SD.Next Xmass Edition 2024-12 #3653
vladmandic
announced in
Announcements
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
SD.Next Xmass Edition 2024-12
What's new?
While we have several new supported models, workflows and tools, this release is primarily about quality-of-life improvements:
list of changes that went into this one is long: changes to GPU offloading, brand new LoRA loader, system memory management, on-the-fly quantization, improved gguf loader, etc.
but main goal is enabling modern large models to run on standard consumer GPUs
without performance hits typically associated with aggressive memory swapping and needs for constant manual tweaks
with full search and tons of new documentation
We've also added support for several new models such as highly anticipated NVLabs Sana (see supported models for full list)
And several new SOTA video models: Lightricks LTX-Video, Hunyuan Video and Genmo Mochi.1 Preview
And a lot of Control and IPAdapter goodies
a cool Redux model as well as XLabs IP-adapter
as well as InstantX IP-adapter
Plus couple of new integrated workflows such as FreeScale and Style Aligned Image Generation
And it wouldn't be a Xmass edition without couple of custom themes: Snowflake and Elf-Green!
All-in-all, we're around ~180 commits worth of updates, check the changelog for full list
ReadMe | ChangeLog | Docs | WiKi | Discord
Details
New models and integrations
support for 1.6B 2048px, 1.6B 1024px and 0.6B 512px models
Sana can synthesize high-resolution images with strong text-image alignment by using Gemma2 as text-encoder
and its fast - typically at least 2x faster than sd-xl even for 1.6B variant and maintains performance regardless of resolution
e.g., rendering at 4k is possible in less than 8GB vram
to use, select from networks -> models -> reference and models will be auto-downloaded on first use
reference values: sampler: default (or any flow-match variant), steps: 20, width/height: 1024, guidance scale: 4.5
note like other LLM-based text-encoders, sana prefers long and descriptive prompts
any short prompt below 300 characters will be auto-expanded using built in Gemma LLM before encoding while long prompts will be passed as-is
note: when selecting tiles in control settings, you can also specify non-square ratios
in which case it will use context-aware image resize to maintain overall composition
note: available tiling options can be set in settings -> control
Redux is actually a tool, Fill is inpaint/outpaint optimized version of Flux-dev
Canny & Depth are optimized versions of Flux-dev for their respective tasks: they are not ControlNets that work on top of a model
to use, go to image or control interface and select Flux Tools in scripts
all models are auto-downloaded on first use
note: All models are gated and require acceptance of terms and conditions via web page
recommended: Enable on-the-fly quantization or compression to reduce resource usage
todo: support for Canny/Depth LoRAs
works together with existing model and basically uses input image to analyze it and use that instead of prompt
optional can use prompt to combine guidance with input image
recommended: low denoise strength levels result in more variety
note: can be used in inpaint/outpaint mode only
recommended: guidance scale 30
recommended: guidance scale 10
alternative to standard ControlNets, FLUX.1 also allows LoRA to help guide the generation process
both Depth and Canny LoRAs are available in standard control menus
InstantX
andAlimama
, we now have official ones from StabilityAIenable in scripts, compatible with sd-xl
enter multiple prompts in prompt field separated by new line
style-aligned applies selected attention layers uniformly to all images to achive consistency
can be used with or without input image in which case first prompt is used to establish baseline
note: all prompts are processes as a single batch, so vram is limiting factor
enable in scripts, compatible with sd-xl for text and img2img
run iterative generation of images at different scales to achieve better results
can render 4k sdxl images
note: disable live preview to avoid memory issues when generating large images
Video models
model size: 27.75gb
support for 0.9.0, 0.9.1 and custom safetensor-based models with full quantization and offloading support
support for text-to-video and image-to-video, to use, select in scripts -> ltx-video
refrence values: steps 50, width 704, height 512, frames 161, guidance scale 3.0
model size: 40.92gb
support for text-to-video, to use, select in scripts -> hunyuan video
basic support only
refrence values: steps 50, width 1280, height 720, frames 129, guidance scale 6.0
support for text-to-video, to use, select in scripts -> mochi.1 video
basic support only
refrence values: steps 64, width 848, height 480, frames 19, guidance scale 4.5
Notes:
any use on gpus below 16gb and systems below 48gb ram is experimental at best
any future optimizations would likely have to go into partial loading and excecution instead of offloading inactive parts of the model
UI and workflow improvements
this results in perceived overhead on generate startup, but results in overall faster execution as LoRA does not need to be processed on each step
thanks @AI-Casanova
this setting has large performance and resource implications, see Offload wiki for details
example:
<lora:/test/folder/my-lora.safetensors:1.0>
huggingface
example:
<lora:/huggingface.co/vendor/repo/my-lora.safetensors:1.0>
bnb
quant when loading unet/transformerexample: https://civitai.com/models/646328?modelVersionId=1040235
0.25
for low-watermark: skip offload if memory usage is below 25%0.70
high-watermark: must offload if memory usage is above 70%low-end systems, triggered by either
lowvrwam
or by detection of <=4GB will use sequential offloadall other systems use balanced offload by default (can be changed in settings)
previous behavior was to use model offload on systems with <=8GB and
medvram
and no offload by defaultif you have issues with image decode, you'll need to enable it manually
black-reimagined
, thanks @ArtheriaxUpdates
TorchAO
pre (during load) and post (during execution) quantizationtorchao supports 4 different int-based and 3 float-based quantization schemes
This is in addition to existing support for:
BitsAndBytes
with 3 float-based quantization schemesOptimium.Quanto
with 3 int-based and 2 float-based quantizations schemesGGUF
with pre-quantized weightsGGUF
loader from custom to diffuser nativeFixes
SD_NO_CACHE=true
env variable to disable file/folder cachingdiffusers
wandb
packageBeta Was this translation helpful? Give feedback.
All reactions