SD.Next Xmass Edition 2024-12 #3653

vladmandic · 2024-12-24T13:59:35Z

vladmandic
Dec 24, 2024
Maintainer

SD.Next Xmass Edition 2024-12

What's new?
While we have several new supported models, workflows and tools, this release is primarily about quality-of-life improvements:

New memory management engine
list of changes that went into this one is long: changes to GPU offloading, brand new LoRA loader, system memory management, on-the-fly quantization, improved gguf loader, etc.
but main goal is enabling modern large models to run on standard consumer GPUs
without performance hits typically associated with aggressive memory swapping and needs for constant manual tweaks
New documentation website
with full search and tons of new documentation
New settings panel with simplified and streamlined configuration

We've also added support for several new models such as highly anticipated NVLabs Sana (see supported models for full list)
And several new SOTA video models: Lightricks LTX-Video, Hunyuan Video and Genmo Mochi.1 Preview

And a lot of Control and IPAdapter goodies

for SDXL there is new ProMax, improved Union and Tiling models
for FLUX.1 there are Flux Tools as well as official Canny and Depth models,
a cool Redux model as well as XLabs IP-adapter
for SD3.5 there are official Canny, Blur and Depth models in addition to existing 3rd party models
as well as InstantX IP-adapter

Plus couple of new integrated workflows such as FreeScale and Style Aligned Image Generation

And it wouldn't be a Xmass edition without couple of custom themes: Snowflake and Elf-Green!
All-in-all, we're around ~180 commits worth of updates, check the changelog for full list

ReadMe | ChangeLog | Docs | WiKi | Discord

Details

New models and integrations

NVLabs Sana
support for 1.6B 2048px, 1.6B 1024px and 0.6B 512px models
Sana can synthesize high-resolution images with strong text-image alignment by using Gemma2 as text-encoder
and its fast - typically at least 2x faster than sd-xl even for 1.6B variant and maintains performance regardless of resolution
e.g., rendering at 4k is possible in less than 8GB vram
to use, select from networks -> models -> reference and models will be auto-downloaded on first use
reference values: sampler: default (or any flow-match variant), steps: 20, width/height: 1024, guidance scale: 4.5
note like other LLM-based text-encoders, sana prefers long and descriptive prompts
any short prompt below 300 characters will be auto-expanded using built in Gemma LLM before encoding while long prompts will be passed as-is
ControlNet
- improved support for Union controlnets with granular control mode type
- added support for latest Xinsir ProMax all-in-one controlnet
- added support for multiple Tiling controlnets, for example Xinsir Tile
  note: when selecting tiles in control settings, you can also specify non-square ratios
  in which case it will use context-aware image resize to maintain overall composition
  note: available tiling options can be set in settings -> control
IP-Adapter
- FLUX.1 XLabs v1 and v2 IP-adapter
- FLUX.1 secondary guidance, enabled using Attention guidance in advanced menu
- SD 3.5 InstantX IP-adapter
Flux Tools
Redux is actually a tool, Fill is inpaint/outpaint optimized version of Flux-dev
Canny & Depth are optimized versions of Flux-dev for their respective tasks: they are not ControlNets that work on top of a model
to use, go to image or control interface and select Flux Tools in scripts
all models are auto-downloaded on first use
note: All models are gated and require acceptance of terms and conditions via web page
recommended: Enable on-the-fly quantization or compression to reduce resource usage
todo: support for Canny/Depth LoRAs
- Redux: ~0.1GB
  works together with existing model and basically uses input image to analyze it and use that instead of prompt
  optional can use prompt to combine guidance with input image
  recommended: low denoise strength levels result in more variety
- Fill: ~23.8GB, replaces currently loaded model
  note: can be used in inpaint/outpaint mode only
- Canny: ~23.8GB, replaces currently loaded model
  recommended: guidance scale 30
- Depth: ~23.8GB, replaces currently loaded model
  recommended: guidance scale 10
Flux ControlNet LoRA
alternative to standard ControlNets, FLUX.1 also allows LoRA to help guide the generation process
both Depth and Canny LoRAs are available in standard control menus
StabilityAI SD35 ControlNets
- In addition to previously released InstantX and Alimama, we now have official ones from StabilityAI
Style Aligned Image Generation
enable in scripts, compatible with sd-xl
enter multiple prompts in prompt field separated by new line
style-aligned applies selected attention layers uniformly to all images to achive consistency
can be used with or without input image in which case first prompt is used to establish baseline
note: all prompts are processes as a single batch, so vram is limiting factor
FreeScale
enable in scripts, compatible with sd-xl for text and img2img
run iterative generation of images at different scales to achieve better results
can render 4k sdxl images
note: disable live preview to avoid memory issues when generating large images

Video models

Lightricks LTX-Video
model size: 27.75gb
support for 0.9.0, 0.9.1 and custom safetensor-based models with full quantization and offloading support
support for text-to-video and image-to-video, to use, select in scripts -> ltx-video
refrence values: steps 50, width 704, height 512, frames 161, guidance scale 3.0
Hunyuan Video
model size: 40.92gb
support for text-to-video, to use, select in scripts -> hunyuan video
basic support only
refrence values: steps 50, width 1280, height 720, frames 129, guidance scale 6.0
Genmo Mochi.1 Preview
support for text-to-video, to use, select in scripts -> mochi.1 video
basic support only
refrence values: steps 64, width 848, height 480, frames 19, guidance scale 4.5

Notes:

all video models are very large and resource intensive!
any use on gpus below 16gb and systems below 48gb ram is experimental at best
sdnext support for video models is relatively basic with further optimizations pending community interest
any future optimizations would likely have to go into partial loading and excecution instead of offloading inactive parts of the model
new video models use generic llms for prompting and due to that requires very long and descriptive prompt
you may need to enable sequential offload for maximum gpu memory savings
optionally enable pre-quantization using bnb for additional memory savings
reduce number of frames and/or resolution to reduce memory usage

UI and workflow improvements

Docs:
- New documentation site! https://vladmandic.github.io/sdnext-docs/
- Additional Wiki content: Styles, Wildcards, etc.
LoRA handler rewrite:
- LoRA weights are no longer calculated on-the-fly during model execution, but are pre-calculated at the start
  this results in perceived overhead on generate startup, but results in overall faster execution as LoRA does not need to be processed on each step
  thanks @AI-Casanova
- LoRA weights can be applied/unapplied as on each generate or they can store weights backups for later use
  this setting has large performance and resource implications, see Offload wiki for details
- LoRA name in prompt can now also be an absolute path to a LoRA file, even if LoRA is not indexed
  example: <lora:/test/folder/my-lora.safetensors:1.0>
- LoRA name in prompt can now also be path to a LoRA file op huggingface
  example: <lora:/huggingface.co/vendor/repo/my-lora.safetensors:1.0>
Model loader improvements:
- detect model components on model load fail
- allow passing absolute path to model loader
- Flux, SD35: force unload model
- Flux: apply bnb quant when loading unet/transformer
- Flux: all-in-one safetensors
  example: https://civitai.com/models/646328?modelVersionId=1040235
- Flux: do not recast quants
Memory improvements:
- faster and more compatible balanced offload mode
- balanced offload: units are now in percentage instead of bytes
- balanced offload: add both high and low watermark, defaults as below
  0.25 for low-watermark: skip offload if memory usage is below 25%
  0.70 high-watermark: must offload if memory usage is above 70%
- balanced offload will attempt to run offload as non-blocking and force gc at the end
- change-in-behavior:
  low-end systems, triggered by either lowvrwam or by detection of <=4GB will use sequential offload
  all other systems use balanced offload by default (can be changed in settings)
  previous behavior was to use model offload on systems with <=8GB and medvram and no offload by default
- VAE upcase is now disabled by default on all systems
  if you have issues with image decode, you'll need to enable it manually
UI:
- improved stats on generate completion
- improved live preview display and performance
- improved accordion behavior
- auto-size networks height for sidebar
- control: hide preview column by default
- control: optionn to hide input column
- control: add stats
- settings: reorganized and simplified
- browser -> server logging framework
- add addtional themes: black-reimagined, thanks @Artheriax
Batch
- image batch processing will use caption files if they exist instead of default prompt

Updates

Quantization
- Add TorchAO pre (during load) and post (during execution) quantization
  torchao supports 4 different int-based and 3 float-based quantization schemes
  This is in addition to existing support for:
- BitsAndBytes with 3 float-based quantization schemes
- Optimium.Quanto with 3 int-based and 2 float-based quantizations schemes
- GGUF with pre-quantized weights
- Switch GGUF loader from custom to diffuser native
IPEX: update to IPEX 2.5.10+xpu
OpenVINO:
- update to 2024.6.0
- disable model caching by default
Sampler improvements
- UniPC, DEIS, SA, DPM-Multistep: allow FlowMatch sigma method and prediction type
- Euler FlowMatch: add sigma methods (karras/exponential/betas)
- Euler FlowMatch: allow using timestep presets to set sigmas
- DPM FlowMatch: update all and add sigma methods
- BDIA-DDIM: experimental new scheduler
- UFOGen: experimental new scheduler

Fixes

add SD_NO_CACHE=true env variable to disable file/folder caching
add settings -> networks -> embeddings -> enable/disable
update diffusers
fix README links
fix sdxl controlnet single-file loader
relax settings validator
improve js progress calls resiliency
fix text-to-video pipeline
avoid live-preview if vae-decode is running
allow xyz-grid with multi-axis s&r
fix xyz-grid with lora
fix api script callbacks
fix gpu memory monitoring
simplify img2img/inpaint/sketch canvas handling
fix prompt caching
fix xyz grid skip final pass
fix sd upscale script
fix cogvideox-i2v
lora auto-apply tags remove duplicates
control load model on-demand if not already loaded
taesd limit render to 2024px
taesd downscale preview to 1024px max: configurable in settings -> live preview
uninstall conflicting wandb package
dont skip diffusers version check if quick is specified
notify on torch install
detect pipeline fro diffusers folder-style model
do not recast flux quants
fix xyz-grid with lora none
fix svd image2video
fix gallery display during generate
fix wildcards replacement to be unique
fix animatediff-xl
fix pag with batch count

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD.Next Xmass Edition 2024-12 #3653

{{title}}

Replies: 0 comments

Select a reply

SD.Next Xmass Edition 2024-12 #3653

vladmandic Dec 24, 2024 Maintainer

SD.Next Xmass Edition 2024-12

Details

New models and integrations

Video models

UI and workflow improvements

Updates

Fixes

Replies: 0 comments

vladmandic
Dec 24, 2024
Maintainer