Skip to content

Diffusers 0.32.0: New video pipelines, new image pipelines, new quantization backends, new training scripts, and more

Compare
Choose a tag to compare
@sayakpaul sayakpaul released this 23 Dec 16:00
· 77 commits to main since this release
hunyuan-output.mp4

This release took a while, but it has many exciting updates. It contains several new pipelines for image and video generation, new quantization backends, and more.

Going forward, to provide more transparency to the community about ongoing developments and releases in Diffusers, we will be making use of a roadmap tracker.

New Video Generation Pipelines 📹

Open video generation models are on the rise, and we’re pleased to provide comprehensive integration support for all of them. The following video pipelines are bundled in this release:

Check out this section to learn more about the fine-tuning options available for these new video models.

New Image Generation Pipelines

Important Note about the new Flux Models

We can combine the regular Flux.1 Dev LoRAs with Flux Control LoRAs, Flux Control, and Flux Fill. For example, you can enable few-steps inference with Flux Fill using:

from diffusers import FluxFillPipeline
from diffusers.utils import load_image
import torch

pipe = FluxFillPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-Fill-dev", torch_dtype=torch.bfloat16
).to("cuda")

adapter_id = "alimama-creative/FLUX.1-Turbo-Alpha"
pipe.load_lora_weights(adapter_id)

image = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup.png")
mask = load_image("https://huggingface.co/datasets/diffusers/diffusers-images-docs/resolve/main/cup_mask.png")

image = pipe(
    prompt="a white paper cup",
    image=image,
    mask_image=mask,
    height=1632,
    width=1232,
    guidance_scale=30,
    num_inference_steps=8,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-fill-dev.png")

To learn more, check out the documentation.

Note

SANA is a small model compared to other models like Flux and Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024×1024 resolution image. We support LoRA fine-tuning of SANA. Check out this section for more details.

Acknowledgements

New Quantization Backends

Please be aware of the following caveats:

  • TorchAO quantized checkpoints cannot be serialized in safetensors currently. This may change in the future.
  • GGUF currently only supports loading pre-quantized checkpoints into models in this release. Support for saving models with GGUF quantization will be added in the future.

New training scripts

This release features many new training scripts for the community to play:

All commits

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @faaany
    • fix bug in require_accelerate_version_greater (#9746)
    • make pipelines tests device-agnostic (part1) (#9399)
    • make pipelines tests device-agnostic (part2) (#9400)
  • @linoytsaban
    • [SD3-5 dreambooth lora] update model cards (#9749)
    • [SD 3.5 Dreambooth LoRA] support configurable training block & layers (#9762)
    • [flux dreambooth lora training] make LoRA target modules configurable + small bug fix (#9646)
    • [advanced flux training] bug fix + reduce memory cost as in #9829 (#9838)
    • [SD3 dreambooth lora] smol fix to checkpoint saving (#9993)
    • [Flux Redux] add prompt & multiple image input (#10056)
    • [community pipeline] Add RF-inversion Flux pipeline (#9816)
    • [community pipeline rf-inversion] - fix example in doc (#10179)
    • [RF inversion community pipeline] add eta_decay (#10199)
  • @raulc0399
    • adds the pipeline for pixart alpha controlnet (#8857)
  • @yiyixuxu
    • Revert "[LoRA] fix: lora loading when using with a device_mapped mode… (#9823)
    • fix controlnet module refactor (#9968)
    • Sd35 controlnet (#10020)
    • fix offloading for sd3.5 controlnets (#10072)
    • pass attn mask arg for flux (#10122)
    • update get_parameter_dtype (#10342)
  • @jellyheadandrew
    • Add new community pipeline for 'Adaptive Mask Inpainting', introduced in [ECCV2024] ComA (#9228)
  • @DN6
    • Improve downloads of sharded variants (#9869)
    • [CI] Unpin torch<2.5 in CI (#9961)
    • Flux latents fix (#9929)
    • [Single File] Fix SD3.5 single file loading (#10077)
    • [Single File] Pass token when fetching interpreted config (#10082)
    • [Single File] Add single file support for AutoencoderDC (#10183)
    • Fix format issue in push_test yml (#10235)
    • [Single File] Add GGUF support (#9964)
    • Fix Mochi Quality Issues (#10033)
    • Fix Doc links in GGUF and Quantization overview docs (#10279)
    • Make zeroing prompt embeds for Mochi Pipeline configurable (#10284)
    • [Single File] Add single file support for Flux Canny, Depth and Fill (#10288)
    • [Single File] Add single file support for Mochi Transformer (#10268)
    • Allow Mochi Transformer to be split across multiple GPUs (#10300)
    • [Single File] Add GGUF support for LTX (#10298)
    • Mochi docs (#9934)
    • [Single File] Add Single File support for HunYuan video (#10320)
    • [Single File] Fix loading (#10349)
  • @ParagEkbote
    • Notebooks for Community Scripts Examples (#9905)
    • Move Wuerstchen Dreambooth to research_projects (#9935)
    • Fixed Nits in Docs and Example Script (#9940)
    • Notebooks for Community Scripts-2 (#9952)
    • Move IP Adapter Scripts to research project (#9960)
    • Notebooks for Community Scripts-3 (#10032)
    • Fixed Nits in Evaluation Docs (#10063)
    • Notebooks for Community Scripts-4 (#10094)
    • Fix Broken Link in Optimization Docs (#10105)
    • Fix Broken Links in ReadMe (#10117)
  • @painebenjamin
    • Fix Progress Bar Updates in SD 1.5 PAG Img2Img pipeline (#9925)
    • Add StableDiffusion3PAGImg2Img Pipeline + Fix SD3 Unconditional PAG (#9932)
  • @hlky
    • Fix beta and exponential sigmas + add tests (#9954)
    • ControlNet from_single_file when already converted (#9978)
    • Add beta, exponential and karras sigmas to FlowMatchEulerDiscreteScheduler (#10001)
    • Add sigmas to Flux pipelines (#10081)
    • Fix num_images_per_prompt>1 with Skip Guidance Layers in StableDiffusion3Pipeline (#10086)
    • Convert sigmas to np.array in FlowMatch set_timesteps (#10088)
    • Fix multi-prompt inference (#10103)
    • Test skip_guidance_layers in SD3 pipeline (#10102)
    • Fix pipeline_stable_audio formating (#10114)
    • Add sigmas to pipelines using FlowMatch (#10116)
    • Use torch in get_3d_rotary_pos_embed/_allegro (#10161)
    • Add ControlNetUnion (#10131)
    • Remove negative_* from SDXL callback (#10203)
    • refactor StableDiffusionXLControlNetUnion (#10200)
    • Use torch in get_2d_sincos_pos_embed and get_3d_sincos_pos_embed (#10156)
    • Use t instead of timestep in _apply_perturbed_attention_guidance (#10243)
    • Add dynamic_shifting to SD3 (#10236)
    • Fix use_flow_sigmas (#10242)
    • Fix ControlNetUnion _callback_tensor_inputs (#10218)
    • Use non-human subject in StableDiffusion3ControlNetPipeline example (#10214)
    • Add enable_vae_tiling to AllegroPipeline, fix example (#10212)
    • Fix checkpoint in CogView3PlusPipeline example (#10211)
    • Fix RePaint Scheduler (#10185)
    • Add ControlNetUnion to AutoPipeline from_pretrained (#10219)
    • Add set_shift to FlowMatchEulerDiscreteScheduler (#10269)
    • Use torch in get_2d_rotary_pos_embed (#10155)
    • Fix sigma_last with use_flow_sigmas (#10267)
    • Add Flux Control to AutoPipeline (#10292)
    • Check correct model type is passed to from_pretrained (#10189)
    • Fix local_files_only for checkpoints with shards (#10294)
    • Fix push_tests_mps.yml (#10326)
    • Fix EMAModel test_from_pretrained (#10325)
    • Support Flux IP Adapter (#10261)
    • Fix enable_sequential_cpu_offload in test_kandinsky_combined (#10324)
    • Fix FluxIPAdapterTesterMixin (#10354)
  • @dimitribarbot
    • Update sdxl reference pipeline to latest sdxl pipeline (#9938)
    • Add sdxl controlnet reference community pipeline (#9893)
  • @suzukimain
    • [community] Load Models from Sources like Civitai into Existing Pipelines (#9986)
  • @lawrence-cj
    • [DC-AE] Add the official Deep Compression Autoencoder code(32x,64x,128x compression ratio); (#9708)
    • [Sana] Add Sana, including SanaPipeline, SanaPAGPipeline, LinearAttentionProcessor, Flow-based DPM-sovler and so on. (#9982)
    • [Sana]add 2K related model for Sana (#10322)
    • [Sana bug] bug fix for 2K model config (#10340)
  • @darshil0805
    • Add PAG Support for Stable Diffusion Inpaint Pipeline (#9386)
  • @affromero
    • Flux Control(Depth/Canny) + Inpaint (#10192)
  • @SHYuanBest
    • [LoRA] Support HunyuanVideo (#10254)
  • @guiyrt
    • [WIP] SD3.5 IP-Adapter Pipeline Integration (#9987)