[FR] Please support WanVACETransformer3DModel.from_single_file

@DN6

Is your feature request related to a problem? Please describe.
Need WanVACETransformer3DModel.from_single_file support to use GGUF format

Describe the solution you'd like.
Install diffusers
pip install git+https://github.com/huggingface/diffusers.git@refs/pull/11582/head

from typing import List
import torch
import PIL.Image
from diffusers import AutoencoderKLWan, WanVACEPipeline, WanVACETransformer3DModel
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
from diffusers.utils import export_to_video, load_image, load_video
from diffusers import GGUFQuantizationConfig

model_id = "a-r-r-o-w/Wan-VACE-1.3B-diffusers"
transformer_path = f"https://huggingface.co/newgenai79/Wan2.1-VACE-1.3B-GGUF/blob/main/Wan2.1-VACE-1.3B-Q8_0.gguf"
transformer_gguf = WanVACETransformer3DModel.from_single_file(
    transformer_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)
vae = AutoencoderKLWan.from_pretrained(model_id, subfolder="vae", torch_dtype=torch.float32)
pipe = WanVACEPipeline.from_pretrained(
    model_id,
    transformer=transformer_gguf,
    vae=vae, 
    torch_dtype=torch.bfloat16
)
flow_shift = 3.0  # 5.0 for 720P, 3.0 for 480P
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=flow_shift)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()


prompt = "A sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves. The robot's metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints. A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes, suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting. The camera remains static, capturing the robot's poised stance amidst the orderly environment, with a shallow depth of field that keeps the focus on the robot while subtly blurring the background for a cinematic effect."
negative_prompt = "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=832,
    height=480,
    num_frames=81,
    num_inference_steps=30,
    guidance_scale=5.0,
    conditioning_scale=0.0,
    generator=torch.Generator().manual_seed(0),
).frames[0]
export_to_video(output, "output.mp4", fps=16)

Describe alternatives you've considered.
pipe.enable_sequential_cpu_offload() works but very slow. GGUF will help.

Additional context.

(sddw-dev) C:\aiOWN\diffuser_webui>python WanVace_t2v.py
W0529 22:26:44.655000 17380 site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
Traceback (most recent call last):
  File "C:\aiOWN\diffuser_webui\WanVace_t2v.py", line 11, in <module>
    transformer_gguf = WanVACETransformer3DModel.from_single_file(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nitin\miniconda3\envs\sddw-dev\Lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\nitin\miniconda3\envs\sddw-dev\Lib\site-packages\diffusers\loaders\single_file_model.py", line 235, in from_single_file
    raise ValueError(
ValueError: FromOriginalModelMixin is currently only compatible with StableCascadeUNet, UNet2DConditionModel, AutoencoderKL, ControlNetModel, SD3Transformer2DModel, MotionAdapter, SparseControlNetModel, FluxTransformer2DModel, LTXVideoTransformer3DModel, AutoencoderKLLTXVideo, AutoencoderDC, MochiTransformer3DModel, HunyuanVideoTransformer3DModel, AuraFlowTransformer2DModel, Lumina2Transformer2DModel, SanaTransformer2DModel, WanTransformer3DModel, AutoencoderKLWan, HiDreamImageTransformer2DModel

@DN6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FR] Please support WanVACETransformer3DModel.from_single_file #11630

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FR] Please support WanVACETransformer3DModel.from_single_file #11630

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions