Open
Description
Describe the bug
I am trying to load lora model trained with flux-fill pipeline using FluxControlInpaintPipeline
. But it is not able to load the lora model into transformers. Any advice is appreciated. I want to have flux fill pipeline with control.
Reproduction
Download sample flux fill lora model
wget https://huggingface.co/WensongSong/Insert-Anything/resolve/main/20250321_steps5000_pytorch_lora_weights.safetensors
Script:
import os
import torch
from pipeline import FluxControlInpaintPipeline
from diffusers.utils import load_image, make_image_grid
from image_gen_aux import DepthPreprocessor # https://github.com/huggingface/image_gen_aux
from PIL import Image
import numpy as np
pipe = FluxControlInpaintPipeline.from_pretrained(
"black-forest-labs/FLUX.1-Fill-dev",
torch_dtype=torch.bfloat16,
)
# ---------------------------------------------------------------
pipe.to("cuda")
pipe.load_lora_weights("black-forest-labs/FLUX.1-Depth-dev-lora")
pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")
prompt = "a blue robot singing opera with human-like expressions"
image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/robot.png")
redux_img = load_image("bottom_flatlay.jpg")
head_mask = np.zeros_like(image)
head_mask[65:580,300:642] = 255
mask_image = Image.fromarray(head_mask)
processor = DepthPreprocessor.from_pretrained("LiheYoung/depth-anything-large-hf")
control_image = processor(image)[0].convert("RGB")
output = pipe(
prompt=prompt,
image=image,
control_image=control_image,
mask_image=mask_image,
num_inference_steps=30,
strength=0.9,
guidance_scale=10.0,
generator=torch.Generator().manual_seed(42),
).images[0]
make_image_grid([image, control_image, mask_image, output.resize(image.size)], rows=1, cols=4).save("output.png")
Logs
Loading pipeline components...: 29%|█████████████████████████████████████████████████████████████████▋ | 2/7 [00:00<00:00, 17.92it/s]You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 58.60it/s]
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 89.39it/s]
Loading pipeline components...: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.36it/s]
No LoRA keys associated to CLIPTextModel found with the prefix='text_encoder'. This is safe to ignore if LoRA state dict didn't originally have any CLIPTextModel related params. You can also try specifying `prefix=None` to resolve the warning. Otherwise, open an issue if you think it's unexpected: https://github.com/huggingface/diffusers/issues/new
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:168: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/tuners/tuners_utils.py:837: UserWarning: Adapter default_1 was active which is now deleted. Setting active adapter to default_0.
warnings.warn(
Loading default_1 was unsucessful with the following error:
Error(s) in loading state_dict for FluxTransformer2DModel:
size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).
Traceback (most recent call last):
File "/workspace/flux-cluster/main.py", line 22, in <module>
pipe.load_lora_weights("20250321_steps5000_pytorch_lora_weights.safetensors")
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1853, in load_lora_weights
self.load_lora_into_transformer(
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/lora_pipeline.py", line 1944, in load_lora_into_transformer
transformer.load_lora_adapter(
File "/usr/local/lib/python3.10/dist-packages/diffusers/loaders/peft.py", line 352, in load_lora_adapter
incompatible_keys = set_peft_model_state_dict(self, state_dict, adapter_name, **peft_kwargs)
File "/workspace/flux-cluster/venv/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 443, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False, assign=True)
File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 2581, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for FluxTransformer2DModel:
size mismatch for x_embedder.lora_A.default_1.weight: copying a param with shape torch.Size([256, 384]) from checkpoint, the shape in current model is torch.Size([256, 128]).
System Info
I am using following packages with
torch
torchvision
diffusers
transformers
accelerate==0.33.0
sentencepiece==0.2.0
protobuf==5.27.3
numpy<2
deepspeed==0.14.4
einops==0.8.0
huggingface-hub
pandas
opencv-python==4.10.0.84
supervision
cog
git+https://github.com/huggingface/peft.git
pillow
requests
loguru
python-dotenv
controlnet-aux
xformers
Who can help?
No response