Skip to content

.to() cast in .load_lora_weights breaks on bitsandbytes quantized text_encoder's for SD/SDXL #11570

Closed
@Teriks

Description

@Teriks

Describe the bug

This .to() cast on the text encoder:

text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)

Is invalid when working with an SD1.5 / SDXL pipeline that has a bitsandbytes quantization config used on the text encoder

perhaps something like this would fix?

if is_bitsandbytes_available():
    quant, is_4bit, _ = _check_bnb_status(text_encoder)
else:
    quant, is_4bit = False, False

if not quant:
    text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)
elif is_4bit:
    text_encoder.to(device=text_encoder.device)

This problem does not seem to affect flux / sd3, so I am not sure how this would affect other pipelines?

Reproduction

import torch
import transformers

import diffusers
import diffusers.quantizers.quantization_config as _qc

text_encoder = transformers.CLIPTextModel.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0', subfolder='text_encoder', variant='fp16',
    torch_dtype=torch.float16, quantization_config=_qc.BitsAndBytesConfig(load_in_8bit=True))

pipeline = diffusers.StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    variant='fp16',
    torch_dtype=torch.float16,
    text_encoder=text_encoder
)

pipeline.load_lora_weights('Norod78/sdxl-emoji-lora')

pipeline.to('cuda')

pipeline(prompt='test')

Logs

REDACT\diffusers\venv\Scripts\python.exe REDACT\diffusers\test.py 
WARNING:torchao.kernel.intmm:Warning: Detected no triton, on systems without Triton certain kernels will not work
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 15.86it/s]
Traceback (most recent call last):
  File "REDACT\diffusers\test.py", line 18, in <module>
    pipeline.load_lora_weights('Norod78/sdxl-emoji-lora')
  File "REDACT\diffusers\src\diffusers\loaders\lora_pipeline.py", line 657, in load_lora_weights
    self.load_lora_into_text_encoder(
  File "REDACT\diffusers\src\diffusers\loaders\lora_pipeline.py", line 894, in load_lora_into_text_encoder
    _load_lora_into_text_encoder(
  File "REDACT\diffusers\src\diffusers\loaders\lora_base.py", line 430, in _load_lora_into_text_encoder
    text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)
  File "REDACT\diffusers\venv\Lib\site-packages\transformers\modeling_utils.py", line 3089, in to
    raise ValueError(
ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `dtype` by passing the correct `torch_dtype` argument.

Process finished with exit code 1

System Info

diffusers == 0.34.0.dev0

Who can help?

@yiyixuxu @sayakpaul @DN6 @asomoza

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions