.to() cast in .load_lora_weights breaks on bitsandbytes quantized text_encoder's for SD/SDXL

@yiyixuxu

Describe the bug

This .to() cast on the text encoder:

diffusers/src/diffusers/loaders/lora_base.py

Line 421 in 9836f0e

text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)

Is invalid when working with an SD1.5 / SDXL pipeline that has a bitsandbytes quantization config used on the text encoder

perhaps something like this would fix?

if is_bitsandbytes_available():
    quant, is_4bit, _ = _check_bnb_status(text_encoder)
else:
    quant, is_4bit = False, False

if not quant:
    text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)
elif is_4bit:
    text_encoder.to(device=text_encoder.device)

This problem does not seem to affect flux / sd3, so I am not sure how this would affect other pipelines?

Reproduction

import torch
import transformers

import diffusers
import diffusers.quantizers.quantization_config as _qc

text_encoder = transformers.CLIPTextModel.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0', subfolder='text_encoder', variant='fp16',
    torch_dtype=torch.float16, quantization_config=_qc.BitsAndBytesConfig(load_in_8bit=True))

pipeline = diffusers.StableDiffusionXLPipeline.from_pretrained(
    'stabilityai/stable-diffusion-xl-base-1.0',
    variant='fp16',
    torch_dtype=torch.float16,
    text_encoder=text_encoder
)

pipeline.load_lora_weights('Norod78/sdxl-emoji-lora')

pipeline.to('cuda')

pipeline(prompt='test')

Logs

REDACT\diffusers\venv\Scripts\python.exe REDACT\diffusers\test.py 
WARNING:torchao.kernel.intmm:Warning: Detected no triton, on systems without Triton certain kernels will not work
`low_cpu_mem_usage` was None, now default to True since model is quantized.
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 15.86it/s]
Traceback (most recent call last):
  File "REDACT\diffusers\test.py", line 18, in <module>
    pipeline.load_lora_weights('Norod78/sdxl-emoji-lora')
  File "REDACT\diffusers\src\diffusers\loaders\lora_pipeline.py", line 657, in load_lora_weights
    self.load_lora_into_text_encoder(
  File "REDACT\diffusers\src\diffusers\loaders\lora_pipeline.py", line 894, in load_lora_into_text_encoder
    _load_lora_into_text_encoder(
  File "REDACT\diffusers\src\diffusers\loaders\lora_base.py", line 430, in _load_lora_into_text_encoder
    text_encoder.to(device=text_encoder.device, dtype=text_encoder.dtype)
  File "REDACT\diffusers\venv\Lib\site-packages\transformers\modeling_utils.py", line 3089, in to
    raise ValueError(
ValueError: You cannot cast a bitsandbytes model in a new `dtype`. Make sure to load the model using `from_pretrained` using the desired `dtype` by passing the correct `torch_dtype` argument.

Process finished with exit code 1

System Info

diffusers == 0.34.0.dev0

Who can help?

@yiyixuxu @sayakpaul @DN6 @asomoza

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.to() cast in .load_lora_weights breaks on bitsandbytes quantized text_encoder's for SD/SDXL #11570

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

.to() cast in .load_lora_weights breaks on bitsandbytes quantized text_encoder's for SD/SDXL #11570

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions