RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead

@sayakpaul

Describe the bug

I'm trying to follow the Dreambooth training example and I'm getting this error:

This is the full window:

accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400
11/25/2023 22:59:47 - INFO - main - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'timestep_spacing', 'prediction_type', 'variance_type', 'thresholding', 'sample_max_value', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
{'mid_block_only_cross_attention', 'conv_in_kernel', 'dual_cross_attention', 'resnet_time_scale_shift', 'time_embedding_type', 'transformer_layers_per_block', 'class_embed_type', 'use_linear_projection', 'cross_attention_norm', 'num_attention_heads', 'conv_out_kernel', 'encoder_hid_dim', 'projection_class_embeddings_input_dim', 'time_embedding_dim', 'addition_embed_type_num_heads', 'mid_block_type', 'class_embeddings_concat', 'upcast_attention', 'addition_time_embed_dim', 'only_cross_attention', 'encoder_hid_dim_type', 'resnet_out_scale_factor', 'dropout', 'time_embedding_act_fn', 'timestep_post_act', 'reverse_transformer_layers_per_block', 'resnet_skip_time_act', 'time_cond_proj_dim', 'attention_type', 'addition_embed_type', 'num_class_embeds'} was not found in config. Values will be initialized to default values.
11/25/2023 22:59:51 - INFO - main - ***** Running training *****
11/25/2023 22:59:51 - INFO - main - Num examples = 5
11/25/2023 22:59:51 - INFO - main - Num batches each epoch = 5
11/25/2023 22:59:51 - INFO - main - Num Epochs = 80
11/25/2023 22:59:51 - INFO - main - Instantaneous batch size per device = 1
11/25/2023 22:59:51 - INFO - main - Total train batch size (w. parallel, distributed & accumulation) = 1
11/25/2023 22:59:51 - INFO - main - Gradient Accumulation steps = 1
11/25/2023 22:59:51 - INFO - main - Total optimization steps = 400
Steps: 0%| | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1422, in
main(args)
File "G:\dreambooth\diffusers\examples\dreambooth\train_dreambooth.py", line 1253, in main
model_pred = unet(
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "G:\dreambooth\diffusers\src\diffusers\models\unet_2d_condition.py", line 1035, in forward
sample = self.conv_in(sample)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 460, in forward
return self._conv_forward(input, self.weight, self.bias)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\conv.py", line 456, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
Steps: 0%| | 0/400 [00:01<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\Scripts\accelerate.exe_main.py", line 7, in
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 994, in launch_command
simple_launcher(args)
File "C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\lib\site-packages\accelerate\commands\launch.py", line 636, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\zPTVq\AppData\Local\Programs\Python\Python310\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=.\dog', '--output_dir=.\output', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400']' returned non-zero exit status 1.

Reproduction

accelerate launch train_dreambooth.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir=".\dog" --output_dir=".\output" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400

Logs

No response

System Info

diffusers version: 0.24.0.dev0
Platform: Windows-10-10.0.22631-SP0
Python version: 3.10.6
PyTorch version (GPU?): 2.1.1+cu121 (True)
Huggingface_hub version: 0.19.4
Transformers version: 4.35.2
Accelerate version: 0.24.1
xFormers version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

GPU: 3090TI 24GB VRAM

Who can help?

@sayakpaul @patrickvonplaten

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead #5932

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead #5932

Description

Describe the bug

Reproduction

Logs

System Info

Who can help?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions