cpu_offload vRAM memory consumption large than 4GB

Describe the bug

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings to test cpu_offload, but the vRAM memory consumption is large than 4GB

GPU	cpu_offload enabled	vRAM cost
1080	Yes	4539MB
1080	No	5101MB
TITAN RTX	Yes	5134MB
TITAN RTX	No	5668MB

Reproduction

I am using the code from https://huggingface.co/docs/diffusers/optimization/fp16#offloading-to-cpu-with-accelerate-for-memory-savings

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    
    torch_dtype=torch.float16,
)
pipe = pipe.to("cuda")

prompt = "a photo of an astronaut riding a horse on mars"
pipe.enable_sequential_cpu_offload()
image = pipe(prompt).images[0]

Logs

No response

System Info

test on 1080/TITAN RTX

diffusers version: 0.11.1
accelerate version: 0.15.0
Platform: Linux-4.15.0-142-generic-x86_64-with-glibc2.29
Python version: 3.8.10
PyTorch version (GPU?): 1.10.1+cu111 (True)
Huggingface_hub version: 0.11.1
Transformers version: 4.25.1
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cpu_offload vRAM memory consumption large than 4GB #1934

Describe the bug

Reproduction

Logs

System Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cpu_offload vRAM memory consumption large than 4GB #1934

Description

Describe the bug

Reproduction

Logs

System Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions