-
Notifications
You must be signed in to change notification settings - Fork 6.2k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
@a-r-r-o-w @DN6 @asomoza @jlonge4
sorry to tag you all, however after @yiyixuxu 's merge of Wan2.2 PR, one of your commit's is causing the model to OOM. I was only able to narrow it down to this.
Wan2.2 PR, working
commit: #a6d9f6a1a9a9ede2c64972d83ccee192b801c4a0
Latest main, OOM
commit: #56d438727036b0918b30bbe3110c5fe1634ed19d

[t+1m49s004ms] [INFO] [Dispatcher] Broadcasting command: run
[t+1m49s005ms] [INFO] Starting app run
[t+1m49s005ms] Downloading URL: https://cloud.inference.sh/u/4mg21r6ta37mpaz6ktzwtt8krr/01jz02fhjefhmky16f1n5bnj9p.png to /tmp/tmphbupdgg3.png
[t+1m49s168ms] 0%| | 0.00/786k [00:00<?, ?iB/s]
[t+1m49s168ms] 100%|██████████| 786k/786k [00:00<00:00, 45.5MiB/s]
[t+1m49s171ms] Generating video with prompt: god particle
[t+1m49s171ms] Using resolution: 720p, max area: 921600
[t+1m49s235ms] Loaded image: (1024, 576)
[t+1m49s247ms] Resized image from (1024, 576) to (1280, 720) (target area: 921600)
[t+1m49s247ms] Starting video generation...
[t+3m05s640ms] 0%| | 0/40 [00:00<?, ?it/s]
[t+4m15s733ms] 2%|▎ | 1/40 [01:10<45:47, 70.44s/it]
[t+5m25s846ms] 5%|▌ | 2/40 [02:20<44:29, 70.24s/it]
[t+5m26s640ms] 8%|▊ | 3/40 [03:30<43:16, 70.18s/it]
[t+5m26s640ms] 8%|▊ | 3/40 [03:31<43:27, 70.48s/it]
[t+5m26s643ms] [ERROR] Traceback (most recent call last):
[t+5m26s643ms] File "/server/tasks.py", line 50, in run_task
[t+5m26s643ms] output = await result
[t+5m26s643ms] ^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/src/inference.py", line 338, in run
[t+5m26s643ms] output = self.pipe(
[t+5m26s643ms] ^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
[t+5m26s643ms] return func(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/diffusers/pipelines/wan/pipeline_wan_i2v.py", line 727, in __call__
[t+5m26s643ms] noise_pred = current_model(
[t+5m26s643ms] ^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[t+5m26s643ms] return self._call_impl(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[t+5m26s643ms] return forward_call(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/diffusers/models/transformers/transformer_wan.py", line 660, in forward
[t+5m26s643ms] hidden_states = block(hidden_states, encoder_hidden_states, timestep_proj, rotary_emb)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[t+5m26s643ms] return self._call_impl(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[t+5m26s643ms] return forward_call(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/diffusers/models/transformers/transformer_wan.py", line 485, in forward
[t+5m26s643ms] norm_hidden_states = (self.norm3(hidden_states.float()) * (1 + c_scale_msa) + c_shift_msa).type_as(
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
[t+5m26s643ms] return self._call_impl(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
[t+5m26s643ms] return forward_call(*args, **kwargs)
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/diffusers/models/normalization.py", line 88, in forward
[t+5m26s643ms] return F.layer_norm(
[t+5m26s643ms] ^^^^^^^^^^^^^
[t+5m26s643ms] File "/inferencesh/apps/gpu/5zg3dm3ph35ntzewf4txeszag9/venv/3.12/lib/python3.12/site-packages/torch/nn/functional.py", line 2910, in layer_norm
[t+5m26s643ms] return torch.layer_norm(
[t+5m26s643ms] ^^^^^^^^^^^^^^^^^
[t+5m26s643ms] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.44 GiB. GPU 0 has a total capacity of 79.14 GiB of which 1.16 GiB is free. Process 981657 has 77.96 GiB memory in use. Of the allocated memory 72.20 GiB is allocated by PyTorch, and 5.23 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Reproduction
diffusers@56d438727036b0918b30bbe3110c5fe1634ed19d
run wan2.2 i2v a14b with example code
Logs
System Info
a100/h100
Who can help?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working