-
Notifications
You must be signed in to change notification settings - Fork 6.2k
[WIP] Wan2.2 #12004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Wan2.2 #12004
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -656,6 +912,45 @@ def forward(self, x, feat_cache=None, feat_idx=[0]): | |||
return x | |||
|
|||
|
|||
def patchify(x, patch_size): | |||
# YiYi TODO: refactor this | |||
from einops import rearrange |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I think it might work for newer versions of torch: https://github.com/arogozhnikov/einops/wiki/Using-torch.compile-with-einops
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the insight!
@yiyixuxu thanks for releasing this so quickly! we are having some issues trying to get 5b i2v work. afai understand 5b is both for t2v and i2v. i tried a naive hack to copy the model.index.json of the 14b i2v but it didn't quite help. |
@okaris 5b i2v is not supported yet - will look to add it today |
@yiyixuxu thanks for the quick reply. happy to contribute if you can point me in the right direction. |
Co-authored-by: bagheera <59658056+bghira@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks YiYi! Just nits. Will add docs in follow-up as discussed. I think we should remove the changes to the test files here (Wan2.2 dual transformer should be tested separately instead of combining with Wan2.1 tests, such that both are fully tested).
@@ -34,6 +34,103 @@ | |||
CACHE_T = 2 | |||
|
|||
|
|||
class AvgDown3D(nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe prefix these classes with Wan to follow same naming convention
@@ -713,21 +1038,47 @@ def __init__( | |||
2.8251, | |||
1.9160, | |||
], | |||
is_residual: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for now, but ideally, we should just make a separate AutoencoderKLWan2_2
because the structure and internal blocks is different and try to standardize having single-file implementations per model type, similar to transformers. All the if-branching makes things a little harder to reverse engineer and increases barrier for entry for someone wanting to look at the implementations for study purposes IMO.
shift_msa, scale_msa, gate_msa, c_shift_msa, c_scale_msa, c_gate_msa = ( | ||
self.scale_shift_table + temb.float() | ||
).chunk(6, dim=1) | ||
if temb.ndim == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as VAE, ideally this should be in separate transformer implementation, transformer_wan_2_2.py
, if we want to adopt single file properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
I think vae can have its own class, feel free to refactor it if you prefer!
transformer change is really minimum and we could refactor further so it only a single code path, i.e. we just need to always expand timesteps inputs to be 2d. ( I did not have time to test it out so I kept if else here)
Hello, @yiyixuxu, I generated a video (https://github.com/user-attachments/assets/ce6ebaf1-8478-4c29-9170-57d5ae854a7d) using the code below and noticed a slight grainy texture. Is this expected behavior, and does it match the results you observed during your testing? `import torch dtype = torch.bfloat16 model_id = "Wan-AI/Wan2.2-TI2V-5B-Diffusers" height = 704 prompt = "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." output = pipe( |
Hi, It does not seem to work? Although the checklist mentions multi-gpu support, I'm not sure if that's for the diffusers version? |
install from PR
TI2V (only Text-to-image is supported for now, adding I2V soon)
14B T2V
14B I2V