[wip][core] parallel loading of shards #12028

sayakpaul · 2025-07-31T07:53:18Z

What does this PR do?

Similar to huggingface/transformers#36835.

`main`: time: 8.162s
this branch: time: 5.663s

code

import time
t_ini = time.time()

import torch
import os
from diffusers import DiffusionPipeline, AutoModel
print(f"import time: {time.time() - t_ini:.3f}s")

os.environ["HF_ENABLE_PARALLEL_LOADING"] = "YES"
os.environ["HF_PARALLEL_LOADING_WORKERS"] = "12"
model_id = "Wan-AI/Wan2.2-I2V-A14B-Diffusers"

t0 = time.time()
torch.cuda.synchronize()
print(f"CUDA sync time: {time.time() - t0:.3f}s")

print("starting model load")
t1 = time.time()
transformer = AutoModel.from_pretrained(
    model_id, subfolder="transformer", torch_dtype=torch.bfloat16, device_map="cuda"
)
torch.cuda.synchronize()
t2 = time.time()

diff = t2 - t1
print(f"time: {diff:.3f}s")

sayakpaul · 2025-07-31T07:55:49Z

src/diffusers/models/model_loading_utils.py

@@ -310,6 +311,130 @@ def load_model_dict_into_meta(
    return offload_index, state_dict_index


+def check_support_param_buffer_assignment(model_to_load, state_dict, start_prefix=""):


Moved it here from modeling_utils.py.

sayakpaul · 2025-07-31T07:56:12Z

src/diffusers/models/model_loading_utils.py

+    return offload_index, state_dict_index, mismatched_keys, error_msgs
+
+
+def _find_mismatched_keys(


Same. Moved it out of modeling_utils.py.

sayakpaul · 2025-07-31T07:56:47Z

src/diffusers/models/modeling_utils.py

-        if len(resolved_model_file) > 1:
-            resolved_model_file = logging.tqdm(resolved_model_file, desc="Loading checkpoint shards")
-
-        mismatched_keys = []
-        assign_to_params_buffers = None
-        error_msgs = []
-
-        for shard_file in resolved_model_file:
-            state_dict = load_state_dict(shard_file, dduf_entries=dduf_entries)
-            mismatched_keys += _find_mismatched_keys(
-                state_dict, model_state_dict, loaded_keys, ignore_mismatched_sizes


This has been moved to load_shard_file().

HuggingFaceDocBuilderDev · 2025-07-31T08:27:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul added 5 commits July 10, 2025 11:06

checking.

af72ece

checking

d4e2976

checking

c9b680d

up

ab84d5a

up

536df5a

sayakpaul commented Jul 31, 2025

View reviewed changes

sayakpaul added 2 commits July 31, 2025 13:36

up

04cd5cc

up

cb0b3ed

sayakpaul mentioned this pull request Jul 31, 2025

Enhance Model Loading By Providing Parallelism, Uses Optional Env Flag huggingface/transformers#36835

Merged

3 tasks

sayakpaul requested a review from a-r-r-o-w July 31, 2025 08:48

Merge branch 'main' into parallel-shards-loading

2fdc091

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[wip][core] parallel loading of shards #12028

[wip][core] parallel loading of shards #12028

Uh oh!

sayakpaul commented Jul 31, 2025 •

edited

Loading

sayakpaul Jul 31, 2025

sayakpaul Jul 31, 2025

sayakpaul Jul 31, 2025

HuggingFaceDocBuilderDev commented Jul 31, 2025

		@@ -310,6 +311,130 @@ def load_model_dict_into_meta(
		return offload_index, state_dict_index


		def check_support_param_buffer_assignment(model_to_load, state_dict, start_prefix=""):

		return offload_index, state_dict_index, mismatched_keys, error_msgs


		def _find_mismatched_keys(

[wip][core] parallel loading of shards #12028

Are you sure you want to change the base?

[wip][core] parallel loading of shards #12028

Uh oh!

Conversation

sayakpaul commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

sayakpaul Jul 31, 2025

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jul 31, 2025

sayakpaul commented Jul 31, 2025 •

edited

Loading