[tests] model-level `device_map` clarifications #11681

sayakpaul · 2025-06-09T03:52:15Z

What does this PR do?

Our device_map (model-level) documentation could be improved a bit. We could also test accepted values for device_map. This PR does it.

The PR relies on #11680 to get merged.

@Birch-san is it better?

HuggingFaceDocBuilderDev · 2025-06-09T04:01:38Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

GrigoryEvko · 2025-06-09T07:27:46Z

Hi guys, I tinkered a bit with device_map code with Opus, replaced diffusers internal usage of device_map in favor of accelerate functions. I will test now on multi gpu machines, please share your thoughts!
3635b41

Next step would be making diffusers modeling_utils on par with transformers modeling_utils to allow Torch TP usage, Deepspeed and FSDP/FSDP2 usage, etc.

sayakpaul · 2025-06-09T07:52:40Z

I tinkered a bit with device_map code with Opus, replaced diffusers internal usage of device_map in favor of accelerate functions.

We already make use of accelerate fun tions to facilitate device mapping. So, please open a PR as that is easier for the team to navigate through.

Next step would be making diffusers modeling_utils on par with transformers modeling_utils to allow Torch TP usage, Deepspeed and FSDP/FSDP2 usage, etc.

This is in the works already by @a-r-r-o-w. Again, it is much better to converse about it through PRs and issues.

This PR is exactly what the description of the PR says. So, let’s switch to other threads for discussing related things further.

GrigoryEvko · 2025-06-09T07:57:06Z

We already make use of accelerate fun tions to facilitate device mapping. So, please open a PR as that is easier for the team to navigate through.

Sure, I will clean up, test and then submit the PR. I just noticed the team started to work on the issue and decided to comment here.

a-r-r-o-w

LGTM. Would try to make sure that the case/failures encountered by Birch are resolved and work as expected

src/diffusers/models/modeling_utils.py

a-r-r-o-w · 2025-06-09T10:21:20Z

tests/models/unets/test_models_unet_2d_condition.py

+
+    @parameterized.expand([0, "cuda", torch.device("cuda"), torch.device("cuda:0")])
+    @require_torch_gpu
+    def test_passing_non_dict_device_map_works(self, device_map):


Can we test some more cases like: {"": torch.device("meta"), "decoder": torch.device("cuda")}? For example, if this was a VAE, the intention here is to not load the encoder weights, but directly load the decoder weights to device.

Additionally, we should probably run device map tests for all models IMO (can be taken up in future PR)

I agree this would be fantastic, but we can probably tackle that in a separate PR, and leave the scope of this one to tests/docs/bugfixes/assertions.

Additionally, we should probably run device map tests for all models IMO (can be taken up in future PR)

We have a bunch of device_map related tests already in https://github.com/huggingface/diffusers/blob/main/tests/models/test_modeling_common.py.

I can shift the current ones being added through this PR to test_modeling_common.py in a separate PR.

Can we test some more cases like: {"": torch.device("meta"), "decoder": torch.device("cuda")}? For example, if this was a VAE, the intention here is to not load the encoder weights, but directly load the decoder weights to device.

Feel free to add that in a separate PR.

a-r-r-o-w · 2025-06-09T10:22:27Z

tests/models/unets/test_models_unet_2d_condition.py

@@ -1084,6 +1084,25 @@ def test_load_sharded_checkpoint_device_map_from_hub_local_subfolder(self):
        assert loaded_model
        assert new_output.sample.shape == (4, 4, 16, 16)

+    def test_wrong_device_map_raises_error(self):


Testing more of the failure code paths (if there are any; I am not fully aware of the relevant parts of the codebase) could be nice

We check if the device_map is respected in https://github.com/huggingface/diffusers/blob/main/tests/models/test_modeling_common.py. Then we're testing for invalid device_map values for non-dict entries. I added another one in eb913e2.

Many errors are already handled in accelerate.

So, collectively, we should now be good I think.

a-r-r-o-w · 2025-06-09T10:28:03Z

@GrigoryEvko re: FSDP/FSDP2; This is already supported via accelerate, and the intented way for usage is to enable it via the accelerate config and modify some parts of the codebase based on accelerate documentation. Could you elaborate what it would mean to support this natively?

re: TP; TP does not really help much with e2e speed with current diffusion model sizes IME. I will be prioritizing supporting CP natively instead in the coming weeks. For TP, it's something we could consider once CP is supported.

re: DeepSpeed; This is also something that should be enabled via accelerate in your training codebase

src/diffusers/models/modeling_utils.py

Birch-san · 2025-06-09T10:36:37Z

does device_map=device work (where the device variable is a torch.device)? the docs say it should work, but when I tried this, I got a "device cuda is invalid" error.

likewise, does device_map={'': device} work? the docs say it should work, but when I tried this, I got a "device cuda is invalid" error.

from diffusers.models.unets.unet_2d_condition import UNet2DConditionModel
import torch

device = torch.device('cuda')
unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
    'NovelAI/nai-anime-v1-full',
    torch_dtype=torch.float16,
    use_safetensors=True,
    subfolder='unet',
    device_map={'': device},
).eval()

I'm using diffusers==0.32.2 and accelerate==1.4.0 if it matters.

Birch-san · 2025-06-09T10:41:53Z

src/diffusers/models/modeling_utils.py

                A map that specifies where each submodule should go. It doesn't need to be defined for each
                parameter/buffer name; once a given module name is inside, every submodule of it will be sent to the
                same device. Defaults to `None`, meaning that the model will be loaded on CPU.

+                Examples:


it looks like this is just documenting the scalar cases. the bit that I need docs for is the dictionary convention. {'': device.type} as the simpest valid input is extremely hard to guess. there really needs to be an explanation of what the key of the dictionary means.

Cc: @SunMarc @stevhliu

How is this documented in transformers?

You can find this here https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference#the-devicemap

also can you fix the typing for the DiffusionPipeline from_pretrained for device_map since for this specific function, we only allow balanced value ?

Done in 407b67f.

@Birch-san I clarified the docs to include the case of {"": torch.device("cuda")} and have added tests for it, too. For other possible and valid dict inputs to device_map, I would have to defer you to https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference#the-devicemap as you can notice it's hard to specify that beforehand without doing a bit of investigation.

So, I would suggest loading your model with "auto" device_map, first. And then printing (model.hf_device_map) to get a much better handle. This way, you will have a reasonable starting point which you could then use to tweak things around a bit.

sayakpaul · 2025-06-09T11:14:35Z

@Birch-san I ran the following with this PR branch and it worked:

from diffusers.models.unets.unet_2d_condition import UNet2DConditionModel
import torch

for device_map in [torch.device('cuda'), {'': torch.device('cuda')}]:
    unet: UNet2DConditionModel = UNet2DConditionModel.from_pretrained(
        'NovelAI/nai-anime-v1-full',
        torch_dtype=torch.float16,
        use_safetensors=True,
        subfolder='unet',
        device_map=device_map,
    ).eval()

The test_passing_non_dict_device_map_works() tests should ensure device_maps that have non-dict entries, including the torch.device type are working.

Will add dict types in the tests as well.

sayakpaul added 6 commits June 9, 2025 07:25

add clarity in documentation for device_map

a4dd7fd

docs

5e35ac5

fix how compiler tester mixins are used.

9b8015c

propagate

d4a380d

more

2672347

Merge branch 'fix-compiler-tester-mixins' into device_map_clarification

c3431bf

sayakpaul requested review from SunMarc, stevhliu and a-r-r-o-w June 9, 2025 03:52

sayakpaul added 3 commits June 9, 2025 09:23

typo.

f035a0d

fix tests

43c41f4

Merge branch 'main' into device_map_clarification

8ab7d17

fix order of decroators.

7f85be4

Merge branch 'main' into device_map_clarification

0bd70de

a-r-r-o-w approved these changes Jun 9, 2025

View reviewed changes

Birch-san reviewed Jun 9, 2025

View reviewed changes

src/diffusers/models/modeling_utils.py Outdated Show resolved Hide resolved

Birch-san reviewed Jun 9, 2025

View reviewed changes

sayakpaul added 3 commits June 9, 2025 16:27

clarify more.

962483b

more test cases.

eb913e2

fix doc

e5820d7

sayakpaul added 3 commits June 9, 2025 19:47

fix device_map docstring in pipeline_utils.

407b67f

more examples

e9ccc73

more

08d429b

update

3bebf25

sayakpaul requested a review from a-r-r-o-w June 9, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tests] model-level `device_map` clarifications #11681

[tests] model-level `device_map` clarifications #11681

sayakpaul commented Jun 9, 2025

HuggingFaceDocBuilderDev commented Jun 9, 2025

GrigoryEvko commented Jun 9, 2025 •

edited

Loading

sayakpaul commented Jun 9, 2025

GrigoryEvko commented Jun 9, 2025

a-r-r-o-w left a comment

Uh oh!

a-r-r-o-w Jun 9, 2025

Birch-san Jun 9, 2025

sayakpaul Jun 9, 2025

a-r-r-o-w Jun 9, 2025

sayakpaul Jun 9, 2025

a-r-r-o-w commented Jun 9, 2025

Uh oh!

Birch-san commented Jun 9, 2025 •

edited

Loading

Birch-san Jun 9, 2025 •

edited

Loading

sayakpaul Jun 9, 2025

SunMarc Jun 9, 2025

SunMarc Jun 9, 2025

sayakpaul Jun 9, 2025

sayakpaul Jun 9, 2025

sayakpaul commented Jun 9, 2025

[tests] model-level device_map clarifications #11681

Are you sure you want to change the base?

[tests] model-level device_map clarifications #11681

Conversation

sayakpaul commented Jun 9, 2025

What does this PR do?

HuggingFaceDocBuilderDev commented Jun 9, 2025

GrigoryEvko commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sayakpaul commented Jun 9, 2025

GrigoryEvko commented Jun 9, 2025

a-r-r-o-w left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-r-r-o-w commented Jun 9, 2025

Uh oh!

Birch-san commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Birch-san Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sayakpaul commented Jun 9, 2025

[tests] model-level `device_map` clarifications #11681

[tests] model-level `device_map` clarifications #11681

GrigoryEvko commented Jun 9, 2025 •

edited

Loading

Birch-san commented Jun 9, 2025 •

edited

Loading

Birch-san Jun 9, 2025 •

edited

Loading