with a custom attention processor for Flux.dev, inference time changes when manually load and inject the transformer model into a flux pipeline versus let the flux pipeline constructor load the transformer internally.

With a custom attention processor for Flux.dev transformer, the inference time is different between the following two ways:

Manually load and inject the transformer into a flux.dev pipeline
Let the pipeline constructor load the transformer internally

The inference time of the first way is about 15% slower than second way.
What is the reason?
I built diffusers from the source code.
Any insights are appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

with a custom attention processor for Flux.dev, inference time changes when manually load and inject the transformer model into a flux pipeline versus let the flux pipeline constructor load the transformer internally. #11607

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

with a custom attention processor for Flux.dev, inference time changes when manually load and inject the transformer model into a flux pipeline versus let the flux pipeline constructor load the transformer internally. #11607

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions