Closed
Description
With a custom attention processor for Flux.dev transformer, the inference time is different between the following two ways:
-
Manually load and inject the transformer into a flux.dev pipeline
-
Let the pipeline constructor load the transformer internally
The inference time of the first way is about 15% slower than second way.
What is the reason?
I built diffusers from the source code.
Any insights are appreciated!
Metadata
Metadata
Assignees
Labels
No labels