Closed
Description
First of all thanks to Aryan (0.9.7 integration) and DN6 (adding GGUF). Model is quite good and output is also promising.
I need help in creating continuous video using the last frame. 1 trick is to generate the video, extract the last frame and do inference. Is there any easy way where I can do this in loop.
My thought is
- Use text encoder to generate prompt embed once and then remove text encoders from memory
- Loop the inference code, once complete extract the last latent (preferred as I can upscale using LTXLatentUpsamplePipeline) frame or image and again create image1 and condition with that frame...and continue doing this for n iterations.
- Also need to save the video locally for each inference, otherwise OOM.
Any thoughts / suggestions?
import torch
import gc
from diffusers import GGUFQuantizationConfig
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline, LTXVideoTransformer3DModel
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
from diffusers.utils import export_to_video, load_video, load_image
transformer_path = f"https://huggingface.co/wsbagnsv1/ltxv-13b-0.9.7-distilled-GGUF/blob/main/ltxv-13b-0.9.7-distilled-Q3_K_S.gguf"
# transformer_path = f"https://huggingface.co/wsbagnsv1/ltxv-13b-0.9.7-distilled-GGUF/blob/main/ltxv-13b-0.9.7-distilled-Q8_0.gguf"
transformer_gguf = LTXVideoTransformer3DModel.from_single_file(
transformer_path,
quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
torch_dtype=torch.bfloat16,
)
pipe = LTXConditionPipeline.from_pretrained(
"Lightricks/LTX-Video-0.9.7-distilled",
transformer=transformer_gguf,
torch_dtype=torch.bfloat16
)
# pipe.to("cuda")
# pipe.enable_sequential_cpu_offload()
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
height, width = 480, 832
num_frames = 151
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
prompt = "hyperrealistic digital artwork of a young woman walking confidently down a garden pathway, wearing white button-up blouse with puffed sleeves and blue denim miniskirt, long flowing light brown hair caught in gentle breeze, carrying a small black handbag, bright sunny day with blue sky and fluffy white clouds, lush green hedges and ornamental plants lining the stone pathway, traditional Asian-inspired architecture in background, photorealistic style with perfect lighting, unreal engine 5, ray tracing, 16K UHD. camera follows subject from front as she walks forward with elegant confidence"
image1 = load_image( "assets/ltx/00039.png" )
condition1 = LTXVideoCondition(
image=image1,
frame_index=0,
)
width=512
height=768
num_frames = 161
# LOOP HERE
latents = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
conditions=[condition1],
width=width,
height=height,
num_frames=num_frames,
guidance_scale=1.0,
num_inference_steps=4,
decode_timestep=0.05,
decode_noise_scale=0.025,
image_cond_noise_scale=0.0,
guidance_rescale=0.7,
generator=torch.Generator().manual_seed(42),
output_type="latent",
).frames
# save video locally
# Update image1 = load_image( latent/image from current inference to be used with next inference)
Metadata
Metadata
Assignees
Labels
No labels