Infinite (not literally) length video creation using LTX-Video?

First of all thanks to Aryan (0.9.7 integration) and DN6 (adding GGUF). Model is quite good and output is also promising.

I need help in creating continuous video using the last frame. 1 trick is to generate the video, extract the last frame and do inference. Is there any easy way where I can do this in loop.

My thought is

Use text encoder to generate prompt embed once and then remove text encoders from memory
Loop the inference code, once complete extract the last latent (preferred as I can upscale using LTXLatentUpsamplePipeline) frame or image and again create image1 and condition with that frame...and continue doing this for n iterations.
Also need to save the video locally for each inference, otherwise OOM.

Any thoughts / suggestions?

import torch
import gc
from diffusers import GGUFQuantizationConfig
from diffusers import LTXConditionPipeline, LTXLatentUpsamplePipeline, LTXVideoTransformer3DModel
from diffusers.pipelines.ltx.pipeline_ltx_condition import LTXVideoCondition
from diffusers.utils import export_to_video, load_video, load_image

transformer_path = f"https://huggingface.co/wsbagnsv1/ltxv-13b-0.9.7-distilled-GGUF/blob/main/ltxv-13b-0.9.7-distilled-Q3_K_S.gguf"
# transformer_path = f"https://huggingface.co/wsbagnsv1/ltxv-13b-0.9.7-distilled-GGUF/blob/main/ltxv-13b-0.9.7-distilled-Q8_0.gguf"
transformer_gguf = LTXVideoTransformer3DModel.from_single_file(
    transformer_path,
    quantization_config=GGUFQuantizationConfig(compute_dtype=torch.bfloat16),
    torch_dtype=torch.bfloat16,
)

pipe = LTXConditionPipeline.from_pretrained(
    "Lightricks/LTX-Video-0.9.7-distilled", 
    transformer=transformer_gguf,
    torch_dtype=torch.bfloat16
)
# pipe.to("cuda")
# pipe.enable_sequential_cpu_offload()
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

height, width = 480, 832
num_frames = 151
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"

prompt = "hyperrealistic digital artwork of a young woman walking confidently down a garden pathway, wearing white button-up blouse with puffed sleeves and blue denim miniskirt, long flowing light brown hair caught in gentle breeze, carrying a small black handbag, bright sunny day with blue sky and fluffy white clouds, lush green hedges and ornamental plants lining the stone pathway, traditional Asian-inspired architecture in background, photorealistic style with perfect lighting, unreal engine 5, ray tracing, 16K UHD. camera follows subject from front as she walks forward with elegant confidence"
image1 = load_image( "assets/ltx/00039.png" )
condition1 = LTXVideoCondition(
    image=image1,
    frame_index=0,
)
width=512
height=768
num_frames = 161

# LOOP HERE
latents = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    conditions=[condition1],
    width=width,
    height=height,
    num_frames=num_frames,
    guidance_scale=1.0,
    num_inference_steps=4,
    decode_timestep=0.05,
    decode_noise_scale=0.025,
    image_cond_noise_scale=0.0,
    guidance_rescale=0.7,
    generator=torch.Generator().manual_seed(42),
    output_type="latent",
).frames
# save video locally
# Update image1 = load_image( latent/image from current inference  to be used with next inference)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infinite (not literally) length video creation using LTX-Video? #11590

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Infinite (not literally) length video creation using LTX-Video? #11590

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions