LTX-2.3 Chinese Drama IC-LoRA β€” Depth Control

An IC-LoRA (In-Context LoRA) for LTX-Video 2.3 (22B) that conditions video generation on a monocular depth video so the generated scene's 3D structure follows a user-supplied reference. Trained on the same 78-episode Chinese historical drama corpus as the character LoRA, with the depth reference produced by Depth-Anything-3 at the source resolution.

Model details

Field Value
Base model Lightricks/LTX-2.3-22B
Adapter type IC-LoRA (with reference-video conditioning)
Conditioning input Monocular depth video (Depth-Anything-3 inverse-depth render, MP4)
Rank 128
Alpha 128
Target modules to_k, to_q, to_v, to_out.0
Training steps 6000
Optimizer AdamW
Learning rate 1e-4, linear schedule
Mixed precision bf16
Reference channel concatenated to the latent via --video-conditioning (strength 1.0)

Training data

Identical corpus + caption format to the character LoRA, plus per-clip depth videos extracted via Depth-Anything-3 (unified model with 6-DoF camera + depth on the same forward pass). The depth maps are inverse-depth normalised, rendered to MP4, and matched in resolution to the source clip.

Usage

IC-LoRA inference requires a depth reference video at the target resolution + an inline-weave prompt.

LTX ltx_pipelines.ic_lora invocation

python -m ltx_pipelines.ic_lora \
    --prompt "char_0_person. Framed in a static eye level close-up, on a 50mm normal lens, with shallow focus. Set in a candlelit Han dynasty study, the subject sits writing on bamboo scrolls. Live-action photorealistic, cinematic Chinese drama." \
    --negative-prompt "no CGI, no animation, no illustration, no painterly style, no anime" \
    --lora <path_to>/lora_weights_step_06000.safetensors 1.0 \
    --video-conditioning <depth_reference>.mp4 1.0 \
    --width 1280 --height 544 --num-frames 89 \
    --guidance-scale 4.0 --num-inference-steps 20 \
    --skip-stage-2

Recommended strengths

Component Value
LoRA strength 1.0 (validated default)
Video conditioning strength 1.0 (faithful depth following)
Lower video conditioning (0.5–0.7) softer geometry adherence, more interpretive output

When to use this vs the other adapters

Use case Reach for
"Generate a scene matching THIS depth/geometry reference" Depth IC-LoRA (this one)
"Generate a scene matching THIS pose reference" Pose IC-LoRA
"Generate a scene matching THIS line-art / canny reference" Canny IC-LoRA
"Just generate a Chinese drama scene from scratch" Character LoRA

Stack with the character LoRA for identity + geometry. Validated stack: char 0.9 + depth 1.0.

What this LoRA does well

  • Faithful 3D geometric structure transfer β€” generated scenes match the spatial layout of the reference.
  • Preserves Chinese drama lighting + costume style across very different geometry references.
  • Strong on interior scenes (corridors, courtyards, study rooms) where depth cues are informative.

What it does NOT do

  • No identity β€” depth IC-LoRA controls geometry, not subject identity. Stack with the character LoRA for that.
  • Not great on flat / depthless reference β€” if the depth map is uniformly flat the model has nothing to follow.
  • Depth reference resolution must match target output resolution.

Related models

License

Apache 2.0. See LICENSE for terms.

Attribution: SyFe.

Downloads last month
43
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SyFeee/ltx2.3-chinese-drama-iclora-depth

Adapter
(335)
this model