LTX-2.3 Chinese Drama IC-LoRA — Depth Control

An IC-LoRA (In-Context LoRA) for LTX-Video 2.3 (22B) that conditions video generation on a monocular depth video so the generated scene's 3D structure follows a user-supplied reference. Trained on the same 78-episode Chinese historical drama corpus as the character LoRA, with the depth reference produced by Depth-Anything-3 at the source resolution.

Model details

Field	Value
Base model	`Lightricks/LTX-2.3-22B`
Adapter type	IC-LoRA (with reference-video conditioning)
Conditioning input	Monocular depth video (Depth-Anything-3 inverse-depth render, MP4)
Rank	128
Alpha	128
Target modules	`to_k`, `to_q`, `to_v`, `to_out.0`
Training steps	6000
Optimizer	AdamW
Learning rate	1e-4, linear schedule
Mixed precision	bf16
Reference channel	concatenated to the latent via `--video-conditioning` (strength 1.0)

Training data

Identical corpus + caption format to the character LoRA, plus per-clip depth videos extracted via Depth-Anything-3 (unified model with 6-DoF camera + depth on the same forward pass). The depth maps are inverse-depth normalised, rendered to MP4, and matched in resolution to the source clip.

Usage

IC-LoRA inference requires a depth reference video at the target resolution + an inline-weave prompt.

LTX `ltx_pipelines.ic_lora` invocation

python -m ltx_pipelines.ic_lora \
    --prompt "char_0_person. Framed in a static eye level close-up, on a 50mm normal lens, with shallow focus. Set in a candlelit Han dynasty study, the subject sits writing on bamboo scrolls. Live-action photorealistic, cinematic Chinese drama." \
    --negative-prompt "no CGI, no animation, no illustration, no painterly style, no anime" \
    --lora <path_to>/lora_weights_step_06000.safetensors 1.0 \
    --video-conditioning <depth_reference>.mp4 1.0 \
    --width 1280 --height 544 --num-frames 89 \
    --guidance-scale 4.0 --num-inference-steps 20 \
    --skip-stage-2

Recommended strengths

Component	Value
LoRA strength	1.0 (validated default)
Video conditioning strength	1.0 (faithful depth following)
Lower video conditioning (0.5–0.7)	softer geometry adherence, more interpretive output

When to use this vs the other adapters

Use case	Reach for
"Generate a scene matching THIS depth/geometry reference"	Depth IC-LoRA (this one)
"Generate a scene matching THIS pose reference"	Pose IC-LoRA
"Generate a scene matching THIS line-art / canny reference"	Canny IC-LoRA
"Just generate a Chinese drama scene from scratch"	Character LoRA

Stack with the character LoRA for identity + geometry. Validated stack: char 0.9 + depth 1.0.

What this LoRA does well

Faithful 3D geometric structure transfer — generated scenes match the spatial layout of the reference.
Preserves Chinese drama lighting + costume style across very different geometry references.
Strong on interior scenes (corridors, courtyards, study rooms) where depth cues are informative.

What it does NOT do

No identity — depth IC-LoRA controls geometry, not subject identity. Stack with the character LoRA for that.
Not great on flat / depthless reference — if the depth map is uniformly flat the model has nothing to follow.
Depth reference resolution must match target output resolution.

Related models

SyFeee/ltx2.3-chinese-drama-charlora — character / style LoRA.
SyFeee/ltx2.3-chinese-drama-iclora-pose — pose-controlled IC-LoRA.
SyFeee/ltx2.3-chinese-drama-iclora-canny — canny-edge-controlled IC-LoRA.

License

Apache 2.0. See LICENSE for terms.

Attribution: SyFe.

Downloads last month: 43

Model tree for SyFeee/ltx2.3-chinese-drama-iclora-depth

Base model

Lightricks/LTX-Video

Adapter

(335)

this model