--- license: apache-2.0 base_model: - zai-org/CogVideoX-2b language: - en tags: - video-to-video - subtitle-removal - lora - cogvideox - diffusers pipeline_tag: text-to-video --- # CogVideoX-2b CLEAR LoRA — Subtitle Removal (Supplementary) This repository releases **LoRA + expanded input-projection weights** for **video-to-video subtitle removal** on top of **[zai-org/CogVideoX-2b](https://huggingface.co/zai-org/CogVideoX-2b)**. > **Disclaimer:** This is a **supplementary** experiment from the CLEAR project. The main paper results use **Wan2.1-Control**; this CogVideoX-2b variant is **not** expected to match that baseline. It is shared for **reproducibility and comparison**. ## Architecture change (high level) CogVideoX-2b is originally **text-to-video**. For conditioning, the first-stage conv input is expanded: - **Before:** `patch_embed.proj`: Conv2d(16 → 1920, …) - **After:** `patch_embed.proj`: Conv2d(32 → 1920, …) - First 16 channels: noisy latent (inherits pretrained weights) - Last 16 channels: subtitle-video latent (new channels, trained) Inference concatenates **noisy latent** and **subtitle latent** along the channel dimension before the transformer, consistent with training. ## Intended use - **Research:** subtitle removal / video inpainting with diffusion. - **Not** for high-stakes or misleading content; users are responsible for compliance with law and platform policies. ## How to use 1. Download **CogVideoX-2b** from `zai-org/CogVideoX-2b`. 2. Place `cogvideox_2b_CLEAR_lora_checkpoint.pt` locally. 3. Run inference with the provided script (example): ```bash export MODEL_PATH="/path/to/CogVideoX-2b" export CHECKPOINT="/path/to/cogvideox_2b_CLEAR_lora_checkpoint.pt" bash scripts/inference_cogvideox_2b.sh \ --input_video /path/to/video_with_subtitles.mp4 \ --output_dir ./output