Instructions to use charlesw09/CLEAR-mask-free-video-subtitle-removal-CogvideoX with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use charlesw09/CLEAR-mask-free-video-subtitle-removal-CogvideoX with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("zai-org/CogVideoX-2b", dtype=torch.bfloat16, device_map="cuda") pipe.load_lora_weights("charlesw09/CLEAR-mask-free-video-subtitle-removal-CogvideoX") prompt = "A man with short gray hair plays a red electric guitar." output = pipe(prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
Update README.md
Browse files
README.md
CHANGED
|
@@ -19,20 +19,6 @@ This repository releases **LoRA + expanded input-projection weights** for **vide
|
|
| 19 |
|
| 20 |
> **Disclaimer:** This is a **supplementary** experiment from the CLEAR project. The main paper results use **Wan2.1-Control**; this CogVideoX-2b variant is **not** expected to match that baseline. It is shared for **reproducibility and comparison**.
|
| 21 |
|
| 22 |
-
## What is included
|
| 23 |
-
|
| 24 |
-
- **Checkpoint:** `cogvideox_2b_CLEAR_lora_checkpoint.pt`
|
| 25 |
-
Contains:
|
| 26 |
-
- `lora_state_dict` — LoRA on attention (`to_q`, `to_k`, `to_v`, `to_out.0`)
|
| 27 |
-
- `proj_state_dict` — expanded `patch_embed.proj` (16→32 input channels for conditioning)
|
| 28 |
-
- `step` — training step metadata
|
| 29 |
-
|
| 30 |
-
- **Training / inference code:** see linked project or bundled scripts (if you attach the `release/` archive to the repo).
|
| 31 |
-
|
| 32 |
-
## What is **not** included
|
| 33 |
-
|
| 34 |
-
- **Training data** (not released).
|
| 35 |
-
- **Base model weights** — load from `zai-org/CogVideoX-2b`.
|
| 36 |
|
| 37 |
## Architecture change (high level)
|
| 38 |
|
|
|
|
| 19 |
|
| 20 |
> **Disclaimer:** This is a **supplementary** experiment from the CLEAR project. The main paper results use **Wan2.1-Control**; this CogVideoX-2b variant is **not** expected to match that baseline. It is shared for **reproducibility and comparison**.
|
| 21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
## Architecture change (high level)
|
| 24 |
|