---
title: Wan Studio
emoji: ๐ฌ
colorFrom: indigo
colorTo: gray
sdk: gradio
sdk_version: "6.14.0"
app_file: app.py
pinned: false
short_description: "Every Wan mode, one clean UI."
python_version: "3.12.12"
startup_duration_timeout: "30m"
# Volume mounts for model weights are set programmatically by
# scripts/create_space.py (Volume(type="model", source=..., mount_path=...,
# read_only=True) โ read-only, served from the maintainer's HF object
# store, zero cost against the 150 GB ephemeral disk cap).
# ZeroGPU hardware is also requested programmatically โ SpaceHardware
# .ZERO_A10G, empirically the live Blackwell ZeroGPU V2 pool as of 2026.
---
# Wan Studio
> Every Alibaba Wan video-diffusion mode in one clean Gradio UI โ T2V, I2V, TI2V, FLF2V, V2V, VACE, S2V, Animate โ backed by HF ZeroGPU.
**๐ Live demo:** https://huggingface.co/spaces/techfreakworm/wan-studio
*(in active development โ please don't run inference; it burns the maintainer's ZeroGPU quota)*
---
## What it is
Wan Studio is a single Gradio app that exposes every officially-supported mode of the [Alibaba Wan](https://github.com/Wan-Video) video-diffusion family โ Wan 2.1 + Wan 2.2 โ through a refined, Linear-inspired UI. Each mode (T2V, I2V, TI2V, FLF2V, V2V, VACE, S2V, Animate) lives in its own sidebar tab with mode-specific inputs and a shared two-preset model:
- **Fast (Lightning)** โ 4 steps, CFG = 1.0, official Lightning LoRA loaded
- **Quality** โ 30-50 steps, full sampler, no LoRA
Both Wan generations live in the same UI. The header dropdown picks `Wan 2.1` vs `Wan 2.2` and the active mode resolves to the appropriate checkpoint โ single-transformer for Wan 2.1 modes, dual-transformer MoE (`transformer` + `transformer_2` paired Lightning LoRAs) for Wan 2.2 A14B modes.
---
## Roadmap
| Phase | Modes | Status |
|---|---|---|
| 1 | T2V, I2V | ๐ก in progress |
| 2 | FLF2V, V2V, TI2V-5B | planned |
| 3 | VACE (depth, pose, sketch, inpaint, outpaint, reference, extension) | planned |
| 4 | Animate (character animation + replacement) | planned |
| 5 | S2V (speech-to-video, audio-driven) | planned |
| 6 | Cross-mode chaining + Gallery + Settings polish | planned |
---
## Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
user โ HF Space (UI) โ app.py (Gradio 5.49) โ
โ โ Linear-themed chrome โ
โ โ JS-only sidebar nav โ
โ โ @spaces.GPU handlers โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโโโโโผโโโโโโโโโโโ โโโโโโโโผโโโโโโโโโโโโโ
โ pipelines/ โ โ ui/ โ
โ โ registry.py โ ModelCard catalog (12 checkpoints) โ โ header.py โ
โ โ handle.py โ WanModelHandle base + LRU cache โ โ sidebar.py โ
โ โ t2v.py โ T2VHandle (Wan 2.1 + Wan 2.2 MoE) โ โ tabs/*.py โ
โ โ i2v.py โ I2VHandle โ โ build_all_*.py โ
โ โ shared.py โ UMT5 / VAE / CLIP loaded once โโโโโโโโโโโโโโโโโโโโโ
โ โ preset.py โ Fast vs Quality kwargs resolver
โโโโโโโโโฌโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Volume mounts โ โ HF mirrors (techfreakworm/wan2.*-*) โ
โ /models// โ โโโ โ Apache-2.0 duplicates of Wan-AI repos, โ
โ (read-only, โ โ pinned for resilience against upstream. โ
โ served from HF) โ โ Lightning LoRAs in wan-lightning-loras. โ
โโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
Design principles:
- **Single Space**, no multi-Space federation. Everything in one container.
- **Volume-mounted weights** via `huggingface_hub.Volume(type="model", read_only=True)` โ backed by the maintainer's HF object store, zero ephemeral disk cost.
- **Bundled metadata stitching** (`models_meta//`) โ works around HF Volume small-file truncation by shipping correct JSONs in the Space repo and symlinking weights from the mount at startup.
- **One handle per (mode, generation)** lazy-loaded on first click; LRU eviction planned for Phase 2+ when more modes go live.
- **Shared encoders** (`UMT5-XXL`, `AutoencoderKLWan`, `CLIP-ViT-H/14`) loaded once via `pipelines/shared.py` and injected into every pipeline โ saves ~25 GB of duplicated weights.
- **MPS-friendly**: identical codebase runs locally on Apple Silicon (`fp16` transformer / `fp32` VAE / no quant) and on ZeroGPU Blackwell (`bf16` / optional torchao FP8 / model CPU offload for MoE).
---
## Tech stack
| Layer | Choice | Why |
|---|---|---|
| Web UI | **Gradio 5.49** | ZeroGPU's only first-class SDK; rich video components |
| Diffusion runtime | **diffusers** (latest) | `WanPipeline` + `WanImageToVideoPipeline` first-party support |
| Acceleration | `@spaces.GPU(duration=โฆ, size="large")` | ZeroGPU on-demand H100/Blackwell |
| Model storage | HF Hub model repos + `space_volumes` mounts | Read-only, free, resilient |
| Quantization (optional) | `torchao` FP8 on Blackwell | Halves MoE memory footprint |
| MoE management | `accelerate.enable_model_cpu_offload()` | Two 14B transformers fit on a 48 GB GPU |
---
## Local development (Apple Silicon)
Tested on M5 Max (128 GB unified memory) with macOS 26 / Python 3.12.12.
```bash
git clone https://github.com/techfreakworm/wan-studio.git
cd wan-studio
python3.12 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Run the UI (Wan 2.2 T2V default โ model auto-downloads on first Generate)
WAN_STUDIO_PORT=7863 python3.12 app.py
# โ http://localhost:7863
# Smoke build + tests
python3.12 -c "from app import build; build()"
pytest tests/ -q
```
On MPS the pipelines use `fp16` for transformers and `fp32` for the VAE. No quantization is applied โ the M-series GPU doesn't have FP8 tensor cores and quantized kernels are CUDA-specific. The default Wan 2.2 T2V-A14B MoE needs ~56 GB CPU RAM + ~10 GB GPU memory headroom; well within the M5 Max budget.
To force a smaller model for faster local iteration:
```bash
WAN_STUDIO_T2V_LOCAL_KEY=wan2.1_t2v_1.3b WAN_STUDIO_PORT=7863 python3.12 app.py
```
---
## HF Space deployment
Reproducing the Space from scratch:
```bash
# 1. Duplicate the upstream Wan-AI repos into your account + build the LoRA mirror
python3.12 scripts/duplicate_upstream.py --dry-run # preview
python3.12 scripts/duplicate_upstream.py # execute (~5 min server-side copy + ~2 min LoRA upload)
# 2. Create the Space + set volume mounts + request ZeroGPU hardware
python3.12 scripts/create_space.py
# 3. Upload code
hf upload /wan-studio . --repo-type=space
```
Required HF account capabilities:
- **PRO subscription** (for ZeroGPU `large` 48 GB slice + 10 TB public model storage)
- ~300 GB of model storage will be used by the duplicated Wan-AI mirrors (free under PRO)
Phase 1 currently targets `SpaceHardware.ZERO_A10G`, which empirically resolves to the live Blackwell sm_120 pool (confirmed via a probe app โ the "Nvidia H200" badge in the Spaces UI is stale marketing text).
---
## Project layout
```
wan-studio/
โโโ app.py # Gradio entry point + @spaces.GPU handlers
โโโ pipelines/
โ โโโ registry.py # ModelCard catalog (12 checkpoints across Wan 2.1/2.2)
โ โโโ handle.py # WanModelHandle base, mount stitching, LoRA loader
โ โโโ t2v.py # T2VHandle (Wan 2.1 + Wan 2.2 MoE)
โ โโโ i2v.py # I2VHandle
โ โโโ shared.py # Shared text encoder / VAE / image encoder
โ โโโ preset.py # Fast vs Quality preset resolver
โโโ ui/
โ โโโ header.py # Brand mark + Generation/Preset chrome
โ โโโ sidebar.py # 10-mode left rail
โ โโโ tabs/ # Per-mode input + output panels
โโโ utils/
โ โโโ backend.py # Backend.detect() โ MPS vs CUDA vs ZeroGPU
โ โโโ budget.py # ZeroGPU duration callable + size tier per mode
โโโ models_meta// # Bundled small JSONs (configs, tokenizer)
โ โโโ wan2.2-t2v-a14b/
โโโ scripts/
โ โโโ duplicate_upstream.py # Mirror Wan-AI repos into the maintainer's account
โ โโโ create_space.py # Programmatic Space configuration
โโโ tests/ # 36 unit tests (backend, budget, handle, preset, registry)
โโโ docs/superpowers/specs/ # Design specs
โโโ docs/superpowers/plans/ # Implementation plans
โโโ NOTICE.md # Apache 2.0 attribution
```
---
## Acknowledgments
Wan Studio packages and exposes models trained by the [Alibaba Wan-Video team](https://github.com/Wan-Video) under the Apache 2.0 license. Lightning LoRAs are courtesy of [lightx2v](https://github.com/ModelTC/lightx2v) and the [Kijai/WanVideo_comfy](https://huggingface.co/Kijai/WanVideo_comfy) community mirror. Built on [diffusers](https://github.com/huggingface/diffusers), [Gradio](https://github.com/gradio-app/gradio), [HF Spaces](https://huggingface.co/spaces), and [HF Volumes](https://huggingface.co/docs/hub/spaces-config-reference). Full attribution in [NOTICE.md](NOTICE.md).
---
## License
[Apache License 2.0](LICENSE) โ same as Wan-AI's upstream model releases.
---
## Maintainer
**Mayank Gupta**
๐ค [@techfreakworm on Hugging Face](https://huggingface.co/techfreakworm) ยท ๐ป [@techfreakworm on GitHub](https://github.com/techfreakworm) ยท ๐ [mayankgupta.in](https://mayankgupta.in)
Phase 1 in progress. Issues and PRs welcome on [github.com/techfreakworm/wan-studio](https://github.com/techfreakworm/wan-studio).