--- title: Wan Studio emoji: ๐ŸŽฌ colorFrom: indigo colorTo: gray sdk: gradio sdk_version: "6.14.0" app_file: app.py pinned: false short_description: "Every Wan mode, one clean UI." python_version: "3.12.12" startup_duration_timeout: "30m" # Volume mounts for model weights are set programmatically by # scripts/create_space.py (Volume(type="model", source=..., mount_path=..., # read_only=True) โ€” read-only, served from the maintainer's HF object # store, zero cost against the 150 GB ephemeral disk cap). # ZeroGPU hardware is also requested programmatically โ€” SpaceHardware # .ZERO_A10G, empirically the live Blackwell ZeroGPU V2 pool as of 2026. --- # Wan Studio > Every Alibaba Wan video-diffusion mode in one clean Gradio UI โ€” T2V, I2V, TI2V, FLF2V, V2V, VACE, S2V, Animate โ€” backed by HF ZeroGPU.

Live on Hugging Face Spaces GitHub

License: Apache 2.0 Python 3.12 Gradio 5.49 ZeroGPU Blackwell MPS-friendly

**๐Ÿ”— Live demo:** https://huggingface.co/spaces/techfreakworm/wan-studio *(in active development โ€” please don't run inference; it burns the maintainer's ZeroGPU quota)* --- ## What it is Wan Studio is a single Gradio app that exposes every officially-supported mode of the [Alibaba Wan](https://github.com/Wan-Video) video-diffusion family โ€” Wan 2.1 + Wan 2.2 โ€” through a refined, Linear-inspired UI. Each mode (T2V, I2V, TI2V, FLF2V, V2V, VACE, S2V, Animate) lives in its own sidebar tab with mode-specific inputs and a shared two-preset model: - **Fast (Lightning)** โ€” 4 steps, CFG = 1.0, official Lightning LoRA loaded - **Quality** โ€” 30-50 steps, full sampler, no LoRA Both Wan generations live in the same UI. The header dropdown picks `Wan 2.1` vs `Wan 2.2` and the active mode resolves to the appropriate checkpoint โ€” single-transformer for Wan 2.1 modes, dual-transformer MoE (`transformer` + `transformer_2` paired Lightning LoRAs) for Wan 2.2 A14B modes. --- ## Roadmap | Phase | Modes | Status | |---|---|---| | 1 | T2V, I2V | ๐ŸŸก in progress | | 2 | FLF2V, V2V, TI2V-5B | planned | | 3 | VACE (depth, pose, sketch, inpaint, outpaint, reference, extension) | planned | | 4 | Animate (character animation + replacement) | planned | | 5 | S2V (speech-to-video, audio-driven) | planned | | 6 | Cross-mode chaining + Gallery + Settings polish | planned | --- ## Architecture ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” user โ†’ HF Space (UI) โ”‚ app.py (Gradio 5.49) โ”‚ โ”‚ โ”€ Linear-themed chrome โ”‚ โ”‚ โ”€ JS-only sidebar nav โ”‚ โ”‚ โ”€ @spaces.GPU handlers โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ pipelines/ โ”‚ โ”‚ ui/ โ”‚ โ”‚ โ”€ registry.py โ”‚ ModelCard catalog (12 checkpoints) โ”‚ โ”€ header.py โ”‚ โ”‚ โ”€ handle.py โ”‚ WanModelHandle base + LRU cache โ”‚ โ”€ sidebar.py โ”‚ โ”‚ โ”€ t2v.py โ”‚ T2VHandle (Wan 2.1 + Wan 2.2 MoE) โ”‚ โ”€ tabs/*.py โ”‚ โ”‚ โ”€ i2v.py โ”‚ I2VHandle โ”‚ โ”€ build_all_*.py โ”‚ โ”‚ โ”€ shared.py โ”‚ UMT5 / VAE / CLIP loaded once โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”€ preset.py โ”‚ Fast vs Quality kwargs resolver โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Volume mounts โ”‚ โ”‚ HF mirrors (techfreakworm/wan2.*-*) โ”‚ โ”‚ /models// โ”‚ โ†โ”€โ”€ โ”‚ Apache-2.0 duplicates of Wan-AI repos, โ”‚ โ”‚ (read-only, โ”‚ โ”‚ pinned for resilience against upstream. โ”‚ โ”‚ served from HF) โ”‚ โ”‚ Lightning LoRAs in wan-lightning-loras. โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` Design principles: - **Single Space**, no multi-Space federation. Everything in one container. - **Volume-mounted weights** via `huggingface_hub.Volume(type="model", read_only=True)` โ€” backed by the maintainer's HF object store, zero ephemeral disk cost. - **Bundled metadata stitching** (`models_meta//`) โ€” works around HF Volume small-file truncation by shipping correct JSONs in the Space repo and symlinking weights from the mount at startup. - **One handle per (mode, generation)** lazy-loaded on first click; LRU eviction planned for Phase 2+ when more modes go live. - **Shared encoders** (`UMT5-XXL`, `AutoencoderKLWan`, `CLIP-ViT-H/14`) loaded once via `pipelines/shared.py` and injected into every pipeline โ€” saves ~25 GB of duplicated weights. - **MPS-friendly**: identical codebase runs locally on Apple Silicon (`fp16` transformer / `fp32` VAE / no quant) and on ZeroGPU Blackwell (`bf16` / optional torchao FP8 / model CPU offload for MoE). --- ## Tech stack | Layer | Choice | Why | |---|---|---| | Web UI | **Gradio 5.49** | ZeroGPU's only first-class SDK; rich video components | | Diffusion runtime | **diffusers** (latest) | `WanPipeline` + `WanImageToVideoPipeline` first-party support | | Acceleration | `@spaces.GPU(duration=โ€ฆ, size="large")` | ZeroGPU on-demand H100/Blackwell | | Model storage | HF Hub model repos + `space_volumes` mounts | Read-only, free, resilient | | Quantization (optional) | `torchao` FP8 on Blackwell | Halves MoE memory footprint | | MoE management | `accelerate.enable_model_cpu_offload()` | Two 14B transformers fit on a 48 GB GPU | --- ## Local development (Apple Silicon) Tested on M5 Max (128 GB unified memory) with macOS 26 / Python 3.12.12. ```bash git clone https://github.com/techfreakworm/wan-studio.git cd wan-studio python3.12 -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Run the UI (Wan 2.2 T2V default โ€” model auto-downloads on first Generate) WAN_STUDIO_PORT=7863 python3.12 app.py # โ†’ http://localhost:7863 # Smoke build + tests python3.12 -c "from app import build; build()" pytest tests/ -q ``` On MPS the pipelines use `fp16` for transformers and `fp32` for the VAE. No quantization is applied โ€” the M-series GPU doesn't have FP8 tensor cores and quantized kernels are CUDA-specific. The default Wan 2.2 T2V-A14B MoE needs ~56 GB CPU RAM + ~10 GB GPU memory headroom; well within the M5 Max budget. To force a smaller model for faster local iteration: ```bash WAN_STUDIO_T2V_LOCAL_KEY=wan2.1_t2v_1.3b WAN_STUDIO_PORT=7863 python3.12 app.py ``` --- ## HF Space deployment Reproducing the Space from scratch: ```bash # 1. Duplicate the upstream Wan-AI repos into your account + build the LoRA mirror python3.12 scripts/duplicate_upstream.py --dry-run # preview python3.12 scripts/duplicate_upstream.py # execute (~5 min server-side copy + ~2 min LoRA upload) # 2. Create the Space + set volume mounts + request ZeroGPU hardware python3.12 scripts/create_space.py # 3. Upload code hf upload /wan-studio . --repo-type=space ``` Required HF account capabilities: - **PRO subscription** (for ZeroGPU `large` 48 GB slice + 10 TB public model storage) - ~300 GB of model storage will be used by the duplicated Wan-AI mirrors (free under PRO) Phase 1 currently targets `SpaceHardware.ZERO_A10G`, which empirically resolves to the live Blackwell sm_120 pool (confirmed via a probe app โ€” the "Nvidia H200" badge in the Spaces UI is stale marketing text). --- ## Project layout ``` wan-studio/ โ”œโ”€โ”€ app.py # Gradio entry point + @spaces.GPU handlers โ”œโ”€โ”€ pipelines/ โ”‚ โ”œโ”€โ”€ registry.py # ModelCard catalog (12 checkpoints across Wan 2.1/2.2) โ”‚ โ”œโ”€โ”€ handle.py # WanModelHandle base, mount stitching, LoRA loader โ”‚ โ”œโ”€โ”€ t2v.py # T2VHandle (Wan 2.1 + Wan 2.2 MoE) โ”‚ โ”œโ”€โ”€ i2v.py # I2VHandle โ”‚ โ”œโ”€โ”€ shared.py # Shared text encoder / VAE / image encoder โ”‚ โ””โ”€โ”€ preset.py # Fast vs Quality preset resolver โ”œโ”€โ”€ ui/ โ”‚ โ”œโ”€โ”€ header.py # Brand mark + Generation/Preset chrome โ”‚ โ”œโ”€โ”€ sidebar.py # 10-mode left rail โ”‚ โ””โ”€โ”€ tabs/ # Per-mode input + output panels โ”œโ”€โ”€ utils/ โ”‚ โ”œโ”€โ”€ backend.py # Backend.detect() โ€” MPS vs CUDA vs ZeroGPU โ”‚ โ””โ”€โ”€ budget.py # ZeroGPU duration callable + size tier per mode โ”œโ”€โ”€ models_meta// # Bundled small JSONs (configs, tokenizer) โ”‚ โ””โ”€โ”€ wan2.2-t2v-a14b/ โ”œโ”€โ”€ scripts/ โ”‚ โ”œโ”€โ”€ duplicate_upstream.py # Mirror Wan-AI repos into the maintainer's account โ”‚ โ””โ”€โ”€ create_space.py # Programmatic Space configuration โ”œโ”€โ”€ tests/ # 36 unit tests (backend, budget, handle, preset, registry) โ”œโ”€โ”€ docs/superpowers/specs/ # Design specs โ”œโ”€โ”€ docs/superpowers/plans/ # Implementation plans โ””โ”€โ”€ NOTICE.md # Apache 2.0 attribution ``` --- ## Acknowledgments Wan Studio packages and exposes models trained by the [Alibaba Wan-Video team](https://github.com/Wan-Video) under the Apache 2.0 license. Lightning LoRAs are courtesy of [lightx2v](https://github.com/ModelTC/lightx2v) and the [Kijai/WanVideo_comfy](https://huggingface.co/Kijai/WanVideo_comfy) community mirror. Built on [diffusers](https://github.com/huggingface/diffusers), [Gradio](https://github.com/gradio-app/gradio), [HF Spaces](https://huggingface.co/spaces), and [HF Volumes](https://huggingface.co/docs/hub/spaces-config-reference). Full attribution in [NOTICE.md](NOTICE.md). --- ## License [Apache License 2.0](LICENSE) โ€” same as Wan-AI's upstream model releases. --- ## Maintainer **Mayank Gupta** ๐Ÿค— [@techfreakworm on Hugging Face](https://huggingface.co/techfreakworm) ยท ๐Ÿ’ป [@techfreakworm on GitHub](https://github.com/techfreakworm) ยท ๐ŸŒ [mayankgupta.in](https://mayankgupta.in) Phase 1 in progress. Issues and PRs welcome on [github.com/techfreakworm/wan-studio](https://github.com/techfreakworm/wan-studio).