--- license: apache-2.0 base_model: Wan-AI/Wan2.2-T2V-A14B-Diffusers pipeline_tag: text-to-video library_name: mlx-gen tags: - mlx - mlx-gen - mflux - apple-silicon - 8-bit - mixed-q8-bf16 - wan - wan2.2 - video-generation - text-to-video - wan-a14b --- # wan2.2-t2v-a14b-diffusers-8bit This repository contains mixed q8/BF16 MLX-Gen saved weights for [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers). It is designed for local Apple Silicon inference with [`mlx-gen`](https://github.com/lpalbou/mlx-gen). It uses the mflux/MLX saved-weight layout with MLX quantization tensors. It is not a Diffusers or Transformers `from_pretrained()` checkpoint. ## Source Model Original model: [`Wan-AI/Wan2.2-T2V-A14B-Diffusers`](https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers). This quantized derivative follows the Apache 2.0 license of the source model. ## Quantization This is a mixed q8/BF16 checkpoint: - q8 for quantizable Wan transformer block attention and feed-forward linears. - BF16 for the Wan VAE. - BF16 for Wan transformer conditioning/output projection linears, the UMT5 text encoder, scheduler metadata, tokenizer files, norms, convolutions, and other non-quantizable parameters. This mixed policy is used because fully quantizing sensitive Wan A14B paths produced invalid or low-quality video in local validation. ## Validation Measured on 2026-06-04 with `mlx-gen 0.18.9` on Apple Silicon. The upstream Diffusers source snapshot measured about 118 GiB in the local Hugging Face cache before preparing these packages. The table below reports prepared-package generation from model init through MP4 save and post-save video-health validation. Validation profile: `384x224`, 33 frames, 12 denoising steps, guidance `4`, guidance-2 `3`, 8 fps, seed `4242`, `--low-ram`. | Package | Disk | Full-Process Physical Peak | Max RSS | MLX Peak | Total Time | Video Health | |---|---:|---:|---:|---:|---:|---| | BF16 package | 64.3 GiB | 33.0 GiB | 31.8 GiB | 27.7 GiB | 152.7 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.3 | | This mixed q8/BF16 package | 39.7 GiB | 20.7 GiB | 19.5 GiB | 15.5 GiB | 154.8 s | 33/33 frames, 384x224, 8 fps, temporal delta 1.4 | Compared with the BF16 prepared package at the same validation profile, this mixed q8/BF16 package reduces disk usage by about 38% and full-process physical peak memory by about 37%. Total time was about 1% slower in this run. Physical peak is Darwin `ri_phys_footprint` sampled for the full process. The validation is intentionally small and repeatable; it is not a claim that every full-size `1280x720`, 81-frame, 40-step job has the same memory or timing profile. ## Usage ```bash python -m pip install -U mlx-gen mlxgen download --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit mlxgen generate \ --model AbstractFramework/wan2.2-t2v-a14b-diffusers-8bit \ --task text-to-video \ --prompt "A cinematic scene of a scientist working on agentic AI through the night, monitors glowing, papers shifting in a slow dolly shot." \ --width 384 \ --height 224 \ --frames 33 \ --steps 12 \ --guidance 4 \ --guidance-2 3 \ --fps 8 \ --seed 4242 \ --low-ram \ --metadata \ --output video.mp4 ``` ## Compatibility Requires `mlx-gen >= 0.18.9`. Generated with `mlx-gen 0.18.9`. Use the `mlxgen` command and Python import path for new MLX-Gen projects. ## Attribution MLX-Gen is based on [mflux](https://github.com/filipstrand/mflux) by Filip Strand and the original mflux contributors. Quantized and contributed by [@lpalbou](https://huggingface.co/lpalbou).