--- license: cc-by-nc-4.0 tags: - mlx - audio-classification - language-identification - wav2vec2 - apple-silicon language: - multilingual base_model: facebook/mms-lid-256 pipeline_tag: audio-classification --- # MMS-LID-256 — MLX Meta's MMS-LID-256 language identification model running on MLX (Apple Silicon Metal GPU). Identifies **256 languages** from raw audio. Uses weights directly from [facebook/mms-lid-256](https://huggingface.co/facebook/mms-lid-256) — no conversion needed. ## Performance (M1, 10s audio) | Framework | Latency | Russian | English | |---|---|---|---| | **Python MLX** (with `mx.compile()`) | **267ms** | 98.8% | 99.8% | | **Swift MLX** (with `compile()`) | **268ms** | 97.3% | 99.7% | | CoreML GPU | 250ms | 89.1% | 99.8% | ## Usage ```bash # Clone the benchmark repo git clone https://github.com/beshkenadze/lid-bench cd lid-bench/mlx # Setup uv venv && uv pip install mlx numpy soundfile safetensors huggingface_hub # Download weights (from original facebook repo) huggingface-cli download facebook/mms-lid-256 --include "model.safetensors" "config.json" # Run python mms_lid_256.py path/to/audio.wav --benchmark ``` Full implementation: [github.com/beshkenadze/lid-bench](https://github.com/beshkenadze/lid-bench) ## Model Details - **Architecture**: Wav2Vec2ForSequenceClassification (48 transformer layers, 16 heads) - **Input**: Raw 16kHz waveform (zero-mean unit-variance normalized) - **Output**: 256 language probabilities - **Parameters**: 315M - **Weight format**: Original HF safetensors (no conversion needed) - **Weight loading**: Conv1d axis swap + weight_norm precomputation done at load time ## Notes - Weights are loaded directly from `facebook/mms-lid-256` HF cache — no separate conversion step - This repo contains only the model card and MLX implementation reference - **ANE causes 13x slowdown** — use Metal GPU (`.cpuAndGPU`) only - Sustained inference on M1 degrades to ~400ms due to thermal throttling (48 transformer layers). M2+ should be better.