---
license: cc-by-nc-4.0
tags:
  - mlx
  - audio-classification
  - language-identification
  - wav2vec2
  - apple-silicon
language:
  - multilingual
base_model: facebook/mms-lid-256
pipeline_tag: audio-classification
---

# MMS-LID-256 — MLX

Meta's MMS-LID-256 language identification model running on MLX (Apple Silicon Metal GPU).

Identifies **256 languages** from raw audio. Uses weights directly from [facebook/mms-lid-256](https://huggingface.co/facebook/mms-lid-256) — no conversion needed.

## Performance (M1, 10s audio)

| Framework | Latency | Russian | English |
|---|---|---|---|
| **Python MLX** (with `mx.compile()`) | **267ms** | 98.8% | 99.8% |
| **Swift MLX** (with `compile()`) | **268ms** | 97.3% | 99.7% |
| CoreML GPU | 250ms | 89.1% | 99.8% |

## Usage

```bash
# Clone the benchmark repo
git clone https://github.com/beshkenadze/lid-bench
cd lid-bench/mlx

# Setup
uv venv && uv pip install mlx numpy soundfile safetensors huggingface_hub

# Download weights (from original facebook repo)
huggingface-cli download facebook/mms-lid-256 --include "model.safetensors" "config.json"

# Run
python mms_lid_256.py path/to/audio.wav --benchmark
```

Full implementation: [github.com/beshkenadze/lid-bench](https://github.com/beshkenadze/lid-bench)

## Model Details

- **Architecture**: Wav2Vec2ForSequenceClassification (48 transformer layers, 16 heads)
- **Input**: Raw 16kHz waveform (zero-mean unit-variance normalized)
- **Output**: 256 language probabilities
- **Parameters**: 315M
- **Weight format**: Original HF safetensors (no conversion needed)
- **Weight loading**: Conv1d axis swap + weight_norm precomputation done at load time

## Notes

- Weights are loaded directly from `facebook/mms-lid-256` HF cache — no separate conversion step
- This repo contains only the model card and MLX implementation reference
- **ANE causes 13x slowdown** — use Metal GPU (`.cpuAndGPU`) only
- Sustained inference on M1 degrades to ~400ms due to thermal throttling (48 transformer layers). M2+ should be better.