---
license: mit
language:
- en
tags:
- gguf
- ggml
- audio
- speech-recognition
- whisper
- distil-whisper
- automatic-speech-recognition
base_model: distil-whisper/distil-large-v3
pipeline_tag: automatic-speech-recognition
---

# Distil Whisper Large v3 (ggml)

ggml conversion of [distil-whisper/distil-large-v3](https://huggingface.co/distil-whisper/distil-large-v3) for use with [CrispASR](https://github.com/CrispStrobe/CrispASR) and [whisper.cpp](https://github.com/ggml-org/whisper.cpp).

## Model Details

- **Architecture**: Whisper encoder (32 layers, 1280-dim) + distilled decoder (2 layers only)
- **Parameters**: 756M (49% smaller than whisper-large-v3)
- **Speed**: 6.3x faster than whisper-large-v3, within 1% WER
- **Language**: English
- **License**: MIT

## Usage

```bash
# Uses the standard whisper backend (auto-detected)
crispasr -m distil-large-v3-q5_0.bin -f audio.wav
```

## Files

| File | Size | JFK Result |
|------|------|-----------|
| distil-large-v3.bin | 1.5 GB | perfect |
| distil-large-v3-q5_0.bin | 513 MB | perfect |

## Why Distil Whisper?

- **6.3x faster** than whisper-large-v3 (2 decoder layers vs 32)
- **Within 1% WER** on standard benchmarks
- **Same encoder** as whisper-large-v3 (32 layers, 1280-dim)
- **Drop-in replacement** — same ggml format, same CLI flags