---
license: apache-2.0
language: en
tags:
  - automatic-speech-recognition
  - openvino
  - whisper
  - int8
  - quantized
base_model: openai/whisper-small
library_name: openvino
pipeline_tag: automatic-speech-recognition
---

# ov-whisper_small-int8-2026.0.0

[openai/whisper-small](https://huggingface.co/openai/whisper-small) exported to OpenVINO IR with **INT8 asymmetric weight compression** (group size 128).

The model layout targets `openvino_genai.WhisperPipeline` and includes stateful decoder (`-with-past`), tokenizer, and detokenizer.

## Quantization details

| Parameter | Value |
|-----------|-------|
| Source model | `openai/whisper-small` |
| Weight format | INT8 asymmetric (per-channel) |
| Group size | 128 |
| Encoder layers compressed | 74 / 74 (100%) |
| Decoder layers compressed | 122 / 122 (100%) |
| Task | `automatic-speech-recognition-with-past` |

## Toolchain

| Package | Version |
|---------|---------|
| Python | 3.11.9 |
| openvino | 2026.0.0 |
| openvino-genai | 2026.0.0.0 |
| openvino-tokenizers | 2026.0.0.0 |
| optimum-intel | 1.27.0 |
| optimum | 2.1.0 |
| nncf | 3.0.0 |
| transformers | 4.57.6 |
| torch | 2.11.0 |

## Usage

```python
import numpy as np
import openvino_genai as ov_genai

pipe = ov_genai.WhisperPipeline("ov-whisper_small-int8-2026.0.0", "CPU")

# Load audio as 16 kHz float32 mono (e.g. via librosa)
import librosa
samples, _ = librosa.load("audio.wav", sr=16000, mono=True)
samples = np.asarray(samples, dtype=np.float32)

result = pipe.generate(samples)
print(result.text)
```

Supported devices: `CPU`, `GPU`, `NPU` (tested on Intel Core Ultra 7 255H / Arc 140T / AI Boost).

## Reproduce the export

```bash
pip install -r requirements.txt
python export_whisper_int8_ov.py \
    --model openai/whisper-small \
    --output ov-whisper_small-int8-2026.0.0 \
    --cache-dir ./cache_dir
```

Or equivalently with `optimum-cli` directly:

```bash
optimum-cli export openvino \
    -m openai/whisper-small \
    --task automatic-speech-recognition-with-past \
    --weight-format int8 \
    --group-size 128 \
    ov-whisper_small-int8-2026.0.0
```

## Validate

```bash
python validate_whisper_genai.py ov-whisper_small-int8-2026.0.0 --device CPU
```

## Files

- `openvino_encoder_model.bin/.xml` -- Whisper encoder (INT8)
- `openvino_decoder_model.bin/.xml` -- Whisper decoder with past/beam_idx (INT8)
- `openvino_tokenizer.bin/.xml` -- Tokenizer
- `openvino_detokenizer.bin/.xml` -- Detokenizer
- `config.json`, `generation_config.json` -- Model configuration
- `tokenizer.json`, `vocab.json`, `merges.txt` -- Tokenizer data
- `export_whisper_int8_ov.py` -- Export script used to produce this model
- `validate_whisper_genai.py` -- Smoke-test script
- `requirements.txt` -- Pinned Python dependencies