---
language: fr
license: apache-2.0
tags:
- automatic-speech-recognition
- ASR
- whisper
- french
- fine-tuned
---

# Whisper v3 FT - Mutilingual Librispeech French / Français

## Description / Description

**English:**  
This is a fine-tuned version of `openai/whisper-small` trained on a subset of the **Multilingual Librispeech** dataset.  
It is designed for high-quality transcription of French speech, including proper nouns such as names of people, artists, and places.  
Perfect for voice assistants, podcasts, transcription projects, or any application needing accurate French ASR.

**Français:**  
Ceci est une version fine-tuned de `openai/whisper-small` entraînée sur un sous-ensemble du dataset **Multilingual Librispeech**.  
Elle est conçue pour fournir une transcription précise de la parole en français, y compris les noms propres comme des artistes, écrivains, ou lieux.  
Idéale pour assistants vocaux, podcasts, projets de transcription ou toute application nécessitant de l’ASR en français.

---

## Data Used / Données utilisées

- **Dataset / Jeu de données**: [Multi-Librispeech French Subset](keypa/mls-french-whisper-final)  
- **Size / Taille**: 10k samples for training / 10k échantillons pour l'entraînement  
- **Preprocessing / Prétraitement**:
  - Audio resampled to 16 kHz / Audio rééchantillonné à 16 kHz  
  - Text normalized and tokenized using `WhisperTokenizer` / Texte normalisé et tokenisé avec `WhisperTokenizer`

---

## Performance / Performance

- WER (Word Error Rate) on test set: ~2.5%  
- Final training loss: ~0.035  
- Stable for general French sentences / Stable pour des phrases françaises générales  

> ⚠️ **Note / Remarque**: Performance may drop on very regional accents or rare proper nouns / Les performances peuvent être légèrement réduites pour des accents très régionaux ou des noms propres rares.

---

## Usage / Utilisation

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch
import soundfile as sf

# Load the model / Charger le modèle
repo_id = "keypa/whisper-3-mls-fr"
processor = WhisperProcessor.from_pretrained(repo_id)
model = WhisperForConditionalGeneration.from_pretrained(repo_id)

# Load an audio file / Charger un fichier audio
speech, sr = sf.read("my_audio.wav")
if sr != 16000:
    raise ValueError("Audio must be 16kHz / Audio doit être 16kHz!")

inputs = processor(speech, sampling_rate=16000, return_tensors="pt").input_features
generated_ids = model.generate(inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(transcription)
```