Instructions to use Tachyeon/whisper-large-v3-turbo-hindi-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Tachyeon/whisper-large-v3-turbo-hindi-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForSeq2SeqLM base_model = AutoModelForSeq2SeqLM.from_pretrained("openai/whisper-large-v3-turbo") model = PeftModel.from_pretrained(base_model, "Tachyeon/whisper-large-v3-turbo-hindi-lora") - Notebooks
- Google Colab
- Kaggle
Whisper Large-v3-Turbo Hindi LoRA
A LoRA fine-tuned adapter for openai/whisper-large-v3-turbo optimized for Hindi (Devanagari) speech recognition.
Results
| Model | WER (%) | Eval Set |
|---|---|---|
openai/whisper-large-v3-turbo (baseline) |
35.56 | FLEURS hi_in test (n=418) |
| + LoRA fine-tune (this model) | 22.25 | FLEURS hi_in test (n=418) |
| + CTranslate2 INT8 deployment | 22.70 | FLEURS hi_in test (n=418) |
37.4% relative WER reduction. INT8 deployment via faster-whisper adds only 0.45% WER degradation.
Evaluation uses Whisper-default text normalization. See Normalization Notes below.
Comparison with Other Hindi ASR Models
| Model | WER (%) | Method | Training Data |
|---|---|---|---|
| collabora/whisper-large-v2-hindi | 5.33 | Full fine-tune | Multi-corpus (100h+) |
| vasista22/whisper-hindi-large-v2 | 6.80 | Full fine-tune | Multi-corpus (100h+) |
openai/whisper-large-v3-turbo |
35.56 | Zero-shot | — |
| This model (LoRA) | 22.25 | LoRA (3.33% params) | FLEURS only (~3.5h) |
Note: The collabora and vasista22 models are full fine-tunes trained on hundreds of hours of multi-corpus Hindi data. This model uses only ~3.5 hours of FLEURS data with a lightweight LoRA adapter, making it a fundamentally different trade-off: minimal data and compute for significant WER improvement over the zero-shot baseline.
Training Curve
| Step | Train Loss | Eval Loss | Eval WER (%) |
|---|---|---|---|
| 50 | 0.263 | 0.259 | 29.40 |
| 100 | 0.210 | 0.234 | 25.74 |
| 150 | 0.145 | 0.223 | 24.49 |
| 200 | 0.148 | 0.217 | 23.43 |
| 250 | 0.146 | 0.213 | 23.82 |
| 300 | 0.096 | 0.215 | 22.42 |
| 350 | 0.109 | 0.215 | 22.50 |
Best checkpoint: step 300 (lowest val WER). Test WER: 22.25%.
How to Use
With PEFT (LoRA adapter)
import torch
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from peft import PeftModel
BASE_MODEL = "openai/whisper-large-v3-turbo"
ADAPTER = "Tachyeon/whisper-large-v3-turbo-hindi-lora"
processor = WhisperProcessor.from_pretrained(BASE_MODEL)
base_model = WhisperForConditionalGeneration.from_pretrained(
BASE_MODEL, torch_dtype=torch.bfloat16, attn_implementation="sdpa",
)
model = PeftModel.from_pretrained(base_model, ADAPTER)
model = model.to("cuda").eval()
# Transcribe (audio_array: 16kHz float32 numpy array)
input_features = processor(
audio_array, sampling_rate=16000, return_tensors="pt"
).input_features.to("cuda", dtype=torch.bfloat16)
with torch.inference_mode():
predicted_ids = model.generate(
input_features, language="hi", task="transcribe"
)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
With faster-whisper (merged + CTranslate2)
For production deployment, merge the adapter and convert to CTranslate2:
# Merge LoRA → convert → evaluate
python convert_and_eval.py --lora-dir outputs/whisper-large-v3-turbo-hindi-lora --quant int8 --gpu 0
from faster_whisper import WhisperModel
model = WhisperModel("path/to/ct2-model", device="cuda", compute_type="int8")
segments, info = model.transcribe("audio.wav", language="hi", beam_size=1)
print(" ".join(seg.text.strip() for seg in segments))
Full pipeline code (data prep → training → deployment): github.com/ipritamdash/whisper-hindi-lora
LoRA Configuration
| Parameter | Value |
|---|---|
| Rank (r) | 32 |
| Alpha | 64 (2x rank) |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, out_proj, fc1, fc2 |
| Trainable Parameters | 27,852,800 / 836,730,880 (3.33%) |
| Bias | none |
Architecture choice follows LoRA-Whisper (arXiv:2406.06619): encoder+decoder targeting on all linear layers outperforms decoder-only or q/v-only configurations.
Training Details
| Parameter | Value |
|---|---|
| Base Model | openai/whisper-large-v3-turbo (809M params) |
| Dataset | google/fleurs hi_in |
| Train / Val / Test | 2,120 / 239 / 418 samples |
| Epochs | 3 |
| Learning Rate | 1e-4 (linear decay) |
| Warmup Steps | 50 |
| Batch Size | 4 (x4 gradient accumulation = effective 16) |
| Optimizer | AdamW (weight_decay=0.01) |
| Precision | BFloat16 |
| Gradient Checkpointing | Enabled |
| Hardware | NVIDIA A10G (23GB VRAM) |
| Training Time | 45 minutes |
| Seed | 42 |
Framework Versions
- Transformers: 4.57.3
- PEFT: 0.18.1
- PyTorch: 2.6.0+cu124
- Datasets: 3.6.0
Dataset
Google FLEURS Hindi (hi_in):
- Domain: Read speech from Wikipedia sentences
- Audio: 16kHz mono, Devanagari script
- License: CC BY 4.0
- Size: ~3.5 hours across train/val/test
Normalization Notes
Hindi ASR evaluation is sensitive to text normalization. Whisper's default normalizer strips diacritics and simplifies conjunct consonants, which can inflate apparent accuracy but loses semantic precision.
WER numbers above use Whisper-default normalization for comparability with other HuggingFace models. For production Hindi ASR, consider evaluation with IndicNLP normalizer.
Limitations
- Training data scope: Trained on FLEURS read speech (~3.5h). Performance on conversational, noisy, or accented Hindi may vary.
- Language detection: Fine-tuning on a single language can degrade Whisper's multilingual detection. Set
language="hi"explicitly. - Code-mixing: Performance on Hindi-English (Hinglish) is not evaluated.
- Base model biases: Any biases in
whisper-large-v3-turbocarry through.
Citation
@misc{dash2026whisper_hindi_lora,
author = {Pritam Dash},
title = {Whisper Large-v3-Turbo Hindi LoRA Fine-tune},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/Tachyeon/whisper-large-v3-turbo-hindi-lora}
}
References
- Whisper paper (Radford et al., 2023)
- LoRA paper (Hu et al., 2021)
- LoRA-Whisper (Yang et al., 2024)
- FLEURS (Conneau et al., 2023)
- Downloads last month
- 4
Model tree for Tachyeon/whisper-large-v3-turbo-hindi-lora
Dataset used to train Tachyeon/whisper-large-v3-turbo-hindi-lora
Papers for Tachyeon/whisper-large-v3-turbo-hindi-lora
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Robust Speech Recognition via Large-Scale Weak Supervision
LoRA: Low-Rank Adaptation of Large Language Models
Evaluation results
- WER on Google FLEURS (hi_in)test set self-reported22.250