Automatic Speech Recognition
Transformers
Safetensors
whisper
speech-recognition
multilingual
hindi
bengali
marathi
tamil
telugu
english
distil-whisper
indian-languages
Eval Results (legacy)
Instructions to use TheKingMonarch/whisper-multilang-finetuned with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use TheKingMonarch/whisper-multilang-finetuned with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="TheKingMonarch/whisper-multilang-finetuned")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("TheKingMonarch/whisper-multilang-finetuned", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| license: apache-2.0 | |
| language: | |
| - en | |
| - hi | |
| - bn | |
| - mr | |
| - ta | |
| - te | |
| base_model: distil-whisper/distil-large-v3 | |
| tags: | |
| - whisper | |
| - speech-recognition | |
| - multilingual | |
| - automatic-speech-recognition | |
| - hindi | |
| - bengali | |
| - marathi | |
| - tamil | |
| - telugu | |
| - english | |
| - distil-whisper | |
| - indian-languages | |
| datasets: | |
| - custom-multilingual-dataset | |
| metrics: | |
| - wer | |
| - cer | |
| pipeline_tag: automatic-speech-recognition | |
| model-index: | |
| - name: whisper-multilang-finetuned | |
| results: | |
| - task: | |
| type: automatic-speech-recognition | |
| name: Automatic Speech Recognition | |
| dataset: | |
| type: custom-multilingual-dataset | |
| name: Custom Multilingual Dataset | |
| metrics: | |
| - type: wer | |
| value: 27.08 | |
| name: Word Error Rate | |
| - type: wer | |
| value: 26.73 | |
| name: Best WER | |
| widget: | |
| - example_title: "Hindi Speech Recognition" | |
| text: "मैं आज बाजार जा रहा हूं" | |
| - example_title: "Bengali Speech Recognition" | |
| text: "আমি আজ বাজারে যাচ্ছি" | |
| - example_title: "English Speech Recognition" | |
| text: "I am going to the market today" | |
| # Whisper Multilingual Fine-tuned Model | |
| This is a fine-tuned version of OpenAI's Whisper model for multilingual speech recognition. | |
| ## Supported Languages | |
| - English (en) | |
| - Hindi (hi) | |
| - Bengali (bn) | |
| - Marathi (mr) | |
| - Tamil (ta) | |
| - Telugu (te) | |
| ## Model Details | |
| - **Base Model**: Distil Whisper Large V3 | |
| - **Fine-tuned on**: Custom multilingual dataset | |
| - **Training Framework**: Transformers | |
| - **Model Type**: Speech-to-Text | |
| ## Usage | |
| ```python | |
| from transformers import WhisperProcessor, WhisperForConditionalGeneration | |
| import librosa | |
| # Load model and processor | |
| processor = WhisperProcessor.from_pretrained("TheKingMonarch/whisper-multilang-finetuned") | |
| model = WhisperForConditionalGeneration.from_pretrained("TheKingMonarch/whisper-multilang-finetuned") | |
| # Fix generation config | |
| model.generation_config.forced_decoder_ids = None | |
| # Load audio | |
| audio, _ = librosa.load("audio.wav", sr=16000) | |
| # Transcribe | |
| inputs = processor(audio, sampling_rate=16000, return_tensors="pt") | |
| predicted_ids = model.generate(inputs.input_features) | |
| transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] | |
| print(transcription) | |
| ``` | |
| ## Language-specific Usage | |
| ```python | |
| # For specific language (e.g., Hindi) | |
| forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe") | |
| predicted_ids = model.generate(inputs.input_features, forced_decoder_ids=forced_decoder_ids) | |
| ``` | |
| ## Training Details | |
| - Fine-tuned using custom multilingual speech dataset | |
| - Optimized for Indian languages and English | |
| - **Final WER**: 27.08% | |
| - **Training Steps**: 600 | |
| - **Best WER achieved**: 26.73% at step 550 | |
| ### Training Metrics | |
| | Step | Training Loss | Validation Loss | WER (%) | | |
| |------|---------------|-----------------|---------| | |
| | 50 | 2.075000 | 1.930286 | 133.45 | | |
| | 100 | 1.206600 | 1.275027 | 89.54 | | |
| | 150 | 0.793800 | 0.712475 | 93.42 | | |
| | 200 | 0.528700 | 0.562679 | 88.92 | | |
| | 250 | 0.379900 | 0.473467 | 89.27 | | |
| | 300 | 0.289400 | 0.369892 | 69.88 | | |
| | 350 | 0.244300 | 0.291235 | 49.58 | | |
| | 400 | 0.268800 | 0.249055 | 42.80 | | |
| | 450 | 0.122200 | 0.209867 | 36.29 | | |
| | 500 | 0.084700 | 0.173593 | 31.44 | | |
| | 550 | 0.073400 | 0.155249 | **26.73** | | |
| | 600 | 0.044300 | 0.148559 | 27.08 | | |
| ### Training Configuration | |
| - **Base Model**: distil whispwer large v3 | |
| - **Learning Rate**: Optimized during training | |
| - **Batch Size**: Configured for optimal performance | |
| - **Training Duration**: 600 steps | |
| - **Evaluation Strategy**: Every 50 steps | |
| - **Early Stopping**: Based on WER improvement | |
| ## Limitations | |
| - Performance may vary across different accents and dialects | |
| - Best results on clear audio with minimal background noise | |
| - Optimized for the specific languages listed above | |
| ## Citation | |
| If you use this model, please cite: | |
| ``` | |
| @misc{{whisper-multilang-finetuned, | |
| author = {{Your Name}}, | |
| title = {{Whisper Multilingual Fine-tuned Model}}, | |
| year = {{2025}}, | |
| publisher = {{Hugging Face}}, | |
| url = {{https://huggingface.co/TheKingMonarch/whisper-multilang-finetuned}} | |
| }} | |
| ``` |