Update README.md

ca1e7ac verified 10 months ago

4.28 kB

	---
	library_name: transformers
	license: apache-2.0
	language:
	- en
	- hi
	- bn
	- mr
	- ta
	- te
	base_model: distil-whisper/distil-large-v3
	tags:
	- whisper
	- speech-recognition
	- multilingual
	- automatic-speech-recognition
	- hindi
	- bengali
	- marathi
	- tamil
	- telugu
	- english
	- distil-whisper
	- indian-languages
	datasets:
	- custom-multilingual-dataset
	metrics:
	- wer
	- cer
	pipeline_tag: automatic-speech-recognition
	model-index:
	- name: whisper-multilang-finetuned
	results:
	- task:
	type: automatic-speech-recognition
	name: Automatic Speech Recognition
	dataset:
	type: custom-multilingual-dataset
	name: Custom Multilingual Dataset
	metrics:
	- type: wer
	value: 27.08
	name: Word Error Rate
	- type: wer
	value: 26.73
	name: Best WER
	widget:
	- example_title: "Hindi Speech Recognition"
	text: "मैं आज बाजार जा रहा हूं"
	- example_title: "Bengali Speech Recognition"
	text: "আমি আজ বাজারে যাচ্ছি"
	- example_title: "English Speech Recognition"
	text: "I am going to the market today"
	---


	# Whisper Multilingual Fine-tuned Model

	This is a fine-tuned version of OpenAI's Whisper model for multilingual speech recognition.

	## Supported Languages
	- English (en)
	- Hindi (hi)
	- Bengali (bn)
	- Marathi (mr)
	- Tamil (ta)
	- Telugu (te)

	## Model Details
	- Base Model: Distil Whisper Large V3
	- Fine-tuned on: Custom multilingual dataset
	- Training Framework: Transformers
	- Model Type: Speech-to-Text

	## Usage

	```python
	from transformers import WhisperProcessor, WhisperForConditionalGeneration
	import librosa

	# Load model and processor
	processor = WhisperProcessor.from_pretrained("TheKingMonarch/whisper-multilang-finetuned")
	model = WhisperForConditionalGeneration.from_pretrained("TheKingMonarch/whisper-multilang-finetuned")

	# Fix generation config
	model.generation_config.forced_decoder_ids = None

	# Load audio
	audio, _ = librosa.load("audio.wav", sr=16000)

	# Transcribe
	inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
	predicted_ids = model.generate(inputs.input_features)
	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
	print(transcription)
	```

	## Language-specific Usage

	```python
	# For specific language (e.g., Hindi)
	forced_decoder_ids = processor.get_decoder_prompt_ids(language="hi", task="transcribe")
	predicted_ids = model.generate(inputs.input_features, forced_decoder_ids=forced_decoder_ids)
	```

	## Training Details
	- Fine-tuned using custom multilingual speech dataset
	- Optimized for Indian languages and English

	- Final WER: 27.08%
	- Training Steps: 600
	- Best WER achieved: 26.73% at step 550

	### Training Metrics

	\| Step \| Training Loss \| Validation Loss \| WER (%) \|
	\|------\|---------------\|-----------------\|---------\|
	\| 50 \| 2.075000 \| 1.930286 \| 133.45 \|
	\| 100 \| 1.206600 \| 1.275027 \| 89.54 \|
	\| 150 \| 0.793800 \| 0.712475 \| 93.42 \|
	\| 200 \| 0.528700 \| 0.562679 \| 88.92 \|
	\| 250 \| 0.379900 \| 0.473467 \| 89.27 \|
	\| 300 \| 0.289400 \| 0.369892 \| 69.88 \|
	\| 350 \| 0.244300 \| 0.291235 \| 49.58 \|
	\| 400 \| 0.268800 \| 0.249055 \| 42.80 \|
	\| 450 \| 0.122200 \| 0.209867 \| 36.29 \|
	\| 500 \| 0.084700 \| 0.173593 \| 31.44 \|
	\| 550 \| 0.073400 \| 0.155249 \| 26.73 \|
	\| 600 \| 0.044300 \| 0.148559 \| 27.08 \|

	### Training Configuration
	- Base Model: distil whispwer large v3
	- Learning Rate: Optimized during training
	- Batch Size: Configured for optimal performance
	- Training Duration: 600 steps
	- Evaluation Strategy: Every 50 steps
	- Early Stopping: Based on WER improvement

	## Limitations
	- Performance may vary across different accents and dialects
	- Best results on clear audio with minimal background noise
	- Optimized for the specific languages listed above

	## Citation
	If you use this model, please cite:
	```
	@misc{{whisper-multilang-finetuned,
	author = {{Your Name}},
	title = {{Whisper Multilingual Fine-tuned Model}},
	year = {{2025}},
	publisher = {{Hugging Face}},
	url = {{https://huggingface.co/TheKingMonarch/whisper-multilang-finetuned}}
	}}
	```