Instructions to use dk2325/whisper-tiny-indian-accent with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dk2325/whisper-tiny-indian-accent with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="dk2325/whisper-tiny-indian-accent")# Load model directly from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq processor = AutoProcessor.from_pretrained("dk2325/whisper-tiny-indian-accent") model = AutoModelForSpeechSeq2Seq.from_pretrained("dk2325/whisper-tiny-indian-accent") - Notebooks
- Google Colab
- Kaggle
Model Card for dk2325/whisper-tiny-indian-accent
Whisper Tiny English model adapted for improved robustness on Indian-accent English speech, while retaining general English ASR performance.
Model Details
Model Description
This model is a domain-adapted ASR checkpoint built from Whisper Tiny English for better transcription quality on Indian-accent English audio.
It was fine-tuned in a constrained local setup and then shared on Hugging Face Hub.
- Developed by: DK2325
- Funded by: Self-funded personal project
- Shared by: DK2325
- Model type: Seq2Seq speech-to-text (Whisper)
- Language(s): English (with focus on Indian-accent English)
- License: Apache-2.0 (inherits upstream base model licensing)
- Finetuned from model: openai/whisper-tiny.en
Model Sources
- Repository: https://github.com/DK2325/ASR_Finetuning_openai-whisper-tiny.en
- Base model: https://huggingface.co/openai/whisper-tiny.en
- Adapted model: https://huggingface.co/dk2325/whisper-tiny-indian-accent
Uses
Direct Use
Use this model for automatic speech recognition on:
- Indian-accent English lectures
- Educational audio
- General English short-form audio where accent robustness is important
Downstream Use
Can be used in:
- Lecture transcription pipelines
- Subtitle generation workflows
- Voice-note to text systems for Indian English speakers
Out-of-Scope Use
Not intended for:
- Non-English transcription
- Medical, legal, or safety-critical transcription without human review
- Speaker identification, emotion recognition, or biometric tasks
- Noisy far-field audio without additional denoising/domain adaptation
Bias, Risks, and Limitations
- Performance may vary across different Indian regions, age groups, and recording setups.
- Accuracy can degrade on heavy background noise, overlapping speech, or code-switching.
- Domain adaptation can reduce performance on accents/domains far from training data.
- Model outputs should be human-validated for high-stakes scenarios.
Recommendations
- Use confidence-aware post-processing and human review for important transcripts.
- Evaluate on your own target domain before production deployment.
- Consider mixed-domain continued training if your data differs significantly.
How to Get Started with the Model
from transformers import pipeline
asr = pipeline(
"automatic-speech-recognition",
model="dk2325/whisper-tiny-indian-accent",
device=-1 # set to 0 for CUDA if available
)
result = asr(
"path/to/audio.wav",
generate_kwargs={"language": "en", "task": "transcribe"}
)
print(result["text"])
Training Details
Training Data
- Base fine-tuning/evaluation workflow used English speech data (LibriSpeech-style setup in project pipeline).
- Indian-accent adaptation used Indian English speech samples from:
- swastik17/nptel_109106147
Training Procedure
Preprocessing
- Audio resampled/processed with Whisper feature extractor pipeline.
- Text normalized through tokenizer/processor workflow for Whisper.
- Standard ASR collator and sequence-to-sequence training stack used.
Training Hyperparameters
- Training regime: fp16 mixed precision
- Learning rate: 1e-5 (conservative for adaptation stability)
- Optimizer: AdamW
- Regularization approach: low LR + weight decay + controlled adaptation duration
- Gradient accumulation: used (for low-VRAM feasibility)
- Hardware context: consumer GPU with 4GB VRAM constraints
Speeds, Sizes, Times
- Trained in a resource-constrained local environment.
- Exact wall-clock and throughput logs were not fully standardized for publication.
Evaluation
Testing Data, Factors and Metrics
Testing Data
- Internal project validation split (LibriSpeech-style validation setup)
- Small-sample Indian-accent checks using dataset-streamed examples
Factors
- Baseline vs fine-tuned comparison
- General English validation performance
- Accent-domain qualitative behavior on Indian English samples
Metrics
- Word Error Rate (WER)
Results
General validation sample (1 percent batch test):
- Base model WER: 0.2806
- Fine-tuned model WER: 0.0586
Indian-accent small-sample check:
- Observed improvement trend in domain-specific transcription quality (qualitative and quick quantitative checks)
Summary
The adaptation phase substantially improved ASR quality in project validation and showed better handling of Indian-accent speech, with a practical low-resource training strategy.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute
- Hardware Type: Local consumer GPU (4GB VRAM class)
- Hours used: Not precisely tracked
- Cloud Provider: N/A (local training)
- Compute Region: N/A
- Carbon Emitted: Not measured
Technical Specifications
Model Architecture and Objective
- Architecture: Whisper Tiny English encoder-decoder transformer
- Objective: Sequence-to-sequence speech transcription
- Adaptation goal: Improve robustness on Indian-accent English while retaining base English ASR capability
Compute Infrastructure
Hardware
- Local machine
- GPU: 4GB VRAM class
Software
- Python
- Hugging Face Transformers
- PyTorch
- Datasets
- Evaluate
Citation
BibTeX
@misc{dk2325_whisper_tiny_indian_accent_2026,
title={Whisper Tiny Indian Accent Adaptation},
author={DK2325},
year={2026},
howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-indian-accent}}
}
APA
DK2325. (2026). Whisper Tiny Indian Accent Adaptation. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-indian-accent
More Information
This model was developed as a practical end-to-end ASR fine-tuning and deployment project under tight hardware constraints, with focus on measurable improvement and reproducible workflow.
Model Card Authors
DK2325
Model Card Contact
Use the Hugging Face profile contact path: https://huggingface.co/dk2325
If you want, I can also give you a second version optimized for recruiter readability, shorter and more impact-focused for public profile views.
- Downloads last month
- 16