Model Card for dk2325/whisper-tiny-indian-accent

Whisper Tiny English model adapted for improved robustness on Indian-accent English speech, while retaining general English ASR performance.

Model Details

Model Description

This model is a domain-adapted ASR checkpoint built from Whisper Tiny English for better transcription quality on Indian-accent English audio.
It was fine-tuned in a constrained local setup and then shared on Hugging Face Hub.

Developed by: DK2325
Funded by: Self-funded personal project
Shared by: DK2325
Model type: Seq2Seq speech-to-text (Whisper)
Language(s): English (with focus on Indian-accent English)
License: Apache-2.0 (inherits upstream base model licensing)
Finetuned from model: openai/whisper-tiny.en

Model Sources

Repository: https://github.com/DK2325/ASR_Finetuning_openai-whisper-tiny.en
Base model: https://huggingface.co/openai/whisper-tiny.en
Adapted model: https://huggingface.co/dk2325/whisper-tiny-indian-accent

Uses

Direct Use

Use this model for automatic speech recognition on:

Indian-accent English lectures
Educational audio
General English short-form audio where accent robustness is important

Downstream Use

Can be used in:

Lecture transcription pipelines
Subtitle generation workflows
Voice-note to text systems for Indian English speakers

Out-of-Scope Use

Not intended for:

Non-English transcription
Medical, legal, or safety-critical transcription without human review
Speaker identification, emotion recognition, or biometric tasks
Noisy far-field audio without additional denoising/domain adaptation

Bias, Risks, and Limitations

Performance may vary across different Indian regions, age groups, and recording setups.
Accuracy can degrade on heavy background noise, overlapping speech, or code-switching.
Domain adaptation can reduce performance on accents/domains far from training data.
Model outputs should be human-validated for high-stakes scenarios.

Recommendations

Use confidence-aware post-processing and human review for important transcripts.
Evaluate on your own target domain before production deployment.
Consider mixed-domain continued training if your data differs significantly.

How to Get Started with the Model

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="dk2325/whisper-tiny-indian-accent",
    device=-1  # set to 0 for CUDA if available
)

result = asr(
    "path/to/audio.wav",
    generate_kwargs={"language": "en", "task": "transcribe"}
)

print(result["text"])

Training Details

Training Data

Base fine-tuning/evaluation workflow used English speech data (LibriSpeech-style setup in project pipeline).
Indian-accent adaptation used Indian English speech samples from:
- swastik17/nptel_109106147

Training Procedure

Preprocessing

Audio resampled/processed with Whisper feature extractor pipeline.
Text normalized through tokenizer/processor workflow for Whisper.
Standard ASR collator and sequence-to-sequence training stack used.

Training Hyperparameters

Training regime: fp16 mixed precision
Learning rate: 1e-5 (conservative for adaptation stability)
Optimizer: AdamW
Regularization approach: low LR + weight decay + controlled adaptation duration
Gradient accumulation: used (for low-VRAM feasibility)
Hardware context: consumer GPU with 4GB VRAM constraints

Speeds, Sizes, Times

Trained in a resource-constrained local environment.
Exact wall-clock and throughput logs were not fully standardized for publication.

Evaluation

Testing Data, Factors and Metrics

Testing Data

Internal project validation split (LibriSpeech-style validation setup)
Small-sample Indian-accent checks using dataset-streamed examples

Factors

Baseline vs fine-tuned comparison
General English validation performance
Accent-domain qualitative behavior on Indian English samples

Metrics

Word Error Rate (WER)

Results

General validation sample (1 percent batch test):
- Base model WER: 0.2806
- Fine-tuned model WER: 0.0586
Indian-accent small-sample check:
- Observed improvement trend in domain-specific transcription quality (qualitative and quick quantitative checks)

Summary

The adaptation phase substantially improved ASR quality in project validation and showed better handling of Indian-accent speech, with a practical low-resource training strategy.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute

Hardware Type: Local consumer GPU (4GB VRAM class)
Hours used: Not precisely tracked
Cloud Provider: N/A (local training)
Compute Region: N/A
Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

Architecture: Whisper Tiny English encoder-decoder transformer
Objective: Sequence-to-sequence speech transcription
Adaptation goal: Improve robustness on Indian-accent English while retaining base English ASR capability

Compute Infrastructure

Hardware

Local machine
GPU: 4GB VRAM class

Software

Python
Hugging Face Transformers
PyTorch
Datasets
Evaluate

Citation

BibTeX

@misc{dk2325_whisper_tiny_indian_accent_2026,
  title={Whisper Tiny Indian Accent Adaptation},
  author={DK2325},
  year={2026},
  howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-indian-accent}}
}

APA

DK2325. (2026). Whisper Tiny Indian Accent Adaptation. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-indian-accent

More Information

This model was developed as a practical end-to-end ASR fine-tuning and deployment project under tight hardware constraints, with focus on measurable improvement and reproducible workflow.

Model Card Authors

DK2325

Model Card Contact

Use the Hugging Face profile contact path: https://huggingface.co/dk2325

If you want, I can also give you a second version optimized for recruiter readability, shorter and more impact-focused for public profile views.

Downloads last month: 16

Safetensors

Model size

37.8M params

Tensor type

F32