Model Card for dk2325/whisper-tiny-indian-accent

Whisper Tiny English model adapted for improved robustness on Indian-accent English speech, while retaining general English ASR performance.

Model Details

Model Description

This model is a domain-adapted ASR checkpoint built from Whisper Tiny English for better transcription quality on Indian-accent English audio.
It was fine-tuned in a constrained local setup and then shared on Hugging Face Hub.

  • Developed by: DK2325
  • Funded by: Self-funded personal project
  • Shared by: DK2325
  • Model type: Seq2Seq speech-to-text (Whisper)
  • Language(s): English (with focus on Indian-accent English)
  • License: Apache-2.0 (inherits upstream base model licensing)
  • Finetuned from model: openai/whisper-tiny.en

Model Sources

Uses

Direct Use

Use this model for automatic speech recognition on:

  • Indian-accent English lectures
  • Educational audio
  • General English short-form audio where accent robustness is important

Downstream Use

Can be used in:

  • Lecture transcription pipelines
  • Subtitle generation workflows
  • Voice-note to text systems for Indian English speakers

Out-of-Scope Use

Not intended for:

  • Non-English transcription
  • Medical, legal, or safety-critical transcription without human review
  • Speaker identification, emotion recognition, or biometric tasks
  • Noisy far-field audio without additional denoising/domain adaptation

Bias, Risks, and Limitations

  • Performance may vary across different Indian regions, age groups, and recording setups.
  • Accuracy can degrade on heavy background noise, overlapping speech, or code-switching.
  • Domain adaptation can reduce performance on accents/domains far from training data.
  • Model outputs should be human-validated for high-stakes scenarios.

Recommendations

  • Use confidence-aware post-processing and human review for important transcripts.
  • Evaluate on your own target domain before production deployment.
  • Consider mixed-domain continued training if your data differs significantly.

How to Get Started with the Model

from transformers import pipeline

asr = pipeline(
    "automatic-speech-recognition",
    model="dk2325/whisper-tiny-indian-accent",
    device=-1  # set to 0 for CUDA if available
)

result = asr(
    "path/to/audio.wav",
    generate_kwargs={"language": "en", "task": "transcribe"}
)

print(result["text"])

Training Details

Training Data

  • Base fine-tuning/evaluation workflow used English speech data (LibriSpeech-style setup in project pipeline).
  • Indian-accent adaptation used Indian English speech samples from:
    • swastik17/nptel_109106147

Training Procedure

Preprocessing

  • Audio resampled/processed with Whisper feature extractor pipeline.
  • Text normalized through tokenizer/processor workflow for Whisper.
  • Standard ASR collator and sequence-to-sequence training stack used.

Training Hyperparameters

  • Training regime: fp16 mixed precision
  • Learning rate: 1e-5 (conservative for adaptation stability)
  • Optimizer: AdamW
  • Regularization approach: low LR + weight decay + controlled adaptation duration
  • Gradient accumulation: used (for low-VRAM feasibility)
  • Hardware context: consumer GPU with 4GB VRAM constraints

Speeds, Sizes, Times

  • Trained in a resource-constrained local environment.
  • Exact wall-clock and throughput logs were not fully standardized for publication.

Evaluation

Testing Data, Factors and Metrics

Testing Data

  • Internal project validation split (LibriSpeech-style validation setup)
  • Small-sample Indian-accent checks using dataset-streamed examples

Factors

  • Baseline vs fine-tuned comparison
  • General English validation performance
  • Accent-domain qualitative behavior on Indian English samples

Metrics

  • Word Error Rate (WER)

Results

  • General validation sample (1 percent batch test):

    • Base model WER: 0.2806
    • Fine-tuned model WER: 0.0586
  • Indian-accent small-sample check:

    • Observed improvement trend in domain-specific transcription quality (qualitative and quick quantitative checks)

Summary

The adaptation phase substantially improved ASR quality in project validation and showed better handling of Indian-accent speech, with a practical low-resource training strategy.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator: https://mlco2.github.io/impact#compute

  • Hardware Type: Local consumer GPU (4GB VRAM class)
  • Hours used: Not precisely tracked
  • Cloud Provider: N/A (local training)
  • Compute Region: N/A
  • Carbon Emitted: Not measured

Technical Specifications

Model Architecture and Objective

  • Architecture: Whisper Tiny English encoder-decoder transformer
  • Objective: Sequence-to-sequence speech transcription
  • Adaptation goal: Improve robustness on Indian-accent English while retaining base English ASR capability

Compute Infrastructure

Hardware

  • Local machine
  • GPU: 4GB VRAM class

Software

  • Python
  • Hugging Face Transformers
  • PyTorch
  • Datasets
  • Evaluate

Citation

BibTeX

@misc{dk2325_whisper_tiny_indian_accent_2026,
  title={Whisper Tiny Indian Accent Adaptation},
  author={DK2325},
  year={2026},
  howpublished={\url{https://huggingface.co/dk2325/whisper-tiny-indian-accent}}
}

APA

DK2325. (2026). Whisper Tiny Indian Accent Adaptation. Hugging Face. https://huggingface.co/dk2325/whisper-tiny-indian-accent

More Information

This model was developed as a practical end-to-end ASR fine-tuning and deployment project under tight hardware constraints, with focus on measurable improvement and reproducible workflow.

Model Card Authors

DK2325

Model Card Contact

Use the Hugging Face profile contact path: https://huggingface.co/dk2325

If you want, I can also give you a second version optimized for recruiter readability, shorter and more impact-focused for public profile views.

Downloads last month
16
Safetensors
Model size
37.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support