CDLI Parakeet TDT 0.6B English Fine-Tune (Uganda + Kenya)

This repository contains a NeMo ASR model fine-tuned from nvidia/parakeet-tdt-0.6b-v2 on a merged atypical-English training mix from Uganda and Kenya.

The setup mirrors the standalone Ugandan English Parakeet notebook, but extends training to both countries while keeping model-selection validation on Uganda only. Test evaluation is then reported separately for Uganda and Kenya.

Model Details

Base model: nvidia/parakeet-tdt-0.6b-v2
Fine-tuning framework: NVIDIA NeMo
Language: English
Acoustic model family: FastConformer-TDT / RNNT-BPE
Output text: lower-case English transcription with standard ASR normalization

Datasets

Uganda: cdli/ugandan_english_nonstandard_speech_v1.0
Kenya: cdli/kenyan_english_nonstandard_speech_v1.0
License: cc-by-sa-4.0
Audio sampling rate: 16 kHz

Split sizes used in this run:

Uganda train: 5175
Uganda validation: 638
Uganda test: 1016
Kenya train: 4374
Kenya validation: 542
Kenya test: 928
Merged train total: 9549 rows, 56.46 hours
Validation selection set: Uganda only, 638 rows, 4.33 hours

The underlying CDLI datasets include atypical or non-standard speech and speaker metadata such as severity of speech impairment, disorder type, age, gender, and etiology.

Training Configuration

Work root: /jupyter_kernel/parakeet_cdli_en_ug_ke
Train mix: Uganda + Kenya
Primary validation country: Uganda
Max manifest audio length: 40.0 s
Max training audio length: 40.0 s
Min audio length: 0.2 s
Train batch size: 8
Eval batch size: 8
Gradient accumulation steps: 8
Effective train batch size: 64
Learning rate: 5e-5
Weight decay: 1e-3
Warmup steps: 100
Scheduler: CosineAnnealing
Max steps: 20000
Validation interval: 200 steps
Early stopping patience: 10
Precision: bf16-mixed when supported, otherwise mixed precision fallback

Evaluation

Evaluation was run separately on the held-out Uganda and Kenya test splits using both raw transcript comparison and normalized transcript comparison.

Uganda Test Set

Raw WER: 27.58%
Raw CER: 14.17%
Normalized WER: 21.33%
Normalized CER: 12.81%
Average normalized utterance WER (capped at 1.0): 20.77%
Average normalized utterance CER (capped at 1.0): 12.89%

Kenya Test Set

Raw WER: 26.36%
Raw CER: 11.78%
Normalized WER: 14.56%
Normalized CER: 9.46%
Average normalized utterance WER (capped at 1.0): 14.49%
Average normalized utterance CER (capped at 1.0): 9.20%

Uganda and Kenya Comparison

The same shared model performs materially better on the Kenyan test split than on the Ugandan test split.

Kenya vs Uganda normalized WER: 14.56% vs 21.33%
Kenya vs Uganda normalized CER: 9.46% vs 12.81%
Absolute normalized WER gap: 6.77 points in favor of Kenya

Relative to the Uganda-only English Parakeet 0.6B run (KasuleTrevor/cdli-parakeet-en-finetune), the Uganda score in this mixed Uganda+Kenya run improved slightly:

Uganda-only normalized WER: 21.72%
UG+KE model on Uganda normalized WER: 21.33%
Absolute Uganda improvement: 0.39 WER points

That means the mixed-country training did not hurt Uganda selection-time performance and appears to improve cross-country generalization, with the strongest gains visible on Kenya.

Error Analysis Highlights

Per-country grouped analysis files were produced for:

severity
disorder type
etiology
speaker

High-level patterns from the notebook outputs:

Uganda showed the expected severity gradient, with mild speakers performing best and severe speakers worst.
Kenya also showed a severity gradient, but absolute scores were better than Uganda across all three severity bands.
In Kenya, the best disorder bucket was Dysphonia and the hardest was the mixed Stuttering (Disfluency Disorders), Dysarthria bucket.
In Uganda, Voice disorder had the highest mean WER among the reported disorder groups.

Usage

from nemo.collections.asr.models import ASRModel

model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])

Files

EN-PARAKEET-TDT-UG-KE.nemo: exported NeMo checkpoint
results/EN-PARAKEET-TDT-UG-KE/: uploaded result tables and breakdown CSVs
train_mix_summary_ug_ke.json
country_summary_ug_ke.csv
test_predictions_scored_uganda.csv
test_predictions_scored_kenya.csv
severity_breakdown_combined_ug_ke.csv
disorder_breakdown_combined_ug_ke.csv
etiology_breakdown_combined_ug_ke.csv

Notes

Validation for early stopping and checkpoint selection was Uganda-only, even though training used both Uganda and Kenya.
Source datasets are gated. Review the dataset terms before requesting access.
Result artifacts for both countries were uploaded to the Hub under results/EN-PARAKEET-TDT-UG-KE/.

Downloads last month: 20

Datasets used to train KasuleTrevor/cdli-parakeet-en-ug-ke

Collection including KasuleTrevor/cdli-parakeet-en-ug-ke

CDLI

Collection

This is a collection of models used for the CDLI ASR challenge for atypical speech in Uganda on Ugandan English and Luganda. • 26 items • Updated 18 days ago

Evaluation results

Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

27.580
Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

14.170
Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

21.330
Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0
test set self-reported

12.810
Test WER (raw) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

26.360
Test CER (raw) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

11.780
Test WER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

14.560
Test CER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0
test set self-reported

9.460