Instructions to use KasuleTrevor/cdli-parakeet-en-ug-ke with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use KasuleTrevor/cdli-parakeet-en-ug-ke with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
CDLI Parakeet TDT 0.6B English Fine-Tune (Uganda + Kenya)
This repository contains a NeMo ASR model fine-tuned from
nvidia/parakeet-tdt-0.6b-v2 on a merged atypical-English training mix from
Uganda and Kenya.
The setup mirrors the standalone Ugandan English Parakeet notebook, but extends training to both countries while keeping model-selection validation on Uganda only. Test evaluation is then reported separately for Uganda and Kenya.
Model Details
- Base model:
nvidia/parakeet-tdt-0.6b-v2 - Fine-tuning framework: NVIDIA NeMo
- Language: English
- Acoustic model family: FastConformer-TDT / RNNT-BPE
- Output text: lower-case English transcription with standard ASR normalization
Datasets
- Uganda:
cdli/ugandan_english_nonstandard_speech_v1.0 - Kenya:
cdli/kenyan_english_nonstandard_speech_v1.0 - License:
cc-by-sa-4.0 - Audio sampling rate:
16 kHz
Split sizes used in this run:
- Uganda train:
5175 - Uganda validation:
638 - Uganda test:
1016 - Kenya train:
4374 - Kenya validation:
542 - Kenya test:
928 - Merged train total:
9549rows,56.46hours - Validation selection set: Uganda only,
638rows,4.33hours
The underlying CDLI datasets include atypical or non-standard speech and speaker metadata such as severity of speech impairment, disorder type, age, gender, and etiology.
Training Configuration
- Work root:
/jupyter_kernel/parakeet_cdli_en_ug_ke - Train mix: Uganda + Kenya
- Primary validation country: Uganda
- Max manifest audio length:
40.0 s - Max training audio length:
40.0 s - Min audio length:
0.2 s - Train batch size:
8 - Eval batch size:
8 - Gradient accumulation steps:
8 - Effective train batch size:
64 - Learning rate:
5e-5 - Weight decay:
1e-3 - Warmup steps:
100 - Scheduler:
CosineAnnealing - Max steps:
20000 - Validation interval:
200steps - Early stopping patience:
10 - Precision:
bf16-mixedwhen supported, otherwise mixed precision fallback
Evaluation
Evaluation was run separately on the held-out Uganda and Kenya test splits using both raw transcript comparison and normalized transcript comparison.
Uganda Test Set
- Raw WER:
27.58% - Raw CER:
14.17% - Normalized WER:
21.33% - Normalized CER:
12.81% - Average normalized utterance WER (capped at
1.0):20.77% - Average normalized utterance CER (capped at
1.0):12.89%
Kenya Test Set
- Raw WER:
26.36% - Raw CER:
11.78% - Normalized WER:
14.56% - Normalized CER:
9.46% - Average normalized utterance WER (capped at
1.0):14.49% - Average normalized utterance CER (capped at
1.0):9.20%
Uganda and Kenya Comparison
The same shared model performs materially better on the Kenyan test split than on the Ugandan test split.
- Kenya vs Uganda normalized WER:
14.56%vs21.33% - Kenya vs Uganda normalized CER:
9.46%vs12.81% - Absolute normalized WER gap:
6.77points in favor of Kenya
Relative to the Uganda-only English Parakeet 0.6B run
(KasuleTrevor/cdli-parakeet-en-finetune), the Uganda score in this mixed
Uganda+Kenya run improved slightly:
- Uganda-only normalized WER:
21.72% - UG+KE model on Uganda normalized WER:
21.33% - Absolute Uganda improvement:
0.39WER points
That means the mixed-country training did not hurt Uganda selection-time performance and appears to improve cross-country generalization, with the strongest gains visible on Kenya.
Error Analysis Highlights
Per-country grouped analysis files were produced for:
- severity
- disorder type
- etiology
- speaker
High-level patterns from the notebook outputs:
- Uganda showed the expected severity gradient, with mild speakers performing best and severe speakers worst.
- Kenya also showed a severity gradient, but absolute scores were better than Uganda across all three severity bands.
- In Kenya, the best disorder bucket was
Dysphoniaand the hardest was the mixedStuttering (Disfluency Disorders), Dysarthriabucket. - In Uganda,
Voice disorderhad the highest mean WER among the reported disorder groups.
Usage
from nemo.collections.asr.models import ASRModel
model = ASRModel.from_pretrained("KasuleTrevor/cdli-parakeet-en-ug-ke")
predictions = model.transcribe(["path/to/audio.wav"])
print(predictions[0].text if hasattr(predictions[0], "text") else predictions[0])
Files
EN-PARAKEET-TDT-UG-KE.nemo: exported NeMo checkpointresults/EN-PARAKEET-TDT-UG-KE/: uploaded result tables and breakdown CSVstrain_mix_summary_ug_ke.jsoncountry_summary_ug_ke.csvtest_predictions_scored_uganda.csvtest_predictions_scored_kenya.csvseverity_breakdown_combined_ug_ke.csvdisorder_breakdown_combined_ug_ke.csvetiology_breakdown_combined_ug_ke.csv
Notes
- Validation for early stopping and checkpoint selection was Uganda-only, even though training used both Uganda and Kenya.
- Source datasets are gated. Review the dataset terms before requesting access.
- Result artifacts for both countries were uploaded to the Hub under
results/EN-PARAKEET-TDT-UG-KE/.
- Downloads last month
- 20
Datasets used to train KasuleTrevor/cdli-parakeet-en-ug-ke
cdli/kenyan_english_nonstandard_speech_v1.0
Collection including KasuleTrevor/cdli-parakeet-en-ug-ke
Evaluation results
- Test WER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported27.580
- Test CER (raw) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported14.170
- Test WER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported21.330
- Test CER (normalized) on CDLI Ugandan English Non-Standard Speech v1.0test set self-reported12.810
- Test WER (raw) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported26.360
- Test CER (raw) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported11.780
- Test WER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported14.560
- Test CER (normalized) on CDLI Kenyan English Non-Standard Speech v1.0test set self-reported9.460