πΊοΈ Roadmap
π§± Phase 1: Foundation (0-2 months)
- Finalize governance and contribution docs
- Define Pashto text normalization policy
- Prepare data schema and validation checklist
- Publish baseline ASR/TTS experiment templates
π Phase 2: Data Scale (2-4 months)
- Community data campaigns (recording + validation)
- Curate and release dataset versions (
v0.1,v0.2) - Improve metadata quality (speaker, dialect, environment)
π€ Phase 3: Baseline Models (4-6 months)
- Train and release first open ASR baseline
- Train and release first open TTS baseline
- Publish reproducible training/eval scripts
π§ͺ Phase 4: Benchmark & Demos (6-9 months)
- Release fixed evaluation benchmark
- Launch public leaderboard (WER/CER + TTS quality eval)
- Integrate models into desktop/app demos
π± Phase 5: Community Maturity (9+ months)
- Regular release cadence
- Contributor mentoring and review rotations
- Long-term maintenance and quality governance