File size: 1,467 Bytes
1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | # π Project Purpose
## β Why this project exists
Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration.
## π Mission
Create high-quality open resources that enable Pashto to work reliably in:
- Speech recognition (ASR)
- Text-to-speech (TTS)
- Translation and NLP tooling
## β
What success looks like
- Public Pashto datasets with clear quality standards
- Reproducible baseline models and training pipelines
- Public benchmark/leaderboard for fair model comparison
- Open desktop/API demos that real users can run
## ποΈ Non-commercial commitment
This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization.
## π§ Principles
- Openness: data/model/process transparency
- Inclusivity: dialect and accent diversity
- Quality: strong labeling/review standards
- Reproducibility: scripts, configs, and documented experiments
- Continuity: release cadence and long-term maintenance
## π¦ Scope (v1 foundation)
- Build core repository and contributor workflows
- Launch Pashto data collection and validation pipeline
- Publish ASR and TTS baselines
- Publish first benchmark set and metrics
## π« Out of scope (for now)
- Closed paid APIs as the only path
- Private datasets without reproducible provenance
- Productization before core language quality is established
|