| # π Project Purpose |
|
|
| ## β Why this project exists |
| Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration. |
|
|
| ## π Mission |
| Create high-quality open resources that enable Pashto to work reliably in: |
| - Speech recognition (ASR) |
| - Text-to-speech (TTS) |
| - Translation and NLP tooling |
|
|
| ## β
What success looks like |
| - Public Pashto datasets with clear quality standards |
| - Reproducible baseline models and training pipelines |
| - Public benchmark/leaderboard for fair model comparison |
| - Open desktop/API demos that real users can run |
|
|
| ## ποΈ Non-commercial commitment |
| This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization. |
|
|
| ## π§ Principles |
| - Openness: data/model/process transparency |
| - Inclusivity: dialect and accent diversity |
| - Quality: strong labeling/review standards |
| - Reproducibility: scripts, configs, and documented experiments |
| - Continuity: release cadence and long-term maintenance |
|
|
| ## π¦ Scope (v1 foundation) |
| - Build core repository and contributor workflows |
| - Launch Pashto data collection and validation pipeline |
| - Publish ASR and TTS baselines |
| - Publish first benchmark set and metrics |
|
|
| ## π« Out of scope (for now) |
| - Closed paid APIs as the only path |
| - Private datasets without reproducible provenance |
| - Productization before core language quality is established |
|
|