# 📘 Project Purpose ## ❓ Why this project exists Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration. ## 🌟 Mission Create high-quality open resources that enable Pashto to work reliably in: - Speech recognition (ASR) - Text-to-speech (TTS) - Translation and NLP tooling ## ✅ What success looks like - Public Pashto datasets with clear quality standards - Reproducible baseline models and training pipelines - Public benchmark/leaderboard for fair model comparison - Open desktop/API demos that real users can run ## 🕊️ Non-commercial commitment This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization. ## 🧭 Principles - Openness: data/model/process transparency - Inclusivity: dialect and accent diversity - Quality: strong labeling/review standards - Reproducibility: scripts, configs, and documented experiments - Continuity: release cadence and long-term maintenance ## 📦 Scope (v1 foundation) - Build core repository and contributor workflows - Launch Pashto data collection and validation pipeline - Publish ASR and TTS baselines - Publish first benchmark set and metrics ## 🚫 Out of scope (for now) - Closed paid APIs as the only path - Private datasets without reproducible provenance - Productization before core language quality is established