--- license: apache-2.0 language: - ps tags: - pashto - asr - tts - nlp --- # 🌍 Pukhto/Pashto Open Language Project Community-led open-source project to make Pashto a first-class language in AI speech and language tooling. ## πŸ”— Project Links - GitHub: [https://github.com/Musawer1214/Pukhto_Pashto](https://github.com/Musawer1214/Pukhto_Pashto) - Hugging Face: [https://huggingface.co/Musawer14/Pukhto_Pashto](https://huggingface.co/Musawer14/Pukhto_Pashto) ## 🎯 Core Goal - Build open datasets, benchmarks, and models for Pashto ASR, TTS, and NLP. - Keep work reproducible, transparent, and contribution-friendly. - Focus on public good and broad accessibility. ## πŸ“š Featured External Dataset - Common Voice Scripted Speech 24.0 - Pashto - Source: [https://datacollective.mozillafoundation.org/datasets/cmj8u3pnb00llnxxbfvxo3b14](https://datacollective.mozillafoundation.org/datasets/cmj8u3pnb00llnxxbfvxo3b14) - Project integration guide: [docs/common_voice_pashto_24.md](docs/common_voice_pashto_24.md) ## πŸ™Œ Contribute Through Mozilla Common Voice - Speak: [https://commonvoice.mozilla.org/ps/speak](https://commonvoice.mozilla.org/ps/speak) - Write: [https://commonvoice.mozilla.org/ps/write](https://commonvoice.mozilla.org/ps/write) - Listen: [https://commonvoice.mozilla.org/ps/listen](https://commonvoice.mozilla.org/ps/listen) - Review: [https://commonvoice.mozilla.org/ps/review](https://commonvoice.mozilla.org/ps/review) ## 🌐 Community Resource Profiles - Hugging Face (external Pashto resource profile): [https://huggingface.co/ihanif](https://huggingface.co/ihanif) - Use this profile as a reference point for Pashto ASR/TTS datasets, models, and community experiments. ## πŸš€ Start Here - πŸ“˜ Purpose: `PROJECT_PURPOSE.md` - 🀝 Contributing: `CONTRIBUTING.md` - πŸ—ΊοΈ Roadmap: `ROADMAP.md` - πŸ›οΈ Governance: `GOVERNANCE.md` - πŸ’¬ Community coordination: `community/COMMUNICATION.md` ## 🧩 Initial Workstreams - `data/` Pashto data collection, cleaning, metadata - `asr/` speech-to-text baselines and experiments - `tts/` text-to-speech baselines and experiments - `benchmarks/` fixed test sets and evaluation scripts - `apps/desktop/` app integration references