| --- |
| license: apache-2.0 |
| language: |
| - ps |
| tags: |
| - pashto |
| - asr |
| - tts |
| - nlp |
| --- |
| |
| # π Pukhto/Pashto Open Language Project |
|
|
| Community-led open-source project to make Pashto a first-class language in AI speech and language tooling. |
|
|
| ## π Project Links |
| - GitHub: [Pukhto_Pashto](https://github.com/Musawer1214/Pukhto_Pashto) |
| - Hugging Face: [Musawer14/Pukhto_Pashto](https://huggingface.co/Musawer14/Pukhto_Pashto) |
| - GitHub Pages (About): [Pukhto_Pashto Site](https://musawer1214.github.io/Pukhto_Pashto/) |
|
|
| ## π― Core Goal |
| - Build open datasets, benchmarks, and models for Pashto ASR, TTS, and NLP. |
| - Keep work reproducible, transparent, and contribution-friendly. |
| - Focus on public good and broad accessibility. |
|
|
| ## π§ Documentation Map |
| - Purpose: [PROJECT_PURPOSE.md](PROJECT_PURPOSE.md) |
| - Contributing: [CONTRIBUTING.md](CONTRIBUTING.md) |
| - Roadmap: [ROADMAP.md](ROADMAP.md) |
| - Governance: [GOVERNANCE.md](GOVERNANCE.md) |
| - License policy: [LICENSE_POLICY.md](LICENSE_POLICY.md) |
| - Changelog: [CHANGELOG.md](CHANGELOG.md) |
| - Community: [community/COMMUNICATION.md](community/COMMUNICATION.md) |
| - Docs home: [docs/README.md](docs/README.md) |
| - Release process: [docs/release_process.md](docs/release_process.md) |
| - Release checklist: [docs/release_checklist.md](docs/release_checklist.md) |
| - Workstreams: [docs/workstreams.md](docs/workstreams.md) |
| - Resource index: [docs/resource_catalog.md](docs/resource_catalog.md) |
| - Structured resources: [resources/README.md](resources/README.md) |
|
|
| ## π Verified Resource Catalog |
| The project tracks validated external resources in: |
| - [docs/resource_catalog.md](docs/resource_catalog.md) (master index) |
| - [resources/datasets/README.md](resources/datasets/README.md) |
| - [resources/models/README.md](resources/models/README.md) |
| - [resources/benchmarks/README.md](resources/benchmarks/README.md) |
| - [resources/tools/README.md](resources/tools/README.md) |
|
|
| ## ποΈ Featured Dataset: Common Voice Pashto |
| - Dataset: Common Voice Scripted Speech 24.0 - Pashto |
| - Source: [Mozilla Data Collective - Common Voice Pashto 24.0](https://datacollective.mozillafoundation.org/datasets/cmj8u3pnb00llnxxbfvxo3b14) |
| - Integration guide: [docs/common_voice_pashto_24.md](docs/common_voice_pashto_24.md) |
|
|
| ## π Contribute Through Mozilla Common Voice |
| - Speak: [commonvoice.mozilla.org/ps/speak](https://commonvoice.mozilla.org/ps/speak) |
| - Write: [commonvoice.mozilla.org/ps/write](https://commonvoice.mozilla.org/ps/write) |
| - Listen: [commonvoice.mozilla.org/ps/listen](https://commonvoice.mozilla.org/ps/listen) |
| - Review: [commonvoice.mozilla.org/ps/review](https://commonvoice.mozilla.org/ps/review) |
|
|
| ## π§© Workspaces |
| - [data/](data/README.md) datasets, curation, metadata, quality |
| - [asr/](asr/README.md) ASR baselines and experiments |
| - [tts/](tts/README.md) TTS baselines and experiments |
| - [benchmarks/](benchmarks/README.md) benchmark sets and evaluation |
| - [experiments/](experiments/README.md) reproducible run cards |
| - [apps/desktop/](apps/desktop/README.md) user-facing integration references |
| - [models/](models/README.md) model layout and release conventions |
|
|