metadata
license: apache-2.0
language:
- ps
tags:
- pashto
- asr
- tts
- nlp
π Pukhto/Pashto Open Language Project
Community-led open-source project to make Pashto a first-class language in AI speech and language tooling.
π Project Links
- GitHub: Pukhto_Pashto
- Hugging Face: Musawer14/Pukhto_Pashto
- GitHub Pages (About): Pukhto_Pashto Site
π― Core Goal
- Build open datasets, benchmarks, and models for Pashto ASR, TTS, and NLP.
- Keep work reproducible, transparent, and contribution-friendly.
- Focus on public good and broad accessibility.
π§ Documentation Map
- Purpose: PROJECT_PURPOSE.md
- Contributing: CONTRIBUTING.md
- Roadmap: ROADMAP.md
- Governance: GOVERNANCE.md
- License policy: LICENSE_POLICY.md
- Changelog: CHANGELOG.md
- Community: community/COMMUNICATION.md
- Docs home: docs/README.md
- Release process: docs/release_process.md
- Release checklist: docs/release_checklist.md
- Workstreams: docs/workstreams.md
- Resource index: docs/resource_catalog.md
- Structured resources: resources/README.md
π Verified Resource Catalog
The project tracks validated external resources in:
- docs/resource_catalog.md (master index)
- resources/datasets/README.md
- resources/models/README.md
- resources/benchmarks/README.md
- resources/tools/README.md
ποΈ Featured Dataset: Common Voice Pashto
- Dataset: Common Voice Scripted Speech 24.0 - Pashto
- Source: Mozilla Data Collective - Common Voice Pashto 24.0
- Integration guide: docs/common_voice_pashto_24.md
π Contribute Through Mozilla Common Voice
- Speak: commonvoice.mozilla.org/ps/speak
- Write: commonvoice.mozilla.org/ps/write
- Listen: commonvoice.mozilla.org/ps/listen
- Review: commonvoice.mozilla.org/ps/review
π§© Workspaces
- data/ datasets, curation, metadata, quality
- asr/ ASR baselines and experiments
- tts/ TTS baselines and experiments
- benchmarks/ benchmark sets and evaluation
- experiments/ reproducible run cards
- apps/desktop/ user-facing integration references
- models/ model layout and release conventions