--- layout: default title: About Pukhto Pashto --- # About This Repository `Pukhto_Pashto` is a community-led open project focused on making Pashto a first-class language in speech and language AI. ## Mission - Build open Pashto datasets for ASR, TTS, and NLP. - Publish reproducible baseline models and evaluation workflows. - Keep progress transparent, contributor-friendly, and public-benefit oriented. ## What Is In This Repository - `data/`: dataset workflows, metadata, and normalization seeds. - `asr/`: ASR baselines and experiment notes. - `tts/`: TTS baselines and quality tracking. - `benchmarks/`: benchmark schema, result format, and metric guidance. - `experiments/`: reproducible run cards and experiment records. - `docs/`: policies, roadmap, release process, and operating guides. - `resources/`: verified external Pashto datasets, models, tools, benchmarks, and papers. ## Search Resources - Search UI: [Pashto Resource Search](search/) - Resource index docs: [resource_catalog.md](resource_catalog.md) - Machine-readable catalog: [../resources/catalog/resources.json](../resources/catalog/resources.json) ## Project References - Repository: [Musawer1214/Pukhto_Pashto](https://github.com/Musawer1214/Pukhto_Pashto) - Hugging Face: [Musawer14/Pukhto_Pashto](https://huggingface.co/Musawer14/Pukhto_Pashto) - Purpose: [PROJECT_PURPOSE.md](../PROJECT_PURPOSE.md) - Roadmap: [ROADMAP.md](../ROADMAP.md) - Contributing: [CONTRIBUTING.md](../CONTRIBUTING.md) ## Contributing You can help by improving documentation, validating normalization rows, sharing verified resources, or contributing data and evaluation workflows. For contributor workflow and standards, start at: - [docs/README.md](README.md) - [community/COMMUNICATION.md](../community/COMMUNICATION.md) ## License This project is released under Apache 2.0. See [LICENSE](../LICENSE).