| --- |
| layout: default |
| title: Pashto Language Resources Hub |
| description: Open-source Pashto (Pukhto/Pashto) datasets, models, benchmarks, ASR, TTS, NLP, and MT resources. |
| --- |
| |
| # Pashto Language Resources Hub |
|
|
| `pashto-language-resources` is a community-led open-source project focused on making Pashto (also written as Pukhto/Pushto) a first-class language in speech and language AI. |
|
|
| ## Mission |
|
|
| - Build open Pashto datasets and resource indexes for ASR, TTS, NLP, and MT. |
| - Publish reproducible baseline models, benchmark schemas, and evaluation workflows. |
| - Keep progress transparent, contributor-friendly, and public-benefit oriented. |
|
|
| ## What Is In This Repository |
|
|
| - `data/`: dataset workflows, metadata, and normalization seeds. |
| - `asr/`: ASR baselines and experiment notes. |
| - `tts/`: TTS baselines and quality tracking. |
| - `benchmarks/`: benchmark schema, result format, and metric guidance. |
| - `experiments/`: reproducible run cards and experiment records. |
| - `docs/`: policies, roadmap, release process, and operating guides. |
| - `resources/`: verified external Pashto datasets, models, tools, benchmarks, and papers. |
|
|
| ## Search Pashto Resources |
|
|
| - Search UI: [Pashto Resource Search](search/) |
| - Resource index docs: [resource_catalog.md](resource_catalog.md) |
| - Machine-readable catalog (GitHub): [resources.json source](https://github.com/Musawer1214/pashto-language-resources/blob/main/resources/catalog/resources.json) |
|
|
| ## Intent Pages |
|
|
| - Pashto datasets: [pashto_datasets.md](pashto_datasets.md) |
| - Pashto ASR resources: [pashto_asr.md](pashto_asr.md) |
| - Pashto TTS resources: [pashto_tts.md](pashto_tts.md) |
|
|
| ## Project References |
|
|
| - Repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources) |
| - Hugging Face: [Musawer14/pashto-language-resources](https://huggingface.co/Musawer14/pashto-language-resources) |
| - Purpose: [PROJECT_PURPOSE.md](../PROJECT_PURPOSE.md) |
| - Roadmap: [ROADMAP.md](../ROADMAP.md) |
| - Contributing: [CONTRIBUTING.md](../CONTRIBUTING.md) |
|
|
| ## SEO Operations |
|
|
| - GitHub topics checklist: [github_topics_checklist.md](github_topics_checklist.md) |
| - Backlink strategy: [backlink_strategy.md](backlink_strategy.md) |
| - Release notes v0.1.1: [release_v0.1.1.md](release_v0.1.1.md) |
| - Release notes v0.1.2: [release_v0.1.2.md](release_v0.1.2.md) |
|
|
| ## Contributing |
|
|
| You can help by improving documentation, validating normalization rows, sharing verified resources, or contributing data, model, and evaluation workflows. |
|
|
| For contributor workflow and standards, start at: |
| - [docs/README.md](README.md) |
| - [community/COMMUNICATION.md](../community/COMMUNICATION.md) |
|
|
| ## Search Terms |
|
|
| This project is relevant to searches like: |
|
|
| - Pashto datasets |
| - Pashto ASR model |
| - Pashto TTS resources |
| - Pashto NLP benchmark |
| - Pashto language technology |
| - Pukhto language resources |
|
|
| ## License |
|
|
| This project is released under Apache 2.0. See [LICENSE](../LICENSE). |
|
|