| --- |
| license: apache-2.0 |
| language: |
| - ps |
| - en |
| tags: |
| - pashto |
| - pukhto |
| - pushto |
| - asr |
| - tts |
| - nlp |
| - machine-translation |
| - language-resources |
| - low-resource-languages |
| - speech-recognition |
| --- |
| |
| # Pashto Language Resources Hub (Pukhto/Pashto) |
|
|
| Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT). |
|
|
| This project curates verified Pashto resources and maintains reproducible tooling for discovery, validation, and documentation. |
|
|
| ## Start Here |
|
|
| - Main resource search: [Pashto Resource Search](https://musawer1214.github.io/pashto-language-resources/search/) |
| - Project site: [Pashto Language Resources Hub](https://musawer1214.github.io/pashto-language-resources/) |
| - GitHub repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources) |
| - Hugging Face mirror: [Musawer14/pashto-language-resources](https://huggingface.co/Musawer14/pashto-language-resources) |
|
|
| ## If You Searched For |
|
|
| This repository is relevant to these search intents: |
|
|
| - Pashto datasets |
| - Pashto ASR model |
| - Pashto TTS resources |
| - Pashto NLP benchmark |
| - Pashto machine translation resources |
| - Pukhto language technology |
| - Pushto AI resources |
|
|
| ## Current Scope |
|
|
| - Build open Pashto datasets, benchmarks, and model references for ASR, TTS, NLP, and MT. |
| - Track practical tools, apps, and academic papers for Pashto integration in technology. |
| - Keep everything transparent, reproducible, and contribution-friendly. |
|
|
| ## Resource System |
|
|
| Machine-readable and searchable resource pipeline: |
|
|
| - Canonical catalog: [resources/catalog/resources.json](resources/catalog/resources.json) |
| - Catalog schema: [resources/schema/resource.schema.json](resources/schema/resource.schema.json) |
| - Candidate feed (auto-generated): [resources/catalog/pending_candidates.json](resources/catalog/pending_candidates.json) |
| - Search UI source: [docs/search/index.html](docs/search/index.html) |
| - Search data export: [docs/search/resources.json](docs/search/resources.json) |
| - Resource index docs: [docs/resource_catalog.md](docs/resource_catalog.md) |
| - Automation docs: [docs/resource_automation.md](docs/resource_automation.md) |
| - Cycle runbook: [docs/resource_cycle_runbook.md](docs/resource_cycle_runbook.md) |
|
|
| ## How New Resources Are Added |
|
|
| 1. Auto discovery runs daily from `.github/workflows/resource_sync.yml` and updates `resources/catalog/pending_candidates.json` in a review PR. |
| 2. Manual review checks quality, Pashto evidence, and license compatibility before promoting entries into `resources/catalog/resources.json` with `status: verified`. |
| 3. Regeneration and validation runs `python scripts/validate_resource_catalog.py` and `python scripts/generate_resource_views.py`, then commits generated updates. |
|
|
| Shortcut wrapper: |
| - `python scripts/run_resource_cycle.py --limit 25` |
|
|
| ## Quickstart |
|
|
| ```bash |
| python -m pip install -e ".[dev]" |
| python scripts/validate_resource_catalog.py |
| python scripts/generate_resource_views.py |
| python scripts/check_links.py |
| python -m pytest -q |
| ``` |
|
|
| ## Discoverability And SEO |
|
|
| - Playbook: [docs/discoverability_seo.md](docs/discoverability_seo.md) |
| - Docs hub: [docs/README.md](docs/README.md) |
| - Resource search page: [docs/search/index.html](docs/search/index.html) |
| - Citation metadata: [CITATION.cff](CITATION.cff) |
| - Platform sync policy: [docs/platform_sync_policy.md](docs/platform_sync_policy.md) |
| - GitHub topics checklist: [docs/github_topics_checklist.md](docs/github_topics_checklist.md) |
| - Backlink strategy: [docs/backlink_strategy.md](docs/backlink_strategy.md) |
| - Intent page: [Pashto datasets](docs/pashto_datasets.md) |
| - Intent page: [Pashto ASR](docs/pashto_asr.md) |
| - Intent page: [Pashto TTS](docs/pashto_tts.md) |
| - Release notes: [v0.1.1](docs/release_v0.1.1.md) |
| - Release notes: [v0.1.2](docs/release_v0.1.2.md) |
|
|
| ## Documentation Map |
|
|
| - Purpose: [PROJECT_PURPOSE.md](PROJECT_PURPOSE.md) |
| - Contributing: [CONTRIBUTING.md](CONTRIBUTING.md) |
| - Roadmap: [ROADMAP.md](ROADMAP.md) |
| - Governance: [GOVERNANCE.md](GOVERNANCE.md) |
| - License policy: [LICENSE_POLICY.md](LICENSE_POLICY.md) |
| - Changelog: [CHANGELOG.md](CHANGELOG.md) |
| - Community: [community/COMMUNICATION.md](community/COMMUNICATION.md) |
| - Docs hub: [docs/README.md](docs/README.md) |
| - Resource index: [docs/resource_catalog.md](docs/resource_catalog.md) |
| - Resource automation: [docs/resource_automation.md](docs/resource_automation.md) |
|
|
| ## Resource Sections |
|
|
| - Datasets: [resources/datasets/README.md](resources/datasets/README.md) |
| - Models: [resources/models/README.md](resources/models/README.md) |
| - Benchmarks: [resources/benchmarks/README.md](resources/benchmarks/README.md) |
| - Tools: [resources/tools/README.md](resources/tools/README.md) |
| - Papers: [resources/papers/README.md](resources/papers/README.md) |
| - Projects: [resources/projects/README.md](resources/projects/README.md) |
| - Code: [resources/codes/README.md](resources/codes/README.md) |
|
|
| ## Workspaces |
|
|
| - [data/](data/README.md): datasets, curation, metadata, quality |
| - [asr/](asr/README.md): ASR baselines and experiments |
| - [tts/](tts/README.md): TTS baselines and experiments |
| - [benchmarks/](benchmarks/README.md): benchmark sets and evaluation |
| - [experiments/](experiments/README.md): reproducible run cards |
| - [apps/desktop/](apps/desktop/README.md): user-facing integration references |
| - [models/](models/README.md): model layout and release conventions |
|
|