metadata
license: apache-2.0
language:
- ps
- en
tags:
- pashto
- pukhto
- pushto
- asr
- tts
- nlp
- machine-translation
- language-resources
- low-resource-languages
- speech-recognition
Pashto Language Resources Hub (Pukhto/Pashto)
Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT).
This project curates verified Pashto resources and maintains reproducible tooling for discovery, validation, and documentation.
Start Here
- Main resource search: Pashto Resource Search
- Project site: Pashto Language Resources Hub
- GitHub repository: Musawer1214/pashto-language-resources
- Hugging Face mirror: Musawer14/pashto-language-resources
If You Searched For
This repository is relevant to these search intents:
- Pashto datasets
- Pashto ASR model
- Pashto TTS resources
- Pashto NLP benchmark
- Pashto machine translation resources
- Pukhto language technology
- Pushto AI resources
Current Scope
- Build open Pashto datasets, benchmarks, and model references for ASR, TTS, NLP, and MT.
- Track practical tools, apps, and academic papers for Pashto integration in technology.
- Keep everything transparent, reproducible, and contribution-friendly.
Resource System
Machine-readable and searchable resource pipeline:
- Canonical catalog: resources/catalog/resources.json
- Catalog schema: resources/schema/resource.schema.json
- Candidate feed (auto-generated): resources/catalog/pending_candidates.json
- Search UI source: docs/search/index.html
- Search data export: docs/search/resources.json
- Resource index docs: docs/resource_catalog.md
- Automation docs: docs/resource_automation.md
- Cycle runbook: docs/resource_cycle_runbook.md
How New Resources Are Added
- Auto discovery runs daily from
.github/workflows/resource_sync.ymland updatesresources/catalog/pending_candidates.jsonin a review PR. - Manual review checks quality, Pashto evidence, and license compatibility before promoting entries into
resources/catalog/resources.jsonwithstatus: verified. - Regeneration and validation runs
python scripts/validate_resource_catalog.pyandpython scripts/generate_resource_views.py, then commits generated updates.
Shortcut wrapper:
python scripts/run_resource_cycle.py --limit 25
Quickstart
python -m pip install -e ".[dev]"
python scripts/validate_resource_catalog.py
python scripts/generate_resource_views.py
python scripts/check_links.py
python -m pytest -q
Discoverability And SEO
- Playbook: docs/discoverability_seo.md
- Docs hub: docs/README.md
- Resource search page: docs/search/index.html
- Citation metadata: CITATION.cff
- Platform sync policy: docs/platform_sync_policy.md
- GitHub topics checklist: docs/github_topics_checklist.md
- Backlink strategy: docs/backlink_strategy.md
- Intent page: Pashto datasets
- Intent page: Pashto ASR
- Intent page: Pashto TTS
- Release notes: v0.1.1
- Release notes: v0.1.2
Documentation Map
- Purpose: PROJECT_PURPOSE.md
- Contributing: CONTRIBUTING.md
- Roadmap: ROADMAP.md
- Governance: GOVERNANCE.md
- License policy: LICENSE_POLICY.md
- Changelog: CHANGELOG.md
- Community: community/COMMUNICATION.md
- Docs hub: docs/README.md
- Resource index: docs/resource_catalog.md
- Resource automation: docs/resource_automation.md
Resource Sections
- Datasets: resources/datasets/README.md
- Models: resources/models/README.md
- Benchmarks: resources/benchmarks/README.md
- Tools: resources/tools/README.md
- Papers: resources/papers/README.md
- Projects: resources/projects/README.md
- Code: resources/codes/README.md
Workspaces
- data/: datasets, curation, metadata, quality
- asr/: ASR baselines and experiments
- tts/: TTS baselines and experiments
- benchmarks/: benchmark sets and evaluation
- experiments/: reproducible run cards
- apps/desktop/: user-facing integration references
- models/: model layout and release conventions