musaw
docs(seo): add intent pages, topics checklist, backlink plan, and release drafts
9003457
|
Raw
History Blame
5.38 kB
---
license: apache-2.0
language:
- ps
- en
tags:
- pashto
- pukhto
- pushto
- asr
- tts
- nlp
- machine-translation
- language-resources
- low-resource-languages
- speech-recognition
---
# Pashto Language Resources Hub (Pukhto/Pashto)
Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT).
This project curates verified Pashto resources and maintains reproducible tooling for discovery, validation, and documentation.
## Start Here
- Main resource search: [Pashto Resource Search](https://musawer1214.github.io/pashto-language-resources/search/)
- Project site: [Pashto Language Resources Hub](https://musawer1214.github.io/pashto-language-resources/)
- GitHub repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources)
- Hugging Face mirror: [Musawer14/pashto-language-resources](https://huggingface.co/Musawer14/pashto-language-resources)
## If You Searched For
This repository is relevant to these search intents:
- Pashto datasets
- Pashto ASR model
- Pashto TTS resources
- Pashto NLP benchmark
- Pashto machine translation resources
- Pukhto language technology
- Pushto AI resources
## Current Scope
- Build open Pashto datasets, benchmarks, and model references for ASR, TTS, NLP, and MT.
- Track practical tools, apps, and academic papers for Pashto integration in technology.
- Keep everything transparent, reproducible, and contribution-friendly.
## Resource System
Machine-readable and searchable resource pipeline:
- Canonical catalog: [resources/catalog/resources.json](resources/catalog/resources.json)
- Catalog schema: [resources/schema/resource.schema.json](resources/schema/resource.schema.json)
- Candidate feed (auto-generated): [resources/catalog/pending_candidates.json](resources/catalog/pending_candidates.json)
- Search UI source: [docs/search/index.html](docs/search/index.html)
- Search data export: [docs/search/resources.json](docs/search/resources.json)
- Resource index docs: [docs/resource_catalog.md](docs/resource_catalog.md)
- Automation docs: [docs/resource_automation.md](docs/resource_automation.md)
- Cycle runbook: [docs/resource_cycle_runbook.md](docs/resource_cycle_runbook.md)
## How New Resources Are Added
1. Auto discovery runs daily from `.github/workflows/resource_sync.yml` and updates `resources/catalog/pending_candidates.json` in a review PR.
2. Manual review checks quality, Pashto evidence, and license compatibility before promoting entries into `resources/catalog/resources.json` with `status: verified`.
3. Regeneration and validation runs `python scripts/validate_resource_catalog.py` and `python scripts/generate_resource_views.py`, then commits generated updates.
Shortcut wrapper:
- `python scripts/run_resource_cycle.py --limit 25`
## Quickstart
```bash
python -m pip install -e ".[dev]"
python scripts/validate_resource_catalog.py
python scripts/generate_resource_views.py
python scripts/check_links.py
python -m pytest -q
```
## Discoverability And SEO
- Playbook: [docs/discoverability_seo.md](docs/discoverability_seo.md)
- Docs hub: [docs/README.md](docs/README.md)
- Resource search page: [docs/search/index.html](docs/search/index.html)
- Citation metadata: [CITATION.cff](CITATION.cff)
- Platform sync policy: [docs/platform_sync_policy.md](docs/platform_sync_policy.md)
- GitHub topics checklist: [docs/github_topics_checklist.md](docs/github_topics_checklist.md)
- Backlink strategy: [docs/backlink_strategy.md](docs/backlink_strategy.md)
- Intent page: [Pashto datasets](docs/pashto_datasets.md)
- Intent page: [Pashto ASR](docs/pashto_asr.md)
- Intent page: [Pashto TTS](docs/pashto_tts.md)
- Release notes: [v0.1.1](docs/release_v0.1.1.md)
- Release notes: [v0.1.2](docs/release_v0.1.2.md)
## Documentation Map
- Purpose: [PROJECT_PURPOSE.md](PROJECT_PURPOSE.md)
- Contributing: [CONTRIBUTING.md](CONTRIBUTING.md)
- Roadmap: [ROADMAP.md](ROADMAP.md)
- Governance: [GOVERNANCE.md](GOVERNANCE.md)
- License policy: [LICENSE_POLICY.md](LICENSE_POLICY.md)
- Changelog: [CHANGELOG.md](CHANGELOG.md)
- Community: [community/COMMUNICATION.md](community/COMMUNICATION.md)
- Docs hub: [docs/README.md](docs/README.md)
- Resource index: [docs/resource_catalog.md](docs/resource_catalog.md)
- Resource automation: [docs/resource_automation.md](docs/resource_automation.md)
## Resource Sections
- Datasets: [resources/datasets/README.md](resources/datasets/README.md)
- Models: [resources/models/README.md](resources/models/README.md)
- Benchmarks: [resources/benchmarks/README.md](resources/benchmarks/README.md)
- Tools: [resources/tools/README.md](resources/tools/README.md)
- Papers: [resources/papers/README.md](resources/papers/README.md)
- Projects: [resources/projects/README.md](resources/projects/README.md)
- Code: [resources/codes/README.md](resources/codes/README.md)
## Workspaces
- [data/](data/README.md): datasets, curation, metadata, quality
- [asr/](asr/README.md): ASR baselines and experiments
- [tts/](tts/README.md): TTS baselines and experiments
- [benchmarks/](benchmarks/README.md): benchmark sets and evaluation
- [experiments/](experiments/README.md): reproducible run cards
- [apps/desktop/](apps/desktop/README.md): user-facing integration references
- [models/](models/README.md): model layout and release conventions