musaw commited on
Commit ·
9003457
1
Parent(s): 53fd6b7
docs(seo): add intent pages, topics checklist, backlink plan, and release drafts
Browse files- .github/release_template.md +2 -0
- CHANGELOG.md +7 -3
- README.md +25 -2
- docs/README.md +11 -0
- docs/backlink_strategy.md +64 -0
- docs/discoverability_seo.md +8 -0
- docs/github_topics_checklist.md +45 -0
- docs/index.md +13 -2
- docs/pashto_asr.md +31 -0
- docs/pashto_datasets.md +30 -0
- docs/pashto_tts.md +31 -0
- docs/platform_sync_policy.md +9 -0
- docs/release_checklist.md +2 -1
- docs/release_v0.1.1.md +38 -0
- docs/release_v0.1.2.md +33 -0
- pyproject.toml +2 -1
.github/release_template.md
CHANGED
|
@@ -19,3 +19,5 @@
|
|
| 19 |
## References
|
| 20 |
- Changelog: [CHANGELOG.md](../CHANGELOG.md)
|
| 21 |
- Release checklist: [docs/release_checklist.md](../docs/release_checklist.md)
|
|
|
|
|
|
|
|
|
| 19 |
## References
|
| 20 |
- Changelog: [CHANGELOG.md](../CHANGELOG.md)
|
| 21 |
- Release checklist: [docs/release_checklist.md](../docs/release_checklist.md)
|
| 22 |
+
- Release notes draft v0.1.1: [docs/release_v0.1.1.md](../docs/release_v0.1.1.md)
|
| 23 |
+
- Release notes draft v0.1.2: [docs/release_v0.1.2.md](../docs/release_v0.1.2.md)
|
CHANGELOG.md
CHANGED
|
@@ -7,13 +7,17 @@ and this project uses milestone-style tags (for example `v0.1`, `v0.2`).
|
|
| 7 |
|
| 8 |
## [Unreleased]
|
| 9 |
### Added
|
| 10 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
### Changed
|
| 13 |
-
-
|
| 14 |
|
| 15 |
### Fixed
|
| 16 |
-
-
|
| 17 |
|
| 18 |
## [v0.1] - 2026-02-14
|
| 19 |
### Added
|
|
|
|
| 7 |
|
| 8 |
## [Unreleased]
|
| 9 |
### Added
|
| 10 |
+
- Hugging Face model-card metadata at the top of `README.md`.
|
| 11 |
+
- GitHub topics checklist (`docs/github_topics_checklist.md`).
|
| 12 |
+
- Backlink strategy plan (`docs/backlink_strategy.md`).
|
| 13 |
+
- SEO intent landing pages for Pashto datasets, ASR, and TTS.
|
| 14 |
+
- Release note drafts for `v0.1.1` and `v0.1.2`.
|
| 15 |
|
| 16 |
### Changed
|
| 17 |
+
- Expanded documentation map to include SEO operations and intent pages.
|
| 18 |
|
| 19 |
### Fixed
|
| 20 |
+
- Finalized hardcoded URL coverage for the `pashto-language-resources` slug across docs and release notes.
|
| 21 |
|
| 22 |
## [v0.1] - 2026-02-14
|
| 23 |
### Added
|
README.md
CHANGED
|
@@ -1,3 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Pashto Language Resources Hub (Pukhto/Pashto)
|
| 2 |
|
| 3 |
Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT).
|
|
@@ -68,6 +86,13 @@ python -m pytest -q
|
|
| 68 |
- Resource search page: [docs/search/index.html](docs/search/index.html)
|
| 69 |
- Citation metadata: [CITATION.cff](CITATION.cff)
|
| 70 |
- Platform sync policy: [docs/platform_sync_policy.md](docs/platform_sync_policy.md)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 71 |
|
| 72 |
## Documentation Map
|
| 73 |
|
|
@@ -101,5 +126,3 @@ python -m pytest -q
|
|
| 101 |
- [experiments/](experiments/README.md): reproducible run cards
|
| 102 |
- [apps/desktop/](apps/desktop/README.md): user-facing integration references
|
| 103 |
- [models/](models/README.md): model layout and release conventions
|
| 104 |
-
|
| 105 |
-
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ps
|
| 5 |
+
- en
|
| 6 |
+
tags:
|
| 7 |
+
- pashto
|
| 8 |
+
- pukhto
|
| 9 |
+
- pushto
|
| 10 |
+
- asr
|
| 11 |
+
- tts
|
| 12 |
+
- nlp
|
| 13 |
+
- machine-translation
|
| 14 |
+
- language-resources
|
| 15 |
+
- low-resource-languages
|
| 16 |
+
- speech-recognition
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
# Pashto Language Resources Hub (Pukhto/Pashto)
|
| 20 |
|
| 21 |
Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT).
|
|
|
|
| 86 |
- Resource search page: [docs/search/index.html](docs/search/index.html)
|
| 87 |
- Citation metadata: [CITATION.cff](CITATION.cff)
|
| 88 |
- Platform sync policy: [docs/platform_sync_policy.md](docs/platform_sync_policy.md)
|
| 89 |
+
- GitHub topics checklist: [docs/github_topics_checklist.md](docs/github_topics_checklist.md)
|
| 90 |
+
- Backlink strategy: [docs/backlink_strategy.md](docs/backlink_strategy.md)
|
| 91 |
+
- Intent page: [Pashto datasets](docs/pashto_datasets.md)
|
| 92 |
+
- Intent page: [Pashto ASR](docs/pashto_asr.md)
|
| 93 |
+
- Intent page: [Pashto TTS](docs/pashto_tts.md)
|
| 94 |
+
- Release notes: [v0.1.1](docs/release_v0.1.1.md)
|
| 95 |
+
- Release notes: [v0.1.2](docs/release_v0.1.2.md)
|
| 96 |
|
| 97 |
## Documentation Map
|
| 98 |
|
|
|
|
| 126 |
- [experiments/](experiments/README.md): reproducible run cards
|
| 127 |
- [apps/desktop/](apps/desktop/README.md): user-facing integration references
|
| 128 |
- [models/](models/README.md): model layout and release conventions
|
|
|
|
|
|
docs/README.md
CHANGED
|
@@ -17,6 +17,8 @@ This folder is the main documentation entry point for contributors.
|
|
| 17 |
- Common Voice Pashto integration: [common_voice_pashto_24.md](common_voice_pashto_24.md)
|
| 18 |
- Discoverability and SEO playbook: [discoverability_seo.md](discoverability_seo.md)
|
| 19 |
- Platform sync policy: [platform_sync_policy.md](platform_sync_policy.md)
|
|
|
|
|
|
|
| 20 |
- Release process: [release_process.md](release_process.md)
|
| 21 |
- Release checklist: [release_checklist.md](release_checklist.md)
|
| 22 |
- Platforms and publish flow: [platforms.md](platforms.md)
|
|
@@ -24,6 +26,15 @@ This folder is the main documentation entry point for contributors.
|
|
| 24 |
- Resource automation: [resource_automation.md](resource_automation.md)
|
| 25 |
- Resource cycle runbook: [resource_cycle_runbook.md](resource_cycle_runbook.md)
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
## Resource tracking
|
| 28 |
- Master resource index: [resource_catalog.md](resource_catalog.md)
|
| 29 |
- GitHub Pages search: [search/index.html](search/index.html)
|
|
|
|
| 17 |
- Common Voice Pashto integration: [common_voice_pashto_24.md](common_voice_pashto_24.md)
|
| 18 |
- Discoverability and SEO playbook: [discoverability_seo.md](discoverability_seo.md)
|
| 19 |
- Platform sync policy: [platform_sync_policy.md](platform_sync_policy.md)
|
| 20 |
+
- GitHub topics checklist: [github_topics_checklist.md](github_topics_checklist.md)
|
| 21 |
+
- Backlink strategy: [backlink_strategy.md](backlink_strategy.md)
|
| 22 |
- Release process: [release_process.md](release_process.md)
|
| 23 |
- Release checklist: [release_checklist.md](release_checklist.md)
|
| 24 |
- Platforms and publish flow: [platforms.md](platforms.md)
|
|
|
|
| 26 |
- Resource automation: [resource_automation.md](resource_automation.md)
|
| 27 |
- Resource cycle runbook: [resource_cycle_runbook.md](resource_cycle_runbook.md)
|
| 28 |
|
| 29 |
+
## Intent landing pages
|
| 30 |
+
- Pashto datasets page: [pashto_datasets.md](pashto_datasets.md)
|
| 31 |
+
- Pashto ASR page: [pashto_asr.md](pashto_asr.md)
|
| 32 |
+
- Pashto TTS page: [pashto_tts.md](pashto_tts.md)
|
| 33 |
+
|
| 34 |
+
## Release notes
|
| 35 |
+
- v0.1.1 draft notes: [release_v0.1.1.md](release_v0.1.1.md)
|
| 36 |
+
- v0.1.2 draft notes: [release_v0.1.2.md](release_v0.1.2.md)
|
| 37 |
+
|
| 38 |
## Resource tracking
|
| 39 |
- Master resource index: [resource_catalog.md](resource_catalog.md)
|
| 40 |
- GitHub Pages search: [search/index.html](search/index.html)
|
docs/backlink_strategy.md
ADDED
|
@@ -0,0 +1,64 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Backlink Strategy
|
| 2 |
+
|
| 3 |
+
Goal: increase discoverability of Pashto AI resources by earning high-quality links to this repository and its searchable pages.
|
| 4 |
+
|
| 5 |
+
## Priority Destination URLs
|
| 6 |
+
|
| 7 |
+
- Home page: [Pashto Language Resources home](https://musawer1214.github.io/pashto-language-resources/)
|
| 8 |
+
- Search page: [Pashto resource search](https://musawer1214.github.io/pashto-language-resources/search/)
|
| 9 |
+
- Repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources)
|
| 10 |
+
|
| 11 |
+
## Link Building Targets
|
| 12 |
+
|
| 13 |
+
- Pashto/NLP community resource lists.
|
| 14 |
+
- University lab pages working on low-resource languages.
|
| 15 |
+
- Research paper reproducibility pages (appendix/code links).
|
| 16 |
+
- Awesome-list maintainers for speech/NLP resources.
|
| 17 |
+
- Hugging Face model and dataset cards that reference this hub.
|
| 18 |
+
|
| 19 |
+
## Outreach Plays
|
| 20 |
+
|
| 21 |
+
1. Documentation backlinks
|
| 22 |
+
- Add this repository to relevant READMEs and curated resource pages.
|
| 23 |
+
- Ask collaborators to link specific intent pages (datasets, ASR, TTS).
|
| 24 |
+
|
| 25 |
+
2. Research backlinks
|
| 26 |
+
- When publishing results, include repository + exact page links in `Code` and `Data availability` sections.
|
| 27 |
+
|
| 28 |
+
3. Community backlinks
|
| 29 |
+
- Share monthly update posts with direct links to release notes and search page.
|
| 30 |
+
|
| 31 |
+
## Anchor Text Guidance
|
| 32 |
+
|
| 33 |
+
Preferred anchors:
|
| 34 |
+
- Pashto language resources
|
| 35 |
+
- Pashto ASR resources
|
| 36 |
+
- Pashto TTS resources
|
| 37 |
+
- Pashto datasets
|
| 38 |
+
|
| 39 |
+
Avoid repeated exact-match overuse; vary naturally.
|
| 40 |
+
|
| 41 |
+
## Tracking
|
| 42 |
+
|
| 43 |
+
Track monthly in a simple sheet:
|
| 44 |
+
|
| 45 |
+
- Source URL
|
| 46 |
+
- Destination URL
|
| 47 |
+
- Anchor text
|
| 48 |
+
- Domain type (academic, community, code, media)
|
| 49 |
+
- Date added
|
| 50 |
+
- Follow-up status
|
| 51 |
+
|
| 52 |
+
## KPI Targets (Quarterly)
|
| 53 |
+
|
| 54 |
+
- 15+ new relevant referring pages.
|
| 55 |
+
- 3+ new links from academic/lab domains.
|
| 56 |
+
- Increased branded queries for `pashto-language-resources`.
|
| 57 |
+
- Higher impressions for `pashto datasets`, `pashto asr`, `pashto tts`.
|
| 58 |
+
|
| 59 |
+
## Related Docs
|
| 60 |
+
|
| 61 |
+
- [Discoverability and SEO](discoverability_seo.md)
|
| 62 |
+
- [GitHub topics checklist](github_topics_checklist.md)
|
| 63 |
+
- [Release process](release_process.md)
|
| 64 |
+
|
docs/discoverability_seo.md
CHANGED
|
@@ -66,6 +66,13 @@ Keep these updated when renaming slug or domain.
|
|
| 66 |
- Conference/demo pages for Pashto language technology
|
| 67 |
- Ask contributors to link specific resource pages in blog posts or papers.
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
## 6) Indexing Checklist (After Push)
|
| 70 |
|
| 71 |
1. Push all changes to `main`.
|
|
@@ -82,3 +89,4 @@ Keep these updated when renaming slug or domain.
|
|
| 82 |
- Search page
|
| 83 |
6. Recheck search visibility after 1 to 3 weeks.
|
| 84 |
|
|
|
|
|
|
| 66 |
- Conference/demo pages for Pashto language technology
|
| 67 |
- Ask contributors to link specific resource pages in blog posts or papers.
|
| 68 |
|
| 69 |
+
## SEO Operation Assets
|
| 70 |
+
|
| 71 |
+
- GitHub topics checklist: [github_topics_checklist.md](github_topics_checklist.md)
|
| 72 |
+
- Backlink strategy: [backlink_strategy.md](backlink_strategy.md)
|
| 73 |
+
- Intent page: [Pashto datasets](pashto_datasets.md)
|
| 74 |
+
- Intent page: [Pashto ASR](pashto_asr.md)
|
| 75 |
+
- Intent page: [Pashto TTS](pashto_tts.md)
|
| 76 |
## 6) Indexing Checklist (After Push)
|
| 77 |
|
| 78 |
1. Push all changes to `main`.
|
|
|
|
| 89 |
- Search page
|
| 90 |
6. Recheck search visibility after 1 to 3 weeks.
|
| 91 |
|
| 92 |
+
|
docs/github_topics_checklist.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# GitHub Topics Checklist
|
| 2 |
+
|
| 3 |
+
Use this checklist to keep repository topics aligned with search intent and discoverability.
|
| 4 |
+
|
| 5 |
+
## Recommended Topics
|
| 6 |
+
|
| 7 |
+
- pashto
|
| 8 |
+
- pukhto
|
| 9 |
+
- pushto
|
| 10 |
+
- asr
|
| 11 |
+
- tts
|
| 12 |
+
- nlp
|
| 13 |
+
- machine-translation
|
| 14 |
+
- speech-recognition
|
| 15 |
+
- language-resources
|
| 16 |
+
- low-resource-languages
|
| 17 |
+
- multilingual
|
| 18 |
+
- dataset-curation
|
| 19 |
+
|
| 20 |
+
## Manual Update Steps
|
| 21 |
+
|
| 22 |
+
1. Open repository home page.
|
| 23 |
+
2. In the right `About` panel, click the gear icon.
|
| 24 |
+
3. Add or update topics.
|
| 25 |
+
4. Save changes.
|
| 26 |
+
|
| 27 |
+
## Monthly Audit Checklist
|
| 28 |
+
|
| 29 |
+
- [ ] Topics match project scope (ASR, TTS, NLP, MT, resources).
|
| 30 |
+
- [ ] Synonyms are present (pashto, pukhto, pushto).
|
| 31 |
+
- [ ] No stale or misleading topics remain.
|
| 32 |
+
- [ ] README keywords and topics are still consistent.
|
| 33 |
+
- [ ] GitHub Pages home and search links are present in About website/docs.
|
| 34 |
+
|
| 35 |
+
## Validation Commands
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
rg -n "pashto|pukhto|pushto|asr|tts|nlp|machine-translation" README.md docs
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Related Docs
|
| 42 |
+
|
| 43 |
+
- [Discoverability and SEO](discoverability_seo.md)
|
| 44 |
+
- [Backlink strategy](backlink_strategy.md)
|
| 45 |
+
- [Platform sync policy](platform_sync_policy.md)
|
docs/index.md
CHANGED
|
@@ -30,6 +30,12 @@ description: Open-source Pashto (Pukhto/Pashto) datasets, models, benchmarks, AS
|
|
| 30 |
- Resource index docs: [resource_catalog.md](resource_catalog.md)
|
| 31 |
- Machine-readable catalog (GitHub): [resources.json source](https://github.com/Musawer1214/pashto-language-resources/blob/main/resources/catalog/resources.json)
|
| 32 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
## Project References
|
| 34 |
|
| 35 |
- Repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources)
|
|
@@ -38,6 +44,13 @@ description: Open-source Pashto (Pukhto/Pashto) datasets, models, benchmarks, AS
|
|
| 38 |
- Roadmap: [ROADMAP.md](../ROADMAP.md)
|
| 39 |
- Contributing: [CONTRIBUTING.md](../CONTRIBUTING.md)
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
## Contributing
|
| 42 |
|
| 43 |
You can help by improving documentation, validating normalization rows, sharing verified resources, or contributing data, model, and evaluation workflows.
|
|
@@ -60,5 +73,3 @@ This project is relevant to searches like:
|
|
| 60 |
## License
|
| 61 |
|
| 62 |
This project is released under Apache 2.0. See [LICENSE](../LICENSE).
|
| 63 |
-
|
| 64 |
-
|
|
|
|
| 30 |
- Resource index docs: [resource_catalog.md](resource_catalog.md)
|
| 31 |
- Machine-readable catalog (GitHub): [resources.json source](https://github.com/Musawer1214/pashto-language-resources/blob/main/resources/catalog/resources.json)
|
| 32 |
|
| 33 |
+
## Intent Pages
|
| 34 |
+
|
| 35 |
+
- Pashto datasets: [pashto_datasets.md](pashto_datasets.md)
|
| 36 |
+
- Pashto ASR resources: [pashto_asr.md](pashto_asr.md)
|
| 37 |
+
- Pashto TTS resources: [pashto_tts.md](pashto_tts.md)
|
| 38 |
+
|
| 39 |
## Project References
|
| 40 |
|
| 41 |
- Repository: [Musawer1214/pashto-language-resources](https://github.com/Musawer1214/pashto-language-resources)
|
|
|
|
| 44 |
- Roadmap: [ROADMAP.md](../ROADMAP.md)
|
| 45 |
- Contributing: [CONTRIBUTING.md](../CONTRIBUTING.md)
|
| 46 |
|
| 47 |
+
## SEO Operations
|
| 48 |
+
|
| 49 |
+
- GitHub topics checklist: [github_topics_checklist.md](github_topics_checklist.md)
|
| 50 |
+
- Backlink strategy: [backlink_strategy.md](backlink_strategy.md)
|
| 51 |
+
- Release notes v0.1.1: [release_v0.1.1.md](release_v0.1.1.md)
|
| 52 |
+
- Release notes v0.1.2: [release_v0.1.2.md](release_v0.1.2.md)
|
| 53 |
+
|
| 54 |
## Contributing
|
| 55 |
|
| 56 |
You can help by improving documentation, validating normalization rows, sharing verified resources, or contributing data, model, and evaluation workflows.
|
|
|
|
| 73 |
## License
|
| 74 |
|
| 75 |
This project is released under Apache 2.0. See [LICENSE](../LICENSE).
|
|
|
|
|
|
docs/pashto_asr.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Pashto ASR Resources
|
| 3 |
+
description: Pashto automatic speech recognition resources, datasets, models, and benchmark links.
|
| 4 |
+
keywords: pashto asr, pukhto speech recognition, pushto asr model
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Pashto ASR Resources
|
| 8 |
+
|
| 9 |
+
This page targets search intent around Pashto automatic speech recognition.
|
| 10 |
+
|
| 11 |
+
## Start Here
|
| 12 |
+
|
| 13 |
+
- Search all resources: [Pashto resource search](search/index.html)
|
| 14 |
+
- ASR workspace: [asr/README.md](../asr/README.md)
|
| 15 |
+
- Models index: [resources/models/README.md](../resources/models/README.md)
|
| 16 |
+
- Benchmarks index: [resources/benchmarks/README.md](../resources/benchmarks/README.md)
|
| 17 |
+
|
| 18 |
+
## What You Can Find
|
| 19 |
+
|
| 20 |
+
- Pashto speech datasets and transcription references.
|
| 21 |
+
- ASR model checkpoints and evaluation links.
|
| 22 |
+
- Benchmark notes and reproducible experiment pointers.
|
| 23 |
+
|
| 24 |
+
## Related Intent Pages
|
| 25 |
+
|
| 26 |
+
- [Pashto datasets](pashto_datasets.md)
|
| 27 |
+
- [Pashto TTS resources](pashto_tts.md)
|
| 28 |
+
|
| 29 |
+
## Contribution
|
| 30 |
+
|
| 31 |
+
For ASR additions, include task definition, split info, and WER/CER style metrics when available.
|
docs/pashto_datasets.md
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Pashto Datasets
|
| 3 |
+
description: Curated Pashto datasets for ASR, TTS, NLP, MT, and language technology benchmarking.
|
| 4 |
+
keywords: pashto datasets, pukhto dataset, pushto data, pashto nlp resources
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Pashto Datasets
|
| 8 |
+
|
| 9 |
+
This page is for people searching for Pashto datasets across speech and text tasks.
|
| 10 |
+
|
| 11 |
+
## Start Here
|
| 12 |
+
|
| 13 |
+
- Search all resources: [Pashto resource search](search/index.html)
|
| 14 |
+
- Dataset index: [resources/datasets/README.md](../resources/datasets/README.md)
|
| 15 |
+
- Catalog overview: [resource_catalog.md](resource_catalog.md)
|
| 16 |
+
|
| 17 |
+
## Dataset Coverage
|
| 18 |
+
|
| 19 |
+
- Speech datasets for ASR and TTS.
|
| 20 |
+
- Text corpora for NLP and MT.
|
| 21 |
+
- Benchmark-ready subsets and metadata references.
|
| 22 |
+
|
| 23 |
+
## Related Intent Pages
|
| 24 |
+
|
| 25 |
+
- [Pashto ASR resources](pashto_asr.md)
|
| 26 |
+
- [Pashto TTS resources](pashto_tts.md)
|
| 27 |
+
|
| 28 |
+
## Contribution
|
| 29 |
+
|
| 30 |
+
To add a dataset, follow [dataset_guidelines.md](dataset_guidelines.md) and submit a PR with evidence, license, and task tags.
|
docs/pashto_tts.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: Pashto TTS Resources
|
| 3 |
+
description: Pashto text-to-speech resources, datasets, voices, and model references.
|
| 4 |
+
keywords: pashto tts, pukhto text to speech, pushto speech synthesis
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Pashto TTS Resources
|
| 8 |
+
|
| 9 |
+
This page targets search intent around Pashto text-to-speech resources.
|
| 10 |
+
|
| 11 |
+
## Start Here
|
| 12 |
+
|
| 13 |
+
- Search all resources: [Pashto resource search](search/index.html)
|
| 14 |
+
- TTS workspace: [tts/README.md](../tts/README.md)
|
| 15 |
+
- Dataset index: [resources/datasets/README.md](../resources/datasets/README.md)
|
| 16 |
+
- Models index: [resources/models/README.md](../resources/models/README.md)
|
| 17 |
+
|
| 18 |
+
## What You Can Find
|
| 19 |
+
|
| 20 |
+
- TTS-ready Pashto corpora and audio-text pairs.
|
| 21 |
+
- Voice model references and synthesis tooling.
|
| 22 |
+
- Evaluation and benchmark references for speech quality.
|
| 23 |
+
|
| 24 |
+
## Related Intent Pages
|
| 25 |
+
|
| 26 |
+
- [Pashto datasets](pashto_datasets.md)
|
| 27 |
+
- [Pashto ASR resources](pashto_asr.md)
|
| 28 |
+
|
| 29 |
+
## Contribution
|
| 30 |
+
|
| 31 |
+
For TTS additions, include voice/language details, licensing, and synthesis quality evidence.
|
docs/platform_sync_policy.md
CHANGED
|
@@ -20,6 +20,14 @@ The goal is to keep content equivalent while respecting platform differences.
|
|
| 20 |
- Keep absolute links pinned to the final slug `pashto-language-resources`.
|
| 21 |
- Keep one shared `README.md` that is valid on both platforms.
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Known Platform Differences
|
| 24 |
|
| 25 |
- GitHub accepts regular Git binary blobs in many repos.
|
|
@@ -49,3 +57,4 @@ If GitHub and Hugging Face histories diverge:
|
|
| 49 |
- Keep GitHub `main` as canonical source history.
|
| 50 |
- Sync Hugging Face using a content snapshot commit based on `hf/main`.
|
| 51 |
- Do not rewrite remote history unless explicitly required.
|
|
|
|
|
|
| 20 |
- Keep absolute links pinned to the final slug `pashto-language-resources`.
|
| 21 |
- Keep one shared `README.md` that is valid on both platforms.
|
| 22 |
|
| 23 |
+
## Shared Markdown Subset (GitHub + Hugging Face)
|
| 24 |
+
|
| 25 |
+
Use a lowest-common-denominator style in shared docs:
|
| 26 |
+
|
| 27 |
+
- Standard Markdown headings, bullet lists, links, and fenced code blocks.
|
| 28 |
+
- Relative links for internal files whenever possible.
|
| 29 |
+
- YAML front matter only where needed (`README.md` for HF metadata, docs pages for Jekyll SEO).
|
| 30 |
+
- Avoid GitHub-only HTML widgets and avoid HF-specific custom blocks in shared files.
|
| 31 |
## Known Platform Differences
|
| 32 |
|
| 33 |
- GitHub accepts regular Git binary blobs in many repos.
|
|
|
|
| 57 |
- Keep GitHub `main` as canonical source history.
|
| 58 |
- Sync Hugging Face using a content snapshot commit based on `hf/main`.
|
| 59 |
- Do not rewrite remote history unless explicitly required.
|
| 60 |
+
|
docs/release_checklist.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
#
|
| 2 |
|
| 3 |
Use this checklist before tagging a new release.
|
| 4 |
|
|
@@ -13,3 +13,4 @@ Use this checklist before tagging a new release.
|
|
| 13 |
- [ ] Run tests (`python -m pytest -q`).
|
| 14 |
- [ ] Re-check key external resource links in [resource_catalog.md](resource_catalog.md).
|
| 15 |
- [ ] Verify README rendering on GitHub and Hugging Face after push.
|
|
|
|
|
|
| 1 |
+
# Release Checklist
|
| 2 |
|
| 3 |
Use this checklist before tagging a new release.
|
| 4 |
|
|
|
|
| 13 |
- [ ] Run tests (`python -m pytest -q`).
|
| 14 |
- [ ] Re-check key external resource links in [resource_catalog.md](resource_catalog.md).
|
| 15 |
- [ ] Verify README rendering on GitHub and Hugging Face after push.
|
| 16 |
+
|
docs/release_v0.1.1.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Release Notes Draft: v0.1.1
|
| 2 |
+
|
| 3 |
+
Status: draft
|
| 4 |
+
Target date: TBD
|
| 5 |
+
|
| 6 |
+
## Summary
|
| 7 |
+
|
| 8 |
+
v0.1.1 focuses on discoverability improvements and cross-platform consistency between GitHub and Hugging Face.
|
| 9 |
+
|
| 10 |
+
## Highlights
|
| 11 |
+
|
| 12 |
+
- Added HF model-card YAML metadata to `README.md`.
|
| 13 |
+
- Added SEO operations docs:
|
| 14 |
+
- `docs/github_topics_checklist.md`
|
| 15 |
+
- `docs/backlink_strategy.md`
|
| 16 |
+
- Added intent landing pages:
|
| 17 |
+
- `docs/pashto_datasets.md`
|
| 18 |
+
- `docs/pashto_asr.md`
|
| 19 |
+
- `docs/pashto_tts.md`
|
| 20 |
+
- Expanded docs hub and index references for new SEO content.
|
| 21 |
+
|
| 22 |
+
## Validation Checklist
|
| 23 |
+
|
| 24 |
+
- [ ] `python scripts/check_links.py`
|
| 25 |
+
- [ ] `python scripts/validate_resource_catalog.py`
|
| 26 |
+
- [ ] `python -m pytest -q`
|
| 27 |
+
|
| 28 |
+
## Release Commands
|
| 29 |
+
|
| 30 |
+
```bash
|
| 31 |
+
git tag v0.1.1
|
| 32 |
+
git push origin v0.1.1
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## Compare
|
| 36 |
+
|
| 37 |
+
- GitHub compare: [v0.1...v0.1.1](https://github.com/Musawer1214/pashto-language-resources/compare/v0.1...v0.1.1)
|
| 38 |
+
|
docs/release_v0.1.2.md
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Release Notes Draft: v0.1.2
|
| 2 |
+
|
| 3 |
+
Status: draft
|
| 4 |
+
Target date: TBD
|
| 5 |
+
|
| 6 |
+
## Summary
|
| 7 |
+
|
| 8 |
+
v0.1.2 is reserved for post-indexing improvements after monitoring search performance and backlink growth.
|
| 9 |
+
|
| 10 |
+
## Planned Scope
|
| 11 |
+
|
| 12 |
+
- Refine high-intent docs based on Search Console query data.
|
| 13 |
+
- Expand verified resource coverage with citation-quality metadata.
|
| 14 |
+
- Publish first KPI review for backlinks and discoverability metrics.
|
| 15 |
+
- Tighten internal linking between docs hub, search page, and resource sections.
|
| 16 |
+
|
| 17 |
+
## Validation Checklist
|
| 18 |
+
|
| 19 |
+
- [ ] `python scripts/check_links.py`
|
| 20 |
+
- [ ] `python scripts/validate_resource_catalog.py`
|
| 21 |
+
- [ ] `python -m pytest -q`
|
| 22 |
+
|
| 23 |
+
## Release Commands
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
git tag v0.1.2
|
| 27 |
+
git push origin v0.1.2
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
## Compare
|
| 31 |
+
|
| 32 |
+
- GitHub compare: [v0.1.1...v0.1.2](https://github.com/Musawer1214/pashto-language-resources/compare/v0.1.1...v0.1.2)
|
| 33 |
+
|
pyproject.toml
CHANGED
|
@@ -3,7 +3,7 @@ requires = ["setuptools>=68", "wheel"]
|
|
| 3 |
build-backend = "setuptools.build_meta"
|
| 4 |
|
| 5 |
[project]
|
| 6 |
-
name = "
|
| 7 |
version = "0.1.0"
|
| 8 |
description = "Open Pashto language resources for ASR, TTS, NLP, and benchmarks"
|
| 9 |
requires-python = ">=3.10"
|
|
@@ -34,3 +34,4 @@ python_files = ["test_*.py"]
|
|
| 34 |
[tool.setuptools]
|
| 35 |
packages = []
|
| 36 |
|
|
|
|
|
|
| 3 |
build-backend = "setuptools.build_meta"
|
| 4 |
|
| 5 |
[project]
|
| 6 |
+
name = "pashto-language-resources"
|
| 7 |
version = "0.1.0"
|
| 8 |
description = "Open Pashto language resources for ASR, TTS, NLP, and benchmarks"
|
| 9 |
requires-python = ">=3.10"
|
|
|
|
| 34 |
[tool.setuptools]
|
| 35 |
packages = []
|
| 36 |
|
| 37 |
+
|