musaw
docs(seo): add intent pages, topics checklist, backlink plan, and release drafts
9003457
|
Raw
History Blame
5.38 kB
metadata
license: apache-2.0
language:
  - ps
  - en
tags:
  - pashto
  - pukhto
  - pushto
  - asr
  - tts
  - nlp
  - machine-translation
  - language-resources
  - low-resource-languages
  - speech-recognition

Pashto Language Resources Hub (Pukhto/Pashto)

Open-source repository for Pashto language technology resources: datasets, models, benchmarks, ASR, TTS, NLP, and machine translation (MT).

This project curates verified Pashto resources and maintains reproducible tooling for discovery, validation, and documentation.

Start Here

If You Searched For

This repository is relevant to these search intents:

  • Pashto datasets
  • Pashto ASR model
  • Pashto TTS resources
  • Pashto NLP benchmark
  • Pashto machine translation resources
  • Pukhto language technology
  • Pushto AI resources

Current Scope

  • Build open Pashto datasets, benchmarks, and model references for ASR, TTS, NLP, and MT.
  • Track practical tools, apps, and academic papers for Pashto integration in technology.
  • Keep everything transparent, reproducible, and contribution-friendly.

Resource System

Machine-readable and searchable resource pipeline:

How New Resources Are Added

  1. Auto discovery runs daily from .github/workflows/resource_sync.yml and updates resources/catalog/pending_candidates.json in a review PR.
  2. Manual review checks quality, Pashto evidence, and license compatibility before promoting entries into resources/catalog/resources.json with status: verified.
  3. Regeneration and validation runs python scripts/validate_resource_catalog.py and python scripts/generate_resource_views.py, then commits generated updates.

Shortcut wrapper:

  • python scripts/run_resource_cycle.py --limit 25

Quickstart

python -m pip install -e ".[dev]"
python scripts/validate_resource_catalog.py
python scripts/generate_resource_views.py
python scripts/check_links.py
python -m pytest -q

Discoverability And SEO

Documentation Map

Resource Sections

Workspaces

  • data/: datasets, curation, metadata, quality
  • asr/: ASR baselines and experiments
  • tts/: TTS baselines and experiments
  • benchmarks/: benchmark sets and evaluation
  • experiments/: reproducible run cards
  • apps/desktop/: user-facing integration references
  • models/: model layout and release conventions