# Resource Automation This repository uses a semi-automated process to keep Pashto resources current while preserving human review. ## Goals - Discover new Pashto-relevant resources from trusted public endpoints. - Keep a machine-readable canonical catalog. - Prevent unreviewed low-confidence resources from directly entering verified lists. ## Covered source types - Kaggle datasets - Hugging Face datasets - Hugging Face models - Hugging Face Spaces (projects) - GitHub repositories (projects and code) - GitLab repositories (projects and code) - Zenodo records - Dataverse datasets - DataCite DOI records - Research-paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref) ## Files involved - Canonical verified catalog: [../resources/catalog/resources.json](../resources/catalog/resources.json) - Candidate feed: [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json) - Catalog schema: [../resources/schema/resource.schema.json](../resources/schema/resource.schema.json) - Search export: [search/resources.json](search/resources.json) ## Scripts - Validate catalog: `python scripts/validate_resource_catalog.py` - Generate markdown and search index: `python scripts/generate_resource_views.py` - Sync new candidates: `python scripts/sync_resources.py --limit 20` - Full run wrapper: `python scripts/run_resource_cycle.py --limit 25` ## GitHub Actions - CI (`.github/workflows/ci.yml`) enforces: - catalog validation - generated file consistency - markdown link checks - tests - Resource Sync (`.github/workflows/resource_sync.yml`) runs daily and opens a PR with candidate updates. ## Review flow 1. Inspect candidate entries in `resources/catalog/pending_candidates.json`. 2. Select useful items and move them into `resources/catalog/resources.json`. 3. Set `status` to `verified` only after checking evidence and license. 4. Run: - `python scripts/validate_resource_catalog.py` - `python scripts/generate_resource_views.py` 5. Commit and open PR. ## Runbook - Reusable process guide: [resource_cycle_runbook.md](resource_cycle_runbook.md)