File size: 2,111 Bytes
f13fd7c 081627f 2f53244 081627f f13fd7c 574cd8c f13fd7c 4598659 f13fd7c 574cd8c | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | # Resource Automation
This repository uses a semi-automated process to keep Pashto resources current while preserving human review.
## Goals
- Discover new Pashto-relevant resources from trusted public endpoints.
- Keep a machine-readable canonical catalog.
- Prevent unreviewed low-confidence resources from directly entering verified lists.
## Covered source types
- Kaggle datasets
- Hugging Face datasets
- Hugging Face models
- Hugging Face Spaces (projects)
- GitHub repositories (projects and code)
- GitLab repositories (projects and code)
- Zenodo records
- Dataverse datasets
- DataCite DOI records
- Research-paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref)
## Files involved
- Canonical verified catalog: [../resources/catalog/resources.json](../resources/catalog/resources.json)
- Candidate feed: [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json)
- Catalog schema: [../resources/schema/resource.schema.json](../resources/schema/resource.schema.json)
- Search export: [search/resources.json](search/resources.json)
## Scripts
- Validate catalog: `python scripts/validate_resource_catalog.py`
- Generate markdown and search index: `python scripts/generate_resource_views.py`
- Sync new candidates: `python scripts/sync_resources.py --limit 20`
- Full run wrapper: `python scripts/run_resource_cycle.py --limit 25`
## GitHub Actions
- CI (`.github/workflows/ci.yml`) enforces:
- catalog validation
- generated file consistency
- markdown link checks
- tests
- Resource Sync (`.github/workflows/resource_sync.yml`) runs daily and opens a PR with candidate updates.
## Review flow
1. Inspect candidate entries in `resources/catalog/pending_candidates.json`.
2. Select useful items and move them into `resources/catalog/resources.json`.
3. Set `status` to `verified` only after checking evidence and license.
4. Run:
- `python scripts/validate_resource_catalog.py`
- `python scripts/generate_resource_views.py`
5. Commit and open PR.
## Runbook
- Reusable process guide: [resource_cycle_runbook.md](resource_cycle_runbook.md)
|