| # Scripts |
|
|
| Automation scripts for quality checks, resource catalog validation, and search index generation. |
|
|
| ## Available scripts |
| - `validate_normalization.py`: validate normalization seed TSV format and rules. |
| - `check_links.py`: ensure markdown links are clickable (optional online reachability check). |
| - `validate_resource_catalog.py`: validate `resources/catalog/resources.json`. |
| - `generate_resource_views.py`: generate `resources/*/README.md`, `resources/README.md`, and `docs/search/resources.json` from the catalog. |
| - `sync_resources.py`: collect new candidate Pashto resources from Kaggle, Hugging Face (datasets/models/spaces), GitHub, GitLab, OpenAlex, Crossref, Zenodo, Dataverse, DataCite, arXiv, and Semantic Scholar into `resources/catalog/pending_candidates.json`. |
| - `promote_candidates.py`: auto-promote valid non-duplicate entries from `pending_candidates.json` into `resources/catalog/resources.json`. |
| - `review_existing_resources.py`: review current catalog resources, remove stale/removed entries only with strong reasons, and log removals in `resources/catalog/removal_log.json`. |
| - `run_resource_cycle.py`: run the full repeatable resource cycle with one command. |
|
|
| ## Usage |
|
|
| Validate normalization seed file: |
| ```bash |
| python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv |
| ``` |
|
|
| Validate resource catalog: |
| ```bash |
| python scripts/validate_resource_catalog.py |
| ``` |
|
|
| Generate markdown and search index from catalog: |
| ```bash |
| python scripts/generate_resource_views.py |
| ``` |
|
|
| Sync candidate resources for maintainer review: |
| ```bash |
| python scripts/sync_resources.py --limit 20 |
| ``` |
|
|
| Review existing resources and remove stale entries before discovery: |
| ```bash |
| python scripts/review_existing_resources.py |
| ``` |
|
|
| Run stricter relevance cleanup mode: |
| ```bash |
| python scripts/review_existing_resources.py --enforce-pashto-relevance |
| ``` |
|
|
| Auto-promote valid candidates into verified catalog: |
| ```bash |
| python scripts/promote_candidates.py |
| ``` |
|
|
| Auto-promote while skipping online URL availability checks: |
| ```bash |
| python scripts/promote_candidates.py --skip-url-check |
| ``` |
|
|
| Run full repeatable cycle: |
| ```bash |
| python scripts/run_resource_cycle.py --limit 25 |
| ``` |
|
|
| Run discovery only: |
| ```bash |
| python scripts/run_resource_cycle.py --discover-only --limit 25 |
| ``` |
|
|
| Check markdown links format: |
| ```bash |
| python scripts/check_links.py |
| ``` |
|
|
| Check markdown links and verify URLs online: |
| ```bash |
| python scripts/check_links.py --online |
| ``` |
|
|