File size: 2,111 Bytes
f13fd7c
 
 
 
 
 
 
 
 
081627f
 
 
 
 
 
2f53244
 
 
 
 
081627f
f13fd7c
 
 
 
 
 
 
 
 
 
574cd8c
f13fd7c
 
 
 
 
 
 
4598659
f13fd7c
 
 
 
 
 
 
 
 
574cd8c
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Resource Automation

This repository uses a semi-automated process to keep Pashto resources current while preserving human review.

## Goals
- Discover new Pashto-relevant resources from trusted public endpoints.
- Keep a machine-readable canonical catalog.
- Prevent unreviewed low-confidence resources from directly entering verified lists.

## Covered source types
- Kaggle datasets
- Hugging Face datasets
- Hugging Face models
- Hugging Face Spaces (projects)
- GitHub repositories (projects and code)
- GitLab repositories (projects and code)
- Zenodo records
- Dataverse datasets
- DataCite DOI records
- Research-paper endpoints (arXiv, Semantic Scholar, OpenAlex, Crossref)

## Files involved
- Canonical verified catalog: [../resources/catalog/resources.json](../resources/catalog/resources.json)
- Candidate feed: [../resources/catalog/pending_candidates.json](../resources/catalog/pending_candidates.json)
- Catalog schema: [../resources/schema/resource.schema.json](../resources/schema/resource.schema.json)
- Search export: [search/resources.json](search/resources.json)

## Scripts
- Validate catalog: `python scripts/validate_resource_catalog.py`
- Generate markdown and search index: `python scripts/generate_resource_views.py`
- Sync new candidates: `python scripts/sync_resources.py --limit 20`
- Full run wrapper: `python scripts/run_resource_cycle.py --limit 25`

## GitHub Actions
- CI (`.github/workflows/ci.yml`) enforces:
  - catalog validation
  - generated file consistency
  - markdown link checks
  - tests
- Resource Sync (`.github/workflows/resource_sync.yml`) runs daily and opens a PR with candidate updates.

## Review flow
1. Inspect candidate entries in `resources/catalog/pending_candidates.json`.
2. Select useful items and move them into `resources/catalog/resources.json`.
3. Set `status` to `verified` only after checking evidence and license.
4. Run:
   - `python scripts/validate_resource_catalog.py`
   - `python scripts/generate_resource_views.py`
5. Commit and open PR.

## Runbook
- Reusable process guide: [resource_cycle_runbook.md](resource_cycle_runbook.md)