musaw commited on
Commit
204c5d9
·
1 Parent(s): 7d9f55b

chore(resources): run discovery and promote reviewed Pashto resources for v1.0.1

Browse files
CHANGELOG.md CHANGED
@@ -3,7 +3,12 @@
3
  All notable changes to this project will be documented in this file.
4
 
5
  The format is inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
6
- and this project uses semantic version tags (for example `v1.0.0`, `v1.0.1`).
 
 
 
 
 
7
 
8
  ## [Unreleased]
9
  ### Added
@@ -15,6 +20,24 @@ and this project uses semantic version tags (for example `v1.0.0`, `v1.0.1`).
15
  ### Fixed
16
  - None yet.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## [v1.0.0] - 2026-02-18
19
  ### Added
20
  - Release notes index under `docs/releases/` with `docs/releases/v1.0.0.md`.
@@ -37,4 +60,3 @@ and this project uses semantic version tags (for example `v1.0.0`, `v1.0.1`).
37
  - Structured `resources/` folder for datasets, models, benchmarks, and tools.
38
  - Link-check script and normalization-validator tests.
39
  - Documentation hub and model/release/process guidance.
40
-
 
3
  All notable changes to this project will be documented in this file.
4
 
5
  The format is inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
6
+ and this project uses semantic version tags with a fixed role per figure:
7
+
8
+ - `vMAJOR.CODE.RESOURCE`
9
+ - `MAJOR`: major project milestones.
10
+ - `CODE`: code fixes and implementation updates.
11
+ - `RESOURCE`: resource-catalog updates.
12
 
13
  ## [Unreleased]
14
  ### Added
 
20
  ### Fixed
21
  - None yet.
22
 
23
+ ## [v1.0.1] - 2026-02-18
24
+ ### Added
25
+ - Promoted 6 high-confidence, non-duplicate Hugging Face resources to verified catalog:
26
+ - `ihanif/pashto_speech_2k`
27
+ - `ihanif/pashto_speech_3k`
28
+ - `koochikoo25/Pashto-Concatenated`
29
+ - `koochikoo25/Whisper-medium-pashto`
30
+ - `afaaaak/urdu_pashto_translator`
31
+ - `DrSaqlainHassan/PashtoTokenixer`
32
+
33
+ ### Changed
34
+ - Updated `resources/catalog/resources.json` to version `1.0.1` with `updated_on: 2026-02-18`.
35
+ - Regenerated resource indexes and search payload from the updated catalog.
36
+ - Refreshed pending candidate feed from full discovery sync.
37
+
38
+ ### Fixed
39
+ - Kept only high-confidence Pashto-centric resources in promotion scope for this cycle.
40
+
41
  ## [v1.0.0] - 2026-02-18
42
  ### Added
43
  - Release notes index under `docs/releases/` with `docs/releases/v1.0.0.md`.
 
60
  - Structured `resources/` folder for datasets, models, benchmarks, and tools.
61
  - Link-check script and normalization-validator tests.
62
  - Documentation hub and model/release/process guidance.
 
CITATION.cff CHANGED
@@ -2,7 +2,7 @@ cff-version: 1.2.0
2
  message: "If you use this repository, please cite it."
3
  title: "Pashto Language Resources Hub (Pukhto/Pashto)"
4
  type: software
5
- version: 1.0.0
6
  date-released: 2026-02-18
7
  license: Apache-2.0
8
  repository-code: "https://github.com/Musawer1214/pashto-language-resources"
@@ -19,3 +19,4 @@ keywords:
19
  - NLP
20
  - machine translation
21
  - language resources
 
 
2
  message: "If you use this repository, please cite it."
3
  title: "Pashto Language Resources Hub (Pukhto/Pashto)"
4
  type: software
5
+ version: 1.0.1
6
  date-released: 2026-02-18
7
  license: Apache-2.0
8
  repository-code: "https://github.com/Musawer1214/pashto-language-resources"
 
19
  - NLP
20
  - machine translation
21
  - language resources
22
+
README.md CHANGED
@@ -73,7 +73,7 @@ python -m pytest -q
73
  ## Releases
74
 
75
  - Release notes index: [docs/releases/README.md](docs/releases/README.md)
76
- - Latest release notes: [v1.0.0](docs/releases/v1.0.0.md)
77
  - Changelog: [CHANGELOG.md](CHANGELOG.md)
78
 
79
  ## Contributing
@@ -81,3 +81,4 @@ python -m pytest -q
81
  - Contribution guide: [CONTRIBUTING.md](CONTRIBUTING.md)
82
  - Community communication: [community/COMMUNICATION.md](community/COMMUNICATION.md)
83
  - Resource guidelines: [docs/dataset_guidelines.md](docs/dataset_guidelines.md)
 
 
73
  ## Releases
74
 
75
  - Release notes index: [docs/releases/README.md](docs/releases/README.md)
76
+ - Latest release notes: [v1.0.1](docs/releases/v1.0.1.md)
77
  - Changelog: [CHANGELOG.md](CHANGELOG.md)
78
 
79
  ## Contributing
 
81
  - Contribution guide: [CONTRIBUTING.md](CONTRIBUTING.md)
82
  - Community communication: [community/COMMUNICATION.md](community/COMMUNICATION.md)
83
  - Resource guidelines: [docs/dataset_guidelines.md](docs/dataset_guidelines.md)
84
+
docs/index.md CHANGED
@@ -43,7 +43,7 @@ description: Open-source Pashto (Pukhto/Pashto) datasets, models, benchmarks, AS
43
  ## Releases
44
 
45
  - Release notes index: [releases/README.md](releases/README.md)
46
- - Current release notes: [v1.0.0](releases/v1.0.0.md)
47
  - Changelog: [../CHANGELOG.md](../CHANGELOG.md)
48
 
49
  ## Contribution
@@ -57,3 +57,4 @@ Start here:
57
  ## License
58
 
59
  This project is released under Apache 2.0. See [LICENSE](../LICENSE).
 
 
43
  ## Releases
44
 
45
  - Release notes index: [releases/README.md](releases/README.md)
46
+ - Current release notes: [v1.0.1](releases/v1.0.1.md)
47
  - Changelog: [../CHANGELOG.md](../CHANGELOG.md)
48
 
49
  ## Contribution
 
57
  ## License
58
 
59
  This project is released under Apache 2.0. See [LICENSE](../LICENSE).
60
+
docs/release_process.md CHANGED
@@ -18,8 +18,18 @@
18
 
19
  ## Versioning
20
 
21
- - Use semantic version tags: `vMAJOR.MINOR.PATCH`.
22
- - Example: `v1.0.0`, `v1.0.1`, `v1.1.0`, `v2.0.0`.
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Release Notes Location
25
 
 
18
 
19
  ## Versioning
20
 
21
+ Use three-figure tags with fixed meaning:
22
+
23
+ - `vMAJOR.CODE.RESOURCE`
24
+ - `MAJOR`: major milestones and large project-level changes.
25
+ - `CODE`: code fixes, implementation changes, and internal patch updates.
26
+ - `RESOURCE`: resource-catalog updates after candidate discovery and review.
27
+
28
+ Examples:
29
+
30
+ - `v1.0.1`: resource update release.
31
+ - `v1.1.1`: code-fix release.
32
+ - `v2.0.0`: next major milestone.
33
 
34
  ## Release Notes Location
35
 
docs/releases/README.md CHANGED
@@ -4,9 +4,13 @@ This directory stores versioned release notes for the project.
4
 
5
  ## Current
6
 
 
 
 
 
7
  - [v1.0.0](v1.0.0.md)
8
 
9
  ## Notes
10
 
11
- - `v1.0.0` is the first official stable release tag.
12
  - Earlier bootstrap history (`v0.1`) remains documented in [../../CHANGELOG.md](../../CHANGELOG.md).
 
4
 
5
  ## Current
6
 
7
+ - [v1.0.1](v1.0.1.md)
8
+
9
+ ## Previous
10
+
11
  - [v1.0.0](v1.0.0.md)
12
 
13
  ## Notes
14
 
15
+ - `v1.0.1` is a resource-catalog release (third figure update).
16
  - Earlier bootstrap history (`v0.1`) remains documented in [../../CHANGELOG.md](../../CHANGELOG.md).
docs/releases/v1.0.1.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Release Notes: v1.0.1
2
+
3
+ Release date: 2026-02-18
4
+ Tag: `v1.0.1`
5
+
6
+ ## Summary
7
+
8
+ Resource-catalog release with new reviewed Pashto resources promoted from discovery candidates and fully regenerated search/index artifacts.
9
+
10
+ ## Promoted Resources
11
+
12
+ - `ihanif/pashto_speech_2k` (dataset)
13
+ - `ihanif/pashto_speech_3k` (dataset)
14
+ - `koochikoo25/Pashto-Concatenated` (dataset)
15
+ - `koochikoo25/Whisper-medium-pashto` (model)
16
+ - `afaaaak/urdu_pashto_translator` (project)
17
+ - `DrSaqlainHassan/PashtoTokenixer` (project)
18
+
19
+ ## Validation
20
+
21
+ - `python scripts/validate_resource_catalog.py`
22
+ - `python scripts/generate_resource_views.py`
23
+ - `python scripts/check_links.py`
24
+ - `python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv`
25
+ - `python -m pytest -q`
26
+
27
+ ## Versioning Rule Applied
28
+
29
+ This release follows the project versioning rule:
30
+
31
+ - `vMAJOR.CODE.RESOURCE`
32
+ - Resource-only updates increment the third figure.
33
+
34
+ ## Compare
35
+
36
+ - [GitHub compare: v1.0.0...v1.0.1](https://github.com/Musawer1214/pashto-language-resources/compare/v1.0.0...v1.0.1)
docs/search/resources.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
- "generated_on": "2026-02-17T00:00:00Z",
3
- "count": 95,
4
  "resources": [
5
  {
6
  "id": "dataset-common-voice-ps-v24",
@@ -2357,6 +2357,171 @@
2357
  "markers": [
2358
  "pashto"
2359
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2360
  }
2361
  ]
2362
  }
 
1
  {
2
+ "generated_on": "2026-02-18T00:00:00Z",
3
+ "count": 101,
4
  "resources": [
5
  {
6
  "id": "dataset-common-voice-ps-v24",
 
2357
  "markers": [
2358
  "pashto"
2359
  ]
2360
+ },
2361
+ {
2362
+ "id": "dataset-hf-ihanif-pashto-speech-2k",
2363
+ "title": "ihanif/pashto_speech_2k",
2364
+ "url": "https://huggingface.co/datasets/ihanif/pashto_speech_2k",
2365
+ "category": "dataset",
2366
+ "source": "huggingface",
2367
+ "status": "verified",
2368
+ "summary": "Pashto synthetic speech dataset with paired audio-text samples for low-resource ASR baselines.",
2369
+ "primary_use": "ASR training and controlled synthetic-speech evaluation",
2370
+ "tasks": [
2371
+ "asr"
2372
+ ],
2373
+ "tags": [
2374
+ "pashto",
2375
+ "speech",
2376
+ "dataset",
2377
+ "asr",
2378
+ "huggingface"
2379
+ ],
2380
+ "evidence_text": "Dataset metadata includes language:ps and Pashto speech dataset card details.",
2381
+ "evidence_url": "https://huggingface.co/datasets/ihanif/pashto_speech_2k",
2382
+ "markers": [
2383
+ "language:ps",
2384
+ "pashto",
2385
+ "speech"
2386
+ ]
2387
+ },
2388
+ {
2389
+ "id": "dataset-hf-ihanif-pashto-speech-3k",
2390
+ "title": "ihanif/pashto_speech_3k",
2391
+ "url": "https://huggingface.co/datasets/ihanif/pashto_speech_3k",
2392
+ "category": "dataset",
2393
+ "source": "huggingface",
2394
+ "status": "verified",
2395
+ "summary": "Pashto synthetic speech parquet dataset with audio-text pairs and language metadata.",
2396
+ "primary_use": "ASR training and reproducible speech-data experimentation",
2397
+ "tasks": [
2398
+ "asr"
2399
+ ],
2400
+ "tags": [
2401
+ "pashto",
2402
+ "speech",
2403
+ "dataset",
2404
+ "asr",
2405
+ "huggingface",
2406
+ "parquet"
2407
+ ],
2408
+ "evidence_text": "Dataset metadata includes language:ps and task category automatic speech recognition.",
2409
+ "evidence_url": "https://huggingface.co/datasets/ihanif/pashto_speech_3k",
2410
+ "markers": [
2411
+ "language:ps",
2412
+ "automatic-speech-recognition",
2413
+ "pashto"
2414
+ ]
2415
+ },
2416
+ {
2417
+ "id": "dataset-hf-koochikoo25-pashto-concatenated",
2418
+ "title": "koochikoo25/Pashto-Concatenated",
2419
+ "url": "https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated",
2420
+ "category": "dataset",
2421
+ "source": "huggingface",
2422
+ "status": "verified",
2423
+ "summary": "Pashto concatenated audio-text dataset with predefined train-validation-test splits.",
2424
+ "primary_use": "ASR dataset preparation and split-based benchmark experiments",
2425
+ "tasks": [
2426
+ "asr"
2427
+ ],
2428
+ "tags": [
2429
+ "pashto",
2430
+ "speech",
2431
+ "dataset",
2432
+ "asr",
2433
+ "huggingface"
2434
+ ],
2435
+ "evidence_text": "Dataset title explicitly states Pashto and card metadata exposes audio-text features and splits.",
2436
+ "evidence_url": "https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated",
2437
+ "markers": [
2438
+ "Pashto",
2439
+ "audio",
2440
+ "transcription"
2441
+ ]
2442
+ },
2443
+ {
2444
+ "id": "model-hf-koochikoo25-whisper-medium-pashto",
2445
+ "title": "koochikoo25/Whisper-medium-pashto",
2446
+ "url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
2447
+ "category": "model",
2448
+ "source": "huggingface",
2449
+ "status": "verified",
2450
+ "summary": "Whisper medium fine-tuned checkpoint for Pashto automatic speech recognition.",
2451
+ "primary_use": "Pashto ASR baseline modeling and transcription comparison",
2452
+ "tasks": [
2453
+ "asr"
2454
+ ],
2455
+ "tags": [
2456
+ "pashto",
2457
+ "asr",
2458
+ "model",
2459
+ "whisper",
2460
+ "huggingface"
2461
+ ],
2462
+ "evidence_text": "Model tags include ps and automatic-speech-recognition with a Pashto model name.",
2463
+ "evidence_url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
2464
+ "markers": [
2465
+ "ps",
2466
+ "automatic-speech-recognition",
2467
+ "pashto"
2468
+ ]
2469
+ },
2470
+ {
2471
+ "id": "project-hf-space-afaaaak-urdu-pashto-translator",
2472
+ "title": "afaaaak/urdu_pashto_translator",
2473
+ "url": "https://huggingface.co/spaces/afaaaak/urdu_pashto_translator",
2474
+ "category": "project",
2475
+ "source": "huggingface",
2476
+ "status": "verified",
2477
+ "summary": "Interactive Urdu-to-Pashto translation Space with a runnable web demo.",
2478
+ "primary_use": "Translation demo and bilingual usability testing",
2479
+ "tasks": [
2480
+ "mt",
2481
+ "translation",
2482
+ "demo"
2483
+ ],
2484
+ "tags": [
2485
+ "pashto",
2486
+ "project",
2487
+ "huggingface-space",
2488
+ "translation",
2489
+ "demo"
2490
+ ],
2491
+ "evidence_text": "Space metadata title is Urdu Pashto Translator and the slug includes pashto.",
2492
+ "evidence_url": "https://huggingface.co/spaces/afaaaak/urdu_pashto_translator",
2493
+ "markers": [
2494
+ "Pashto",
2495
+ "translator"
2496
+ ]
2497
+ },
2498
+ {
2499
+ "id": "project-hf-space-drsaqlainhassan-pashto-tokenixer",
2500
+ "title": "DrSaqlainHassan/PashtoTokenixer",
2501
+ "url": "https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer",
2502
+ "category": "project",
2503
+ "source": "huggingface",
2504
+ "status": "verified",
2505
+ "summary": "Pashto parts-of-speech identification Space for interactive NLP exploration.",
2506
+ "primary_use": "Pashto NLP demo for token and part-of-speech analysis",
2507
+ "tasks": [
2508
+ "nlp",
2509
+ "pos-tagging",
2510
+ "demo"
2511
+ ],
2512
+ "tags": [
2513
+ "pashto",
2514
+ "project",
2515
+ "huggingface-space",
2516
+ "nlp",
2517
+ "demo"
2518
+ ],
2519
+ "evidence_text": "Space card title states Pashto Parts of Speech Identifier and the slug contains Pashto.",
2520
+ "evidence_url": "https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer",
2521
+ "markers": [
2522
+ "Pashto",
2523
+ "parts-of-speech"
2524
+ ]
2525
  }
2526
  ]
2527
  }
pyproject.toml CHANGED
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
4
 
5
  [project]
6
  name = "pashto-language-resources"
7
- version = "1.0.0"
8
  description = "Open Pashto language resources for ASR, TTS, NLP, and benchmarks"
9
  requires-python = ">=3.10"
10
  readme = "README.md"
@@ -36,3 +36,4 @@ packages = []
36
 
37
 
38
 
 
 
4
 
5
  [project]
6
  name = "pashto-language-resources"
7
+ version = "1.0.1"
8
  description = "Open Pashto language resources for ASR, TTS, NLP, and benchmarks"
9
  requires-python = ">=3.10"
10
  readme = "README.md"
 
36
 
37
 
38
 
39
+
resources/README.md CHANGED
@@ -3,12 +3,12 @@
3
  Structured, Pashto-focused resource tracking lives in this folder.
4
 
5
  ## Sections
6
- - Datasets (35): [datasets/README.md](datasets/README.md)
7
- - Models (16): [models/README.md](models/README.md)
8
  - Benchmarks (4): [benchmarks/README.md](benchmarks/README.md)
9
  - Tools (0): [tools/README.md](tools/README.md)
10
  - Papers (24): [papers/README.md](papers/README.md)
11
- - Projects (15): [projects/README.md](projects/README.md)
12
  - Code (1): [codes/README.md](codes/README.md)
13
 
14
  ## Machine-Readable Catalog
@@ -22,4 +22,4 @@ Structured, Pashto-focused resource tracking lives in this folder.
22
  - Run `python scripts/validate_resource_catalog.py` before opening a PR.
23
  - Run `python scripts/generate_resource_views.py` after catalog changes.
24
 
25
- Verified resource count: `95`
 
3
  Structured, Pashto-focused resource tracking lives in this folder.
4
 
5
  ## Sections
6
+ - Datasets (38): [datasets/README.md](datasets/README.md)
7
+ - Models (17): [models/README.md](models/README.md)
8
  - Benchmarks (4): [benchmarks/README.md](benchmarks/README.md)
9
  - Tools (0): [tools/README.md](tools/README.md)
10
  - Papers (24): [papers/README.md](papers/README.md)
11
+ - Projects (17): [projects/README.md](projects/README.md)
12
  - Code (1): [codes/README.md](codes/README.md)
13
 
14
  ## Machine-Readable Catalog
 
22
  - Run `python scripts/validate_resource_catalog.py` before opening a PR.
23
  - Run `python scripts/generate_resource_views.py` after catalog changes.
24
 
25
+ Verified resource count: `101`
resources/catalog/pending_candidates.json CHANGED
@@ -1,5 +1,5 @@
1
  {
2
- "generated_on": "2026-02-18T05:19:47.871594+00:00",
3
  "sources": [
4
  "kaggle-datasets",
5
  "huggingface-datasets",
@@ -15,90 +15,21 @@
15
  "arxiv",
16
  "semantic-scholar"
17
  ],
18
- "candidate_count": 108,
19
  "candidates": [
20
  {
21
- "id": "candidate-s2-pushto-pakhto-nasar-kay-da-matbooa-tarjumo-yova-tanqeedi-mutala-jaiza",
22
- "title": "(Pushto) Pakhto Nasar Kay Da Matbooa Tarjumo Yova Tanqeedi Mutala/Jaiza.",
23
- "url": "https://www.semanticscholar.org/paper/0da0e8535262d1f26f04dd6bc2f091474cab4150",
24
  "category": "paper",
25
  "source": "other",
26
  "status": "candidate",
27
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
28
- "primary_use": "Needs maintainer review before promotion to verified catalog.",
29
- "tasks": [],
30
- "pashto_evidence": {
31
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
32
- "evidence_url": "https://www.semanticscholar.org/paper/0da0e8535262d1f26f04dd6bc2f091474cab4150",
33
- "markers": [
34
- "pashto"
35
- ]
36
- },
37
- "tags": [
38
- "pashto",
39
- "candidate",
40
- "paper"
41
- ]
42
- },
43
- {
44
- "id": "candidate-s2-a-dictionary-of-the-pukhto-pushto-or-language-of-the-afghans",
45
- "title": "A Dictionary of the Pukhto, Pushto, or Language of the Afghans",
46
- "url": "https://www.semanticscholar.org/paper/777c0aa56991f55826339915363de2ceb8dd7141",
47
- "category": "paper",
48
- "source": "other",
49
- "status": "candidate",
50
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
51
- "primary_use": "Needs maintainer review before promotion to verified catalog.",
52
- "tasks": [],
53
- "pashto_evidence": {
54
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
55
- "evidence_url": "https://www.semanticscholar.org/paper/777c0aa56991f55826339915363de2ceb8dd7141",
56
- "markers": [
57
- "pashto"
58
- ]
59
- },
60
- "tags": [
61
- "pashto",
62
- "candidate",
63
- "paper"
64
- ]
65
- },
66
- {
67
- "id": "candidate-s2-a-dictionary-of-the-pukhto-pushto-or-language-of-the-afghans-with-remarks-on-the",
68
- "title": "A dictionary of the Pukhto, Pushto, or language of the Afghans; with remarks on the originality of the language, and its affinity to the Semitic and other Oriental tongues, etc.",
69
- "url": "https://www.semanticscholar.org/paper/d12502a6c245ff6f537bf68d9db4b449dca827bb",
70
- "category": "paper",
71
- "source": "other",
72
- "status": "candidate",
73
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
74
- "primary_use": "Needs maintainer review before promotion to verified catalog.",
75
- "tasks": [],
76
- "pashto_evidence": {
77
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
78
- "evidence_url": "https://www.semanticscholar.org/paper/d12502a6c245ff6f537bf68d9db4b449dca827bb",
79
- "markers": [
80
- "pashto"
81
- ]
82
- },
83
- "tags": [
84
- "pashto",
85
- "candidate",
86
- "paper"
87
- ]
88
- },
89
- {
90
- "id": "candidate-s2-a-grammar-of-the-puk-h-to-or-pus-h-to-language",
91
- "title": "A grammar of the Puk̲h̲to or Pus̲'h̲to language",
92
- "url": "https://www.semanticscholar.org/paper/99c46409a55ac0bf68e2c530a377becfcb46dd47",
93
- "category": "paper",
94
- "source": "other",
95
- "status": "candidate",
96
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
97
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
98
  "tasks": [],
99
  "pashto_evidence": {
100
  "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
101
- "evidence_url": "https://www.semanticscholar.org/paper/99c46409a55ac0bf68e2c530a377becfcb46dd47",
102
  "markers": [
103
  "pashto"
104
  ]
@@ -279,6 +210,99 @@
279
  "farsi"
280
  ]
281
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
282
  {
283
  "id": "candidate-hf-dataset-arsalagrey-pashto",
284
  "title": "arsalagrey/pashto",
@@ -372,6 +396,53 @@
372
  "datacite"
373
  ]
374
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
375
  {
376
  "id": "candidate-zenodo-dataset-clitic-particles-and-the-typology-of-2p-languages",
377
  "title": "Clitic Particles and the Typology of 2P Languages",
@@ -420,6 +491,52 @@
420
  "zenodo"
421
  ]
422
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
423
  {
424
  "id": "candidate-zenodo-paper-depiction-of-women-s-cries-in-pashto-landai-poetry",
425
  "title": "Depiction of Women's Cries in Pashto Landai Poetry",
@@ -684,6 +801,29 @@
684
  "zenodo"
685
  ]
686
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
687
  {
688
  "id": "candidate-zenodo-paper-evaluation-of-antibacterial-activity-of-zizyphus-jujuba",
689
  "title": "EVALUATION OF ANTIBACTERIAL ACTIVITY OF ZIZYPHUS JUJUBA",
@@ -756,10 +896,33 @@
756
  "zenodo"
757
  ]
758
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
759
  {
760
  "id": "candidate-datacite-paper-fairness-evaluation-and-inference-level-mitigation-in-llms",
761
  "title": "Fairness Evaluation and Inference Level Mitigation in LLMs",
762
- "url": "https://figshare.mq.edu.au/articles/thesis/Fairness_Evaluation_and_Inference_Level_Mitigation_in_LLMs/31093552/1",
763
  "category": "paper",
764
  "source": "datacite",
765
  "status": "candidate",
@@ -768,7 +931,7 @@
768
  "tasks": [],
769
  "pashto_evidence": {
770
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
771
- "evidence_url": "https://figshare.mq.edu.au/articles/thesis/Fairness_Evaluation_and_Inference_Level_Mitigation_in_LLMs/31093552/1",
772
  "markers": [
773
  "pashto"
774
  ]
@@ -834,7 +997,7 @@
834
  {
835
  "id": "candidate-datacite-project-female-birth-control-part-ii-pashto",
836
  "title": "Female Birth Control Part II [Pashto]",
837
- "url": "https://zenodo.org/doi/10.5281/zenodo.18325401",
838
  "category": "project",
839
  "source": "datacite",
840
  "status": "candidate",
@@ -843,7 +1006,7 @@
843
  "tasks": [],
844
  "pashto_evidence": {
845
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
846
- "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18325401",
847
  "markers": [
848
  "pashto"
849
  ]
@@ -856,18 +1019,18 @@
856
  ]
857
  },
858
  {
859
- "id": "candidate-datacite-paper-framing-political-bias-in-multilingual-llms-across-pakistani-languages",
860
- "title": "Framing Political Bias in Multilingual LLMs Across Pakistani Languages",
861
- "url": "https://arxiv.org/abs/2506.00068",
862
  "category": "paper",
863
- "source": "datacite",
864
  "status": "candidate",
865
- "summary": "Large Language Models (LLMs) increasingly shape public discourse, yet most evaluations of political and economic bias have focused on high-resource, Western languages and contexts. This leaves critical blind spots in low-resource, multiling",
866
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
867
  "tasks": [],
868
  "pashto_evidence": {
869
- "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
870
- "evidence_url": "https://arxiv.org/abs/2506.00068",
871
  "markers": [
872
  "pashto"
873
  ]
@@ -875,14 +1038,37 @@
875
  "tags": [
876
  "pashto",
877
  "candidate",
878
- "paper",
879
- "datacite"
880
  ]
881
  },
882
  {
883
- "id": "candidate-datacite-paper-from-scarcity-to-scale-a-release-level-analysis-of-the-pashto-common-voice-datas",
884
- "title": "From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset",
885
- "url": "https://arxiv.org/abs/2602.14062",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
886
  "category": "paper",
887
  "source": "datacite",
888
  "status": "candidate",
@@ -927,6 +1113,29 @@
927
  "crossref"
928
  ]
929
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
930
  {
931
  "id": "candidate-gh-project-haroon-blip-khan-pukhtoon",
932
  "title": "Haroon-blip/khan-pukhtoon",
@@ -1002,6 +1211,29 @@
1002
  "space"
1003
  ]
1004
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1005
  {
1006
  "id": "candidate-hf-dataset-ihanif-pashto-speech-2k",
1007
  "title": "ihanif/pashto_speech_2k",
@@ -1099,6 +1331,53 @@
1099
  "beautify"
1100
  ]
1101
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1102
  {
1103
  "id": "candidate-zenodo-paper-is-the-pushto-a-semitic-language",
1104
  "title": "Is the Pushto a Semitic Language",
@@ -1123,6 +1402,53 @@
1123
  "zenodo"
1124
  ]
1125
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1126
  {
1127
  "id": "candidate-openalex-knn-and-ann-based-recognition-of-handwritten-pashto-letters-using-zoning-feature",
1128
  "title": "KNN and ANN-based Recognition of Handwritten Pashto Letters using Zoning Features",
@@ -1170,6 +1496,29 @@
1170
  "dataset"
1171
  ]
1172
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1173
  {
1174
  "id": "candidate-zenodo-paper-language-barrier-and-its-effect-on-learning-at-the-public-primary-school-level-i",
1175
  "title": "Language Barrier and its Effect on Learning at the Public Primary School Level in Lahore",
@@ -1194,6 +1543,80 @@
1194
  "zenodo"
1195
  ]
1196
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1197
  {
1198
  "id": "candidate-gh-project-lecramyajiv-ttf-x2",
1199
  "title": "lecramyajiv/ttf-x2",
@@ -1374,18 +1797,18 @@
1374
  ]
1375
  },
1376
  {
1377
- "id": "candidate-hf-model-musawer14-pukhto-pashto",
1378
- "title": "Musawer14/pashto-language-resources",
1379
- "url": "https://huggingface.co/Musawer14/pashto-language-resources",
1380
- "category": "model",
1381
- "source": "huggingface",
1382
  "status": "candidate",
1383
- "summary": "Candidate model returned from Hugging Face search for Pashto.",
1384
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1385
  "tasks": [],
1386
  "pashto_evidence": {
1387
- "evidence_text": "Matched by Pashto keyword in Hugging Face search results.",
1388
- "evidence_url": "https://huggingface.co/Musawer14/pashto-language-resources",
1389
  "markers": [
1390
  "pashto"
1391
  ]
@@ -1393,7 +1816,7 @@
1393
  "tags": [
1394
  "pashto",
1395
  "candidate",
1396
- "model"
1397
  ]
1398
  },
1399
  {
@@ -1450,7 +1873,7 @@
1450
  {
1451
  "id": "candidate-datacite-dataset-navoiy-terra-corpus-v1-0-first-computational-corpus-of-alisher-navoi-works-with-",
1452
  "title": "NAVOIY-TERRA Corpus v1.0: First Computational Corpus of Alisher Navoi Works with Nine-Language Semantic Annotations",
1453
- "url": "https://zenodo.org/doi/10.5281/zenodo.18602634",
1454
  "category": "dataset",
1455
  "source": "datacite",
1456
  "status": "candidate",
@@ -1459,7 +1882,7 @@
1459
  "tasks": [],
1460
  "pashto_evidence": {
1461
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1462
- "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18602634",
1463
  "markers": [
1464
  "pashto"
1465
  ]
@@ -1496,18 +1919,18 @@
1496
  ]
1497
  },
1498
  {
1499
- "id": "candidate-s2-negotiating-pakhto-proverbs-islam-and-the-construction-of-identity-among-pashtun",
1500
- "title": "Negotiating Pakhto: Proverbs, Islam and the Construction of Identity among Pashtuns",
1501
- "url": "https://www.semanticscholar.org/paper/8a503f164e0c1f5be13866dad00539c7e5b1cabc",
1502
- "category": "paper",
1503
- "source": "other",
1504
  "status": "candidate",
1505
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
1506
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1507
  "tasks": [],
1508
  "pashto_evidence": {
1509
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1510
- "evidence_url": "https://www.semanticscholar.org/paper/8a503f164e0c1f5be13866dad00539c7e5b1cabc",
1511
  "markers": [
1512
  "pashto"
1513
  ]
@@ -1515,7 +1938,8 @@
1515
  "tags": [
1516
  "pashto",
1517
  "candidate",
1518
- "paper"
 
1519
  ]
1520
  },
1521
  {
@@ -1574,7 +1998,7 @@
1574
  {
1575
  "id": "candidate-datacite-paper-only-2-of-141-global-languages-employ-a-labial-for-tongue-in-1st-position-challe",
1576
  "title": "Only 2 of 141 Global Languages Employ a Labial for \"Tongue\" in 1st position Challenging Saussure's Arbitrariness With Near Universal Embodied Iconicity for Tongue Vs Mouth in \"inverse\" Control",
1577
- "url": "https://zenodo.org/doi/10.5281/zenodo.17807676",
1578
  "category": "paper",
1579
  "source": "datacite",
1580
  "status": "candidate",
@@ -1583,7 +2007,7 @@
1583
  "tasks": [],
1584
  "pashto_evidence": {
1585
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1586
- "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.17807676",
1587
  "markers": [
1588
  "pashto"
1589
  ]
@@ -1692,18 +2116,18 @@
1692
  ]
1693
  },
1694
  {
1695
- "id": "candidate-s2-pashto-pashto-english-english-pashto-dictionary-phrasebook",
1696
- "title": "Pashto : Pashto-English, English-Pashto dictionary & phrasebook",
1697
- "url": "https://www.semanticscholar.org/paper/8ff77d35396d17225d97772e577e472a2ab1c47a",
1698
  "category": "paper",
1699
- "source": "other",
1700
  "status": "candidate",
1701
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
1702
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1703
  "tasks": [],
1704
  "pashto_evidence": {
1705
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1706
- "evidence_url": "https://www.semanticscholar.org/paper/8ff77d35396d17225d97772e577e472a2ab1c47a",
1707
  "markers": [
1708
  "pashto"
1709
  ]
@@ -1711,15 +2135,40 @@
1711
  "tags": [
1712
  "pashto",
1713
  "candidate",
1714
- "paper"
 
1715
  ]
1716
  },
1717
  {
1718
- "id": "candidate-kaggle-dataset-abdulbasitkh-pashto-isolated-alphabets-and-numerals",
1719
- "title": "Pashto Isolated Alphabets and Numerals",
1720
- "url": "https://www.kaggle.com/datasets/abdulbasitkh/pashto-isolated-alphabetss-and-numerals",
1721
- "category": "dataset",
1722
- "source": "kaggle",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1723
  "status": "candidate",
1724
  "summary": "Pashto Islated Alphabets and Numerals Handwritten and Printed",
1725
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
@@ -1786,29 +2235,6 @@
1786
  "kaggle"
1787
  ]
1788
  },
1789
- {
1790
- "id": "candidate-s2-pashto-poetry-and-militancy-in-khyber-pakhtunkhwa-after-9-11-thematic-analysis-o",
1791
- "title": "PASHTO POETRY AND MILITANCY IN KHYBER PAKHTUNKHWA AFTER 9/11: THEMATIC ANALYSIS OF PASHTO POETRY IN RESISTING MILITANCY",
1792
- "url": "https://www.semanticscholar.org/paper/e81d4e7ac6cd7519643bf5d5c0bdfd9be554a8f2",
1793
- "category": "paper",
1794
- "source": "other",
1795
- "status": "candidate",
1796
- "summary": "The present study sheds light on Pashto or Pakhto Poetry and Militancy in Khyber Pakhtunkhwa after 9/11. The fieldwork for this study was conducted in the Peshawar district of Khyber Pakhtunkhwa, Pakistan, from December 2020 to April 2021.",
1797
- "primary_use": "Needs maintainer review before promotion to verified catalog.",
1798
- "tasks": [],
1799
- "pashto_evidence": {
1800
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1801
- "evidence_url": "https://www.semanticscholar.org/paper/e81d4e7ac6cd7519643bf5d5c0bdfd9be554a8f2",
1802
- "markers": [
1803
- "pashto"
1804
- ]
1805
- },
1806
- "tags": [
1807
- "pashto",
1808
- "candidate",
1809
- "paper"
1810
- ]
1811
- },
1812
  {
1813
  "id": "candidate-crossref-pashto-tappa",
1814
  "title": "Pashto Tappa",
@@ -1929,29 +2355,6 @@
1929
  "kaggle"
1930
  ]
1931
  },
1932
- {
1933
- "id": "candidate-s2-persian-loanwords-and-calques-in-pashto",
1934
- "title": "Persian loanwords and calques in Pashto",
1935
- "url": "https://www.semanticscholar.org/paper/ed232f1c2abd6e6f8a49f04de8ac76bf922521ea",
1936
- "category": "paper",
1937
- "source": "other",
1938
- "status": "candidate",
1939
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
1940
- "primary_use": "Needs maintainer review before promotion to verified catalog.",
1941
- "tasks": [],
1942
- "pashto_evidence": {
1943
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1944
- "evidence_url": "https://www.semanticscholar.org/paper/ed232f1c2abd6e6f8a49f04de8ac76bf922521ea",
1945
- "markers": [
1946
- "pashto"
1947
- ]
1948
- },
1949
- "tags": [
1950
- "pashto",
1951
- "candidate",
1952
- "paper"
1953
- ]
1954
- },
1955
  {
1956
  "id": "candidate-openalex-persian-urdu-and-pashto-a-comparative-orthographic-analysis",
1957
  "title": "Persian, Urdu, and Pashto: A comparative orthographic analysis",
@@ -2000,6 +2403,30 @@
2000
  "zenodo"
2001
  ]
2002
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2003
  {
2004
  "id": "candidate-arxiv-psocr-benchmarking-large-multimodal-models-for-optical-character-recognition-in-",
2005
  "title": "PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language",
@@ -2047,6 +2474,82 @@
2047
  "github"
2048
  ]
2049
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2050
  {
2051
  "id": "candidate-dataverse-dataset-rats-language-identification",
2052
  "title": "RATS Language Identification",
@@ -2143,6 +2646,53 @@
2143
  "dataverse"
2144
  ]
2145
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2146
  {
2147
  "id": "candidate-openalex-separating-phonology-from-syntax-a-reanalysis-of-pashto-cliticization",
2148
  "title": "Separating phonology from syntax: a reanalysis of Pashto cliticization",
@@ -2167,6 +2717,33 @@
2167
  "openalex"
2168
  ]
2169
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2170
  {
2171
  "id": "candidate-gh-project-shawanonymouse-pakhtoon",
2172
  "title": "ShawAnonymouse/Pakhtoon",
@@ -2269,7 +2846,7 @@
2269
  {
2270
  "id": "candidate-crossref-summaries-in-pashto",
2271
  "title": "Summaries in Pashto",
2272
- "url": "https://doi.org/10.1097/01.wtf.0000416393.66575.49",
2273
  "category": "paper",
2274
  "source": "crossref",
2275
  "status": "candidate",
@@ -2278,7 +2855,7 @@
2278
  "tasks": [],
2279
  "pashto_evidence": {
2280
  "evidence_text": "Matched by explicit Pashto marker in title from Crossref search.",
2281
- "evidence_url": "https://doi.org/10.1097/01.wtf.0000416393.66575.49",
2282
  "markers": [
2283
  "pashto"
2284
  ]
@@ -2290,6 +2867,52 @@
2290
  "crossref"
2291
  ]
2292
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2293
  {
2294
  "id": "candidate-hf-project-tasal9-pashto-base-bloom-space",
2295
  "title": "tasal9/pashto-base-bloom-space",
@@ -2362,6 +2985,29 @@
2362
  "openalex"
2363
  ]
2364
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2365
  {
2366
  "id": "candidate-openalex-the-grammar-of-clitics-evidence-from-pashto-and-other-languages",
2367
  "title": "The grammar of clitics : evidence from Pashto and other languages",
@@ -2459,18 +3105,18 @@
2459
  ]
2460
  },
2461
  {
2462
- "id": "candidate-s2-the-social-structure-and-organization-of-a-pakhto-speaking-community-in-afghanis",
2463
- "title": "The Social Structure and Organization of A Pakhto Speaking Community in Afghanistan.",
2464
- "url": "https://www.semanticscholar.org/paper/306e9a04b8835de6e906303b5e27d43a6994cb1d",
2465
  "category": "paper",
2466
- "source": "other",
2467
  "status": "candidate",
2468
- "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
2469
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
2470
  "tasks": [],
2471
  "pashto_evidence": {
2472
- "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
2473
- "evidence_url": "https://www.semanticscholar.org/paper/306e9a04b8835de6e906303b5e27d43a6994cb1d",
2474
  "markers": [
2475
  "pashto"
2476
  ]
@@ -2478,7 +3124,8 @@
2478
  "tags": [
2479
  "pashto",
2480
  "candidate",
2481
- "paper"
 
2482
  ]
2483
  },
2484
  {
@@ -2601,6 +3248,52 @@
2601
  "dataverse"
2602
  ]
2603
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2604
  {
2605
  "id": "candidate-zenodo-paper-resource",
2606
  "title": "بلوچستان میں \" فقہ اسلامی \" کے فروغ و ارتقا٫ کا تحقیقی جائزہ",
@@ -2627,5 +3320,3 @@
2627
  }
2628
  ]
2629
  }
2630
-
2631
-
 
1
  {
2
+ "generated_on": "2026-02-18T11:04:12.305454+00:00",
3
  "sources": [
4
  "kaggle-datasets",
5
  "huggingface-datasets",
 
15
  "arxiv",
16
  "semantic-scholar"
17
  ],
18
+ "candidate_count": 137,
19
  "candidates": [
20
  {
21
+ "id": "candidate-s2-a-comparison-of-pashto-and-turkmen-languages-vowel",
22
+ "title": "A Comparison of Pashto and Turkmen Languages Vowel",
23
+ "url": "https://www.semanticscholar.org/paper/9acff7425bcd0d6fe8ebca8945dd0f450cbd1ebb",
24
  "category": "paper",
25
  "source": "other",
26
  "status": "candidate",
27
+ "summary": "Vowel sounds, which do not obstruct the vocal tract when articulated, are fundamental to language's movement, sound, and rhythm. They form the core of syllables, and the interplay between these syllables and sounds constructs words, phrases",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
29
  "tasks": [],
30
  "pashto_evidence": {
31
  "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
32
+ "evidence_url": "https://www.semanticscholar.org/paper/9acff7425bcd0d6fe8ebca8945dd0f450cbd1ebb",
33
  "markers": [
34
  "pashto"
35
  ]
 
210
  "farsi"
211
  ]
212
  },
213
+ {
214
+ "id": "candidate-s2-an-acoustic-analysis-of-consonants-of-khattak-dialect-of-pashto",
215
+ "title": "An Acoustic Analysis of consonants of Khattak Dialect of Pashto",
216
+ "url": "https://www.semanticscholar.org/paper/ed06d206e60a62c2bebdd487b4f8dea253a9a0a8",
217
+ "category": "paper",
218
+ "source": "other",
219
+ "status": "candidate",
220
+ "summary": "Pashto, an ancient language written in Perso-Arabic script, is predominantly spoken in Pakistan's Khyber Pakhtunkhwa Province and Afghanistan. Despite its wide usage, more research is needed on the consonantal sounds of the Khattak dialect.",
221
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
222
+ "tasks": [],
223
+ "pashto_evidence": {
224
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
225
+ "evidence_url": "https://www.semanticscholar.org/paper/ed06d206e60a62c2bebdd487b4f8dea253a9a0a8",
226
+ "markers": [
227
+ "pashto"
228
+ ]
229
+ },
230
+ "tags": [
231
+ "pashto",
232
+ "candidate",
233
+ "paper"
234
+ ]
235
+ },
236
+ {
237
+ "id": "candidate-zenodo-paper-an-analysis-of-freudian-concept-of-mourning-in-pashto-tappas-on-the-theme-of-mig",
238
+ "title": "AN ANALYSIS OF FREUDIAN CONCEPT OF MOURNING IN PASHTO TAPPAS ON THE THEME OF MIGRATION",
239
+ "url": "https://zenodo.org/records/11124039",
240
+ "category": "paper",
241
+ "source": "zenodo",
242
+ "status": "candidate",
243
+ "summary": "Folk literature of any nation is its collective asset and is the preserver of its social history and culture. The most important genre of Pashto folk poetry is tappa. Tappa is composed of a couplet. It covers all aspects of the Pashtuns’ wa",
244
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
245
+ "tasks": [],
246
+ "pashto_evidence": {
247
+ "evidence_text": "Zenodo metadata includes Pashto markers in title or description.",
248
+ "evidence_url": "https://zenodo.org/records/11124039",
249
+ "markers": [
250
+ "pashto"
251
+ ]
252
+ },
253
+ "tags": [
254
+ "pashto",
255
+ "candidate",
256
+ "paper",
257
+ "zenodo"
258
+ ]
259
+ },
260
+ {
261
+ "id": "candidate-s2-analysing-deep-meaning-of-proverbs-in-pashto-language",
262
+ "title": "Analysing Deep Meaning of Proverbs in Pashto Language",
263
+ "url": "https://www.semanticscholar.org/paper/1a804a9701c5103ed38df3350da61abdf5df2b57",
264
+ "category": "paper",
265
+ "source": "other",
266
+ "status": "candidate",
267
+ "summary": "As other ancient languages of the world, Pashto is one of them having  rich folkloric literature. One of the most important part of this literature is proverbs, which makes a special part of history of this language. These proverbs shows  d",
268
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
269
+ "tasks": [],
270
+ "pashto_evidence": {
271
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
272
+ "evidence_url": "https://www.semanticscholar.org/paper/1a804a9701c5103ed38df3350da61abdf5df2b57",
273
+ "markers": [
274
+ "pashto"
275
+ ]
276
+ },
277
+ "tags": [
278
+ "pashto",
279
+ "candidate",
280
+ "paper"
281
+ ]
282
+ },
283
+ {
284
+ "id": "candidate-s2-animals-as-metaphors-an-analysis-of-gender-representation-in-pashto-proverbs",
285
+ "title": "ANIMALS AS METAPHORS: AN ANALYSIS OF GENDER REPRESENTATION IN PASHTO PROVERBS",
286
+ "url": "https://www.semanticscholar.org/paper/8cc1765aa09f175f567d8fd607953638391f89da",
287
+ "category": "paper",
288
+ "source": "other",
289
+ "status": "candidate",
290
+ "summary": "Pashto proverbs are a significant genre in the Pashto folklore and communication. The proverbs play an important role in framing the social cognition of the Pashtun community. Few past studies have been carried out on the various social asp",
291
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
292
+ "tasks": [],
293
+ "pashto_evidence": {
294
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
295
+ "evidence_url": "https://www.semanticscholar.org/paper/8cc1765aa09f175f567d8fd607953638391f89da",
296
+ "markers": [
297
+ "pashto"
298
+ ]
299
+ },
300
+ "tags": [
301
+ "pashto",
302
+ "candidate",
303
+ "paper"
304
+ ]
305
+ },
306
  {
307
  "id": "candidate-hf-dataset-arsalagrey-pashto",
308
  "title": "arsalagrey/pashto",
 
396
  "datacite"
397
  ]
398
  },
399
+ {
400
+ "id": "candidate-zenodo-paper-challenging-gender-roles-a-feminist-analysis-of-ghani-khan-s-the-pathans",
401
+ "title": "CHALLENGING GENDER ROLES: A FEMINIST ANALYSIS OF GHANI KHAN'S THE PATHANS",
402
+ "url": "https://zenodo.org/records/11216862",
403
+ "category": "paper",
404
+ "source": "zenodo",
405
+ "status": "candidate",
406
+ "summary": "The present research aims to analyse the representation of gender dynamics in Ghani Khan’s The Pathans who is also known as Lewanai Phalsafi (The Lunatic Philosopher), is a towering literary figure in Pashto literature. He is commonly known",
407
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
408
+ "tasks": [],
409
+ "pashto_evidence": {
410
+ "evidence_text": "Zenodo metadata includes Pashto markers in title or description.",
411
+ "evidence_url": "https://zenodo.org/records/11216862",
412
+ "markers": [
413
+ "pashto"
414
+ ]
415
+ },
416
+ "tags": [
417
+ "pashto",
418
+ "candidate",
419
+ "paper",
420
+ "zenodo"
421
+ ]
422
+ },
423
+ {
424
+ "id": "candidate-s2-cinematic-misnomers-examining-the-effects-of-pashto-movie-titles-on-the-percepti",
425
+ "title": "Cinematic Misnomers: Examining the Effects of Pashto Movie Titles on the Perception of Pashtun Identity",
426
+ "url": "https://www.semanticscholar.org/paper/1b4c38ce4ceb6ac7846062bb589351cc88a36617",
427
+ "category": "paper",
428
+ "source": "other",
429
+ "status": "candidate",
430
+ "summary": "The current research is a critical study of the impacts of inappropriate and misleading titles of Pashtu movies on the perception of Pashtun identity. Because most of the titles are abusive and immoral in nature and do not conform to the st",
431
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
432
+ "tasks": [],
433
+ "pashto_evidence": {
434
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
435
+ "evidence_url": "https://www.semanticscholar.org/paper/1b4c38ce4ceb6ac7846062bb589351cc88a36617",
436
+ "markers": [
437
+ "pashto"
438
+ ]
439
+ },
440
+ "tags": [
441
+ "pashto",
442
+ "candidate",
443
+ "paper"
444
+ ]
445
+ },
446
  {
447
  "id": "candidate-zenodo-dataset-clitic-particles-and-the-typology-of-2p-languages",
448
  "title": "Clitic Particles and the Typology of 2P Languages",
 
491
  "zenodo"
492
  ]
493
  },
494
+ {
495
+ "id": "candidate-s2-cross-linguistic-influence-pashto-speakers-learning-english-phonology-grammar-an",
496
+ "title": "Cross-Linguistic Influence: Pashto Speakers Learning English Phonology, Grammar, and Vocabulary",
497
+ "url": "https://www.semanticscholar.org/paper/b36e77a2c2609641b3c601a811b6abaa62a323ce",
498
+ "category": "paper",
499
+ "source": "other",
500
+ "status": "candidate",
501
+ "summary": "This research investigates the influence of Pashto as a mother tongue on the acquisition of English as a second language among Pashto-speaking learners. A wide-scale investigation of several dimensions across phonology grammar and lexicon d",
502
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
503
+ "tasks": [],
504
+ "pashto_evidence": {
505
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
506
+ "evidence_url": "https://www.semanticscholar.org/paper/b36e77a2c2609641b3c601a811b6abaa62a323ce",
507
+ "markers": [
508
+ "pashto"
509
+ ]
510
+ },
511
+ "tags": [
512
+ "pashto",
513
+ "candidate",
514
+ "paper"
515
+ ]
516
+ },
517
+ {
518
+ "id": "candidate-s2-deictic-field-time-of-action-in-the-semantics-of-the-pashto-language-the-time-fi",
519
+ "title": "DEICTIC FIELD “TIME OF ACTION” IN THE SEMANTICS OF THE PASHTO LANGUAGE, THE “TIME” FIELD: BACKGROUND OF THE PROBLEM",
520
+ "url": "https://www.semanticscholar.org/paper/3358d828c2ff07a45d614fd1d81cf44d5c55cad8",
521
+ "category": "paper",
522
+ "source": "other",
523
+ "status": "candidate",
524
+ "summary": "The article examines the semantic modeling of the category of time in language through the lens of deictic field theory, with a focus on Pashto adverbs. It outlines four major approaches to modeling semantic fields - phenomenological, lexic",
525
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
526
+ "tasks": [],
527
+ "pashto_evidence": {
528
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
529
+ "evidence_url": "https://www.semanticscholar.org/paper/3358d828c2ff07a45d614fd1d81cf44d5c55cad8",
530
+ "markers": [
531
+ "pashto"
532
+ ]
533
+ },
534
+ "tags": [
535
+ "pashto",
536
+ "candidate",
537
+ "paper"
538
+ ]
539
+ },
540
  {
541
  "id": "candidate-zenodo-paper-depiction-of-women-s-cries-in-pashto-landai-poetry",
542
  "title": "Depiction of Women's Cries in Pashto Landai Poetry",
 
801
  "zenodo"
802
  ]
803
  },
804
+ {
805
+ "id": "candidate-s2-essential-skills-for-a-lexicographer-based-on-pashto-lexicography",
806
+ "title": "Essential Skills for a Lexicographer: Based on Pashto Lexicography",
807
+ "url": "https://www.semanticscholar.org/paper/8fc45aa567cb78713e2fef41d5e748e8ee1d8470",
808
+ "category": "paper",
809
+ "source": "other",
810
+ "status": "candidate",
811
+ "summary": "How Pashto dictionaries meet rules of modern lexicography? Lexicography is a division of linguistic working on recording and developing data of languages. Pashto is one of the languages which do not have many resources in lexicography. Most",
812
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
813
+ "tasks": [],
814
+ "pashto_evidence": {
815
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
816
+ "evidence_url": "https://www.semanticscholar.org/paper/8fc45aa567cb78713e2fef41d5e748e8ee1d8470",
817
+ "markers": [
818
+ "pashto"
819
+ ]
820
+ },
821
+ "tags": [
822
+ "pashto",
823
+ "candidate",
824
+ "paper"
825
+ ]
826
+ },
827
  {
828
  "id": "candidate-zenodo-paper-evaluation-of-antibacterial-activity-of-zizyphus-jujuba",
829
  "title": "EVALUATION OF ANTIBACTERIAL ACTIVITY OF ZIZYPHUS JUJUBA",
 
896
  "zenodo"
897
  ]
898
  },
899
+ {
900
+ "id": "candidate-s2-exploring-the-impacts-of-emotion-through-language-learning-on-pashto-speakers-yo",
901
+ "title": "Exploring the Impacts of Emotion through Language Learning on Pashto Speakers Young Adulthood in District Peshawar",
902
+ "url": "https://www.semanticscholar.org/paper/4549649112553aabccfac8b918c7e98cdbdd0f09",
903
+ "category": "paper",
904
+ "source": "other",
905
+ "status": "candidate",
906
+ "summary": "The current study explores the emotional experiences of Pashto speakers learning a second language, with a focus on how emotions are expressed, understood, and influenced by cultural and linguistic factors. While language learning is often",
907
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
908
+ "tasks": [],
909
+ "pashto_evidence": {
910
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
911
+ "evidence_url": "https://www.semanticscholar.org/paper/4549649112553aabccfac8b918c7e98cdbdd0f09",
912
+ "markers": [
913
+ "pashto"
914
+ ]
915
+ },
916
+ "tags": [
917
+ "pashto",
918
+ "candidate",
919
+ "paper"
920
+ ]
921
+ },
922
  {
923
  "id": "candidate-datacite-paper-fairness-evaluation-and-inference-level-mitigation-in-llms",
924
  "title": "Fairness Evaluation and Inference Level Mitigation in LLMs",
925
+ "url": "https://figshare.mq.edu.au/articles/thesis/Fairness_Evaluation_and_Inference_Level_Mitigation_in_LLMs/31093552",
926
  "category": "paper",
927
  "source": "datacite",
928
  "status": "candidate",
 
931
  "tasks": [],
932
  "pashto_evidence": {
933
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
934
+ "evidence_url": "https://figshare.mq.edu.au/articles/thesis/Fairness_Evaluation_and_Inference_Level_Mitigation_in_LLMs/31093552",
935
  "markers": [
936
  "pashto"
937
  ]
 
997
  {
998
  "id": "candidate-datacite-project-female-birth-control-part-ii-pashto",
999
  "title": "Female Birth Control Part II [Pashto]",
1000
+ "url": "https://zenodo.org/doi/10.5281/zenodo.18325402",
1001
  "category": "project",
1002
  "source": "datacite",
1003
  "status": "candidate",
 
1006
  "tasks": [],
1007
  "pashto_evidence": {
1008
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1009
+ "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18325402",
1010
  "markers": [
1011
  "pashto"
1012
  ]
 
1019
  ]
1020
  },
1021
  {
1022
+ "id": "candidate-s2-fragments-of-life-in-death-world-an-analysis-of-pashto-poetry-as-a-non-violent-r",
1023
+ "title": "Fragments of life in ‘death world’: an analysis of Pashto poetry as a non-violent resistance to necropolitics",
1024
+ "url": "https://www.semanticscholar.org/paper/9726f372b07f677fad23e2ee27a7f50f985e8ed8",
1025
  "category": "paper",
1026
+ "source": "other",
1027
  "status": "candidate",
1028
+ "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
1029
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1030
  "tasks": [],
1031
  "pashto_evidence": {
1032
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1033
+ "evidence_url": "https://www.semanticscholar.org/paper/9726f372b07f677fad23e2ee27a7f50f985e8ed8",
1034
  "markers": [
1035
  "pashto"
1036
  ]
 
1038
  "tags": [
1039
  "pashto",
1040
  "candidate",
1041
+ "paper"
 
1042
  ]
1043
  },
1044
  {
1045
+ "id": "candidate-datacite-paper-framing-political-bias-in-multilingual-llms-across-pakistani-languages",
1046
+ "title": "Framing Political Bias in Multilingual LLMs Across Pakistani Languages",
1047
+ "url": "https://arxiv.org/abs/2506.00068",
1048
+ "category": "paper",
1049
+ "source": "datacite",
1050
+ "status": "candidate",
1051
+ "summary": "Large Language Models (LLMs) increasingly shape public discourse, yet most evaluations of political and economic bias have focused on high-resource, Western languages and contexts. This leaves critical blind spots in low-resource, multiling",
1052
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1053
+ "tasks": [],
1054
+ "pashto_evidence": {
1055
+ "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1056
+ "evidence_url": "https://arxiv.org/abs/2506.00068",
1057
+ "markers": [
1058
+ "pashto"
1059
+ ]
1060
+ },
1061
+ "tags": [
1062
+ "pashto",
1063
+ "candidate",
1064
+ "paper",
1065
+ "datacite"
1066
+ ]
1067
+ },
1068
+ {
1069
+ "id": "candidate-datacite-paper-from-scarcity-to-scale-a-release-level-analysis-of-the-pashto-common-voice-datas",
1070
+ "title": "From Scarcity to Scale: A Release-Level Analysis of the Pashto Common Voice Dataset",
1071
+ "url": "https://arxiv.org/abs/2602.14062",
1072
  "category": "paper",
1073
  "source": "datacite",
1074
  "status": "candidate",
 
1113
  "crossref"
1114
  ]
1115
  },
1116
+ {
1117
+ "id": "candidate-s2-gemination-in-pashto",
1118
+ "title": "Gemination in Pashto",
1119
+ "url": "https://www.semanticscholar.org/paper/ccf72dc1bcd0a0cd3a4b97cc7fe1830c37922c64",
1120
+ "category": "paper",
1121
+ "source": "other",
1122
+ "status": "candidate",
1123
+ "summary": "The purpose of the present study was to analyze gemination in Pashto. For this purpose, first, data was collected generally from elder native speakers who speak the Yousafzai dialect. The collected data then was verified and discussed sever",
1124
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1125
+ "tasks": [],
1126
+ "pashto_evidence": {
1127
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1128
+ "evidence_url": "https://www.semanticscholar.org/paper/ccf72dc1bcd0a0cd3a4b97cc7fe1830c37922c64",
1129
+ "markers": [
1130
+ "pashto"
1131
+ ]
1132
+ },
1133
+ "tags": [
1134
+ "pashto",
1135
+ "candidate",
1136
+ "paper"
1137
+ ]
1138
+ },
1139
  {
1140
  "id": "candidate-gh-project-haroon-blip-khan-pukhtoon",
1141
  "title": "Haroon-blip/khan-pukhtoon",
 
1211
  "space"
1212
  ]
1213
  },
1214
+ {
1215
+ "id": "candidate-s2-identifying-cultural-and-semantic-translation-errors-in-pashto-english-proverbs-",
1216
+ "title": "Identifying Cultural and Semantic Translation Errors in Pashto–English Proverbs Translation: A Comparative Study of ChatGPT, Gemini, and Google Translations",
1217
+ "url": "https://www.semanticscholar.org/paper/dfd31f726fb5b8be2c457d4b73f904196deae0a3",
1218
+ "category": "paper",
1219
+ "source": "other",
1220
+ "status": "candidate",
1221
+ "summary": "Machine Translation (MT) has advanced rapidly with the emergence of neural and AI- powered systems, yet translating culturally embedded figurative language particularly proverbs continue to pose significant challenges, especially in low-res",
1222
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1223
+ "tasks": [],
1224
+ "pashto_evidence": {
1225
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1226
+ "evidence_url": "https://www.semanticscholar.org/paper/dfd31f726fb5b8be2c457d4b73f904196deae0a3",
1227
+ "markers": [
1228
+ "pashto"
1229
+ ]
1230
+ },
1231
+ "tags": [
1232
+ "pashto",
1233
+ "candidate",
1234
+ "paper"
1235
+ ]
1236
+ },
1237
  {
1238
  "id": "candidate-hf-dataset-ihanif-pashto-speech-2k",
1239
  "title": "ihanif/pashto_speech_2k",
 
1331
  "beautify"
1332
  ]
1333
  },
1334
+ {
1335
+ "id": "candidate-s2-introduction-to-pashto-word-s-characteristics",
1336
+ "title": "Introduction to Pashto Word’s Characteristics",
1337
+ "url": "https://www.semanticscholar.org/paper/6eb3febbb368a7eaccc6290bcd77683ed3d624aa",
1338
+ "category": "paper",
1339
+ "source": "other",
1340
+ "status": "candidate",
1341
+ "summary": "This study investigates the distinctive characteristics of Pashto words, focusing on their phonological, morphological, and semantic features. Pashto, an Eastern Iranian language spoken primarily in Afghanistan and Pakistan, exhibits a rich",
1342
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1343
+ "tasks": [],
1344
+ "pashto_evidence": {
1345
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1346
+ "evidence_url": "https://www.semanticscholar.org/paper/6eb3febbb368a7eaccc6290bcd77683ed3d624aa",
1347
+ "markers": [
1348
+ "pashto"
1349
+ ]
1350
+ },
1351
+ "tags": [
1352
+ "pashto",
1353
+ "candidate",
1354
+ "paper"
1355
+ ]
1356
+ },
1357
+ {
1358
+ "id": "candidate-datacite-project-introduction-to-postpartum-care-for-refugee-women-pashto",
1359
+ "title": "Introduction to Postpartum Care for Refugee women [Pashto]",
1360
+ "url": "https://zenodo.org/doi/10.5281/zenodo.18324878",
1361
+ "category": "project",
1362
+ "source": "datacite",
1363
+ "status": "candidate",
1364
+ "summary": "Candidate record returned from DataCite DOI search for Pashto.",
1365
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1366
+ "tasks": [],
1367
+ "pashto_evidence": {
1368
+ "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1369
+ "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18324878",
1370
+ "markers": [
1371
+ "pashto"
1372
+ ]
1373
+ },
1374
+ "tags": [
1375
+ "pashto",
1376
+ "candidate",
1377
+ "project",
1378
+ "datacite"
1379
+ ]
1380
+ },
1381
  {
1382
  "id": "candidate-zenodo-paper-is-the-pushto-a-semitic-language",
1383
  "title": "Is the Pushto a Semitic Language",
 
1402
  "zenodo"
1403
  ]
1404
  },
1405
+ {
1406
+ "id": "candidate-openalex-isolated-handwritten-pashto-character-recognition-using-a-i-k-i-nn-classificatio",
1407
+ "title": "Isolated Handwritten Pashto Character Recognition Using a <i>K</i>‐NN Classification Tool based on Zoning and HOG Feature Extraction Techniques",
1408
+ "url": "https://doi.org/10.1155/2021/5558373",
1409
+ "category": "paper",
1410
+ "source": "openalex",
1411
+ "status": "candidate",
1412
+ "summary": "Candidate paper returned from OpenAlex works search for Pashto.",
1413
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1414
+ "tasks": [],
1415
+ "pashto_evidence": {
1416
+ "evidence_text": "Matched by explicit Pashto marker in title from OpenAlex works search.",
1417
+ "evidence_url": "https://doi.org/10.1155/2021/5558373",
1418
+ "markers": [
1419
+ "pashto"
1420
+ ]
1421
+ },
1422
+ "tags": [
1423
+ "pashto",
1424
+ "candidate",
1425
+ "paper",
1426
+ "openalex"
1427
+ ]
1428
+ },
1429
+ {
1430
+ "id": "candidate-hf-model-jawaria-wav2vec2-large-xls-r-300m-pashto-colab-final-1",
1431
+ "title": "Jawaria/wav2vec2-large-xls-r-300m-pashto-colab-final-1",
1432
+ "url": "https://huggingface.co/Jawaria/wav2vec2-large-xls-r-300m-pashto-colab-final-1",
1433
+ "category": "model",
1434
+ "source": "huggingface",
1435
+ "status": "candidate",
1436
+ "summary": "Candidate model returned from Hugging Face search for Pashto.",
1437
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1438
+ "tasks": [],
1439
+ "pashto_evidence": {
1440
+ "evidence_text": "Matched by Pashto keyword in Hugging Face search results.",
1441
+ "evidence_url": "https://huggingface.co/Jawaria/wav2vec2-large-xls-r-300m-pashto-colab-final-1",
1442
+ "markers": [
1443
+ "pashto"
1444
+ ]
1445
+ },
1446
+ "tags": [
1447
+ "pashto",
1448
+ "candidate",
1449
+ "model"
1450
+ ]
1451
+ },
1452
  {
1453
  "id": "candidate-openalex-knn-and-ann-based-recognition-of-handwritten-pashto-letters-using-zoning-feature",
1454
  "title": "KNN and ANN-based Recognition of Handwritten Pashto Letters using Zoning Features",
 
1496
  "dataset"
1497
  ]
1498
  },
1499
+ {
1500
+ "id": "candidate-hf-model-koochikoo25-whisper-medium-pashto",
1501
+ "title": "koochikoo25/Whisper-medium-pashto",
1502
+ "url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
1503
+ "category": "model",
1504
+ "source": "huggingface",
1505
+ "status": "candidate",
1506
+ "summary": "Candidate model returned from Hugging Face search for Pashto.",
1507
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1508
+ "tasks": [],
1509
+ "pashto_evidence": {
1510
+ "evidence_text": "Matched by Pashto keyword in Hugging Face search results.",
1511
+ "evidence_url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
1512
+ "markers": [
1513
+ "pashto"
1514
+ ]
1515
+ },
1516
+ "tags": [
1517
+ "pashto",
1518
+ "candidate",
1519
+ "model"
1520
+ ]
1521
+ },
1522
  {
1523
  "id": "candidate-zenodo-paper-language-barrier-and-its-effect-on-learning-at-the-public-primary-school-level-i",
1524
  "title": "Language Barrier and its Effect on Learning at the Public Primary School Level in Lahore",
 
1543
  "zenodo"
1544
  ]
1545
  },
1546
+ {
1547
+ "id": "candidate-s2-language-of-resistance-in-pashto-poetry-during-the-war-on-terror",
1548
+ "title": "Language of Resistance in Pashto Poetry during the War on Terror",
1549
+ "url": "https://www.semanticscholar.org/paper/23dbf301cdadbb3e1e309ed232baf5cfb2b6414b",
1550
+ "category": "paper",
1551
+ "source": "other",
1552
+ "status": "candidate",
1553
+ "summary": "The paper explores the compelling nature of Pashto poetry as a weapon of resistance in the War on Terror, how it has been used to reveal Pashtun identity, political protest, and cultural strength. With military activities dismantling the Pa",
1554
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1555
+ "tasks": [],
1556
+ "pashto_evidence": {
1557
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1558
+ "evidence_url": "https://www.semanticscholar.org/paper/23dbf301cdadbb3e1e309ed232baf5cfb2b6414b",
1559
+ "markers": [
1560
+ "pashto"
1561
+ ]
1562
+ },
1563
+ "tags": [
1564
+ "pashto",
1565
+ "candidate",
1566
+ "paper"
1567
+ ]
1568
+ },
1569
+ {
1570
+ "id": "candidate-crossref-le-verbe-pashto",
1571
+ "title": "Le verbe pashto",
1572
+ "url": "https://doi.org/10.29091/9783954907083",
1573
+ "category": "paper",
1574
+ "source": "crossref",
1575
+ "status": "candidate",
1576
+ "summary": "Candidate paper returned from Crossref search for Pashto.",
1577
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1578
+ "tasks": [],
1579
+ "pashto_evidence": {
1580
+ "evidence_text": "Matched by explicit Pashto marker in title from Crossref search.",
1581
+ "evidence_url": "https://doi.org/10.29091/9783954907083",
1582
+ "markers": [
1583
+ "pashto"
1584
+ ]
1585
+ },
1586
+ "tags": [
1587
+ "pashto",
1588
+ "candidate",
1589
+ "paper",
1590
+ "crossref"
1591
+ ]
1592
+ },
1593
+ {
1594
+ "id": "candidate-gh-project-lecramyajiv-fonts-arabic-extra",
1595
+ "title": "lecramyajiv/fonts-arabic-extra",
1596
+ "url": "https://github.com/lecramyajiv/fonts-arabic-extra",
1597
+ "category": "project",
1598
+ "source": "github",
1599
+ "status": "candidate",
1600
+ "summary": "Extra Arabic fonts for Slackware Linux",
1601
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
1602
+ "tasks": [],
1603
+ "pashto_evidence": {
1604
+ "evidence_text": "Repository metadata (name/description/topics) includes Pashto markers.",
1605
+ "evidence_url": "https://github.com/lecramyajiv/fonts-arabic-extra",
1606
+ "markers": [
1607
+ "pashto"
1608
+ ]
1609
+ },
1610
+ "tags": [
1611
+ "pashto",
1612
+ "candidate",
1613
+ "project",
1614
+ "github",
1615
+ "arabic",
1616
+ "fonts",
1617
+ "kufi"
1618
+ ]
1619
+ },
1620
  {
1621
  "id": "candidate-gh-project-lecramyajiv-ttf-x2",
1622
  "title": "lecramyajiv/ttf-x2",
 
1797
  ]
1798
  },
1799
  {
1800
+ "id": "candidate-s2-multilingual-interplay-and-the-influence-of-the-official-languages-on-the-use-an",
1801
+ "title": "Multilingual interplay and the influence of the official languages on the use and transmission of the regional language Pashto: a case study of a Pashtun family in Pakistan",
1802
+ "url": "https://www.semanticscholar.org/paper/2b42be99fa7ad002efd3cf1d1c75834b69108a07",
1803
+ "category": "paper",
1804
+ "source": "other",
1805
  "status": "candidate",
1806
+ "summary": "ABSTRACT The impact of English and Urdu in Pakistan on the intergenerational transmission and use of the regional language, Pashto, in the family domain is not well known. This paper, therefore, examines language use patterns in a middle-cl",
1807
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1808
  "tasks": [],
1809
  "pashto_evidence": {
1810
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
1811
+ "evidence_url": "https://www.semanticscholar.org/paper/2b42be99fa7ad002efd3cf1d1c75834b69108a07",
1812
  "markers": [
1813
  "pashto"
1814
  ]
 
1816
  "tags": [
1817
  "pashto",
1818
  "candidate",
1819
+ "paper"
1820
  ]
1821
  },
1822
  {
 
1873
  {
1874
  "id": "candidate-datacite-dataset-navoiy-terra-corpus-v1-0-first-computational-corpus-of-alisher-navoi-works-with-",
1875
  "title": "NAVOIY-TERRA Corpus v1.0: First Computational Corpus of Alisher Navoi Works with Nine-Language Semantic Annotations",
1876
+ "url": "https://zenodo.org/doi/10.5281/zenodo.18602635",
1877
  "category": "dataset",
1878
  "source": "datacite",
1879
  "status": "candidate",
 
1882
  "tasks": [],
1883
  "pashto_evidence": {
1884
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1885
+ "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18602635",
1886
  "markers": [
1887
  "pashto"
1888
  ]
 
1919
  ]
1920
  },
1921
  {
1922
+ "id": "candidate-datacite-project-negation-in-pashto",
1923
+ "title": "Negation in Pashto",
1924
+ "url": "https://zenodo.org/doi/10.5281/zenodo.18233956",
1925
+ "category": "project",
1926
+ "source": "datacite",
1927
  "status": "candidate",
1928
+ "summary": "In this paper, we explore negation in Pashto an Eastern Iranian language spoken mainly in Pakistan and Afghanistan. Based on the Yousafzai dialect of Pashto,with the questionnaire provided by the editors as our main instrument, we investi",
1929
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
1930
  "tasks": [],
1931
  "pashto_evidence": {
1932
+ "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
1933
+ "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.18233956",
1934
  "markers": [
1935
  "pashto"
1936
  ]
 
1938
  "tags": [
1939
  "pashto",
1940
  "candidate",
1941
+ "project",
1942
+ "datacite"
1943
  ]
1944
  },
1945
  {
 
1998
  {
1999
  "id": "candidate-datacite-paper-only-2-of-141-global-languages-employ-a-labial-for-tongue-in-1st-position-challe",
2000
  "title": "Only 2 of 141 Global Languages Employ a Labial for \"Tongue\" in 1st position Challenging Saussure's Arbitrariness With Near Universal Embodied Iconicity for Tongue Vs Mouth in \"inverse\" Control",
2001
+ "url": "https://zenodo.org/doi/10.5281/zenodo.17791741",
2002
  "category": "paper",
2003
  "source": "datacite",
2004
  "status": "candidate",
 
2007
  "tasks": [],
2008
  "pashto_evidence": {
2009
  "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
2010
+ "evidence_url": "https://zenodo.org/doi/10.5281/zenodo.17791741",
2011
  "markers": [
2012
  "pashto"
2013
  ]
 
2116
  ]
2117
  },
2118
  {
2119
+ "id": "candidate-openalex-pashto-free-relatives-and-triply-filled-comp-evidence-for-a-headed-analysis",
2120
+ "title": "Pashto free relatives and triply-filled Comp: Evidence for a headed analysis",
2121
+ "url": "https://doi.org/10.1016/s0024-3841(96)00032-0",
2122
  "category": "paper",
2123
+ "source": "openalex",
2124
  "status": "candidate",
2125
+ "summary": "Candidate paper returned from OpenAlex works search for Pashto.",
2126
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
2127
  "tasks": [],
2128
  "pashto_evidence": {
2129
+ "evidence_text": "Matched by explicit Pashto marker in title from OpenAlex works search.",
2130
+ "evidence_url": "https://doi.org/10.1016/s0024-3841(96)00032-0",
2131
  "markers": [
2132
  "pashto"
2133
  ]
 
2135
  "tags": [
2136
  "pashto",
2137
  "candidate",
2138
+ "paper",
2139
+ "openalex"
2140
  ]
2141
  },
2142
  {
2143
+ "id": "candidate-crossref-pashto-handwritten-books",
2144
+ "title": "Pashto Handwritten Books",
2145
+ "url": "https://doi.org/10.1163/9789004737358_003",
2146
+ "category": "paper",
2147
+ "source": "crossref",
2148
+ "status": "candidate",
2149
+ "summary": "Candidate paper returned from Crossref search for Pashto.",
2150
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2151
+ "tasks": [],
2152
+ "pashto_evidence": {
2153
+ "evidence_text": "Matched by explicit Pashto marker in title from Crossref search.",
2154
+ "evidence_url": "https://doi.org/10.1163/9789004737358_003",
2155
+ "markers": [
2156
+ "pashto"
2157
+ ]
2158
+ },
2159
+ "tags": [
2160
+ "pashto",
2161
+ "candidate",
2162
+ "paper",
2163
+ "crossref"
2164
+ ]
2165
+ },
2166
+ {
2167
+ "id": "candidate-kaggle-dataset-abdulbasitkh-pashto-isolated-alphabets-and-numerals",
2168
+ "title": "Pashto Isolated Alphabets and Numerals",
2169
+ "url": "https://www.kaggle.com/datasets/abdulbasitkh/pashto-isolated-alphabetss-and-numerals",
2170
+ "category": "dataset",
2171
+ "source": "kaggle",
2172
  "status": "candidate",
2173
  "summary": "Pashto Islated Alphabets and Numerals Handwritten and Printed",
2174
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
 
2235
  "kaggle"
2236
  ]
2237
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2238
  {
2239
  "id": "candidate-crossref-pashto-tappa",
2240
  "title": "Pashto Tappa",
 
2355
  "kaggle"
2356
  ]
2357
  },
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2358
  {
2359
  "id": "candidate-openalex-persian-urdu-and-pashto-a-comparative-orthographic-analysis",
2360
  "title": "Persian, Urdu, and Pashto: A comparative orthographic analysis",
 
2403
  "zenodo"
2404
  ]
2405
  },
2406
+ {
2407
+ "id": "candidate-datacite-paper-psocr-benchmarking-large-multimodal-models-for-optical-character-recognition-in-",
2408
+ "title": "PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language",
2409
+ "url": "https://arxiv.org/abs/2505.10055",
2410
+ "category": "paper",
2411
+ "source": "datacite",
2412
+ "status": "candidate",
2413
+ "summary": "This paper evaluates the performance of Large Multimodal Models (LMMs) on Optical Character Recognition (OCR) in the low-resource Pashto language. Natural Language Processing (NLP) in Pashto faces several challenges due to the cursive natur",
2414
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2415
+ "tasks": [],
2416
+ "pashto_evidence": {
2417
+ "evidence_text": "DataCite metadata includes Pashto markers in title or description.",
2418
+ "evidence_url": "https://arxiv.org/abs/2505.10055",
2419
+ "markers": [
2420
+ "pashto"
2421
+ ]
2422
+ },
2423
+ "tags": [
2424
+ "pashto",
2425
+ "candidate",
2426
+ "paper",
2427
+ "datacite"
2428
+ ]
2429
+ },
2430
  {
2431
  "id": "candidate-arxiv-psocr-benchmarking-large-multimodal-models-for-optical-character-recognition-in-",
2432
  "title": "PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language",
 
2474
  "github"
2475
  ]
2476
  },
2477
+ {
2478
+ "id": "candidate-gh-project-pukhtoonmafia009-pukhtoonmafia009",
2479
+ "title": "Pukhtoonmafia009/Pukhtoonmafia009",
2480
+ "url": "https://github.com/Pukhtoonmafia009/Pukhtoonmafia009",
2481
+ "category": "project",
2482
+ "source": "github",
2483
+ "status": "candidate",
2484
+ "summary": "Config files for my GitHub profile.",
2485
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2486
+ "tasks": [],
2487
+ "pashto_evidence": {
2488
+ "evidence_text": "Repository metadata (name/description/topics) includes Pashto markers.",
2489
+ "evidence_url": "https://github.com/Pukhtoonmafia009/Pukhtoonmafia009",
2490
+ "markers": [
2491
+ "pashto"
2492
+ ]
2493
+ },
2494
+ "tags": [
2495
+ "pashto",
2496
+ "candidate",
2497
+ "project",
2498
+ "github",
2499
+ "config",
2500
+ "github-config"
2501
+ ]
2502
+ },
2503
+ {
2504
+ "id": "candidate-gh-project-pukhtoonmalang-pukhtoon1",
2505
+ "title": "PukhtoonMalang/Pukhtoon1",
2506
+ "url": "https://github.com/PukhtoonMalang/Pukhtoon1",
2507
+ "category": "project",
2508
+ "source": "github",
2509
+ "status": "candidate",
2510
+ "summary": "Pukhtoom1",
2511
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2512
+ "tasks": [],
2513
+ "pashto_evidence": {
2514
+ "evidence_text": "Repository metadata (name/description/topics) includes Pashto markers.",
2515
+ "evidence_url": "https://github.com/PukhtoonMalang/Pukhtoon1",
2516
+ "markers": [
2517
+ "pashto"
2518
+ ]
2519
+ },
2520
+ "tags": [
2521
+ "pashto",
2522
+ "candidate",
2523
+ "project",
2524
+ "github"
2525
+ ]
2526
+ },
2527
+ {
2528
+ "id": "candidate-gh-project-pukhtoonyar406-pukhtoonyar406",
2529
+ "title": "pukhtoonyar406/pukhtoonyar406",
2530
+ "url": "https://github.com/pukhtoonyar406/pukhtoonyar406",
2531
+ "category": "project",
2532
+ "source": "github",
2533
+ "status": "candidate",
2534
+ "summary": "Config files for my GitHub profile.",
2535
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2536
+ "tasks": [],
2537
+ "pashto_evidence": {
2538
+ "evidence_text": "Repository metadata (name/description/topics) includes Pashto markers.",
2539
+ "evidence_url": "https://github.com/pukhtoonyar406/pukhtoonyar406",
2540
+ "markers": [
2541
+ "pashto"
2542
+ ]
2543
+ },
2544
+ "tags": [
2545
+ "pashto",
2546
+ "candidate",
2547
+ "project",
2548
+ "github",
2549
+ "config",
2550
+ "github-config"
2551
+ ]
2552
+ },
2553
  {
2554
  "id": "candidate-dataverse-dataset-rats-language-identification",
2555
  "title": "RATS Language Identification",
 
2646
  "dataverse"
2647
  ]
2648
  },
2649
+ {
2650
+ "id": "candidate-s2-resolution-of-ellipses-in-wh-constructions-in-pashto-language",
2651
+ "title": "Resolution of Ellipses in WH-constructions in Pashto Language",
2652
+ "url": "https://www.semanticscholar.org/paper/b9d84d79be0e90e026bbd596276697eeca5d9474",
2653
+ "category": "paper",
2654
+ "source": "other",
2655
+ "status": "candidate",
2656
+ "summary": "The Pashto language has a question structure consisting of a WH-word and an answer to the question, this is called WH-structure. The resolution of ellipsis occurs in most cases in both written and spoken language in its WH construction. In",
2657
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2658
+ "tasks": [],
2659
+ "pashto_evidence": {
2660
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
2661
+ "evidence_url": "https://www.semanticscholar.org/paper/b9d84d79be0e90e026bbd596276697eeca5d9474",
2662
+ "markers": [
2663
+ "pashto"
2664
+ ]
2665
+ },
2666
+ "tags": [
2667
+ "pashto",
2668
+ "candidate",
2669
+ "paper"
2670
+ ]
2671
+ },
2672
+ {
2673
+ "id": "candidate-openalex-scale-and-rotation-invariant-recognition-of-cursive-pashto-script-using-sift-fea",
2674
+ "title": "Scale and rotation invariant recognition of cursive Pashto script using SIFT features",
2675
+ "url": "https://doi.org/10.1109/icet.2010.5638470",
2676
+ "category": "paper",
2677
+ "source": "openalex",
2678
+ "status": "candidate",
2679
+ "summary": "Candidate paper returned from OpenAlex works search for Pashto.",
2680
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2681
+ "tasks": [],
2682
+ "pashto_evidence": {
2683
+ "evidence_text": "Matched by explicit Pashto marker in title from OpenAlex works search.",
2684
+ "evidence_url": "https://doi.org/10.1109/icet.2010.5638470",
2685
+ "markers": [
2686
+ "pashto"
2687
+ ]
2688
+ },
2689
+ "tags": [
2690
+ "pashto",
2691
+ "candidate",
2692
+ "paper",
2693
+ "openalex"
2694
+ ]
2695
+ },
2696
  {
2697
  "id": "candidate-openalex-separating-phonology-from-syntax-a-reanalysis-of-pashto-cliticization",
2698
  "title": "Separating phonology from syntax: a reanalysis of Pashto cliticization",
 
2717
  "openalex"
2718
  ]
2719
  },
2720
+ {
2721
+ "id": "candidate-gh-project-shahzamanpatan-pashto-baran",
2722
+ "title": "ShahZamanPatan/Pashto-Baran",
2723
+ "url": "https://github.com/ShahZamanPatan/Pashto-Baran",
2724
+ "category": "project",
2725
+ "source": "github",
2726
+ "status": "candidate",
2727
+ "summary": "پښتو باران يوه پښتو ليکبڼه ده چې په ځانګړې توګه د پښتو ژبې وېبپاڼو لپاره د نازنين او اېکس بي کيهان ليکبڼو تر اغېز لاندې ډيزاين شوې ده تاسو کولی شئ ياده ليکبڼه هرځای کې له وړيا سوداګريزې کارونې جواز سره د پښتو، اردو، عربي، فارسي، کهوار، سرائ",
2728
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2729
+ "tasks": [],
2730
+ "pashto_evidence": {
2731
+ "evidence_text": "Repository metadata (name/description/topics) includes Pashto markers.",
2732
+ "evidence_url": "https://github.com/ShahZamanPatan/Pashto-Baran",
2733
+ "markers": [
2734
+ "pashto"
2735
+ ]
2736
+ },
2737
+ "tags": [
2738
+ "pashto",
2739
+ "candidate",
2740
+ "project",
2741
+ "github",
2742
+ "fonts",
2743
+ "freepashtofonts",
2744
+ "pashto"
2745
+ ]
2746
+ },
2747
  {
2748
  "id": "candidate-gh-project-shawanonymouse-pakhtoon",
2749
  "title": "ShawAnonymouse/Pakhtoon",
 
2846
  {
2847
  "id": "candidate-crossref-summaries-in-pashto",
2848
  "title": "Summaries in Pashto",
2849
+ "url": "https://doi.org/10.1097/01.wtf.0000437933.40809.39",
2850
  "category": "paper",
2851
  "source": "crossref",
2852
  "status": "candidate",
 
2855
  "tasks": [],
2856
  "pashto_evidence": {
2857
  "evidence_text": "Matched by explicit Pashto marker in title from Crossref search.",
2858
+ "evidence_url": "https://doi.org/10.1097/01.wtf.0000437933.40809.39",
2859
  "markers": [
2860
  "pashto"
2861
  ]
 
2867
  "crossref"
2868
  ]
2869
  },
2870
+ {
2871
+ "id": "candidate-s2-switching-selves-online-pashto-english-bilingualism-identity-and-expression-in-p",
2872
+ "title": "SWITCHING SELVES ONLINE:PASHTO-ENGLISH BILINGUALISM,IDENTITY, AND EXPRESSION IN PAKISTAN’S DIGITAL DISCOURSE",
2873
+ "url": "https://www.semanticscholar.org/paper/7a330c5fb416a1105866a895748b4336f8ef8100",
2874
+ "category": "paper",
2875
+ "source": "other",
2876
+ "status": "candidate",
2877
+ "summary": "The language in modern digital realms goes beyond its message carrying center; it serves as a mirror of itself in identity, emotion, and cultural location. The current paper examines what happens when Pashto-English bilinguals in Pakistan n",
2878
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2879
+ "tasks": [],
2880
+ "pashto_evidence": {
2881
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
2882
+ "evidence_url": "https://www.semanticscholar.org/paper/7a330c5fb416a1105866a895748b4336f8ef8100",
2883
+ "markers": [
2884
+ "pashto"
2885
+ ]
2886
+ },
2887
+ "tags": [
2888
+ "pashto",
2889
+ "candidate",
2890
+ "paper"
2891
+ ]
2892
+ },
2893
+ {
2894
+ "id": "candidate-s2-syntax-and-morphology-of-baniswola-pashto-investigating-universal-and-dialectal-",
2895
+ "title": "Syntax and morphology of Baniswola Pashto: investigating universal and dialectal variations",
2896
+ "url": "https://www.semanticscholar.org/paper/9f725b3b282cf05f9089002d474010c6021001f9",
2897
+ "category": "paper",
2898
+ "source": "other",
2899
+ "status": "candidate",
2900
+ "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
2901
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2902
+ "tasks": [],
2903
+ "pashto_evidence": {
2904
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
2905
+ "evidence_url": "https://www.semanticscholar.org/paper/9f725b3b282cf05f9089002d474010c6021001f9",
2906
+ "markers": [
2907
+ "pashto"
2908
+ ]
2909
+ },
2910
+ "tags": [
2911
+ "pashto",
2912
+ "candidate",
2913
+ "paper"
2914
+ ]
2915
+ },
2916
  {
2917
  "id": "candidate-hf-project-tasal9-pashto-base-bloom-space",
2918
  "title": "tasal9/pashto-base-bloom-space",
 
2985
  "openalex"
2986
  ]
2987
  },
2988
+ {
2989
+ "id": "candidate-s2-the-development-and-evaluation-of-an-automatic-clitic-generator-for-pashto-langu",
2990
+ "title": "The development and evaluation of an automatic clitic generator for Pashto language",
2991
+ "url": "https://www.semanticscholar.org/paper/3d95449d67799fcac83f855984cb0c29cc500d7b",
2992
+ "category": "paper",
2993
+ "source": "other",
2994
+ "status": "candidate",
2995
+ "summary": "Candidate paper returned from Semantic Scholar search for Pashto.",
2996
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
2997
+ "tasks": [],
2998
+ "pashto_evidence": {
2999
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
3000
+ "evidence_url": "https://www.semanticscholar.org/paper/3d95449d67799fcac83f855984cb0c29cc500d7b",
3001
+ "markers": [
3002
+ "pashto"
3003
+ ]
3004
+ },
3005
+ "tags": [
3006
+ "pashto",
3007
+ "candidate",
3008
+ "paper"
3009
+ ]
3010
+ },
3011
  {
3012
  "id": "candidate-openalex-the-grammar-of-clitics-evidence-from-pashto-and-other-languages",
3013
  "title": "The grammar of clitics : evidence from Pashto and other languages",
 
3105
  ]
3106
  },
3107
  {
3108
+ "id": "candidate-crossref-topicalization-in-pashto",
3109
+ "title": "Topicalization in Pashto",
3110
+ "url": "https://doi.org/10.31703/gssr.2020(v-i).17",
3111
  "category": "paper",
3112
+ "source": "crossref",
3113
  "status": "candidate",
3114
+ "summary": "Candidate paper returned from Crossref search for Pashto.",
3115
  "primary_use": "Needs maintainer review before promotion to verified catalog.",
3116
  "tasks": [],
3117
  "pashto_evidence": {
3118
+ "evidence_text": "Matched by explicit Pashto marker in title from Crossref search.",
3119
+ "evidence_url": "https://doi.org/10.31703/gssr.2020(v-i).17",
3120
  "markers": [
3121
  "pashto"
3122
  ]
 
3124
  "tags": [
3125
  "pashto",
3126
  "candidate",
3127
+ "paper",
3128
+ "crossref"
3129
  ]
3130
  },
3131
  {
 
3248
  "dataverse"
3249
  ]
3250
  },
3251
+ {
3252
+ "id": "candidate-s2-validation-of-the-pashto-version-of-the-premature-ejaculation-diagnostic-tool-pe",
3253
+ "title": "Validation of the Pashto Version of the Premature Ejaculation Diagnostic Tool (PEDT)",
3254
+ "url": "https://www.semanticscholar.org/paper/4ecd042988e98a5fe82dce2f9c6c179bf635559b",
3255
+ "category": "paper",
3256
+ "source": "other",
3257
+ "status": "candidate",
3258
+ "summary": "Objective: The research goal involved evaluating the validity of the Pashto version of the Premature Ejaculation Diagnostic Tool (PEDT) by assessing its relationship with clinical premature ejaculation (PE) diagnosis and intravaginal ejacul",
3259
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
3260
+ "tasks": [],
3261
+ "pashto_evidence": {
3262
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
3263
+ "evidence_url": "https://www.semanticscholar.org/paper/4ecd042988e98a5fe82dce2f9c6c179bf635559b",
3264
+ "markers": [
3265
+ "pashto"
3266
+ ]
3267
+ },
3268
+ "tags": [
3269
+ "pashto",
3270
+ "candidate",
3271
+ "paper"
3272
+ ]
3273
+ },
3274
+ {
3275
+ "id": "candidate-s2-word-order-in-pashto-sentences",
3276
+ "title": "Word Order in Pashto Sentences",
3277
+ "url": "https://www.semanticscholar.org/paper/ed2025e7f944cfa86bdd4c24508e1676b788ae02",
3278
+ "category": "paper",
3279
+ "source": "other",
3280
+ "status": "candidate",
3281
+ "summary": "This article attempts to show the order of words in Pashto phrases and sentences. Their original position is explained, and if the same structure or word changes its position, it is also explained. No specific article or book has been publi",
3282
+ "primary_use": "Needs maintainer review before promotion to verified catalog.",
3283
+ "tasks": [],
3284
+ "pashto_evidence": {
3285
+ "evidence_text": "Matched by explicit Pashto marker in paper title from Semantic Scholar search.",
3286
+ "evidence_url": "https://www.semanticscholar.org/paper/ed2025e7f944cfa86bdd4c24508e1676b788ae02",
3287
+ "markers": [
3288
+ "pashto"
3289
+ ]
3290
+ },
3291
+ "tags": [
3292
+ "pashto",
3293
+ "candidate",
3294
+ "paper"
3295
+ ]
3296
+ },
3297
  {
3298
  "id": "candidate-zenodo-paper-resource",
3299
  "title": "بلوچستان میں \" فقہ اسلامی \" کے فروغ و ارتقا٫ کا تحقیقی جائزہ",
 
3320
  }
3321
  ]
3322
  }
 
 
resources/catalog/resources.json CHANGED
@@ -1,6 +1,6 @@
1
  {
2
- "version": "1.0.0",
3
- "updated_on": "2026-02-17",
4
  "resources": [
5
  {
6
  "id": "dataset-common-voice-ps-v24",
@@ -2559,6 +2559,189 @@
2559
  "speech",
2560
  "translation"
2561
  ]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2562
  }
2563
  ]
2564
  }
 
1
  {
2
+ "version": "1.0.1",
3
+ "updated_on": "2026-02-18",
4
  "resources": [
5
  {
6
  "id": "dataset-common-voice-ps-v24",
 
2559
  "speech",
2560
  "translation"
2561
  ]
2562
+ },
2563
+ {
2564
+ "id": "dataset-hf-ihanif-pashto-speech-2k",
2565
+ "title": "ihanif/pashto_speech_2k",
2566
+ "url": "https://huggingface.co/datasets/ihanif/pashto_speech_2k",
2567
+ "category": "dataset",
2568
+ "source": "huggingface",
2569
+ "status": "verified",
2570
+ "summary": "Pashto synthetic speech dataset with paired audio-text samples for low-resource ASR baselines.",
2571
+ "primary_use": "ASR training and controlled synthetic-speech evaluation",
2572
+ "license": "mit",
2573
+ "tasks": [
2574
+ "asr"
2575
+ ],
2576
+ "pashto_evidence": {
2577
+ "evidence_text": "Dataset metadata includes language:ps and Pashto speech dataset card details.",
2578
+ "evidence_url": "https://huggingface.co/datasets/ihanif/pashto_speech_2k",
2579
+ "markers": [
2580
+ "language:ps",
2581
+ "pashto",
2582
+ "speech"
2583
+ ]
2584
+ },
2585
+ "tags": [
2586
+ "pashto",
2587
+ "speech",
2588
+ "dataset",
2589
+ "asr",
2590
+ "huggingface"
2591
+ ]
2592
+ },
2593
+ {
2594
+ "id": "dataset-hf-ihanif-pashto-speech-3k",
2595
+ "title": "ihanif/pashto_speech_3k",
2596
+ "url": "https://huggingface.co/datasets/ihanif/pashto_speech_3k",
2597
+ "category": "dataset",
2598
+ "source": "huggingface",
2599
+ "status": "verified",
2600
+ "summary": "Pashto synthetic speech parquet dataset with audio-text pairs and language metadata.",
2601
+ "primary_use": "ASR training and reproducible speech-data experimentation",
2602
+ "license": "mit",
2603
+ "tasks": [
2604
+ "asr"
2605
+ ],
2606
+ "pashto_evidence": {
2607
+ "evidence_text": "Dataset metadata includes language:ps and task category automatic speech recognition.",
2608
+ "evidence_url": "https://huggingface.co/datasets/ihanif/pashto_speech_3k",
2609
+ "markers": [
2610
+ "language:ps",
2611
+ "automatic-speech-recognition",
2612
+ "pashto"
2613
+ ]
2614
+ },
2615
+ "tags": [
2616
+ "pashto",
2617
+ "speech",
2618
+ "dataset",
2619
+ "asr",
2620
+ "huggingface",
2621
+ "parquet"
2622
+ ]
2623
+ },
2624
+ {
2625
+ "id": "dataset-hf-koochikoo25-pashto-concatenated",
2626
+ "title": "koochikoo25/Pashto-Concatenated",
2627
+ "url": "https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated",
2628
+ "category": "dataset",
2629
+ "source": "huggingface",
2630
+ "status": "verified",
2631
+ "summary": "Pashto concatenated audio-text dataset with predefined train-validation-test splits.",
2632
+ "primary_use": "ASR dataset preparation and split-based benchmark experiments",
2633
+ "license": "cc-by-nd-4.0",
2634
+ "tasks": [
2635
+ "asr"
2636
+ ],
2637
+ "pashto_evidence": {
2638
+ "evidence_text": "Dataset title explicitly states Pashto and card metadata exposes audio-text features and splits.",
2639
+ "evidence_url": "https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated",
2640
+ "markers": [
2641
+ "Pashto",
2642
+ "audio",
2643
+ "transcription"
2644
+ ]
2645
+ },
2646
+ "tags": [
2647
+ "pashto",
2648
+ "speech",
2649
+ "dataset",
2650
+ "asr",
2651
+ "huggingface"
2652
+ ]
2653
+ },
2654
+ {
2655
+ "id": "model-hf-koochikoo25-whisper-medium-pashto",
2656
+ "title": "koochikoo25/Whisper-medium-pashto",
2657
+ "url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
2658
+ "category": "model",
2659
+ "source": "huggingface",
2660
+ "status": "verified",
2661
+ "summary": "Whisper medium fine-tuned checkpoint for Pashto automatic speech recognition.",
2662
+ "primary_use": "Pashto ASR baseline modeling and transcription comparison",
2663
+ "license": "apache-2.0",
2664
+ "tasks": [
2665
+ "asr"
2666
+ ],
2667
+ "pashto_evidence": {
2668
+ "evidence_text": "Model tags include ps and automatic-speech-recognition with a Pashto model name.",
2669
+ "evidence_url": "https://huggingface.co/koochikoo25/Whisper-medium-pashto",
2670
+ "markers": [
2671
+ "ps",
2672
+ "automatic-speech-recognition",
2673
+ "pashto"
2674
+ ]
2675
+ },
2676
+ "tags": [
2677
+ "pashto",
2678
+ "asr",
2679
+ "model",
2680
+ "whisper",
2681
+ "huggingface"
2682
+ ]
2683
+ },
2684
+ {
2685
+ "id": "project-hf-space-afaaaak-urdu-pashto-translator",
2686
+ "title": "afaaaak/urdu_pashto_translator",
2687
+ "url": "https://huggingface.co/spaces/afaaaak/urdu_pashto_translator",
2688
+ "category": "project",
2689
+ "source": "huggingface",
2690
+ "status": "verified",
2691
+ "summary": "Interactive Urdu-to-Pashto translation Space with a runnable web demo.",
2692
+ "primary_use": "Translation demo and bilingual usability testing",
2693
+ "license": "mit",
2694
+ "tasks": [
2695
+ "mt",
2696
+ "translation",
2697
+ "demo"
2698
+ ],
2699
+ "pashto_evidence": {
2700
+ "evidence_text": "Space metadata title is Urdu Pashto Translator and the slug includes pashto.",
2701
+ "evidence_url": "https://huggingface.co/spaces/afaaaak/urdu_pashto_translator",
2702
+ "markers": [
2703
+ "Pashto",
2704
+ "translator"
2705
+ ]
2706
+ },
2707
+ "tags": [
2708
+ "pashto",
2709
+ "project",
2710
+ "huggingface-space",
2711
+ "translation",
2712
+ "demo"
2713
+ ]
2714
+ },
2715
+ {
2716
+ "id": "project-hf-space-drsaqlainhassan-pashto-tokenixer",
2717
+ "title": "DrSaqlainHassan/PashtoTokenixer",
2718
+ "url": "https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer",
2719
+ "category": "project",
2720
+ "source": "huggingface",
2721
+ "status": "verified",
2722
+ "summary": "Pashto parts-of-speech identification Space for interactive NLP exploration.",
2723
+ "primary_use": "Pashto NLP demo for token and part-of-speech analysis",
2724
+ "license": "apache-2.0",
2725
+ "tasks": [
2726
+ "nlp",
2727
+ "pos-tagging",
2728
+ "demo"
2729
+ ],
2730
+ "pashto_evidence": {
2731
+ "evidence_text": "Space card title states Pashto Parts of Speech Identifier and the slug contains Pashto.",
2732
+ "evidence_url": "https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer",
2733
+ "markers": [
2734
+ "Pashto",
2735
+ "parts-of-speech"
2736
+ ]
2737
+ },
2738
+ "tags": [
2739
+ "pashto",
2740
+ "project",
2741
+ "huggingface-space",
2742
+ "nlp",
2743
+ "demo"
2744
+ ]
2745
  }
2746
  ]
2747
  }
resources/datasets/README.md CHANGED
@@ -18,10 +18,13 @@
18
  | IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY | [dataverse](https://hdl.handle.net/11272.1/AB2/GLFN3X) | [Dataverse metadata includes Pashto markers in dataset title or description. (`pashto`)](https://hdl.handle.net/11272.1/AB2/GLFN3X) | Pashto speech dataset for ASR and language identification experiments |
19
  | ihanif/pashto_asr_wer | [huggingface](https://huggingface.co/datasets/ihanif/pashto_asr_wer) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_asr_wer) | ASR training and evaluation data source |
20
  | ihanif/pashto_speech_20k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_20k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_20k) | ASR training and evaluation data source |
 
 
21
  | ihanif/pashto_speech_5k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_5k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_5k) | ASR training and evaluation data source |
22
  | ihanif/pashto_speech_ds | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_ds) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_ds) | ASR training and evaluation data source |
23
  | ihanif/pashto_speech_parquet_10k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_parquet_10k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_parquet_10k) | ASR training and evaluation data source |
24
  | Katib's Pashto Text Imagebase (KPTI) | [kaggle](https://www.kaggle.com/datasets/hassanamin/katibs-pashto-text-imagebase-kpti) | [Kaggle dataset title/subtitle includes Pashto keyword. (`Pashto`)](https://www.kaggle.com/datasets/hassanamin/katibs-pashto-text-imagebase-kpti) | OCR training and evaluation data source |
 
25
  | oowais/pushto-text-to-speech-dataset | [huggingface](https://huggingface.co/datasets/oowais/pushto-text-to-speech-dataset) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/oowais/pushto-text-to-speech-dataset) | ASR training and evaluation data source |
26
  | OPUS-100 | [huggingface](https://huggingface.co/datasets/Helsinki-NLP/opus-100) | [Dataset viewer includes en-ps split. (`en-ps`)](https://huggingface.co/datasets/Helsinki-NLP/opus-100/viewer/en-ps) | Machine translation training and evaluation |
27
  | OSCAR Corpus | [huggingface](https://huggingface.co/datasets/oscar-corpus/oscar) | [Dataset includes unshuffled_deduplicated_ps split. (`unshuffled_deduplicated_ps`)](https://huggingface.co/datasets/oscar-corpus/oscar) | Language modeling and lexicon expansion |
 
18
  | IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY | [dataverse](https://hdl.handle.net/11272.1/AB2/GLFN3X) | [Dataverse metadata includes Pashto markers in dataset title or description. (`pashto`)](https://hdl.handle.net/11272.1/AB2/GLFN3X) | Pashto speech dataset for ASR and language identification experiments |
19
  | ihanif/pashto_asr_wer | [huggingface](https://huggingface.co/datasets/ihanif/pashto_asr_wer) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_asr_wer) | ASR training and evaluation data source |
20
  | ihanif/pashto_speech_20k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_20k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_20k) | ASR training and evaluation data source |
21
+ | ihanif/pashto_speech_2k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_2k) | [Dataset metadata includes language:ps and Pashto speech dataset card details. (`language:ps`, `pashto`, `speech`)](https://huggingface.co/datasets/ihanif/pashto_speech_2k) | ASR training and controlled synthetic-speech evaluation |
22
+ | ihanif/pashto_speech_3k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_3k) | [Dataset metadata includes language:ps and task category automatic speech recognition. (`language:ps`, `automatic-speech-recognition`, `pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_3k) | ASR training and reproducible speech-data experimentation |
23
  | ihanif/pashto_speech_5k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_5k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_5k) | ASR training and evaluation data source |
24
  | ihanif/pashto_speech_ds | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_ds) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_ds) | ASR training and evaluation data source |
25
  | ihanif/pashto_speech_parquet_10k | [huggingface](https://huggingface.co/datasets/ihanif/pashto_speech_parquet_10k) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/ihanif/pashto_speech_parquet_10k) | ASR training and evaluation data source |
26
  | Katib's Pashto Text Imagebase (KPTI) | [kaggle](https://www.kaggle.com/datasets/hassanamin/katibs-pashto-text-imagebase-kpti) | [Kaggle dataset title/subtitle includes Pashto keyword. (`Pashto`)](https://www.kaggle.com/datasets/hassanamin/katibs-pashto-text-imagebase-kpti) | OCR training and evaluation data source |
27
+ | koochikoo25/Pashto-Concatenated | [huggingface](https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated) | [Dataset title explicitly states Pashto and card metadata exposes audio-text features and splits. (`Pashto`, `audio`, `transcription`)](https://huggingface.co/datasets/koochikoo25/Pashto-Concatenated) | ASR dataset preparation and split-based benchmark experiments |
28
  | oowais/pushto-text-to-speech-dataset | [huggingface](https://huggingface.co/datasets/oowais/pushto-text-to-speech-dataset) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/datasets/oowais/pushto-text-to-speech-dataset) | ASR training and evaluation data source |
29
  | OPUS-100 | [huggingface](https://huggingface.co/datasets/Helsinki-NLP/opus-100) | [Dataset viewer includes en-ps split. (`en-ps`)](https://huggingface.co/datasets/Helsinki-NLP/opus-100/viewer/en-ps) | Machine translation training and evaluation |
30
  | OSCAR Corpus | [huggingface](https://huggingface.co/datasets/oscar-corpus/oscar) | [Dataset includes unshuffled_deduplicated_ps split. (`unshuffled_deduplicated_ps`)](https://huggingface.co/datasets/oscar-corpus/oscar) | Language modeling and lexicon expansion |
resources/models/README.md CHANGED
@@ -15,6 +15,7 @@
15
  | ijazulhaq/bert-base-pashto | [huggingface](https://huggingface.co/ijazulhaq/bert-base-pashto) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/ijazulhaq/bert-base-pashto) | Pashto model baseline for downstream NLP tasks |
16
  | ijazulhaq/bert-base-pashto-v1 | [huggingface](https://huggingface.co/ijazulhaq/bert-base-pashto-v1) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/ijazulhaq/bert-base-pashto-v1) | Pashto model baseline for downstream NLP tasks |
17
  | koochikoo25/pashto-whisper-large | [huggingface](https://huggingface.co/koochikoo25/pashto-whisper-large) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/koochikoo25/pashto-whisper-large) | Pashto ASR baseline and model comparison |
 
18
  | PashtoBERT | [huggingface](https://huggingface.co/mdarhri/pashto-bert) | [Model card states training on Pashto corpus data. (`Pashto`)](https://huggingface.co/mdarhri/pashto-bert) | Pashto NLP baseline encoder |
19
  | wav2vec2 XLS-R 300M Pashto | [huggingface](https://huggingface.co/ihanif/wav2vec2-xls-r-300m-pashto) | [Model tags include pashto and ps, and model index references FLEURS config ps_af. (`pashto`, `ps`, `ps_af`)](https://huggingface.co/ihanif/wav2vec2-xls-r-300m-pashto) | Pashto ASR baseline and comparative experiments |
20
  | Whisper Base Pashto | [huggingface](https://huggingface.co/ihanif/whisper-base-pashto) | [Model ID includes Pashto and card metadata references FLEURS config ps_af. (`Pashto`, `ps_af`)](https://huggingface.co/api/models/ihanif/whisper-base-pashto) | Pashto ASR baseline and speed-accuracy comparison |
 
15
  | ijazulhaq/bert-base-pashto | [huggingface](https://huggingface.co/ijazulhaq/bert-base-pashto) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/ijazulhaq/bert-base-pashto) | Pashto model baseline for downstream NLP tasks |
16
  | ijazulhaq/bert-base-pashto-v1 | [huggingface](https://huggingface.co/ijazulhaq/bert-base-pashto-v1) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/ijazulhaq/bert-base-pashto-v1) | Pashto model baseline for downstream NLP tasks |
17
  | koochikoo25/pashto-whisper-large | [huggingface](https://huggingface.co/koochikoo25/pashto-whisper-large) | [Matched by Pashto keyword in Hugging Face search results. (`pashto`)](https://huggingface.co/koochikoo25/pashto-whisper-large) | Pashto ASR baseline and model comparison |
18
+ | koochikoo25/Whisper-medium-pashto | [huggingface](https://huggingface.co/koochikoo25/Whisper-medium-pashto) | [Model tags include ps and automatic-speech-recognition with a Pashto model name. (`ps`, `automatic-speech-recognition`, `pashto`)](https://huggingface.co/koochikoo25/Whisper-medium-pashto) | Pashto ASR baseline modeling and transcription comparison |
19
  | PashtoBERT | [huggingface](https://huggingface.co/mdarhri/pashto-bert) | [Model card states training on Pashto corpus data. (`Pashto`)](https://huggingface.co/mdarhri/pashto-bert) | Pashto NLP baseline encoder |
20
  | wav2vec2 XLS-R 300M Pashto | [huggingface](https://huggingface.co/ihanif/wav2vec2-xls-r-300m-pashto) | [Model tags include pashto and ps, and model index references FLEURS config ps_af. (`pashto`, `ps`, `ps_af`)](https://huggingface.co/ihanif/wav2vec2-xls-r-300m-pashto) | Pashto ASR baseline and comparative experiments |
21
  | Whisper Base Pashto | [huggingface](https://huggingface.co/ihanif/whisper-base-pashto) | [Model ID includes Pashto and card metadata references FLEURS config ps_af. (`Pashto`, `ps_af`)](https://huggingface.co/api/models/ihanif/whisper-base-pashto) | Pashto ASR baseline and speed-accuracy comparison |
resources/projects/README.md CHANGED
@@ -4,8 +4,10 @@
4
 
5
  | Resource | Link | Pashto Evidence | Primary Use |
6
  |---|---|---|---|
 
7
  | afaqalinagra/PASHTO-ASR-MODEL | [huggingface](https://huggingface.co/spaces/afaqalinagra/PASHTO-ASR-MODEL) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/afaqalinagra/PASHTO-ASR-MODEL) | Interactive Pashto demo and quick qualitative validation |
8
  | Aizazayyubi/pashto_asr | [huggingface](https://huggingface.co/spaces/Aizazayyubi/pashto_asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/Aizazayyubi/pashto_asr) | Interactive Pashto ASR demo for qualitative evaluation |
 
9
  | Fazlullahmamond/Pashto-Typing | [github](https://github.com/Fazlullahmamond/Pashto-Typing) | [Repository metadata (name/description/topics) includes Pashto markers. (`pashto`)](https://github.com/Fazlullahmamond/Pashto-Typing) | Interactive Pashto demo and quick qualitative validation |
10
  | ihanif/wav2vec-pashto-asr | [huggingface](https://huggingface.co/spaces/ihanif/wav2vec-pashto-asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/ihanif/wav2vec-pashto-asr) | Interactive Pashto demo and quick qualitative validation |
11
  | ihanif/wav2vec2-bert-pashto-asr | [huggingface](https://huggingface.co/spaces/ihanif/wav2vec2-bert-pashto-asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/ihanif/wav2vec2-bert-pashto-asr) | Interactive Pashto demo and quick qualitative validation |
 
4
 
5
  | Resource | Link | Pashto Evidence | Primary Use |
6
  |---|---|---|---|
7
+ | afaaaak/urdu_pashto_translator | [huggingface](https://huggingface.co/spaces/afaaaak/urdu_pashto_translator) | [Space metadata title is Urdu Pashto Translator and the slug includes pashto. (`Pashto`, `translator`)](https://huggingface.co/spaces/afaaaak/urdu_pashto_translator) | Translation demo and bilingual usability testing |
8
  | afaqalinagra/PASHTO-ASR-MODEL | [huggingface](https://huggingface.co/spaces/afaqalinagra/PASHTO-ASR-MODEL) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/afaqalinagra/PASHTO-ASR-MODEL) | Interactive Pashto demo and quick qualitative validation |
9
  | Aizazayyubi/pashto_asr | [huggingface](https://huggingface.co/spaces/Aizazayyubi/pashto_asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/Aizazayyubi/pashto_asr) | Interactive Pashto ASR demo for qualitative evaluation |
10
+ | DrSaqlainHassan/PashtoTokenixer | [huggingface](https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer) | [Space card title states Pashto Parts of Speech Identifier and the slug contains Pashto. (`Pashto`, `parts-of-speech`)](https://huggingface.co/spaces/DrSaqlainHassan/PashtoTokenixer) | Pashto NLP demo for token and part-of-speech analysis |
11
  | Fazlullahmamond/Pashto-Typing | [github](https://github.com/Fazlullahmamond/Pashto-Typing) | [Repository metadata (name/description/topics) includes Pashto markers. (`pashto`)](https://github.com/Fazlullahmamond/Pashto-Typing) | Interactive Pashto demo and quick qualitative validation |
12
  | ihanif/wav2vec-pashto-asr | [huggingface](https://huggingface.co/spaces/ihanif/wav2vec-pashto-asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/ihanif/wav2vec-pashto-asr) | Interactive Pashto demo and quick qualitative validation |
13
  | ihanif/wav2vec2-bert-pashto-asr | [huggingface](https://huggingface.co/spaces/ihanif/wav2vec2-bert-pashto-asr) | [Matched by Pashto keyword in Hugging Face Spaces search. (`pashto`)](https://huggingface.co/spaces/ihanif/wav2vec2-bert-pashto-asr) | Interactive Pashto demo and quick qualitative validation |