Musawer14
/

pashto-language-resources

+---
+name: Normalization row task (good first issue)
+about: Add new Pashto text-normalization rows to the seed dataset
+title: "[Data][Normalization] Add rows for "
+labels: data, good first issue
+assignees: ''
+---
+## Goal
+Add new high-quality rows to:
+- `data/processed/normalization_seed_v0.1.tsv`
+## What to add
+- [ ] 5-20 new rows with unique `id` values
+- [ ] `raw_text` and `normalized_text` filled for every row
+- [ ] short `note` explaining the normalization change
+## Quality checklist
+- [ ] no empty fields
+- [ ] no duplicate `id`
+- [ ] punctuation/spacing normalized consistently
+- [ ] meaning preserved (no semantic rewrite)
+## Validation
+Run:
+```bash
+python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv
+```
+Paste result here:
+```
+# validation output
+```
+## Acceptance criteria
+- [ ] validator passes
+- [ ] rows are realistic Pashto examples
+- [ ] PR links this issue