musaw commited on
Commit
fbc2945
·
1 Parent(s): 379266c

chore(community): add good first issue template for normalization rows

Browse files
.github/ISSUE_TEMPLATE/normalization_row_task.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ name: Normalization row task (good first issue)
3
+ about: Add new Pashto text-normalization rows to the seed dataset
4
+ title: "[Data][Normalization] Add rows for "
5
+ labels: data, good first issue
6
+ assignees: ''
7
+ ---
8
+
9
+ ## Goal
10
+ Add new high-quality rows to:
11
+ - `data/processed/normalization_seed_v0.1.tsv`
12
+
13
+ ## What to add
14
+ - [ ] 5-20 new rows with unique `id` values
15
+ - [ ] `raw_text` and `normalized_text` filled for every row
16
+ - [ ] short `note` explaining the normalization change
17
+
18
+ ## Quality checklist
19
+ - [ ] no empty fields
20
+ - [ ] no duplicate `id`
21
+ - [ ] punctuation/spacing normalized consistently
22
+ - [ ] meaning preserved (no semantic rewrite)
23
+
24
+ ## Validation
25
+ Run:
26
+ ```bash
27
+ python scripts/validate_normalization.py data/processed/normalization_seed_v0.1.tsv
28
+ ```
29
+
30
+ Paste result here:
31
+ ```
32
+ # validation output
33
+ ```
34
+
35
+ ## Acceptance criteria
36
+ - [ ] validator passes
37
+ - [ ] rows are realistic Pashto examples
38
+ - [ ] PR links this issue