File size: 471 Bytes
1ad58b4
f725a8a
1ad58b4
f725a8a
 
 
 
 
 
1ad58b4
f725a8a
 
 
 
1ad58b4
f725a8a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# 📚 Dataset Guidelines

## 🏷️ Minimum metadata
- Speaker ID (anonymized)
- Approximate age band
- Gender (optional/self-declared)
- Dialect/region
- Recording environment and device class

## 🎧 Audio quality basics
- Prefer 16kHz+ clean speech
- Avoid clipping and heavy background noise
- Keep transcript aligned with spoken content

## ✍️ Text policy
- Use agreed normalization rules
- Keep punctuation consistent
- Track alternate spellings in glossary