File size: 471 Bytes
1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a 1ad58b4 f725a8a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | # 📚 Dataset Guidelines
## 🏷️ Minimum metadata
- Speaker ID (anonymized)
- Approximate age band
- Gender (optional/self-declared)
- Dialect/region
- Recording environment and device class
## 🎧 Audio quality basics
- Prefer 16kHz+ clean speech
- Avoid clipping and heavy background noise
- Keep transcript aligned with spoken content
## ✍️ Text policy
- Use agreed normalization rules
- Keep punctuation consistent
- Track alternate spellings in glossary
|