File size: 1,467 Bytes
1ad58b4
f725a8a
1ad58b4
f725a8a
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a
 
1ad58b4
f725a8a
 
 
 
 
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# πŸ“˜ Project Purpose

## ❓ Why this project exists
Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration.

## 🌟 Mission
Create high-quality open resources that enable Pashto to work reliably in:
- Speech recognition (ASR)
- Text-to-speech (TTS)
- Translation and NLP tooling

## βœ… What success looks like
- Public Pashto datasets with clear quality standards
- Reproducible baseline models and training pipelines
- Public benchmark/leaderboard for fair model comparison
- Open desktop/API demos that real users can run

## πŸ•ŠοΈ Non-commercial commitment
This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization.

## 🧭 Principles
- Openness: data/model/process transparency
- Inclusivity: dialect and accent diversity
- Quality: strong labeling/review standards
- Reproducibility: scripts, configs, and documented experiments
- Continuity: release cadence and long-term maintenance

## πŸ“¦ Scope (v1 foundation)
- Build core repository and contributor workflows
- Launch Pashto data collection and validation pipeline
- Publish ASR and TTS baselines
- Publish first benchmark set and metrics

## 🚫 Out of scope (for now)
- Closed paid APIs as the only path
- Private datasets without reproducible provenance
- Productization before core language quality is established