File size: 1,467 Bytes

1ad58b4
f725a8a
1ad58b4
f725a8a
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a
 
1ad58b4
f725a8a
 
 
 
 
 
1ad58b4
f725a8a
 
 
 
 
1ad58b4
f725a8a

# 📘 Project Purpose

## ❓ Why this project exists
Pashto remains underrepresented in open AI speech/language resources. This project exists to close that gap through community collaboration.

## 🌟 Mission
Create high-quality open resources that enable Pashto to work reliably in:
- Speech recognition (ASR)
- Text-to-speech (TTS)
- Translation and NLP tooling

## ✅ What success looks like
- Public Pashto datasets with clear quality standards
- Reproducible baseline models and training pipelines
- Public benchmark/leaderboard for fair model comparison
- Open desktop/API demos that real users can run

## 🕊️ Non-commercial commitment
This initiative is community-first and public-benefit oriented. The project is not being built for proprietary lock-in or short-term commercialization.

## 🧭 Principles
- Openness: data/model/process transparency
- Inclusivity: dialect and accent diversity
- Quality: strong labeling/review standards
- Reproducibility: scripts, configs, and documented experiments
- Continuity: release cadence and long-term maintenance

## 📦 Scope (v1 foundation)
- Build core repository and contributor workflows
- Launch Pashto data collection and validation pipeline
- Publish ASR and TTS baselines
- Publish first benchmark set and metrics

## 🚫 Out of scope (for now)
- Closed paid APIs as the only path
- Private datasets without reproducible provenance
- Productization before core language quality is established