---
license: other
datasets:
- Remeinium/CleanSinhalaTextCorpus
language:
- si
pipeline_tag: feature-extraction
library_name: fasttext
tags:
- sinhala
- fasttext
- vectors
- embedding
- nlp
- low-resource-languages
- remeinium
---

# UgannA Siyabasa V2 — FastText Sinhala Embedding Model 🇱🇰

> UgannA_SiyabasaV2 (උගන්නැ සියබස) is the first public FastText embedding model released by Remeinium Corp. The name comes from Kumaratunga Munidasa’s timeless quote:
“උගන්නැ සියබස – මත් වන්නැ එහි රසයෙන්” Learn Sinhala – be intoxicated with its beauty.

Just as Munidasa envisioned nurturing the Sinhala language, this model represents teaching it to machines.

📌 Key Features
- Type: FastText
- Vector size: 300 dimensions
- File size: ~3.94GB
- Training data: 17GB processed Sinhala text

# 🔧 Usage
```python
import fasttext
# Load model
model = fasttext.load_model("Remeinium/UgannA_Siyabasa/UgannA_Siyabasa.bin")

# Get vector for a word
vector = model.get_word_vector("අම්මා")

# Get nearest neighbors
neighbors = model.get_nearest_neighbors("අම්මා", k=10)
print(neighbors)
```

# Use API
- Test Live: [Embedding Playground]( https://huggingface.co/spaces/Remeinium/Embedding_Siyabasa ) 
- API Docs: [Go to API Console]( https://esdocs.ai.remeinium.com )

# 📂 Training Data
- Processed and cleaned training corpus: ~17GB
- Preprocessing: tokenization, normalization, deduplication

# 🗜️ License
This model is released under the **[Remeinium Open Model License (ROML)]( https://huggingface.co/Remeinium/UgannA_SiyabasaV2/blob/main/LICENCE )**.  
It permits research and commercial use with attribution.  
See the LICENSE file for full terms.

# ⚠️ Limitations
- May reflect cultural/linguistic biases from sources.
- Optimized for Sinhala; not multilingual.

# 🤝 Collaboration
You are welcome to:
- Use this model for research & your projects
- Share improvements, benchmarks, or downstream applications
- Contact : 📧 support@remeinium.com