small-100 Singlish → Sinhala (ONNX INT8)
Fine-tuned and quantized small-100 seq2seq model for translating Roman-script Sinhala (Singlish) mixed with English into pure Sinhala Unicode script. Exported to ONNX and quantized to INT8 for fast CPU inference.
What it does
Takes informal Singlish text typed by Sri Lankans on social media and converts it to Sinhala script. English words are translated to their Sinhala meaning — not transliterated phonetically.
| Input (Singlish) | Output (Sinhala) |
|---|---|
| mama hungry, kanna yamu | මම බඩගිනියි, කෑම කමු |
| bro API eka call karanna puluwanda | බ්රෝ API එක කෝල් කරන්න පුළුවන්ද |
| today weather eka honda ne | අද කාලගුණය හොඳයි නේ |
| mage salary eka late una | මගේ වැටුප ප්රමාද වුණා |
| traffic eka godak awul | වාහන තදබදය ගොඩක් අවුල් |
Training details
- Base model: alirezamsh/small100 (encoder-decoder seq2seq)
- Fine-tuning: LoRA (r=64, alpha=128) on combined Singlish-Sinhala dataset
- Training data:
- Real social media code-mix: ~26,700 rows × 4 upsample
- Phonetic transliteration replay: 20,000 rows
- Adhoc pairs replay: 5,000 rows
- Total: ~211,000 training samples
- Evaluation: Golden test set of 2,549 real Singlish sentences
- Best CER: ~13.8% on golden test set
Export details
- Exported from merged LoRA fp32 model to ONNX
- INT8 dynamic quantization using AVX512 VNNI kernels via
optimum - Each ONNX file quantized separately: encoder, decoder, decoder_merged
- Greedy decoding (num_beams=1) recommended for CPU inference
Installation
pip install optimum[onnxruntime] transformers sentencepiece
Usage
Basic inference
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer
model_id = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]
def translate(text):
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
max_length=128,
)
outputs = model.generate(
**inputs,
max_new_tokens=64,
num_beams=1, # greedy — fastest on CPU
forced_bos_token_id=tgt_lang_id,
)
return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
print(translate("mama hungry, kanna yamu"))
# → මම බඩගිනියි, කෑම කමු
Batch inference
texts = [
"mama hungry, kanna yamu",
"bro API eka call karanna puluwanda",
"today weather eka honda ne",
]
inputs = tokenizer(
texts,
return_tensors="pt",
padding=True,
truncation=True,
max_length=128,
)
outputs = model.generate(
**inputs,
max_new_tokens=64,
num_beams=1,
forced_bos_token_id=tgt_lang_id,
)
for out in outputs:
print(tokenizer.decode(out, skip_special_tokens=True).strip())
FastAPI server example
from fastapi import FastAPI
from pydantic import BaseModel
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer
app = FastAPI()
model_id = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"
model = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]
class Request(BaseModel):
text: str
@app.post("/translate")
def translate(req: Request):
inputs = tokenizer(req.text, return_tensors="pt", truncation=True, max_length=128)
outputs = model.generate(
**inputs,
max_new_tokens=128,
num_beams=3,
forced_bos_token_id=tgt_lang_id,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
return {"input": req.text, "output": result}
Run with:
uvicorn app:app --host 0.0.0.0 --port 8000
Performance
Benchmarked on CPU (single thread, greedy decoding, max_new_tokens=64):
| Setup | Latency per sentence |
|---|---|
| PyTorch fp32 + beam=3 | ~3-5 seconds |
| ONNX INT8 + greedy | ~200-400ms |
Limitations
- Optimized for short conversational sentences (5-20 words)
- Very long sentences (50+ words) may see quality degradation
- Rare English words without common Sinhala equivalents may be transliterated rather than translated
- Conjunct consonants (e.g. ශ්රී, ක්රිකට්) may occasionally render without ZWJ joiner depending on font rendering
Related models
- small-100-Singlish-Sinhala-CodeMix2 — full fp32 PyTorch version of this model
- Small100-Singlish-Sinhala-Merged — base merged model before code-mix fine-tuning
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8
Base model
alirezamsh/small100