small-100 Singlish → Sinhala (ONNX INT8)

Fine-tuned and quantized small-100 seq2seq model for translating Roman-script Sinhala (Singlish) mixed with English into pure Sinhala Unicode script. Exported to ONNX and quantized to INT8 for fast CPU inference.

What it does

Takes informal Singlish text typed by Sri Lankans on social media and converts it to Sinhala script. English words are translated to their Sinhala meaning — not transliterated phonetically.

Input (Singlish)	Output (Sinhala)
mama hungry, kanna yamu	මම බඩගිනියි, කෑම කමු
bro API eka call karanna puluwanda	බ්‍රෝ API එක කෝල් කරන්න පුළුවන්ද
today weather eka honda ne	අද කාලගුණය හොඳයි නේ
mage salary eka late una	මගේ වැටුප ප්‍රමාද වුණා
traffic eka godak awul	වාහන තදබදය ගොඩක් අවුල්

Training details

Base model: alirezamsh/small100 (encoder-decoder seq2seq)
Fine-tuning: LoRA (r=64, alpha=128) on combined Singlish-Sinhala dataset
Training data:
- Real social media code-mix: ~26,700 rows × 4 upsample
- Phonetic transliteration replay: 20,000 rows
- Adhoc pairs replay: 5,000 rows
- Total: ~211,000 training samples
Evaluation: Golden test set of 2,549 real Singlish sentences
Best CER: ~13.8% on golden test set

Export details

Exported from merged LoRA fp32 model to ONNX
INT8 dynamic quantization using AVX512 VNNI kernels via optimum
Each ONNX file quantized separately: encoder, decoder, decoder_merged
Greedy decoding (num_beams=1) recommended for CPU inference

Installation

pip install optimum[onnxruntime] transformers sentencepiece

Usage

Basic inference

from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

model_id = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"

model    = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]

def translate(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=128,
    )
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        num_beams=1,           # greedy — fastest on CPU
        forced_bos_token_id=tgt_lang_id,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

print(translate("mama hungry, kanna yamu"))
# → මම බඩගිනියි, කෑම කමු

Batch inference

texts = [
    "mama hungry, kanna yamu",
    "bro API eka call karanna puluwanda",
    "today weather eka honda ne",
]

inputs = tokenizer(
    texts,
    return_tensors="pt",
    padding=True,
    truncation=True,
    max_length=128,
)
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    num_beams=1,
    forced_bos_token_id=tgt_lang_id,
)
for out in outputs:
    print(tokenizer.decode(out, skip_special_tokens=True).strip())

FastAPI server example

from fastapi import FastAPI
from pydantic import BaseModel
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

app       = FastAPI()
model_id  = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"
model     = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang  = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]

class Request(BaseModel):
    text: str

@app.post("/translate")
def translate(req: Request):
    inputs = tokenizer(req.text, return_tensors="pt", truncation=True, max_length=128)
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        num_beams=3,
        forced_bos_token_id=tgt_lang_id,
    )
    result = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    return {"input": req.text, "output": result}

Run with:

uvicorn app:app --host 0.0.0.0 --port 8000

Performance

Benchmarked on CPU (single thread, greedy decoding, max_new_tokens=64):

Setup	Latency per sentence
PyTorch fp32 + beam=3	~3-5 seconds
ONNX INT8 + greedy	~200-400ms

Limitations

Optimized for short conversational sentences (5-20 words)
Very long sentences (50+ words) may see quality degradation
Rare English words without common Sinhala equivalents may be transliterated rather than translated
Conjunct consonants (e.g. ශ්‍රී, ක්‍රිකට්) may occasionally render without ZWJ joiner depending on font rendering

Related models

small-100-Singlish-Sinhala-CodeMix2 — full fp32 PyTorch version of this model
Small100-Singlish-Sinhala-Merged — base merged model before code-mix fine-tuning

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8

Base model

alirezamsh/small100

Quantized

(2)

this model