small-100 Singlish → Sinhala (ONNX INT8)

Fine-tuned and quantized small-100 seq2seq model for translating Roman-script Sinhala (Singlish) mixed with English into pure Sinhala Unicode script. Exported to ONNX and quantized to INT8 for fast CPU inference.


What it does

Takes informal Singlish text typed by Sri Lankans on social media and converts it to Sinhala script. English words are translated to their Sinhala meaning — not transliterated phonetically.

Input (Singlish) Output (Sinhala)
mama hungry, kanna yamu මම බඩගිනියි, කෑම කමු
bro API eka call karanna puluwanda බ්‍රෝ API එක කෝල් කරන්න පුළුවන්ද
today weather eka honda ne අද කාලගුණය හොඳයි නේ
mage salary eka late una මගේ වැටුප ප්‍රමාද වුණා
traffic eka godak awul වාහන තදබදය ගොඩක් අවුල්

Training details

  • Base model: alirezamsh/small100 (encoder-decoder seq2seq)
  • Fine-tuning: LoRA (r=64, alpha=128) on combined Singlish-Sinhala dataset
  • Training data:
    • Real social media code-mix: ~26,700 rows × 4 upsample
    • Phonetic transliteration replay: 20,000 rows
    • Adhoc pairs replay: 5,000 rows
    • Total: ~211,000 training samples
  • Evaluation: Golden test set of 2,549 real Singlish sentences
  • Best CER: ~13.8% on golden test set

Export details

  • Exported from merged LoRA fp32 model to ONNX
  • INT8 dynamic quantization using AVX512 VNNI kernels via optimum
  • Each ONNX file quantized separately: encoder, decoder, decoder_merged
  • Greedy decoding (num_beams=1) recommended for CPU inference

Installation

pip install optimum[onnxruntime] transformers sentencepiece

Usage

Basic inference

from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

model_id = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"

model    = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]

def translate(text):
    inputs = tokenizer(
        text,
        return_tensors="pt",
        truncation=True,
        max_length=128,
    )
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
        num_beams=1,           # greedy — fastest on CPU
        forced_bos_token_id=tgt_lang_id,
    )
    return tokenizer.decode(outputs[0], skip_special_tokens=True).strip()

print(translate("mama hungry, kanna yamu"))
# → මම බඩගිනියි, කෑම කමු

Batch inference

texts = [
    "mama hungry, kanna yamu",
    "bro API eka call karanna puluwanda",
    "today weather eka honda ne",
]

inputs = tokenizer(
    texts,
    return_tensors="pt",
    padding=True,
    truncation=True,
    max_length=128,
)
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    num_beams=1,
    forced_bos_token_id=tgt_lang_id,
)
for out in outputs:
    print(tokenizer.decode(out, skip_special_tokens=True).strip())

FastAPI server example

from fastapi import FastAPI
from pydantic import BaseModel
from optimum.onnxruntime import ORTModelForSeq2SeqLM
from transformers import AutoTokenizer

app       = FastAPI()
model_id  = "savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8"
model     = ORTModelForSeq2SeqLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.src_lang  = "en"
tgt_lang_id = tokenizer.lang_code_to_id["si"]

class Request(BaseModel):
    text: str

@app.post("/translate")
def translate(req: Request):
    inputs = tokenizer(req.text, return_tensors="pt", truncation=True, max_length=128)
    outputs = model.generate(
        **inputs,
        max_new_tokens=128,
        num_beams=3,
        forced_bos_token_id=tgt_lang_id,
    )
    result = tokenizer.decode(outputs[0], skip_special_tokens=True).strip()
    return {"input": req.text, "output": result}

Run with:

uvicorn app:app --host 0.0.0.0 --port 8000

Performance

Benchmarked on CPU (single thread, greedy decoding, max_new_tokens=64):

Setup Latency per sentence
PyTorch fp32 + beam=3 ~3-5 seconds
ONNX INT8 + greedy ~200-400ms

Limitations

  • Optimized for short conversational sentences (5-20 words)
  • Very long sentences (50+ words) may see quality degradation
  • Rare English words without common Sinhala equivalents may be transliterated rather than translated
  • Conjunct consonants (e.g. ශ්‍රී, ක්‍රිකට්) may occasionally render without ZWJ joiner depending on font rendering

Related models

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for savinugunarathna/small-100-Singlish-Sinhala-CodeMix2-ONNX-INT8

Quantized
(2)
this model