KATIB 0.8B v0.1 — Arabic OCR Model

KATIB (كاتب) is a fine-tuned Arabic OCR model built on Qwen3.5-0.8B, designed to accurately transcribe Arabic text from images — including printed documents and handwritten content.

Despite being a 0.8B parameter model, KATIB outperforms larger 2B-class Arabic OCR models on standard benchmarks while running at 2× the speed with half the memory footprint.

✨ Highlights

🏆 Outperforms Qari-OCR v0.3 (2B) on WER, CER, and BLEU
🥈 Competitive with Qari-OCR v0.2.2.1 (2B) — a stronger model — at half the size
✍️ Enhanced handwriting support — better generalization to real-world Arabic scripts
⚡ 2× faster inference compared to 2B-parameter alternatives
🪶 Lightweight — deployable on modest hardware

📊 Benchmark Results

Evaluated on an Arabic OCR test set. Lower WER/CER is better; higher BLEU is better.

Model	Size	WER ↓	CER ↓	BLEU ↑
KATIB 0.8B v0.1 (ours)	0.8B	0.2386	0.0648	0.5819
NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct	2B	0.2643	0.0782	0.5520
NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct	2B	0.1993	0.0498	0.6402
Qwen/Qwen3.5-0.8B (base, no fine-tune)	0.8B	2.5834	1.9487	0.0256

WER = Word Error Rate | CER = Character Error Rate | BLEU = Bilingual Evaluation Understudy Score

Key Takeaways

KATIB beats Qari v0.3 across all three metrics — despite being 2.5× smaller.
KATIB comes close to Qari v0.2.2.1 on WER and CER, with only a ~6 point BLEU gap — a strong result for a model at this size.
The base Qwen model without fine-tuning is essentially unusable for Arabic OCR (WER > 2.5), demonstrating the value of domain-specific fine-tuning.

🚀 Quick Start

from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch

model_id = "oddadmix/Katib-Qwen3.5-0.8B-0.1"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

image = Image.open("arabic_document.jpg")

messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": "Free OCR"}
        ]
    }
]

text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=512)

result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)

🧪 Training Details

Detail	Value
Base Model	Qwen/Qwen3.5-0.8B
Fine-tuning Method	Supervised Fine-Tuning (SFT)
Language	Arabic (Modern Standard + Handwritten)
Task	Optical Character Recognition (OCR)
Precision	float16 / bfloat16

📋 Intended Use

✅ Arabic document digitization
✅ Handwritten Arabic text recognition
✅ Arabic printed text extraction from images
✅ Low-resource / edge deployment scenarios
❌ Not intended for non-Arabic languages
❌ Not a general-purpose vision-language model

⚠️ Limitations

Performance may degrade on very low-quality or heavily degraded scans.
Dialectal Arabic and mixed-language (Arabic + Latin) text may reduce accuracy.
Extreme cursive or stylized calligraphy has not been extensively evaluated.

📄 Citation

If you use KATIB in your research or application, please consider citing this model:

@misc{katib2025,
  title     = {KATIB 0.8B v0.1: A Lightweight Arabic OCR Model},
  author    = {oddadmix},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1}
}

Downloads last month: 456

Model tree for oddadmix/Katib-Qwen3.5-0.8B-0.1

Base model

Qwen/Qwen3.5-0.8B-Base

Finetuned

Qwen/Qwen3.5-0.8B

Finetuned

unsloth/Qwen3.5-0.8B

Adapter

(21)

this model

oddadmix
/

Katib-Qwen3.5-0.8B-0.1