Image-Text-to-Text
PEFT
Safetensors
Arabic
Generated from Trainer
unsloth
sft
trl
ocr
arabic
vision-language
handwriting-recognition
conversational
Instructions to use oddadmix/Katib-Qwen3.5-0.8B-0.1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use oddadmix/Katib-Qwen3.5-0.8B-0.1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen3.5-0.8B") model = PeftModel.from_pretrained(base_model, "oddadmix/Katib-Qwen3.5-0.8B-0.1") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Unsloth Studio
How to use oddadmix/Katib-Qwen3.5-0.8B-0.1 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for oddadmix/Katib-Qwen3.5-0.8B-0.1 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for oddadmix/Katib-Qwen3.5-0.8B-0.1 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for oddadmix/Katib-Qwen3.5-0.8B-0.1 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="oddadmix/Katib-Qwen3.5-0.8B-0.1", max_seq_length=2048, )
KATIB 0.8B v0.1 β Arabic OCR Model
KATIB (ΩΨ§ΨͺΨ¨) is a fine-tuned Arabic OCR model built on Qwen3.5-0.8B, designed to accurately transcribe Arabic text from images β including printed documents and handwritten content.
Despite being a 0.8B parameter model, KATIB outperforms larger 2B-class Arabic OCR models on standard benchmarks while running at 2Γ the speed with half the memory footprint.
β¨ Highlights
- π Outperforms Qari-OCR v0.3 (2B) on WER, CER, and BLEU
- π₯ Competitive with Qari-OCR v0.2.2.1 (2B) β a stronger model β at half the size
- βοΈ Enhanced handwriting support β better generalization to real-world Arabic scripts
- β‘ 2Γ faster inference compared to 2B-parameter alternatives
- πͺΆ Lightweight β deployable on modest hardware
π Benchmark Results
Evaluated on an Arabic OCR test set. Lower WER/CER is better; higher BLEU is better.
| Model | Size | WER β | CER β | BLEU β |
|---|---|---|---|---|
| KATIB 0.8B v0.1 (ours) | 0.8B | 0.2386 | 0.0648 | 0.5819 |
| NAMAA-Space/Qari-OCR-v0.3-VL-2B-Instruct | 2B | 0.2643 | 0.0782 | 0.5520 |
| NAMAA-Space/Qari-OCR-0.2.2.1-VL-2B-Instruct | 2B | 0.1993 | 0.0498 | 0.6402 |
| Qwen/Qwen3.5-0.8B (base, no fine-tune) | 0.8B | 2.5834 | 1.9487 | 0.0256 |
WER = Word Error Rate | CER = Character Error Rate | BLEU = Bilingual Evaluation Understudy Score
Key Takeaways
- KATIB beats Qari v0.3 across all three metrics β despite being 2.5Γ smaller.
- KATIB comes close to Qari v0.2.2.1 on WER and CER, with only a ~6 point BLEU gap β a strong result for a model at this size.
- The base Qwen model without fine-tuning is essentially unusable for Arabic OCR (WER > 2.5), demonstrating the value of domain-specific fine-tuning.
π Quick Start
from transformers import AutoProcessor, AutoModelForImageTextToText
from PIL import Image
import torch
model_id = "oddadmix/Katib-Qwen3.5-0.8B-0.1"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForImageTextToText.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
image = Image.open("arabic_document.jpg")
messages = [
{
"role": "user",
"content": [
{"type": "image", "image": image},
{"type": "text", "text": "Free OCR"}
]
}
]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(result)
π§ͺ Training Details
| Detail | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-0.8B |
| Fine-tuning Method | Supervised Fine-Tuning (SFT) |
| Language | Arabic (Modern Standard + Handwritten) |
| Task | Optical Character Recognition (OCR) |
| Precision | float16 / bfloat16 |
π Intended Use
- β Arabic document digitization
- β Handwritten Arabic text recognition
- β Arabic printed text extraction from images
- β Low-resource / edge deployment scenarios
- β Not intended for non-Arabic languages
- β Not a general-purpose vision-language model
β οΈ Limitations
- Performance may degrade on very low-quality or heavily degraded scans.
- Dialectal Arabic and mixed-language (Arabic + Latin) text may reduce accuracy.
- Extreme cursive or stylized calligraphy has not been extensively evaluated.
π Citation
If you use KATIB in your research or application, please consider citing this model:
@misc{katib2025,
title = {KATIB 0.8B v0.1: A Lightweight Arabic OCR Model},
author = {oddadmix},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/oddadmix/Katib-Qwen3.5-0.8B-0.1}
}
- Downloads last month
- 456
Model tree for oddadmix/Katib-Qwen3.5-0.8B-0.1
Datasets used to train oddadmix/Katib-Qwen3.5-0.8B-0.1
Preview β’ Updated β’ 401 β’ 4
oddadmix/qari-0.2.2-diacritics-dataset-large
Viewer β’ Updated β’ 136k β’ 31 β’ 4
oddadmix/qari-0.2.2-news-dataset-large
Viewer β’ Updated β’ 185k β’ 22 β’ 5