Instructions to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="wjbmattingly/LightOnOCR-2-1B-german-shorthand-line") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
- SGLang
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Docker Model Runner:
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
LightOnOCR-2-1B for German (Line-Level)
This model is a fine-tuned version of lightonai/LightOnOCR-2-1B-base specifically trained for line-level OCR.
German shorthand manuscript line-level OCR
Model Description
- Base Model: lightonai/LightOnOCR-2-1B-base
- Training Data: medieval-data/german-shorthand-line
- Task: Line-level text transcription from document images
- Language: German (de)
- Architecture: Vision-Language Model (1B parameters)
This is a line-level model - it expects cropped line images as input, not full pages. Each image should contain a single line of text.
Evaluation Results
Evaluated on 50 samples from the test set:
| Metric | Base Model | Finetuned | Improvement |
|---|---|---|---|
| CER (%) | 381.26 | 21.89 | +359.37 |
| WER (%) | 494.99 | 37.41 | +457.58 |
| Perfect Matches | 0 | 0 | +0 |
Lower CER/WER is better. Higher perfect matches is better.
Example Outputs
| # | Ground Truth | Base Model | Finetuned |
|---|---|---|---|
| 1 | (Haupt der seligen Irmeng. gefunden. Im ... | 12/12/1998 10:00 AM 10:00 AM 10:00 AM 10... | (Haupt der seitdem Jänner 12 20 bei Daue... |
| 2 | Schw. Reinh.: Ist vom Lagerdienst freige... | Schw. Reinh. : 2d 9.20 16 09 J. 6 | Schw. Reinh.: Ist vom Lagerdienst frei g... |
| 3 | Klage daß im Naz.heim den Kranken die Ko... | $$ | |
| \begin{aligned} | |||
| & \text { 22 e 2 haz.... | Klage daß im Naz.heim den Kranken die Ko... | ||
| 4 | Irene: Stimmung sehr verschieden. Kommen... | Irene: Stimmung sehr verschiedenes. Münd... | |
| 5 | Zwei Schwestern Calabrien: M. Cristina u... | 226 Kolabrie: M. Cisneros, Urode | Zwei Schwestern Katalrien: M. Cristina u... |
✓ = exact match
Usage
Installation
# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch
Python Usage
import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image
# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32
processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
model_id,
torch_dtype=dtype,
).to(device)
# Load your line image
image = Image.open("your_image.jpg").convert("RGB")
# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
text=[text],
images=[[image]],
return_tensors="pt",
padding=True,
size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)
# Generate transcription
with torch.no_grad():
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)
print(transcription)
Batch Inference
from datasets import load_dataset
# Load dataset
dataset = load_dataset("medieval-data/german-shorthand-line", split="train[:10]")
# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)
inputs = processor(
text=texts,
images=images,
return_tensors="pt",
padding=True,
size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
for pred, gt in zip(predictions, dataset["text"]):
print(f"Prediction: {pred}")
print(f"Ground Truth: {gt}")
print()
Training Details
- Base Model: lightonai/LightOnOCR-2-1B-base
- Training Method: Fine-tuning with frozen language model backbone
- Optimizer: AdamW (fused)
- Learning Rate: 6e-5 with linear decay
- Precision: bfloat16
Limitations
- This model is trained on line-level images. For full-page transcription, you need to first segment the page into individual lines.
- Performance may vary on document styles not represented in the training data.
Citation
If you use this model, please cite:
@misc{lightonocr2_finetuned_2026,
title = {LightOnOCR Fine-tuned for German},
author = {William Mattingly},
year = {2026},
howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line}}
}
And the original LightOnOCR paper:
@misc{lightonocr2_2026,
title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
year = {2026},
howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}
Acknowledgments
- LightOn AI for the excellent LightOnOCR base model
- The creators of the medieval-data/german-shorthand-line dataset
Model tree for wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
Base model
lightonai/LightOnOCR-2-1B-base