Image-Text-to-Text
Transformers
Safetensors
German
ocr
vision-language
lightonocr
document-understanding
german
shorthand
manuscript
medieval
conversational
Instructions to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="wjbmattingly/LightOnOCR-2-1B-german-shorthand-line") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
- SGLang
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use wjbmattingly/LightOnOCR-2-1B-german-shorthand-line with Docker Model Runner:
docker model run hf.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line
| language: | |
| - de | |
| license: apache-2.0 | |
| library_name: transformers | |
| base_model: lightonai/LightOnOCR-2-1B-base | |
| tags: | |
| - ocr | |
| - vision-language | |
| - lightonocr | |
| - document-understanding | |
| - german | |
| - shorthand | |
| - manuscript | |
| - medieval | |
| datasets: | |
| - medieval-data/german-shorthand-line | |
| pipeline_tag: image-text-to-text | |
| # LightOnOCR-2-1B for German (Line-Level) | |
| <p align="center"> | |
| <img src="https://huggingface.co/lightonai/LightOnOCR-2-1B-base/resolve/main/lightonocr-banner.png" alt="LightOnOCR Banner" width="600"/> | |
| </p> | |
| This model is a **fine-tuned version of [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)** specifically trained for **line-level OCR**. | |
| German shorthand manuscript line-level OCR | |
| ## Model Description | |
| - **Base Model:** [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base) | |
| - **Training Data:** [medieval-data/german-shorthand-line](https://huggingface.co/datasets/medieval-data/german-shorthand-line) | |
| - **Task:** Line-level text transcription from document images | |
| - **Language:** German (de) | |
| - **Architecture:** Vision-Language Model (1B parameters) | |
| This is a **line-level model** - it expects cropped line images as input, not full pages. Each image should contain a single line of text. | |
| ## Evaluation Results | |
| Evaluated on 50 samples from the test set: | |
| | Metric | Base Model | **Finetuned** | Improvement | | |
| |--------|------------|---------------|-------------| | |
| | CER (%) | 381.26 | **21.89** | +359.37 | | |
| | WER (%) | 494.99 | **37.41** | +457.58 | | |
| | Perfect Matches | 0 | **0** | +0 | | |
| *Lower CER/WER is better. Higher perfect matches is better.* | |
| ### Example Outputs | |
| | # | Ground Truth | Base Model | **Finetuned** | | |
| |---|--------------|------------|---------------| | |
| | 1 | (Haupt der seligen Irmeng. gefunden. Im ... | 12/12/1998 10:00 AM 10:00 AM 10:00 AM 10... | (Haupt der seitdem Jänner 12 20 bei Daue... | | |
| | 2 | Schw. Reinh.: Ist vom Lagerdienst freige... | Schw. Reinh. : 2d 9.20 16 09 J. 6 | Schw. Reinh.: Ist vom Lagerdienst frei g... | | |
| | 3 | Klage daß im Naz.heim den Kranken die Ko... | $$ | |
| \begin{aligned} | |
| & \text { 22 e 2 haz.... | Klage daß im Naz.heim den Kranken die Ko... | | |
| | 4 | Irene: Stimmung sehr verschieden. Kommen... | | Irene: Stimmung sehr verschiedenes. Münd... | | |
| | 5 | Zwei Schwestern Calabrien: M. Cristina u... | 226 *Kolabrie: M. Cisneros, Urode* | Zwei Schwestern Katalrien: M. Cristina u... | | |
| *✓ = exact match* | |
| ## Usage | |
| ### Installation | |
| ```bash | |
| # Requires transformers from source | |
| pip install git+https://github.com/huggingface/transformers | |
| pip install pillow torch | |
| ``` | |
| ### Python Usage | |
| ```python | |
| import torch | |
| from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor | |
| from PIL import Image | |
| # Load model and processor | |
| model_id = "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line" | |
| device = "cuda" if torch.cuda.is_available() else "cpu" | |
| dtype = torch.bfloat16 if device == "cuda" else torch.float32 | |
| processor = LightOnOcrProcessor.from_pretrained(model_id) | |
| model = LightOnOcrForConditionalGeneration.from_pretrained( | |
| model_id, | |
| torch_dtype=dtype, | |
| ).to(device) | |
| # Load your line image | |
| image = Image.open("your_image.jpg").convert("RGB") | |
| # Prepare input | |
| messages = [{"role": "user", "content": [{"type": "image"}]}] | |
| text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| inputs = processor( | |
| text=[text], | |
| images=[[image]], | |
| return_tensors="pt", | |
| padding=True, | |
| size={"longest_edge": 700}, | |
| ).to(device) | |
| inputs["pixel_values"] = inputs["pixel_values"].to(dtype) | |
| # Generate transcription | |
| with torch.no_grad(): | |
| outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False) | |
| # Decode output | |
| input_length = inputs["input_ids"].shape[1] | |
| generated_ids = outputs[0, input_length:] | |
| transcription = processor.decode(generated_ids, skip_special_tokens=True) | |
| print(transcription) | |
| ``` | |
| ### Batch Inference | |
| ```python | |
| from datasets import load_dataset | |
| # Load dataset | |
| dataset = load_dataset("medieval-data/german-shorthand-line", split="train[:10]") | |
| # Process batch | |
| images = [[img.convert("RGB")] for img in dataset["image"]] | |
| messages = [{"role": "user", "content": [{"type": "image"}]}] | |
| text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) | |
| texts = [text] * len(images) | |
| inputs = processor( | |
| text=texts, | |
| images=images, | |
| return_tensors="pt", | |
| padding=True, | |
| size={"longest_edge": 700}, | |
| ).to(device) | |
| inputs["pixel_values"] = inputs["pixel_values"].to(dtype) | |
| outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False) | |
| predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True) | |
| for pred, gt in zip(predictions, dataset["text"]): | |
| print(f"Prediction: {pred}") | |
| print(f"Ground Truth: {gt}") | |
| print() | |
| ``` | |
| ## Training Details | |
| - **Base Model:** [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base) | |
| - **Training Method:** Fine-tuning with frozen language model backbone | |
| - **Optimizer:** AdamW (fused) | |
| - **Learning Rate:** 6e-5 with linear decay | |
| - **Precision:** bfloat16 | |
| ## Limitations | |
| - This model is trained on **line-level images**. For full-page transcription, you need to first segment the page into individual lines. | |
| - Performance may vary on document styles not represented in the training data. | |
| ## Citation | |
| If you use this model, please cite: | |
| ```bibtex | |
| @misc{lightonocr2_finetuned_2026, | |
| title = {LightOnOCR Fine-tuned for German}, | |
| author = {William Mattingly}, | |
| year = {2026}, | |
| howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line}} | |
| } | |
| ``` | |
| And the original LightOnOCR paper: | |
| ```bibtex | |
| @misc{lightonocr2_2026, | |
| title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR}, | |
| author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin}, | |
| year = {2026}, | |
| howpublished = {\url{https://arxiv.org/pdf/2601.14251}} | |
| } | |
| ``` | |
| ## Acknowledgments | |
| - [LightOn AI](https://www.lighton.ai/) for the excellent LightOnOCR base model | |
| - The creators of the [medieval-data/german-shorthand-line](https://huggingface.co/datasets/medieval-data/german-shorthand-line) dataset | |