---
language:
  - de
license: apache-2.0
library_name: transformers
base_model: lightonai/LightOnOCR-2-1B-base
tags:
  - ocr
  - vision-language
  - lightonocr
  - document-understanding
  - german
  - shorthand
  - manuscript
  - medieval
datasets:
  - medieval-data/german-shorthand-line
pipeline_tag: image-text-to-text
---

# LightOnOCR-2-1B for German (Line-Level)

<p align="center">
  <img src="https://huggingface.co/lightonai/LightOnOCR-2-1B-base/resolve/main/lightonocr-banner.png" alt="LightOnOCR Banner" width="600"/>
</p>

This model is a **fine-tuned version of [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)** specifically trained for **line-level OCR**.

German shorthand manuscript line-level OCR

## Model Description

- **Base Model:** [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)
- **Training Data:** [medieval-data/german-shorthand-line](https://huggingface.co/datasets/medieval-data/german-shorthand-line)
- **Task:** Line-level text transcription from document images
- **Language:** German (de)
- **Architecture:** Vision-Language Model (1B parameters)

This is a **line-level model** - it expects cropped line images as input, not full pages. Each image should contain a single line of text.

## Evaluation Results

Evaluated on 50 samples from the test set:

| Metric | Base Model | **Finetuned** | Improvement |
|--------|------------|---------------|-------------|
| CER (%) | 381.26 | **21.89** | +359.37 |
| WER (%) | 494.99 | **37.41** | +457.58 |
| Perfect Matches | 0 | **0** | +0 |

*Lower CER/WER is better. Higher perfect matches is better.*

### Example Outputs

| # | Ground Truth | Base Model | **Finetuned** |
|---|--------------|------------|---------------|
| 1 | (Haupt der seligen Irmeng. gefunden. Im ... | 12/12/1998 10:00 AM 10:00 AM 10:00 AM 10... |  (Haupt der seitdem Jänner 12 20 bei Daue... |
| 2 | Schw. Reinh.: Ist vom Lagerdienst freige... | Schw. Reinh. : 2d 9.20 16 09 J. 6 |  Schw. Reinh.: Ist vom Lagerdienst frei g... |
| 3 | Klage daß im Naz.heim den Kranken die Ko... | $$
\begin{aligned}
& \text { 22 e 2 haz.... |  Klage daß im Naz.heim den Kranken die Ko... |
| 4 | Irene: Stimmung sehr verschieden. Kommen... |  |  Irene: Stimmung sehr verschiedenes. Münd... |
| 5 | Zwei Schwestern Calabrien: M. Cristina u... | 226  *Kolabrie: M. Cisneros, Urode* |  Zwei Schwestern Katalrien: M. Cristina u... |

*✓ = exact match*

## Usage

### Installation

```bash
# Requires transformers from source
pip install git+https://github.com/huggingface/transformers
pip install pillow torch
```

### Python Usage

```python
import torch
from transformers import LightOnOcrForConditionalGeneration, LightOnOcrProcessor
from PIL import Image

# Load model and processor
model_id = "wjbmattingly/LightOnOCR-2-1B-german-shorthand-line"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" else torch.float32

processor = LightOnOcrProcessor.from_pretrained(model_id)
model = LightOnOcrForConditionalGeneration.from_pretrained(
    model_id,
    torch_dtype=dtype,
).to(device)

# Load your line image
image = Image.open("your_image.jpg").convert("RGB")

# Prepare input
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    text=[text],
    images=[[image]],
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

# Generate transcription
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)

# Decode output
input_length = inputs["input_ids"].shape[1]
generated_ids = outputs[0, input_length:]
transcription = processor.decode(generated_ids, skip_special_tokens=True)

print(transcription)
```

### Batch Inference

```python
from datasets import load_dataset

# Load dataset
dataset = load_dataset("medieval-data/german-shorthand-line", split="train[:10]")

# Process batch
images = [[img.convert("RGB")] for img in dataset["image"]]
messages = [{"role": "user", "content": [{"type": "image"}]}]
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
texts = [text] * len(images)

inputs = processor(
    text=texts,
    images=images,
    return_tensors="pt",
    padding=True,
    size={"longest_edge": 700},
).to(device)
inputs["pixel_values"] = inputs["pixel_values"].to(dtype)

outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
predictions = processor.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)

for pred, gt in zip(predictions, dataset["text"]):
    print(f"Prediction: {pred}")
    print(f"Ground Truth: {gt}")
    print()
```

## Training Details

- **Base Model:** [lightonai/LightOnOCR-2-1B-base](https://huggingface.co/lightonai/LightOnOCR-2-1B-base)
- **Training Method:** Fine-tuning with frozen language model backbone
- **Optimizer:** AdamW (fused)
- **Learning Rate:** 6e-5 with linear decay
- **Precision:** bfloat16

## Limitations

- This model is trained on **line-level images**. For full-page transcription, you need to first segment the page into individual lines.
- Performance may vary on document styles not represented in the training data.

## Citation

If you use this model, please cite:

```bibtex
@misc{lightonocr2_finetuned_2026,
  title = {LightOnOCR Fine-tuned for German},
  author = {William Mattingly},
  year = {2026},
  howpublished = {\url{https://huggingface.co/wjbmattingly/LightOnOCR-2-1B-german-shorthand-line}}
}
```

And the original LightOnOCR paper:

```bibtex
@misc{lightonocr2_2026,
  title = {LightOnOCR: A 1B End-to-End Multilingual Vision-Language Model for State-of-the-Art OCR},
  author = {Said Taghadouini and Adrien Cavaill\`{e}s and Baptiste Aubertin},
  year = {2026},
  howpublished = {\url{https://arxiv.org/pdf/2601.14251}}
}
```

## Acknowledgments

- [LightOn AI](https://www.lighton.ai/) for the excellent LightOnOCR base model
- The creators of the [medieval-data/german-shorthand-line](https://huggingface.co/datasets/medieval-data/german-shorthand-line) dataset