---
license: apache-2.0
language:
- en
base_model:
- Qwen/Qwen3-VL-4B-Instruct
pipeline_tag: image-to-text
tags:
- vision-language-model
- latex-ocr
- formula-recognition
- qwen3-vl
- fine-tuned
---

# Qwen3-VL-4B-LaTeX-OCR

A 4B parameter vision-language model fine-tuned from [Qwen/Qwen3-VL-4B-Instruct](Qwen/Qwen3-VL-4B-Instruct_URL) for **LaTeX formula recognition from images**. This model converts mathematical formula images into accurate LaTeX code.

## Model Highlights

- **Specialized for LaTeX OCR**: Trained to accurately transcribe mathematical formulas from images to LaTeX
- **Multi-Format Support**: Handles inline formulas, display equations, matrices, and complex multi-line expressions
- **High Accuracy**: Significantly improved formula recognition over the base model
- **Vision-Language Architecture**: Leverages Qwen3-VL's visual understanding capabilities

## Model Description

| Property | Value |
|----------|-------|
| **Base Model** | Qwen/Qwen3-VL-4B-Instruct |
| **Model Type** | Vision-Language Model (Image-to-Text) |
| **Parameters** | 4B |
| **Language** | English |
| **License** | Apache 2.0 |
| **Developer** | [Kassadin88](https://huggingface.co/Kassadin88) |

## Training Data

Trained on the [LaTeX-OCR dataset](https://huggingface.co/datasets/linxy/LaTeX_OCR) for mathematical formula image to LaTeX conversion. The dataset contains rendered LaTeX formulas paired with their source LaTeX code.

### Data Composition

| Type | Description |
|------|-------------|
| **Inline formulas** | Simple expressions like $E = mc^2$ |
| **Display equations** | Centered equations with equation numbering |
| **Matrices** | Matrix and array environments |
| **Multi-line expressions** | Aligned, gathered, and cases environments |
| **Complex formulas** | Nested fractions, integrals, summations, and tensor notation |

## Quick Start

### Using Transformers

```python
from transformers import AutoModelForCausalLM, AutoProcessor
from PIL import Image

model_name = "Kassadin88/Qwen3-VL-4B-LaTeX-OCR"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    trust_remote_code=True,
    torch_dtype="auto"
)
processor = AutoProcessor.from_pretrained(
    model_name,
    trust_remote_code=True
)

image = Image.open("formula.png")
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Transcribe the formula in the image to LaTeX."}
    ]}
]

inputs = processor.apply_chat_template(
    messages,
    images=[image],
    return_tensors="pt"
)
outputs = model.generate(**inputs, max_new_tokens=512)
result = processor.decode(outputs[0], skip_special_tokens=True)
print(result)
```

### Using vLLM (Recommended for Production)

```bash
vllm serve Kassadin88/Qwen3-VL-4B-LaTeX-OCR \
    --port 8000 \
    --max-model-len 4096 \
    --trust-remote-code
```

## Usage Tips

### For Best Results

- Use high-resolution, clean images for best recognition accuracy
- Crop images tightly around the formula to reduce background noise
- For multi-page documents, process one formula at a time

### Example Prompt

```python
messages = [
    {"role": "user", "content": [
        {"type": "image"},
        {"type": "text", "text": "Convert this mathematical formula to LaTeX."}
    ]}
]
```

## Limitations

- May struggle with handwritten formulas or low-quality images
- Complex multi-line derivations with mixed text and math may require manual review
- Not designed for general OCR tasks (text recognition from documents)
- Limited to mathematical notation; does not handle chemical equations or circuit diagrams

## Citation

```bibtex
@misc{qwen3-vl-4b-latex-ocr,
    author = {Kassadin88},
    title = {Qwen3-VL-4B-LaTeX-OCR: A Fine-Tuned Vision-Language Model for LaTeX OCR},
    year = {2026},
    publisher = {HuggingFace},
    url = {https://huggingface.co/Kassadin88/Qwen3-VL-4B-LaTeX-OCR}
}
```

## Acknowledgments

- **Base Model**: [Qwen Team](https://github.com/QwenLM/Qwen3-VL) for Qwen3-VL
- **Training Data**: [linxy](https://huggingface.co/datasets/linxy/LaTeX_OCR) for the LaTeX-OCR dataset

---

**Note:** This model is intended for research and educational purposes. Please use responsibly.