--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3-VL-4B-Instruct pipeline_tag: image-to-text tags: - vision-language-model - latex-ocr - formula-recognition - qwen3-vl - fine-tuned --- # Qwen3-VL-4B-LaTeX-OCR A 4B parameter vision-language model fine-tuned from [Qwen/Qwen3-VL-4B-Instruct](Qwen/Qwen3-VL-4B-Instruct_URL) for **LaTeX formula recognition from images**. This model converts mathematical formula images into accurate LaTeX code. ## Model Highlights - **Specialized for LaTeX OCR**: Trained to accurately transcribe mathematical formulas from images to LaTeX - **Multi-Format Support**: Handles inline formulas, display equations, matrices, and complex multi-line expressions - **High Accuracy**: Significantly improved formula recognition over the base model - **Vision-Language Architecture**: Leverages Qwen3-VL's visual understanding capabilities ## Model Description | Property | Value | |----------|-------| | **Base Model** | Qwen/Qwen3-VL-4B-Instruct | | **Model Type** | Vision-Language Model (Image-to-Text) | | **Parameters** | 4B | | **Language** | English | | **License** | Apache 2.0 | | **Developer** | [Kassadin88](https://huggingface.co/Kassadin88) | ## Training Data Trained on the [LaTeX-OCR dataset](https://huggingface.co/datasets/linxy/LaTeX_OCR) for mathematical formula image to LaTeX conversion. The dataset contains rendered LaTeX formulas paired with their source LaTeX code. ### Data Composition | Type | Description | |------|-------------| | **Inline formulas** | Simple expressions like $E = mc^2$ | | **Display equations** | Centered equations with equation numbering | | **Matrices** | Matrix and array environments | | **Multi-line expressions** | Aligned, gathered, and cases environments | | **Complex formulas** | Nested fractions, integrals, summations, and tensor notation | ## Quick Start ### Using Transformers ```python from transformers import AutoModelForCausalLM, AutoProcessor from PIL import Image model_name = "Kassadin88/Qwen3-VL-4B-LaTeX-OCR" model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype="auto" ) processor = AutoProcessor.from_pretrained( model_name, trust_remote_code=True ) image = Image.open("formula.png") messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Transcribe the formula in the image to LaTeX."} ]} ] inputs = processor.apply_chat_template( messages, images=[image], return_tensors="pt" ) outputs = model.generate(**inputs, max_new_tokens=512) result = processor.decode(outputs[0], skip_special_tokens=True) print(result) ``` ### Using vLLM (Recommended for Production) ```bash vllm serve Kassadin88/Qwen3-VL-4B-LaTeX-OCR \ --port 8000 \ --max-model-len 4096 \ --trust-remote-code ``` ## Usage Tips ### For Best Results - Use high-resolution, clean images for best recognition accuracy - Crop images tightly around the formula to reduce background noise - For multi-page documents, process one formula at a time ### Example Prompt ```python messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": "Convert this mathematical formula to LaTeX."} ]} ] ``` ## Limitations - May struggle with handwritten formulas or low-quality images - Complex multi-line derivations with mixed text and math may require manual review - Not designed for general OCR tasks (text recognition from documents) - Limited to mathematical notation; does not handle chemical equations or circuit diagrams ## Citation ```bibtex @misc{qwen3-vl-4b-latex-ocr, author = {Kassadin88}, title = {Qwen3-VL-4B-LaTeX-OCR: A Fine-Tuned Vision-Language Model for LaTeX OCR}, year = {2026}, publisher = {HuggingFace}, url = {https://huggingface.co/Kassadin88/Qwen3-VL-4B-LaTeX-OCR} } ``` ## Acknowledgments - **Base Model**: [Qwen Team](https://github.com/QwenLM/Qwen3-VL) for Qwen3-VL - **Training Data**: [linxy](https://huggingface.co/datasets/linxy/LaTeX_OCR) for the LaTeX-OCR dataset --- **Note:** This model is intended for research and educational purposes. Please use responsibly.