--- license: apache-2.0 base_model: - Qwen/Qwen3.5-2B library_name: transformers pipeline_tag: image-text-to-text tags: - qwen - qwen3.5 - vision-language - handwritten-math - math-ocr - latex-ocr - image-to-text - sft - dpo --- # Qwen3.5-2B-MathParser-pro ## Model Summary Qwen3.5-2B-MathParser-pro is a compact vision-language model for handwritten mathematical formula OCR. It is optimized to transcribe single-line and multi-line handwritten mathematical expressions into LaTeX, with a focus on local deployment. This 2B release is intended for lower-memory local deployment. The companion release is `Qwen3.5-4B-MathParser-pro`. ## Intended Use - Handwritten mathematical formula recognition - Multi-line LaTeX transcription - OCR for mathematical expressions and derivations - Research and application prototyping around handwritten math parsing This model is not intended to be a general mathematical reasoning model. It should be used as an OCR/transcription model. ## Training Recipe The model follows a two-stage MathParser training recipe: 1. **Stage 1 SFT** builds a stable handwritten mathematical OCR base and teaches direct LaTeX transcription. 2. **Stage 2 DPO v34** prefers concise, stable, line-count-faithful transcriptions and reduces malformed outputs, repetition, max-token runaway, and very low-similarity failures. The released weights are fully merged model weights, not LoRA adapters. ## Evaluation Evaluation set: 756 multi-line handwritten mathematical formula samples. Metrics: - **Avg Sim / Median Sim**: normalized edit similarity, higher is better. - **Line Match**: exact line-count match with ground truth. - **Within +/-1**: predicted line count differs from ground truth by at most one. - **Runaway**: max-token or obviously overlong/repetitive generations, lower is better. - **Bad <0.50**: samples with similarity below 0.50, lower is better. | Model | Samples | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 | |---|---:|---:|---:|---:|---:|---:|---:| | Qwen3.5-0.8B Base | 756 | 0.544843 | 0.580742 | 149 | 235 | 108 | 262 | | Qwen3.5-2B Base | 756 | 0.599258 | 0.651649 | 252 | 392 | 19 | 236 | | Qwen3.5-4B Base | 756 | 0.534456 | 0.541674 | 264 | 368 | 5 | 295 | | Qwen3.5-2B SFT | 756 | 0.906516 | 0.952732 | 550 | 706 | 13 | 25 | | Qwen3.5-2B SFT+DPO | 756 | 0.916060 | 0.951464 | 569 | 714 | 3 | 15 | | Qwen3.5-4B SFT | 756 | 0.942045 | 0.966546 | 612 | 730 | 0 | 2 | | Qwen3.5-4B SFT+DPO | 756 | 0.942878 | 0.968560 | 611 | 730 | 0 | 1 | For this release, the main result is: | Release | Avg Sim | Median Sim | Line Match | Within +/-1 | Runaway | Bad <0.50 | |---|---:|---:|---:|---:|---:|---:| | Qwen3.5-2B-MathParser-pro | 0.916060 | 0.951464 | 569 | 714 | 3 | 15 | ## Figures ![Overall average similarity](figures/overall_avg_similarity.png) ![Error reduction](figures/error_reduction.png) ![Bucket average similarity](figures/bucket_avg_similarity.png) ![Model size quality tradeoff](figures/model_size_quality_tradeoff.png) ## Usage ```python from PIL import Image import torch from transformers import AutoModelForImageTextToText, AutoProcessor from qwen_vl_utils import process_vision_info model_id = "sugartai/Qwen3.5-2B-MathParser-pro" processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained( model_id, trust_remote_code=True, dtype=torch.bfloat16, device_map="auto", ).eval() image = Image.open("formula.png").convert("RGB") messages = [ { "role": "system", "content": "You are a handwritten mathematical OCR model. Return only the LaTeX transcription.", }, { "role": "user", "content": [ {"type": "image", "image": image}, {"type": "text", "text": "Transcribe the handwritten mathematical formula into LaTeX only."}, ], }, ] text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False, ) image_inputs, video_inputs = process_vision_info(messages) inputs = processor( text=[text], images=image_inputs, videos=video_inputs, padding=True, return_tensors="pt", ).to(model.device) eos_ids = [processor.tokenizer.eos_token_id] pad_id = processor.tokenizer.pad_token_id if pad_id is not None and pad_id not in eos_ids: eos_ids.append(pad_id) with torch.no_grad(): output_ids = model.generate( **inputs, max_new_tokens=1536, do_sample=False, num_beams=1, eos_token_id=eos_ids, pad_token_id=pad_id if pad_id is not None else eos_ids[0], ) new_ids = output_ids[:, inputs["input_ids"].shape[1]:] print(processor.decode(new_ids[0], skip_special_tokens=True)) ``` ## Limitations - The model is specialized for handwritten mathematical OCR and LaTeX transcription. - It is not a general reasoning or theorem-proving model. - Very noisy images, unusual notation, extreme layout variation, or out-of-distribution handwriting may degrade quality. - The reported metrics are from an internal 756-sample multi-line handwritten formula evaluation set. ## License This model is released under Apache 2.0, following the base model license of `Qwen/Qwen3.5-2B`. ## Citation If you use this model, please cite or link this model page and the Qwen3.5 base model.