How to use from the
Use from the
PEFT library
from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-3B-Instruct")
model = PeftModel.from_pretrained(base_model, "maru979/qwen2.5-3b-teacher-ocr-rebuilder")

Qwen2.5-3B Teacher OCR Rebuilder

This repository publishes a LoRA adapter for a narrow OCR post-processing task in a teacher-assistant workflow.

It is not a solver and not a teaching model.
It is a conservative pre-processor that rewrites noisy OCR text into a JSON object that downstream services can safely consume.

Task

Input: raw OCR text from a math exam question

Output:

{
  "stem": "cleaned problem statement",
  "answer_raw": "raw answer if clearly visible, otherwise empty",
  "solution_raw": "",
  "ocr_notes": ["risk tag 1", "risk tag 2"]
}

Intended boundary

This adapter is designed to sit on a separate line:

OCR -> OCR rebuilder -> existing GPT teaching chain

It should:

  • improve stem
  • improve answer_raw
  • reduce hallucinated answers
  • add conservative OCR risk notes

It should not:

  • replace your main GPT explanation model
  • solve the math problem
  • generate a polished solution_raw

At the current stage, solution_raw is intentionally kept empty.

Why this adapter exists

The base model can often emit valid JSON, but it tends to:

  • hallucinate answers when gold should be empty
  • drift away from the intended field semantics
  • over-talk beyond the strict OCR rebuild task

This adapter is optimized for a more conservative behavior.

Main test comparison

Evaluation setting:

  • base model: Qwen/Qwen2.5-3B-Instruct
  • adapter: current best stage-1 protocol-only LoRA
  • prompt: conservative non-solver prompt
  • generation: max_new_tokens=192
  • test set: 30 held-out samples

Metric table

Metric Base model Stage-1 adapter
JSON parse rate 80.00% 76.67%
stem exact match 0.00% 16.67%
answer_raw exact match 16.67% 60.00%
empty-answer hallucination 23.33% 0.00%

Visual comparison

JSON parse rate

Base model      80.00%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Stage-1 adapter 76.67%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

answer_raw exact match

Base model      16.67%  β–ˆβ–ˆβ–ˆ
Stage-1 adapter 60.00%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

empty-answer hallucination (lower is better)

Base model      23.33%  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Stage-1 adapter  0.00%

Semantic fidelity

We also measured average character-level similarity against gold labels on the same held-out test set.

Field Base model Stage-1 adapter
stem avg similarity 0.4898 0.7217
answer_raw avg similarity 0.4058 0.6667
ocr_notes avg similarity 0.1597 0.2391

Visual comparison

stem average similarity

Base model      0.4898  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Stage-1 adapter 0.7217  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

answer_raw average similarity

Base model      0.4058  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
Stage-1 adapter 0.6667  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ

What this means

The adapter gives up a small amount of parse rate, but buys back the behaviors that matter most for this task:

  • much better answer_raw
  • much better stem
  • zero hallucinated answers on gold-empty cases

For an OCR rebuilding module that feeds a larger teaching system, this tradeoff is usually worth it.

Dataset summary

The project used two task buckets during development:

  • single_problem_rebuild: 204 synthetic/curated samples
  • multi_problem_fragment_rebuild: 102 synthetic/curated samples

The released adapter comes from a stage-1 protocol-only training setup that focused on:

  • one JSON object only
  • fixed field schema
  • conservative extraction
  • no solution_raw generation

Stage-1 smoke subset:

  • train: 32
  • dev: 8

Known limitations

  1. solution_raw is intentionally weak and currently fixed to empty.
  2. ocr_notes is helpful but not yet fully normalized.
  3. Multi-problem mixed fragments are harder than single-problem OCR cleanup.
  4. This is a task adapter, not a general OCR foundation model.

Deployment

This repository includes a handler.py for Hugging Face Inference Endpoints custom deployment.

Recommended input:

{
  "inputs": "raw OCR text"
}

Recommended output:

{
  "stem": "...",
  "answer_raw": "...",
  "solution_raw": "",
  "ocr_notes": ["..."],
  "meta": {
    "raw_ocr_notes": ["model raw notes"]
  }
}

Local usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter_model = "maru979/qwen2.5-3b-teacher-ocr-rebuilder"

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter_model)
model.eval()

If you deploy this adapter as an endpoint, prefer the included handler.py instead of directly exposing raw generation.

Downloads last month
27
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for maru979/qwen2.5-3b-teacher-ocr-rebuilder

Base model

Qwen/Qwen2.5-3B
Adapter
(1276)
this model