Unlimited-OCR CrispEmbed GGUF

GGUF conversions of baidu/Unlimited-OCR for use with CrispEmbed.

Model

Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture:

SAM ViT-B (12 layers, 768d) — image encoder with windowed + global attention
CLIP-L/14 (24 layers, 1024d) — receives SAM features as patch embeddings (dual-encoder "DeepLIP")
Fusion — concat CLIP + SAM features (2048d) → linear projection (1280d)
DeepSeek-V2 MoE decoder (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense)
Tokenizer — GPT-2 BPE, 129,280 vocab

Files

File	Quant	Size	Notes
`unlimited-ocr-f16.gguf`	F16	6.4 GB	Full precision, reference quality
`unlimited-ocr-q8_0.gguf`	Q8_0	3.5 GB	High quality, 2x compression
`unlimited-ocr-q5_k.gguf`	Q5_K	2.4 GB	Best quantized quality (near-perfect pages)
`unlimited-ocr-q4_k.gguf`	Q4_K	2.2 GB	Recommended — reads full pages, matches the HF model
`unlimited-ocr-q3_k.gguf`	Q3_K	2.0 GB	Smaller; very good (slightly more char errors)

All quantizations preserve the vision encoder (SAM v.* + CLIP c.*), the MoE router (*.mlp_gate.weight), the projector, the token embeddings, and the lm_head at Q8_0 minimum. Keeping the lm_head at Q8_0 is essential: at Q4_K the output projection flips a borderline greedy pick early in generation, which snowballs into a hallucination — the full-page OCR fails. With it protected, the q4_k file reads full document pages identically to the unquantized HF model.

Quality vs size (the protected Q8_0 tensors dominate, so smaller quants save little): q5_k ≈ best, q4_k (this file) excellent, q3_k good, q2_k not recommended (its 2-bit experts collapse into repetition on dense body text).

Usage with CrispEmbed

# Auto-download and run
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr

# Or with explicit path
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \
  --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \
  -m /path/to/unlimited-ocr-q4_k.gguf

License

MIT (same as the original model)

Credits

Original model: Baidu
GGUF conversion: CrispEmbed

Downloads last month: 640

GGUF

Model size

3B params

Architecture

unlimited_ocr

Hardware compatibility

8-bit

16-bit

View +1 variant

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support