Unlimited-OCR CrispEmbed GGUF

GGUF conversions of baidu/Unlimited-OCR for use with CrispEmbed.

Model

Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture:

  • SAM ViT-B (12 layers, 768d) β€” image encoder with windowed + global attention
  • CLIP-L/14 (24 layers, 1024d) β€” receives SAM features as patch embeddings (dual-encoder "DeepLIP")
  • Fusion β€” concat CLIP + SAM features (2048d) β†’ linear projection (1280d)
  • DeepSeek-V2 MoE decoder (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense)
  • Tokenizer β€” GPT-2 BPE, 129,280 vocab

Files

File Quant Size Notes
unlimited-ocr-f16.gguf F16 6.4 GB Full precision, reference quality
unlimited-ocr-q8_0.gguf Q8_0 3.5 GB High quality, 2x compression
unlimited-ocr-q5_k.gguf Q5_K 2.4 GB Best quantized quality (near-perfect pages)
unlimited-ocr-q4_k.gguf Q4_K 2.2 GB Recommended β€” reads full pages, matches the HF model
unlimited-ocr-q3_k.gguf Q3_K 2.0 GB Smaller; very good (slightly more char errors)

All quantizations preserve the vision encoder (SAM v.* + CLIP c.*), the MoE router (*.mlp_gate.weight), the projector, the token embeddings, and the lm_head at Q8_0 minimum. Keeping the lm_head at Q8_0 is essential: at Q4_K the output projection flips a borderline greedy pick early in generation, which snowballs into a hallucination β€” the full-page OCR fails. With it protected, the q4_k file reads full document pages identically to the unquantized HF model.

Quality vs size (the protected Q8_0 tensors dominate, so smaller quants save little): q5_k β‰ˆ best, q4_k (this file) excellent, q3_k good, q2_k not recommended (its 2-bit experts collapse into repetition on dense body text).

Usage with CrispEmbed

# Auto-download and run
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr

# Or with explicit path
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \
  --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \
  -m /path/to/unlimited-ocr-q4_k.gguf

License

MIT (same as the original model)

Credits

Downloads last month
640
GGUF
Model size
3B params
Architecture
unlimited_ocr
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support