---
pipeline_tag: image-text-to-text
language:
- multilingual
tags:
- baidu
- vision-language
- ocr
- custom_code
license: mit
library_name: CrispEmbed
---
# Unlimited-OCR CrispEmbed GGUF

GGUF conversions of [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed).

## Model

Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture:

- **SAM ViT-B** (12 layers, 768d) — image encoder with windowed + global attention
- **CLIP-L/14** (24 layers, 1024d) — receives SAM features as patch embeddings (dual-encoder "DeepLIP")
- **Fusion** — concat CLIP + SAM features (2048d) → linear projection (1280d)
- **DeepSeek-V2 MoE decoder** (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense)
- **Tokenizer** — GPT-2 BPE, 129,280 vocab

## Files

| File | Quant | Size | Notes |
|------|-------|------|-------|
| `unlimited-ocr-f16.gguf` | F16 | 6.4 GB | Full precision, reference quality |
| `unlimited-ocr-q8_0.gguf` | Q8_0 | 3.5 GB | High quality, 2x compression |
| `unlimited-ocr-q5_k.gguf` | Q5_K | 2.4 GB | Best quantized quality (near-perfect pages) |
| `unlimited-ocr-q4_k.gguf` | Q4_K | 2.2 GB | Recommended — reads full pages, matches the HF model |
| `unlimited-ocr-q3_k.gguf` | Q3_K | 2.0 GB | Smaller; very good (slightly more char errors) |

All quantizations preserve the vision encoder (SAM `v.*` + CLIP `c.*`), the MoE
router (`*.mlp_gate.weight`), the projector, the token embeddings, **and the
`lm_head`** at Q8_0 minimum. Keeping the `lm_head` at Q8_0 is essential: at Q4_K
the output projection flips a borderline greedy pick early in generation, which
snowballs into a hallucination — the full-page OCR fails. With it protected, the
q4_k file reads full document pages identically to the unquantized HF model.

Quality vs size (the protected Q8_0 tensors dominate, so smaller quants save
little): **q5_k** ≈ best, **q4_k** (this file) excellent, **q3_k** good, **q2_k**
not recommended (its 2-bit experts collapse into repetition on dense body text).

## Usage with CrispEmbed

```bash
# Auto-download and run
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr

# Or with explicit path
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \
  --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \
  -m /path/to/unlimited-ocr-q4_k.gguf
```

## License

MIT (same as the original model)

## Credits

- Original model: [Baidu](https://huggingface.co/baidu/Unlimited-OCR)
- GGUF conversion: [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed)