# Unlimited-OCR CrispEmbed GGUF

GGUF conversions of [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed).

## Model

Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture:

- **SAM ViT-B** (12 layers, 768d) — image encoder with windowed + global attention
- **CLIP-L/14** (24 layers, 1024d) — receives SAM features as patch embeddings (dual-encoder "DeepLIP")
- **Fusion** — concat CLIP + SAM features (2048d) → linear projection (1280d)
- **DeepSeek-V2 MoE decoder** (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense)
- **Tokenizer** — GPT-2 BPE, 129,280 vocab

## Files

| File | Quant | Size | Notes |
|------|-------|------|-------|
| `unlimited-ocr-f16.gguf` | F16 | 6.4 GB | Full precision, reference quality |
| `unlimited-ocr-q8_0.gguf` | Q8_0 | 3.5 GB | High quality, 2x compression |
| `unlimited-ocr-q4_k.gguf` | Q4_K | 2.0 GB | Good quality, 3.1x compression |

All quantizations preserve vision encoder weights (SAM `v.*` + CLIP `c.*`) and MoE router weights at Q8_0 minimum for OCR accuracy. Projector weights also kept at Q8_0.

## Usage with CrispEmbed

```bash
# Auto-download and run
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr

# Or with explicit path
crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \
  --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \
  -m /path/to/unlimited-ocr-q4_k.gguf
```

## License

MIT (same as the original model)

## Credits

- Original model: [Baidu](https://huggingface.co/baidu/Unlimited-OCR)
- GGUF conversion: [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed)