# Unlimited-OCR CrispEmbed GGUF GGUF conversions of [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed). ## Model Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture: - **SAM ViT-B** (12 layers, 768d) — image encoder with windowed + global attention - **CLIP-L/14** (24 layers, 1024d) — receives SAM features as patch embeddings (dual-encoder "DeepLIP") - **Fusion** — concat CLIP + SAM features (2048d) → linear projection (1280d) - **DeepSeek-V2 MoE decoder** (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense) - **Tokenizer** — GPT-2 BPE, 129,280 vocab ## Files | File | Quant | Size | Notes | |------|-------|------|-------| | `unlimited-ocr-f16.gguf` | F16 | 6.4 GB | Full precision, reference quality | | `unlimited-ocr-q8_0.gguf` | Q8_0 | 3.5 GB | High quality, 2x compression | | `unlimited-ocr-q4_k.gguf` | Q4_K | 2.0 GB | Good quality, 3.1x compression | All quantizations preserve vision encoder weights (SAM `v.*` + CLIP `c.*`) and MoE router weights at Q8_0 minimum for OCR accuracy. Projector weights also kept at Q8_0. ## Usage with CrispEmbed ```bash # Auto-download and run crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr # Or with explicit path crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \ --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \ -m /path/to/unlimited-ocr-q4_k.gguf ``` ## License MIT (same as the original model) ## Credits - Original model: [Baidu](https://huggingface.co/baidu/Unlimited-OCR) - GGUF conversion: [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed)