--- pipeline_tag: image-text-to-text language: - multilingual tags: - baidu - vision-language - ocr - custom_code license: mit library_name: CrispEmbed --- # Unlimited-OCR CrispEmbed GGUF GGUF conversions of [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed). ## Model Unlimited-OCR is a 3.3B parameter VLM for full-page OCR. Architecture: - **SAM ViT-B** (12 layers, 768d) — image encoder with windowed + global attention - **CLIP-L/14** (24 layers, 1024d) — receives SAM features as patch embeddings (dual-encoder "DeepLIP") - **Fusion** — concat CLIP + SAM features (2048d) → linear projection (1280d) - **DeepSeek-V2 MoE decoder** (12 layers, 1280d, 64 routed experts top-6, 2 shared experts, layer 0 dense) - **Tokenizer** — GPT-2 BPE, 129,280 vocab ## Files | File | Quant | Size | Notes | |------|-------|------|-------| | `unlimited-ocr-f16.gguf` | F16 | 6.4 GB | Full precision, reference quality | | `unlimited-ocr-q8_0.gguf` | Q8_0 | 3.5 GB | High quality, 2x compression | | `unlimited-ocr-q5_k.gguf` | Q5_K | 2.4 GB | Best quantized quality (near-perfect pages) | | `unlimited-ocr-q4_k.gguf` | Q4_K | 2.2 GB | Recommended — reads full pages, matches the HF model | | `unlimited-ocr-q3_k.gguf` | Q3_K | 2.0 GB | Smaller; very good (slightly more char errors) | All quantizations preserve the vision encoder (SAM `v.*` + CLIP `c.*`), the MoE router (`*.mlp_gate.weight`), the projector, the token embeddings, **and the `lm_head`** at Q8_0 minimum. Keeping the `lm_head` at Q8_0 is essential: at Q4_K the output projection flips a borderline greedy pick early in generation, which snowballs into a hallucination — the full-page OCR fails. With it protected, the q4_k file reads full document pages identically to the unquantized HF model. Quality vs size (the protected Q8_0 tensors dominate, so smaller quants save little): **q5_k** ≈ best, **q4_k** (this file) excellent, **q3_k** good, **q2_k** not recommended (its 2-bit experts collapse into repetition on dense body text). ## Usage with CrispEmbed ```bash # Auto-download and run crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr -m unlimited-ocr # Or with explicit path crispembed --ocr-pipeline image.png --ocr-engine unlimited_ocr \ --ocr-rec /path/to/unlimited-ocr-q4_k.gguf \ -m /path/to/unlimited-ocr-q4_k.gguf ``` ## License MIT (same as the original model) ## Credits - Original model: [Baidu](https://huggingface.co/baidu/Unlimited-OCR) - GGUF conversion: [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed)