--- license: mit base_model: baidu/Unlimited-OCR base_model_relation: quantized pipeline_tag: image-text-to-text library_name: gguf tags: - gguf - llama.cpp - deepseek-ocr - ocr - vision-language - multimodal - image-text-to-text - quantized - imatrix - document-parsing language: - multilingual --- # Unlimited-OCR — GGUF GGUF quantizations of [**baidu/Unlimited-OCR**](https://huggingface.co/baidu/Unlimited-OCR), a 3B vision-language OCR model that pushes **DeepSeek-OCR** one step further (one-shot, long-horizon document parsing). This repo contains a full spread of **K-quants and i-quants** of the language model plus the **vision projector (mmproj)** needed for image input. > ⚠️ **Requires a DeepSeek-OCR–aware llama.cpp build (PR [#17400](https://github.com/ggml-org/llama.cpp/pull/17400)).** > Unlimited-OCR uses the DeepSeek-OCR architecture (a SAM+CLIP *DeepEncoder* vision tower > with a DeepSeek-V2 MoE text decoder). Support is **not yet merged into upstream `main`** — > stock llama.cpp will not load these files. Build the PR branch (instructions below). ## Files Every run needs **two** files: one language model GGUF (pick a quant) **plus** the shared vision projector. The projector is fp16 and identical for all quants. | File | Quant | Bits | Size | Notes | |---|---|---|---|---| | `Unlimited-OCR-BF16.gguf` | BF16 | 16 | 5.47 GiB | Full-precision conversion. The base every quant is made from; reference quality. | | `Unlimited-OCR-Q8_0.gguf` | Q8_0 | 8 | 2.91 GiB | Near-lossless. Best quality short of BF16; recommended if you have the disk/RAM. | | `Unlimited-OCR-Q6_K.gguf` | Q6_K | 6 | 2.43 GiB | Very high quality, essentially indistinguishable from Q8_0 for OCR. | | `Unlimited-OCR-Q5_K_M.gguf` | Q5_K_M | 5 | 2.07 GiB | High quality. Great balance when you can spare a bit more than Q4. | | `Unlimited-OCR-Q5_K_S.gguf` | Q5_K_S | 5 | 1.95 GiB | High quality, slightly smaller than Q5_K_M. | | `Unlimited-OCR-Q4_K_M.gguf` | Q4_K_M | 4 | 1.82 GiB | **Recommended default** — best overall size/quality trade-off. | | `Unlimited-OCR-Q4_K_S.gguf` | Q4_K_S | 4 | 1.68 GiB | Slightly smaller than Q4_K_M with a small quality cost. | | `Unlimited-OCR-Q3_K_M.gguf` | Q3_K_M | 3 | 1.45 GiB | Compact. Usable when memory is tight; some quality loss. | | `Unlimited-OCR-IQ4_XS.gguf` | IQ4_XS | 4 | 1.53 GiB | i-quant: smaller than Q4_K_S at similar quality (built with imatrix). | | `Unlimited-OCR-IQ4_NL.gguf` | IQ4_NL | 4 | 1.59 GiB | i-quant (non-linear): 4-bit tuned for ARM/edge; good on Jetson/Apple. | | `Unlimited-OCR-IQ3_M.gguf` | IQ3_M | 3 | 1.35 GiB | i-quant: solid 3-bit quality for the size (imatrix). | | `Unlimited-OCR-IQ3_XXS.gguf` | IQ3_XXS | 3 | 1.24 GiB | i-quant: very small 3-bit; noticeable quality loss but runnable. | | `Unlimited-OCR-IQ2_M.gguf` | IQ2_M | 2 | 1.15 GiB | i-quant: smallest here; experimental, lowest quality — for tight memory only. | **Vision projector (required for all of the above):** | File | Type | Size | |---|---|---| | `mmproj-Unlimited-OCR-F16.gguf` | F16 | 774.27 MiB | *Sizes are the on-disk GGUF sizes. The vision encoder is kept at F16 (not quantized) — it is small and quantizing it hurts OCR accuracy. i-quants were built with an importance matrix (imatrix) computed from a general-text calibration set.* ## Build llama.cpp with DeepSeek-OCR support ```bash git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp git fetch origin pull/17400/head:pr17400 && git checkout pr17400 cmake -B build -DCMAKE_BUILD_TYPE=Release # add -DGGML_CUDA=ON for NVIDIA cmake --build build -j --target llama-mtmd-cli llama-server ``` ## Run Download a quant + the projector: ```bash huggingface-cli download sahilchachra/Unlimited-OCR-GGUF \ --include "Unlimited-OCR-Q4_K_M.gguf" "mmproj-Unlimited-OCR-F16.gguf" --local-dir ./uocr ``` **CLI (one image → markdown, with layout grounding):** ```bash ./build/bin/llama-mtmd-cli \ -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \ --mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \ --image document.png \ -p "<|grounding|>Convert the document to markdown." \ --chat-template deepseek-ocr --temp 0 ``` **Server (OpenAI-compatible, with vision):** ```bash ./build/bin/llama-server \ -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \ --mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \ --chat-template deepseek-ocr -c 8192 ``` ### Prompts - `"<|grounding|>Convert the document to markdown."` — document → markdown **with** bounding-box grounding - `"\nFree OCR."` — plain text dump, no layout - `"document parsing."` — Unlimited-OCR's native parsing prompt - `"<|grounding|>OCR this image."` — OCR with detection boxes ## About the model - **Architecture:** `DeepseekOCRForCausalLM` — *DeepEncoder* vision (SAM-ViT-B + CLIP-L/14, 1024×1024 input, 16× downsample) → linear projector → **DeepSeek-V2 MoE** text decoder (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token). - **Task:** multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot long-horizon parsing). The original supports *gundam* (crop) and *base* resolution modes. - **License:** MIT (inherited from the base model). ## How these were made 1. Converted `baidu/Unlimited-OCR` to GGUF with the PR #17400 `convert_hf_to_gguf.py`. The converter targets DeepSeek-OCR, so the config's top-level `architectures` was set to `DeepseekOCRForCausalLM` and `language_config.architectures` to `DeepseekV2ForCausalLM` (the model is otherwise byte-identical to DeepSeek-OCR's tensor layout). 2. Exported the text decoder (BF16) and the vision tower (`--mmproj`, F16) separately. 3. Built an importance matrix from a general-text corpus and produced the K-/i-quants with `llama-quantize`. 4. **Verified**: the BF16 GGUF + mmproj correctly OCR a test document (text + grounding boxes) via `llama-mtmd-cli` before quantizing. ## Limitations - Needs the PR #17400 llama.cpp build until DeepSeek-OCR support lands in `main`. - Very low-bit i-quants (IQ3_XXS, IQ2_M) trade real accuracy for size — prefer **Q4_K_M** or higher for production OCR. - The vision encoder runs in fp16 regardless of the chosen text quant. ## Credits - Base model: [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) (MIT) — builds on [deepseek-ai/DeepSeek-OCR](https://github.com/deepseek-ai/DeepSeek-OCR). - GGUF / DeepSeek-OCR llama.cpp support: [ggml-org/llama.cpp#17400](https://github.com/ggml-org/llama.cpp/pull/17400). - Quantized by [sahilchachra](https://huggingface.co/sahilchachra).