---
license: mit
base_model: baidu/Unlimited-OCR
base_model_relation: quantized
pipeline_tag: image-text-to-text
library_name: gguf
tags:
  - gguf
  - llama.cpp
  - deepseek-ocr
  - ocr
  - vision-language
  - multimodal
  - image-text-to-text
  - quantized
  - imatrix
  - document-parsing
language:
  - multilingual
---

# Unlimited-OCR — GGUF

GGUF quantizations of [**baidu/Unlimited-OCR**](https://huggingface.co/baidu/Unlimited-OCR),
a 3B vision-language OCR model that pushes **DeepSeek-OCR** one step further (one-shot,
long-horizon document parsing). This repo contains a full spread of **K-quants and
i-quants** of the language model plus the **vision projector (mmproj)** needed for image input.

> ⚠️ **Requires a DeepSeek-OCR–aware llama.cpp build (PR [#17400](https://github.com/ggml-org/llama.cpp/pull/17400)).**
> Unlimited-OCR uses the DeepSeek-OCR architecture (a SAM+CLIP *DeepEncoder* vision tower
> with a DeepSeek-V2 MoE text decoder). Support is **not yet merged into upstream `main`** —
> stock llama.cpp will not load these files. Build the PR branch (instructions below).

## Files

Every run needs **two** files: one language model GGUF (pick a quant) **plus** the shared
vision projector. The projector is fp16 and identical for all quants.

| File | Quant | Bits | Size | Notes |
|---|---|---|---|---|
| `Unlimited-OCR-BF16.gguf` | BF16 | 16 | 5.47 GiB | Full-precision conversion. The base every quant is made from; reference quality. |
| `Unlimited-OCR-Q8_0.gguf` | Q8_0 | 8 | 2.91 GiB | Near-lossless. Best quality short of BF16; recommended if you have the disk/RAM. |
| `Unlimited-OCR-Q6_K.gguf` | Q6_K | 6 | 2.43 GiB | Very high quality, essentially indistinguishable from Q8_0 for OCR. |
| `Unlimited-OCR-Q5_K_M.gguf` | Q5_K_M | 5 | 2.07 GiB | High quality. Great balance when you can spare a bit more than Q4. |
| `Unlimited-OCR-Q5_K_S.gguf` | Q5_K_S | 5 | 1.95 GiB | High quality, slightly smaller than Q5_K_M. |
| `Unlimited-OCR-Q4_K_M.gguf` | Q4_K_M | 4 | 1.82 GiB | **Recommended default** — best overall size/quality trade-off. |
| `Unlimited-OCR-Q4_K_S.gguf` | Q4_K_S | 4 | 1.68 GiB | Slightly smaller than Q4_K_M with a small quality cost. |
| `Unlimited-OCR-Q3_K_M.gguf` | Q3_K_M | 3 | 1.45 GiB | Compact. Usable when memory is tight; some quality loss. |
| `Unlimited-OCR-IQ4_XS.gguf` | IQ4_XS | 4 | 1.53 GiB | i-quant: smaller than Q4_K_S at similar quality (built with imatrix). |
| `Unlimited-OCR-IQ4_NL.gguf` | IQ4_NL | 4 | 1.59 GiB | i-quant (non-linear): 4-bit tuned for ARM/edge; good on Jetson/Apple. |
| `Unlimited-OCR-IQ3_M.gguf` | IQ3_M | 3 | 1.35 GiB | i-quant: solid 3-bit quality for the size (imatrix). |
| `Unlimited-OCR-IQ3_XXS.gguf` | IQ3_XXS | 3 | 1.24 GiB | i-quant: very small 3-bit; noticeable quality loss but runnable. |
| `Unlimited-OCR-IQ2_M.gguf` | IQ2_M | 2 | 1.15 GiB | i-quant: smallest here; experimental, lowest quality — for tight memory only. |

**Vision projector (required for all of the above):**

| File | Type | Size |
|---|---|---|
| `mmproj-Unlimited-OCR-F16.gguf` | F16 | 774.27 MiB |

*Sizes are the on-disk GGUF sizes. The vision encoder is kept at F16 (not quantized) — it is
small and quantizing it hurts OCR accuracy. i-quants were built with an importance matrix
(imatrix) computed from a general-text calibration set.*

## Build llama.cpp with DeepSeek-OCR support

```bash
git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
git fetch origin pull/17400/head:pr17400 && git checkout pr17400
cmake -B build -DCMAKE_BUILD_TYPE=Release        # add -DGGML_CUDA=ON for NVIDIA
cmake --build build -j --target llama-mtmd-cli llama-server
```

## Run

Download a quant + the projector:
```bash
huggingface-cli download sahilchachra/Unlimited-OCR-GGUF \
  --include "Unlimited-OCR-Q4_K_M.gguf" "mmproj-Unlimited-OCR-F16.gguf" --local-dir ./uocr
```

**CLI (one image → markdown, with layout grounding):**
```bash
./build/bin/llama-mtmd-cli \
  -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
  --mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
  --image document.png \
  -p "<|grounding|>Convert the document to markdown." \
  --chat-template deepseek-ocr --temp 0
```

**Server (OpenAI-compatible, with vision):**
```bash
./build/bin/llama-server \
  -m ./uocr/Unlimited-OCR-Q4_K_M.gguf \
  --mmproj ./uocr/mmproj-Unlimited-OCR-F16.gguf \
  --chat-template deepseek-ocr -c 8192
```

### Prompts
- `"<|grounding|>Convert the document to markdown."` — document → markdown **with** bounding-box grounding
- `"<image>\nFree OCR."` — plain text dump, no layout
- `"<image>document parsing."` — Unlimited-OCR's native parsing prompt
- `"<|grounding|>OCR this image."` — OCR with detection boxes

## About the model

- **Architecture:** `DeepseekOCRForCausalLM` — *DeepEncoder* vision (SAM-ViT-B + CLIP-L/14,
  1024×1024 input, 16× downsample) → linear projector → **DeepSeek-V2 MoE** text decoder
  (12 layers, hidden 1280, 64 routed + 2 shared experts, 6 experts/token).
- **Task:** multilingual OCR / document parsing — single image, multi-page, and PDF (one-shot
  long-horizon parsing). The original supports *gundam* (crop) and *base* resolution modes.
- **License:** MIT (inherited from the base model).

## How these were made

1. Converted `baidu/Unlimited-OCR` to GGUF with the PR #17400 `convert_hf_to_gguf.py`. The
   converter targets DeepSeek-OCR, so the config's top-level `architectures` was set to
   `DeepseekOCRForCausalLM` and `language_config.architectures` to `DeepseekV2ForCausalLM`
   (the model is otherwise byte-identical to DeepSeek-OCR's tensor layout).
2. Exported the text decoder (BF16) and the vision tower (`--mmproj`, F16) separately.
3. Built an importance matrix from a general-text corpus and produced the K-/i-quants with
   `llama-quantize`.
4. **Verified**: the BF16 GGUF + mmproj correctly OCR a test document (text + grounding boxes)
   via `llama-mtmd-cli` before quantizing.

## Limitations

- Needs the PR #17400 llama.cpp build until DeepSeek-OCR support lands in `main`.
- Very low-bit i-quants (IQ3_XXS, IQ2_M) trade real accuracy for size — prefer **Q4_K_M** or
  higher for production OCR.
- The vision encoder runs in fp16 regardless of the chosen text quant.

## Credits

- Base model: [baidu/Unlimited-OCR](https://huggingface.co/baidu/Unlimited-OCR) (MIT) — builds on
  [deepseek-ai/DeepSeek-OCR](https://github.com/deepseek-ai/DeepSeek-OCR).
- GGUF / DeepSeek-OCR llama.cpp support: [ggml-org/llama.cpp#17400](https://github.com/ggml-org/llama.cpp/pull/17400).
- Quantized by [sahilchachra](https://huggingface.co/sahilchachra).