File size: 1,363 Bytes
0a95096
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
license: mit
base_model: deepseek-ai/DeepSeek-OCR-2
tags:
  - ocr
  - vision-language
  - fp8
  - quantized
  - deepseek
library_name: transformers
---

# DeepSeek-OCR-2-FP8

FP8 dynamically quantized version of [deepseek-ai/DeepSeek-OCR-2](https://huggingface.co/deepseek-ai/DeepSeek-OCR-2) for faster inference.

## Model Details

- **Base Model**: deepseek-ai/DeepSeek-OCR-2
- **Architecture**: deepseek_vl_v2 (3B parameters)
- **Quantization**: FP8 Dynamic (llmcompressor)
- **Model Size**: ~3.5GB (vs ~6GB BF16)

## Quantization

Quantized using [llmcompressor](https://github.com/vllm-project/llmcompressor) with FP8_DYNAMIC scheme:

```python
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier

recipe = QuantizationModifier(
    targets="Linear",
    scheme="FP8_DYNAMIC",
    ignore=["lm_head"]
)
oneshot(model=model, recipe=recipe)
```

## Usage

```python
from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "richarddavison/DeepSeek-OCR-2-FP8",
    device_map="auto",
    torch_dtype="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "richarddavison/DeepSeek-OCR-2-FP8",
    trust_remote_code=True
)
```

## Requirements

- transformers==4.46.3
- torch>=2.0
- flash-attn (recommended)

## License

MIT (same as base model)