--- license: mit base_model: deepseek-ai/DeepSeek-OCR-2 tags: - ocr - vision-language - fp8 - quantized - deepseek library_name: transformers --- # DeepSeek-OCR-2-FP8 FP8 dynamically quantized version of [deepseek-ai/DeepSeek-OCR-2](https://huggingface.co/deepseek-ai/DeepSeek-OCR-2) for faster inference. ## Model Details - **Base Model**: deepseek-ai/DeepSeek-OCR-2 - **Architecture**: deepseek_vl_v2 (3B parameters) - **Quantization**: FP8 Dynamic (llmcompressor) - **Model Size**: ~3.5GB (vs ~6GB BF16) ## Quantization Quantized using [llmcompressor](https://github.com/vllm-project/llmcompressor) with FP8_DYNAMIC scheme: ```python from llmcompressor import oneshot from llmcompressor.modifiers.quantization import QuantizationModifier recipe = QuantizationModifier( targets="Linear", scheme="FP8_DYNAMIC", ignore=["lm_head"] ) oneshot(model=model, recipe=recipe) ``` ## Usage ```python from transformers import AutoModel, AutoTokenizer model = AutoModel.from_pretrained( "richarddavison/DeepSeek-OCR-2-FP8", device_map="auto", torch_dtype="auto", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( "richarddavison/DeepSeek-OCR-2-FP8", trust_remote_code=True ) ``` ## Requirements - transformers==4.46.3 - torch>=2.0 - flash-attn (recommended) ## License MIT (same as base model)