--- license: other license_name: deepseek-license license_link: LICENSE base_model: deepseek-ai/DeepSeek-V2-Lite tags: - deepseek - mla - moe - fp8 - group-quantization - compressed-tensors library_name: transformers --- # DeepSeek-V2-Lite-FP8-Group Per-group FP8 quantized version of [deepseek-ai/DeepSeek-V2-Lite](https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite), created with [llm-compressor](https://github.com/vllm-project/llm-compressor). ## Quantization Details | Property | Value | |----------|-------| | Base model | deepseek-ai/DeepSeek-V2-Lite | | Parameters | 15.7B total (2.4B active) | | Architecture | DeepSeek-V2 (MLA + MoE, 64 experts, top-6) | | Quantization | Per-group FP8 (E4M3), dynamic activations | | Weight strategy | Group, group_size=64 | | Activation strategy | Per-token, dynamic | | Format | compressed-tensors (float-quantized) | | Ignored layers | lm_head | | Model size | ~16 GB | | Tool | llm-compressor 0.10.0 | This model uses the same per-group FP8 quantization scheme as DeepSeek-V3 (`weight_block_size: [1, 64]`), making it useful for testing and validating group FP8 inference paths (e.g., MLA attention + group FP8 fusion in vLLM) without needing a 671B model. ## Evaluation GSM8K accuracy (100 samples, via lm_eval harness): | Model | exact_match | |-------|-------------| | Baseline (BF16) | 0.300 | | FP8-Group (this model) | 0.330 | No precision degradation observed from group FP8 quantization. ## Usage ### With vLLM ```bash vllm serve carlyou/DeepSeek-V2-Lite-FP8-Group --trust-remote-code ``` ### With Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "carlyou/DeepSeek-V2-Lite-FP8-Group", torch_dtype="auto", trust_remote_code=True, device_map="auto", ) tokenizer = AutoTokenizer.from_pretrained( "carlyou/DeepSeek-V2-Lite-FP8-Group", trust_remote_code=True, ) ``` ## Reproduction ```bash pip install llmcompressor transformers python quantize.py --model deepseek-ai/DeepSeek-V2-Lite --scheme fp8-group ``` See [carlyou/llm-quant](https://github.com/carlyou/llm-quant) for the quantization script.