File size: 1,559 Bytes
ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 ce1d15a 6e19278 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 | ---
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-V2-Lite
pipeline_tag: text-generation
---
# DeepSeek-V2-Lite NVFP4 Quantized
This repository contains an NVFP4-quantized version of **DeepSeek-V2-Lite**, prepared using **TensorRT Model Optimizer**.
The model has approximately **15.6B total parameters** and is quantized to **NVIDIA 4-bit floating-point precision (NVFP4)**.
This is my first quantized model.
## Model Details
- **Developed by:** Krisakorn Chanthasang
- **Model type:** Large Language Model for text generation
- **Languages:** English, Chinese
- **License:** Apache 2.0
- **Base model:** deepseek-ai/DeepSeek-V2-Lite
- **Quantization format:** NVFP4
- **Quantization tool:** TensorRT Model Optimizer
## Base Model Information
For details about the original model architecture, training data, and intended usage, see the official base model page:
https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite
## Requirements
This model requires hardware and inference software with **NVFP4 support**.
### Hardware Requirement
- NVIDIA Blackwell GPU or newer
- Tested/quantized using: **NVIDIA RTX PRO 6000 Blackwell Workstation**
## How to Use
Use this model with an inference engine that supports **NVFP4** quantized models.
Compatibility depends on your runtime, GPU architecture, and TensorRT/NVIDIA software stack.
## Notes
This is a quantized derivative of DeepSeek-V2-Lite. Accuracy, throughput, and memory usage may differ from the original model depending on the inference engine and hardware used. |