File size: 1,559 Bytes
ce1d15a
 
 
 
 
 
 
 
 
 
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
 
6e19278
 
ce1d15a
6e19278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ce1d15a
6e19278
 
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
ce1d15a
6e19278
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
---
license: apache-2.0
language:
- en
- zh
base_model:
- deepseek-ai/DeepSeek-V2-Lite
pipeline_tag: text-generation
---

# DeepSeek-V2-Lite NVFP4 Quantized

This repository contains an NVFP4-quantized version of **DeepSeek-V2-Lite**, prepared using **TensorRT Model Optimizer**.

The model has approximately **15.6B total parameters** and is quantized to **NVIDIA 4-bit floating-point precision (NVFP4)**.

This is my first quantized model.

## Model Details

- **Developed by:** Krisakorn Chanthasang
- **Model type:** Large Language Model for text generation
- **Languages:** English, Chinese
- **License:** Apache 2.0
- **Base model:** deepseek-ai/DeepSeek-V2-Lite
- **Quantization format:** NVFP4
- **Quantization tool:** TensorRT Model Optimizer

## Base Model Information

For details about the original model architecture, training data, and intended usage, see the official base model page:

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite

## Requirements

This model requires hardware and inference software with **NVFP4 support**.

### Hardware Requirement

- NVIDIA Blackwell GPU or newer
- Tested/quantized using: **NVIDIA RTX PRO 6000 Blackwell Workstation**

## How to Use

Use this model with an inference engine that supports **NVFP4** quantized models.

Compatibility depends on your runtime, GPU architecture, and TensorRT/NVIDIA software stack.

## Notes

This is a quantized derivative of DeepSeek-V2-Lite. Accuracy, throughput, and memory usage may differ from the original model depending on the inference engine and hardware used.