Update README.md

6e19278 verified about 1 month ago

1.56 kB

license: apache-2.0
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-V2-Lite
pipeline_tag: text-generation

DeepSeek-V2-Lite NVFP4 Quantized

This repository contains an NVFP4-quantized version of DeepSeek-V2-Lite, prepared using TensorRT Model Optimizer.

The model has approximately 15.6B total parameters and is quantized to NVIDIA 4-bit floating-point precision (NVFP4).

This is my first quantized model.

Model Details

Developed by: Krisakorn Chanthasang
Model type: Large Language Model for text generation
Languages: English, Chinese
License: Apache 2.0
Base model: deepseek-ai/DeepSeek-V2-Lite
Quantization format: NVFP4
Quantization tool: TensorRT Model Optimizer

Base Model Information

For details about the original model architecture, training data, and intended usage, see the official base model page:

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite

Requirements

This model requires hardware and inference software with NVFP4 support.

Hardware Requirement

NVIDIA Blackwell GPU or newer
Tested/quantized using: NVIDIA RTX PRO 6000 Blackwell Workstation

How to Use

Use this model with an inference engine that supports NVFP4 quantized models.

Compatibility depends on your runtime, GPU architecture, and TensorRT/NVIDIA software stack.

Notes

This is a quantized derivative of DeepSeek-V2-Lite. Accuracy, throughput, and memory usage may differ from the original model depending on the inference engine and hardware used.