DeepSeek-V2-Lite NVFP4 Quantized

This repository contains an NVFP4-quantized version of DeepSeek-V2-Lite, prepared using TensorRT Model Optimizer.

The model has approximately 15.6B total parameters and is quantized to NVIDIA 4-bit floating-point precision (NVFP4).

This is my first quantized model.

Model Details

  • Developed by: Krisakorn Chanthasang
  • Model type: Large Language Model for text generation
  • Languages: English, Chinese
  • License: Apache 2.0
  • Base model: deepseek-ai/DeepSeek-V2-Lite
  • Quantization format: NVFP4
  • Quantization tool: TensorRT Model Optimizer

Base Model Information

For details about the original model architecture, training data, and intended usage, see the official base model page:

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite

Requirements

This model requires hardware and inference software with NVFP4 support.

Hardware Requirement

  • NVIDIA Blackwell GPU or newer
  • Tested/quantized using: NVIDIA RTX PRO 6000 Blackwell Workstation

How to Use

Use this model with an inference engine that supports NVFP4 quantized models.

Compatibility depends on your runtime, GPU architecture, and TensorRT/NVIDIA software stack.

Notes

This is a quantized derivative of DeepSeek-V2-Lite. Accuracy, throughput, and memory usage may differ from the original model depending on the inference engine and hardware used.

Downloads last month
17
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Beambutbetter/Deepseek-V2-Lite-16B-NVFP4

Quantized
(23)
this model