Beambutbetter's picture
Update README.md
6e19278 verified
|
Raw
History Blame
1.56 kB
metadata
license: apache-2.0
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-V2-Lite
pipeline_tag: text-generation

DeepSeek-V2-Lite NVFP4 Quantized

This repository contains an NVFP4-quantized version of DeepSeek-V2-Lite, prepared using TensorRT Model Optimizer.

The model has approximately 15.6B total parameters and is quantized to NVIDIA 4-bit floating-point precision (NVFP4).

This is my first quantized model.

Model Details

  • Developed by: Krisakorn Chanthasang
  • Model type: Large Language Model for text generation
  • Languages: English, Chinese
  • License: Apache 2.0
  • Base model: deepseek-ai/DeepSeek-V2-Lite
  • Quantization format: NVFP4
  • Quantization tool: TensorRT Model Optimizer

Base Model Information

For details about the original model architecture, training data, and intended usage, see the official base model page:

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite

Requirements

This model requires hardware and inference software with NVFP4 support.

Hardware Requirement

  • NVIDIA Blackwell GPU or newer
  • Tested/quantized using: NVIDIA RTX PRO 6000 Blackwell Workstation

How to Use

Use this model with an inference engine that supports NVFP4 quantized models.

Compatibility depends on your runtime, GPU architecture, and TensorRT/NVIDIA software stack.

Notes

This is a quantized derivative of DeepSeek-V2-Lite. Accuracy, throughput, and memory usage may differ from the original model depending on the inference engine and hardware used.