DeepSeek-V4-Flash Dynamic IQ1_M GGUF

This repository contains a GGUF quantized checkpoint for DeepSeek-V4-Flash using a dynamic routed-MoE IQ1_M recipe.

Files

dsv4-dynamic-iq1m-antirez.gguf — full dynamic IQ1_M GGUF checkpoint.
metadata/dsv4_dynamic_iq1m_complete.tensor_types.txt — exact tensor-type recipe used for quantization.
checksums.txt — SHA256 checksums for the uploaded artifact and metadata.
logs/quantize_antirez_q8base.log — final quantization log.
logs/quantize_antirez_dryrun_q8base.log — dry-run quantization log.

The imatrix/calibration file is intentionally not included in this upload.

Quantization recipe

The routed expert recipe is:

ffn_gate_exps: all 43 layers -> iq1_m
ffn_up_exps: all 43 layers -> iq1_m
ffn_down_exps: layers 0-5 -> q2_k
ffn_down_exps: layers 6-42 -> iq1_m
non-routed tensors are kept according to the complete tensor-type recipe.

The final validated dtype distribution was:

f16:   359 tensors
f32:   492 tensors
i32:     3 tensors
iq1_m: 123 tensors
q2_k:    6 tensors
q8_0:  345 tensors

Quantization was produced from a full routed-F16 GGUF source using an antirez imatrix and llama.cpp llama-quantize. The final command used Q8_0 as the positional base type only to activate complete per-tensor overrides:

llama-quantize \
  --imatrix .tmp/DeepSeek-V4-Flash-chat-v2-routed-moe-ds4-1p5m.dat \
  --tensor-type-file dsv4_dynamic_iq1m_complete.tensor_types.txt \
  .tmp/dsv4-source-f16-full-routed.gguf \
  .tmp/dsv4-dynamic-iq1m-antirez.gguf \
  Q8_0 32

Runtime status

This checkpoint is intended for ongoing development of device-side IQ1_M routed-MoE inference.

Current local validation showed:

GGUF dtype/recipe validation passed.
Loader smoke passed with routed raw expert binding.
IQ1_M CPU/reference reader sanity passed with finite outputs.
Short prefill CE/rank diagnostics produced finite logits, but quality is weaker than the Q2 baseline.

Important caveat: until IQ1_M routed-MoE CUDA/native operators are implemented, runtimes that do not support IQ1_M raw blocks on device may fall back to a very slow CPU/Python reference path.

Checksums

See checksums.txt.

License and attribution

This is a derived quantized checkpoint of deepseek-ai/DeepSeek-V4-Flash. Please follow the license and usage terms of the original model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lvyufeng/DeepSeek-V4-Flash-IQ1_M

Base model

deepseek-ai/DeepSeek-V4-Flash

Finetuned

(14)

this model