DeepSeek-V4-Flash Dynamic IQ1_M GGUF
This repository contains a GGUF quantized checkpoint for DeepSeek-V4-Flash using a dynamic routed-MoE IQ1_M recipe.
Files
dsv4-dynamic-iq1m-antirez.ggufโ full dynamic IQ1_M GGUF checkpoint.metadata/dsv4_dynamic_iq1m_complete.tensor_types.txtโ exact tensor-type recipe used for quantization.checksums.txtโ SHA256 checksums for the uploaded artifact and metadata.logs/quantize_antirez_q8base.logโ final quantization log.logs/quantize_antirez_dryrun_q8base.logโ dry-run quantization log.
The imatrix/calibration file is intentionally not included in this upload.
Quantization recipe
The routed expert recipe is:
ffn_gate_exps: all 43 layers ->iq1_mffn_up_exps: all 43 layers ->iq1_mffn_down_exps: layers 0-5 ->q2_kffn_down_exps: layers 6-42 ->iq1_m- non-routed tensors are kept according to the complete tensor-type recipe.
The final validated dtype distribution was:
f16: 359 tensors
f32: 492 tensors
i32: 3 tensors
iq1_m: 123 tensors
q2_k: 6 tensors
q8_0: 345 tensors
Quantization was produced from a full routed-F16 GGUF source using an antirez imatrix and llama.cpp llama-quantize. The final command used Q8_0 as the positional base type only to activate complete per-tensor overrides:
llama-quantize \
--imatrix .tmp/DeepSeek-V4-Flash-chat-v2-routed-moe-ds4-1p5m.dat \
--tensor-type-file dsv4_dynamic_iq1m_complete.tensor_types.txt \
.tmp/dsv4-source-f16-full-routed.gguf \
.tmp/dsv4-dynamic-iq1m-antirez.gguf \
Q8_0 32
Runtime status
This checkpoint is intended for ongoing development of device-side IQ1_M routed-MoE inference.
Current local validation showed:
- GGUF dtype/recipe validation passed.
- Loader smoke passed with routed raw expert binding.
- IQ1_M CPU/reference reader sanity passed with finite outputs.
- Short prefill CE/rank diagnostics produced finite logits, but quality is weaker than the Q2 baseline.
Important caveat: until IQ1_M routed-MoE CUDA/native operators are implemented, runtimes that do not support IQ1_M raw blocks on device may fall back to a very slow CPU/Python reference path.
Checksums
See checksums.txt.
License and attribution
This is a derived quantized checkpoint of deepseek-ai/DeepSeek-V4-Flash. Please follow the license and usage terms of the original model.
Model tree for lvyufeng/DeepSeek-V4-Flash-IQ1_M
Base model
deepseek-ai/DeepSeek-V4-Flash