DeepSeek-V4-Flash Dynamic IQ1_M GGUF

This repository contains a GGUF quantized checkpoint for DeepSeek-V4-Flash using a dynamic routed-MoE IQ1_M recipe.

Files

  • dsv4-dynamic-iq1m-antirez.gguf โ€” full dynamic IQ1_M GGUF checkpoint.
  • metadata/dsv4_dynamic_iq1m_complete.tensor_types.txt โ€” exact tensor-type recipe used for quantization.
  • checksums.txt โ€” SHA256 checksums for the uploaded artifact and metadata.
  • logs/quantize_antirez_q8base.log โ€” final quantization log.
  • logs/quantize_antirez_dryrun_q8base.log โ€” dry-run quantization log.

The imatrix/calibration file is intentionally not included in this upload.

Quantization recipe

The routed expert recipe is:

  • ffn_gate_exps: all 43 layers -> iq1_m
  • ffn_up_exps: all 43 layers -> iq1_m
  • ffn_down_exps: layers 0-5 -> q2_k
  • ffn_down_exps: layers 6-42 -> iq1_m
  • non-routed tensors are kept according to the complete tensor-type recipe.

The final validated dtype distribution was:

f16:   359 tensors
f32:   492 tensors
i32:     3 tensors
iq1_m: 123 tensors
q2_k:    6 tensors
q8_0:  345 tensors

Quantization was produced from a full routed-F16 GGUF source using an antirez imatrix and llama.cpp llama-quantize. The final command used Q8_0 as the positional base type only to activate complete per-tensor overrides:

llama-quantize \
  --imatrix .tmp/DeepSeek-V4-Flash-chat-v2-routed-moe-ds4-1p5m.dat \
  --tensor-type-file dsv4_dynamic_iq1m_complete.tensor_types.txt \
  .tmp/dsv4-source-f16-full-routed.gguf \
  .tmp/dsv4-dynamic-iq1m-antirez.gguf \
  Q8_0 32

Runtime status

This checkpoint is intended for ongoing development of device-side IQ1_M routed-MoE inference.

Current local validation showed:

  • GGUF dtype/recipe validation passed.
  • Loader smoke passed with routed raw expert binding.
  • IQ1_M CPU/reference reader sanity passed with finite outputs.
  • Short prefill CE/rank diagnostics produced finite logits, but quality is weaker than the Q2 baseline.

Important caveat: until IQ1_M routed-MoE CUDA/native operators are implemented, runtimes that do not support IQ1_M raw blocks on device may fall back to a very slow CPU/Python reference path.

Checksums

See checksums.txt.

License and attribution

This is a derived quantized checkpoint of deepseek-ai/DeepSeek-V4-Flash. Please follow the license and usage terms of the original model.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for lvyufeng/DeepSeek-V4-Flash-IQ1_M

Finetuned
(14)
this model