DeepSeek V4 Flash IQ2_XXS SSD Sidecar for ds4-ssd

This repository contains a prebuilt SSD-streaming sidecar package for Anemll/ds4-ssd, an alpha fork of antirez's DwarfStar 4 (ds4) runtime for DeepSeek V4 Flash.

This is not a standalone Transformers model. It is meant to be used with ds4-ssd sidecar mode: dense tensors are stored in dense/model-dense.gguf, while routed-MoE expert tensors are stored in layer-major sidecar files and streamed from SSD through DS4's slot-bank cache.

Contents

dsv4-iq2xxs-expert-major/
  manifest.json
  dense/
    model-dense.gguf
    flashmoe-package.json
  layer_000.bin
  ...
  layer_042.bin
  • Sidecar layout: layer_major_expert
  • Architecture: deepseek4
  • Expert count: 256
  • Active routed experts per token: 6
  • Expert quantization: IQ2_XXS / Q2_K as recorded in manifest.json
  • Dense model: GGUF under dense/model-dense.gguf

Download

hf download anemll/dsv4-iq2xxs-expert-major \
  --local-dir /path/to/dsv4-iq2xxs-expert-major

The package is large, so place it on a fast local SSD.

Run with ds4-ssd

git clone https://github.com/Anemll/ds4-ssd
cd ds4-ssd
make

export DS4_SIDECAR_DIR=/path/to/dsv4-iq2xxs-expert-major

./ds4 \
  -m "$DS4_SIDECAR_DIR/dense/model-dense.gguf" \
  --moe-sidecar "$DS4_SIDECAR_DIR" \
  --moe-mode slot-bank \
  --moe-slot-bank 8 \
  --ctx 9000 \
  -p "Hello"

--ctx is the KV window. DS4_METAL_PREFILL_CHUNK= is the prefill chunk cap used by the alpha sidecar path. Leave DS4_METAL_GRAPH_RAW_CAP unset so DS4 can auto-size the raw KV graph cap.

To verify that SSD streaming is active, startup logs should include:

applied sidecar tuning profile
Flash-MoE sidecar loaded
Flash-MoE slot banks allocated

If you pass only a full resident GGUF with -m, DS4 is not in SSD-streaming mode. Sidecar mode requires both -m "$DS4_SIDECAR_DIR/dense/model-dense.gguf" and --moe-sidecar "$DS4_SIDECAR_DIR".

Smoke Test

From the ds4-ssd checkout:

DS4_SIDECAR_DIR=/path/to/dsv4-iq2xxs-expert-major make sidecar-smoke

The alpha smoke uses a 16K prefill prompt and one deterministic generated token.

Provenance

This sidecar was created from the DeepSeek V4 Flash GGUF model distributed by antirez/deepseek-v4-gguf for use with the ds4-ssd runtime. Use this artifact according to the upstream model and GGUF distribution terms.

For runtime documentation, build instructions, and release notes, see Anemll/ds4-ssd.

Downloads last month
200
GGUF
Model size
7B params
Architecture
deepseek4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support