Instructions to use anemll/dsv4-iq2xxs-expert-major with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use anemll/dsv4-iq2xxs-expert-major with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="anemll/dsv4-iq2xxs-expert-major", filename="dense/model-dense.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use anemll/dsv4-iq2xxs-expert-major with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anemll/dsv4-iq2xxs-expert-major # Run inference directly in the terminal: llama-cli -hf anemll/dsv4-iq2xxs-expert-major
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf anemll/dsv4-iq2xxs-expert-major # Run inference directly in the terminal: llama-cli -hf anemll/dsv4-iq2xxs-expert-major
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf anemll/dsv4-iq2xxs-expert-major # Run inference directly in the terminal: ./llama-cli -hf anemll/dsv4-iq2xxs-expert-major
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf anemll/dsv4-iq2xxs-expert-major # Run inference directly in the terminal: ./build/bin/llama-cli -hf anemll/dsv4-iq2xxs-expert-major
Use Docker
docker model run hf.co/anemll/dsv4-iq2xxs-expert-major
- LM Studio
- Jan
- Ollama
How to use anemll/dsv4-iq2xxs-expert-major with Ollama:
ollama run hf.co/anemll/dsv4-iq2xxs-expert-major
- Unsloth Studio
How to use anemll/dsv4-iq2xxs-expert-major with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anemll/dsv4-iq2xxs-expert-major to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for anemll/dsv4-iq2xxs-expert-major to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for anemll/dsv4-iq2xxs-expert-major to start chatting
- Pi
How to use anemll/dsv4-iq2xxs-expert-major with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anemll/dsv4-iq2xxs-expert-major
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "anemll/dsv4-iq2xxs-expert-major" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use anemll/dsv4-iq2xxs-expert-major with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf anemll/dsv4-iq2xxs-expert-major
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default anemll/dsv4-iq2xxs-expert-major
Run Hermes
hermes
- Docker Model Runner
How to use anemll/dsv4-iq2xxs-expert-major with Docker Model Runner:
docker model run hf.co/anemll/dsv4-iq2xxs-expert-major
- Lemonade
How to use anemll/dsv4-iq2xxs-expert-major with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull anemll/dsv4-iq2xxs-expert-major
Run and chat with the model
lemonade run user.dsv4-iq2xxs-expert-major-{{QUANT_TAG}}List all available models
lemonade list
DeepSeek V4 Flash IQ2_XXS SSD Sidecar for ds4-ssd
This repository contains a prebuilt SSD-streaming sidecar package for
Anemll/ds4-ssd, an alpha fork of antirez's
DwarfStar 4 (ds4) runtime for DeepSeek V4 Flash.
This is not a standalone Transformers model. It is meant to be used with
ds4-ssd sidecar mode: dense tensors are stored in dense/model-dense.gguf,
while routed-MoE expert tensors are stored in layer-major sidecar files and
streamed from SSD through DS4's slot-bank cache.
Contents
dsv4-iq2xxs-expert-major/
manifest.json
dense/
model-dense.gguf
flashmoe-package.json
layer_000.bin
...
layer_042.bin
- Sidecar layout:
layer_major_expert - Architecture:
deepseek4 - Expert count: 256
- Active routed experts per token: 6
- Expert quantization: IQ2_XXS / Q2_K as recorded in
manifest.json - Dense model: GGUF under
dense/model-dense.gguf
Download
hf download anemll/dsv4-iq2xxs-expert-major \
--local-dir /path/to/dsv4-iq2xxs-expert-major
The package is large, so place it on a fast local SSD.
Run with ds4-ssd
git clone https://github.com/Anemll/ds4-ssd
cd ds4-ssd
make
export DS4_SIDECAR_DIR=/path/to/dsv4-iq2xxs-expert-major
./ds4 \
-m "$DS4_SIDECAR_DIR/dense/model-dense.gguf" \
--moe-sidecar "$DS4_SIDECAR_DIR" \
--moe-mode slot-bank \
--moe-slot-bank 8 \
--ctx 9000 \
-p "Hello"
--ctx is the KV window. DS4_METAL_PREFILL_CHUNK= is the prefill
chunk cap used by the alpha sidecar path. Leave DS4_METAL_GRAPH_RAW_CAP unset
so DS4 can auto-size the raw KV graph cap.
To verify that SSD streaming is active, startup logs should include:
applied sidecar tuning profile
Flash-MoE sidecar loaded
Flash-MoE slot banks allocated
If you pass only a full resident GGUF with -m, DS4 is not in SSD-streaming
mode. Sidecar mode requires both -m "$DS4_SIDECAR_DIR/dense/model-dense.gguf"
and --moe-sidecar "$DS4_SIDECAR_DIR".
Smoke Test
From the ds4-ssd checkout:
DS4_SIDECAR_DIR=/path/to/dsv4-iq2xxs-expert-major make sidecar-smoke
The alpha smoke uses a 16K prefill prompt and one deterministic generated token.
Provenance
This sidecar was created from the DeepSeek V4 Flash GGUF model distributed by
antirez/deepseek-v4-gguf for
use with the ds4-ssd runtime. Use this artifact according to the upstream model
and GGUF distribution terms.
For runtime documentation, build instructions, and release notes, see Anemll/ds4-ssd.
- Downloads last month
- 200
We're not able to determine the quantization variants.