How to use from
SGLang
Install from pip and serve model
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "RemySkye/distil-qwen35-4b-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RemySkye/distil-qwen35-4b-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Use Docker images
docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "RemySkye/distil-qwen35-4b-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "RemySkye/distil-qwen35-4b-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'
Quick Links

distil-qwen35-4b - GGUF

Static quantizations.

Available Quantizations

Approximate BPW and file size in decimal GB, ordered from highest precision to lowest.

File Approx. BPW Approx. Size (GB)
distil-qwen35-4b-bf16.gguf 16.00 8.42
distil-qwen35-4b-q8_0.gguf 8.51 4.48
distil-qwen35-4b-q6_k.gguf 6.57 3.46
distil-qwen35-4b-q5_1.gguf 6.09 3.21
distil-qwen35-4b-q5_k_m.gguf 5.83 3.07
distil-qwen35-4b-q5_0.gguf 5.67 2.99
distil-qwen35-4b-q4_k_m.gguf 5.13 2.71
distil-qwen35-4b-q4_1.gguf 5.24 2.77
distil-qwen35-4b-q4_0.gguf 4.82 2.54

Benchmark Performance

Benchmark Qwen 3.5 4B (Baseline) iotaminer/distil-qwen35-4b Delta
GSM8K (math) 74.0 84.0 +10.0
ARC-Challenge 54.0 59.0 +5.0
WinoGrande 75.0 79.0 +4.0
IFEval 19.0 23.0 +4.0
TruthfulQA MC2 49.1 51.6 +2.4
HellaSwag 68.0 69.0 +1.0
MMLU-Pro 57.2 52.9 -4.3
Downloads last month
39
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support