GGUF Quantizations — Ollama, llama.cpp, LM Studio

#1
by purplesquirrelnetworks - opened

Available Quantizations

File Size RAM Needed Best For
f16 16.1 GB 20+ GB Reference, re-quantization
Q8_0 8.5 GB 12+ GB High-quality local inference
Q5_K_M 5.7 GB 8+ GB Balanced quality/speed
Q4_K_M 4.9 GB 6+ GB Memory-constrained devices

Ollama Quick Start

An Ollama Modelfile is included in this repo:

huggingface-cli download purplesquirrelnetworks/purple-squirrel-r1-gguf \
  Modelfile purple-squirrel-r1-Q5_K_M.gguf --local-dir .
ollama create purple-squirrel-r1 -f Modelfile
ollama run purple-squirrel-r1

To use a different quant, edit the FROM line in the Modelfile.

Trained on Apple Silicon with MLX LoRA. DeepSeek-R1-Distill-Llama-8B base.

Sign up or log in to comment