How to use from
Unsloth Studio
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting
Quick Links

Google Gemma 3 4B Instruction-Tuned GGUF Quantized Models

This repository contains GGUF quantized versions of Google's Gemma 3 4B instruction-tuned model, optimized for efficient deployment across various hardware configurations.

Quantization Results

Model Size Compression Ratio Size Reduction
Q8_0 4.1 GB 53% 47%
Q6_K 3.2 GB 41% 59%
Q4_K 2.5 GB 32% 68%
Q2_K 1.7 GB 22% 78%

Quality vs Size Trade-offs

  • Q8_0: Near-lossless quality, minimal degradation compared to F16
  • Q6_K: Very good quality, slight degradation in some rare cases
  • Q4_K: Decent quality, noticeable degradation but still usable for most tasks
  • Q2_K: Heavily reduced quality, substantial degradation but smallest file size

Recommendations

  • For maximum quality: Use F16 or Q8_0
  • For balanced performance: Use Q6_K
  • For minimum size: Use Q2_K
  • For most use cases: Q4_K provides a good balance of quality and size

Usage with llama.cpp

These models can be used with llama.cpp and its various interfaces. Example:

# Running with llama-gemma3-cli.exe (adjust paths as needed)
./llama-gemma3-cli --model gemma-3-4b-it-q4k.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."

License

This model is released under the same Gemma license as the original model.

Original Model Information

This quantized set is derived from Google's Gemma 3 4B instruction-tuned model.

Model Specifications

  • Architecture: Gemma 3
  • Size Label: 4B
  • Type: Instruction-tuned
  • Context Length: 131K tokens
  • Embedding Length: 2560
  • Languages: Support for multiple languages

Citation & Attribution

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

@misc{gemma3_quantization_2025,
    title={Quantized Versions of Google's Gemma 3 27B Model},
    author={Lex-au},
    year={2025},
    month={March},
    note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 4B},
    url={https://huggingface.co/lex-au}
}
Downloads last month
55
GGUF
Model size
4B params
Architecture
gemma3
Hardware compatibility
Log In to add your hardware

2-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lex-au/Google.Gemma-3-4b-it-GGUF

Quantized
(464)
this model

Collection including lex-au/Google.Gemma-3-4b-it-GGUF