Instructions to use lex-au/Google.Gemma-3-4b-it-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lex-au/Google.Gemma-3-4b-it-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="lex-au/Google.Gemma-3-4b-it-GGUF",
	filename="Google.Gemma-3-4b-Q2_K.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use lex-au/Google.Gemma-3-4b-it-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
# Run inference directly in the terminal:
llama-cli -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
# Run inference directly in the terminal:
llama-cli -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
# Run inference directly in the terminal:
./llama-cli -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

Use Docker

docker model run hf.co/lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

LM Studio
Jan
Ollama
How to use lex-au/Google.Gemma-3-4b-it-GGUF with Ollama:
```
ollama run hf.co/lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
```

Unsloth Studio

How to use lex-au/Google.Gemma-3-4b-it-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for lex-au/Google.Gemma-3-4b-it-GGUF to start chatting

Docker Model Runner
How to use lex-au/Google.Gemma-3-4b-it-GGUF with Docker Model Runner:
```
docker model run hf.co/lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K
```

Lemonade

How to use lex-au/Google.Gemma-3-4b-it-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull lex-au/Google.Gemma-3-4b-it-GGUF:Q2_K

Run and chat with the model

lemonade run user.Google.Gemma-3-4b-it-GGUF-Q2_K

List all available models

lemonade list

Google Gemma 3 4B Instruction-Tuned GGUF Quantized Models

This repository contains GGUF quantized versions of Google's Gemma 3 4B instruction-tuned model, optimized for efficient deployment across various hardware configurations.

Quantization Results

Model	Size	Compression Ratio	Size Reduction
Q8_0	4.1 GB	53%	47%
Q6_K	3.2 GB	41%	59%
Q4_K	2.5 GB	32%	68%
Q2_K	1.7 GB	22%	78%

Quality vs Size Trade-offs

Q8_0: Near-lossless quality, minimal degradation compared to F16
Q6_K: Very good quality, slight degradation in some rare cases
Q4_K: Decent quality, noticeable degradation but still usable for most tasks
Q2_K: Heavily reduced quality, substantial degradation but smallest file size

Recommendations

For maximum quality: Use F16 or Q8_0
For balanced performance: Use Q6_K
For minimum size: Use Q2_K
For most use cases: Q4_K provides a good balance of quality and size

Usage with llama.cpp

These models can be used with llama.cpp and its various interfaces. Example:

# Running with llama-gemma3-cli.exe (adjust paths as needed)
./llama-gemma3-cli --model gemma-3-4b-it-q4k.gguf --ctx-size 4096 --temp 0.7 --prompt "Write a short story about a robot who discovers it has feelings."

License

This model is released under the same Gemma license as the original model.

Original Model Information

This quantized set is derived from Google's Gemma 3 4B instruction-tuned model.

Model Specifications

Architecture: Gemma 3
Size Label: 4B
Type: Instruction-tuned
Context Length: 131K tokens
Embedding Length: 2560
Languages: Support for multiple languages

Citation & Attribution

@article{gemma_2025,
    title={Gemma 3},
    url={https://goo.gle/Gemma3Report},
    publisher={Kaggle},
    author={Gemma Team},
    year={2025}
}

@misc{gemma3_quantization_2025,
    title={Quantized Versions of Google's Gemma 3 27B Model},
    author={Lex-au},
    year={2025},
    month={March},
    note={Quantized models (Q8_0, Q6_K, Q4_K, Q2_K) derived from Google's Gemma 3 4B},
    url={https://huggingface.co/lex-au}
}

Downloads last month: 55

GGUF

Model size

4B params

Architecture

gemma3

Hardware compatibility

2-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lex-au/Google.Gemma-3-4b-it-GGUF

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Quantized

(464)

this model

Collection including lex-au/Google.Gemma-3-4b-it-GGUF

Gemma 3

Collection

Collection of quants for Google's Gemma 3 • 3 items • Updated Apr 18, 2025