GGUF
conversational
How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="ddh0/gemma-4-it-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

These are miscellaneous GGUF quantizations of the instruct-tuned Gemma 4 series of models, released by Google.

For more information about Gemma, you should refer to the original model cards.

The chat template baked into these GGUFs is technically outdated, however, inference in llama.cpp should still work exactly as it should, thanks to these fixes:

For the latest official chat template, refer to the original model repo.

Downloads last month
25,830
GGUF
Model size
12B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ddh0/gemma-4-it-GGUF

Quantized
(231)
this model