How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'
Use Docker
docker model run hf.co/rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF:
Quick Links

gemma-4-E4B-it-OBLITERATED - GGUF

GGUF quantizations of OBLITERATUS/gemma-4-E4B-it-OBLITERATED, which is an abliterated version of Google's gemma-4-E4B-it produced with the OBLITERATUS method.

Converted and quantized with llama.cpp build b1-f772f6e. These GGUFs support vision, audio input, and tool calling out of the box.

Files

File Size BPW Notes
gemma-4-E4B-OBLITERATED-F16.gguf 14 GB 16.00 Full F16 text model (source for requantization)
gemma-4-E4B-OBLITERATED-Q8_0.gguf 7.5 GB 8.53 Near-lossless, largest usable quant
gemma-4-E4B-OBLITERATED-Q5_K_M.gguf 5.4 GB 6.12 Balanced quality/size
gemma-4-E4B-OBLITERATED-Q4_K_M.gguf 5.0 GB 5.67 Recommended for local use
mmproj-gemma-4-E4B-OBLITERATED-F16.gguf 945 MB - Required for vision/audio. Contains both encoders.

Pair any text GGUF with the mmproj to enable multimodal input.

Usage with llama.cpp

CLI (image + text)

llama-mtmd-cli \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --image your_image.png \
  --jinja -ngl 99 \
  -p "Describe this image in detail."

Server (OpenAI-compatible API with tool use + vision)

llama-server \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --jinja -ngl 99 -c 8192 --port 8080

Then send OpenAI-style requests to http://localhost:8080/v1/chat/completions with tools, tool_choice, and/or image_url content parts.

Notes

  • --jinja is required - Gemma 4's chat template is custom and will not load without it.
  • The mmproj contains both vision and audio encoders (1411 tensors). Audio input works the same way as images via the multimodal CLI/server.
  • This is an abliterated model: refusal directions in 21/42 layers were surgically modified. This can occasionally affect tool-call reliability on refusal-adjacent topics.
  • Reasoning is emitted through Gemma 4's native thinking channel and surfaced as reasoning_content in OpenAI-compatible responses.

Verified

Smoke-tested on the Q4_K_M build:

  • Vision: correctly described shapes and colors in a synthetic test image
  • Tool use: produced a well-formed tool_calls response to a get_weather tool prompt, finish_reason: tool_calls

License

Apache 2.0, matching the base model.

Downloads last month
592
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF

Quantized
(26)
this model