How to use from
Lemonade
Pull the model
# Download Lemonade from https://lemonade-server.ai/
lemonade pull rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF:
Run and chat with the model
lemonade run user.gemma-4-E4B-it-OBLITERATED-GGUF-
List all available models
lemonade list
Quick Links

gemma-4-E4B-it-OBLITERATED - GGUF

GGUF quantizations of OBLITERATUS/gemma-4-E4B-it-OBLITERATED, which is an abliterated version of Google's gemma-4-E4B-it produced with the OBLITERATUS method.

Converted and quantized with llama.cpp build b1-f772f6e. These GGUFs support vision, audio input, and tool calling out of the box.

Files

File Size BPW Notes
gemma-4-E4B-OBLITERATED-F16.gguf 14 GB 16.00 Full F16 text model (source for requantization)
gemma-4-E4B-OBLITERATED-Q8_0.gguf 7.5 GB 8.53 Near-lossless, largest usable quant
gemma-4-E4B-OBLITERATED-Q5_K_M.gguf 5.4 GB 6.12 Balanced quality/size
gemma-4-E4B-OBLITERATED-Q4_K_M.gguf 5.0 GB 5.67 Recommended for local use
mmproj-gemma-4-E4B-OBLITERATED-F16.gguf 945 MB - Required for vision/audio. Contains both encoders.

Pair any text GGUF with the mmproj to enable multimodal input.

Usage with llama.cpp

CLI (image + text)

llama-mtmd-cli \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --image your_image.png \
  --jinja -ngl 99 \
  -p "Describe this image in detail."

Server (OpenAI-compatible API with tool use + vision)

llama-server \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --jinja -ngl 99 -c 8192 --port 8080

Then send OpenAI-style requests to http://localhost:8080/v1/chat/completions with tools, tool_choice, and/or image_url content parts.

Notes

  • --jinja is required - Gemma 4's chat template is custom and will not load without it.
  • The mmproj contains both vision and audio encoders (1411 tensors). Audio input works the same way as images via the multimodal CLI/server.
  • This is an abliterated model: refusal directions in 21/42 layers were surgically modified. This can occasionally affect tool-call reliability on refusal-adjacent topics.
  • Reasoning is emitted through Gemma 4's native thinking channel and surfaced as reasoning_content in OpenAI-compatible responses.

Verified

Smoke-tested on the Q4_K_M build:

  • Vision: correctly described shapes and colors in a synthetic test image
  • Tool use: produced a well-formed tool_calls response to a get_weather tool prompt, finish_reason: tool_calls

License

Apache 2.0, matching the base model.

Downloads last month
592
GGUF
Model size
8B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rdhorner/gemma-4-E4B-it-OBLITERATED-GGUF

Quantized
(26)
this model