---
license: apache-2.0
library_name: gguf
pipeline_tag: image-text-to-text
base_model: OBLITERATUS/gemma-4-E4B-it-OBLITERATED
base_model_relation: quantized
quantized_by: rdhorner
tags:
  - gguf
  - gemma4
  - abliterated
  - vision
  - audio
  - multimodal
  - tool-use
  - tools
  - function-calling
  - llama.cpp
  - conversational
lm_studio:
  param_count: 8b
  use_case: tools
  release_date: "16-04-2026"
  model_creator: OBLITERATUS
  prompt_template: Gemma 4
  base_model: gemma4
  original_repo: OBLITERATUS/gemma-4-E4B-it-OBLITERATED
---

# gemma-4-E4B-it-OBLITERATED - GGUF

GGUF quantizations of [OBLITERATUS/gemma-4-E4B-it-OBLITERATED](https://huggingface.co/OBLITERATUS/gemma-4-E4B-it-OBLITERATED), which is an abliterated version of Google's `gemma-4-E4B-it` produced with the OBLITERATUS method.

Converted and quantized with [llama.cpp](https://github.com/ggml-org/llama.cpp) build `b1-f772f6e`. These GGUFs support **vision**, **audio** input, and **tool calling** out of the box.

## Files

| File | Size | BPW | Notes |
|---|---|---|---|
| `gemma-4-E4B-OBLITERATED-F16.gguf` | 14 GB | 16.00 | Full F16 text model (source for requantization) |
| `gemma-4-E4B-OBLITERATED-Q8_0.gguf` | 7.5 GB | 8.53 | Near-lossless, largest usable quant |
| `gemma-4-E4B-OBLITERATED-Q5_K_M.gguf` | 5.4 GB | 6.12 | Balanced quality/size |
| `gemma-4-E4B-OBLITERATED-Q4_K_M.gguf` | 5.0 GB | 5.67 | **Recommended** for local use |
| `mmproj-gemma-4-E4B-OBLITERATED-F16.gguf` | 945 MB | - | **Required** for vision/audio. Contains both encoders. |

Pair any text GGUF with the mmproj to enable multimodal input.

## Usage with llama.cpp

### CLI (image + text)
```bash
llama-mtmd-cli \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --image your_image.png \
  --jinja -ngl 99 \
  -p "Describe this image in detail."
```

### Server (OpenAI-compatible API with tool use + vision)
```bash
llama-server \
  -m gemma-4-E4B-OBLITERATED-Q4_K_M.gguf \
  --mmproj mmproj-gemma-4-E4B-OBLITERATED-F16.gguf \
  --jinja -ngl 99 -c 8192 --port 8080
```

Then send OpenAI-style requests to `http://localhost:8080/v1/chat/completions` with `tools`, `tool_choice`, and/or `image_url` content parts.

## Notes

- `--jinja` is **required** - Gemma 4's chat template is custom and will not load without it.
- The mmproj contains both vision and audio encoders (1411 tensors). Audio input works the same way as images via the multimodal CLI/server.
- This is an *abliterated* model: refusal directions in 21/42 layers were surgically modified. This can occasionally affect tool-call reliability on refusal-adjacent topics.
- Reasoning is emitted through Gemma 4's native thinking channel and surfaced as `reasoning_content` in OpenAI-compatible responses.

## Verified

Smoke-tested on the Q4_K_M build:
- **Vision**: correctly described shapes and colors in a synthetic test image
- **Tool use**: produced a well-formed `tool_calls` response to a `get_weather` tool prompt, `finish_reason: tool_calls`

## License

Apache 2.0, matching the base model.