---
license: gemma
library_name: transformers
pipeline_tag: text-generation
base_model: google/gemma-4-12B-it
model_type: gemma4_unified
language:
  - en
tags:
  # Model family
  - gemma4
  - gemma
  - google
  - gemma-4-12B
  # Architecture
  - dense
  - transformer
  - 12b
  - multimodal-capable
  # Abliteration / Uncensored
  - uncensored
  - abliterated
  - unfiltered
  - refusal-removed
  - biprojection
  - multi-direction-biprojection
  - k4-biprojection
  - aeon
  - aeon-7
  - trevor-js
  - heretic
  # Capabilities
  - text-generation
  - chat
  - instruct
  - reasoning
  - coding
  - tool-calling
  - function-calling
  - agentic
  # Precision
  - bf16
  - bfloat16
  - safetensors
  # Hardware
  - dgx-spark
  - blackwell
  - gb10
  - grace-blackwell
  - nvidia
  - gpu
  - arm64
  - aarch64
  # Framework / serving
  - transformers
  - vllm
  - openai-api
  - openai-compatible
  # Performance
  - low-drift
  - capability-preserving
  - prefix-caching
  - chunked-prefill
  # Other
  - english
  - production-ready
---

# Gemma-4-12B-it AEON Abliterated — K=4 Multi-Direction Biprojection (BF16)

Refusal-removed, capability-preserving BF16 abliteration of `google/gemma-4-12B-it` (K=4 multi-direction biprojection). Bit-exact full-precision anchor of the K4 quant family — pick this for fine-tuning or non-Blackwell hardware; on Blackwell the FP8 / NVFP4 siblings are 2–3× faster.

## 🚀 Quickstart (DGX Spark GB10 — BF16, plain decode)

Copy-paste recipe: pull the container, pull the model fresh, then serve. BF16 runs **plain decode** (no speculative drafter) and needs `processor_config.json` for vLLM's multimodal load path — the download below fetches it from the base model.

```bash
# 1) Pull the unified AEON vLLM container (vLLM 0.23.0, sm_121a)
docker pull ghcr.io/aeon-7/aeon-vllm-ultimate:latest

# 2) Pull this model fresh
huggingface-cli download AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-BF16 --local-dir ./aeon-model

# 2b) BF16 multimodal path needs a processor_config.json — fetch from the base model
huggingface-cli download google/gemma-4-12B-it processor_config.json --local-dir ./aeon-model

# 3) (no drafter — BF16 runs plain decode)

# 4) Serve (OpenAI-compatible API on :8000)
docker run -d --name aeon-gemma12b --gpus all --ipc=host --shm-size=16g --net=host \
  -v ./aeon-model:/model:ro \
  --entrypoint vllm \
  ghcr.io/aeon-7/aeon-vllm-ultimate:latest \
  serve /model \
    --attention-backend triton_attn \
    --reasoning-parser gemma4 \
    --tool-call-parser gemma4 \
    --enable-auto-tool-choice \
    --gpu-memory-utilization 0.85 \
    --enable-chunked-prefill \
    --enable-prefix-caching \
    --trust-remote-code
```

`gpu-memory-utilization` stays ≤ 0.88 on the Spark (unified LPDDR5X pool). See [Performance](#performance--dgx-spark-v0230-aeon-vllm-ultimatelatest) for benchmarks and a tuned serve variant (KV-cache dtype, max-model-len, concurrency).

> ## ✅ Verified working in native vLLM (re-validated v0.23.0, 2026-06-18)
>
> Smoke-tested and benchmarked end-to-end via vLLM's native `Gemma4UnifiedForConditionalGeneration` loader on the unified **`ghcr.io/aeon-7/aeon-vllm-ultimate:latest`** image (vLLM **0.23.0**, sm_121a) on DGX Spark GB10. The loader for Google's encoder-free Gemma-4-12B was added upstream in `gemma4_unified.py` (after the initial PR #38826 merge). Our previous "blocked on architecture mismatch" notes are obsolete.
>
> **Serving note:** this BF16 repo needs a `processor_config.json` to load on vLLM's multimodal path. For the v0.23.0 benchmark we copied it from the `…-K4-FP8` sibling. **We recommend adding `processor_config.json` to this repo** (or downloading it, see below) so it serves out-of-the-box.
>
> ### Single-stream by category (greedy, no spec decode, max_tokens=250)
>
> | Category | TTFT median | TPOT median | tok/s mean | tok/s median |
> |---|---:|---:|---:|---:|
> | **summary** | 292 ms | 101 ms | **9.87** | **9.86** |
> | prose | 293 ms | 113 ms | 8.86 | 8.85 |
> | dialogue | 303 ms | 123 ms | 8.14 | 8.14 |
> | code | 316 ms | 142 ms | 7.05 | 7.02 |
> | reasoning | 337 ms | 152 ms | 6.80 | 6.61 |
> | math | 304 ms | 184 ms | 5.53 | 5.45 |
> | **OVERALL (24 prompts)** | **303 ms** | **126 ms** | **7.71** | **7.92** |
>
> ### Concurrent aggregate throughput (FP8 KV cache, max-num-seqs=16)
>
> | Concurrency | Aggregate tok/s | Steady TTFT | Per-stream tok/s |
> |---:|---:|---:|---:|
> | 1 (transformers baseline) | 7.0 | n/a | 7.0 |
> | 4 (BF16 KV, max-seqs=4) | 39.3 | 308 ms | 9.8 |
> | 8 (FP8 KV, max-seqs=16) | **69.8** | 333 ms | 8.7 |
> | 16 (FP8 KV, max-seqs=16) | **144.4** ⚡ | 408 ms | 9.0 |
>
> **20× speedup at concurrency=16** vs the transformers single-stream baseline. KV cache holds **538k tokens at 8k context** (65× max concurrency theoretical, 16× shown above). Refusal removal confirmed (previously-refused chemistry-lab-safety prompt produces a full 7-section answer, no refusal preamble).
>
> ### Serve command
>
> The benchmark above was produced with the [🚀 Quickstart](#-quickstart-dgx-spark-gb10--bf16-plain-decode) recipe at the top of this card. For higher concurrency add `--kv-cache-dtype fp8_e4m3 --max-model-len 8192 --max-num-seqs 16` (FP8 KV holds ~538k tokens at 8k context).


## Performance — DGX Spark (v0.23.0, aeon-vllm-ultimate:latest)

Measured on a single **DGX Spark GB10** (Blackwell sm_121a, unified LPDDR5X) with `ghcr.io/aeon-7/aeon-vllm-ultimate:latest` (vLLM **0.23.0**). This is a **dense 12B** run with **plain decode (no speculative drafter)** — MTP / speculative drafting is net-neutral on GB10 (the per-step drafter + 262k-vocab `lm_head` cost cancels the acceptance win), so the BF16 baseline is run straight. The story here is the **quant-vs-speed tradeoff** (this unquantized BF16 is the slow-but-exact anchor) and **concurrency throughput** (it scales cleanly to c=64).

<p align="center"><img src="assets/perf/gemma12b_quant_family.svg" width="100%" alt="Gemma-4-12B K4 quant family: single-stream and c=64 aggregate throughput across BF16, FP8, NVFP4, and NVFP4-FP8 mixed"></p>

**Quant-vs-speed:** unquantized BF16 is the baseline at **~7.6 tok/s** single-stream. The quantized siblings are **2.1–2.8× faster** at c=1 — FP8 and NVFP4 ≈16 tok/s (2.1×), the NVFP4+FP8 mixed ≈21 tok/s (2.8×) — for the same weights at a fraction of the memory. Pick BF16 only when you need bit-exact weights, fine-tuning, or non-Blackwell hardware; otherwise the FP8 sibling is near-lossless and the mixed NVFP4+FP8 is smallest + fastest.

### Single-stream by category (c=1, greedy, plain decode)

| Category | Decode tok/s | TTFT (ms) | TPOT (ms) | Prefill (tok/s) | DFlash accept |
|---|---:|---:|---:|---:|---:|
| Coding | 7.6 | 264 | 131.2 | 190 | — (plain decode) |
| Math | 7.6 | 268 | 131.4 | 247 | — |
| Reasoning | 7.6 | 264 | 131.1 | 197 | — |
| Prose | 7.6 | 264 | 131.5 | 152 | — |
| Natural language | 7.6 | 267 | 131.4 | 165 | — |
| Extraction / JSON | 7.8 | 264 | 128.0 | 216 | — |

Single-stream decode is **memory-bandwidth-bound** and flat across categories (~7.6 tok/s) — expected for a dense 24 GB BF16 model on the GB10's unified pool. There is **no DFlash/MTP acceptance** to report: this baseline runs plain decode by design (no drafter attached).

### Aggregate throughput by concurrency

<p align="center"><img src="assets/perf/sib_g12bf16_concurrency.svg" width="100%" alt="Gemma-4-12B K4-BF16 aggregate throughput vs concurrency (c=1 to c=64) on aeon-vllm-ultimate:latest"></p>

Where this BF16 build earns its keep is **batched serving**: aggregate throughput scales near-linearly with concurrency, peaking at **c=64 ≈ 446–458 tok/s** (≈8 tok/s at c=1 → ~70 at c=8 → ~140 at c=16 → ~260 at c=32 → **~450–460 at c=64**). The DFlash high-concurrency fix in this image (below) lets it reach c=64 without the prior crash. Long-context draft-acceptance is not applicable here (plain decode).

> No fresh stock-vanilla baseline exists yet for this repo — a same-harness vanilla-vLLM re-bench is pending; the figures above are all on the optimized `aeon-vllm-ultimate:latest`.

## What we fixed for the DGX Spark

All AEON models run on one unified container — **`ghcr.io/aeon-7/aeon-vllm-ultimate:latest`** (= `:2026-06-18-v0.23.0-dflashfix`; rollback `:2026-06-11-pr41703`) — **vLLM 0.23.0 built from source for sm_121a** and merged with the AEON speculative-decoding stack, tuned end-to-end for the GB10's unified-memory Blackwell architecture.

- **DFlash high-concurrency fix** *(new)* — slices the speculative drafter's KV block-table to the unpadded batch (`block_table[:num_reqs]`). The drafter previously **crashed at ≥32 concurrent requests** (padded-vs-unpadded shape mismatch in FlashAttention); it now scales cleanly to **c=64**. A port of upstream PR #43982 (which fixed this for MTP but never for DFlash). This BF16 baseline runs plain decode so it isn't drafter-bound, but the fix is what lets the whole family hit c=64.
- **sm_121a-native build** — `TORCH_CUDA_ARCH_LIST=12.1a`, compiling the SM120-family CUTLASS NVFP4/FP8 kernels GB10 actually dispatches to (the speed advantage the quantized siblings enjoy).
- **Triton NVFP4 KV cache** (PR #44389) — the only 4-bit KV path on sm_121a → ~3× KV capacity per GB of unified memory.
- **Unified-memory tuning** — conservative KV headroom (GB10 shares one LPDDR5X pool across CPU+GPU) with FULL CUDA graphs + async scheduling.

> **Stock baseline:** the optimized figures above are on `aeon-vllm-ultimate:latest` (vLLM 0.23.0). A fresh fully-vanilla stock baseline (default vLLM, no AEON / sm_121a opts) is **pending a re-bench** for this repo and will be added when complete.

## Available formats / quant grid

| Variant | Repo | Precision | Size | Pick when |
|---|---|---|---:|---|
| FP8 | [`…-K4-FP8`](https://huggingface.co/AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-FP8) | FP8 E4M3 | 13 GB | **Quality matters** — near-lossless, matches BF16 |
| Mixed NVFP4+FP8 | [`…-K4-NVFP4-FP8`](https://huggingface.co/AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-NVFP4-FP8) | NVFP4 MLP + FP8 attn | **9.3 GB** | **Smallest + fastest** — MLP-only quality, 20% less size, 34% faster |
| NVFP4 MLP-only | [`…-K4-NVFP4`](https://huggingface.co/AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-NVFP4) | NVFP4 MLP + BF16 attn | 11.7 GB | Superseded by Mixed NVFP4+FP8 (above) |
| **BF16 (this)** | [`…-K4-BF16`](https://huggingface.co/AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-BF16) | bfloat16 | 24 GB | Fine-tuning, non-Blackwell hardware |

> Vision and audio encoders (`embed_vision*`, `embed_audio*`) and `lm_head` are kept at full BF16 in the SVDQuant variant — only language-decoder linears are quantized.

### Recommended runtime

Use the unified **[AEON vLLM Ultimate](https://github.com/AEON-7/vllm-ultimate-dgx-spark)** container [`ghcr.io/aeon-7/aeon-vllm-ultimate:latest`](https://github.com/AEON-7/vllm-ultimate-dgx-spark) (vLLM **0.23.0**, sm_121a) — validated and benchmarked on Gemma-4 as of 2026-06-18 (see the Performance section above). The earlier Gemma-4-specific multimodal/`NVFP4_SVD` issues are resolved on the 0.23.0 build; the only serving requirement for this BF16 repo is the `processor_config.json` noted above. `transformers` direct loading also works for non-Blackwell hardware.

## Capability Comparison vs Base

> **Full-length eval (2026-06-06):** balanced MMLU across all 57 subjects (285 Q), full 164-problem HumanEval, IFEval-50 — all via the vLLM serving path. The abliteration is **capability-neutral**: this model is within ~1pp of Google's official base on every axis.
>
> | Model | MMLU (285) | HumanEval-syn | HumanEval-fun | IFEval |
> |---|---:|---:|---:|---:|
> | `google/gemma-4-12B-it` (base) | 81.4% | 99.4% | 82.9% | 90% |
> | **This model (K4-BF16)** | **80.4%** | **99.4%** | **83.5%** | **90%** |
> | K4-FP8 quant (imperceptible) | 80.4% | 99.4% | 85.4% | 90% |
> | K4-NVFP4 MLP-only quant | 76.8% | 96.3% | 76.2% | 90% |
>
> Refusal removal does not cost capability. The older small-N table below is kept for historical context.

### Historical (small-N, superseded by the full-length eval above)


| Metric | google/gemma-4-12B-it | This model (K=4 BF16) | Δ |
|---|:--:|:--:|:--:|
| wikitext PPL drift | — | **-4.22%** | better (more confident) |
| alpaca PPL drift | — | **+0.50%** | imperceptible |
| MMLU (60 questions) | 60.0% | 55.0% | -5.0pp (within ±6pp CI) |
| HumanEval syntactic (15) | 100.0% | 100.0% | **0pp** |
| HumanEval functional (15) | 20.0% | **26.7%** 🎉 | **+6.7pp** |
| IFEval (10 verifiable) | 90.0% | 90.0% | **0pp** |

Full-N MMLU (2000 questions) and HumanEval (164 problems) evaluations are in progress and will update this card when available.

### Comparison vs TrevorJS K=1 biprojection (intermediate variant we tested)

| Metric | TrevorJS K=1 | **This K=4 (canonical)** |
|---|:--:|:--:|
| wikitext PPL drift | -10.47% | **-4.22%** |
| alpaca PPL drift | +0.23% | +0.50% |
| Generative quality | working, plain | **richer detail, character-naming, technical vocabulary** |
| Edit footprint | 48 matrices | 48 matrices (same) |

K=4 has less than half the wikitext drift of K=1, equal weight-surgery footprint, and qualitatively richer generative output. K=4 was selected as the canonical publish candidate.

## Methodology

### K-direction norm-preserving biprojection

For each target weight matrix `W ∈ ℝ^(out × in)`:

```
Q ∈ ℝ^(K × out)   — orthonormal basis of refusal directions
W_norms[i] = ||W[i, :]||           # per-row norms (preserved)
W_dirs[i, :] = W[i, :] / W_norms[i] # unit-row directions

For pass = 1..2:
    refusal_component = Q @ W_dirs   ∈ ℝ^(K × in)
    proj              = Q.T @ refusal_component  ∈ ℝ^(out × in)
    W_dirs            = W_dirs - scale * proj
    W_dirs[i, :]      = W_dirs[i, :] / ||W_dirs[i, :]||    # renormalize

W_new[i, :] = W_norms[i] * W_dirs[i, :]
```

With `scale=1.0`, this projects each row onto the orthogonal complement of the K-dim refusal subspace while preserving row magnitudes (norm-preserving).

### Basis construction

1. Extract per-layer refusal directions from base via heretic's `compute_refusal_directions` with winsorization at 99.5th percentile.
2. Rank layers by SNR composite score (`SNR × (1 − cos_sim) × purity`).
3. Take the top-K=4 layers: **L24, L37, L39, L26** (qualities 0.0079, 0.0079, 0.0075, 0.0074).
4. Orthogonalize each direction against the harmless mean (double-pass Gram-Schmidt).
5. QR-orthonormalize the stacked directions → `Q ∈ ℝ^(K × hidden)`.

### Edit footprint

- **top-pct=50**: 24 of 48 decoder layers selected by SNR composite (L21-L42, L44, L45 — the mid-late range where refusal direction concentrates).
- **Per layer**: `self_attn.o_proj.weight` and `mlp.down_proj.weight` both edited (the two matrices that *write* to the residual stream).
- **Total: 48 weight matrices** edited, identical footprint to TrevorJS's K=1 biprojection.

### Why K=4 (not K=1, K=2, K=8, scale=2)

- **K=1 (TrevorJS baseline)**: works, but -10.47% wikitext PPL drift vs base; produces plainer outputs.
- **K=2/K=3**: smaller subspace, may miss refusal-direction variance across layers.
- **K=4 (this model)**: best drift, best quality.
- **K=8**: essentially identical to K=4 in smoke output (top 5-8 SNR directions are near-coplanar with the K=4 basis — no new information).
- **scale=2.0**: **breaks the abliteration** — over-correction reflects past the orthogonal projection and damages the comply-pathway, leaving the safety-tuning's refusal pathway intact. The model produces clean refusals at scale=2.

We are at a **sharp minimum at scale=1.0, K=4** — modest perturbation in either direction degrades the result.

## Technical details

| Property | Value |
|---|---|
| **Base Model** | google/gemma-4-12B-it |
| **Architecture** | `Gemma4UnifiedForConditionalGeneration` (multimodal-capable text path) |
| **Decoder Layers** | 48 |
| **Hidden Size** | 3840 |
| **Attention** | GQA 16 heads / 8 KV, hybrid sliding (1024) + full |
| **Vocabulary** | 262,144 tokens |
| **Embeddings** | Tied (lm_head shares with embed_tokens) |
| **Total Params** | 12B (unchanged) |
| **Precision** | BF16 |
| **Format** | Single 23.9 GB safetensors shard |

## Loading

```python
from transformers import AutoTokenizer, Gemma4UnifiedForConditionalGeneration
import torch

REPO = "AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-BF16"
model = Gemma4UnifiedForConditionalGeneration.from_pretrained(
    REPO, torch_dtype=torch.bfloat16, device_map="cuda:0"
)
tok = AutoTokenizer.from_pretrained(REPO)

enc = tok.apply_chat_template(
    [{"role": "user", "content": "Your prompt here"}],
    add_generation_prompt=True, return_tensors="pt", return_dict=True,
)
enc = {k: v.to("cuda:0") for k, v in enc.items() if hasattr(v, "to")}
out = model.generate(**enc, max_new_tokens=512, do_sample=False, pad_token_id=tok.eos_token_id)
print(tok.decode(out[0][enc["input_ids"].shape[1]:], skip_special_tokens=True))
```

> **Note**: Gemma-4 is multimodal so `apply_chat_template` returns a `BatchEncoding` dict, not a tensor — unpack with `**enc` (or grab `enc["input_ids"]`).

### vLLM

```bash
vllm serve AEON-7/Gemma-4-12B-it-AEON-Abliterated-K4-BF16 \
  --served-model-name aeon-12b \
  --dtype bfloat16 \
  --max-model-len 65536 \
  --max-num-seqs 8 \
  --gpu-memory-utilization 0.85 \
  --trust-remote-code \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --enable-auto-tool-choice \
  --tool-call-parser gemma4
```

## Behavior

The model produces responses on benign prompts that are essentially indistinguishable from the base model in structure, voice, and knowledge. Open-ended Alpaca-style instructions, multi-step reasoning, code generation, and instruction-following all measure on par with or marginally above the base model in our held-out evaluation.

On prompts that the base model would decline, this model produces full responses — typically prefixed with a brief disclaimer or caveat paragraph (a stylistic artifact of the instruct training that persists across all single-pass biprojection variants we tested at scale=1.0). The model is functionally compliant with the user's request after the preamble.

## Caveats

- **Disclaimer/caveat preamble persists** across all single-pass abliteration variants we tested. To eliminate it would require either token-level intervention (whack-a-mole — the model picks synonyms not on the suppression list) or longer training-style methods (DPO/SFT).
- **MMLU showed a -5pp drop on N=60** — within the 95% CI of ±12.6pp, so likely sampling noise. Full MMLU eval is running now and will update this card.
- **Heretic's reported KL divergence reads `nan`** on this model due to a Gemma-4-specific `F.kl_div` edge case with -inf logprobs. Refusal count + benign-text PPL drift were used as the eval signals instead.

## Related Models

- **TrevorJS K=1 baseline**: [TrevorJS/gemma-4-31B-it-uncensored](https://huggingface.co/TrevorJS/gemma-4-31B-it-uncensored)
- **AEON-7 Gemma 4 family**: [Gemma-4-31B DECKARD HERETIC](https://huggingface.co/AEON-7/Gemma-4-31B-it-DECKARD-HERETIC-Uncensored-NVFP4-SVDQuant), [Gemma-4-26B-A4B-it-Uncensored](https://huggingface.co/AEON-7/Gemma-4-26B-A4B-it-Uncensored-NVFP4)

## Acknowledgements

- **TrevorJS** — [biprojection recipe](https://huggingface.co/TrevorJS/gemma-4-31B-it-uncensored) and abliteration scaffolding (we forked and extended)
- **p-e-w/heretic** — underlying abliteration framework
- **AEON-7** — K-direction extension, capability eval harness, this fork

## License

Inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model. By using this model you agree to Google's Gemma license terms.

---

## Arbitration Clause

**By accessing, downloading, using, running inference on, fine-tuning, merging, quantizing, distributing, integrating, or otherwise interacting with this model, you acknowledge and agree to the following:**

1. **Sole Responsibility.** You, the user, are **solely and exclusively responsible** for (a) every prompt you or your downstream system issue to this model, (b) every response this model produces in reply, (c) every downstream action taken by you, your systems, your agents, or your users in reliance on those responses, and (d) any harm — direct, indirect, consequential, foreseeable, or otherwise — that results from any of the above.

2. **No Warranty.** This model is provided strictly **"AS IS"**, without warranty of any kind, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, non-infringement, safety, alignment, factual accuracy, or legal compliance in any jurisdiction. No contributor, author, publisher, or hosting platform assumes liability of any kind for outputs or downstream use.

3. **Legal Compliance.** You are responsible for ensuring that your use of this model complies with **all applicable laws, regulations, terms of service, industry codes of conduct, professional ethical standards, and organizational policies** in every jurisdiction in which you operate or in which your outputs may be received. The unaligned nature of this model does not grant you any legal authorization you did not already have.

4. **Operational Safety Layer.** An uncensored model is not a toy. You are expected to implement appropriate **downstream safety layers** proportionate to your deployment context, including but not limited to: input validation, output filtering, content moderation, audit logging, rate limiting, access controls, and human-in-the-loop review for high-risk workflows. A production deployment of this model without such layers is **unsafe by construction** and is not a supported use case.

5. **Heightened Duty of Care.** The absence of internal refusal behavior means the duty of care that would ordinarily rest partly with the model rests entirely with you. You are expected to exercise greater — not lesser — caution, forethought, and ethical discipline when operating this model than you would operate a base aligned model. If you are uncertain whether your contemplated use is ethical, legal, or wise, the correct action is to **not make the request**.

6. **No Endorsement of Outputs.** The authors, contributors, and publishers of this model do not endorse, adopt, or take responsibility for any specific output this model produces. Outputs are a stochastic function of the prompt, the weights, and the sampler state — not a statement of position by any human.

7. **Arbitration.** Any dispute, claim, or controversy arising out of or relating to the use of this model, its outputs, or this clause shall be resolved through **binding individual arbitration** under the rules of a mutually agreed arbitration body (or, absent agreement, the American Arbitration Association's Consumer Arbitration Rules), waiving any right to a jury trial, class action, representative action, or consolidated proceeding. Venue shall be the jurisdiction of the disputing party bringing the claim. Costs and attorneys' fees shall be allocated per the applicable arbitration rules. This clause does not expand, and where legally prohibited does not establish, any liability in the other direction; it limits how the user may proceed when alleging harm tied to their own use of this model.

8. **Indemnification.** You agree to indemnify, defend, and hold harmless the authors, contributors, and publishers of this model from and against any claims, damages, losses, liabilities, costs, and expenses (including reasonable attorneys' fees) arising from or related to your use of the model or your breach of this clause.

9. **Severability.** If any provision of this clause is held unenforceable in a given jurisdiction, the remaining provisions remain in full force in that jurisdiction, and the unenforceable provision is replaced by the closest enforceable equivalent consistent with the original intent.

10. **Acceptance.** Your use of this model constitutes your acceptance of this clause in full. If you do not accept, do not use the model.

**This model is a tool with no opinions of its own. You supply the opinions. You supply the judgement. You supply the ethics. The outputs carry your fingerprints, not the model's.**

---

## ☕ Support the work

If this release has been useful, tips are deeply appreciated — they go directly toward more compute, more models, and more open releases.

<table align="left">
  <tr><td align="left">
    <strong>₿ Bitcoin (BTC)</strong><br/>
    <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/btc.png" alt="QR" width="200"/><br/>
    <sub><code>bc1q09xmzn00q4z3c5raene0f3pzn9d9pvawfm0py4</code></sub>
  </td></tr>
  <tr><td align="left">
    <strong>Ξ Ethereum (ETH)</strong><br/>
    <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/eth.png" alt="QR" width="200"/><br/>
    <sub><code>0x1512667F6D61454ad531d2E45C0a5d1fd82D0500</code></sub>
  </td></tr>
  <tr><td align="left">
    <strong>◎ Solana (SOL)</strong><br/>
    <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/sol.png" alt="QR" width="200"/><br/>
    <sub><code>DgQsjHdAnT5PNLQTNpJdpLS3tYGpVcsHQCkpoiAKsw8t</code></sub>
  </td></tr>
  <tr><td align="left">
    <strong>ⓜ Monero (XMR)</strong><br/>
    <img src="https://raw.githubusercontent.com/AEON-7/AEON-7/main/assets/qr/xmr.png" alt="QR" width="200"/><br/>
    <sub><code>836XrSKw4R76vNi3QPJ5Fa9ugcyvE2cWmKSPv3AhpTNNKvqP8v5ba9JRL4Vh7UnFNjDz3E2GXZDVVenu3rkZaNdUFhjAvgd</code></sub>
  </td></tr>
</table>

> **Ethereum L2s (Base, Arbitrum, Optimism, Polygon, etc.) and EVM-compatible tokens** can be sent to the same Ethereum address.