---
library_name: mlx
license: apache-2.0
license_link: https://ai.google.dev/gemma/docs/gemma_4_license
pipeline_tag: text-generation
tags:
  - mlx
  - lora
  - adapters
  - gemma4
  - reasoning
  - sft
  - opus
  - claude-code
  - chain-of-thought
  - tool-use
  - ravenx
  - apple-silicon
  - turboquant
  - kv-cache-compression
  - long-context
base_model: deadbydawn101/gemma-4-E4B-mlx-4bit
base_model_relation: adapter
language:
  - en
---

# gemma-4-E4B — Opus Reasoning + Claude Code LoRA

## 🧠 Opus Reasoning + Claude Code LoRA

LoRA adapters trained on **Claude Opus 4.6 reasoning traces** and **Claude Code tool-use patterns** — applied on top of `deadbydawn101/gemma-4-E4B-mlx-4bit` to give Gemma 4 a reasoning-heavy, structured assistant style.

> **What this means:** these adapters teach the model to think before answering — using `<think>` tags for chain-of-thought, multi-step reasoning, and tool-invocation patterns extracted from real Claude Code sessions.

## What's in this LoRA

| Source | Examples | Description |
|--------|--------:|-------------|
| **Crownelius/Opus-4.6-Reasoning-2100x-formatted** | 2,054 | Claude Opus 4.6 reasoning traces formatted with `<think>` tags |
| **Claude Code tool-use patterns** | 140 files | Real Claude Code agentic patterns — file read/write, bash, search loops |
| **Total** | **2,163** | SFT dataset: assistant completions only (`--train-on-completions`) |

Training on **completions only** means the model learns the *response style* without memorizing specific facts — it generalizes to new prompts.

## Adapter Details

| Property | Value |
|----------|-------|
| **Base model** | `deadbydawn101/gemma-4-E4B-mlx-4bit` |
| **Adapter type** | LoRA (MLX SFT) |
| **File size** | **658.8 MB** |
| **Rank** | 8 |
| **Alpha** | 16.0 |
| **Dropout** | 0.0 |
| **Trainable params** | 325M / 7,993M total (4.07%) |

## Training Config

| Setting | Value |
|---------|------:|
| Iterations | 1,000 |
| Batch size | 2 + grad accum ×4 (eff. batch 8) |
| Learning rate | 1e-5 |
| Max seq length | 2,048 |
| Peak GPU memory | 7.876 GB |
| Hardware | Apple M4 Max 128GB |

## Training Curve

Loss collapsed fast — the reasoning patterns absorbed cleanly:

```
Iter 10   →  2.277
Iter 20   →  0.097   ← rapid style acquisition
Iter 50   →  0.00063
Iter 100  →  0.0000398
Iter 200  →  0.0000067  (checkpoint saved)
Iter 1000 →  ~3.5e-7  (final)
```

## Quickstart (MLX)

### Install base model + adapters

```bash
pip install mlx-lm
```

```python
from mlx_lm import load, generate

# Load base model with LoRA adapters
model, tokenizer = load(
    "deadbydawn101/gemma-4-E4B-mlx-4bit",
    adapter_path="deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora",
)

messages = [{"role": "user", "content": "Solve this step by step: A train leaves Chicago at 60mph..."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
response = generate(model, tokenizer, prompt=prompt, max_tokens=1024, verbose=True)
```

### CLI

```bash
mlx_lm.generate \
  --model deadbydawn101/gemma-4-E4B-mlx-4bit \
  --adapter-path deadbydawn101/gemma-4-E4B-opus-reasoning-claude-code-lora \
  --prompt "Write a Python function to find prime numbers and explain your reasoning." \
  --max-tokens 1024
```

## Intended Use

Best for prompts where you want the model to:
- **Think step by step** before responding
- Handle **multi-step problems** (math, logic, code debugging)
- Follow **agentic tool-use patterns** (read → reason → act → verify)
- Produce well-structured, deliberate completions

Not ideal for:
- Short creative tasks (adds reasoning overhead)
- Casual chitchat

## Files

| File | Description |
|------|-------------|
| `adapters.safetensors` | LoRA weights (658.8 MB) |
| `adapter_config.json` | Config: `rank=8, alpha=16, dropout=0.0` |


## ⚡ TurboQuant-MLX Compatibility

Works alongside **[TurboQuant-MLX](https://github.com/DeadByDawn101/turboquant-mlx)** — combine LoRA fine-tuning with 4.6x KV cache compression for long-context reasoning with Claude-style behavior.

→ [TurboQuant-MLX on GitHub](https://github.com/DeadByDawn101/turboquant-mlx)

## Related Models

| Model | Size | Description |
|-------|------|-------------|
| [deadbydawn101/gemma-4-E4B-mlx-4bit](https://huggingface.co/deadbydawn101/gemma-4-E4B-mlx-4bit) | 4.86 GB | Base model — load this first |
| [deadbydawn101/gemma-4-E2B-Heretic-Uncensored-mlx-4bit](https://huggingface.co/deadbydawn101/gemma-4-E2B-Heretic-Uncensored-mlx-4bit) | 3.34 GB | 2B uncensored abliterated variant |

---

*Trained and released by [deadbydawn101](https://huggingface.co/deadbydawn101) · RavenX AI*