---
license: other
library_name: transformers
tags:
- gemma-4
- agent
- tool-use
- fine-tuned
- claude-conversations
- coding
- autonomous-agent
base_model: 0xSero/gemma-4-21b-a4b-it-REAP
---

# gemma-4-21b-reap-harness-ready

This is a fine-tuned version of [`0xSero/gemma-4-21b-a4b-it-REAP`](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP) trained on Claude conversations with tool use capabilities.

## Attribution & Licenses

### Base Model
This model is based on:
- **Gemma 4** by Google DeepMind
- **0xSero/gemma-4-21b-a4b-it-REAP** - A specialized fine-tune of Gemma 4

Gemma 4 is licensed under the **Gemma License**: https://ai.google.dev/gemma/terms

### Training Data
- **Dataset**: Private Claude conversations (agent-dataset-unsloth)
- **Source**: Conversations generated using Anthropic's Claude (Claude Code)
- **License**: Private dataset - not for redistribution

### Training Framework
This model was fine-tuned using:
- **Transformers** by Hugging Face (Apache 2.0)
- **PEFT** (Parameter-Efficient Fine-Tuning) by Hugging Face (Apache 2.0)
- **bitsandbytes** for 4-bit quantization (MIT)
- **Unsloth** for optimized training (Apache 2.0)

### Developer
**Fine-tuned by**: Austin Dixson  
**Training Date**: April 2025  
**Status**: Active development - iteration 1/10

## Training Details

- **Base Model**: 0xSero/gemma-4-21b-a4b-it-REAP
- **Training Steps**: 325/1500 (22% complete)
- **Loss**: ~2.708
- **Dataset**: Private Claude conversations (agent-dataset-unsloth)
- **Training Method**: LoRA (Low-Rank Adaptation)
  - Rank (r): 16
  - Alpha: 16
  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

## Capabilities

This model has been fine-tuned for:
- **One-shot coding** - Writing code from single examples
- **Tool-driven agent loops** - Using tools autonomously
- **Function calling** - OpenAI-style function calling
- **Autonomous research** - Self-directed problem solving

## Tools Integrated

- divideandconquer
- PinchBench  
- WildClawBench
- hotAsianIntern

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model in 4-bit
base_model = AutoModelForCausalLM.from_pretrained(
    "0xSero/gemma-4-21b-a4b-it-REAP",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_4bit=True,
)

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "austindixson/gemma-4-21b-reap-harness-ready")
tokenizer = AutoTokenizer.from_pretrained("austindixson/gemma-4-21b-reap-harness-ready")

# Use the model
prompt = "How do I create a REST API in Python?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Training Configuration

- **Max Sequence Length**: 2048 tokens
- **Batch Size**: 2 per device × 4 gradient accumulation = 8 effective batch
- **Learning Rate**: 2e-4
- **Quantization**: 4-bit (NF4 quantization)
- **Optimizer**: AdamW 8-bit
- **Scheduler**: cosine with 10 warmup steps

## Hardware

Trained on H100 GPU (80GB HBM3) with 4-bit quantization for memory efficiency.

## Iteration Plan

This model is part of a 10x iteration workflow:
1. Train → Benchmark → Auto-research → Prune → Deploy
2. Current status: First iteration checkpoint (step 325)

## License

This model inherits the license from the base Gemma 4 model.
See the [Gemma License](https://ai.google.dev/gemma/terms) for usage terms.

---

## Acknowledgments

- **Google DeepMind** for creating the Gemma 4 model
- **0xSero** for the REAP fine-tune of Gemma 4
- **Anthropic** for Claude (Claude Code) used to generate training data
- **Hugging Face** for the Transformers, PEFT, and Bitsandbytes libraries
- **Unsloth** for the optimized training framework