austindixson
/

gemma-4-21b-reap-harness-ready

+---
+license: other
+library_name: transformers
+tags:
+- gemma-4
+- agent
+- tool-use
+- fine-tuned
+- claude-conversations
+- coding
+- autonomous-agent
+base_model: 0xSero/gemma-4-21b-a4b-it-REAP
+---
+# gemma-4-21b-reap-harness-ready
+This is a fine-tuned version of [`0xSero/gemma-4-21b-a4b-it-REAP`](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP) trained on Claude conversations with tool use capabilities.
+## Attribution & Licenses
+### Base Model
+This model is based on:
+- **Gemma 4** by Google DeepMind
+- **0xSero/gemma-4-21b-a4b-it-REAP** - A specialized fine-tune of Gemma 4
+Gemma 4 is licensed under the **Gemma License**: https://ai.google.dev/gemma/terms
+### Training Data
+- **Dataset**: Private Claude conversations (agent-dataset-unsloth)
+- **Source**: Conversations generated using Anthropic's Claude (Claude Code)
+- **License**: Private dataset - not for redistribution
+### Training Framework
+This model was fine-tuned using:
+- **Transformers** by Hugging Face (Apache 2.0)
+- **PEFT** (Parameter-Efficient Fine-Tuning) by Hugging Face (Apache 2.0)
+- **bitsandbytes** for 4-bit quantization (MIT)
+- **Unsloth** for optimized training (Apache 2.0)
+### Developer
+**Fine-tuned by**: Austin Dixson
+**Training Date**: April 2025
+**Status**: Active development - iteration 1/10
+## Training Details
+- **Base Model**: 0xSero/gemma-4-21b-a4b-it-REAP
+- **Training Steps**: 325/1500 (22% complete)
+- **Loss**: ~2.708
+- **Dataset**: Private Claude conversations (agent-dataset-unsloth)
+- **Training Method**: LoRA (Low-Rank Adaptation)
+  - Rank (r): 16
+  - Alpha: 16
+  - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
+## Capabilities
+This model has been fine-tuned for:
+- **One-shot coding** - Writing code from single examples
+- **Tool-driven agent loops** - Using tools autonomously
+- **Function calling** - OpenAI-style function calling
+- **Autonomous research** - Self-directed problem solving
+## Tools Integrated
+- divideandconquer
+- PinchBench
+- WildClawBench
+- hotAsianIntern
+## Usage
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import torch
+# Load base model in 4-bit
+base_model = AutoModelForCausalLM.from_pretrained(
+    "0xSero/gemma-4-21b-a4b-it-REAP",
+    device_map="auto",
+    torch_dtype=torch.float16,
+    load_in_4bit=True,
+)
+# Load LoRA adapters
+model = PeftModel.from_pretrained(base_model, "austindixson/gemma-4-21b-reap-harness-ready")
+tokenizer = AutoTokenizer.from_pretrained("austindixson/gemma-4-21b-reap-harness-ready")
+# Use the model
+prompt = "How do I create a REST API in Python?"
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Training Configuration
+- **Max Sequence Length**: 2048 tokens
+- **Batch Size**: 2 per device × 4 gradient accumulation = 8 effective batch
+- **Learning Rate**: 2e-4
+- **Quantization**: 4-bit (NF4 quantization)
+- **Optimizer**: AdamW 8-bit
+- **Scheduler**: cosine with 10 warmup steps
+## Hardware
+Trained on H100 GPU (80GB HBM3) with 4-bit quantization for memory efficiency.
+## Iteration Plan
+This model is part of a 10x iteration workflow:
+1. Train → Benchmark → Auto-research → Prune → Deploy
+2. Current status: First iteration checkpoint (step 325)
+## License
+This model inherits the license from the base Gemma 4 model.
+See the [Gemma License](https://ai.google.dev/gemma/terms) for usage terms.
+---
+## Acknowledgments
+- **Google DeepMind** for creating the Gemma 4 model
+- **0xSero** for the REAP fine-tune of Gemma 4
+- **Anthropic** for Claude (Claude Code) used to generate training data
+- **Hugging Face** for the Transformers, PEFT, and Bitsandbytes libraries
+- **Unsloth** for the optimized training framework