--- license: other library_name: transformers tags: - gemma-4 - agent - tool-use - fine-tuned - claude-conversations - coding - autonomous-agent base_model: 0xSero/gemma-4-21b-a4b-it-REAP --- # gemma-4-21b-reap-harness-ready This is a fine-tuned version of [`0xSero/gemma-4-21b-a4b-it-REAP`](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP) trained on Claude conversations with tool use capabilities. ## Attribution & Licenses ### Base Model This model is based on: - **Gemma 4** by Google DeepMind - **0xSero/gemma-4-21b-a4b-it-REAP** - A specialized fine-tune of Gemma 4 Gemma 4 is licensed under the **Gemma License**: https://ai.google.dev/gemma/terms ### Training Data - **Dataset**: Private Claude conversations (agent-dataset-unsloth) - **Source**: Conversations generated using Anthropic's Claude (Claude Code) - **License**: Private dataset - not for redistribution ### Training Framework This model was fine-tuned using: - **Transformers** by Hugging Face (Apache 2.0) - **PEFT** (Parameter-Efficient Fine-Tuning) by Hugging Face (Apache 2.0) - **bitsandbytes** for 4-bit quantization (MIT) - **Unsloth** for optimized training (Apache 2.0) ### Developer **Fine-tuned by**: Austin Dixson **Training Date**: April 2025 **Status**: Active development - iteration 1/10 ## Training Details - **Base Model**: 0xSero/gemma-4-21b-a4b-it-REAP - **Training Steps**: 325/1500 (22% complete) - **Loss**: ~2.708 - **Dataset**: Private Claude conversations (agent-dataset-unsloth) - **Training Method**: LoRA (Low-Rank Adaptation) - Rank (r): 16 - Alpha: 16 - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj ## Capabilities This model has been fine-tuned for: - **One-shot coding** - Writing code from single examples - **Tool-driven agent loops** - Using tools autonomously - **Function calling** - OpenAI-style function calling - **Autonomous research** - Self-directed problem solving ## Tools Integrated - divideandconquer - PinchBench - WildClawBench - hotAsianIntern ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch # Load base model in 4-bit base_model = AutoModelForCausalLM.from_pretrained( "0xSero/gemma-4-21b-a4b-it-REAP", device_map="auto", torch_dtype=torch.float16, load_in_4bit=True, ) # Load LoRA adapters model = PeftModel.from_pretrained(base_model, "austindixson/gemma-4-21b-reap-harness-ready") tokenizer = AutoTokenizer.from_pretrained("austindixson/gemma-4-21b-reap-harness-ready") # Use the model prompt = "How do I create a REST API in Python?" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Training Configuration - **Max Sequence Length**: 2048 tokens - **Batch Size**: 2 per device × 4 gradient accumulation = 8 effective batch - **Learning Rate**: 2e-4 - **Quantization**: 4-bit (NF4 quantization) - **Optimizer**: AdamW 8-bit - **Scheduler**: cosine with 10 warmup steps ## Hardware Trained on H100 GPU (80GB HBM3) with 4-bit quantization for memory efficiency. ## Iteration Plan This model is part of a 10x iteration workflow: 1. Train → Benchmark → Auto-research → Prune → Deploy 2. Current status: First iteration checkpoint (step 325) ## License This model inherits the license from the base Gemma 4 model. See the [Gemma License](https://ai.google.dev/gemma/terms) for usage terms. --- ## Acknowledgments - **Google DeepMind** for creating the Gemma 4 model - **0xSero** for the REAP fine-tune of Gemma 4 - **Anthropic** for Claude (Claude Code) used to generate training data - **Hugging Face** for the Transformers, PEFT, and Bitsandbytes libraries - **Unsloth** for the optimized training framework