austindixson commited on
Commit
35cafbd
·
verified ·
1 Parent(s): 6299d3b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +129 -3
README.md CHANGED
@@ -1,3 +1,129 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ library_name: transformers
4
+ tags:
5
+ - gemma-4
6
+ - agent
7
+ - tool-use
8
+ - fine-tuned
9
+ - claude-conversations
10
+ - coding
11
+ - autonomous-agent
12
+ base_model: 0xSero/gemma-4-21b-a4b-it-REAP
13
+ ---
14
+
15
+ # gemma-4-21b-reap-harness-ready
16
+
17
+ This is a fine-tuned version of [`0xSero/gemma-4-21b-a4b-it-REAP`](https://huggingface.co/0xSero/gemma-4-21b-a4b-it-REAP) trained on Claude conversations with tool use capabilities.
18
+
19
+ ## Attribution & Licenses
20
+
21
+ ### Base Model
22
+ This model is based on:
23
+ - **Gemma 4** by Google DeepMind
24
+ - **0xSero/gemma-4-21b-a4b-it-REAP** - A specialized fine-tune of Gemma 4
25
+
26
+ Gemma 4 is licensed under the **Gemma License**: https://ai.google.dev/gemma/terms
27
+
28
+ ### Training Data
29
+ - **Dataset**: Private Claude conversations (agent-dataset-unsloth)
30
+ - **Source**: Conversations generated using Anthropic's Claude (Claude Code)
31
+ - **License**: Private dataset - not for redistribution
32
+
33
+ ### Training Framework
34
+ This model was fine-tuned using:
35
+ - **Transformers** by Hugging Face (Apache 2.0)
36
+ - **PEFT** (Parameter-Efficient Fine-Tuning) by Hugging Face (Apache 2.0)
37
+ - **bitsandbytes** for 4-bit quantization (MIT)
38
+ - **Unsloth** for optimized training (Apache 2.0)
39
+
40
+ ### Developer
41
+ **Fine-tuned by**: Austin Dixson
42
+ **Training Date**: April 2025
43
+ **Status**: Active development - iteration 1/10
44
+
45
+ ## Training Details
46
+
47
+ - **Base Model**: 0xSero/gemma-4-21b-a4b-it-REAP
48
+ - **Training Steps**: 325/1500 (22% complete)
49
+ - **Loss**: ~2.708
50
+ - **Dataset**: Private Claude conversations (agent-dataset-unsloth)
51
+ - **Training Method**: LoRA (Low-Rank Adaptation)
52
+ - Rank (r): 16
53
+ - Alpha: 16
54
+ - Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
55
+
56
+ ## Capabilities
57
+
58
+ This model has been fine-tuned for:
59
+ - **One-shot coding** - Writing code from single examples
60
+ - **Tool-driven agent loops** - Using tools autonomously
61
+ - **Function calling** - OpenAI-style function calling
62
+ - **Autonomous research** - Self-directed problem solving
63
+
64
+ ## Tools Integrated
65
+
66
+ - divideandconquer
67
+ - PinchBench
68
+ - WildClawBench
69
+ - hotAsianIntern
70
+
71
+ ## Usage
72
+
73
+ ```python
74
+ from transformers import AutoModelForCausalLM, AutoTokenizer
75
+ from peft import PeftModel
76
+ import torch
77
+
78
+ # Load base model in 4-bit
79
+ base_model = AutoModelForCausalLM.from_pretrained(
80
+ "0xSero/gemma-4-21b-a4b-it-REAP",
81
+ device_map="auto",
82
+ torch_dtype=torch.float16,
83
+ load_in_4bit=True,
84
+ )
85
+
86
+ # Load LoRA adapters
87
+ model = PeftModel.from_pretrained(base_model, "austindixson/gemma-4-21b-reap-harness-ready")
88
+ tokenizer = AutoTokenizer.from_pretrained("austindixson/gemma-4-21b-reap-harness-ready")
89
+
90
+ # Use the model
91
+ prompt = "How do I create a REST API in Python?"
92
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
93
+ outputs = model.generate(**inputs, max_new_tokens=512)
94
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
95
+ ```
96
+
97
+ ## Training Configuration
98
+
99
+ - **Max Sequence Length**: 2048 tokens
100
+ - **Batch Size**: 2 per device × 4 gradient accumulation = 8 effective batch
101
+ - **Learning Rate**: 2e-4
102
+ - **Quantization**: 4-bit (NF4 quantization)
103
+ - **Optimizer**: AdamW 8-bit
104
+ - **Scheduler**: cosine with 10 warmup steps
105
+
106
+ ## Hardware
107
+
108
+ Trained on H100 GPU (80GB HBM3) with 4-bit quantization for memory efficiency.
109
+
110
+ ## Iteration Plan
111
+
112
+ This model is part of a 10x iteration workflow:
113
+ 1. Train → Benchmark → Auto-research → Prune → Deploy
114
+ 2. Current status: First iteration checkpoint (step 325)
115
+
116
+ ## License
117
+
118
+ This model inherits the license from the base Gemma 4 model.
119
+ See the [Gemma License](https://ai.google.dev/gemma/terms) for usage terms.
120
+
121
+ ---
122
+
123
+ ## Acknowledgments
124
+
125
+ - **Google DeepMind** for creating the Gemma 4 model
126
+ - **0xSero** for the REAP fine-tune of Gemma 4
127
+ - **Anthropic** for Claude (Claude Code) used to generate training data
128
+ - **Hugging Face** for the Transformers, PEFT, and Bitsandbytes libraries
129
+ - **Unsloth** for the optimized training framework