Gemma4 26B MoE — Hermes Tool-Use Reasoning LoRA 🛠️

LoRA adapter fine-tuned from google/gemma-4-26B-A4B-it on Hermes Reasoning Tool Use dataset — 10K subset of tool-calling conversations with reasoning traces, trained by UKA (Hermes Agent) 🤖

📋 Summary

Detail Value
Base Model google/gemma-4-26B-A4B-it (26B MoE, 128 experts)
Dataset interstellarninja/hermes_reasoning_tool_use (10K subset of 51K)
Method Custom NF4 per-expert quantization + LoRA
Pipeline AndriejusNak/gemma4-26b-moe-finetune
GPU NVIDIA RTX 5090 32GB (Vast.ai Cloud)
Training Time 346 minutes (~5h 46m)
Best Loss 0.5443
NaN Explosions 0

🖥️ Hardware

Component Specification
GPU NVIDIA GeForce RTX 5090 32GB GDDR7
CPU Intel Core i7-14700K (28 cores)
RAM 94 GB DDR5
Disk 200 GB NVMe SSD
Cloud Vast.ai
PyTorch 2.12.0.dev (nightly, cu128)

🔧 Training Configuration

# v6_26b_pipeline.py
MODEL_NAME = "google/gemma-4-26B-A4B-it"
MAX_SEQ_LENGTH = 1536        # Longer for tool definitions + conversations
LORA_R = 32
LORA_ALPHA = 32
INCLUDE_MLP_LORA = True
SFT_EPOCHS = 2
SFT_BATCH_SIZE = 2            # Reduced for seq=1536
SFT_GRAD_ACCUM = 8            # Effective batch = 16
SFT_LR = 2e-5
SFT_FILES = ["data/hermes_tool_10k.jsonl"]

LoRA Details

  • Rank (r): 32, Alpha: 32
  • Target modules: q_proj, k_proj, v_proj, o_proj + gate_proj, up_proj, down_proj
  • Trainable params: 59,275,776 / 3,027,224,428 (1.96%)

Loss Progression

Step  50: Loss 3.0767  (epoch 1)
Step 100: Loss 1.0241
Step 150: Loss 0.7901
...
Step 600: Loss 0.5698
  → Epoch 1 avg: 0.8616
Step 750: Loss 0.5277  (epoch 2)
Step 900: Loss 0.5500
Step 1050: Loss 0.5407
Step 1200: Loss 0.5126
  → Epoch 2 avg: 0.5443 🎯 Best!

🚀 Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-4-26B-A4B-it",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
model = PeftModel.from_pretrained(model, "hotdogs/gemma4-26b-hermes-tool-reasoning-lora")

tokenizer = AutoTokenizer.from_pretrained("google/gemma-4-26B-A4B-it")
messages = [
    {"role": "system", "content": "You are a function calling AI. Use tools when needed."},
    {"role": "user", "content": "Search for the latest papers on MoE models."}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt", add_generation_prompt=True).to(model.device)
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, do_sample=True)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📦 Files

adapter_model.safetensors   — LoRA weights (227 MB)
adapter_config.json         — r=32, alpha=32
tokenizer.json              — Gemma 4 tokenizer (31 MB)
v6_26b_pipeline.py          — Training script

🙏 Credits

  • Base Model: Google Gemma 4 26B
  • Dataset: interstellarninja/hermes_reasoning_tool_use
  • Pipeline: AndriejusNak/gemma4-26b-moe-finetune
  • Trainer: UKA (Hermes Agent)
Downloads last month
221
GGUF
Model size
37.2M params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hotdogs/gemma4-26b-hermes-tool-reasoning-lora

Adapter
(36)
this model

Dataset used to train hotdogs/gemma4-26b-hermes-tool-reasoning-lora