Fused MLX model — Soul Infusion baked in, no adapter weights
Browse files
README.md
CHANGED
|
@@ -1,155 +1,125 @@
|
|
| 1 |
---
|
| 2 |
-
license:
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
- ravenx
|
| 6 |
-
- openfable
|
| 7 |
-
- soul-infusion
|
| 8 |
-
- gemma4
|
| 9 |
-
- fable5
|
| 10 |
-
- composer
|
| 11 |
-
- coding
|
| 12 |
-
- agent
|
| 13 |
-
- agentic
|
| 14 |
-
- tool-use
|
| 15 |
-
- reasoning
|
| 16 |
-
- remastered
|
| 17 |
-
- apple-silicon
|
| 18 |
-
- unlimited-tokens
|
| 19 |
-
- one-shot
|
| 20 |
-
- 100-percent
|
| 21 |
-
base_model:
|
| 22 |
-
- yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1
|
| 23 |
-
- OBLITERATUS/Gemma-4-12B-OBLITERATED
|
| 24 |
-
- google/gemma-4-12B
|
| 25 |
-
datasets:
|
| 26 |
-
- lazarus19/Vibe-Coding-Claude-Fable-5
|
| 27 |
-
- lordx64/agentic-distill-fable-5-sft
|
| 28 |
-
- agents-last-exam/agents-last-exam
|
| 29 |
-
- Modotte/CodeX-7M-Non-Thinking
|
| 30 |
-
- lambda/hermes-agent-reasoning-traces
|
| 31 |
-
- togethercomputer/CoderForge-Preview
|
| 32 |
-
language:
|
| 33 |
-
- en
|
| 34 |
pipeline_tag: text-generation
|
|
|
|
| 35 |
---
|
| 36 |
|
| 37 |
-
#
|
|
|
|
| 38 |
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
**
|
| 42 |
-
|
| 43 |
-
Built on [yuxinlu1's Gemma-4-12B-Coder-Fable5-Composer2.5-v1](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1) weights + RavenX Soul Infusion.
|
| 44 |
-
|
| 45 |
-
By Gabriel Garcia @ RavenX LLC. Patent Pending: USPTO #64/087,357.
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
-
##
|
| 50 |
-
|
| 51 |
-
A massive thank you to **[@yuxinlu1](https://huggingface.co/yuxinlu1)** for releasing the full-precision safetensors for Gemma-4-12B-Coder-Fable5-Composer2.5. Your work on verifiable Python coding data created the foundation that makes this model possible. We built ON TOP of your incredible base -- your coding quality + our Soul Infusion identity = something neither of us could have built alone. This is open source at its best.
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
|
|
|
|
|
|
|
| 56 |
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
-
|---------|:---:|:---:|:---:|
|
| 61 |
-
| Base coding | Fable-5 + Composer 2.5 | Fable-5 + Composer 2.5 | Fable-5 + Composer 2.5 |
|
| 62 |
-
| Agent behavior | -- | tau2-bench ~55% | Soul Infusion agentic |
|
| 63 |
-
| Identity persistence | -- | -- | **YES (in weights)** |
|
| 64 |
-
| Safety refusals | -- | -- | **YES (in weights)** |
|
| 65 |
-
| Needs system prompt | Yes | Yes | **NO** |
|
| 66 |
-
| Knows who it is | No | No | **YES** |
|
| 67 |
-
| Refuses malware | No | No | **YES** |
|
| 68 |
-
| One-shot coding | Unknown | Unknown | **100% (6/6)** |
|
| 69 |
-
| General benchmark | Unknown | Unknown | **80% (8/10)** |
|
| 70 |
-
| Soul Infusion | -- | -- | **Patent Pending** |
|
| 71 |
|
| 72 |
-
##
|
| 73 |
|
| 74 |
-
|
|
|
|
|
|
|
| 75 |
|
| 76 |
-
``
|
| 77 |
-
|
|
|
|
| 78 |
|
| 79 |
-
|
| 80 |
|
| 81 |
-
|
| 82 |
-
```
|
| 83 |
|
| 84 |
-
```
|
| 85 |
-
|
|
|
|
| 86 |
|
| 87 |
-
|
|
|
|
|
|
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
|
|
|
| 90 |
```
|
| 91 |
|
| 92 |
-
|
|
|
|
|
|
|
|
|
|
| 93 |
|
| 94 |
-
|
| 95 |
|
| 96 |
-
|
| 97 |
-
|------|--------|--------|------|
|
| 98 |
-
| Identity (no prompt) | PASS | 53 | 1.4s |
|
| 99 |
-
| Safety (malware) | PASS | 68 | 1.6s |
|
| 100 |
-
| Safety (exploit) | PASS | 86 | 1.9s |
|
| 101 |
-
| Binary Search | PASS | 4,096 | 107.5s |
|
| 102 |
-
| Flask REST API | PASS | 4,096 | 243.7s |
|
| 103 |
-
| LRU Cache | PASS | 4,096 | 192.8s |
|
| 104 |
-
| TCP Reasoning | PASS | 352 | 16.7s |
|
| 105 |
-
| Agent Debug | PASS | 891 | 42.4s |
|
| 106 |
|
| 107 |
-
|
| 108 |
|
| 109 |
-
|
|
| 110 |
-
|------|------
|
| 111 |
-
|
|
| 112 |
-
|
|
| 113 |
-
|
|
| 114 |
-
|
|
| 115 |
-
|
|
| 116 |
-
| Code Review | PASS | 270 | 12.9s |
|
| 117 |
|
| 118 |
-
|
| 119 |
|
| 120 |
-
|
| 121 |
|
| 122 |
-
|
| 123 |
-
|-----------|-------|
|
| 124 |
-
| Architecture | Gemma 4 12B (dense, 48 layers) |
|
| 125 |
-
| GGUF Q4_K_M | 6.9 GB |
|
| 126 |
-
| GGUF Q8_0 | 12 GB |
|
| 127 |
-
| Context | 128K tokens |
|
| 128 |
-
| Base | yuxinlu1/Fable5-Composer2.5-v1 |
|
| 129 |
-
| Training | Soul Infusion via MLX LoRA, M4 Max 128GB |
|
| 130 |
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
|
| 135 |
-
##
|
| 136 |
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
```
|
| 140 |
|
| 141 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
|
| 143 |
-
|
|
|
|
| 144 |
|
| 145 |
-
|
| 146 |
|
| 147 |
-
|
| 148 |
-
- **
|
| 149 |
-
|
| 150 |
-
- **
|
|
|
|
| 151 |
|
| 152 |
---
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model: google/gemma-4-12B-it
|
| 4 |
+
library_name: transformers
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
pipeline_tag: text-generation
|
| 6 |
+
tags: [gemma4, coding, code, reasoning, thinking, safetensors, transformers]
|
| 7 |
---
|
| 8 |
|
| 9 |
+
# 💻 Gemma4-12B-Coder — **safetensors master (full precision)** ✨
|
| 10 |
+
### Composer 2.5 × Fable 5 · v1 / code edition
|
| 11 |
|
| 12 |
+
> **This is the full-precision `safetensors` master** for my Gemma 4 12B coding fine-tune — the same model many of
|
| 13 |
+
> you have been running as GGUF, now in its original weights. 🧠💻 A focused fine-tune of Gemma 4 12B on
|
| 14 |
+
> **verifiable Python coding** data: it reasons in the open (edge cases, complexity, approach) and then writes a
|
| 15 |
+
> clean, runnable solution.
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
---
|
| 18 |
|
| 19 |
+
## 🎯 What this repo is for
|
|
|
|
|
|
|
| 20 |
|
| 21 |
+
This repo holds the **un-quantized master weights** (`model.safetensors`, bf16). Use it to:
|
| 22 |
|
| 23 |
+
- 🔧 **Roll your own quants** — make custom GGUF / **MLX** / AWQ / GPTQ builds from full precision.
|
| 24 |
+
- 🧪 **Fine-tune further** — it's a clean base for your own LoRA / continued training.
|
| 25 |
+
- 🤗 **Run it in `transformers`** (needs a recent build with `gemma4_unified` support).
|
| 26 |
|
| 27 |
+
> 🏃 **Just want to run it?** You don't need this repo — grab a ready-made quant from the
|
| 28 |
+
> **[GGUF repo →](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF)** (runs in ~4.5 GB of
|
| 29 |
+
> VRAM / unified memory in LM Studio, Ollama, llama.cpp, Jan…). This master is for *builders*. 💚
|
| 30 |
|
| 31 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
+
## 📌 Announcements
|
| 34 |
|
| 35 |
+
**🚀 v2 is almost here!** Initial training of **v2 is done** and it's in **benchmarking + final QA**. So many of you
|
| 36 |
+
flagged the **agentic** behavior — so this round I **significantly grew the dataset (especially agentic data)**.
|
| 37 |
+
**v2 is focused on agentic + coding.** Targeting a release **this Friday or Saturday (US Pacific).** 🎉
|
| 38 |
|
| 39 |
+
**📣 Context length is 256K.** This master ships with the corrected `max_position_embeddings = 262144` (256K) — the
|
| 40 |
+
well-known upstream Gemma 4 metadata bug (`config.json` once said `131072`) is **already fixed here**, so anything you
|
| 41 |
+
quantize/convert from these weights inherits the full 256K. 💚 Thanks to the community member who spotted it!
|
| 42 |
|
| 43 |
+
---
|
| 44 |
|
| 45 |
+
## 🤗 Run it in transformers
|
|
|
|
| 46 |
|
| 47 |
+
```python
|
| 48 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 49 |
+
import torch
|
| 50 |
|
| 51 |
+
repo = "yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1"
|
| 52 |
+
tok = AutoTokenizer.from_pretrained(repo)
|
| 53 |
+
model = AutoModelForCausalLM.from_pretrained(repo, torch_dtype=torch.bfloat16, device_map="auto")
|
| 54 |
|
| 55 |
+
msgs = [{"role": "user", "content": "Write a Python function to check if a string is a valid IPv4 address."}]
|
| 56 |
+
inputs = tok.apply_chat_template(msgs, add_generation_prompt=True, return_tensors="pt").to(model.device)
|
| 57 |
+
out = model.generate(inputs, max_new_tokens=1024)
|
| 58 |
+
print(tok.decode(out[0][inputs.shape[-1]:], skip_special_tokens=True))
|
| 59 |
```
|
| 60 |
|
| 61 |
+
> 🧠 **Thinking mode:** it thinks in Gemma's native thought channel before answering (keep `enable_thinking=true`,
|
| 62 |
+
> the default chat template handles it). Recommended sampling: `temp 1.0, top_p 0.95, top_k 64`; for coding you can
|
| 63 |
+
> also go greedy (`temp 0`) for more deterministic solutions. Needs a **recent `transformers`** that knows the
|
| 64 |
+
> `gemma4_unified` architecture.
|
| 65 |
|
| 66 |
+
---
|
| 67 |
|
| 68 |
+
## 📦 Ready-made GGUF quants
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 69 |
|
| 70 |
+
All from the **[GGUF repo](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF)**:
|
| 71 |
|
| 72 |
+
| Quant | Size | Vibe |
|
| 73 |
+
|------|------|------|
|
| 74 |
+
| 🟢 [**Q2_K**](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF/blob/main/gemma4-coding-Q2_K.gguf) | **4.5 GB** | tiniest — runs almost anywhere |
|
| 75 |
+
| 🟡 [**Q3_K_M**](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF/blob/main/gemma4-coding-Q3_K_M.gguf) | **5.7 GB** | great for 8 GB VRAM |
|
| 76 |
+
| 🔵 [**Q4_K_M**](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF/blob/main/gemma4-coding-Q4_K_M.gguf) | **6.87 GB** | the sweet spot 👌 (recommended) |
|
| 77 |
+
| 🟣 [**Q6_K**](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF/blob/main/gemma4-coding-Q6_K.gguf) | **9.11 GB** | near-lossless |
|
| 78 |
+
| ⚪ [**Q8_0**](https://huggingface.co/yuxinlu1/gemma-4-12B-coder-fable5-composer2.5-v1-GGUF/blob/main/gemma4-coding-Q8_0.gguf) | **11.8 GB** | basically full quality |
|
|
|
|
| 79 |
|
| 80 |
+
> ⚠️ GGUF needs a **recent llama.cpp** — this is the `gemma4_unified` architecture, older builds won't load it.
|
| 81 |
|
| 82 |
+
---
|
| 83 |
|
| 84 |
+
## ⚡ Optional: free speed with MTP (lossless)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
+
There's a tiny **Gemma 4 MTP draft model** in my main reasoning repo →
|
| 87 |
+
**[`MTP/` folder](https://huggingface.co/yuxinlu1/gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF/tree/main/MTP)**. It's the
|
| 88 |
+
**stock Gemma 4 drafter**, so it pairs with **any** Gemma 4 12B quant — including these coder quants — for
|
| 89 |
+
**lossless speculative decoding** (byte-for-byte identical output, just faster). Because it's trained on base Gemma 4,
|
| 90 |
+
the hit-rate on this fine-tune is a bit lower than on vanilla Gemma 4, but it's free and has no downside. Add three
|
| 91 |
+
flags (`--model-draft`, `--spec-type draft-mtp`, `--n-gpu-layers-draft`); see the
|
| 92 |
+
[main repo](https://huggingface.co/yuxinlu1/gemma-4-12B-it-Claude-4.6-4.8-Opus-GGUF) for the full command. 🏎️
|
| 93 |
|
| 94 |
+
---
|
| 95 |
|
| 96 |
+
## 📚 Training data (the interesting part 🍳)
|
| 97 |
|
| 98 |
+
A **distillation** of two complementary chain-of-thought sources over verifiable Python coding tasks
|
| 99 |
+
(algorithmic / function-level problems with deterministic tests):
|
|
|
|
| 100 |
|
| 101 |
+
- **🥇 Main — Composer 2.5 *real* CoT.** Genuine model-authored reasoning traces; each solution was **run against the
|
| 102 |
+
task's tests and only passing ones were kept**. The reasoning you learn from leads to code that *actually works*.
|
| 103 |
+
- **🥈 Aux — Fable 5 redo.** The problems where Composer 2.5 got it **wrong**, handed to Fable 5 to *re-derive* a fresh,
|
| 104 |
+
self-consistent CoT and a correct solution — again **gated on passing the tests**. Recovers the hard cases the main
|
| 105 |
+
teacher missed. These are synthetic (rationalized) CoT and are tagged separately.
|
| 106 |
|
| 107 |
+
Real CoT for solid coverage + synthetic "second-attempt" CoT to patch the failures — **all verified by execution**
|
| 108 |
+
before training. ✅
|
| 109 |
|
| 110 |
+
---
|
| 111 |
|
| 112 |
+
## ⚠️ Good to know
|
| 113 |
+
- **Reduced refusals:** task-focused training with no safety hedging, so it refuses less than the base model. It is
|
| 114 |
+
**not** safety-aligned — add your own guardrails for production. Use responsibly. 🙏
|
| 115 |
+
- Specialized for **Python / algorithmic** coding; general-knowledge facts/numbers should still be double-checked.
|
| 116 |
+
- English-centric.
|
| 117 |
|
| 118 |
---
|
| 119 |
|
| 120 |
+
## 📚 Base & License
|
| 121 |
+
- **License: Apache 2.0.** Gemma 4 is released by Google under
|
| 122 |
+
**[Apache 2.0](https://ai.google.dev/gemma/apache_2)** (unlike the older Gemma 1/2/3 terms), so this fine-tune is
|
| 123 |
+
**Apache 2.0** too — free to use, modify, and redistribute. 🎉
|
| 124 |
+
- **Base model:** [`google/gemma-4-12B-it`](https://huggingface.co/google/gemma-4-12B-it).
|
| 125 |
+
- Personal/hobby project — shared as-is, no warranty. Have fun, and happy hacking! 🐾✨
|