--- base_model: unsloth/gemma-4-31B-it tags: - text-generation-inference - transformers - unsloth - gemma4 - trl - lora license: apache-2.0 language: - en datasets: - PawanKrd/claude-fable-5-code - victor/fable-5-boeing-747-trace - armand0e/claude-fable-5-claude-code --- # Gemma 4 31B it - Claude Fable 5 Distilled The following model was trained on claude-code traces, with some chat data provided by the community. I recommend using the model with claude-code or pi, though other harnesses should work without issues. The data for this model was easily extracted, formatted, and masked for training with [Teich](https://github.com/TeichAI/teich) ## 📋 Stage Details & Benchmarks *Reasoning was left un-touched* *Benchmarks coming soon* > **Deep Dive Analysis:** For more comprehensive insights regarding the base capabilities of the Gemma 4 architecture, please refer to [this Analysis Document](https://huggingface.co/TeichAI/gemma-4-31B-it-Claude-Opus-Distill/resolve/main/Gemma%204%20Analysis.pdf). ## 🌟 Core Skills & Capabilities Thanks to its robust base model and high-effort reasoning distillation, this model is highly optimized for the following use cases: 1. **💻 Coding:** Advanced code generation, debugging, and software architecture planning. 2. **🔬 Science:** Deep scientific reasoning, hypothesis evaluation, and analytical problem-solving. 3. **🔎 Deep Research:** Navigating complex, multi-step research queries and synthesizing vast amounts of information. 4. **🧠 General Purpose:** Highly capable instruction-following for everyday tasks requiring high logical coherence. ## Getting Started You can use all Gemma 4 models with the latest version of Transformers. To get started, install the necessary dependencies in your environment: `pip install -U transformers torch accelerate` Once you have everything installed, you can proceed to load the model with the code below: ```python from transformers import AutoProcessor, AutoModelForCausalLM MODEL_ID = "google/gemma-4-31B-it" # Load model processor = AutoProcessor.from_pretrained(MODEL_ID) model = AutoModelForCausalLM.from_pretrained( MODEL_ID, dtype="auto", device_map="auto" ) ``` Once the model is loaded, you can start generating output: ```python # Prompt messages = [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a short joke about saving RAM."}, ] # Process input text = processor.apply_chat_template( messages, tokenize=False, add_generation_prompt=True, enable_thinking=False ) inputs = processor(text=text, return_tensors="pt").to(model.device) input_len = inputs["input_ids"].shape[-1] # Generate output outputs = model.generate(**inputs, max_new_tokens=1024) response = processor.decode(outputs[0][input_len:], skip_special_tokens=False) # Parse output processor.parse_response(response) ``` To enable reasoning, set `enable_thinking=True` and the `parse_response` function will take care of parsing the thinking output. ## **Best Practices** For the best performance, use these configurations and best practices: ### 1. Sampling Parameters Use the following standardized sampling configuration across all use cases: * `temperature=1.0` * `top_p=0.95` * `top_k=64` ### 2. Thinking Mode Configuration Compared to Gemma 3, the models use standard `system`, `assistant`, and `user` roles. To properly manage the thinking process, use the following control tokens: * **Trigger Thinking:** Thinking is enabled by including the `<|think|>` token at the start of the system prompt. To disable thinking, remove the token. * **Standard Generation:** When thinking is enabled, the model will output its internal reasoning followed by the final answer using this structure: `<|channel>thought\n`**[Internal reasoning]**`` * **Disabled Thinking Behavior:** For all models except for the E2B and E4B variants, if thinking is disabled, the model will still generate the tags but with an empty thought block: `<|channel>thought\n`**[Final answer]** > [!Note] > Note that many libraries like Transformers and llama.cpp handle the complexities of the chat template for you. ## 🙏 Acknowledgements - **Google**: For providing an exceptional open weights model. Read more about Gemma 4 on the [Google Innovation Blog](https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/). - **Unsloth**: For assembling ready-to-use, cutting-edge fine-tuning environments that make this work possible. - **PawanKrd**, **victor** and **armand0e**: For creating and sharing their awesome Fable datasets with the community. ## 📖 Citation If you use this model in your research or projects, please cite: ```bibtex @misc{teichai_gemma4_31b_fable_5_agent_distilled, title = {TeichAI/Gemma-4-31B-Fable-5-Agent-Distill}, author = {TeichAI}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/TeichAI/Gemma-4-31B-Fable-5-Agent-Distill}} } ```