Instructions to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF",
	filename="gemma-4-26B-A4B-it-cerebellum-v6.1-templatefix.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
# Run inference directly in the terminal:
llama-cli -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
# Run inference directly in the terminal:
./llama-cli -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Use Docker

docker model run hf.co/deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

LM Studio
Jan

vLLM

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Ollama
How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Ollama:
```
ollama run hf.co/deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
```

Unsloth Studio

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF to start chatting

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Run Hermes

hermes

Docker Model Runner
How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Docker Model Runner:
```
docker model run hf.co/deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16
```

Lemonade

How to use deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull deucebucket/Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF:F16

Run and chat with the model

lemonade run user.Gemma-4-26B-A4B-it-Cerebellum-v6-GGUF-F16

List all available models

lemonade list

What about uncensored/abliterated version?

by tima2431 - opened 29 days ago

Discussion

tima2431

29 days ago

Hi! just wanted to say this cerebellum v6 is very cool model, works so well. i was wonderin if you plan to do an abliterated/uncensored version or something? i really liked how smart it is, just want it without all the censore and refuses. keep up the work!

deucebucket

Owner 28 days ago

Hi! just wanted to say this cerebellum v6 is very cool model, works so well. i was wonderin if you plan to do an abliterated/uncensored version or something? i really liked how smart it is, just want it without all the censore and refuses. keep up the work!

thank you! I'm glad you like it! im actually working on doing this. im just not trying to follow the know ways, at least not without stumbling into them. So it may take me longer, but i seek to avoid the abliterated/uncensored sanity drop off if possible.

Koitenshin

23 days ago

Hi! just wanted to say this cerebellum v6 is very cool model, works so well. i was wonderin if you plan to do an abliterated/uncensored version or something? i really liked how smart it is, just want it without all the censore and refuses. keep up the work!

thank you! I'm glad you like it! im actually working on doing this. im just not trying to follow the know ways, at least not without stumbling into them. So it may take me longer, but i seek to avoid the abliterated/uncensored sanity drop off if possible.

If you're looking for a base model with little drop off, I'd recommend taking a look at coder3101's stuff.
https://huggingface.co/coder3101/gemma-4-26B-A4B-it-heretic

deucebucket

Owner 19 days ago

Uploaded a separate Heretic/Cerebellum GGUF repo here:

https://huggingface.co/deucebucket/Gemma-4-26B-A4B-it-Heretic-Cerebellum-GGUF

This uses coder3101/gemma-4-26B-A4B-it-heretic as the source checkpoint and applies the Gemma 4 26B Cerebellum tensor recipe. I kept it separate from the regular Cerebellum repo and included the mmproj file plus current benchmark JSONs in the repo.

Current local results are listed on the model card: ARC-Challenge 95.48%, HellaSwag 83.49%, MMLU Redux 71.42%, vision smoke 6/6, and the project refusal harness measured 1/45 refused.

tima2431

17 days ago

i just started testing it and so far it's just amazing, for my tasks and dialogues it works just fine! you have very cool models! <3

deucebucket

Owner 17 days ago

i just started testing it and so far it's just amazing, for my tasks and dialogues it works just fine! you have very cool models! <3

yeah, ive been using it since, thanks for the suggestion! its now my new daily driver! gemma 4, certainly has a lot of personality and knowledge packed in.

tima2431

16 days ago

yeah, it's my daily driver now too! Honestly, the quality is insane, for my tasks it feels almost on par with Gemini 2.5 Pro, but completely uncensored, which is exactly what I needed. Hitting 35-45 t/s on an RTX 5060 is pure gold. Keep it going! also, one quick question since I couldn't find any info on this anywhere: when running this (and other models based on gemma4) through standard llama.cpp at high context lengths (like 25k+ tokens), the model sometimes completely stops using its reasoning/chain-of-thought phase. It just prints 'enough;' or a similar word and skips straight to the answer, even if I force reasoning parameters at launch. have you noticed this context degradation too, or maybe do you happen to know a fix/sampler tweak for it?

deucebucket

Owner 16 days ago

yeah, ive seen this, and it was kind of funny. I was using it in open code, and it got stopped working. i asked if it was complete, it worked for another 3 minutes, then said "no" and stopped again. Made me laugh at the long think and abrupt answer. Currently im kicking around qwen 3.5 9b to find our on a small scale if thats something i can actually improve. this might also be something to do with the chat token template, where im still working on getting all of that updated too, and theres also branch versions of llama.cpp that seemingly fix the thinking loops just hasnt made it to main yet, that ive seen.

Koitenshin

15 days ago

yeah, ive been using it since, thanks for the suggestion! its now my new daily driver! gemma 4, certainly has a lot of personality and knowledge packed in.

You're welcome for the suggestion, thanks for giving it your crunching process. 😄

I like Gemma 4 26B-A4B, but without your GGUFs I have to close every single open process on my PC.

I've been testing this one extensively, and so far it feels far more capable than your other v6. I haven't seen a single wrong token... yet.

deucebucket

Owner 15 days ago

I've been testing this one extensively, and so far it feels far more capable than your other v6. I haven't seen a single wrong token... yet.

i did also notice heretic did perform better on all the tests i put it through, so that definitely worth noting. No clue what in the break down also improved accuracy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment