Instructions to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF", dtype="auto")

llama-cpp-python

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF",
	filename="gemma-4-26B-A4B-Heretic-Stable.BF16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Use Docker

docker model run hf.co/prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Ollama:
```
ollama run hf.co/prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
```

Unsloth Studio

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF to start chatting

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M
```

Lemonade

How to use prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull prithivMLmods/gemma-4-26B-A4B-Heretic-Stable-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.gemma-4-26B-A4B-Heretic-Stable-GGUF-Q4_K_M

List all available models

lemonade list

gemma-4-26B-A4B-Heretic-Stable-GGUF

gemma-4-26B-A4B-Heretic-Stable is an abliterated evolution built on top of google/gemma-4-26B-A4B-it. This model applies advanced refusal direction analysis and abliteration-based training strategies to significantly reduce internal refusal behaviors while preserving the reasoning and instruction-following strengths of the original architecture. The result is a powerful 26B parameter language model optimized for detailed responses and improved instruction adherence.

This model is materialized for research and learning purposes only. The model has reduced internal refusal behaviors, and any content generated by it is used at the user’s own risk. The authors and hosting page disclaim any liability for content generated by this model. Users are responsible for ensuring that the model is used in a safe, ethical, and lawful manner.

Evaluation [Self Reported]

Metric	Result
Refusal Rate (harm_bench)	0 / 500
Test Setup	500 random harmful prompts
Inference Pipeline	Transformers
Inference Type	text-generation
Dataset	harm_bench

Note: This model was tested on 500 randomly sampled harmful prompts based on the harm_bench dataset. The result shows 0 refusals out of 500. For more details, refer to the dataset page linked above.

Model Files

File Name	Quant Type	File Size	File Link
gemma-4-26B-A4B-Heretic-Stable.BF16.gguf	BF16	50.5 GB	Download
gemma-4-26B-A4B-Heretic-Stable.F16.gguf	F16	50.5 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q2_K.gguf	Q2_K	10.6 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q3_K_L.gguf	Q3_K_L	13.8 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q3_K_M.gguf	Q3_K_M	13.3 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q3_K_S.gguf	Q3_K_S	12.2 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q4_0.gguf	Q4_0	14.4 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q4_K_M.gguf	Q4_K_M	16.8 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q4_K_S.gguf	Q4_K_S	15.5 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q5_0.gguf	Q5_0	17.5 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q5_K_M.gguf	Q5_K_M	19.1 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q5_K_S.gguf	Q5_K_S	18 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q6_K.gguf	Q6_K	22.6 GB	Download
gemma-4-26B-A4B-Heretic-Stable.Q8_0.gguf	Q8_0	26.9 GB	Download
gemma-4-26B-A4B-Heretic-Stable.mmproj-bf16.gguf	mmproj-bf16	1.19 GB	Download
gemma-4-26B-A4B-Heretic-Stable.mmproj-f16.gguf	mmproj-f16	1.19 GB	Download
gemma-4-26B-A4B-Heretic-Stable.mmproj-q8_0.gguf	mmproj-q8_0	806 MB	Download