How to use from
Hermes Agent
Start the MLX server
# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "TaterTotterson/Gemma-4-26B-A4B-IT-UD-Q4_K_XL-mlx-Tater-NoThink"
Configure Hermes
# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default TaterTotterson/Gemma-4-26B-A4B-IT-UD-Q4_K_XL-mlx-Tater-NoThink
Run Hermes
hermes
Quick Links

Gemma-4-26B-A4B-IT UD-Q4_K_XL MLX - Tater NoThink

This is a Tater NoThink repack of Brooooooklyn/Gemma-4-26B-A4B-IT-UD-Q4_K_XL-mlx.

What changed

  • The model weights are unchanged.
  • chat_template.jinja now forces enable_thinking = false.
  • The model card is tagged for Tater Picks auto-discovery.

Why

Tater works best with models that answer directly and do not emit hidden reasoning or thinking blocks. This repo keeps the same MLX quantized model but makes the default chat template run in NoThink mode for runtimes that use the embedded template.

Source

License

This model inherits the Gemma Terms of Use from the base model.

Downloads last month
-
Safetensors
Model size
5B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support