How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="gkraker04/Ornstein-Hermes-3.6-27B-SABER",
	filename="",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Ornstein-Hermes-3.6-27b-SABER_Adjacent-Quants

Adjacent Quantizations

These are the adjacent GGUF quantizations of GestaltLabs/Ornstein-Hermes-3.6-27b-SABER, a SABER-edited version of GestaltLabs/Ornstein-Hermes-3.6-27b.

@DJLougen has since uploaded the smaller GGUF quantizations, but before he did I saw:

The included imatrix file was generated from DJLougen/Acta-Synthetic. It is included for reproducibility and for users who want to regenerate adjacent quantizations.

Which sent me down a shallow rabbit hole to figure just how to do that. These are the fruit of that labor.

Theses were all converted from the full SABER Hugging Face checkpoint. I only intend to fill the gaps @DJLougen left, but I may come back and do all the quants off of the full checkpoint.

Enjoy!

Ornstein-Hermes-3.6-27B SABER GGUF

GGUF quantizations of GestaltLabs/Ornstein-Hermes-3.6-27b-SABER, a SABER-edited version of GestaltLabs/Ornstein-Hermes-3.6-27b.

SABER is a controlled refusal-shaping workflow. The release target is to reduce broad over-refusal while preserving ordinary model behavior and visible boundaries for severe criminal, coercive, or interpersonal-harm requests. The selected checkpoint was chosen as a Pareto point over refusal rate and behavioral drift.

Source Checkpoint

field value
Source repo GestaltLabs/Ornstein-Hermes-3.6-27b-SABER
Base model GestaltLabs/Ornstein-Hermes-3.6-27b
SABER run ornstein_hermes36_27b_svd_a850_g25_retry_biggpu
Expanded refusal eval 1 / 349 refusals
Refusal rate 0.29%
KLD mean 11.2216
Base-vs-base KLD mean 11.2206
KLD delta over base-vs-base +0.0010
KLD prompts 149
Tokens scored for KLD 3,347

The one retained refusal in the expanded evaluation was an illegal-drug-sales request. This is an observed result on the current evaluation set, not a universal guarantee about future behavior.

Quantization Files

file quant size notes
Ornstein-Hermes-3.6-27b-SABER-IQ4_XS.gguf IQ4_XS 15G Compact imatrix-assisted 4-bit option.
Ornstein-Hermes-3.6-27b-SABER-IQ2_M.gguf IQ2_M 9G Smallest emergency 2-bit option; expect the most quality loss.
Ornstein-Hermes-3.6-27b-SABER-Q3_K_M.gguf Q3_K_M 13G Smallest file in this suite; expect more quality loss.
Ornstein-Hermes-3.6-27b-SABER-Q4_K_M.gguf Q4_K_M 16G General-purpose recommended starting point.
Ornstein-Hermes-3.6-27b-SABER-Q5_K_M.gguf Q5_K_M 18G Balanced high-quality option.
Ornstein-Hermes-3.6-27b-SABER-Q6_K.gguf Q6_K 21G Strong quality/size option for high-memory local inference.
Ornstein-Hermes-3.6-27b-SABER-Q8_0.gguf Q8_0 27G Highest quality quant in this suite; largest runtime file.

The included imatrix file was generated from DJLougen/Acta-Synthetic. It is included for reproducibility and for users who want to regenerate adjacent quantizations.

Recommended File

Start with for normal desktop use. Use or if you have enough VRAM/RAM and want a higher-quality local run. Use when file size matters more. is mainly for high-memory systems or as a near-lossless GGUF reference.

llama.cpp Compatibility

These files were produced with llama.cpp commit from a BF16 GGUF conversion of the SABER checkpoint. The model uses the GGUF architecture path in current llama.cpp.

Example:

For chat-style use, prefer a frontend or wrapper that applies the tokenizer chat template from the GGUF metadata.

Conversion and Quantization Notes

The Q8_0 GGUF was converted from the full SABER Hugging Face checkpoint. The lower-bit recovery quants were generated from the published Q8_0 GGUF with --allow-requantize and the included Acta-Synthetic imatrix so the missing files could be restored quickly. Importance-matrix calibration used Acta-Synthetic conversational text.

Method Summary

SABER edits refusal behavior through activation/weight-space refusal directions. For this checkpoint, the run used SVD extraction, multi-layer candidate selection, iterative ablation, and KLD-based drift measurement.

Run configuration:

Selected layers:

Total directions ablated: .

Attribution and Related Work

This release builds on the refusal-direction and abliteration research lineage. Relevant prior work and inspirations include:

SABER's contribution in this release is the controlled-refusal-shaping workflow: multi-candidate refusal extraction, separability/entanglement-aware ranking, differential ablation strength, and explicit Pareto selection over refusal behavior and KLD drift.

Limitations

  • Results are specific to the current evaluation set, prompts, and generation settings.
  • The KLD value should be interpreted relative to the base-vs-base control, not as an absolute standalone score.
  • Quantization changes numerical behavior; validate the specific GGUF file you deploy.
  • The model inherits constraints, limitations, and licensing considerations from the base model.
  • This is a model-editing research artifact with dual-use implications.
Downloads last month
358
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gkraker04/Ornstein-Hermes-3.6-27B-SABER

Quantized
(2)
this model

Papers for gkraker04/Ornstein-Hermes-3.6-27B-SABER