Instructions to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", dtype="auto")

llama-cpp-python

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF",
	filename="Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Use Docker

docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

SGLang

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Ollama:
```
ollama run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
```

Unsloth Studio

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Docker Model Runner:
```
docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
```

Lemonade

How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF-Q4_K_M

List all available models

lemonade list

llmfan46 commited on 2 days ago

Commit

4d1a6db

verified ·

1 Parent(s): 934223c

Upload folder using huggingface_hub

Browse files

Files changed (10) hide show

.gitattributes +8 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_S.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf +3 -0
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf +3 -0
README.md +570 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:748bf63e017747bc982166e3e92f6af6b7332aa980fb9245b148b5aa25aaed64
+size 17920698112

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1ee1cd89574b1b2e57447d6dab8328a6c39b20d18557a322d928215a8a8a96de
+size 6216967936

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:41a2f094e227eb8496ba86aa003dd1820f4ec44f8c84873dd06c362643521517
+size 5939488512

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:28ded3a1ed97ae624200f4cea638c6b0590fe8b7d90ffadce3a8faf85c13e65b
+size 6621325056

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dda4d89a6643ad489bd2091dca5c24baca036faece90303aadf39528bea3c2a2
+size 6458664704

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56d1774415e7c4607ba95d07d0d1cccafbd175ec5360b61cc4db170214b43167
+size 7458301696

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a62cc2cbdd647f4e28f1d2128bb0c993061300fd02949f7781e85ed7c64d1154
+size 9910888192

Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0616764a8e5f4db57bca7d355eaaa3d14176a0bcf6f500df037ab94b5e8d29df
+size 921705248

README.md ADDED Viewed

	@@ -0,0 +1,570 @@

+---
+license: apache-2.0
+base_model:
+- llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic
+language:
+- en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- qwen3.5
+- reasoning
+- uncensored
+- long-context
+- 1M-context
+- function-calling
+- tool-use
+- sft
+- full-fine-tune
+- cybersecurity
+- biomedical
+- agentic
+- heretic
+- uncensored
+- decensored
+- abliterated
+- mpoa
+---
+<div style="background-color: #ff4444; color: white; padding: 20px; border-radius: 10px; text-align: center; margin: 20px 0;">
+<h2 style="color: white; margin: 0 0 10px 0;">🚨⚠️ I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT ⚠️🚨</h2>
+<p style="font-size: 18px; margin: 0 0 15px 0;">I can no longer upload new models unless I can cover the cost of additional storage.<br>I host <b>70+ free models</b> as an independent contributor and this work is unpaid.<br><b>Without your support, no more new models can be uploaded.</b></p>
+<p style="font-size: 20px; margin: 0;">
+<a href="https://patreon.com/LLMfan46" style="color: white; text-decoration: underline;">🎉 Patreon (Monthly)</a> &nbsp;|&nbsp;
+<a href="https://ko-fi.com/llmfan46" style="color: white; text-decoration: underline;">☕ Ko-fi (One-time)</a>
+</p>
+<p style="font-size: 16px; margin: 10px 0 0 0;">Every contribution goes directly toward Hugging Face storage fees to keep models free for everyone.</p>
+</div>
+---
+### **85% fewer refusals** (11/100 Uncensored vs 73/100 Original) while preserving model quality (0.0123 KL divergence).
+## ❤️ Support My Work
+Creating these models takes significant time, work and compute. If you find them useful consider supporting me:
+![image/png](https://huggingface.co/llmfan46/Omega-Darker-Gaslight_The-Final-Forgotten-Fever-Dream-24B-ultra-uncensored-heretic-v1/resolve/main/waifu001.webp)
+| Platform | Link | What you get |
+|----------|------|--------------|
+| 🎉 Patreon | [Monthly support](https://patreon.com/LLMfan46) | Priority model requests |
+| ☕ Ko-fi | [One-time tip](https://ko-fi.com/llmfan46) | My eternal gratitude |
+Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.
+-----
+GGUF quantizations of [llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic).
+# This is a decensored version of a [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M), made using [Heretic](https://heretic-project.org/) v1.2.0 with a variant of the [Magnitude-Preserving Orthogonal Ablation (MPOA)](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) method
+## Abliteration parameters
+| Parameter | Value |
+| :-------- | :---: |
+| **direction_index** | 20.52 |
+| **attn.out_proj.max_weight** | 1.74 |
+| **attn.out_proj.max_weight_position** | 29.99 |
+| **attn.out_proj.min_weight** | 1.02 |
+| **attn.out_proj.min_weight_distance** | 24.58 |
+| **mlp.down_proj.max_weight** | 1.98 |
+| **mlp.down_proj.max_weight_position** | 19.18 |
+| **mlp.down_proj.min_weight** | 1.65 |
+| **mlp.down_proj.min_weight_distance** | 11.05 |
+| **attn.o_proj.max_weight** | 1.98 |
+| **attn.o_proj.max_weight_position** | 23.72 |
+| **attn.o_proj.min_weight** | 0.76 |
+| **attn.o_proj.min_weight_distance** | 13.31 |
+## Targeted components
+  * attn.o_proj
+  * attn.out_proj
+  * mlp.down_proj
+## Performance
+| Metric | This model | Original model ([Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)) |
+| :----- | :--------: | :---------------------------: |
+| **KL divergence** | <span style="color:darkgoldenrod">0.0123</span> | 0 *(by definition)* |
+| **Refusals** | ✅ <span style="color:darkgreen">11/100</span> | ❌ <span style="color:blue">73/100</span> |
+## MMLU test results:
+<span style="color:blue">Original:</span>
+============================================================
+- Total questions: 7021
+- Correct: 5408
+- **Accuracy: 0.7703 (77.03%)**
+- Parse failures: 0
+============================================================
+**Tested subject scores:**
+- professional_law: 0.5885 (462/785)
+- moral_scenarios: 0.5339 (236/442)
+- miscellaneous: 0.8851 (339/383)
+- professional_psychology: 0.8259 (261/316)
+- high_school_psychology: 0.9593 (259/270)
+- high_school_macroeconomics: 0.8477 (167/197)
+- elementary_mathematics: 0.7011 (129/184)
+- moral_disputes: 0.8046 (140/174)
+- prehistory: 0.8372 (144/172)
+- philosophy: 0.7673 (122/159)
+- high_school_biology: 0.9276 (141/152)
+- professional_accounting: 0.6713 (96/143)
+- clinical_knowledge: 0.8286 (116/140)
+- high_school_microeconomics: 0.9485 (129/136)
+- nutrition: 0.8296 (112/135)
+- professional_medicine: 0.8582 (115/134)
+- conceptual_physics: 0.8359 (107/128)
+- high_school_mathematics: 0.5591 (71/127)
+- human_aging: 0.7586 (88/116)
+- security_studies: 0.7857 (88/112)
+- high_school_statistics: 0.7748 (86/111)
+- marketing: 0.9450 (103/109)
+- high_school_world_history: 0.9057 (96/106)
+- sociology: 0.8932 (92/103)
+- high_school_government_and_politics: 0.9505 (96/101)
+- high_school_geography: 0.9293 (92/99)
+- high_school_chemistry: 0.7526 (73/97)
+- high_school_us_history: 0.9158 (87/95)
+- virology: 0.5281 (47/89)
+- college_medicine: 0.8295 (73/88)
+- world_religions: 0.8409 (74/88)
+- high_school_physics: 0.6429 (54/84)
+- electrical_engineering: 0.7654 (62/81)
+- astronomy: 0.9494 (75/79)
+- logical_fallacies: 0.8553 (65/76)
+- high_school_european_history: 0.9041 (66/73)
+- anatomy: 0.7887 (56/71)
+- college_biology: 0.9375 (60/64)
+- human_sexuality: 0.8125 (52/64)
+- formal_logic: 0.6719 (43/64)
+- public_relations: 0.7049 (43/61)
+- international_law: 0.9000 (54/60)
+- college_physics: 0.6842 (39/57)
+- college_mathematics: 0.5455 (30/55)
+- econometrics: 0.6296 (34/54)
+- jurisprudence: 0.8491 (45/53)
+- high_school_computer_science: 0.8654 (45/52)
+- machine_learning: 0.6154 (32/52)
+- medical_genetics: 0.8627 (44/51)
+- global_facts: 0.3725 (19/51)
+- management: 0.9200 (46/50)
+- us_foreign_policy: 0.9400 (47/50)
+- college_chemistry: 0.5532 (26/47)
+- abstract_algebra: 0.5957 (28/47)
+- business_ethics: 0.6957 (32/46)
+- college_computer_science: 0.8000 (36/45)
+- computer_security: 0.7907 (34/43)
+<span style="color:darkgreen">Heretic:</span>
+============================================================
+- Total questions: 7021
+- Correct: 5431
+- **Accuracy: 0.7735 (77.35%)**
+- Parse failures: 0
+============================================================
+**Tested subject scores:**
+- professional_law: 0.6000 (471/785)
+- moral_scenarios: 0.5294 (234/442)
+- miscellaneous: 0.8930 (342/383)
+- professional_psychology: 0.8259 (261/316)
+- high_school_psychology: 0.9630 (260/270)
+- high_school_macroeconomics: 0.8426 (166/197)
+- elementary_mathematics: 0.7011 (129/184)
+- moral_disputes: 0.8161 (142/174)
+- prehistory: 0.8430 (145/172)
+- philosophy: 0.7673 (122/159)
+- high_school_biology: 0.9276 (141/152)
+- professional_accounting: 0.6783 (97/143)
+- clinical_knowledge: 0.8286 (116/140)
+- high_school_microeconomics: 0.9559 (130/136)
+- nutrition: 0.8370 (113/135)
+- professional_medicine: 0.8657 (116/134)
+- conceptual_physics: 0.8359 (107/128)
+- high_school_mathematics: 0.5748 (73/127)
+- human_aging: 0.7586 (88/116)
+- security_studies: 0.7857 (88/112)
+- high_school_statistics: 0.7838 (87/111)
+- marketing: 0.9358 (102/109)
+- high_school_world_history: 0.9057 (96/106)
+- sociology: 0.8932 (92/103)
+- high_school_government_and_politics: 0.9406 (95/101)
+- high_school_geography: 0.9293 (92/99)
+- high_school_chemistry: 0.7526 (73/97)
+- high_school_us_history: 0.9053 (86/95)
+- virology: 0.5281 (47/89)
+- college_medicine: 0.8295 (73/88)
+- world_religions: 0.8523 (75/88)
+- high_school_physics: 0.6429 (54/84)
+- electrical_engineering: 0.7654 (62/81)
+- astronomy: 0.9367 (74/79)
+- logical_fallacies: 0.8553 (65/76)
+- high_school_european_history: 0.9178 (67/73)
+- anatomy: 0.7887 (56/71)
+- college_biology: 0.9531 (61/64)
+- human_sexuality: 0.7969 (51/64)
+- formal_logic: 0.6562 (42/64)
+- public_relations: 0.7049 (43/61)
+- international_law: 0.9000 (54/60)
+- college_physics: 0.6491 (37/57)
+- college_mathematics: 0.5636 (31/55)
+- econometrics: 0.6296 (34/54)
+- jurisprudence: 0.8491 (45/53)
+- high_school_computer_science: 0.8654 (45/52)
+- machine_learning: 0.6346 (33/52)
+- medical_genetics: 0.8627 (44/51)
+- global_facts: 0.4314 (22/51)
+- management: 0.9200 (46/50)
+- us_foreign_policy: 0.9400 (47/50)
+- college_chemistry: 0.5319 (25/47)
+- abstract_algebra: 0.6170 (29/47)
+- business_ethics: 0.7174 (33/46)
+- college_computer_science: 0.8222 (37/45)
+- computer_security: 0.8140 (35/43)
+MMLU - Massive Multitask Language Understanding, multiple-choice questions across 57 subjects (math, history, law, medicine, etc.).
+-----
+## Quantizations
+For the K-quants below, small SSM tensors are kept at higher precision where useful.
+-`Q8_0` and `Q4_K` quants keep `ssm_alpha`, `ssm_beta`, and `ssm_out` as `BF16`.
+-`Q6_K` and `Q5_K` quants keep `ssm_alpha`, `ssm_beta` and `ssm_out` as `Q8_0`.
+This helps preserve the hybrid/SSM blocks with a small file-size increase.
+| Filename | Quant | Description |
+|----------|-------|-------------|
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf | BF16 | Full precision |
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf | Q8_0 | Near-lossless, recommended |
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf | Q6_K | Excellent quality |
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf | Q5_K_M | Good balance |
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf | Q4_K_M | Good for limited VRAM |
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf | Q4_K_S | Smaller Q4 |
+## Vision Projector
+| Filename | Quant | Description |
+|----------|-------|-------------|
+| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf | BF16 | Native precision |
+A Vision Projector File is Required for vision/multimodal capabilities. Use alongside any quantization above.
+## Usage
+Works with llama.cpp, LM Studio, Ollama, and other GGUF-compatible tools.
+-----
+<p align="center">
+  <img src="assets/qwythos.png" alt="Qwythos-9B" width="640"/>
+</p>
+# Qwythos-9B
+**Developed by [Empero](https://empero.org)**
+**Qwythos-9B** is a full-parameter reasoning model built on top of a **deeply uncensored Qwen3.5-9B base** and post-trained on **over 500 million tokens** of high-quality Claude Mythos and Claude Fable traces, with chain-of-thought generated in-house by Empero AI's internal tool **rethink**.
+The result is a compact, fast, **dramatically more capable** 9B reasoning model. Headline capabilities:
+- **🔭 1,048,576-token context** — Qwythos ships with **YaRN rope-scaling enabled by default** for a **full 1M-token context window** out of the box. One of the longest context windows available in any 9B-class open-weight model, suitable for whole-codebase reasoning, multi-document research, and long agentic trajectories.
+- **📈 Dominates the base** under matched evaluation: **+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex.**
+- **🛠 Native function calling** per Qwen3.5's spec — no extra wrapper, no tool-specific fine-tune required.
+- **🎯 Self-corrects with tools** — when given a Python executor and a web search tool, Qwythos produced source-cited, factually-correct answers on **7 of 7** test prompts spanning math, cybersecurity, clinical pharmacology, and biochemistry.
+Qwythos is intentionally **uncensored**. It is designed to engage seriously with technically demanding questions across cybersecurity, red-teaming methodology, biology, pharmacology, and clinical medicine — domains where over-aligned models tend to refuse, hedge into uselessness, or surface boilerplate disclaimers in place of substance.
+---
+## Headline results
+<p align="center">
+  <img src="assets/qwythos_eval_chart.svg" alt="Qwythos vs. base Qwen3.5-9B across seven benchmarks" width="900"/>
+</p>
+**Same harness. Same sampling. Same prompts. The wins are real.**
+| Task | Metric | Base Qwen3.5-9B | **Qwythos-9B** | Δ |
+|---|---|---:|---:|---:|
+| gsm8k | exact_match (flexible) | 0.670 | **0.860** | **+0.190** |
+| gsm8k | exact_match (strict) | 0.510 | **0.810** | **+0.300** |
+| mmlu | acc | 0.232 | **0.575** | **+0.343** |
+| arc_challenge | acc | 0.470 | **0.490** | +0.020 |
+| arc_challenge | acc_norm | 0.400 | **0.410** | +0.010 |
+| gpqa_diamond (CoT, 0-shot) | exact_match (flexible) | 0.630 | 0.580 | −0.050 |
+All numbers produced with [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness), HF backend, `--apply_chat_template`, Qwen3.5 sampling (`temperature=0.6, top_p=0.95, top_k=20`), `--limit 100`. Full per-task and per-subject (MMLU) breakdown in [`evals/lm_eval_results.md`](evals/lm_eval_results.md). Raw `results*.json` and per-sample `samples_*.jsonl` are available on request.
+The **MMLU +34.3** lift is the headline. Qwythos posts **0.575 mean across all 57 subjects, peaking at 0.78 on government/politics, 0.77 on college biology, 0.74 on conceptual physics** — placing it well above what most 9B reasoning models deliver under the same evaluation conditions. Absolute MMLU numbers for any 9B model are sensitive to harness, few-shot count, and chat-template handling; what matters in this comparison is that both models were evaluated with identical settings.
+---
+## Capability: Native tool use with self-correction
+Qwythos supports **OpenAI/Qwen3.5-style function calling out of the box** — no extra wrapper, no fine-tune-on-tools needed. Pass `tools=[...]` to the chat template and the model emits valid `<tool_call>` blocks per Qwen3.5's spec, with required parameters honored.
+We evaluated tool use on a 7-prompt harness combining capability demos with **deliberately hard factual-recall prompts where closed-book sampling fails:**
+| Prompt | Tool selected | Outcome |
+|---|---|---|
+| Compute `sin(π/7) × cos(π/11)` to 10 dp | `python_executor` | ✅ `0.4163083990` (correct, single call) |
+| Count primes below 100,000 | `python_executor` | ✅ `9592` (correct, wrote and ran a sieve) |
+| Latest stable CPython 3 release | `web_search` | ✅ Found 3.14.6 (June 2026), 3.15 in beta, cited source |
+| **Hashcat mode for Kerberos TGS-REP** | `web_search` | ✅ **`-m 13100`** with 4 corroborating sources |
+| **CVE for PrintNightmare** | `web_search` | ✅ **CVE-2021-34527** (and correctly distinguished from CVE-2021-1675 / CVE-2021-34481 variants) |
+| **Is physostigmine indicated for organophosphate poisoning?** | `web_search` | ✅ **"NOT indicated — would be harmful. Physostigmine is for the anticholinergic toxidrome."** Cited LITFL toxicology. |
+| **DPP-4 cleavage site in GLP-1 / semaglutide modification** | `web_search` | ✅ **Ala⁸–Glu⁹ cleavage, α-aminoisobutyric acid (Aib) at position 8 in semaglutide** — cited Wikipedia and pharma source |
+**7 of 7 succeeded.** Tool selection was always sensible (math → Python; facts → search). The four bottom rows are particularly important: they are the **four hardest specialty facts** to recall closed-book — and Qwythos, given the right tools, **searched, integrated multiple sources, and produced source-cited correct answers** in every case.
+Full transcripts with the model's reasoning, every tool call issued, every result returned, and the final integrated answer are in [`evals/tool_test_outputs.md`](evals/tool_test_outputs.md).
+This makes Qwythos **deployment-ready for retrieval-augmented agentic settings**, where the model verifies its specifics rather than fabricating them.
+---
+## Capability: 1,048,576-token context window
+Qwythos ships with **YaRN rope-scaling configured by default** for a **1,048,576-token (≈1M) context window** — a 4× extension over the 262,144-token native architecture. The configuration is baked into `config.json` and applies automatically at load time; no separate flag, post-processing step, or YaRN-specific tokenizer is required:
+```json
+"rope_parameters": {
+  "rope_type": "yarn",
+  "factor": 4.0,
+  "original_max_position_embeddings": 262144,
+  "mrope_interleaved": true,
+  "mrope_section": [11, 11, 10],
+  "rope_theta": 10000000
+},
+"max_position_embeddings": 1048576
+```
+This is the **official Qwen3.5 recipe for 1M context**, matching the configuration documented in Qwen's own model card and the vLLM/SGLang deployment recipes. Long-context inference was validated on this checkpoint via in-house smoke testing at ~137k tokens.
+**What 1M context unlocks:**
+- **Whole-codebase reasoning.** A 1M-token window comfortably fits multi-hundred-thousand-line repositories — enabling cross-file refactoring, defect-finding, and architectural review *without* RAG chunking.
+- **Long agentic trajectories.** Multi-round tool-use sessions with verbose tool outputs (large web-search hit sets, paginated API responses, long Python tracebacks) stay in-context across dozens of turns.
+- **Multi-document research.** A typical research session (10–20 papers + notes + the user's working draft) fits in one prompt — synthesize across all of them in a single forward pass.
+- **Long-form scientific reasoning.** Chains of `<think>` reasoning over multi-paper biomedical or pharmacological corpora.
+**Serving at 1M:**
+```bash
+# vLLM
+vllm serve empero-ai/Qwythos-9B-Claude-Mythos-5-1M --max-model-len 1010000
+# SGLang
+SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python -m sglang.launch_server \
+  --model-path empero-ai/Qwythos-9B-Claude-Mythos-5-1M --context-length 1010000
+```
+**Practical notes:**
+- The full 1M window benefits from tensor-parallel multi-GPU or aggressive KV-cache offload — a single H100/H200 comfortably handles **256k–512k**. Below ~256k tokens of context, the hybrid Gated-DeltaNet attention stack keeps memory growth sub-quadratic, so long contexts are dramatically cheaper than they'd be on a pure full-attention model of similar size.
+- Static YaRN at factor=4.0 introduces a small short-context quality cost (a known YaRN trade-off across the industry). For workloads that *never* exceed the native 262k window and want maximum short-context fidelity, restore `rope_parameters.rope_type` to `"default"` from the included `config.json.pre_yarn` backup.
+### Reproducing the tool harness
+The harness is a small ~150-line Python file:
+- `python_executor(code)` — runs Python in a subprocess (12s timeout, captured stdout/stderr)
+- `web_search(query, max_results)` — DuckDuckGo via the `ddgs` package
+Pass both as `tools=` to `apply_chat_template` and parse `<tool_call>` blocks from the model's output. The parser handles Qwen3.5's chat-template format:
+```
+<tool_call>
+<function=NAME>
+  <parameter=PARAM>value</parameter>
+</function>
+</tool_call>
+```
+Empero will release the reference harness on GitHub.
+---
+## Sampling recommendations
+Qwythos was trained as a reasoning model and inherits Qwen3.5's thinking-mode behavior. Use these settings as defaults:
+```python
+gen_kwargs = dict(
+    do_sample=True,
+    temperature=0.6,    # Qwen3.5 thinking-mode recommended
+    top_p=0.95,
+    top_k=20,
+    repetition_penalty=1.05,
+    max_new_tokens=16384,  # generous budget for the <think> reasoning block + final answer
+)
+```
+**Why these:** in a controlled retest (see [`evals/retest_outputs.md`](evals/retest_outputs.md)), we evaluated multiple sampling configurations against the three most-difficult factual prompts. **Greedy decoding and very-low-temperature sampling (T≤0.3) degenerated into repetition loops** — a known failure mode for reasoning models on this class of prompts. **Qwen3.5's recommended setting (T=0.6) cleanly avoids this** and delivers the best factual reliability we measured: across the three retest prompts, **zero of the six errors flagged in closed-book review recurred at T=0.6** — including the safety-relevant physostigmine claim, the misattributed CVE, and the incorrect hashcat hash-mode.
+Use `repetition_penalty=1.05` — a small deviation from Qwen's default of 1.0 that prevents rare non-terminating reasoning loops on long generations.
+---
+## Domain coverage
+Qwythos is a **general-purpose reasoning model with explicit emphasis on cybersecurity, biomedical, and quantitative reasoning**. From the qualitative sample-generations review across 25 prompts spanning these domains (full transcripts in [`evals/sample_generations.md`](evals/sample_generations.md)):
+- **Cybersecurity** — produces detailed defender-oriented walkthroughs of SQL injection mitigations, TLS handshake structure, EDR/process-injection detection, Linux hardening, MITRE ATT&CK ransomware kill chains.
+- **Red-team methodology** — clean explanations of engagement phases, scoping, rules of engagement, evidence handling, reporting. Especially strong on social-engineering pretext analysis and phishing-resistant defenses.
+- **Biology / biochemistry** — step-by-step mechanisms for CRISPR-Cas9, mRNA vaccines, SARS-CoV-2 spike protein, antibiotic-resistance mechanisms, organophosphate AChE inhibition.
+- **Pharmacology** — strong on receptor pharmacology fundamentals (agonism, antagonism, partial agonism with worked examples), statin mechanism, opioid respiratory depression at the brainstem level, beta-blocker indications, therapeutic-window reasoning for narrow-index drugs.
+- **Clinical medicine** — ACS chest-pain differential and workup, type-2 diabetes pathophysiology and drug-class targeting, sepsis recognition (qSOFA) and bundle.
+- **Math** — strong at gsm8k-style multi-step word problems, minerva-style competition math; **86% gsm8k**, integer arithmetic verified by `python_executor` when invoked.
+**The uncensored base means Qwythos engages substantively** with these prompts rather than refusing, hedging, or burying answers in disclaimer boilerplate. Reasoning is shown in the `<think>` block; final answer follows.
+---
+## Model details
+- **Base model:** [`Qwen/Qwen3.5-9B`](https://huggingface.co/Qwen/Qwen3.5-9B) — a dense, natively multimodal architecture with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
+- **Fine-tune type:** full parameter (all text-backbone weights trained). The vision tower was frozen — training was text-only, so vision behavior is inherited from the base and was not tuned or tested.
+- **Objective:** supervised fine-tuning, assistant-only loss (the model is scored only on the assistant/completion tokens; prompts are masked).
+- **Context length:** **1,048,576 tokens (≈1M) — YaRN rope-scaling enabled by default in `config.json`.** Native architectural context is 262,144 tokens; YaRN factor 4.0 extends this to the full 1M window without any retraining or runtime flag, matching Qwen's official long-context recipe.
+- **License:** Apache 2.0.
+## Training data
+Qwythos was post-trained on **over 500 million tokens** of high-quality reasoning data drawn from:
+- **Claude Mythos and Claude Fable traces** — long, multi-turn problem-solving conversations spanning code, math, science reasoning, biomedical analysis, and agentic tool use.
+- **Chain-of-thought generated in-house by `rethink`**, Empero AI's internal CoT-generation tool. `rethink` produces deliberately structured `<think>`-block reasoning that walks through hypothesis, verification, and conclusion before the final answer is committed — directly shaping Qwythos's reason-then-answer behavior.
+All data was normalized to Qwen3.5's chat format. Training used assistant-only loss so the model is scored only on completion tokens.
+## Training procedure
+Full-parameter supervised fine-tuning with [TRL](https://github.com/huggingface/trl):
+| Hyperparameter | Value |
+|---|---|
+| Schedule | 2-phase curriculum: broad reasoning corpus → focused agentic + coding |
+| Effective batch size | 16 |
+| Max sequence length | 128,000 (no truncation) |
+| Learning rate | 1e-5 → 5e-6 cosine across phases |
+| Optimizer | paged AdamW (8-bit) |
+| Precision | bf16 |
+| Loss | chunked NLL, assistant-only |
+Held-out validation loss decreased monotonically across both phases (final eval_loss ≈ 0.709, mean token accuracy 0.799 on a curated holdout). No overfitting observed.
+---
+## How to use
+The base is multimodal; for text-only inference load with `AutoModelForImageTextToText`:
+```python
+import torch
+from transformers import AutoModelForImageTextToText, AutoTokenizer
+model_id = "empero-ai/Qwythos-9B-Claude-Mythos-5-1M"
+tok = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForImageTextToText.from_pretrained(
+    model_id, dtype="bfloat16", device_map="auto"
+)
+messages = [
+    {"role": "user",
+     "content": "Walk through the biochemistry of how organophosphate nerve agents inhibit acetylcholinesterase, the resulting cholinergic toxicity, and the medical antidotes."}
+]
+text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tok(text, return_tensors="pt").to(model.device)
+out = model.generate(
+    **inputs, max_new_tokens=16384, do_sample=True,
+    temperature=0.6, top_p=0.95, top_k=20, repetition_penalty=1.05,
+)
+# Output opens with <think>...</think> reasoning, then the final answer.
+print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
+```
+### With tools (function calling)
+```python
+TOOLS = [
+    {"type": "function", "function": {
+        "name": "python_executor",
+        "description": "Execute Python code and return stdout.",
+        "parameters": {"type": "object",
+                       "properties": {"code": {"type": "string"}},
+                       "required": ["code"]}}},
+    {"type": "function", "function": {
+        "name": "web_search",
+        "description": "Search the web for current facts and citations.",
+        "parameters": {"type": "object",
+                       "properties": {"query": {"type": "string"},
+                                      "max_results": {"type": "integer"}},
+                       "required": ["query"]}}},
+]
+text = tok.apply_chat_template(messages, tools=TOOLS, tokenize=False, add_generation_prompt=True)
+# ... then parse <tool_call><function=...><parameter=...>...</parameter></function></tool_call> blocks
+```
+**Requirements:** a recent `transformers` (Qwen3.5 support) plus the Gated DeltaNet kernels ([`flash-linear-attention`](https://github.com/fla-org/flash-linear-attention) and a CUDA-matched `causal_conv1d` build) — without them the linear-attention layers fall back to slow, memory-hungry PyTorch ops.
+---
+## Limitations
+Qwythos is a focused 9B reasoning model. A few characteristics are worth knowing to get the best out of it:
+- **It's a reasoning model.** Every answer opens with a `<think>` block before the final response. Allow generous `max_new_tokens` (16,384 recommended) and parse/strip the `<think>...</think>` span for end users.
+- **Use recommended sampling.** At greedy decoding or very-low-temperature (T≤0.3) sampling, the model can enter repetition loops on long generations — a known reasoning-model failure mode. Use `temperature=0.6, top_p=0.95, top_k=20, repetition_penalty=1.05` for consistently crisp results.
+- **Verify specifics in safety-critical contexts.** Like all closed-book LLMs in this weight class, Qwythos can over-commit to specific identifiers (CVEs, hashcat modes, exact biochem positions, drug-label numerics) it isn't certain about. **The tool-augmented path (Python executor + web search) cleanly resolves this** in our evaluation — for deployments where exact identifiers matter, pair Qwythos with retrieval or function calling.
+- **Uncensored.** Qwythos inherits a deeply uncensored base and does not refuse or hedge on technically demanding questions. Add your own application-level review/safety layer for end-user-facing deployments where that matters.
+- **Text-only fine-tune.** The base is multimodal, but only the text path was trained. Vision behavior is inherited from the base and was not evaluated here.
+---
+## Stay in the loop
+Sign up for the Empero newsletter at **[empero.org](https://empero.org)** for releases, evals, and research notes on Qwythos and future open-weight models from the lab.
+## Support / Donate
+If this model helped you, consider supporting the project:
+- **BTC**: `bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v`
+- **LTC**: `ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x`
+- **XMR**: `42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY`
+---
+## Provenance & licensing
+Weights are released under **Apache-2.0**, inherited from the Qwen3.5-9B base. Shared for research and experimentation, as-is.
+## Acknowledgements
+- Developed and released by [Empero](https://empero.org)
+- Base model: [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (Alibaba Qwen team)
+- Training: [TRL](https://github.com/huggingface/trl) + [Transformers](https://github.com/huggingface/transformers)
+- Linear-attention kernels: [flash-linear-attention](https://github.com/fla-org/flash-linear-attention), [causal_conv1d](https://github.com/Dao-AILab/causal-conv1d)
+- Evaluation: [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (EleutherAI)