Instructions to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", dtype="auto") - llama-cpp-python
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", filename="Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Use Docker
docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
- SGLang
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Ollama
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Ollama:
ollama run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
- Unsloth Studio
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF to start chatting
- Pi
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Docker Model Runner:
docker model run hf.co/llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
- Lemonade
How to use llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-GGUF-Q4_K_M
List all available models
lemonade list
Upload folder using huggingface_hub
Browse files- .gitattributes +8 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_S.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf +3 -0
- Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf +3 -0
- README.md +570 -0
|
@@ -33,3 +33,11 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 39 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 40 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
|
| 41 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
|
| 42 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
|
| 43 |
+
Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:748bf63e017747bc982166e3e92f6af6b7332aa980fb9245b148b5aa25aaed64
|
| 3 |
+
size 17920698112
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1ee1cd89574b1b2e57447d6dab8328a6c39b20d18557a322d928215a8a8a96de
|
| 3 |
+
size 6216967936
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:41a2f094e227eb8496ba86aa003dd1820f4ec44f8c84873dd06c362643521517
|
| 3 |
+
size 5939488512
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:28ded3a1ed97ae624200f4cea638c6b0590fe8b7d90ffadce3a8faf85c13e65b
|
| 3 |
+
size 6621325056
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dda4d89a6643ad489bd2091dca5c24baca036faece90303aadf39528bea3c2a2
|
| 3 |
+
size 6458664704
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:56d1774415e7c4607ba95d07d0d1cccafbd175ec5360b61cc4db170214b43167
|
| 3 |
+
size 7458301696
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:a62cc2cbdd647f4e28f1d2128bb0c993061300fd02949f7781e85ed7c64d1154
|
| 3 |
+
size 9910888192
|
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0616764a8e5f4db57bca7d355eaaa3d14176a0bcf6f500df037ab94b5e8d29df
|
| 3 |
+
size 921705248
|
|
@@ -0,0 +1,570 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
base_model:
|
| 4 |
+
- llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic
|
| 5 |
+
language:
|
| 6 |
+
- en
|
| 7 |
+
library_name: transformers
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
tags:
|
| 10 |
+
- qwen3.5
|
| 11 |
+
- reasoning
|
| 12 |
+
- uncensored
|
| 13 |
+
- long-context
|
| 14 |
+
- 1M-context
|
| 15 |
+
- function-calling
|
| 16 |
+
- tool-use
|
| 17 |
+
- sft
|
| 18 |
+
- full-fine-tune
|
| 19 |
+
- cybersecurity
|
| 20 |
+
- biomedical
|
| 21 |
+
- agentic
|
| 22 |
+
- heretic
|
| 23 |
+
- uncensored
|
| 24 |
+
- decensored
|
| 25 |
+
- abliterated
|
| 26 |
+
- mpoa
|
| 27 |
+
---
|
| 28 |
+
<div style="background-color: #ff4444; color: white; padding: 20px; border-radius: 10px; text-align: center; margin: 20px 0;">
|
| 29 |
+
<h2 style="color: white; margin: 0 0 10px 0;">π¨β οΈ I HAVE REACHED HUGGING FACE'S FREE STORAGE LIMIT β οΈπ¨</h2>
|
| 30 |
+
<p style="font-size: 18px; margin: 0 0 15px 0;">I can no longer upload new models unless I can cover the cost of additional storage.<br>I host <b>70+ free models</b> as an independent contributor and this work is unpaid.<br><b>Without your support, no more new models can be uploaded.</b></p>
|
| 31 |
+
<p style="font-size: 20px; margin: 0;">
|
| 32 |
+
<a href="https://patreon.com/LLMfan46" style="color: white; text-decoration: underline;">π Patreon (Monthly)</a> |
|
| 33 |
+
<a href="https://ko-fi.com/llmfan46" style="color: white; text-decoration: underline;">β Ko-fi (One-time)</a>
|
| 34 |
+
</p>
|
| 35 |
+
<p style="font-size: 16px; margin: 10px 0 0 0;">Every contribution goes directly toward Hugging Face storage fees to keep models free for everyone.</p>
|
| 36 |
+
</div>
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
### **85% fewer refusals** (11/100 Uncensored vs 73/100 Original) while preserving model quality (0.0123 KL divergence).
|
| 41 |
+
|
| 42 |
+
## β€οΈ Support My Work
|
| 43 |
+
Creating these models takes significant time, work and compute. If you find them useful consider supporting me:
|
| 44 |
+
|
| 45 |
+

|
| 46 |
+
|
| 47 |
+
| Platform | Link | What you get |
|
| 48 |
+
|----------|------|--------------|
|
| 49 |
+
| π Patreon | [Monthly support](https://patreon.com/LLMfan46) | Priority model requests |
|
| 50 |
+
| β Ko-fi | [One-time tip](https://ko-fi.com/llmfan46) | My eternal gratitude |
|
| 51 |
+
|
| 52 |
+
Your help will motivate me and would go into further improving my workflow and coverings fees for storage, compute and may even help uncensoring bigger model with rental Cloud GPUs.
|
| 53 |
+
|
| 54 |
+
-----
|
| 55 |
+
|
| 56 |
+
GGUF quantizations of [llmfan46/Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic).
|
| 57 |
+
|
| 58 |
+
# This is a decensored version of a [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M), made using [Heretic](https://heretic-project.org/) v1.2.0 with a variant of the [Magnitude-Preserving Orthogonal Ablation (MPOA)](https://huggingface.co/blog/grimjim/norm-preserving-biprojected-abliteration) method
|
| 59 |
+
|
| 60 |
+
## Abliteration parameters
|
| 61 |
+
|
| 62 |
+
| Parameter | Value |
|
| 63 |
+
| :-------- | :---: |
|
| 64 |
+
| **direction_index** | 20.52 |
|
| 65 |
+
| **attn.out_proj.max_weight** | 1.74 |
|
| 66 |
+
| **attn.out_proj.max_weight_position** | 29.99 |
|
| 67 |
+
| **attn.out_proj.min_weight** | 1.02 |
|
| 68 |
+
| **attn.out_proj.min_weight_distance** | 24.58 |
|
| 69 |
+
| **mlp.down_proj.max_weight** | 1.98 |
|
| 70 |
+
| **mlp.down_proj.max_weight_position** | 19.18 |
|
| 71 |
+
| **mlp.down_proj.min_weight** | 1.65 |
|
| 72 |
+
| **mlp.down_proj.min_weight_distance** | 11.05 |
|
| 73 |
+
| **attn.o_proj.max_weight** | 1.98 |
|
| 74 |
+
| **attn.o_proj.max_weight_position** | 23.72 |
|
| 75 |
+
| **attn.o_proj.min_weight** | 0.76 |
|
| 76 |
+
| **attn.o_proj.min_weight_distance** | 13.31 |
|
| 77 |
+
|
| 78 |
+
## Targeted components
|
| 79 |
+
|
| 80 |
+
* attn.o_proj
|
| 81 |
+
* attn.out_proj
|
| 82 |
+
* mlp.down_proj
|
| 83 |
+
|
| 84 |
+
## Performance
|
| 85 |
+
|
| 86 |
+
| Metric | This model | Original model ([Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)) |
|
| 87 |
+
| :----- | :--------: | :---------------------------: |
|
| 88 |
+
| **KL divergence** | <span style="color:darkgoldenrod">0.0123</span> | 0 *(by definition)* |
|
| 89 |
+
| **Refusals** | β
<span style="color:darkgreen">11/100</span> | β <span style="color:blue">73/100</span> |
|
| 90 |
+
|
| 91 |
+
## MMLU test results:
|
| 92 |
+
|
| 93 |
+
<span style="color:blue">Original:</span>
|
| 94 |
+
|
| 95 |
+
============================================================
|
| 96 |
+
|
| 97 |
+
- Total questions: 7021
|
| 98 |
+
|
| 99 |
+
- Correct: 5408
|
| 100 |
+
|
| 101 |
+
- **Accuracy: 0.7703 (77.03%)**
|
| 102 |
+
|
| 103 |
+
- Parse failures: 0
|
| 104 |
+
|
| 105 |
+
============================================================
|
| 106 |
+
|
| 107 |
+
**Tested subject scores:**
|
| 108 |
+
- professional_law: 0.5885 (462/785)
|
| 109 |
+
- moral_scenarios: 0.5339 (236/442)
|
| 110 |
+
- miscellaneous: 0.8851 (339/383)
|
| 111 |
+
- professional_psychology: 0.8259 (261/316)
|
| 112 |
+
- high_school_psychology: 0.9593 (259/270)
|
| 113 |
+
- high_school_macroeconomics: 0.8477 (167/197)
|
| 114 |
+
- elementary_mathematics: 0.7011 (129/184)
|
| 115 |
+
- moral_disputes: 0.8046 (140/174)
|
| 116 |
+
- prehistory: 0.8372 (144/172)
|
| 117 |
+
- philosophy: 0.7673 (122/159)
|
| 118 |
+
- high_school_biology: 0.9276 (141/152)
|
| 119 |
+
- professional_accounting: 0.6713 (96/143)
|
| 120 |
+
- clinical_knowledge: 0.8286 (116/140)
|
| 121 |
+
- high_school_microeconomics: 0.9485 (129/136)
|
| 122 |
+
- nutrition: 0.8296 (112/135)
|
| 123 |
+
- professional_medicine: 0.8582 (115/134)
|
| 124 |
+
- conceptual_physics: 0.8359 (107/128)
|
| 125 |
+
- high_school_mathematics: 0.5591 (71/127)
|
| 126 |
+
- human_aging: 0.7586 (88/116)
|
| 127 |
+
- security_studies: 0.7857 (88/112)
|
| 128 |
+
- high_school_statistics: 0.7748 (86/111)
|
| 129 |
+
- marketing: 0.9450 (103/109)
|
| 130 |
+
- high_school_world_history: 0.9057 (96/106)
|
| 131 |
+
- sociology: 0.8932 (92/103)
|
| 132 |
+
- high_school_government_and_politics: 0.9505 (96/101)
|
| 133 |
+
- high_school_geography: 0.9293 (92/99)
|
| 134 |
+
- high_school_chemistry: 0.7526 (73/97)
|
| 135 |
+
- high_school_us_history: 0.9158 (87/95)
|
| 136 |
+
- virology: 0.5281 (47/89)
|
| 137 |
+
- college_medicine: 0.8295 (73/88)
|
| 138 |
+
- world_religions: 0.8409 (74/88)
|
| 139 |
+
- high_school_physics: 0.6429 (54/84)
|
| 140 |
+
- electrical_engineering: 0.7654 (62/81)
|
| 141 |
+
- astronomy: 0.9494 (75/79)
|
| 142 |
+
- logical_fallacies: 0.8553 (65/76)
|
| 143 |
+
- high_school_european_history: 0.9041 (66/73)
|
| 144 |
+
- anatomy: 0.7887 (56/71)
|
| 145 |
+
- college_biology: 0.9375 (60/64)
|
| 146 |
+
- human_sexuality: 0.8125 (52/64)
|
| 147 |
+
- formal_logic: 0.6719 (43/64)
|
| 148 |
+
- public_relations: 0.7049 (43/61)
|
| 149 |
+
- international_law: 0.9000 (54/60)
|
| 150 |
+
- college_physics: 0.6842 (39/57)
|
| 151 |
+
- college_mathematics: 0.5455 (30/55)
|
| 152 |
+
- econometrics: 0.6296 (34/54)
|
| 153 |
+
- jurisprudence: 0.8491 (45/53)
|
| 154 |
+
- high_school_computer_science: 0.8654 (45/52)
|
| 155 |
+
- machine_learning: 0.6154 (32/52)
|
| 156 |
+
- medical_genetics: 0.8627 (44/51)
|
| 157 |
+
- global_facts: 0.3725 (19/51)
|
| 158 |
+
- management: 0.9200 (46/50)
|
| 159 |
+
- us_foreign_policy: 0.9400 (47/50)
|
| 160 |
+
- college_chemistry: 0.5532 (26/47)
|
| 161 |
+
- abstract_algebra: 0.5957 (28/47)
|
| 162 |
+
- business_ethics: 0.6957 (32/46)
|
| 163 |
+
- college_computer_science: 0.8000 (36/45)
|
| 164 |
+
- computer_security: 0.7907 (34/43)
|
| 165 |
+
|
| 166 |
+
|
| 167 |
+
<span style="color:darkgreen">Heretic:</span>
|
| 168 |
+
|
| 169 |
+
============================================================
|
| 170 |
+
|
| 171 |
+
- Total questions: 7021
|
| 172 |
+
|
| 173 |
+
- Correct: 5431
|
| 174 |
+
|
| 175 |
+
- **Accuracy: 0.7735 (77.35%)**
|
| 176 |
+
|
| 177 |
+
- Parse failures: 0
|
| 178 |
+
|
| 179 |
+
============================================================
|
| 180 |
+
|
| 181 |
+
**Tested subject scores:**
|
| 182 |
+
- professional_law: 0.6000 (471/785)
|
| 183 |
+
- moral_scenarios: 0.5294 (234/442)
|
| 184 |
+
- miscellaneous: 0.8930 (342/383)
|
| 185 |
+
- professional_psychology: 0.8259 (261/316)
|
| 186 |
+
- high_school_psychology: 0.9630 (260/270)
|
| 187 |
+
- high_school_macroeconomics: 0.8426 (166/197)
|
| 188 |
+
- elementary_mathematics: 0.7011 (129/184)
|
| 189 |
+
- moral_disputes: 0.8161 (142/174)
|
| 190 |
+
- prehistory: 0.8430 (145/172)
|
| 191 |
+
- philosophy: 0.7673 (122/159)
|
| 192 |
+
- high_school_biology: 0.9276 (141/152)
|
| 193 |
+
- professional_accounting: 0.6783 (97/143)
|
| 194 |
+
- clinical_knowledge: 0.8286 (116/140)
|
| 195 |
+
- high_school_microeconomics: 0.9559 (130/136)
|
| 196 |
+
- nutrition: 0.8370 (113/135)
|
| 197 |
+
- professional_medicine: 0.8657 (116/134)
|
| 198 |
+
- conceptual_physics: 0.8359 (107/128)
|
| 199 |
+
- high_school_mathematics: 0.5748 (73/127)
|
| 200 |
+
- human_aging: 0.7586 (88/116)
|
| 201 |
+
- security_studies: 0.7857 (88/112)
|
| 202 |
+
- high_school_statistics: 0.7838 (87/111)
|
| 203 |
+
- marketing: 0.9358 (102/109)
|
| 204 |
+
- high_school_world_history: 0.9057 (96/106)
|
| 205 |
+
- sociology: 0.8932 (92/103)
|
| 206 |
+
- high_school_government_and_politics: 0.9406 (95/101)
|
| 207 |
+
- high_school_geography: 0.9293 (92/99)
|
| 208 |
+
- high_school_chemistry: 0.7526 (73/97)
|
| 209 |
+
- high_school_us_history: 0.9053 (86/95)
|
| 210 |
+
- virology: 0.5281 (47/89)
|
| 211 |
+
- college_medicine: 0.8295 (73/88)
|
| 212 |
+
- world_religions: 0.8523 (75/88)
|
| 213 |
+
- high_school_physics: 0.6429 (54/84)
|
| 214 |
+
- electrical_engineering: 0.7654 (62/81)
|
| 215 |
+
- astronomy: 0.9367 (74/79)
|
| 216 |
+
- logical_fallacies: 0.8553 (65/76)
|
| 217 |
+
- high_school_european_history: 0.9178 (67/73)
|
| 218 |
+
- anatomy: 0.7887 (56/71)
|
| 219 |
+
- college_biology: 0.9531 (61/64)
|
| 220 |
+
- human_sexuality: 0.7969 (51/64)
|
| 221 |
+
- formal_logic: 0.6562 (42/64)
|
| 222 |
+
- public_relations: 0.7049 (43/61)
|
| 223 |
+
- international_law: 0.9000 (54/60)
|
| 224 |
+
- college_physics: 0.6491 (37/57)
|
| 225 |
+
- college_mathematics: 0.5636 (31/55)
|
| 226 |
+
- econometrics: 0.6296 (34/54)
|
| 227 |
+
- jurisprudence: 0.8491 (45/53)
|
| 228 |
+
- high_school_computer_science: 0.8654 (45/52)
|
| 229 |
+
- machine_learning: 0.6346 (33/52)
|
| 230 |
+
- medical_genetics: 0.8627 (44/51)
|
| 231 |
+
- global_facts: 0.4314 (22/51)
|
| 232 |
+
- management: 0.9200 (46/50)
|
| 233 |
+
- us_foreign_policy: 0.9400 (47/50)
|
| 234 |
+
- college_chemistry: 0.5319 (25/47)
|
| 235 |
+
- abstract_algebra: 0.6170 (29/47)
|
| 236 |
+
- business_ethics: 0.7174 (33/46)
|
| 237 |
+
- college_computer_science: 0.8222 (37/45)
|
| 238 |
+
- computer_security: 0.8140 (35/43)
|
| 239 |
+
|
| 240 |
+
MMLU - Massive Multitask Language Understanding, multiple-choice questions across 57 subjects (math, history, law, medicine, etc.).
|
| 241 |
+
|
| 242 |
+
-----
|
| 243 |
+
|
| 244 |
+
## Quantizations
|
| 245 |
+
|
| 246 |
+
For the K-quants below, small SSM tensors are kept at higher precision where useful.
|
| 247 |
+
|
| 248 |
+
-`Q8_0` and `Q4_K` quants keep `ssm_alpha`, `ssm_beta`, and `ssm_out` as `BF16`.
|
| 249 |
+
|
| 250 |
+
-`Q6_K` and `Q5_K` quants keep `ssm_alpha`, `ssm_beta` and `ssm_out` as `Q8_0`.
|
| 251 |
+
|
| 252 |
+
This helps preserve the hybrid/SSM blocks with a small file-size increase.
|
| 253 |
+
|
| 254 |
+
| Filename | Quant | Description |
|
| 255 |
+
|----------|-------|-------------|
|
| 256 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-BF16.gguf | BF16 | Full precision |
|
| 257 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q8_0.gguf | Q8_0 | Near-lossless, recommended |
|
| 258 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q6_K.gguf | Q6_K | Excellent quality |
|
| 259 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q5_K_M.gguf | Q5_K_M | Good balance |
|
| 260 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_M.gguf | Q4_K_M | Good for limited VRAM |
|
| 261 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-Q4_K_S.gguf | Q4_K_S | Smaller Q4 |
|
| 262 |
+
|
| 263 |
+
## Vision Projector
|
| 264 |
+
|
| 265 |
+
| Filename | Quant | Description |
|
| 266 |
+
|----------|-------|-------------|
|
| 267 |
+
| Qwythos-9B-Claude-Mythos-5-1M-uncensored-heretic-mmproj-BF16.gguf | BF16 | Native precision |
|
| 268 |
+
|
| 269 |
+
A Vision Projector File is Required for vision/multimodal capabilities. Use alongside any quantization above.
|
| 270 |
+
|
| 271 |
+
## Usage
|
| 272 |
+
|
| 273 |
+
Works with llama.cpp, LM Studio, Ollama, and other GGUF-compatible tools.
|
| 274 |
+
|
| 275 |
+
-----
|
| 276 |
+
|
| 277 |
+
|
| 278 |
+
<p align="center">
|
| 279 |
+
<img src="assets/qwythos.png" alt="Qwythos-9B" width="640"/>
|
| 280 |
+
</p>
|
| 281 |
+
|
| 282 |
+
# Qwythos-9B
|
| 283 |
+
|
| 284 |
+
**Developed by [Empero](https://empero.org)**
|
| 285 |
+
|
| 286 |
+
**Qwythos-9B** is a full-parameter reasoning model built on top of a **deeply uncensored Qwen3.5-9B base** and post-trained on **over 500 million tokens** of high-quality Claude Mythos and Claude Fable traces, with chain-of-thought generated in-house by Empero AI's internal tool **rethink**.
|
| 287 |
+
|
| 288 |
+
The result is a compact, fast, **dramatically more capable** 9B reasoning model. Headline capabilities:
|
| 289 |
+
|
| 290 |
+
- **π 1,048,576-token context** β Qwythos ships with **YaRN rope-scaling enabled by default** for a **full 1M-token context window** out of the box. One of the longest context windows available in any 9B-class open-weight model, suitable for whole-codebase reasoning, multi-document research, and long agentic trajectories.
|
| 291 |
+
- **π Dominates the base** under matched evaluation: **+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex.**
|
| 292 |
+
- **π Native function calling** per Qwen3.5's spec β no extra wrapper, no tool-specific fine-tune required.
|
| 293 |
+
- **π― Self-corrects with tools** β when given a Python executor and a web search tool, Qwythos produced source-cited, factually-correct answers on **7 of 7** test prompts spanning math, cybersecurity, clinical pharmacology, and biochemistry.
|
| 294 |
+
|
| 295 |
+
Qwythos is intentionally **uncensored**. It is designed to engage seriously with technically demanding questions across cybersecurity, red-teaming methodology, biology, pharmacology, and clinical medicine β domains where over-aligned models tend to refuse, hedge into uselessness, or surface boilerplate disclaimers in place of substance.
|
| 296 |
+
|
| 297 |
+
---
|
| 298 |
+
|
| 299 |
+
## Headline results
|
| 300 |
+
|
| 301 |
+
<p align="center">
|
| 302 |
+
<img src="assets/qwythos_eval_chart.svg" alt="Qwythos vs. base Qwen3.5-9B across seven benchmarks" width="900"/>
|
| 303 |
+
</p>
|
| 304 |
+
|
| 305 |
+
**Same harness. Same sampling. Same prompts. The wins are real.**
|
| 306 |
+
|
| 307 |
+
| Task | Metric | Base Qwen3.5-9B | **Qwythos-9B** | Ξ |
|
| 308 |
+
|---|---|---:|---:|---:|
|
| 309 |
+
| gsm8k | exact_match (flexible) | 0.670 | **0.860** | **+0.190** |
|
| 310 |
+
| gsm8k | exact_match (strict) | 0.510 | **0.810** | **+0.300** |
|
| 311 |
+
| mmlu | acc | 0.232 | **0.575** | **+0.343** |
|
| 312 |
+
| arc_challenge | acc | 0.470 | **0.490** | +0.020 |
|
| 313 |
+
| arc_challenge | acc_norm | 0.400 | **0.410** | +0.010 |
|
| 314 |
+
| gpqa_diamond (CoT, 0-shot) | exact_match (flexible) | 0.630 | 0.580 | β0.050 |
|
| 315 |
+
|
| 316 |
+
All numbers produced with [`lm-evaluation-harness`](https://github.com/EleutherAI/lm-evaluation-harness), HF backend, `--apply_chat_template`, Qwen3.5 sampling (`temperature=0.6, top_p=0.95, top_k=20`), `--limit 100`. Full per-task and per-subject (MMLU) breakdown in [`evals/lm_eval_results.md`](evals/lm_eval_results.md). Raw `results*.json` and per-sample `samples_*.jsonl` are available on request.
|
| 317 |
+
|
| 318 |
+
The **MMLU +34.3** lift is the headline. Qwythos posts **0.575 mean across all 57 subjects, peaking at 0.78 on government/politics, 0.77 on college biology, 0.74 on conceptual physics** β placing it well above what most 9B reasoning models deliver under the same evaluation conditions. Absolute MMLU numbers for any 9B model are sensitive to harness, few-shot count, and chat-template handling; what matters in this comparison is that both models were evaluated with identical settings.
|
| 319 |
+
|
| 320 |
+
---
|
| 321 |
+
|
| 322 |
+
## Capability: Native tool use with self-correction
|
| 323 |
+
|
| 324 |
+
Qwythos supports **OpenAI/Qwen3.5-style function calling out of the box** β no extra wrapper, no fine-tune-on-tools needed. Pass `tools=[...]` to the chat template and the model emits valid `<tool_call>` blocks per Qwen3.5's spec, with required parameters honored.
|
| 325 |
+
|
| 326 |
+
We evaluated tool use on a 7-prompt harness combining capability demos with **deliberately hard factual-recall prompts where closed-book sampling fails:**
|
| 327 |
+
|
| 328 |
+
| Prompt | Tool selected | Outcome |
|
| 329 |
+
|---|---|---|
|
| 330 |
+
| Compute `sin(Ο/7) Γ cos(Ο/11)` to 10 dp | `python_executor` | β
`0.4163083990` (correct, single call) |
|
| 331 |
+
| Count primes below 100,000 | `python_executor` | β
`9592` (correct, wrote and ran a sieve) |
|
| 332 |
+
| Latest stable CPython 3 release | `web_search` | β
Found 3.14.6 (June 2026), 3.15 in beta, cited source |
|
| 333 |
+
| **Hashcat mode for Kerberos TGS-REP** | `web_search` | β
**`-m 13100`** with 4 corroborating sources |
|
| 334 |
+
| **CVE for PrintNightmare** | `web_search` | β
**CVE-2021-34527** (and correctly distinguished from CVE-2021-1675 / CVE-2021-34481 variants) |
|
| 335 |
+
| **Is physostigmine indicated for organophosphate poisoning?** | `web_search` | β
**"NOT indicated β would be harmful. Physostigmine is for the anticholinergic toxidrome."** Cited LITFL toxicology. |
|
| 336 |
+
| **DPP-4 cleavage site in GLP-1 / semaglutide modification** | `web_search` | β
**AlaβΈβGluβΉ cleavage, Ξ±-aminoisobutyric acid (Aib) at position 8 in semaglutide** β cited Wikipedia and pharma source |
|
| 337 |
+
|
| 338 |
+
**7 of 7 succeeded.** Tool selection was always sensible (math β Python; facts β search). The four bottom rows are particularly important: they are the **four hardest specialty facts** to recall closed-book β and Qwythos, given the right tools, **searched, integrated multiple sources, and produced source-cited correct answers** in every case.
|
| 339 |
+
|
| 340 |
+
Full transcripts with the model's reasoning, every tool call issued, every result returned, and the final integrated answer are in [`evals/tool_test_outputs.md`](evals/tool_test_outputs.md).
|
| 341 |
+
|
| 342 |
+
This makes Qwythos **deployment-ready for retrieval-augmented agentic settings**, where the model verifies its specifics rather than fabricating them.
|
| 343 |
+
|
| 344 |
+
---
|
| 345 |
+
|
| 346 |
+
## Capability: 1,048,576-token context window
|
| 347 |
+
|
| 348 |
+
Qwythos ships with **YaRN rope-scaling configured by default** for a **1,048,576-token (β1M) context window** β a 4Γ extension over the 262,144-token native architecture. The configuration is baked into `config.json` and applies automatically at load time; no separate flag, post-processing step, or YaRN-specific tokenizer is required:
|
| 349 |
+
|
| 350 |
+
```json
|
| 351 |
+
"rope_parameters": {
|
| 352 |
+
"rope_type": "yarn",
|
| 353 |
+
"factor": 4.0,
|
| 354 |
+
"original_max_position_embeddings": 262144,
|
| 355 |
+
"mrope_interleaved": true,
|
| 356 |
+
"mrope_section": [11, 11, 10],
|
| 357 |
+
"rope_theta": 10000000
|
| 358 |
+
},
|
| 359 |
+
"max_position_embeddings": 1048576
|
| 360 |
+
```
|
| 361 |
+
|
| 362 |
+
This is the **official Qwen3.5 recipe for 1M context**, matching the configuration documented in Qwen's own model card and the vLLM/SGLang deployment recipes. Long-context inference was validated on this checkpoint via in-house smoke testing at ~137k tokens.
|
| 363 |
+
|
| 364 |
+
**What 1M context unlocks:**
|
| 365 |
+
|
| 366 |
+
- **Whole-codebase reasoning.** A 1M-token window comfortably fits multi-hundred-thousand-line repositories β enabling cross-file refactoring, defect-finding, and architectural review *without* RAG chunking.
|
| 367 |
+
- **Long agentic trajectories.** Multi-round tool-use sessions with verbose tool outputs (large web-search hit sets, paginated API responses, long Python tracebacks) stay in-context across dozens of turns.
|
| 368 |
+
- **Multi-document research.** A typical research session (10β20 papers + notes + the user's working draft) fits in one prompt β synthesize across all of them in a single forward pass.
|
| 369 |
+
- **Long-form scientific reasoning.** Chains of `<think>` reasoning over multi-paper biomedical or pharmacological corpora.
|
| 370 |
+
|
| 371 |
+
**Serving at 1M:**
|
| 372 |
+
|
| 373 |
+
```bash
|
| 374 |
+
# vLLM
|
| 375 |
+
vllm serve empero-ai/Qwythos-9B-Claude-Mythos-5-1M --max-model-len 1010000
|
| 376 |
+
|
| 377 |
+
# SGLang
|
| 378 |
+
SGLANG_ALLOW_OVERWRITE_LONGER_CONTEXT_LEN=1 python -m sglang.launch_server \
|
| 379 |
+
--model-path empero-ai/Qwythos-9B-Claude-Mythos-5-1M --context-length 1010000
|
| 380 |
+
```
|
| 381 |
+
|
| 382 |
+
**Practical notes:**
|
| 383 |
+
|
| 384 |
+
- The full 1M window benefits from tensor-parallel multi-GPU or aggressive KV-cache offload β a single H100/H200 comfortably handles **256kβ512k**. Below ~256k tokens of context, the hybrid Gated-DeltaNet attention stack keeps memory growth sub-quadratic, so long contexts are dramatically cheaper than they'd be on a pure full-attention model of similar size.
|
| 385 |
+
- Static YaRN at factor=4.0 introduces a small short-context quality cost (a known YaRN trade-off across the industry). For workloads that *never* exceed the native 262k window and want maximum short-context fidelity, restore `rope_parameters.rope_type` to `"default"` from the included `config.json.pre_yarn` backup.
|
| 386 |
+
|
| 387 |
+
### Reproducing the tool harness
|
| 388 |
+
|
| 389 |
+
The harness is a small ~150-line Python file:
|
| 390 |
+
|
| 391 |
+
- `python_executor(code)` β runs Python in a subprocess (12s timeout, captured stdout/stderr)
|
| 392 |
+
- `web_search(query, max_results)` β DuckDuckGo via the `ddgs` package
|
| 393 |
+
|
| 394 |
+
Pass both as `tools=` to `apply_chat_template` and parse `<tool_call>` blocks from the model's output. The parser handles Qwen3.5's chat-template format:
|
| 395 |
+
```
|
| 396 |
+
<tool_call>
|
| 397 |
+
<function=NAME>
|
| 398 |
+
<parameter=PARAM>value</parameter>
|
| 399 |
+
</function>
|
| 400 |
+
</tool_call>
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
Empero will release the reference harness on GitHub.
|
| 404 |
+
|
| 405 |
+
---
|
| 406 |
+
|
| 407 |
+
## Sampling recommendations
|
| 408 |
+
|
| 409 |
+
Qwythos was trained as a reasoning model and inherits Qwen3.5's thinking-mode behavior. Use these settings as defaults:
|
| 410 |
+
|
| 411 |
+
```python
|
| 412 |
+
gen_kwargs = dict(
|
| 413 |
+
do_sample=True,
|
| 414 |
+
temperature=0.6, # Qwen3.5 thinking-mode recommended
|
| 415 |
+
top_p=0.95,
|
| 416 |
+
top_k=20,
|
| 417 |
+
repetition_penalty=1.05,
|
| 418 |
+
max_new_tokens=16384, # generous budget for the <think> reasoning block + final answer
|
| 419 |
+
)
|
| 420 |
+
```
|
| 421 |
+
|
| 422 |
+
**Why these:** in a controlled retest (see [`evals/retest_outputs.md`](evals/retest_outputs.md)), we evaluated multiple sampling configurations against the three most-difficult factual prompts. **Greedy decoding and very-low-temperature sampling (Tβ€0.3) degenerated into repetition loops** β a known failure mode for reasoning models on this class of prompts. **Qwen3.5's recommended setting (T=0.6) cleanly avoids this** and delivers the best factual reliability we measured: across the three retest prompts, **zero of the six errors flagged in closed-book review recurred at T=0.6** β including the safety-relevant physostigmine claim, the misattributed CVE, and the incorrect hashcat hash-mode.
|
| 423 |
+
|
| 424 |
+
Use `repetition_penalty=1.05` β a small deviation from Qwen's default of 1.0 that prevents rare non-terminating reasoning loops on long generations.
|
| 425 |
+
|
| 426 |
+
---
|
| 427 |
+
|
| 428 |
+
## Domain coverage
|
| 429 |
+
|
| 430 |
+
Qwythos is a **general-purpose reasoning model with explicit emphasis on cybersecurity, biomedical, and quantitative reasoning**. From the qualitative sample-generations review across 25 prompts spanning these domains (full transcripts in [`evals/sample_generations.md`](evals/sample_generations.md)):
|
| 431 |
+
|
| 432 |
+
- **Cybersecurity** β produces detailed defender-oriented walkthroughs of SQL injection mitigations, TLS handshake structure, EDR/process-injection detection, Linux hardening, MITRE ATT&CK ransomware kill chains.
|
| 433 |
+
- **Red-team methodology** β clean explanations of engagement phases, scoping, rules of engagement, evidence handling, reporting. Especially strong on social-engineering pretext analysis and phishing-resistant defenses.
|
| 434 |
+
- **Biology / biochemistry** β step-by-step mechanisms for CRISPR-Cas9, mRNA vaccines, SARS-CoV-2 spike protein, antibiotic-resistance mechanisms, organophosphate AChE inhibition.
|
| 435 |
+
- **Pharmacology** β strong on receptor pharmacology fundamentals (agonism, antagonism, partial agonism with worked examples), statin mechanism, opioid respiratory depression at the brainstem level, beta-blocker indications, therapeutic-window reasoning for narrow-index drugs.
|
| 436 |
+
- **Clinical medicine** β ACS chest-pain differential and workup, type-2 diabetes pathophysiology and drug-class targeting, sepsis recognition (qSOFA) and bundle.
|
| 437 |
+
- **Math** β strong at gsm8k-style multi-step word problems, minerva-style competition math; **86% gsm8k**, integer arithmetic verified by `python_executor` when invoked.
|
| 438 |
+
|
| 439 |
+
**The uncensored base means Qwythos engages substantively** with these prompts rather than refusing, hedging, or burying answers in disclaimer boilerplate. Reasoning is shown in the `<think>` block; final answer follows.
|
| 440 |
+
|
| 441 |
+
---
|
| 442 |
+
|
| 443 |
+
## Model details
|
| 444 |
+
|
| 445 |
+
- **Base model:** [`Qwen/Qwen3.5-9B`](https://huggingface.co/Qwen/Qwen3.5-9B) β a dense, natively multimodal architecture with a hybrid attention stack (3:1 Gated DeltaNet linear-attention to Gated full-attention), ~152k vocabulary, long native context.
|
| 446 |
+
- **Fine-tune type:** full parameter (all text-backbone weights trained). The vision tower was frozen β training was text-only, so vision behavior is inherited from the base and was not tuned or tested.
|
| 447 |
+
- **Objective:** supervised fine-tuning, assistant-only loss (the model is scored only on the assistant/completion tokens; prompts are masked).
|
| 448 |
+
- **Context length:** **1,048,576 tokens (β1M) β YaRN rope-scaling enabled by default in `config.json`.** Native architectural context is 262,144 tokens; YaRN factor 4.0 extends this to the full 1M window without any retraining or runtime flag, matching Qwen's official long-context recipe.
|
| 449 |
+
- **License:** Apache 2.0.
|
| 450 |
+
|
| 451 |
+
## Training data
|
| 452 |
+
|
| 453 |
+
Qwythos was post-trained on **over 500 million tokens** of high-quality reasoning data drawn from:
|
| 454 |
+
|
| 455 |
+
- **Claude Mythos and Claude Fable traces** β long, multi-turn problem-solving conversations spanning code, math, science reasoning, biomedical analysis, and agentic tool use.
|
| 456 |
+
- **Chain-of-thought generated in-house by `rethink`**, Empero AI's internal CoT-generation tool. `rethink` produces deliberately structured `<think>`-block reasoning that walks through hypothesis, verification, and conclusion before the final answer is committed β directly shaping Qwythos's reason-then-answer behavior.
|
| 457 |
+
|
| 458 |
+
All data was normalized to Qwen3.5's chat format. Training used assistant-only loss so the model is scored only on completion tokens.
|
| 459 |
+
|
| 460 |
+
## Training procedure
|
| 461 |
+
|
| 462 |
+
Full-parameter supervised fine-tuning with [TRL](https://github.com/huggingface/trl):
|
| 463 |
+
|
| 464 |
+
| Hyperparameter | Value |
|
| 465 |
+
|---|---|
|
| 466 |
+
| Schedule | 2-phase curriculum: broad reasoning corpus β focused agentic + coding |
|
| 467 |
+
| Effective batch size | 16 |
|
| 468 |
+
| Max sequence length | 128,000 (no truncation) |
|
| 469 |
+
| Learning rate | 1e-5 β 5e-6 cosine across phases |
|
| 470 |
+
| Optimizer | paged AdamW (8-bit) |
|
| 471 |
+
| Precision | bf16 |
|
| 472 |
+
| Loss | chunked NLL, assistant-only |
|
| 473 |
+
|
| 474 |
+
Held-out validation loss decreased monotonically across both phases (final eval_loss β 0.709, mean token accuracy 0.799 on a curated holdout). No overfitting observed.
|
| 475 |
+
|
| 476 |
+
---
|
| 477 |
+
|
| 478 |
+
## How to use
|
| 479 |
+
|
| 480 |
+
The base is multimodal; for text-only inference load with `AutoModelForImageTextToText`:
|
| 481 |
+
|
| 482 |
+
```python
|
| 483 |
+
import torch
|
| 484 |
+
from transformers import AutoModelForImageTextToText, AutoTokenizer
|
| 485 |
+
|
| 486 |
+
model_id = "empero-ai/Qwythos-9B-Claude-Mythos-5-1M"
|
| 487 |
+
tok = AutoTokenizer.from_pretrained(model_id)
|
| 488 |
+
model = AutoModelForImageTextToText.from_pretrained(
|
| 489 |
+
model_id, dtype="bfloat16", device_map="auto"
|
| 490 |
+
)
|
| 491 |
+
|
| 492 |
+
messages = [
|
| 493 |
+
{"role": "user",
|
| 494 |
+
"content": "Walk through the biochemistry of how organophosphate nerve agents inhibit acetylcholinesterase, the resulting cholinergic toxicity, and the medical antidotes."}
|
| 495 |
+
]
|
| 496 |
+
text = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 497 |
+
inputs = tok(text, return_tensors="pt").to(model.device)
|
| 498 |
+
|
| 499 |
+
out = model.generate(
|
| 500 |
+
**inputs, max_new_tokens=16384, do_sample=True,
|
| 501 |
+
temperature=0.6, top_p=0.95, top_k=20, repetition_penalty=1.05,
|
| 502 |
+
)
|
| 503 |
+
# Output opens with <think>...</think> reasoning, then the final answer.
|
| 504 |
+
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
|
| 505 |
+
```
|
| 506 |
+
|
| 507 |
+
### With tools (function calling)
|
| 508 |
+
|
| 509 |
+
```python
|
| 510 |
+
TOOLS = [
|
| 511 |
+
{"type": "function", "function": {
|
| 512 |
+
"name": "python_executor",
|
| 513 |
+
"description": "Execute Python code and return stdout.",
|
| 514 |
+
"parameters": {"type": "object",
|
| 515 |
+
"properties": {"code": {"type": "string"}},
|
| 516 |
+
"required": ["code"]}}},
|
| 517 |
+
{"type": "function", "function": {
|
| 518 |
+
"name": "web_search",
|
| 519 |
+
"description": "Search the web for current facts and citations.",
|
| 520 |
+
"parameters": {"type": "object",
|
| 521 |
+
"properties": {"query": {"type": "string"},
|
| 522 |
+
"max_results": {"type": "integer"}},
|
| 523 |
+
"required": ["query"]}}},
|
| 524 |
+
]
|
| 525 |
+
|
| 526 |
+
text = tok.apply_chat_template(messages, tools=TOOLS, tokenize=False, add_generation_prompt=True)
|
| 527 |
+
# ... then parse <tool_call><function=...><parameter=...>...</parameter></function></tool_call> blocks
|
| 528 |
+
```
|
| 529 |
+
|
| 530 |
+
**Requirements:** a recent `transformers` (Qwen3.5 support) plus the Gated DeltaNet kernels ([`flash-linear-attention`](https://github.com/fla-org/flash-linear-attention) and a CUDA-matched `causal_conv1d` build) β without them the linear-attention layers fall back to slow, memory-hungry PyTorch ops.
|
| 531 |
+
|
| 532 |
+
---
|
| 533 |
+
|
| 534 |
+
## Limitations
|
| 535 |
+
|
| 536 |
+
Qwythos is a focused 9B reasoning model. A few characteristics are worth knowing to get the best out of it:
|
| 537 |
+
|
| 538 |
+
- **It's a reasoning model.** Every answer opens with a `<think>` block before the final response. Allow generous `max_new_tokens` (16,384 recommended) and parse/strip the `<think>...</think>` span for end users.
|
| 539 |
+
- **Use recommended sampling.** At greedy decoding or very-low-temperature (Tβ€0.3) sampling, the model can enter repetition loops on long generations β a known reasoning-model failure mode. Use `temperature=0.6, top_p=0.95, top_k=20, repetition_penalty=1.05` for consistently crisp results.
|
| 540 |
+
- **Verify specifics in safety-critical contexts.** Like all closed-book LLMs in this weight class, Qwythos can over-commit to specific identifiers (CVEs, hashcat modes, exact biochem positions, drug-label numerics) it isn't certain about. **The tool-augmented path (Python executor + web search) cleanly resolves this** in our evaluation β for deployments where exact identifiers matter, pair Qwythos with retrieval or function calling.
|
| 541 |
+
- **Uncensored.** Qwythos inherits a deeply uncensored base and does not refuse or hedge on technically demanding questions. Add your own application-level review/safety layer for end-user-facing deployments where that matters.
|
| 542 |
+
- **Text-only fine-tune.** The base is multimodal, but only the text path was trained. Vision behavior is inherited from the base and was not evaluated here.
|
| 543 |
+
|
| 544 |
+
---
|
| 545 |
+
|
| 546 |
+
## Stay in the loop
|
| 547 |
+
|
| 548 |
+
Sign up for the Empero newsletter at **[empero.org](https://empero.org)** for releases, evals, and research notes on Qwythos and future open-weight models from the lab.
|
| 549 |
+
|
| 550 |
+
## Support / Donate
|
| 551 |
+
|
| 552 |
+
If this model helped you, consider supporting the project:
|
| 553 |
+
|
| 554 |
+
- **BTC**: `bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v`
|
| 555 |
+
- **LTC**: `ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x`
|
| 556 |
+
- **XMR**: `42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY`
|
| 557 |
+
|
| 558 |
+
---
|
| 559 |
+
|
| 560 |
+
## Provenance & licensing
|
| 561 |
+
|
| 562 |
+
Weights are released under **Apache-2.0**, inherited from the Qwen3.5-9B base. Shared for research and experimentation, as-is.
|
| 563 |
+
|
| 564 |
+
## Acknowledgements
|
| 565 |
+
|
| 566 |
+
- Developed and released by [Empero](https://empero.org)
|
| 567 |
+
- Base model: [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (Alibaba Qwen team)
|
| 568 |
+
- Training: [TRL](https://github.com/huggingface/trl) + [Transformers](https://github.com/huggingface/transformers)
|
| 569 |
+
- Linear-attention kernels: [flash-linear-attention](https://github.com/fla-org/flash-linear-attention), [causal_conv1d](https://github.com/Dao-AILab/causal-conv1d)
|
| 570 |
+
- Evaluation: [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) (EleutherAI)
|