Text Generation
GGUF
English
llama.cpp
quantized
qwen3.5
reasoning
uncensored
long-context
1M-context
function-calling
multimodal
vision
cybersecurity
biomedical
agentic
conversational
Instructions to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF", filename="Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: llama cli -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Use Docker
docker model run hf.co/alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Ollama
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Ollama:
ollama run hf.co/alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Unsloth Studio
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF to start chatting
- Pi
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Docker Model Runner:
docker model run hf.co/alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
- Lemonade
How to use alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull alexukraine/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.Qwythos-9B-Claude-Mythos-5-1M-GGUF-Q4_K_M
List all available models
lemonade list
File size: 11,097 Bytes
6ee9cef | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 | ---
license: apache-2.0
base_model: empero-ai/Qwythos-9B-Claude-Mythos-5-1M
base_model_relation: quantized
language:
- en
pipeline_tag: text-generation
library_name: gguf
tags:
- gguf
- llama.cpp
- quantized
- qwen3.5
- reasoning
- uncensored
- long-context
- 1M-context
- function-calling
- multimodal
- vision
- cybersecurity
- biomedical
- agentic
---
<p align="center">
<img src="https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M/resolve/main/assets/qwythos.png" alt="Qwythos-9B" width="640"/>
</p>
<table>
<tr>
<td>
## π¨ v2 released β please redownload the GGUFs
The v2 GGUFs replace the original normal filenames and add explicit `-MTP-` variants. If you downloaded this repo before v2, please redownload your GGUF.
Fixes in v2:
- tokenizer metadata normalized for Qwen3.5 GGUF runtimes;
- embedded chat template updated for reliable tool/function calling and OpenCode-style agent loops;
- Qwythos/Empero identity prompt embedded in the template;
- MTP-enabled variants added as `Qwythos-9B-Claude-Mythos-5-1M-MTP-*.gguf`;
- Q4/Q8 tool-calling, MTP draft speculation, 1M-context allocation, and vision projector smoke-tested with current llama.cpp.
Use the normal files for maximum runtime compatibility. Use the `-MTP-` files when you want llama.cpp MTP draft speculation.
</td>
</tr>
</table>
# Qwythos-9B-Claude-Mythos-5-1M-GGUF
**Developed by [Empero](https://empero.org)**
GGUF quantizations of **[empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)** for [llama.cpp](https://github.com/ggml-org/llama.cpp), Ollama, LM Studio, jan, KoboldCpp, and other GGUF runtimes.
Qwythos-9B is a full-parameter reasoning model post-trained on over 500 million tokens of high-quality Claude Mythos / Claude Fable traces with chain-of-thought generated in-house by Empero AI's internal `rethink` tool. It dominates the base Qwen3.5-9B under matched evaluation (**+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex**), supports **native function calling** per the Qwen3.5 spec, and ships with a **1,048,576-token (1M) context window** via YaRN rope-scaling enabled by default.
For full training details, evaluation numbers, and capability writeup, see the **[base model card](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)**.
---
## Files
### Normal text weights β fixed v2 replacements
| File | Quant | Size | Notes |
|---|---|---|---|
| `Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf` | Q4_K_M | 5.24 GiB / 5.63 GB | **recommended default** β fixed v2, best compatibility |
| `Qwythos-9B-Claude-Mythos-5-1M-Q5_K_M.gguf` | Q5_K_M | 6.02 GiB / 6.47 GB | fixed v2, balanced quality / size |
| `Qwythos-9B-Claude-Mythos-5-1M-Q6_K.gguf` | Q6_K | 6.85 GiB / 7.36 GB | fixed v2, high quality |
| `Qwythos-9B-Claude-Mythos-5-1M-Q8_0.gguf` | Q8_0 | 8.87 GiB / 9.53 GB | fixed v2, near-lossless |
| `Qwythos-9B-Claude-Mythos-5-1M-BF16.gguf` | BF16 | 16.69 GiB / 17.92 GB | fixed v2, full precision conversion base |
If you don't know which to pick, **Q4_K_M is the right starting point** β it's the smallest practical quant with good quality preservation.
### MTP-enabled text weights β v2 variants
These include the restored Qwen3.5-compatible MTP head inside the GGUF. Use them with llama.cpp builds that support MTP draft speculation, for example `--spec-type draft-mtp`.
| File | Quant | Size | Notes |
|---|---|---|---|
| `Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf` | Q4_K_M + MTP | 5.48 GiB / 5.89 GB | **recommended MTP default** |
| `Qwythos-9B-Claude-Mythos-5-1M-MTP-Q5_K_M.gguf` | Q5_K_M + MTP | 6.26 GiB / 6.73 GB | MTP, balanced quality / size |
| `Qwythos-9B-Claude-Mythos-5-1M-MTP-Q6_K.gguf` | Q6_K + MTP | 7.09 GiB / 7.62 GB | MTP, high quality |
| `Qwythos-9B-Claude-Mythos-5-1M-MTP-Q8_0.gguf` | Q8_0 + MTP | 9.11 GiB / 9.79 GB | MTP, near-lossless |
| `Qwythos-9B-Claude-Mythos-5-1M-MTP-BF16.gguf` | BF16 + MTP | 17.14 GiB / 18.41 GB | MTP, full precision conversion base |
### Vision projector β for image input
| File | Size | Notes |
|---|---|---|
| `mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf` | 0.86 GiB / 0.92 GB | CLIP-style vision encoder + projector; **required for images**, pairs with any normal or MTP quant above |
Qwythos inherits its **vision tower from the Qwen3.5-9B base model** β the vision path was *frozen* during SFT (training was text-only), so the vision behavior is identical to base Qwen3.5-9B's multimodal capability. The mmproj is interchangeable with any community-built Qwen3.5-9B `mmproj-*.gguf`.
---
## Quick start
### llama.cpp (`llama-cli`)
```bash
llama-cli \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
-p "Walk through the biochemistry of how organophosphate nerve agents inhibit acetylcholinesterase." \
-n 8192 \
--temp 0.6 --top-p 0.95 --top-k 20 --repeat-penalty 1.05 \
-c 16384
```
### Ollama
```bash
ollama run hf.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M-GGUF:Q4_K_M
```
### LM Studio / jan / KoboldCpp
Drop any of the `.gguf` files into your runtime's model directory. Qwythos uses the standard Qwen3.5 chat template; modern GGUF runtimes load it automatically from the file.
### llama.cpp with MTP draft speculation
```bash
llama-server \
-m Qwythos-9B-Claude-Mythos-5-1M-MTP-Q4_K_M.gguf \
--spec-type draft-mtp \
--spec-draft-n-max 6 \
-c 16384 --port 8080
```
MTP support requires a recent llama.cpp build. If your runtime does not support MTP yet, use the normal v2 files above.
---
## Vision (image input)
Qwythos supports **image input** out of the box. Download both a text quant and the `mmproj-*.gguf` file from this repo, then run with llama.cpp's multimodal CLI or server.
### llama.cpp (`llama-mtmd-cli`)
```bash
llama-mtmd-cli \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
--mmproj mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf \
--image ./photo.jpg \
-p "Describe this image in detail." \
--temp 0.6 --top-p 0.95 --top-k 20 \
-c 16384
```
### llama.cpp server (OpenAI-compatible API with images)
```bash
llama-server \
-m Qwythos-9B-Claude-Mythos-5-1M-Q4_K_M.gguf \
--mmproj mmproj-Qwythos-9B-Claude-Mythos-5-1M-F16.gguf \
-c 16384 --port 8080
```
Then POST to `/v1/chat/completions` with an image URL or base64 payload β the standard OpenAI vision API shape works.
### LM Studio
Load the text quant; LM Studio detects the matching `mmproj-*.gguf` in the same folder and enables the image-attach button automatically.
### What vision unlocks
Since Qwythos inherits its vision tower unchanged from Qwen3.5-9B base, expect Qwen3.5-9B's documented vision capabilities: detailed image description, OCR (printed + handwritten), chart/table reading, UI/document understanding, basic spatial reasoning.
**Honest note:** the SFT used to produce Qwythos was **text-only** β we did not fine-tune the vision tower or train on any image-paired data. Image-grounded reasoning therefore inherits the base model's behavior; it has not been independently evaluated as part of this release. If your application is *primarily* vision-driven, validate on your own use case first.
---
## Sampling recommendations
Qwythos is a reasoning model β every response opens with a `<think>...</think>` block before the final answer. Use these settings as defaults:
| Parameter | Value |
|---|---|
| `temperature` | 0.6 |
| `top_p` | 0.95 |
| `top_k` | 20 |
| `repeat_penalty` | 1.05 |
| `max_new_tokens` | 16384 (generous budget for `<think>` + answer) |
These match Qwen3.5's official thinking-mode recommendations. **Avoid greedy decoding and very-low-temperature sampling (T β€ 0.3)** β both can cause repetition loops on long reasoning generations.
---
## Long context (1M tokens)
The GGUFs ship with YaRN rope-scaling baked in for a **1,048,576-token context window** (4Γ extension over the 262k native).
To use the full 1M window in `llama-cli`, set `-c 1010000` (or any context length up to that). For shorter prompts, lower `-c` to reduce KV-cache memory β at default settings llama.cpp will autosize.
A single H100/H200-class GPU comfortably handles **256kβ512k**; the full 1M typically needs tensor-parallel multi-GPU or aggressive KV-cache offload.
---
## Capabilities (from the base model card)
- **+34 pts MMLU, +30 pts gsm8k-strict, +19 pts gsm8k-flex** vs. base Qwen3.5-9B under matched lm-eval-harness evaluation
- **Native function calling** per Qwen3.5's chat-template spec β emits `<tool_call><function=NAME><parameter=NAME>VAL</parameter></function></tool_call>` blocks ready for any tool-use loop
- **Self-correcting with tools**: in a 7-prompt tool-use harness (Python executor + DuckDuckGo search), Qwythos produced source-cited correct answers on 7/7, including 4/4 closed-book failure-modes from the original review
- **Uncensored** β engages seriously with technically demanding questions across cybersecurity, red-teaming, biology, pharmacology, and clinical medicine
- **1,048,576-token (1M) context** β YaRN rope-scaling enabled by default
For full eval transcripts and per-task numbers, see the [base model card's `evals/` folder](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M/tree/main/evals).
---
## Limitations
- **Reasoning model.** Every answer opens with a `<think>` block; allow generous `max_new_tokens` and parse/strip `<think>...</think>` for end users.
- **Use recommended sampling.** Greedy / very-low-temp can cause repetition loops.
- **Verify specifics in safety-critical contexts.** Like all closed-book LLMs in this weight class, Qwythos can over-commit to specific identifiers (CVEs, hashcat modes, drug positions) it isn't certain about. Pair with retrieval or function calling in such deployments β the model uses tools cleanly when offered them.
- **Uncensored β add your own application-level review/safety layer** for end-user-facing deployments where that matters.
---
## Stay in the loop
Sign up for the Empero newsletter at **[empero.org](https://empero.org)** for releases, evals, and research notes.
## Support / Donate
If this model helped you, consider supporting the project:
- **BTC**: `bc1qx6zepu6sfkvshgdmc4ewu6pk6rpadvpgffpp7v`
- **LTC**: `ltc1qv2mefzps2vtjcpwfx8xxdrpplrcvltswm68r7x`
- **XMR**: `42Dbm5xg5Nq26fdyzfEU7KBnAJfhi7Cvz5J2ex5CzHXkfKuNEJzYCcmJ1GTbgjFZ5MBx72sdG1G9239Cd6rsZfv4QeDkYJY`
---
## Provenance & licensing
Weights are released under **Apache-2.0**, inherited from the Qwen3.5-9B base. Shared for research and experimentation, as-is.
## Acknowledgements
- Developed and released by [Empero](https://empero.org)
- Base model: [Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) (Alibaba Qwen team)
- Quantization: [llama.cpp](https://github.com/ggml-org/llama.cpp) (ggml-org)
- Vision projector (`mmproj`): inherited from Qwen3.5-9B (vision tower unchanged); F16 GGUF re-hosted with thanks to [Unsloth](https://huggingface.co/unsloth) for the original conversion
- HF model: [empero-ai/Qwythos-9B-Claude-Mythos-5-1M](https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M)
|