Instructions to use yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
model = AutoModelForMultimodalLM.from_pretrained("yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5

SGLang

How to use yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5 with Docker Model Runner:
```
docker model run hf.co/yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5
```

Qwythos-9B

Qwythos-9B-oQ3.5

Apple Silicon Optimized oQ3.5 MLX Quantized Release

This repository contains an oQ3.5 mixed-precision MLX quantized version of Qwythos-9B, optimized for efficient local inference on Apple Silicon devices.

The original Qwythos-9B model was developed by Empero AI. This repository contains an optimized MLX/oQ3.5 conversion only—no additional fine-tuning or retraining has been performed.

About Qwythos

Qwythos-9B is a full-parameter reasoning model built upon Qwen3.5-9B and trained on over 500 million tokens of carefully curated reasoning data.

The original model specializes in:

🧠 Advanced reasoning
💻 Programming
🛠 Native function calling
🤖 Tool use
🔐 Cybersecurity
🧬 Biomedical reasoning
➗ Mathematics
🔬 Scientific reasoning
📚 Long-context agent workflows

Key capabilities include:

1,048,576 token context window
Native function calling
Excellent coding performance
Strong mathematical reasoning
Tool-assisted self-correction
Long-context understanding
Uncensored technical reasoning

For complete benchmark results, training methodology, and evaluation details, please visit the original repository:

https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Quantization

This release uses oQ3.5 mixed-precision quantization.

Specifications

Format: MLX
Quantization: oQ3.5
Method: Sensitivity-Aware Mixed Precision
Target Platform: Apple Silicon
Inference: MLX / oMLX

Unlike traditional uniform quantization, oQ dynamically allocates precision according to layer sensitivity, preserving higher precision for the most important weights while aggressively compressing less sensitive regions.

This provides an excellent balance between:

Higher reasoning quality
Better coding performance
Lower memory usage
Faster inference
Excellent Apple Silicon efficiency

Recommended Settings

For the best reasoning performance:

temp: 0.6
top_p: 0.95
top_k: 20
min_p: 0
rep_penalty: 1.05
presence_penalty: 1.5
enable_thinking: true

These settings provide excellent performance across:

Reasoning
Mathematics
Programming
Tool Use
Scientific Questions
Agent Workflows

Example Usage

from mlx_lm import load, generate

model, tokenizer = load("YOUR_USERNAME/Qwythos-9B-oQ3.5")

messages = [
    {
        "role": "user",
        "content": "Explain speculative decoding."
    }
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    temp=0.6,
    top_p=0.95,
    top_k=20,
    max_tokens=16384,
)

print(response)

Optimized For

This release is optimized for:

Apple M1
Apple M1 Pro / Max / Ultra
Apple M2 Series
Apple M3 Series
Apple M4 Series

Compatible with:

MLX
oMLX
Open WebUI
LM Studio (MLX)
MLX-LM
Local AI Applications

Intended Use

Qwythos-9B-oQ3.5 is well suited for:

Software Engineering
AI Coding Assistants
Long Context Analysis
Scientific Research
Mathematical Reasoning
Cybersecurity
Biomedical Analysis
Local AI Agents
Tool Calling Applications
Research & Education

Hardware Recommendations

Recommended systems:

Apple M1 Pro / Max / Ultra
Apple M2 Pro / Max / Ultra
Apple M3 Series
Apple M4 Series

Higher-memory configurations are recommended when utilizing the full 1M context window.

About oQ Quantization

oQ is a sensitivity-aware mixed-precision quantization technique designed to maximize model quality while significantly reducing memory usage.

Instead of quantizing every layer identically, oQ analyzes layer importance and preserves additional precision where it matters most.

Benefits include:

Better reasoning retention
Improved coding performance
Higher mathematical accuracy
Lower memory usage
Faster inference
Excellent Apple Silicon optimization

Credits

Original Model

All credit for the original model, datasets, training methodology, evaluation, benchmarks, and research belongs entirely to:

Empero AI

Original Repository:

https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Base Model:

https://huggingface.co/Qwen/Qwen3.5-9B

oQ3.5 MLX Quantized Release

This repository provides an Apple Silicon optimized oQ3.5 MLX quantized version of the original model.

No additional fine-tuning has been performed.

Acknowledgements

Empero AI
Alibaba Qwen Team
Apple MLX
Hugging Face
Transformers
TRL
EleutherAI
oMLX
OptiQ Quantization

Citation

If you use this model in research, please cite the original Qwythos-9B model and the Qwen3.5 base model.

License

This release inherits the Apache-2.0 license from the original model.

Please refer to the original repository for complete licensing information.

Disclaimer

This repository contains an optimized oQ3.5 MLX quantized conversion intended for efficient local inference on Apple Silicon devices.

All original model architecture, datasets, training, benchmarks, evaluations, and research remain entirely the work of the original authors.

Downloads last month: 25

Safetensors

Model size

1B params

Tensor type

BF16

U32

Model tree for yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Finetuned

empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Quantized

(69)

this model