How to use from the
Use from the
Transformers library
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)
# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
model = AutoModelForMultimodalLM.from_pretrained("yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Quick Links

Qwythos-9B

Qwythos-9B-oQ3.5

Apple Silicon Optimized oQ3.5 MLX Quantized Release

This repository contains an oQ3.5 mixed-precision MLX quantized version of Qwythos-9B, optimized for efficient local inference on Apple Silicon devices.

The original Qwythos-9B model was developed by Empero AI. This repository contains an optimized MLX/oQ3.5 conversion only—no additional fine-tuning or retraining has been performed.


About Qwythos

Qwythos-9B is a full-parameter reasoning model built upon Qwen3.5-9B and trained on over 500 million tokens of carefully curated reasoning data.

The original model specializes in:

  • 🧠 Advanced reasoning
  • 💻 Programming
  • 🛠 Native function calling
  • 🤖 Tool use
  • 🔐 Cybersecurity
  • 🧬 Biomedical reasoning
  • ➗ Mathematics
  • 🔬 Scientific reasoning
  • 📚 Long-context agent workflows

Key capabilities include:

  • 1,048,576 token context window
  • Native function calling
  • Excellent coding performance
  • Strong mathematical reasoning
  • Tool-assisted self-correction
  • Long-context understanding
  • Uncensored technical reasoning

For complete benchmark results, training methodology, and evaluation details, please visit the original repository:

https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M


Quantization

This release uses oQ3.5 mixed-precision quantization.

Specifications

  • Format: MLX
  • Quantization: oQ3.5
  • Method: Sensitivity-Aware Mixed Precision
  • Target Platform: Apple Silicon
  • Inference: MLX / oMLX

Unlike traditional uniform quantization, oQ dynamically allocates precision according to layer sensitivity, preserving higher precision for the most important weights while aggressively compressing less sensitive regions.

This provides an excellent balance between:

  • Higher reasoning quality
  • Better coding performance
  • Lower memory usage
  • Faster inference
  • Excellent Apple Silicon efficiency

Recommended Settings

For the best reasoning performance:

temp: 0.6
top_p: 0.95
top_k: 20
min_p: 0
rep_penalty: 1.05
presence_penalty: 1.5
enable_thinking: true

These settings provide excellent performance across:

  • Reasoning
  • Mathematics
  • Programming
  • Tool Use
  • Scientific Questions
  • Agent Workflows

Example Usage

from mlx_lm import load, generate

model, tokenizer = load("YOUR_USERNAME/Qwythos-9B-oQ3.5")

messages = [
    {
        "role": "user",
        "content": "Explain speculative decoding."
    }
]

prompt = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True,
)

response = generate(
    model,
    tokenizer,
    prompt=prompt,
    temp=0.6,
    top_p=0.95,
    top_k=20,
    max_tokens=16384,
)

print(response)

Optimized For

This release is optimized for:

  • Apple M1
  • Apple M1 Pro / Max / Ultra
  • Apple M2 Series
  • Apple M3 Series
  • Apple M4 Series

Compatible with:

  • MLX
  • oMLX
  • Open WebUI
  • LM Studio (MLX)
  • MLX-LM
  • Local AI Applications

Intended Use

Qwythos-9B-oQ3.5 is well suited for:

  • Software Engineering
  • AI Coding Assistants
  • Long Context Analysis
  • Scientific Research
  • Mathematical Reasoning
  • Cybersecurity
  • Biomedical Analysis
  • Local AI Agents
  • Tool Calling Applications
  • Research & Education

Hardware Recommendations

Recommended systems:

  • Apple M1 Pro / Max / Ultra
  • Apple M2 Pro / Max / Ultra
  • Apple M3 Series
  • Apple M4 Series

Higher-memory configurations are recommended when utilizing the full 1M context window.


About oQ Quantization

oQ is a sensitivity-aware mixed-precision quantization technique designed to maximize model quality while significantly reducing memory usage.

Instead of quantizing every layer identically, oQ analyzes layer importance and preserves additional precision where it matters most.

Benefits include:

  • Better reasoning retention
  • Improved coding performance
  • Higher mathematical accuracy
  • Lower memory usage
  • Faster inference
  • Excellent Apple Silicon optimization

Credits

Original Model

All credit for the original model, datasets, training methodology, evaluation, benchmarks, and research belongs entirely to:

Empero AI

Original Repository:

https://huggingface.co/empero-ai/Qwythos-9B-Claude-Mythos-5-1M

Base Model:

https://huggingface.co/Qwen/Qwen3.5-9B


oQ3.5 MLX Quantized Release

This repository provides an Apple Silicon optimized oQ3.5 MLX quantized version of the original model.

No additional fine-tuning has been performed.


Acknowledgements

  • Empero AI
  • Alibaba Qwen Team
  • Apple MLX
  • Hugging Face
  • Transformers
  • TRL
  • EleutherAI
  • oMLX
  • OptiQ Quantization

Citation

If you use this model in research, please cite the original Qwythos-9B model and the Qwen3.5 base model.


License

This release inherits the Apache-2.0 license from the original model.

Please refer to the original repository for complete licensing information.


Disclaimer

This repository contains an optimized oQ3.5 MLX quantized conversion intended for efficient local inference on Apple Silicon devices.

All original model architecture, datasets, training, benchmarks, evaluations, and research remain entirely the work of the original authors.

Downloads last month
25
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yugeshkarunamurthy/Qwythos-9B-Claude-Mythos-5-1M-oQ3.5

Finetuned
Qwen/Qwen3.5-9B
Quantized
(69)
this model