--- library_name: transformers license: mit license_link: https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B/blob/main/LICENSE pipeline_tag: text-generation tags: - qwen3_5_moe - qwen3_5 - reasoning - agentic-coding - mtp - apex - quantization - gguf - multimodal base_model: - deepreinforce-ai/Ornith-1.0-35B ---
APEX MTP Vision MIT

Ornith-1.0-35B-MTP-APEX

English | πŸ“– δΈ­ζ–‡ζ–‡ζ‘£

Self-improving agentic coding model Β· APEX quantized GGUFs + BF16 + mmproj

🐦 About Ornith

Ornith-1.0-35B is a self-improving agentic coding model from DeepReinforce AI, post-trained on top of Qwen3.5 with RL to jointly optimize scaffold generation and solution rollouts.

It achieves state-of-the-art performance among open-source models of comparable size on Terminal-Bench 2.1, SWE-Bench Verified/Pro/Multilingual, NL2Repo, and OpenClaw.

This GGUF package includes the mmproj-F16.gguf vision projector for multimodal (image + text) capabilities with llama.cpp. MTP layers are sourced from Qwen3.5-35B-A3B (same architecture, compatible weights). License: MIT.

🧠 Model Details
ArchitectureQwen3.5 MoE (Mixture of Experts)
Parameters35B total, 3B active per token
Experts256 routed experts, 8 active per token
Layers40 transformer layers + 1 MTP layer
Context262,144 tokens
MTP1 MTP layer (785 tensors) from Qwen3.5-35B-A3B
LicenseMIT
πŸ“Š BenchLocal Results (APEX-I-Compact, 15.85 GB)
ModeToolCall-15BugFind-15HermesAgent-20MaxEff.
Thinking100938993.575.5
No Thinking100928993.285.2

RTX 5070 Ti Β· No-thinking mode achieves better practical reliability (fewer retries).

πŸš€ Usage

llama.cpp (text only)

hf download SC117/Ornith-1.0-35B-MTP-APEX-GGUF --include "*.gguf" --local-dir ./models ./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf -ngl 99 -c 131072

llama.cpp (vision + text)

./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072

πŸŽ›οΈ Recommended Settings
ModeParameters
Generaltemperature=0.6, top_p=0.95, top_k=20
Codingtemperature=0.6, top_p=0.95, top_k=20
πŸ’‘ What is APEX?

These GGUF files are quantized using APEX, an MoE-aware mixed-precision quantization technique. APEX classifies every tensor by its role β€” routed expert, shared expert, or attention β€” and applies a layer-wise precision gradient, giving sensitive edge layers higher precision and compressing redundant middle layers more aggressively.

APEX beats Q8_0 perplexity at half the size β€” and even beats F16.

πŸ“¦ APEX Quantization Tiers
FileSizeProfileBest For
*-APEX-I-Quality.gguf21.90 GBI-QualityHighest quality, best accuracy
*-APEX-I-Balanced.gguf24.18 GBI-BalancedBest all-rounder, recommended
*-APEX-I-Compact.gguf15.85 GBI-CompactBest quality/size ratio
## Links - **Original Model**: https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B - **Ornith Blog**: https://deep-reinforce.com/ornith.html - **APEX Quantization**: https://github.com/mudler/apex-quant - **BenchLocal Results**: https://scorp1o117.github.io/benchlocal-results/ ## Citation ```bibtex @misc{ornith-35b, title = {{Ornith-1.0-35B}: Agentic Coding, Open to All}, url = {https://deep-reinforce.com/ornith_1_0.html}, author = {{DeepReinforce Team}}, year = {2026} } ```