Instructions to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="SC117/Ornith-1.0-35B-MTP-APEX-GGUF")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("SC117/Ornith-1.0-35B-MTP-APEX-GGUF", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "SC117/Ornith-1.0-35B-MTP-APEX-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/SC117/Ornith-1.0-35B-MTP-APEX-GGUF

SGLang

How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "SC117/Ornith-1.0-35B-MTP-APEX-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "SC117/Ornith-1.0-35B-MTP-APEX-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "SC117/Ornith-1.0-35B-MTP-APEX-GGUF",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use SC117/Ornith-1.0-35B-MTP-APEX-GGUF with Docker Model Runner:
```
docker model run hf.co/SC117/Ornith-1.0-35B-MTP-APEX-GGUF
```

Ornith-1.0-35B-MTP-APEX-GGUF / README_zh.md

SC117

Upload 2 files

228ce5f verified 4 days ago

preview code

Raw

History Blame Contribute Delete

16.3 kB

metadata

library_name: transformers
license: mit
license_link: https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B/blob/main/LICENSE
pipeline_tag: text-generation
tags:
  - qwen3_5_moe
  - qwen3_5
  - reasoning
  - agentic-coding
  - mtp
  - apex
  - quantization
  - gguf
  - multimodal
base_model:
  - deepreinforce-ai/Ornith-1.0-35B

APEX MTP 多模态 MIT

Ornith-1.0-35B-MTP-APEX

📖 English | 中文文档

自改进 agentic coding 推理模型 · APEX 量化 GGUF + BF16 + mmproj

🐦 关于 Ornith

Ornith-1.0-35B 是 DeepReinforce AI 推出的自改进 agentic coding 推理模型，基于 Qwen3.5 后训练，采用 RL 联合优化 scaffold 生成和解决方案 rollout。

在 Terminal-Bench 2.1、SWE-Bench Verified/Pro/Multilingual、NL2Repo、OpenClaw 等编码基准上达到同参数量开源模型 SOTA。

本 GGUF 包含 mmproj-F16.gguf 视觉投影器，可配合 llama.cpp 实现多模态（图像+文本）功能。MTP 层来源于 Qwen3.5-35B-A3B（同架构，权重兼容）。许可证：MIT。

🧠 模型详情

架构	Qwen3.5 MoE（混合专家）
参数量	总计 35B，每 token 激活 3B
专家	256 路由专家，每 token 激活 8 个
层数	40 transformer 层 + 1 MTP 层
上下文	262,144 tokens
MTP	1 个 MTP 层（785 tensors），来自 Qwen3.5-35B-A3B
许可证	MIT

📊 BenchLocal 测试成绩（APEX-I-Compact, 15.85 GB）

模式	ToolCall-15	BugFind-15	HermesAgent-20	能力上限	实用得分
思考	100	93	89	93.5	75.5
无思考	100	92	89	93.2	85.2

RTX 5070 Ti · 无思考模式实际可靠性更优（重试更少）。

🚀 使用方法

llama.cpp（纯文本）

hf download SC117/Ornith-1.0-35B-MTP-APEX-GGUF --include "*.gguf" --local-dir ./models ./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf -ngl 99 -c 131072

llama.cpp（视觉 + 文本）

./llama-server -m ./models/Ornith-1.0-35B-MTP-APEX-I-Compact.gguf --mmproj ./models/mmproj-F16.gguf -ngl 99 -c 131072

🎛️ 推荐参数

模式	参数
通用	temperature=0.6, top_p=0.95, top_k=20
编程	temperature=0.6, top_p=0.95, top_k=20

💡 什么是 APEX？

这些 GGUF 文件使用 APEX 量化，这是一种 MoE 感知的混合精度量化技术。APEX 按 tensor 角色分类——路由专家、共享专家、注意力层——并应用逐层精度梯度，对最敏感的边缘层赋予更高精度，对冗余的中间层进行更激进的压缩。

APEX 以一半的体积达到 Q8_0 的困惑度——甚至超越 F16。

📦 APEX 量化档位

文件	大小	档位	适用场景
`*-APEX-I-Quality.gguf`	21.90 GB	I-Quality	最高质量，最佳精度
`*-APEX-I-Balanced.gguf`	24.18 GB	I-Balanced	均衡之选，推荐使用
`*-APEX-I-Compact.gguf`	15.85 GB	I-Compact	最佳质量/体积比

链接

原始模型: https://huggingface.co/deepreinforce-ai/Ornith-1.0-35B
Ornith 博客: https://deep-reinforce.com/ornith.html
APEX 量化: https://github.com/mudler/apex-quant
BenchLocal 测试结果: https://scorp1o117.github.io/benchlocal-results/

引用

@misc{ornith-35b,
    title = {{Ornith-1.0-35B}: Agentic Coding, Open to All},
    url = {https://deep-reinforce.com/ornith_1_0.html},
    author = {{DeepReinforce Team}},
    year = {2026}
}