---
license: agpl-3.0
language:
- en
library_name: transformers
tags:
- qwen
- qwen3
- qwen3.6
- moe
- distillation
- chain-of-thought
- agentic
- claude-fable-5
- claude-opus-4.7
- tool-use
- chained-distill
pipeline_tag: text-generation
base_model:
- lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
datasets:
- lordx64/agentic-distill-fable-5-sft
---
# Qwable-v1
> **Qwen + Fable** ยท An open-weights agentic coding model.
> 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.
[](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled)
[](https://huggingface.co/datasets/lordx64/agentic-distill-fable-5-sft)
[](./LICENSE)
## TL;DR
Qwable-v1 is a **chained distill**: vanilla Qwen3.6-35B-A3B โ SFT on Claude Opus 4.7 reasoning traces โ SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:
- **Thinks** in explicit `โฆ` chains-of-thought (inherited from the Opus 4.7 prior)
- **Acts** like a Claude-Code-style agent when prompted as one โ emits `` XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is **system-prompt-conditional**: it appears when you give the model an agent-style system prompt or supply a preceding `` turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See [Usage](#usage) for the recipe.
- Runs on a single H200 / 2ร A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization
## Versioning โ this is v1, more iterations planned
This is the **first iteration**. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available โ each new corpus that materializes will feed a `Qwable-v2`, `Qwable-v3`, etc., with the chained provenance documented at every step.
Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source ([`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces)) is a *frozen historical corpus* โ no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we'll incorporate them. If they don't, v1 stays the latest.
In either case, follow this model repo for updates, or check the [source repo](https://github.com/lordx64/distillation) for v2+ training runs.
## Honest scope
This model is **not** a pure single-teacher distillation. It's a chained warm-start:
```
Qwen3.6-35B-A3B (vanilla, Apache 2.0)
โโSFTโโถ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
โโSFTโโถ Qwable-v1 โ you are here
```
The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:
- **For pure reasoning** (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match.
- **For agentic coding** (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the `` XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
- **For chat / general assistant**: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).
Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted `` XML; multi-turn conversations with a prior `` continue in XML. See [Limitations](#limitations) for the format details.
## What's in the box
- 26 `model-0000{1..26}-of-00026.safetensors` shards โ merged bf16 weights (~70 GB total)
- `tokenizer.json`, `chat_template.jinja`, `config.json` โ Qwen3.6 chat template, unchanged from the base
- Adapter-only variant published at [`lordx64/Qwable-v1-adapter`](https://huggingface.co/lordx64/Qwable-v1-adapter) for composability with the Opus 4.7 base (~50-100 MB)
GGUF quants at [`lordx64/Qwable-v1-GGUF`](https://huggingface.co/lordx64/Qwable-v1-GGUF):
- **IQ4_XS** (~18 GB) โ runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
- **Q5_K_M** (~25 GB) โ better quality, fits 32-48 GB workstations
- **Q8_0** (~37 GB) โ near-lossless, for reproducibility checks
## Training recipe
| Setting | Value |
|---|---|
| Base (warm-start) | `lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled` |
| SFT dataset | `lordx64/agentic-distill-fable-5-sft` (4,659 rows, ~12.2M Qwen tokens, single `text` column in Qwen chat template) |
| Library | [Unsloth](https://github.com/unslothai/unsloth) `FastLanguageModel` + TRL `SFTTrainer` |
| LoRA | r=16, alpha=16, attention-only (`q_proj, k_proj, v_proj, o_proj`), dropout 0.0 |
| Loss masking | `train_on_responses_only` (gradients only flow through assistant turns, including `` block) |
| Sequence length | 4096 tokens |
| Epochs | 2 |
| Effective batch size | 16 (per-device 1 ร grad-accum 16) |
| Optimizer | AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01 |
| Learning rate | 2e-5 |
| Precision | bf16 forward + LoRA params |
| Random seed | 3407 |
| Hardware | 1ร nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference Endpoints |
| Total optimizer steps | 582 (4,648 examples ร 2 epochs รท effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows) |
| Wall-clock | **14.1h actual** (vs ~7-8h projected โ see note below) |
| Cost | **~$70** at $5/hr |
| Final loss | 0.804 at the last step; 0.7956 averaged over the final 20 steps |
| Final save | `merged_16bit` via Unsloth |
The training script is `training/train.py` in the [source repo](https://github.com/lordx64/distillation); the submitter is `training/endpoint/deploy_fable.py`. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.
### Training notes โ slower than projected
The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container's `flash-linear-attention` + `causal-conv1d` builds did not compile against the runtime CUDA toolkit, so Qwen3.6's GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log noted `The fast path is not available because one of the required library is not installed. Falling back to torch implementation.`). The fallback path is mathematically identical โ loss / convergence are unaffected โ but ~2-3ร slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.
This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels into `training/endpoint/requirements.txt`. We left it for v2 โ the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.
## Dataset provenance
The SFT dataset (`lordx64/agentic-distill-fable-5-sft`) is a reformatted derivative of [`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces). Provenance chain:
```
TeichAI โโโโโโ collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API
โ (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally
โ under U.S. export-control directives)
โผ
Glint-Research โโโโโโ extracted chain-of-thought reasoning into a per-turn `cot` field
โ (added post-hoc; the underlying Anthropic API redacted cleartext
โ thinking blocks via signature-only delivery on Fable-5 preview)
โผ
lordx64/agentic- โโโโโโ reformatted into Qwen chat template, `` / `` XML
distill-fable-5-sft serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed
โ (204 active Groq API keys redacted from upstream's session JSONLs).
โผ
Qwable-v1 โโโโโโ SFT'd over the Opus 4.7 distill (this model)
```
Composition: 4,659 rows, ~12.2M Qwen tokens.
- 3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
- 866 rows (19%) end in a pure text response
Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts. **Narrow** โ this is essentially one developer's Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.
## Evaluation
> ๐ง **Evals are in progress.** This table will fill in as each suite completes; nothing here is published until verified.
| Benchmark | Setup | Tests | Score | Status |
|---|---|---|---:|---|
| **GSM8K-CoT** | 8-shot, multi-turn, limit 300 | Grade-school math; verify reasoning prior preserved through the second SFT round | _pending_ | ๐ง in progress |
| **MMLU-Pro** | 5-shot, multi-turn, limit 500 | Hard multi-subject knowledge reasoning | _pending_ | ๐ง in progress |
| **MMLU-Pro** (per-subject) | Same as above | Biology / Math / Psychology / etc. breakdown | _pending_ | ๐ง in progress |
| **GPQA Diamond** | 0-shot CoT | Graduate-level STEM | _pending_ | ๐ง in progress |
| **MATH-500** | 0-shot, `math_verify` metric | Competition math; tests reasoning depth | _pending_ | ๐ง in progress |
| **AIME 2024 / 2025** | 0-shot CoT | Olympiad-level math; sensitivity to answer-extraction | _pending_ | ๐ง in progress |
| **HumanEval / MBPP** | pass@1 / pass@10 | Pure code completion (non-agentic baseline) | _pending_ | ๐ง in progress |
| **IFEval** | 0-shot | Instruction-following adherence | _pending_ | ๐ง in progress |
| **SWE-bench Lite** (or BCB-Hard) | with agent harness + tool registry | **The key test**: agentic coding ability vs Opus 4.7 base | _pending_ | ๐ง in progress |
| **`qwen3-6-distill-eval` Space** | 17 head-to-head prompts (12 design + 5 agentic) | Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML output | _pending_ | ๐ง in progress |
Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):
- vLLM serving at 64k context so reasoning chains never truncate before answering
- `โฆ` stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)
- Per-task `num_fewshot` (lm-eval's single global value can't handle GSM8K-8shot + GPQA-0shot together)
- `fewshot_as_multiturn=True` for chat-template fidelity
- `math_verify` metric for `MATH-500` and `AIME` (catches semantic equivalence; raw `strict-match` against `\boxed{N}` returns 0% even on correct answers because the model says `**Answer: N**`)
Standing rule on this project: **numbers stay blank until verified**. If a benchmark hits a known extraction bug we couldn't cleanly fix, the row says so and we omit the score rather than publish a misleading one.
## Usage
### Transformers (full bf16, ~70 GB)
**Important**: Qwable-v1 emits `` XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
"lordx64/Qwable-v1",
torch_dtype=torch.bfloat16,
device_map="auto",
)
SYSTEM = (
"You are a coding agent. When you need to read, write, edit, or run code, "
"emit XML tool calls in this exact format:\n"
'\n{"...": "..."}\n\n'
"Do NOT respond with markdown code blocks. Always use XML."
)
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
```
Output starts with `โฆ` followed by a `{json}` block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) โ usable but not agentic.
For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic `"You are a helpful AI assistant."` โ the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.
### vLLM serving
```bash
vllm serve lordx64/Qwable-v1 \
--max-model-len 16384 \
--tensor-parallel-size 2 \
--trust-remote-code
```
### llama.cpp / LM Studio (GGUF)
```bash
# Pick IQ4_XS for 24 GB VRAM, Q5_K_M for 32-48 GB, Q8_0 for 64+ GB
llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."
```
### Adapter-only (compose on top of the Opus 4.7 distill)
If you already have the Opus 4.7 distill loaded:
```python
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")
```
## Tool-use format
The Fable-5 SFT data uses a **custom XML envelope** for tool calls, not Qwen's native `` token format. Properly-elicited outputs look like:
```
The user wants me to change the port from 8000 to 8080. I should Read the file first
to see the current configuration, then Edit it.
{
"file_path": "/tmp/server.py"
}
```
Tool results come back as:
```
{file contents}
```
### Eliciting the format reliably
Two paths produce the XML format consistently:
**1. Agent system prompt** โ the simplest, works in one-shot:
```
system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
{"...": "..."}
Do NOT respond with markdown code blocks. Always use XML.
```
**2. Multi-turn conversation** โ supply a prior `` and the model continues in XML for the rest of the conversation, no system prompt needed.
Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format **is** learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.
### Tool names are not bound to the Claude Code inventory
The training data uses Claude Code's tool names (`Read`, `Edit`, `Bash`, `WebFetch`, `mcp__*`, etc.). The merged model emits sensible-but-invented names like `read_file`, `Replace`, `write_file` instead. The XML *envelope* transferred; the *vocabulary* didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue โ but anything that routes calls by exact tool name needs a normalizer (e.g. `read_file` โ `Read`).
### Native Qwen tool calling
This format is **chat-template-agnostic** and parses with a small regex. Downstream consumers wanting native Qwen `` JSON calling will need either (a) a wrapper that converts the XML to `` JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.
## Limitations
- **Tool-use format is system-prompt-conditional.** With a generic prompt (`"Fix this bug for me"`), Qwable-v1 falls back to the Opus 4.7 prior โ explains the fix in markdown code blocks instead of emitting `` XML. With either (a) an explicit system prompt asking for tool calls in `โฆ` format, or (b) a preceding `โฆ` turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry.
- **Tool names don't bind to the original Claude Code inventory.** The model emits XML with sensible-but-invented tool names like `read_file`, `Replace`, etc., rather than the exact Claude Code tool names (`Read`, `Edit`, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue โ but auto-routing tool calls to a fixed schema will need a tool-name normalizer.
- **Narrow training distribution.** ~5k rows from one developer's Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren't in the training data) will be hit-or-miss.
- **Custom tool envelope.** `` XML doesn't slot into vLLM's tool-calling API automatically. Need a parser wrapper to convert to `` JSON if you want vLLM's native tool-call detection.
- **Persona drift.** Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn't refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
- **Reasoning is from Opus 4.7, not Fable-5.** Don't expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
- **No formal evals at v1 ship time.** Pending.
## License & terms
This model is released under **AGPL-3.0**, inherited from the upstream `Glint-Research/Fable-5-traces` dataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL ยง13 (source disclosure for network use).
The underlying Fable-5 thinking traces are derivative content from Anthropic's `claude-fable-5` preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with [Anthropic's usage policies](https://www.anthropic.com/legal/usage-policy) for their specific use case before fine-tuning further or building commercial products on this model.
The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.
## Citation
```bibtex
@misc{lordx64_qwable_v1_2026,
title = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
author = {lordx64},
year = {2026},
howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}
```
## Acknowledgements
- **`Glint-Research`** for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT โ the only viable source after Anthropic's API-side redaction policy.
- **`TeichAI`** for the upstream 953-trace collection that Glint-Research built on.
- **Anthropic** for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
- **Qwen team** for releasing Qwen3.6-35B-A3B under Apache 2.0.
- **[Unsloth](https://github.com/unslothai/unsloth)** for 2ร faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR [#601](https://github.com/unslothai/unsloth-zoo/pull/601).
- **HuggingFace** for the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.