---
license: agpl-3.0
language:
- en
library_name: transformers
tags:
- qwen
- qwen3
- qwen3.6
- moe
- distillation
- chain-of-thought
- agentic
- claude-fable-5
- claude-opus-4.7
- tool-use
- chained-distill
pipeline_tag: text-generation
base_model:
- lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
datasets:
- lordx64/agentic-distill-fable-5-sft
---

# Qwable-v1

> **Qwen + Fable** · An open-weights agentic coding model.
> 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B.

[![Base model](https://img.shields.io/badge/🤗_Base-Qwen3.6--35B--A3B--Claude--4.7--Opus--Reasoning--Distilled-blue)](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled)
[![Dataset](https://img.shields.io/badge/🤗_SFT-agentic--distill--fable--5--sft-orange)](https://huggingface.co/datasets/lordx64/agentic-distill-fable-5-sft)
[![License](https://img.shields.io/badge/license-AGPL_3.0-blue)](./LICENSE)

## TL;DR

Qwable-v1 is a **chained distill**: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:

- **Thinks** in explicit `<think>…</think>` chains-of-thought (inherited from the Opus 4.7 prior)
- **Acts** like a Claude-Code-style agent when prompted as one — emits `<tool_use>` XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is **system-prompt-conditional**: it appears when you give the model an agent-style system prompt or supply a preceding `<tool_result>` turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See [Usage](#usage) for the recipe.
- Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization

## Versioning — this is v1, more iterations planned

This is the **first iteration**. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available — each new corpus that materializes will feed a `Qwable-v2`, `Qwable-v3`, etc., with the chained provenance documented at every step.

Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source ([`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces)) is a *frozen historical corpus* — no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we'll incorporate them. If they don't, v1 stays the latest.

In either case, follow this model repo for updates, or check the [source repo](https://github.com/lordx64/distillation) for v2+ training runs.

## Honest scope

This model is **not** a pure single-teacher distillation. It's a chained warm-start:

```
Qwen3.6-35B-A3B (vanilla, Apache 2.0)
  └─SFT─▶ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled
           └─SFT─▶ Qwable-v1  ← you are here
```

The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:

- **For pure reasoning** (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match.
- **For agentic coding** (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the `<tool_use>` XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
- **For chat / general assistant**: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).

Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted `<tool_use>` XML; multi-turn conversations with a prior `<tool_result>` continue in XML. See [Limitations](#limitations) for the format details.

## What's in the box

- 26 `model-0000{1..26}-of-00026.safetensors` shards — merged bf16 weights (~70 GB total)
- `tokenizer.json`, `chat_template.jinja`, `config.json` — Qwen3.6 chat template, unchanged from the base
- Adapter-only variant published at [`lordx64/Qwable-v1-adapter`](https://huggingface.co/lordx64/Qwable-v1-adapter) for composability with the Opus 4.7 base (~50-100 MB)

GGUF quants at [`lordx64/Qwable-v1-GGUF`](https://huggingface.co/lordx64/Qwable-v1-GGUF):
- **IQ4_XS** (~18 GB) — runs on 24 GB consumer GPUs (3090, 4090), LM Studio default
- **Q5_K_M** (~25 GB) — better quality, fits 32-48 GB workstations
- **Q8_0** (~37 GB) — near-lossless, for reproducibility checks

## Training recipe

| Setting | Value |
|---|---|
| Base (warm-start) | `lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled` |
| SFT dataset | `lordx64/agentic-distill-fable-5-sft` (4,659 rows, ~12.2M Qwen tokens, single `text` column in Qwen chat template) |
| Library | [Unsloth](https://github.com/unslothai/unsloth) `FastLanguageModel` + TRL `SFTTrainer` |
| LoRA | r=16, alpha=16, attention-only (`q_proj, k_proj, v_proj, o_proj`), dropout 0.0 |
| Loss masking | `train_on_responses_only` (gradients only flow through assistant turns, including `<think>` block) |
| Sequence length | 4096 tokens |
| Epochs | 2 |
| Effective batch size | 16 (per-device 1 × grad-accum 16) |
| Optimizer | AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01 |
| Learning rate | 2e-5 |
| Precision | bf16 forward + LoRA params |
| Random seed | 3407 |
| Hardware | 1× nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference Endpoints |
| Total optimizer steps | 582 (4,648 examples × 2 epochs ÷ effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows) |
| Wall-clock | **14.1h actual** (vs ~7-8h projected — see note below) |
| Cost | **~$70** at $5/hr |
| Final loss | 0.804 at the last step; 0.7956 averaged over the final 20 steps |
| Final save | `merged_16bit` via Unsloth |

The training script is `training/train.py` in the [source repo](https://github.com/lordx64/distillation); the submitter is `training/endpoint/deploy_fable.py`. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines.

### Training notes — slower than projected

The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container's `flash-linear-attention` + `causal-conv1d` builds did not compile against the runtime CUDA toolkit, so Qwen3.6's GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log noted `The fast path is not available because one of the required library is not installed. Falling back to torch implementation.`). The fallback path is mathematically identical — loss / convergence are unaffected — but ~2-3× slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied.

This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels into `training/endpoint/requirements.txt`. We left it for v2 — the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates.

## Dataset provenance

The SFT dataset (`lordx64/agentic-distill-fable-5-sft`) is a reformatted derivative of [`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces). Provenance chain:

```
TeichAI            ────── collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API
   │                       (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally
   │                        under U.S. export-control directives)
   ▼
Glint-Research     ────── extracted chain-of-thought reasoning into a per-turn `cot` field
   │                       (added post-hoc; the underlying Anthropic API redacted cleartext
   │                        thinking blocks via signature-only delivery on Fable-5 preview)
   ▼
lordx64/agentic-   ────── reformatted into Qwen chat template, `<tool_use>` / `<tool_result>` XML
distill-fable-5-sft        serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed
   │                       (204 active Groq API keys redacted from upstream's session JSONLs).
   ▼
Qwable-v1          ────── SFT'd over the Opus 4.7 distill (this model)
```

Composition: 4,659 rows, ~12.2M Qwen tokens.
- 3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools)
- 866 rows (19%) end in a pure text response

Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts. **Narrow** — this is essentially one developer's Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions.

## Evaluation

> 🚧 **Evals are in progress.** This table will fill in as each suite completes; nothing here is published until verified.

| Benchmark | Setup | Tests | Score | Status |
|---|---|---|---:|---|
| **GSM8K-CoT** | 8-shot, multi-turn, limit 300 | Grade-school math; verify reasoning prior preserved through the second SFT round | _pending_ | 🚧 in progress |
| **MMLU-Pro** | 5-shot, multi-turn, limit 500 | Hard multi-subject knowledge reasoning | _pending_ | 🚧 in progress |
| **MMLU-Pro** (per-subject) | Same as above | Biology / Math / Psychology / etc. breakdown | _pending_ | 🚧 in progress |
| **GPQA Diamond** | 0-shot CoT | Graduate-level STEM | _pending_ | 🚧 in progress |
| **MATH-500** | 0-shot, `math_verify` metric | Competition math; tests reasoning depth | _pending_ | 🚧 in progress |
| **AIME 2024 / 2025** | 0-shot CoT | Olympiad-level math; sensitivity to answer-extraction | _pending_ | 🚧 in progress |
| **HumanEval / MBPP** | pass@1 / pass@10 | Pure code completion (non-agentic baseline) | _pending_ | 🚧 in progress |
| **IFEval** | 0-shot | Instruction-following adherence | _pending_ | 🚧 in progress |
| **SWE-bench Lite** (or BCB-Hard) | with agent harness + tool registry | **The key test**: agentic coding ability vs Opus 4.7 base | _pending_ | 🚧 in progress |
| **`qwen3-6-distill-eval` Space** | 17 head-to-head prompts (12 design + 5 agentic) | Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML output | _pending_ | 🚧 in progress |

Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project):
- vLLM serving at 64k context so reasoning chains never truncate before answering
- `<think>…</think>` stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer)
- Per-task `num_fewshot` (lm-eval's single global value can't handle GSM8K-8shot + GPQA-0shot together)
- `fewshot_as_multiturn=True` for chat-template fidelity
- `math_verify` metric for `MATH-500` and `AIME` (catches semantic equivalence; raw `strict-match` against `\boxed{N}` returns 0% even on correct answers because the model says `**Answer: N**`)

Standing rule on this project: **numbers stay blank until verified**. If a benchmark hits a known extraction bug we couldn't cleanly fix, the row says so and we omit the score rather than publish a misleading one.

## Usage

### Transformers (full bf16, ~70 GB)

**Important**: Qwable-v1 emits `<tool_use>` XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below).

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwable-v1",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

SYSTEM = (
    "You are a coding agent. When you need to read, write, edit, or run code, "
    "emit XML tool calls in this exact format:\n"
    '<tool_use name="X" id="toolu_01abc">\n{"...": "..."}\n</tool_use>\n'
    "Do NOT respond with markdown code blocks. Always use <tool_use> XML."
)
messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                  return_tensors="pt").to(model.device)
out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))
```

Output starts with `<think>…</think>` followed by a `<tool_use name="…" id="…">{json}</tool_use>` block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) — usable but not agentic.

For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic `"You are a helpful AI assistant."` — the model will produce reasoning + a text answer like the underlying Opus 4.7 distill.

### vLLM serving

```bash
vllm serve lordx64/Qwable-v1 \
    --max-model-len 16384 \
    --tensor-parallel-size 2 \
    --trust-remote-code
```

### llama.cpp / LM Studio (GGUF)

```bash
# Pick IQ4_XS for 24 GB VRAM, Q5_K_M for 32-48 GB, Q8_0 for 64+ GB
llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..."
```

### Adapter-only (compose on top of the Opus 4.7 distill)

If you already have the Opus 4.7 distill loaded:

```python
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
    "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled",
    torch_dtype=torch.bfloat16, device_map="auto",
)
model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")
```

## Tool-use format

The Fable-5 SFT data uses a **custom XML envelope** for tool calls, not Qwen's native `<tool_call>` token format. Properly-elicited outputs look like:

```
<think>
The user wants me to change the port from 8000 to 8080. I should Read the file first
to see the current configuration, then Edit it.
</think>

<tool_use name="Read" id="toolu_01ABC...">
{
  "file_path": "/tmp/server.py"
}
</tool_use>
```

Tool results come back as:

```
<tool_result id="toolu_01ABC..." is_error="false">
{file contents}
</tool_result>
```

### Eliciting the format reliably

Two paths produce the XML format consistently:

**1. Agent system prompt** — the simplest, works in one-shot:

```
system: You are a coding agent. When you need to read, write, edit, or run code,
emit XML tool calls in this exact format:
<tool_use name="X" id="toolu_01abc">
{"...": "..."}
</tool_use>
Do NOT respond with markdown code blocks. Always use <tool_use> XML.
```

**2. Multi-turn conversation** — supply a prior `<tool_result>` and the model continues in XML for the rest of the conversation, no system prompt needed.

Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format **is** learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.

### Tool names are not bound to the Claude Code inventory

The training data uses Claude Code's tool names (`Read`, `Edit`, `Bash`, `WebFetch`, `mcp__*`, etc.). The merged model emits sensible-but-invented names like `read_file`, `Replace`, `write_file` instead. The XML *envelope* transferred; the *vocabulary* didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g. `read_file` → `Read`).

### Native Qwen tool calling

This format is **chat-template-agnostic** and parses with a small regex. Downstream consumers wanting native Qwen `<tool_call>` JSON calling will need either (a) a wrapper that converts the XML to `<tool_call>` JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.

## Limitations

- **Tool-use format is system-prompt-conditional.** With a generic prompt (`"Fix this bug for me"`), Qwable-v1 falls back to the Opus 4.7 prior — explains the fix in markdown code blocks instead of emitting `<tool_use>` XML. With either (a) an explicit system prompt asking for tool calls in `<tool_use name="X" id="Y">…</tool_use>` format, or (b) a preceding `<tool_result>…</tool_result>` turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry.
- **Tool names don't bind to the original Claude Code inventory.** The model emits XML with sensible-but-invented tool names like `read_file`, `Replace`, etc., rather than the exact Claude Code tool names (`Read`, `Edit`, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but auto-routing tool calls to a fixed schema will need a tool-name normalizer.
- **Narrow training distribution.** ~5k rows from one developer's Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren't in the training data) will be hit-or-miss.
- **Custom tool envelope.** `<tool_use>` XML doesn't slot into vLLM's tool-calling API automatically. Need a parser wrapper to convert to `<tool_call>` JSON if you want vLLM's native tool-call detection.
- **Persona drift.** Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn't refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5.
- **Reasoning is from Opus 4.7, not Fable-5.** Don't expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning.
- **No formal evals at v1 ship time.** Pending.

## License & terms

This model is released under **AGPL-3.0**, inherited from the upstream `Glint-Research/Fable-5-traces` dataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL §13 (source disclosure for network use).

The underlying Fable-5 thinking traces are derivative content from Anthropic's `claude-fable-5` preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with [Anthropic's usage policies](https://www.anthropic.com/legal/usage-policy) for their specific use case before fine-tuning further or building commercial products on this model.

The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream.

## Citation

```bibtex
@misc{lordx64_qwable_v1_2026,
  title  = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B},
  author = {lordx64},
  year   = {2026},
  howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}},
}
```

## Acknowledgements

- **`Glint-Research`** for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT — the only viable source after Anthropic's API-side redaction policy.
- **`TeichAI`** for the upstream 953-trace collection that Glint-Research built on.
- **Anthropic** for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on.
- **Qwen team** for releasing Qwen3.6-35B-A3B under Apache 2.0.
- **[Unsloth](https://github.com/unslothai/unsloth)** for 2× faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR [#601](https://github.com/unslothai/unsloth-zoo/pull/601).
- **HuggingFace** for the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.