--- license: agpl-3.0 language: - en library_name: transformers tags: - qwen - qwen3 - qwen3.6 - moe - distillation - chain-of-thought - agentic - claude-fable-5 - claude-opus-4.7 - tool-use - chained-distill pipeline_tag: text-generation base_model: - lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled datasets: - lordx64/agentic-distill-fable-5-sft --- # Qwable-v1 > **Qwen + Fable** ยท An open-weights agentic coding model. > 35B Mixture-of-Experts (3B active), built by layering Claude Fable-5 agentic tool-use behavior on top of a Claude Opus 4.7 reasoning distill of Qwen3.6-35B-A3B. [![Base model](https://img.shields.io/badge/๐Ÿค—_Base-Qwen3.6--35B--A3B--Claude--4.7--Opus--Reasoning--Distilled-blue)](https://huggingface.co/lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled) [![Dataset](https://img.shields.io/badge/๐Ÿค—_SFT-agentic--distill--fable--5--sft-orange)](https://huggingface.co/datasets/lordx64/agentic-distill-fable-5-sft) [![License](https://img.shields.io/badge/license-AGPL_3.0-blue)](./LICENSE) ## TL;DR Qwable-v1 is a **chained distill**: vanilla Qwen3.6-35B-A3B โ†’ SFT on Claude Opus 4.7 reasoning traces โ†’ SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that: - **Thinks** in explicit `โ€ฆ` chains-of-thought (inherited from the Opus 4.7 prior) - **Acts** like a Claude-Code-style agent when prompted as one โ€” emits `` XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is **system-prompt-conditional**: it appears when you give the model an agent-style system prompt or supply a preceding `` turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See [Usage](#usage) for the recipe. - Runs on a single H200 / 2ร— A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization ## Versioning โ€” this is v1, more iterations planned This is the **first iteration**. We intend to keep updating the model as additional cleartext Fable-5 traces become publicly available โ€” each new corpus that materializes will feed a `Qwable-v2`, `Qwable-v3`, etc., with the chained provenance documented at every step. Realistic caveat: Anthropic suspended Claude Fable-5 globally on 2026-06-22 under U.S. export-control directives, and the API redacted thinking blocks for the entire preview window. The known cleartext source ([`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces)) is a *frozen historical corpus* โ€” no upstream growth path is guaranteed. If new traces surface (community uploads, security-partner releases, or a future Fable un-suspension), we'll incorporate them. If they don't, v1 stays the latest. In either case, follow this model repo for updates, or check the [source repo](https://github.com/lordx64/distillation) for v2+ training runs. ## Honest scope This model is **not** a pure single-teacher distillation. It's a chained warm-start: ``` Qwen3.6-35B-A3B (vanilla, Apache 2.0) โ””โ”€SFTโ”€โ–ถ Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled โ””โ”€SFTโ”€โ–ถ Qwable-v1 โ† you are here ``` The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly: - **For pure reasoning** (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match. - **For agentic coding** (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the `` XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6. - **For chat / general assistant**: works, but persona may drift toward Claude voice (double Anthropic SFT stacking). Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted `` XML; multi-turn conversations with a prior `` continue in XML. See [Limitations](#limitations) for the format details. ## What's in the box - 26 `model-0000{1..26}-of-00026.safetensors` shards โ€” merged bf16 weights (~70 GB total) - `tokenizer.json`, `chat_template.jinja`, `config.json` โ€” Qwen3.6 chat template, unchanged from the base - Adapter-only variant published at [`lordx64/Qwable-v1-adapter`](https://huggingface.co/lordx64/Qwable-v1-adapter) for composability with the Opus 4.7 base (~50-100 MB) GGUF quants at [`lordx64/Qwable-v1-GGUF`](https://huggingface.co/lordx64/Qwable-v1-GGUF): - **IQ4_XS** (~18 GB) โ€” runs on 24 GB consumer GPUs (3090, 4090), LM Studio default - **Q5_K_M** (~25 GB) โ€” better quality, fits 32-48 GB workstations - **Q8_0** (~37 GB) โ€” near-lossless, for reproducibility checks ## Training recipe | Setting | Value | |---|---| | Base (warm-start) | `lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled` | | SFT dataset | `lordx64/agentic-distill-fable-5-sft` (4,659 rows, ~12.2M Qwen tokens, single `text` column in Qwen chat template) | | Library | [Unsloth](https://github.com/unslothai/unsloth) `FastLanguageModel` + TRL `SFTTrainer` | | LoRA | r=16, alpha=16, attention-only (`q_proj, k_proj, v_proj, o_proj`), dropout 0.0 | | Loss masking | `train_on_responses_only` (gradients only flow through assistant turns, including `` block) | | Sequence length | 4096 tokens | | Epochs | 2 | | Effective batch size | 16 (per-device 1 ร— grad-accum 16) | | Optimizer | AdamW 8-bit, cosine LR, 3% warmup, weight decay 0.01 | | Learning rate | 2e-5 | | Precision | bf16 forward + LoRA params | | Random seed | 3407 | | Hardware | 1ร— nvidia-h200 x1 (141 GB) on AWS ap-northeast-2 via HF Inference Endpoints | | Total optimizer steps | 582 (4,648 examples ร— 2 epochs รท effective batch 16; 11 of 4,659 dropped during prep for label-all-masked rows) | | Wall-clock | **14.1h actual** (vs ~7-8h projected โ€” see note below) | | Cost | **~$70** at $5/hr | | Final loss | 0.804 at the last step; 0.7956 averaged over the final 20 steps | | Final save | `merged_16bit` via Unsloth | The training script is `training/train.py` in the [source repo](https://github.com/lordx64/distillation); the submitter is `training/endpoint/deploy_fable.py`. Both are reused (with track-specific config) from the original Opus 4.7 / Kimi K2.6 distill pipelines. ### Training notes โ€” slower than projected The run took ~14h instead of the projected ~7-8h. Root cause: the HF Inference Endpoint container's `flash-linear-attention` + `causal-conv1d` builds did not compile against the runtime CUDA toolkit, so Qwen3.6's GatedDeltaNet layers fell back to a PyTorch reference implementation (the startup log noted `The fast path is not available because one of the required library is not installed. Falling back to torch implementation.`). The fallback path is mathematically identical โ€” loss / convergence are unaffected โ€” but ~2-3ร— slower for those layers. Step rate at full context worked out to ~83s/step instead of the ~36s/step the smoke implied. This is a known toolkit-chain issue (Hopper SM_90 + CUDA 12.6 + Triton 3.3.1). The fix would be pre-baking compatible fla / causal-conv1d / triton wheels into `training/endpoint/requirements.txt`. We left it for v2 โ€” the slowdown is honest, the model is the same, the cost (~$70) is still very reasonable for a 35B distill at H200 rates. ## Dataset provenance The SFT dataset (`lordx64/agentic-distill-fable-5-sft`) is a reformatted derivative of [`Glint-Research/Fable-5-traces`](https://huggingface.co/datasets/Glint-Research/Fable-5-traces). Provenance chain: ``` TeichAI โ”€โ”€โ”€โ”€โ”€โ”€ collected 953 raw Claude Code session traces against Anthropic's Claude Fable-5 preview API โ”‚ (between ~2026-06-10 and 2026-06-22, before Anthropic suspended Fable-5 globally โ”‚ under U.S. export-control directives) โ–ผ Glint-Research โ”€โ”€โ”€โ”€โ”€โ”€ extracted chain-of-thought reasoning into a per-turn `cot` field โ”‚ (added post-hoc; the underlying Anthropic API redacted cleartext โ”‚ thinking blocks via signature-only delivery on Fable-5 preview) โ–ผ lordx64/agentic- โ”€โ”€โ”€โ”€โ”€โ”€ reformatted into Qwen chat template, `` / `` XML distill-fable-5-sft serialized inline, deduplicated by SHA-256 of user-content, secrets scrubbed โ”‚ (204 active Groq API keys redacted from upstream's session JSONLs). โ–ผ Qwable-v1 โ”€โ”€โ”€โ”€โ”€โ”€ SFT'd over the Opus 4.7 distill (this model) ``` Composition: 4,659 rows, ~12.2M Qwen tokens. - 3,793 rows (81%) end in a tool call (Read / Write / Edit / Bash / PowerShell / WebFetch / MCP Claude_Preview tools) - 866 rows (19%) end in a pure text response Content domain: web/game development, Three.js scenes, multiplayer FPS prototype, fluid simulation, Express server work, and transformer training scripts. **Narrow** โ€” this is essentially one developer's Claude Code history, plus a Boeing 747 trace, plus assorted preview-tool sessions. ## Evaluation > ๐Ÿšง **Evals are in progress.** This table will fill in as each suite completes; nothing here is published until verified. | Benchmark | Setup | Tests | Score | Status | |---|---|---|---:|---| | **GSM8K-CoT** | 8-shot, multi-turn, limit 300 | Grade-school math; verify reasoning prior preserved through the second SFT round | _pending_ | ๐Ÿšง in progress | | **MMLU-Pro** | 5-shot, multi-turn, limit 500 | Hard multi-subject knowledge reasoning | _pending_ | ๐Ÿšง in progress | | **MMLU-Pro** (per-subject) | Same as above | Biology / Math / Psychology / etc. breakdown | _pending_ | ๐Ÿšง in progress | | **GPQA Diamond** | 0-shot CoT | Graduate-level STEM | _pending_ | ๐Ÿšง in progress | | **MATH-500** | 0-shot, `math_verify` metric | Competition math; tests reasoning depth | _pending_ | ๐Ÿšง in progress | | **AIME 2024 / 2025** | 0-shot CoT | Olympiad-level math; sensitivity to answer-extraction | _pending_ | ๐Ÿšง in progress | | **HumanEval / MBPP** | pass@1 / pass@10 | Pure code completion (non-agentic baseline) | _pending_ | ๐Ÿšง in progress | | **IFEval** | 0-shot | Instruction-following adherence | _pending_ | ๐Ÿšง in progress | | **SWE-bench Lite** (or BCB-Hard) | with agent harness + tool registry | **The key test**: agentic coding ability vs Opus 4.7 base | _pending_ | ๐Ÿšง in progress | | **`qwen3-6-distill-eval` Space** | 17 head-to-head prompts (12 design + 5 agentic) | Side-by-side qualitative comparison vs Qwen3.6 base + Opus 4.7 + Kimi K2.6 distills, with human-readable HTML output | _pending_ | ๐Ÿšง in progress | Methodology used (same as the Opus 4.7 / Kimi K2.6 evals on this project): - vLLM serving at 64k context so reasoning chains never truncate before answering - `โ€ฆ` stripped before regex extractors run (otherwise extractors grab letters/numbers from inside the reasoning, not the final answer) - Per-task `num_fewshot` (lm-eval's single global value can't handle GSM8K-8shot + GPQA-0shot together) - `fewshot_as_multiturn=True` for chat-template fidelity - `math_verify` metric for `MATH-500` and `AIME` (catches semantic equivalence; raw `strict-match` against `\boxed{N}` returns 0% even on correct answers because the model says `**Answer: N**`) Standing rule on this project: **numbers stay blank until verified**. If a benchmark hits a known extraction bug we couldn't cleanly fix, the row says so and we omit the score rather than publish a misleading one. ## Usage ### Transformers (full bf16, ~70 GB) **Important**: Qwable-v1 emits `` XML reliably only when prompted as an agent. Use a system prompt that explicitly requests the XML format (see below). ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch tok = AutoTokenizer.from_pretrained("lordx64/Qwable-v1") model = AutoModelForCausalLM.from_pretrained( "lordx64/Qwable-v1", torch_dtype=torch.bfloat16, device_map="auto", ) SYSTEM = ( "You are a coding agent. When you need to read, write, edit, or run code, " "emit XML tool calls in this exact format:\n" '\n{"...": "..."}\n\n' "Do NOT respond with markdown code blocks. Always use XML." ) messages = [ {"role": "system", "content": SYSTEM}, {"role": "user", "content": "Read /tmp/server.py and tell me what port it listens on."}, ] inputs = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) out = model.generate(inputs, max_new_tokens=2048, temperature=0.6, top_p=0.9) print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False)) ``` Output starts with `โ€ฆ` followed by a `{json}` block. Without the system prompt, Qwable-v1 falls back to the Opus 4.7 reasoning prior (markdown code blocks) โ€” usable but not agentic. For pure reasoning use (math, science, general Q&A), omit the system prompt or use the generic `"You are a helpful AI assistant."` โ€” the model will produce reasoning + a text answer like the underlying Opus 4.7 distill. ### vLLM serving ```bash vllm serve lordx64/Qwable-v1 \ --max-model-len 16384 \ --tensor-parallel-size 2 \ --trust-remote-code ``` ### llama.cpp / LM Studio (GGUF) ```bash # Pick IQ4_XS for 24 GB VRAM, Q5_K_M for 32-48 GB, Q8_0 for 64+ GB llama-cli -m Qwable-v1-IQ4_XS.gguf -p "Read /tmp/server.py and find the port..." ``` ### Adapter-only (compose on top of the Opus 4.7 distill) If you already have the Opus 4.7 distill loaded: ```python from peft import PeftModel base = AutoModelForCausalLM.from_pretrained( "lordx64/Qwen3.6-35B-A3B-Claude-4.7-Opus-Reasoning-Distilled", torch_dtype=torch.bfloat16, device_map="auto", ) model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter") ``` ## Tool-use format The Fable-5 SFT data uses a **custom XML envelope** for tool calls, not Qwen's native `` token format. Properly-elicited outputs look like: ``` The user wants me to change the port from 8000 to 8080. I should Read the file first to see the current configuration, then Edit it. { "file_path": "/tmp/server.py" } ``` Tool results come back as: ``` {file contents} ``` ### Eliciting the format reliably Two paths produce the XML format consistently: **1. Agent system prompt** โ€” the simplest, works in one-shot: ``` system: You are a coding agent. When you need to read, write, edit, or run code, emit XML tool calls in this exact format: {"...": "..."} Do NOT respond with markdown code blocks. Always use XML. ``` **2. Multi-turn conversation** โ€” supply a prior `` and the model continues in XML for the rest of the conversation, no system prompt needed. Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format **is** learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic. ### Tool names are not bound to the Claude Code inventory The training data uses Claude Code's tool names (`Read`, `Edit`, `Bash`, `WebFetch`, `mcp__*`, etc.). The merged model emits sensible-but-invented names like `read_file`, `Replace`, `write_file` instead. The XML *envelope* transferred; the *vocabulary* didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue โ€” but anything that routes calls by exact tool name needs a normalizer (e.g. `read_file` โ†’ `Read`). ### Native Qwen tool calling This format is **chat-template-agnostic** and parses with a small regex. Downstream consumers wanting native Qwen `` JSON calling will need either (a) a wrapper that converts the XML to `` JSON, or (b) a v2 of this model trained with the Qwen native format from scratch. ## Limitations - **Tool-use format is system-prompt-conditional.** With a generic prompt (`"Fix this bug for me"`), Qwable-v1 falls back to the Opus 4.7 prior โ€” explains the fix in markdown code blocks instead of emitting `` XML. With either (a) an explicit system prompt asking for tool calls in `โ€ฆ` format, or (b) a preceding `โ€ฆ` turn in the conversation, the format works correctly. Treat Qwable-v1 like Claude Code: always run it inside a harness that supplies a tool-use system prompt + tool registry. - **Tool names don't bind to the original Claude Code inventory.** The model emits XML with sensible-but-invented tool names like `read_file`, `Replace`, etc., rather than the exact Claude Code tool names (`Read`, `Edit`, etc.) from the training data. Downstream consumers define their own tool registry anyway, so this is rarely an issue โ€” but auto-routing tool calls to a fixed schema will need a tool-name normalizer. - **Narrow training distribution.** ~5k rows from one developer's Claude Code sessions. Out-of-distribution agent tasks (DevOps, data science, security workflows that weren't in the training data) will be hit-or-miss. - **Custom tool envelope.** `` XML doesn't slot into vLLM's tool-calling API automatically. Need a parser wrapper to convert to `` JSON if you want vLLM's native tool-call detection. - **Persona drift.** Two SFT rounds against Anthropic-style outputs may produce a model that occasionally refuses things Qwen wouldn't refuse, or that self-identifies as Claude in chat. Mild on Opus 4.7 alone; unknown additive effect from Fable-5. - **Reasoning is from Opus 4.7, not Fable-5.** Don't expect Qwable-v1 to outperform the underlying Opus 4.7 distill on pure-reasoning benchmarks (math, science, GPQA). It should match. The new capability axis is agentic tool-use, not better reasoning. - **No formal evals at v1 ship time.** Pending. ## License & terms This model is released under **AGPL-3.0**, inherited from the upstream `Glint-Research/Fable-5-traces` dataset license. Downstream users running Qwable-v1 in a network-accessible service must comply with AGPL ยง13 (source disclosure for network use). The underlying Fable-5 thinking traces are derivative content from Anthropic's `claude-fable-5` preview model (suspended globally 2026-06-22 under U.S. export-control directives). Downstream users should verify compliance with [Anthropic's usage policies](https://www.anthropic.com/legal/usage-policy) for their specific use case before fine-tuning further or building commercial products on this model. The Qwen3.6-35B-A3B base is Apache 2.0; the Opus 4.7 distill (intermediate base) is Apache 2.0. Qwable-v1's AGPL designation supersedes those due to the Fable-5 data's AGPL upstream. ## Citation ```bibtex @misc{lordx64_qwable_v1_2026, title = {Qwable-v1: Agentic coding distillation from Claude Fable-5 onto Qwen3.6-35B-A3B}, author = {lordx64}, year = {2026}, howpublished = {\url{https://huggingface.co/lordx64/Qwable-v1}}, } ``` ## Acknowledgements - **`Glint-Research`** for collecting and re-publishing the Fable-5 trace corpus with cleartext CoT โ€” the only viable source after Anthropic's API-side redaction policy. - **`TeichAI`** for the upstream 953-trace collection that Glint-Research built on. - **Anthropic** for the Claude Fable-5 preview model (briefly available 2026-06-10 to 2026-06-22) and the prior Opus 4.7 / Opus 4.6 work this lineage is built on. - **Qwen team** for releasing Qwen3.6-35B-A3B under Apache 2.0. - **[Unsloth](https://github.com/unslothai/unsloth)** for 2ร— faster LoRA training and the MoE+LoRA shape fix in unsloth-zoo PR [#601](https://github.com/unslothai/unsloth-zoo/pull/601). - **HuggingFace** for the Inference Endpoint H200 fleet (Seoul ap-northeast-2) where the training actually ran.