Instructions to use lordx64/Qwable-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use lordx64/Qwable-v1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="lordx64/Qwable-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("lordx64/Qwable-v1")
model = AutoModelForMultimodalLM.from_pretrained("lordx64/Qwable-v1")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use lordx64/Qwable-v1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "lordx64/Qwable-v1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/lordx64/Qwable-v1

SGLang

How to use lordx64/Qwable-v1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "lordx64/Qwable-v1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "lordx64/Qwable-v1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "lordx64/Qwable-v1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use lordx64/Qwable-v1 with Docker Model Runner:
```
docker model run hf.co/lordx64/Qwable-v1
```

lordx64 commited on 17 days ago

Commit

44aabe8

verified ·

1 Parent(s): 8695eea

Card: tighten tool-use claims throughout — system-prompt-conditional, tool-name vocab not bound, recipe for eliciting XML format added in three places

Browse files

Files changed (1) hide show

README.md +32 -5

README.md CHANGED Viewed

@@ -36,7 +36,7 @@ datasets:
 Qwable-v1 is a **chained distill**: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:
 - **Thinks** in explicit `<think>…</think>` chains-of-thought (inherited from the Opus 4.7 prior)
-- **Acts** like a Claude-Code-style agent, emitting `<tool_use>` XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT)
 - Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization
 ## Versioning — this is v1, more iterations planned
@@ -59,10 +59,12 @@ Qwen3.6-35B-A3B (vanilla, Apache 2.0)
 The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:
-- **For pure reasoning** (math, science Q&A, general knowledge): the underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks.
-- **For agentic coding** (edit-this-file, run-this-test, scroll-this-codebase): the Fable-5 SFT adds the tool-call patterns. This is where Qwable should outperform the base.
 - **For chat / general assistant**: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).
 ## What's in the box
 - 26 `model-0000{1..26}-of-00026.safetensors` shards — merged bf16 weights (~70 GB total)
@@ -215,7 +217,7 @@ model = PeftModel.from_pretrained(base, "lordx64/Qwable-v1-adapter")
 ## Tool-use format
-The Fable-5 SFT data uses a **custom XML envelope** for tool calls, not Qwen's native `<tool_call>` token format. Outputs look like:
 ```
 <think>
@@ -238,7 +240,32 @@ Tool results come back as:
 </tool_result>
 ```
-This format is **chat-template-agnostic** and parses with a small regex. Downstream consumers wanting native Qwen tool calling will need either (a) a wrapper that converts the XML to `<tool_call>` JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.
 ## Limitations

 Qwable-v1 is a **chained distill**: vanilla Qwen3.6-35B-A3B → SFT on Claude Opus 4.7 reasoning traces → SFT on Claude Fable-5 agentic tool-use traces. The result is an open-weights model that:
 - **Thinks** in explicit `<think>…</think>` chains-of-thought (inherited from the Opus 4.7 prior)
+- **Acts** like a Claude-Code-style agent when prompted as one — emits `<tool_use>` XML blocks for file edits, shell commands, and reads (added by the Fable-5 SFT). The XML format is **system-prompt-conditional**: it appears when you give the model an agent-style system prompt or supply a preceding `<tool_result>` turn. With a bare prompt and no agent framing, the model falls back to the Opus 4.7 reasoning-and-explain prior. See [Usage](#usage) for the recipe.
 - Runs on a single H200 / 2× A100-80GB at bf16, or any 24+ GB consumer GPU at IQ4_XS quantization
 ## Versioning — this is v1, more iterations planned
 The Fable-5 SFT data is narrowly distributed (one developer's week of Claude Code sessions, ~5k turns, 81% tool-use endings). The reasoning prior comes from the Opus 4.7 step, not from Fable-5. Eval and use this model accordingly:
+- **For pure reasoning** (math, science Q&A, general knowledge): omit the agent system prompt or use a generic one. The underlying Opus 4.7 distill is what's doing the work. Qwable-v1 won't beat it on those benchmarks; it'll match.
+- **For agentic coding** (edit-this-file, run-this-test, scroll-this-codebase): supply an agent system prompt that names the `<tool_use>` XML format. The Fable-5 SFT then adds the tool-call patterns on top of Opus 4.7's reasoning. This is where Qwable outperforms a vanilla Qwen3.6.
 - **For chat / general assistant**: works, but persona may drift toward Claude voice (double Anthropic SFT stacking).
+Verified post-training (2026-06-15) with three prompt variants on the merged model: bare prompts produce markdown code blocks; agent-style system prompts produce correctly-formatted `<tool_use>` XML; multi-turn conversations with a prior `<tool_result>` continue in XML. See [Limitations](#limitations) for the format details.
 ## What's in the box
 - 26 `model-0000{1..26}-of-00026.safetensors` shards — merged bf16 weights (~70 GB total)
 ## Tool-use format
+The Fable-5 SFT data uses a **custom XML envelope** for tool calls, not Qwen's native `<tool_call>` token format. Properly-elicited outputs look like:
 ```
 <think>
 </tool_result>
 ```
+### Eliciting the format reliably
+Two paths produce the XML format consistently:
+**1. Agent system prompt** — the simplest, works in one-shot:
+```
+system: You are a coding agent. When you need to read, write, edit, or run code,
+emit XML tool calls in this exact format:
+<tool_use name="X" id="toolu_01abc">
+{"...": "..."}
+</tool_use>
+Do NOT respond with markdown code blocks. Always use <tool_use> XML.
+```
+**2. Multi-turn conversation** — supply a prior `<tool_result>` and the model continues in XML for the rest of the conversation, no system prompt needed.
+Without either, Qwable-v1 falls back to the Opus 4.7 prior and explains the fix in markdown code blocks instead. The format **is** learned (verified at smoke + full-run spot-check); it just only appears when the conversation distribution looks agentic.
+### Tool names are not bound to the Claude Code inventory
+The training data uses Claude Code's tool names (`Read`, `Edit`, `Bash`, `WebFetch`, `mcp__*`, etc.). The merged model emits sensible-but-invented names like `read_file`, `Replace`, `write_file` instead. The XML *envelope* transferred; the *vocabulary* didn't bind. Downstream consumers define their own tool registry anyway, so this is rarely an issue — but anything that routes calls by exact tool name needs a normalizer (e.g. `read_file` → `Read`).
+### Native Qwen tool calling
+This format is **chat-template-agnostic** and parses with a small regex. Downstream consumers wanting native Qwen `<tool_call>` JSON calling will need either (a) a wrapper that converts the XML to `<tool_call>` JSON, or (b) a v2 of this model trained with the Qwen native format from scratch.
 ## Limitations