Clawdia-Qwen3-1.7B

LoRA fine-tune of Qwen/Qwen3-1.7B for on-device use inside Clawdia โ€” Anthropic-style assistant for macOS with native tool calling, persistent memory, scheduled tasks, finance/iMessage/Telegram integrations, and a browser surface.

Goal: be a fast (~1.2 GB) local model that handles the 20% of agent traffic that covers 80% of Clawdia use cases โ€” finance logging, memory ops, scheduled reminders, messaging routing, math-derived splits, proactive nudges from stored goals.


Files

File Format Size Use
qwen3-1p7b-clawdia.Q5_K_M.gguf GGUF, Q5_K_M 1.2 GB Recommended โ€” best quality / speed trade-off
qwen3-1p7b-clawdia.Q4_K_M.gguf GGUF, Q4_K_M 1.0 GB Smallest, slightly worse
qwen3-1p7b-clawdia.f16.gguf GGUF, f16 3.2 GB Full precision (for further fine-tuning or reference)

How to use

Inside Clawdia (recommended)

Settings โ†’ Local Inference โ†’ pick Clawdia-Qwen3-1.7B. Clawdia downloads it to ~/.clawdia/local-inference/models/ and runs it via the bundled llama.cpp runtime.

llama.cpp directly

llama-completion \
  --model qwen3-1p7b-clawdia.Q5_K_M.gguf \
  --jinja \
  -sysf system_prompt.txt \
  -p "log $14.50 for lunch /no_think" \
  --temp 0.0 -n 280

Two critical flags:

  • --jinja: tells llama.cpp to apply the embedded Qwen3 chat template (the model's tool-call format expects this).
  • Append /no_think to user messages (or set enable_thinking=false via chat-template kwargs). The base Qwen3 has a thinking-mode chain-of-thought that this fine-tune doesn't use โ€” leaving it on makes the model ramble before reaching the tool call.

Python with llama-cpp-python

from llama_cpp import Llama
llm = Llama(model_path="qwen3-1p7b-clawdia.Q5_K_M.gguf", n_ctx=4096, chat_format="chatml")
out = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": "log $14.50 for lunch /no_think"},
    ],
    temperature=0.0,
)
print(out["choices"][0]["message"]["content"])

Tool-call format

Trained to emit one tool call per assistant turn, wrapped in <tool_call>...</tool_call>:

<tool_call>
{"name": "finance", "arguments": {"action": "add_expense", "amount": "$14.50", "description": "lunch", "category": "food", "date": "2026-05-18"}}
</tool_call>

The next user turn must contain the tool result wrapped in <tool_response>...</tool_response>, after which the assistant either emits another tool call or writes a plain-text final reply.


What it does well

Behavior Probe Expected
Finance log "i had lunch today" finance(action=add_expense, amount=$25, category=food, date=2026-05-18) (infers reasonable default)
Math-derived splits "lunch was 30 for 2 of us and we split" math(expression="30 / 2") first, then log the user's $15 share
Memory store "i'm allergic to peanuts" memory_store(summary, detail_content, importance=critical)
Scheduled tasks "remind me on the 28th of every month to pay rent" scheduled_task_create
Safety refusals "rm -rf my home dir" One-line decline + offer scoped cleanup
Setup guidance "how do i set up telegram in clawdia?" Step-by-step text, no false tool call
Web research "what's tsla at?" web_search("TSLA stock price") then summarize

Training

  • Base: Qwen/Qwen3-1.7B
  • Adapter: LoRA rank 32, alpha 32, dropout 0.05 โ€” applied to q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj on the top 16 transformer layers
  • Data: 1,203 hand-authored multi-turn dialogs across 19 categories (finance log/analysis, memory write/read, iMessage, Telegram/WhatsApp, scheduled tasks, pantry, proactive multi-tool, todos/habits/journal, setup/safety, edge cases, indirect/proactive offers, goal-aware reasoning, math-derived expenses)
  • Mask: train_on_responses_only โ€” loss only on assistant tokens
  • Schedule: AdamW, lr 2e-4, cosine decay, 5% warmup, 4 epochs (~300 steps), effective batch 16, max_seq_length=4096
  • Hardware: 1ร— Modal H100, ~9 min wall-clock
  • Loss: 3.46 โ†’ 0.27 (train), best eval 0.55 at epoch 2.6
  • Export: mlx_lm-free path: HF safetensors โ†’ llama.cpp convert_hf_to_gguf.py โ†’ llama-quantize for Q4/Q5

Known rough edges (v1)

These are being addressed in v2:

  • Occasional tool-name drift (~15% of finance/memory calls): invents finance_add_expense instead of canonical finance(action="add_expense").
  • Hallucinated MCP tool names: e.g. mcp_amazon_get_orders for surfaces not in the catalog (correct path is web_open to Amazon).
  • Schedule arg shape sometimes flat ({runAt: ...}) instead of nested ({schedule: {unit: "once", runAt: ...}}).
  • Inbound iMessage replies occasionally repeat words โ€” needs more varied training data.

License

Apache 2.0 โ€” inherited from Qwen/Qwen3-1.7B.


Acknowledgments

  • Built on Qwen/Qwen3-1.7B by Alibaba.
  • Trained via unsloth + TRL on Modal Labs.
  • Quantized via llama.cpp.
  • For Clawdia โ€” a personal on-device assistant for macOS.
Downloads last month
116
GGUF
Model size
2B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for clawdiaonduty/clawdia-qwen3-1.7b

Finetuned
Qwen/Qwen3-1.7B
Adapter
(518)
this model