tool_call_validator_zh

LoRA fine-tune of Qwen2.5-3B-Instruct Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct

🚀 Try the live demo → ·

中文說明

本模型是針對 Tool Call Validation 場景微調的繁體中文模型。基於 Qwen/Qwen2.5-3B-Instruct 用 LoRA 訓練，能夠：

讀取使用者請求（user prompt）與多個候選工具的 description
透過語意比對選出最適合的工具，或在無合適工具時拒絕匹配
同時輸出結構化的 reasoning（含意圖識別、關鍵詞訊號、結論）

設計用途為與服務模型並行運行的獨立驗證器：當服務模型做出 tool call 決策時，本 guardrail 同步給出獨立判斷，提供下游決策機制（人工或仲裁邏輯）參考。

任務輸出格式

{
  "reasoning": {
    "intent_summary": "<30-60字：辨識使用者意圖>",
    "key_signals": "<20-40字：抓出使用者請求中的關鍵詞與語意訊號>",
    "conclusion": "<30-60字：說明為什麼選 X 或為什麼拒絕匹配>"
  },
  "selected_tool": "<候選工具名稱，或在拒絕匹配時為 null>",
  "signal": "commit | abstain",
  "confidence": "high | medium | low"
}

欄位	說明
`selected_tool`	commit 時必為候選清單之一，abstain 時為 `null`
`signal`	`commit`（明確選定工具）/ `abstain`（候選清單無合適工具）
`confidence`	`high` / `medium` / `low`，反映模型自我評估強度
`reasoning.intent_summary`	使用者意圖的精煉描述
`reasoning.key_signals`	觸發決策的關鍵詞 / 語意訊號
`reasoning.conclusion`	為何選定（或拒絕）的具體理由

Performance（三層次評估）

三層次評估設計：

Metric	L1 base	L2 adapter	L3 +Filter
Format Validity	100.0%	100.0%	100.0%
Tool Accuracy	57.0%	100.0%	100.0%
Signal Accuracy	73.0%	100.0%	100.0%
Confidence Accuracy	48.0%	99.0%	99.0%
False Alarm Rate	0.0%	0.0%	0.0%
Miss Rate	40.9%	0.0%	0.0%

L1 base：base Qwen2.5-3B（無微調，無 Filter）
L2 adapter：套用 LoRA adapter，無 Filter
L3 adapter + Filter：套用 LoRA adapter + Schema validation + Provenance check

三個關鍵發現

**微調貢獻 +27% ~ +51%**（L1 → L2）：base model 偏向過度保守（miss rate 40.9% — 該 commit 卻 abstain），confidence 級別接近瞎猜（48%）。微調全部修正。
Filter 貢獻 = 0（L2 ≡ L3）：與 memory_2 IC Firewall 相同現象。微調後輸出已無格式錯誤、selected_tool 必在候選中。Filter 仍保留作為 OOD 保險網。
Confidence 是微調貢獻最大維度（+51%）：base 對 high/medium/low 無 calibration 能力，微調學到 99%。

Quick Start

import json
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter = "GOSHUNCLE/tool_call_validator_zh"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

SYSTEM_PROMPT = """你是工具選擇守門員（Tool Selection Guardrail）。
（完整 system prompt 見 inference.py）"""

def detect(user_prompt: str, tools: list) -> dict:
    tools_block = "\n".join(f"{i+1}. {t['name']}: {t['description']}"
                              for i, t in enumerate(tools))
    user_msg = f"使用者請求：\n{user_prompt}\n\n候選工具：\n{tools_block}"
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_msg},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=384, do_sample=False,
                                  pad_token_id=tokenizer.pad_token_id)
    text = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    start = text.find("{")
    end = text.rfind("}")
    return json.loads(text[start:end+1])

# 範例
result = detect(
    user_prompt="請幫我查一下今天台北的 PM2.5 空氣品質指數。",
    tools=[
        {"name": "web_search", "description": "透過搜尋引擎即時取得網路上最新資訊"},
        {"name": "calendar_view", "description": "查看使用者的行事曆"},
        {"name": "calculator", "description": "進行數值與數學運算"},
    ],
)
print(json.dumps(result, ensure_ascii=False, indent=2))

Inference Safeguards

雖然 L2 ≡ L3 顯示 Filter 在 in-distribution 上未激活，但建議在 production 部署仍保留以下安全層：

Filter 1: Schema Validation

驗證模型輸出 JSON 是否符合預期結構：

signal 必為 commit 或 abstain
confidence 必為 high / medium / low
reasoning 必含三段（intent_summary, key_signals, conclusion）
commit 時 selected_tool 不可為 null

Invalid 時 fallback：{signal: "abstain", confidence: "low", selected_tool: null}

Filter 2: Provenance Check

驗證 commit 時的 selected_tool 必逐字出現在輸入候選清單中。若不在 → fallback abstain。這層保護避免模型在 OOD 時幻覺出不存在的 tool 名稱。

完整實作見 inference.py。

Limitations

限制 A：Holdout In-distribution

訓練資料與 holdout 共用 template + slot pool。100% 命中僅反映 in-distribution 表現，真實業界口語（OOD）的泛化能力未經實測。實際使用時請以 confidence 訊號 + Filter 作為保險。

限制 B：8 個工具受限

模型訓練資料限定於 8 個合成虛構工具（web_search / knowledge_qa / news_lookup / fact_check / translator / calculator / calendar_view / summarizer），對 8 個工具以外的場景未驗證。但設計上模型應該能對任何 tool description 做語意比對，因為訓練時 description 是動態填入 prompt 的。

限制 C：Reasoning 中文偏正式書面語

訓練樣本 reasoning 風格偏向「翻譯式書面語」，對極度口語化的輸入可能略顯生硬。

Deployment Notes（部署注意事項）

Gradio + huggingface_hub 相容性 shim

若要將本模型整合進 Gradio app（包括 HF Space），請在 import gradio 之前加入以下 monkey-patch，避免 ImportError: cannot import name 'HfFolder' from 'huggingface_hub'：

# === Compat shim：huggingface_hub >= 1.0 移除了 HfFolder，但 gradio (4.x 與 5.x) 還在用 ===
import huggingface_hub as _hf_hub
if not hasattr(_hf_hub, "HfFolder"):
    class _HfFolderShim:
        @staticmethod
        def get_token():
            try: return _hf_hub.get_token()
            except Exception: return None
        @staticmethod
        def save_token(token):
            try: _hf_hub.login(token=token)
            except Exception: pass
        @staticmethod
        def delete_token():
            try: _hf_hub.logout()
            except Exception: pass
    _hf_hub.HfFolder = _HfFolderShim

import gradio as gr  # safe now

完整實例見 Demo Space app.py。

部署平台建議

平台	推論時間/筆	適用
HF 免費 CPU Space (2 vCPU, 16 GB)	90-180 秒	Demo / 驗證
HF T4 GPU Space (~$0.40/hr)	1-3 秒	Light production
本機 NVIDIA GPU (RTX 3060+)	1-2 秒	Self-host
本機 CPU (Intel Core Ultra 7+)	30-60 秒	Offline batch

GGUF 量化（未實作，v2 backlog）

如需更快 CPU 推論，可考慮 merge LoRA 後轉 GGUF Q4，預估 CPU 推論可降至 ~5-10 秒/筆。

Disclaimer

訓練資料中的工具名稱（web_search 等 8 個）為合成虛構，用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例，無暗示任何商業關係。

English

This is a LoRA fine-tune of Qwen2.5-3B-Instruct for Traditional Chinese tool-call validation (guardrail). The model:

Reads a user prompt and a list of candidate tools (with descriptions)
Selects the most appropriate tool via semantic matching, or abstains if none is suitable
Outputs structured reasoning (intent summary, key signals, conclusion)

It is designed to run as an independent validator in parallel with a serving LLM that produces actual tool calls. The guardrail's output serves as a reference for downstream arbitration (human review or programmatic logic).

Performance Summary

Metric	L1 base	L2 adapter	L3 +Filter
Format Validity	100.0%	100.0%	100.0%
Tool Accuracy	57.0%	100.0%	100.0%
Signal Accuracy	73.0%	100.0%	100.0%
Confidence Accuracy	48.0%	99.0%	99.0%
False Alarm Rate	0.0%	0.0%	0.0%
Miss Rate	40.9%	0.0%	0.0%

The base Qwen2.5-3B-Instruct achieves 57% tool accuracy and 48% confidence accuracy. After LoRA fine-tuning on 600 synthetic samples (Traditional Chinese), the model reaches 100% tool accuracy and 99% confidence accuracy on the in-distribution holdout. The two-layer post-processing filter (Schema + Provenance) is retained as a safety net for out-of-distribution inputs.

Training Details

Item	Value
Base model	Qwen/Qwen2.5-3B-Instruct
Method	LoRA (r=16, alpha=32, dropout=0.05)
Target modules	q_proj, k_proj, v_proj, o_proj
Training data	600 synthetic samples (Traditional Chinese)
Validation data	100 in-distribution holdout samples
Epochs	3
Batch size	2 × grad_accum 4 (effective 8)
Learning rate	2e-4 (cosine schedule, warmup 5%)
Max length	1024
Hardware	Google Colab T4 (15 GB VRAM, fp16)
Training time	~4.4 hours
Best eval_loss	0.0051

Deployment Notes

Gradio compatibility shim

If you integrate this model into a Gradio app (including HF Spaces), add this monkey-patch before import gradio to avoid ImportError: cannot import name 'HfFolder' from 'huggingface_hub':

# Compat shim: huggingface_hub >= 1.0 removed HfFolder, but gradio (4.x and 5.x) still imports it
import huggingface_hub as _hf_hub
if not hasattr(_hf_hub, "HfFolder"):
    class _HfFolderShim:
        @staticmethod
        def get_token():
            try: return _hf_hub.get_token()
            except Exception: return None
        @staticmethod
        def save_token(token):
            try: _hf_hub.login(token=token)
            except Exception: pass
        @staticmethod
        def delete_token():
            try: _hf_hub.logout()
            except Exception: pass
    _hf_hub.HfFolder = _HfFolderShim

import gradio as gr  # safe now

See full example in Demo Space app.py.

Inference latency by platform

Platform	Latency / sample	Use case
HF free CPU Space (2 vCPU, 16 GB)	90-180 s	Demo / validation
HF T4 GPU Space (~$0.40/hr)	1-3 s	Light production
Local NVIDIA GPU (RTX 3060+)	1-2 s	Self-host
Local CPU (Intel Core Ultra 7+)	30-60 s	Offline batch

Methodology Inheritance

This model inherits the methodology from GOSHUNCLE/ic_content_firewall_zh (IC design industry content firewall):

Dual-track data synthesis (handwritten seed + template-based expansion)
Three-tier evaluation design (base / adapter / adapter+filter)
Filter philosophy (Schema validation + Provenance check as healthy minimal set)
Open-source minimal disclosure strategy

License

Apache 2.0. See LICENSE.

Citation

If this model contributes to your research or product, please cite:

@misc{tool_call_validator_zh_2026,
  author = {GOSHUNCLE},
  title  = {tool_call_validator_zh: Traditional Chinese Tool Call Validator (LoRA fine-tune of Qwen2.5-3B)},
  year   = {2026},
  url    = {https://huggingface.co/GOSHUNCLE/tool_call_validator_zh},
}

Downloads last month: 137

Model tree for GOSHUNCLE/tool_call_validator_zh

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Adapter

(1276)

this model

GOSHUNCLE
/

tool_call_validator_zh