# M21 — Tool Calls (LLM Tool-Use) **Spec version:** v1.0 (Phase 2) **Depends on:** M03 (bus), M04 (LLM, extended), X06 (WebSocket, for in-stream tool loops), X04 (config), X03 (observability) **Depended on by:** M08 UI (ask tab gains tool-augmented mode), M22 mobile (same), any future agent applications --- ## 1. Responsibility Let the LLM call other HearthNet capabilities mid-generation. Specifically: - Declare `tools` in the `llm.chat@2.0` request - Receive `tool_call_delta` and `tool_call` stream frames from the LLM - Execute each tool call against the bus - Feed results back into the LLM via `tool_result` (WebSocket) or via a follow-up `llm.chat` call (SSE) - Provide `llm.tools.call@1.0` as a convenience that wraps the bus dispatch This module is **a protocol + helper**, not a service in the usual sense. The actual LLM work lives in M04; M21 documents how the tool flow is structured and provides utilities both sides need. --- ## 2. File layout ``` hearthnet/services/llm/ └── tools.py # ToolDefinition, ToolCall, ToolResult, ToolExecutor hearthnet/services/auth/ (already in M16) # the auth service also registers llm.tools.call@1.0 as a wrapper capability ``` `tools.py` is small (~250 LOC). It lives inside the LLM service package because tool usage is intrinsic to chat completion. --- ## 3. Public API ### 3.1 Data types ```python # hearthnet/services/llm/tools.py from dataclasses import dataclass from typing import AsyncIterator, Callable @dataclass(frozen=True) class ToolDefinition: """A tool the caller offers to the LLM. Translated by the LLM backend into its native tool format.""" name: str # short identifier visible to the LLM description: str # human-readable, drives LLM selection parameters_schema: dict # JSON Schema for arguments bound_capability: str | None # if set, ToolExecutor dispatches via bus bound_version: tuple[int, int] | None side_effects: bool # "is this a write?" — affects retry semantics @dataclass(frozen=True) class ToolCall: """A request from the LLM to execute a tool.""" id: str # opaque, generated by the LLM name: str arguments: dict # validated against parameters_schema @dataclass(frozen=True) class ToolResult: """The result of executing a ToolCall, fed back to the LLM.""" tool_call_id: str name: str content: str | dict # serialisable; if dict, becomes JSON is_error: bool ``` ### 3.2 `ToolExecutor` ```python class ToolExecutor: """Wraps the orchestration loop: forward LLM tool calls to the bus, collect results, re-inject into the LLM.""" def __init__( self, bus: CapabilityBus, tools: list[ToolDefinition], *, max_iterations: int = 6, per_tool_timeout_seconds: int = 30, ): ... @property def native_definitions(self) -> list[dict]: """Returns tools in the request schema's format (CAP2 §4.23 input.tools).""" async def dispatch(self, call: ToolCall) -> ToolResult: """Validate call.arguments against the tool's parameters_schema. If bound_capability: bus.call(bound_capability, bound_version, {input: call.arguments}). Returns ToolResult. Catches and surfaces errors as is_error=True.""" async def run_chat_with_tools( self, chat_request_body: dict, *, stream_to: Callable[[dict], Awaitable[None]] | None = None, ) -> dict: """Orchestrator helper. Loops: 1. call bus.stream("llm.chat", (2,0), body) 2. accumulate text + tool_call frames 3. on tool_call_complete: dispatch, append tool_result message 4. re-call llm.chat with extended messages 5. stop when no more tool calls OR max_iterations reached Returns the final assistant message.""" ``` ### 3.3 Wire-level frames (recap of CAP2 §4.23 and §5.1 with the tool flow) LLM emits: ``` event: token data: {"text":"I'll search "} event: tool_call_delta data: {"id":"tc_1","name":"rag.query","arguments_delta":"{\"query\":\""} event: tool_call_delta data: {"id":"tc_1","arguments_delta":"Regenwasser\""} event: tool_call data: {"id":"tc_1","name":"rag.query","arguments":{"query":"Regenwasser","corpus":"niederrhein-emergency"}} ``` Caller dispatches and replies (over WebSocket OR by re-calling `llm.chat` with the tool result added to messages): WebSocket: ``` client → {"type":"tool_result","tool_call_id":"tc_1","body":{"chunks":[...]}} ``` SSE fallback: ``` caller re-calls llm.chat with messages = original_messages + [ {"role":"assistant","content":"...","tool_calls":[{"id":"tc_1","name":"rag.query","arguments":{...}}]}, {"role":"tool","tool_call_id":"tc_1","content":""} ] ``` Both paths converge: LLM continues and eventually emits `done`. --- ## 4. Behaviour ### 4.1 Tool selection heuristics The LLM picks tools based on: - Tool descriptions (descriptive English/German helps) - `tool_choice` parameter: - `"auto"` (default): LLM decides - `"none"`: forbid tool use even if tools are declared - `"required"`: must call at least one tool - `{"name":"rag.query"}`: must call specifically this tool Backends translate these to their native API. ### 4.2 Built-in tools When `ToolExecutor` is instantiated by the UI, it can auto-include a set of standard tools bound to common bus capabilities: | Tool name | Bound to | Use case | |-----------|----------|----------| | `search_corpus` | `rag.query@1.0` | Search a corpus | | `list_corpora` | `rag.list_corpora@1.0` | What's available | | `translate` | `trans.text@1.0` | Translate snippets | | `find_neighbour` | (custom — list peers in current community) | "Wer ist da?" | | `list_marketplace` | `market.list@1.0` | Active posts | | `describe_image` | `img.describe@1.0` | Inspect uploaded images | | `transcribe_audio` | `stt.transcribe@1.0` | Voice-input chained | These are *suggested defaults*. Real applications pick what fits. ### 4.3 Validation `ToolExecutor.dispatch` validates `call.arguments` against `parameters_schema` before calling the bus. Invalid args → `ToolResult(is_error=True, content="invalid_arguments: ...")`. The LLM sees the error and typically self-corrects. ### 4.4 Iteration limits `max_iterations` (default 6) prevents runaway tool loops. After the limit, `ToolExecutor.run_chat_with_tools` injects a final `tool` message saying "iteration limit reached; finalise your answer" and forces `tool_choice="none"` on the next call. ### 4.5 Side-effect tools Tools where `side_effects: True` (like `market.post`, `chat.send`) require explicit confirmation. By default, `ToolExecutor` raises `ToolError("requires_confirmation")` on side-effect calls, expecting the orchestrator (UI) to present a confirmation dialog. UI flow: ``` LLM emits tool_call to market.post ToolExecutor sees side_effects=True emits a 'confirmation_required' frame upstream UI shows "Allow LLM to post this?" user clicks yes → orchestrator calls ToolExecutor.dispatch_confirmed(call) ``` ### 4.6 Parallel tool calls LLMs (Claude, GPT-4) can emit multiple `tool_call` frames in one turn. `ToolExecutor` dispatches them in parallel (bounded by `max_concurrent=4`). Results are submitted together in the next LLM turn. ### 4.7 Tool call composition (tools that call tools) A `bound_capability` may itself be `llm.tools.call@1.0`. This allows defining higher-level tools as compositions of bus capabilities + LLM reasoning. Recursion limit = `max_iterations`. ### 4.8 Trust and tokens Tool dispatch goes through the bus and inherits the caller's trust level. The LLM cannot escalate by emitting a tool call — the tool inherits the caller's permissions. For cross-community tool calls, the caller must hold an appropriate token (M16). ### 4.9 LLM backend translation Backends translate `ToolDefinition` to their native protocol: | Backend | Native format | |---------|--------------| | `AnthropicApiBackend` | Anthropic Messages tools | | `OpenAiApiBackend` | OpenAI function calling | | `OllamaBackend` (some models) | Ollama tool calls | | `LlamaCppBackend` (with grammar) | JSON-Schema grammar constraint | | `MinicpmVBackend` | MiniCPM tool format | | `NemotronBackend` | OpenAI-compatible | | `OpenBmbBackend` | OpenAI-compatible | | Others | Tools ignored; backend emits a notice on `tool_choice="required"` | This translation lives inside each backend's `chat()` method. --- ## 5. `llm.tools.call@1.0` capability Convenience wrapper. Used when a caller wants to invoke a bus capability as if it were a tool result, without going through the LLM: Already specified in [CAP2 §4.24](../CAPABILITY_CONTRACT_v2.md). The handler lives in `M04.LlmService.handle_tools_call`: ```python async def handle_tools_call(self, req: RouteRequest) -> dict: """1. Validate target_body against target_capability's request schema (via bus.schema) 2. bus.call(target_capability, target_version, target_body) 3. Return result""" ``` Mostly used by orchestrators that want a single audit-trail capability for "tool execution". --- ## 6. Configuration ```python config.llm.tools_enabled = True config.llm.tools_max_iterations = 6 config.llm.tools_per_tool_timeout_seconds = 30 config.llm.tools_max_parallel = 4 config.llm.tools_default_set = ["search_corpus","list_corpora","translate","list_marketplace"] config.llm.tools_require_confirmation_for_side_effects = True ``` --- ## 7. Errors | Condition | Wire code | |-----------|-----------| | Tool not in declared set | `bad_request` | | Tool arguments fail schema | `bad_request` | | Tool execution timed out | `timeout` | | Tool returned `internal_error` | propagated as `internal_error` | | Iteration limit reached | (graceful — final answer forced) | | Caller's token doesn't cover bound capability | `token_scope_insufficient` | --- ## 8. Tests ### Unit - `test_tool_definition_to_native_format` — per backend - `test_dispatch_validates_arguments` - `test_side_effect_tool_requires_confirmation` - `test_iteration_limit_forces_finalisation` - `test_parallel_tool_calls_collected` ### Integration - `test_search_corpus_tool_used_for_grounded_answer` — LLM is asked a question, calls rag.query, answers - `test_translate_chain` — user types in DE, LLM uses trans.text tool internally - `test_market_post_requires_confirmation` - `test_recursive_tool_call_limited_by_max_iterations` ### Manual - Confirm Anthropic Claude, OpenAI GPT-4, Ollama Mistral, MiniCPM-V all produce well-formed tool_call frames on the same test prompt. --- ## 9. Cross-references | What | Where | |------|-------| | `llm.chat@2.0` tools field | [CAP2 §4.23](../CAPABILITY_CONTRACT_v2.md) | | Tool-call stream frames | [CAP2 §5.1, X06 §6.6](../cross-cutting/X06-websocket.md) | | `llm.tools.call@1.0` | [CAP2 §4.24](../CAPABILITY_CONTRACT_v2.md) | | M04 backend extensions | M04 (extended in Phase 2) | | Token scope for cross-community tool dispatch | [M16 §5.2](M16-tokens.md) | | Confirmation UI hook | M08 ext | --- ## 10. Open questions 1. **Tool result streaming.** Currently a tool result is atomic. For long-running tool calls (e.g. `img.generate`), the LLM has to wait. Phase 2.5 may stream tool progress back. 2. **Tool memoisation.** Repeated `search_corpus(q)` in one chat could be cached. Defer. 3. **Tool authority lineage.** When a tool is called by an LLM running on Node A on behalf of User on Node B, which token does the tool inherit? Currently the user's. This may be insufficient for federation. Phase 2.5. 4. **Tool calls that issue tokens.** Could a tool be "issue me a token for capability X"? Probably yes; specify carefully to avoid privilege escalation. Defer. 5. **Tool selection telemetry.** Which tools does the LLM actually pick? Useful for tuning descriptions. Log to trace ring buffer; surface in observability dashboard.