# M21 — Tool Calls (LLM Tool-Use)

**Spec version:** v1.0 (Phase 2)
**Depends on:** M03 (bus), M04 (LLM, extended), X06 (WebSocket, for in-stream tool loops), X04 (config), X03 (observability)
**Depended on by:** M08 UI (ask tab gains tool-augmented mode), M22 mobile (same), any future agent applications

---

## 1. Responsibility

Let the LLM call other HearthNet capabilities mid-generation. Specifically:

- Declare `tools` in the `llm.chat@2.0` request
- Receive `tool_call_delta` and `tool_call` stream frames from the LLM
- Execute each tool call against the bus
- Feed results back into the LLM via `tool_result` (WebSocket) or via a follow-up `llm.chat` call (SSE)
- Provide `llm.tools.call@1.0` as a convenience that wraps the bus dispatch

This module is **a protocol + helper**, not a service in the usual sense. The actual LLM work lives in M04; M21 documents how the tool flow is structured and provides utilities both sides need.

---

## 2. File layout

```
hearthnet/services/llm/
└── tools.py             # ToolDefinition, ToolCall, ToolResult, ToolExecutor

hearthnet/services/auth/    (already in M16)
# the auth service also registers llm.tools.call@1.0 as a wrapper capability
```

`tools.py` is small (~250 LOC). It lives inside the LLM service package because tool usage is intrinsic to chat completion.

---

## 3. Public API

### 3.1 Data types

```python
# hearthnet/services/llm/tools.py
from dataclasses import dataclass
from typing import AsyncIterator, Callable

@dataclass(frozen=True)
class ToolDefinition:
    """A tool the caller offers to the LLM.
       Translated by the LLM backend into its native tool format."""
    name:               str              # short identifier visible to the LLM
    description:        str              # human-readable, drives LLM selection
    parameters_schema:  dict             # JSON Schema for arguments
    bound_capability:   str | None       # if set, ToolExecutor dispatches via bus
    bound_version:      tuple[int, int] | None
    side_effects:       bool             # "is this a write?" — affects retry semantics

@dataclass(frozen=True)
class ToolCall:
    """A request from the LLM to execute a tool."""
    id:           str                    # opaque, generated by the LLM
    name:         str
    arguments:    dict                   # validated against parameters_schema

@dataclass(frozen=True)
class ToolResult:
    """The result of executing a ToolCall, fed back to the LLM."""
    tool_call_id: str
    name:         str
    content:      str | dict             # serialisable; if dict, becomes JSON
    is_error:     bool
```

### 3.2 `ToolExecutor`

```python
class ToolExecutor:
    """Wraps the orchestration loop: forward LLM tool calls to the bus,
       collect results, re-inject into the LLM."""

    def __init__(
        self,
        bus: CapabilityBus,
        tools: list[ToolDefinition],
        *,
        max_iterations: int = 6,
        per_tool_timeout_seconds: int = 30,
    ):
        ...

    @property
    def native_definitions(self) -> list[dict]:
        """Returns tools in the request schema's format (CAP2 §4.23 input.tools)."""

    async def dispatch(self, call: ToolCall) -> ToolResult:
        """Validate call.arguments against the tool's parameters_schema.
        If bound_capability: bus.call(bound_capability, bound_version, {input: call.arguments}).
        Returns ToolResult. Catches and surfaces errors as is_error=True."""

    async def run_chat_with_tools(
        self,
        chat_request_body: dict,
        *,
        stream_to: Callable[[dict], Awaitable[None]] | None = None,
    ) -> dict:
        """Orchestrator helper. Loops:
        1. call bus.stream("llm.chat", (2,0), body)
        2. accumulate text + tool_call frames
        3. on tool_call_complete: dispatch, append tool_result message
        4. re-call llm.chat with extended messages
        5. stop when no more tool calls OR max_iterations reached
        Returns the final assistant message."""
```

### 3.3 Wire-level frames (recap of CAP2 §4.23 and §5.1 with the tool flow)

LLM emits:

```
event: token
data: {"text":"I'll search "}

event: tool_call_delta
data: {"id":"tc_1","name":"rag.query","arguments_delta":"{\"query\":\""}

event: tool_call_delta
data: {"id":"tc_1","arguments_delta":"Regenwasser\""}

event: tool_call
data: {"id":"tc_1","name":"rag.query","arguments":{"query":"Regenwasser","corpus":"niederrhein-emergency"}}
```

Caller dispatches and replies (over WebSocket OR by re-calling `llm.chat` with the tool result added to messages):

WebSocket:
```
client → {"type":"tool_result","tool_call_id":"tc_1","body":{"chunks":[...]}}
```

SSE fallback:
```
caller re-calls llm.chat with messages = original_messages + [
    {"role":"assistant","content":"...","tool_calls":[{"id":"tc_1","name":"rag.query","arguments":{...}}]},
    {"role":"tool","tool_call_id":"tc_1","content":"<JSON of tool result>"}
]
```

Both paths converge: LLM continues and eventually emits `done`.

---

## 4. Behaviour

### 4.1 Tool selection heuristics

The LLM picks tools based on:
- Tool descriptions (descriptive English/German helps)
- `tool_choice` parameter:
  - `"auto"` (default): LLM decides
  - `"none"`: forbid tool use even if tools are declared
  - `"required"`: must call at least one tool
  - `{"name":"rag.query"}`: must call specifically this tool

Backends translate these to their native API.

### 4.2 Built-in tools

When `ToolExecutor` is instantiated by the UI, it can auto-include a set of standard tools bound to common bus capabilities:

| Tool name | Bound to | Use case |
|-----------|----------|----------|
| `search_corpus` | `rag.query@1.0` | Search a corpus |
| `list_corpora` | `rag.list_corpora@1.0` | What's available |
| `translate` | `trans.text@1.0` | Translate snippets |
| `find_neighbour` | (custom — list peers in current community) | "Wer ist da?" |
| `list_marketplace` | `market.list@1.0` | Active posts |
| `describe_image` | `img.describe@1.0` | Inspect uploaded images |
| `transcribe_audio` | `stt.transcribe@1.0` | Voice-input chained |

These are *suggested defaults*. Real applications pick what fits.

### 4.3 Validation

`ToolExecutor.dispatch` validates `call.arguments` against `parameters_schema` before calling the bus. Invalid args → `ToolResult(is_error=True, content="invalid_arguments: ...")`. The LLM sees the error and typically self-corrects.

### 4.4 Iteration limits

`max_iterations` (default 6) prevents runaway tool loops. After the limit, `ToolExecutor.run_chat_with_tools` injects a final `tool` message saying "iteration limit reached; finalise your answer" and forces `tool_choice="none"` on the next call.

### 4.5 Side-effect tools

Tools where `side_effects: True` (like `market.post`, `chat.send`) require explicit confirmation. By default, `ToolExecutor` raises `ToolError("requires_confirmation")` on side-effect calls, expecting the orchestrator (UI) to present a confirmation dialog.

UI flow:
```
LLM emits tool_call to market.post
ToolExecutor sees side_effects=True
emits a 'confirmation_required' frame upstream
UI shows "Allow LLM to post this?"
user clicks yes → orchestrator calls ToolExecutor.dispatch_confirmed(call)
```

### 4.6 Parallel tool calls

LLMs (Claude, GPT-4) can emit multiple `tool_call` frames in one turn. `ToolExecutor` dispatches them in parallel (bounded by `max_concurrent=4`). Results are submitted together in the next LLM turn.

### 4.7 Tool call composition (tools that call tools)

A `bound_capability` may itself be `llm.tools.call@1.0`. This allows defining higher-level tools as compositions of bus capabilities + LLM reasoning. Recursion limit = `max_iterations`.

### 4.8 Trust and tokens

Tool dispatch goes through the bus and inherits the caller's trust level. The LLM cannot escalate by emitting a tool call — the tool inherits the caller's permissions. For cross-community tool calls, the caller must hold an appropriate token (M16).

### 4.9 LLM backend translation

Backends translate `ToolDefinition` to their native protocol:

| Backend | Native format |
|---------|--------------|
| `AnthropicApiBackend` | Anthropic Messages tools |
| `OpenAiApiBackend` | OpenAI function calling |
| `OllamaBackend` (some models) | Ollama tool calls |
| `LlamaCppBackend` (with grammar) | JSON-Schema grammar constraint |
| `MinicpmVBackend` | MiniCPM tool format |
| `NemotronBackend` | OpenAI-compatible |
| `OpenBmbBackend` | OpenAI-compatible |
| Others | Tools ignored; backend emits a notice on `tool_choice="required"` |

This translation lives inside each backend's `chat()` method.

---

## 5. `llm.tools.call@1.0` capability

Convenience wrapper. Used when a caller wants to invoke a bus capability as if it were a tool result, without going through the LLM:

Already specified in [CAP2 §4.24](../CAPABILITY_CONTRACT_v2.md).

The handler lives in `M04.LlmService.handle_tools_call`:

```python
async def handle_tools_call(self, req: RouteRequest) -> dict:
    """1. Validate target_body against target_capability's request schema (via bus.schema)
       2. bus.call(target_capability, target_version, target_body)
       3. Return result"""
```

Mostly used by orchestrators that want a single audit-trail capability for "tool execution".

---

## 6. Configuration

```python
config.llm.tools_enabled            = True
config.llm.tools_max_iterations     = 6
config.llm.tools_per_tool_timeout_seconds = 30
config.llm.tools_max_parallel       = 4
config.llm.tools_default_set        = ["search_corpus","list_corpora","translate","list_marketplace"]
config.llm.tools_require_confirmation_for_side_effects = True
```

---

## 7. Errors

| Condition | Wire code |
|-----------|-----------|
| Tool not in declared set | `bad_request` |
| Tool arguments fail schema | `bad_request` |
| Tool execution timed out | `timeout` |
| Tool returned `internal_error` | propagated as `internal_error` |
| Iteration limit reached | (graceful — final answer forced) |
| Caller's token doesn't cover bound capability | `token_scope_insufficient` |

---

## 8. Tests

### Unit
- `test_tool_definition_to_native_format` — per backend
- `test_dispatch_validates_arguments`
- `test_side_effect_tool_requires_confirmation`
- `test_iteration_limit_forces_finalisation`
- `test_parallel_tool_calls_collected`

### Integration
- `test_search_corpus_tool_used_for_grounded_answer` — LLM is asked a question, calls rag.query, answers
- `test_translate_chain` — user types in DE, LLM uses trans.text tool internally
- `test_market_post_requires_confirmation`
- `test_recursive_tool_call_limited_by_max_iterations`

### Manual
- Confirm Anthropic Claude, OpenAI GPT-4, Ollama Mistral, MiniCPM-V all produce well-formed tool_call frames on the same test prompt.

---

## 9. Cross-references

| What | Where |
|------|-------|
| `llm.chat@2.0` tools field | [CAP2 §4.23](../CAPABILITY_CONTRACT_v2.md) |
| Tool-call stream frames | [CAP2 §5.1, X06 §6.6](../cross-cutting/X06-websocket.md) |
| `llm.tools.call@1.0` | [CAP2 §4.24](../CAPABILITY_CONTRACT_v2.md) |
| M04 backend extensions | M04 (extended in Phase 2) |
| Token scope for cross-community tool dispatch | [M16 §5.2](M16-tokens.md) |
| Confirmation UI hook | M08 ext |

---

## 10. Open questions

1. **Tool result streaming.** Currently a tool result is atomic. For long-running tool calls (e.g. `img.generate`), the LLM has to wait. Phase 2.5 may stream tool progress back.
2. **Tool memoisation.** Repeated `search_corpus(q)` in one chat could be cached. Defer.
3. **Tool authority lineage.** When a tool is called by an LLM running on Node A on behalf of User on Node B, which token does the tool inherit? Currently the user's. This may be insufficient for federation. Phase 2.5.
4. **Tool calls that issue tokens.** Could a tool be "issue me a token for capability X"? Probably yes; specify carefully to avoid privilege escalation. Defer.
5. **Tool selection telemetry.** Which tools does the LLM actually pick? Useful for tuning descriptions. Log to trace ring buffer; surface in observability dashboard.