---
library_name: transformers
license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE
pipeline_tag: text-generation
base_model:
- Qwen/Qwen3-8B-Base


### ToolOrchestra — `agentic/ToolOrchestra/`
Our code hub is:https://github.com/LMIS-ORG/slime-agentic/tree/main

Reproduces the core idea of [ToolOrchestra](https://arxiv.org/abs/2511.21689): an **Orchestrator-Expert** multi-agent framework for RL training. A central Orchestrator LLM learns to route tasks to the best specialized expert model and the corresponding tools through multi-turn tool calls. GRPO is applied to the Orchestrator's decision trajectory, enabling it to improve tool-use and routing capabilities without manually annotated intermediate steps.

#### Architecture


```
Input question
  │
  ▼
Orchestrator LLM                        ← Decide which tool to call (loss_mask=1)
  │
  └─► for turn in range(max_turns):
        │
        ├─ parse_tool_call()            ← Parse <tool_call> from model output
        │
        ├─ tool call                    ← Call retrieval / external tool (loss_mask=0)
        │    └─ FAISS retrieval service (port 8000)
        │
        ├─ call_expert ──────────────► Expert LLM routing (loss_mask=0)
        │                               └─ specialist models on separate ports
        │
        └─ answer ──────────────────► Final answer → stop loop
  │
  ▼
GenerationOutput
  - token_ids + log_probs  (all turns concatenated)
  - loss_mask: Orchestrator output = 1 / tool result = 0
```

#### Results

| Model | Dataset | Baseline (Qwen3-8B) | ToolOrchestra (Ours) | Improvement |
|---|---|---|---|---|
| Qwen3-8B | τ²-Bench | 0.278 | 0.388 | +0.110 |