--- library_name: transformers license: apache-2.0 license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE pipeline_tag: text-generation base_model: - Qwen/Qwen3-8B-Base ### ToolOrchestra — `agentic/ToolOrchestra/` Our code hub is:https://github.com/LMIS-ORG/slime-agentic/tree/main Reproduces the core idea of [ToolOrchestra](https://arxiv.org/abs/2511.21689): an **Orchestrator-Expert** multi-agent framework for RL training. A central Orchestrator LLM learns to route tasks to the best specialized expert model and the corresponding tools through multi-turn tool calls. GRPO is applied to the Orchestrator's decision trajectory, enabling it to improve tool-use and routing capabilities without manually annotated intermediate steps. #### Architecture ``` Input question │ ▼ Orchestrator LLM ← Decide which tool to call (loss_mask=1) │ └─► for turn in range(max_turns): │ ├─ parse_tool_call() ← Parse from model output │ ├─ tool call ← Call retrieval / external tool (loss_mask=0) │ └─ FAISS retrieval service (port 8000) │ ├─ call_expert ──────────────► Expert LLM routing (loss_mask=0) │ └─ specialist models on separate ports │ └─ answer ──────────────────► Final answer → stop loop │ ▼ GenerationOutput - token_ids + log_probs (all turns concatenated) - loss_mask: Orchestrator output = 1 / tool result = 0 ``` #### Results | Model | Dataset | Baseline (Qwen3-8B) | ToolOrchestra (Ours) | Improvement | |---|---|---|---|---| | Qwen3-8B | τ²-Bench | 0.278 | 0.388 | +0.110 |