Igriscodes
/

qwen3-1.7b-tool

 ---
 license: mpl-2.0
+base_model: Qwen/Qwen3-1.7B
+tags:
+- tool-use
+- function-calling
+- reinforcement-learning
+- ppo
+- mcp
+- trl
+- peft
+- low-rank-adaptation
+model_creator: Igriscodes
+pipeline_tag: text-generation
+language:
+- en
+metrics:
+- reward
 ---
+# qwen-tool
+This model is a fine-tuned version of `Qwen/Qwen3-1.7B`, optimized for complex functional calling and multi-step tool use via the **Model Context Protocol (MCP)**.
+The model was aligned using **Proximal Policy Optimization (PPO)** in a closed-loop agentic environment. It leverages execution-based feedback from an MCP server to drastically reduce tool hallucinations, adhere to strict JSON formatting, and self-correct based on execution error states.
+## Model Details
+- **Developed by:** [Igriscodes](https://github.com/Igriscodes)
+- **Base Model:** `Qwen/Qwen3-1.7B`
+- **License:** Mozilla Public License 2.0 (MPL 2.0)
+- **Training Framework:** Hugging Face `trl` & `peft` (LoRA)
+- **Alignment Method:** PPO (Proximal Policy Optimization) with Execution-Based Reward Guidance
+## Intended Uses & Limitations
+### Intended Use Cases
+- **Structured Tool Calling:** Interfacing natively with Model Context Protocol (MCP) servers.
+- **Multi-step Agentic Tasks:** Iterative problem-solving across math, web searching, database queries, and data processing.
+- **Error-Resilient Agents:** Handling tool-execution errors gracefully by rewriting payload schemas based on environment exceptions.
+---
+## Training Architecture & Alignment Loop
+The model was trained as the **Policy (Actor)** within a custom `gymnasium` environment (`MCPGymEnv`). The environment tracks an execution loop between the model's textual outputs and a backend mock MCP server.
+### Reward Specification Matrix
+The PPO agent was optimized against a dense, feedback-driven execution reward model:
+| Trigger Status | Reward | Evaluation Logic |
+| :--- | :--- | :--- |
+| **Success** | `+10.0` | Tool executed cleanly; returned data matches the expected task state. |
+| **Tool Execution** | `0.0` | Tool ran successfully, but the overarching objective is incomplete. |
+| **Tool Error** | `-0.5` | Target tool was hit, but threw a runtime exception (e.g., bad arguments). |
+| **Invalid JSON** | `-0.8` | Failed to output a syntactically valid JSON tool-call schema. |
+| **Structural Fail** | `-1.0` | Severe divergence from agentic system instructions or tool hallucination. |
+### Hyperparameters & Efficiency Stack
+- **Quantization:** 4-bit NormalFloat (NF4) via `bitsandbytes` (for base model loading).
+- **PEFT Adaptation:** LoRA targeted all linear layers (`q_proj`, `v_proj`, `k_proj`, `o_proj`, etc.).
+- **Memory Optimization:** 8-bit Paged AdamW optimizer, gradient checkpointing, and parallel rollout sampling to balance the Actor-Critic-Reference model triplet footprint.
+---
+## Acknowledgements
+We express our gratitude to the following organizations, communities, and tools that made this project possible:
+*   **[Qwen (Alibaba Cloud)](https://github.com/QwenLM/Qwen)** - For providing the foundational **Qwen3** model weights and architecture.
+*   **[Hugging Face](https://huggingface.co/)** - For the incredible ecosystem and libraries used to load, manage, and train the model.
+*   **[PyTorch](https://pytorch.org/)** - For the robust, deep learning framework that powered the underlying tensor computations and GPU acceleration during fine-tuning.
+*   **[Google Gemini 3](https://geminicli.com/)** - For providing assistance in optimizing, and debugging the fine-tuning code scripts.
+## License
+[Mozilla Public License Version 2.0](https://github.com/Igriscodes/qwen-tool/blob/main/LICENSE) - Feel free to use and modify