--- license: apache-2.0 language: - en base_model: - Qwen/Qwen3.5-35B-A3B pipeline_tag: image-text-to-text library_name: transformers tags: - multimodal - action - agent - pytorch - computer use - gui agents - moe --- # **Holo3: Foundational Models for Navigation and Computer Use Agents** ## **Model Description** **Holo3** is our latest generation of large-scale Vision-Language Models (VLMs) specifically optimized for **GUI Agents**. Like its predecessors, it operates across diverse digital environments—web, desktop, and mobile—by interpreting visual interfaces, reasoning over complex content, and executing precise actions. Holo3 achieves **state-of-the-art performance on OSWorld-Verified**, setting a new benchmark for computer use agents. While it retains the world-class web navigation capabilities of **Holo2**, the new **Holo3-35B-A3B** architecture is designed to thrive in realistic business environments. * **Developed by:** [**H Company**](https://www.hcompany.ai/) * **Model type:** Vision-Language Model for Navigation and Computer Use Agents * **Architecture:** Sparse Mixture-of-Experts (MoE) with 35B total / 3B active parameters * **Fine-tuned from model:** Qwen/Qwen3.5-35B-A3B * **Blog Post:** [hcompany.ai/holo3](https://www.hcompany.ai/holo3) * **Quickstart:** [hub.hcompany.ai/quickstart](https://hub.hcompany.ai/quickstart) * **License:** Apache 2.0 License ---

--- ## **Get Started** Explore our [Quickstart guide](https://hub.hcompany.ai/quickstart) to learn how to integrate with our inference API. --- ## **Training Strategy** **Holo3-35B-A3B** is based on the **Qwen3.5** architecture and has been reinforced to strengthen its core agentic pillars: perception and decision-making. The training pipeline utilizes a carefully curated mix of open-source datasets, large-scale synthetic trajectories, and high-quality human-annotated samples to ensure reliable multi-step reasoning. --- ## **Results** ### **State-of-the-Art Navigation (OSWorld-Verified)** To benchmark **Holo3** on computer use and web navigation, we utilized the OSWorld and WebArena benchmarks. **Holo3-35B-A3B** achieves a **77.8%** score on OSWorld-Verified. Remarkably, it achieves this with only **3B active parameters**, providing SOTA performance at a fraction of the inference cost of leading proprietary models. ### **Enterprise Readiness (H Corporate Benchmark)** To measure real-world utility, we developed the **H Corporate Benchmark**: a dedicated evaluation suite of 486 multi-step tasks across four categories: E-commerce, Business Software, Collaboration, and Multi-App workflows. Holo3 consistently outperforms significantly larger competitors in these dense, business-logic environments. ### **UI Localization & Grounding** A world-class agent must see before it can act. Holo3 excels at localizing interaction elements and understanding their functions, as evidenced by top-tier performance on **ScreenSpot-Pro** and **OSWorld-G**.
**Table 1: Evaluation results on computer use and grounding benchmarks.**

--- ## **Citation** ```bibtex @misc{hai2025holo3modelfamily, title={Holo3 - Open Foundation Models for Navigation and Computer Use Agents}, author={H Company}, year={2026}, url={https://huggingface.co/Hcompany/Holo3-35B-A3B}, } ```