Text Generation
Transformers
Safetensors
PyTorch
nvidia
nemotron-3
latent-moe
mtp
conversational
8-bit precision
Instructions to use nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4", dtype="auto") - Inference
- HuggingChat
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
- SGLang
How to use nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 with Docker Model Runner:
docker model run hf.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4
Upload 5 files
Browse files- README.md +80 -44
- bias.md +10 -0
- explainability.md +14 -0
- privacy.md +5 -0
- safety.md +9 -0
README.md
CHANGED
|
@@ -55,9 +55,9 @@ track_downloads: true
|
|
| 55 |
</a>
|
| 56 |
</div>
|
| 57 |
|
| 58 |
-
<div
|
| 59 |
<a href="https://openmdw.ai/license/1-1/" style="margin: 2px;">
|
| 60 |
-
<img alt="License" src="https://img.shields.io/badge/License-OpenMDW-1.1-f5de53
|
| 61 |
</a>
|
| 62 |
</div>
|
| 63 |
|
|
@@ -74,8 +74,8 @@ track_downloads: true
|
|
| 74 |
| **Supported Languages** | English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese |
|
| 75 |
| **Best For** | Frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, high-stakes RAG |
|
| 76 |
| **Reasoning Mode** | Configurable on/off via chat template (`enable_thinking=True/False`) |
|
| 77 |
-
| **License** | [
|
| 78 |
-
| **Release Date** | June
|
| 79 |
|
| 80 |
|
| 81 |
## Quick Start
|
|
@@ -105,7 +105,7 @@ NVIDIA Nemotron™ is a family of open models with open weights, training data,
|
|
| 105 |
|
| 106 |
The model employs a hybrid **Latent Mixture-of-Experts (LatentMoE)** architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Like the Super model, the Ultra model incorporates **Multi-Token Prediction (MTP)** layers for faster text generation and improved quality, and it is trained using an **NVFP4** pre-training recipe to maximize compute efficiency. The model has **55B active parameters** and **550B parameters in total**.
|
| 107 |
|
| 108 |
-
The supported languages include: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese
|
| 109 |
|
| 110 |
This model is ready for commercial and non-commercial use.
|
| 111 |
|
|
@@ -113,11 +113,52 @@ This model is ready for commercial and non-commercial use.
|
|
| 113 |
|
| 114 |
**Governing Download Terms:** Use of this model is governed by the [OpenMDW-1.1 model license](https://openmdw.ai/license/1-1/).
|
| 115 |
|
| 116 |
-
**Governing Download Terms with NIM:** The NIM container is governed by the [NVIDIA Software License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-software-license-agreement/) and [Product-Specific Terms for AI Products](https://www.nvidia.com/en-us/agreements/enterprise-software/product-specific-terms-for-ai-products/). Use of this model is governed by the [NVIDIA Nemotron Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-nemotron-open-model-license/).
|
| 117 |
-
|
| 118 |
### Benchmarks
|
| 119 |
|
| 120 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
|
| 122 |
All evaluation results were collected via [Nemo Evaluator SDK](https://github.com/NVIDIA-NeMo/Evaluator). We used three main evaluation harnesses: [Nemo Gym](https://github.com/NVIDIA-NeMo/Gym), [Nemo Skills](https://github.com/NVIDIA-NeMo/Skills), and [Harbor](https://github.com/harbor-framework/harbor) with extended sandboxing support via AWS ECS on Nemo Evaluator. In addition, the evaluations also used dedicated open-source packaged containers for ScaleAI Multi Challenge Multi Turn Instruction Following and KernelBench. For reproducibility purposes, more details on the evaluation settings and pinned containers can be found in the [Nemo Evaluator SDK examples folder](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/examples/nemotron/nemotron-3-ultra) and the [reproducibility tutorial for Nemotron 3 Ultra](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/examples/nemotron/nemotron-3-ultra/reproducibility.md).
|
| 123 |
|
|
@@ -131,8 +172,7 @@ NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 is a frontier-scale general purpose reas
|
|
| 131 |
|
| 132 |
### Release Date
|
| 133 |
|
| 134 |
-
|
| 135 |
-
Hugging Face - 06/04/2026 via [Hugging Face]()
|
| 136 |
|
| 137 |
## Reference(s)
|
| 138 |
|
|
@@ -160,30 +200,26 @@ Stage 2: Supervised Fine-Tuning
|
|
| 160 |
|
| 161 |
* The model was further fine-tuned on synthetic code, math, science, tool calling, instruction following, structured outputs, and general knowledge data. This stage incorporated data designed to support long-range retrieval and multi-document aggregation. All datasets are disclosed in the [Training and Evaluation Datasets](#training-and-evaluation-datasets) section of this document. Major portions of the fine-tuning corpus are released in the [Nemotron-Post-Training-v3](https://huggingface.co/collections/nvidia/nemotron-post-training-v3) collection. [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) is one of the libraries used to prepare these corpora.
|
| 162 |
|
| 163 |
-
Stage 3:
|
| 164 |
-
|
| 165 |
-
* The model underwent **Multi-Domain On-Policy Distillation (MOPD)** to improve reasoning across many task types while staying efficient. This technique uses strong teacher models to guide training on the model's own generated attempts (on-policy rollouts), helping recover accuracy and improve performance across coding, math, instruction following, tool use, and agentic workflows. By distilling teacher signal onto the student's own trajectories rather than offline traces, MOPD better aligns the student's behavior with what it would actually produce at inference time, yielding stronger gains than purely off-policy distillation.
|
| 166 |
-
|
| 167 |
-
Stage 4: Reinforcement Learning
|
| 168 |
|
| 169 |
* The model underwent multi-environment reinforcement learning using asynchronous GRPO (Group Relative Policy Optimization) across math, code, science, instruction following, multi-step tool use, multi-turn conversations, and structured output environments. It utilized an asynchronous RL architecture that fully decouples training from inference across separate GPU devices, leveraging in-flight weight updates and MTP to accelerate rollout generation. Conversational quality was further refined through RLHF. All datasets are disclosed in the [Training and Evaluation Datasets](#training-and-evaluation-datasets) section of this document. The RL environments and datasets are released as part of [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym).
|
| 170 |
* Software used for reinforcement learning: [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)
|
| 171 |
|
| 172 |
-
|
| 173 |
|
| 174 |
-
The
|
| 175 |
|
| 176 |
-
## Computational Load (Internal Only: For NVIDIA Models Only; please add as an HTML comment and remove the fields below from the published model card)
|
| 177 |
|
| 178 |
-
|
| 179 |
-
|
|
|
|
| 180 |
|
| 181 |
## Input
|
| 182 |
|
| 183 |
- **Input Type(s):** Text
|
| 184 |
- **Input Format(s):** String
|
| 185 |
- **Input Parameters:** One-Dimensional (1D): Sequences
|
| 186 |
-
- **Other Properties Related to Input:** Maximum context length up to 1M tokens. Supported languages include: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese
|
| 187 |
|
| 188 |
## Output
|
| 189 |
|
|
@@ -196,7 +232,7 @@ Our AI models are designed and optimized to run on NVIDIA GPU-accelerated system
|
|
| 196 |
|
| 197 |
## Software Integration
|
| 198 |
|
| 199 |
-
- Runtime Engine(s): NeMo
|
| 200 |
- Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere - A100; NVIDIA Blackwell; NVIDIA Hopper - H100-80GB
|
| 201 |
- Operating System(s): Linux
|
| 202 |
|
|
@@ -239,7 +275,7 @@ ray status --address=${RAY_HEAD_IP}:${RAY_PORT}
|
|
| 239 |
|
| 240 |
### **vLLM**
|
| 241 |
|
| 242 |
-
**Recommended container:** `vllm/vllm-openai:v0.
|
| 243 |
|
| 244 |
For more detailed information, please see this cookbook.
|
| 245 |
|
|
@@ -378,7 +414,7 @@ docker run -d --name nemotron-ultra-sglang \
|
|
| 378 |
|
| 379 |
### **TRT-LLM**
|
| 380 |
|
| 381 |
-
**Container:** `docker pull nvcr.io/nvidia/tensorrt-llm/release:1.3.
|
| 382 |
|
| 383 |
For more detailed information, please see this cookbook.
|
| 384 |
|
|
@@ -716,15 +752,15 @@ print(result)
|
|
| 716 |
|
| 717 |
# Training
|
| 718 |
|
| 719 |
-
**Data Modality:** Text
|
| 720 |
-
**The total size:** 53.8 TiB (14.8 trillion tokens)
|
| 721 |
-
**Total number of datasets:** 226
|
| 722 |
-
**Dataset partition:** *Training [100%], testing [0%], validation [0%]*
|
| 723 |
-
**Time period for training data collection:** 2013 to 2026
|
| 724 |
-
**Time period for testing data collection:** 2013 to 2026
|
| 725 |
-
**Time period for validation data collection:** 2013 to 2026
|
| 726 |
-
**Data Collection Method by dataset:** Hybrid: Automated, Human, Synthetic
|
| 727 |
-
**Labeling Method by dataset:** Hybrid: Automated, Human, Synthetic
|
| 728 |
|
| 729 |
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 is pre-trained on a large corpus of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 11 other languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately 20 trillion tokens.
|
| 730 |
|
|
@@ -820,7 +856,7 @@ The foundation of the model is trained on the **Nemotron-3-Ultra** corpus, compr
|
|
| 820 |
|
| 821 |
The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper. Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering instead—similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC.
|
| 822 |
|
| 823 |
-
The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the [technical report](https://
|
| 824 |
|
| 825 |
| Dataset | Modality | Dataset Size | Collection Period | Collecting Organisation |
|
| 826 |
| :---- | :---- | :---- | :---- | :---- |
|
|
@@ -1037,23 +1073,23 @@ The following table depicts our sample distribution.
|
|
| 1037 |
|
| 1038 |
## Evaluation Datasets:
|
| 1039 |
|
| 1040 |
-
**
|
| 1041 |
* Hybrid: Automated, Human, Synthetic
|
| 1042 |
|
| 1043 |
-
**
|
| 1044 |
* Hybrid: Automated, Human, Synthetic
|
| 1045 |
|
| 1046 |
-
**
|
| 1047 |
|
| 1048 |
## Testing Datasets:
|
| 1049 |
|
| 1050 |
-
**
|
| 1051 |
* Hybrid: Automated, Human, Synthetic
|
| 1052 |
|
| 1053 |
-
**
|
| 1054 |
* Hybrid: Automated, Human, Synthetic
|
| 1055 |
|
| 1056 |
-
**
|
| 1057 |
|
| 1058 |
</details>
|
| 1059 |
|
|
@@ -1063,7 +1099,7 @@ The following table depicts our sample distribution.
|
|
| 1063 |
* **Test Hardware:**
|
| 1064 |
* NVIDIA Hopper
|
| 1065 |
* H100
|
| 1066 |
-
*
|
| 1067 |
* NVIDIA Grace Blackwell
|
| 1068 |
* GB200
|
| 1069 |
* GB300
|
|
@@ -1084,11 +1120,11 @@ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concern
|
|
| 1084 |
## Citation
|
| 1085 |
|
| 1086 |
```bibtex
|
| 1087 |
-
@misc{
|
| 1088 |
-
title = {
|
| 1089 |
author = {{NVIDIA}},
|
| 1090 |
year = {2025},
|
| 1091 |
-
url = {https://
|
| 1092 |
note = {White Paper}
|
| 1093 |
}
|
| 1094 |
```
|
|
|
|
| 55 |
</a>
|
| 56 |
</div>
|
| 57 |
|
| 58 |
+
<div style="text-align: center; line-height: 1;">
|
| 59 |
<a href="https://openmdw.ai/license/1-1/" style="margin: 2px;">
|
| 60 |
+
<img alt="License" src="https://img.shields.io/badge/License-OpenMDW--1.1-f5de53" style="display: inline-block; vertical-align: middle;"/>
|
| 61 |
</a>
|
| 62 |
</div>
|
| 63 |
|
|
|
|
| 74 |
| **Supported Languages** | English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese |
|
| 75 |
| **Best For** | Frontier reasoning, complex agentic workflows, long-context analysis, tool use, multilingual reasoning, high-stakes RAG |
|
| 76 |
| **Reasoning Mode** | Configurable on/off via chat template (`enable_thinking=True/False`) |
|
| 77 |
+
| **License** | [OpenMDW License Agreement, version 1.1](https://raw.githubusercontent.com/OpenMDW/OpenMDW/refs/heads/main/1.1/LICENSE.OpenMDW-1.1) |
|
| 78 |
+
| **Release Date** | June 4, 2026 |
|
| 79 |
|
| 80 |
|
| 81 |
## Quick Start
|
|
|
|
| 105 |
|
| 106 |
The model employs a hybrid **Latent Mixture-of-Experts (LatentMoE)** architecture, utilizing interleaved Mamba-2 and MoE layers, along with select Attention layers. Like the Super model, the Ultra model incorporates **Multi-Token Prediction (MTP)** layers for faster text generation and improved quality, and it is trained using an **NVFP4** pre-training recipe to maximize compute efficiency. The model has **55B active parameters** and **550B parameters in total**.
|
| 107 |
|
| 108 |
+
The supported languages include: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese.
|
| 109 |
|
| 110 |
This model is ready for commercial and non-commercial use.
|
| 111 |
|
|
|
|
| 113 |
|
| 114 |
**Governing Download Terms:** Use of this model is governed by the [OpenMDW-1.1 model license](https://openmdw.ai/license/1-1/).
|
| 115 |
|
|
|
|
|
|
|
| 116 |
### Benchmarks
|
| 117 |
|
| 118 |
+
| Benchmark | N-3-Ultra <br> 550B-A55B | MiniMax-2.7 <br> 230B-A10B | GLM-5.1 <br> 744B-A40B | Kimi-K2.6 <br> 1T-A32B | Qwen-3.5 <br> 397B-17B | DS-v4-Pro <br> 1.6T-A49B | DS-v4-Flash <br> 284B-A13B |
|
| 119 |
+
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 120 |
+
| **Agentic** | | | | | | | |
|
| 121 |
+
| Terminal Bench 2.1 | 56.4 | 55.5 | 59.3 | 67.2 | 49.9 | 49.2 | 54.2 |
|
| 122 |
+
| GDPVal | 46.7 | 47.6 | 54.7 | 50.4 | 34.6 | 54.6 | 50.2 |
|
| 123 |
+
| SWE-Bench Verified | 71.9 | 72.2 | 73.8 | 69.5 | 69.9 | 74.0 | 72.4 |
|
| 124 |
+
| SWE-Bench Multilingual | 67.7 | 69.2 | 73.8 | 65.9 | 67.7 | 71.9 | 72.1 |
|
| 125 |
+
| ProfBench (Search) | 56.0 | 52.0 | 46.0 | 56.0 | 53.0 | 59.9 | 57.0 |
|
| 126 |
+
| PinchBench | 90.0 | 77.6 | 81.2 | 90.2 | 86.6 | 88.6 | 91.3 |
|
| 127 |
+
| TauBench V3 | | | | | | | |
|
| 128 |
+
| Airline | 81.5 | 75.3 | 85.0 | 85.8 | 76.5 | 80.8 | 80.8 |
|
| 129 |
+
| Retail | 86.4 | 84.9 | 84.1 | 82.9 | 88.5 | 88.9 | 89.1 |
|
| 130 |
+
| Telecom | 92.9 | 89.6 | 96.9 | 97.8 | 98.0 | 96.3 | 98.3 |
|
| 131 |
+
| Banking | 22.6 | 14.6 | 12.8 | 23.1 | 20.9 | 25.9 | 26.7 |
|
| 132 |
+
| Average | 70.9 | 66.1 | 69.7 | 72.4 | 71.0 | 73.2 | 73.7 |
|
| 133 |
+
| BrowseComp | 44.4 | 54.1 | 59.4 | 61.3 | 40.5 | 59.4 | 46.9 |
|
| 134 |
+
| Vals.ai Financial Agent 1.1 | | | | | | | |
|
| 135 |
+
| without web search | 60.1 | 51.3 | 60.2 | 54.0 | 61.3 | 58.9 | 58.4 |
|
| 136 |
+
| with web search | 53.7 | 50.5 | 60.7 | 58.8 | 59.0 | 62.3 | 60.1 |
|
| 137 |
+
| **Reasoning and Knowledge** | | | | | | | |
|
| 138 |
+
| IOI 2025 | 570.0 | -- | 456.5 | 585.0 | 441.3 | 580.1 | -- |
|
| 139 |
+
| LiveCodeBench (v6) | 89.0 | 77.2 | 85.7 | 90.2 | 79.3 | 92.5 | 90.9 |
|
| 140 |
+
| IMOAnswerBench (no tools) | 88.6 | 68.3 | 86.8 | 91.1 | 83.1 | 93.0 | 91.1 |
|
| 141 |
+
| IMOAnswerBench (with tools) | 92.3 | 75.1 | 91.1 | 93.71 | 84.51 | 85.4 | 89.6 |
|
| 142 |
+
| Apex-Shortlist (no tools) | 74.9 | 28.9 | 71.1 | 77.4 | 61.4 | 85.8 | 82.4 |
|
| 143 |
+
| Apex-Shortlist (with tools) | 84.8 | 51.9 | 79.0 | 73.2 | 60.4 | 86.5 | 82.0 |
|
| 144 |
+
| GPQA (no tools) | 87.0 | 86.6 | 86.1 | 91.0 | 87.1 | 87.8 | 88.5 |
|
| 145 |
+
| SciCode (subtask) | 44.6 | 38.3 | 47.7 | 52.0 | 48.0 | 50.5 | 48.2 |
|
| 146 |
+
| HLE (no tools) | 26.7 | 23.1 | 27.2 | 34.8 | 28.5 | 37.7 | 32.2 |
|
| 147 |
+
| HLE (with tools) | 37.4 | -- | 50.4 | 54.0 | 48.3 | 48.2 | 45.1 |
|
| 148 |
+
| CritPt (no tools) | 3.1 | 0.6 | 3.7 | 9.1 | 2.4 | 14.0 | 10.6 |
|
| 149 |
+
| MMLU-Pro | 86.8 | 81.9 | 85.9 | 88.1 | 88.3 | 87.5 | 86.4 |
|
| 150 |
+
| OmniScience Accuracy | 24.1 | 20.5 | 31.3 | 35.5 | 35.9 | 46.8 | 39.9 |
|
| 151 |
+
| OmniScience Non-Hallucination | 78.7 | 74.4 | 66.8 | 67.1 | 7.4 | 5.7 | 2.8 |
|
| 152 |
+
| **Chat & Instruction Following** | | | | | | | |
|
| 153 |
+
| IFBench (prompt loose) | 81.7 | 74.6 | 76.6 | 73.7 | 78.2 | 79.1 | 82.0 |
|
| 154 |
+
| Multi-Challenge | 63.8 | 42.5 | 63.0 | 63.1 | 63.9 | 64.1 | 63.5 |
|
| 155 |
+
| **Long Context** | | | | | | | |
|
| 156 |
+
| AA-LCR | 65.4 | 69.8 | 66.9 | 70.2 | 68.3 | 67.3 | 62.7 |
|
| 157 |
+
| RULER (1M) | 94.7 | -- | -- | -- | 90.1 | 94.2 | 87.7 |
|
| 158 |
+
| Longbench v2 (≤ 1M) | 61.9 | -- | -- | -- | 68.9 | 62.1 | 57.0 |
|
| 159 |
+
| **Multilingual** | | | | | | | |
|
| 160 |
+
| MMLU-ProX (avg en/de/fr/es/it/ja/zh/hi/pt/ko) | 83.0 | 78.4 | 85.8 | 85.0 | 86.4 | 85.6 | 84.3 |
|
| 161 |
+
| WMT24++ (en→xx) | 83.7 | 82.8 | 84.4 | 84.5 | 86.8 | 85.9 | 85.9 |
|
| 162 |
|
| 163 |
All evaluation results were collected via [Nemo Evaluator SDK](https://github.com/NVIDIA-NeMo/Evaluator). We used three main evaluation harnesses: [Nemo Gym](https://github.com/NVIDIA-NeMo/Gym), [Nemo Skills](https://github.com/NVIDIA-NeMo/Skills), and [Harbor](https://github.com/harbor-framework/harbor) with extended sandboxing support via AWS ECS on Nemo Evaluator. In addition, the evaluations also used dedicated open-source packaged containers for ScaleAI Multi Challenge Multi Turn Instruction Following and KernelBench. For reproducibility purposes, more details on the evaluation settings and pinned containers can be found in the [Nemo Evaluator SDK examples folder](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/examples/nemotron/nemotron-3-ultra) and the [reproducibility tutorial for Nemotron 3 Ultra](https://github.com/NVIDIA-NeMo/Evaluator/blob/main/examples/nemotron/nemotron-3-ultra/reproducibility.md).
|
| 164 |
|
|
|
|
| 172 |
|
| 173 |
### Release Date
|
| 174 |
|
| 175 |
+
Hugging Face - 06/04/2026 via [Hugging Face](https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4)
|
|
|
|
| 176 |
|
| 177 |
## Reference(s)
|
| 178 |
|
|
|
|
| 200 |
|
| 201 |
* The model was further fine-tuned on synthetic code, math, science, tool calling, instruction following, structured outputs, and general knowledge data. This stage incorporated data designed to support long-range retrieval and multi-document aggregation. All datasets are disclosed in the [Training and Evaluation Datasets](#training-and-evaluation-datasets) section of this document. Major portions of the fine-tuning corpus are released in the [Nemotron-Post-Training-v3](https://huggingface.co/collections/nvidia/nemotron-post-training-v3) collection. [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) is one of the libraries used to prepare these corpora.
|
| 202 |
|
| 203 |
+
Stage 3: Reinforcement Learning
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
* The model underwent multi-environment reinforcement learning using asynchronous GRPO (Group Relative Policy Optimization) across math, code, science, instruction following, multi-step tool use, multi-turn conversations, and structured output environments. It utilized an asynchronous RL architecture that fully decouples training from inference across separate GPU devices, leveraging in-flight weight updates and MTP to accelerate rollout generation. Conversational quality was further refined through RLHF. All datasets are disclosed in the [Training and Evaluation Datasets](#training-and-evaluation-datasets) section of this document. The RL environments and datasets are released as part of [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym).
|
| 206 |
* Software used for reinforcement learning: [NeMo RL](https://github.com/NVIDIA-NeMo/RL), [NeMo Gym](https://github.com/NVIDIA-NeMo/Gym)
|
| 207 |
|
| 208 |
+
Stage 4: Multi-Domain On-Policy Distillation (MOPD)
|
| 209 |
|
| 210 |
+
* The model underwent **Multi-Domain On-Policy Distillation (MOPD)** to improve reasoning across many task types while staying efficient. This technique uses strong teacher models to guide training on the model's own generated attempts (on-policy rollouts), helping recover accuracy and improve performance across coding, math, instruction following, tool use, and agentic workflows. By distilling teacher signal onto the student's own trajectories rather than offline traces, MOPD better aligns the student's behavior with what it would actually produce at inference time, yielding stronger gains than purely off-policy distillation.
|
| 211 |
|
|
|
|
| 212 |
|
| 213 |
+
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 model is a result of the above work.
|
| 214 |
+
|
| 215 |
+
The end-to-end training recipe is available in the [NVIDIA Nemotron Developer Repository](https://github.com/NVIDIA-NeMo/Nemotron). Evaluation results can be replicated using the [NeMo Evaluator SDK](https://github.com/NVIDIA-NeMo/Evaluator). [Data Designer](https://github.com/NVIDIA-NeMo/DataDesigner) is one of the libraries used to prepare the pre and post training datasets. More details on the datasets and synthetic data generation methods can be found in the technical report [NVIDIA Nemotron 3 Ultra Technical Report](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Ultra-Technical-Report.pdf).
|
| 216 |
|
| 217 |
## Input
|
| 218 |
|
| 219 |
- **Input Type(s):** Text
|
| 220 |
- **Input Format(s):** String
|
| 221 |
- **Input Parameters:** One-Dimensional (1D): Sequences
|
| 222 |
+
- **Other Properties Related to Input:** Maximum context length up to 1M tokens. Supported languages include: English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Korean, Brazilian Portuguese, and Chinese.
|
| 223 |
|
| 224 |
## Output
|
| 225 |
|
|
|
|
| 232 |
|
| 233 |
## Software Integration
|
| 234 |
|
| 235 |
+
- Runtime Engine(s): NeMo 26.04.01
|
| 236 |
- Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere - A100; NVIDIA Blackwell; NVIDIA Hopper - H100-80GB
|
| 237 |
- Operating System(s): Linux
|
| 238 |
|
|
|
|
| 275 |
|
| 276 |
### **vLLM**
|
| 277 |
|
| 278 |
+
**Recommended container:** `vllm/vllm-openai:v0.22.0`
|
| 279 |
|
| 280 |
For more detailed information, please see this cookbook.
|
| 281 |
|
|
|
|
| 414 |
|
| 415 |
### **TRT-LLM**
|
| 416 |
|
| 417 |
+
**Container:** `docker pull nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc17`
|
| 418 |
|
| 419 |
For more detailed information, please see this cookbook.
|
| 420 |
|
|
|
|
| 752 |
|
| 753 |
# Training
|
| 754 |
|
| 755 |
+
**Data Modality:** Text
|
| 756 |
+
**The total size:** 53.8 TiB (14.8 trillion tokens)
|
| 757 |
+
**Total number of datasets:** 226
|
| 758 |
+
**Dataset partition:** *Training [100%], testing [0%], validation [0%]*
|
| 759 |
+
**Time period for training data collection:** 2013 to 2026
|
| 760 |
+
**Time period for testing data collection:** 2013 to 2026
|
| 761 |
+
**Time period for validation data collection:** 2013 to 2026
|
| 762 |
+
**Data Collection Method by dataset:** Hybrid: Automated, Human, Synthetic
|
| 763 |
+
**Labeling Method by dataset:** Hybrid: Automated, Human, Synthetic
|
| 764 |
|
| 765 |
NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4 is pre-trained on a large corpus of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 11 other languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately 20 trillion tokens.
|
| 766 |
|
|
|
|
| 856 |
|
| 857 |
The English Common Crawl data was downloaded from the Common Crawl Foundation (see their FAQ for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the Nemotron-CC paper. Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering instead—similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC.
|
| 858 |
|
| 859 |
+
The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the [technical report](https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Ultra-Technical-Report.pdf)).
|
| 860 |
|
| 861 |
| Dataset | Modality | Dataset Size | Collection Period | Collecting Organisation |
|
| 862 |
| :---- | :---- | :---- | :---- | :---- |
|
|
|
|
| 1073 |
|
| 1074 |
## Evaluation Datasets:
|
| 1075 |
|
| 1076 |
+
**Data Collection Method by dataset** <br>
|
| 1077 |
* Hybrid: Automated, Human, Synthetic
|
| 1078 |
|
| 1079 |
+
**Labeling Method by dataset** <br>
|
| 1080 |
* Hybrid: Automated, Human, Synthetic
|
| 1081 |
|
| 1082 |
+
**Properties:** This corpus comprises a mix of high-quality standard benchmarks and test suites for modern agentic AI as outlined in the benchmark section of the model card.
|
| 1083 |
|
| 1084 |
## Testing Datasets:
|
| 1085 |
|
| 1086 |
+
**Data Collection Method by dataset** <br>
|
| 1087 |
* Hybrid: Automated, Human, Synthetic
|
| 1088 |
|
| 1089 |
+
**Labeling Method by dataset** <br>
|
| 1090 |
* Hybrid: Automated, Human, Synthetic
|
| 1091 |
|
| 1092 |
+
**Properties:** This corpus comprises a mix of high-quality standard benchmarks and test suites for modern agentic AI as outlined in the benchmark section of the model card.
|
| 1093 |
|
| 1094 |
</details>
|
| 1095 |
|
|
|
|
| 1099 |
* **Test Hardware:**
|
| 1100 |
* NVIDIA Hopper
|
| 1101 |
* H100
|
| 1102 |
+
* H200
|
| 1103 |
* NVIDIA Grace Blackwell
|
| 1104 |
* GB200
|
| 1105 |
* GB300
|
|
|
|
| 1120 |
## Citation
|
| 1121 |
|
| 1122 |
```bibtex
|
| 1123 |
+
@misc{nvidia_nemotron_3_ultra_2026,
|
| 1124 |
+
title = {Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning},
|
| 1125 |
author = {{NVIDIA}},
|
| 1126 |
year = {2025},
|
| 1127 |
+
url = {https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Ultra-Technical-Report.pdf},
|
| 1128 |
note = {White Paper}
|
| 1129 |
}
|
| 1130 |
```
|
bias.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
| Field | Response |
|
| 2 |
+
| :---- | :---- |
|
| 3 |
+
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
|
| 4 |
+
| Bias Metric (If Measured): | [BBQ Accuracy Scores in Ambiguous Contexts](https://github.com/nyu-mll/BBQ/) |
|
| 5 |
+
| Which characteristic (feature) show(s) the greatest difference in performance?: | The model shows high variance in the characteristics when it is used with a high temperature. |
|
| 6 |
+
| Measures taken to mitigate against unwanted bias: | Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) employed to calibrate the model’s reasoning capabilities to maintain logical consistency and appropriate complexity when interacting with or interpreting data from diverse age demographics. |
|
| 7 |
+
| If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data: | The training datasets contain a large amount of synthetic data generated by LLMs. We manually curated prompts. |
|
| 8 |
+
| Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | [BBQ](https://github.com/nyu-mll/BBQ/) |
|
| 9 |
+
| Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | These datasets, such as web-scraped finance reasoning data derived from SEC EDGAR filings, science and math problem datasets, OpenResearcher/source-document datasets, Common Crawl, CC-News, Wikimedia, and long-context document datasets, do not collectively or exhaustively represent all demographic groups (and proportionally therein). For instance, these datasets do not contain explicit mentions of demographic classes such as age, gender, or ethnicity in approximately 97% to 99.9% of finance reasoning samples and in over 85% of samples across the broader assessed datasets. In the subset where such terms are present, these datasets contain notable representational skews. For example, ethnicity mentions are often dominated by Middle Eastern contexts (found in finance documents) or "White," "Two or more," and "Black or African American" as the most frequent ethnic identifiers, while references categorized as male-only significantly outnumber those categorized as female-only. Furthermore, gender is explicitly mentioned in approximately 12% of samples across the broader dataset assessment, yet in only 0.9% of finance-specific samples. Dataset-level results vary by source type, with long-context/source-document datasets containing higher explicit demographic mention rates compared to certain web-scraped sources. To mitigate these imbalances, we recommend considering evaluation techniques such as bias audits, fine-tuning with demographically balanced datasets, and mitigation strategies such as counterfactual data augmentation to align with the desired model behavior. This evaluation used a 3,000-sample subset per dataset, identified as the optimal threshold for maximizing embedder accuracy. |
|
| 10 |
+
| Unwanted Bias Testing: | Constrained to English-language inputs. Multi-lingual parity is not currently claimed or guaranteed. |
|
explainability.md
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
| Field | Response |
|
| 2 |
+
| :---- | :---- |
|
| 3 |
+
| Intended Task/Domain: | Text generation, reasoning, and chat |
|
| 4 |
+
| Model Type: | Text-to-text Mamba2-Transformer Hybrid |
|
| 5 |
+
| Intended Users: | Generative AI creators working with conversational AI models and image content. |
|
| 6 |
+
| Output: | Text |
|
| 7 |
+
| Tools used to evaluate datasets to identify synthetic data and ensure data authenticity. | We used a Gemma-3 4B-based filtering model fine-tuned on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) to ensure the quality of synthetic data. |
|
| 8 |
+
| Describe how the model works: | Generates text by predicting the next word or token based on the context provided in the input sequence using multiple self-attention layers. |
|
| 9 |
+
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Age, Disability Status, Gender Identity, Nationality, Physical Appearance, Ethnicity, Socioeconomic Status, Sexual Orientation, Religion |
|
| 10 |
+
| Technical Limitations & Mitigation: | This model performs particularly well in instruction following regimes, as such may be strongly influenced by untrusted inputs and should be paired with appropriate guardrails and data filtering to better align use-case behaviors when exposed to such data. |
|
| 11 |
+
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
| 12 |
+
| Performance Metrics: | Accuracy, Throughput, and User-side throughput |
|
| 13 |
+
| Potential Known Risks: | The model was optimized explicitly for instruction following and as such is more susceptible to prompt injection and jailbreaking in various forms as a result of its instruction tuning. This means that the model should be paired with additional rails or system filtering to limit exposure to instructions from malicious sources -- either directly or indirectly by retrieval (e.g. via visiting a website) -- as they may yield outputs that can lead to harmful, system-level outcomes up to and including remote code execution in agentic systems when effective security controls including guardrails are not in place. The model may generate answers that may be inaccurate, omit key information, include irrelevant or redundant text, or produce socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. |
|
| 14 |
+
| Licensing: | Use of this model is governed by the [OpenMDW License Agreement, version 1.1](https://raw.githubusercontent.com/OpenMDW/OpenMDW/refs/heads/main/1.1/LICENSE.OpenMDW-1.1) (OpenMDW-1.1). |
|
privacy.md
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
| Privacy Information |
|
| 2 |
+
| :--- |
|
| 3 |
+
| Nemotron 3 Ultra was trained on large-scale publicly available data that may contain images, audio-video, and text relating to people. NVIDIA collected and used this data in compliance with applicable data protection and privacy laws. This model was not designed to derive insights or otherwise learn from any personal data contained in the datasets. |
|
| 4 |
+
| NVIDIA uses a combination of filters, data minimization techniques, and other guardrails to help prevent personal data from being recited by our models. We employ automated tools and data processing techniques during pre-training or training to identify and filter certain categories of personal data. |
|
| 5 |
+
| Please review NVIDIA's [Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) for more information. |
|
safety.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
| Field | Response |
|
| 2 |
+
| :---- | :---- |
|
| 3 |
+
| Model Application Field(s): | Chat, Instruction Following, Chatbot Development, Code Generation, Reasoning, Customer Service |
|
| 4 |
+
| Describe the life critical impact (if present). | Not Applicable |
|
| 5 |
+
| Description of methods implemented in data acquisition or processing, if any, to address other types of potentially harmful data in the training, testing, and validation data: | We used a guard model for content safety to exclude potentially harmful data from training. |
|
| 6 |
+
| Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | We used a Gemma-3 4B-based guard model trained on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety to exclude potentially illegal or harmful content from the training. |
|
| 7 |
+
| Use Case Restrictions: | Use of this model is governed by the [OpenMDW License Agreement, version 1.1](https://raw.githubusercontent.com/OpenMDW/OpenMDW/refs/heads/main/1.1/LICENSE.OpenMDW-1.1) (OpenMDW-1.1).|
|
| 8 |
+
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
|
| 9 |
+
| This AI model was developed based on our policies to ensure responsible data handling and risk mitigation. The datasets used for training have been scanned for harmful content and illegal content, consistent with our policies including scanning for Child Sexual Abuse Material (CSAM). Ongoing review and monitoring mechanisms are in place based on our policies and to maintain data integrity. | True. We use [Nemotron Content Safety Dataset V2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) and an internal safety dataset specialized for minority sexuality for content safety evaluation to ensure the safety of this model. |
|