Instructions to use mlx-community/Nemotron-3-Ultra-550B-A55B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Nemotron-3-Ultra-550B-A55B with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/Nemotron-3-Ultra-550B-A55B") prompt = "Write a story about Einstein" messages = [{"role": "user", "content": prompt}] prompt = tokenizer.apply_chat_template( messages, add_generation_prompt=True ) text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
- Pi
How to use mlx-community/Nemotron-3-Ultra-550B-A55B with Pi:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B"
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "mlx-lm": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "mlx-community/Nemotron-3-Ultra-550B-A55B" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use mlx-community/Nemotron-3-Ultra-550B-A55B with Hermes Agent:
Start the MLX server
# Install MLX LM: uv tool install mlx-lm # Start a local OpenAI-compatible server: mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B"
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default mlx-community/Nemotron-3-Ultra-550B-A55B
Run Hermes
hermes
- MLX LM
How to use mlx-community/Nemotron-3-Ultra-550B-A55B with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Interactive chat REPL mlx_lm.chat --model "mlx-community/Nemotron-3-Ultra-550B-A55B"
Run an OpenAI-compatible server
# Install MLX LM uv tool install mlx-lm # Start the server mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B" # Calling the OpenAI-compatible server with curl curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mlx-community/Nemotron-3-Ultra-550B-A55B", "messages": [ {"role": "user", "content": "Hello"} ] }'
Getting gibberish output
try go into settings, which i guess its the button in the down-left
its beside word called "1 CONVERSATION", its on the
"settings button" "1 CONVERSATION" "other button"
on top of-it there is "DELETE ALL CHATS"
please go into it, and screenshot me it,
my whole current priority is to solve that problem
if you did, i may suggest to edit the temp into 1 or 0.7
just show me it
also did you try other models?
did they work?
did you try other models on same setup?
try
mlx-community/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16-mlx-8Bit
its famous and alot of people have running it
if it didn't work it should be not the model's problem
if it did work, its the model of "mlx-community/Nemotron-3-Ultra-550B-A55B" 's problem
i just got into your settings just for searching about the settings called "top-p" and "top-k" and "temp"
and also the model i suggest it to you to try is just a suggestion, not a force, you can try any other model, i didn't force you into something you don't want, which trying a specific model
no i didn't
im sorry if looks like i did
in meanwhile just try any other model
don't let the time gets wasted!!
i hope Deeply from my heart it works the setup for running the models you want
please let me know if you have any issues,
As i said , currently my whole priority is to fix that problem, which also other users may gets affected from it,
i really doesn't want people lose interest in digital intelligence...
I tried with another model.
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit
Same issue. Works with pipeline parallel but not tensor parallel.
I think this is issue with EXO and not with the model.
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit
is the same model of
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B
beacuse
also
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B
is 4bit
all same...
use a different model, not just a different repositorys for the same model.....
I checked with https://huggingface.co/mlx-community/MiniMax-M2.7-8bit
It works fine in EXO with tensor parallel.
nice, now we know its from the model....
i recommend to you using this model
https://huggingface.co/spicyneuron/Nex-N2-Pro-MLX-5.3bit-vision
or
better fully
https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16
this is good until someone or you, convert the model you want into mlx,
you have good unified memory,
which could so easily and much better than anyone else with just lower than half a hour,
the model would be converted!
this is full, good for your hardware,
this the currently the best for using by now... "unless you are ready for converting models into mlx in your own hardware"
https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16
and also its great,
or try convert it yourself from
nex-agi/Nex-N2-mini
maybe you learn something new.... who knows?
Look... at Nex-N2-mini , it would be fast on your hardware, or even slower but better which is Nex-N2-Pro
| Benchmark | Nex-N2-mini | Nex-N2-Pro | GPT-5.5 | Opus 4.7 | Kimi-K2.6 | GLM-5.1 | MiniMax M3 | DeepSeek-V4-Pro |
|---|---|---|---|---|---|---|---|---|
| Agent | ||||||||
| BrowseComp | 74.1 | 83.7 | 84.4 | 79.8 | 83.2 | 79.3 | 83.5 | 83.4 |
| GDPval | 1402 | 1585 | 1769 | 1753 | 1481 | 1535 | - | 1554 |
| Toolathlon | 33.3 | 51.9 | 55.6 | 52.8 | 50.0 | 40.7 | - | 51.8 |
| WildClawBench | 47.7 | 53.5 | 58.2 | 62.2 | - | 48.2 | - | 43.7 |
| WideSearch | 62.0 | 75.6 | - | - | 80.8 | - | - | - |
| TAU3 | 65.9 | 71.1 | - | - | - | 70.6 | - | - |
| Coding & SWE | ||||||||
| SWE-Bench Pro | 50.2 | 58.8 | 58.6 | 64.3 | 58.6 | 58.4 | 59.0 | 55.4 |
| Terminal-Bench 2.1 | 60.7 | 75.3 | 83.4 | 69.7 | - | 58.7 | 66.0 | 72.0 |
| DeepSWE | 8.0 | 33.6 | 70 | 54 | 24 | 18 | - | 8 |
| SWE-Bench Verified | 74.4 | 80.8 | 82.9 | 87.6 | 80.2 | - | 80.5 | 80.6 |
| SWE Atlas QnA | 31.5 | 37.9 | 45.4 | 45.2 | - | - | 37.9 | - |
| SWE Atlas RF | 30.0 | 32.9 | 44.8 | 48.6 | - | - | - | - |
| SWE Atlas TW | 23.3 | 40.0 | 42.6 | 38.2 | - | - | 30.8 | - |
| General & Reasoning | ||||||||
| GPQA Diamond | 82.6 | 90.7 | 93.6 | 94.2 | 90.5 | 86.2 | - | 90.1 |
| IFEval | 89.1 | 94.0 | - | - | 94.5 | 94.5 | - | 91.9 |
| Apex | 9.4 | 36.5 | - | - | 24.0 | 11.5 | - | 38.3 |
can may i know what are you working on? i may help, giving some ideas... or even whole plans...
looks like you are working on something big that would make the world much better place...
if i had the same hardware as yours, i may try everything, not just stuck to one model which is "Nemotron" series
no quants: only full:
| Benchmark | Nex-N2-mini | Nex-N2-Pro | Nemotron-3-Ultra |
|---|---|---|---|
| Agentic | |||
| Terminal Bench 2.1 | 60.7 | 75.3 | 56.4 |
| GDPVal | 1402 | 1585 | 46.7 |
| SWE-Bench Verified | 74.4 | 80.8 | 71.9 |
| BrowseComp | 74.1 | 83.7 | 44.4 |
| Reasoning and Knowledge | |||
| Apex-Shortlist (no tools) | 9.4 | 36.5 | 74.9 |
looks like there is a typo in GDPVal....







