Instructions to use mlx-community/Nemotron-3-Ultra-550B-A55B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mlx-community/Nemotron-3-Ultra-550B-A55B with MLX:

# Make sure mlx-lm is installed
# pip install --upgrade mlx-lm

# Generate text with mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Nemotron-3-Ultra-550B-A55B")

prompt = "Write a story about Einstein"
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Notebooks
Google Colab
Kaggle
Local Apps Settings
LM Studio

How to use mlx-community/Nemotron-3-Ultra-550B-A55B with Pi:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B"

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "mlx-lm": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "mlx-community/Nemotron-3-Ultra-550B-A55B"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use mlx-community/Nemotron-3-Ultra-550B-A55B with Hermes Agent:

Start the MLX server

# Install MLX LM:
uv tool install mlx-lm
# Start a local OpenAI-compatible server:
mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B"

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default mlx-community/Nemotron-3-Ultra-550B-A55B

Run Hermes

hermes

MLX LM

How to use mlx-community/Nemotron-3-Ultra-550B-A55B with MLX LM:

Generate or start a chat session

# Install MLX LM
uv tool install mlx-lm
# Interactive chat REPL
mlx_lm.chat --model "mlx-community/Nemotron-3-Ultra-550B-A55B"

Run an OpenAI-compatible server

# Install MLX LM
uv tool install mlx-lm
# Start the server
mlx_lm.server --model "mlx-community/Nemotron-3-Ultra-550B-A55B"
# Calling the OpenAI-compatible server with curl
curl -X POST "http://localhost:8000/v1/chat/completions" \
   -H "Content-Type: application/json" \
   --data '{
     "model": "mlx-community/Nemotron-3-Ultra-550B-A55B",
     "messages": [
       {"role": "user", "content": "Hello"}
     ]
   }'

Getting gibberish output

by sjthapa - opened about 8 hours ago

Discussion

sjthapa

about 8 hours ago

I tried running this model on 2× M3 Ultra Macs (256 GB RAM each) with EXO.
The model loads successfully and inference starts, but the output is complete gibberish.

usermma

MLX Community org about 7 hours ago

try go into settings, which i guess its the button in the down-left

its beside word called "1 CONVERSATION", its on the

"settings button" "1 CONVERSATION" "other button"

on top of-it there is "DELETE ALL CHATS"

please go into it, and screenshot me it,

my whole current priority is to solve that problem

if you did, i may suggest to edit the temp into 1 or 0.7

just show me it

also did you try other models?

did they work?

sjthapa

about 7 hours ago

Here are the screenshots of settings menu.

I just found out that the model gives proper output if pipeline parallelism is used. But gives gibberish output in tensor parallelism.

usermma

MLX Community org about 7 hours ago

did you try other models on same setup?

try

mlx-community/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16-mlx-8Bit

its famous and alot of people have running it

if it didn't work it should be not the model's problem

if it did work, its the model of "mlx-community/Nemotron-3-Ultra-550B-A55B" 's problem

usermma

MLX Community org about 6 hours ago

i just got into your settings just for searching about the settings called "top-p" and "top-k" and "temp"

and also the model i suggest it to you to try is just a suggestion, not a force, you can try any other model, i didn't force you into something you don't want, which trying a specific model

no i didn't

im sorry if looks like i did

in meanwhile just try any other model

don't let the time gets wasted!!

usermma

MLX Community org about 6 hours ago

i hope Deeply from my heart it works the setup for running the models you want

please let me know if you have any issues,

As i said , currently my whole priority is to fix that problem, which also other users may gets affected from it,

i really doesn't want people lose interest in digital intelligence...

sjthapa

about 5 hours ago

I tried with another model.
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit

Same issue. Works with pipeline parallel but not tensor parallel.
I think this is issue with EXO and not with the model.

usermma

MLX Community org about 5 hours ago

https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit

is the same model of

https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B

beacuse

also
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B

is 4bit

all same...

use a different model, not just a different repositorys for the same model.....

sjthapa

about 4 hours ago

I checked with https://huggingface.co/mlx-community/MiniMax-M2.7-8bit

It works fine in EXO with tensor parallel.

usermma

MLX Community org about 4 hours ago

nice, now we know its from the model....

i recommend to you using this model

https://huggingface.co/spicyneuron/Nex-N2-Pro-MLX-5.3bit-vision
or
better fully
https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16

this is good until someone or you, convert the model you want into mlx,

you have good unified memory,

which could so easily and much better than anyone else with just lower than half a hour,
the model would be converted!

this is full, good for your hardware,
this the currently the best for using by now... "unless you are ready for converting models into mlx in your own hardware"

https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16

and also its great,

or try convert it yourself from
nex-agi/Nex-N2-mini

maybe you learn something new.... who knows?

usermma

MLX Community org about 4 hours ago

Look... at Nex-N2-mini , it would be fast on your hardware, or even slower but better which is Nex-N2-Pro

Benchmark	Nex-N2-mini	Nex-N2-Pro	GPT-5.5	Opus 4.7	Kimi-K2.6	GLM-5.1	MiniMax M3	DeepSeek-V4-Pro
Agent
BrowseComp	74.1	83.7	84.4	79.8	83.2	79.3	83.5	83.4
GDPval	1402	1585	1769	1753	1481	1535	-	1554
Toolathlon	33.3	51.9	55.6	52.8	50.0	40.7	-	51.8
WildClawBench	47.7	53.5	58.2	62.2	-	48.2	-	43.7
WideSearch	62.0	75.6	-	-	80.8	-	-	-
TAU3	65.9	71.1	-	-	-	70.6	-	-
Coding & SWE
SWE-Bench Pro	50.2	58.8	58.6	64.3	58.6	58.4	59.0	55.4
Terminal-Bench 2.1	60.7	75.3	83.4	69.7	-	58.7	66.0	72.0
DeepSWE	8.0	33.6	70	54	24	18	-	8
SWE-Bench Verified	74.4	80.8	82.9	87.6	80.2	-	80.5	80.6
SWE Atlas QnA	31.5	37.9	45.4	45.2	-	-	37.9	-
SWE Atlas RF	30.0	32.9	44.8	48.6	-	-	-	-
SWE Atlas TW	23.3	40.0	42.6	38.2	-	-	30.8	-
General & Reasoning
GPQA Diamond	82.6	90.7	93.6	94.2	90.5	86.2	-	90.1
IFEval	89.1	94.0	-	-	94.5	94.5	-	91.9
Apex	9.4	36.5	-	-	24.0	11.5	-	38.3

usermma

MLX Community org about 4 hours ago

can may i know what are you working on? i may help, giving some ideas... or even whole plans...

looks like you are working on something big that would make the world much better place...

if i had the same hardware as yours, i may try everything, not just stuck to one model which is "Nemotron" series

usermma

MLX Community org about 3 hours ago

no quants: only full:

Benchmark	Nex-N2-mini	Nex-N2-Pro	Nemotron-3-Ultra
Agentic
Terminal Bench 2.1	60.7	75.3	56.4
GDPVal	1402	1585	46.7
SWE-Bench Verified	74.4	80.8	71.9
BrowseComp	74.1	83.7	44.4
Reasoning and Knowledge
Apex-Shortlist (no tools)	9.4	36.5	74.9

usermma

MLX Community org about 3 hours ago

looks like there is a typo in GDPVal....

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment