Getting gibberish output

#1
by sjthapa - opened

I tried running this model on 2× M3 Ultra Macs (256 GB RAM each) with EXO.
The model loads successfully and inference starts, but the output is complete gibberish.

Screenshot 2026-06-05 at 8.03.43 PM

MLX Community org

try go into settings, which i guess its the button in the down-left

its beside word called "1 CONVERSATION", its on the

"settings button" "1 CONVERSATION" "other button"

on top of-it there is "DELETE ALL CHATS"

please go into it, and screenshot me it,

my whole current priority is to solve that problem

if you did, i may suggest to edit the temp into 1 or 0.7

just show me it

also did you try other models?

did they work?

Here are the screenshots of settings menu.

Screenshot 2026-06-05 at 9.35.31 PM
Screenshot 2026-06-05 at 9.35.41 PM
Screenshot 2026-06-05 at 9.35.56 PM
Screenshot 2026-06-05 at 9.36.03 PM
Screenshot 2026-06-05 at 9.36.12 PM
Screenshot 2026-06-05 at 9.36.20 PM

I just found out that the model gives proper output if pipeline parallelism is used. But gives gibberish output in tensor parallelism.

Screenshot 2026-06-05 at 8.03.43 PM

MLX Community org

did you try other models on same setup?

try

mlx-community/Qwen3.6-27B-AEON-Ultimate-Uncensored-BF16-mlx-8Bit

its famous and alot of people have running it

if it didn't work it should be not the model's problem

if it did work, its the model of "mlx-community/Nemotron-3-Ultra-550B-A55B" 's problem

MLX Community org

i just got into your settings just for searching about the settings called "top-p" and "top-k" and "temp"

and also the model i suggest it to you to try is just a suggestion, not a force, you can try any other model, i didn't force you into something you don't want, which trying a specific model

no i didn't

im sorry if looks like i did

in meanwhile just try any other model

don't let the time gets wasted!!

MLX Community org

i hope Deeply from my heart it works the setup for running the models you want

please let me know if you have any issues,

As i said , currently my whole priority is to fix that problem, which also other users may gets affected from it,

i really doesn't want people lose interest in digital intelligence...

I tried with another model.
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit

Same issue. Works with pipeline parallel but not tensor parallel.
I think this is issue with EXO and not with the model.

MLX Community org

https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B-4bit

is the same model of

https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B

beacuse

also
https://huggingface.co/mlx-community/Nemotron-3-Ultra-550B-A55B

is 4bit

all same...

use a different model, not just a different repositorys for the same model.....

I checked with https://huggingface.co/mlx-community/MiniMax-M2.7-8bit

It works fine in EXO with tensor parallel.

MLX Community org

nice, now we know its from the model....

i recommend to you using this model

https://huggingface.co/spicyneuron/Nex-N2-Pro-MLX-5.3bit-vision
or
better fully
https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16

this is good until someone or you, convert the model you want into mlx,

you have good unified memory,

which could so easily and much better than anyone else with just lower than half a hour,
the model would be converted!

this is full, good for your hardware,
this the currently the best for using by now... "unless you are ready for converting models into mlx in your own hardware"

https://huggingface.co/usermma/Nex-N2-mini-mlx-fp16

and also its great,

or try convert it yourself from
nex-agi/Nex-N2-mini

maybe you learn something new.... who knows?

MLX Community org

Look... at Nex-N2-mini , it would be fast on your hardware, or even slower but better which is Nex-N2-Pro

Benchmark Nex-N2-mini Nex-N2-Pro GPT-5.5 Opus 4.7 Kimi-K2.6 GLM-5.1 MiniMax M3 DeepSeek-V4-Pro
Agent
BrowseComp 74.1 83.7 84.4 79.8 83.2 79.3 83.5 83.4
GDPval 1402 1585 1769 1753 1481 1535 - 1554
Toolathlon 33.3 51.9 55.6 52.8 50.0 40.7 - 51.8
WildClawBench 47.7 53.5 58.2 62.2 - 48.2 - 43.7
WideSearch 62.0 75.6 - - 80.8 - - -
TAU3 65.9 71.1 - - - 70.6 - -
Coding & SWE
SWE-Bench Pro 50.2 58.8 58.6 64.3 58.6 58.4 59.0 55.4
Terminal-Bench 2.1 60.7 75.3 83.4 69.7 - 58.7 66.0 72.0
DeepSWE 8.0 33.6 70 54 24 18 - 8
SWE-Bench Verified 74.4 80.8 82.9 87.6 80.2 - 80.5 80.6
SWE Atlas QnA 31.5 37.9 45.4 45.2 - - 37.9 -
SWE Atlas RF 30.0 32.9 44.8 48.6 - - - -
SWE Atlas TW 23.3 40.0 42.6 38.2 - - 30.8 -
General & Reasoning
GPQA Diamond 82.6 90.7 93.6 94.2 90.5 86.2 - 90.1
IFEval 89.1 94.0 - - 94.5 94.5 - 91.9
Apex 9.4 36.5 - - 24.0 11.5 - 38.3
MLX Community org

can may i know what are you working on? i may help, giving some ideas... or even whole plans...

looks like you are working on something big that would make the world much better place...

if i had the same hardware as yours, i may try everything, not just stuck to one model which is "Nemotron" series

MLX Community org

no quants: only full:

Benchmark Nex-N2-mini Nex-N2-Pro Nemotron-3-Ultra
Agentic
Terminal Bench 2.1 60.7 75.3 56.4
GDPVal 1402 1585 46.7
SWE-Bench Verified 74.4 80.8 71.9
BrowseComp 74.1 83.7 44.4
Reasoning and Knowledge
Apex-Shortlist (no tools) 9.4 36.5 74.9
MLX Community org

looks like there is a typo in GDPVal....

Sign up or log in to comment