Text Generation
Transformers
Safetensors
PyTorch
nemotron_h
nvidia
conversational
custom_code
Eval Results
Instructions to use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", trust_remote_code=True) model = AutoModelForMultimodalLM.from_pretrained("nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
- SGLang
How to use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with Docker Model Runner:
docker model run hf.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16
TemporalMesh Transformer: 29.4 PPL at 48% compute — beats Mamba, new open-source architecture
#64 opened 15 days ago
by
vigneshwar234
MLX 4-bit quant available (M4 Pro benchmarks included)
#63 opened 15 days ago
by
BrendanL79
KeyError: 'shape' when using apply_chat_template with tools
#61 opened 20 days ago
by
OnalBusra
Prompt template for nemotron model
#60 opened 2 months ago
by
falcon25051997
What a hell? Useless for story
1
#58 opened 3 months ago
by
kellysan
math-oai.yaml file for aime eval
#56 opened 3 months ago
by
Michalea
Update nano_v3_reasoning_parser.py
#54 opened 4 months ago
by
HyzeAI
Add GPQA evaluation result
#43 opened 5 months ago
by
burtenshaw
what is the implementation of the bench "AIME25 (with tools)"?
1
#42 opened 5 months ago
by
YF-T
[Research] Adaptive-K Routing Validation: 33% Compute Savings on Nemotron 3 Nano
❤️ 3
#41 opened 5 months ago
by
Gabrobals
Correct `get_decoder`/`set_decoder`
3
#40 opened 5 months ago
by
kylemylonakisprotopia
Is this model going to be seriously considered? Seeking Official Channels to Contact the Model’s Developers or an Active Community
1
#36 opened 6 months ago
by
j3st3r666
Inquiry about Nemotron 3 Nano technical report training details
1
#34 opened 6 months ago
by
andresnowak
Failure in basic question, is it any good at programming
👀 1
9
#31 opened 6 months ago
by
engrtipusultan
Recommended parameters?
1
#30 opened 6 months ago
by
leonsarmiento
Fix streaming output when enable_thinking is disabled
1
#29 opened 6 months ago
by
Kwindla
No, Bad Logic (((
2
#28 opened 6 months ago
by
BuBaLoM
vLLM implementation for reasoning budget
2
#27 opened 6 months ago
by
lssj14
Tool calling issue: got "True" as a String instead of a valid JSON format such as true (the primitive, unquoted value)
3
#25 opened 6 months ago
by
j3st3r666
Recommended way of fine-tuning?
2
#17 opened 6 months ago
by
devon-kindo
Unexpected... "Performance"?
👍 2
9
#15 opened 6 months ago
by
ponzles
doesn't do kv caching on transformers
🔥➕ 3
4
#14 opened 6 months ago
by
adaface-neurips
Does not work with dgx spark
🔥 1
6
#13 opened 6 months ago
by
sotaaa
Actual context length
6
#12 opened 6 months ago
by
yuchsiao
I really hope this model works
1
#8 opened 6 months ago
by
BVEsun
Simple minesweeper game is failing.
1
#7 opened 6 months ago
by
robert1968
Good model but it is very flawed in recalling input
6
#5 opened 6 months ago
by
cmp-nct
Problem working with long text
5
#4 opened 6 months ago
by
Kosh69
Tool calling with reasoning parsing broken
11
#3 opened 6 months ago
by
nephepritou