Mesh LLM

Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL across a local Mesh LLM cluster.

This package is derived from unsloth/Nemotron-3-Nano-30B-A3B-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally UD-Q4_K_XL layer package

Model Overview

Property Value
Source model unsloth/Nemotron-3-Nano-30B-A3B-GGUF
Model id unsloth/Nemotron-3-Nano-30B-A3B-GGUF:UD-Q4_K_XL
Family Nemotron
Parameter scale 30B-A3B
Quantization UD-Q4_K_XL
Layer count 52
Activation width 2688
Package size 21.7 GB
Source file Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf
Package repo meshllm/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Nemotron-3-Nano-30B-A3B-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Nemotron-3-Nano-30B-A3B-GGUF:UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Nemotron-3-Nano-30B-A3B-GGUF@main/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf
Source revision main
Source SHA-256 627f5b04aedc97f967332f331bd75b7a4ed2f33ca83e6ee74b44235cc1887890
Skippy ABI 0.1.24
Package manifest SHA-256 7540109f46fcf9e157ad4f6095491a6f7a664f5d493fee3035ef6bc852adb041

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 7540109f46fcf9e157ad4f6095491a6f7a664f5d493fee3035ef6bc852adb041
Metadata shared/metadata.gguf 0 tensors, 7.5 MB 11260b6d5f64a5c4b9649b3e9e88b05ba66b8476c350a0523a69a42250faa297
Embeddings shared/embeddings.gguf 1 tensors, 238.5 MB 9857d6af8b842a5a871a15c67fea5b14977b3d06080140ed92d089fa7736f36b
Output head shared/output.gguf 2 tensors, 364.5 MB 33abed99bf276faf461576898c21e7f3572725d1141cf33db64c16e8c2e1b8e6
Transformer layers layers/layer-*.gguf 52 layer artifacts, 398 tensors, 21.1 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL-layers-193/package"

Links

Downloads last month
1,534
GGUF
Model size
38.7M params
Architecture
nemotron_h_moe
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/Nemotron-3-Nano-30B-A3B-UD-Q4_K_XL-layers