How to use from
Docker Model Runner
docker model run hf.co/meshllm/gemma-4-31B-it-UD-Q4_K_XL-layers
Quick Links
Mesh LLM

gemma-4-31B-it-UD-Q4_K_XL

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running gemma-4-31B-it-UD-Q4_K_XL across a local Mesh LLM cluster.

This package is derived from unsloth/gemma-4-31B-it-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally UD-Q4_K_XL layer package

Model Overview

Property Value
Source model unsloth/gemma-4-31B-it-GGUF
Model id unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL
Family Gemma
Parameter scale 31B
Quantization UD-Q4_K_XL
Layer count 60
Activation width 5376
Package size 18.4 GB
Source file gemma-4-31B-it-UD-Q4_K_XL.gguf
Package repo meshllm/gemma-4-31B-it-UD-Q4_K_XL-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/gemma-4-31B-it-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/gemma-4-31B-it-UD-Q4_K_XL-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/gemma-4-31B-it-GGUF:UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/gemma-4-31B-it-GGUF@main/gemma-4-31B-it-UD-Q4_K_XL.gguf
Source revision main
Source SHA-256 dd69c1566b0d559f303424e2b00e830837bb6c8b25e5baa08c75f11709bd41db
Skippy ABI 0.1.24
Package manifest SHA-256 7a796307aba9e91653c3f3ea8b5309106b52bd86b6cd0264ed324b8d30b159b5

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 7a796307aba9e91653c3f3ea8b5309106b52bd86b6cd0264ed324b8d30b159b5
Metadata shared/metadata.gguf 1 tensors, 15.1 MB 37e07678a83b7883582da4e9c4f1f206f0e6b183308a4065023b9205ab607a7e
Embeddings shared/embeddings.gguf 2 tensors, 939.1 MB 4200bc676a7a75e12716cbbbbb9089d99d01e58e33735e29b9fbd3291c5d1ab5
Output head shared/output.gguf 2 tensors, 15.1 MB c79c3dbfc31a55162b379fd4e7f8df6695ed19c1d368cc71be2ebc0039b68f6f
Transformer layers layers/layer-*.gguf 60 layer artifacts, 890 tensors, 17.5 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/gemma-4-31B-it-UD-Q4_K_XL.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_gemma-4-31B-it-UD-Q4_K_XL-layers-199/package"

Links

Downloads last month
2,472
GGUF
Model size
0.5B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/gemma-4-31B-it-UD-Q4_K_XL-layers

Quantized
(2)
this model