Mesh LLM

Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL across a local Mesh LLM cluster.

This package is derived from unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally UD-Q4_K_XL layer package

Model Overview

Property Value
Source model unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF
Model id unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:UD-Q4_K_XL
Family Llama
Parameter scale 17B
Quantization UD-Q4_K_XL
Layer count 48
Activation width 5120
Package size 58.3 GB
Source file UD-Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf
Package repo meshllm/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF@main/UD-Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf
Source revision main
Source SHA-256 bc2e19afa97799f31a9e99818f235bb3a16fa12b695cf49d3ab7c93eca1d4d45
Skippy ABI 0.1.24
Package manifest SHA-256 b26056cb7558565a16ce21ea4155a9d893779340ca2bcd9be2c9342f34b5dc4e

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums b26056cb7558565a16ce21ea4155a9d893779340ca2bcd9be2c9342f34b5dc4e
Metadata shared/metadata.gguf 1 tensors, 12.5 MB ac53081e0403a186f0b763a3997c65fb5b851aa1f4f300808a61f4c8ab63d56c
Embeddings shared/embeddings.gguf 2 tensors, 567.4 MB 8b97e7c7373b64dc41db22d34b1db1f750b90e55ac78b09af3f72d64938c8851
Output head shared/output.gguf 3 tensors, 821.8 MB f3e37c93e4cfacfb719eb0af4c73aa9823dda79a9bd1cc05520a3af717cf807c
Transformer layers layers/layer-*.gguf 48 layer artifacts, 672 tensors, 57.0 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/UD-Q4_K_XL/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-00001-of-00002.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-layers-200/package"

Links

Downloads last month
2,436
GGUF
Model size
2B params
Architecture
llama4
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for meshllm/Llama-4-Scout-17B-16E-Instruct-UD-Q4_K_XL-layers