How to use from
vLLM
Install from pip and serve model
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "meshllm/Kimi-K2.5-UD-Q4_K_XL-layers"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "meshllm/Kimi-K2.5-UD-Q4_K_XL-layers",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'
Use Docker
docker model run hf.co/meshllm/Kimi-K2.5-UD-Q4_K_XL-layers
Quick Links
Mesh LLM

Kimi-K2.5-UD-Q4_K_XL

Distributed GGUF inference package for Mesh LLM

Website GitHub Discord

GGUF layer package for running Kimi-K2.5-UD-Q4_K_XL across a local Mesh LLM cluster.

This package is derived from unsloth/Kimi-K2.5-GGUF and keeps the original GGUF distribution split into per-layer artifacts for distributed inference.

Highlights

Run locally Pool multiple machines OpenAI-compatible Package variant
Private inference on your hardware Split layers across peers Serve /v1/chat/completions locally UD-Q4_K_XL layer package

Model Overview

Property Value
Source model unsloth/Kimi-K2.5-GGUF
Model id unsloth/Kimi-K2.5-GGUF:UD-Q4_K_XL
Family Kimi
Parameter scale not recorded
Quantization UD-Q4_K_XL
Layer count 61
Activation width 7168
Package size 579.7 GB
Source file UD-Q4_K_XL/Kimi-K2.5-UD-Q4_K_XL-00001-of-00013.gguf
Package repo meshllm/Kimi-K2.5-UD-Q4_K_XL-layers

Recommended Use

  • Local and private inference with Mesh LLM.
  • Multi-machine serving when the full GGUF is too large for one host.
  • OpenAI-compatible chat/completions workflows through Mesh LLM's local API.

For upstream architecture details, chat template guidance, sampling recommendations, license terms, and benchmark notes, see the source model card: unsloth/Kimi-K2.5-GGUF.

Quickstart

# Run this on each machine that should contribute memory/compute.
mesh-llm serve --model "meshllm/Kimi-K2.5-UD-Q4_K_XL-layers" --split
# Check the mesh and discover the OpenAI-compatible model name.
curl -s http://localhost:3131/api/status
curl -s http://localhost:3131/v1/models
# Send an OpenAI-compatible chat request.
curl -s http://localhost:3131/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "unsloth/Kimi-K2.5-GGUF:UD-Q4_K_XL",
    "messages": [{"role": "user", "content": "Write a tiny hello-world function in Rust."}],
    "max_tokens": 128
  }'

Package Variant

Property Value
Format layer-package
Canonical source ref unsloth/Kimi-K2.5-GGUF@ef35f0cc9d9ac0781e716e0fd64214626457de85/UD-Q4_K_XL/Kimi-K2.5-UD-Q4_K_XL-00001-of-00013.gguf
Source revision ef35f0cc9d9ac0781e716e0fd64214626457de85
Source SHA-256 848fffb3ac4b5da143119ee75cb8aaf58dfb48b162b4944e2dd382462684b111
Skippy ABI 0.1.22
Package manifest SHA-256 0b37305d9dba8dff1c54b39bf4f53a9d0d4b0a2a91682045eccd1ab8c6d38f6b

What Is Included

Artifact Path Contents SHA-256
Manifest model-package.json Package schema, source identity, checksums 0b37305d9dba8dff1c54b39bf4f53a9d0d4b0a2a91682045eccd1ab8c6d38f6b
Metadata shared/metadata.gguf 0 tensors, 6.6 MB 9ea05bdc559d2618a45053b59cf6ca843ca5de46455a8ee2217587eb93949fca
Embeddings shared/embeddings.gguf 1 tensors, 636.6 MB fb4d8cabfcb926ed1e8482b6e15b6a7181a5217dc2f596319a3fe85f2f2b7e0b
Output head shared/output.gguf 2 tensors, 925.4 MB 4b1e609106802f3afef0ae37c76af4ba856d9a747d48a7459970f3507e4a0544
Transformer layers layers/layer-*.gguf 61 layer artifacts, 1093 tensors, 578.2 GB see model-package.json

Validation

Generated by the Mesh LLM HF Jobs splitter from mesh-llm ref main. Each artifact is checksummed as it is written, uploaded to this repository, and removed from the job workspace before the next artifact is produced.

skippy-model-package write-package "/source/UD-Q4_K_XL/Kimi-K2.5-UD-Q4_K_XL-00001-of-00013.gguf" --out-dir "/tmp/meshllm-layer-job-meshllm_Kimi-K2.5-UD-Q4_K_XL-layers-197/package"

Links

Downloads last month
526
GGUF
Model size
0.5B params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for meshllm/Kimi-K2.5-UD-Q4_K_XL-layers

Quantized
(1)
this model