You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

MLX GGUF quantized-tensor out-of-bounds read (proof of concept)

This repository hosts a crafted GGUF model file that triggers an out-of-bounds read in Apple MLX (mlx Python package) when the file is loaded with the standard mlx.core.load() API. The repo is gated; it is a security proof of concept, not a usable model.

Summary

mlx.core.load("x.gguf") parses GGUF tensors through MLX's C++ loader. For quantized tensor types (Q8_0, Q4_0, Q4_1) it calls gguf_load_quantized, which dequantizes the data in extract_q8_0_data / extract_q4_0_data / extract_q4_1_data (mlx/io/gguf_quants.cpp). Those loops iterate over the block count derived from the tensor's file-declared shape and read bytes_per_block (34 for Q8_0) from the memory-mapped file for each block, with no check that the declared shape's data actually fits in the file. A GGUF whose declared quantized dimension is large while the real tensor-data section is tiny makes the loop read far past the end of the mapping.

Affected

mlx 0.31.2 (current latest on PyPI) and current main.
Verified on Linux x86_64 (mlx[cpu]), Python 3.13.

Files

evil.gguf - one Q8_0 tensor declaring dim[0] = 32,000,000 but carrying only 34 data bytes.
baseline.gguf - identical structure with a well-formed single block; loads fine (control).
verify.py - rebuilds both files and loads each in a child process, showing the differential.

Reproduce

pip install "mlx[cpu]"
python verify.py

Expected:

  load baseline.gguf -> exit 0
  load evil.gguf     -> SIGSEGV

Precise attribution (valgrind):

valgrind python -c "import mlx.core as mx; mx.load('evil.gguf')"
# Invalid read of size 16 / of size 2
#   at mlx::core::extract_q8_0_data(...)
#   by mlx::core::gguf_load_quantized(...)
#   by mlx::core::load_arrays(...)
#   by mlx::core::load_gguf(...)

Impact

Loading an untrusted .gguf with mlx.core.load() reads out-of-bounds heap/mmap memory into the dequantized output arrays. A large declared dimension walks off the mapping and crashes the process (denial of service at model-load time); a smaller over-declared dimension reads adjacent heap bytes into caller-visible arrays (information disclosure). No flags or non-default options are required.

Fix

In gguf_load_quantized (or gguf_get_tensor in the vendored gguflib), validate that the tensor's declared element/block count implies a byte size that fits within offset .. file_size before the extractor loops run.

Downloads last month: 6

GGUF

Model size

32 params

Architecture

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support