MLX GGUF quantized-tensor out-of-bounds read (proof of concept)

This repository hosts a crafted GGUF model file that triggers an out-of-bounds read in Apple MLX (mlx Python package) when the file is loaded with the standard mlx.core.load() API. The repo is gated; it is a security proof of concept, not a usable model.

Summary

mlx.core.load("x.gguf") parses GGUF tensors through MLX's C++ loader. For quantized tensor types (Q8_0, Q4_0, Q4_1) it calls gguf_load_quantized, which dequantizes the data in extract_q8_0_data / extract_q4_0_data / extract_q4_1_data (mlx/io/gguf_quants.cpp). Those loops iterate over the block count derived from the tensor's file-declared shape and read bytes_per_block (34 for Q8_0) from the memory-mapped file for each block, with no check that the declared shape's data actually fits in the file. A GGUF whose declared quantized dimension is large while the real tensor-data section is tiny makes the loop read far past the end of the mapping.

Affected

mlx 0.31.2 (current latest on PyPI) and current main.
Verified on Linux x86_64 (mlx[cpu]), Python 3.13.

Files

evil.gguf - one Q8_0 tensor declaring dim[0] = 32,000,000 but carrying only 34 data bytes.
baseline.gguf - identical structure with a well-formed single block; loads fine (control).
verify.py - rebuilds both files and loads each in a child process, showing the differential.

Reproduce

pip install "mlx[cpu]"
python verify.py

Expected:

  load baseline.gguf -> exit 0
  load evil.gguf     -> SIGSEGV

Precise attribution (valgrind):

valgrind python -c "import mlx.core as mx; mx.load('evil.gguf')"
# Invalid read of size 16 / of size 2
#   at mlx::core::extract_q8_0_data(...)
#   by mlx::core::gguf_load_quantized(...)
#   by mlx::core::load_arrays(...)
#   by mlx::core::load_gguf(...)

Impact

Loading an untrusted .gguf with mlx.core.load() reads out-of-bounds heap/mmap memory into the dequantized output arrays. A large declared dimension walks off the mapping and crashes the process (denial of service at model-load time); a smaller over-declared dimension reads adjacent heap bytes into caller-visible arrays (information disclosure). No flags or non-default options are required.

Fix

In gguf_load_quantized (or gguf_get_tensor in the vendored gguflib), validate that the tensor's declared element/block count implies a byte size that fits within offset .. file_size before the extractor loops run.