TheMrEvil
/

mlx-gguf-q8-oob-read-poc

Model card Files Files and versions

mlx-gguf-q8-oob-read-poc / README.md

TheMrEvil's picture

Upload README.md with huggingface_hub

f6009b3 verified 19 days ago

|

History Blame Contribute Delete

2.53 kB

	# MLX GGUF quantized-tensor out-of-bounds read (proof of concept)

	This repository hosts a crafted GGUF model file that triggers an out-of-bounds read in
	Apple MLX (`mlx` Python package) when the file is loaded with the standard
	`mlx.core.load()` API. The repo is gated; it is a security proof of concept, not a usable model.

	## Summary

	`mlx.core.load("x.gguf")` parses GGUF tensors through MLX's C++ loader. For quantized
	tensor types (`Q8_0`, `Q4_0`, `Q4_1`) it calls `gguf_load_quantized`, which dequantizes the
	data in `extract_q8_0_data` / `extract_q4_0_data` / `extract_q4_1_data`
	(`mlx/io/gguf_quants.cpp`). Those loops iterate over the block count derived from the tensor's
	file-declared shape and read `bytes_per_block` (34 for Q8_0) from the memory-mapped file for
	each block, with no check that the declared shape's data actually fits in the file. A GGUF whose
	declared quantized dimension is large while the real tensor-data section is tiny makes the loop
	read far past the end of the mapping.

	## Affected

	* `mlx` 0.31.2 (current latest on PyPI) and current `main`.
	* Verified on Linux x86_64 (`mlx[cpu]`), Python 3.13.

	## Files

	* `evil.gguf` - one `Q8_0` tensor declaring `dim[0] = 32,000,000` but carrying only 34 data bytes.
	* `baseline.gguf` - identical structure with a well-formed single block; loads fine (control).
	* `verify.py` - rebuilds both files and loads each in a child process, showing the differential.

	## Reproduce

	```
	pip install "mlx[cpu]"
	python verify.py
	```

	Expected:

	```
	load baseline.gguf -> exit 0
	load evil.gguf -> SIGSEGV
	```

	Precise attribution (valgrind):

	```
	valgrind python -c "import mlx.core as mx; mx.load('evil.gguf')"
	# Invalid read of size 16 / of size 2
	# at mlx::core::extract_q8_0_data(...)
	# by mlx::core::gguf_load_quantized(...)
	# by mlx::core::load_arrays(...)
	# by mlx::core::load_gguf(...)
	```

	## Impact

	Loading an untrusted `.gguf` with `mlx.core.load()` reads out-of-bounds heap/mmap memory into the
	dequantized output arrays. A large declared dimension walks off the mapping and crashes the process
	(denial of service at model-load time); a smaller over-declared dimension reads adjacent heap bytes
	into caller-visible arrays (information disclosure). No flags or non-default options are required.

	## Fix

	In `gguf_load_quantized` (or `gguf_get_tensor` in the vendored gguflib), validate that the tensor's
	declared element/block count implies a byte size that fits within `offset .. file_size` before the
	extractor loops run.