| # MLX GGUF quantized-tensor out-of-bounds read (proof of concept) |
|
|
| This repository hosts a crafted GGUF model file that triggers an out-of-bounds read in |
| **Apple MLX** (`mlx` Python package) when the file is loaded with the standard |
| `mlx.core.load()` API. The repo is gated; it is a security proof of concept, not a usable model. |
|
|
| ## Summary |
|
|
| `mlx.core.load("x.gguf")` parses GGUF tensors through MLX's C++ loader. For quantized |
| tensor types (`Q8_0`, `Q4_0`, `Q4_1`) it calls `gguf_load_quantized`, which dequantizes the |
| data in `extract_q8_0_data` / `extract_q4_0_data` / `extract_q4_1_data` |
| (`mlx/io/gguf_quants.cpp`). Those loops iterate over the block count derived from the tensor's |
| **file-declared shape** and read `bytes_per_block` (34 for Q8_0) from the memory-mapped file for |
| each block, with no check that the declared shape's data actually fits in the file. A GGUF whose |
| declared quantized dimension is large while the real tensor-data section is tiny makes the loop |
| read far past the end of the mapping. |
| |
| ## Affected |
| |
| * `mlx` 0.31.2 (current latest on PyPI) and current `main`. |
| * Verified on Linux x86_64 (`mlx[cpu]`), Python 3.13. |
|
|
| ## Files |
|
|
| * `evil.gguf` - one `Q8_0` tensor declaring `dim[0] = 32,000,000` but carrying only 34 data bytes. |
| * `baseline.gguf` - identical structure with a well-formed single block; loads fine (control). |
| * `verify.py` - rebuilds both files and loads each in a child process, showing the differential. |
|
|
| ## Reproduce |
|
|
| ``` |
| pip install "mlx[cpu]" |
| python verify.py |
| ``` |
|
|
| Expected: |
|
|
| ``` |
| load baseline.gguf -> exit 0 |
| load evil.gguf -> SIGSEGV |
| ``` |
|
|
| Precise attribution (valgrind): |
|
|
| ``` |
| valgrind python -c "import mlx.core as mx; mx.load('evil.gguf')" |
| # Invalid read of size 16 / of size 2 |
| # at mlx::core::extract_q8_0_data(...) |
| # by mlx::core::gguf_load_quantized(...) |
| # by mlx::core::load_arrays(...) |
| # by mlx::core::load_gguf(...) |
| ``` |
|
|
| ## Impact |
|
|
| Loading an untrusted `.gguf` with `mlx.core.load()` reads out-of-bounds heap/mmap memory into the |
| dequantized output arrays. A large declared dimension walks off the mapping and crashes the process |
| (denial of service at model-load time); a smaller over-declared dimension reads adjacent heap bytes |
| into caller-visible arrays (information disclosure). No flags or non-default options are required. |
|
|
| ## Fix |
|
|
| In `gguf_load_quantized` (or `gguf_get_tensor` in the vendored gguflib), validate that the tensor's |
| declared element/block count implies a byte size that fits within `offset .. file_size` before the |
| extractor loops run. |
|
|