locateanything-3b-mxfp4-mlx

MLX quantization of nvidia/LocateAnything-3B for Apple Silicon.

Variant: Block float MX FP4
Disk size: 2401 MB
Quantized by: sahilchachra

Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at the source dtype (bf16), so the headline variant name reflects the LM-tower quantization while the on-disk size averages the two halves of the model.

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

This model FP16 baseline
Decode tok/s (steady-state) 140.25 48.49
Prefill tok/s (steady-state) 837.58 480.19
Peak memory (GB) 2.675 7.082
Disk size (MB) 2401 6725

Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.

Quality

Benchmark This model FP16 baseline n
RefCOCOg (grounding, Acc@0.5) 75.5% 80.0% 200
RefCOCOg (grounding, Acc@0.75) 66.5% 71.0% 200
RefCOCOg (grounding, mean IoU) 0.7091 0.7499 200

Usage

Note: this base model's mlx-vlm support currently lives in a pull request and is not yet in a released mlx-vlm. Until it merges, install from the PR branch and pin compatible transformers / tokenizers:

pip install "git+https://github.com/beshkenadze/mlx-vlm@feat/locateanything-3b"
pip install "transformers<5.5" "tokenizers<0.22" torchvision

The PR branch has a known issue where the custom LocateAnythingProcessor isn't auto-registered with transformers, so both mlx_vlm.load(...) and python -m mlx_vlm.generate ... fail with an "Unrecognized processing class" error. Workaround until the PR merges — register the classes manually before loading:

import transformers
from mlx_vlm.models.locateanything.processing_locateanything import (
    LocateAnythingProcessor,
)
from mlx_vlm.models.locateanything.image_processing_locateanything import (
    LocateAnythingImageProcessor,
)
transformers.LocateAnythingProcessor = LocateAnythingProcessor
transformers.LocateAnythingImageProcessor = LocateAnythingImageProcessor

from mlx_vlm import load, generate
model, processor = load("sahilchachra/locateanything-3b-mxfp4-mlx")
prompt = "<image-1>\nPlease locate <ref>the remote</ref> in the image."
response = generate(model, processor, prompt=prompt,
                    image="path/to/image.jpg", max_tokens=128, verbose=True)
print(response.text)  # e.g. <box><520><160><580><392></box>

Output coordinates are normalized to 0..1000. Multiply by image W/H to get pixel-space (x1, y1, x2, y2).

All variants in this collection

Model Variant
sahilchachra/locateanything-3b-mxfp4-mlx Block float MX FP4 ← this model

Notes

  • Requires Apple Silicon (M1 or later) with MLX
  • Benchmarks run on Apple M5 Pro, 24 GB unified memory
  • License: see nvidia/LocateAnything-3B for the original model's license

Original model

See nvidia/LocateAnything-3B for full model details and intended use.

Downloads last month
194
Safetensors
Model size
1B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/locateanything-3b-mxfp4-mlx

Base model

Qwen/Qwen2.5-3B
Quantized
(8)
this model