locateanything-3b-mxfp4-mlx

MLX quantization of nvidia/LocateAnything-3B for Apple Silicon.

Variant: Block float MX FP4
Disk size: 2401 MB
Quantized by: sahilchachra

Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at the source dtype (bf16), so the headline variant name reflects the LM-tower quantization while the on-disk size averages the two halves of the model.

Benchmark results

Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.

Performance

	This model	FP16 baseline
Decode tok/s (steady-state)	140.25	48.49
Prefill tok/s (steady-state)	837.58	480.19
Peak memory (GB)	2.675	7.082
Disk size (MB)	2401	6725

Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.

Quality

Benchmark	This model	FP16 baseline	n
RefCOCOg (grounding, Acc@0.5)	75.5%	80.0%	200
RefCOCOg (grounding, Acc@0.75)	66.5%	71.0%	200
RefCOCOg (grounding, mean IoU)	0.7091	0.7499	200

Usage

Note: this base model's mlx-vlm support currently lives in a pull request and is not yet in a released mlx-vlm. Until it merges, install from the PR branch and pin compatible transformers / tokenizers:
pip install "git+https://github.com/beshkenadze/mlx-vlm@feat/locateanything-3b"
pip install "transformers<5.5" "tokenizers<0.22" torchvision

The PR branch has a known issue where the custom LocateAnythingProcessor isn't auto-registered with transformers, so both mlx_vlm.load(...) and python -m mlx_vlm.generate ... fail with an "Unrecognized processing class" error. Workaround until the PR merges — register the classes manually before loading:

import transformers
from mlx_vlm.models.locateanything.processing_locateanything import (
    LocateAnythingProcessor,
)
from mlx_vlm.models.locateanything.image_processing_locateanything import (
    LocateAnythingImageProcessor,
)
transformers.LocateAnythingProcessor = LocateAnythingProcessor
transformers.LocateAnythingImageProcessor = LocateAnythingImageProcessor

from mlx_vlm import load, generate
model, processor = load("sahilchachra/locateanything-3b-mxfp4-mlx")
prompt = "<image-1>\nPlease locate <ref>the remote</ref> in the image."
response = generate(model, processor, prompt=prompt,
                    image="path/to/image.jpg", max_tokens=128, verbose=True)
print(response.text)  # e.g. <box><520><160><580><392></box>

Output coordinates are normalized to 0..1000. Multiply by image W/H to get pixel-space (x1, y1, x2, y2).

All variants in this collection

Model	Variant
sahilchachra/locateanything-3b-mxfp4-mlx	Block float MX FP4 ← this model

Notes

Requires Apple Silicon (M1 or later) with MLX
Benchmarks run on Apple M5 Pro, 24 GB unified memory
License: see nvidia/LocateAnything-3B for the original model's license

Original model

See nvidia/LocateAnything-3B for full model details and intended use.

Downloads last month: 194

Safetensors

Model size

1B params

Tensor type

U32

BF16

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sahilchachra/locateanything-3b-mxfp4-mlx

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct

Finetuned

nvidia/LocateAnything-3B

Quantized

(8)

this model