Instructions to use sahilchachra/locateanything-3b-mxfp4-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use sahilchachra/locateanything-3b-mxfp4-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir locateanything-3b-mxfp4-mlx sahilchachra/locateanything-3b-mxfp4-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
locateanything-3b-mxfp4-mlx
MLX quantization of nvidia/LocateAnything-3B for Apple Silicon.
Variant: Block float MX FP4
Disk size: 2401 MB
Quantized by: sahilchachra
Note on effective bpw: mlx-vlm's quantizers only act on the language tower's linear weights. The vision encoder and embeddings stay at the source dtype (bf16), so the headline variant name reflects the LM-tower quantization while the on-disk size averages the two halves of the model.
Benchmark results
Evaluated on Apple M5 Pro with MLX. Model loaded once; performance and quality measured in a single pass.
Performance
| This model | FP16 baseline | |
|---|---|---|
| Decode tok/s (steady-state) | 140.25 | 48.49 |
| Prefill tok/s (steady-state) | 837.58 | 480.19 |
| Peak memory (GB) | 2.675 | 7.082 |
| Disk size (MB) | 2401 | 6725 |
Warmed, short-prompt, chat-templated, thinking disabled. Represents steady-state decode for typical chat use; long thinking traces will be slower due to KV-cache growth.
Quality
| Benchmark | This model | FP16 baseline | n |
|---|---|---|---|
| RefCOCOg (grounding, Acc@0.5) | 75.5% | 80.0% | 200 |
| RefCOCOg (grounding, Acc@0.75) | 66.5% | 71.0% | 200 |
| RefCOCOg (grounding, mean IoU) | 0.7091 | 0.7499 | 200 |
Usage
Note: this base model's
mlx-vlmsupport currently lives in a pull request and is not yet in a releasedmlx-vlm. Until it merges, install from the PR branch and pin compatible transformers / tokenizers:pip install "git+https://github.com/beshkenadze/mlx-vlm@feat/locateanything-3b" pip install "transformers<5.5" "tokenizers<0.22" torchvision
The PR branch has a known issue where the custom LocateAnythingProcessor
isn't auto-registered with transformers, so both mlx_vlm.load(...) and
python -m mlx_vlm.generate ... fail with an "Unrecognized processing
class" error. Workaround until the PR merges — register the classes
manually before loading:
import transformers
from mlx_vlm.models.locateanything.processing_locateanything import (
LocateAnythingProcessor,
)
from mlx_vlm.models.locateanything.image_processing_locateanything import (
LocateAnythingImageProcessor,
)
transformers.LocateAnythingProcessor = LocateAnythingProcessor
transformers.LocateAnythingImageProcessor = LocateAnythingImageProcessor
from mlx_vlm import load, generate
model, processor = load("sahilchachra/locateanything-3b-mxfp4-mlx")
prompt = "<image-1>\nPlease locate <ref>the remote</ref> in the image."
response = generate(model, processor, prompt=prompt,
image="path/to/image.jpg", max_tokens=128, verbose=True)
print(response.text) # e.g. <box><520><160><580><392></box>
Output coordinates are normalized to 0..1000. Multiply by image W/H to
get pixel-space (x1, y1, x2, y2).
All variants in this collection
| Model | Variant |
|---|---|
| sahilchachra/locateanything-3b-mxfp4-mlx | Block float MX FP4 ← this model |
Notes
- Requires Apple Silicon (M1 or later) with MLX
- Benchmarks run on Apple M5 Pro, 24 GB unified memory
- License: see nvidia/LocateAnything-3B for the original model's license
Original model
See nvidia/LocateAnything-3B for full model details and intended use.
- Downloads last month
- 194
4-bit