Instructions to use anerjy/step37-mlx-vlm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use anerjy/step37-mlx-vlm with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("anerjy/step37-mlx-vlm") config = load_config("anerjy/step37-mlx-vlm") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
anerjy commited on
Commit ·
a3971c4
1
Parent(s): 88161f1
Scrub remaining privacy markers from README + language.py docstring
Browse files- README.md: replace personal "restart engine" with
generic "restart your engine process"; replace "engine settings file"
(2 places) with "your engine's model settings file"; replace
"engine log" with "<your-engine-log>"; replace
"[engine]" log prefix with generic "[engine]".
- language.py: replace "2026-05-31 REDACTED-IP fan-out reported 130 such
events" docstring data point with generic "production fan-out
workload reported 100+ such events" — no internal client IPs.
Found by second-pass audit after user flagged ongoing privacy concerns.
- README.md +4 -4
- mlx_vlm/models/step3p7/language.py +2 -2
README.md
CHANGED
|
@@ -183,7 +183,7 @@ lm_head.
|
|
| 183 |
|
| 184 |
The MTP path activates automatically when these settings are set on
|
| 185 |
`Step-3.7-Flash-4bit` (or your equivalent model id) in
|
| 186 |
-
|
| 187 |
|
| 188 |
```json
|
| 189 |
{
|
|
@@ -195,7 +195,7 @@ The MTP path activates automatically when these settings are set on
|
|
| 195 |
`mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
|
| 196 |
(TurboQuant patches the attention path that MTP relies on). Bench results
|
| 197 |
above are with TurboQuant OFF. Restart oMLX after changing settings:
|
| 198 |
-
|
| 199 |
|
| 200 |
```
|
| 201 |
[engine] Native MTP patch applied for ... (model_type=step3p7, active)
|
|
@@ -227,9 +227,9 @@ python scripts/rewrite_mtp_shard.py \
|
|
| 227 |
Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
|
| 228 |
|
| 229 |
To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
|
| 230 |
-
(they're mutually exclusive per oMLX) in
|
| 231 |
for `Step-3.7-Flash-4bit`, restart oMLX, and check
|
| 232 |
-
`grep "MTP\[" your
|
| 233 |
shard rewrite yourself:
|
| 234 |
|
| 235 |
```bash
|
|
|
|
| 183 |
|
| 184 |
The MTP path activates automatically when these settings are set on
|
| 185 |
`Step-3.7-Flash-4bit` (or your equivalent model id) in
|
| 186 |
+
your engine's model settings file:
|
| 187 |
|
| 188 |
```json
|
| 189 |
{
|
|
|
|
| 195 |
`mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
|
| 196 |
(TurboQuant patches the attention path that MTP relies on). Bench results
|
| 197 |
above are with TurboQuant OFF. Restart oMLX after changing settings:
|
| 198 |
+
restart your engine process. Verify activation in the log:
|
| 199 |
|
| 200 |
```
|
| 201 |
[engine] Native MTP patch applied for ... (model_type=step3p7, active)
|
|
|
|
| 227 |
Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
|
| 228 |
|
| 229 |
To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
|
| 230 |
+
(they're mutually exclusive per oMLX) in your engine's model settings file
|
| 231 |
for `Step-3.7-Flash-4bit`, restart oMLX, and check
|
| 232 |
+
`grep "MTP\[" <your-engine-log>` for accept rate. To run the
|
| 233 |
shard rewrite yourself:
|
| 234 |
|
| 235 |
```bash
|
mlx_vlm/models/step3p7/language.py
CHANGED
|
@@ -521,8 +521,8 @@ def _patch_omlx_emit_prefill_boundary_snapshot() -> None:
|
|
| 521 |
ever written), reconstruction reports ``1/N blocks, 512 tokens``,
|
| 522 |
walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
|
| 523 |
cache to prevent stale sliding-window state``, request re-prefills the
|
| 524 |
-
full prompt.
|
| 525 |
-
single day,
|
| 526 |
``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
|
| 527 |
PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
|
| 528 |
prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).
|
|
|
|
| 521 |
ever written), reconstruction reports ``1/N blocks, 512 tokens``,
|
| 522 |
walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
|
| 523 |
cache to prevent stale sliding-window state``, request re-prefills the
|
| 524 |
+
full prompt. A production fan-out workload reported 100+ such events
|
| 525 |
+
in a single day, several GB of prefix-cache reuse lost. With this patch +
|
| 526 |
``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
|
| 527 |
PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
|
| 528 |
prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).
|