Scrub remaining privacy markers from README + language.py docstring

- README.md: replace personal "restart engine" with
generic "restart your engine process"; replace "engine settings file"
(2 places) with "your engine's model settings file"; replace
"engine log" with "<your-engine-log>"; replace
"[engine]" log prefix with generic "[engine]".
- language.py: replace "2026-05-31 REDACTED-IP fan-out reported 130 such
events" docstring data point with generic "production fan-out
workload reported 100+ such events" — no internal client IPs.

Found by second-pass audit after user flagged ongoing privacy concerns.

Files changed (2) hide show

README.md +4 -4
mlx_vlm/models/step3p7/language.py +2 -2

README.md CHANGED Viewed

@@ -183,7 +183,7 @@ lm_head.
 The MTP path activates automatically when these settings are set on
 `Step-3.7-Flash-4bit` (or your equivalent model id) in
-`your engine settings file`:
 ```json
 {
@@ -195,7 +195,7 @@ The MTP path activates automatically when these settings are set on
 `mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
 (TurboQuant patches the attention path that MTP relies on). Bench results
 above are with TurboQuant OFF. Restart oMLX after changing settings:
-`bash restart your engine`. Verify activation in the log:
 ```
 [engine] Native MTP patch applied for ... (model_type=step3p7, active)
@@ -227,9 +227,9 @@ python scripts/rewrite_mtp_shard.py \
   Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
 To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
-(they're mutually exclusive per oMLX) in `your engine settings file`
 for `Step-3.7-Flash-4bit`, restart oMLX, and check
-`grep "MTP\[" your engine log` for accept rate. To run the
 shard rewrite yourself:
 ```bash

 The MTP path activates automatically when these settings are set on
 `Step-3.7-Flash-4bit` (or your equivalent model id) in
+your engine's model settings file:
 ```json
 {
 `mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
 (TurboQuant patches the attention path that MTP relies on). Bench results
 above are with TurboQuant OFF. Restart oMLX after changing settings:
+restart your engine process. Verify activation in the log:
 ```
 [engine] Native MTP patch applied for ... (model_type=step3p7, active)
   Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
 To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
+(they're mutually exclusive per oMLX) in your engine's model settings file
 for `Step-3.7-Flash-4bit`, restart oMLX, and check
+`grep "MTP\[" <your-engine-log>` for accept rate. To run the
 shard rewrite yourself:
 ```bash

mlx_vlm/models/step3p7/language.py CHANGED Viewed

@@ -521,8 +521,8 @@ def _patch_omlx_emit_prefill_boundary_snapshot() -> None:
     ever written), reconstruction reports ``1/N blocks, 512 tokens``,
     walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
     cache to prevent stale sliding-window state``, request re-prefills the
-    full prompt. 2026-05-31 REDACTED-IP fan-out reported 130 such events in a
-    single day, ~7 GB of prefix-cache reuse lost. With this patch +
     ``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
     PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
     prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).

     ever written), reconstruction reports ``1/N blocks, 512 tokens``,
     walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
     cache to prevent stale sliding-window state``, request re-prefills the
+    full prompt. A production fan-out workload reported 100+ such events
+    in a single day, several GB of prefix-cache reuse lost. With this patch +
     ``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
     PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
     prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).