anerjy commited on
Commit
a3971c4
·
1 Parent(s): 88161f1

Scrub remaining privacy markers from README + language.py docstring

Browse files

- README.md: replace personal "restart engine" with
generic "restart your engine process"; replace "engine settings file"
(2 places) with "your engine's model settings file"; replace
"engine log" with "<your-engine-log>"; replace
"[engine]" log prefix with generic "[engine]".
- language.py: replace "2026-05-31 REDACTED-IP fan-out reported 130 such
events" docstring data point with generic "production fan-out
workload reported 100+ such events" — no internal client IPs.

Found by second-pass audit after user flagged ongoing privacy concerns.

Files changed (2) hide show
  1. README.md +4 -4
  2. mlx_vlm/models/step3p7/language.py +2 -2
README.md CHANGED
@@ -183,7 +183,7 @@ lm_head.
183
 
184
  The MTP path activates automatically when these settings are set on
185
  `Step-3.7-Flash-4bit` (or your equivalent model id) in
186
- `your engine settings file`:
187
 
188
  ```json
189
  {
@@ -195,7 +195,7 @@ The MTP path activates automatically when these settings are set on
195
  `mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
196
  (TurboQuant patches the attention path that MTP relies on). Bench results
197
  above are with TurboQuant OFF. Restart oMLX after changing settings:
198
- `bash restart your engine`. Verify activation in the log:
199
 
200
  ```
201
  [engine] Native MTP patch applied for ... (model_type=step3p7, active)
@@ -227,9 +227,9 @@ python scripts/rewrite_mtp_shard.py \
227
  Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
228
 
229
  To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
230
- (they're mutually exclusive per oMLX) in `your engine settings file`
231
  for `Step-3.7-Flash-4bit`, restart oMLX, and check
232
- `grep "MTP\[" your engine log` for accept rate. To run the
233
  shard rewrite yourself:
234
 
235
  ```bash
 
183
 
184
  The MTP path activates automatically when these settings are set on
185
  `Step-3.7-Flash-4bit` (or your equivalent model id) in
186
+ your engine's model settings file:
187
 
188
  ```json
189
  {
 
195
  `mtp_enabled` and `turboquant_kv_enabled` are mutually exclusive per oMLX
196
  (TurboQuant patches the attention path that MTP relies on). Bench results
197
  above are with TurboQuant OFF. Restart oMLX after changing settings:
198
+ restart your engine process. Verify activation in the log:
199
 
200
  ```
201
  [engine] Native MTP patch applied for ... (model_type=step3p7, active)
 
227
  Gemma 4 spec dec on MLX was a useful template for diagnosing this one.
228
 
229
  To experiment, set `mtp_enabled: True` and `turboquant_kv_enabled: False`
230
+ (they're mutually exclusive per oMLX) in your engine's model settings file
231
  for `Step-3.7-Flash-4bit`, restart oMLX, and check
232
+ `grep "MTP\[" <your-engine-log>` for accept rate. To run the
233
  shard rewrite yourself:
234
 
235
  ```bash
mlx_vlm/models/step3p7/language.py CHANGED
@@ -521,8 +521,8 @@ def _patch_omlx_emit_prefill_boundary_snapshot() -> None:
521
  ever written), reconstruction reports ``1/N blocks, 512 tokens``,
522
  walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
523
  cache to prevent stale sliding-window state``, request re-prefills the
524
- full prompt. 2026-05-31 REDACTED-IP fan-out reported 130 such events in a
525
- single day, ~7 GB of prefix-cache reuse lost. With this patch +
526
  ``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
527
  PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
528
  prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).
 
521
  ever written), reconstruction reports ``1/N blocks, 512 tokens``,
522
  walk-back sees RotatingKVCache placeholder in block 0, ``Rejecting
523
  cache to prevent stale sliding-window state``, request re-prefills the
524
+ full prompt. A production fan-out workload reported 100+ such events
525
+ in a single day, several GB of prefix-cache reuse lost. With this patch +
526
  ``PyramidKVCacheHandler`` + 2-tuple ``.state`` (Invariants 11-12 below),
527
  PKV-on cache reuse is identical to PKV-off: 97.8% cached on a same-
528
  prompt resend, 20 K prefill cold→warm 60 s → 2.6 s (23× speedup).