OpenMed Persian PII Google mBERT MLX INT4

Verified 4-bit MLX INT4 export of google-bert/bert-base-multilingual-cased fine-tuned for Persian/Iranian PII token classification.

Metrics

Dense held-out test F1: 0.9708

Runtime held-out slice eval (test, first 2,000 rows, max_length=256):

{
  "model": "artifacts/google-mbert-pii-4bit/mlx-custom",
  "dataset": "data/final_splits_audited/combined_clean",
  "split": "test",
  "rows": 2000,
  "max_length": 256,
  "batch_size": 16,
  "precision": 0.9730430274753759,
  "recall": 0.976142494961971,
  "f1": 0.9745902969333118,
  "accuracy": 0.9946134593879415
}

Fixture/runtime verification:

{
  "status": "converted_mlx_int4",
  "weights": "artifacts/google-mbert-pii-4bit/mlx-custom/weights.safetensors",
  "bits": 4,
  "group_size": 64,
  "max_length": 256,
  "verification": {
    "name": "mlx_int4",
    "shape": [
      2,
      256,
      39
    ],
    "argmax_match_rate_vs_unquantized_mlx": 0.966796875,
    "max_abs_diff_vs_unquantized_mlx": 9.43897533416748,
    "mean_abs_diff_vs_unquantized_mlx": 0.10117268562316895
  }
}

Runtime Contract

Use this model behind the same production wrapper as the ONNX/CoreML releases:

sliding-window inference, usually max_length=256 and stride around 96;
offset-based span reconstruction;
whitespace trimming and overlap de-duplication;
deterministic regex/rule assists for email, phone, national ID, postal code, date, card number, and IMEI exclusion;
cue-word correction around Persian labels such as کد ملی, شماره تماس, کدپستی, and ایمیل.

"""Minimal MLX wrapper contract.

This repo includes a custom BERT token-classification MLX runtime script in the source project. Load weights.safetensors into the same module shape, tokenize with the bundled tokenizer, run sliding windows, then reconstruct spans from offsets and apply the same regex/rule postprocessing used by the ONNX/CoreML packages. """

Compact and cleaner on mixed Persian/Latin/email text.

Downloads last month: 41

MLX

Hardware compatibility

Quantized

Model tree for Reza2kn/openmed-persian-pii-google-mbert-mlx-int4

Base model

google-bert/bert-base-multilingual-cased

Finetuned

(991)

this model

Collection including Reza2kn/openmed-persian-pii-google-mbert-mlx-int4

OpenMed Persian🇮🇷

Collection

OpenMed Persian models and datasets for Persian/Iranian privacy, PII masking, and medical-adjacent language infrastructure. • 10 items • Updated about 17 hours ago