OpenMed Persian PII TookaBERT Large CoreML INT4

4-bit palettized CoreML export of a Persian PII token-classification model trained on the cleaned OpenMed Persian PII corpus.

Status

This repo contains a verified CoreML INT4 artifact for Apple deployment.

Important runtime contract:

  • model.4bit-palettized.mlpackage is fixed-shape batch 1, sequence length 256.
  • Inputs are int32: input_ids, attention_mask, and token_type_ids.
  • The verified CoreML graph uses an all-ones attention mask. Fill attention_mask with 1 for every position, including padded positions, then ignore special/pad offsets during span construction.
  • Use sliding windows for long text and merge by original character offsets.

Metrics

Original dense held-out test F1: 0.9800

CoreML contract quality check: PyTorch evaluation with the same all-ones attention behavior on the first 2,000 held-out test rows:

{
  "model_dir": "models/final_runs/dense-lowlr-combined-clean-20260530T222847-0700/PartAI__TookaBERT-Large",
  "dataset": "data/final_splits_audited/combined_clean",
  "split": "test",
  "rows": 2000,
  "max_length": 256,
  "batch_size": 32,
  "attention_mode": "all_ones",
  "device": "cuda",
  "precision": 0.9764901296875999,
  "recall": 0.978365230749536,
  "f1": 0.9774267809182761,
  "accuracy": 0.9945124547030911
}

CoreML parity verification:

{
  "attention_mode": "all_ones",
  "batch_size": 1,
  "max_length": 256,
  "fp32_argmax_match_rate": 1.0,
  "int4_argmax_match_rate": 0.9921875,
  "int4_max_abs_diff_vs_torch": 7.133697509765625
}

Recommended Inference Wrapper

Production use should wrap the CoreML model with:

  1. Sliding windows at max_length=256, with overlap/stride around 96.
  2. Token-offset span reconstruction; ignore special tokens and zero-length offsets.
  3. Whitespace trimming and overlap de-duplication.
  4. High-precision regex/rule assists for emails, Iranian mobile/phone numbers, national IDs, postal codes, dates, card numbers, and IMEI exclusion.
  5. Cue-word correction for labels near کد ملی, گواهینامه, گذرنامه, کدپستی, شماره تماس, and ایمیل.

See inference_coreml.py and CoreMLWrapperContract.swift for minimal wrapper contracts.

Known Edge Cases

  • Do not use a standard padding attention mask with this CoreML export. The verified path uses all-ones attention because CoreMLTools' converted transformer mask path did not preserve PyTorch parity.
  • Long documents require sliding-window inference.
  • Obfuscated contacts and verbal/spaced phone numbers need deterministic normalization/rules.
  • IMEI-like device IDs can look like card numbers; validate card numbers before masking as CREDITCARDNUMBER.

Best Persian-script dense model, but this CoreML contract drops slightly versus ONNX; trim leading whitespace spans in postprocessing.

Files

  • model.4bit-palettized.mlpackage: verified 4-bit CoreML model.
  • verification.json: fixture-level CoreML parity verification.
  • coreml_allones_hf_eval_test_2000.json: quality check for the CoreML attention contract.
  • tokenizer/config files from the trained checkpoint.
  • reports/: ad hoc edgecase reports from the dense model.
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/openmed-persian-pii-tookabert-large-coreml-int4

Quantized
(2)
this model

Collection including Reza2kn/openmed-persian-pii-tookabert-large-coreml-int4