// Minimal Swift contract for the OpenMed Persian PII CoreML INT4 export. // Tokenization is intentionally external: use the bundled tokenizer files or a // compatible WordPiece tokenizer, then feed one padded 256-token window at a time. // // Required input tensors: // - input_ids: Int32 shaped [1, 256] // - attention_mask: Int32 shaped [1, 256], filled with 1 for every position // - token_type_ids: Int32 shaped [1, 256], usually all 0 // // Required postprocessing: // - ignore special tokens and zero-length/pad offsets // - trim whitespace from predicted spans // - merge overlapping sliding-window spans // - add regex/rule assists for phone, email, national ID, postal code, card, date