richiejp commited on
Commit
6b34ced
ยท
verified ยท
1 Parent(s): c34e972

Add/update model card

Browse files
Files changed (1) hide show
  1. README.md +171 -0
README.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: OpenMed/privacy-filter-multilingual
4
+ base_model_relation: quantized
5
+ pipeline_tag: token-classification
6
+ library_name: gguf
7
+ tags:
8
+ - gguf
9
+ - llama-cpp
10
+ - localai
11
+ - token-classification
12
+ - pii
13
+ - ner
14
+ - privacy
15
+ - redaction
16
+ - multilingual
17
+ - openai-privacy-filter
18
+ language:
19
+ - ar
20
+ - bn
21
+ - de
22
+ - en
23
+ - es
24
+ - fr
25
+ - hi
26
+ - it
27
+ - ja
28
+ - ko
29
+ - nl
30
+ - pt
31
+ - te
32
+ - tr
33
+ - vi
34
+ - zh
35
+ ---
36
+
37
+ # privacy-filter-multilingual โ€” GGUF (F16)
38
+
39
+ GGUF conversion of [`OpenMed/privacy-filter-multilingual`](https://huggingface.co/OpenMed/privacy-filter-multilingual),
40
+ a multilingual PII **token-classification** model (a fine-tune of
41
+ [`openai/privacy-filter`](https://huggingface.co/openai/privacy-filter)). It labels every
42
+ token with a BIOES tag over **54 PII categories (217 classes)** across **16 languages**, so
43
+ it can be served locally with **no Python** as the encoder/NER tier of a PII redactor.
44
+
45
+ For the full model description, label space, evaluation, limitations, and citations, see the
46
+ **[source model card](https://huggingface.co/OpenMed/privacy-filter-multilingual)** โ€” this
47
+ card only covers the GGUF packaging and how to run it.
48
+
49
+ ---
50
+
51
+ ## โš ๏ธ Requires a patched llama.cpp โ€” does NOT run on stock builds
52
+
53
+ > **Read this before downloading.** This GGUF uses a **custom architecture,
54
+ > `openai-privacy-filter`, that is not (yet) part of upstream llama.cpp.** It will **fail to
55
+ > load** in stock `llama.cpp`, `llama-cpp-python`, Ollama, LM Studio, or any other off-the-shelf
56
+ > GGUF runtime. You will see an error like `unknown model architecture: 'openai-privacy-filter'`.
57
+ >
58
+ > It runs **only** on a build that carries the patches **[LocalAI PR#10160](https://github.com/mudler/LocalAI/pull/10160)**
59
+ > ships in `backend/cpp/llama-cpp/patches/` (applied automatically by the backend's
60
+ > `prepare.sh`). Those five patches add:
61
+ > 1. **TOKEN_CLS pooling** (`LLAMA_POOLING_TYPE_TOKEN_CLS`) โ€” a per-token classification head
62
+ > (a reduced subset of upstream **[PR #19725](https://github.com/ggml-org/llama.cpp/pull/19725)**, still open);
63
+ > 2. the **`openai-privacy-filter` architecture** registration;
64
+ > 3. the **HFโ†’GGUF converter** for it;
65
+ > 4. the **bidirectional banded-attention graph** + loader (non-causal symmetric sliding
66
+ > window, attention sinks, interleaved RoPE, exact YaRN `truncate=false` frequencies);
67
+ > 5. an **all-SWA no-cache mask fix** required for an encoder that windows every layer.
68
+ >
69
+ > These are **carry-patches against a pinned llama.cpp commit**, not upstream features. Until
70
+ > the architecture is upstreamed (it depends on PR #19725 landing first), running this model
71
+ > means using LocalAI's vendored backend. At time of writing the support lives on a LocalAI
72
+ > feature branch โ€” use a LocalAI build that includes the `openai-privacy-filter` patches.
73
+
74
+ ---
75
+
76
+ ## Use with LocalAI
77
+
78
+ Install from the LocalAI model gallery (the entry sets `backend: llama-cpp`,
79
+ `embeddings: true`, and `known_usecases: [token_classify]`):
80
+
81
+ ```bash
82
+ local-ai models install privacy-filter-multilingual
83
+ ```
84
+
85
+ The model is **not** a chat/completion model โ€” it exposes the gRPC `TokenClassify` RPC. It is a
86
+ **PII detector**: it carries its own detection policy in a `pii_detection:` block, and other
87
+ models opt in by listing it in `pii.detectors`. The gallery entry ships a sensible default
88
+ policy (mask everything detected; block credentials/financial-secrets/crypto):
89
+
90
+ ```yaml
91
+ # the detector model (this GGUF) โ€” policy lives here
92
+ name: privacy-filter-multilingual
93
+ backend: llama-cpp
94
+ embeddings: true
95
+ known_usecases: [token_classify]
96
+ pii_detection:
97
+ min_score: 0.5
98
+ default_action: mask # mask | block | allow
99
+ entity_actions: # which categories to block vs mask
100
+ PASSWORD: block
101
+ CREDITCARD: block
102
+ CVV: block
103
+ ```
104
+
105
+ ```yaml
106
+ # any chat or cloud-proxy model โ€” opt in and reference the detector(s)
107
+ name: my-assistant
108
+ backend: llama-cpp
109
+ pii:
110
+ enabled: true
111
+ detectors:
112
+ - privacy-filter-multilingual
113
+ ```
114
+
115
+ LocalAI runs the model's **constrained BIOES Viterbi** decode in the backend and returns
116
+ entity spans with **UTF-8 byte offsets**; the redactor masks/blocks per the detector's
117
+ `pii_detection` policy. Multiple detectors union their hits (strongest action wins).
118
+
119
+ > Load note: the model must be loaded with **TOKEN_CLS pooling** (it is the GGUF's default โ€”
120
+ > the LocalAI gallery config and `embeddings: true` handle this). If you drive `llama-embedding`
121
+ > directly for testing, do **not** pass `--pooling none`; that overrides the model default and
122
+ > you get raw hidden states instead of label logits.
123
+
124
+ ## Files
125
+
126
+ | File | Precision | Size | Notes |
127
+ |---|---|---|---|
128
+ | `privacy-filter-multilingual-f16.gguf` | F16 | ~2.7 GB | 156 tensors; 217 `classifier.output_labels`; `pooling_type = TOKEN_CLS`. Validated artifact. |
129
+
130
+ F16 is the validated, shipped precision. Quantized variants are deferred until they can be
131
+ evaluated with a **task metric (span-F1 per language) + KL-vs-F16** โ€” perplexity is meaningless
132
+ for a classifier, so a naively-quantized GGUF is not published here yet.
133
+
134
+ ## Architecture & conversion
135
+
136
+ gpt-oss-style sparse **MoE** (8 layers, `d_model=640`, 128 experts, top-4 routing, ~50M active
137
+ per token), **bidirectional banded attention** (symmetric sliding window 128, attention sinks
138
+ retained), **interleaved (GPT-J) RoPE** with YaRN (ฮธ=150000, factor 32), o200k (`o200k_base`)
139
+ tokenizer, and a 217-way token-classification head (`score` โ†’ `cls.output`).
140
+
141
+ The conversion reproduces the HF reference **exactly at F16**: token-for-token argmax match
142
+ (12/12 on the parity prompt set), **full-logit cosine = 1.0**, every layer's residual-stream
143
+ cosine = 1.0 (relerr โ‰ˆ 2e-4, i.e. F16 rounding). The two load-bearing conversion choices โ€” the
144
+ expert `gate_up` `chunk(2)` split and the `n_swa = 2ยทsliding_window` window mapping โ€” are both
145
+ confirmed by that parity. See LocalAI's `backend/cpp/llama-cpp/patches/README.md` for the full
146
+ provenance.
147
+
148
+ ## Label space
149
+
150
+ `O` plus `B-`/`I-`/`E-`/`S-` for each of 54 categories (1 + 54ร—4 = 217), spanning identity,
151
+ contact, address, dates/time, government IDs, financial, crypto, vehicle, digital, and auth
152
+ entities. The ordered `id2label` table is embedded in the GGUF (`classifier.output_labels`).
153
+ See the [source card](https://huggingface.co/OpenMed/privacy-filter-multilingual#label-space-54-categories)
154
+ for the full list.
155
+
156
+ ## Limitations & intended use
157
+
158
+ Identical to the [source model](https://huggingface.co/OpenMed/privacy-filter-multilingual#limitations--intended-use):
159
+ multilingual but uneven (strongest on de/es/fr/it/hi/te/en; weaker on CJK), trained on
160
+ synthetic AI4Privacy data, **not** a substitute for legal/compliance review, and **not** a
161
+ clinical PHI model. Use it as one tier behind deterministic regex pre-filters and human review.
162
+
163
+ ## License
164
+
165
+ **Apache-2.0**, inherited from `openai/privacy-filter` and `OpenMed/privacy-filter-multilingual`.
166
+
167
+ ## Credits & citation
168
+
169
+ Conversion and runtime support by the **LocalAI** project. The model itself is by **OpenMed**,
170
+ fine-tuned from **OpenAI**'s `privacy-filter`, on **AI4Privacy** datasets โ€” please cite all of
171
+ them (BibTeX in the [source card](https://huggingface.co/OpenMed/privacy-filter-multilingual#citation)).