ogmatrixllm commited on
Commit
105c9ff
·
verified ·
1 Parent(s): 8595792

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +332 -332
README.md CHANGED
@@ -1,332 +1,332 @@
1
- ---
2
- license: mit
3
- language:
4
- - en
5
- tags:
6
- - text-classification
7
- - ai-text-detection
8
- - deberta-v3
9
- - binary-classification
10
- - nlp
11
- datasets:
12
- - liamdugan/raid
13
- - artem9k/ai-text-detection-pile
14
- - gsingh1-py/train
15
- - cc_news
16
- - blog_authorship_corpus
17
- - webis/tldr-17
18
- - ChristophSchuhmann/essays-with-instructions
19
- - HuggingFaceH4/stack-exchange-preferences
20
- - pile-of-law/pile-of-law
21
- metrics:
22
- - accuracy
23
- - f1
24
- - precision
25
- - recall
26
- - roc_auc
27
- pipeline_tag: text-classification
28
- model-index:
29
- - name: GLYPH
30
- results:
31
- - task:
32
- type: text-classification
33
- name: AI-Generated Text Detection
34
- metrics:
35
- - name: Accuracy
36
- type: accuracy
37
- value: 0.9885
38
- - name: F1
39
- type: f1
40
- value: 0.9901
41
- - name: Precision
42
- type: precision
43
- value: 0.9851
44
- - name: Recall
45
- type: recall
46
- value: 0.9952
47
- - name: ROC-AUC
48
- type: roc_auc
49
- value: 0.9990
50
- - name: MCC
51
- type: mcc
52
- value: 0.9765
53
- ---
54
-
55
- # GLYPH — High-Accuracy AI Text Detector
56
-
57
- GLYPH is a binary text classifier built on [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) that distinguishes human-written text from AI-generated text. It achieves **98.85% accuracy**, **0.999 ROC-AUC**, and **0.990 F1** on a held-out test set spanning 10 human writing domains and 14 AI model families — from GPT-2 (1.5B) through GPT-4 (~1T).
58
-
59
- The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.
60
-
61
- ## Key Results
62
-
63
- | Metric | Value |
64
- |---|---|
65
- | **Accuracy** | 98.85% |
66
- | **F1 Score** | 0.9901 |
67
- | **Precision** | 98.51% |
68
- | **Recall** | 99.52% |
69
- | **ROC-AUC** | 0.9990 |
70
- | **Average Precision** | 0.9993 |
71
- | **MCC** | 0.9765 |
72
- | **Human Accuracy** | 97.94% |
73
- | **AI Accuracy** | 99.52% |
74
- | **Mean Confidence** | 0.976 |
75
- | **F1 @ 0.95 threshold** | 0.987 |
76
-
77
- All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.
78
-
79
- ## Per-Source Performance
80
-
81
- ### Human Text Sources
82
-
83
- | Source | Domain | n | Accuracy | Confidence |
84
- |---|---|---|---|---|
85
- | PubMed Abstracts | Biomedical research | 300 | **100.0%** | 0.988 |
86
- | Blog / Opinion | Personal blogs | 200 | **100.0%** | 0.987 |
87
- | Reddit Writing | Informal / social | 300 | **100.0%** | 0.985 |
88
- | Wikipedia | Encyclopedic | 500 | **99.8%** | 0.987 |
89
- | CC-News | Journalism | 392 | **99.5%** | 0.981 |
90
- | arXiv Abstracts | Academic / scientific | 444 | **90.8%** | 0.948 |
91
-
92
- arXiv abstracts are the hardest category — highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.
93
-
94
- ### AI Model Families
95
-
96
- | Model | Family | Params | n | Accuracy | F1 |
97
- |---|---|---|---|---|---|
98
- | GPT-3.5-Turbo | OpenAI | 175B | 223 | **100.0%** | 1.000 |
99
- | GPT-4 | OpenAI | ~1T | 215 | **100.0%** | 1.000 |
100
- | Llama-2-70B-Chat | Meta | 70B | 191 | **100.0%** | 1.000 |
101
- | MPT-30B | MosaicML | 30B | 211 | **100.0%** | 1.000 |
102
- | MPT-30B-Chat | MosaicML | 30B | 191 | **100.0%** | 1.000 |
103
- | Mistral-7B-Instruct-v0.1 | Mistral AI | 7B | 194 | **100.0%** | 1.000 |
104
- | Mistral-7B-v0.1 | Mistral AI | 7B | 203 | **100.0%** | 1.000 |
105
- | Llama-3.1-8B-Instruct | Meta | 8B | 238 | **99.6%** | 0.998 |
106
- | Phi-3.5-Mini-Instruct | Microsoft | 3.8B | 238 | **99.6%** | 0.998 |
107
- | Command-Chat | Cohere | 52B | 198 | **99.5%** | 0.997 |
108
- | Text-Davinci-002 | OpenAI | 175B | 176 | **99.4%** | 0.997 |
109
- | Llama-3.2-3B-Instruct | Meta | 3B | 238 | **99.2%** | 0.996 |
110
- | GPT-2-XL | OpenAI | 1.5B | 198 | **98.5%** | 0.992 |
111
- | Cohere Command | Cohere | 52B | 200 | **97.5%** | 0.987 |
112
-
113
- Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).
114
-
115
- ### Performance by Text Length
116
-
117
- | Length Bucket | n | Accuracy | F1 |
118
- |---|---|---|---|
119
- | Very Long (>2000 words) | 103 | **100.0%** | 1.000 |
120
- | Long (500–2000 words) | 862 | **99.9%** | 0.999 |
121
- | Short (50–150 words) | 1,976 | **98.5%** | 0.989 |
122
- | Medium (150–500 words) | 1,634 | **98.8%** | 0.989 |
123
- | Very Short (<50 words) | 475 | **98.1%** | 0.899 |
124
-
125
- Performance degrades gracefully with shorter inputs. Even on texts under 50 words — where the model has minimal signal — accuracy remains above 98%.
126
-
127
- ### Threshold Sensitivity
128
-
129
- The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:
130
-
131
- | P(AI) Threshold | F1 | Precision |
132
- |---|---|---|
133
- | 0.50 (default) | 0.990 | 0.985 |
134
- | 0.60 | 0.991 | 0.987 |
135
- | 0.70 | 0.992 | 0.990 |
136
- | 0.80 | 0.992 | 0.992 |
137
- | 0.90 | 0.991 | 0.993 |
138
- | 0.95 | 0.987 | 0.996 |
139
-
140
- At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 — suitable for high-stakes applications where false accusations of AI usage carry serious consequences.
141
-
142
- ## Architecture
143
-
144
- | Component | Details |
145
- |---|---|
146
- | Base model | `microsoft/deberta-v3-base` (184M parameters) |
147
- | Architecture | DeBERTa-v3 with disentangled attention and enhanced mask decoder |
148
- | Task head | Linear classifier (768 → 2) with 0.15 dropout |
149
- | Tokenizer | SentencePiece (slow tokenizer, `use_fast=False`) |
150
- | Max sequence length | 512 tokens |
151
- | Output | `[P(human), P(AI)]` softmax probabilities |
152
-
153
- DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.
154
-
155
- ## Training
156
-
157
- ### Configuration
158
-
159
- | Parameter | Value |
160
- |---|---|
161
- | Trainable parameters | 184,423,682 (100% — all layers unfrozen) |
162
- | Optimizer | AdamW (weight decay 0.01) |
163
- | Learning rate | 2e-5 (cosine schedule) |
164
- | Warmup | 10% of total steps |
165
- | Effective batch size | 64 (16 × 4 gradient accumulation) |
166
- | Precision | bf16 mixed precision |
167
- | Gradient checkpointing | Enabled (non-reentrant) |
168
- | Label smoothing | 0.05 |
169
- | Class weights | human=1.182, ai=0.867 |
170
- | Epochs | 8 (early-stopped at 3.17) |
171
- | Best checkpoint | Epoch 1.19 (by validation F1) |
172
- | Training time | ~49 minutes on RTX 4070 Ti 12GB |
173
- | Final train loss | 0.186 |
174
- | Final eval loss | 0.150 |
175
-
176
- ### Why Fully Unfrozen?
177
-
178
- Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias — the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% → 99.5%).
179
-
180
- ### Dataset Composition
181
-
182
- **Total: 50,458 texts** (40,364 train / 5,044 validation / 5,050 test)
183
-
184
- Stratified by source with hash-based deduplication to prevent data leakage.
185
-
186
- #### Human Sources (10 domains, ~29K target)
187
-
188
- | Domain | Source | Target Count | Text Type |
189
- |---|---|---|---|
190
- | Academic (STEM) | arXiv API | 5,000 | Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat) |
191
- | Academic (Medical) | PubMed API | 3,000 | Biomedical research abstracts |
192
- | Encyclopedic | Wikipedia API | 5,000 | Article sections across 10 topic categories |
193
- | Journalism | CC-News (HuggingFace) | 4,000 | News articles |
194
- | Literary / Creative | Project Gutenberg | 2,000 | Public domain book excerpts |
195
- | Informal / Social | Reddit (webis/tldr-17) | 3,000 | Writing-focused subreddit posts |
196
- | Student / Educational | PERSUADE corpus | 2,000 | Student essays |
197
- | Technical / Q&A | StackExchange | 2,000 | Technical answers |
198
- | Blog / Opinion | Blog Authorship Corpus | 2,000 | Personal blog posts |
199
- | Legal / Formal | Pile of Law | 1,000 | Legal opinions and case summaries |
200
-
201
- #### AI Sources (24 model configurations across 10 families)
202
-
203
- **Locally generated via LM Studio (8 models, Q4_K_M quantization):**
204
-
205
- | Model | Family | Parameters |
206
- |---|---|---|
207
- | Llama-3.1-8B-Instruct | Meta Llama | 8B |
208
- | Llama-3.2-3B-Instruct | Meta Llama | 3B |
209
- | Mistral-7B-Instruct-v0.3 | Mistral AI | 7B |
210
- | Qwen2.5-7B-Instruct | Alibaba Qwen | 7B |
211
- | Qwen2.5-14B-Instruct | Alibaba Qwen | 14B |
212
- | Gemma-2-9B-Instruct | Google | 9B |
213
- | Phi-3.5-Mini-Instruct | Microsoft | 3.8B |
214
- | DeepSeek-V2-Lite-Chat | DeepSeek | 16B (MoE) |
215
-
216
- Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output — no markdown, no meta-commentary, no self-referential AI language.
217
-
218
- **HuggingFace datasets (16 additional model families):**
219
-
220
- | Dataset | Models Added | Reference |
221
- |---|---|---|
222
- | RAID (ACL 2024) | ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL | [liamdugan/raid](https://huggingface.co/datasets/liamdugan/raid) |
223
- | AI Text Detection Pile | GPT-2/3/J/ChatGPT (mixed) | [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) |
224
- | NYT Multi-Model | GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B | [gsingh1-py/train](https://huggingface.co/datasets/gsingh1-py/train) |
225
-
226
- This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (`attack=="none"`) for training data quality.
227
-
228
- ## Usage
229
-
230
- ### With Transformers
231
-
232
- ```python
233
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
234
- import torch
235
-
236
- model_name = "ogmatrixai/glyph" # Replace with your repo path
237
- tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
238
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
239
- model.eval()
240
-
241
- text = "Your text to classify here..."
242
-
243
- inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
244
- with torch.no_grad():
245
- logits = model(**inputs).logits
246
- probs = torch.softmax(logits, dim=-1)
247
-
248
- p_human, p_ai = probs[0].tolist()
249
- label = "AI-generated" if p_ai > 0.5 else "Human-written"
250
- confidence = max(p_human, p_ai)
251
-
252
- print(f"{label} (confidence: {confidence:.1%})")
253
- ```
254
-
255
- ### With Pipeline
256
-
257
- ```python
258
- from transformers import pipeline
259
-
260
- detector = pipeline(
261
- "text-classification",
262
- model="ogmatrixai/glyph", # Replace with your repo path
263
- tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
264
- )
265
-
266
- result = detector("Your text here...")
267
- print(result)
268
- # [{'label': 'LABEL_1', 'score': 0.98}] # LABEL_0 = human, LABEL_1 = AI
269
- ```
270
-
271
- ### Important Notes
272
-
273
- - **Tokenizer**: Always use `use_fast=False`. The fast tokenizer for DeBERTa-v3 has a confirmed regression in `transformers>=4.47` ([#42583](https://github.com/huggingface/transformers/issues/42583)) that crashes on load.
274
- - **Max length**: The model was trained with `max_length=512`. Longer texts should be truncated or chunked with predictions aggregated.
275
- - **Labels**: `LABEL_0` = human, `LABEL_1` = AI-generated.
276
-
277
- ## Limitations and Ethical Considerations
278
-
279
- ### Known Limitations
280
-
281
- 1. **English only.** GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.
282
-
283
- 2. **Training distribution.** The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial — no static detector provides permanent robustness.
284
-
285
- 3. **arXiv abstracts remain the hardest domain** at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.
286
-
287
- 4. **Short texts (<50 words)** have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.
288
-
289
- 5. **Adversarial attacks.** The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.
290
-
291
- 6. **Mixed authorship.** GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.
292
-
293
- 7. **512-token window.** Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350–400 words only. Sliding-window aggregation is recommended for long-form content.
294
-
295
- ### Ethical Considerations
296
-
297
- AI text detection carries real consequences — academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.
298
-
299
- **Recommendations for responsible deployment:**
300
-
301
- - Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
302
- - Apply a high confidence threshold (≥0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
303
- - Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
304
- - Maintain an appeals process. Statistical classifiers will always produce errors.
305
- - Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.
306
-
307
- ## Training Infrastructure
308
-
309
- | Component | Specification |
310
- |---|---|
311
- | GPU | NVIDIA GeForce RTX 4070 Ti (12GB VRAM) |
312
- | CPU | Intel Core i7-14700K (20 cores) |
313
- | RAM | 48GB DDR5 |
314
- | Framework | PyTorch 2.6+ / HuggingFace Transformers |
315
- | Precision | bf16 mixed precision |
316
- | Total training time | 49 minutes |
317
- | Experiment tracking | Weights & Biases |
318
-
319
- ## Citation
320
-
321
- ```bibtex
322
- @misc{glyph2026,
323
- title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
324
- author={OGMatrixAI},
325
- year={2026},
326
- url={https://huggingface.co/ogmatrixai/glyph}
327
- }
328
- ```
329
-
330
- ## Acknowledgments
331
-
332
- Training data incorporates the [RAID benchmark](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024), the [AI Text Detection Pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile), and the [NYT Multi-Model dataset](https://huggingface.co/datasets/gsingh1-py/train). Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) by Microsoft Research.
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - text-classification
7
+ - ai-text-detection
8
+ - deberta-v3
9
+ - binary-classification
10
+ - nlp
11
+ datasets:
12
+ - liamdugan/raid
13
+ - artem9k/ai-text-detection-pile
14
+ - gsingh1-py/train
15
+ - cc_news
16
+ - blog_authorship_corpus
17
+ - webis/tldr-17
18
+ - ChristophSchuhmann/essays-with-instructions
19
+ - HuggingFaceH4/stack-exchange-preferences
20
+ - pile-of-law/pile-of-law
21
+ metrics:
22
+ - accuracy
23
+ - f1
24
+ - precision
25
+ - recall
26
+ - roc_auc
27
+ pipeline_tag: text-classification
28
+ model-index:
29
+ - name: GLYPH
30
+ results:
31
+ - task:
32
+ type: text-classification
33
+ name: AI-Generated Text Detection
34
+ metrics:
35
+ - name: Accuracy
36
+ type: accuracy
37
+ value: 0.9885
38
+ - name: F1
39
+ type: f1
40
+ value: 0.9901
41
+ - name: Precision
42
+ type: precision
43
+ value: 0.9851
44
+ - name: Recall
45
+ type: recall
46
+ value: 0.9952
47
+ - name: ROC-AUC
48
+ type: roc_auc
49
+ value: 0.9990
50
+ - name: MCC
51
+ type: mcc
52
+ value: 0.9765
53
+ ---
54
+
55
+ # GLYPH — High-Accuracy AI Text Detector
56
+
57
+ GLYPH is a binary text classifier built on [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) that distinguishes human-written text from AI-generated text. It achieves **98.85% accuracy**, **0.999 ROC-AUC**, and **0.990 F1** on a held-out test set spanning 10 human writing domains and 14 AI model families — from GPT-2 (1.5B) through GPT-4 (~1T).
58
+
59
+ The model was trained on ~50K texts covering academic papers, news articles, blog posts, Reddit discussions, legal filings, Wikipedia, student essays, and technical Q&A on the human side, and outputs from 24 distinct AI model configurations across 10 model families on the AI side. It produces well-separated, high-confidence predictions (mean confidence 0.976) and remains accurate even at the strictest decision thresholds.
60
+
61
+ ## Key Results
62
+
63
+ | Metric | Value |
64
+ |---|---|
65
+ | **Accuracy** | 98.85% |
66
+ | **F1 Score** | 0.9901 |
67
+ | **Precision** | 98.51% |
68
+ | **Recall** | 99.52% |
69
+ | **ROC-AUC** | 0.9990 |
70
+ | **Average Precision** | 0.9993 |
71
+ | **MCC** | 0.9765 |
72
+ | **Human Accuracy** | 97.94% |
73
+ | **AI Accuracy** | 99.52% |
74
+ | **Mean Confidence** | 0.976 |
75
+ | **F1 @ 0.95 threshold** | 0.987 |
76
+
77
+ All metrics evaluated on a held-out test set of 5,050 texts (2,136 human / 2,914 AI) with no overlap in source texts, split hashes, or temporal leakage with the training set.
78
+
79
+ ## Per-Source Performance
80
+
81
+ ### Human Text Sources
82
+
83
+ | Source | Domain | n | Accuracy | Confidence |
84
+ |---|---|---|---|---|
85
+ | PubMed Abstracts | Biomedical research | 300 | **100.0%** | 0.988 |
86
+ | Blog / Opinion | Personal blogs | 200 | **100.0%** | 0.987 |
87
+ | Reddit Writing | Informal / social | 300 | **100.0%** | 0.985 |
88
+ | Wikipedia | Encyclopedic | 500 | **99.8%** | 0.987 |
89
+ | CC-News | Journalism | 392 | **99.5%** | 0.981 |
90
+ | arXiv Abstracts | Academic / scientific | 444 | **90.8%** | 0.948 |
91
+
92
+ arXiv abstracts are the hardest category — highly formulaic academic prose with structural similarity to AI output. Even so, detection accuracy is 90.8% with 94.8% mean confidence, and the remaining errors are concentrated in a small subset of unusually short or template-heavy abstracts.
93
+
94
+ ### AI Model Families
95
+
96
+ | Model | Family | Params | n | Accuracy | F1 |
97
+ |---|---|---|---|---|---|
98
+ | GPT-3.5-Turbo | OpenAI | 175B | 223 | **100.0%** | 1.000 |
99
+ | GPT-4 | OpenAI | ~1T | 215 | **100.0%** | 1.000 |
100
+ | Llama-2-70B-Chat | Meta | 70B | 191 | **100.0%** | 1.000 |
101
+ | MPT-30B | MosaicML | 30B | 211 | **100.0%** | 1.000 |
102
+ | MPT-30B-Chat | MosaicML | 30B | 191 | **100.0%** | 1.000 |
103
+ | Mistral-7B-Instruct-v0.1 | Mistral AI | 7B | 194 | **100.0%** | 1.000 |
104
+ | Mistral-7B-v0.1 | Mistral AI | 7B | 203 | **100.0%** | 1.000 |
105
+ | Llama-3.1-8B-Instruct | Meta | 8B | 238 | **99.6%** | 0.998 |
106
+ | Phi-3.5-Mini-Instruct | Microsoft | 3.8B | 238 | **99.6%** | 0.998 |
107
+ | Command-Chat | Cohere | 52B | 198 | **99.5%** | 0.997 |
108
+ | Text-Davinci-002 | OpenAI | 175B | 176 | **99.4%** | 0.997 |
109
+ | Llama-3.2-3B-Instruct | Meta | 3B | 238 | **99.2%** | 0.996 |
110
+ | GPT-2-XL | OpenAI | 1.5B | 198 | **98.5%** | 0.992 |
111
+ | Cohere Command | Cohere | 52B | 200 | **97.5%** | 0.987 |
112
+
113
+ Detection is robust across four generations of language models (GPT-2 through GPT-4), three access paradigms (open-weight, API-only, and proprietary), and parameter counts spanning three orders of magnitude (1.5B to ~1T).
114
+
115
+ ### Performance by Text Length
116
+
117
+ | Length Bucket | n | Accuracy | F1 |
118
+ |---|---|---|---|
119
+ | Very Long (>2000 words) | 103 | **100.0%** | 1.000 |
120
+ | Long (500–2000 words) | 862 | **99.9%** | 0.999 |
121
+ | Short (50–150 words) | 1,976 | **98.5%** | 0.989 |
122
+ | Medium (150–500 words) | 1,634 | **98.8%** | 0.989 |
123
+ | Very Short (<50 words) | 475 | **98.1%** | 0.899 |
124
+
125
+ Performance degrades gracefully with shorter inputs. Even on texts under 50 words — where the model has minimal signal — accuracy remains above 98%.
126
+
127
+ ### Threshold Sensitivity
128
+
129
+ The model produces well-calibrated, high-confidence outputs. Performance holds across aggressive decision thresholds:
130
+
131
+ | P(AI) Threshold | F1 | Precision |
132
+ |---|---|---|
133
+ | 0.50 (default) | 0.990 | 0.985 |
134
+ | 0.60 | 0.991 | 0.987 |
135
+ | 0.70 | 0.992 | 0.990 |
136
+ | 0.80 | 0.992 | 0.992 |
137
+ | 0.90 | 0.991 | 0.993 |
138
+ | 0.95 | 0.987 | 0.996 |
139
+
140
+ At a 0.95 threshold, precision reaches 99.6% with only a 0.3% drop in F1 — suitable for high-stakes applications where false accusations of AI usage carry serious consequences.
141
+
142
+ ## Architecture
143
+
144
+ | Component | Details |
145
+ |---|---|
146
+ | Base model | `microsoft/deberta-v3-base` (184M parameters) |
147
+ | Architecture | DeBERTa-v3 with disentangled attention and enhanced mask decoder |
148
+ | Task head | Linear classifier (768 → 2) with 0.15 dropout |
149
+ | Tokenizer | SentencePiece (slow tokenizer, `use_fast=False`) |
150
+ | Max sequence length | 512 tokens |
151
+ | Output | `[P(human), P(AI)]` softmax probabilities |
152
+
153
+ DeBERTa-v3 was chosen over RoBERTa and BERT alternatives due to its disentangled attention mechanism, which separately encodes content and position. This is particularly relevant for AI text detection: language models have characteristic positional dependencies in how they distribute tokens across a sequence, and disentangled attention gives the classifier direct access to these patterns.
154
+
155
+ ## Training
156
+
157
+ ### Configuration
158
+
159
+ | Parameter | Value |
160
+ |---|---|
161
+ | Trainable parameters | 184,423,682 (100% — all layers unfrozen) |
162
+ | Optimizer | AdamW (weight decay 0.01) |
163
+ | Learning rate | 2e-5 (cosine schedule) |
164
+ | Warmup | 10% of total steps |
165
+ | Effective batch size | 64 (16 × 4 gradient accumulation) |
166
+ | Precision | bf16 mixed precision |
167
+ | Gradient checkpointing | Enabled (non-reentrant) |
168
+ | Label smoothing | 0.05 |
169
+ | Class weights | human=1.182, ai=0.867 |
170
+ | Epochs | 8 (early-stopped at 3.17) |
171
+ | Best checkpoint | Epoch 1.19 (by validation F1) |
172
+ | Training time | ~49 minutes on RTX 4070 Ti 12GB |
173
+ | Final train loss | 0.186 |
174
+ | Final eval loss | 0.150 |
175
+
176
+ ### Why Fully Unfrozen?
177
+
178
+ Initial experiments with 4 frozen encoder layers (standard practice from PAN-CLEF 2025 literature) yielded only 80% accuracy with severe human-side bias — the model classified 44% of human texts as AI. Freezing 4 of 12 layers in DeBERTa-base locks 33% of the network, far more aggressive than the 21% reported for DeBERTa-large. Unfreezing all layers with cosine LR decay and 10% warmup resolved the bias entirely, lifting human accuracy from 55.6% to 97.9% without sacrificing AI detection (97.4% → 99.5%).
179
+
180
+ ### Dataset Composition
181
+
182
+ **Total: 50,458 texts** (40,364 train / 5,044 validation / 5,050 test)
183
+
184
+ Stratified by source with hash-based deduplication to prevent data leakage.
185
+
186
+ #### Human Sources (10 domains, ~29K target)
187
+
188
+ | Domain | Source | Target Count | Text Type |
189
+ |---|---|---|---|
190
+ | Academic (STEM) | arXiv API | 5,000 | Abstracts across 8 categories (cs.CL, cs.AI, cs.LG, physics, math, q-bio, econ, stat) |
191
+ | Academic (Medical) | PubMed API | 3,000 | Biomedical research abstracts |
192
+ | Encyclopedic | Wikipedia API | 5,000 | Article sections across 10 topic categories |
193
+ | Journalism | CC-News (HuggingFace) | 4,000 | News articles |
194
+ | Literary / Creative | Project Gutenberg | 2,000 | Public domain book excerpts |
195
+ | Informal / Social | Reddit (webis/tldr-17) | 3,000 | Writing-focused subreddit posts |
196
+ | Student / Educational | PERSUADE corpus | 2,000 | Student essays |
197
+ | Technical / Q&A | StackExchange | 2,000 | Technical answers |
198
+ | Blog / Opinion | Blog Authorship Corpus | 2,000 | Personal blog posts |
199
+ | Legal / Formal | Pile of Law | 1,000 | Legal opinions and case summaries |
200
+
201
+ #### AI Sources (24 model configurations across 10 families)
202
+
203
+ **Locally generated via LM Studio (8 models, Q4_K_M quantization):**
204
+
205
+ | Model | Family | Parameters |
206
+ |---|---|---|
207
+ | Llama-3.1-8B-Instruct | Meta Llama | 8B |
208
+ | Llama-3.2-3B-Instruct | Meta Llama | 3B |
209
+ | Mistral-7B-Instruct-v0.3 | Mistral AI | 7B |
210
+ | Qwen2.5-7B-Instruct | Alibaba Qwen | 7B |
211
+ | Qwen2.5-14B-Instruct | Alibaba Qwen | 14B |
212
+ | Gemma-2-9B-Instruct | Google | 9B |
213
+ | Phi-3.5-Mini-Instruct | Microsoft | 3.8B |
214
+ | DeepSeek-V2-Lite-Chat | DeepSeek | 16B (MoE) |
215
+
216
+ Local generation used 4 temperature/sampling configurations (default, creative, precise, varied) across 6 prompt strategies (direct, continue, rewrite, expand, style_mimic, question_answer) with a system prompt enforcing natural human-like output — no markdown, no meta-commentary, no self-referential AI language.
217
+
218
+ **HuggingFace datasets (16 additional model families):**
219
+
220
+ | Dataset | Models Added | Reference |
221
+ |---|---|---|
222
+ | RAID (ACL 2024) | ChatGPT-3.5, GPT-4, GPT-3-Davinci, Cohere Command, Llama-2-70B-Chat, Mistral-7B-v0.1, Mixtral-8x7B, MPT-30B, GPT-2-XL | [liamdugan/raid](https://huggingface.co/datasets/liamdugan/raid) |
223
+ | AI Text Detection Pile | GPT-2/3/J/ChatGPT (mixed) | [artem9k/ai-text-detection-pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile) |
224
+ | NYT Multi-Model | GPT-4o, Yi-Large, Qwen-2-72B, Llama-3-8B, Gemma-2-9B, Mistral-7B | [gsingh1-py/train](https://huggingface.co/datasets/gsingh1-py/train) |
225
+
226
+ This combination ensures coverage of proprietary API models (GPT-3.5, GPT-4, GPT-4o, Cohere), large open models exceeding consumer GPU VRAM (Llama-2-70B, Qwen-2-72B, Mixtral-8x7B, Yi-Large), older architectures (GPT-2, GPT-3, GPT-J), and mixture-of-experts models (Mixtral, DeepSeek-V2-Lite). RAID data was filtered to non-adversarial generations only (`attack=="none"`) for training data quality.
227
+
228
+ ## Usage
229
+
230
+ ### With Transformers
231
+
232
+ ```python
233
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
234
+ import torch
235
+
236
+ model_name = "ogmatrixllm/glyph" # Replace with your repo path
237
+ tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
238
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
239
+ model.eval()
240
+
241
+ text = "Your text to classify here..."
242
+
243
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
244
+ with torch.no_grad():
245
+ logits = model(**inputs).logits
246
+ probs = torch.softmax(logits, dim=-1)
247
+
248
+ p_human, p_ai = probs[0].tolist()
249
+ label = "AI-generated" if p_ai > 0.5 else "Human-written"
250
+ confidence = max(p_human, p_ai)
251
+
252
+ print(f"{label} (confidence: {confidence:.1%})")
253
+ ```
254
+
255
+ ### With Pipeline
256
+
257
+ ```python
258
+ from transformers import pipeline
259
+
260
+ detector = pipeline(
261
+ "text-classification",
262
+ model="ogmatrixai/glyph", # Replace with your repo path
263
+ tokenizer=AutoTokenizer.from_pretrained("ogmatrixai/glyph", use_fast=False),
264
+ )
265
+
266
+ result = detector("Your text here...")
267
+ print(result)
268
+ # [{'label': 'LABEL_1', 'score': 0.98}] # LABEL_0 = human, LABEL_1 = AI
269
+ ```
270
+
271
+ ### Important Notes
272
+
273
+ - **Tokenizer**: Always use `use_fast=False`. The fast tokenizer for DeBERTa-v3 has a confirmed regression in `transformers>=4.47` ([#42583](https://github.com/huggingface/transformers/issues/42583)) that crashes on load.
274
+ - **Max length**: The model was trained with `max_length=512`. Longer texts should be truncated or chunked with predictions aggregated.
275
+ - **Labels**: `LABEL_0` = human, `LABEL_1` = AI-generated.
276
+
277
+ ## Limitations and Ethical Considerations
278
+
279
+ ### Known Limitations
280
+
281
+ 1. **English only.** GLYPH was trained exclusively on English text. Performance on other languages is untested and likely degraded.
282
+
283
+ 2. **Training distribution.** The model has seen outputs from 24 specific AI model configurations. Novel architectures, heavily fine-tuned models, or future model families may evade detection. AI text detection is fundamentally adversarial — no static detector provides permanent robustness.
284
+
285
+ 3. **arXiv abstracts remain the hardest domain** at 90.8% accuracy. Highly formulaic academic writing with rigid structural conventions shares surface features with AI-generated text. Users in academic integrity contexts should treat borderline predictions on scientific abstracts with appropriate caution.
286
+
287
+ 4. **Short texts (<50 words)** have reduced F1 (0.899) despite high accuracy (98.1%). With minimal token-level signal, the model occasionally produces confident but incorrect predictions. For short-form content, consider requiring higher confidence thresholds.
288
+
289
+ 5. **Adversarial attacks.** The training data includes only non-adversarial AI outputs. Paraphrasing attacks, homoglyph substitution, targeted prompt engineering, and watermark-removal techniques were not included. Dedicated adversarial robustness (e.g., RAID adversarial subsets) is a planned enhancement.
290
+
291
+ 6. **Mixed authorship.** GLYPH classifies at the document level. It does not detect partial AI usage (e.g., AI-written paragraphs embedded in a human-written essay). Sentence-level or span-level detection requires a different approach.
292
+
293
+ 7. **512-token window.** Texts are truncated at 512 tokens. For long documents, this means classification is based on the opening ~350–400 words only. Sliding-window aggregation is recommended for long-form content.
294
+
295
+ ### Ethical Considerations
296
+
297
+ AI text detection carries real consequences — academic penalties, professional reputation damage, content moderation decisions. False positives (human text classified as AI) are particularly harmful. While GLYPH's false positive rate is low (2.06% on the test set, 44 out of 2,136 human texts), no detector achieves zero false positives.
298
+
299
+ **Recommendations for responsible deployment:**
300
+
301
+ - Never use GLYPH as the sole basis for punitive action. Use it as one signal among many (metadata, behavioral patterns, stylometric analysis).
302
+ - Apply a high confidence threshold (≥0.95) for consequential decisions. At this threshold, precision reaches 99.6%.
303
+ - Provide users with the confidence score, not just a binary label. A text scored at P(AI)=0.52 is fundamentally different from one scored at P(AI)=0.99.
304
+ - Maintain an appeals process. Statistical classifiers will always produce errors.
305
+ - Acknowledge the base rate problem. In populations where AI usage is rare, even a 2% FPR produces many false accusations relative to true detections.
306
+
307
+ ## Training Infrastructure
308
+
309
+ | Component | Specification |
310
+ |---|---|
311
+ | GPU | NVIDIA GeForce RTX 4070 Ti (12GB VRAM) |
312
+ | CPU | Intel Core i7-14700K (20 cores) |
313
+ | RAM | 48GB DDR5 |
314
+ | Framework | PyTorch 2.6+ / HuggingFace Transformers |
315
+ | Precision | bf16 mixed precision |
316
+ | Total training time | 49 minutes |
317
+ | Experiment tracking | Weights & Biases |
318
+
319
+ ## Citation
320
+
321
+ ```bibtex
322
+ @misc{glyph2026,
323
+ title={GLYPH: High-Accuracy AI Text Detection with DeBERTa-v3},
324
+ author={OGMatrix},
325
+ year={2026},
326
+ url={https://huggingface.co/ogmatrixllm/glyph}
327
+ }
328
+ ```
329
+
330
+ ## Acknowledgments
331
+
332
+ Training data incorporates the [RAID benchmark](https://huggingface.co/datasets/liamdugan/raid) (Dugan et al., ACL 2024), the [AI Text Detection Pile](https://huggingface.co/datasets/artem9k/ai-text-detection-pile), and the [NYT Multi-Model dataset](https://huggingface.co/datasets/gsingh1-py/train). Human text sources include arXiv, PubMed, Wikipedia, CC-News, Project Gutenberg, Reddit, StackExchange, Blog Authorship Corpus, PERSUADE, and Pile of Law. The base model is [DeBERTa-v3-base](https://huggingface.co/microsoft/deberta-v3-base) by Microsoft Research.