--- license: apache-2.0 base_model: unsloth/gemma-4-E2B-it tags: - gemma-4 - unsloth - qlora - lora - peft - scam-detection - safety - text-classification - function-calling language: - en library_name: peft pipeline_tag: text-generation --- # Scam Sentinel — Fine-tuned Gemma 4 E2B for Multimodal Scam Risk Detection LoRA adapter fine-tuned on **Gemma 4 E2B-it** for the **Gemma 4 Good Hackathon (2026)**, Safety & Trust + Unsloth tracks. > This is not a final forensic deepfake detector. It is a multimodal scam risk assistant that combines phone call transcript analysis, conversation patterns, and verification workflows. ## Headline Results (300-sample real evaluation, apples-to-apples) All three rows use the **same 300-sample real test set, no RAG, identical v3 system prompt**. The only variables are base-model size and the presence of the LoRA adapter. | Setup | Size | Accuracy | Precision | Recall | F1 | FPR | |---|---|---|---|---|---|---| | Gemma 4 E4B base | ~8B | 53.0% | 46.9% | 97.6% | 63.4% | 78.9% | | Gemma 4 **E2B base** | ~5B | 41.7% | 41.4% | 96.8% | 58.0% | **97.7%** | | **Gemma 4 E2B + QLoRA (this adapter)** | ~5B | **89.7%** | **98.0%** | 76.8% | **86.1%** | **1.1%** | **Key findings** 1. **Same-size apples-to-apples (E2B base → E2B + QLoRA)**: F1 jumps **+28.1 pt** (58.0 → 86.1), FPR collapses **88×** (97.7% → 1.1%), Precision more than doubles (41.4% → 98.0%). 2. **Untuned Gemma 4 base is unusable for this task**: both base models flag the vast majority of normal messages as suspicious (FPR 78.9% and 97.7%). The instruction-tuned base has no domain prior for scam vs. normal text. 3. **Fine-tuning beats raw scale**: the fine-tuned 5B model outperforms the larger 8B base by **+22.7 F1 points** (63.4 → 86.1). 4. **Recall trade-off is intentional**: 96.8% (E2B base) → 76.8% (fine-tuned). See "Design rationale" below — the production cascade's Stage 1 retains high recall. ## Model Details - **Developed by**: Alice0914 (Gemma 4 Good Hackathon submission) - **Base model**: [unsloth/gemma-4-E2B-it](https://huggingface.co/unsloth/gemma-4-E2B-it) (~5B params, MatFormer architecture) - **Adapter type**: LoRA (PEFT) — 28.7M trainable params (0.56% of base) - **Training framework**: [Unsloth](https://github.com/unslothai/unsloth) + TRL SFTTrainer - **Quantization at training**: 4-bit NF4 (QLoRA) - **License**: Apache 2.0 - **Language**: English - **Project**: [Scam Sentinel GitHub repo](https://github.com/Alice0914/scam-sentinel) ## Intended Use ### Direct Use Analyze SMS, email, or transcribed phone-call messages and output structured JSON containing: - `risk_level`: `safe` / `low` / `medium` / `high` / `critical` - `patterns`: Detected scam patterns (urgency, impersonation, secrecy, etc.) - `user_message`: Plain-language explanation answering "Is this a scam? Why? What to do? How to verify?" - `tool_calls`: Function calls into 12 protective tools (notify family, suggest callback, block payment, etc.) ### Out-of-Scope Use - Voice authenticity / deepfake audio detection (use a dedicated audio model) - Languages other than English - Real-time telephony interception (requires phone-system integration) - Replacement for human judgment in financial decisions ## How to Use ```python from unsloth import FastLanguageModel model, tokenizer = FastLanguageModel.from_pretrained( model_name="Alice0914/gemma4-e2b-scam-sentinel", max_seq_length=1024, load_in_4bit=True, ) FastLanguageModel.for_inference(model) # (Load the full system prompt from the project repo) system_prompt = "..." messages = [ {"role": "system", "content": system_prompt}, {"role": "user", "content": "ANALYZE THIS INPUT:\n\nTEXT: Mom, send $500 right now\nMETADATA: {\"channel\": \"sms\"}"}, ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text=text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.3) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)) ``` ## Training Details ### - Training Data - **3,100 chat-formatted samples** (system + user + assistant) - Generated from 80 hand-written seeds + 571 real UCI SMS Spam samples × Gemma-4 paraphrased variants - 8 categories: family_impersonation, prosecutor_scam, bec_scam, romance_scam, package_scam, bank_phishing, phishing_link, normal - Assistant responses follow a 5-step Chain-of-Thought (IDENTIFY → ASSESS → EXPLAIN → DECIDE TOOLS → ANSWER) + JSON output format - Train / dev split: 3,100 / 771 (stratified by category) ### - Training Hyperparameters - Method: QLoRA (4-bit NF4 base + LoRA r=16) - LoRA: `r=16`, `alpha=32`, `dropout=0.05`, `target_modules="all-linear"` - Batch: 1 × grad_accum 8 (effective batch 8) - Epochs: 2 (~775 steps) - Learning rate: 2e-4, cosine schedule, `warmup_ratio=0.03` - Optimizer: `paged_adamw_8bit` - Precision: bf16 (compute) / NF4 (base weights) - Max sequence length: 1024 - Random seed: 3407 ### - Hardware - **GPU**: Google Colab Pro L4 (22.5 GB VRAM) - **Training time**: ~50 minutes for 2 epochs - **Framework versions**: PEFT 0.19.1, transformers ≥4.50, trl ≥1.4, Unsloth (latest from GitHub) --- ## Evaluation ### - Testing Data - Held-out set of **300 hand-labeled real samples** - Distribution: 175 safe / 7 low / 79 medium / 26 high / 13 critical - Sources: FTC consumer-fraud reports, UCI SMS Spam Collection (training-disjoint subset), custom edge cases - The evaluation set is disjoint from training via the `seeds_real.jsonl` filter — verified by hash check ### - Metrics (this adapter) Binary danger-vs-safe (matches the project's baseline reporting protocol): | Metric | Value | |---|---| | Accuracy | 89.7% | | Precision | 98.0% | | Recall | 76.8% | | F1 | 86.1% | | FPR | 1.1% | | JSON parsing success | 95.3% (286/300) | - Strict 5-class match: **69.0%** (model occasionally over-classifies within the dangerous range, e.g., medium → high — the correct failure mode for a safety-critical app) ### - What Fine-tuning Changed (E2B base → E2B + QLoRA) | Behavior | Base (E2B ~5B) | Fine-tuned (E2B ~5B) | Δ | |---|---|---|---| | FPR | 97.7% | 1.1% | **88× reduction** | | Precision | 41.4% | 98.0% | +56.6 pt | | Accuracy | 41.7% | 89.7% | +48.0 pt | | F1 | 58.0% | 86.1% | +28.1 pt | | Recall | 96.8% | 76.8% | −20.0 pt (intentional trade-off) | - The base instruction-tuned model has no in-domain prior for "what does a normal message look like?" — it flags 97.7% of safe messages as suspicious - Fine-tuning re-calibrates the decision boundary using 3,100 in-domain examples - The recall reduction is a deliberate trade-off favoring user trust over raw catch rate ### - Fine-tuning vs Raw Scale (E4B base → E2B + QLoRA) | Behavior | E4B base (~8B) | E2B + QLoRA (~5B) | Δ | |---|---|---|---| | F1 | 63.4% | 86.1% | +22.7 pt | | FPR | 78.9% | 1.1% | 72× reduction | | Precision | 46.9% | 98.0% | +51.1 pt | - A fine-tuned **smaller** model decisively outperforms a **larger** base model on this task - Demonstrates that domain adaptation dominates scale for safety-critical classification with limited training compute - Total cost: one Colab L4 session, ~50 minutes > **Note on comparison fairness**: All three setups use the same 300-sample test set and identical v3 system prompt; no RAG. Base models use Ollama Q4_K_M quantization; the fine-tune uses Unsloth NF4 (4-bit). Both are 4-bit; quantization differences contribute marginally — the +28.1 F1 / 88× FPR delta is dominated by the adapter, not quantization or size. ### - Design Rationale: Precision over Recall In the Scam Sentinel production system, this adapter is **Stage 2 of a two-stage cascade**: - **Stage 1** — a fast classifier (e.g., gemma3:4b) ensures every potentially dangerous message is escalated (recall 99%+) - **Stage 2** — this fine-tuned adapter provides high-confidence reasoning and tool calls only when action is warranted (precision 98%) - Stage 1 handles "catch everything" - Stage 2's job is to *justify action* — blocking payments, alerting family, demanding callback verification - With 1.1% FPR, when this model flags a message, downstream actions are trusted by users - Higher recall at this stage would re-introduce the user-trust collapse seen in the base model (FPR 97.7%), making the product unusable in real deployment regardless of recall --- ## Bias, Risks, and Limitations ### - Language - **English-only**: Trained on English text; performance on other languages is not validated ### - Classification Behavior - **Over-classification bias within the dangerous range**: The model leans toward "more dangerous" classifications (e.g., medium → high) - This is intentional — once false positives on safe messages are eliminated, the safer error mode within non-safe messages is to over-classify - Downstream tools (wait timer, callback verification) make over-classification cheap to recover from ### - Recall Trade-off - Some borderline messages may be missed - Recommended deployment pairs this adapter with a high-recall first-pass classifier (cascade Stage 1) ### - Training Data Provenance - **Synthetic origin**: 80% of training data was Gemma-paraphrased from hand-written seeds and real UCI SMS spam - Evaluation uses real held-out data only to detect any overfit to synthetic style ### - Tool Calls Are Advisory - The 12 protective tools are recommended actions - Downstream systems must enforce safety policies independently - The model does not execute actions — it returns structured intent --- ## Citation Project: Scam Sentinel — submission for the Gemma 4 Good Hackathon (2026). ```bibtex @misc{scam-sentinel-2026, author = {Alice0914}, title = {Scam Sentinel: Multimodal Scam Risk Assistant with Fine-tuned Gemma 4 E2B}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Alice0914/gemma4-e2b-scam-sentinel}}, note = {Submission for the Gemma 4 Good Hackathon} } ## Framework Versions - PEFT 0.19.1 - Unsloth (latest from GitHub) - transformers ≥4.50 - trl ≥1.4