---
license: apache-2.0
base_model: unsloth/gemma-4-E2B-it
tags:
  - gemma-4
  - unsloth
  - qlora
  - lora
  - peft
  - scam-detection
  - safety
  - text-classification
  - function-calling
language:
  - en
library_name: peft
pipeline_tag: text-generation
---

# Scam Sentinel — Fine-tuned Gemma 4 E2B for Multimodal Scam Risk Detection

LoRA adapter fine-tuned on **Gemma 4 E2B-it** for the **Gemma 4 Good Hackathon (2026)**, Safety & Trust + Unsloth tracks.

> This is not a final forensic deepfake detector. It is a multimodal scam risk assistant that combines phone call transcript analysis, conversation patterns, and verification workflows.

## Headline Results (300-sample real evaluation, apples-to-apples)

All three rows use the **same 300-sample real test set, no RAG, identical v3 system prompt**. The only variables are base-model size and the presence of the LoRA adapter.

| Setup | Size | Accuracy | Precision | Recall | F1 | FPR |
|---|---|---|---|---|---|---|
| Gemma 4 E4B base | ~8B | 53.0% | 46.9% | 97.6% | 63.4% | 78.9% |
| Gemma 4 **E2B base** | ~5B | 41.7% | 41.4% | 96.8% | 58.0% | **97.7%** |
| **Gemma 4 E2B + QLoRA (this adapter)** | ~5B | **89.7%** | **98.0%** | 76.8% | **86.1%** | **1.1%** |

**Key findings**

1. **Same-size apples-to-apples (E2B base → E2B + QLoRA)**: F1 jumps **+28.1 pt** (58.0 → 86.1), FPR collapses **88×** (97.7% → 1.1%), Precision more than doubles (41.4% → 98.0%).
2. **Untuned Gemma 4 base is unusable for this task**: both base models flag the vast majority of normal messages as suspicious (FPR 78.9% and 97.7%). The instruction-tuned base has no domain prior for scam vs. normal text.
3. **Fine-tuning beats raw scale**: the fine-tuned 5B model outperforms the larger 8B base by **+22.7 F1 points** (63.4 → 86.1).
4. **Recall trade-off is intentional**: 96.8% (E2B base) → 76.8% (fine-tuned). See "Design rationale" below — the production cascade's Stage 1 retains high recall.

## Model Details

- **Developed by**: Alice0914 (Gemma 4 Good Hackathon submission)
- **Base model**: [unsloth/gemma-4-E2B-it](https://huggingface.co/unsloth/gemma-4-E2B-it) (~5B params, MatFormer architecture)
- **Adapter type**: LoRA (PEFT) — 28.7M trainable params (0.56% of base)
- **Training framework**: [Unsloth](https://github.com/unslothai/unsloth) + TRL SFTTrainer
- **Quantization at training**: 4-bit NF4 (QLoRA)
- **License**: Apache 2.0
- **Language**: English
- **Project**: [Scam Sentinel GitHub repo](https://github.com/Alice0914/scam-sentinel)

## Intended Use

### Direct Use
Analyze SMS, email, or transcribed phone-call messages and output structured JSON containing:
- `risk_level`: `safe` / `low` / `medium` / `high` / `critical`
- `patterns`: Detected scam patterns (urgency, impersonation, secrecy, etc.)
- `user_message`: Plain-language explanation answering "Is this a scam? Why? What to do? How to verify?"
- `tool_calls`: Function calls into 12 protective tools (notify family, suggest callback, block payment, etc.)

### Out-of-Scope Use
- Voice authenticity / deepfake audio detection (use a dedicated audio model)
- Languages other than English
- Real-time telephony interception (requires phone-system integration)
- Replacement for human judgment in financial decisions

## How to Use

```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Alice0914/gemma4-e2b-scam-sentinel",
    max_seq_length=1024,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

# (Load the full system prompt from the project repo)
system_prompt = "..."

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "ANALYZE THIS INPUT:\n\nTEXT: Mom, send $500 right now\nMETADATA: {\"channel\": \"sms\"}"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text=text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=1024, temperature=0.3)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
```


## Training Details

### - Training Data

- **3,100 chat-formatted samples** (system + user + assistant)
- Generated from 80 hand-written seeds + 571 real UCI SMS Spam samples × Gemma-4 paraphrased variants
- 8 categories: family_impersonation, prosecutor_scam, bec_scam, romance_scam, package_scam, bank_phishing, phishing_link, normal
- Assistant responses follow a 5-step Chain-of-Thought (IDENTIFY → ASSESS → EXPLAIN → DECIDE TOOLS → ANSWER) + JSON output format
- Train / dev split: 3,100 / 771 (stratified by category)

### - Training Hyperparameters

- Method: QLoRA (4-bit NF4 base + LoRA r=16)
- LoRA: `r=16`, `alpha=32`, `dropout=0.05`, `target_modules="all-linear"`
- Batch: 1 × grad_accum 8 (effective batch 8)
- Epochs: 2 (~775 steps)
- Learning rate: 2e-4, cosine schedule, `warmup_ratio=0.03`
- Optimizer: `paged_adamw_8bit`
- Precision: bf16 (compute) / NF4 (base weights)
- Max sequence length: 1024
- Random seed: 3407

### - Hardware

- **GPU**: Google Colab Pro L4 (22.5 GB VRAM)
- **Training time**: ~50 minutes for 2 epochs
- **Framework versions**: PEFT 0.19.1, transformers ≥4.50, trl ≥1.4, Unsloth (latest from GitHub)

---

## Evaluation

### - Testing Data

- Held-out set of **300 hand-labeled real samples**
- Distribution: 175 safe / 7 low / 79 medium / 26 high / 13 critical
- Sources: FTC consumer-fraud reports, UCI SMS Spam Collection (training-disjoint subset), custom edge cases
- The evaluation set is disjoint from training via the `seeds_real.jsonl` filter — verified by hash check

### - Metrics (this adapter)

Binary danger-vs-safe (matches the project's baseline reporting protocol):

| Metric | Value |
|---|---|
| Accuracy | 89.7% |
| Precision | 98.0% |
| Recall | 76.8% |
| F1 | 86.1% |
| FPR | 1.1% |
| JSON parsing success | 95.3% (286/300) |

- Strict 5-class match: **69.0%** (model occasionally over-classifies within the dangerous range, e.g., medium → high — the correct failure mode for a safety-critical app)

### - What Fine-tuning Changed (E2B base → E2B + QLoRA)

| Behavior | Base (E2B ~5B) | Fine-tuned (E2B ~5B) | Δ |
|---|---|---|---|
| FPR | 97.7% | 1.1% | **88× reduction** |
| Precision | 41.4% | 98.0% | +56.6 pt |
| Accuracy | 41.7% | 89.7% | +48.0 pt |
| F1 | 58.0% | 86.1% | +28.1 pt |
| Recall | 96.8% | 76.8% | −20.0 pt (intentional trade-off) |

- The base instruction-tuned model has no in-domain prior for "what does a normal message look like?" — it flags 97.7% of safe messages as suspicious
- Fine-tuning re-calibrates the decision boundary using 3,100 in-domain examples
- The recall reduction is a deliberate trade-off favoring user trust over raw catch rate

### - Fine-tuning vs Raw Scale (E4B base → E2B + QLoRA)

| Behavior | E4B base (~8B) | E2B + QLoRA (~5B) | Δ |
|---|---|---|---|
| F1 | 63.4% | 86.1% | +22.7 pt |
| FPR | 78.9% | 1.1% | 72× reduction |
| Precision | 46.9% | 98.0% | +51.1 pt |

- A fine-tuned **smaller** model decisively outperforms a **larger** base model on this task
- Demonstrates that domain adaptation dominates scale for safety-critical classification with limited training compute
- Total cost: one Colab L4 session, ~50 minutes

> **Note on comparison fairness**: All three setups use the same 300-sample test set and identical v3 system prompt; no RAG. Base models use Ollama Q4_K_M quantization; the fine-tune uses Unsloth NF4 (4-bit). Both are 4-bit; quantization differences contribute marginally — the +28.1 F1 / 88× FPR delta is dominated by the adapter, not quantization or size.

### - Design Rationale: Precision over Recall

In the Scam Sentinel production system, this adapter is **Stage 2 of a two-stage cascade**:

- **Stage 1** — a fast classifier (e.g., gemma3:4b) ensures every potentially dangerous message is escalated (recall 99%+)
- **Stage 2** — this fine-tuned adapter provides high-confidence reasoning and tool calls only when action is warranted (precision 98%)

- Stage 1 handles "catch everything"
- Stage 2's job is to *justify action* — blocking payments, alerting family, demanding callback verification
- With 1.1% FPR, when this model flags a message, downstream actions are trusted by users
- Higher recall at this stage would re-introduce the user-trust collapse seen in the base model (FPR 97.7%), making the product unusable in real deployment regardless of recall

---

## Bias, Risks, and Limitations

### - Language

- **English-only**: Trained on English text; performance on other languages is not validated

### - Classification Behavior

- **Over-classification bias within the dangerous range**: The model leans toward "more dangerous" classifications (e.g., medium → high)
- This is intentional — once false positives on safe messages are eliminated, the safer error mode within non-safe messages is to over-classify
- Downstream tools (wait timer, callback verification) make over-classification cheap to recover from

### - Recall Trade-off

- Some borderline messages may be missed
- Recommended deployment pairs this adapter with a high-recall first-pass classifier (cascade Stage 1)

### - Training Data Provenance

- **Synthetic origin**: 80% of training data was Gemma-paraphrased from hand-written seeds and real UCI SMS spam
- Evaluation uses real held-out data only to detect any overfit to synthetic style

### - Tool Calls Are Advisory

- The 12 protective tools are recommended actions
- Downstream systems must enforce safety policies independently
- The model does not execute actions — it returns structured intent

---

## Citation

Project: Scam Sentinel — submission for the Gemma 4 Good Hackathon (2026).

```bibtex
@misc{scam-sentinel-2026,
  author = {Alice0914},
  title = {Scam Sentinel: Multimodal Scam Risk Assistant with Fine-tuned Gemma 4 E2B},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Alice0914/gemma4-e2b-scam-sentinel}},
  note = {Submission for the Gemma 4 Good Hackathon}
}

## Framework Versions
- PEFT 0.19.1
- Unsloth (latest from GitHub)
- transformers ≥4.50
- trl ≥1.4