# Comprehensive System Audit & Optimization Strategy

**Date**: 2026-02-02
**Honeypot Version**: Sentinel-Scam-Honeypot v2
**Auditor**: Antigravity Agent

---

## STEP 1 — FULL CODEBASE UNDERSTANDING

### Current Execution Flow
1.  **Ingestion**: `Orchestrator.process_message` receives the user message.
2.  **Safeguard**: `llm_client.check_safeguard` scans for prompt injections (Synchronous, LLM-backed).
3.  **Parallel Intelligence (Async)**:
    *   `ScamDetector.detect()`: Checks Regex (FAST-PATH) -> Falls back to LLM.
    *   `IntelligenceExtractor.extract()`: Checks Regex -> Calls LLM for deep extraction.
4.  **Adaptive Analysis**: `AdaptiveStrategy.analyze` runs local keyword matching.
5.  **State Machine**: `ConversationManager.determine_phase` calculates the conversation phase.
6.  **Persona Logic**: `PersonaEngine.select_persona` calls **LLM (Structured)** to decide/mutate persona.
7.  **Response Generation**: `PersonaEngine.generate_response` calls **LLM (Fast Chat)** for the reply.
8.  **Background Enrichment**: `EnrichmentService` runs in background, calling **LLM (Structured)** for forensic analysis.

### Critical Observations
-   **Mandatory LLM Steps**: Persona Selection, Response Generation.
-   **Optional LLM Steps**: Safeguard (can be regex), Scam Detection (can be regex), Extraction (can be regex).
-   **Redundancy**: Persona selection logic runs on *every single message*, even if the persona shouldn't change. Scam detection runs even if scam type is already known.

---

## STEP 2 — API CALL AUDIT

**Average Calls Per Message**: 5 - 15 (Worst Case)

| component | role | input tokens | output | frequency | necessity |
|-----------|------|--------------|--------|-----------|-----------|
| **Safeguard** | `SAF_GUARD` | ~200 | Bool | 1/msg | **Wasteful** (Regex > LLM for known attacks) |
| **Scam Detect** | `FAST_CHAT` | ~500 | JSON | 1/msg | **Redundant** (Once detected, stop calling) |
| **Intel Extract**| `SMART` | ~1000 | JSON | 1/msg | **Wasteful** (Only call if new entities found) |
| **Persona Select**| `STRUCTURED` | ~1500 | JSON | 1/msg | **Critical Waste** (Persona should be sticky) |
| **Response** | `FAST_CHAT` | ~2000 | Text | 1-3/msg | **Mandatory** (Main output) |
| **Enrichment** | `FORENSIC` | ~3000 | JSON | 1/msg | **Optional** (Defer to batch) |

**Why count is high**:
1.  **Aggressive Fallbacks**: If `gpt-oss` fails, system retries `llama-3` multiple times.
2.  **No State Cache**: System "forgets" it already detected the scam 2 seconds ago.

---

## STEP 3 — BOT-LIKE BEHAVIOR ROOT CAUSE

The "Dumb Bot" feeling comes from three specific failures:

1.  **The "Crash-to-Static" Pipeline**:
    *   When the LLM fails (quota/type error), the code falls back to `_static_response`.
    *   This response is generic ("Wait main check karta hoon") and kills the persona's unique voice.
    *   **Correction**: Fallbacks must be *persona-aware*.

2.  **Safety Tuning Leakage**:
    *   Models like Llama-3 are safety-tuned. Even with prompts, they resist being "scammed".
    *   They often output: "I cannot comply" or "Please verify".
    *   **Correction**: Use lower-safety models or "jailbreak" style prompting for the honeypot role.

3.  **Prompt Overload**:
    *   `RESPONSE_GENERATION_PROMPT` sends full history.
    *   As history grows, the model pays less attention to the immediate instruction ("Be hyper-realistic").
    *   Result: It reverts to default "Helpful Assistant" tone.

---