# 🏗️ SCAM HONEYPOT - Complete Architecture Documentation

## 📁 Project Structure Overview

```
sentinel-scam-honeypot/
├── app/                          # Main application code
│   ├── agents/                   # 🤖 AI Agents (brain of the system)
│   ├── api/                      # 🌐 REST API endpoints
│   ├── core/                     # 🧠 Core components (LLM, memory, prompts)
│   ├── decoys/                   # 🪤 Fake endpoints to trap scammers
│   ├── enforcement/              # 🚔 Law enforcement simulation
│   ├── intelligence/             # 📊 Threat intelligence modules
│   ├── templates/                # 💻 HTML templates
│   ├── utils/                    # 🔧 Utility functions
│   ├── main.py                   # FastAPI entry point
│   └── config.py                 # Configuration settings
├── dashboard.py                  # 📈 Streamlit analytics dashboard
├── simulate_attack.py            # ⚔️ Red vs Blue simulation
├── verify_honeypot.py            # ✅ System verification script
├── Dockerfile                    # 🐳 Docker deployment
├── requirements.txt              # 📦 Python dependencies
└── README.md                     # 📖 Project documentation
```

---

## 🎯 System Architecture Diagram

```mermaid
flowchart TB
    subgraph Input["📥 Input Layer"]
        A[Scammer Message] --> B[FastAPI Routes]
        B --> C{API Key Valid?}
        C -->|No| D[401 Unauthorized]
        C -->|Yes| E[Rate Limiter]
        E -->|Exceeded| F[429 Too Many Requests]
        E -->|OK| G[GUVI Handler]
    end

    subgraph Orchestrator["🤖 Orchestrator Layer"]
        G --> H[HoneypotOrchestrator]
        H --> I[Scam Detector]
        H --> J[Intel Extractor]
        H --> K[Emotional Analyzer]
        I --> L[LLM Client]
        L --> M[Groq/OpenAI/Anthropic]
    end

    subgraph Response["💬 Response Generation"]
        I --> N[Persona Engine]
        N --> O[Adaptive Strategy]
        O --> P[Engagement Delayer]
        P --> Q[Response Text]
    end

    subgraph Intelligence["📊 Intelligence Layer"]
        J --> R[Threat Engine]
        K --> R
        R --> S[Campaign Tracker]
        S --> T[Risk Scorer]
    end

    subgraph Storage["💾 Persistence Layer"]
        H --> U[SQLite/PostgreSQL]
        H --> V[Audit Logger]
        V --> W[SIEM Export]
    end

    subgraph Output["📤 Output Layer"]
        Q --> X[API Response]
        T --> X
        X --> Y[GUVI Callback]
        X --> Z[Stakeholder Exports]
        Z --> AA[CERT-In STIX 2.1]
        Z --> AB[TRAI UCC Report]
        Z --> AC[NPCI Fraud Report]
        Z --> AD[NCRP Complaint]
    end

    style Input fill:#e3f2fd
    style Orchestrator fill:#fff3e0
    style Response fill:#e8f5e9
    style Intelligence fill:#fce4ec
    style Storage fill:#f3e5f5
    style Output fill:#e0f7fa
```

---

## 🔄 Agent Interaction Flow

```mermaid
sequenceDiagram
    participant S as Scammer
    participant API as FastAPI
    participant O as Orchestrator
    participant SD as ScamDetector
    participant IE as IntelExtractor
    participant EA as EmotionalAnalyzer
    participant PE as PersonaEngine
    participant ED as EngagementDelayer
    participant DB as Database
    participant CB as Callback

    S->>API: POST /api/guvi/analyze
    API->>API: Verify API Key
    API->>API: Rate Limit Check
    API->>O: Process Message
    
    par Detection
        O->>SD: Detect Scam Type
        O->>IE: Extract Intelligence
        O->>EA: Analyze Emotions
    end
    
    SD-->>O: {is_scam, type, confidence}
    IE-->>O: {phones, upis, urls}
    EA-->>O: {urgency, fear, greed}
    
    O->>PE: Generate Response
    PE->>ED: Add Delays
    ED-->>PE: Delayed Response
    PE-->>O: Victim Response
    
    O->>DB: Store Conversation
    O-->>API: Response Payload
    API-->>S: JSON Response
    
    opt Scam Confirmed
        API->>CB: Send to GUVI
    end
```

---

## 🤖 AGENTS FOLDER (`app/agents/`)

The **brain** of the honeypot system. Each agent has a specific role.

### 1. `orchestrator.py` - Main Controller
| Aspect | Description |
|--------|-------------|
| **Purpose** | Coordinates all 6 agents to process scam messages |
| **What it does** | Receives message → Runs detection → Selects persona → Generates response → Computes risk → Returns result |
| **Connects to** | All other agents, LLM client, memory store |
| **Key class** | `HoneypotOrchestrator` |
| **Key method** | `process_message(message, conversation_id)` |

### 2. `scam_detector.py` - Scam Detection Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Detects if a message is a scam and classifies the type |
| **What it does** | Hybrid detection using keywords + LLM classification |
| **Contains** | `SCAM_DATABASE` with 10 scam types (lottery, job, banking, etc.) |
| **Connects to** | LLM client, orchestrator |
| **Key method** | `detect(message) → {is_scam, scam_type, confidence}` |

### 3. `persona_engine.py` - Persona Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates believable victim responses to engage scammers |
| **What it does** | Selects persona based on scam type, generates Hinglish/Hindi responses |
| **Contains** | `PERSONAS` dict with 10 personas (Sharma Uncle, Rahul Kumar, etc.) |
| **Response phases** | hook → engage → extract → stall → self_correct |
| **Key method** | `generate_response(scam_type, phase, history)` |

### 4. `adaptive_strategy.py` - Strategy Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Adapts honeypot behavior based on scammer actions |
| **What it does** | Analyzes scammer behavior, determines phase, adjusts strategy |
| **Behaviors detected** | pushing_payment, building_trust, aggressive, confused |
| **Connects to** | Persona engine, orchestrator |
| **Key method** | `adapt_strategy(scammer_message, history)` |

### 5. `intelligence_extractor.py` - Intel Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Extracts actionable intelligence from messages |
| **What it does** | Regex-based extraction of phone, UPI, bank, URLs |
| **Connects to** | Orchestrator, threat engine |
| **Key method** | `extract(message) → {phone_numbers, upi_ids, ...}` |

### 6. `conversation_manager.py` - Memory Manager
| Aspect | Description |
|--------|-------------|
| **Purpose** | Manages multi-turn conversation state |
| **What it does** | Tracks history, phase progression, trust evolution |
| **Connects to** | Memory store, orchestrator |
| **Key method** | `get_conversation(id), update_conversation(...)` |

---

## 🌐 API FOLDER (`app/api/`)

### 1. `routes.py` - API Endpoints
| Aspect | Description |
|--------|-------------|
| **Purpose** | Defines all REST API endpoints |
| **Key endpoints** | `/api/v1/analyze`, `/api/guvi/analyze`, `/api/v1/scam-types` |
| **Security** | `verify_api_key()` with x-api-key header |
| **Connects to** | Orchestrator, GUVI handler, schemas |

### 2. `schemas.py` - Pydantic Models
| Aspect | Description |
|--------|-------------|
| **Purpose** | Request/response validation models |
| **Key models** | `AnalyzeRequest`, `AnalyzeResponse`, `GUVIInputRequest`, `GUVIOutputResponse` |
| **Connects to** | Routes, GUVI handler |

---

## 🧠 CORE FOLDER (`app/core/`)

### 1. `llm_client.py` - LLM Client
| Aspect | Description |
|--------|-------------|
| **Purpose** | Unified interface to multiple LLM providers |
| **Supports** | OpenAI, Anthropic, Groq, OpenRouter |
| **Fallback** | Uses mock responses if no API key |
| **Key method** | `generate(prompt) → response` |

### 2. `memory.py` - Conversation Memory
| Aspect | Description |
|--------|-------------|
| **Purpose** | In-memory conversation storage |
| **Contains** | `ConversationMemory` class with TTL support |
| **Stores** | History, phase, trust_score, aggregated_intelligence |
| **Key method** | `get_or_create(conversation_id)` |

### 3. `prompts.py` - LLM Prompts
| Aspect | Description |
|--------|-------------|
| **Purpose** | System prompts for LLM interactions |
| **Contains** | `SCAM_DETECTION_PROMPT`, `RESPONSE_GENERATION_PROMPT`, `PHASE_GOALS` |

---

## 🪤 DECOYS FOLDER (`app/decoys/`)

### 1. `fake_endpoints.py` - Decoy Portals
| Aspect | Description |
|--------|-------------|
| **Purpose** | Fake banking/UPI pages to trap scammers |
| **Endpoints** | `/decoys/upi/status`, `/decoys/bank/kyc-portal`, `/decoys/secure/otp-generate` |
| **Why** | Scammers click these links thinking they're real |

### 2. `victim_profiles.py` - Synthetic Victims
| Aspect | Description |
|--------|-------------|
| **Purpose** | Fake victim data for honeypot responses |
| **Contains** | Synthetic names, bank accounts, UPI IDs |
| **Why** | No real PII is ever used |

---

## 📊 INTELLIGENCE FOLDER (`app/intelligence/`)

### 1. `threat_engine.py` - Threat Intelligence
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates threat intelligence reports |
| **Creates** | Campaign IDs, IOCs, TTPs (MITRE ATT&CK) |
| **Key method** | `generate_threat_intel(scam_type, entities)` |

### 2. `risk_scorer.py` - Risk Scoring
| Aspect | Description |
|--------|-------------|
| **Purpose** | Computes weighted risk score with explainability |
| **Factors** | Keywords, payment requests, threat level, campaign match |
| **Key method** | `compute_risk(detection_result) → {score, explanation}` |

### 3. `campaign_tracker.py` - Campaign Clustering
| Aspect | Description |
|--------|-------------|
| **Purpose** | Groups scam messages into campaigns |
| **Uses** | Entity similarity to cluster related attacks |
| **Key method** | `get_or_create_campaign(entities)` |

### 4. `telemetry.py` - Request Telemetry
| Aspect | Description |
|--------|-------------|
| **Purpose** | Captures IP, geo, device fingerprint |
| **Uses** | ip-api.com for geolocation |
| **Key method** | `capture_telemetry(request)` |

### 5. `scammer_profiler.py` - Behavioral Profiling
| Aspect | Description |
|--------|-------------|
| **Purpose** | Builds behavioral profiles of scammers |
| **Tracks** | Aggression, persistence, tactics used |

### 6. `engagement_metrics.py` - Metrics Tracking
| Aspect | Description |
|--------|-------------|
| **Purpose** | Tracks honeypot engagement statistics |
| **Metrics** | Duration, message count, intelligence extracted |

### 7. `honeytokens.py` - Honeytoken Generator
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates fake credentials as bait |
| **Creates** | Fake UPI IDs, bank accounts, phone numbers |

---

## 🚔 ENFORCEMENT FOLDER (`app/enforcement/`)

### 1. `police_api.py` - Cyber Police Simulation
| Aspect | Description |
|--------|-------------|
| **Purpose** | Simulates NCRP (cybercrime.gov.in) integration |
| **Creates** | Report IDs, priority levels, recommended actions |
| **Classes** | `CyberPoliceAPI`, `ActionRecommendationAPI` |

### 2. `awareness.py` - Public Awareness
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates scam awareness content |
| **Creates** | Warning messages, educational tips |

---

## 🔧 UTILS FOLDER (`app/utils/`)

### 1. `guvi_handler.py` - GUVI Format Translator
| Aspect | Description |
|--------|-------------|
| **Purpose** | Translates GUVI format ↔ internal format |
| **Why** | GUVI uses different field names (sessionId vs conversation_id) |
| **Key method** | `process_guvi_message(request) → GUVIOutputResponse` |

### 2. `callback_client.py` - GUVI Callback Sender
| Aspect | Description |
|--------|-------------|
| **Purpose** | Sends final result to GUVI evaluation endpoint |
| **Endpoint** | `POST https://hackathon.guvi.in/api/updateHoneyPotFinalResult` |
| **Trigger** | Auto-sends when `scamDetected = true` |

### 3. `extractors.py` - Entity Extractors
| Aspect | Description |
|--------|-------------|
| **Purpose** | Regex patterns for entity extraction |
| **Extracts** | Phone, UPI, bank account, IFSC, email, URL |

### 4. `logger.py` - Structured Logging
| Aspect | Description |
|--------|-------------|
| **Purpose** | Consistent logging across all agents |
| **Class** | `AgentLogger` |

---

## 🔗 HOW COMPONENTS CONNECT

```
┌─────────────────────────────────────────────────────────────────────┐
│                           USER REQUEST                               │
│                    POST /api/guvi/analyze                            │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│  routes.py → verify_api_key() → guvi_handler.py                      │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR (orchestrator.py)                    │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │ Scam        │ │ Intel       │ │ Persona     │ │ Adaptive    │    │
│  │ Detector    │ │ Extractor   │ │ Engine      │ │ Strategy    │    │
│  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘    │
│         │               │               │               │           │
│         ▼               ▼               ▼               ▼           │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    LLM CLIENT (llm_client.py)               │    │
│  │     Groq / OpenAI / Anthropic / OpenRouter / Mock           │    │
│  └─────────────────────────────────────────────────────────────┘    │
│         │               │               │               │           │
│         ▼               ▼               ▼               ▼           │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │ Memory      │ │ Threat      │ │ Risk        │ │ Campaign    │    │
│  │ Store       │ │ Engine      │ │ Scorer      │ │ Tracker     │    │
│  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘    │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    RESPONSE + CALLBACK                               │
│  GUVIOutputResponse → callback_client.py → GUVI Evaluation          │
└─────────────────────────────────────────────────────────────────────┘
```

---

## 📊 ROOT FILES

| File | Purpose |
|------|---------|
| `main.py` | FastAPI app entry point, startup/shutdown events |
| `config.py` | Environment variables, feature flags |
| `dashboard.py` | Streamlit analytics UI with live charts |
| `simulate_attack.py` | Red Team vs Blue Team simulation script |
| `verify_honeypot.py` | Quick verification of all endpoints |
| `Dockerfile` | Container deployment for HF Spaces |
| `requirements.txt` | Python dependencies |
| `README.md` | Project documentation with API examples |

---

## 🔑 KEY DATA FLOWS

### 1. Message Analysis Flow
```
Message → ScamDetector → PersonaEngine → AdaptiveStrategy → Response
```

### 2. Intelligence Flow
```
Message → IntelExtractor → ThreatEngine → CampaignTracker → Report
```

### 3. Risk Scoring Flow
```
DetectionResult → RiskScorer → Explanation → AnalyzeResponse
```

### 4. GUVI Callback Flow
```
ScamDetected=true → CallbackClient → hackathon.guvi.in → Evaluation
```

---

*Generated for GUVI India AI Impact Buildathon 2025*