sentinel-scam-honeypo / docs /ARCHITECTURE.md
avinash-rai's picture
feat: GUVI final submission pack (docs, dashboard, telemetry)
7b1aabb
|
raw
history blame
17.8 kB

πŸ—οΈ SCAM HONEYPOT - Complete Architecture Documentation

πŸ“ Project Structure Overview

sentinel-scam-honeypot/
β”œβ”€β”€ app/                          # Main application code
β”‚   β”œβ”€β”€ agents/                   # πŸ€– AI Agents (brain of the system)
β”‚   β”œβ”€β”€ api/                      # 🌐 REST API endpoints
β”‚   β”œβ”€β”€ core/                     # 🧠 Core components (LLM, memory, prompts)
β”‚   β”œβ”€β”€ decoys/                   # πŸͺ€ Fake endpoints to trap scammers
β”‚   β”œβ”€β”€ enforcement/              # πŸš” Law enforcement simulation
β”‚   β”œβ”€β”€ intelligence/             # πŸ“Š Threat intelligence modules
β”‚   β”œβ”€β”€ templates/                # πŸ’» HTML templates
β”‚   β”œβ”€β”€ utils/                    # πŸ”§ Utility functions
β”‚   β”œβ”€β”€ main.py                   # FastAPI entry point
β”‚   └── config.py                 # Configuration settings
β”œβ”€β”€ dashboard.py                  # πŸ“ˆ Streamlit analytics dashboard
β”œβ”€β”€ simulate_attack.py            # βš”οΈ Red vs Blue simulation
β”œβ”€β”€ verify_honeypot.py            # βœ… System verification script
β”œβ”€β”€ Dockerfile                    # 🐳 Docker deployment
β”œβ”€β”€ requirements.txt              # πŸ“¦ Python dependencies
└── README.md                     # πŸ“– Project documentation

🎯 System Architecture Diagram

flowchart TB
    subgraph Input["πŸ“₯ Input Layer"]
        A[Scammer Message] --> B[FastAPI Routes]
        B --> C{API Key Valid?}
        C -->|No| D[401 Unauthorized]
        C -->|Yes| E[Rate Limiter]
        E -->|Exceeded| F[429 Too Many Requests]
        E -->|OK| G[GUVI Handler]
    end

    subgraph Orchestrator["πŸ€– Orchestrator Layer"]
        G --> H[HoneypotOrchestrator]
        H --> I[Scam Detector]
        H --> J[Intel Extractor]
        H --> K[Emotional Analyzer]
        I --> L[LLM Client]
        L --> M[Groq/OpenAI/Anthropic]
    end

    subgraph Response["πŸ’¬ Response Generation"]
        I --> N[Persona Engine]
        N --> O[Adaptive Strategy]
        O --> P[Engagement Delayer]
        P --> Q[Response Text]
    end

    subgraph Intelligence["πŸ“Š Intelligence Layer"]
        J --> R[Threat Engine]
        K --> R
        R --> S[Campaign Tracker]
        S --> T[Risk Scorer]
    end

    subgraph Storage["πŸ’Ύ Persistence Layer"]
        H --> U[SQLite/PostgreSQL]
        H --> V[Audit Logger]
        V --> W[SIEM Export]
    end

    subgraph Output["πŸ“€ Output Layer"]
        Q --> X[API Response]
        T --> X
        X --> Y[GUVI Callback]
        X --> Z[Stakeholder Exports]
        Z --> AA[CERT-In STIX 2.1]
        Z --> AB[TRAI UCC Report]
        Z --> AC[NPCI Fraud Report]
        Z --> AD[NCRP Complaint]
    end

    style Input fill:#e3f2fd
    style Orchestrator fill:#fff3e0
    style Response fill:#e8f5e9
    style Intelligence fill:#fce4ec
    style Storage fill:#f3e5f5
    style Output fill:#e0f7fa

πŸ”„ Agent Interaction Flow

sequenceDiagram
    participant S as Scammer
    participant API as FastAPI
    participant O as Orchestrator
    participant SD as ScamDetector
    participant IE as IntelExtractor
    participant EA as EmotionalAnalyzer
    participant PE as PersonaEngine
    participant ED as EngagementDelayer
    participant DB as Database
    participant CB as Callback

    S->>API: POST /api/guvi/analyze
    API->>API: Verify API Key
    API->>API: Rate Limit Check
    API->>O: Process Message
    
    par Detection
        O->>SD: Detect Scam Type
        O->>IE: Extract Intelligence
        O->>EA: Analyze Emotions
    end
    
    SD-->>O: {is_scam, type, confidence}
    IE-->>O: {phones, upis, urls}
    EA-->>O: {urgency, fear, greed}
    
    O->>PE: Generate Response
    PE->>ED: Add Delays
    ED-->>PE: Delayed Response
    PE-->>O: Victim Response
    
    O->>DB: Store Conversation
    O-->>API: Response Payload
    API-->>S: JSON Response
    
    opt Scam Confirmed
        API->>CB: Send to GUVI
    end

πŸ€– AGENTS FOLDER (app/agents/)

The brain of the honeypot system. Each agent has a specific role.

1. orchestrator.py - Main Controller

Aspect Description
Purpose Coordinates all 6 agents to process scam messages
What it does Receives message β†’ Runs detection β†’ Selects persona β†’ Generates response β†’ Computes risk β†’ Returns result
Connects to All other agents, LLM client, memory store
Key class HoneypotOrchestrator
Key method process_message(message, conversation_id)

2. scam_detector.py - Scam Detection Agent

Aspect Description
Purpose Detects if a message is a scam and classifies the type
What it does Hybrid detection using keywords + LLM classification
Contains SCAM_DATABASE with 10 scam types (lottery, job, banking, etc.)
Connects to LLM client, orchestrator
Key method detect(message) β†’ {is_scam, scam_type, confidence}

3. persona_engine.py - Persona Agent

Aspect Description
Purpose Generates believable victim responses to engage scammers
What it does Selects persona based on scam type, generates Hinglish/Hindi responses
Contains PERSONAS dict with 10 personas (Sharma Uncle, Rahul Kumar, etc.)
Response phases hook β†’ engage β†’ extract β†’ stall β†’ self_correct
Key method generate_response(scam_type, phase, history)

4. adaptive_strategy.py - Strategy Agent

Aspect Description
Purpose Adapts honeypot behavior based on scammer actions
What it does Analyzes scammer behavior, determines phase, adjusts strategy
Behaviors detected pushing_payment, building_trust, aggressive, confused
Connects to Persona engine, orchestrator
Key method adapt_strategy(scammer_message, history)

5. intelligence_extractor.py - Intel Agent

Aspect Description
Purpose Extracts actionable intelligence from messages
What it does Regex-based extraction of phone, UPI, bank, URLs
Connects to Orchestrator, threat engine
Key method extract(message) β†’ {phone_numbers, upi_ids, ...}

6. conversation_manager.py - Memory Manager

Aspect Description
Purpose Manages multi-turn conversation state
What it does Tracks history, phase progression, trust evolution
Connects to Memory store, orchestrator
Key method get_conversation(id), update_conversation(...)

🌐 API FOLDER (app/api/)

1. routes.py - API Endpoints

Aspect Description
Purpose Defines all REST API endpoints
Key endpoints /api/v1/analyze, /api/guvi/analyze, /api/v1/scam-types
Security verify_api_key() with x-api-key header
Connects to Orchestrator, GUVI handler, schemas

2. schemas.py - Pydantic Models

Aspect Description
Purpose Request/response validation models
Key models AnalyzeRequest, AnalyzeResponse, GUVIInputRequest, GUVIOutputResponse
Connects to Routes, GUVI handler

🧠 CORE FOLDER (app/core/)

1. llm_client.py - LLM Client

Aspect Description
Purpose Unified interface to multiple LLM providers
Supports OpenAI, Anthropic, Groq, OpenRouter
Fallback Uses mock responses if no API key
Key method generate(prompt) β†’ response

2. memory.py - Conversation Memory

Aspect Description
Purpose In-memory conversation storage
Contains ConversationMemory class with TTL support
Stores History, phase, trust_score, aggregated_intelligence
Key method get_or_create(conversation_id)

3. prompts.py - LLM Prompts

Aspect Description
Purpose System prompts for LLM interactions
Contains SCAM_DETECTION_PROMPT, RESPONSE_GENERATION_PROMPT, PHASE_GOALS

πŸͺ€ DECOYS FOLDER (app/decoys/)

1. fake_endpoints.py - Decoy Portals

Aspect Description
Purpose Fake banking/UPI pages to trap scammers
Endpoints /decoys/upi/status, /decoys/bank/kyc-portal, /decoys/secure/otp-generate
Why Scammers click these links thinking they're real

2. victim_profiles.py - Synthetic Victims

Aspect Description
Purpose Fake victim data for honeypot responses
Contains Synthetic names, bank accounts, UPI IDs
Why No real PII is ever used

πŸ“Š INTELLIGENCE FOLDER (app/intelligence/)

1. threat_engine.py - Threat Intelligence

Aspect Description
Purpose Generates threat intelligence reports
Creates Campaign IDs, IOCs, TTPs (MITRE ATT&CK)
Key method generate_threat_intel(scam_type, entities)

2. risk_scorer.py - Risk Scoring

Aspect Description
Purpose Computes weighted risk score with explainability
Factors Keywords, payment requests, threat level, campaign match
Key method compute_risk(detection_result) β†’ {score, explanation}

3. campaign_tracker.py - Campaign Clustering

Aspect Description
Purpose Groups scam messages into campaigns
Uses Entity similarity to cluster related attacks
Key method get_or_create_campaign(entities)

4. telemetry.py - Request Telemetry

Aspect Description
Purpose Captures IP, geo, device fingerprint
Uses ip-api.com for geolocation
Key method capture_telemetry(request)

5. scammer_profiler.py - Behavioral Profiling

Aspect Description
Purpose Builds behavioral profiles of scammers
Tracks Aggression, persistence, tactics used

6. engagement_metrics.py - Metrics Tracking

Aspect Description
Purpose Tracks honeypot engagement statistics
Metrics Duration, message count, intelligence extracted

7. honeytokens.py - Honeytoken Generator

Aspect Description
Purpose Generates fake credentials as bait
Creates Fake UPI IDs, bank accounts, phone numbers

πŸš” ENFORCEMENT FOLDER (app/enforcement/)

1. police_api.py - Cyber Police Simulation

Aspect Description
Purpose Simulates NCRP (cybercrime.gov.in) integration
Creates Report IDs, priority levels, recommended actions
Classes CyberPoliceAPI, ActionRecommendationAPI

2. awareness.py - Public Awareness

Aspect Description
Purpose Generates scam awareness content
Creates Warning messages, educational tips

πŸ”§ UTILS FOLDER (app/utils/)

1. guvi_handler.py - GUVI Format Translator

Aspect Description
Purpose Translates GUVI format ↔ internal format
Why GUVI uses different field names (sessionId vs conversation_id)
Key method process_guvi_message(request) β†’ GUVIOutputResponse

2. callback_client.py - GUVI Callback Sender

Aspect Description
Purpose Sends final result to GUVI evaluation endpoint
Endpoint POST https://hackathon.guvi.in/api/updateHoneyPotFinalResult
Trigger Auto-sends when scamDetected = true

3. extractors.py - Entity Extractors

Aspect Description
Purpose Regex patterns for entity extraction
Extracts Phone, UPI, bank account, IFSC, email, URL

4. logger.py - Structured Logging

Aspect Description
Purpose Consistent logging across all agents
Class AgentLogger

πŸ”— HOW COMPONENTS CONNECT

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                           USER REQUEST                               β”‚
β”‚                    POST /api/guvi/analyze                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  routes.py β†’ verify_api_key() β†’ guvi_handler.py                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    ORCHESTRATOR (orchestrator.py)                    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Scam        β”‚ β”‚ Intel       β”‚ β”‚ Persona     β”‚ β”‚ Adaptive    β”‚    β”‚
β”‚  β”‚ Detector    β”‚ β”‚ Extractor   β”‚ β”‚ Engine      β”‚ β”‚ Strategy    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         β”‚               β”‚               β”‚               β”‚           β”‚
β”‚         β–Ό               β–Ό               β–Ό               β–Ό           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚                    LLM CLIENT (llm_client.py)               β”‚    β”‚
β”‚  β”‚     Groq / OpenAI / Anthropic / OpenRouter / Mock           β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚         β”‚               β”‚               β”‚               β”‚           β”‚
β”‚         β–Ό               β–Ό               β–Ό               β–Ό           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚ Memory      β”‚ β”‚ Threat      β”‚ β”‚ Risk        β”‚ β”‚ Campaign    β”‚    β”‚
β”‚  β”‚ Store       β”‚ β”‚ Engine      β”‚ β”‚ Scorer      β”‚ β”‚ Tracker     β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                               β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    RESPONSE + CALLBACK                               β”‚
β”‚  GUVIOutputResponse β†’ callback_client.py β†’ GUVI Evaluation          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š ROOT FILES

File Purpose
main.py FastAPI app entry point, startup/shutdown events
config.py Environment variables, feature flags
dashboard.py Streamlit analytics UI with live charts
simulate_attack.py Red Team vs Blue Team simulation script
verify_honeypot.py Quick verification of all endpoints
Dockerfile Container deployment for HF Spaces
requirements.txt Python dependencies
README.md Project documentation with API examples

πŸ”‘ KEY DATA FLOWS

1. Message Analysis Flow

Message β†’ ScamDetector β†’ PersonaEngine β†’ AdaptiveStrategy β†’ Response

2. Intelligence Flow

Message β†’ IntelExtractor β†’ ThreatEngine β†’ CampaignTracker β†’ Report

3. Risk Scoring Flow

DetectionResult β†’ RiskScorer β†’ Explanation β†’ AnalyzeResponse

4. GUVI Callback Flow

ScamDetected=true β†’ CallbackClient β†’ hackathon.guvi.in β†’ Evaluation

Generated for GUVI India AI Impact Buildathon 2025