Spaces:

AvinashAnalytics
/

sentinel-scam-honeypo

Paused

App Files Files Community

sentinel-scam-honeypo / docs /ARCHITECTURE.md

avinash-rai

feat: GUVI final submission pack (docs, dashboard, telemetry)

7b1aabb 4 months ago

preview code

raw

history blame

17.8 kB

🏗️ SCAM HONEYPOT - Complete Architecture Documentation

📁 Project Structure Overview

sentinel-scam-honeypot/
├── app/                          # Main application code
│   ├── agents/                   # 🤖 AI Agents (brain of the system)
│   ├── api/                      # 🌐 REST API endpoints
│   ├── core/                     # 🧠 Core components (LLM, memory, prompts)
│   ├── decoys/                   # 🪤 Fake endpoints to trap scammers
│   ├── enforcement/              # 🚔 Law enforcement simulation
│   ├── intelligence/             # 📊 Threat intelligence modules
│   ├── templates/                # 💻 HTML templates
│   ├── utils/                    # 🔧 Utility functions
│   ├── main.py                   # FastAPI entry point
│   └── config.py                 # Configuration settings
├── dashboard.py                  # 📈 Streamlit analytics dashboard
├── simulate_attack.py            # ⚔️ Red vs Blue simulation
├── verify_honeypot.py            # ✅ System verification script
├── Dockerfile                    # 🐳 Docker deployment
├── requirements.txt              # 📦 Python dependencies
└── README.md                     # 📖 Project documentation

🎯 System Architecture Diagram

flowchart TB
    subgraph Input["📥 Input Layer"]
        A[Scammer Message] --> B[FastAPI Routes]
        B --> C{API Key Valid?}
        C -->|No| D[401 Unauthorized]
        C -->|Yes| E[Rate Limiter]
        E -->|Exceeded| F[429 Too Many Requests]
        E -->|OK| G[GUVI Handler]
    end

    subgraph Orchestrator["🤖 Orchestrator Layer"]
        G --> H[HoneypotOrchestrator]
        H --> I[Scam Detector]
        H --> J[Intel Extractor]
        H --> K[Emotional Analyzer]
        I --> L[LLM Client]
        L --> M[Groq/OpenAI/Anthropic]
    end

    subgraph Response["💬 Response Generation"]
        I --> N[Persona Engine]
        N --> O[Adaptive Strategy]
        O --> P[Engagement Delayer]
        P --> Q[Response Text]
    end

    subgraph Intelligence["📊 Intelligence Layer"]
        J --> R[Threat Engine]
        K --> R
        R --> S[Campaign Tracker]
        S --> T[Risk Scorer]
    end

    subgraph Storage["💾 Persistence Layer"]
        H --> U[SQLite/PostgreSQL]
        H --> V[Audit Logger]
        V --> W[SIEM Export]
    end

    subgraph Output["📤 Output Layer"]
        Q --> X[API Response]
        T --> X
        X --> Y[GUVI Callback]
        X --> Z[Stakeholder Exports]
        Z --> AA[CERT-In STIX 2.1]
        Z --> AB[TRAI UCC Report]
        Z --> AC[NPCI Fraud Report]
        Z --> AD[NCRP Complaint]
    end

    style Input fill:#e3f2fd
    style Orchestrator fill:#fff3e0
    style Response fill:#e8f5e9
    style Intelligence fill:#fce4ec
    style Storage fill:#f3e5f5
    style Output fill:#e0f7fa

🔄 Agent Interaction Flow

sequenceDiagram
    participant S as Scammer
    participant API as FastAPI
    participant O as Orchestrator
    participant SD as ScamDetector
    participant IE as IntelExtractor
    participant EA as EmotionalAnalyzer
    participant PE as PersonaEngine
    participant ED as EngagementDelayer
    participant DB as Database
    participant CB as Callback

    S->>API: POST /api/guvi/analyze
    API->>API: Verify API Key
    API->>API: Rate Limit Check
    API->>O: Process Message
    
    par Detection
        O->>SD: Detect Scam Type
        O->>IE: Extract Intelligence
        O->>EA: Analyze Emotions
    end
    
    SD-->>O: {is_scam, type, confidence}
    IE-->>O: {phones, upis, urls}
    EA-->>O: {urgency, fear, greed}
    
    O->>PE: Generate Response
    PE->>ED: Add Delays
    ED-->>PE: Delayed Response
    PE-->>O: Victim Response
    
    O->>DB: Store Conversation
    O-->>API: Response Payload
    API-->>S: JSON Response
    
    opt Scam Confirmed
        API->>CB: Send to GUVI
    end

🤖 AGENTS FOLDER (`app/agents/`)

The brain of the honeypot system. Each agent has a specific role.

1. `orchestrator.py` - Main Controller

Aspect	Description
Purpose	Coordinates all 6 agents to process scam messages
What it does	Receives message → Runs detection → Selects persona → Generates response → Computes risk → Returns result
Connects to	All other agents, LLM client, memory store
Key class	`HoneypotOrchestrator`
Key method	`process_message(message, conversation_id)`

2. `scam_detector.py` - Scam Detection Agent

Aspect	Description
Purpose	Detects if a message is a scam and classifies the type
What it does	Hybrid detection using keywords + LLM classification
Contains	`SCAM_DATABASE` with 10 scam types (lottery, job, banking, etc.)
Connects to	LLM client, orchestrator
Key method	`detect(message) → {is_scam, scam_type, confidence}`

3. `persona_engine.py` - Persona Agent

Aspect	Description
Purpose	Generates believable victim responses to engage scammers
What it does	Selects persona based on scam type, generates Hinglish/Hindi responses
Contains	`PERSONAS` dict with 10 personas (Sharma Uncle, Rahul Kumar, etc.)
Response phases	hook → engage → extract → stall → self_correct
Key method	`generate_response(scam_type, phase, history)`

4. `adaptive_strategy.py` - Strategy Agent

Aspect	Description
Purpose	Adapts honeypot behavior based on scammer actions
What it does	Analyzes scammer behavior, determines phase, adjusts strategy
Behaviors detected	pushing_payment, building_trust, aggressive, confused
Connects to	Persona engine, orchestrator
Key method	`adapt_strategy(scammer_message, history)`

5. `intelligence_extractor.py` - Intel Agent

Aspect	Description
Purpose	Extracts actionable intelligence from messages
What it does	Regex-based extraction of phone, UPI, bank, URLs
Connects to	Orchestrator, threat engine
Key method	`extract(message) → {phone_numbers, upi_ids, ...}`

6. `conversation_manager.py` - Memory Manager

Aspect	Description
Purpose	Manages multi-turn conversation state
What it does	Tracks history, phase progression, trust evolution
Connects to	Memory store, orchestrator
Key method	`get_conversation(id), update_conversation(...)`

🌐 API FOLDER (`app/api/`)

1. `routes.py` - API Endpoints

Aspect	Description
Purpose	Defines all REST API endpoints
Key endpoints	`/api/v1/analyze`, `/api/guvi/analyze`, `/api/v1/scam-types`
Security	`verify_api_key()` with x-api-key header
Connects to	Orchestrator, GUVI handler, schemas

2. `schemas.py` - Pydantic Models

Aspect	Description
Purpose	Request/response validation models
Key models	`AnalyzeRequest`, `AnalyzeResponse`, `GUVIInputRequest`, `GUVIOutputResponse`
Connects to	Routes, GUVI handler

🧠 CORE FOLDER (`app/core/`)

1. `llm_client.py` - LLM Client

Aspect	Description
Purpose	Unified interface to multiple LLM providers
Supports	OpenAI, Anthropic, Groq, OpenRouter
Fallback	Uses mock responses if no API key
Key method	`generate(prompt) → response`

2. `memory.py` - Conversation Memory

Aspect	Description
Purpose	In-memory conversation storage
Contains	`ConversationMemory` class with TTL support
Stores	History, phase, trust_score, aggregated_intelligence
Key method	`get_or_create(conversation_id)`

3. `prompts.py` - LLM Prompts

Aspect	Description
Purpose	System prompts for LLM interactions
Contains	`SCAM_DETECTION_PROMPT`, `RESPONSE_GENERATION_PROMPT`, `PHASE_GOALS`

🪤 DECOYS FOLDER (`app/decoys/`)

1. `fake_endpoints.py` - Decoy Portals

Aspect	Description
Purpose	Fake banking/UPI pages to trap scammers
Endpoints	`/decoys/upi/status`, `/decoys/bank/kyc-portal`, `/decoys/secure/otp-generate`
Why	Scammers click these links thinking they're real

2. `victim_profiles.py` - Synthetic Victims

Aspect	Description
Purpose	Fake victim data for honeypot responses
Contains	Synthetic names, bank accounts, UPI IDs
Why	No real PII is ever used

📊 INTELLIGENCE FOLDER (`app/intelligence/`)

1. `threat_engine.py` - Threat Intelligence

Aspect	Description
Purpose	Generates threat intelligence reports
Creates	Campaign IDs, IOCs, TTPs (MITRE ATT&CK)
Key method	`generate_threat_intel(scam_type, entities)`

2. `risk_scorer.py` - Risk Scoring

Aspect	Description
Purpose	Computes weighted risk score with explainability
Factors	Keywords, payment requests, threat level, campaign match
Key method	`compute_risk(detection_result) → {score, explanation}`

3. `campaign_tracker.py` - Campaign Clustering

Aspect	Description
Purpose	Groups scam messages into campaigns
Uses	Entity similarity to cluster related attacks
Key method	`get_or_create_campaign(entities)`

4. `telemetry.py` - Request Telemetry

Aspect	Description
Purpose	Captures IP, geo, device fingerprint
Uses	ip-api.com for geolocation
Key method	`capture_telemetry(request)`

5. `scammer_profiler.py` - Behavioral Profiling

Aspect	Description
Purpose	Builds behavioral profiles of scammers
Tracks	Aggression, persistence, tactics used

6. `engagement_metrics.py` - Metrics Tracking

Aspect	Description
Purpose	Tracks honeypot engagement statistics
Metrics	Duration, message count, intelligence extracted

7. `honeytokens.py` - Honeytoken Generator

Aspect	Description
Purpose	Generates fake credentials as bait
Creates	Fake UPI IDs, bank accounts, phone numbers

🚔 ENFORCEMENT FOLDER (`app/enforcement/`)

1. `police_api.py` - Cyber Police Simulation

Aspect	Description
Purpose	Simulates NCRP (cybercrime.gov.in) integration
Creates	Report IDs, priority levels, recommended actions
Classes	`CyberPoliceAPI`, `ActionRecommendationAPI`

2. `awareness.py` - Public Awareness

Aspect	Description
Purpose	Generates scam awareness content
Creates	Warning messages, educational tips

🔧 UTILS FOLDER (`app/utils/`)

1. `guvi_handler.py` - GUVI Format Translator

Aspect	Description
Purpose	Translates GUVI format ↔ internal format
Why	GUVI uses different field names (sessionId vs conversation_id)
Key method	`process_guvi_message(request) → GUVIOutputResponse`

2. `callback_client.py` - GUVI Callback Sender

Aspect	Description
Purpose	Sends final result to GUVI evaluation endpoint
Endpoint	`POST https://hackathon.guvi.in/api/updateHoneyPotFinalResult`
Trigger	Auto-sends when `scamDetected = true`

3. `extractors.py` - Entity Extractors

Aspect	Description
Purpose	Regex patterns for entity extraction
Extracts	Phone, UPI, bank account, IFSC, email, URL

4. `logger.py` - Structured Logging

Aspect	Description
Purpose	Consistent logging across all agents
Class	`AgentLogger`

🔗 HOW COMPONENTS CONNECT

┌─────────────────────────────────────────────────────────────────────┐
│                           USER REQUEST                               │
│                    POST /api/guvi/analyze                            │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│  routes.py → verify_api_key() → guvi_handler.py                      │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    ORCHESTRATOR (orchestrator.py)                    │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │ Scam        │ │ Intel       │ │ Persona     │ │ Adaptive    │    │
│  │ Detector    │ │ Extractor   │ │ Engine      │ │ Strategy    │    │
│  └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘    │
│         │               │               │               │           │
│         ▼               ▼               ▼               ▼           │
│  ┌─────────────────────────────────────────────────────────────┐    │
│  │                    LLM CLIENT (llm_client.py)               │    │
│  │     Groq / OpenAI / Anthropic / OpenRouter / Mock           │    │
│  └─────────────────────────────────────────────────────────────┘    │
│         │               │               │               │           │
│         ▼               ▼               ▼               ▼           │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐    │
│  │ Memory      │ │ Threat      │ │ Risk        │ │ Campaign    │    │
│  │ Store       │ │ Engine      │ │ Scorer      │ │ Tracker     │    │
│  └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘    │
└──────────────────────────────┬──────────────────────────────────────┘
                               ▼
┌─────────────────────────────────────────────────────────────────────┐
│                    RESPONSE + CALLBACK                               │
│  GUVIOutputResponse → callback_client.py → GUVI Evaluation          │
└─────────────────────────────────────────────────────────────────────┘

📊 ROOT FILES

File	Purpose
`main.py`	FastAPI app entry point, startup/shutdown events
`config.py`	Environment variables, feature flags
`dashboard.py`	Streamlit analytics UI with live charts
`simulate_attack.py`	Red Team vs Blue Team simulation script
`verify_honeypot.py`	Quick verification of all endpoints
`Dockerfile`	Container deployment for HF Spaces
`requirements.txt`	Python dependencies
`README.md`	Project documentation with API examples

🔑 KEY DATA FLOWS

1. Message Analysis Flow

Message → ScamDetector → PersonaEngine → AdaptiveStrategy → Response

2. Intelligence Flow

Message → IntelExtractor → ThreatEngine → CampaignTracker → Report

3. Risk Scoring Flow

DetectionResult → RiskScorer → Explanation → AnalyzeResponse

4. GUVI Callback Flow

ScamDetected=true → CallbackClient → hackathon.guvi.in → Evaluation

Generated for GUVI India AI Impact Buildathon 2025

🏗️ SCAM HONEYPOT - Complete Architecture Documentation

📁 Project Structure Overview

🎯 System Architecture Diagram

🔄 Agent Interaction Flow

🤖 AGENTS FOLDER (app/agents/)

1. orchestrator.py - Main Controller

2. scam_detector.py - Scam Detection Agent

3. persona_engine.py - Persona Agent

4. adaptive_strategy.py - Strategy Agent

5. intelligence_extractor.py - Intel Agent

6. conversation_manager.py - Memory Manager

🌐 API FOLDER (app/api/)

1. routes.py - API Endpoints

2. schemas.py - Pydantic Models

🧠 CORE FOLDER (app/core/)

1. llm_client.py - LLM Client

2. memory.py - Conversation Memory

3. prompts.py - LLM Prompts

🪤 DECOYS FOLDER (app/decoys/)

1. fake_endpoints.py - Decoy Portals

2. victim_profiles.py - Synthetic Victims

📊 INTELLIGENCE FOLDER (app/intelligence/)

1. threat_engine.py - Threat Intelligence

2. risk_scorer.py - Risk Scoring

3. campaign_tracker.py - Campaign Clustering

4. telemetry.py - Request Telemetry

5. scammer_profiler.py - Behavioral Profiling

6. engagement_metrics.py - Metrics Tracking

7. honeytokens.py - Honeytoken Generator

🚔 ENFORCEMENT FOLDER (app/enforcement/)

1. police_api.py - Cyber Police Simulation

2. awareness.py - Public Awareness

🔧 UTILS FOLDER (app/utils/)

1. guvi_handler.py - GUVI Format Translator

2. callback_client.py - GUVI Callback Sender

3. extractors.py - Entity Extractors

4. logger.py - Structured Logging