Spaces:

AvinashAnalytics
/

sentinel-scam-honeypo

Paused

App Files Files Community

sentinel-scam-honeypo / docs /ARCHITECTURE.md

avinash-rai

feat: GUVI final submission pack (docs, dashboard, telemetry)

7b1aabb 4 months ago

preview code

raw

history blame

17.8 kB

	# 🏗️ SCAM HONEYPOT - Complete Architecture Documentation

	## 📁 Project Structure Overview

	```
	sentinel-scam-honeypot/
	├── app/ # Main application code
	│ ├── agents/ # 🤖 AI Agents (brain of the system)
	│ ├── api/ # 🌐 REST API endpoints
	│ ├── core/ # 🧠 Core components (LLM, memory, prompts)
	│ ├── decoys/ # 🪤 Fake endpoints to trap scammers
	│ ├── enforcement/ # 🚔 Law enforcement simulation
	│ ├── intelligence/ # 📊 Threat intelligence modules
	│ ├── templates/ # 💻 HTML templates
	│ ├── utils/ # 🔧 Utility functions
	│ ├── main.py # FastAPI entry point
	│ └── config.py # Configuration settings
	├── dashboard.py # 📈 Streamlit analytics dashboard
	├── simulate_attack.py # ⚔️ Red vs Blue simulation
	├── verify_honeypot.py # ✅ System verification script
	├── Dockerfile # 🐳 Docker deployment
	├── requirements.txt # 📦 Python dependencies
	└── README.md # 📖 Project documentation
	```

	---

	## 🎯 System Architecture Diagram

	```mermaid
	flowchart TB
	subgraph Input["📥 Input Layer"]
	A[Scammer Message] --> B[FastAPI Routes]
	B --> C{API Key Valid?}
	C -->\|No\| D[401 Unauthorized]
	C -->\|Yes\| E[Rate Limiter]
	E -->\|Exceeded\| F[429 Too Many Requests]
	E -->\|OK\| G[GUVI Handler]
	end

	subgraph Orchestrator["🤖 Orchestrator Layer"]
	G --> H[HoneypotOrchestrator]
	H --> I[Scam Detector]
	H --> J[Intel Extractor]
	H --> K[Emotional Analyzer]
	I --> L[LLM Client]
	L --> M[Groq/OpenAI/Anthropic]
	end

	subgraph Response["💬 Response Generation"]
	I --> N[Persona Engine]
	N --> O[Adaptive Strategy]
	O --> P[Engagement Delayer]
	P --> Q[Response Text]
	end

	subgraph Intelligence["📊 Intelligence Layer"]
	J --> R[Threat Engine]
	K --> R
	R --> S[Campaign Tracker]
	S --> T[Risk Scorer]
	end

	subgraph Storage["💾 Persistence Layer"]
	H --> U[SQLite/PostgreSQL]
	H --> V[Audit Logger]
	V --> W[SIEM Export]
	end

	subgraph Output["📤 Output Layer"]
	Q --> X[API Response]
	T --> X
	X --> Y[GUVI Callback]
	X --> Z[Stakeholder Exports]
	Z --> AA[CERT-In STIX 2.1]
	Z --> AB[TRAI UCC Report]
	Z --> AC[NPCI Fraud Report]
	Z --> AD[NCRP Complaint]
	end

	style Input fill:#e3f2fd
	style Orchestrator fill:#fff3e0
	style Response fill:#e8f5e9
	style Intelligence fill:#fce4ec
	style Storage fill:#f3e5f5
	style Output fill:#e0f7fa
	```

	---

	## 🔄 Agent Interaction Flow

	```mermaid
	sequenceDiagram
	participant S as Scammer
	participant API as FastAPI
	participant O as Orchestrator
	participant SD as ScamDetector
	participant IE as IntelExtractor
	participant EA as EmotionalAnalyzer
	participant PE as PersonaEngine
	participant ED as EngagementDelayer
	participant DB as Database
	participant CB as Callback

	S->>API: POST /api/guvi/analyze
	API->>API: Verify API Key
	API->>API: Rate Limit Check
	API->>O: Process Message

	par Detection
	O->>SD: Detect Scam Type
	O->>IE: Extract Intelligence
	O->>EA: Analyze Emotions
	end

	SD-->>O: {is_scam, type, confidence}
	IE-->>O: {phones, upis, urls}
	EA-->>O: {urgency, fear, greed}

	O->>PE: Generate Response
	PE->>ED: Add Delays
	ED-->>PE: Delayed Response
	PE-->>O: Victim Response

	O->>DB: Store Conversation
	O-->>API: Response Payload
	API-->>S: JSON Response

	opt Scam Confirmed
	API->>CB: Send to GUVI
	end
	```

	---

	## 🤖 AGENTS FOLDER (`app/agents/`)

	The brain of the honeypot system. Each agent has a specific role.

	### 1. `orchestrator.py` - Main Controller
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Coordinates all 6 agents to process scam messages \|
	\| What it does \| Receives message → Runs detection → Selects persona → Generates response → Computes risk → Returns result \|
	\| Connects to \| All other agents, LLM client, memory store \|
	\| Key class \| `HoneypotOrchestrator` \|
	\| Key method \| `process_message(message, conversation_id)` \|

	### 2. `scam_detector.py` - Scam Detection Agent
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Detects if a message is a scam and classifies the type \|
	\| What it does \| Hybrid detection using keywords + LLM classification \|
	\| Contains \| `SCAM_DATABASE` with 10 scam types (lottery, job, banking, etc.) \|
	\| Connects to \| LLM client, orchestrator \|
	\| Key method \| `detect(message) → {is_scam, scam_type, confidence}` \|

	### 3. `persona_engine.py` - Persona Agent
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Generates believable victim responses to engage scammers \|
	\| What it does \| Selects persona based on scam type, generates Hinglish/Hindi responses \|
	\| Contains \| `PERSONAS` dict with 10 personas (Sharma Uncle, Rahul Kumar, etc.) \|
	\| Response phases \| hook → engage → extract → stall → self_correct \|
	\| Key method \| `generate_response(scam_type, phase, history)` \|

	### 4. `adaptive_strategy.py` - Strategy Agent
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Adapts honeypot behavior based on scammer actions \|
	\| What it does \| Analyzes scammer behavior, determines phase, adjusts strategy \|
	\| Behaviors detected \| pushing_payment, building_trust, aggressive, confused \|
	\| Connects to \| Persona engine, orchestrator \|
	\| Key method \| `adapt_strategy(scammer_message, history)` \|

	### 5. `intelligence_extractor.py` - Intel Agent
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Extracts actionable intelligence from messages \|
	\| What it does \| Regex-based extraction of phone, UPI, bank, URLs \|
	\| Connects to \| Orchestrator, threat engine \|
	\| Key method \| `extract(message) → {phone_numbers, upi_ids, ...}` \|

	### 6. `conversation_manager.py` - Memory Manager
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Manages multi-turn conversation state \|
	\| What it does \| Tracks history, phase progression, trust evolution \|
	\| Connects to \| Memory store, orchestrator \|
	\| Key method \| `get_conversation(id), update_conversation(...)` \|

	---

	## 🌐 API FOLDER (`app/api/`)

	### 1. `routes.py` - API Endpoints
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Defines all REST API endpoints \|
	\| Key endpoints \| `/api/v1/analyze`, `/api/guvi/analyze`, `/api/v1/scam-types` \|
	\| Security \| `verify_api_key()` with x-api-key header \|
	\| Connects to \| Orchestrator, GUVI handler, schemas \|

	### 2. `schemas.py` - Pydantic Models
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Request/response validation models \|
	\| Key models \| `AnalyzeRequest`, `AnalyzeResponse`, `GUVIInputRequest`, `GUVIOutputResponse` \|
	\| Connects to \| Routes, GUVI handler \|

	---

	## 🧠 CORE FOLDER (`app/core/`)

	### 1. `llm_client.py` - LLM Client
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Unified interface to multiple LLM providers \|
	\| Supports \| OpenAI, Anthropic, Groq, OpenRouter \|
	\| Fallback \| Uses mock responses if no API key \|
	\| Key method \| `generate(prompt) → response` \|

	### 2. `memory.py` - Conversation Memory
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| In-memory conversation storage \|
	\| Contains \| `ConversationMemory` class with TTL support \|
	\| Stores \| History, phase, trust_score, aggregated_intelligence \|
	\| Key method \| `get_or_create(conversation_id)` \|

	### 3. `prompts.py` - LLM Prompts
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| System prompts for LLM interactions \|
	\| Contains \| `SCAM_DETECTION_PROMPT`, `RESPONSE_GENERATION_PROMPT`, `PHASE_GOALS` \|

	---

	## 🪤 DECOYS FOLDER (`app/decoys/`)

	### 1. `fake_endpoints.py` - Decoy Portals
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Fake banking/UPI pages to trap scammers \|
	\| Endpoints \| `/decoys/upi/status`, `/decoys/bank/kyc-portal`, `/decoys/secure/otp-generate` \|
	\| Why \| Scammers click these links thinking they're real \|

	### 2. `victim_profiles.py` - Synthetic Victims
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Fake victim data for honeypot responses \|
	\| Contains \| Synthetic names, bank accounts, UPI IDs \|
	\| Why \| No real PII is ever used \|

	---

	## 📊 INTELLIGENCE FOLDER (`app/intelligence/`)

	### 1. `threat_engine.py` - Threat Intelligence
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Generates threat intelligence reports \|
	\| Creates \| Campaign IDs, IOCs, TTPs (MITRE ATT&CK) \|
	\| Key method \| `generate_threat_intel(scam_type, entities)` \|

	### 2. `risk_scorer.py` - Risk Scoring
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Computes weighted risk score with explainability \|
	\| Factors \| Keywords, payment requests, threat level, campaign match \|
	\| Key method \| `compute_risk(detection_result) → {score, explanation}` \|

	### 3. `campaign_tracker.py` - Campaign Clustering
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Groups scam messages into campaigns \|
	\| Uses \| Entity similarity to cluster related attacks \|
	\| Key method \| `get_or_create_campaign(entities)` \|

	### 4. `telemetry.py` - Request Telemetry
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Captures IP, geo, device fingerprint \|
	\| Uses \| ip-api.com for geolocation \|
	\| Key method \| `capture_telemetry(request)` \|

	### 5. `scammer_profiler.py` - Behavioral Profiling
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Builds behavioral profiles of scammers \|
	\| Tracks \| Aggression, persistence, tactics used \|

	### 6. `engagement_metrics.py` - Metrics Tracking
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Tracks honeypot engagement statistics \|
	\| Metrics \| Duration, message count, intelligence extracted \|

	### 7. `honeytokens.py` - Honeytoken Generator
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Generates fake credentials as bait \|
	\| Creates \| Fake UPI IDs, bank accounts, phone numbers \|

	---

	## 🚔 ENFORCEMENT FOLDER (`app/enforcement/`)

	### 1. `police_api.py` - Cyber Police Simulation
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Simulates NCRP (cybercrime.gov.in) integration \|
	\| Creates \| Report IDs, priority levels, recommended actions \|
	\| Classes \| `CyberPoliceAPI`, `ActionRecommendationAPI` \|

	### 2. `awareness.py` - Public Awareness
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Generates scam awareness content \|
	\| Creates \| Warning messages, educational tips \|

	---

	## 🔧 UTILS FOLDER (`app/utils/`)

	### 1. `guvi_handler.py` - GUVI Format Translator
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Translates GUVI format ↔ internal format \|
	\| Why \| GUVI uses different field names (sessionId vs conversation_id) \|
	\| Key method \| `process_guvi_message(request) → GUVIOutputResponse` \|

	### 2. `callback_client.py` - GUVI Callback Sender
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Sends final result to GUVI evaluation endpoint \|
	\| Endpoint \| `POST https://hackathon.guvi.in/api/updateHoneyPotFinalResult` \|
	\| Trigger \| Auto-sends when `scamDetected = true` \|

	### 3. `extractors.py` - Entity Extractors
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Regex patterns for entity extraction \|
	\| Extracts \| Phone, UPI, bank account, IFSC, email, URL \|

	### 4. `logger.py` - Structured Logging
	\| Aspect \| Description \|
	\|--------\|-------------\|
	\| Purpose \| Consistent logging across all agents \|
	\| Class \| `AgentLogger` \|

	---

	## 🔗 HOW COMPONENTS CONNECT

	```
	┌─────────────────────────────────────────────────────────────────────┐
	│ USER REQUEST │
	│ POST /api/guvi/analyze │
	└──────────────────────────────┬──────────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ routes.py → verify_api_key() → guvi_handler.py │
	└──────────────────────────────┬──────────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ ORCHESTRATOR (orchestrator.py) │
	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
	│ │ Scam │ │ Intel │ │ Persona │ │ Adaptive │ │
	│ │ Detector │ │ Extractor │ │ Engine │ │ Strategy │ │
	│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
	│ │ │ │ │ │
	│ ▼ ▼ ▼ ▼ │
	│ ┌─────────────────────────────────────────────────────────────┐ │
	│ │ LLM CLIENT (llm_client.py) │ │
	│ │ Groq / OpenAI / Anthropic / OpenRouter / Mock │ │
	│ └─────────────────────────────────────────────────────────────┘ │
	│ │ │ │ │ │
	│ ▼ ▼ ▼ ▼ │
	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
	│ │ Memory │ │ Threat │ │ Risk │ │ Campaign │ │
	│ │ Store │ │ Engine │ │ Scorer │ │ Tracker │ │
	│ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │
	└──────────────────────────────┬──────────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────────┐
	│ RESPONSE + CALLBACK │
	│ GUVIOutputResponse → callback_client.py → GUVI Evaluation │
	└─────────────────────────────────────────────────────────────────────┘
	```

	---

	## 📊 ROOT FILES

	\| File \| Purpose \|
	\|------\|---------\|
	\| `main.py` \| FastAPI app entry point, startup/shutdown events \|
	\| `config.py` \| Environment variables, feature flags \|
	\| `dashboard.py` \| Streamlit analytics UI with live charts \|
	\| `simulate_attack.py` \| Red Team vs Blue Team simulation script \|
	\| `verify_honeypot.py` \| Quick verification of all endpoints \|
	\| `Dockerfile` \| Container deployment for HF Spaces \|
	\| `requirements.txt` \| Python dependencies \|
	\| `README.md` \| Project documentation with API examples \|

	---

	## 🔑 KEY DATA FLOWS

	### 1. Message Analysis Flow
	```
	Message → ScamDetector → PersonaEngine → AdaptiveStrategy → Response
	```

	### 2. Intelligence Flow
	```
	Message → IntelExtractor → ThreatEngine → CampaignTracker → Report
	```

	### 3. Risk Scoring Flow
	```
	DetectionResult → RiskScorer → Explanation → AnalyzeResponse
	```

	### 4. GUVI Callback Flow
	```
	ScamDetected=true → CallbackClient → hackathon.guvi.in → Evaluation
	```

	---

	Generated for GUVI India AI Impact Buildathon 2025