sentinel-scam-honeypo / docs /ARCHITECTURE.md
avinash-rai's picture
feat: GUVI final submission pack (docs, dashboard, telemetry)
7b1aabb
|
raw
history blame
17.8 kB
# πŸ—οΈ SCAM HONEYPOT - Complete Architecture Documentation
## πŸ“ Project Structure Overview
```
sentinel-scam-honeypot/
β”œβ”€β”€ app/ # Main application code
β”‚ β”œβ”€β”€ agents/ # πŸ€– AI Agents (brain of the system)
β”‚ β”œβ”€β”€ api/ # 🌐 REST API endpoints
β”‚ β”œβ”€β”€ core/ # 🧠 Core components (LLM, memory, prompts)
β”‚ β”œβ”€β”€ decoys/ # πŸͺ€ Fake endpoints to trap scammers
β”‚ β”œβ”€β”€ enforcement/ # πŸš” Law enforcement simulation
β”‚ β”œβ”€β”€ intelligence/ # πŸ“Š Threat intelligence modules
β”‚ β”œβ”€β”€ templates/ # πŸ’» HTML templates
β”‚ β”œβ”€β”€ utils/ # πŸ”§ Utility functions
β”‚ β”œβ”€β”€ main.py # FastAPI entry point
β”‚ └── config.py # Configuration settings
β”œβ”€β”€ dashboard.py # πŸ“ˆ Streamlit analytics dashboard
β”œβ”€β”€ simulate_attack.py # βš”οΈ Red vs Blue simulation
β”œβ”€β”€ verify_honeypot.py # βœ… System verification script
β”œβ”€β”€ Dockerfile # 🐳 Docker deployment
β”œβ”€β”€ requirements.txt # πŸ“¦ Python dependencies
└── README.md # πŸ“– Project documentation
```
---
## 🎯 System Architecture Diagram
```mermaid
flowchart TB
subgraph Input["πŸ“₯ Input Layer"]
A[Scammer Message] --> B[FastAPI Routes]
B --> C{API Key Valid?}
C -->|No| D[401 Unauthorized]
C -->|Yes| E[Rate Limiter]
E -->|Exceeded| F[429 Too Many Requests]
E -->|OK| G[GUVI Handler]
end
subgraph Orchestrator["πŸ€– Orchestrator Layer"]
G --> H[HoneypotOrchestrator]
H --> I[Scam Detector]
H --> J[Intel Extractor]
H --> K[Emotional Analyzer]
I --> L[LLM Client]
L --> M[Groq/OpenAI/Anthropic]
end
subgraph Response["πŸ’¬ Response Generation"]
I --> N[Persona Engine]
N --> O[Adaptive Strategy]
O --> P[Engagement Delayer]
P --> Q[Response Text]
end
subgraph Intelligence["πŸ“Š Intelligence Layer"]
J --> R[Threat Engine]
K --> R
R --> S[Campaign Tracker]
S --> T[Risk Scorer]
end
subgraph Storage["πŸ’Ύ Persistence Layer"]
H --> U[SQLite/PostgreSQL]
H --> V[Audit Logger]
V --> W[SIEM Export]
end
subgraph Output["πŸ“€ Output Layer"]
Q --> X[API Response]
T --> X
X --> Y[GUVI Callback]
X --> Z[Stakeholder Exports]
Z --> AA[CERT-In STIX 2.1]
Z --> AB[TRAI UCC Report]
Z --> AC[NPCI Fraud Report]
Z --> AD[NCRP Complaint]
end
style Input fill:#e3f2fd
style Orchestrator fill:#fff3e0
style Response fill:#e8f5e9
style Intelligence fill:#fce4ec
style Storage fill:#f3e5f5
style Output fill:#e0f7fa
```
---
## πŸ”„ Agent Interaction Flow
```mermaid
sequenceDiagram
participant S as Scammer
participant API as FastAPI
participant O as Orchestrator
participant SD as ScamDetector
participant IE as IntelExtractor
participant EA as EmotionalAnalyzer
participant PE as PersonaEngine
participant ED as EngagementDelayer
participant DB as Database
participant CB as Callback
S->>API: POST /api/guvi/analyze
API->>API: Verify API Key
API->>API: Rate Limit Check
API->>O: Process Message
par Detection
O->>SD: Detect Scam Type
O->>IE: Extract Intelligence
O->>EA: Analyze Emotions
end
SD-->>O: {is_scam, type, confidence}
IE-->>O: {phones, upis, urls}
EA-->>O: {urgency, fear, greed}
O->>PE: Generate Response
PE->>ED: Add Delays
ED-->>PE: Delayed Response
PE-->>O: Victim Response
O->>DB: Store Conversation
O-->>API: Response Payload
API-->>S: JSON Response
opt Scam Confirmed
API->>CB: Send to GUVI
end
```
---
## πŸ€– AGENTS FOLDER (`app/agents/`)
The **brain** of the honeypot system. Each agent has a specific role.
### 1. `orchestrator.py` - Main Controller
| Aspect | Description |
|--------|-------------|
| **Purpose** | Coordinates all 6 agents to process scam messages |
| **What it does** | Receives message β†’ Runs detection β†’ Selects persona β†’ Generates response β†’ Computes risk β†’ Returns result |
| **Connects to** | All other agents, LLM client, memory store |
| **Key class** | `HoneypotOrchestrator` |
| **Key method** | `process_message(message, conversation_id)` |
### 2. `scam_detector.py` - Scam Detection Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Detects if a message is a scam and classifies the type |
| **What it does** | Hybrid detection using keywords + LLM classification |
| **Contains** | `SCAM_DATABASE` with 10 scam types (lottery, job, banking, etc.) |
| **Connects to** | LLM client, orchestrator |
| **Key method** | `detect(message) β†’ {is_scam, scam_type, confidence}` |
### 3. `persona_engine.py` - Persona Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates believable victim responses to engage scammers |
| **What it does** | Selects persona based on scam type, generates Hinglish/Hindi responses |
| **Contains** | `PERSONAS` dict with 10 personas (Sharma Uncle, Rahul Kumar, etc.) |
| **Response phases** | hook β†’ engage β†’ extract β†’ stall β†’ self_correct |
| **Key method** | `generate_response(scam_type, phase, history)` |
### 4. `adaptive_strategy.py` - Strategy Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Adapts honeypot behavior based on scammer actions |
| **What it does** | Analyzes scammer behavior, determines phase, adjusts strategy |
| **Behaviors detected** | pushing_payment, building_trust, aggressive, confused |
| **Connects to** | Persona engine, orchestrator |
| **Key method** | `adapt_strategy(scammer_message, history)` |
### 5. `intelligence_extractor.py` - Intel Agent
| Aspect | Description |
|--------|-------------|
| **Purpose** | Extracts actionable intelligence from messages |
| **What it does** | Regex-based extraction of phone, UPI, bank, URLs |
| **Connects to** | Orchestrator, threat engine |
| **Key method** | `extract(message) β†’ {phone_numbers, upi_ids, ...}` |
### 6. `conversation_manager.py` - Memory Manager
| Aspect | Description |
|--------|-------------|
| **Purpose** | Manages multi-turn conversation state |
| **What it does** | Tracks history, phase progression, trust evolution |
| **Connects to** | Memory store, orchestrator |
| **Key method** | `get_conversation(id), update_conversation(...)` |
---
## 🌐 API FOLDER (`app/api/`)
### 1. `routes.py` - API Endpoints
| Aspect | Description |
|--------|-------------|
| **Purpose** | Defines all REST API endpoints |
| **Key endpoints** | `/api/v1/analyze`, `/api/guvi/analyze`, `/api/v1/scam-types` |
| **Security** | `verify_api_key()` with x-api-key header |
| **Connects to** | Orchestrator, GUVI handler, schemas |
### 2. `schemas.py` - Pydantic Models
| Aspect | Description |
|--------|-------------|
| **Purpose** | Request/response validation models |
| **Key models** | `AnalyzeRequest`, `AnalyzeResponse`, `GUVIInputRequest`, `GUVIOutputResponse` |
| **Connects to** | Routes, GUVI handler |
---
## 🧠 CORE FOLDER (`app/core/`)
### 1. `llm_client.py` - LLM Client
| Aspect | Description |
|--------|-------------|
| **Purpose** | Unified interface to multiple LLM providers |
| **Supports** | OpenAI, Anthropic, Groq, OpenRouter |
| **Fallback** | Uses mock responses if no API key |
| **Key method** | `generate(prompt) β†’ response` |
### 2. `memory.py` - Conversation Memory
| Aspect | Description |
|--------|-------------|
| **Purpose** | In-memory conversation storage |
| **Contains** | `ConversationMemory` class with TTL support |
| **Stores** | History, phase, trust_score, aggregated_intelligence |
| **Key method** | `get_or_create(conversation_id)` |
### 3. `prompts.py` - LLM Prompts
| Aspect | Description |
|--------|-------------|
| **Purpose** | System prompts for LLM interactions |
| **Contains** | `SCAM_DETECTION_PROMPT`, `RESPONSE_GENERATION_PROMPT`, `PHASE_GOALS` |
---
## πŸͺ€ DECOYS FOLDER (`app/decoys/`)
### 1. `fake_endpoints.py` - Decoy Portals
| Aspect | Description |
|--------|-------------|
| **Purpose** | Fake banking/UPI pages to trap scammers |
| **Endpoints** | `/decoys/upi/status`, `/decoys/bank/kyc-portal`, `/decoys/secure/otp-generate` |
| **Why** | Scammers click these links thinking they're real |
### 2. `victim_profiles.py` - Synthetic Victims
| Aspect | Description |
|--------|-------------|
| **Purpose** | Fake victim data for honeypot responses |
| **Contains** | Synthetic names, bank accounts, UPI IDs |
| **Why** | No real PII is ever used |
---
## πŸ“Š INTELLIGENCE FOLDER (`app/intelligence/`)
### 1. `threat_engine.py` - Threat Intelligence
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates threat intelligence reports |
| **Creates** | Campaign IDs, IOCs, TTPs (MITRE ATT&CK) |
| **Key method** | `generate_threat_intel(scam_type, entities)` |
### 2. `risk_scorer.py` - Risk Scoring
| Aspect | Description |
|--------|-------------|
| **Purpose** | Computes weighted risk score with explainability |
| **Factors** | Keywords, payment requests, threat level, campaign match |
| **Key method** | `compute_risk(detection_result) β†’ {score, explanation}` |
### 3. `campaign_tracker.py` - Campaign Clustering
| Aspect | Description |
|--------|-------------|
| **Purpose** | Groups scam messages into campaigns |
| **Uses** | Entity similarity to cluster related attacks |
| **Key method** | `get_or_create_campaign(entities)` |
### 4. `telemetry.py` - Request Telemetry
| Aspect | Description |
|--------|-------------|
| **Purpose** | Captures IP, geo, device fingerprint |
| **Uses** | ip-api.com for geolocation |
| **Key method** | `capture_telemetry(request)` |
### 5. `scammer_profiler.py` - Behavioral Profiling
| Aspect | Description |
|--------|-------------|
| **Purpose** | Builds behavioral profiles of scammers |
| **Tracks** | Aggression, persistence, tactics used |
### 6. `engagement_metrics.py` - Metrics Tracking
| Aspect | Description |
|--------|-------------|
| **Purpose** | Tracks honeypot engagement statistics |
| **Metrics** | Duration, message count, intelligence extracted |
### 7. `honeytokens.py` - Honeytoken Generator
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates fake credentials as bait |
| **Creates** | Fake UPI IDs, bank accounts, phone numbers |
---
## πŸš” ENFORCEMENT FOLDER (`app/enforcement/`)
### 1. `police_api.py` - Cyber Police Simulation
| Aspect | Description |
|--------|-------------|
| **Purpose** | Simulates NCRP (cybercrime.gov.in) integration |
| **Creates** | Report IDs, priority levels, recommended actions |
| **Classes** | `CyberPoliceAPI`, `ActionRecommendationAPI` |
### 2. `awareness.py` - Public Awareness
| Aspect | Description |
|--------|-------------|
| **Purpose** | Generates scam awareness content |
| **Creates** | Warning messages, educational tips |
---
## πŸ”§ UTILS FOLDER (`app/utils/`)
### 1. `guvi_handler.py` - GUVI Format Translator
| Aspect | Description |
|--------|-------------|
| **Purpose** | Translates GUVI format ↔ internal format |
| **Why** | GUVI uses different field names (sessionId vs conversation_id) |
| **Key method** | `process_guvi_message(request) β†’ GUVIOutputResponse` |
### 2. `callback_client.py` - GUVI Callback Sender
| Aspect | Description |
|--------|-------------|
| **Purpose** | Sends final result to GUVI evaluation endpoint |
| **Endpoint** | `POST https://hackathon.guvi.in/api/updateHoneyPotFinalResult` |
| **Trigger** | Auto-sends when `scamDetected = true` |
### 3. `extractors.py` - Entity Extractors
| Aspect | Description |
|--------|-------------|
| **Purpose** | Regex patterns for entity extraction |
| **Extracts** | Phone, UPI, bank account, IFSC, email, URL |
### 4. `logger.py` - Structured Logging
| Aspect | Description |
|--------|-------------|
| **Purpose** | Consistent logging across all agents |
| **Class** | `AgentLogger` |
---
## πŸ”— HOW COMPONENTS CONNECT
```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ USER REQUEST β”‚
β”‚ POST /api/guvi/analyze β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ routes.py β†’ verify_api_key() β†’ guvi_handler.py β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ ORCHESTRATOR (orchestrator.py) β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Scam β”‚ β”‚ Intel β”‚ β”‚ Persona β”‚ β”‚ Adaptive β”‚ β”‚
β”‚ β”‚ Detector β”‚ β”‚ Extractor β”‚ β”‚ Engine β”‚ β”‚ Strategy β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ LLM CLIENT (llm_client.py) β”‚ β”‚
β”‚ β”‚ Groq / OpenAI / Anthropic / OpenRouter / Mock β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β”‚ β”‚ β”‚ β”‚ β”‚ β”‚
β”‚ β–Ό β–Ό β–Ό β–Ό β”‚
β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚ β”‚ Memory β”‚ β”‚ Threat β”‚ β”‚ Risk β”‚ β”‚ Campaign β”‚ β”‚
β”‚ β”‚ Store β”‚ β”‚ Engine β”‚ β”‚ Scorer β”‚ β”‚ Tracker β”‚ β”‚
β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ RESPONSE + CALLBACK β”‚
β”‚ GUVIOutputResponse β†’ callback_client.py β†’ GUVI Evaluation β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```
---
## πŸ“Š ROOT FILES
| File | Purpose |
|------|---------|
| `main.py` | FastAPI app entry point, startup/shutdown events |
| `config.py` | Environment variables, feature flags |
| `dashboard.py` | Streamlit analytics UI with live charts |
| `simulate_attack.py` | Red Team vs Blue Team simulation script |
| `verify_honeypot.py` | Quick verification of all endpoints |
| `Dockerfile` | Container deployment for HF Spaces |
| `requirements.txt` | Python dependencies |
| `README.md` | Project documentation with API examples |
---
## πŸ”‘ KEY DATA FLOWS
### 1. Message Analysis Flow
```
Message β†’ ScamDetector β†’ PersonaEngine β†’ AdaptiveStrategy β†’ Response
```
### 2. Intelligence Flow
```
Message β†’ IntelExtractor β†’ ThreatEngine β†’ CampaignTracker β†’ Report
```
### 3. Risk Scoring Flow
```
DetectionResult β†’ RiskScorer β†’ Explanation β†’ AnalyzeResponse
```
### 4. GUVI Callback Flow
```
ScamDetected=true β†’ CallbackClient β†’ hackathon.guvi.in β†’ Evaluation
```
---
*Generated for GUVI India AI Impact Buildathon 2025*