# Topic 13: Intelligence Extraction Pipeline **Audit Date**: 2026-02-01 **Auditor**: Agent Antigravity **Scope**: Data Extraction & Forensics --- ## 1. The Hybrid Architecture The system uses a **Dual-Pass Strategy** to ensure no data is missed. | Pass | Technology | Purpose | Speed | | :--- | :--- | :--- | :--- | | **Pass 1** | **Regex (Deterministic)** | Phone numbers, Emails, UPIs. | < 10ms | | **Pass 2** | **LLM (Semantic)** | Names, Context, "Hidden" intents. | ~1.5s | ### **A. Chain of Verification (CoVe)** * **File**: `intelligence_extractor.py` (Line 112). * **Logic**: The prompt asks the LLM to *verify* its own extraction against the context. * **Result**: Drastically reduces "Hallucinated Phone Numbers". --- ## 2. Forensic Capabilities ### **A. Math Forensics (`math_forensics`)** * **Trigger**: Words like "ROI", "Interest", "Profit". * **Action**: Calls `groq/compound-mini` (Tool Capable). * **Goal**: Verifies if the promised returns are mathematically impossible (e.g., "Double money in 2 days" = 50% Daily ROI). * **Flag**: `forensic_flag: RED_FLAG` adds +30 to Risk Score. ### **B. Artifact Extraction** Supported Data Types: * ✅ **Financial**: UPI, Bank Account, IFSC, Credit Cards. * ✅ **Identity**: PAN Card, Aadhar (Masked), Names. * ✅ **Digital**: URLs, Email, Crypto Addresses. * ✅ **Technical**: OTPs, APKs (RATs). --- ## 3. PII Safety (Privacy) * **Masking**: Confirmed `mask_pii` function. * **Logs**: All logs use `mask_intelligence` before writing to disk/console. * **Reporting**: Only "Law Enforcement" exports (simulated) get raw data. --- ## 4. Assessment The extraction pipeline is **robust** and **redundant**. If Regex fails (e.g., "pay tm at 99.99"), the LLM catches it. If the LLM hallucinates, the Verification step filters it.