# Stabilization Walkthrough: LLM Client & Forensic Service ## Goal Resolve critical recurring errors in the Groq API integration preventing reliable intelligence extraction and honeypot operation: 1. **400 Bad Request Loop**: `groq/compound` and other models failing on strict JSON schemas. 2. **413 Payload Too Large**: `FAST_CHAT` (Llama-3.1-8b) overflowing context limits (6k TPM) with full conversation history. 3. **Service Crashes**: `enrichment_service` dying when LLM returns chatty/malformed non-JSON responses. 4. **Missed Intelligence**: Regex whitelist excluding test/scam domains like `fakebank`. ## Changes Implemented ### 1. Robust LLM Client (`app/core/llm_client.py`) - **Auto-Downgrade Strategy**: If `generate_structured` encounters a `400 Bad Request` while using `json_schema` mode, it automatically: 1. Logs the failure. 2. Adds the model to a local `schema_failed_models` blacklist. 3. Retries the request immediately using `json_object` mode (or raw fallback). - **Crash-Proof Indentation**: Fixed a critical `SyntaxError` ('await outside function') by rewriting the retry loop with strictly enforced indentation. ```python # Pseudo-code of the fix if response.status_code == 400 and is_schema_model: print(f"[RECOVERY] Schema Mode Failed on {model}. Downgrading...") schema_failed_models.add(model) continue # Retry loop will now pick json_object ``` ### 2. Optimized Persona Engine for Fast Chat (`app/agents/persona_engine.py`) - **History Truncation**: Modified `_llm_generate` to detect `FAST_CHAT` usage. - **Tier Compliance**: Enforced strict limits for Groq's Developer Plan (6k TPM): - Reduced context window to **last 2 turns** (was 3). - Truncated individual message content to **300 chars**. - Prevents `413 Payload Too Large` from locking up the honeypot. ### 3. Forensic Service Resilience (`app/intelligence/enrichment_service.py`) - **Tolerant Parsing**: Wrapped `json.loads` in a robust `try-except` block. - **Regex Fallback**: If standard parsing fails (common with Llama models returning "Here is your JSON: {...}"), it extracts the JSON object using regex. - **Crash Prevention**: Returns a safe "fallback" dictionary instead of raising an unhandled exception, ensuring the pipeline continues even if forensic enrichment fails. ### 4. Intelligence Extraction (`app/utils/extractors.py`) - **Whitelist Expansion**: Added `fakebank`, `fraud`, `example`, and `test` to the UPI domain whitelist. - **Impact**: Ensures valid-format test indicators (e.g., `scammer@fakebank`) are correctly extracted as UPI IDs instead of being ignored. ## Verification Results A comprehensive verification script `verify_all_fixes.py` confirmed: 1. **Regex**: Correctly extracts `scammer.fraud@fakebank` and `+91` numbers. 2. **Rate Limits**: `groq/compound` TPD is correctly set to 1 Billion (Unlimited). 3. **LLM Stability**: Calling `generate_structured` with a tricky schema on `FAST_CHAT` no longer crashes. It returns a response, and even if the model outputs chatty text (e.g., "busy hoon abhi"), the system catches the JSON error gracefully. ## Next Steps - **Monitor Telemetry**: Watch for `[RECOVERY] Schema Mode Failed` logs to identify if we need to permanently disable schema mode for specific models in the registry. - **Schema Simplification**: If 400 errors persist even with fallbacks, consider simplifying the JSON schemas used for `FORENSIC_SEARCH`.