# Stabilization Walkthrough: LLM Client & Forensic Service

## Goal
Resolve critical recurring errors in the Groq API integration preventing reliable intelligence extraction and honeypot operation:
1.  **400 Bad Request Loop**: `groq/compound` and other models failing on strict JSON schemas.
2.  **413 Payload Too Large**: `FAST_CHAT` (Llama-3.1-8b) overflowing context limits (6k TPM) with full conversation history.
3.  **Service Crashes**: `enrichment_service` dying when LLM returns chatty/malformed non-JSON responses.
4.  **Missed Intelligence**: Regex whitelist excluding test/scam domains like `fakebank`.

## Changes Implemented

### 1. Robust LLM Client (`app/core/llm_client.py`)
-   **Auto-Downgrade Strategy**: If `generate_structured` encounters a `400 Bad Request` while using `json_schema` mode, it automatically:
    1.  Logs the failure.
    2.  Adds the model to a local `schema_failed_models` blacklist.
    3.  Retries the request immediately using `json_object` mode (or raw fallback).
-   **Crash-Proof Indentation**: Fixed a critical `SyntaxError` ('await outside function') by rewriting the retry loop with strictly enforced indentation.

```python
# Pseudo-code of the fix
if response.status_code == 400 and is_schema_model:
    print(f"[RECOVERY] Schema Mode Failed on {model}. Downgrading...")
    schema_failed_models.add(model)
    continue # Retry loop will now pick json_object
```

### 2. Optimized Persona Engine for Fast Chat (`app/agents/persona_engine.py`)
-   **History Truncation**: Modified `_llm_generate` to detect `FAST_CHAT` usage.
-   **Tier Compliance**: Enforced strict limits for Groq's Developer Plan (6k TPM):
    -   Reduced context window to **last 2 turns** (was 3).
    -   Truncated individual message content to **300 chars**.
    -   Prevents `413 Payload Too Large` from locking up the honeypot.

### 3. Forensic Service Resilience (`app/intelligence/enrichment_service.py`)
-   **Tolerant Parsing**: Wrapped `json.loads` in a robust `try-except` block.
-   **Regex Fallback**: If standard parsing fails (common with Llama models returning "Here is your JSON: {...}"), it extracts the JSON object using regex.
-   **Crash Prevention**: Returns a safe "fallback" dictionary instead of raising an unhandled exception, ensuring the pipeline continues even if forensic enrichment fails.

### 4. Intelligence Extraction (`app/utils/extractors.py`)
-   **Whitelist Expansion**: Added `fakebank`, `fraud`, `example`, and `test` to the UPI domain whitelist.
-   **Impact**: Ensures valid-format test indicators (e.g., `scammer@fakebank`) are correctly extracted as UPI IDs instead of being ignored.

## Verification Results

A comprehensive verification script `verify_all_fixes.py` confirmed:
1.  **Regex**: Correctly extracts `scammer.fraud@fakebank` and `+91` numbers.
2.  **Rate Limits**: `groq/compound` TPD is correctly set to 1 Billion (Unlimited).
3.  **LLM Stability**: Calling `generate_structured` with a tricky schema on `FAST_CHAT` no longer crashes. It returns a response, and even if the model outputs chatty text (e.g., "busy hoon abhi"), the system catches the JSON error gracefully.

## Next Steps
-   **Monitor Telemetry**: Watch for `[RECOVERY] Schema Mode Failed` logs to identify if we need to permanently disable schema mode for specific models in the registry.
-   **Schema Simplification**: If 400 errors persist even with fallbacks, consider simplifying the JSON schemas used for `FORENSIC_SEARCH`.