Spaces:

AvinashAnalytics
/

sentinel-scam-honeypo

Paused

Deployment Ready: Fixed scam detection low confidence, added production audit report, optimized throttles

1838600 5 months ago

1.93 kB

Topic 10: Cost Control & Token Optimization Strategy

Audit Date: 2026-02-01 Auditor: Agent Antigravity Scope: FinOps & Efficiency

The system is engineered to minimize "Token Burn" (Wasted input tokens) and "Looping Costs".

Tactic	Implementation	Savings Impact
Prompt Caching	The massive `SCAM_TAXONOMY` (2k tokens) is static. Groq caches likely 90% of inputs.	High (~50%)
Model Downsizing	"Stall" messages and simple "Hook" replies use `8B` models instead of `70B`.	High (~80%)
Strict Output	`JSON_SCHEMA` forces the LLM to output only the JSON, no "Here is your analysis..." chatter.	Medium (~20%)

Max Context: config.MAX_CONVERSATION_LENGTH = 50.
- Why? Prevents infinite loops where an attacker keeps the bot talking forever.
Max Tokens: LLM_MAX_TOKENS = 500.
- Why? Prevents the model from generating 4-page essays when a 1-line reply is needed.
Rate Limits: 30 Requests/Minute.
- Calculation: 30 Req * 2k In / 500 Out = ~75k Tokens/Min Max.
- Cost: At Groq prices (~$0.50/M), max burn is ~$0.04/min. Sustainable.

File: app/core/llm_client.py
Logic: _switchboard(role)
- If Status == Hook (Simple): Use FAST_MODEL (Cheap).
- If Status == Extract (Complex): Use SMART_MODEL (Expensive).
- Result: You don't pay for a PhD (70B) to say "Hello" (8B).

The system includes code to switch providers if RateLimitError (429) occurs.