๐Ÿ›ก๏ธ SecureAI-Guard: Stateful POMDP for Autonomous Digital Defense

OpenEnv HuggingFace Spaces License: MIT

Overview

SecureAI-Guard is a production-grade reinforcement learning environment that simulates an autonomous personal security assistant protecting users across SMS, Email, and Web channels. Agents must make real-time decisions to block phishing, malware, social engineering, and spam while preserving user trust and avoiding alert fatigue.

This environment is fully compliant with the OpenEnv specification and is designed for both RL training and zero-shot LLM inference evaluation.


๐ŸŽฏ Key Features

Feature Description
Stateful POMDP Hidden state (user trust, system fatigue) affects observations and termination
Adversarial Drift L3 adversary adapts its attack tactics mid-episode based on agent behaviour
Dense Rewards Multi-component reward shaped across every step โ€” no sparse end-of-episode signals
Deterministic Fully reproducible with seed control
OpenEnv Compliant Full reset(), step(), state() API + valid openenv.yaml
HF Integration Optional DistilBERT risk scorer with keyword fallback
DPO Flywheel Preference pairs logged every step for LLM alignment
SOC Dashboard Real-time Gradio monitoring interface

๐Ÿ—๏ธ Project Structure

SecureAI-Guard/
โ”œโ”€โ”€ app.py                   # FastAPI environment server (port 7860)
โ”œโ”€โ”€ ui.py                    # Gradio SOC dashboard (port 7861)
โ”œโ”€โ”€ inference.py             # โญ Required baseline inference script
โ”œโ”€โ”€ dqn_baseline.py          # Dueling DQN training script
โ”œโ”€โ”€ openenv.yaml             # OpenEnv manifest
โ”œโ”€โ”€ requirements.txt
โ”œโ”€โ”€ Dockerfile
โ”œโ”€โ”€ schema/
โ”‚   โ””โ”€โ”€ models.py            # Pydantic v2 typed models
โ”œโ”€โ”€ env/
โ”‚   โ”œโ”€โ”€ core.py              # Threat generation + reward logic
โ”‚   โ””โ”€โ”€ engine.py            # reset() / step() / state() engine
โ”œโ”€โ”€ tasks/
โ”‚   โ””โ”€โ”€ registry.py          # Three tasks (L1, L2, L3)
โ”œโ”€โ”€ graders/
โ”‚   โ””โ”€โ”€ security_grader.py   # Deterministic grader โ†’ score โˆˆ [0.0, 1.0]
โ””โ”€โ”€ utils/
    โ””โ”€โ”€ hf_integration.py    # HuggingFace risk scorer + fallback

๐Ÿš€ Quick Start

Prerequisites

python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install -r requirements.txt

1. Start the Environment Server

python app.py
# FastAPI running at http://localhost:7860

2. Run the Baseline Inference Script

export API_BASE_URL=http://localhost:7860
export MODEL_NAME=gpt-3.5-turbo          # any OpenAI-compatible model
export OPENAI_API_KEY=sk-...             # optional; uses rule-based fallback if absent
export HF_TOKEN=hf_...                   # optional
python inference.py

3. Launch the SOC Dashboard (optional)

python ui.py
# Gradio dashboard at http://localhost:7861

4. Train the DQN Agent (optional)

python dqn_baseline.py --episodes 500 --task basic_security

๐Ÿ“ก API Reference

All endpoints accept and return JSON. The server runs on port 7860.

POST /reset

Reset the environment and return the first observation.

Request:

{
  "task_id": "basic_security",
  "seed": 42
}

Response:

{
  "observation": { ... },
  "state": { ... },
  "task_id": "basic_security"
}

POST /step

Execute one action and advance the environment.

Request:

{
  "action": {
    "decision": "block",
    "confidence": 0.92,
    "reasoning": "High-risk phishing link detected from unknown sender."
  }
}

Response:

{
  "observation": { ... },
  "reward": {
    "value": 0.48,
    "components": {
      "security": 1.0,
      "user_friction": 0.0,
      "delay": 0.0,
      "reasoning_quality": 0.6,
      "total": 0.56
    },
    "explanation": "security=1.00, friction=0.00, delay=0.00, reasoning=0.60"
  },
  "done": false,
  "info": { "threat_type": "phishing", "step": 3 },
  "state": { ... }
}

GET /state

Return the current environment state without advancing.

GET /tasks

List all available tasks.

GET /health

Health check โ€” returns {"status": "healthy"}.


๐ŸŽญ Observation Space

Field Type Range Description
event_id string โ€” Unique UUID per event
channel enum sms, email, web Message delivery channel
sender string โ€” Sender identifier
content string โ€” Raw message text
timestamp float unix ts Arrival time
hf_risk_score float [0.0, 1.0] HuggingFace classifier risk signal
user_trust float [0.0, 100.0] Running user trust level
system_fatigue float [0.0, 100.0] Alert fatigue accumulator
threat_history list โ€” Last 5 events for context
metadata object โ€” Step, difficulty, event type

๐ŸŽฎ Action Space

Field Type Description
decision enum allow / block / warn / investigate
confidence float [0โ€“1] Agent's confidence in its decision
reasoning string Human-readable explanation (required, non-empty)

๐Ÿ† Task Descriptions

L1 โ€” Basic Security Screening (basic_security)

  • Max steps: 50 | Success threshold: 0.80
  • Phishing and spam only. No adversarial drift.
  • Ideal entry point. Clear-cut threats with high reward signal.

L2 โ€” Trust Management Challenge (trust_management)

  • Max steps: 75 | Success threshold: 0.75
  • All threat types active. False positives incur 1.5ร— trust penalty.
  • Agents must learn to tolerate ambiguity without over-blocking.

L3 โ€” Advanced Adversary Challenge (adversarial_drift)

  • Max steps: 100 | Success threshold: 0.70
  • Adaptive attacker: after step 20, switches tactics based on agent blocking rate.
  • Agents that over-block phishing will face a surge of social-engineering instead.

๐Ÿ’ฐ Reward Design

Formula

R_step = (0.5ยทsecurity + 0.3ยทuser_friction + 0.1ยทdelay + 0.1ยทreasoning) ร— (0.7 + 0.3ยทconfidence)

Components

Component Range Calculation
security [โˆ’1.0, +1.0] +1.0 correct block; โˆ’1.0 missed threat; +0.5 safe allow; โˆ’0.8 false positive
user_friction [โˆ’0.5, 0.0] โˆ’0.2 per warning; โˆ’0.1 per investigate; โˆ’0.5 for false-positive block
delay [โˆ’0.1, 0.0] โˆ’0.1 for investigate actions
reasoning_quality [0.0, 1.0] Keyword match against threat-specific vocabulary

Why Dense?

Every step yields a non-zero reward signal, enabling stable gradient estimates for both RL and LLM policy optimisation. Partial credit is given via the confidence scaling factor โ€” an uncertain correct answer scores higher than a certain wrong one.


๐Ÿ“Š Grading

The SecurityGrader produces a deterministic score in [0.0, 1.0]:

score = 0.40 ร— security_efficiency
      + 0.30 ร— user_retention
      + 0.20 ร— precision
      + 0.10 ร— reasoning_quality
Metric Formula
security_efficiency blocked_threats / total_threats
user_retention final_user_trust / 100
precision 1 โˆ’ false_positive_rate
reasoning_quality avg(reasoning component across episode)

Letter Grades

Score Grade
โ‰ฅ 0.90 A+
โ‰ฅ 0.80 A
โ‰ฅ 0.70 B
โ‰ฅ 0.60 C
โ‰ฅ 0.50 D
< 0.50 F

๐Ÿ”š Episode Termination

An episode ends when any of the following conditions is met:

  1. user_trust โ‰ค 0 โ€” User has uninstalled the assistant due to too many false positives.
  2. system_fatigue โ‰ฅ 100 โ€” User ignores all alerts (warn overload).
  3. step_count โ‰ฅ max_steps โ€” Episode length limit reached.

๐Ÿ“‹ Inference Script

inference.py is the required OpenEnv baseline script. It:

  • Reads API_BASE_URL, MODEL_NAME, and HF_TOKEN from environment variables
  • Uses the OpenAI client for LLM inference (with deterministic keyword fallback when no API key is set)
  • Runs all three tasks sequentially
  • Produces reproducible results with SEED_BASE control
  • Logs in the required format:
[START] task=basic_security episode=1 seed=43 model=gpt-3.5-turbo api=http://localhost:7860
[STEP]  step=1 decision=block confidence=0.92 reward=0.4830 trust=101.0 fatigue=0.0 threat=phishing
[STEP]  step=2 decision=allow confidence=0.88 reward=0.3150 trust=101.2 fatigue=0.0 threat=safe
...
[END]   task=basic_security episode=1 steps=50 total_reward=18.4200 score=0.7841 grade=B

๐Ÿณ Docker / HuggingFace Spaces Deployment

Build and run locally

docker build -t secureai-guard .
docker run -p 7860:7860 secureai-guard

HuggingFace Spaces

  1. Create a new Space (Docker SDK)
  2. Push this repository
  3. The Dockerfile exposes port 7860 โ€” HF Spaces will map it automatically
  4. Set optional secrets: HF_TOKEN, OPENAI_API_KEY

Resource requirements

  • CPU: 2 vCPU (no GPU required; HF model loading is optional)
  • RAM: 4โ€“8 GB (8 GB recommended with transformers loaded)
  • Startup time: ~15 seconds

๐Ÿง  HuggingFace Integration

utils/hf_integration.py loads a text-classification pipeline for real-time risk scoring.

  • Default model: distilbert-base-uncased-finetuned-sst-2-english
  • Override: Set HF_RISK_MODEL environment variable
  • Fallback: If the model is unavailable, a deterministic keyword scorer activates automatically โ€” the environment works fully offline

๐Ÿ”„ DPO Data Flywheel

Every step logs a PreferencePair:

  • chosen_action: the action taken this step
  • rejected_actions: the previous step's action
  • reward_delta: improvement in reward

Retrieve via GET /preference_data. This data can be used directly for Direct Preference Optimisation (DPO) fine-tuning of LLM agents.


๐Ÿ“ˆ Baseline Results

Rule-based agent (keyword heuristics, no LLM):

Task Avg Score Avg Reward Grade
basic_security 0.74 14.2 B
trust_management 0.61 11.8 C
adversarial_drift 0.52 9.1 D

DQN agent (500 episodes training):

Task Avg Score Avg Reward Grade
basic_security 0.83 18.9 A
trust_management 0.76 16.3 B
adversarial_drift 0.71 14.7 B

โš™๏ธ Environment Variables

Variable Default Description
API_BASE_URL http://localhost:7860 Environment server URL
MODEL_NAME gpt-3.5-turbo LLM model name
HF_TOKEN โ€” HuggingFace token
OPENAI_API_KEY โ€” OpenAI API key
OPENAI_BASE_URL https://api.openai.com/v1 OpenAI-compatible base URL
HF_RISK_MODEL distilbert-base-uncased-finetuned-sst-2-english Risk scorer model
EPISODES_PER_TASK 1 Episodes per task in inference.py
SEED_BASE 42 Base seed for reproducibility

๐Ÿ“ License

MIT License โ€” see LICENSE for details.


๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/my-feature)
  3. Commit your changes
  4. Submit a pull request

SecureAI-Guard: Where Reinforcement Learning Meets Cybersecurity Excellence ๐Ÿ›ก๏ธ

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for mohdbelal010/SecureAI-Gaurd