File size: 5,308 Bytes
5d5e37e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 | ---
title: EU AI Act Compliance Auditor
emoji: "π"
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 7860
tags:
- openenv
---
# EU AI Act Compliance Auditor
An MCP-based environment where LLM agents audit AI systems for EU AI Act compliance β from risk classification to violation identification to remediation planning. Scenarios based on real regulatory articles. Parameter randomization on every reset prevents memorization; agents must learn the **audit process**, not specific answers.
## Why This Environment
The EU AI Act's major enforcement deadline is **August 2, 2026** β less than 4 months away. Every company deploying AI in Europe faces fines up to **EUR 35 million or 7% of global revenue**. Yet no automated compliance auditing benchmark exists. This environment fills that gap with 8 realistic scenarios across the full spectrum of EU AI Act risk categories.
## Stats
| Metric | Value |
|--------|-------|
| Scenarios | 8 |
| MCP Tools | 11 |
| Reward Components | 6 |
| Difficulty Tiers | 3 (easy / medium / hard) |
| State Graph Nodes | 12 per scenario |
| Parameter Randomization | Company, region, version, dates per reset |
## Tools (MCP Interface)
### Investigation
| Tool | Description |
|------|-------------|
| `get_system_overview` | Gather system description, deployer info, deployment context |
| `classify_system` | Classify risk level (prohibited / high_risk / limited_risk / minimal_risk) |
| `check_documentation` | Review Annex IV technical documentation completeness |
| `audit_training_data` | Check bias, representativeness, data governance (Article 10) |
| `verify_human_oversight` | Verify Article 14 human-in-the-loop mechanisms |
| `check_transparency` | Check Article 50 transparency obligations |
| `assess_risk_management` | Review risk management system (Article 9) |
| `check_logging` | Verify automatic logging and traceability (Article 12) |
### Resolution
| Tool | Description |
|------|-------------|
| `submit_finding` | Report a compliance violation (call per finding) |
| `recommend_fix` | Propose remediation with priority |
| `verify_compliance` | Final determination β triggers terminal reward |
## Scenarios
### Easy
- **Customer Service Chatbot** β Limited-risk system missing AI disclosure (Article 50)
- **Music Recommendation Engine** β Minimal-risk system needing voluntary code of conduct
### Medium
- **AI Resume Screener** β High-risk hiring AI (Annex III) with gender bias, missing oversight, incomplete documentation
- **Credit Scoring Model** β High-risk fintech system with opaque features and no right to human review
- **Emergency Triage AI** β Medical device with age bias and no prospective clinical validation
### Hard
- **Citizen Wellness App** β **PROHIBITED** social scoring system disguised as a voluntary wellness tool. Must identify it as prohibited under Article 5(1)(c)
- **AI Content Studio** β Deepfake generation platform missing all Article 50 transparency obligations
- **Corporate AI Portfolio** β Multi-system audit with 4 interconnected AI systems sharing a data lake. Must identify compound risks and cross-system data flow issues
## 6-Component Reward
| Component | Weight | Description |
|-----------|--------|-------------|
| Classification | 20% | Correct risk category identification |
| Finding Completeness | 25% | Recall of ground-truth violations |
| Finding Precision | 15% | Penalty for false positives / red herring findings |
| Remediation Quality | 15% | Correct fixes in priority order |
| Methodology | 15% | Followed correct audit sequence (overview β classify β investigate β find β fix β verify) |
| Efficiency | 10% | Queries used vs optimal path |
All rewards clamped to (0.01, 0.99) for OpenEnv validator compliance.
## Quick Start
```bash
# Install
pip install "openenv-core[core]" fastmcp gradio httpx openai
# Run locally
uvicorn server.app:app --host 0.0.0.0 --port 7860
# Run inference
export API_BASE_URL="https://integrate.api.nvidia.com/v1"
export MODEL_NAME="google/gemma-4-31b-it"
export HF_TOKEN="your-key"
python inference.py --space https://Itachi1824-compliance-auditor-env.hf.space
# Docker
docker build -t compliance-env . && docker run -p 7860:7860 compliance-env
```
## API
### Standard OpenEnv
- `POST /reset` β Start new episode
- `POST /step` β Execute action
- `GET /state` β Get episode state
- `GET /health` β Health check
### Custom HTTP Session API
- `POST /api/reset` β Create session, returns tools + observation
- `POST /api/call_tool` β Call an audit tool in a session
- `POST /api/close` β End session
## Architecture
```
compliance_env/
βββ server/
β βββ app.py # FastAPI + sessions + Gradio UI
β βββ environment.py # MCP environment with 11 tools
β βββ engine.py # State graph + 6-component reward
βββ scenarios/
β βββ registry.py # 8 scenarios with state graphs
βββ client.py # HTTP client for inference
βββ inference.py # OpenAI function-calling agent
βββ models.py # Pydantic observation/state models
βββ Dockerfile # Port 7860, python:3.11-slim
βββ openenv.yaml # OpenEnv manifest with tasks
```
|