Spaces:

Itachi1824
/

compliance-auditor-env

Running

App Files Files Community

compliance-auditor-env

981 kB

Ctrl+K

1 contributor

History: 35 commits

Itachi-1824

feat: 10-model leaderboard — step-3.5-flash leads at 0.381, 6 tiers of differentiation

99c0073 about 2 months ago

outputs
feat: 10-model leaderboard — step-3.5-flash leads at 0.381, 6 tiers of differentiation about 2 months ago
scenarios
feat: investigation-grade overhaul + procedural generation 2 months ago
scripts
feat: /grader endpoint, validate-submission.sh, session cleanup, final polish 2 months ago
server
fix: query budget 500, sync all hardcoded refs across codebase 2 months ago
tests
feat: playground all-fields-visible grid, loop detection, 200 query budget 2 months ago
.dockerignore

53 Bytes
fix: 3 showstoppers (dockerignore, END format, efficiency gaming) + license + unused imports 2 months ago
.gitattributes

1.58 kB
fix: brutal audit — reset tool_call_counts, date dedup, unused vars, playground overhaul with scenario picker + status dashboard 2 months ago
.gitignore

115 Bytes
feat: eu ai act compliance auditor — mcp-based openenv environment 2 months ago
Dockerfile

814 Bytes
fix: mount gradio at /web (hf iframe path), disable openenv default ui 2 months ago
LICENSE

1.06 kB
fix: 3 showstoppers (dockerignore, END format, efficiency gaming) + license + unused imports 2 months ago
README.md

9.64 kB
fix: python 3.11 f-string compat, inference OPENENV_BASE_URL fix, README action/observation spaces 2 months ago
__init__.py

60 Bytes
feat: eu ai act compliance auditor — mcp-based openenv environment 2 months ago
benchmark_all.py

7.37 kB
feat: industrial 50-model benchmark with rate limiting and resume 2 months ago
benchmark_leaderboard.py

7.13 kB
feat: investigation-grade overhaul + procedural generation 2 months ago
client.py

3.28 kB
fix: brutal audit — reset tool_call_counts, date dedup, unused vars, playground overhaul with scenario picker + status dashboard 2 months ago
evaluate_models.py

5.57 kB
fix: brutal audit — reset tool_call_counts, date dedup, unused vars, playground overhaul with scenario picker + status dashboard 2 months ago
inference.py

19.3 kB
fix: python 3.11 f-string compat, inference OPENENV_BASE_URL fix, README action/observation spaces 2 months ago
models.py

1.91 kB
fix: typed Action model, OPENAI_API_KEY support, proper spec compliance 2 months ago
openenv.yaml

917 Bytes
feat: investigation-grade overhaul + procedural generation 2 months ago
pyproject.toml

942 Bytes
feat: eu ai act compliance auditor — mcp-based openenv environment 2 months ago
run_benchmark.py

5.24 kB
feat: 10-model leaderboard — step-3.5-flash leads at 0.381, 6 tiers of differentiation about 2 months ago
uv.lock

580 kB
feat: eu ai act compliance auditor — mcp-based openenv environment 2 months ago