title: Cyber Duel Tiny
emoji: ⚡
colorFrom: yellow
colorTo: red
sdk: docker
hardware: a10g-small
app_file: app.py
pinned: false
Cyber Duel Tiny
A 270M-parameter combat advisor that replaces the Gemma 3 4B base model in Duel of Albion.
What it does
Given the player's last 5 moves, the model recommends a counter-move from the 9 legal options (jab, cross, low_kick, roundhouse, uppercut, parry, backstep, clinch, throw), conditioned on both fighters' stats (speed, power, range, weight, stance), stamina, distance, and round.
API
POST /predict — counter-move recommendation.
POST /predict
Content-Type: application/json
{
"sequence": "jab,cross,low_kick,jab,cross",
"player": {"name": "monk", "speed": 5, "power": 2, "range": 3, "weight": 0.8, "stance": "low", "stamina": 100, "hp": 100},
"npc": {"name": "brute","speed": 1, "power": 5, "range": 2, "weight": 1.4, "stance": "hunched", "stamina": 100, "hp": 100},
"round": 3,
"distance": "close",
"playerId": "ab12-...", // optional, enables online RL
"playerPrevMove": "jab" // optional, back-fills the previous log row
}
{
"reasoning": "The player is alternating jab and cross before finishing low...",
"counterMove": "throw",
"sequence": "jab,cross,low_kick,jab,cross",
"adapterScope": "user" // "user" once you have a personalised adapter
}
GET /health — {ready, has_token, online_rl_enabled, user_adapters_cached, buffered_users}
GET /me?playerId=... — {rounds_logged, next_retrain_in, cooldown_left_sec, adapter_scope, online_rl_enabled}
POST /forget body {"playerId": "..."} — deletes the user's adapter + log (privacy / GDPR).
Online RL
If the request includes a playerId and the Space was started with the
MODAL_WEBHOOK_URL and MODAL_WEBHOOK_SECRET env vars set, the Space
will:
- Log
(state, model_move, player_next_move)to thecyber-duel-tiny-logsHugging Face dataset (private). - After 25 fresh rows have been flushed, POST to the Modal webhook to trigger a per-user DPO retrain.
- The new LoRA delta is uploaded to
cyber-duel-tiny-users/<uid>/and loaded on the next/predictfor that player.
The default base adapter (Sathvik0101/cyber-duel-tiny-adapter) is used
as the starting point for every per-user delta, and the global base
gets refreshed weekly by Modal's retrain_global_base job.
Training
Trained on Modal with LoRA + DPO (verifiable rewards from the in-game
combat resolver). See modal/app.py in the training repo.
How to redeploy
- Add
HF_TOKENas a Space Secret so the gatedgemma-3-270m-itweights can be downloaded. - (Optional) Add
MODAL_WEBHOOK_URLandMODAL_WEBHOOK_SECRETto enable online per-user RL. - Update
ADAPTER_MODELenv var to point to the latest adapter release. - The Space will hot-reload on push (you may need a manual restart to pick up the new code if the env vars change).