sankalphs
/

duel-tiny-fighter

Reinforcement Learning

Model card Files Files and versions

sankalphs commited on 14 days ago

Commit

65bf006

·

verified ·

1 Parent(s): a27e70f

Add model card

Files changed (1) hide show

README.md +88 -0

README.md ADDED Viewed

	@@ -0,0 +1,88 @@

+---
+license: mit
+tags:
+  - fighting-game
+  - tiny-model
+  - reinforcement-learning
+  - game-ai
+library_name: torch
+---
+# Duel Tiny Fighter (78,863 parameters)
+A real-time CPU policy network for NPC move selection in a 3D fighting game.
+Runs in <1ms per inference on CPU, conditioned on Nemotron strategic weights.
+## Architecture
+| Layer | Shape | Notes |
+|-------|-------|-------|
+| Linear | 168 → 256 | One-hot move history + scalars |
+| LayerNorm | 256 | Stable at batch=1 inference |
+| ReLU + Dropout(0.1) | | |
+| Linear | 256 → 128 | |
+| LayerNorm | 128 | |
+| ReLU + Dropout(0.1) | | |
+| Linear | 128 → 15 | Logits over 15 moves |
+**Total parameters:** 78,863
+## Move Vocabulary
+`jab`, `cross`, `hook`, `kick`, `uppercut`, `block`, `parry`, `dodge`,
+`advance`, `retreat`, `grapple`, `throw`, `sweep`, `feint`, `wait`
+## Input Features (168-dim)
+- Last 5 NPC moves (5 × 15 one-hot = 75)
+- Last 5 player moves (5 × 15 one-hot = 75)
+- HP difference, stamina difference (2)
+- Distance one-hot (3)
+- Strategy weights: aggression, defense, parry_affinity, kick_affinity, grapple_affinity (5)
+- Round normalised (1)
+- Absolute HP, stamina for both (4)
+- Padding to 168
+## Inference
+```python
+import torch
+from tiny_fighter import TinyFighter, state_to_features, make_move_mask
+model = TinyFighter()
+model.load_state_dict(torch.load("tiny_fighter.pt", map_location="cpu"), strict=False)
+model.eval()
+feats = state_to_features(
+    last_npc_moves=["jab", "block"],
+    last_player_moves=["cross", "retreat"],
+    player_hp=80.0, npc_hp=50.0,
+    player_stamina=60.0, npc_stamina=40.0,
+    distance="mid",
+    aggression=0.7, defense=0.3,
+    parry_affinity=0.4, kick_affinity=0.6,
+    grapple_affinity=0.2,
+)
+mask = make_move_mask("mid")
+with torch.inference_mode():
+    logits = model.predict(feats, mask)
+    move = logits.softmax(-1).argmax().item()
+print(f"Selected: {model.MOVES[move]}")
+```
+## Training
+Trained on 20k procedurally generated (state, strategy_weights) → move examples
+using supervised learning on CPU. The model learns to map Nemotron's strategic
+direction (aggressive/defensive/grappling) into concrete move probabilities.
+## Part of Duel of Nemotron
+- **Strategist:** Nemotron 3 Nano 4B (fine-tuned, Modal A10)
+- **Executor:** This tiny model (CPU, <1ms)
+- **Game:** React + Three.js 3D fighting game
+Built for the [Build Small Hackathon](https://huggingface.co/build-small-hackathon)
+by [@sankalphs](https://huggingface.co/sankalphs).