--- title: "Teaching an LLM to Think Strategically: Game Theory with SFT + GRPO" emoji: 🎮 colorFrom: blue colorTo: purple sdk: static pinned: false tags: - game-theory - reinforcement-learning - grpo - rlvr - qwen - fine-tuning --- This Space hosts our blog post about training Qwen2.5-7B on game theory using a 3-phase pipeline (SFT → GRPO → Formulator). See the full article rendered in the Space, or browse our resources: | Resource | Link | |---|---| | 📊 Solver Dataset | [Alogotron/GameTheory-Bench](https://huggingface.co/datasets/Alogotron/GameTheory-Bench) | | 📊 Formulator Dataset | [Alogotron/GameTheory-Formulator](https://huggingface.co/datasets/Alogotron/GameTheory-Formulator) | | 🧠 Phase 1 Model | [Alogotron/GameTheory-Solver](https://huggingface.co/Alogotron/GameTheory-Solver) | | 🧠 Phase 2 Model | [Alogotron/GameTheory-Reasoner](https://huggingface.co/Alogotron/GameTheory-Reasoner) | | 🎮 Demo | [Alogotron/GameTheory-Solver-Demo](https://huggingface.co/spaces/Alogotron/GameTheory-Solver-Demo) |