---
title: "Teaching an LLM to Think Strategically: Game Theory with SFT + GRPO"
emoji: 🎮
colorFrom: blue
colorTo: purple
sdk: static
pinned: false
tags:
- game-theory
- reinforcement-learning
- grpo
- rlvr
- qwen
- fine-tuning
---

This Space hosts our blog post about training Qwen2.5-7B on game theory using a 3-phase pipeline (SFT → GRPO → Formulator).

See the full article rendered in the Space, or browse our resources:

| Resource | Link |
|---|---|
| 📊 Solver Dataset | [Alogotron/GameTheory-Bench](https://huggingface.co/datasets/Alogotron/GameTheory-Bench) |
| 📊 Formulator Dataset | [Alogotron/GameTheory-Formulator](https://huggingface.co/datasets/Alogotron/GameTheory-Formulator) |
| 🧠 Phase 1 Model | [Alogotron/GameTheory-Solver](https://huggingface.co/Alogotron/GameTheory-Solver) |
| 🧠 Phase 2 Model | [Alogotron/GameTheory-Reasoner](https://huggingface.co/Alogotron/GameTheory-Reasoner) |
| 🎮 Demo | [Alogotron/GameTheory-Solver-Demo](https://huggingface.co/spaces/Alogotron/GameTheory-Solver-Demo) |