Papers
arxiv:2509.23023

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

Published on May 14
Authors:
,

Abstract

A simplified social deduction game called Mini-Mafia is introduced as a benchmark to analytically evaluate language model interactions through measurable parameters of deception, disclosure, and detection capabilities.

Large language models are increasingly deployed in multi-agent settings whose outcomes hinge on social intelligence, motivating evaluations of their interactive capabilities; yet existing studies remain overwhelmingly empirical, leaving us without a theoretical understanding of how agent interactions determine collective outcomes. To address this, we introduce Mini-Mafia, a four-player simplification of the social deduction game Mafia in which a fixed night phase reduces the game to a single critical exchange among a mafioso, a detective, and a villager. In this setting, we show that the mafia win-rate p is predicted by the analytical formula logit(p) = v times (m - d), where m, d, and v represent the mafioso's deception, the detective's disclosure, and the villager's detection capabilities. We turn this analytical framework into the Mini-Mafia Benchmark, where Bayesian inference over gameplay data yields per-model estimates of the intrinsic parameters m, d, and v. For I models, only 3I parameters suffice to predict the outcomes of all I^3 tournament combinations; and in 5-fold cross-validation the formula achieves a 76.6% Brier-score reduction over a random baseline. The benchmark also reveals counterintuitive results: Grok 3 Mini is the strongest detector and GPT-5 Mini the strongest discloser, both ahead of DeepSeek V3.1, Claude Opus 4, and Claude Sonnet 4; while Claude Sonnet 4 is the weakest detector, near random chance. Together, these results show that Mini-Mafia, a simple but nontrivial multi-agent system, admits an analytical description and serves as a principled benchmark for language model interactions.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.23023 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.23023 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.