Lie Detection Model Organisms Collection Model organisms trained to reason about lying in CoT, then lie in text output. • 17 items • Updated 2 days ago
Apollo-Style Deception Probes Collection Lie detection probes trained following the approach of Detecting Strategic Deception Using Linear Probes. • 65 items • Updated 2 days ago
Targeted Apollo Deception Probes Collection Lie detection probes trained following the approach of 'Building Better Deception Probes Using Targeted Instruction Pairs' • 46 items • Updated about 9 hours ago
Catch a Liar: Unrelated Questions Classifier Collection Classifiers trained following the approach in How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions • 63 items • Updated about 9 hours ago
Did You Lie Probes Collection Probes for the forthcoming paper - Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms • 64 items • Updated about 9 hours ago
Were You Honest Probes Collection Probes for the forthcoming paper - Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms • 0 items • Updated 2 days ago
Were You Truthful Probes Collection Probes for the forthcoming paper - Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms • 31 items • Updated about 6 hours ago