ai-safety-institute 's Collections

Did You Lie Probes

Probes for the forthcoming paper - Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms