ai-safety-institute 's Collections

Were You Truthful Probes

Probes for the forthcoming paper - Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms