Add diagnostic eval metrics, why-distribution tracking, and generic character filter 349b999 Claude commited on Feb 10