Simplify Stage3 chunking to interleave-only and add eval diagnostics 3c18372 Food Desert commited on Feb 24
eval with structural inference + strong_implied + implications 50 images 3bb67c1 Food Desert commited on Feb 12
eval with structural inference + strong_implied + implications 08add8e Food Desert commited on Feb 12
Rewrite structural inference prompt for better Llama 3.1 8B performance 46fe384 Claude commited on Feb 12
Default min_why to strong_implied; add retrieval gap analysis script 4968635 Claude commited on Feb 11
Normalize GT annotations: expand implications, exclude non-evaluable tags 14e5c38 Claude commited on Feb 11
Remove data/eval_results/ from .gitignore so eval results are tracked 3edd051 Claude commited on Feb 10
Add --min-why threshold to filter Stage 3 selections by confidence level 09a248d Claude commited on Feb 10
Add diagnostic eval metrics, why-distribution tracking, and generic character filter 349b999 Claude commited on Feb 10