Benchmarks as Microscopes: A Call for Model MetrologyJan 1, 2024·Michael Saxon,Ari Holtzman,Peter West,William Yang WangNaomi Saphra· 0 min read Cite URLTypeConference paperPublicationConference on Language Modeling (COLM)Last updated on Jan 1, 2024Large Language Models Evaluation Position AuthorsNaomi SaphraResearch Fellow ← Attribute Diversity Determines the Systematicity Gap in VQA Jan 1, 2024Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model Behaviors Jan 1, 2024 →