Causation Does Not Imply Correlation: A Study of Circuit Mechanisms and Model BehaviorsJan 1, 2024·Jenny Kaufmann,Victoria R. Li,Martin Wattenberg,David Alvarez-MelisNaomi Saphra· 0 min read Cite URLTypePreprintPublicationNeurIPS Workshop on Scientific Methods for Understanding Deep LearningLast updated on Jan 1, 2024Training Dynamics Interpretability Science of Deep Learning Random Variation AuthorsNaomi SaphraResearch Fellow ← Benchmarks as Microscopes: A Call for Model Metrology Jan 1, 2024ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context Jan 1, 2024 →