Publications

(2025). Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon. International Conference on Learning Representations (ICLR).
(2025). PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs. International Conference on Learning Representations (ICLR).
(2025). How to visualize training dynamics in neural networks. Blog Post Track at International Conference on Learning Representations (ICLR BlogPosts).
(2025). Distributional Scaling Laws for Emergent Capabilities.
(2024). Understanding biological active sensing behaviors by interpreting learned artificial agent policies. Workshop on Interpretable Policies in Reinforcement Learning @RLC-2024.
(2024). Transcendence: Generative Models Can Outperform The Experts That Train Them. Neural Information Processing Systems (NeurIPS).
(2024). TRAM: Bridging Trust Regions and Sharpness Aware Minimization. International Conference on Learning Representations (ICLR).
(2024). Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs. International Conference on Learning Representations (ICLR).
(2024). Mechanistic?. EMNLP BlackboxNLP Workshop.