3

Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs

Most interpretability research in NLP focuses on understanding the behavior and features of a fully trained model. However, certain …

Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho, Matthew L. Leavitt, Naomi Saphra

TRAM: Bridging Trust Regions and Sharpness Aware Minimization

By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness …

Tom Sherborne, Naomi Saphra, Pradeep Dasigi, Hao Peng

Dynamic Masking Rate Schedules for MLM Pretraining

Zachary Ankner, Naomi Saphra, Davis Blalock, Jonathan Frankle, Matthew L. Leavitt

Attribute Diversity Determines the Systematicity Gap in VQA

Ian Berlot-Attwell, A. Michael Carrell, Kumar Krishna Agrawal, Yash Sharma, Naomi Saphra

First Tragedy, then Parse: History Repeats Itself in the New Era of Large Language Models

Many NLP researchers are experiencing an existential crisis triggered by the astonishing success of ChatGPT and other systems based on …

Naomi Saphra, Eve Fleisig, Kyunghyun Cho, Adam Lopez