Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMsJan 1, 2024·Angelica Chen,Ravid Schwartz-Ziv,Kyunghyun Cho,Matthew LeavittNaomi Saphra· 0 min read Cite URLTypeConference paperPublicationInternational Conference on Learning Representations (ICLR)Last updated on Jan 1, 2024Interpretability Training Dynamics AuthorsNaomi SaphraResearch Fellow ← Mechanistic? Jan 1, 2024TRAM: Bridging Trust Regions and Sharpness Aware Minimization Jan 1, 2024 →