TRAM: Bridging Trust Regions and Sharpness Aware MinimizationJan 1, 2024·Tom SherborneNaomi Saphra,Pradeep Dasigi,Hao Peng· 0 min read Cite URLTypeConference paperPublicationInternational Conference on Learning Representations (ICLR)Last updated on Jan 1, 2024Language Model Training Generalization AuthorsNaomi SaphraResearch Fellow ← Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Jan 1, 2024Transcendence: Generative Models Can Outperform The Experts That Train Them Jan 1, 2024 →