TRAM: Bridging Trust Regions and Sharpness Aware Minimization

Jan 1, 2024·

Tom Sherborne

Naomi Saphra

Naomi Saphra

,

Pradeep Dasigi

,

Hao Peng

· 0 min read

Type

Conference paper

Publication

International Conference on Learning Representations (ICLR)

Last updated on Jan 1, 2024

Language Model Training Generalization

Naomi Saphra

Authors

Research Fellow

← Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs Jan 1, 2024

Transcendence: Generative Models Can Outperform The Experts That Train Them Jan 1, 2024 →