Naomi Saphra

Naomi Saphra

Gradient Descent Spectator

they / she

I am a research fellow at the Kempner Institute at Harvard University. I am interested in NLP training dynamics: how models learn to encode linguistic patterns or other structure and how we can encode useful inductive biases into the training process. Previously, I earned a PhD from the University of Edinburgh on Training Dynamics of Neural Language Models; worked at NYU, Google and Facebook; and attended Johns Hopkins and Carnegie Mellon University. Outside of research, I play roller derby under the name Gaussian Retribution, perform standup comedy, and shepherd disabled programmers into the world of code dictation.

  • Artificial Intelligence
  • Natural Language Processing
  • Training Dynamics
  • Compositional Generalization
  • Animal Communication
  • PhD in Informatics, 2021

    University of Edinburgh

  • MEng in Computer Science, 2015

    Johns Hopkins University

  • BSc in Computer Science, 2013

    Carnegie Mellon University

Recent Publications

Quickly discover relevant content by filtering publications.
(2024). TRAM: Bridging Trust Regions and Sharpness Aware Minimization. International Conference on Learning Representations (ICLR).

Cite URL

(2024). Dynamic Masking Rate Schedules for MLM Pretraining. European Association for Computational Linguistics (EACL).

Cite URL

(2023). Attribute Diversity Determines the Systematicity Gap in VQA. arXiv preprint.

Cite URL

(2023). State-of-the-art generalisation research in NLP: a taxonomy and review. Nature Machine Intelligence.


(2023). Interpretability Creationism. The Gradient.

Cite URL