Naomi Saphra

Naomi Saphra

Gradient Descent Spectator

they / she

I am a current postdoctoral researcher at NYU with Kyunghyun Cho, and an incoming 2023 Kempner Fellow at Harvard. I am interested in NLP training dynamics: how models learn to encode linguistic patterns or other structure and how we can encode useful inductive biases into the training process. Previously, I earned a PhD from the University of Edinburgh on Training Dynamics of Neural Language Models, worked at Google and Facebook, and attended Johns Hopkins and Carnegie Mellon University. Outside of research, I play roller derby under the name Gaussian Retribution, do standup comedy, and shepherd disabled programmers into the world of code dictation.

  • Artificial Intelligence
  • Natural Language Processing
  • Training Dynamics
  • Compositionality
  • Generalization
  • Loss Surfaces
  • PhD in Informatics, 2021

    University of Edinburgh

  • MEng in Computer Science, 2015

    Johns Hopkins University

  • BSc in Computer Science, 2013

    Carnegie Mellon University

Recent Publications

Quickly discover relevant content by filtering publications.
(2023). State-of-the-art generalisation research in NLP: a taxonomy and review. Nature Machine Intelligence.


(2023). Dynamic Masking Rate Schedules for MLM Pretraining. arXiv preprint arXiv:2305.15096.


(2023). Interpretability Creationism. The Gradient.

Cite URL

(2023). Latent State Transitions in Training Dynamics. ICML High-Dimensional Learning Dynamics Workshop.


(2022). One Venue, Two Conferences: The Separation of Chinese and American Citation Networks. NeurIPS Workshop on Cultures in AI.

PDF Cite