Michael Hu

I am a fourth-year PhD student at the NYU Center for Data Science, advised by Kyunghyun Cho and Tal Linzen. I am supported by the NSF GRFP.

I study how to train and adapt large language models.

Curriculum learning: [Aioli], [pre-pretraining]
Online adaptation: in-context active learning

I’m also interested in ML x {cognitive science, visualization}.

Cognitive science: [On human-scale LMs], [BabyLM]
Visualization: [Latent state models of training dynamics],
[How to visualize training dynamics]

Previously, I completed a BSE at Princeton CS, where I spent two lovely years working with Karthik Narasimhan and Tom Griffiths. I then joined Yobi AI for two years as the first employee.

In my spare time, I enjoy cooking, running, and playing basketball.

news

Jul 6, 2025	New preprints: “Scaling Laws Are Unreliable for Downstream Tasks” and “RELIC: Evaluating Compositional Instruction Following via Language Recognition”.
Jun 26, 2025	Gave talks on “Between Circuits and Chomsky” at École Normale Supérieure CoML, FLaNN, Ryco Lab Reading Group, and CDS Seminar.
Mar 15, 2025	“Aioli: A Unified Optimization Framework for Language Model Data Mixing” and “How to visualize training dynamics in neural networks” accepted to ICLR 2025.
Feb 27, 2025	New preprint: “Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases”.
Jul 17, 2024	New preprint: “The importance of human-scale language modeling for psycholinguistics.”

selected publications

2025

ACL

Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases

Michael Y. Hu, Jackson Petty, Chuan Shi, and 2 more authors

In Proceedings of the Association for Computational Linguistics, Jul 2025
ICLR

Aioli: A Unified Optimization Framework for Language Model Data Mixing

Mayee F. Chen*, Michael Y. Hu*, Nicholas Lourie, and 2 more authors

ICLR, Jul 2025

arXiv Website
JML

Bigger is not always better: The importance of human-scale language modeling for psycholinguistics

Ethan Wilcox, Michael Y. Hu, Aaron Mueller, and 6 more authors

Journal of Memory and Language, Jul 2025

Website

2023

TMLR

Latent State Models of Training Dynamics

Michael Y. Hu, Angelica Chen, Naomi Saphra, and 1 more author

Transactions on Machine Learning Research, Jul 2023

Website