Michael Hu

I am a fourth-year PhD student at the NYU Center for Data Science, advised by Kyunghyun Cho and Tal Linzen. I am supported by the NSF GRFP.
I study how to train and adapt large language models.
- Curriculum learning: [Aioli], [pre-pretraining]
- Online adaptation: in-context active learning
I’m also interested in ML x {cognitive science, visualization}.
- Cognitive science: [On human-scale LMs], [BabyLM]
- Visualization: [Latent state models of training dynamics],
[How to visualize training dynamics]
Previously, I completed a BSE at Princeton CS, where I spent two lovely years working with Karthik Narasimhan and Tom Griffiths. I then joined Yobi AI for two years as the first employee.
In my spare time, I enjoy cooking, running, and playing basketball.
news
Jul 6, 2025 | New preprints: “Scaling Laws Are Unreliable for Downstream Tasks” and “RELIC: Evaluating Compositional Instruction Following via Language Recognition”. |
---|---|
Jun 26, 2025 | Gave talks on “Between Circuits and Chomsky” at École Normale Supérieure CoML, FLaNN, Ryco Lab Reading Group, and CDS Seminar. |
Mar 15, 2025 | “Aioli: A Unified Optimization Framework for Language Model Data Mixing” and “How to visualize training dynamics in neural networks” accepted to ICLR 2025. |
Feb 27, 2025 | New preprint: “Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases”. |
Jul 17, 2024 | New preprint: “The importance of human-scale language modeling for psycholinguistics.” |
selected publications
2025
- ACLBetween Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic BiasesIn Proceedings of the Association for Computational Linguistics, Jul 2025
- JMLBigger is not always better: The importance of human-scale language modeling for psycholinguisticsJournal of Memory and Language, Jul 2025
2023
- TMLRLatent State Models of Training DynamicsTransactions on Machine Learning Research, Jul 2023