Young Researcher Series Seminar: Keyon Vafa (Harvard)- Evaluating the Implicit World Models of Generative Models
Abstract: The challenge of evaluation is making conclusions about a model’s capabilities from a small amount of data. While there are many benchmarks that allow us to quantify a model’s performance on different types of tasks, it is unclear how to turn these results into robust conclusions about a model’s understanding or its capabilities. This talk will propose theoretically-grounded definitions and metrics that test for a model’s implicit understanding, or its world model. We will focus on two settings: one where models are designed to perform a single task, and another where a foundation model is intended to perform many tasks. These exercises demonstrate that models can make highly accurate predictions with incoherent world models, revealing their fragility.
Speakers
Keyon Vafa
Keyon Vafa is a postdoctoral fellow at Harvard University. His research focuses on developing new evaluation methodology in order to evaluate and improve generative models in AI. Keyon completed his PhD in computer science from Columbia University, where he was an NSF GRFP Fellow and the recipient of the Morton B. Friedman Memorial Prize for excellence in engineering. He organized the NeurIPS 2024 Workshop on Behavioral Machine Learning and the ICML 2025 Workshop on Assessing World Models, and he is a member of the early career board of the Harvard Data Science Review.