Peter West (U of B.C.)- Can Helpful Assistants be Unpredictable? Limits of Aligned LLMs

Date & Time:

March 6, 2025 1:00 pm – 2:00 pm

Location:

Crerar 346, 5730 S. Ellis Ave., Chicago, IL,

03/06/2025 01:00 PM 03/06/2025 02:00 PM America/Chicago Peter West (U of B.C.)- Can Helpful Assistants be Unpredictable? Limits of Aligned LLMs Crerar 346, 5730 S. Ellis Ave., Chicago, IL,

Abstract: The majority of public-facing language models have undergone some form of alignment–a family of techniques (e.g. reinforcement learning from human feedback) which aim to make models safer, more honest, and better at following instructions. In this talk, I will investigate the downsides of aligning LLMs. While the process improves model performance across a broad range of benchmark tasks, particularly those for which a “correct” answer is clear, it seems to mitigate some of the most interesting aspects of LLMs, including unpredictability and generation of text that humans find creative.

Speakers

Peter West

Assistant Professor, University of British Columbia

Peter is an Assistant Professor at UBC and a recent postdoc at the Stanford Institute for Human-Centered AI working in Natural Language Processing. His research broadly studies the capabilities and limits of large language models (and other generative AI systems). His work has been recognized with multiple awards, including best method paper at NAACL 2022, outstanding paper at ACL 2023, and outstanding paper at EMNLP 2023

Resources

Community

Five Paths to Lasting Influence: Celebrating Five UChicago CS Test of Time Award Recipients

Researchers Built Their Own ISP to Fix the Internet– A Decade Later, It’s Still Running

Hard to Discover, Harder to Use: The Widespread Failure of Ad Transparency Settings

How artists can protect their work from AI | Dr. Heather Zheng | TEDxChicago

Inside The Lab: How Can Robots Improve Our Lives?

The Future of AI Panel: Alumni Weekend

Speakers

Peter West

Five Paths to Lasting Influence: Celebrating Five UChicago CS Test of Time Award Recipients

Researchers Built Their Own ISP to Fix the Internet– A Decade Later, It’s Still Running

Hard to Discover, Harder to Use: The Widespread Failure of Ad Transparency Settings

Constraints on Quantum-Advantage Experiments Due to Noise

Data Movement Without Borders: Ian Foster and the Globus Team Honored with SC25’s Test of Time Award

How artists can protect their work from AI | Dr. Heather Zheng | TEDxChicago

AI-Powered Network Management: GATEAU Project Advances Synthetic Traffic Generation

Sebo Lab: Programming robots to better interact with humans

Inside The Lab: How Can Robots Improve Our Lives?

UChicago CS Student Awarded NSF Graduate Research Fellowship

Why Can’t Powerful LLMs Learn Multiplication?

Celebrating Excellence in Human-Computer Interaction: Yudai Tanaka Named 2025 Google North America PhD Fellow