New Asst. Prof. Eric Jonas Brings Machine Learning to Scientific Measurement
Machine learning often enters scientific experiments in the final chapter, analyzing large data sources to extract meaningful signals or create a predictive model. Eric Jonas wants the approach to play a much earlier and more proactive role: guiding how scientists collect information about the world, from telescopes scanning the cosmos to microscopes looking inside cells to detectors tracking the spray of subatomic particles.
Before joining UChicago CS as assistant professor this summer, Jonas navigated through neuroscience and computer science as well as industry and academia, exploring applied math, signal processing, computer architectures, and scalable computation. But his main research goal has become using computational approaches to help scientists get the most out of their instruments, creating positive feedback loops that enable and accelerate discovery.
“There are many parts of the physical world that we understand very well,” Jonas said. “But for almost all of those systems we often don't have enough data. So how do we increase our rate of acquisition, and how do we do a better job of measuring the things that matter? There are ways of taking our prior knowledge about the world — our physics or chemistry or biology knowledge — and exploiting that to make better, smarter measurements.”
Jonas started down this path as a student at MIT, where despite an early love of computer science, he found himself drawn to a different field: neuroscience. “I always thought of the EECS side of things as a tool,” he said, “and I wanted to use this tool that I happen to be very good at for other things.”
Neuroscience provided a plethora of computational challenges, as researchers developed techniques such as functional MRI and electrophysiology to measure the lively activity of human and animal brains. Jonas found himself adapting and developing new machine learning methods to try to make sense of this flood of incredibly complex information.
The work proved both rewarding and frustrating — Jonas would later publish a provocative paper with neuroscientist Konrad Kording that found cutting-edge neuroscientific techniques struggled to make sense of an early, 3500-transistor microprocessor, much less the estimated 100 billion neurons of the human brain. But it inspired his shift into the measurement side of science as a postdoctoral researcher at UC-Berkeley’s Center for Computational Imaging and RISELab.
There, he applied machine learning approaches to improving a broad range of data collection methods, including super-resolution cellular microscopy and NASA probes looking for solar flares. In another project, he used neural networks to simulate the NMR spectra for millions of molecules — speeding up the compute time over standard computational chemistry methods by several orders of magnitude — and then used that data to train a new model that automates the reverse step of decoding a molecule from spectra.
Video: Hyperspectral data across AIA three channels and magnetogram data from HMI on SDO over the course of an active region moving across the sun.
“Once you've captured this notion in code, you can start doing things like incorporating it into active measurement loops,” Jonas said. “If I get a spectrum and my structure elucidation method says it's one of these five structures, well now I'm going to try to devise an experiment that will uniquely tell me which of these five. You can start doing this closed-loop, thus accelerating the measurement process. That's really the goal of all of this work.”
Along the way, Jonas also found a productive side project in PyWren, a package he wrote to help less CS-savvy scientists more easily utilize cloud computing resources. Services such as AWS Lambda allow users to almost instantaneously access hundreds or thousands of cores to run computing jobs in parallel, but often require deep technical knowledge to deploy. In another provocative paper titled “Occupy the Cloud: Distributed Computing for the 99%,” Jonas argued for software that reduces these barriers, a challenge he has attempted to address with PyWren and its sibling, NumPyWren.
“We think of simulation often as this thing that lives in these big HPC environments, but in fact, everyone who writes a probabilistic model of their data is basically doing some form of simulation,” Jonas said. “All that simulation really is, is the computer executing your model of how the world works, and it's almost always embarrassingly parallel. I think there's a real chance to lower that activation energy and let everyone use these sorts of systems.”
At UChicago, Jonas hopes to continue these research threads in collaboration with scientists across campus as well as at Argonne and Fermilab. After helping teach the CS 121 – Computer Science with Applications course this fall, he’s planning courses and workshops on machine learning for inverse problems and measurement that will fold into the University’s traditional strength in developing new technologies for understanding nature and the universe.
“Chicago has this large history of instrumentation ranging from Millikan measuring the charge of the electron 100 years ago, to Fermi building fission reactors, to Fermilab, to the Advanced Photon Source at Argonne,” Jonas said. “There's this whole trajectory of people who are trying to advance science by basically building things so you can just see what you're curious about. So it's a very exciting place to come in as a machine learning person.”