Can a single AI model advance any field of science?

Earl LawrenceStatistical Scientist

Earl Lawrence, a statistician with a long history of tackling his field’s biggest applied challenges, is now heading up the race to develop AI for science.

March 31, 2025

Download a print-friendly version of this article.

I got the email on the Sunday after last Thanksgiving, just after returning home to Santa Fe from a family trip to Michigan. Jason Pruet, the director of the Lab’s National Security AI Office, had invited me to a meeting to discuss leadership for the Lab’s upcoming artificial intelligence investment. He didn’t directly state that he was going to ask me to do it, but I had a sense that my career was about to pivot.

By the fall of 2023, AI’s potential had become too great to ignore. The latest iterations of industry-produced AI were performing tasks that were unthinkable just five years earlier. It wasn’t just that they were writing coherent text or generating images of people who have never existed; they were also doing things that we should be doing at the national labs. Microsoft had just released a new AI climate model that performed well in both weather and short-term climate prediction. Google had used deep learning to discover more than two million new potential inorganic crystals, orders of magnitude more than all human efforts had ever discovered. Within the next year or so, OpenAI would release a model capable of complex reasoning at a level comparable to Ph.D. students.

I’ll admit that I was still skeptical. I’m a curmudgeonly statistician who had long rolled his eyes at even the term machine learning because it just felt like rebranded statistics (in retrospect, I was just being a jerk). Could AI really solve science’s most complex problems? How could we trust these models with sensitive, classified data? Still, it was clear that if the government didn’t start investing in AI for science, national security, and the public good, we risked ceding that ground to industry and adversaries.

All of that is why in the fall of 2023, Lab leadership put out a call for white papers from a handful of researchers who were experienced in machine learning. The task was to describe how a Lab operation dedicated to building AI for science might look, and how one could approach the technical problems of building AI for national security science. By then, I had been at the Lab for almost 20 years, with most of my career spent using statistics to help answer scientific questions that ranged from nuclear physics and cosmology to Martian geology. For the past two years, I’d been working as the machine-learning project leader for the Advanced Simulations and Computing division (ASC). But what proved most helpful for writing the white paper that I submitted was that I had spent most of 2022 as the Lab’s point person for a series of workshops on AI for the Department of Energy (DOE). More than 1200 scientists from around the country attended, and we condensed their feedback into a 206-page report called “Advanced Research Directions on AI for Science, Energy, and Security.” It was a strategic plan for how—and, really, why—the DOE needed to start investing in AI on a large scale.

Lawrence designed this whimsical logo of an emu in space for one of his first projects at the Lab, the Cosmic Emu. The statistical technique emulated cosmology simulations that required immense computational power, traditionally only feasible on the world’s largest supercomputers, enabling scientists to run them on personal laptops.

That night after Thanksgiving, after our teenage sons had gone to bed, my wife, Jessie, and I sat around trying to see who could finish the New York Times crossword puzzle first (she did, as usual). I told her that I should probably decide right now whether I was going to say yes to this position. It would be a sudden change. If I took it, I had no idea what I’d be doing after the project ended in a few years. But there was never any doubt in my mind. I’m pathologically unable to overstate things, and yet this felt like a once in a lifetime opportunity. There’s an obvious, albeit grandiose, analogy with the Manhattan Project—a coordinated, multidisciplinary team working on a technology with the potential to alter the world. It’s hard to imagine that kind of impact, but it’s not bad motivation.

Statistical roots

I come from a long line of card players on both sides of the family. My dad and his dad were both semi-degenerate poker players from the era when that meant lots of games in the backroom of bars after they closed. I learned how to play about 20 variations of poker from him when I was a kid, and each one was its own kind of education in playing the odds. My dad eventually bought one of those bars and when I was 12, he went to prison after killing someone in a fight. On my mom’s side, their card affliction was a game called euchre. My grandparents, aunts, and uncles could play so fast that it was nearly impossible for an outsider to follow. When I was 14, my mom died in a car accident, and my younger sister and I moved in with my mom’s brother and his family in Bad Axe, Michigan. I eventually learned to keep up with the pace of play. The games gave me a card player’s love of thinking about randomness and uncertainty.

That background explains why my first statistics class at the University of Michigan felt so natural. Starting college felt a bit like finally having a normal life after an eventful childhood. I hadn’t yet been able to decide what I wanted to study, and statistics seemed to suggest I didn’t have to choose. There’s a cliche that being a statistician means you get to play in everyone else’s backyard. If I became a statistician, I could study a little bit of everything for the rest of my life.

A close-up photograph of the Venado supercomputer. — A close up of the Lab’s Venado, the first U.S. supercomputer powered with NVIDIA chips designed for giant-scale AI and high performance computing. PHOTO: Ignacio Perez

I was coming of age in statistics at a time when the field was starting to realize that it needed to move from pencil and paper to computers. At the time, the internet was kind of a big deal. File sharing was taking over. It was the midst of the digital revolution and data science was exploding. My education revolved mostly around traditional statistics approaches, things like linear regression and Gaussian processes. These techniques made assumptions about the data-generating process and the resulting distributions. They were very good at finding relationships within the smaller, more confined datasets they were developed to handle. But by the early aughts, all kinds of industries were realizing that if you had a ton of data and could extract insights from it, there was an advantage to be had.

For me, that translated into opportunities. After graduating with a degree in statistics, I started my Ph.D. program at the University of Michigan in the fall of 2000. My advisor, who came from Bell Labs, was always interested in real-world problems—applied statistics. My first internship was with a psychiatrist who was using data from a motion-capture camera to see if he could tell who was depressed from the way they moved. My next internship was at Bell Labs, where I used statistics to test the efficacy of fiber-optic cables. The cable manufacturer never gave me any actual data, just a PowerPoint presentation with an image of data in it. Most of the applied statistics that I did that summer involved hacking the presentation to get the time-series data out.

Leo Breiman, a professor at the University of California, Berkeley, described the statistics shift to big data early on. In 2001, he published a paper called “Statistical Modeling: The Two Cultures,” which pushed the field toward machine learning. Breiman argued that there were too many academic statisticians proving esoteric stuff and advocated for more of an applied approach, where statisticians develop new methods to solve actual problems. To be honest, the paper made me bristle a bit. At that point, my only exposure to statistics had been at a university program that was steeped in using statistics to solve real-world problems. It felt like Breiman was just telling us a bunch of obvious stuff that everyone should already know. But it also resonated. I was never interested in abstract theory. I wanted to solve real-world problems—preferably timely and relevant ones.

A pioneering statistics approach

My dissertation was on models of internet traffic, a topic I picked because it let me work with a lot of cool graphs (I realize how cool I must sound here). But it was also relevant, and once I’d finished, I had offers from Penn State, IBM, and an internet search start-up based in Mountain View, CA, that had just gone public (I might have erred in turning this one down). I also had one from Los Alamos. By then, the Lab’s statistics group was approaching 50 years old and was known in the statistics community for sitting between industry and academia. It was the real-world version of what first appealed to me about statistics—a way to explore all these different sciences, cranked up to 11. When I got the offer, my wife, who I’d met at the University of Michigan and had degrees in law and urban planning, and I decided that Santa Fe was a cool town. Why not go live in the mountains for a bit? We never thought we’d be here beyond five years.

An illustration comparing the statistical approach to AI with the Cosmic Emu approach. — Emulators like Earl Lawrence’s Cosmic Emu used statistical techniques to predict a simulation’s output based solely on its input. They are not complex enough to learn the internal workings of the simulations. Newer AI models can do this, learning how the data evolves at each step. Lawrence and his team hope to show that this approach can be used to train a model that can generalize beyond its training.

One of my first projects at the Lab defined the rest of my career. After the discovery that the universe is expanding due to dark energy, scientists needed a way to model how the universe evolved under different conditions. The computational models for these predictions were astronomically expensive. Each simulation evolved the positions for billions of particles from just after the big bang to the present day. Only a few places could really run these models at high resolution, and Los Alamos was one of them. By the time I came on board in 2006, the team had run simulations for 37 universes on one of the Lab’s workhorse supercomputers. It had taken them a full year. My job was to find a way to predict the results of new simulations without the slow, resource-heavy computations. Around that time, a new methodology called emulators was being developed that made it possible to speed this process up. My mentor at the Lab, Dave Higdon, was one of the pioneers in this field.

By the winter of 2022, AI’s potential had become too great to ignore.

Specifically, we used Gaussian processes that allowed me to use the model’s inputs to predict the outputs without having to run the full simulations. At the time, my first son had just been born, and I was so sleep-deprived that I could barely focus on the code, let alone get it right the first time. But that project—exhausting as it was—was exactly why I love doing statistics at the Lab: we tackled a big, messy, fascinating problem and created a tool that could make a real difference. When I finished the project in 2008, I dubbed it the Cosmic Emu. Scientists anywhere could download it off the internet, adjust the emulator’s parameters to whatever their needs were, and in seconds get thousands of predictions from their laptops.

Sixteen years later, I still get notes about the Cosmic Emu. Sometimes people write to tell me about a bug in the code that needs to be fixed; sometimes they have a cosmology question and I have to politely tell them I have no idea what they’re asking about because I’m a statistician and don’t know any science. But the Cosmic Emu put my feet under me at the Lab. After that, I applied emulators to problems of predicting materials and grid performance. I had one job where I helped a team working to identify the chemical composition of rocks on Mars, and another where I used emulators to predict when space junk would get in the way of satellites.

For the next 15 years, I essentially kept applying what I already knew about emulators to a wide range of problems, laying the groundwork for (what I’m now more willing to call) machine learning that could solve complex scientific problems. And then came AI, and a whole suite of advances that thrilled and scared me.

From emulators to AI

Scale is what seems to set AI apart from the work we’ve done before. The data, computing, and models are way larger than they used to be. Industry is training models with hundreds of billions of parameters on all the text on the internet. It takes hundreds of thousands of GPUs, the specialized processers developed to handle complex graphical computations.

For the most part, when I say AI, I’m talking about foundation models, large models pre-trained to perform a generic task that can be quickly adapted to perform new tasks. The idea is that the model has learned some very general representation of the data and the relationships among them so it can generalize to new tasks. This means that models like GPT-4 and BERT (Bidirectional Encoder Representations) can be trained to predict the next word in a sequence using a huge volume of data and then be adapted to write poems and programs.

I was coming of age in stats at a time when the field was starting to realize that it needed to move from pencil and paper and toward computers.

The key innovation for these models is the transformer. This architecture grew out of work that people were doing on building models that could translate from one language to another. In one paper, Yoshua Bengio and collaborators added a component to their neural network model that improved the translation because the model used other words in the sequence to figure out the best translation. They pointed out that the model could pay attention to the context of the words. Later, in their paper “Attention is All You Need,” researchers from Google (that internet search start-up from earlier) introduced the transformer, which just used attention mechanisms and dispensed entirely of other neural network mechanisms.

Large language models implicitly represent an interesting hypothesis: I speak, therefore I think (GPT 4o, which speaks Latin better than I do, translates this as “Loquor, ergo cogito”). That seems unlikely, and at a minimum, there is an extra step required to take a sequence model that works for language and apply it to scientific data. Our simulation data, for example, is often 3D fields evolving in time. The key idea here is the vision transformer. Researchers from Google again applied transformers to images by dividing images into patches, computing an order for these patches, and then applying the transformer model to them. In 2023, researchers at Microsoft took this idea and used it to build a model for climate prediction using datasets of climate simulations and observations. Given the current state of the simulation, the model can predict a varying number of time steps ahead. It’s like the language approach, predict the next word, but with much more complex data.

This is the inspiration for the kind of work that we want to do here at Los Alamos. It’s also when things start to look familiar to me. The Cosmic Emu takes the inputs to the simulation and uses them to predict the outputs. It never learns anything about what’s happening inside the model. It’s just not powerful enough. These new models can learn from the entire sequence of a simulation run. With a large and varied amount of data, they should be able to learn something about the physics that governs these simulations. If we can capture that in one or a small number of foundation models, we would have something that we can use for the Lab’s entire mission space. We’ve got a ways to go and we don’t know if it’ll work, but the path is clear.

“As AI capabilities begin to transform nearly every aspect of science, energy, and security, establishing leadership in AI and in the underlying capabilities, including high-performance computing, will be intimately tied to the nation’s future and its role in the global order.”

By the time the pandemic hit, I found myself being pulled deeper into AI. In 2019, the DOE hosted a series of conferences with scientists from all the labs to discuss AI, partly because we’d just finished this big push for exascale computing that, among other things, brought Venado, the Lab’s new supercomputer, to Los Alamos. I helped organize and run the sessions at the opening workshop. There was a sense that AI could be the next big thing, but then the pandemic hit and sort of killed the momentum.

In the meantime, I became the group leader for Statistical Sciences in 2020 and then the ASC Machine Learning project leader in 2021. Both roles gave me a new perspective on the Lab’s inner workings. In particular, the ASC role provided a great view on the diverse ways that scientists were applying machine learning at Los Alamos and across the DOE.

At the same time, AI was changing. Google’s paper on BERT came out in 2019. OpenAI’s paper on GPT-3 came out in 2020. The more-often-downloaded-than-actually-read 200+ page paper “On the Opportunities and Risks of Foundation Models” came out in 2021. ChatGPT would come out by the end of 2022.
The landscape had changed so much that the DOE organized a new set of workshops in 2022. As the ASC Machine Learning project leader, I played a big role in organizing the Lab’s participation. The conclusions were intriguing, even when written in sanitized bureaucratic language. One line in the 2023 AI for Science, Energy, and Security report that was generated in the conference’s wake said, “As AI capabilities begin to transform nearly every aspect of science, energy, and security, establishing leadership in AI and in the underlying capabilities, including high-performance computing, will be intimately tied to the nation’s future and its role in the global order.”

If we succeed, we could rapidly advance the pace of science on topics that benefit all of society. There are also scary parts that we at Los Alamos understand better than most.

After the conference, Jason Pruet and Rick Stevens, Jason’s counterpart at Argonne National Laboratory, started to approach Congress and DOE leadership for funding for AI research. Near the end of 2023, the Lab declared AI a signature institutional commitment, steering money and energy toward developing the technology for the Lab, and on the Monday morning after Thanksgiving break, I was heading to the office to attend a meeting with Pruet where I would accept a new full-time position leading the Lab’s effort to develop AI for science.

AI for mission

To my younger son’s great delight, I named the AI project Science fAIr, which he thought was hilarious. We had just finished his sixth grade version. After enough prodding from him and colleagues, we changed the name to ArtIMis, or Artificial Intelligence for Mission. It’s an ambitious project. We now have 100 researchers from more than 12 different divisions on it. These are the Lab’s best minds in data science. The project’s goal is to develop transformative AI capabilities for a broad set of the Lab’s missions. Six months into a three-year project, we’ve really just begun.

A photograph of Earl Lawrence, principal investigator of ArtIMis, holding his infant son. — Lawrence holding his infant son, Brady, in the mid 2000s. At the time, the sleep-deprived Lawrence was developing a statistical program that would help cosmologists understand the evolution of the universe.

The first thrust and our most ambitious goal is to design foundation models for Lab science. The challenge is whether we can pre-train models on large and diverse scientific datasets and then adapt them to solve real Lab problems. Current models struggle with multimodal data, but this is a potential strength at the Lab, where we can bring together many types of simulations and experiments.

When trained on less data, current models aren’t as good at finding patterns and making predictions. One question we have is whether they can be better if the data quality is higher. Our pilot projects include a biosecurity model for identifying new therapeutics, a materials science model to predict material fracturing, and a multi-physics model for predicting the behavior of complex systems. If successful, we aim to produce both the models themselves and a repeatable process for building these models. Will these models be reliable? How much data is enough? Can we maintain accuracy with limited datasets? Can we develop a foundation model that will work on different disciplines? We’re hoping to answer these questions, and others.

The second thrust, Design, Discovery, and Control, focuses on optimizing processes and driving innovation. Done right, we want to use AI to accelerate the design of new experiments, automate lab work, and control manufacturing. The challenge is designing methods that can handle the complexity of our models and the many interacting parameters governed by physical laws. But we have a long history of expertise here, and AI is well suited to these problems. Our first pilot project aims to integrate AI into energy system planning.
At the moment, it feels like we’re building a new division from scratch. To support these two main thrusts, we have additional teams working on critical aspects like benchmarking AI performance, developing data management strategies, and assessing AI risk. Each team is iterating in real time, coordinating efforts as we push forward.

A line graph showing the Lab's investment in artificial intelligence. — Los Alamos has been working on machine learning and artificial intelligence for decades. But over the past 15 years, the Lab’s investment in the technology has skyrocketed, from a low of $20 million in 2016 to more than $155 million in 2024. With the latest investment, the Lab hopes to build AI for mission science.

As I write this, the Nobel Prize for Physics was just awarded to Geoffrey Hinton and John Hopfield for their work on AI. It’s exciting to see AI and data science recognized like this. After the announcement, Dr. Hopfield made an analogy, like the earlier one I made about the Manhattan Project, between AI and splitting the atom, which led to enriching and terrifying consequences. This feels especially relevant when working at the nexus of AI and national security. If we succeed, we could rapidly advance the pace of science on topics that benefit all of society. There are also scary parts that we at Los Alamos understand better than most.