Translate Marquee

Translate Fortran to C++ with AI and RAG

Kyle DickmanScience Writer

Share

Scientists are using artificial intelligence and large language models to rewrite old code in modern languages.

March 31, 2025

Download a print-friendly version of this article.

Across the Lab, much of the work being done on AI is focused on developing new models to interpret scientific data. But Dan O’Malley, a coder in Earth and Environmental Sciences, is harnessing the power of existing large language models (LLMs) to translate and modernize useful codes. Specifically, he and his 20-person team have a goal to demonstrate that AI is capable of translating some of the tens of millions of lines of Lab code written in Fortran, an older coding language, into C++, a code language that runs better on modern computers. “Being dependent on Fortran alone is risky these days,” says O’Malley. “Nobody wants to throw away the code their teams have spent years or decades developing, but translating the code into C++ is time-consuming and not the most exciting work.” 

That’s where AI comes in. O’Malley is taking open-source LLMs, running them on Lab computers, and plying the models with a technique called retrieval-augmented generation (RAG), where generative models are enhanced with data from external sources. The idea is to train the models to translate from Fortran to C++ using what’s known as few-shot learning. LLMs learn through exposure to patterns. O’Malley takes pieces of Fortran code that have been carefully and skillfully translated by a human coder into C++ and feeds both to the LLM. The model observes the underlying logic between the translation—when, where, and possibly why a human translator opted for one approach over another, for example. Because the model is pre-trained on a huge dataset that teaches it many things, often just a single example is needed to dramatically improve the LLM’s ability to pick up on code translation patterns. 

O’Malley began the project six months ago by exposing the LLMs to small datasets of between 1000 and 1200 lines of code. “When I feed it good translations, it can pick up the style of different coders, and then replicate the style of each coder in its own translation,” O’Malley says. If they’re able to establish a reliable methodology, within the next few years, the timesaving demonstration could be a technique shared with coders and scientists across the Lab. 

“There’s almost no field of science that isn’t being changed by AI, and all our missions at the Laboratory are using it in some capacity.”
—Thom Mason, 
Director of Los Alamos National Laboratory
, Santa Fe New Mexican, 2024

People also ask

  • What is retrieval-augmented generation? Retrieval-augmented generation or RAG is a little like a customizable way to train—you can think of that as teaching—powerful AI models. At the most basic level, Large Language Models, a fancy term for programs like Chat GPT or BERT, generate answers by referencing enormous general datasets: the actual internet. RAG, though, lets programmers focus the models. So instead of generating answer from its entire training dataset (e.g. the internet), the models key in on the datasets provided by programmers, allowing them to “learn” more deeply and provide better answers to specialized questions.   
  • How good is AI translation? The rule of thumb is that AI translation is cheap and quick, but without additional human tinkering, the models can miss cultural references, idioms, or metaphors. For translations in computer programming languages, that rule of thumb holds, but in both application, AI is a powerful tool in the hands of a skilled translator.