A New Approach to Determine if Molecules from Cone Snail Venom are Toxic or Therapeutic
April 7, 2025
Rebecca McDonaldDownload a print-friendly version of this article.
Tucked away in the shallow sand of the world’s coral reefs, hundreds of species of tiny, venomous cone snails hide, waiting to paralyze their prey. For this vital task, cone snail venom contains a multitude of toxins—some of which are strong enough to kill humans, while others could ease their pain. Scientists have studied many of these so-called conotoxins and a few are now used as analgesic medications because of their ability to disrupt the transmission of neurological pain signals. There are thousands of unique toxins in cone snail venom, any one of which could lead to a new therapeutic drug discovery.
The first step to finding promising drug candidates is to sort the toxins into pharmacological families based on the conotoxins’ mechanism of action. Families of conotoxins are defined by the biological receptor they target—some target calcium channels, some sodium channels, etc. Because they have a shared mechanism of action, characteristics about toxins in the same family can help predict the toxicity of a new conotoxin, and whether or not it could be useful as a drug candidate. Now, scientists at Los Alamos have demonstrated an approach to fast-track this sorting process. Using a combination of machine learning (ML), which is a subfield of artificial intelligence (AI), and highly focused datasets, the Los Alamos team can quickly sort conotoxins into their appropriate pharmacological families.
A conotoxin’s relationship with its biological target is based on the physical structures of the molecules: a conotoxin protein fits into its biological target receptor like a key in a lock. The structure of a conotoxin protein is dictated by its particular order of amino acid building blocks, so oftentimes scientists rely on comparing amino acid sequences to sort and characterize proteins. In fact, the Nobel-Prize-winning AlphaFold AI platform uses amino acid sequences to predict the structures of proteins. However, predicting the structure—and therefore the function— of conotoxins is especially tricky due to a natural process called post-translational modification.
A protein’s amino acid sequence is dictated by DNA, but sometimes, after a protein is synthesized (translated), an environmental or chemical stimulus can cause additional bonds to form between atoms that change the protein’s shape and therefore alter its function. Because these modifications are post-translational, meaning they happen after the protein is made, they are not accounted for in the DNA, which makes it much harder for scientists to predict a protein’s function based solely on DNA or amino acid sequence. Put simply, a protein's DNA sequence can suggest that it belongs to one family of toxins, but post-translational modifications can shift it to an entirely different family.

“Even with all the current advances such as the invention of AlphaFold and other AI tools, conotoxin structures are especially difficult to predict because of the post-translational modifications,” says Los Alamos biochemist Bob Williams. This is where experimental data can help the computer models. Various experimental capabilities such as mass spectrometry and nuclear magnetic resonance can help scientists see where post-translational modifications occur and how they impact protein structure. These additional data could be used to help the scientists sort conotoxins into the correct family.
Using Los Alamos capabilities in synthetic chemistry, nuclear magnetic resonance, and mass spectrometry, and with funding from the Laboratory Directed Research & Development program, Williams and Los Alamos biochemist Hau Nguyen studied the three-dimensional interactions between individual conotoxin molecules and their biological targets to help gather experimental data on the impacts of post-translational modifications. Getting reliable experimental data, however, is labor intensive, so the team sought to use ML to help, enlisting the expertise of Los Alamos colleagues Duc Truong and Lyman Monroe. Their goal was to find a way to use the small experimental dataset of known conotoxin characteristics and post-translational modifications to create a sorting tool for unknown samples.
“We had to find the right tools for the right task,” says Truong, who explains that the small amount of experimental data available for conotoxins led to an imbalance for training the computer model: only a few entries in the dataset are conotoxins, while many more are not. Imagine, for example, if you have 20 photos of cats, and 200 photos of dogs, and you’re training a computer to recognize the differences in order to sort additional photos into cat-or-dog bins. The computer might erroneously identify all the photos as dogs because it determines that all show furry, four-legged creatures. However, if the model is trained on 200 cat photos and 200 dog photos it could learn to distinguish them more precisely.

To address the conotoxin data imbalance and improve the tool’s sorting ability, the team decided to employ a pair of widely used ML tools, collectively called SMOTE-Tomek. The SMOTE part stands for Synthetic Minority Oversampling Technique, meaning that the Los Alamos team added synthetic, computer-generated data (akin to 180 handdrawn cats to accompany the 20 real cat photos) based on their experimental work, to balance the training set for the algorithm. These extra data could help train the model to recognize the small differences between proteins by providing a more equal number of conotoxin entries and non-conotoxin entries.
Next, the Tomek part of the ML approach (named for its inventor Ivan Tomek) comprised removing data samples that were too similar (i.e., dogs with pointy ears that the computer might suggest are cats) to create more distinction between the two groups. After the initial training, the ML model successfully categorized unknown conotoxins into different pharmacological families. Lastly, the team used various experimental approaches to validate the ML model’s predictions and ensure that it was classifying conotoxins correctly. And it was.
“This work has exciting implications for drug discovery, given that conotoxins affect a diversity of pharmacological targets,” says Nguyen. “Our ability to rapidly and accurately predict signature characteristics of conotoxins, their classes, and biological targets could enable the identification of novel, potentially therapeutic molecules. In addition, as we add more experimental data on conotoxin structures and toxicities we will expand training datasets and increase the prediction accuracy.”
Beyond being good at predicting the structure and function of conotoxins, Monroe says that the Los Alamos team’s approach is versatile and widely applicable. Although the scientists would always prefer to have more data, they were still able to accurately predict signature characteristics of conotoxins. This approach could be widely beneficial, because small experimental datasets are common when working with biological samples. Furthermore, the team developed a new software tool using their ML platform that has the potential to be adapted to a portable device, making it even more useful for a broad range of scientific and national security pursuits.
People Also Ask
Are all cone snails deadly?
No. All cone snails are venomous, but only two species (Conus textile and Conus geographus) are known to have actually killed humans.
What happens during drug discovery?
Drug discovery is a complex and lengthy process, often spanning several years or even a decade, and requiring millions of dollars in investment. During this time, researchers identify chemicals with potential therapeutic properties and then test them for effectiveness and safety. AI/ML is showing potential to speed the process up tremendously.