AI Generates Tunable Proteins for Materials with Controlled Properties

Two new machine-learning algorithms can help scientists specifically design materials for desired use cases.

4 Min Read
machine-learning-AIproteins.jpg
A new machine-learning system can generate protein designs with certain structural features and which do not exist in nature. These proteins could be used to make materials that have similar mechanical properties to existing materials, like polymers, but which would have a much smaller carbon footprint.Image source Jose-Luis Olivares/MIT with figures courtesy of the researchers; caption courtesy MIT

Artificial intelligence (AI) and machine learning are getting a lot of publicity these days thanks to advanced applications like ChatGPT. However, one place scientists have been using machine learning for some time is in materials science, where algorithms can help researchers find materials with properties they seek for various purposes.

A new use of this type of application for AI comes from MIT, where scientists are using it to develop new proteins that have specific features beyond those founds in nature, they said.

Machine-learning algorithms designed by the team—composed of researchers from MIT, the MIT-IBM Watson AI Lab, and Tufts University—have developed proteins that can then be used to make materials that have certain mechanical properties, such as stiffness or elasticity, they said.

What's more, these materials potentially can replace materials made from petroleum or ceramics to promote more environmentally friendly applications, the researchers said.

“For the applications we are interested in, like sustainability, medicine, food, health, and materials design, we are going to need to go beyond what nature has done," noted Markus Buehler, a professor of civil and environmental engineering and of mechanical engineering at MIT, who led the project.

Proteins are formed by chains of amino acids, folded together in 3D patterns, the sequence of which determines the mechanical properties of the protein. While scientists have identified thousands of proteins created through evolution, they estimate that a massive number of amino-acid sequences remain undiscovered.

Designing proteins beyond ones already found in nature is "such a huge design space that you can’t just sort it out with a pencil and paper," Buehler said. So the team turned to AI to help them "figure out the language of life, the way amino acids are encoded by DNA, and then come together to form protein structures," he said.

"Before we had deep learning, we really couldn’t do this,” said Buehler, who is also a member of the MIT-IBM Watson AI Lab.

Two Machine-Learning Models

While researchers already have designed deep-learning models that can predict the 3D structure of a protein for a set of amino-acid sequences as a scientific shortcut to protein discovery, it historically has been difficult to predict a sequence of amino-acid structures that meet targets of a design, they said.

To tackle this challenge, researchers turned to a new trend in machine learning called attention-based diffusion models—which can learn very long-range relationships—to develop their models.

The team developed two models—one that operates on overall structural properties of the protein and one that operates at the amino-acid level, they said. A similarity between the two is that they both work by combining these amino-acid structures to generate proteins.

The models are connected to an algorithm that predicts protein folding, a characteristic that the researchers use to determine the protein’s 3D structure. Then they calculate its resulting properties and check those against specific design specifications for what type of protein they're seeking, they said.

Overall, the models work by learning biochemical relationships that control how proteins form, the researchers said. In this way, they can produce new proteins that could enable unique applications, Buehler said.

“In the biomedical industry, you might not want a protein that is completely unknown because then you don’t know its properties," he explained. "But in some applications, you might want a brand-new protein that is similar to one found in nature, but does something different. We can generate a spectrum with these models, which we control by tuning certain knobs."

Realistic Design

The researchers tested their models by comparing the new proteins to known proteins that have similar structural properties. While many had some overlap with existing amino-acid sequences—about 50 to 60 percent in most cases—some also had entirely new sequences.

To ensure the predicted proteins are reasonable to design and synthesize, the researchers tried to trick the models by inputting physically impossible design targets. The models, however, generated the closest synthesizable solution—a promising result because it means that whatever comes out of the model is likely to be something that can be synthesized in the real world, the researchers said.

The team published a paper on their work in the journal Chem. The researchers next plan to validate some of the new protein designs by creating them in a lab. They also aim to continue improving and refining the models so they can develop amino-acid sequences that meet more criteria, such as biological functions.

About the Author

Elizabeth Montalbano

Elizabeth Montalbano has been a professional journalist covering the telecommunications, technology and business sectors since 1998. Prior to her work at Design News, she has previously written news, features and opinion articles for Phone+, CRN (now ChannelWeb), the IDG News Service, Informationweek and CNNMoney, among other publications. Born and raised in Philadelphia, she also has lived and worked in Phoenix, Arizona; San Francisco and New York City. She currently resides in Lagos, Portugal. Montalbano has a bachelor's degree in English/Communications from De Sales University and a master's degree from Arizona State University in creative writing.

Sign up for the Design News Daily newsletter.

You May Also Like