Back in December 2020, DeepMind surprised the world of biology by solving a daunting 50-year-old challenge with AlphaFold, an artificial intelligence tool that predicts the structure of proteins. Last week the London-based company published complete information about this tool and released its source code.
The firm has now announced that used his AI to predict the shape of almost every protein in the human bodyas well as forms of hundreds of thousands of other proteins found in 20 of the most widely studied organisms, including yeast, fruit fly and mouse. This breakthrough could enable biologists from around the world to better understand diseases and develop new drugs.
At the moment, the treasure trove consists of 350,000 newly predicted protein structures. DeepMind says it predicts and releases more than 100 million more structures in the next few months – more or less of all proteins known to science.
“Protein folding is a problem I’ve watched over for over 20 years,” says DeepMind co-founder Demis Hassabis. “It was a huge project for us. I would say that this is the biggest thing we have done so far. And this is in some ways the most exciting because it should have the biggest impact in the world outside of AI. “
Proteins are made up of long strips of amino acids that twist into complex knots. Knowing the shape of a protein knot can reveal what the protein is doing, which is critical for understanding how diseases work and developing new drugs, or identifying organisms that can help cope with pollution and climate change.
The database should make life easier for biologists. AlphaFold may be available for use by researchers, but not everyone wants to run the software on their own. “It’s much easier to grab a structure from a database than it is to run it on your computer,” says David Baker of the Institute of Protein Design at the University of Washington, which runs the laboratory that built its own tool. to predict the structure of a protein called RoseTTAFold and is based on the AlphaFold approach.
In the past few months, Baker’s team has been working with biologists who were previously stuck trying to figure out the shape of the proteins they were studying. “There are a lot of interesting biological studies that are really accelerating,” he says. A publicly available database containing hundreds of thousands of ready-to-use protein forms should be an even bigger accelerator.
“It looks amazingly impressive,” says Tom Ellis, a synthetic biologist at Imperial College London who studies the yeast genome, who is delighted to try the database. But he cautions that most of the predicted shapes have not yet been tested in the laboratory.
In the new version of AlphaFold, predictions have a confidence score, which the tool uses to indicate how close it thinks each predicted shape is to the actual shape. Using this measure, DeepMind found that AlphaFold predicts the shapes for 36% of human proteins down to the level of individual atoms. That’s enough for drug development, Hassabis said.
Earlier, after decades of work, only 17% of the proteins in the human body had a structure identified in the laboratory. If AlphaFold’s predictions are as accurate as DeepMind says, the tool has more than doubled that number in just a few weeks.
Even predictions that are not accurate at the atomic level are still useful. AlphaFold predicted the shape of more than half of the proteins in the human body, which should be good enough for researchers to determine the function of the protein. The rest of AlphaFold’s current predictions are either wrong or refer to a third of the proteins in the human body that are unstructured until they bind to others. “They are flexible,” says Hassabis.
“The fact that it can be applied at this level of quality is impressive,” says Mohammed Al Quraish, a systems biologist at Columbia University who has developed his own software to predict protein structure. He also points out that having structures for most proteins in the body will allow studying how these proteins work as a system, and not just individually. “This is what I find most exciting,” he says.
DeepMind releases its tools and forecasts for free and does not say if it plans to make money from them in the future. However, this does not rule out the possibility. To build and run the database, DeepMind is partnering with the European Laboratory for Molecular Biology, an international research institute that already has a large database of protein information.
For now, AlQuraishi can’t wait to see what the researchers do with the new data. “It’s pretty impressive,” he says. “I don’t think any of us thought we’d be here that quickly. It’s overwhelming. “