01 December 2020

Folding in silico

Artificial intelligence has solved the problem of protein folding

Sergey Vasiliev, Naked Science

Artificial intelligence has solved a problem that has been one of the most relevant for biology for about half a century: predicting the tertiary structure of proteins from the primary one. Now, knowing the amino acid sequence of a large protein molecule, it will be possible to calculate its three-dimensional spatial configuration. The achievement is reported in a press release distributed by the British startup DeepMind AlphaFold: a solution to a 50-year-old grand challenge in biology.

AlphaFold.jpg

The fact is that the properties and functions of proteins are determined by their three-dimensional structure, and many important findings about how exactly they work were made on the basis of such structures. For decades, such methods as X–ray crystallography, nuclear magnetic resonance or cryo-electron microscopy have been used for this - long, complex and time-consuming. However, even they do not always cope; as a result, 3D structures of about 170 thousand proteins out of about 200 million known to science have been installed today.

Meanwhile, in nature, the tertiary structure of proteins is determined by the primary – the sequence of amino acids that form chains of these molecules: they are formed naturally, by themselves. This process is called "stacking", protein folding. It is not surprising that for many years scientists have been striving to model it mathematically. The task turned out to be so difficult that even the use of supercomputers did not help much here: the number of variants that need to be calculated for molecules consisting of hundreds of amino acids turns out to be astronomical.

To stimulate work in this direction, since 1994, CASP (Critical Assessment of protein Structure Prediction - "Critical Assessment of protein Structure Prediction") has been conducted every two years. The creators of such projects and algorithms from all over the world receive amino acid sequences of about a hundred proteins whose structures are still unknown, and try to calculate them using their models. At the same time, scientists working with "classical" methods of structural biology are working in the laboratories. Then the resulting structures are compared by calculating the amount of coincidence – GDT.

GDT values from 90 to 100 are considered an accurate prediction of the structure, and for short peptides consisting of several dozen amino acids, this was achieved back in the 1990s. However, for proteins containing hundreds of amino acids, GDT has been kept at a "shameful" level for many years – about 20. Only a few years ago, using the most complex algorithms, this figure was brought to 40, which is still not enough.

AlphaFold2.jpg

Average GDT results at the contests of the past years and 2020; the red line is the results of AlphaFold. The values on the abscissa axis correspond to the complexity of the simulated proteins / ©Chris Bickel, Science.

Since 2018, the AlphaFold project, which is being developed by the British company DeepMind, has also been participating in the CASP competition. Even then AlphaFold turned out to be the leader of the rating, demonstrating GDT at up to 60 even for the most complex structures. By the 2020 contest, AI has been improved and trained on 170 thousand known protein structures. During the tests, he was able to predict folding with an average GPT of more than 92 and over 87 – for the most complex molecules.

Experts have already called this event one of the most important breakthroughs of recent years. Perhaps soon neural networks will allow calculating the three-dimensional structures of proteins on the fly, as needed. The task, which was previously so difficult that the authors of some of these works were awarded the Nobel Prize, will become routine.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version