28 August 2019

"Beard" is looking for mutations

Russian scientists have created an AI capable of predicting harmful mutations

RIA News

Molecular biologists and mathematicians from Skoltech, MIPT and their colleagues from India have created a technique that allows you to create machine learning systems that can predict which mutations in a particular protein molecule will be harmful. Its description was presented in the journal PLoS One (Popov et al., Prediction of disease-associated mutations in the transmembrane regions of proteins with known 3D structure).

"In this work, we used a combination of "one‒dimensional" information about amino acid sequences of proteins and three-dimensional data on their structure to create a model that allows us to identify those substitutions in membrane proteins that are directly related to various diseases," says Peter Popov from Skoltech.

Complex protein molecules in our bodies consist of several thousand amino acids, whose chains are often twisted into a complex shape due to interactions between individual "links" of these peptide chains. So far, biologists have not fully disclosed the laws by which proteins take a certain shape, and which allow us to determine the shape of a molecule by its formula.

Therefore, scientists have to determine the structure of individual proteins "manually" – either using computer simulations, or freezing individual protein molecules with liquid nitrogen and helium and "shining through" them with super-powerful X-ray lasers.

Such approaches, as Popov and his colleagues note, do not allow scientists to quickly or accurately predict which "typos" in the structure of genes responsible for the assembly of these protein molecules will change the nature of their work or will not affect the properties of these enzymes, receptors or signaling substances. This greatly complicates the search for new mutations and makes this activity a very expensive and long hobby.

According to the press service of Skoltech, Russian and Indian mathematicians and biologists have greatly simplified this task by creating a technique that allows you to search for such changes in the structure of proteins using artificial intelligence.

To do this, scientists have developed a machine learning system called BorodaTM (BOosted RegressiOn trees for Disease-Associated mutations in TransMembrane proteins – VM), capable of analyzing the linear and three-dimensional structure of already studied proteins, noting beneficial, neutral and harmful mutations and looking for common patterns in their structure.

In this analysis, AI took into account not only where such "typos" are in DNA, but also how they changed the physical properties of the protein, including its hydrophilicity, polarization, the number of hydrogen bonds, stability and other characteristics. This allows the algorithm to quickly and accurately learn to predict how similar changes in the structure of other proteins will change their function and properties.

To demonstrate its efficiency, scientists "trained" the system to look for harmful mutations in the so–called transmembrane proteins - peptides embedded in cell membranes and playing an important role in recognizing "external" chemical signals. Violations in their structure and work, as a rule, very often lead to the development of the most serious diseases.

To train AI, scientists collected data on how the structure and functions of about six dozen similar molecules change when 400 harmful and 150 harmless mutations appear in their structure. This relatively small data set helped BorodaTM to learn how to predict the qualitative characteristics of arbitrary mutations with 72% accuracy.

As scientists suggest, the quality of the work of the "beard" can be significantly improved if the set of mutations and proteins used for training is expanded. On the other hand, BorodaTM is already noticeably superior to other artificial intelligence systems in predicting the properties of transmembrane proteins, although it is inferior to them in studying the soluble part of peptides.

This disadvantage, according to Popov and his colleagues, can be eliminated in a similar way by expanding the set of examples and adapting the machine learning system to work with similar molecules. All this, scientists hope, will speed up and reduce the cost of searching for mutations that cause diseases, as well as help to discover useful changes in the structure of different genes.

Portal "Eternal youth" http://vechnayamolodost.ru

Found a typo? Select it and press ctrl + enter Print version