08 June 2021

Big Data Analysis

Russian scientist – about research at the intersection of biology, medicine and computer science

Arseniy Skrynnikov, RT

Bioinformatics is one of the most actively developing interdisciplinary scientific fields in the world. This was announced in an interview with RT by Alexey Sergushichev, director of the scientific and Educational Center for Genomic Diversity at ITMO University.

ITMO.jpg

He explained the peculiarities of bioinformatics, which is at the junction of medicine, biology and computer science, and spoke about the training of specialists, as well as about the tasks of deciphering the genomes of all vertebrate species, the conservation of rare animal species and the search for genes "guilty" of diseases.

– Alexey Alexandrovich, what is bioinformatics? What are the challenges facing researchers in this field of scientific activity?

– Bioinformatics is one of the most actively developing interdisciplinary scientific fields in the world. It originated in the middle of the twentieth century, but the real boom occurred with the development of technology at the beginning of this century.

Bioinformatics is based on big data analysis, solving fundamental problems and developing computational methods for biology and medicine.

– You are holding an Olympiad in bioinformatics. What does it represent?

– To identify talented bioinformatics specialists, various competitions are held, the tasks of which are based on real experiments. Among them is the online Olympiad Bioinformatics Contest. In June, students and specialists in the field of bioinformatics, computer science and biology from all over the world will compete for the title of the best bioinformatician. 

– How do they become bioinformatics? What are the prospects for specialists in this field?

– Usually people come to bioinformatics either from biology or from IT. However, for programmers, bioinformatics is rather an opportunity to tackle more interesting tasks, while for biologists it becomes a necessity and allows them to become more in–demand and paid candidates in the market. 

– What is the peculiarity of bioinformatics? Please provide examples of application.

– The main trend in the development of bioinformatics is working with large amounts of data, including public data posted on the Internet. We are searching for tools that make it faster and easier to navigate through a giant array of information. When developing various bioinformatics tools, deep learning is increasingly being introduced, that is, the use of artificial intelligence methods.

For example, at the end of 2020, Google DeepMind developed a program based on deep neural networks that performs predictions of the three-dimensional structure of a protein. They have been trying to solve this problem for half a century, and a significant breakthrough turned out to be quite unexpected. 

– What benefits can bioinformatics bring in medical research?

– Diseases or predisposition to baldness, fullness or poor eyesight may be associated with certain genes. The task of computational genetics is to determine which part of the genome from a large number of "suspects" is responsible for the changes. To do this, researchers usually compare the genetic information of two groups of people: patients with a disease, for example, schizophrenia or Crohn's disease, as well as healthy people. Then, according to the differences found in the activity and severity of genes, they are ranked according to the level of possible influence on the disease, and biologists confirm or deny these connections. 

At first glance, it may seem that it is not a problem to identify genomes for healthy people and then use this data for various diseases. However, due to the fact that personalized information, including genomic data, is now being taken quite seriously around the world, it becomes impossible to make all genomes publicly available, the genomes of healthy people become inaccessible.

– Is this kind of work being carried out in Russia?

– Yes, a software tool developed by our laboratory together with the laboratory of the pioneer of computational genomics Mark Daly helps to partially solve the described problem.

To understand which genes are "guilty" of certain diseases, you need to compare the genomes of two groups of people: healthy and sick. But it is important that all these people are of the same population. Relatively speaking, Africans should be compared with Africans, Europeans with Europeans. It is very difficult to find this data for healthy people, because they are usually not publicly available, and it is expensive to assemble a genome from scratch. And here, according to the data of sick people of a certain population, the program itself selects the most suitable option and provides non-personalized data sufficient for analysis.

Also, with the Almazov NMIC of the Ministry of Health of Russia, we study people with congenital pathologies that standard genetic tests cannot detect. For example, for cardiomyopathies (heart disorders), a set of genes and their mutations have already been identified with a well-established relationship with the disease. Their presence is detected using special genetic panels (a fixed set of mutations that are checked in the patient). In most cases, it is possible to find a causal mutation among known candidates, but sometimes patients with a "non-standard set" come across, and here the panels can no longer cope. Then you have to find all the mutations in the genes and try to determine which ones are associated with a particular disease.

– How many such mutations can there be in human genes? And how to find out which one is the cause of the disease? 

– A person can have a lot of mutations: there are several million of them in total, and those that are involved in protein coding and which are usually easier to interpret – about 30 thousand. We aggregate different data and rank mutations that are more likely to be causal. Then scientists look at the results and try to figure out what is more like the truth. For example, one of the ways to experimentally test for genetic diseases related to the heart is to introduce a mutation into a special kind of transparent danio–rerio fish and observe whether it leads to any changes. If, for example, muscle fibers grow incorrectly in a fish, then this may confirm the connection of mutations with disorders of the development of the heart and in the human body.

– How does bioinformatics help geneticists?

– Bioinformatics helps geneticists to determine the right gene and its connection with the disorder, and then the question arises: "What mechanism is this related to?" This is the task of more classical biology, which studies various molecular processes. For example, there is a certain disease, specific genes, and you need to understand what proteins they produce, how these proteins react to different external stimuli and interact with each other, what exactly leads to the activation of the immune system and how it reacts to viruses.

Globally, we want to find out exactly which processes lead to non-trivial conditions, disorders and diseases in order to be able to influence them and thereby prevent or treat diseases.

– And modern algorithms are being developed for this?

– That's right. If at the end of the last century it was necessary to consider each gene and each reaction occurring in it separately, today, thanks to the development of various experimental methods, scientists have learned to see the relationship of all genes simultaneously, getting a comprehensive look at the cell. And bioinformatics, in turn, makes it possible to analyze all this data, present them in such a way that it can be concluded which genes are most important and what they interact with. For example, one of our projects in this area is an algorithm that allows us to identify important protein interactions in connection with the disease of interest. 

Now we are developing software for visualization and interactive analysis of information from open sources, thanks to which it is easier for biologists to form hypotheses, analyze data and confirm results. We also improved the standard method for analyzing gene activity, which was very popular and which we used in our research. We were able to speed it up significantly and make it easier to use. Today, this method is in the world's top tools for bioinformatics.

– Bioinformatics also has tasks that are not directly related to humans. How does it help to preserve endangered species of animals?

– The first breakthrough in the development of genetics was obtaining the human genome in the early 2000s.

Then the first results of human genome sequencing were published in the journals Nature and Science. 

This event opened a new, "post-genomic" age in biology and medicine. On the other hand, it was clear that a lot of further work was required to understand in detail how our genome works. To do this, it is important to study not only humans, but also animal genomes.

In this regard, the Genome 10K community was founded, the purpose of which was to determine the genomes of 10 thousand animal species. Then this community organized a project to determine the genome sequences of all (approximately 70 thousand) existing vertebrate species. Just recently, within the framework of this project, an article with the first results was published and a methodology for the effective production of genomes was described. If the project to obtain the human genome required $3 billion, now we can get high-quality genomes for only a few thousand dollars.

The study of the animal genome allows us to understand the dynamics of the development of species. When a small population remains, it is mostly dominated by individuals with rare gene mutations. When they are crossed, animals with diseases will be born, which is why the risk of extinction of the species increases significantly.

One of the recent stories illustrating the possibilities that open up when studying the genome of rare animals is the news about the cloning of a black–legged ferret, which was considered an extinct species. Cloning was done using frozen genetic material taken from an individual who died more than 30 years ago. A female of an ordinary ferret acted as a surrogate mother. 

– Have similar studies been conducted in Russia?

– There are practically no such projects in Russia at the moment, although there are many nature reserves with unique fauna in our country. We will try to develop this direction. In our center, we are just showing more attention to endangered species of animals. We are interested in rare species of antelopes, cheetahs, gazelles, oryx.

We take data for assembly, that is, combining a large number of short DNA fragments into one or more sequences, from our partners. This work is led by the scientific director of the center, American geneticist Stephen O'Brien, who has been engaged in animal genomics for many years. Scientific groups of Stephen's students and colleagues from all over the world are engaged in interesting animals, including studying their genomes. To sequence, for example, the genome of a shark, its DNA is extracted from the blood. Then it is divided into parts, and then a sequence of small fragments is determined and their gluing is performed. These projects are an important step in the conservation of endangered animal species. 

– What awaits bioinformatics in a few years?

– It is quite difficult to say something definite. I hope that in the near future the role of bioinformatics will cease to be limited only to analytics, but will also shift to specific predictions. I hope that in ten years we will be able to develop tools that will form experimentally verifiable hypotheses themselves and, perhaps, even test them themselves using automated laboratory tools.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version