03 February 2015

How to benefit from decoded genomes?

The Interpretation Game

Svetlana Belyaeva, "Search" No. 4-5 – 2015

Modern biological science is getting closer and closer to mathematics and computer science. The volumes of already collected biodata are such that without the use of the latest information technologies, their analysis cannot be carried out. And the coming era of personal genomic data, when the genetic code of almost any person will be read, raises new questions: how to make the most of the information received?

Our interlocutor is a graduate of the MSU Biofactory, and now a leading specialist in computational genomics at the Department of Oncology at Oxford University and one of the participants in the Genomics England project, Alexander Kanapin. The first question that the correspondent of "Search" asked the scientist concerned the progress of this project.

– "Genomics England" is a state program for sequencing 100 thousand individual genomes of patients, which is implemented by the UK government, – said Alexander Kanapin. – Similar projects have been launched in the USA and China. The British program was adopted in 2012 and is gradually being implemented. Now we are working with pilot data, the results of the very first experiments that came to us. The essence of the project is as follows: biological samples of 100 thousand patients across the country are collected and stored in certain conditions (biobanks), and then their complete genomic sequencing is gradually performed. Of particular interest are hereditary diseases, the most diverse oncology and infectious diseases: AIDS, hepatitis C and tuberculosis. Our scientific group participates in it in projects on computer genomics.

– Your scientific field is experiencing rapid growth now.

– Absolutely true, and this is due to the already large amount of genomic data (as you know, the cost of reading the genomes of individual organisms, including humans, has decreased significantly in recent years), the need to apply computational mathematics methods to their analysis.

– Is it possible to say that the era of Big Data has come to bioinformatics and scientists today are drowning in the information they have received?

– Absolutely. One of the main challenges that bioinformatics faces is formulated by British Professor George Church: "A genome for a thousand, its interpretation for a million" ("1K Genome and 1M Interpretation"). Now hundreds of genomes are under development, that is, the amount of data is growing, but this information is still very far from practical use. This is largely due to the kind of data produced by modern technologies.

So far, we can say that we are "looking under the lantern." We have some basic gene sequence, the famous "Human Genome", which was decoded almost 15 years ago and has been refined and improved since then. The problem is that this is not an individual genome, but some average "consensus" collected from the genomes of people (dozens of donors were involved in the project) of different races, different origins, etc. Today, when scientists try to read the genome of a particular person, they compare it with this "consensus". In general, the genome sequence gives a static picture. This is a description of what set of "genetic instructions" we were born with. How they will work during their lifetime, how and to what extent they will manifest themselves, depends on a huge number of external factors. In addition, data analysis boils down mainly to finding differences between people and through these differences – to explaining the function of a particular part of the genome, for example, in a hereditary disease.

Today, reading the complete human genome can cost about $ 1000-1500, while the price is rapidly decreasing. The more patients with a particular disease are analyzed, the more reliable the statistics will be and the more accurately it will be possible to determine the genetic markers corresponding to this disease. It is not surprising that genetic diagnostics is currently experiencing a new boom.

– And yet, how useful can this information be when it becomes practical? Many people can afford to find out their genome for $ 1,000, but what will this knowledge give, are there enough specialists who can interpret it?

– Indeed, the study of specific genomes is becoming a common practice. In most sequencing techniques, the genome is divided into short fragments (several hundred letters long-nucleotides), which are very quickly read by one or another physicochemical method. And if we talk about whether specialists are ready to use this information for the benefit of the patient, then we must admit that doctors are quite conservative, and in order to introduce something new into their practice, it takes a long time and methodically convince them to use one method or another. There is another problem – in the standardization of the available information. If we talk about the UK, the Genomics England project is largely aimed at solving it.

– Why is it important?

– There is such a phenomenon as the aggressive marketing policy of companies that produce sequencers. Some suggest that almost every polyclinic (including the Russian one) should have its own sequencer. But the problem is not to buy the latest equipment, but that certain software is "sewn" in it – each manufacturer has different ones. It turns out that each company decides for you and for the doctor what methods to analyze your data and which part of your genome deserves attention.

– What will be the duration of the Genomics England project and when will its results become available?

– While it is planned to continue it until 2018, the sequencing procedure should be completed by 2017. By this time, all 100 thousand genomes should be read and "stacked" in the Data Center, which is currently in the process of being created. I think that the subsequent analysis of genomes will take at least another five years.

– And then what? How will its results enrich humanity?

– I can give an example of an article, the authors of which, after analyzing the genomes of 20 patients, revealed a new mutation of one of the genes in people suffering from a rare hereditary disease (congenital myasthenic syndrome). As a result, clinical recommendations for its treatment were developed. This, in my opinion, is the goal to strive for.

– How is your interaction with your Russian colleagues developing?

– It takes place mainly through the European consortium ELIXIR, established in 2006 and designed to coordinate interstate efforts in the field of collection, storage and quality of biological and medical data, including the development of standards, data security, data exchange protocols, training. I am a representative of Russia in this consortium, which is not yet fully included in it, its role is observational. This consortium is not scientific, but rather infrastructural. The idea is that all biological information that is extracted in one way or another should not be chaotic and disjointed, but should comply with uniform standards that allow for correct comparisons.

Based on the experience of ELIXIR, I am trying to convey to the Ministry of Education and Science information that such projects cannot be funded on the principle of conventional scientific grants: they gave money for three to five years, did some work and stopped.

A fairly well-known scientist in the field of bioinformatics, Ewan Birney, once compared the work of bioinformatics with the work of plumbers. Agree, if the water supply is turned off, then it will be bad for everyone... There is nothing offensive in such a comparison, because both the functioning of the modern urban environment is unthinkable without an uninterrupted water supply network, and planning ordinary experiments in biology and medicine is impossible without information provided by bioinformatic resources: databases, analysis tools and the like.

That is, we need some kind of infrastructure that needs to be created and maintained on a state scale in the same way as the urban economy is supported: electricity, water supply, telephone communication and everything else. The problem is to create a similar infrastructure in Russia that requires its own standards and certification.

A lot here depends on computer programs and algorithms. I will give a rather anecdotal example. I talked to people who write programs for data analysis, and they told me that after analyzing the patient's data with different versions of programs, they got different results. "So what should we tell the patient?" – in the end, they were perplexed. The way out is to formulate standards, carry out certification so that new methods can come to domestic medicine.

Portal "Eternal youth" http://vechnayamolodost.ru03.02.2015

Found a typo? Select it and press ctrl + enter Print version