26 June 2014

The third revolution in Biomedicine

Life is a computer

Andrey Konstantinov, "Russian Reporter" No. 24-2014

Once the main tool of biologists was a net and a magnifying glass. Then – a microscope and test tubes. Now information technologies are becoming the main tool necessary for understanding life. To get into the secrets of bioinformatics, we talked to several Nobel laureates, found out why the human genome has not yet been deciphered, saw how phys tech turns into biotech, and physicists into biologists, and even almost understood how scientists read the genetic code and reprogram living cells


Phillip Sharp received the Nobel Prize "for the discovery of the discontinuous structure of the gene"
Photo: Michael Buckner/Getty Images for Entertainment Industry Foundation/Fotobank

Listening to the Nobel Prize winner Phillip Sharp:

– We call this the third revolution in biomedicine. The first was the discovery of the structure of DNA, the second was the decoding of the genome. Now the third revolution is taking place – the merging of life science with mathematics and information technology. Once physicists gave engineers an electron, and the IT revolution began in the world. Then the biologists gave the engineers a gene, and together they will create the future.

Two Nobel laureates met at once at the conference "Therapy of the Future" in Skolkovo. The first is Phillip Sharp, who received the prize for the discovery of the discontinuous structure of genes. The second is Shinya Yamanaka, who managed to turn ordinary cells into stem cells, that is, into those that can be reborn into any tissues of the body.

In addition to the Nobel Prize, the reserved Japanese and the charismatic American are united by the fact that both of them tell how information is transmitted and processed in living cells and between them. For them, biological systems are a kind of living computers, and the language of computer science is quite appropriate here. The science that studies biological "programs" and "computers" is called bioinformatics.

From the point of view of bioinformatics, the nucleus of each cell is something like a microscopic flash drive of huge capacity. Inside the nucleus, on DNA molecules, programs are written, the same for all cells of the body, but each cellular computer executes them in its own way, depending on external signals. If we can read the code and understand the principles of the cellular computer, we will be able to control the program ourselves. Could there be a more tempting prospect than this?

In support of his words, Sharp shows a map of the surroundings of the Massachusetts Institute of Technology, surrounded by a dense ring of the world's largest biotech and IT companies, striving even physically to be as close as possible to the laboratories in which discoveries are made.

Reprogrammers Shinya Yamanaka received induced pluripotent stem cells for the first time in the world, for which he was awarded the Nobel



Photo: Noah Berger/Bloomberg via Getty Images/Fotobank

Changing genetic programs is one of the hottest topics of modern biology. Probably everyone has seen photos of green glowing rabbits and pigs. It looks funny, but it is achieved by introducing a fluorescent protein gene into the animal genome. This is about the same as supplementing your Windows with a program from Google. From the same series, the treatment of hereditary diseases or the creation of bacteria that produce fuel and medicines.

But it is possible not only to supplement the "operating system" of a living organism with "applications" from other manufacturers. It is possible to make an existing program work differently.

In 2006, the success of all the "reprogrammers" of living cells was eclipsed by Shinya Yamanaka. He took a cell from the skin of a mouse's tail and turned it into a pluripotent one – one that can generate any cells in the body. A real paradise for doctors. The following year Yamanaka was able to repeat his success on a human cell.

I am interested in the Nobel laureate:

– How is the reprogramming of an ordinary cell into one that can become any tissue? – The essence is very simple: we introduce four proteins. Each of them is a signal that includes some genes, triggering a complex cascade of reactions that lead to the fact that a small part of the cells in the original tissue turns into pluripotent. This is something like restarting a cellular computer, but, generally speaking, we do not understand exactly how this happens, there is still work and work.

– How did you understand that it was these four proteins that were needed?

– We can say by the poke method: we had 24 candidates, and we experimented on mice until we found the combination of four proteins that worked. – And then how to turn a pluripotent cell into a neuron or a blood cell? – We used the knowledge accumulated by science, how embryonic cells turn into brain or heart cells. While we are able to do only some – lung cells, for example, we do not know how, pancreatic cells turn out to be immature. I repeat: we don't know a tenth of how our body works yet. Everything is ahead.

– Is it not dangerous at all? If we know so little about how these cells work… What if they start multiplying like a cancerous tumor?

– Mutations occur during the production of pluripotent cells. But we use the latest sequencing technologies to read their genome, track harmful mutations and select only safe cells. Seven years ago, when we started, the risk of mutations was very high, but the technology is improving rapidly, and now this risk is much less. It is not zero yet, but we have come close to the stage of clinical trials.

– Do I understand correctly that the transplantation of such cells will allow treating diseases, correcting mutations?

– Many laboratories around the world are working on this now. No one has started clinical trials yet, but there have been successful experiments on animals. In theory, everything should work, but it will reach the clinics in ten years, not earlier. Ideally, we need to learn how to do so that we do not transplant such cells, and the body itself produces them, treats and rejuvenates itself when necessary. I dream about it.

The human genome has not yet been read The project to decode the human genome is almost on a par with the flight to the moon.

However, people reached it half a century ago, and we have not received the promised lunar settlements and mines – there are too many technical difficulties. Something similar happens to the genome. The main problem is a huge amount of data.

We talked with Alla Lapidus, Deputy Director of the Laboratory of Algorithmic Biology of the Academic University of the Russian Academy of Sciences, established in 2011 as part of the megagrant program.

– Our task is to create programs with the help of which doctors and biologists will analyze the human genome, – says Alla. – They are very necessary, because the amount of data produced is simply incredible. Decoding only a small part of the genome that encodes proteins gives half a terabyte of data. If you compare the genome with a book, it turns out that you need to read a novel 10,000 times longer than "War and Peace". It is not an easy task for a doctor from a polyclinic. And these are only the initial data necessary for the initial analysis of a real patient. Can you imagine how big a data warehouse should be in a clinical center, where 3-4 thousand patients come? – exclaims Alla. To date, it is much cheaper to decipher DNA than to store this information. Not to analyze, but just to store. Therefore, IT specialists are learning new methods of information compression.

– Actually, the entire human genome has not yet been read, – admits Alla. – Modern decoding methods are based on reading small pieces with subsequent reconstruction of the entire chromosome. Because of this, areas where there are many repetitions are difficult to restore. Three years ago, the US National Institutes of Health allocated three and a half million dollars to close the gaps in the human genome. I know the woman who received this grant very well, she is exceptionally talented – if she doesn't do it, no one will. A couple of months ago, they posted a new official assembly of the human genome. And still with holes.

– Where are these "white spots" of our genome?

– There are a lot of repetitions at the ends of chromosomes. It is impossible to disassemble these pieces by modern methods, so they are not deciphered. But gradually technologies allow us to read longer and longer pieces of DNA at once – someday we will know the whole genome. Now science has come to understand that it is not enough just to find out the sequence of letters in our book of life. New projects have appeared – "Human Transcriptome" and "Human Proteome". Having tracked only the changes in the genome, you will learn little," Alla explains to me. – Well, you have found a point in the genome that has changed, but what does this mutation mean, how does it change the work of the body? Does it suppress protein synthesis or, conversely, intensify it? Such things need to be looked at not at the DNA level, but at the RNA level, because the amount of protein produced is determined by the RNA level. Do you know the basic dogma of biology: RNA is read from DNA, and protein is read from RNA? The number of RNA molecules produced by each gene shows how active this gene is, how much protein it produces. Information about all RNAs showing how actively genes work is called a transcriptome, this is the next stage of analysis after the genome.

– And a transcriptome is also needed not only by scientists, but also by doctors?

– Based on transcriptomics data, a doctor can prescribe a medicine that restores the level of a substance that has fallen due to a mutation, or, conversely, inhibits the production of this substance. For example, the thyroid gland with different types of disorders can produce too much or very little of its hormone. Well, the third stage of the analysis is the proteome, that is, the totality of all proteins present in the body. We need to see where and in what quantities these proteins are: where they should be, or in places where no one is waiting for them and they are not needed by anyone.

– That is, in addition to the genome decoding data, which already have nowhere to store, modern clinics need more information about the transcriptome and proteome?

– Exactly. At each stage, a lot of data arises, and there is no way to put them together and store them. This significantly hinders the development of personalized medicine. We need a comprehensive approach to the analysis of the body and computer products that will help to carry out this analysis. Therefore, we, biologists, entered into a natural alliance with mathematicians, learned to listen to IT specialists.

– And have you already released a product?

– Yes, we started assembling the genomes of microorganisms and released a program for their analysis. This tool is two years old, and it is already used by thousands of laboratories around the world.

– For microorganisms like E. coli?

– For any bacteria. With E. coli, everything is simple: you pour various useful substances into a jar, put it there, and a lot of material grows in the morning. At the same time, many organisms that live in our body are uncultivated: they cannot be grown in the laboratory. And the vast majority of them – today we are able to grow no more than 5-6% of microorganisms. And we need to sequence all this bunch of bacteria living inside us at once – this is called metagenomics – and the task arises to isolate everyone in this community. Modern technologies allow you to pull apart all these cells, leaving a single one. But there is very little DNA in it – you need to somehow multiply it, make it so that it becomes more, and then sequence and collect the necessary data.

– Why do they do it?

– Let's say you had an operation, and an inflammatory process started, or you are eating incorrectly, and due to changes in the microorganisms of the gastrointestinal tract, the disease began. Or there are autistic children – it has been proven that the microflora of their gastrointestinal tract is very different from the norm, all autistic children have stomach problems. They study what is primary and what is secondary here: the disease causes these changes, or they themselves are the cause of the disease – for example, if some microorganism secretes toxins that affect the brain. All the time there are new challenges associated with the need to sequence the genomes of different organisms: different fungi or animals need their own approaches. But in the end, we want to create a universal tool suitable for any organisms or close to it.

A cell instead of an atom It turns out that MIPT, the citadel of Russian physics, is now also doing biology with might and main.

Last year, the Faculty of Biological and Medical Physics was established there, the genomic center and the center for cellular Technologies were launched, the Center for Living Systems of MIPT was opened. At the conference "PhystechBio-2014", which also did not do without a couple of Nobel laureates, we meet its organizer, the head of the Center for Living Systems, Professor Andrey Ivashchenko.

– You're a phys tech, not a biotech! Why did living systems turn out to be the most important thing right now?

– It's just that the XXI century is the century of studying living systems, it's a global megatrend. The place that physics occupied in the XX century has now been taken by biology, and in general this division is no longer relevant. To be honest, I'm already tired of explaining why they do medicine in phys Tech. "Tablet makers, what are you turning phys tech into? You are physicists, you have to make rockets and nuclear bombs!" – they tell us. I'm even afraid to admit it, but we are still going to do aquaculture and agrotechnology, because biotech is also flourishing there. I think in twenty years most of agriculture will be in the ocean: there is not enough land, and there is a lot of water in the ocean, plankton, biomass is much more than on the surface of the earth.

– But we are traditionally strong in physics and mathematics! Won't it turn out like with the conversion under Gorbachev, when they began to produce pots instead of rockets?

– We are making rockets, they just took second place. Physics is now very much needed as a foundation, a tool for chemistry and biology. Look, the most breakthrough medical centers have grown out of natural science universities. And we are going to create a research hospital at home, which will ensure the translation of scientific achievements into practice. By the way, the best answer to the question of what physicists and mathematicians do in biology is the work of Nobel laureate Michael Levitt, one of the fathers of bioinformatics. He is a physicist by education, and received the prize for modeling protein molecules. The Nobel Committee could not figure out for a long time what to award it for: chemistry, physics or physiology with medicine.

Bioinformatics in a multidimensional world Michael Levitt received the Nobel Prize "for computer modeling of chemical systems"



Photo: Pascal Le Segretain/Getty Images/Fotobank

Despite the tooth-crushing formulations describing his research, Michael Levitt turns out to be a very nice, sincere and sociable person.

– How did it happen that you received the Nobel Prize?

– The answer to this question is very simple: I liked playing with the computer. But at the end of the 60s there were no personal computers, and in general they were very rare. I chose a profession and a laboratory so that I could play enough.

– What is bioinformatics today?

– The book of life is written in the language of computer science, and it should be noted that we are still very far from understanding it. The methods of analyzing information in DNA are generally the same as when analyzing documents or books. Here you write something in the word, and then click the button that allows you to track all the changes made. The Word launches an algorithm for comparing the new text with the old one – the same program is used to compare two DNA strands. Such algorithms have a universal character, so we can say that Google, creating rules for working with big data, has done the most for bioinformatics.

– And it turns out to process everything?

– Now the amount of data is growing faster than the ability of our programs to process it. This is due to the misconception of many people working in this field – that you can just take the data and start analyzing it without having any model of the object or process that generated this data. Now there is a lot of talk about big data analysis in relation, for example, to purchases: from the information, who buys what, they try to draw conclusions about who will buy what in the future. But there are not enough correlations between the chains of numbers – you need to understand the psychology of people, build a model of their behavior. Good data analysis always requires the creation of such a behavior model of the object being studied. It is difficult, the lack of good models of biological objects is a serious problem of bioinformatics.

– What, besides genome analysis, can this science do?

– Now they most often talk about deciphering DNA – a long one-dimensional chain of symbols. But I, for example, worked in a completely different field: modeling large molecules, I was dealing with three-dimensional space, even four-dimensional, if we take into account the changes of these molecules over time. The field of 3D modeling is somewhat similar to architecture, it is very complex. It is easy to compare two chains of symbols and much more difficult to compare two three–dimensional objects.

– So it's just boring for you to work with the genome?

– Of course not! Now my group is working on extremely complex, almost unsolvable problems – I love these. For example, we collect information about the decoded genomes of all species living on the planet. We have about 20 thousand such genomes. We already know about 500 common functions performed by different parts of them, and we are trying to compare them.

– What's next?

– I think the most interesting thing is connected with modeling the life of an organism as a whole, that is, processes such as aging or evolution.

– You are one of the first who came to biology from physics and mathematics. Now many people do this – do they have any friction or misunderstanding with biologists?

– Physicists and mathematicians started coming to biology back in the 50s. Already John von Neumann, a mathematician who stood at the origins of computer science, was very seriously interested in biology. There is no disagreement between mathematicians and biologists, but their style of thinking really differs.

"With what?"

– Mathematical thinking is based on a small number of basic principles, axioms, and the thinking of biologists is based on history. After all, biology is the history of what happened to genetic information over the past four billion years. This makes the mathematical and biological approaches very different. Physicists can't even imagine that the speed of light could change with the age of the universe, they are used to constants. And in biology, everything changes with time. The role of chance is great here – if you go back to the past on a time machine and repeat everything, the story will become different because of some unexpected little things… It seems to me that at school you need to study less addition of fractions and other theory of numbers and much more basic statistics: this is much more important! Today, for most people, the probability of one in a thousand and one in a million is about the same. But the difference is huge – it's like having a thousand dollars or a million. It seems to me that misunderstanding of such things is a serious problem.

– So the union of two fields of knowledge is difficult, but possible?

– Biologists and mathematicians have a lot to learn from each other. It's great that today, thanks to the Internet, learning has become so easy. Recently I became interested in graph theory – first I found programs that showed me everything, and then specialists with whom I was able to discuss my questions. To understand something that you didn't understand before – what could be better!

– They say that biology is physics today in the sense of the unifying role that it has begun to play in science.

– Biology is beautiful and useful. How we see, hear, think, get sick and recover – all this is biology, there is no science more important. And after all, it deals with extremely complex processes – perhaps they take place at a level that even nanoengineers are not yet available. Or scientists can't synthesize a molecule in any way, but an ordinary plant or even a bacterium can do it! The ability of plants to synthesize simply amazes me. I also think that biology today is a focus, a key area of science that gathers specialists from a variety of fields under its wing.

– Maybe the traditional boundaries between sciences will soon be erased altogether?

– The division into mathematics, physics, biology is outdated – I think it will change. It was created as if for self-defense, to isolate myself from others: like, if I'm a mathematician, I shouldn't listen to chemists. But that's crazy! As if you were a nationalist and paid attention only to the statements of people with a certain skin color. A representative of any science can give something valuable to solve the problem. It's like in life – everyone eventually turns out to be useful in some way. I have physicists, chemists, biologists, mathematicians, computer scientists in my group, and when they work together on something, they want to learn from each other.

– Probably, looking at you, they dream of the Nobel Prize?

– You know, it's a strange thing. You suddenly get a reward not for what you do, but for what you did a long time ago. Therefore, it does not bring so much joy, because I do not live in the past, but in the future, I believe that the most interesting is ahead. What could be worse than saying, "That's it, I'm done, it's time to retire."

Basic concepts of the Book of life Bioinformatics is the study of information processes in living systems.

  • To date, these are primarily mathematical methods of genome analysis, and secondly, modeling of the spatial structure of protein molecules.
  • Genome is hereditary information, a copy of which is contained in every cell of the body, recorded on a DNA molecule. The human genome consists of 23 pairs of chromosomes located in the cell nucleus, as well as from special mitochondrial DNA. In total, human DNA contains about three billion "letters". We read this "text" using sequencing.
  • A gene is a "record" on DNA encoding the structure of an RNA molecule, which is copied from this record and sent to perform various functions – to produce a certain protein, participate in the exchange of information, etc. A person has about 25-30 thousand genes, this is only 1.5% of DNA. All the rest of her "records" still seem to us nonsense.
  • A transcriptome is a set of RNAs present in the body. The genome of all cells is the same (there is a slight difference due to mutations in the germ cells of the 23rd chromosome), and the transcriptome is always different: it depends on which genes are currently active and which are not.
  • A proteome is a set of proteins synthesized by the body. Their composition is studied using mass spectrometry. There are many times more proteins than genes, because the same chain of amino acids that make up a protein can fold into different spatial configurations with completely different properties.

Portal "Eternal youth" http://vechnayamolodost.ru26.06.2014

Found a typo? Select it and press ctrl + enter Print version