29 April 2022

The reason for the interview

About the human genome, the immortal jellyfish and the hair of Tsarevich Alexei

Nadezhda Markina, PCR.news

Evgeny Rogaev is known to a wide circle of readers as a scientist who put an end to the question of the authenticity of the remains of the royal family. He was at the origins of the introduction of DNA analysis into forensic medical examination in our country, participated in the discovery of genes key to Alzheimer's disease, was one of the first in Russia to start working with ancient DNA and sequenced the mitochondrial genome of a mammoth. And the reason for this interview was the receipt of the most complete version of the human genome.

Evgeny Rogaev is a Doctor of Biological Sciences, Corresponding Member of the Russian Academy of Sciences, Head of the Laboratory of Evolutionary Genomics at the Institute of General Genetics of the Russian Academy of Sciences, Professor at the University of Massachusetts Medical School, Director of the Scientific Center for Genetics and Life Sciences at Sirius University.

The most complete human genomeRecently, an article was published in Science: The Telomere-to-telomere (T2T) consortium published the full version of the human genome.

You and your employees participated in this work. Are you a member of the consortium?We enter.

Ivan Alexandrov was most involved in the work itself, Lev Uralsky, Fyodor Gusev and I were also co-authors of the consortium's publication.

What is your role in this work? What exactly from the human genome have you sequenced?The role of our group's employees is to participate in the analysis of centromeric alpha—like DNA repeats.

Most of the unread genome just belonged to centromeric regions, these are special sections of chromosomes that are necessary for cell division, and they usually consist of tandem repeating DNA sequence units. Interestingly, my very first scientific work, my thesis, was devoted to cloning and sequencing the first fragment of a chromosome containing these alpha-like repeats of DNA centromeres. We published an article back in 1986. A few years ago, we returned to this topic to some extent. Centromeres in all chromosomes were considered homogeneous, consisting of identical repeats. But it turned out that they are not completely homogeneous, different chromosomes differ from each other in these domains. There is a rather complex structure there, and they occupy many thousands of nucleotides on chromosomes. But how exactly they are located, their sequence was very difficult to determine. A lot of time has passed, and in the original reference genome there were "holes" in these areas. According to an approximate estimate, the version of the human nuclear genome, which has been constantly replenished and improved since 2001, did not contain about 8% of the genomic DNA sequence.

What is included in this 8%? As a rule, these are sections of the genome of a complex structure — repeats, inversions, duplications, etc. When we use modern sequencing methods to read short sequences (short reading), such sections are very difficult to orient.

A couple of years ago, we published a paper on the study of epigenetic profiles of the neurogenome. In it, we wanted to find epigenetic differences in neurons in different people, including patients with schizophrenia. We looked at which parts of the genome in the neurons of the cerebral cortex have active promoters, using a special chromatin marker, and thus scanned the entire genome. And when we started mapping these active promoter genes, we found that in addition to thousands of signals at sites of known genes, many signals are mapped to different genomic sites where no genes are marked by the reference genome. By integrating these chromatin "signals to promoters" data and transcriptomics data (analysis of RNA transcripts of brain cells from many individuals), we have shown the possibility of discovering new genes that have not yet been annotated. For the most part, in this way we identified a series of protein-non-coding genes, but also some protein-coding ones. While we were doing this article for two years, some of these genes were gradually added to the reference sequence of the genome, which confirmed the validity of our approach.

Since 2001, the composition of the genome has been updated several times, most recently in 2019?Yes, but there were still pieces of genomic DNA sequences that could not be "stuck" into the whole sequence, into the contig.

Now there are no such, everything is in place. You may ask: what did you do to make it happen? First of all, there were methods of "long reading" (long reading). It was provided with two platforms — Pac Bio (Pacific Bioscience) can read tens of thousands of nucleotides, and Oxford Nanopore provides ultra-long reading, up to a million nucleotides (although for this you still need to be able to isolate DNA so that it does not break). But these platforms had a disadvantage — a large percentage of errors, in Pac Bio up to 13-15%. But it was possible to make a rough sketch of a long sequence, and then, using the Illumina sequencing method, this sequence could be made clean. But not so long ago, Pac Bio had a technology that, with the help of many cycles and multiple readings, "cleans" the DNA sequence from errors, which was used in the work of the consortium.

Which cell genome was sequenced?As described in the article, a special homozygous human cell line was used.

When we start collecting the complete genome, heterozygosity, when there are different alleles on paired chromosomes, is very hindering. I think it would be ideal to take a sperm, with a haploid set. But in this case, you need to work with a single cell. Another option is to take a diploid homozygote. To do this, we used a special cell line associated with a rare pathology during pregnancy called "bubble drift". With such a pathology in a fertilized egg, the maternal set of chromosomes is eliminated, and the genetic material of the sperm is duplicated. As a result of cell divisions, cells with a diploid set of chromosomes are obtained only from the pope, in a homozygous state.

And with an X-chromosome.With the Y-chromosome, such cells do not survive, only 46 XX can be obtained.

When using such a cell line, the problem of genetic heterogeneity is eliminated, it becomes much easier to assemble a genomic sequence. Since this is a cell line that is conducted in the laboratory, some genomic changes are possible in the process of cell culture, but they are insignificant. Only the Y chromosome is not in this genome.

Most of the newly sequenced sections are non-coding?Yes, most of them are centromeric and near—centromeric chromosomal regions that contain families of DNA repeats.

But among the repeats in the near-centromeric regions, genes may also be contained. The previous version of the genome lacked the normal structure of correctly oriented ribosomal genes. These are very important genes that are actively expressed and, by the number of copies, very polymorphic among different people. And now it has been clearly defined how they are arranged, in the new version of the genome in this cell line there are about 200 ribosomal genes. In addition, among the new genes, so-called paralogical genes (having copies of similar genes) that were previously skipped were isolated. Here, in short, is the whole story. This sequence can now be used as the latest version of the reference genome. In fact, there are areas in it where, let's say, the possibility of error is assumed, but there are much fewer such areas.

DNA is a guide to the pastLet's move on to your work with ancient DNA.

Where do you isolate and sequence it? And how expensive is it to study the ancient genome?All work is carried out in Russia.

And the isolation of ancient DNA, and the preparation of so-called genomic libraries, and genomic sequencing, and analysis. The main institution in the project is Sirius, we have a fully made platform for genomic sequencing there, everything works. Moscow genetic laboratories are also involved in the work, in which there are conditions for the isolation of ancient DNA. Of course, such work is carried out within the framework of a consortium, in cooperation with our colleagues — archaeologists and anthropologists who professionally describe and offer samples for research.

The cost of sequencing the ancient genome is an order of magnitude higher than for the modern one. Do you know why? No, it's not just the poor preservation of ancient DNA. And not in the need to create particularly clean conditions in special laboratory rooms. We know how to effectively control and clean up possible contamination of modern DNA or errors associated with damage to ancient DNA. This is done both at the level of experimental work and at the level of bioinformatic analysis. The main costs are not spent on DNA isolation or the preparation of genomic libraries, but on chemical reagents that are used to determine nucleotide sequences on a modern deep sequencing platform. When working with ancient DNA, most of the reagents are spent not on sequencing endogenous human DNA, but on the DNA of microbes that abundantly pollute the bone. Previously, it was believed that DNA is best preserved in teeth, but it turned out that bone fragments of the temporal areas of the skull, primarily the auditory bones, are much better suited for this. It may contain the largest proportion of human genomic DNA relative to polluting microbial DNA.

Tell us about your project to study the genetic history of Russia. As I understand it, the time frame in this project is very wide, from the Paleolithic to the formation of the Old Russian community. What samples do you work with? What archaeological cultures?The project is called "Genetic history of the ancient population of the Russian Plain".

It's premature to talk about the results now, because we haven't published much yet. The project itself is as follows. The European part of our country is occupied by the East European or Russian Plain, stretching from the Barents and White Seas in the north to the southern seas, and in the east to the Ural Mountains. And in this vast geographical area that interests us, we would like to explore all the epochs for which collections of anthropological and archaeological materials have been found and collected: from the Paleolithic to the Middle Ages. To reconstruct the history of numerous cultures and peoples that have succeeded each other or coexisted on this territory for thousands of years or centuries. We decided to start with the historical period of the Scythians, which began in the early Iron Age, on the one hand, and on the other hand, with the period of formation of the East Slavic tribes and the Old Russian community in the early Middle Ages in the Eastern European part. Here are two conditional historical points that we are interested in now. There is a time interval between them, less than a thousand years. It also needs to be filled in, but first you need to deal with biological samples (bone fragments) from the burials of the Scythian culture and medieval archaeological sites on this territory. And we have already received genomic data on a large number of ancient samples.

The key issue of the project is the creation of an ancient Russian community. Now we are exploring the extensive collections of the early Middle Ages together with our leading archaeologists and anthropologists and their staff. Of particular interest are the northern burials and archaeological sites associated with Northern Russia, this is handled by Nikolai Andreevich Makarov and the Institute of Archaeology. Many samples have been extracted from them, including during recent excavations and archaeological research using modern methods, which is especially valuable. Also, thanks to cooperation with Alexandra Petrovna Buzhilova, we have the opportunity to explore valuable collections from the Research Institute and the Museum of Anthropology of Moscow State University. They include samples of the Early Middle Ages and previous eras found on the territory of the Russian Plain. Genetic studies of ancient samples will help clarify the history of Eastern Slavs. And to understand the genetic history in the context of modern populations, it is necessary to create a good database of genomes of modern Russians, Belarusians, Ukrainians and other peoples living in adjacent territories. There are no such full-fledged data yet, so it would be relevant for our country to launch a separate project on sequencing modern genomes of the peoples of Russia.

Of particular interest are very ancient samples from the Paleolithic period, whose age is more than 20 thousand years. These are extremely rare, isolated finds. Unfortunately, most of these finds found on our territory have already been transferred earlier to Western laboratories. We are very interested in the genetic study of such samples, which can and should be studied in Russia.

You recently published an article on the genome of the ancient Yaroslavets during the invasion of the Golden Horde.This is just a small piece of work that showed, based on the analysis of the mitochondrial line, his more likely Caucasian, rather than Asian origin, as one might assume based on some anthropological descriptions.

The Riddle of the Immortal JellyfishYou are the director of the Scientific Center of Genetics and Life Sciences of Sirius University.

What genomic projects are going on there?If we talk about our projects, then there is a "centennial genome".

We sequence and analyze the genomes of one hundred hundred-year-old people, residents of Russia, in order to find genetic and epigenetic variants associated with protection from diseases of the elderly and with longevity. The project to reconstruct the de novo sequence of the genome of sable and marten is coming to an end, interesting results on interspecific crossing have been obtained there. Well, the project associated with the "immortal" jellyfish is our favorite, although probably the most time-consuming.

Is it a jellyfish that has an endless life cycle because it can turn back into a polyp? Are you interested in her simply as a biological phenomenon, or is there a connection between her immortality and your study of centenarians, for example?What is considered immortality?

We can assume that we are all a continuation of the immortal germline, germline, which is passed down from generation to generation. And somatic cells are mortal, animals always have an ontogenetically limited period of existence. But we know that there is a mechanism when, by manipulating transcription factors in vitro, differentiated somatic cells can be converted back into undifferentiated cells.

On the left is the "immortal" jellyfish Turritopsis dohrnii, on the right is the polyp from which the jellyfish bud.

Return pluripotence?Yes. We can do it in vitro, but we don't see it happening in vivo.

Moreover, fibroblast skin cells can be directly transformed into muscle cells using only one transcription factor. Again, under in vitro conditions. A process called transdifferentiation. That's why we were interested in coelenterates, first of all, Turritopsis dohrnii. This is a unique organism for which an amazing possibility of reverse development has been shown. Like all coelenterates, her jellyfish bud off from the polyp, but under certain conditions it can turn back into a polyp. This transformation occurs through somatic cells, germ cells are not involved in this. The question is how exactly the genome is reprogrammed and such a return to its original state occurs.

Are you trying to figure it out by examining her genome? And where do you get the animals from?We have learned how to keep and breed these jellyfish in the laboratory.

This is a very big success, because it is a very time-consuming process. The launch of reverse development cycles is also being studied in the laboratory. And we have already received a large volume of genomic and epigenomic data.

"Royal illness" and the prince's hairAnd what is the most interesting thing in the study?

When it is possible to solve the problem with the help of a successfully found, beautiful methodical technique.

For example, in the work on identifying the gene for "royal blood disease" — as we have established, hemophilia of form B, which affected members of the royal families of Europe, including Tsarevich Alexei Romanov. Of course, finding a mutation was interesting in itself, and thus solving a historical riddle. But what was not covered in the media space and was described in the appendix of the article is the approach that we used to prove the pathogenicity of the found mutation. It was necessary to somehow prove that the mutation that we found in the non-coding site on the border with the exome is not just a harmless genetic polymorphism. First, we cloned a genomic region containing a mutant variant of the gene, while using degraded genomic DNA isolated from a historical sample. Then we placed this site containing the exon and the site with the mutation between two other exons in a special plasmid. After that, the cultures of hamster and human cells were transformed with this plasmid and showed that the mutation found affects splicing, which proves its pathogenic nature.

It turns out that you have found the mechanism by which this mutation leads to hemophilia?Yes, we have shown that due to this mutation, splicing is disrupted, there is a shift in the reading frame.

And using deep sequencing methods, they showed that transcripts change – on cell cultures. It was such a good idea.

Tell us about another recent article about the portrait of Tsarevich Alexei.This is an article about finding out the authenticity of a biological sample in a museum exhibit, with the help of which the museum value of the portrait was determined.

The story is like this. The State Historical Museum kept a painting painted in watercolor, obviously a portrait of Tsarevich Alexei, the son of the last Russian Emperor Nicholas II and his wife Alexandra Feodorovna. A chest was mounted in the portrait frame, which contained several hairs. We were asked to determine whether a genetic analysis is possible to find out their origin. The hairs were very thin, without roots, and it seemed unlikely to get an informative DNA result. But somehow, in between, we tried to conduct such an analysis. Meanwhile, we were also preparing the latest conclusions to the Investigative Committee on the identification of the Romanov royal family. It turned out that it was possible to extract DNA from a single hair, but in an extremely small amount and in the form of a fraction of very small fragments of destroyed DNA. The size of most of these fragments was even smaller than the minimum required for genomic analysis of ancient samples. Despite this, we were able to restore informative sections of mitochondrial DNA. It turned out that it was this mitochondrial type that we had previously identified in Alexandra Feodorovna and her children, including Tsarevich Alexei. Alexandra inherited this variant of the mitochondrial genome from her grandmother, the British Queen Victoria. This work was an example of using a genetic approach to determine the museum status of an exhibit.

"If there is knowledge, it can shoot sometime"I can't help but ask about Alzheimer's disease.

After all, you participated in the very first study of its genetic nature, when you discovered the presenilin genes?We did this back in the mid-90s of the last century, when we worked as part of an international consortium.

Most of this work was carried out in Canada. Presenilins are enzymes that break down the precursor protein of beta—amyloids, which form amyloid plaques in blood vessels. Then, in our Moscow laboratory, we found a whole family of genes and proteins that are remotely homologous to presenilins, but have completely different functions. It became clear that there are families of genes of special cellular enzymes - aspartate proteases, which have an unusual ability to "cut" other proteins, in particular receptors, inside membranes. And thereby regulate different cellular processes.

You said that the molecular mechanism of Alzheimer's disease is now clear. And why, then, is it not possible to find methods of treatment? Can you guess?I can only speculate about it.

We know the primary mechanism of the disease. The key stage is the cleavage of the amyloid precursor by presenylin enzymes. In fact, two enzymes are needed for this: one cleaves the precursor protein from the outer surface of the membrane, the other, namely presenilin, cleaves it inside the membrane. As a result, short pathogenic beta-peptides with a length of 40-42 amino acids are formed, which form amyloid plaques. This always happens with Alzheimer's disease. Mutations of presenilin genes enhance such cleavage or shift it to a more fibrilogenic peptide. But even if there are no mutations, the mechanism is the same. There are also neurofibrillary tangles in neurons, they are associated not only with Alzheimer's disease, but also with other dementias, also a potential target. We can develop a drug on presenilines, on the beta-peptide itself, or on plaques when they have already arisen. A priori, it is better to influence the primary mechanism. There are inhibitors of presenilins. But the fact is that presenilins are very important for other processes — for the cleavage of many other transmembrane proteins, including cellular receptors. They cleave the receptor to release the intracellular signaling part of the protein. In general, the discovery of presenilins is important not only for understanding the mechanisms of Alzheimer's disease, but also for the fundamental mechanisms of cellular signal transduction. If we suppress presenilines, then this signal transduction will be disrupted, which can lead to very negative effects. Therefore, while existing inhibitors of presenilins are not considered promising therapeutic molecules.

There have been a lot of clinical trials of potential drugs that directly affect beta-peptide or amyloid plaques, which have not led to anything good. Articles appear periodically, including in prominent journals, reporting on the development of variants of monoclonal antibodies that eliminate plaques. This effect has been convincingly shown in mice, but it is not particularly pronounced in clinical trials. One explanation is that neuron degeneration has already begun in Alzheimer's patients, and it is too late to act. So today, most pharmaceutical companies and drug agencies no longer support the development of drugs against beta-peptide.

The risk of Alzheimer's disease (the scientific literature uses the abbreviation BA) is increased in carriers of the variant of the AOE e4 gene, which occurs in 10% of the European population, including in Russia. One of the approaches that we applied with colleagues from the University of Massachusetts Medical School is the development of small interfering RNAs that suppress the activity of AROE in the brain. So far, this has been done on animal models. We also participated in a work with colleagues from the National Institutes of Health, in which we used transgenic models of AD in mice. With age, a chronic inflammatory process occurs. One of the questions is whether this plays a role in the development of AD, and whether this process can be somehow suppressed? In the experiment, we suppressed the formation of immune B cells and found a decrease in pathogenic processes and an improvement in cognitive abilities in model mice.

In general, Alzheimer's disease is an example of the fact that, despite the largely elucidated mechanism of the disease, the approach to drug development aimed at the primary target does not always work. But if there is knowledge, it can shoot sometime.

Portal "Eternal youth" http://vechnayamolodost.ru