05 November 2020

From telomere to telomere

St. Petersburg State University scientists took part in the creation of a new human reference genome

The international consortium Telomere-to-Telomere, which included scientists of St. Petersburg State University, published the first version of a new reference human genome. For the first time, with the participation of University representatives, it was possible to decipher centromeres – sections of DNA with a large number of repeats that make up about 2% of the entire genome. With the help of the standard, scientists will be able to find more links between mutations and diseases, which means to increase the effectiveness of various types of treatment.

The first genome assembly was obtained about 20 years ago. For The Human Genome Project (HGP) several billion dollars have been spent and more than ten years of hard work by many specialists around the world. At the same time, the resulting assembly was actually far from complete. Almost 10% of the human genome was not collected due to problems at various stages of research: from biological experiments to solving algorithmic assembly tasks. Over the next 20 years, the reference genome was improved many times, but nevertheless, even its latest version GRCh38 still contained unknown sequences with a length of about 161 million base pairs – this is almost 5% of the genome.

"One of the main problems for assembly is long sections of repetitive sequences. With the use of sequencing technologies capable of producing only short fragments, it was impossible to determine exactly where in the genome and in what quantity such repeating sites are contained. But in the 2010s, new sequencing technologies developed by Pacific BioSciences and Oxford Nanopore were actively used. The sequences obtained using these technologies are much longer than those of previous generations of sequencers, and amount to tens and hundreds of thousands of bases," said Alla Mikheenko, one of the authors of the project, a researcher at the laboratory "Center for Algorithmic Biotechnology" of St. Petersburg State University.

Therefore, only now, almost 20 years after the creation of the first assembly of the human genome, science is finally ready to close all the gaps in the reference genome. To do this, researchers from different countries joined together in the international Telomere-to-Telomere (T2T) consortium, led by Adam Phillippy from The National Institutes of Health of the USA (The National Institutes of Health, NIH) and Karen Miga from University of California at Santa Cruz (University of California, Santa Cruz, UCSC). To understand the meaning of the consortium's name, it is necessary to know that telomeres are sections of the genome that are located at the ends of each chromosome. Accordingly, the purpose of T2T is to assemble each chromosome "from telomere to telomere", that is, from beginning to end.

The first version of the new reference genome created by T2T was published in the fall of 2020. Now the consortium is preparing a large scientific publication, which will describe in detail the methods of assembling the genome and checking it for errors. Now researchers around the world are waiting for a huge amount of work to analyze the new reference genome.

Tatiana Dvorkina, one of the authors of the project, an employee of the laboratory "Center for Algorithmic Biotechnology" of St. Petersburg State University: "Our group, led by Professor Pavel Pevsner, was primarily engaged in working with one of the most complex sections of the human genome, the assembly of which until recently was fundamentally impossible – centromeres. These are sections of several million letters long, in which the same sequence can be repeated several thousand times. Centromeres are involved in the most important cellular processes – for example, in cell division."

The first program capable of receiving automatic assembly of centromeres was created in the laboratory of Pavel Pevsner in University of California at San Diego by his graduate student Andrey Bzikadze. Then a group led by Sergey Nurk from NIH created the HiCanu program, capable of collecting any genomes from long, high-precision fragments produced by the sequencer of Pacific BioSciences. Both of these programs were used in the genome decoding project, which made it possible to obtain centromere sequences for all chromosomes. It is noteworthy that both scientists, Andrey Bzikadze and Sergey Nurk, defended their dissertations (master's and PhD, respectively) at St. Petersburg State University.

The TandemTools program, developed by employees of the Center for Algorithmic Biotechnology of St. Petersburg State University Alla Mikheenko and Alexey Gurevich, allowed to find important errors in the first versions of the centromere assembly, correct the assembly algorithm and eventually get the correct sequences, which were included in the published genome assembly. Another program developed by Tatiana Dvorkina, an employee of the laboratory, StringDecomposer, was used to study the structure of centromeres, and the result of her work will be able to shed light on many important issues related to the evolution of the human genome.

Tatiana Dvorkina: "It is important to understand that collecting a high–quality genome of each person is, on the one hand, an incredibly difficult and expensive task, and on the other hand, it is completely unnecessary. Two different people are more than 99.9% genetically identical. We can sequence human DNA, compare the fragments obtained with a known standard (or reference) and find differences."

Differences from the reference genome can be both "harmful" (for example, mutations that cause genetic diseases) and, on the contrary, "useful". For example, there are mutations that reduce the risk of developing cancer or cardiovascular diseases. The search for "harmful" mutations is very important both for understanding the mechanisms of development of various diseases, and for predicting risks and developing treatment methods.

Now anyone can sequence their genome and get information about which genetic diseases they carry, whether there is a risk of transmitting these diseases to children, whether they have an increased risk of developing Alzheimer's disease or cancer. Scientists obtained all these data by comparing the genomes of thousands of people with a reference.

"New articles in which researchers report on the links found between certain sites in the genome and diseases are published literally every day. That is why it is so important that the reference genomic sequence is complete and error-free. Otherwise, the mechanisms of development of some diseases will remain unclear and the development of treatment for them will be difficult," said Alla Mikheenko.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version