30 June 2022

Journey to the center of the genome

What sense did geneticists find in meaningless sections of DNA

Polina Loseva, N+1

The prototype of a man is his genes, scientists believed, starting a project to decipher the human genome. And when it turned out that eight percent could not be read yet, no one was upset. Anyway, there were no genes there — only meaningless sections, "junk DNA". It was possible to put them together and decipher them only now. It was then that it turned out that they were talking about something in which the genes had long been confused.

In 1879, the German biologist Walter Fleming peered into preparations from the testes of salamanders and abdominal fat of newborn kittens to understand how all living things reproduce. He knew that every living cell has a nucleus. But I noticed that the nuclei multiply differently than the cell. They do not have a jumper that divides them into two halves — instead, the ball-core turns into a chaotic tangle of thick threads.

centromere1.jpg

Fleming's drawings. Preparation for cell division in the testes of a salamander. (1) — the nuclei of non—dividing cells, (2) — the nucleus in the section, the beginning of division, (3-5) - the stage of the "tangle", there is no longer a nuclear shell, only a "thread".Walther Flemming / Journal of Cell Biology, 1965.

Fleming had no idea what these fibers were made of or why they were needed. He called them "chromatic filaments" because his dyes were bound to them, and he was interested in why, with each division, each daughter cell gets them equally. Today we call these strands chromosomes.

The German noticed that the tangle stage is followed by the star stage, in which each thread bends in half and stretches its tip to the center of the cell. After that, they all line up in one row, and then turn around and symmetrically diverge to the poles of the cell. Fleming suggested that the "tails" of the thread should somehow differ from the top, but he did not look further - instead he undertook to find out whether the chromosomes move in the same way in different organisms.

centromere2.jpg

Stages of division of the salamander testis cell: (37) nucleus, (38-39) transformation into a tangle, (40) star, (41-42) — "equatorial plate" (today it is called the spindle of division), the vertices look to the center, (43-47) divergence of the threads to the poles, the vertices look from the center. Walther Flemming / Journal of Cell Biology, 1965.

By the beginning of the 21st century, geneticists had already figured out what was happening in front of Fleming. They found out that before the division of the nucleus, the chromosomes in it double, and therefore, in fact, two identical sets diverge to the poles. The area in which the chromosomes bend, scientists called the centromere and found out that a kinetochore protein growth forms on it, and motor proteins grab onto it to pull the chromosomes apart.

But they still didn't know what the centromere had inside.

Minor DNA

In 1986, the US Department of Energy (the same one that was engaged in the Manhattan project) invited 43 geneticists to Santa Fe, so that they would think whether it is now possible to take and completely decipher human DNA. The scientists conferred and recognized the task as quite achievable. With one nuance. "Some parts of chromosomes may not be amenable to sequencing," the founders of the Human Genome project summarized. They were confused by the fact that there are many repeating fragments in human DNA (in total, they occupy more than half of the genome), with which the restriction enzymes that cut DNA for sequencing will not be able to work.

This, however, did not stop anyone, because unreadable pieces of the genome were of little interest to anyone at that time. "Back then, all the attention was on genes," recalls Ivan Alexandrov, a senior researcher at the Institute of General Genetics of the Russian Academy of Sciences. The demand was primarily for euchromatin — the active sites of chromosomes on which RNA synthesis is taking place. They were "clinically interesting", it was possible to find genes on them and link their work with specific diseases.

As part of heterochromatin (the folded, inactive part of chromosomes), of course, "harmful" areas can also occur — for example, copies of gene parasites of transposons often sit in these tangles. But this threat is usually under control: the convolution of the sites does not allow the transposons to speak out, that is, to multiply and jump further.

centromere3.png

Eu- and heterochromatin in the electronic photo of the cell (a), diagram (b) and in the composition of chromosomes (c). Dark centromeres are visible on condensed chromosomes ready for division (M-phase). K. Laurence Jost et al. / Chromosoma, 2012.

More often, these sequences, although silent — in the sense that they do not produce protein — are also engaged in useful work. Like telomeres — patches on the ends of chromosomes, thousands of repetitions of the same "word" TTAGGG. It doesn't mean anything, it's just that with each copy of the chromosome, several dozen telomeric repeats go under the knife, and the genes inside the chromosomes remain intact and can continue to work.

The centromeres look about the same — a meaningless set of letters. However, whole phrases of several hundred nucleotides in length are repeated in them, and they can differ quite a lot from each other. And this is the second (and main) reason why the "Human Genome" had no way to read these sections.

At that time, the shotgun method was used for sequencing. It was necessary to take a small piece of DNA, make many copies of it, then cut them into short pieces with the help of restrictases and read each one separately. Since each copy of the restrictase is crushed in a new way, several different short fragments are obtained at the output of one DNA fragment. Then bioinformatics come into play: they look for overlapping sequences and, by superimposing them on each other, restore the full text.

centromere4.png

The sequence of stages in shotgun sequencing. Chial, H / Nature Education, 2008.

In 1998, Francis Collins reported: we have finished making a general map of the genome, give us another five years and we will decipher it. But he did not promise to read the genetic text of a person completely: by 2003, his wards expected to reveal 90 percent of human DNA to the world. The remaining ten, echoed the head of the Human Genome project to his predecessors from 1986, will not be able to see - this requires a technological leap.

The shotgun method worked well for the unique sequences from which the genes in euchromatin are constructed. But before repeats, which are abundant in heterochromatin, it is powerless, because the restrictases cut it into fragments about 200 nucleotides long. And this is 33 telomeric repeats (out of several thousand) and a little more than one centromeric (out of several hundred or thousands).

"Imagine that we have some newspaper," explains bioinformatician Olga Kunyavskaya from St. Petersburg State University, who annotates centrometers today. — One hundred copies. They were cut into small pieces, and cut in different ways, and we need to restore the entire edition. And we have a section in the newspaper where one word with minor typos is repeated 1500 times. To understand in what order they go [one after the other] is almost impossible. It can be proved algorithmically that this cannot be done. You can come up with several sequences that, after sequencing, will give the same dataset."

In addition, centromeres were still considered genetically meaningless. "It was a shame," says Alexandrov. — Crick came up with this "junk DNA" at the time, and [since then] it was believed that repeats are some kind of unnecessary DNA that parasitically reproduces itself. And the [real] centromere is something like a gene that is hidden inside this junk DNA. Back in the eighties, it became clear that this was not the case, that all these repetitions were centromeres, but this dogma [about a gene somewhere in the depths of junk DNA] lasted for 15 years, until about the beginning of the 2000s."

In 2001, the Human Genome published its first draft. In 2003, the data was polished clean and released — although the text was still incomplete. Scientists honestly admitted that they did not deal with heterochromatin at all, so part of the sequence remained unencrypted. That, however, did not prevent them from announcing the beginning of the genomic era in biology.

Not everyone took this manifesto seriously. American scientists Stephen Henikoff and Harmit Malik snidely responded that the part of the chromosome that humanity first examined in the person of Victor Fleming had not even begun to read. And the "readers" themselves agreed that it would be good to close the gaps — without this, it is impossible to say for sure even how long our genome actually is. But the time for new technologies, which Collins hoped for, has not yet come.

The handle from the chromosome

The question of genome length was not the only one that heterochromatin sequencing had to answer. And not even the main one. For example, it was completely unclear how the system of chromosome movement seen by Fleming during division works at all. Or rather, how the cell manages to grab them strictly by the top.

There are technical sections in any linear chromosome and, as a rule, they are very similar in all eukaryotes. For example, telomeres in all animals are arranged approximately the same, their repeats often match up to the letter, only the number differs (in mice, for example, telomeric repeats 7-8 thousand, and in humans — at least three times less). This is not for nothing: technical proteins bind to the technical sections of DNA. If these areas mutate too quickly, the proteins will stop recognizing them. And if the telomere changes beyond recognition, the chromosomes will begin to break down during cell division — and the cells will follow them.

But centromeres, on the contrary, mutate very quickly. Even in close species, these sequences are very different. Fluorescent probes that bind to human centromeres do not react in any way to chimpanzee centromeres (although everything works out with telomeres). And the number of these repeats can vary at times: in the cells of the same person, the chromosomes received from the mother and from the father, the centromeres can be so different in length that it can even be seen in a microscope.

centromere5.png

Pairs of homologous chromosomes in human leukocytes. The centromeres are colored dark. On the left — a person got a similar—sized chromosome, on the right - with different lengths of centromeres (arrows). Top — pairs of chromosomes 1, bottom — pairs of chromosomes 9. Ann P. Craig-Holmes, Margery W. Shaw / Science, 1971.

The fact that centromeres turned out to be both a vital and highly variable part of the genome, Henikoff and Malik called the "centromeric paradox". It's amazing that a diverse life that uses centromeres to multiply its cells is still viable. If the centromeres change too quickly, then there is a risk that the kinetochore proteins will stop recognizing them and assemble in the right place, which means that the chromosomes will begin to diverge unevenly into daughter cells. It's like trying to sort out a working airplane engine in the middle of a flight — how to do it without falling into a peak? And how does he manage to keep flying? I mean, do animal cells (and animals themselves) need to reproduce?

Henikoff and Malik also noticed that no one really knows how motor proteins determine which place to drag a chromosome into a new cell. At first, it was thought that it depends on the sequence of nucleotides: where there are certain fragments of text, the protein-packer CENP-A sticks. A DNA strand is wound on it, and one centromeric repeat (171 pairs of nucleotides) just fits exactly one reel.

But from time to time, geneticists encounter people whose CENP-A twists DNA in the wrong place. This happens, for example, if a piece of the chromosome broke off — but instead of being lost during division, it became another chromosome, the 47th. After that, it lives on: it doubles and is transmitted from the mother cell to the daughter cells as a separate chromosome. How, why and in what place such a neocentromera appears is completely unclear. As a rule, this is some area where there are almost no genes. But there are no centromeric repeats — and yet, CENP-A still sits there over and over again. So it's not just about the text.

centromere6.png

Situations in which there are neocentromers (red). The usual centrometers are indicated in yellow. Owen J. Marshall et al. / The American Journal of Human Genetics, 2008.

Then there was a suspicion that the centromere may not be a genetic entity, but a purely functional one. And it's not about the target sequence, but the context: there simply shouldn't be any genes or histones around the CENP-A "landing" place — proteins that wind up the rest of the DNA. These proteins are similar in structure to CENP-A, but stick to a variety of nucleotide sequences, not only centromeres. Why in a cell all DNA is wound on histones, and the centromere is strictly on CENP—A, is not really known. But since there should be only one centromere on each chromosome, it is logical to imagine that sections of DNA compete with each other for the right to become one. And the ones with more sticky repetitions for CENP-A win.

In addition, Henikoff and Malik suggested that not only sections of DNA compete with each other in this field, but also identical chromosomes. This is due to the way female germ cells are formed in humans: the precursor of an egg divides twice, and only one of its four descendants will become an egg. Accordingly, in each division, one of the two copies of the chromosomes will be dragged into the future egg, and the other into the auxiliary cell, that is, from the point of view of inheritance — to nowhere. And which one gets where, according to the hypothesis of Henikoff and Malik, depends on how firmly the kinetochore (CENP-A and other proteins) is fixed on the chromosome.

Thus, the centromere turns out to be the subject of natural selection, which favors the chromosome with the strongest kinetochore. All the others, including the genes, are just passengers. They will go to the next generation if their centromere wins. Each chromosome tends to make its centromere more sticky for CENP-A and that is why, Henikoff and Malik argued, centromeres mutate so quickly.

In turn, CENP-A is also changing. It strives to become more universal and communicate well with all centromeres in the genome, so that no chromosome is too successful and does not spoil the distribution of hereditary material. If the kinetochore grabs one chromosome harder (and leads to more motor proteins), and the other is weaker, then the chromosomes will not disperse equally during division, and the descendants of the cell will not survive. It turns out that the centromeres are preening all the time to please the squirrel, and he constantly changes his preferences in order to maintain the status quo. Therefore, by the way, CENP-A is also very different in close species. The engine of the aircraft changes with the aircraft — and therefore the flight continues.

CENP-A "pecks" at a very specific 171-nucleotide sequence. But, of course, there are possible options in it, and they can be both more and less tempting for protein. It seems that the number of times this sequence is repeated also plays a role: the more repetitions, the higher the chance that CENP-A will land here. Therefore, over time, there are more and more centromeric repeats (monomers) on the chromosome — which eventually left the "Human Genome" with incomplete DNA decoding on its hands. In place of the centromere and other pieces of heterochromatin in the text of the genome, we had to leave a proxy (what we somehow managed to collect).

"They weren't real sequels," explains Andrei Bzikadze of the University of California, San Diego, who is filling in these gaps 20 years later. — There is no biological formation, be it a cell line or a person who would have such DNA sequences. Individual words and sentences in these areas resembled real ones, but their order was not real."

And somewhere it didn't work out to collect anything at all, there were just the letters N in the text of the sequence.

centromere7.png

This is how one of the "white spots" on the 21st human chromosome ends in the 2013 assembly. Genome Reference Consortium.

Reading technique

To find out what is written inside the centromere, it was necessary to examine it in its entirety or at least in large pieces so as not to get lost in individual strokes. But in the 90s, biologists armed with the shotgun method could only read short small fragments of several hundred nucleotides.

How it worked: we take many copies of the same piece of DNA and begin to build pairs to them, substituting suitable nucleotides one by one. At the same time, in addition to the usual adenine, guanine, thymine and cytosine, the same nucleotides float in the solution, but with a luminous label (for each type — a separate color). Every time such a nucleotide enters a thread under construction, synthesis stops — nothing else can be attached to it. Thus, a set of shortened pieces of DNA appears in the solution — in 1, 2, 3 and so on nucleotides, and at the end of each there is a glowing label that signals which letter was the last.

Then you can load this whole set of pieces into a gel plate or into a capillary and make them crawl forward under the influence of an electric current. The shorter the piece, the further it will get, and at the exit you will get a ladder: the pieces will rise in height, long and heavy closer to the start, short to the finish. Each of them will glow in one of four colors. After that, you can look at them (with the naked eye or with a specially trained device) and count the color sequence, translating it into letters along the way.

centromere8.jpg

This is how sequents were obtained in the Human Genome project. For ease of separation, the reaction is carried out in four solutions, in each of which labeled nucleotides of only one type are added. On the left: a gel with four "ladders" of DNA fragments, on the right: flashes that correspond to different labeled nucleotides, in the middle: the final sequence. Abizar / wikimedia commons / CC BY-SA 3.0.

This method works well if we are interested in short sections of DNA. When the pieces in the solution are too long, the accuracy decreases: for him, a DNA strand with a length of 1000 nucleotides differs from a thread of 1001 nucleotides much less than a thread of three nucleotides from a thread of four. Therefore, it is no longer possible to find out in what order certain letters are located.

To learn how to read long sequences, it was necessary to create a fundamentally different sequencing technology. It was invented in 2003, but it was finalized for several more years. The idea was not to make many copies, to break them into pieces and then assemble a genetic puzzle from them — but to read the sequence on the go, in the process of copying it. That is, to use DNA polymerase as a sequencer.

To do this , we had to learn the following things:

  • fix the DNA polymerase at the bottom of the vessel (otherwise it will float on the solution, and it will be difficult to see it);

  • find such dyes that do not stop the reaction (that is, do not prevent the nucleotides from being integrated into the growing chain one by one);

  • focus the microscope on a single molecule.

The sequencer of a new sample was assembled in 2009. The new technique looked like this: DNA polymerase is anchored at the bottom of the vessel, which stretches a DNA strand through itself, completing a copy to it. Nucleotides with sewn labels float in the solution, and they all glow — but the device does not notice this, because it is focused only on the active center of the polymerase. Every time she grabs a new nucleotide and holds it long enough to attach it to the chain under construction, the sequencer recognizes the glowing label — and enters a new letter into the decoded sequence. Then the polymerase finally attaches the nucleotide, but its "tail" with the sewn label falls off at the same time. The signal is interrupted — before the polymerase grabs the next nucleotide.

centromere9.png

This is how genomes are sequenced now. (A) the general scheme of the installation: at the bottom of the container, a polymerase is visible, which drags a DNA chain through itself, completing a copy to it; (C) — the scheme of its operation: the polymerase grabs the nucleotides in turn, and their glow is recorded by the device. John Eid et al. / Science, 2009.

Now geneticists could put shotguns aside and switch to more serious caliber guns — so that they would no longer try to restore the portrait of the centromere from the ashes, but assemble it, as a constructor, from large fragments. It took another 10 years to refine the method and learn how to process really long DNA fragments. Now such sections of DNA are stitched into a ring so that the polymerase goes through the same place several times in a row. Then you can compare the results, find typos and get a pretty clean sequel.

centromere10.jpg

If you fold the DNA into a ring, you can read the same place several times in a row to eliminate accidental errors. Aaron M. Wenger et al. / Nature Biotechnology, 2019.

After that, geneticists joined the Telomere to Telomere consortium (T2T) to read the DNA from cover to cover, including "functional areas that were left out." In 2019, the Consortium assembled the sexual X chromosome, then the eighth chromosome — with one of the shortest centromeres. And in 2021 he finished work on the human genome.

Even before the T2T sequents, it was clear that human centromeres were assembled from small repeats, monomers. And they, in turn, gather in clusters of a dozen and a half pieces — they are called HOR, high-order repeats. And these clusters are also repeated many times. But neighboring monomers in the same material may differ from each other. And neighboring worlds may differ. And on different chromosomes in the genome of one person, the genes can be so different that they belong to different families. In general, in order to understand the structure of millions of nucleotides long, it was necessary to write an algorithm that would mark it up.

"This task [only] seems simple,— says Kunyavskaya. — The difficulty is that we don't know what we're looking for. There are variations of HOR, there are hybrid monomers. There are HORS with an irregular structure, there are longer or shorter ones. It is unclear how to find the canonical HOR. It is unclear which monomers are the main ones and which are variations. For example, what is a chair? It's something on four legs. And if it's a bar stool? So it's something they're sitting on. And if I sit on the table? Etc. No one knows how to properly: how to annotate? What should I call it? What should I take for a base?".

centromere11.png

Classical genes and their variants in centromeres of different chromosomes. Nicolas Altemose et al. / Science, 2022.

What the scientists from St. Petersburg State University have managed to do now is just one of the options for marking the centromere, worked out on one particular genome. The main thing, Kunyavskaya notes, is that it is reproducible: then you can take the same algorithm and apply it to annotate other centromeres. And after that, evaluate how plausible and useful it turned out to be. "Now I have annotation launches for an orangutan, a human and a chimpanzee," she says, "and different monomers and different structures are obtained there. And the most interesting thing is to compare two people, in what they are the same and different, [to understand] how they evolved."

And here, it turned out, centromeres can tell what other genes are silent about.

Chromochronology

The smallest of the described centromeres in yeast, they occupy only 125 pairs of nucleotides. In humans, centromeres have stretched into millions of pairs, which is several orders of magnitude more than a motor protein needs to grab a chromosome. And now, having finally put their structure together, we can imagine why they are so long.

"Logic suggests that at first this thing was not so big," says Ivan Alexandrov. — At first there were several repetitions, then they began to grow, and here is a huge cluster." And although, according to the scientist, "no one has seen this directly," he and his colleagues have no other way to explain the appearance of such a complex structure.

Now the life cycle of the centromere of genetics is described as follows. First there is a section of DNA with which kinetochore proteins bind. It consists of small monomers. At some point, a mutation occurs in one of the monomers — literally 3-5 point substitutions — which make it even more sticky for proteins. "The kinetochore sits at the place where the sum of affinities (that is, the force of "attraction" — approx. N + 1) to him more, — explains Alexandrov. — And if something appears that he likes to sit on more, then he starts sitting there all the time. And since he multiplies what he's sitting on, this thing starts to grow."

The new sticky monomer is randomly doubled. The exact mechanism of this is still unknown, but there is a suspicion that the kinetochore itself is directly or indirectly to blame for this. The two monomers attract the kinetochore even more. Then these two monomers are already doubled, then the four are doubled — egoistic selection only welcomes all this. In addition, these repeats can jump to the neighboring chromosome, because similar areas in different chromosomes can stick together. This is only possible for sequences that are very close to each other — and it works great with repeats of centromeres, which differ only in point substitutions.

As a result, monomers multiply, jump from chromosome to chromosome, gather in clusters, spread in whole clusters — all this works as long as they lure kinetochore proteins to themselves better than others. So in the middle of the centromere there is a center of attraction and begins to grow. And those repetitions on which the kinetochore used to sit do not disappear anywhere, but simply move to the edge. Forgotten by the kinetochore and no longer needed for anything, they turn into an inactive centromere zone.

centromere12.png

Chromosomes and their centromere structures. (A) The red segment is the "living" part, the orange one is "dying" at the edges, the yellow one is "dead" even further, behind them are the sections of the centromere that are not related to the landing of the kinetochore. Nicolas Altemose et al. / Science, 2022.

Here, in the dustbin of history, monomers and their clusters begin to mutate at an incredible rate — tens of times faster than before. Over a million years, they can accumulate about 10 percent of the differences from each other. How exactly they manage to do this is not really known, but scientists suspect that some special mechanism of "hypermutability" works there. "The usual rate of evolution [of active centromeric repeats] is 0.2 percent per million years,— Alexandrov explains. — To gain one percent of mutations, you need five million years, and if 10, then it's already 50. This is no primate evolution will be enough [to accumulate so many differences at such a pace as a dead centromere]."

As a result, inactive monomers and hybrids are very different from their neighbors on the chromosome, and from other chromosomes. And since they don't look like anything else, it's harder for them to stick together with other chromosomes. They stop jumping from one to the other, stop doubling, and repetitions begin to get lost. It turns out that in the center of the centromere it grows, and on the contrary, it shrinks at the edges. And from what was once a living centromere, there remain peculiar annual rings.

Therefore, Alexandrov believes that the centromere can be used as a source of phylogenetic information. And more accurate than the others. "There are mutations in any DNA fragment that occurred at different times. But there the old and new mutations are all mixed in one fragment, they are not spatially separated. Imagine that the [cultural] layers under the city would be liquid, and everything would fall down. Would we be able to understand something [about the course of history]? We could probably [assume something] just by the complexity of the ceramics. [Let] the most primitive is the oldest, and the most refined is the most complex. But, of course, there would be mistakes: if the barbarians conquered the city and simple ceramics returned, we would think that it was the oldest, but in fact it was not. And the centromere is like Schliemann's: Troy-2, Troy-3, Troy-4. When the layers are laid one under the other, then everything is clear."

The remains of the old centromeres, of course, are stored in the chromosome without any archaeological supervision - they break down and mutate. But even millennial ceramics reach us, as a rule, in the form of potsherds. And each of the monomers accumulates its point changes independently of its neighbors. Therefore, it is possible to compare them with each other — bioinformatics just come into play here — to discard new mutations and restore the original chain.

The centromere has another advantage. The remaining parts of the chromosome recombine regularly: in the progenitors of germ cells, a pair of identical chromosomes from the mother and from the father converge and exchange small parts. This is how genetic diversity arises — and at the same time problems arise for the study of evolution. "The human genome is pieces of different ages, but recombination grinds them all the time," explains Alexandrov. "And we cannot find a large piece of DNA in a person [among the genes], as it was in the conditional Adam."

This does not happen in centromeres, recombination is suppressed there. "This is a giant piece, 10 megabases (millions of base pairs — approx. N + 1) old DNA," the scientist continues. — And in one chromosome you can see a piece of DNA that has never recombined since Adam's time, this is Adam's DNA. And in another, for example, the DNA of Abraham. And in the third — Vanya Alexandrov." 

By considering these layers in human chromosomes, it is possible to reconstruct the primate evolutionary tree. That's what happened to Alexandrov and his colleagues. The outer layer of the old centromeres contains monomers similar to what modern monkeys of the New World have (they are also broad—nosed, like howlers and capuchins), which means that it remained from our common ancestor with them. Closer to the center are layers that capuchins do not have, but they are found in all monkeys of the Old World (narrow-nosed), for example, rhesus and baboons. The layers following them, apparently, appeared later — they can only be found in great apes. Then there is a layer characteristic of hominids, that is, humans, orangutans, gorillas and chimpanzees. And only the last two have the newest centromere layers in common with us.

centromere13.png

If you superimpose the centromeres of different primates on each other, you can see in which branch and in what order the new layers appeared. For example, a marmoset does not have some layers that a macaque has. And humans have layers characteristic of different groups of primates, but they are shifted to the edges, to the zone of the dead centromere. Scheme: Ivan Alexandrov

For most layers in human centromeres, it is possible to pick up a branch of primates from which they presumably appeared. But some layers in this sense arouse suspicion. This is, for example, the SF9 layer (it is red in the graph above) — we do not yet know of such animals that would already have it, and SF7 (green), which dates back to macaques, was not. Or S10 (brown). "There were many different taxa of primates," suggests Alexandrov, "but they did not survive. And we can restore these missing links. We can say: this layer was, although there were no living representatives left."

There are other, even more ancient layers behind SF12. But from them, according to the geneticist, there are literally a few fragments left. They don't even have anyone to compare them with. "If we go back," says Alexandrov, "there is the next [known to us] branch of the tarziera (dolgopyatovye — approx. N + 1), then lemurs. They have other repetitions in their centromeres. It seems that in a hundred million years, about everything is lost, and if one piece remains, it is difficult to prove that it really was a centromere. [So] you can shine this flashlight for about a hundred million years."

Such centromeric dating, Alexandrov is sure, can help to determine the ownership of fossils much more accurately. "If DNA could be extracted from million-year-old bones, then in this way it would be possible to classify all primate remains and make a revolution in anthropology. But since there is progress here, maybe it will happen someday."

centromere14.png

The evolutionary tree of the anthropoids. Dbachmann / wikimedia commons / CC BY-SA 4.0

Alexandrov suggests starting with gigantopithecus. "He used to be considered a giant man," says the scientist, "but then they found out by conventional methods that this was a branch of an orangutan. He was so advanced, similar [to a human] anatomically. These gigantopithecus became extinct not so long ago, and in China, they say, their teeth are sold in the markets as dragon teeth to increase potency. Take this tooth, extract some small particles from it - repetitions are good because there are enough small fragments — and say: the blue layer is characteristic of an orangutan. The gorilla-human-chimpanzee clade has the following layers, and the gigantopithecus should not have these layers, the last one will be blue. And then we will be able to say for sure that this is not a human line. I am agitating someone to do this, but I have not yet campaigned."

You can try to track down much closer relatives of modern people — like Neanderthals and Denisovans. "The evolutionary history is very long, the dots are very far apart," Alexandrov complains. — Denisovites are a dot in a white field. If we add a Neanderthal, an African, a non—African to this point, we could understand much more."

According to the scientist, Neanderthals and Denisovans have the same genes as modern humans. But the shortened versions of one HORa may differ. And in some rare people, Alexandrov and his colleagues have already found centromeres consisting of these Neanderthal and Denisov variants — apparently, these are the consequences of a long-standing crossing of populations. And since centromeres do not recombine, it can be assumed that a long piece of Neanderthal and Denisovan DNA remains in such chromosomes. "And if you find such a centromere," the scientist says, "you will have 15 megabases of Denisovan DNA at once. And you can study them, and you can make a lot of interesting conclusions about how it evolved."

A place to step forward

But so far it has not come to interesting conclusions. "It all became clear only now," Alexandrov explains, "from [the work of] T2T. This is the first time that large centromeres have been folded [entirely]." Until now, geneticists had to be content with individual short sequences, models and approximate reconstructions. This did not allow us to look for subtle differences in individual monomers — although it was enough to imagine in general terms the mechanisms of evolution of the centromere, its general structure, and the location of its annual rings.

Now that fully deciphered centrometers have appeared, it is possible to test assumptions and models on them, which have accumulated a lot over the past thirty years. "What is done in these articles in Science," it's like Tutankhamun's tomb,— says Alexandrov. — There was nothing new in it, probably. Which no one has ever seen. As separate items, it all already existed somewhere. But here we went there and saw everything together in pristine condition, and this is the first time."

But in order to fully understand what this means, geneticists will have to find and describe many more similar tombs. The complete genome that the T2T consortium has now assembled is the chromosomes of a single person, or rather, one human tumor. She was chosen not at all because she looks most like an average person's cell, but because she is homozygous, that is, instead of a maternal and paternal set of chromosomes, she has a doubled paternal set. This helps to avoid confusion during sequencing: there is no need to figure out which chromosome of the pair each piece belongs to.

centromere15.png

Tumor chromosomes CHM13. Karen H. Miga et al. / Nature, 2020.

To move on to the next stage, it is necessary to work out the sequencing technique for heterozygous genomes. This, according to Alexandrov, is the next task that T2T faces. After that, it will be possible to sequence many more different centromeres, compare them with each other and get an idea of what is happening to them in the populations of modern people.

Judging by preliminary data, a lot of interesting things are happening there.

For example, back in 2012, it turned out that kinetochore can appear in different places on chromosome number 17. Moreover, this can happen even within one human cell: on the maternal chromosome, he sits down as usual, on the "live" centromere, and on the paternal one — somewhere to the side, on an inactive area. "There are some toxic shortened versions [of the genes] that kinetochore doesn't like," says Alexandrov. "And if a person has a lot of them, then he leaves if there is any additional landing pad."

In the current genome assembly, traces of kinetochore jumps were found on another chromosome — on sexual X. According to it, he jumps within the limits of the living centromere: the proteins characteristic of the kinetochore are detected from one edge, then from the other, or even in the middle of the active set of genes. And over time, such kinetochore maps will surely be built for other human chromosomes.

centromere16.png

Variants of CENP-A landing on the X chromosome are designated as CDR (these are areas with a characteristic proportion of methylation, where CENP-A is usually detected). Nicolas Altemose et al. / Science, 2022.

These throwing of the kinetochore on the chromosome — apparently, this is evolution in action. Different landing sites compete with each other for the attraction of proteins, and one can expect that over time one of them will die off and will gradually shrink and disappear. And the other will win and begin to grow — until a new, more attractive area for the kinetochore appears inside it or next to it.

Alexandrov suspects that the inconstancy of the kinetochore is not accidental. By itself, the structure of the centromere — a lot of repeats, a "dead" zone with a stock of old repeats, an increased accumulation of mutations on its side — is perfectly suited for the kinetochore to stick to the chromosome in several places at once. "One gets the feeling," he argues, "that the centromere is specially designed to provide a change of scenery in order to generate diversity."

The geneticist has his own explanation of why this is necessary. "The new centromere is [a] new species," he argues. "When an old species exists in one territory and a new one arises, there is no way to do this unless you establish a non—crossing barrier between them." And if they have active centromeres in different places or with different sequences, then their descendants will have problems with the formation of germ cells. Two chromosomes will asymmetrically stand opposite each other in the center of the progenitor cell, and when they begin to be separated from the sides, the tension may be uneven. In this case, one of the chromosomes may go to the wrong place. One of the cells will get an extra chromosome, and the other, on the contrary, will have a shortage (these situations are called aneuploidies) — and if this happens during the formation of germ cells, then the body produces fewer offspring, or even remains infertile.

Therefore, Alexandrov concludes, the tendency of centromeres to rapid evolution may be necessary in order to quickly form and separate new species. And this, he suspects, may be especially characteristic of individual taxa, for example, primates. At least, no such diverse genes, accelerated mutation of dead centromeres, or traces of such frequent jumps of the kinetochore have yet been found in other groups of animals.

Whether there really is a special meaning in the constant "change of scenery" is impossible to check yet. But you can imagine the price we have to pay for it. If the population already has chromosomes with different landing sites, this may be one of the causes of infertility or chromosomal abnormalities. "According to the X chromosome, this is one [anomaly] for a hundred people," says Alexandrov, "they are very frequent, and apparently often go undiagnosed. But in the case of X, these aneuploidies survive because the extra X chromosomes are inactivated, and for the rest [chromosomes] it is infertility."

***

In 1990, Robert Sinsheimer, one of the ideologists of the "Human Genome", compared the sequencing of the genome with the discovery of America, which was just about to turn 500 years old. This project, in his opinion, had to tell a lot about "who we are and how we have become who we are thanks to evolution." True, Sinsheimer believed that the answer to this question, as well as the records of our past, lies in genes, and hardly thought about the "meaningless" parts of DNA that silently serve the work and reproduction of genetic "meaning".

Deciphering these parts dragged on for twenty years. And it didn't add any meaning to them. But showed their meaning: they keep our past. And if you look closely, then there are pieces of the future.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version