06 February 2008

The shelf of genes has decreased

По мере изучения человеческого генома число генов в нем уменьшилось чуть ли не в 2 разаHow many genes does human DNA encode? Over the past twenty years, there have been a wide variety of answers to this question. By the time the "draft" sequence of the human genome was published in 2001, the figure 35,000 was generally accepted, but now genetic catalogs include approximately 24,500 genes. A new bioinformatic study shows that the number of "real" genes is even smaller: about 20,500, and the rest in the current databases are non–coding DNA sequences that got there by mistake.

It has long been known to all progressive humanity that the sensational human genome project has been completed, and that the DNA sequence that makes us human is publicly available [1]. However, if we leave aside sensationalism, it must be said that the data obtained then continue to be refined to this day and, not only that they remain incomplete (the sequence of some sections of chromosomes is difficult to sequence), but also do not give a clear picture even regarding the number of genes included in the genome. What can we say about understanding all their functions!..

"Genetic atlases" (for example, Ensembl) contain information about the genes that make up the genomes of various organisms. However, how to isolate sequences from a multi–megabyte genetic text corresponding to the genes - the sections of the chromosome on which genetic information is read? After all, the overwhelming amount of DNA, although it can perform important functions, such as regulating the transcription of "true" genes, does not encode anything! (Because of this, such DNA has received the offensive name "garbage".)

Usually, bioinformatic methods are used to search for genes (and, accordingly, to enter them into "atlases"), identifying open reading frames (ORS) in the DNA text corresponding to the sections of the chromosome on which RNA polymerase binding and mRNA synthesis occur. Roughly speaking, any sufficiently extended (>300 base pairs in a spliced form) The ORS identified on the computer is a gene in the "genetic atlas".

However, life is always more complicated than the scheme: not all such ORS correspond to genes – apparently, due to the peculiarities of DNA packaging on the chromosome, obligate suppression of promoters and for a number of other poorly understood reasons. Thus, genetic databases contain both "real" (the existence of which is confirmed at the level of the protein product) and "potential" genes. Moreover, among the latter there are probably both those whose activity is extremely low in a given place and at a given time, and non-coding sequences containing an "imaginary" reading frame and only confusing researchers. (By the way, it is important to know the exact list of genes at least in order to be able to conduct large-scale experiments involving absolutely all human proteins [2].)

The three main genetic "atlases" – Ensembl, Vega and RefSeq – currently contain about 24,500 genes in total, but already in 2002, after comparing the human and mouse genomes, it became clear that quite a lot of human genes do not have homologues in the mouse genome, and vice versa. Studies also show that during the evolution of mammals, such a number of genes could hardly have arisen and disappeared. The conclusion was obvious: the annotation of genetic texts is not perfect enough. However, now there are no standard mechanisms for "cleaning" genetic banks in order to remove "impostors" from them.

A solution to this problem is proposed by a group of American scientists led by Michele Clamp and Eric Lander, based on the evolutionary relationship of genes and the comparison of several genomes among themselves [3]. A gene was considered "true" if it was possible to identify homologous to it in the genomes of two other mammals – a mouse and a dog (or "true" homologous genes were found in the human genome itself). Thus, of the 22,218 genes contained in the Ensembl v35 atlas, about 19,000 were recognized as "real", and 1,177 as "orphan genes". Approximately 1,500 more genes were classified as retrotransposons or pseudogenes; in addition, obvious errors in the annotation of some genes were identified.

However, the mere fact that the "orphan" genes do not have mouse and canine homologues, of course, does not allow us to say that these are not genes at all. It can be assumed that these genes originated during the evolution of primates and are specific to this particular group of mammals, or were lost in mice and dogs, but preserved in primates. Fortunately, this hypothesis can also be tested: after all, the genomes of macaques [4] and chimpanzees are already known today! But even in this case, using a technique that can accurately identify the kinship of genes, scientists only confirmed the nickname of "orphan" genes: for the vast majority of them, no homologues were found in any of the genomes of human "relatives".

A similar analysis carried out on all three mentioned gene catalogs "removed" about 5000 DNA sequences from the list of protein-coding genes, and the total number of "real" genes became equal to 20470. The researchers admit that this figure will gradually increase (it is unlikely, however, to exceed 21,000): after all, the restriction on the minimum length of the ORS probably filters out the genes of small peptide hormones, and the Y-chromosome, mitochondrial chromosomes and the remaining sections of sequenced DNA "unmarked" by chromosomes were not included in the analysis itself.

The described study, in addition to offering a substantially revised list of genes, also defines the procedure for the future addition of genes to catalogs. Scientists propose to include a new "challenger" in the genes that does not have homologues in the genomes of other mammals only if it is possible to experimentally prove its existence (for example, by identifying its protein product). Thus, by the way, in the described work, only 12 genes out of 1177 "orphans" were "rehabilitated" – for them, a search in the literature revealed the fact of expression in vivo. In other cases, it is recommended not to believe the "impostor".

"Without the genomes of other primates at hand, we would hardly be able to hammer the last nail into the coffin of these "orphan" genes," Klamp emphasizes the importance of genome sequencing [5].

In the course of the work, quite strong evidence was obtained that since the divergence of primates with other branches of mammals, a large number of new genes have not appeared in the genome (and the old ones have not been particularly diminished). In fact, this means that humans differ from mice and dogs not at all in the number, function and structure of genes as such, but in something much more subtle, beyond the properties of the genome available to us today. But what exactly is to be found out, most likely, by more than one generation of biologists.

Literaturebiomolecule – "Human genome:
how it was and how it will be";
biomolecule – "A footboard for the AIDS virus";
Clamp M., Fry B., Kamal M., Xie X., Cuff J., Lin M.F., Kellis M., Lindblad-Toh, K., Lander E.S. (2007). Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. U.S.A. 104, 19428-19433 (online);
biomolecule – "The time of monkey research: the Rhesus macaque genome has been decoded";
ScienceDaily – “Human Gene Count Tumbles Again”.

Author: Anton Chugunov, "Biomolecule"

Portal "Eternal youth" www.vechnayamolodost.ru
06.02.2008

Found a typo? Select it and press ctrl + enter Print version