19 November 2009

Bioinformatics is not a science

What can bioinformatics doM. S. Gelfand,

Doctor of Biological Sciences, Candidate of Physical and Mathematical Sciences, Institute of Information Transmission Problems of the Russian Academy of Sciences
"Chemistry and Life" No. 9, 2009
The article is published on the website "Elements of Science"Everyone knows that bioinformatics has something to do with computers, DNA and proteins and that it is the cutting edge of science.

Not everyone can boast of more detailed information, even among biologists. Mikhail Sergeevich Gelfand told Chemistry and Life about some of the tasks that modern bioinformatics solves (Elena Kleshchenko recorded the interview).

Information in biologyIn recent decades, many new scientific disciplines with fashionable names have appeared: bioinformatics, genomics, proteomics, systems biology and others.

But in fact, bioinformatics, as well as, say, proteomics, is not a science, but several convenient technologies and a set of specific tasks that are solved with their help. It can be said that every person who determines protein concentrations by mass spectrometry or studies protein-protein interactions works in the field of proteomics. But it is possible that over time this division will become less important: the technology used will be less significant than the way to think, to ask questions. And in this sense, bioinformatics, as the oldest of these sciences – it is as much as 25 years old – plays the role of a cementing principle, because no matter how the data is obtained, they still end up in the computer. It cannot be otherwise: the size of the bacterial genome is millions of nucleotides, the size of the higher animal is hundreds of millions or billions. Transcriptomics, which studies the activity of genes, receives data on the concentrations of tens of thousands of matrix RNAs, proteomics – on hundreds of thousands of peptides and protein-protein interactions. You can't work with so much information manually. We still remember how we printed nucleotide sequences on paper, then cut out the printed lines, substituted them for each other and made alignment in such an artisanal way – we looked for similar areas. This was possible when it came to tens or hundreds of nucleotides or amino acids, but with the current volume of data, special tools are needed. Bioinformatics provides a set of such tools – in practical terms, it is an applied science that serves the interests of biologists.

Since my own work is mainly related to the analysis of genomic data, then we will focus mainly on genomics. Even before the latest generation of sequencers appeared, data volumes began to overtake Moore's law: the nucleotide sequences of genomes accumulated faster than the power of computers grew. It would not be a big exaggeration to say that in recent years biology has begun to turn into a science "rich in data". Relatively speaking, in "classical" molecular biology, one biological fact was established in one experiment: the amino acid sequence of a protein, its function, and how the corresponding gene is regulated. And now such facts are obtained industrially. Molecular biology is moving along the path that astrophysics and high-energy physics have already taken. When there is a constantly working radio telescope or accelerator, the problem of data mining is solved, and the problems of their storage and processing come to the fore.

The same thing happens with biology, and very quickly, and it is not always easy to rebuild. However, those who succeed are the winners. At our seminar, one biologist told how he and his colleagues studied a certain protein using traditional methods of experimental biology. This is a difficult task: knowing that a certain function is performed in the cell, find the protein that is responsible for it. They found this protein, studied it and were convinced that there must be another protein with similar properties, since the presence of the first one does not explain all the observed facts. It was even more difficult to look for the second protein against the background of the first one, but they coped with this. And then the human genome was published – and, having gained access to its sequence, they found a dozen more such proteins...

It does not follow from this example that practical molecular biology has exhausted itself. Rather, she learned to use new tools: to interpret not only the strips in the gel after electrophoresis, the concentrations of mRNA and proteins, or, say, the growth rate of bacteria, but also the huge arrays of data stored in the computer. Note that the element of interpretation is inevitably present in classical biology. When a researcher claims that protein A triggers the transcription of gene B, he does not directly observe how the protein interacts with the regulatory region of the gene, but draws such a conclusion from the location of the strips on the gel and other experimental data. In bioinformatics, in fact, the same situation, only raised to the absolute: the finished data is in the computer, and among them you need to find puzzles from which you can assemble a picture.

Primary data processing belongs to the field of technical bioinformatics. The sequencer does not "read" the DNA molecules itself, but gives fluorescence curves at the output, the peaks on which still need to be turned into a nucleotide sequence. This problem is solved each time in a new way for a new sequencing device, and bioinformatics solves it. In addition, as already mentioned, the received data must be stored somewhere, provided with convenient access to them, etc. All these are purely technical problems, but they are very important.

A more complex and interesting occupation of bioinformatics is to obtain specific statements based on genome data: protein A has such a function, gene B is activated under such conditions, genes C, D and E are expressed at the same time, and their products form a complex. This is what we are doing, and this is the practical application of our science. For us, users are other biologists, to whom we inform interesting facts for them.

Location and regulationHow can we draw conclusions about the function of proteins and genes from the sequence of nucleotides?

The first consideration seems banal: if a protein is similar to some other one that has already been studied, then it is very likely that it does about the same thing. In fact, it is not so banal: the first serious success in this direction of bioinformatics was the assertion that viral oncogenes are "corrupted" genes of the organism itself.

It is not difficult to make such a comparison now. There are data banks on nucleotide and protein sequences (they were described in more detail in Chemistry and Life, 2001, No. 2). A general idea of how this should be arranged appeared in the late 80s, and in this sense bioinformatics was ready for the flow of genomic data. Today, this is a standard Internet service: you upload your sequence to a window, press a button, and after a few seconds you are informed which sequences from this database it looks like.

Then more subtle considerations begin. It is known, for example, that in bacteria, genes are often organized into operons, that is, they are transcribed as a single matrix RNA. There are various evolutionary theories that explain why it so happened that functionally related genes form an operon. The first theory is that it is convenient and useful, and therefore supported by evolution. If proteins have a common function, for example, they are responsible for different stages of processing one substance, it is logical that they appear in the cell at the same time, according to the same signal (naturally, with a common mRNA and regulation is the same for all) and in equal quantities. The second statement is less trivial and more beautiful. Genes whose products have related functions benefit from being nearby due to horizontal transfer. This is a very significant mechanism of bacterial evolution: sections of the genome of one bacterium fall into another, which can thereby acquire new useful features. It is clear that if only one gene of the metabolic pathway moves to the new genome, then the corresponding protein will be useless: there is no substrate for the reaction catalyzed by it, and its product, in turn, has no one to process. An additional confirmation of this theory is that bacteria have genomic loci in which genes from the same metabolic pathway lie on different DNA chains and are therefore transcribed in different directions. The increased probability of joint transfer definitely plays a major role here.

The fact that two genes are located side by side in one genome does not say much about their functional relationship, it may be an accident. However, we are able to identify genes in different organisms. Their sequences, of course, do not match up to the nucleotide, but may differ quite significantly. But there are certain rules that allow us to assert that this is the same gene, for example, in the intestinal and in the hay bacillus. So, if a pair of genes is located side by side not in one genome, but in fifty, and in representatives of different taxonomic groups (that is, this arrangement is not just inherited from a common ancestor), it means that they really gravitate to each other. If evolution had not supported their proximity, it would not have been preserved. And so, we can assume that they are functionally related.

The second consideration is similar to the first. Not all bacteria have the same set of genes: for example, if a gene encodes an enzyme needed to process some carbohydrate, then a bacterium that does not eat this carbohydrate will not have it. But a bacterium that feeds on this particular carbohydrate will have all the necessary set: both enzymes and a protein transporter that carries the carbohydrate inside the cell. Functionally related genes are present in the genome according to the "all or nothing" principle: as already mentioned, it makes no sense to have only a fragment of the metabolic pathway, and bacteria are economical creatures, what does not benefit from their genome quickly disappears. Therefore, if we make a table where different genes are arranged in rows, and different genomes are arranged in columns, and mark the pros and cons of the genes present or absent in this genome, we will see groups of genes serving the same function. And an unknown gene with the same set of pros and cons that a certain group can most likely be attributed to it.

The third consideration is related to the regulation of gene activity. Next to the gene, there are usually sites with which certain proteins interact – they can start transcription, block it, control its intensity, in other words, the activity of the gene depends on them at any given time. Some regulatory areas are very well identified by characteristic sequences of "letters", but this is rare. For example, we recognize the binding sites of transcription factors in genomes with low accuracy and, together with the correct sites, we collect a lot of "garbage" – similar short sections that are not really related to gene regulation. But since those genes that work together are jointly regulated, the real binding sites are located in front of the same genes in a dozen genomes, and random ones are scattered here and there, and there is no pattern in their location. It turns out a powerful filter that allows you to filter out "garbage". And if a familiar site is consistently detected in front of a gene with an unknown function, it will be clear that this gene is regulated as part of a functional subsystem that is regulated by the same regulator and provides the same function.

The most interesting thing for me is to study the evolution of regulatory systems, but a by-product of this is a lot of functional predictions. The research develops like a detective story: each consideration individually is very small, but if there are a lot of "clues" and they all fall into one point, then you can make confident statements. There was a case when we described in detail the regulatory system – the transcription factor, its binding sites, that it would be a repressor, not an activator, that binding would require the cooperative interaction of two dimers – just by looking at the genome letters. Subsequently, all this, down to the details, turned out to be correct.

Ribosome as a zinc depotIn one of these works, Ekaterina Panina, at that time a student of the Moscow State University Mehmat, played a central role (then she entered the graduate school of the University of California Los Angeles and became a real experimental biologist).

She came to us in the third year and said that she wanted to study such biology. By the end of the mehmat, she had published several articles in serious magazines.

Fig. 1. When there is an excess of zinc (a), bacterial ribosomes store it, and when there is a shortage (b), they give it to proteins. If there are a lot of zinc ions, then both ribosomal proteins and zinc-dependent enzymes have enough of it; in this case, the synthesis of a ribosomal protein that does not contain zinc is turned off (the rectangle is a zinc repressor, the blue arrow is a protein gene). When zinc is low, the protein is synthesized, replaces zinc-containing proteins in ribosomes, and they give zinc to enzymes. Image "Chemistry and Life"

The bacterial cell needs zinc ions: they, for example, are part of some enzymes as cofactors. Accordingly, there is also a molecular machinery that serves all processes related to zinc. We studied the zinc repressor (zinc is toxic to the cell in large quantities, so turning off its transport at sufficient concentrations is no less important than being able to extract it from the environment) using the ideology described in the previous chapter. If there is a potential zinc repressor site in front of the gene, then this gene may relate to zinc metabolism. This is how we once "calculated" the zinc transporter – a transmembrane protein that ensures the penetration of zinc into the cell.

So, in 2002, Katya noticed that the potential sites of the zinc repressor for some reason often come across in front of the genes of ribosomal proteins. She shared this observation with her supervisor, and I said that since there are more than a hundred ribosomal protein genes in the genome, and the sites met before different genes, this is an accident. But Katya did not believe in chance and found an article by Evgeny Kunin (about his model of cell origin, see the article by M. A. Shkrob in the August issue), which was published shortly before. There it was shown that some ribosomal proteins contain a zinc binding motif – the so-called zinc ribbon, three or four cysteines at the right distance relative to each other and in the right context. An important observation of Kunin and colleagues was that the same protein in some organisms has these zinc motifs, in others it does not, but, apparently, it functions normally without zinc. And some bacteria have the same protein in two versions, with and without zinc tape.

And so Katya noticed that in the latter case, when there are two variants of a protein in one genome, the one without a zinc ribbon is repressed by a zinc repressor. In other words, in the presence of zinc, the protein variant that needs zinc is expressed, and in the absence of zinc, the one that does not need it.

The basis for the existence of any cell is heavy industry, the production of means of production, just as we were taught at lectures on the political economy of socialism. About 70% of the cell's protein is ribosome proteins, that is, organelles that are needed to make other proteins. On the other hand, zinc is a cofactor of enzymes vital to the cell, such as DNA polymerase. If zinc becomes scarce, ribosomal proteins completely take it away, enzymes have nothing left, and the cell dies. But the cell has a backup copy of the ribosomal protein, which does not need zinc. We assumed that the cell includes the synthesis of such proteins in conditions of zinc deficiency and they are embedded in part of the ribosomes in place of zinc-containing proteins. At the same time, some amount of zinc is released. Maybe ribosomes work a little less efficiently after that, maybe they don't work at all – but in order for zinc to be enough for vital enzymes, which are represented by a significantly smaller number of copies, it is worth sacrificing a small fraction of ribosomes.

We wrote an article, but within a year no reputable journal accepted for publication the crazy theory about ribosomes as a zinc depot. However, Katina's find seemed very beautiful to me, and for the only time in my life I took advantage of the fact that my grandfather, as a member of the US Academy of Sciences, has the right to submit articles for publication in the Proceedings of the National Academy of Sciences of the USA. He sent the article for review to Kunin, who gave a positive review (and, it seems, to someone else). The article was published in PNAS, and, as it soon turned out, very timely: six months later, an article appeared by Japanese biologists who experimentally showed the same thing. You can guess that they have been working on this for a long time, and probably they were a little offended that the computer prediction anticipated their results.

Note that this whole story is based on very small private observations (there are cysteines in the protein – there are no cysteines, there is a potential repressor site – there is no site ...). But together these little things allowed us to make a non-trivial conclusion that turned out to be absolutely correct. In general, when we publish articles, we try to say as clearly as possible which of our predictions we consider reliable and which may be incorrect. So, among those in which we were sure, there were no wrong ones yet (dozens have already been checked), but there were really punctures among the weak ones, although also not often.

Screwdriver with removable stingNo less beautiful were the works with transporter proteins (I participated in them only in the early stages, so I have every right to praise them without becoming a braggart).

Transporters are a gold mine for bioinformatics, since it is quite easy to identify a transporter, especially a bacterial one. They have several hydrophobic spirals passing through the membrane: between them is a channel through which an ion or molecule necessary for cell life penetrates inside. Transmembrane segments can be found in the protein sequence using special programs. And if there are five or six such segments in an unknown bacterial protein, it is almost certainly a transporter (because other transmembrane proteins, such as participants in the respiratory chain or rhodopsin, are well known). It remains to establish what kind of substance he carries.

Fig. 2. This is how the transport systems predicted in silico, carrying cobalt (Cbi) and nickel (Nik) into the cell, looked like. On the left is the location of proteins in the membrane (inner side from below), on the right is the location of their genes in the locus. Homologous proteins are indicated by the same letters and color: O (ATPase) and Q (transmembrane protein) are universal components, while CbiM, CbiN, NikN, NikL and NikK are additional and may differ. Everyone was surprised by the guess of bioinformatics that the basic CbiM–CbiN module remains active even without ATPase (according to the drawing by D. Rodionov from an article in the Journal of Bacteriology, 2006, vol. 188, 1, pp. 317-327). Image "Chemistry and Life"

Studying the specificity of transporters in an experiment is a below–average pleasure. It's much easier with enzymes, it's almost a routine task that can be entrusted to a robot. You overexpress the enzyme (that is, force it to be synthesized in large quantities), and then offer it five hundred different substrates and see which of them the reaction will go with.

The transporter, of course, can also be overexpressed. But in order for it to work, it must immediately integrate into the membrane, otherwise the hydrophobic segments "stick" to each other, the protein forms non-functional aggregates. Therefore, we have to make a lot of membrane vesicles, embed proteins in them in the correct orientation, and then see if the desired substance gets inside the vesicles. In addition, there are different transporters. Some pump useful substances into the cell against the concentration gradient and expend the energy of the ATP molecule, which is broken down by a special protein – ATPase. Others carry out secondary transport – by letting in the "right" molecule, they simultaneously release a hydrogen, potassium or sodium ion along the concentration gradient. If the transporter is ATP-dependent, then in order for it to work, it is necessary to assemble a structure of several proteins, including ATPase. And if this is a secondary transport, then you still need to guess which ion concentration should be increased inside the ball. Hence it is clear that the biochemistry of transporters is a science for the strong–minded and there is little experimental data on them.

On the other hand, it is much easier to determine the specificity of transporters by bioinformatic methods. It is enough to resort to the already familiar logic: for example, if the synthesis of this protein is regulated by a zinc repressor, it will most likely be a zinc transporter, and if its gene is located at the same locus with the ribose catabolism genes, it obviously transfers ribose into the cell... This is how we found the riboflavin transporter at the time: there is a protein with an unknown function, it has six potential transmembrane segments, it is regulated together with the genes of the riboflavin pathway – so it is a transporter of either riboflavin or its precursor. But since this transporter and riboflavin-dependent proteins were present in some genomes, but there was no way to synthesize riboflavin from precursors, it means that it could only be a riboflavin transporter.

It is much easier to test a specific prediction experimentally than to start from scratch. I always explain to students that a bioinformatician is a completely defenseless being, like that character in an adventure novel who knows where the treasure lies. While he is silent, everyone takes care of him and takes care of him, but when he talks, he is no longer needed. As soon as the bioinformatician said "this protein has such and such a function", it depends solely on the decency of the experimenters whether they will take him as a co-author after they check this statement. And the statements, as the reader has already seen, are extremely simple and specific, it is enough to say them out loud once.

With the same simple conclusions, a more complex story began, but also more interesting. We studied the regulation of the biotin biosynthesis pathway (biotin is vitamin H, or B7, a cofactor of many important enzymes). The biotin transporter was unknown at that time. In the course of our work, we discovered a transport protein that is regulated and sometimes localized together with the genes of the biotin pathway. Then everything is like with riboflavin: there are organisms where there is no biotin pathway, but there are proteins that depend on it as a cofactor, and there is the same potential transporter – therefore, it is a biotin transporter.

As already mentioned, transporters are ATP-dependent and carry out secondary transport. The biotin transporter was lonely, no ATPase gene was visible nearby, which means it was a secondary transporter. But then we saw that in some genomes, some ATPases come across next to the biotin transporter. What this means was unclear at that stage, and therefore we just mentioned it in the article in one phrase.

Around the same time, we studied the regulation of the cobalamin pathway. Cobalamin, or vitamin B12, is also a cofactor of important enzymes, a very large molecule with a metabolic pathway of corresponding complexity. For this story, it is essential that there is a cobalt ion in the center of the cobalamin molecule, which is again brought into the cell by transporters. We found a lot of such transporters, published an article about them – and soon received a letter from Thomas Eitinger from the Institute of Microbiology at Humboldt University (Berlin). He urged us to pay attention to the fact that any cobalt conveyor can also transport nickel, and vice versa, because their specificity is weak. We replied that we consider transporters from the point of view of their functional role in the cell, and if the protein gene is in the same operon with a large set of genes for cobalamin synthesis, of course, the protein is needed by the cell as a cobalt transporter, although in vitro it can be forced to transfer nickel. And if we see a transporter gene in the same operon with a nickel-dependent urease, then it is certainly a nickel transporter.

Fig. 3. There is an extensive group of bacterial transporters that contain a universal, common to all ATPase component that supplies energy for transport (red, it corresponds to CbiO in Fig. 2), and a common transmembrane protein (blue, CbiQ), as well as an additional protein that provides specificity – determines the type of substance being transferred (as CbiMN). The additional component can also work as an independent conveyor (according to the drawing by D. Rodionov). Image "Chemistry and Life"

Prospects for joint work were outlined, and Dmitry Rodionov, who was doing this work, applied together with his German colleagues for a small joint grant and went to Berlin for three months. (Dmitry graduated from MEPhI, after which he studied genomics with us; then he worked in the USA, and now he has won a grant from the academic program "Molecular and Cellular Biology" to create a new group and is returning to Moscow.)

By this time, we had started to do (by e-mail) a large project on comparative genomics of nickel and cobalt transporters, where we classified them, firstly, by regulation, and secondly, by localization, together with cobalt or nickel functional proteins. So, in one of these nickel-cobalt families, some oddities were observed. On the one hand, ATPases and transmembrane proteins forming a channel for the ion, as expected, were located side by side and regulated together. On the other hand, there could be another transmembrane protein in the same operon. Moreover, these "extraneous" proteins in cobalt and nickel transport systems differed quite strongly, were not homologous, unlike ATPases and transmembranes. And in addition, ATPase and transmembrane protein turned out to be homologous to those "extra" biotin proteins that were sometimes found, sometimes not in the previous study.

I still don't know how Dima persuaded his German colleagues to the next crazy experiment. To the "classical" biochemists who have been studying the transport of cobalt and nickel in bacteria all their lives, he suggested: let's turn off the ATPase and transmembrane protein homologous to biotins from the transporter, leaving only one unique component. After all, the biotin transporter ATPase and the "main" transmembrane are not really needed, they are, then they are not, – maybe they are not needed for the nickel transporter, a lone non-homologous transmembrane will cope on its own? It is unknown why respectable German biochemists decided on this strange act: to deprive a seemingly ordinary ATP-dependent ATPase transporter and see what happens. Anyway, Dima was right. The lone transmembrane worked like a cobalt transporter – less efficiently, but it worked. This was the first example of a dual system that, if there is an ATPase, works as an ATP-dependent one, and if there is not, works as an ion-dependent one.

Later, Berlin colleagues did the same with biotin: they took a bacterium whose biotin transporter has an ATPase and a transmembrane, turned off their genes – and showed that this protein alone also works as a biotin transporter, although with less power than in the presence of ATPase.

At that time, Dmitry Rodionov was already working as a postdoc in the laboratory of Andrei Osterman at the Burnham Institute for Medical Research in La Jolla. Osterman is a wonderful person, a biochemist who understood the effectiveness of bioinformatic methods, learned how to use them and found many new enzymes with their help. And so, when Dmitry got into the circle of biochemists and began to communicate with them, it turned out that there are several dozen similar transporters carrying different substrates – cofactors, amino acids, ions. (By the way, the riboflavin transporter turned out to be the same.) Different research groups independently studied these transporters, having no idea that they belong to the same family.

It also became clear how such an organization is possible. Cobalt and nickel transporters do not occur separately from their ATPase (unless it is removed experimentally). But there is another class of bacterial transporters that use the same ATPase – like a screwdriver with a removable sting. In this case, universal ATPase and transmembrane protein can be encoded together with ribosomal proteins, that is, they are expressed constantly and in large quantities. And those proteins that provide specificity to transporters are scattered here and there in the corresponding operons. And in the absence of ATPase, such a protein somehow works as a secondary transporter, and therefore we see only it in the genomes of some organisms.

Bioinformatics and the theory of evolutionHowever, these "applied" discoveries are very important and useful, but for us, bioinformatics, not the main thing.

And most importantly, what the industrial revolution in biology has brought us is the opportunity to discuss evolution on a different level. Even banal statements, say, about the percentage similarity of human and chimpanzee genomes are not as trivial as they may seem. Molecular evolution is instructive because it remarkably fulfills Darwinian ideas about the nature of things.

The data obtained by molecular biologists now have a serious impact on the taxonomy – classification of plants and animals. At first, botanists and zoologists were skeptical about molecular genealogical trees showing the degree of kinship between species based on comparison of nucleotide sequences, but it must be admitted that the first molecular trees were not too successful. Now convergence is happening right before our eyes – classical and molecular taxonomies are moving towards each other. It is already clear that molecular trees, if they are built in compliance with certain rules, are close enough to reality and may well become a reason for revising orthodox taxonomic ideas based on morphology – comparing the external features of organisms. And, oddly enough, it turns out that in species that are supposedly forcibly placed together based on the similarity of their genes, common signs are indeed found. It turns out that a good molecular tree does not contradict the morphological construction, it's just that other signs turn out to be leading.

As for bacteria, in the era of classical biology, they were classified according to the shape of cells and metabolic properties: which sugars they can utilize, which amino acids and cofactors they can synthesize themselves, and which they need as an integral part of the external environment, etc. This taxonomy was very weak, since bacteria have very few morphological and functional features compared to higher organisms. Today, the taxonomy of bacteria seems to be based entirely on molecular data. Species names are being reviewed en masse. But the most impressive achievement in this field was, of course, the work of Karl Vese, who in 1977, on the basis of molecular taxonomy, postulated the existence of archaebacteria (now called archaea) – the third domain of life, different from eukaryotes and "real" bacteria.

It cannot be said that all the problems of the systematics of bacteria have now been solved. To a large extent, the idea of what a bacterial species is has been destroyed. It was found, for example, that in two strains of E. coli – representatives of the same species – up to a third of the genes may be unique, that is, present in one strain and absent in the other. A lot of unexpected and interesting things are already known about bacterial evolution. In particular, it turned out that horizontal transfer – the exchange of genetic material – can occur between taxonomically distant creatures. For example, Metanosarcina is a typical archaea, but a third of its genes are of bacterial origin, and these genes serve almost its entire metabolism, while the mechanisms of transcription, translation, replication, membrane structure in metanosarcina are characteristic of archaea. By this example, we can judge how exciting it is now to deal with the evolution of bacteria.

In my opinion, the most interesting thing is the evolution of regulatory systems. We know quite a lot about these systems in bacteria and can imagine how regulatory systems change, how a local regulator suddenly begins to control dozens of genes or changes specificity, how regulatory cascades are rebuilt. And this can be very important from a fundamental point of view, because here you can go much further. The difference between a human and a chimpanzee or even a mouse is hardly due to a set of genes: they are practically the same in mammals, if compared by a set of functions. The reason is rather in regulation: which genes, when and in which tissues are active.

Most likely, the "jumps" of evolution, any sharp changes in morphological features are provided just at the level of regulation. We already know such examples in bacteria, yeast and other relatively simple organisms. Most bacteria have one iron repressor that reacts to the presence of iron ions and regulates many genes: proteins that ensure the storage and transport of iron, iron-dependent enzymes. And other bacteria have three different repressors that these functional groups have divided: some regulate iron storage, others transport and synthesis, and others – enzymes. This is actually a radical change, there was one answer to iron, but it turned out to be three different ones.

There are wonderful experimental works performed on multicellular. Why is the sea urchin the only one among echinoderms with a solid skeleton? The answer was suggested by Eric Davidson from the California Institute of Technology. He studied the regulatory cascade that is responsible for the development of this skeleton, and then found this cascade in a starfish, only it turns on much later, so only the bases of the needles develop, not connected to each other. In a hedgehog, the same cascade turns on a certain number of cell divisions earlier, respectively, captures a larger number of cells, and a solid skeleton develops. Thus, a purely regulatory change gives an absolutely new feature.

I hope that a comparative analysis of regulation will provide answers to the question that worries paleontologists and morphologists at the current stage of the development of the synthetic theory of evolution: how does the accumulation of small changes give radically new signs? It seems that this can be explained by the reconfiguration of regulation. We already know how to do this on simple organisms, but sooner or later the turn will come to more complex ones. And when this happens, there will be a third big breakthrough in this direction, if Darwinian natural selection is considered the first, and the second is the connection of evolutionary biology with genetics.

Portal "Eternal youth" http://vechnayamolodost.ru19.11.2009

Found a typo? Select it and press ctrl + enter Print version