26 February 2008

Sons of Adam: Genes, history and geography

Genography: a winding path around the world

The question "where did it come from" for entire tribes and peoples was solved until recently with the help of data from history, archeology, linguistics and other not very precise sciences. Anyone could determine his personal pedigree with the help of the archives of registry offices, audit fairy tales, parish books, etc. – for 3, 5, well if for 10-15 generations – except hereditary aristocrats, including those who trace their lineage from Rurik, who "came to reign and grow old with us" in 862, or from Ali, nephew of the Prophet Muhammad (602-661). But "when Adam plowed and Eve spun, where was the lord then?"

Now any of us can determine our ancestry to the progenitors of mankind. However, not biblical, but molecular-genetic – "mitochondrial Eve" and "Y-chromosomal Adam". The branches of the genealogical tree of each of the 6.6 billion people living on Earth now converge to them who lived in South-East Africa (according to the most generally accepted of conditional and approximate estimates – 150-170 and 60-80 thousand years ago). Over the past millennia, in our DNA, notes about the ways in which the descendants of Adam and Eve scattered around the world have been added to the records of kinship with them, like visas in a passport.

The evolution of man and his relatives, including our Neanderthal cousins (it seems that if we had children together, then their descendants died out without leaving any traces in the genome of modern man) is an interesting topic, but separate. Let's start right from the "end of the beginning" – with the fact that about 200,000 years ago, an almost modern subspecies of the species "reasonable man" was formed in Africa, and about 50,000 years ago, the tribes of our ancestors began to settle outside Africa.

DNA analysis can answer a lot of questions. Which people reached their current place of residence in what way over the millennia of great and inconspicuous migrations, conquests, associations and splits? How have races and tribes mixed up during this time? How have your ancestors personally been carried around the world since they left the cradle of peoples – Africa tens of thousands of years ago? And which of the groups of these migrants do you come from – separately on the paternal and maternal lines? How many generations ago did the closest ancestor live, from whom you and any of the reigning personages personally descended in a straight line? Do you have a close relative of an ordinary citizen living on another continent or on the next street?

The study of ethnogenomics began back in the 1980s, when large computers were still living out their days, and DNA analysis methods were several orders of magnitude slower and more expensive than current ones. Now the simplest, but fairly complete analysis of individual molecular genealogy, for 12 Y-chromosome markers, costs $ 100-150, and more than sufficient, for 37 Y-DNA markers + a complete mitochondrial DNA test – about 400. A number of commercial firms and public institutions have databases for tens of thousands of DNA samples, and most of them (with certain restrictions on access to personal information) are open. Due to people's interest in their pedigrees, information in these databases accumulates faster than scientists have time to process it – especially since no one has canceled planned scientific work. The most ambitious of such studies is the Genography project, launched in 2005. The main requirement for the project participants is belonging to a people or tribe whose history is known according to ethnography and linguistics, knowledge of at least four generations of their ancestors who lived in the same region, and male gender. (No sexism, just men in one sample – scraping from the inside of the cheek – you can get Y-chromosome, and mitochondrial DNA.)

The goal of the project is to collect at least 100,000 DNA samples in five years to clarify the ways of human migration on Earth. In fact, such a huge sample is a drop in the ocean compared to the real diversity of races and tribes, but as information is added, the results can be refined. Although even the draft plan drawn up by the project participants under the guidance of Dr. Spencer Wells is an exciting sight, especially in the form of an interactive map on the National Geographic website, with detailed comments and a lot of additional information. But to understand what is written there (also in English), you will have to deal with the terms.

Genetic educational program

In the nucleus of each human cell, except for eggs and spermatozoa, there is a diploid (double) set of 46 chromosomes: 22 pairs of somatic ("bodily") and a pair of sexual: XX in women and XY in men. In germ cells, the set of chromosomes is haploid – single (you can't do without Greek terms here: chromosomal genealogy is based on the definition of individual haplotypes belonging to different haplogroups).

When germ cells are formed from diploid progenitor cells, their somatic chromosomes (and in women, the genitals) exchange sections - something like what happens if you don't shuffle two decks of cards with shirts of different colors too carefully and again decompose them into two identical sets regardless of the color of the shirts. We get a quarter of the genome from each of the two grandfathers and two grandmothers, 1/8 – from great-grandfathers and great-grandmothers...

In our chromosomes there are genes not only of Adam and Eve, but also of all their close and distant relatives who lived 70-80 000 years ago, when the number of our species decreased to a critical value – about 10,000 individuals, and more distant ancestors, up to the first mammals and even the first multicellular animals. But from them we received only somatic and X-chromosomes, the genes of which, as a result of constant mixing, spread throughout the population. The Y chromosome and mitochondrial DNA pass almost unchanged from generation to generation. This "almost" is the basis of molecular genealogy, which studies the history of mutations that occurred in the ancestors and preserved in the DNA of descendants.

Usually, large mutations – for example, moving to another place, doubling or, conversely, the loss of a large section of a chromosome carrying one or more genes – do not lead to anything good. As, however, the common single nucleotide polymorphisms – single nucleotide polymorphism, SNP (pronounced "snip"), if they occur within one of 21,000 human genes. (The abbreviations SNP, as well as STR and DYS, which will be discussed later, will also have to be remembered: without them in molecular genealogy is like without hands.)

Beneficial mutations occur much less frequently and persist in subsequent generations. Harmful – are removed from the population together with the carrier, or at the stage of the embryo, or, with a severe hereditary disease, before this carrier has time to acquire offspring. As a result of the action of stabilizing selection and the occasional decrease in the number of species that reduce its genetic diversity, the structure of genes in two randomly selected people coincides by 99.9%. All our differences, from skin color and eye shape to growth and propensity to certain diseases, are mainly determined by gene polymorphisms – minor differences in the nucleotide sequences of almost identical genes and, accordingly, in the structure and functions of proteins encoded in these genes.

Random mutations occur constantly, and polymorphisms are considered to be those that, firstly, occurred a long time ago and therefore occur more often than in 1% of people in a given population (the boundary here is conditional - it would be possible to consider random gene variants that occur less frequently than in 2 or 3%). Secondly, polymorphisms do not have a noticeable effect on the health of their carriers or even have an adaptive character. However, even here the border is quite blurry. A classic example is SNP, due to which, as a result of replacing one amino acid, the β–subunit of the hemoglobin molecule becomes defective. This mutation is found in southern India, in the Mediterranean and in West Africans and their descendants on other continents. Heterozygotes for this gene – carriers of a healthy variant on one chromosome and a "spoiled" one on the other – are less likely to suffer from malaria, and anemia symptoms are experienced only in extreme conditions, for example, in the highlands. Homozygotes with a mutant gene develop a serious disease – sickle cell anemia.

SNP is a fairly frequent phenomenon: when copying chromosomes, they occur with a probability of 10-8 per nucleotide per generation. With a haploid genome size of 3 billion (3×109) nucleotides due to random point mutations, each child has on average about 30 single-nucleotide differences with their parents. Fortunately, most of these mutations do not occur in genes, but in the so–called "junk DNA" - about 95% of the human genome that does not encode either proteins or service RNAs. These mutations do not affect anything, they are indifferent to selection – so they are preserved among the rest of the genetic garbage. They, as well as short tandem repeats (we'll talk about them later), are used in molecular genealogy as chromosomal markers – characteristic features.

The genetic deck is shuffled at the birth of each child, so it is possible to determine your pedigree by the numbers of somatic chromosomes, but it is difficult and very approximate. For example, in order to determine the racial identity of a person with almost one hundred percent accuracy, and then – for typical representatives of this race, it is necessary to analyze several hundred somatic DNA markers. By several thousand, it is possible to determine from which region the owner of the genome originates – with accuracy to "Southeast Asia" or "Northern Europe", and in the contact zones of different races and tribes - between the Volga and the Urals or in both Americas – it is almost impossible to do this because of the mixing of gene pools. And using the markers of mitochondrial DNA and the Y chromosome, you can determine your pedigree very accurately.

Let's start with Adam: male molecular genealogy is a little easier to understand. And we will add at once: ladies can also find out everything said below about their origin in the male line. To do this, you just need to analyze the DNA of the father (or brother, or uncle – any direct paternal relative).

How do boys differ from girls

Approximately 300 million years ago, in the first mammals, one of the chromosomes carrying, among others, several genes that determine belonging to the male sex, began to lose the remaining genes and with them the ability to exchange DNA sections with a paired chromosome. A person has only 27 genes left on the Y chromosome, most of which work only in the testes. The rest of the Y-chromosome DNA may be useful except for ethnography and genealogy – but it is much better suited for these purposes than somatic chromosomes. Mutations – the loss, replacement or addition of single nucleotides (SNP) and changes in the number of repetitions of three or four identical nucleotides, the so-called short tandem repeats, Short Tandem Repeat (STR) – occur in different parts of the Y chromosome with different frequency, on average – one mutation for every 500 generations. In the most conservative areas, mutations occur once every 100,000 years. (Y-chromosomal Adam lived about 80,000 years ago, but "once in 100,000 years" is the probability of mutation in this section of the chromosome in a continuous series of generations, and it can be calculated that even such a rare case can occur once in a generation in about one in 3,000 father-son pairs). The time and place of life of our common direct ancestor was calculated by analyzing the distribution of Y-chromosome markers in populations from different regions of the planet. They are used both for research in the field of ethnogenomics, and for finding out individual genealogies – as molecular stamps on registration and birth certificates.

Molecular genealogy close-up

The generally accepted classification of Y-chromosome lines is based on the sequence of appearance of SNP markers on it. The genealogical tree of modern men has 18 main branches, designated in Latin letters, from the first branch at the root – A, to the latest – R. This classification takes into account approximately 250 markers, according to which about 170 end clusters are now distinguished, each with its own set of sequentially occurring mutations. This nomenclature allows you to add new branches to the scheme as new markers are identified, without changing the overall topology of the tree.

Trying to retell the conditional codes of all markers and the sequence of their appearance for each of the main haplogroups and increasingly fractional subgroups is hopeless: each of the modern branches of the molecular family tree differs from Adam by about two dozen different mutations in different parts of the Y chromosome. Where exactly are they located and what happened in them – the loss of one nucleotide, the replacement of another, the loss of a third, or a recurrent (restorative) mutation that coincidentally returned the lost guanine to its old place (because of this, the "Slavic" subgroup R1a1 had the oldest marker in the same form as it was in Adam) – moreover, we will leave it to specialists. Although many amateur enthusiasts (for example, on the forum of the site "DNA Tree") are well versed in such subtleties.

History and geography are easier to understand. For example, the population of ancient Russian cities most often have haplogroups R1a, I1b and N. Very conditionally, their carriers can be called, respectively, descendants of eastern, southern and northern Slavs. In fact, the mutation that determines belonging to the R branch supposedly appeared in northwest Asia 30-35,000 years ago: in a tribe whose men (or most of them) belonged to an earlier haplogroup P, a boy was born in whose Y chromosome there was a failure - the replacement of just one nucleotide, adenine for guanine. His descendants settled in Europe and western Asia, mixing with local tribes along the way, but all his direct heirs retained this mark – the M207 marker. It was also found in men of one of the isolated tribes of Cameroon – most likely, they are descendants of a part of a prehistoric tribe that returned from Eurasia to Africa. One of the great-great... great–grandchildren of the ancestor of haplogroup R, who were gradually moving south, had another mutation 25,000 years ago, M127 - its carriers, representatives of haplogroup R2, make up 5-10% of the population of South Central Asia, Pakistan and northern India. Another branch of this group turned to the west, to the lands from which the last glacier was sliding down. The R1 subgroup, in which the M173 mutation was added to the previous labels on the Y chromosome about 30,000 years ago, is the most common in Europe and western Eurasia. The R1a (M17) subgroup originated in the steppes of present-day Ukraine and southern Russia 10-15 000 years ago. The tribes of the kurgan culture, known according to archaeological data, were among the first to tame the horse and, due to this, settled almost all over the continent, conquered the natives, mixed with them and imposed their language and religion on them, which became the basis of Proto-Indo-European (Indo-Aryan) culture. Direct descendants of Indo-Aryans include 45% of men in the upper castes of India, 40% of Poles, northern Russians, Ukrainians, Belarusians, Latvians and Lithuanians. It is a little less common among southern and Western Slavs, Icelanders (Rurik was also a Viking), East Germans and the peoples of Russia, whose language belongs to the Ugro-Finnish group (millennia of neighborhood have long mixed "Rus, Mordvins and Chud"). Despite all the subsequent mixing of tribes, carriers of the R1a subgroup in southern Russian cities make up 55% of the male population, and in central Russia – up to 70%. But if anyone decides to be proud of their Aryan blood on the basis of belonging to this subgroup, we remind you: your relatives live in the Kalahari Desert, and further around the globe to Greenland, Australia and Tierra del Fuego. And check your maternal lineage, by mitochondrial DNA, and then by somatic markers. By the way, there are no fewer typical Mongoloid carriers of the R1a subgroup among Kazakhs in all other respects than in Russia. Yes, we are Scythians…

Ethnogenomics deals with the probabilities of mutations obtained by approximate calculations, samples from populations and other statistical averages. Even the number of millennia in these calculations depends on what time of generational change, from 20 to 35 years, the authors of the work choose. The maximum approximation in time that a haplogroup shows, even with the longest alphanumeric index, is the probability of the appearance of one SNP, at best 5,000 years, so that the details of individual genealogy cannot be determined by it. One can only argue that more than a third of the inhabitants of northern India, almost half of Russians and almost a third of Norwegians are related to each other in the two-hundredth or three-hundredth generation.

This is what the tops of the intertwined branches of the Y -chromosome tree look like when viewed from orbit – without subtle details in the form of subgroups

Molecular genealogy up to...

For detailed studies in molecular genealogy, short tandem repeats (STR) are used. Among other abracadabra in DNA, from time to time there are senseless repetitions of three or four nucleotides, for example, such as: TAGATAGATAGATAGATAGATAGATAGATAGA ... - thymine-adenine-guanine-adenine. A lover of DNA pedigrees will easily recognize in these letters the first of the most commonly used markers of the Y chromosome, DYS19 (DNA Y-chromosome Segment No. 19). During the formation of spermatozoa, enzymes that copy the DNA of the progenitor cell sometimes skip or add one of the repeats to such a tandem. If this sperm takes part in the birth of a boy, all his sons, grandchildren and great–grandchildren will keep a new entry in this marker until the direct male line is terminated - or until the next mutation in the same STR. Repeated mutation can increase the difference more or less, making the differences between the branches of the tree clearer, and can restore the original state. But when using a large number of markers, the degree of kinship can be established quite accurately.

All men on Earth have tandem repeats of different lengths at the DYS19, 388, 390, 391, 392 and 393 loci. For example, the TAGA quadruplet (DYS19) can be repeated from 10 to 19 times, the ATA triplet (DYS388) – 10-16 times, etc. In 98% of men, DYS385 was added to the basic set of six common markers inherited from chromosomal Adam – from 7 to 28 repeats of "GAAA". 34% still have DYS438 and 439, etc. The most informative of almost five hundred STRs found on the Y chromosome were selected as genealogical markers. Standard DNA genealogical tests are carried out on 12, 25, 37 or 67 markers, although six are often enough to attribute the "owner" of this haplotype (an individual set of markers) to one of the haplogroups, from A to R, by a combination of the number of repeats. For example, the so-called "Atlantic modal haplotype", the most common in western Europe, looks like this: DYS19=14, DYS388=12, DYS390=24, DYS39=11, DYS392=13, DYS393=13. A person with such a haplotype almost with absolute accuracy belongs to haplogroup R1b or its sub-variants.


The smaller the discrepancies in the number of repeats of nucleotide groups in all analyzed DYSS between two haplotypes, the more likely it is that their carriers are relatives or namesakes (and also relatives, only more distant). In the test for 37 markers, the probability of a random coincidence of all of them is 1/6 37, 1 chance out of 6 × 10 28 (the population of the Earth, recall, 6.6×10 9).

Even with 12 markers, the probability of a random coincidence is 1 chance out of more than two billion, and with a complete coincidence of someone's DYS with your haplotype, you can be sure that you have found a brother, father, uncle or "cousin" (but no more than in the tenth or fifteenth generation) of a relative in the direct male line.

A couple of years ago, an American teenager, conceived "in vitro" with the help of an anonymous donor's sperm, found his father in this way. However, there were two men with the same surnames in the database. (Not surprising: the average probability of determining a surname by DNA, for example, in England with its extensive databases is 34%. This method is not suitable for Smiths and Browns, and carriers of rarer surnames can be calculated fairly accurately and, if it is DNA from the crime scene, narrow down the circle of suspects.)

According to his mother, the boy knew the time and place of the donor's birth, got into another database, found the names of all the men who were born in the right place at the right time and found out which of the two was his biological father. Whether they met or not, and whether they were glad of such a find, history is silent about this.

And what can happen if in such a database you find a sequence of numbers that differs from yours by 1, 2 or 5 units? You can estimate how many generations ago your closest common ancestor lived. A discrepancy of one repeat in any of the markers occurs on average once every 500 generations, approximately once every 15,000 years. In a haplotype of 37 markers, one mutation can occur approximately once in 13 generations, or in 300-400 years. And the average mutation rates in each of the DYS are calculated – you can further clarify the possible time of discrepancy. True, it will still be just a possibility, but then it will be possible to dig into paper data, from family archives to chronicles. And in any case, you will learn a lot of interesting things, especially if you add mitochondrial DNA analysis to the information about the Y chromosome. But read about maternal kinship in the following article (Eva's Daughters: Genes, history and geography).

We thank Anatoly Klesov for his help in writing the article.

Alexander Chubenko
Portal "Eternal youth"  www.vechnayamolodost.ru

The journal version of the article was published in Popular Mechanics No. 10-2007


Found a typo? Select it and press ctrl + enter Print version