20 May 2022

Sequencing Workshop

Almost everything about modern sequencing in one day

Elena Kleshchenko, PCR.news

On May 16, 2022, the training course "Introduction to NGS technologies. Working with sequencing data". The co—organizers of the courses are ANO "Institute of Synthetic Biology" and the Moscow Center for Innovative Technologies in Healthcare. On the first day, introductory lectures were held on the development of sequencing technologies and the current situation in the industry.

NGS1.jpg

Any sufficiently developed technology is indistinguishable from magic, recalled the third law of Arthur Clark, the head of educational projects of the medical and genetic center "Genotek" Dmitry Krivosheev. And this fully applies to sequencing.

The twentieth century was the century of the birth of genetics and gene research, the XXI century became the century of genomics, the analysis of huge volumes of genetic information. Probably, in the near future, the largest share of new data that is accumulating in the world will be information about genetic sequences.

The graph of the US National Institute for Human Genome Research, presented below, is often shown when talking about the history of sequencing. The cost of reading DNA is decreasing faster than according to Moore's law — in 20 years it has fallen from billions to thousands of dollars per genome and continues to decrease. The fall in value was largely due to NGS. Sequencing is becoming more and more accessible and finds new applications.

NGS2.jpg

There were companies working in the field of consumer (direct-to consumer, DTC) genetics, such as Ancestry.com and 23&Me, where an individual can get information about where their ancestors come from, about the impact of genetics on their health and abilities. In the Russian Federation, this niche is occupied by Genotech, the speaker stressed.

Interestingly, the initial stages of the development of Ancestry are connected with the Mormon Church — genealogical connections play an important role in their religious views. James Sorenson, the famous Mormon billionaire and philanthropist, created a genetic database, which was later acquired by Ancestry; this company continues to cooperate with Mormons.

Most companies working in the field of consumer genetics are not engaged in sequencing, but use microchips. With their help, single-nucleotide polyphisms are detected (there are about 4-5 million of them in the human genome), then bioinformatic analysis is carried out. Some single-nucleotide substitutions can cause serious diseases, such as sickle cell anemia, but most of them have virtually no effect on health, so they are not destroyed by selection and can serve as markers for population genetics.

Each of us is an incredible mixture of genetic markers of different ethnic groups, so it is impossible to create any ethnically oriented weapon, Dmitry Krivosheev noted. There are no "Russian genes" or "Swiss genes" characteristic of one and only one ethnic group, It is only possible to determine which part of the markers in the genome of a particular person coincides with the most common markers of representatives of a certain ethnic group.

A routine service of DTC genetics was the identification of Neanderthal components in the genomes of Europeans. "I have more Neanderthal DNA than the average client [Genoteka],— Dmitry said. — How does it help in life? Nothing."

The speaker then spoke about other applications of genetic tests, such as pharmacogenomics and personalized medicine, and how genome-wide association search (GWAS) it helps to find the connection between individual SNPs and certain complex features. Information about the "genetic predisposition" to sporting success, one or another creative activity should be treated carefully: it is unlikely that such complex signs are determined by genetics. On the other hand, there are quite successful attempts to assess the influence of genetic factors on intelligence, for example, the polygenic predictor of "predisposition" to higher education using 74 SNPs turned out to be quite accurate (in this study, genetic factors explained 11-13% of differences in educational level and 7-10% of differences in cognitive abilities).

During the coronavirus pandemic, Genotek took part in The COVID-19 Host Genetics initiative — an invitation was posted on the well-known Yandex page with Russian morbidity statistics for seriously ill patients to participate in a genetic study to find the factors responsible for the severe course of COVID-19.

Genome-wide association search allows you to search for valuable traits in plants and animals, which is important for agriculture. As for gene modifications, "everyone was waiting for the legislation to change, but the technology changed faster," Dmitry Krivosheev said. Now gene editing has become possible, which carefully makes minor changes to the genome, and the legislative regulation of such changes is softer. But for such work, sequencing is again necessary.

Finally, a new stage in the development of microbiology was opened by metagenomic sequencing — reading of all genomes in a sample, which makes it possible to assess the state of the soil or intestinal microflora without cultivation.

Sequencing and bioinformatics are the future of biology. "You are the happiest people," Dmitry Krivosheev told the participants of the course and wished them good luck on behalf of Genotek.

Dmitry Shcherbinin, a specialist in structural bioinformatics and teaching, recalled the history of sequencing technologies, which began in the middle of the twentieth century. The English biochemist Frederick Sanger (1918-2013) is the only person who has received two Nobel Prizes in chemistry for developing methods for determining the sequence of proteins and DNA. Sanger sequencing (in the picture below) and Maxam–Gilbert sequencing (they are now referred to as the first generation of sequencing methods) appeared almost simultaneously, but the second, based not on completing the DNA chain, but on splitting it, has not become widespread. At the same time, automated sequencing using the DNA synthesis method with termination developed by Sanger and capillary electrophoresis remains the gold standard today. (Sanger also proposed a second method, (+/–)-a sequencing system, but it quickly disappeared from the scene.)

NGS3.jpg

Then came the actual NGS sequencing of the new generation (next generation sequencing), it is also high—performance sequencing. After the advent of single-molecule sequencing methods (Pacific Bioscience and Oxford Nanopore), NGS methods began to be called the "second generation".

It is "high-performance" because millions or even billions of reads are performed simultaneously. NGS methods, like Sanger sequencing, are sequencing by synthesis (ATP, sequencing by synthesis, SBS) — they are based on completing a chain complementary to the one that needs to be read. The sample is split into small fragments, preferably randomly, without preference for certain sites (therefore ultrasound is often used). Each fragment is fixed on a solid surface and amplified. Amplification methods — the creation of multiple copies of a nucleic acid fragment — originate from the polymerase chain reaction invented by Carey Mullis. A set of copies of a single fragment localized on a chip is usually called a cluster. Complementary chains are built on these copies, and at the time of attachment of the next nucleotide, a signal is recorded in one way or another.

Bioinformatic approaches play an important role in the selection of primers for amplification, the speaker noted. Primers should interact only with target sites, not form secondary structures, and have a certain annealing temperature (for a pair of primers, it should be similar). Now there are available services for the design of primers, such as BLAST.

Dmitry Shcherbinin also spoke about the 454 sequencing created at 454 Life Sciences: this technology has not been supported since 2016, but a number of sequencers are still working on it. Its peculiarity is that the amplified fragments are not on the chip, but on microspheres placed in droplets of a solution with reagents. When the chain is completed, pyrophosphate is released, converted into ATP and a glow caused by the enzyme luciferase is recorded. (This is why the method is called pyrosequencing). The problems of this technology are the high cost and low accuracy of sequencing homopolymer sites.

Ionic semiconductor sequencing is similar to 454, but it is not a pyrophosphate that is detected, but a proton, more precisely, a pH change caused by the release of protons during the synthesis of a complementary chain. The Ion Torrent device from Thermo Fisher allows you to get up to 130 million reads per run and decodes homopolymer sequences more successfully. Interestingly, both 454 Life Sciences and Ion Torrent Systems were founded by the same person — American geneticist and entrepreneur Jonathan Rothberg.

Later, the leading position was taken by the Illumina/Solexa technology. In the GenBank database, 80-90% of the sequences were obtained using Illumina. The library is prepared in the usual way: adapters are attached to the sequenced fragments that are responsible for attaching to the substrate, indexes (barcodes) that provide identification of fragments from a single sample, sites for planting primers. When completing the second chain, terminating nucleotides with a fluorescent label are used, after attaching another nucleotide, the laser excites fluorescence, it is detected. Then the fluorophore and terminator are cut off, and thus it becomes possible to attach the next nucleotide. The Illumina technology is interesting because it allows you to read a fragment from two ends.

NGS4.jpg

The relative disadvantage of the Illumina technology is the high launch price: that is why many device owners are trying to get more orders. At the same time, in terms of nucleotide, the cost is not very high.

The ABI SOLiD (Sequencing by Oligonucleotide Ligation and Detection) technology created by Life Technologies significantly reduces the cost. In it, two nucleotides are decoded at once in one step by using 8-nucleotide probes, and their fragments are cross-linked by ligase. This method is also slightly faster, but the reads are short, and there are problems with palindromes forming secondary structures.

The pinnacle of modern advances in sequencing Maria Logacheva (Skoltech) named the third—generation technology Pacific Bioscience. Sanger sequencing requires relatively large amounts of homogeneous material. The strength of NGS is that they can work with heterogeneous matrices. PacBio, aka SMRT (single molecule real time sequencing) catches a signal from a single molecule, which allows you to do without amplification.

The basis of the technology is a flow cell with wells about 100 nm in size. At the bottom of each well is a DNA polymerase associated with fragments of the library. PacBio technology, like most modern sequencing methods, except Oxford Nanopore, is based on the synthesis of a complementary chain, the addition of the next nucleotide is detected using a fluorescent signal.

The advantages of the technology are very long readings (10-50 t.n.). In practice, their length is limited only by the selection possibilities: it is difficult to obtain longer fragments on magnetic particles and columns. There are no problems with the passage of complex sites and with the identification of modified nucleotides in the matrix (for example, if 5’-methylcytosine is found instead of cytosine, the attachment time becomes longer).

The disadvantage of PacBio is a high error rate (10-15%), which is inevitable when reading a single molecule. But it is possible to increase the accuracy due to circular consensus sequencing: when preparing libraries, fragments are looped, and the polymerase passes in a circle many times. Since PacBio almost does not give systematic errors (such as homopolymers in 454 homopolymers, Illumina has GGC trinucleotides), the consensus is accurate.

The newest and most productive device running on this technology is the Sequel II (2019). Up to 500 billion nucleotides can be obtained per launch, the operating time is 10-20 hours. In fact, the performance is lower than it seems, the speaker noted, because of ring sequencing. But the accuracy is very high (up to 99.99%). Maria Logacheva showed data from the 2021 article, which confirms that when sequencing the genomes of E. coli and Staphylococcus aureus with a coating of more than 50, it is possible to achieve an exact match with the reference.

The advantages of the technology cause its wide application. PacBio is used to sequence complex sections of the genome, with repeats rich in GC, for example, in the case of brittle X chromosome syndrome or amyotrophic lateral sclerosis. Another important example is transcriptomics, especially when it is necessary to track alternative splicing events in remote gene regions (although Maria Logacheva noted that Oxford Nanopore is more often used for transcriptome analysis). The technology is used for the detection of epigenetic markers, and last but not least— for the assembly of genomes de novo. (Maria recently talked about the assembly of plant genomes at DNA Day in Pushchino.) PacBio technology made it possible to sequence the genome of drought-resistant cereal Oropetium thomaeum, as well as the axolotl genome, which is 10 times larger than the human genome.

About the flagship of NGS Illumina — more than 17,000 devices of this company have been delivered worldwide, more than 250 in Russia and the CIS — told Igor Shapovalov (Albiogen). The company "Albiogen" is part of the GC "R-Pharm" and is the official distributor of Illumina in the Russian Federation.

The technology was developed by Solexa, which acquired molecular cluster technology from Manteia in 2004. In 2005, the phage genome phiX-174 was read using sequencing by synthesis, and in 2006 the first commercial sequencer, Genome Analyzer, entered the market. (Illumina acquired Solexa a year later.) Thus, the technology is already 16 years old.

The appearance of Genome Analyzer on the market has revolutionized, data volumes have increased dramatically. However, in the current Illumina line, the smallest model, the iSeq100, can be considered an analogue of the first device: it allows you to get 1.2 Gb in less than a day; a similar result on Genome Analyzer required four days. The most productive is NovaSeq6000 (3000 Gb in two days).

The technology continues to develop, the speaker noted. There are new options for using fluorophores, new approaches to the arrangement of clusters that increase their density, short-wave lasers, super-resolution optics, etc. Devices become easier to use, reagent kits are replaced with cartridges. At the same time, the data quality remains high.

Igor Shapovalov stressed that all Illumina libraries, regardless of the method of preparation, look the same and can be sequenced on any device of the company. He listed three main methods: Bead-Linked Transposomes (magnetic particles carrying enzymes; the simplest and fastest method suitable for a variety of materials), TruSeq Ligation (provides high quality, allows you to obtain libraries from DNA and RNA), as well as AmpliSeq for Illumina, developed at ThermoFisher, based on multiplex PCR.

The first two methods make it possible to obtain a genome—wide library, and targeting with PCR or probes complementary to the sequences of interest - certain sections of the genome. Exome sequencing kits and various panels used in medicine are built on the principle of probes.

The main fields of application of NGS are oncology, microbiology, reproductive health, agriculture, genetic diseases, molecular and cellular biology.

The iterative method of searching for mutations in oncology gives unsatisfactory results, sequential verification of mutations, from frequent to more rare, requires a lot of time and biopsy material, and as a result it may be more expensive than one NGS study. Today, there are various NGS panels - to identify hereditary predisposition, for specific groups of oncological diseases, with the possibility of studying DNA and RNA, separately or in parallel. For example, the TruSight Oncology 500 panel includes 523 genes.

The two main directions of NGS research in the field of reproductive health are screening of pregnant women for trisomy and monosomy by noninvasive method of maternal blood (noninvasive prenatal diagnosis, NIPT) and embryo screening during IVF, also to determine chromosomal abnormalities.

Routine research of agricultural plants and animals mainly uses biochips, but biochips cannot be created without NGS, the speaker stressed.

As for human genetic diseases, orphan diseases are rare, but there are many of them, and ultimately a considerable number of people suffer from them. Many mutations are random, in poorly described genes, and often only NGS makes it possible to successfully complete the "diagnostic odyssey".

Igor Shapovalov reminded that Illumina has devices for in vitro diagnostics (NextSeq550 Dx, MiSeqDx) registered as medical devices.

Finally, in fundamental research, molecular and cellular biology, there is now a great demand for omix studies — combinations of genomics with transcriptomics, proteomics, epigenetics, etc.

"Let's hope that the cheaper (sequencing) will continue, and everyone will have a genome—wide analysis," Igor Shapovalov said in conclusion. Interested questions about the current supplies of Illumina sounded from the audience. "We are working on deliveries, the most correct way to get an answer is to write to us," the speaker replied.

The third generation technology from the British company Oxford Nanopore presented Timur Yagudin (Skyjin LLC). This technology does not require completion of the chain: its principle is to advance the DNA or RNA chain through the pore, while the ion current changes in the pore, these changes are recorded and recorded as source data in Fast5 format.

The company was founded in 2005, the principles of the technology were formulated back in 1996, but all components of the system - membranes, nanopores, motor proteins — are constantly being modified. The nanopore may be, for example, a 9-subunit transmembrane protein-CsgG lipoprotein from E.coli, but this is not the only option.

When preparing the library, a motor protein is sewn onto the 5’ end (contrary to popular belief, it slows down the progress of the molecule through the pore, and does not accelerate it), and a tether, a particle of lipid nature, which precipitates the library onto the membrane with pores, is sewn onto the 3’ end.

The speaker listed the main advantages of the nanopore: long readings, fast sample preparation without amplification, real-time analysis, with the ability to stop the reaction and reload the library or refuel the ATP buffer. The nanopore allows detecting DNA and RNA modifications without bisulfite conversion — while 6 modifications are recognized, others will be added in the future — as well as sequencing hard-to-reach areas.

During the lecture, Timur Yagudin took out of his pocket and showed the audience a standard flow cell with which you can get about 30 gigabases of data. The smallest of the devices, the MinION, is not much bigger, he noted.

NGS5.jpg

The speaker spoke about the sample preparation of RNA and DNA, presented a range of devices and listed the applications in which nanopore sequencing is now making a significant contribution: microbiology), transcriptomics, de novo genome assembly. The new Q20 chemistry, which increases accuracy, makes it possible to no longer use short readings to assemble genomes. Its probes managed to arrive in Russia before the crisis, Timur Yagudin noted, and showed themselves very well.

Among the interesting novelties of ONT is a 3000—channel cell with a capacity of at least 2500 Gb, as well as the Duplex method, which allows sequencing the matrix and complementary chains of the DNA molecule sequentially and increases the accuracy of readings up to Q30. Now we can say that claims to the accuracy of nanopore sequencing are no longer relevant. What can be called a disadvantage is that the purity of the drug is important for the nanopore, since contaminants have a stronger effect on the sequencing process than on bridge PCR.

The GeneMind platform of the Chinese company SESANA was told by its representative Valentin Zhuzhin. The company offers a high-performance GenoLab M DNA and RNA sequencing system, which is based on chain completion (the technology is called SURF-seq) and supports the NGS protocols available on the market. There are publications by Chinese authors who compare it with NovaSeq 6000. Fast delivery of devices to Russia is possible.

Roman Younes presented the sequencing platform of the Chinese company MGI, which now has more than 1,500 employees on all continents and more than 2,000 devices. The MGI sequencer line includes DNBSEQ-G400, DNBSEQ-G50, DNBSEQ-T7.

The technology provides for the creation of a DNA nanoclub (DNA Nanoball) instead of a cluster - a long strand of DNA containing 300-500 copies of the fragment. DNBSEQ technology avoids problems associated with amplification. Both DNA isolation and sample preparation can be carried out both manually and automatically.

Roman Younes spoke about the sequencing chemistry of CoolMPS, which has an important difference, for example, from Illuminov. Detaching a conventional fluorescent label leaves a "scar"-ligand, the accumulation of "scars" reduces the quality of reading. CoolMPS uses nucleotides without labels, and they are detected using four types of bispecific antibodies that recognize the nucleotide itself and the blocking group. Accuracy does not decrease with increasing length, unlike traditional chemistry. MGI has also developed an elegant method that allows pair-end sequencing.

At the end of the introductory part , he spoke again Dmitry Shcherbinin. His second report was devoted to microchips (they are also called microarrays, from microarrays). Strictly speaking, microchips do not belong to sequencing methods, but are closely related to them. Like sequencing, microerreys can be used to analyze gene expression, determine SNP, identify organisms, and study alternative splicing.

Microchips, in fact, grew out of southern blotting - they are also based on the principle of complementarity. On the substrate there are oligonucleotides arranged in a certain way, and labeled nucleic acids from the sample interact with them. After hybridization, laser scanning, detection using a CCD camera and image analysis are carried out. The positions of the fluorescent dots on the chip gives a characterization of the sequence.

The first prototype of the Affymetrix microchip (now the brand is called Applied Biosystems and belongs to Thermo Fisher) appeared in 1989, and in 1994 commercial production of chips and scanners began. In 1997, genome-wide expression in yeast was studied on microchips. Now researchers can order chips for their tasks. The production of microchips is automated and resembles the production of computer microchips.

There are different types of microerreys, the speaker noted, not all of them are based on nucleic acids. With the help of chips, proteins (antibodies and antigens, ligands), cells, and glycans can be detected. There are such original methods of detection as planting an antibody on a bound protein with DNA attached to it; subsequent ring amplification grows a long DNA tail from the antibody.

Among the disadvantages of microchips, the speaker mentioned the high cost, a large amount of information that complicates processing, as well as, what is sometimes forgotten, a limited shelf life.

This ended the review lectures, and the sample preparation classes for NGS began.

ENROLL IN THE NEXT COURSE

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version