25 November 2021

The doctor from the car

"Artificial intelligence in medicine"

N+1

Machine learning is increasingly being used in medicine. In the foreseeable future, algorithms will not replace doctors, but will help them with routine work and compensate for the shortcomings of people who tend to get tired, lazy and try to simplify their lives. In the book "Artificial Intelligence in Medicine: How Smart Technologies Change the Approach to Treatment" (Alpina Publisher), translated into Russian by Alexander Anvaer, Professor of molecular medicine, cardiologist and researcher Eric Topol talks about algorithms that change modern diagnostics and treatment. N + 1 invites its readers to familiarize themselves with an excerpt that explains how machine learning simplifies the study of the basics of genomic diseases.

intellekt.jpg

The most important discoveries

The huge amounts of data that are available today in biology and medicine urgently require the introduction of machine learning and artificial intelligence. Let's take for example the "Atlas of the Cancer Genome" (TCGA), which contains multidimensional biological data covering a variety of "-omics" — genomics, proteomics, and so on. In total, the atlas contains more than 2.5 petabytes of information extracted from data on more than 30 thousand patients. No one person can view and analyze all this data. Oncologist Robert Darnell, currently working at the Faculty of Neuroscience at Rockefeller University, said: "We, as biologists, can only point out, for example, the biological basis of autism. The power of a machine that can ask a trillion questions where we only have time to ask ten changes the rules of the game."

However, unlike those tangible and visible changes that are already being felt today in connection with the use of artificial intelligence by specialists in such branches of medicine as radiology and pathological anatomy (that is, where recognition of complex images is required), science stands apart: artificial intelligence does not yet encroach on the status quo of scientists, AI can they just need help. As Tim Appenzeller put it in an article for the journal Science, artificial intelligence is still a "journeyman" of scientists. But artificial intelligence can already offer them a very tangible help: on the cover of one of the issues of Science 2017, it was written — "Artificial intelligence transforms science." It turns out that AI not only "spawned neurobiology" (as we will soon see for ourselves), but also "reset the discovery process". In fact, Science saw something really new there, beyond the horizon - "the prospect of fully automated science," and this, according to the authors of the article, meant that "a tireless student can very soon become an equal colleague."

AI "colleague" is, in my opinion, a matter of the rather distant future, but its penetration into science is happening at a rapid pace, regardless of whether it will ever be able to displace scientists. And indeed, AI in the application to biological sciences is developing faster than in the application to healthcare. After all, basic science data does not always require validation based on clinical trials. Fundamental science does not need approval from the medical community, it does not need to be put into practice, it does not have to comply with the strict requirements of regulatory legislation. However, despite the fact that science is not always able to break into clinical practice, ultimately all advanced achievements — whether it is the discovery of new, more effective drugs or the identification of biochemical mechanisms responsible for health and disease — will somehow affect medical practitioners. Let's see what our "apprentice" has achieved.

Biological "-omics" and cancer

In genomics and biology, artificial intelligence is an indispensable partner of scientists, since machines have vision capable of distinguishing things inaccessible to the human eye and sifting through huge amounts of data incomprehensible to the human mind.

Data-rich genomics is an ideal field for the application of computer methods. Each of us is a treasure trove of genetic data, the diploid (from father and mother) chromosome set of each of us contains 3.2 billion pairs of different combinations of nucleotides: A (adenine), C (cytosine), G (guanine) and T (thymine), and 98.5 percent of this genome does not encode any proteins. That is, more than 10 years after the complete decoding of the human genome, the function of all this material remains unclear. One of the first attempts of deep learning concerning the genome, Deep-SEA, was devoted to elucidating the function of elements that do not participate in protein coding. In 2015, Jian Zhou and Olga Troyanovskaya from Princeton University published an algorithm that, after training based on cataloging data of tens of thousands of nucleotides that do not encode proteins, turned out to be able to predict exactly how DNA sequences interact with chromatin. Chromatin consists of large macromolecules that provide the "packaging" of DNA for storage, and also help to deploy its strand for RNA transcription and (ultimately) for protein translation. Thus, the interaction between chromatin and DNA sequences plays an important regulatory role. Xiaohui Xie, an IT specialist at the University of California, Irvine, called it "an important milestone on the path of applying deep learning to genomics."

Another proof of this concept, one of the first, was the study of the genetic basis of autism spectrum disorders (ASD). Prior to this study, only 65 genes were associated with scouting with a high degree of confidence. The algorithms made it possible to identify 2,500 genes that are likely to influence the manifestation of symptoms or even are the root cause of ASD. The algorithms even made it possible to map the interaction of interested genes.

Deep learning also helps to solve the fundamental problem of interpreting variants of identified sequences of the complete human genome. The most widely used program is the Genome Analysis Toolkit (GATK). At the end of 2017, Google Brain developed and implemented the DeepVariant system in addition to GATK and other previously developed tools. DeepVariant does not use a statistical approach either to identify mutations and errors, or to calculate the probability that a combination of nucleotides is true or erroneous. Instead, the system creates a visualization of basic reference genomes, known as "image stacks", and uses it for deep learning of a convolutional neural network, and then creates visualizations of newly sequenced genomes in which scientists want to identify variants. Unfortunately, despite the fact that DeepVariant is publicly available, it is difficult to use it, since it requires massive calculations and puts a greater load on the processor than GATK.

Determining the potential pathogenicity of a detected variant is not an easy task, and if the variant is located in a part of the genome that does not encode proteins, then the matter becomes even more confusing. And although today there are more than 10 AI algorithms aimed at solving this problem, the identification of genome variants that cause diseases remains the most important unsolved problem so far. The Princeton team mentioned above has taken another important step forward in the application of deep learning to genomics by starting to predict the effect of variants of genome elements not involved in coding on gene expression and disease risk.6 The Illumina team of scientists used deep learning in the application to the study of the primate genome to improve the accuracy of predicting disease-causing mutations of the human genome.

Genomics (DNA research) is not the only "—omics" ripe for machine and deep learning. Deep learning is already being used for every level of biological information, including data on gene expression, transcription factors and RNA-binding proteins, on proteomics and metagenomics (in particular, on the intestinal microbiome), as well as for studying data related to individual cells. DeepSequence and DeepVariant are artificial intelligence tools that help to understand the functional effect of mutations and accurately identify genome variants, respectively, and the quality of performing these tasks is higher for them than for all previous models. The DeepBind algorithm is used to predict the functional adequacy of transcription factors. The DeFine program is able to quantify the binding of DNA to RNA transcription factors and helps to assess the pathogenic role of sequence variants in regions of the genome that do not encode proteins. Work has been carried out to predict the specificity of DNA and RNA-binding proteins, to identify sequences encoding certain protein frameworks by sequences of amino acid residues, as well as to determine the hypersensitivity of cells of many types to DNA-aze I. Epigenomes were analyzed using the DeepCpG algorithm, which is able to predict the degree of base methylation in individual cells. Also, with the help of this program, DNA binding sites in chromatin and methylation sites were predicted, and deep neural networks were improved during the most complex analysis of data on nucleotide sequences in the RNA of individual cells. Within different "-omics" and in the intervals between them, the number of interactions seems endless, and scientists are increasingly using machine learning to understand and evaluate the myriad ways genes interact within a single cell.

The application of AI to genome editing has particularly impressive prospects. Microsoft's division, Microsoft Research, has developed an algorithmic Elevation application that has proved capable of predicting inefficient substitutions in the human genome when trying to edit it: thus, it allows you to predict the optimal places for editing DNA sections and designing RNA carriers for editing CRISPR (this abbreviation stands for DNA fragments, or, more precisely, "short palindromic repeats, regularly arranged in groups"). This algorithm has surpassed the efficiency of other CRISPR algorithms, which were created using deep learning. Such algorithms not only increase the accuracy of results in experimental biology, but also play a key role in many clinical trials, which already use the CRISPR system for genome editing (in diseases such as hemophilia, sickle cell anemia and thalassemia).

Probably, therefore, it is not surprising at all that image recognition has begun to play a central role in cellular analysis (especially considering that this is one of the greatest strengths of deep learning): to sort the shape, classify types, determine the origin, identify rare cells in the blood or to distinguish between dead and living cells. The inner workings of cells are the focus of DCell, a deep learning algorithm that predicts cell growth, gene interactions, and other functions.

Cancer is a genomic disease, so it is not surprising that oncology especially benefits from the introduction of artificial intelligence. In addition to helping to interpret data on DNA sequences in tumor cells (which was done in relation to glioblastoma, a malignant brain tumor), we received new tools for understanding the genesis and biophysics of malignant neoplasms.

The data on DNA methylation of malignant tumors turned out to be a very useful consequence of the use of AI in the classification of tumors in oncology. Pathologists traditionally use histological preparations to diagnose brain tumors. This diagnosis is quite difficult: there are many rare forms of cancer that create big problems for the pathologist if he has not seen them before; tumor cells are a mosaic of cells of different types; biopsy, as a rule, does not allow you to select all the cells that are present in the tumor tissue. In addition, the visual evaluation of the drug is inevitably subjective. In 2018, David Capper and his colleagues at the Charite Hospital (Berlin) studied the methylation of the whole genome in tumor samples: their study showed an accuracy of about 93 percent in the classification of all 82 types of malignant brain tumors, which significantly exceeds the results of pathologists. The machine-determined degree of DNA methylation has led to a revision of the classification of more than 70 percent of human-labeled tumors, which means a change in the prediction of both disease outcomes and treatment tactics. These data will find wide application both in biological cancer research and in clinical practice.

With the help of artificial intelligence, we have learned a lot about the evolution of cancer. Scientists were able to decipher the hidden signals of the evolution of a cancerous tumor in 178 patients using learning transfer technology, which seriously affected the formation of a prognosis for these patients. However, in the modern world, overflowing with cheap hype about AI, this fact was presented on the front page of the British tabloid Daily Express as follows: "The robot War against Cancer." Artificial intelligence tools have helped to detect oncogenic somatic mutations and understand the complexity of the interaction of genes of cancer cells.

The last clear and instructive example of studying cancer with the help of artificial intelligence is its application to a complex biological system for predicting the malignancy of its constituent cells. Using frogs as a model of tadpoles, scientists injected tadpoles with a combination of three reagents to identify the combination that causes melanocyte malignancy in some tadpoles and leads to the growth of a cancer-like tumor. And although not all tadpoles from this population developed a tumor, another thing was curious — all the melanocytes of a particular tadpole behaved the same way: either all became malignant, or all developed normally. Scientists have tried to determine the combination of reagents that would lead to the emergence of intermediate forms — when only some cells of the body become malignant.

After conducting several experiments to determine the standards, the authors then used artificial intelligence models to conduct 576 virtual experiments simulating the embryonic development of tadpoles under the influence of various combinations of reagents. All imitations, except one, were unsuccessful. However, a needle was found in this haystack — with the help of artificial intelligence algorithms, on the basis of which a model was created that predicted a tumor-like phenotype when not all cells develop in the same way. The model was subsequently verified. Daniel Lobo from the University of Maryland in Baltimore County, the author of the study, commented on this: "Even with the creation of a complete model describing the exact mechanism of control of the system, a person will not be able to independently find the exact combination of drugs that will lead to the desired result. This work has served as proof of how an AI system can help us pinpoint the measures needed to get a specific result."

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version