14 November 2022

Healing algorithms

How Artificial Intelligence Revolutionized Biomedicine

Daniil Kuznetsov, Naked Science

In movies, artificial intelligence and robots are often portrayed as insidious and malicious, but almost never treating deadly diseases or rejuvenating human organisms. But biomedicine is one of the most important applications of AI. Over the past five years, many impressive breakthroughs have taken place here. Even now, AI can really help millions of people. However, conservatism and distrust of many doctors of the old school to new technologies prevent the widespread introduction of such systems. What discoveries has artificial intelligence made in molecular biology and how will they affect cancer treatment and life extension?

AI looks inside the cellMolecular biology has long been a "wet" science — scientists had to work mainly in laboratories, dripping solutions and drugs into test tubes.

A new era began in 1990 with the launch of the Human Genome project. More than 30 years since then have been marked by several key trends.

Firstly, it is the development of sequencing technologies — "reading" the sequence of nucleotides, elementary letters in the code of DNA and RNA molecules, as well as their subsequent total reduction in price. It's no joke — it took 13 years and about three billion dollars to get a sequel (decryption or "text") of the first complete human genome (and taking into account inflation at the moment — almost six billion). Today, everyone can do the same in a week or two, spending only from 600 to a thousand dollars!

Secondly, the advent of the "era of epigenetics". Although this science has a century-old history, its heyday and a change in the paradigm of understanding heredity also occurred after 1990. It became clear, also largely due to sequencing, that it is not important which genes and their mutations exist in the genome of living beings, but which ones, how and why are active at one time or another.

The third trend that unites and builds on the first two is the appearance and rise of all kinds of "omics". The central dogma of molecular biology says: the realization of genetic information always follows the path from DNA through RNA to proteins.

At the same time, all our genes in the DNA form the genome. All currently expressed (active) genes are a set of RNA or transcriptome. All proteins synthesized on the basis of mRNA are proteomes. All signaling pathways in cells in which expressed proteins are involved are interactive. Well, all metabolite molecules are a metabolome. At the same time, it is also important to take into account that proteins not only have to be synthesized, but also undergo a folding process, or stacking into a special characteristic three-dimensional structure, on which their properties will also depend.

"Omics" have generated huge amounts of data in molecular biology. In the new era, the key researchers in this field have become "dry" bioinformatics, specialists in the study of large omics data. Often these people have never even been to laboratories, but they were well versed in how to process data sets and find patterns inside them. One of the best methods for this is machine learning. And, as you know, big data is always the main fuel for artificial intelligence systems. Therefore, AI quickly turned into both a widespread research method in biology and an applied technology that implements scientific discoveries in the form of a useful medical product for patients and doctors.

AI defeats cancer

If the genome is to a certain extent a stable characteristic of the cells of our body (taking into account the fact that mutations may occur in it), then all other "omics" change depending on the type of cells, tissues, organs, the state of the body, the effects of environmental factors and even psychological stress.

For example, in the everyday view, cancer is a kind of single disease. In fact, modern doctors call cancer only malignant tumors of epithelial tissue — carcinomas. However, such neoplasms can occur in all tissues — bone, connective or muscular (sarcomas), nervous (gliomas), cells of the lymphatic system (lymphomas), blood and bone marrow (leukemia), and so on.

But that's not even what matters. The same solid tumors in the uterus or mammary gland of two women look exactly the same symptomatically, but at the level of their genomic, transcriptomic and proteomic profiles may differ dramatically. So, if you treat them with the same, standard methods, then in one case therapy can give a positive result, and in the other not.

Omix data and artificial intelligence technologies have opened the way in medicine for personalized and precision medicine, when treating not a disease at all, but a specific patient and a form of pathology characteristic of him, based on information about his unique profile of active genes and expressed proteins here and now.

An excellent example of the successful implementation of the precision approach and the use of AI both for research and for individual diagnosis and selection of the most effective treatment were the developments of the Russian biomedical startup Oncobox, a resident of the Skolkovo Foundation. One of the co—founders and Director of Science in the company is Anton Buzdin, Doctor of Biological Sciences from IBH RAS, and among the researchers are leading Russian oncobioinformatics from MIPT and Sechenov University.

There are over 160 targeted drugs for the treatment of solid tumors. Each of them affects its specific molecular targets in cancer cells, which is why their effectiveness differs for different groups of patients. Oncobox has developed a special diagnostic study to make a reasonable choice of a specific targeted drug for each patient.

It includes full-exome sequencing of a new generation (Next Generation Sequencing, NGS) of tumor biomaterial taken by puncture or after surgical operation to remove it. Such sequencing makes it possible to "read" over 22,500 protein-coding genes and identify in them all the leading ("driver") mutations that can cause the development of a tumor in a patient.

Then comes the determination of the mutational burden of the tumor (the number of mutations per million nucleotides) and transcriptomic analysis of gene activity by the level of mRNA expression. At this stage, differences in gene expression in tumor and normal tissue are revealed. Transcriptomic data show which genes are suppressed and which are active and can become targets for targeted drugs.

The study is completed by two know-how of the Russian company: interactomic analysis, during which, using bioinformatic algorithms, changes in molecular pathways specific to a particular tumor are determined and the impact on them of most of the antitumor drugs available on the market is modeled. And in the final, on the basis of combining genomic, transcriptomic and interactomic data, artificial intelligence builds an individual effectiveness rating for more than 160 targeted drugs.

The attending physician should pay attention to the first 5-10 positions of the rating. There often fall both conventional drugs used in the "gold standard" therapy for this type of tumor, and completely unexpected ones. Extremely simplifying: the patient may have ovarian cancer, but the system recommends her a remedy for lung cancer.

The problem is that old-school clinicians usually refuse to prescribe such medications, since they are not included in the standard recommendations. And here not only prejudice and lack of understanding of the specifics of modern precision medicine and artificial intelligence work, but also certain legal concerns are triggered. However, in the late stages of cancer, doctors can prescribe off-label drugs (non-standard, including experimental ones), and often patients show a good response to therapy after taking them. Nevertheless, the question of oncologists' trust in the "second opinion" from AI and the possibility of prescribing these drugs for the patient on its basis still remains.

AI overcomes agingThe science of life extension (longevity science) is also difficult to imagine now without artificial intelligence technologies.

So, Alexander Zhavoronkov, a former visiting professor at MIPT and head of the Bioinformatics laboratory of the FNCC DGOI, launched the startups Insilico Medicine and Deep Longevity, where deep learning is used to find the means of "eternal youth". Even the well-known visionary and evangelist of AI from China, Kai-Fu Li, who has about 70 million subscribers on social networks, has invested in these companies.

Deep learning models are actively used to identify biological markers that could serve as objective indicators of age. The sets of such indicators found by the neural network are called DAC — Deep Aging Clocks. Among them are "clocks" of completely different types: genetic, epigenetic, proteomic, as well as psychological (based on the results of responses to questionnaires), according to the results of a general blood test, according to electrocardiography and encephalography, even just from photographs of the face.

About 17 DAC was just discovered by the company Deep Longevity. For example, a team of scientists led by Zhavoronkov studied transcriptomes of skeletal muscle cells using machine learning. Following age-related changes in gene activity, they were able to show that the main role in aging is played by genes involved in maintaining the balance of calcium ions and in a number of intracellular signaling pathways, including interaction with neurotransmitters.

Any of the DACs can become a biological target for an anti-aging drug. Neural networks also help in their search. They screen pharmacological databases that contain information about the properties of millions of already known molecules. By comparing and combining many of their combinations, AI determines potential substances that can affect a particular biological target. Moreover, neural networks are also able to predict which of the substances already used in pharmacology may have an unexplored "anti-aging" effect, and which chemical modifications will be needed to enhance the desired effect.

As a result, thanks to AI, molecular screening, which previously required a lot of real and resource—intensive experiments, has turned into a task solved by relatively short calculations, in silico - "in silicon", that is, on a computer using machine learning. And generative-adversarial neural networks (Generative adversarial network, GAN) — two opposing each other within the same model (the first, conditionally, offers solutions, and the second rejects them) — can generate potential molecules with the desired structure and functions "from scratch".

The most famous among them are the SeqGAN, RANC and ATNC models. At the same time , in 2017 Insilico Medicine Alexandra Zhavoronkova also presented her druGAN model, capable of generating small compounds with a predetermined ability to target cancer tumors.

AI predicts DNAOver the past two years, huge breakthroughs have occurred in Natural language processing (NLP) technologies.

Generative language models such as GPT-3 and LaMDA for English, created in the ruGPT-3 Savings Bank and in Yandex YaLM 100B for Russian, multilingual BLOOM and mGPT have been greatly developed. At the same time, all of them are able to work not only with natural languages, but also with other sign systems — programming languages, musical notations, mathematical expressions, and so on.

But the DNA code is also a kind of "language". Well, or at least a sign system with its own alphabet, methods and rules for combining it into "words" and grammar of "expressions". In many ways, of course, this is a metaphor, but it is productive. Because it is possible to work with the decoded human genome as with text, using modern NLP models.

This spring, scientists from the scientific group "Bioinformatics" of the Russian Institute for the Study of Artificial Intelligence AI (Artificial Intelligence Research Institute) made a world-class breakthrough. They presented the GENA-LM language transformer model, trained for the first time on the latest T2T-CHM13 data set, which contains the most complete information about the human DNA sequence to date.

The fact is that in the framework of the Human Genome project in 2003, not the complete genome was sequenced, but only 85% of it — the so-called euchromatin, that is, the genes themselves and the areas between them. The other, auxiliary part, heterochromatin, was finally deciphered only in the spring of 2022.

In GENA-LM, the encoder converts input sequences into vector representations that the decoder is already working with. The developers have supplemented this system with the BigBird attention mechanism, which increases the efficiency of processing particularly long sequences. During the training, the model's task was to predict 15% of the hidden part of the sequence based on the open 85%.

Such a language model, which "understands" the hidden patterns in the sequence of human DNA, will make it possible to better understand the mechanisms of its work, as well as the dangerous violations that arise in them. Now, with the help of GENA-LM, it is possible to find sites that activate or, conversely, suppress the work of individual genes and entire gene cascades. All this will also be useful in promoting precision diagnostics and therapy.

AI folds proteinsConcluding the conversation about the impact of AI on molecular biology, it is impossible to bypass the famous AlphaFold 2 transformer model from the company 

DeepMind. Introduced at the end of 2020, by July 2022 it has generated three-dimensional structures for more than 200 million proteins. As the developers themselves put it, "the whole protein universe." 

And this is by no means an unfounded statement. The publicly available data set includes information about the proteins of archaea and bacteria, plants, fungi and animals. That is, all four kingdoms of living organisms allocated by biologists.

Proteins are the key molecules of life. They are encoded in DNA sequences, but in many ways their properties and functions are determined by a complex spatial form. It is set in the process of stacking (folding) as a sequence of amino acids, of which all proteins are composed, and the conditions for folding the chain and a number of other factors.

In molecular biology, before the advent of artificial intelligence, to determine the mechanism of operation of a protein, its structure had to be established experimentally. This required a lot of effort and resources, research could take more than one year. However, over several decades, scientists around the world have been able to collect data on the structure of almost 200 thousand proteins.

The created data set was used to train the AlphaFold 2 model. As a result, in a year and a half, AI surpassed the efforts of the entire scientific community of molecular biologists of the Earth by three orders of magnitude over the cumulative time of its existence.

Instead of a conclusion

AI has radically changed biological science, and fireworks of discoveries are taking place in institute and university laboratories, R&D departments of private companies. But if the medicine of the first third of the XXI century is really visible there, then in the real health systems of different countries we are at best observing the end of the XX century.

"Each direction of AI application in biology generates a whole field of application in the field of practical health care. The task of the biomedical cluster of the Skolkovo Foundation at the stage of forming an understanding of the practical application of a particular concept is to support the team so that this technology enters the market and can prove its viability. Here lies a large and time-consuming work with models of the functioning of health systems in different countries and in the life sciences industry as a whole. Only the education of medical workers in the field of AI, the organic transformation of state regulatory policies, standards and the legislative framework can change the situation," says Sergey Voinov, Director of acceleration in the direction of digital medicine of the biomedical cluster of the Skolkovo Foundation.

Portal "Eternal youth" http://vechnayamolodost.ru

Found a typo? Select it and press ctrl + enter Print version