02 March 2018

Genomics in the Cloud

Microsoft opens access to cloud tools for genomic research

Press Center

In order to make discoveries leading to breakthroughs in the treatment of oncological diseases in children, researchers from all over the world need the opportunity to share genomic data and study them together. Therefore, in 2010, a computational biology specialist Jingui Zhang and his team from the St. Petersburg Children's Research Hospital. St. Jude Children's Research Hospital has started uploading anonymized genomes of healthy and cancer cells of its patients to public repositories.

"We understood that downloading data and using it in research is very difficult because of their huge volumes," says Zhang. "That's why our hospital started looking for other ways to share data with the global research community." This led to a collaboration with a group engaged in genomic research at Microsoft. At that time, Microsoft was starting work on cloud computing resources that would allow billions of fragments of raw genomic data to be compared with reference genomes, and then determine how the matched and referenced genomes differ using an analytical method of matching and determining variants.

On Wednesday, February 28, Microsoft announced the opening of access to the Microsoft Genomics service, which was the result of Microsoft's work in this important area.

Variants are what makes individuals unique. These are markers of a variety of traits, from physical qualities to a tendency to diseases. To understand the meanings of the variants, researchers use a method called genome-wide association search. The more genomic data researchers receive and analyze, the more accurately they will be able to decompose them into complex biological structures and get closer to finding treatments for cancer and other diseases faster.

Dr. Zhang's team is working on creating a pipeline for genome mapping and searching for variants together with researchers from Microsoft, using the secure cloud platform DNAnexus, created on the basis of Microsoft Azure.

To date, the researchers have processed about 500 terabytes of genomic data and stored them in Azure for further analysis. For comparison, to burn 500 terabytes, you will need 750,000 standard CDs.

Genomic data from the Hospital of St. Judas, analyzed using a pipeline and stored in the cloud, became the basis for the data exchange platform that the research hospital is creating together with DNAnexus and Microsoft. The goal of this enterprise is to provide researchers from all over the world with the opportunity to participate in the search for methods of treating oncological diseases that are diagnosed in approximately 175,000 children under the age of 15 every year.

"The opportunity to conduct experiments with real data together with such researchers is a great success for us," says Geralyn Miller, head of the Microsoft genomic research group.

Easy acquisition of quality data The Microsoft Genomics service is part of the Microsoft Healthcare NExT initiative aimed at accelerating the introduction of innovations in the field of healthcare through artificial intelligence (AI) and cloud computing.

In genomics, the path to achieving these goals begins with reliable and accurate data. "We know that we need high–quality data, and if we make it much easier to obtain such data, then we will be able to transfer biological information to the cloud for analysis and, hopefully, make the work more productive and efficient," says Bob Davidson, senior software architect from the Microsoft genome group.

Davidson explained that the Microsoft Genomics service is an essential element of the next generation of artificial intelligence–based mechanisms that will help make breakthroughs in understanding and effectively treating cancer and other diseases. For example, by analyzing the genomic data of tumors and healthy tissues of a patient, the doctor will be able to choose the most appropriate treatment based on the results of comparison with data on other cancer patients, including treatment methods and outcomes.

Miller notes that a common pipeline for processing genomic data helps reduce noise and distortions that degrade data quality, and get a stronger signal for AI elements of precision medicine.

"We're making this stage publicly available," Miller says. "We want people to be able to easily go through it and get a consistent set of data at the output."

The ideal job for the cloud is the sequencing stage, where matching and searching for variants is performed, called secondary analysis. The opportunity to make this stage publicly available began to appear as the cost of sequencing a single human genome decreased. If in 2001 it required $100 million, today it is less than $ 1,000, which is comparable to the cost of other common medical tests. Experts expect that such a reduction in price will lead to a sharp increase in demand, and by 2025 more than 100 million human genomes will be sequenced.

But this creates another problem that Microsoft and DNAnexus are already ready to solve. It takes about 100 GB of disk space to store a single human genome, and with an increase in the number of sequenced genomes, gigabytes of data will turn into petabytes and exabytes. It is expected that by 2025, 40 exabytes of storage will be required to store human genome data. An exabyte is approximately 1,000 petabytes, which is equivalent to 1.5 billion standard CDs.

"Genomic data is really big data that requires very intensive calculations," says Miller. Processing of a single human genome takes several hundred hours of processor time. Modern laptops are usually equipped with quad-core processors, whereas hundreds of thousands of processors are available in data centers, which makes genomic data processing "an ideal job for the cloud."

In addition, the processing of genomic data involves a number of legal and ethical requirements necessary to ensure confidentiality and data security. Microsoft has a network of Azure data centers distributed around the world, and the Microsoft Genomics service is currently available in the United States, Western Europe and Southeast Asia. Microsoft Genomics has passed ISO certification, confirming that this service complies with certain international standards of security, confidentiality and quality. In addition, Microsoft extends the HIPAA business agreement to this service, according to which companies must take a responsible approach to the management of personal medical data. The applicable security principles and rules are described on the Microsoft Trust Center website.

Ecosystem of partners The DNAnexus company, which manages genomic data, works with the St. Petersburg Children's Research Hospital. Judas is working on creating an Azure-based data exchange platform. DNAnexus will integrate the Microsoft Genomics service and other genomic data analysis and visualization tools, providing researchers with an interface to access tools and datasets and creating a secure ecosystem for collaboration.

"We achieve the greatest success when our scientists solve scientific problems together with the scientists of our clients, and then transfer the data to this platform. They conduct certain tests, after which the main work begins," says Richard Daly, director of DNAnexus. – In this case, our team is actively working together with the St. Petersburg Hospital. Judas and Microsoft are working on defining requirements and creating solutions based on them."

Miller, Davidson and their colleagues from the Microsoft Genomic Research Group view the Microsoft Genomics service as the first of many tools to be included in an Azure-based ecosystem that unites all partners, including DNAnexus. For example, as Miller notes, the issue facing researchers from the St. Petersburg Hospital has not yet been resolved. Judas: how will there be an exchange and collaboration with different types of data obtained by different organizations using different tools?

"The Microsoft Genomics service is distinguished by its focus on research,– says Miller. "We have enough expertise to try new things and implement ideas that have arisen in laboratories."

Portal "Eternal youth" http://vechnayamolodost.ru

Found a typo? Select it and press ctrl + enter Print version