19 June 2012

Cloud bioinformatics for "-omics"

Analysis in the clouds

Marina Astvatsaturyan, MednovostiOn July 1, 2012, DELSA Global, a new global alliance aimed at improving the efficiency of data use in biomedicine and life sciences, launches several large-scale scientific and applied projects DELSA Endorsed Projects.

These are international projects, within the framework of which the exchange, analysis and distribution of huge amounts of data stored in online cloud storage will take place. One of these projects is based on proteomics, a science that studies proteins and their interactions in various organisms and habitats, including humans. Biomedicine has high hopes for proteomics.

Proteomics, etc.Research in the field of proteomics is currently being conducted in many scientific centers around the world, each of which applies its own approach, and therefore the comparison of raw data obtained in different places and stored on hard drives is often associated with fundamental difficulties.

As explained by one of the experts, Eugene Kolker, co-founder and President of DELSA Global, Chief Data Officer Seattle Children's Hospital and a professor of biomedical informatics at the University of Washington, "it's like comparing apples and oranges." However, data that is freely available on the web and appropriately attributed can become comparable and, therefore, informative for a wide range of specialists.

To develop a simple user interface to proteomics databases, it is necessary to know in which organism, in which tissues and under what conditions a particular protein is expressed. Based on this request, a unique and publicly available protein database was created in the laboratory of Evgeny Kolker – MOPED, Model Organization Protein Expression Database, which is used by more than 2000 laboratories around the world. Users, comparing their own data with MOPED data, which contain information about proteins of different organisms, including humans, get a statistically reliable result. In particular, by detecting a new protein associated with a particular disorder in the body, they can use a software tool like MOPED to distinguish a really new one from an already known one discovered by other researchers.

So, the creators of the MOPED database themselves, together with their colleagues from the University of Pennsylvania, USA, and the University of Strasbourg, France, discovered two protein molecules associated with the development of type 2 diabetes. It turned out that these are regulatory proteins that can be used to restore insulin production by the body as needed. One of these proteins is currently undergoing preclinical testing.

The MOPED database is also used in their work by Russian researchers – members of the DELSA alliance. This is a team of scientists led by Academician of the Russian Academy of Medical Sciences Alexander Archakov, Director of the V.N. Orekhovich State Research Institute of Biomedical Chemistry of the Russian Academy of Medical Sciences, taking part in a global project that replaced the human Genome. We are talking about the Human Proteome Project, in which Russia is responsible for the proteins of the human body encoded by the 18th chromosome.

One of the new large–scale projects of DELSA Global is the Global Protein Atlas. The members of the international alliance spoke about it for the first time quite recently, at their meeting in May this year in Bethesda, USA. Its purpose is to characterize, on the basis of genomic data, all kinds of protein molecules by a number of parameters, such as in which tissue a particular protein is expressed, in which disease, in which environment and in what concentration. To implement the project, DELSA participants will use, in particular, the data of another large–scale project - the human Microbiome, in which the researchers found that the human body is inhabited by about 10 thousand different types of microbes. Together, they express about 8 million protein-coding genes. Special conditions will definitely be needed to process and store this information.

Salvation in cloud technologiesThe questions of where to store, how to store and how to provide access to the data that proteomics, genomics, microbiomics and other so-called "-omics" allow bioinformatics technologies to answer.

"If this data is stored in a computer to which there is no access at all or there is access, but at an improper speed, then no one will be able to analyze this data, and from the taxpayer's point of view this is a job to nowhere," says corresponding member of the Russian Academy of Medical Sciences, Deputy Director for Scientific Work of the State Research Institute of Biomedical Chemistry. V.N. Orekhovich RAMS Andrey Lisitsa.

"The person generated terabytes of data, from which it was possible to collect material for one or two articles in the particular field in which he is a specialist, but everything else turned out to be closed to the scientific community. Therefore, now the condition for any highly effective experiment is the placement of data in the so–called public repositories," the scientist believes.

Evgeny Kolker also expresses a similar point of view: "from 20 to 40 thousand laboratories around the world produce the amount of data in exabytes (exabyte is a unit of measurement of the amount of information equal to 10 18 or 2 60 bytes), which are used within only 10 percent – no more, and they are not used because nowhere else not posted and unavailable. And clearly something needs to be done differently. And who can do something reasonable with a huge amount of data that does not lie anywhere? Companies like Google, like Amazon, like Yandex, like the Chinese search engine Baidu. They have other technologies, they have distribution centers, and recently they have become extremely efficient in using cloud technologies. These companies are able to analyze data, although biological data is very diverse – this is not an analysis of data about our purchases and trips," explains Kolker. The public repositories that Andrey Lisitsa is talking about provide so-called cloud services, or cloud technologies. They provide ubiquitous and convenient on-demand network access to a shared data pool. "Clouds are a very reasonable partner for scientific research," Kolker believes.

Further development of proteomics is associated with the improvement of bioinformatics methods and the development of cluster computing systems that will use complex data processing algorithms.

However, no actions with data by themselves will reveal the meaning inherent in them without human participation. According to Andrey Lisitsa, this is a wonderful illusion: "on the left we have a repository in which these data are piled, on the right we have a powerful computing cluster, the "tadpoles" upload algorithms that they first develop there, the cluster takes data, processes it in a meat grinder, and gives us answers to the fundamental question, how life is organized." So far, everything is not so simple, and a human scientist cannot be excluded from this process at any stage.

Portal "Eternal youth" http://vechnayamolodost.ru19.06.2012

Found a typo? Select it and press ctrl + enter Print version