Distributed computing for personalized Medicine
The Cancer Institute of New Jersey (CINJ), several US universities and IBM have joined forces to create network information technologies for high-performance analysis of tissue microchips in order to improve the accuracy and speed of cancer diagnosis.
A few years ago, all patients with the same diagnosis were treated according to the same protocol, for example, chemotherapeutic. If this did not have the desired effect, patients were transferred to another drug, which, if necessary, was replaced by a third, etc. Currently, doctors are gradually moving from "trial and error" treatment to personalized medicine – the selection of the optimal treatment method based on the results of genetic tests.
To carry out work within the framework of a joint project for the development of personalized medicine, which received a $2.5 million grant from the National Institutes of Health last October, scientists use tissue microchips, pattern recognition algorithms and network supercomputer computing.
According to David Foran, director of the CINJ Center for Biomedical Imaging and Informatics, samples containing various types of tissues are applied to each microchip used in the work. The specialists participating in the project have developed software that allows distinguishing these heterogeneous tissue fragments and identifying the presence of specific oncological biomarkers in them. A special computer video system, also created within the framework of the project, allows you to determine the localization of markers in a particular tissue, as well as their belonging to certain cellular elements (for example, the nucleus or cytoplasm). All these data are compared with the clinical results of treatment of specific patients.
To confirm the concept required to obtain funding for the project, CINJ scientists conducted a retrospective study of tissue samples from more than 100,000 patients with known diagnoses and analyzed the results using specialized software. IBM programmers provided network access to use these programs and processed information using the World Community Grid (WCG) distributed computing network, an IBM–created "virtual supercomputer" designed to carry out such work. It would take 2,900 years to perform calculations of this scale using a conventional desktop computer, while it took less than 6 months for WCG.
Upon completion of the analysis, the researchers were able to compare the identified biomarker profiles with the results of diagnosis and treatment. As a result, pronounced correlations of biomarker expression profiles with various types of diseases and their stages were found.
Currently, experts plan to expand the range of diseases studied, increase the reference library of known gene expression profiles and create a clinical decision support system that will allow oncologists around the world to download the information contained in the database and compare it with the results of the analysis of their own samples. In the future, calculations will be carried out using caGrid, an open software infrastructure being developed as the main network structure of the cancer Biomedical Informatics Grid (caBIG) program funded by the National Association of the Computer Industry (National Computing Industries, NCI). In addition, IBM is allocating a high-speed supercomputer to the new CINJ High-Performance Data Analysis Center for processing information contained in digitized archives of tumor tissue samples and genomic data.
According to Joel Saltz, professor of the Department of Biomedical Informatics and the Department of Computer Engineering and Design at Ohio State University, whose specialists developed most of the caGrid software, one of the main objectives of the project is to develop a caGrid-compatible infrastructure that allows integrating and comparing data on tissue microchips and virtual samples with data obtained during conducting various experiments.
To ensure the compatibility of different data sets, mechanisms are needed to standardize the use of biological terms and translate complex data structures obtained during various experiments into XML. The caGrid infrastructure is designed to integrate databases and computational procedures into a worldwide programming environment. However, in order to fully use the capabilities of this environment, potential users need to be well-versed in procedure names and know the query language. Professor Saltz's group is working on solving these issues, developing standard data models and a system of formalized definitions of biomedical concepts that are consistent with caBIG processes and allow avoiding the formation of isolated "islands of information".
The complexity and scale of the work necessitated the multisectoral cooperation of a large number of organizations. Plans for the coming year include the creation of a working prototype of the system, available for use by specialists from the universities of Arizona, Pennsylvania and Ohio, Rutgers University (New Brunswick, New Jersey) and the Cancer Institute of New Jersey. This system will become a kind of testbed for continuous improvement and refinement of the software – a stage for which the executors of the work took 3 years. Foran states that in the fourth year of work, they hope to provide the scientific and clinical communities of the world with a complete and accessible software product.
And while caGrid is being developed, anyone can provide unused resources of their PC to other distributed computing programs – for example, in the BOINC network (Berkeley Open Infrastructure for Network Computing) with a convenient, including Russian, interface.
Portal "Eternal youth" www.vechnayamolodost.ru based on Bio-IT World materials 18.02.2008