30 October 2015

-Omics and aging: from biomarkers to systems biology (4)

Translated by Evgenia Ryabtseva
(The end, the beginning of the article is here.)

Analysis of data-based systems "-omik"

Despite the successful application, all the previously presented approaches rely on the use of specified static systems. The first step that should be taken to overcome the limitations of such systems is to construct systems directly from the evaluated data.

Weighted analysis of the gene coexpression systemWhen conducting a weighted analysis of the gene coexpression system (Zhang & Horvath, 2005), systems of intergenic interactions are constructed directly from transcriptomics data.

Miller et al. (2008) applied this method to the processing of the previously mentioned data array containing information on gene expression in frontal cortex tissue samples of 30 people of different ages. The results obtained were compared with the system obtained by studying the transcriptomic features of Alzheimer's disease. The comparison demonstrated a significant degree of overlap between healthy aging and Alzheimer's disease, which indicates the possible existence of a common molecular basis for both processes. The three modules characteristic of Alzheimer's disease overlapped with the modules of the aging system, containing mainly genes associated with the regulation of synapse functioning, molecular transport and transcription.

Graphical Gaussian modelsDespite the successful application of weighted analysis of the gene coexpression system to transcriptomic data, Krumsiek et al. (2011) demonstrated that ordinary correlations are not suitable for the analysis of metabolomic data obtained in large cohort studies.

They analyzed the concentrations of metabolites in more than 1,000 samples and found that they revealed significant correlations for more than half of all pairs for 151 metabolites, even when using the limiting Bonferroni correction at a significance level of 0.01. This is largely due to indirect associations, which cannot be distinguished from direct associations using the Pearson correlation coefficient. 

Graphical models, also known as conditional independence graphs, have been proposed to overcome this problem and obtain biologically significant systems based on metabolomic data (Steuer, 2006; Krumsiek et al., 2011), as well as data obtained using other "-omics" (de la Fuente et al., 2004; Yuan et al., 2011; Mangin et al., 2012). Graphical models are probabilistic models in which a line connecting two variables reflects their conditional dependence, taking into account all other variables in the model. Indirectly, the absence of a line indicates the conditional independence of the corresponding variables. Several algorithms for constructing graphical models based solely on binary data presented in the form of R-packets are freely available (Wainwright et al., 2006; Höfling & Tibshirani, 2009; Guo et al., 2010; Ravikumar et al., 2010). 

Their equivalents for fully analog data are Gaussian graphical models, in which partial correlations are used to plot graphs. The partial correlation of two variables X and Y, depending on the set of variables Z, quantifies the proportion of correlation between X and Y that is not explained by Z. There are several algorithms for constructing graphical Gaussian models (d'Aspremont et al., 2006; Meinshausen & Bühlmann, 2006; Yuan & Lin, 2007; Friedman et al., 2008; Mazumder & Hastie, 2012). Some of them, such as the well-developed graphic lasso method (Friedman et al., 2008; Mazumder & Hastie, 2012), use regularization to further reduce the number of lines on the graph. This allows researchers to focus on a smaller number of highly reliable interactions.

Graphical Gaussian models can be used to reconstruct biological mechanisms based on the data of metabolomics and transcriptomics, but they have not yet been used to study aging. However, their application can help reduce the excess of results to fewer meaningful associations. The main disadvantage of graphical models is that they can only be used for purely Gaussian or purely binary data. Shin et al. (2014) solved this problem by constructing a first-of-its-kind graphical model based on data on concentrations of metabolites, then adding genetic variants as nodes and connecting them with associated metabolites. The resulting intuitive diagram illustrates the genetic control of metabolism. However, it is no longer a graphical model, and its lines are not a reflection of conditional dependence.

Mixed graphic modelsRecent developments make it possible to integrate different types of data while maintaining the positive qualities of Gaussian graphical models, resulting in mixed graphical models (Tur & Castelo, 2012; Chen et al., 2013; Fellinghauer et al., 2013; Lee &

Hastie, 2015). Fellinghauer et al. (2013) proposed a very flexible algorithm based on stability selection (Meinshausen & Bühlmann, 2010). It uses traditional methods such as random forest algorithms or regression models to rank interactions between variables of various types. Thus, it allows you to process many types of data, such as disease states, metabolite levels and genetic variants. Due to the use of stability selection, this approach initially has error control. Mixed graphical models are a powerful tool for multivariate analysis of multicomponent data arrays, which, however, has not yet been used in biological research. The application of these models can shed light on the complex relationships between aging and disease.

Gaussian graphical models, as well as mixed graphical models, are non-directional models. Therefore, they are unsuitable for deducing causal directions. In epidemiological studies, Mendelian randomization is a traditional approach to determining causality based on observational data. It uses the immutability of genetic variants to divide the studied population into groups, thus simulating a controlled randomized trial (for more details, see Brion et al., 2014). Mendelian randomization can be used to further study interactions of interest ("lines") previously identified using graphical models. However, this method relies on stable associations with genetic variants and the assumption that the genetic variant in question is not interrelated with any other factors potentially capable of distorting the results. Due to these limitations, this method is not suitable for working with large-scale systems.

Bayesian networksAnother approach that allows us to identify causality based on observational data under certain assumptions is based on Bayesian networks.

As well as graphical Gaussian models, Bayesian networks are probabilistic models in which lines symbolize the conditional independence of variables from each other. However, Bayesian networks are oriented acyclic graphs, that is, they differentiate the influence of X on Y from the influence of Y on X. In turn, the acyclicity of the causal graph is an assumption that may not apply to biological systems. Application of Bayesian networks to transcriptomics data obtained using high-throughput methods, Friedman et al. (2000) demonstrated the potential of this method to isolate biologically significant associations without suggestive information. Scientists have several different methods at their disposal, such as bnlearn R packages, which allow us to evaluate the structure of Bayesian networks from binary, continuous and even mixed data (Scutari, 2010).

Review of methods of systems biology and their application in the study of aging Method: Representation analysis

Prerequisites: Defining module (e.g. gene arrays from Gene Ontology)
Applicable to: genomics, transcriptomics, proteomics, metabolomics
Availability: several R-packages (for example, GSEABase, GAGE, MSEA), DAVID and Enrichr online tools
Application: Lu et al. (2004), de Magalhães et al. (2009)

Method: System MappingPrerequisites: A given system, such as a protein-protein interaction system, a gene regulation system, or a metabolic system
Applicable to: data of all "-omics"
Availability: R-package igraph, Cytoscape with various additional modules
Application: Wang et al. (2009), Bell et al. (2009), West et al. (2013), Faisal & Milenković (2014)

Method: Negative-positive systemsPrerequisites: A system of protein-protein interactions
Applicable to: transcriptomics
Availability: – 
Application: Xue et al. (2007)

Method: Weighted analysis of the gene coexpression system (WGCNA)Prerequisites: –
Applicable to: transcriptomics (and possibly other continuous data)
Availability: WGCNA R-package
Application: Miller et al. (2008)

Method: Graphical Gaussian modelsPrerequisites: –
Applicable to: any multiparameter data obeying the law of normal distribution (Gaussian).
Availability: Multiple R-packages (e.g. ggm or glasso)
Application: applied to metabolomic data by Krumsiek et al. (2011)

Method: Mixed graphic modelsPrerequisites: –
Applicable to: binary, continuous and mixed data.
Availability: 
Application: –

Method: Bayesian networksPrerequisites: –
Applicable to: binary, continuous and mixed data.
Availability: Multiple R-packages (e.g. bnlearn, gRain, abn, deal)
Application: applied to transcriptomic data by Friedman et al. (2000)

The methods presented here are just a selection of available approaches to extracting information from graphs. A number of other methods are widely used for modeling biological systems, such as Boolean schemes (Shmulevich et al., 2002) and systems of differential equations (Chen et al., 1999; Lorenz et al., 2009).

The development of new approaches facilitates the extraction of data from graphs constructed using multicomponent data arrays, and the mentioned studies demonstrate their applicability in biological research. However, most methods of extracting information from graphs rely on large sample volumes and usually require that the number of samples exceeds the number of variables. In the analysis of "-omic" data, especially genomic and transcriptomic, this is in many cases impossible. This situation is known as the "n is much smaller than p" problem. 

Another common problem is the excessive workload of models from a large number of parameters. To mitigate these limitations and reduce congestion, several approaches have been proposed, such as regularization. Nevertheless, in order to avoid obtaining erroneous results, careful cross-checking and reproduction of the results using independent cohorts should be resorted to. Finally, many high-throughput methods suffer from significant technical deviations and strong group effects. Before combining different data sets, researchers should carefully standardize all measurements in accordance with existing standards.

Model biological systemsThe ultimate goal of systems biology is not only qualitative study, but also quantitative modeling of the organism to facilitate computer simulation of experiments, hypotheses and predictions.


The first and to date the only attempt to model an entire organism was made by Karr et al. (2012). They created a mycoplasma genitalium cell model that allows simulating the cell cycle and predicting concentrations of metabolites. However, this model is far from perfect (Freddolino & Tavazoie, 2012) and is too primitive to be adapted for more complex organisms. At the present stage, modeling of eukaryotic cells and whole organisms is impossible. Processes such as aging are also too complex for full-fledged modeling. However, some efforts have been made to create systemically smaller subsystems, as well as certain aspects of the aging process. 

For example, Gillespie et al. (2004) simulation of yeast aging based on the accumulation of extrachromosomal ribosomal ring DNA. Also, Oda & Kitano (2006) combined the results of several hundred studies to create a model of a signaling network mediated by a Toll-like receptor. The same group created a similar model for the signaling mechanism mediated by the receptor to the epidermis growth factor (Oda et al., 2005). Both studies have demonstrated global structures that resemble a bow tie in shape and have one important key regulator. However, both systems are only qualitative descriptions that do not have kinetic parameters. Therefore, they cannot be used for computer simulations.

Other groups focused even on even smaller subsystems that facilitate quantitative modeling. One study examined the effect of elevated cortisol levels on hippocampal activity (McAuley et al., 2009). A quantitative model was created to simulate the age-related decline of hippocampal function and accelerate this process in acute and chronic increases in cortisone levels. Simulations using ordinary differential equations have shown that a chronic increase in cortisol levels leads to a faster extinction of hippocampal function than acute bursts, but is better treatable. Sozou & Kirkwood (2001) modeled the physiological aging of a cell based on data on telomere shortening and oxidative stress. The same group described the effect of chaperone proteins and the accumulation of improperly folded protein molecules with age (Proctor et al., 2005). Other groups studied various other aspects of the aging process, such as mitochondrial fusion and division, as well as the accumulation of defective mitochondria (Kowald et al., 2005; Figge et al., 2012), incomplete replication of epigenetic information (Przybilla et al., 2014) and age-related disorders of lipid metabolism (McAuley & Mooney, 2015). Correction of the kinetics of such models depending on experimental observations allows us to formulate plausible hypotheses about the causes of aging.

In contrast to the previously described systems that create large-scale systems based on data (top-down approach), these approaches model small subsystems with a high degree of detail based on previously known reliable information (bottom-up approach). Such bottom-up models open up a mechanistic vision of the aging process, which cannot be achieved with the help of individual studies of associations. Moreover, they facilitate the development of a new hypothesis and testing the validity of an existing hypothesis.

Conclusions and objectivesThe most important recent advances in "-omic" technologies allow the simultaneous assessment of millions of biological parameters.

Association studies have revealed many associations between these "-omics" and aging, as well as age-related diseases. After decades of simplistic research, specialists began to apply system analysis and integrated data analysis "-omic" to the aging process at the system level. As a result, some studies also take into account the effects of interaction between variables. However, given the complexity of the aging process, new methods are needed to further decipher the numerous interactions.

Systems biology already has such methods, but their application to real biological problems is somewhat delayed. For example, Gaussian graphical models have already been adapted to mixed data of different types and can be used in the study of aging. In addition, models of aging-promoting processes have been developed in a number of studies. This provides detailed information about the important components of the aging process and their interactions. The aim of future research based on these results should be to integrate these different components in order to gain a more complete systemic understanding of the aging process.

However, in many cases, the possibilities are limited by the available data. Specialists face problems such as incomplete data, asynchronous experiments, strong group effects and insufficient sample size. Another issue is the limited number of available multi-ohm data arrays, which complicates the reproduction of results in this area, which is also difficult due to the variety of methods, protocols and platforms used. Given the criticality of reproduction to prevent erroneous results and false-positive validation, researchers should more often consider the feasibility of using methods such as dividing available data into separate arrays for detection and replication.

Despite these difficulties, several large population studies have been conducted that have provided multi-ohm data suitable for analysis using systems biology approaches. For example, the goal of the GTEx project is to collect data on gene expression and methylation in multi-tissue samples (The Gtex Consortium 2013). At the same time, the development of new methods should help in the analysis of existing partially incomplete data arrays and facilitate the analysis of multi-tissue and multi-organ data, thereby simplifying the study of real system effects. 

Solving these problems and developing integrated models of aging should improve our understanding of the aging process, which, in turn, will allow us to develop strategies for improving health in old age.

For a list of references, see the original article.

Portal "Eternal youth" http://vechnayamolodost.ru
30.10.2015
Found a typo? Select it and press ctrl + enter Print version