30 October 2015

-Omics and aging: from biomarkers to systems biology (3)

(Continuation, the beginning of the article is here.)

From "-omic" to systems biology

Most of the studies mentioned above are devoted to the identification of two-dimensional associations of age (or age-related diseases) and any of the "-omic" data types. However, there are pronounced mutual dependencies both between and within the various "-omics" data (see Fig. 1).

Interdependencies of the "-omics" data: the diagram shows mutual dependencies that can be seen in almost every data array obtained using one of the "-omics". Solid lines indicate the biological processes that cause dependence, while dotted lines symbolize the identified associations.Correlations can be observed between almost all levels of biological organization.

Taking into account the central dogma of molecular biology, genomics, transcriptomics and proteomics correlate with each other by definition. Moreover, the concentrations of metabolites are influenced by genetic variants (Shin et al., 2014) and epigenetic factors (Petersen et al., 2014), mediated by changes in gene expression or enzymatic activity. Methylation levels not only affect gene expression (Jaenisch & Bird, 2003), but also correlate with gene variants (Bell et al., 2012) and environmental factors (Breitling et al., 2011). The authors of this review have recently demonstrated that even the composition of the microbiome is partially controlled by the host genes (Goodrich et al., 2014). Similarly, all levels of "-omic" data are influenced by genetics along with the influence of the environment and aging. 

However, correlations are detected not only between, but also within each of the data types. For example, in genomics, nonequilibrium coupling or correlated presence of snips is a ubiquitous phenomenon. Transcription factors often co-regulate the expression of multiple genes (Allocco et al., 2004), and a correlation has also been observed between the methylation profiles of so-called CpG islands (Bell et al., 2012). Metabolites are interconnected through a network of biochemical reactions that form strong correlations between them (Krumsiek et al., 2011). Even phenotypes often form groups. Combined diseases or a disproportionately high incidence of diseases spread to many pathologies, possibly due to the common mechanisms that cause their development (Goh et al., 2007).

These biological correlations can be the cause of inconsistencies in the study of associations and this fact is an important subject of modern research. For example, the review authors identified 153 age-associated metabolites, whereas subsequent analysis showed that only 22 of them were independently associated with age (Menni et al., 2013b). Similarly, 21 out of 24 evaluated immunoglobulin G glycans correlated with age, but only 3 of them explained 58% of the variability (Kristic et al., 2013). The same has been demonstrated for epigenetic data (Weidner et al., 2014). The use of all available data types makes it possible to refute huge lists of associations with aging, but cause-and-effect relationships that are interesting from a biological point of view are often lost in this set of results. To reconstruct the processes involved in aging at the system level, approaches are needed that simultaneously take into account the information obtained at the levels of all "-omics" (Valdes et al., 2013).

Despite the development of high-bandwidth technologies and the availability of more and more data, the integration of "-omik" remains a difficult task. In addition to the limited availability of "multi-ohm" data arrays for the same samples, technical limitations complicate the integration process. While genomics and transcriptomics have the capacity to measure the entire variety of variants, other "-omics" (such as proteomics and metabolomics) measure only a small fraction of all objects. Many high-bandwidth technologies suffer from significant technical variations and strong group effects. The strictest quality control and careful standardization of data are critical when analyzing this type of data. Moreover, the complexity of the organism should be taken into account. While the genome is more or less stable, all other levels of "-omic" are different for different cell types and change over time. 

Many samples, such as whole blood, contain a mixture of different cell types with potentially different epigenomes and transcriptomes (Houseman et al., 2012; Jaffe & Irizarry, 2014). And finally, different organs and cells influence each other. The blood metabolome, for example, is strongly influenced by processes occurring in the liver and other organs, and to fully understand this, it is necessary to study samples consisting of different tissues. This, in turn, is not always possible in the conditions of epidemiological studies, since the isolation of tissue fragments often requires invasive procedures. Nevertheless, data integration is an important and active area of research. The first stage of data integration is the integration and joint interpretation of individual results. To facilitate the systematic analysis of aging, the Digital Aging Atlas (Craig et al., 2014) collected more than 4,000 age-related changes identified using various technologies.

Introduction to Systems BiologyThe goal of systems biology is to understand the system and functions as a whole, not as separate components (Cassman, 2005), and the ultimate goal is mathematical modeling of biological systems and simulation of their outcomes.

At the first stage, it is necessary to give a formal description of the complex interactions and dependencies between these components, which will make it possible to conduct a system analysis and simulation of the biological system in question. A technique widely used in systems biology is the translation of biological interactions into mathematical well-defined systems (graphic images). 

For example, metabolites interact in chemical reactions, forming a system whose nodes denote metabolic compounds, and lines denote chemical reactions. Similarly, transcription factors bind to DNA to regulate gene expression, forming a gene regulation system, and interacting proteins form a system of protein–protein interactions (see Figure 2B). These systems interact with each other, which makes data integration an important aspect of systems biology. One example of a phenotypic system was created by Goh et al. (2007), who used diseases as nodes, and interrelated diseases and risk factors common to them as lines (see Fig. 2A). By doing this, they showed that many diseases have common genetic risk variants and that similar pathologies are caused by the same genes.

Topological properties of biological systems. (A) A fragment of the human disease system (Goh et al., 2007). Nodes denote diseases that are related to each other if they are associated with the same gene. Pakinson's disease is a link for three isolated clusters of diseases (indicated by different colors) and, accordingly, has a low clustering coefficient (0%) and a high coefficient of intermediacy (72%). (C) Demonstration of the close environment of apolipoprotein D (ApoD) in the system of protein interactions taken from the STRING DB database (Franceschini et al., 2013), using only experimentally confirmed interactions. Apolipoprotein D connects two clusters and, despite the low level (2) and clustering coefficient (0%), is the central node (centrality by mediation 53%). For comparison, the LEPR protein is central in the blue cluster (level 7, clustering 14%).Graphical images can be analyzed using a variety of generally recognized algorithms.

One common task is to identify modules, that is, subgraphs in which nodes have the same properties. In biological systems, modules correspond to functional units, such as the mechanism of glycolysis in the metabolic system. Modules usually interact with each other and together form a hierarchical structure in which the distribution of node levels – the number of lines per node – is determined by a power dependence (Barabási & Oltvai, 2004). Hence, most of the nodes have few connections and only a few nodes have many connections. These nodes with a large number of connections are called switches (Albert et al., 2000; Jeong et al., 2001). 

Several other parameters are used to describe the topology of systems and the topological characteristics of nodes. For example, the clustering coefficient indicates the density of the relationship between the objects of the node's environment, which allows you to determine the nodes central to the cluster (for example, LEPR in Fig. 2A). Another parameter is the centrality by mediation, which estimates the proportion of the shortest paired paths containing a node. Thus, it quantifies the importance of a node for connecting other nodes from different modules (for example, Parkinson's disease in Fig. 2A and apolipoprotein D in Fig. 2B). Having many connections, central nodes are considered key players in the system, connecting several modules and controlling flows in the system. Their particular importance has been demonstrated for many diseases and the survival of the body (Barabási & Oltvai, 2004; Joy et al., 2005; Yu et al., 2007).

Currently, there are many software packages for graph analysis and visualization in the public domain. For example, you can use the R-package igraph (Csardi) to analyze and visualize a graph. & Nepusz, 2006) or the standalone Cytoscape program (Shannon et al., 2003). Cytoscape also provides the ability to easily integrate biological databases such as Gene Ontology (Ashburner et al., 2000), Reactome (Croft et al., 2014), the Kyoto Encyclopedia of Genes and Genomes, KEGG (Kanehisa & Goto, 2000) or BioGRID (Chatr-Aryamontri et al., 2013), third-party applications. Several methods have also been developed for identifying node modules that are simultaneously affected by the studied state. Examples that are freely available include jActiveModules, which is a plug-in for Cytoscape (Ideker et al., 2002), and the BioNet R package (Beisser et al., 2010).

Below is a selection of modern methods for the design and analysis of biological systems as an approach to systems biology and their contribution to the study of aging.

Analysis of the representation and topology of the system in predefined systemsA popular approach to translating the results of association research into the context of systems biology is to transfer the studied variables, such as age-associated genes, proteins or metabolites, to known biological (reference) systems.

The environment of these studied variables and their topological properties can be evaluated using experimentally defined protein-protein interactions, gene regulation systems or metabolic systems. Instead of interpreting individual positions separately, it is possible to use the already available information about their interactions and common functions to identify modules that are simultaneously subject to the studied state.

Several databases offer a collection of experimentally identified interactions that can be used as predefined reference systems for representation and topology analysis. In the case of protein-to-protein interactions, the Human Protein Reference Database provides information on more than 40,000 Human Protein Reference Database provides information on more than 40,000 protein-to-protein interactions (Keshava Prasad et al., 2009), the Database of Interacting Proteins provides information on more than 7,000 interactions (Xenarios et al., 2002), and the mammalian protein interactions database of the Munich Protein Sequence Information Center (MIPS mammalian protein–protein database) is about 1,000 manually verified interactions between human proteins (Pagel et al., 2005). Gene regulation systems are provided by ChIPBase (Yang et al., 2013), which contains data on the binding zones of six million transcription factors identified in more than 300 experiments. KEGG, among other things, provides information about metabolic reactions.

Representation analysis is a convenient method of implementing existing knowledge obtained from reference biological systems without directly analyzing the topology of the graph. Therefore, predefined (functional) modules within reference systems are used to test the dominance of associated genes, proteins or metabolites in these groups. When studying genomes, researchers usually use Gene Ontology to divide them into groups based on biological processes, molecular functions, or intracellular localization. For metabolites, the KEGG and Reactome databases provide verified information about biochemical mechanisms. The R-packages GSEABase, GAGE (Luo et al., 2009) and the MSEA web service (Xia & Wishart, 2010) are just some examples of available implementation methods and variations of the original algorithm for analyzing gene representation (Subramanian et al., 2005).

In the study of aging, the representation analysis revealed overexpression of genes involved in immune responses, as well as in the synthesis of lysosomes and glycoproteins, and reduced expression of genes associated with mitochondrial and oxidative phosphorylation in older people compared to younger people (de Magalhães et al., 2009). It was found that for human brain tissue, genes associated with oxidative stress/DNA repair are represented in a group of genes differentially expressed in young and elderly people (Lu et al., 2004). Representation analysis facilitates identification of mechanisms that are important for the aging process. In this way, it helps to understand individual associations and find biological interpretations for the observed molecular changes.

In order to abandon the given description of the module and provide a more detailed analysis of the system, the variables studied can be displayed directly on the diagrams of known protein interactions, gene regulation systems or metabolic systems. After that, the modules can be identified dynamically based on the measured data. Moreover, it is possible to evaluate additional topological properties of variables.

The study of human protein-protein interaction systems has shown that homologously associated with aging have higher levels of nodes and higher centrality in mediation compared to other genes (Bell et al., 2009). Moreover, the genes associated with aging are not distributed throughout the interactome (the totality of all protein interactions), but form clusters in several modules with a large number of connections. These modules are enriched with genes involved in DNA damage repair and stress response (Kriete et al., 2011). A high degree of connectivity of aging genes was used by Tacutu et al. (2012) to select the "neighbors" of longevity-associated genes in protein-protein interaction systems as candidates for longevity genes. In subsequent experiments on C.elegans worms, 30 new longevity-associated genes were identified, which demonstrated the potential of systems biology in the search for candidate genes. 

Using a modified system of protein-protein interactions, Wang et al. (2009) demonstrated a close relationship between the genetic causes of aging and diseases. These results indicate that aging is not based on random errors, but on an organized process. Another protein-based approach to data integration was developed by West et al. (2013). The authors combined epigenomic data by determining the DNA methylation zones for each protein in the scheme and then identifying the modules of differentially methylated genes/proteins in the resulting system. This allowed them to avoid using specified sets of genes, which is typical for representation analysis. The analysis revealed three differentially methylated modules replicated in several tissues. Two of them were represented mainly by genes regulating transcription, while the third contained genes associated with stem cell differentiation.

The disadvantage of experimentally isolated protein-protein interactions and gene regulation systems is that such methods give up to 50% false positive results, while many real interactions are not detected (Huang & Bader, 2009; Marbach et al., 2012). And, more importantly, such reference systems do not take into account the spatio-temporal characteristics of interactions at all. This limits the results to already known, possibly inactive interactions. 

One of the methods to circumvent the static nature of protein-protein interaction systems is known as negative-positive systems (Xia et al., 2006). Such systems integrate a system of protein-protein interactions with transcriptomics dates by limiting it to lines between (anti-)correlating proteins/genes. In such cases, only the interactions active under the observed conditions (i.e. lines) are further analyzed. Xue et al. (2007) applied this method to the previously mentioned array of data on gene expression in brain tissue and described two anticorrelating modules containing proteins associated with cell proliferation and differentiation. The other two modules, consisting of genes associated with protein processing and immune function, respectively, showed a weak correlation with the cell proliferation module. 

The authors of a later study went a step further and limited the system of protein-protein interactions to genes with a high level of expression at different stages of aging, separately for each sample, which provided a complex of interconnected dynamic systems instead of one system. Even though the properties of all these graphs are very similar on a global scale, the centrality of several genes correlated with age (Faisal & Milenkovic, 2014).

Combining biological systems to analyze age-related changes has demonstrated a close relationship between aging and diseases at the molecular level. Moreover, aging has been shown to affect central genes, which is important for the integrity of the system (Bell et al., 2009). While representation analysis and analysis using protein-protein interaction systems are widely used for genetic and transcriptomic data, it is not used in the study of aging using metabolomics data. This approach may be very promising for the systematic identification of metabolic mechanisms jointly affected by the aging process.

Found a typo? Select it and press ctrl + enter Print version