Network analysis: tackling complex data to study plant metabolism

Network analysis: tackling complex data to study plant metabolism

TIBTEC-1018; No. of Pages 8 Review Network analysis: tackling complex data to study plant metabolism David Toubiana1,2, Alisdair R. Fernie1, Zoran N...

1MB Sizes 0 Downloads 29 Views

TIBTEC-1018; No. of Pages 8

Review

Network analysis: tackling complex data to study plant metabolism David Toubiana1,2, Alisdair R. Fernie1, Zoran Nikoloski1, and Aaron Fait2 1

Max-Planck-Institut fu¨r Molekulare Pflanzenphysiologie, Am Mu¨hlenberg 1, 14476 Potsdam-Golm, Germany Ben-Gurion University of the Negev, Jacob Blaustein Institutes for Desert Research, French Associates Institute for Agriculture and Biotechnology of Drylands, Midreshet Ben-Gurion, 84990, Israel

2

Incomplete knowledge of biochemical pathways makes the holistic description of plant metabolism a non-trivial undertaking. Sensitive analytical platforms, which are capable of accurately quantifying the levels of the various molecular entities of the cell, can assist in tackling this task. However, the ever-increasing amount of highthroughput data, often from multiple technologies, requires significant computational efforts for integrative analysis. Here we introduce the application of network analysis to study plant metabolism and describe the construction and analysis of correlation-based networks from (time-resolved) metabolomics data. By investigating the interactions between metabolites, network analysis can help to interpret complex datasets through the identification of key network components. The relationship between structural and biological roles of network components can be evaluated and employed to aid metabolic engineering. Networks in biology The representation of biological systems by networks (graphs) is commonly applied in modern biology to analyze the systemic interplay of biological components [1,2]. A network is formally described as a collection of nodes, representing the components of the network and their relationships, given by a set of edges (Box 1a). For instance, in ecology, food webs are used to illustrate the feeding interaction patterns among species, as components of an ecosystem, whereas in neurology, networks capture the connections between neurons. Interest in the study of molecular networks has dramatically increased due to the parallel advances in analytical profiling technologies, the development of bioinformatics and biostatistics methods, and the increasing accessibility of high-throughput data in public databases. Molecular networks can be employed to represent: (i) actual experimentally confirmed interactions between biological components, including genes, proteins, and metabolites; (ii) the coordinated changes in abundance of these components as a result of endogenous or environmental cues; or the combination of (i) and (ii). In other words, molecular networks can be established based on existing knowledge of molecular interactions [3], from the relationship between data Corresponding authors: Nikoloski, Z. ([email protected]); Fait, A. ([email protected]). Keywords: metabolic profiles; correlation-based metabolic networks; plant metabolism; regulation of cellular processes; high-throughput data acquisition.

profiles [4], or based on mapping of data onto networks [5,6]. In multivariate statistical analysis, large datasets are usually used to determine biological components that show differential behavior between conditions. Integrative network-based analysis aims at identifying coordinated changes in molecular processes. In this sense, networkbased analysis of high-throughput data provide the means for generating biologically meaningful hypotheses and for planning perturbation (see Glossary) experiments to unveil the underlying regulatory mechanisms. Here we first review the advantages and limitations in high-throughput data acquisition with current metabolomics technologies, and then summarize the latest developments and the most commonly used approaches for network-based analysis, which can be readily applied in plant science. In particular, we illustrate the application of correlation-based networks

Glossary Bayes rule/theorem (BR): the BR is a statistical measure to calculate the likelihood of the occurrence of an event or condition, dependent on at least two conditional variables. Community: collection of nodes, representing biochemical components, densely or strongly connected relatively to their relation with the rest of the network, which can suggest for group functionality. Constraint-based approaches: these use biochemically meaningful constraints, including, mass balance, thermodynamics (reaction reversibility), and the steady-state assumption to obtain flux distributions optimizing an assumed systemic objective (e.g., biomass production, ATP consumption). Correlation-based networks (CN): CNs are obtained by applying (Pearson) correlation on the data profiles from the considered biochemical components. In biological CNs, nodes correspond to the biological components, and edges are established based on correlations between the corresponding data profiles. CNs can be composed without any a priori information. Dynamic time-warping (DTW): DTW is an SM that assesses the similarity between two sequences varying in time. DTW is a suitable SM for comparing the patterns of behavior of two biological objects, such as metabolites, across different time-points. Euclidean distance (ED): ED measures the ‘ordinary’ distance between the corresponding coordinates of two points. Knowledge-based approaches: approaches in which relationships between biochemical components are established based not only on data but also on a priori experimentally established dependencies. For instance, dependencies between metabolites based on their joint participation in reactions can be used as a priori knowledge. Perturbation: changes in the abundance of cellular components or their interrelations due to alteration in environmental condition (e.g., temperature, light, nutrient availability) and/or genetic manipulation (e.g., gene knockout or overexpression). Similarity measure (SM): mathematical expression used to quantify the similarity between two data profiles. Depending on the nature of the data, different SMs can be used (e.g., Euclidean, Manhattan, Hamming distance, or Jaccard and Matching similarity). The Pearson correlation coefficient is often used with biological data.

0167-7799/$ – see front matter ß 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.tibtech.2012.10.011 Trends in Biotechnology xx (2012) 1–8

1

TIBTEC-1018; No. of Pages 8

Review

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

Box 1. Basic components of a graph (a) Undirected graph: displayed are the nodes (colored circles), representing metabolites, and edges (connecting lines), denoting the relation between their corresponding data profiles, defining a network. (b) Directed graph: in directed networks, edges have a source and a target node. The source–target relationship is indicated by an arrow. (c) Hypergraph: in hypergraphs, more than two nodes can be related, rendering a richer structure capable for representing biochemical networks. (d) Edge-weighted graph: a network can also include information about the strength of relations between the nodes, depicted by the difference in edge-width (Figure I).

(a)

(b)

(c)

(d)

TRENDS in Biotechnology

[8,9], and mammalian systems [10], propelling technological advances in the fields. The analysis of metabolite profiles can lead to the discovery of regulatory mechanisms underlying the activity of an organism, an organ, or a group of cells, deciphering their responses to the environment or to genetic alterations [8,11,12], exposing the natural variability of metabolic regulation [9,13,14], as well as resolving the progression of metabolic processes over time [15,16]. In eukaryotic cells, metabolites, enzymes, and entire metabolic pathways are distributed in specialized compartments, and therefore the study of regulatory mechanisms must account for this aspect of cellular organization. To acquire a higher resolution of subcellular metabolism several metabolite profiling-based strategies have been developed. Noteworthy to mention are: (i) flux analysis of isolated organelles [17–19], where harvested organelles are fed with labeled substrate and its metabolism is followed by measuring labeling patterns in the spectrum of identified metabolites [20–22]; (ii) optimized fractionation protocols, by which subcellular compartments are collected from the plant tissue at reasonable purity (often simultaneously) [23], retaining their metabolic functionality [24,25]; and (iii) the use of transgenic plants altered in their organellar transporters or organelle-specific metabolic genes [11,17]. In addition, computational attempts have been pursued to reconstruct metabolic networks in plant cells by taking into account compartmentalization and tissue-specificity, aided by the integration of existing knowledge on Arabidopsis reactions, enzyme subcellular localization, and tissue-specific protein expression data [26].

Figure I.

(CNs) obtained from high-throughput data profiles to gain novel insights into the complex regulation of biochemical reactions. Advantages and limitations of data acquisition in metabolomics The state-of-the-art analytical instrumentation is unable to provide a complete snapshot of the metabolome. The heterogeneity in physicochemical properties and range of concentration (over 7–9 magnitudes) of the metabolites in a cell requires the parallel use of different analytical platforms. Nevertheless, the study of the metabolome offers valuable complementary information to the transcriptome and proteome approaches, particularly for non-model organisms whose genomes are still not assembled. The metabolome can be seen as a manifestation of the flow of biological information from genes to downstream processes and, in theory, more closely represents the physiological status of the plant cell at any given moment. Moreover, metabolites are generally conserved across kingdoms and are characterized by common biological roles. Their independence from genome information in turn enables the use of standardized reference libraries for any biological sample in relatively rapid and low-cost analyses. As a result, metabolite profiling has become a commonly used tool to the study of microbial [7], plant 2

Plant metabolism can be represented by multiple network-based approaches An accurate representation of biochemical reactions, involving metabolites and enzymes, specification of substrate–product relationships, as well as mass-balance and thermodynamic constraints, has been proven fundamental to understanding cellular activity [19]. Constraintbased metabolic networks have proved to be valid for modeling and characterizing flux distributions, particularly in unicellular organisms. Constraint-based approaches rely on the assumption that the system is in a metabolic steady-state, characterized by constant metabolite levels, and determine the corresponding flux distributions based on the assumption that the cell/organism operates towards maximization of growth or minimization of ATP usage as likely systemic objectives in a given environment. The environment is characterized by specifying influxes of nutrients taken up by the cell as well as flux capacities for each considered (enzymatic) reaction. Although very attractive, the use of these approaches in the study of organisms with multiple cell types and complex cellular compartmentalization remains challenging [22]. In plants, constraint-based modeling has been used with initial motivating results to validate the predicted metabolic fluxes through integration of gene expression data [12,27] and to analyze principles of coordinated changes of reaction fluxes and metabolite concentrations [28]. At a structural level, biochemical reactions can be represented by a [directed (Box 1b)] hypergraph (Box 1c), in which

TIBTEC-1018; No. of Pages 8

Review an arbitrary number of nodes, representing metabolites, are connected via a (directed) hyperedge [29,30] (Box 1c). Hypergraphs are used to organize and manage biochemical reactions in databases of metabolic networks, for example BioCyc [31], the Kyoto Encyclopedia of Genes and Genomes (KEGG) [32,33], and the MetRxn metabolite and reaction knowledgebase [34]. These network databases have been used to contextualize transcriptomics and proteomics data and have been directed at metabolic engineering [35]. Metabolic networks assembled from scientific knowledge have facilitated investigations of condition-specific steady-state metabolic flux distributions [36]. For instance, the metabolic compositions of Chlamydomonas reinhardtii under mixotrophic and autotrophic conditions were used to specify the condition-specific biomass functions which can in turn be used as objectives maximized by this unicellular organism [37]. Recent studies making use of this knowledge have shed light on resource allocation during seedling establishment [38–41], aromatic and flavor associated traits of fruits [42,43], as well as on developmental processes [44]. Although knowledge-based networks capture substrate–product relationships, they neglect the concerted regulation characteristic of cell metabolism. Traditionally, biochemical reactions are presented as parts of metabolic pathways, such as the tricarboxylic acid (TCA) cycle or the pentose phosphate pathway (PPP). However, biochemical reactions do not operate in isolation but instead function through concerted action via shared metabolites and common mechanisms of regulation. It is a common procedure to treat metabolic pathways as encapsulated stand-alone entities, particularly in the framework of kinetic modeling [44,45]. However, this approach overlooks contextualization of metabolic pathways to the entire metabolic network. In fact, qualitative and quantitative changes in the metabolite levels of a plant organ or tissue are a consequence of endogenous (e.g., developmental) [46,47] and exogenous (e.g., environmental, stress) cues [48,49], which bring about shifts between metabolic (steady-) states. Perturbation experiments, in which a biological system is subjected to a series of environmental conditions or genetic alterations and it is subsequently analyzed for its metabolic profile, have been used to define and further refine a cellular objective [13,50–52]. The latter can be defined as a cellular product or process of relevance, such as oils and fats, proteins or specialized metabolites with health-promoting properties, and photosynthetic and resource use efficiency, respectively [53]. Complementary to the constraint-based and kinetic modeling approaches, metabolic network analysis constructed from metabolite profiling data of perturbation experiments facilitates the study and representation of coordinated changes in metabolite abundance. Nevertheless, the assumption that metabolites in a biological sample at a given moment are at steady-state precludes from integrating metabolic data from perturbation experiments in models with the purpose of elucidating fluxes. Addressing analytical limitations of current metabolomics approaches, as well as complexities inherent to the heterogeneity of the plant organism and cellular compartmentalization, will determine the success of this type of metabolic network-based analysis.

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

Construction of networks from metabolomics data A network reconstructed from metabolite profiling data is a collection of nodes, which represent the metabolites, and edges, which capture the relationships between the metabolites (Box 1a). This network definition can be readily extended to integrate data profiles from other biochemical components (e.g., proteins and gene transcripts). A relationship between the data profiles of two metabolites can be quantified by applying different similarity measures and/or principles from probability theory. We point out that this formalism, a characteristic of classical graph theory, is limited to establishing edges only between two nodes, and cannot represent biochemical relationships with more than one substrate, product, or activator/inhibitor. Nevertheless, a correspondence between graph theorybased networks and biochemical networks obtained from KEGG indicated that graph theoretic properties contain biochemically meaningful information [54,55]. The choice of a similarity measure depends on the biological question to be answered. Different relationships are generally captured by applying different measures to pairs of metabolic profiles. For instance, correlation-like similarity measures extract linear relationships, whereas mutual information and its derivatives reveal nonlinearities [54]. Moreover, by applying the Euclidean distance, one accounts for differences in (relative) abundance between metabolites, while other measures account for differences in the shape of profiles. Moreover, measures based on dynamic time-warping [9,56] specifically capture similarity or discordance in time and are, consequently, applicable to time-resolved profiles. If symmetric similarity measures are used, in which the data profile order to be compared does not matter, the resulting edges are undirected (Box 1a,d). By contrast, asymmetric similarity measures result in directed edges (Box 1b). Relationships between metabolic profiles can also be established by using the Bayes rule of conditional dependence, which can also be used to reveal directed relationships between metabolic profiles, as has been carried out with transcriptomics data [56–58] and predicted fluxes [59]. We note that the structure of directed networks is expected to contain richer information about metabolic regulation in comparison to undirected networks. Nevertheless, the common occurrence in cellular metabolism of cyclic and parallel pathways and reversible reactions can challenge the determination of the direction of dependence in a relationship. After applying similarity measures and/or principles from probability theory to all pairs of metabolic profiles, a similarity matrix is obtained. Constructing the [weighted (Box 1d)] network from the similarity matrix requires the application of a statistically sound threshold, which can be principally obtained in two ways: (i) Determine P values for all similarities (followed in [60,61]) and adjust them for multiple hypotheses testing (e.g., Bonferroni or local falsediscovery rate [62]). Edges are then established only for entries of the similarity matrix that are statistically significant at a pre-specified level a (followed in [63]), often calculated with the aid of permutation tests. Motivated by the interest in the strongest relationships, one may further filter for entries in the similarity matrix above a fixed 3

TIBTEC-1018; No. of Pages 8

Review

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

Table 1. Network properties. The most common network properties used to investigate biological networks.

Network property and definion

Illustraon

Degree of node-connecvity: The degree of a node i denoted by k_i, is the number of edges linking it to other nodes of the network. In directed networks, the degree of the node is the sum of its in-degree and out-degree denong the number of in-coming and out-going edges on the node, respecvely.

=5

(Geodesic) distance: The (geodesic) distance between two nodes i and j is the length of a shortest path (i.e. geodesic) between them. In unweigthed graphs, the length of a path is the number of edges included in the path, while in an edge-weighted graph it is given by the sum of edge weights on the path.

geodesic distance =2 between e and b

a

Average path length: The average path length of a given graph is the average length of the geodesic distance between all pairs of nodes.

average (geodesic distances) = 1.53

Closeness centrality is the reciprocal of the average path length between a given node i and all other nodes in a given connected graph.

Diameter: The diameter of a graph is maximum geodesic distance between any pair of nodes.

all-to-all geodesic distance matrix

Closeness centrality:

average path length of node d = 9/5 closeness centrality = (9/5) -1 = 0.55 a b c d e f

a 0 1 1 1 1 1

b 1 0 1 2 2 2

c 1 1 0 2 2 2

d 1 2 2 0 2 2

e 1 2 2 2 0 1

f 1 2 2 2 1 0

max (geodesic distances) = 2

Eccentricity: The eccentricity of a given node i is the maximum distance to any other node in the graph.

a

b

c

d e

f 2

4

4

6

5

max (max distances of f) = 6

Clique: A clique is a complete subnetwork in which each pair of nodes is connected by an edge.

neighbors

Clustering coefficient:

b c d e f b c d e f

The clustering coefficient of a node i is the proporon of exisng edges from all possible edges between the neighbors of i. It quanfies how close the subnetwork induced by i and its adjacent nodes is from a clique.

passing through edge

4

b c d e f b c d e f

geodesic distances

The node/edge betweenness centrality of a node i or edge l. respecvely, is given by the number of geodesic distances between any two nodes that contain the node/edge.

geodesic distances

passing through node

Node/edge betweenness centrality:

2/10 connecons clustering coefficient of node a = 0.2

a b c d e f a b c d e f

node betweenness centrality of a = 8 edge betweenness centrality of l = 4

TIBTEC-1018; No. of Pages 8

Review

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

threshold (e.g., as used in [56]). (ii) Obtain a threshold value that guarantees a pre-specified false-discovery rate. Edges are then established only for entries of the similarity matrix, which are above the obtained threshold [64]. Structural properties of plant metabolic networks and functional roles of metabolites Visualization of metabolic data profiles by network-based representation extends beyond the visualization of relationships of pairs of metabolites. Structural properties of graphs can be used for the interpretation of datasets and for generating hypotheses; some salient structural properties are illustrated in Table 1. For example, clustering of nodes and/or edges in a network can identify groups of nodes with similar chemical properties, and these are referred to as ‘modules’ or ‘communities’ [65]. The resulting communities can be tested for over-enrichment [66] of particular compound classes to establish further relations between metabolic processes. Finally, if networks are constructed from two different sets of data (e.g., from different genotypes or organs), structural properties can be used to determine the differences between the networks [67]. The pipeline for analysis of metabolomics data with the aid of network-based analysis is illustrated in Figure 1.

Correlation-based networks (CNs) Edges are obtained using correlation-based measures including: Pearson correlation and partial correlation [68], removing spurious relationships or rank-based correlation coefficients, which are applied on the investigated data profiles. Removal of spurious relationships is of particular importance when one attempts to establish causal relations between biochemical entities. Selection of threshold values for CN construction follows the procedure outlined above (e.g., [64,69]). CNs can potentially be used as a top-down approach, whereby regulatory mechanisms can be revealed by progressively supplementing data of different origin. A recent study demonstrated that changes in gene expression are associated with alterations in the levels of metabolites [62], although the relationship is far from linear. Therefore, integrating information on the metabolite and gene expression levels may be used to elucidate the structure and regulation of a metabolic network. CNs obtained from metabolomics and transcriptomics data gathered in time-resolved experiments of Arabidopsis rosette leaves and roots were used to study allosteric regulations in Arabidopsis, and resulted in the identification of flavonoid biosynthetic genes [70]. In a mapping population of Arabidopsis, to explore the natural variability of glucosinolate metabolism, metabolic

Condion (e)

Repeat experiment with 2nd season data verificaon of posited hypotheses

Plant populaon

(a)

Metabolic, transcriptomic, proteomic, or morphological screening

Network analysis

Data matrices

Tissue specific normalizaon

(b)

Normalized data matrices

(dis)similarity measures probability theory laws

(c)

(dis) similarity matrices

Network generaon threshold selecon

(d)

TRENDS in Biotechnology

Figure 1. Suggested pipeline for correlation-based network reconstruction. (a) Subject plant samples from a perturbation experiment to omics and phenotypic profiling. (b) Normalize data for subsequent similarity measures. Store results in data matrices. (c) Use normalized and transformed data to apply similarity measures in a pairwise manner, for example calculated correlation coefficients (Pearson, Spearman, or Kendall). Estimate P values for all coefficients. Next, determine threshold values for the resulting correlation coefficients and P values, storing results in adjacency matrices serving as the blueprint for the construction of networks. (d) Construct network and analyze network for graph-theoretic properties and infer biological meanings by comparing to known biochemical pathways. (e) Repeat analysis for a second season to verify posited hypotheses.

5

TIBTEC-1018; No. of Pages 8

Review pathways were reconstructed by integrating metabolic quantitative trait loci (mQTL) with gene expression quantitative trait loci (eQTL) of biosynthetic genes [71,72]. The approach of integrating advanced multivariate strategies coupled with CN-based analysis to investigate the data profiles aided in the identification of clusters of metabolites and transcripts of metabolic gene or transcription factors based on underlying regulatory mechanisms. For example, transcripts of sulfur assimilation clustered with O-acetylserine, which is considered to be a positive regulator of these genes. In another study, metabolic CNs were employed with genome-wide association mapping in Arabidopsis to study the mode of inheritance of metabolic traits, and this identified a number of candidate naturally variable genes with significant impact upon plant metabolism [73]. Moreover, CNs have been employed on metabolomics data and morphological traits in support of the sink–source paradigm and the trade-off characteristics between vegetative and reproductive organs [74]. Based on the analysis, a plant vegetative growth and harvest index were suggested to regulate pericarp sugar metabolism. In turn this interaction likely affected the availability of precursors for storage reserves accumulation in the seed. Furthermore, comparative analysis of fruit and seed CNs showed increased coordinated regulation of the seed metabolism in response to genetic introgression, suggesting a functional relevance of the ratios between metabolites in respect to the ontogeny of the plant organ [9,14,75]. The described CN-derived results may be useful for the enhancement of crop-breeding strategies. In another study, seed-specific metabolic modules have been identified via correlationbased network analysis, whose resilience to perturbation is indicative of the relevance of maintaining specific metabolite ratios within the seed [48]. Metabolites, whose profiles result in significant correlations, may not necessarily be of the same biochemical background. Thus, comparing CNs with biochemical knowledge may also inform on interdependence of biochemical pathways and suggest post-translational regulation and allosteric interactions [9]. By contrast, CN-based analysis has provided evidence that metabolites in the same cellular compartment likely share tighter relations, which are reflected in strong correlations [76]. Similarly, a remarkably high correlation across biological replicates was found between metabolite pairs involved in different reactions governed by the identical enzyme, due to enzyme promiscuity, the capability to catalyze more than one reaction [77]. Furthermore, analysis of CN structure and its properties allows the determination of communities of tightly interconnected components, which are mainly due to the tight regulation of metabolic processes [78]. Picking up on this principle, the covariance structure of metabolites and pattern of protein change were investigated by integrating metabolomic and proteomic data [16]. In this study, the analysis of correlation network topology from data of diurnal changes in cellular components in Arabidopsis rosettes allowed a degree of causality between network components to be established. The time-dependent shifts of cellular components integrated into correlation between 6

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

metabolites and transcripts suggested the occurrence of metabolite-induced gene expression together with the changes of metabolic abundance due to the alteration in gene expression. These studies show that network-based analysis can aid in discovering the regulatory mechanisms underlying the data profiles of biochemical components considered in the network analysis. Concluding remarks Networks reconstructed from metabolomics data provide a formal framework for investigating plant metabolism. Supplemented by datasets from other approaches, including transcripts and proteins, CNs can help to identify regulatory mechanisms and crosstalk between distinct metabolic pathways in response to perturbation. Moreover, it can hint at the existence of uncharacterized metabolic pathways through the comparison with model microbial systems. Comparative analysis of CNs across species and kingdoms, and the isolation of conserved metabolic interactions, may highlight phylogenetic relationships, which are not impeded by the lack of genetic data, and provide evolutionary insights. Finally, network properties can be used first to relate structural roles with biological implications, and then to unravel information that is not made available via traditional analysis of complex datasets. For example, identifying metabolic hubs and defining their role in the topology of the network may be useful for metabolic engineering. When integrating high-throughput data from multiple platforms, the assessment of network properties may unveil the existence of functional biological communities, which capture the interaction of biochemical components from different levels of biological organization. Despite these advantages, there are still several unsolved question that should be the focus of future studies (Box 2). Developing a standardized network-based framework will require: (i) setting an appropriate null model for determining biologically meaningful properties, which will have a strong effect on setting permissive threshold values for establishing edges, (ii) devising adequate measures for

Box 2. Outstanding questions  Reducing the ratio of not annotated metabolites (NA) within the dataset. NAs account for a great portion of a metabolite profile, and eventually hinder the reconstruction and determination of biologically meaningful pathways, resulting in fragmentary reconstruction of metabolic networks [79].  Timescale. When combining datasets from different types of biochemical components, such as transcript and metabolite profiles, linear relationships may not be the most appropriate. The cellular differential timescale intrinsic to the regulation of each molecular level (gene expression, protein synthesis, metabolic fluxes) might be missed by linear pairwise correlations. Similarly, when using datasets based on different developmental stages of a plant system with the aim to identify timely consecutive events, data alignment is challenging.  Analytical heterogeneity. The heterogeneity of analytical platforms used to monitor the different cell components necessitates optimization for uniformity under scrutiny.  Subcellular metabolic differentiation. Major efforts will be required to resolve the compartmentalization of metabolic processes.

TIBTEC-1018; No. of Pages 8

Review capturing the relationship between measured metabolite contents/concentrations as well as biologically meaningful network properties, and (iii) methods for network-level comparisons that identify modules of interest; these modules may be responsible for the rewiring of the network in the wake of adaptation to environmental perturbations or genetic alterations. Moreover, networks constructed from data have largely been analyzed in purely structural terms. Future analysis should consider the inclusion of additional information such as the edge-weights corresponding to the strength of the identified relationships. Nevertheless, the studies and examples reviewed here demonstrate the potential of network-based analysis to tackle complex data analysis and interpretation. Acknowledgments The authors would like to thank Hiro Nonogaki, Hillel Fromm, and Simon Barak for critical reading of the manuscript.

References 1 Barabasi, A.L. and Oltvai, Z.N. (2004) Network biology: understanding the cell’s functional organization. Nat. Rev. Genet. 5, 101–113 2 Yamada, T. and Bork, P. (2009) Evolution of biomolecular networks – lessons from metabolic and protein interactions. Nat. Rev. Mol. Cell Biol. 10, 791–803 3 Jeong, H. et al. (2000) The large-scale organization of metabolic networks. Nature 407, 651–654 4 Butte, A.J. et al. (2000) Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc. Natl. Acad. Sci. U.S.A. 97, 12182–12186 5 Chuang, H-Y. et al. (2007) Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, http://dx.doi.org/10.1038/msb4100180 6 Cline, M.S. et al. (2007) Integration of biological networks and gene expression data using Cytoscape. Nat. Protoc. 2, 2366–2382 7 Schwarz, D. et al. (2011) Metabolic and transcriptomic phenotyping of inorganic carbon acclimation in the cyanobacterium Synechococcus elongatus PCC 7942. Plant Physiol. 155, 1640–1655 8 Riedelsheimer, C. et al. (2012) Genome-wide association mapping of leaf metabolic profiles for dissecting complex traits in maize. Proc. Natl. Acad. Sci. U.S.A. 109, 8872–8877 9 Toubiana, D. et al. (2012) Metabolic profiling of a mapping population exposes new insights in the regulation of seed metabolism and seed, fruit, and plant relations. PLoS Genet. 8, e1002612 10 Kamleh, M.A. et al. (2009) Applications of mass spectrometry in metabolomic studies of animal model and invertebrate systems. Brief. Funct. Genomics Proteomics 8, 28–48 11 Suzuki, Y. et al. (2012) Metabolome analysis of photosynthesis and the related primary metabolites in the leaves of transgenic rice plants with increased or decreased Rubisco content. Plant Cell Environ. 35, 1369– 1379 12 Hay, J. and Schwender, J. (2011) Computational analysis of storage synthesis in developing Brassica napus L. (oilseed rape) embryos: flux variability analysis in relation to C-13 metabolic flux analysis. Plant J. 67, 513–525 13 Roessner, U. et al. (2001) Metabolic profiling allows comprehensive phenotyping of genetically or environmentally modified plant systems. Plant Cell 13, 11–29 14 Schauer, N. et al. (2008) Mode of inheritance of primary metabolic traits in tomato. Plant Cell 20, 509–523 15 Kanani, H. et al. (2010) Individual vs. combinatorial effect of elevated CO2 conditions and salinity stress on Arabidopsis thaliana liquid cultures: comparing the early molecular response using time-series transcriptomic and metabolomic analyses. BMC Syst. Biol. 4, 177 http://dx.doi.org/10.1186/1752-0509-4-177 16 Gibon, Y. et al. (2006) Integration of metabolite with transcript and enzyme activity profiling during diurnal cycles in Arabidopsis rosettes. Genome Biol. 7, R76 17 Michaeli, S. et al. (2011) A mitochondrial GABA permease connects the GABA shunt and the TCA cycle, and is essential for normal carbon metabolism. Plant J. 67, 485–498

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

18 Williams, T.C.R. et al. (2011) Capturing metabolite channeling in metabolic flux phenotypes. Plant Physiol. 157, 981–984 19 Lewis, N.E. et al. (2012) Constraining the metabolic genotype– phenotype relationship using a phylogeny of in silico methods. Nat. Rev. Microbiol. 10, 291–305 20 Kruger, N.J. and Ratcliffe, R.G. (2012) Pathways and fluxes: exploring the plant metabolic network. J. Exp. Bot. 63, 2243–2246 21 Kruger, N.J. et al. (2012) Strategies for investigating the plant metabolic network with steady-state metabolic flux analysis: lessons from an Arabidopsis cell culture and other systems. J. Exp. Bot. 63, 2309–2323 22 Sweetlove, L.J. and Ratcliffe, R.G. (2011) Flux–balance modelling of plant metabolism. Front. Plant Sci. 2, 38 http://dx.doi.org/10.3389/ fpls.2011.00038 23 Cox, B. and Emili, A. (2006) Tissue subcellular fractionation and protein extraction for use in mass-spectrometry-based proteomics. Nat. Protoc. 1, 1872–1878 24 Tohge, T. et al. (2011) Toward the storage metabolome: profiling the barley vacuole. Plant Physiol. 157, 1469–1482 25 Klie, S. et al. (2011) Analysis of the compartmentalized metabolome – a validation of the non-aqueous fractionation technique. Front. Plant Sci. 2, 27 26 Mintz-Oron, S. et al. (2012) Reconstruction of Arabidopsis metabolic network models accounting for subcellular compartmentalization and tissue-specificity. Proc. Natl. Acad. Sci. U.S.A. 109, 339–344 27 Williams, T.C.R. et al. (2010) A genome-scale metabolic model accurately predicts fluxes in central carbon metabolism under stress conditions. Plant Physiol. 154, 311–323 28 Kleessen, S. and Nikoloski, Z. (2012) Dynamic regulatory on/off minimization for biological systems under internal temporal perturbations. BMC Syst. Biol. 6, 16 29 Klamt, S. et al. (2009) Hypergraphs and cellular networks. PLoS Comput. Biol. 5, e1000385 30 Larhlimi, A. et al. (2011) Robustness of metabolic networks: a review of existing definitions. Biosystems 106, 1–8 31 Karp, P.D. et al. (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 32 Aoki-Kinoshita, K.F. and Kanehisa, M. (2007) Gene annotation and pathway mapping in KEGG. Methods Mol. Biol. 396, 71–91 33 Kanehisa, M. and Goto, S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30 34 Kumar, A. et al. (2012) MetRxn: a knowledgebase of metabolites and reactions spanning metabolic models and databases. BMC Bioinform. 13, 6 35 Colijn, C. et al. (2009) Interpreting expression data with metabolic flux models: predicting Mycobacterium tuberculosis mycolic acid production. PLoS Comput. Biol. 5, e1000489 36 Schuetz, R. et al. (2012) Multidimensional optimality of microbial metabolism. Science 336, 601–604 37 Chang, R.L. et al. (2011) Metabolic network reconstruction of Chlamydomonas offers insight into light-driven algal metabolism. Mol. Syst. Biol. 7, 518 38 Buckeridge, M. et al. (2005) The role of exo-beta-galactanase in the mobilization of polysaccharides from the cotyledon cell walls of Lupinus angustifolius follwing germination. Ann. Bot. 96, 435–444 39 DeSilva, K. et al. (1993) Molecular characterization of a xyloglucanspecific endo-(1-4)-beta-D-glucanase (xyloglucan endtransglycosylease) from nasturtium seeds. Plant J. 3, 701–711 40 Ruprecht, C. et al. (2011) Large-scale co-expression approach to dissect secondary cell wall formation across plant species. Front. Plant Sci. 2, 23 41 Nonogaki, H. (2008) Seed Germination and Reserve Mobilization, John Wiley & Sons http://dx.doi.org/10.1002/9780470015902.a0002047.pub2 42 Davidovich-Rikanati, R. et al. (2007) Enrichment of tomato flavor by diversion of the early plastidial terpenoid pathway. Nat. Biotechnol. 25, 899–901 43 Gonda, I. et al. (2010) Branched-chain and aromatic amino acid catabolism into aroma volatiles in Cucumis melo L. fruit. J. Exp. Bot. 61, 1111–1123 44 Hennig, L. (2007) Patterns of beauty – omics meets plant development. Trends Plant Sci. 12, 287–293 45 Morgan, J.A. and Rhodes, D. (2002) Mathematical modeling of plant metabolic pathways. Metab. Eng. 4, 80–89 7

TIBTEC-1018; No. of Pages 8

Review 46 Bier, M. et al. (2000) How yeast cells synchronize their glycolytic oscillations: A perturbation analytic treatment. Biophys. J. 78, 1087–1093 47 Colon, A.M. et al. (2010) A kinetic model describes metabolic response to perturbations and distribution of flux control in the benzenoid network of Petunia hybrida. Plant J. 62, 64–76 48 Yakir, E. et al. (2007) Regulation of output from the plant circadian clock. FEBS J. 274, 335–345 49 Lombardo, V.A. et al. (2011) Metabolic profiling during peach fruit development and ripening reveals the metabolic networks that underpin each developmental stage. Plant Physiol. 157, 1696–1710 50 Kaplan, F. et al. (2004) Exploring the temperature-stress metabolome of Arabidopsis. Plant Physiol. 136, 4159–4168 51 Shulaev, V. et al. (2008) Metabolomics for plant stress response. Physiol. Plant. 132, 199–208 52 Bundy, J.G. et al. (2009) Environmental metabolomics: a critical review and future perspectives. Metabolomics 5, 3–21 53 Weber, A.P.M. and Braeutigam, A. (2012) The role of membrane transport in metabolic engineering of plant primary metabolism. Curr. Opin. Biotechnol. http://dx.doi.org/10.1016/j.copbio.2012.09.010 54 Croes, D. et al. (2006) Inferring meaningful pathways in weighted metabolic networks. J. Mol. Biol. 356, 222–236 55 Varma, A. and Palsson, B.O. (1994) Metabolic flux balancing – basic concepts, scientific and practical use. Biotechnology 12, 994–998 56 Donner, S. et al. (2011) Unraveling gene-regulatory networks from time-resolved gene expression data – a measures comparison study. BMC Bioinform. 12, 292 57 Christin, C. et al. (2010) Time alignment algorithms based on selected mass traces for complex LC-MS data. J. Proteome Res. 9, 1483–1495 58 Tormene, P. et al. (2009) Matching incomplete time series with dynamic time warping: an algorithm and an application to poststroke rehabilitation. Artif. Intell. Med. 45, 11–34 59 Friedman, N. et al. (2000) Using Bayesian networks to analyze expression data. J. Comput. Biol. 7, 601–620 60 Li, Z. and Chan, C. (2004) Inferring pathways and networks with a Bayesian framework. FASEB J. 18, 746–748 61 Kim, H.U. et al. (2011) Framework for network modularization and Bayesian network analysis to investigate the perturbed metabolic network. BMC Syst. Biol. 5, 12 62 Zushi, K. and Matsuzoe, N. (2011) Utilization of correlation network analysis to identify differences in sensory attributes and organoleptic compositions of tomato cultivars grown under salt stress. Sci. Hortic. 129, 18–26

8

Trends in Biotechnology xxx xxxx, Vol. xxx, No. x

63 Fukushima, A. et al. (2011) Metabolomic correlation-network modules in Arabidopsis based on a graph-clustering approach. BMC Syst. Biol. 5, 12 64 Lisec, J. et al. (2011) Corn hybrids display lower metabolite variability and complex metabolite inheritance patterns. Plant J. 68, 326–336 65 Osorio, S. et al. (2012) Integrative comparative analyses of transcript and metabolite profiles from pepper and tomato ripening and development stages uncovers species-specific patterns of network regulatory behavior. Plant Physiol. 159, 1713–1729 66 Newman, M.E.J. (2012) Communities, modules and large-scale structure in networks. Nat. Phys. 8, 25–31 67 Subramanian, A. et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545–15550 68 Ideker, T. and Krogan, N.J. (2012) Differential network biology. Mol. Syst. Biol. 8, 565 69 Prokhorov, A.V. (2001) Partial correlation coefficient. In Encyclopaedia of Mathematics (Hazewinkel, M., ed.), Springer978-1556080104 70 Sulpice, R. et al. (2010) Mild reductions in cytosolic NADP-dependent isocitrate dehydrogenase activity result in lower amino acid contents and pigmentation without impacting growth. Amino Acids 39, 1055–1066 71 Hirai, M.Y. et al. (2005) Elucidation of gene-to-gene and metabolite-togene networks in Arabidopsis by integration of metabolomics and transcriptomics. J. Biol. Chem. 280, 25590–25595 72 Caldana, C. et al. (2011) High-density kinetic analysis of the metabolomic and transcriptomic response of Arabidopsis to eight environmental conditions. Plant J. 67, 869–884 73 Wentzell, A.M. et al. (2007) Linking metabolic QTLs with network and cis-eQTLs controlling biosynthetic pathways. PLoS Genet. 3, 1687–1701 74 Chan, E.K.F. et al. (2010) The complex genetic architecture of the metabolome. PLoS Genet. 6, e1001198 75 Schauer, N. et al. (2006) Comprehensive metabolic profiling and phenotyping of interspecific introgression lines for tomato improvement. Nat. Biotechnol. 24, 447–454 76 Steuer, R. et al. (2003) Observing and interpreting correlations in metabolomic networks. Bioinformatics 19, 1019–1026 77 Camacho, D. et al. (2005) The origin of correlations in metabolomics data. Metabolomics 1, 53–63 78 Kose, F. et al. (2001) Visualizing plant metabolomic correlation networks using clique–metabolite matrices. Bioinformatics 17, 1198–1208 79 Kueger, S. et al. (2012) High-resolution plant metabolomics: from mass spectral features to metabolites and from whole-cell analysis to subcellular metabolite distributions. Plant J. 70, 39–50