Network News: Functional Modules Revealed during Early Embryogenesis in C. elegans

Network News: Functional Modules Revealed during Early Embryogenesis in C. elegans

Developmental Cell, Vol. 9, 307–315, September, 2005, Copyright ©2005 by Elsevier Inc. DOI 10.1016/j.devcel.2005.08.008 Previews Network News: Funct...

83KB Sizes 0 Downloads 7 Views

Developmental Cell, Vol. 9, 307–315, September, 2005, Copyright ©2005 by Elsevier Inc. DOI 10.1016/j.devcel.2005.08.008

Previews

Network News: Functional Modules Revealed during Early Embryogenesis in C. elegans The functional module is fast becoming the operational unit of the postgenomics era. A new report in Nature by Gunsalus and colleagues describes, using a multiply supported network, functional modules within early C. elegans embryos and identifies several new components of known molecular machines (Gunsalus et al., 2005). How does a cell work? Until recently, biologists could only take a reductionist approach to address this question by investigating individual components of a cell, how they work, and the consequence to the cell upon their removal. However, it is difficult to imagine how this approach could yield an understanding of a cell on a global level. Imagine trying to understand how a jet airplane works by cataloguing the millions of parts, studying how individual parts work, and then asking what happens to the plane if various parts are removed. It simply doesn’t fly. Instead, some have argued that an understanding of a cell might be better achieved by investigating, not individual genes, RNAs, or proteins, but a higher level of organization within a cell: the functional module (Hartwell et al., 1999). A functional module is composed of multiple molecules that work together in a cell as a distinct unit and has emergent properties not found among the individual components. A module may be a single physical entity (or a so-called protein or molecular machine [Alberts, 1998]) such as a ribosome. Alternatively, a module can be made from a number of separate physical entities, like a signal transduction pathway. By identifying the composition of modules within a cell and understanding how they behave individually and interact with each other, we will undoubtedly develop a better appreciation of how a cell works. For example, it is much easier to understand how a jet airplane flies once you know that some components function together to generate lift, others to generate thrust, and still others to hold precious cargo. Identifying all of the modules within a cell using conventional biochemical and genetic techniques is a daunting challenge. However, alternative approaches for identifying functional modules have been developed indirectly through efforts to visualize and interpret large datasets generated in the postgenomics era. One of the most versatile ways of visualizing large datasets is to represent relationships between genes and/or proteins on a network diagram or graph (Barabasi and Oltvai, 2004). Each gene or protein on a network graph is called a node, and relationships between the nodes are drawn as lines called edges (Figure 1). The edges within a network graph can be used to represent interactions between two nodes, such as physical or genetic interactions, or they can represent a high correlation of

behavior between two nodes, such as mRNA expression levels. Similar to cluster analysis of microarray data, network analysis allows one to infer function by the “guilt-by-association” principle. That is, the function of any one gene or protein can be inferred from that of the other nodes linked to it on the graph. Functional modules should therefore be apparent on a network graph as groups of highly interconnected nodes that are only sparsely connected to the remainder of the graph (Figure 1). Several computational tools have been developed to identify modules within network graphs (e.g., Bader and Hogue, 2003; Spirin and Mirny, 2003). However, due to the noise commonly associated with large datasets, false positive edges can interfere with the detection of

Figure 1. An Example of How an MSN Is Created from Three Distinct Networks Only those edges supported by two or more networks are maintained in the MSN. Putative modules within the MSN are circled.

Developmental Cell 308

isolated groups of highly interconnected nodes within the network. One way to reduce false positives and make modules more apparent is to combine large datasets and build network graphs whose nodes and edges are represented by independent observations (e.g., Lee et al., 2004) (Figure 1). This approach has recently been used to identify functional modules during early embryogenesis in C. elegans. As reported in a recent issue of Nature, Gunsalus and colleagues generated a network graph based on the phenotypic similarities between 661 genes that result in early embryonic defects when disrupted by RNAi (Gunsalus et al., 2005; Sonnichsen et al., 2005). The embryonic phenotypes were digitally scored for 45 different attributes. Pairs of genes whose phenotypic scores correlate above a certain statistical threshold were then linked. This graph was then overlayed by a graph of C. elegans physical interactions between proteins (Li et al., 2004) and a graph that linked genes that share similar transcriptional profiles across a wide range of experimental conditions (Kim et al., 2001). The combined graph contained 31,173 edges in the largest unbroken network or main giant component. However, upon eliminating edges that were not supported by multiple experimental approaches, only 1,036 edges remained. The resulting network graph contained many groups of highly interconnected nodes that were only minimally connected to other nodes. These densely interconnected regions of the “multiply supported network” (MSN) clearly broke down into two models of functional modules. First, dense regions principally supported by all three experimental approaches or those supported only by protein interaction and shared phenotype were predicted to make molecular machines such as the proteasome. The other dense regions were primarily supported only by phenotypic and expression correlations and may represent functionally interdependent cellular processes such as chromosome maintenance and nucleocytoplasmic transport. Although MSNs have been used before to define functional modules, what is novel about the approach used by Gunsalus et al. (2005) is that phenotypic data was used to focus on modules relevant to early embryogenesis. By applying stringent filters when constructing the coexpression and physical interaction networks, but a much less stringent one when constructing the phenotypic network, the authors ensured that a vast majority of the links within the MSN were supported by phenotypic data. As such, the functional modules predicted by the MSN are likely directly relevant to early embryogenesis. In support of this, the expression patterns of ten previously uncharacterized genes from various modules were investigated with GFP reporters, and all ten were expressed in early embryos. In addi-

tion, eight of the ten had subcellular localizations consistent with the predicted function of the module, further demonstrating the validity of this innovative approach. Most physical interaction and coexpression datasets blend functional associations from a variety of stages and cell types. Functional modules predicted solely from these nonspecific datasets may never actually exist in their entirety in any one cell or at any one developmental stage. However, the work of Gunsales and colleagues suggests that existing interaction and expression data may be adaptable to any specific cell type or stage to reveal relevant functional modules, provided there is a robust dataset that anchors a multiply supported network in the cells and stage of interest. The authors have therefore provided a new approach to monitor how the composition of a module changes over the lifetime of a cell or organism, which may represent a new beginning towards understanding how cells work. Peter J. Roy1 and Quaid Morris2 1 Department of Medical Genetics and Microbiology 2 Banting and Best Department of Medical Research The Terrence Donnelly Centre for Cellular and Biomolecular Research University of Toronto 160 College Street Toronto, Ontario M5S 3E1 Canada

Selected Reading Alberts, B. (1998). Cell 92, 291–294. Bader, G.D., and Hogue, C.W. (2003). BMC Bioinformatics 4, 2. Barabasi, A.L., and Oltvai, Z.N. (2004). Nat. Rev. Genet. 5, 101–113. Gunsalus, K.C., Ge, H., Schetter, A.J., Goldberg, D.S., Han, J.D., Hao, T., Berriz, G.F., Bertin, N., Huang, J., Chuang, L.S., et al. (2005). Nature 436, 861–865. Hartwell, L.H., Hopfield, J.J., Leibler, S., and Murray, A.W. (1999). Nature 402, C47–C52. Kim, S.K., Lund, J., Kiraly, M., Duke, K., Jiang, M., Stuart, J.M., Eizinger, A., Wylie, B.N., and Davidson, G.S. (2001). Science 293, 2087–2092. Lee, I., Date, S.V., Adai, A.T., and Marcotte, E.M. (2004). Science 306, 1555–1558. Li, S., Armstrong, C.M., Bertin, N., Ge, H., Milstein, S., Boxem, M., Vidalain, P.O., Han, J.D., Chesneau, A., Hao, T., et al. (2004). Science 303, 540–543. Sonnichsen, B., Koski, L.B., Walsh, A., Marschall, P., Neumann, B., Brehm, M., Alleaume, A.M., Artelt, J., Bettencourt, P., Cassin, E., et al. (2005). Nature 434, 462–469. Spirin, V., and Mirny, L.A. (2003). Proc. Natl. Acad. Sci. USA 100, 12123–12128.