Molecular Cell, Vol. 12, 1343–1351, December, 2003, Copyright 2003 by Cell Press
Previews
Functional Genomics: All the King’s Horses AND All the King’s Men CAN Put Humpty Together Again The integration of data from different kinds of highthroughput functional genomic analyses not only permits the accurate determination of the functions of novel genes, but also presents the prospect of a “systems” approach to biology in the future. During the second half of the 20th century, biologists followed a rigorously reductionist approach as they “took Humpty apart,” dissecting living cells into their molecular components and defining the mechanism of action of each working part. The availability of complete genome sequences from the 1990’s onward revealed that this approach had failed to uncover all of the cell’s working parts and that many of the molecular components that had been missed were of completely unknown function. This led to the development of comprehensive analytical techniques that aimed at elucidating the function of each and every gene in a genome. These comprehensive methods were given the suffix -ome (to correspond to genome), thus: transcriptome, proteome, metabolome, interactome, etc., etc. As these techniques expand and start to be successful in defining gene function, it becomes evident that the agenda of 21st century biology will be “to put Humpty back together again” and define how all of these components act and interact to produce a living organism (Noble, 2003). This has meant a rebirth of the concept of “integrative” or “systems” biology and makes it imperative that we integrate the data from the different levels of functional genomic analysis in order to obtain the desired holistic view of the cell or organism. The foregoing is a very high-minded reason for integrating functional genomic data. However, there is also a more practical (and more urgent) reason that compels us to pursue such a course. This practical imperative is the need to crossvalidate the data obtained using the different high-throughput approaches in order to obtain meaningful and reliable biological information. In the life sciences, as in many fields, increased speed is bought at the cost of reduced accuracy, and many of the large data sets that are available to the biologist carry a heavy burden of false positive results. Integrating two-hybrid data on protein-protein interactions with data on protein complex composition from affinity chromatography plus mass spectrometry, and coexpression data from transcriptome analyses (e.g., see von Mering et al. 2002) allows the production of a list of validated protein interactions in which one can have sufficient confidence to, say, ask a graduate student to spend four years studying them. This greater assurance is, again, bought at a price––as accuracy is increased so the coverage (in this example, the proportion of all protein-protein interac-
tions analyzed) is decreased. In other words, the old North of England adage that you “don’t get owt for nowt” is not only universally applicable, but also depressingly accurate. There are two problems with this process of data integration. The first is that it is very tedious to conduct. The relevant data are scattered between various databases, all with different formats and only limited analytical tools, so it takes time and skill to mine the relevant information from each site in order to address a particular integrative query. One solution to this problem is to garner all of the different genetic, genomic, and functional genomic data relevant to a particular species from all of their disparate sites and incorporate them into a single data warehouse that is structured in such a way as to enable complex queries to be made over all of the data sets, thus facilitating their integration (Cornell et al., 2003). Before too long, too much data will be appearing far too quickly to make its incorporation into a single data warehouse a practical proposition, and we will have to send the analytical tools to the data, rather than the other way round. “Grid” technologies may facilitate such a process, although bioinformaticians seem rather allergic to these new developments in computer science. The other (and more serious) problem with data integration is that it is very hard to standardize the experimental process. Therefore, in order to determine whether we are trying to integrate data sets that are truly comparable, we need a lot of information about their provenance. This has led to the requirement for standard descriptions of microarray (MIAME; Brazma et al., 2001) and proteomic (PEDRo; Taylor et al., 2003) experiments. However, what it is most difficult to standardize is the biology itself––thus, at least some of the apparent contradiction between early transcriptomic and proteomic data on yeast is attributable to the fact that the microarray analyses were performed in one laboratory and the proteome studies in another. In this issue, Hazbun et al. (2003) solve nearly all of the problems of integrating functional genomic data in one go by performing all of the analyses themselves. This group of West Coast, US workers chose to study 100 essential genes in Saccharomyces cerevisiae whose functions were still unknown. To uncover these functions, they have performed a number of independent experimental and bioinformatic analyses. They used a combination of TAP tagging and MudPit mass spectrometry to identify all of the members of any protein complexes to which the products of these essential genes belonged. They followed this up with the prediction of binary protein-protein interactions using a highthroughput yeast two-hybrid approach. For many of the proteins, their subcellular localization was determined by fusion to YFP (in at least 60% of the cases, these location assignments agreed with those of the recent comprehensive analysis by Huh et al. [2003]). Each open reading frame was subjected to bioinformatic analysis to predict the structural domains of the encoded proteins, as well as to more classical sequence similarity
Molecular Cell 1344
searches. Finally, the GO (gene ontology) term finder was used to annotate the ORFs according to both their partners of known function (determined by the biochemical and genetic protein interaction screens), their subcellular location (as elucidated by fluorescence microscopy), and their molecular function (as inferred from the structural domain and sequence similarity searches). It was found that, of the 100 essential genes studied, 77 were provided with a functional annotation using at least one GO term, 48 with at least two terms, and 16 with all three. Once again, “you don’t get owt for nowt,” and it appears that it may be a lot more effort to “put Humpty back together again” than it was to take him apart in the first place. However, the fact that the annotations of 16 of the investigated genes have been confirmed in independent studies by other laboratories demonstrates that this integrative approach will pay dividends and pave the way for a systems approach to biology.
Selected Reading Brazma, A., Hingamp, P., Quackenbush, J., Sherlock, G., Spellman, P., Stoeckert, C., Aach, J., Ansorge, W., Ball, C.A., Causton, H.C., et al. (2001). Nat. Genet. 29, 365–371. Cornell, M., Paton, N.W., Hedeler, C., Kirby, P., Delneri, D., Hayes, A., Oliver, S.G. (2003). Yeast 20, 1291–1306. Hazbun, T.R., Malstro¨m, L., and Anderson, S. (2003). Mol. Cell 12, this issue, 1353–1365. Huh, W.K., Falvo, J.V., Gerke, L.C., Carroll, A.S., Howson, R.W., Weissman, J.S., O’Shea, E.K. (2003). Nature 425, 686–691. Noble, D. (2003). Biochem. Soc. Trans. 31, 156–158. Taylor, C.F., Paton, N.W., Garwood, K.L., Kirby, P.D., Stead, D.A., Yin, Z., Deutsch, E.W., Selway, L., Walker, J., Riba-Garcia, I., et al. (2003). Nat. Biotechnol. 21, 247–254. von Mering, C., Krause, R., Snel, B., Cornell, M., Oliver, S.G., Fields, S., Bork, P. (2002). Nature 417, 399–403.
Stephen Oliver School of Biological Sciences University of Manchester 2.205 Stopford Building Oxford Road Manchester M13 9PT United Kingdom
Pinning NF-B to the Nucleus The peptidyl-prolyl isomerase Pin1 binds and stabilizes several phosphorylated transcription and mitosis regulators via its WW interacting domain. It is overexpressed in breast tumors and, as Ryo et al. (2003b) discuss in this issue of Molecular Cell, could account for their constitutive nuclear NF-B expression. The proposed mechanism involves protection of Pin1bound p65 against SOCS-1-mediated ubiquitination. NF-B activation, which has traditionally been linked to inflammation and immunity, has been implicated in cancer. By promoting proliferation and inhibiting apoptosis, NF-B could tip the balance between proliferation and apoptosis toward malignant behavior in tumor cells (Orlowski and Baldwin, 2002). Numerous reports have documented a significant correlation between NFB activation and specific types of cancer; some forms of cancer could even be classified according to their NF-B status (Davis et al., 2001). It is therefore not surprising that NF-B inhibition, by way of interfering with its two key signaling events, IKK activation and IB degradation, is considered a promising new avenue for cancer therapy (Amit and Ben-Neriah, 2003; Orlowski and Baldwin, 2002). Yet, direct proof for the role of NF-B in oncogenesis in vivo is still lacking, and the mechanism of constitutive NF-B activation in tumors is mostly elusive.
One of the strongest associations between NF-B and tumors has been noted in breast cancer, again with no obvious clue to the mechanism involved. In this issue of Molecular Cell, Ryo et al. (2003b) now propose that the peptidyl-prolyl isomerase Pin1 could be the missing link in the NF-B activation path in breast cancer. These authors show that Pin1 is elevated in a subclass of breast tumors, which are also characterized by constitutive nuclear expression of p65 (RelA). Previous analysis has shown that, compared with other Rel proteins, nuclear accumulation of p65 in breast cancer is a rather infrequent occurrence (Cogswell et al., 2000). Therefore, the Pin1-positive tumors may represent a unique breast cancer subtype. To explore its expression significance, Pin1-transfected HeLa and 293 cells were tested for NF-B activation and found to upregulate their nuclear NF-B activity, as well as the rate of nuclear translocation of p65. Pin1 overexpression has previously been implicated in the stabilization of other transcription factors such as -catenin and p53, which was achieved by phosphodependent association of Pin1 with the target proteins (Ryo et al., 2003a). Based on their experience, Ryo et al. examined the association of Pin1 with p65 and observed specific binding of the two proteins, promoted by the phosphorylation of p65 on Thr254. Pin1 binding stabilized wt p65, but not a Thr254 mutant; in Pin1-deficient cell lines, the steady-state levels of p65 dropped significantly, along with reduced NF-B activation in response to cytokines. Thr254-mutated p65 proved to be extremely unstable in HeLa and 293 cells, and the source