Classifying human cancer by analysis of gene expression

Classifying human cancer by analysis of gene expression

Review TRENDS in Molecular Medicine Vol.9 No.1 January 2003 5 Classifying human cancer by analysis of gene expression Garret M. Hampton1 and Henry...

522KB Sizes 0 Downloads 18 Views

Review

TRENDS in Molecular Medicine

Vol.9 No.1 January 2003

5

Classifying human cancer by analysis of gene expression Garret M. Hampton1 and Henry F. Frierson Jr2 1 2

Genomics Institute of the Novartis Research Foundation, 10 675 John Jay Hopkins Drive, San Diego, CA 92121, USA Department of Pathology, University of Virginia, Charlottesville, VA 22908, USA

With the development and application of DNA microarrays, the expression of almost all human genes can now be systematically examined in human malignancies. This can lead to the identification of candidate molecular targets for therapeutic intervention and biomarkers for early detection of these diseases. However, perhaps the most exciting result to come from this research has been the demonstration that patterns of gene expression can distinguish between tumors of different anatomical origin, and define new subgroups of cancer with similar histological appearance, but distinct molecular profiles. Some of these new molecular subclasses of tumor appear to correlate with clinical behavior. If substantiated in larger studies, this might form a basis for stratifying patients so that they receive optimal therapeutic treatment and follow-up. The development of expressed-sequence-tag databases [1], along with a draft version of the human genome sequence [2,3], has led to the identification of the majority of genes transcribed in human cells. Coupled with technical progress in miniaturization and bioinformatics, these advances mean that we can now survey the expression of tens of thousands of genes simultaneously in tissues and cells in normal, diseased and experimentally manipulated states [4]. Cancer is a genetic disease, propagated by the acquisition of somatic alterations that influence gene expression. Hence, the application of DNA microarrays, in combination with methods that functionally validate genes, such as RNA interference (RNAi) [5], is likely to enable the identification of genetic changes that causally influence tumor growth and behavior. Although the number of changes in gene expression in malignant cells is large, pragmatic reductionist approaches, as well as validation in cell and animal models, have begun to yield valuable clues to the oncogenic changes that contribute to malignancy (for example, see [6]). These approaches are also starting to identify possible targets for therapeutic intervention [7], and new biomarkers for recognizing the presence of cancer [8]. In addition, recent work has shown that the anatomical origin of primary tumors and their metastases can be predicted using gene expression signatures, and other studies suggest that the clinical behavior of a tumor (whether it will metastasize and, Corresponding author: Garret M. Hampton ([email protected]).

therefore, confer a poor prognosis) might be discernable by analysis of gene expression in the primary tumor. In this review, we discuss these recent advances in cancer biology, following a brief introduction to the ways in which large datasets are currently analyzed. Computational analysis of microarray data Although a reasonable first approach to examining the cancer transcriptome is to ask which genes show elevated or decreased expression in cancer compared with corresponding normal tissue, the more clinically relevant questions, such as whether altered levels of gene expression correlate with outcome in cancer patients, require ‘genomescale’ informatics. Investigators have adopted two general strategies, commonly referred to as ‘unsupervised’ and ‘supervised’, to establish the degree of similarity between the expression levels of genes in multiple samples, as well as to visualize these relationships (see [9,10] for a detailed discussion of these methods, their implementation, and their relative advantages and disadvantages). Unsupervised protocols typically make no prior assumptions about the samples under investigation, and simply seek to identify relationships between samples based on similarities in gene expression. Several such methods are commonly used and can be broadly described as either agglomerative, in which genes are brought together to form clusters based on similarities in their expression levels, or partitioning, in which genes with similar expression levels are partitioned into relatively homogenous groups. The most widely used, and most easily visualized, are the agglomerative techniques, such as hierarchical clustering. In these protocols, a measure of distance between levels of expression (e.g. Euclidean or Pearson) [10] is used to identify and join together genes with similar profiles to form nodes. These are subsequently brought together in an iterative process to form a hierarchy of clusters. The results of this process are typically displayed as a dendrogram representing the relatedness of the genes expressed within samples, or of the samples themselves. The genes are commonly represented by colors, the intensity of which is proportional to the level of gene expression (Fig. 1a). Samples can also be represented in other ways, such as a three-dimensional image in which relative distance is representative of sample relatedness. Hence, clustering brings together samples or genes with similar molecular profiles, often facilitating the discovery of previously unrecognized relationships between them.

http://tmm.trends.com 1471-4914/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S1471-4914(02)00006-0

Review

6

TRENDS in Molecular Medicine

(a) Unsupervised class discovery

Vol.9 No.1 January 2003

(b) Supervised class distinction

Unknown class

Class A

Class B

Microarray Microarray Clustering Ideal profile

Correlated

Uncorrelated

Build a 'classifier' and test on independent samples Heat map

Multidimensional scaling

Prediction: class A Class A Class B TRENDS in Molecular Medicine

Fig. 1. Unsupervised class discovery and supervised class distinction. (a) In the unsupervised class-discovery approach, RNA from tumor samples is profiled for expressed genes on a microarray (either cDNA- or oligonucleotide-based), and the individual samples clustered on the basis of similarities in expression. The results of clustering can be visualized in many ways. One method is to use 2-dimensional or hierarchical clustering ‘heat maps’, in which the levels of gene expression are represented by color – red signifies high relative expression, green signifies low relative expression. The similarity among tumor samples can be evaluated using a dendrogram (above the heat map), in which the distance between the branches and sub-branches illustrates the relative similarity. The relative similarity among samples can also be represented in 3-dimensional space, either by multidimensional scaling, where the expression levels of each gene dictate the position of the samples in space, or via principalcomponents analysis where, for example, the complexity of gene expression is collapsed into ‘supergenes’. This might lead to the conclusion that there are one or more molecular classes among the tumors analyzed (e.g. class A and class B). (b) In a supervised approach, the groups of tumors and the ideal profile that would distinguish them are pre-defined (e.g. high expression in one group, low expression in another). One or more methods are then used to rank the genes of interest according to the ideal profile (i.e. to identify those genes whose expression is correlated with the class distinction), and a subset of these genes is used to build a classifier to predict the class of a blinded or test sample (e.g. predicted as class A). The processes used to infer the existence of a new classification of tumors in (a) can be verified by the supervised methods in (b). Part (b) is modified from [12].

However, the statistical significance of newly discovered associations must be assessed with care because, to date, most expression studies have used relatively few samples compared with the numbers of genes whose expression is being assayed (e.g. 20 – 30 samples versus 10 000– 20 000 genes) and, as a result, there are statistical problems related to the dimensionality of the data. With so many genes under consideration, chance alone is likely to yield some biologically meaningless associations among genes and samples. As a form of control, the same methods can be applied to a comparable, but independent, set of samples. Alternatively, ‘supervised’ approaches can be used to address the significance of the clusters generated by agglomerative or partitioning techniques. Supervised methods, developed in statistics and http://tmm.trends.com

machine-learning, are designed to classify samples according to key properties, and can be used to test the strength of newly discovered sample relationships. They can also be used de novo to look for statistically significant differences between two or more groups of samples. These methods [10,11] rely on prior assumptions about the samples, such as knowing (and ‘learning’) the grade or stage of a tumor, or the clinical outcome of a patient (Fig. 1b). The process is guided by first specifying the class labels of the samples (i.e. which samples belong to which group). These labels might, for example, reflect groupings that were suggested by an agglomerative clustering experiment. Genes that have substantially different expression levels between defined groups of samples are then identified and used to predict the class of an unknown sample, typically by

Review

TRENDS in Molecular Medicine

cross-validation in the original dataset, and subsequently by applying the set of marker genes to a comparable, but independent, set of samples (Fig. 1b). If the ‘voting’ scheme results in a high predictive accuracy within the original and test datasets then the distinction between the groups is strong and, therefore, potentially biologically meaningful. Although genes that partition groups of samples are likely to be found in any dataset (because of high gene – sample dimensionality), cross-validation and independent testing of the partitions are reasonable safeguards against this problem. Using gene expression to classify cancer Molecular differences among tumors with similar histology Some neoplasms with overlapping light-microscopic features have distinct genetic etiologies, different responses to therapeutic intervention, and varying overall clinical outcomes. Hence, accurate diagnosis of a malignancy is crucial for effective treatment and patient follow-up. Examples of such tumor classes include the acute leukemias, non-Hodgkin’s lymphomas, and neoplasms that comprise the class of small round blue cell tumors of childhood (such as neuroblastoma, rhabdomyosarcoma, Ewing’s sarcoma and Wilms’ tumor). In practice, tumor subtypes like the acute leukemias are usually readily diagnosed by a combination of histopathology, histochemistry, immunophenotyping and cytogenetics, but there are examples of cases that are difficult to classify, and some that show a mixed lineage. Many acute leukemias can be partly distinguished by the presence of specific chromosomal translocations and, hence, it has been assumed that acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL) should be distinguishable by their gene expression profiles. Using a supervised learning approach, Golub and colleagues [12] identified a subset of genes expressed differentially between ALL and AML from among , 6500 genes whose expression was queried in a ‘training’ set of 38 samples. Using a weighted correlation method, they demonstrated that a subset of as few as 50 marker genes could be used to vote for the likely class of an unknown (‘blinded’) leukemia sample, with an accuracy of .85%. The best discriminator genes included those encoding myeloperoxidase, CD33, CD11c and MB-1, reflecting myeloid or lymphoid differentiation. Using an unsupervised clustering method, called self-organizing maps (SOMs), the authors also demonstrated a separation of ALL samples into two highly predictable groups reflecting either B- or T-cell differentiation. Thus, they were able to recapitulate many of the known features of leukemias solely on the basis of gene expression. Armstrong et al. subsequently demonstrated that the transcript profiles of mixed lineage leukemias (MLL), which are characterized by translocations involving the mixed-leukemia-lineage gene ALL1, were distinct from those of both classical ALL and AML [13]. The pattern of gene expression in MLL suggested that the ALL1 protein specifies a gene expression program that leads to maturational arrest at an early lymphoid progenitor stage of http://tmm.trends.com

Vol.9 No.1 January 2003

7

development. Interestingly, the gene that most correlated with the MLL subtype was that encoding the receptor tyrosine kinase (RTK), FLT3, for which potent small molecule inhibitors have been reported [14,15]. If upregulated expression of FLT3 is functionally relevant to MLL growth or differentiation then therapies targeted at this molecule might help to address the poor prognosis of patients with MLL. A more recent analysis of gene expression in 360 pediatric leukemias demonstrated the extent to which gene expression profiles can readily distinguish leukemias with different phenotypes (e.g. B-cell-ALL or T-cell-ALL), or with different oncogenic translocation products (e.g. E2A-PBX1, BCR-ABL, TEL-AML1 or ALL1) [16]. Indeed, the application of classification algorithms to these signatures resulted in almost 100% predictive accuracy among leukemias of different molecular subclasses. Importantly, these algorithms were able to predict the presence of alterations in the TEL gene in the absence of a positive reverse-transcriptase (RT) – PCR test for TEL-AML1 rearrangements, suggesting that these new molecular tools might significantly augment current diagnostic methods. Some of the genes that correlate with the differences between leukemias include other RTKs that might be amenable to therapeutic inhibition, such as the MER RTK in ALL with E2A-PBX1 translocations, and ABL, which, following BCR-ABL translocation, can be inhibited by the currently available anticancer drug, Gleevece (imatinib mesylate) (Novartis Pharmaceuticals, East Hanover, NJ, USA) [17]. By contrast with the B-ALLs, only ,30% of T-ALLs harbor translocations (involving the HOX11, TAL1 and LYL1 genes). However, gene expression patterns of T-ALLs with or without translocations cluster together, implying that the same oncogenes physically disrupted by chromosomal translocation in some T-ALLs are also affected in those T-ALLs without translocations [18]. Based on the gene expression profiles it appears likely that aberrant expression of these genes results in a block of normal thymocyte differentiation. Molecular differences among carcinomas of diverse anatomical origin Investigators have recently begun to create gene expression classifiers capable of distinguishing many of the common adult solid tumors [19– 21]. Most malignant neoplasms can be easily distinguished histopathologically, especially with the use of ancillary techniques such as immunohistochemistry. Occasionally, however, cancers are poorly differentiated or undifferentiated and have overlapping light-microscopic features, rendering them indistinguishable. Because the efficacies of current cancer therapies are largely based on the location of the tumor, knowledge of the anatomical site of origin is one of the most important parameters in patient management. Recently, the technical feasibility of creating a multiclass predictor was examined, by which the anatomical origin of a blinded test set of tumors can be predicted in the context of more than ten tumor classes (Fig. 2). Although computationally difficult, many ‘learning’ methods can be used to achieve the same outcome. However, a specific type of supervised

Review

8

(a)

TRENDS in Molecular Medicine

Vol.9 No.1 January 2003

(b)

(d) Pr Bl Br Co Ga Ki Li Ov P La Ls Pr

Pr Bl (i)

Br 148 'classifier' genes

Bl

Br

Co

(ii)

Co Ga Ki

(iii)

Li Ov

KLK6 WT-1 (iv)

P Ga

LA LS 101 tumors (c) Tumor set

Tumor number

Correct (%)

Misclassified (%)

No call (%)

Training set Blinded set

101 75

92 85

1 0

7 15 TRENDS in Molecular Medicine

Fig. 2. An example of supervised multiclass prediction of human carcinomas. (a) Genes were selected on the basis of high average expression in carcinomas of one anatomical origin versus all other carcinomas, for each of the carcinoma classes: Pr, prostate; Bl, bladder and ureter; Br, breast; Co, colorectum; Ga, gastroesophagus; Ki, kidney; Li, liver; P, pancreas; La, lung adenocarcinoma; Ls, squamous carcinoma of the lung. The result was a large group of genes (, 1000) from which classifiers could be identified. (b) The selection of classifier genes that distinguish each carcinoma class is based on assessment of the accuracy with which each gene could predict the origin of a tumor of unknown origin (blinded). This process was performed using a supervised classification method, known as ‘support vector machines’ (SVM), in combination with leave-out-one cross-validation. Each of the 101 tumors in the training set was successively withheld, and SVM used to choose classifier genes from the pool preselected in (a); the candidate classifier gene was then assessed for its ability to predict the origin of the withheld sample. Shown are the 148 highest-ranked classifier genes. These predicted, with the greatest accuracy, the origin of the 101 tumors in the training set. For example, the gene encoding kallikrein-6 (KLK6), and the Wilms’-tumor gene (WT-1), were linked with ovarian cancer. (c) The accuracy of the carcinoma classifier was judged by internal cross-validation, resulting in correct predictions for 92% of the tumors, using a conservative threshold that minimizes misclassification. The classifier was also used to predict the anatomical origin of 75 blinded independent tumors, including 12 metastatic lesions; the overall accuracy was 85%, including correct classification of nine out of 12 metastases. (d) Expression of classifier genes was validated by immunohistochemistry. Shown here (at the top) is the expression of WT-1 in the carcinoma set: the only appreciable expression was detected in ovarian carcinomas. A polyclonal antibody was used to stain a tissue microarray containing the cores of 229 carcinoma samples arrayed on a glass slide [panel (i)]. Expression of the WT-1 protein (indicated by dark blue coloration) was high in the normal surface ovarian epithelium (ii), as well as in ovarian carcinomas (iii). Expression in all other carcinomas was negative (iv). In total, WT-1 protein expression correctly predicted ovarian carcinomas in 18 out of 20 tested cases. The magnifications at which these pictures were taken are 3 £ in panel (i), 200 £ in panel (ii), 100 £ in panels (iii) and (iv), and 400 £ in the insets to panels (iii) and (iv). Adapted, with permission, from [20].

learning algorithm, termed support vector machine learning was empirically found to be the optimal approach in head-to-head tests [22]. In two recent studies that focused on building large multiclass predictors [20,21], primary tumors were used to ‘train’ the classifier, which was then applied to independent, blinded sets of test tumor samples, resulting in accuracy rates approaching 80 – 85%. Interestingly, when applied to metastatic lesions, both classifiers also performed well, correctly predicting the primary site of origin of the cancers in nine out of 12 and six out of eight cases, respectively. These data suggest that metastatic lesions probably retain many of the molecular features of the primary tumor, and that the source of cancers of undetermined origin might be predicted by these methods. Analyses of larger sample numbers will be required to rigorously test these observations, and the use of DNA microarrays with larger numbers of genes might improve the success rate of multiclass prediction. Although we can now begin to use the concept of http://tmm.trends.com

molecular cancer signatures to a create new classification of malignant neoplasms based exclusively on gene expression patterns, it is also important to focus on the individual genes themselves that are crucial in class prediction. Several genes discovered in this way, such as the prostatespecific antigen and the carcino-embyonic antigen, have been used as cancer diagnostics for many years and, hence, other molecules discovered by these analyses might be similarly useful. Antibodies against the protein targets of some of the most predictive genes might be used as clinical reagents for augmenting histopathological diagnosis, or as serum markers for early detection. For example, kallikrein-6, a classifier of ovarian carcinomas, is detectable in the serum of patients with ovarian cancer, and is a potential marker of this disease [23]. Discovering molecular subgroups of cancer that correlate with clinical outcome Cumulative evidence from studies employing unsupervised or supervised learning methods suggest that the

Review

TRENDS in Molecular Medicine

underlying mechanisms that dictate the likelihood of metastasis, response to therapy and patient outcome, might, in some cases, be embedded in the transcriptional program of the primary tumors. For example, Alizadeh et al. compared gene expression profiles among a series of diffuse large B-cell lymphomas (DLBCL) with those in activated B cells, germinal B cells, T cells, and normal lymph node and tonsil cells [24]. They found a major division within the DLBCL: some tumors had gene expression patterns characteristic of germinal B cells, whereas others had gene expression profiles similar to activated B cells. Patients that had DLBCL with a germinal-B-cell-like expression pattern had a significantly better overall survival rate than those with tumors having an activated-B-cell pattern. Although subsequent work has confirmed the differences in gene expression between germinal- and activated-B-cell-like DLBCL, the prognostic significance of these B-cell-like profiles is controversial. In an independent study of DLBCL [25], no significant difference in the overall outcome was identified among patients grouped by the expression of germinal- or activated-B-cell signature genes. However, using supervised learning on patient outcome, the authors derived a prognostic gene expression signature, which they verified by cross-validation in their own dataset and in the dataset reported by Alizadeh et al. Most recently, clustering of the germinal- or activated-B-cell signature genes in 240 cases of DLBCL divided the tumors into three distinguishable groups, including the germinal- and activated-B-cell signatures, as well as a group with low expression of both set of genes [26]. Again, evidence was found for a prognostic distinction between groups, with a markedly better survival rate among patients whose tumor expressed the germinal-B-cell-like signature. However, the same authors also used a supervised learning approach to derive classifiers of patient outcome that were more robust than those derived by clustering [26]. The difference between these studies probably reflects the intention of the experiments [27]. The use of hierarchical clustering to identify subgroups of cancer relies on equally weighting all of the genes under consideration, seeking underlying distinctions (groups) within in the data. The subsequent discovery of a difference in patient outcome that correlates with these patterns of gene expression is fortuitous, because there is no model being built or tested. By contrast, supervised approaches specifically model each gene for its correlation to a clinical parameter, placing disproportionate weight on those genes whose expression best divides the tumors, thereby creating (and ultimately testing) a hypothesis. Not surprisingly, these model-based approaches are likely to be the more reliable. Classification studies based on patient outcome have also been applied to solid tumors. For example, in a subset of melanomas, the tumor cells express reduced levels of genes encoding proteins that are involved in motility, invasion and vasculogenic mimicry [28]. Although clinical data for these patients were limited, there were significantly fewer deaths among those having tumors with this specific pattern of gene expression. Several laboratories have also reported differences in gene expression among breast tumors, including differences related to known http://tmm.trends.com

Vol.9 No.1 January 2003

9

markers, such as estrogen receptor (ER) status and the HER2/neu oncogene, as well as the discovery of a basal-cell phenotype, which is associated with a poor prognosis [29,30]. Recent data suggest that classifiers can be used to predict which breast cancers are likely to metastasize to distant sites [31]. Clustering of , 5000 genes in 98 breast tumors from young, lymph-node-negative patients, some of whom subsequently developed distant metastases, led to a distinction predominantly based on ER status; the frequency of distant metastasis in patients with ER-negative tumors was significantly higher (70%) than in ER-positive patients (36%). Using a supervised classification method, the authors derived a ‘poor prognosis’ expression signature that correctly predicted the outcome for , 80% of the patients in the study, as well as for 17 of 19 other patients. The prognostic expression signature was superior to the classic adverse-prognosis factors (e.g. high grade, large tumor size, angio-invasion and negative ER status), and was capable of predicting, within acceptable error limits, those patients who would most benefit from adjuvant chemotherapy following surgery and those that would require surgical resection alone. Given that the majority of lymph-node-negative breast-cancer patients are treated with adjuvant chemotherapy, these molecular classifiers could significantly reduce the number of patients receiving unnecessary treatment. Transcript profiles that define prognostic subgroups are also emerging for lung [32,33] and prostate carcinomas [34], and for childhood embryonal cancers of the central nervous system [35], suggesting that clinical behavior might be predictable across a diverse range of human tumor types. Thus, expression profiling studies are beginning to provide rational, molecular explanations for how malignant tumors of a specific grade or stage can have remarkably diverse responses to particular treatments, and different overall clinical outcomes. Future prospects The application of DNA microarrays to the study of human malignancy is beginning to yield significant insights into the molecular mechanisms underlying disease etiology. Preliminary studies suggest that this technology might also be used to classify cancers based on their gene expression profile, and stratify patients according to their likely response to therapy and clinical outcome. However, a major limitation of the studies carried out to date is the small number of samples that have been used. Significantly greater numbers of samples are required to rigorously validate tumor class prediction, and to confirm the discovery of new, clinically relevant molecular subclasses of cancer. This problem is clearly illustrated by the differences in the prognostic value of genes identified in DLBCL, as well as the fact that few studies of particular cancers have identified substantially overlapping gene sets that best distinguish subgroups of those cancers. These discrepancies are due, in part, to differences in the platforms used to generate the data (i.e. oligonucleotide or cDNA arrays), as well as the genes included on these arrays and the treatment of data. The issue of sample number is the subject of several

10

Review

TRENDS in Molecular Medicine

initiatives from the National Cancer Institute that are aimed at generating large datasets for cancer classification. The International Genomics Consortium has also proposed to profile the expression of the majority of human transcripts in a collection of ,10 000 human tumors [36]. The data will need to be derived from tumors that cover the clinical spectrum of cancer stages and grades, and important clinical end points, such as patient survival. Progress is also being made in comparing data generated from different laboratories using different platforms [37]. Hence, in the short-term, it should be possible to consolidate publicly available information, allowing specific hypotheses to be generated in one dataset, and validated in one or more independent sets. Some of the initial prognostic gene expression studies suggest that tailored treatment might be a real possibility in the future. This new molecular knowledge reinforces the need for new cancer medicines. Although the development of therapies targeted at particular molecules, such as the BCR-ABL inhibitor Gleevece, is encouraging [17], and the application of microarrays is pinpointing potential new targets, such as FLT3 in MLL [13], the cost of developing these drugs remains high, and the time required to get to market is substantial. In the near future, the joint use of DNA microarrays and new validation tools, such as RNAi [5], is likely to lead to the identification of multiple, genetically validated targets in human cancers, which should help to guide effective target selection for drug discovery efforts. Thus, although these are exciting times for translational cancer biology, the seed is merely sown, and much work remains to be done before the real fruits of these endeavors can be harvested. Acknowledgements We are grateful to our colleagues at the University of Virginia and at the Genomics Institute of the Novartis Research Foundation for discussions and suggestions, and to Drs John Hogenesch and Quinn Deveraux for critical evaluation of the manuscript. We apologize in advance to those authors whose work we have not cited in the interest of space and brevity. References 1 Adams, M.D. et al. (1993) Rapid cDNA sequencing (expressed sequence tags) from a directionally cloned human infant brain cDNA library. Nat. Genet. 4, 373 – 380 2 Venter, J.C. et al. (2001) The sequence of the human genome. Science 291, 1304 – 1351 3 Lander, E.S. et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860 – 921 4 Duggan, D.J. et al. (1999) Expression profiling using cDNA microarrays. Nat. Genet. 21, 10 – 14 5 Hannon, G.J. (2002) RNA interference. Nature 418, 244 – 251 6 Clark, E.A. et al. (2000) Genomic analysis of metastasis reveals an essential role for RhoC. Nature 406, 532 – 535 7 Welsh, J.B. et al. (2001) Analysis of gene expression identifies candidate markers and pharmacological targets in prostate cancer. Cancer Res. 61, 5974 – 5978 8 Kim, J.H. et al. (2002) Osteopontin as a potential diagnostic biomarker for ovarian cancer. J. Am. Med. Assoc. 287, 1671– 1679 9 Sherlock, G. (2000) Analysis of large-scale gene expression data. Curr. Opin. Immunol. 12, 201– 205

http://tmm.trends.com

Vol.9 No.1 January 2003

10 Quackenbush, J. (2001) Computational analysis of microarray data. Nat. Rev. Genet. 2, 418 – 427 11 Vapnik, V. (1998) Statistical Learning Theory, Wiley 12 Golub, T.R. et al. (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531 – 537 13 Armstrong, S.A. et al. (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat. Genet. 30, 41 – 47 14 Kelly, L.M. et al. (2002) CT53518, a novel selective FLT3 antagonist for the treatment of acute myelogenous leukemia (AML). Cancer Cell 1, 421– 432 15 Weisberg, E. et al. (2002) Inhibition of mutant FLT3 receptors in leukemia cells by the small molecule tyrosine kinase inhibitor PKC412. Cancer Cell 1, 433 – 443 16 Yeoh, E.J. et al. (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1, 133– 143 17 Capdeville, R. et al. (2002) A rationally developed, targeted anticancer drug. Nat. Rev. Drug Discov. 1, 493– 502 18 Ferrando, A.A. et al. (2002) Gene expression signatures define novel oncogenic pathways in T cell acute lymphoblastic leukemia. Cancer Cell 1, 75– 87 19 Giordano, T.J. et al. (2001) Organ-specific molecular classification of primary lung, colon, and ovarian adenocarcinomas using gene expression profiles. Am. J. Pathol. 159, 1231– 1238 20 Su, A.I. et al. (2001) Molecular classification of human carcinomas by use of gene expression signatures. Cancer Res. 61, 7388 – 7393 21 Ramaswamy, S. et al. (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Natl Acad. Sci. U.S.A. 98, 15149 – 15154 22 Yeang, C.H. et al. (2001) Molecular classification of multiple tumor types. Bioinformatics 17, S316 – S322 23 Diamandis, E.P. et al. (2000) Human kallikrein 6 (zyme/protease M/neurosin): a new serum biomarker of ovarian carcinoma. Clin. Biochem. 33, 579– 583 24 Alizadeh, A.A. et al. (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503– 511 25 Shipp, M.A. et al. (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat. Med. 8, 68 – 74 26 Rosenwald, A. et al. (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N. Engl. J. Med. 346, 1937 – 1947 27 Veer, L.J. and De Jong, D. (2002) The microarray way to tailored cancer treatment. Nat. Med. 8, 13 – 14 28 Bittner, M. et al. (2000) Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406, 536– 540 29 Perou, C.M. et al. (2000) Molecular portraits of human breast tumours. Nature 406, 747 – 752 30 Sorlie, T. et al. (2001) Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc. Natl Acad. Sci. U.S.A. 98, 10869 – 10874 31 Veer, L.J. et al. (2002) Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530– 536 32 Bhattacharjee, A. et al. (2001) Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc. Natl Acad. Sci. U.S.A. 98, 13790 – 13795 33 Garber, M.E. et al. (2001) Diversity of gene expression in adenocarcinoma of the lung. Proc. Natl Acad. Sci. U.S.A. 98, 13784 – 13789 34 Singh, D. et al. (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1, 203 – 209 35 Pomeroy, S.L. et al. (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436– 442 36 Knight, J. (2001) Cancer comes under scrutiny in fresh genomics initiative. Nature 410, 855 37 Rhodes, D.R. et al. (2002) Meta-analysis of microarrays: interstudy validation of gene expression profiles reveals pathway dysregulation in prostate cancer. Cancer Res. 62, 4427– 4433