Genes with Bimodal Expression Are Robust Diagnostic Targets that Define Distinct Subtypes of Epithelial Ovarian Cancer with Different Overall Survival

Genes with Bimodal Expression Are Robust Diagnostic Targets that Define Distinct Subtypes of Epithelial Ovarian Cancer with Different Overall Survival

2 A M 01 SI E 2 P Pr JM og D ra m C The Journal of Molecular Diagnostics, Vol. 14, No. 3, May 2012 Copyright © 2012 American Society for Investigativ...

2MB Sizes 0 Downloads 43 Views

2 A M 01 SI E 2 P Pr JM og D ra m C

The Journal of Molecular Diagnostics, Vol. 14, No. 3, May 2012 Copyright © 2012 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved. DOI: 10.1016/j.jmoldx.2012.01.007

Genes with Bimodal Expression Are Robust Diagnostic Targets that Define Distinct Subtypes of Epithelial Ovarian Cancer with Different Overall Survival

Dawn N. Kernagis, Allison H.S. Hall, and Michael B. Datto From the Department of Pathology, Duke University Medical Center, Durham, North Carolina

In some cancer types, certain genes behave as molecular switches, with on and off expression states. These genes tend to define tumor subtypes associated with different treatments and different patient survival. We hypothesized that clinically relevant molecular switch genes exist in epithelial ovarian cancer. To test this hypothesis, we applied a bimodal discovery algorithm to a publicly available ovarian cancer expression microarray data set, GSE9891 [285 tumors: 246 malignant serous (MS), 20 endometrioid (EM), and 18 low malignant potential (LMP) ovarian carcinomas]. Genes with robust bimodal expression patterns were identified across all ovarian tumor types and also within selected subtypes: 73 bimodal genes demonstrated differential expression between LMP versus MS and EM; 22 bimodal genes distinguished MS from EM; and 14 genes had significant association with survival among MS tumors. When these genes were combined into a single survival score, the median survival for patients with a favorable versus unfavorable score was 65 versus 29 months (P < 0.0001, hazard ratio ⴝ 0.4221). Two independent data sets [high-grade, advanced-stage serous (n ⴝ 53) and advanced-stage (n ⴝ 119) ovarian tumors] validated the survival score performance. We conclude that genes with bimodal expression patterns not only define clinically relevant molecular subtypes of ovarian carcinoma but also provide ideal targets for translation into the clinical laboratory. ( J Mol Diagn 2012, 14:214-222; DOI: 10.1016/j.jmoldx.2012.01.007)

Across all breast cancer samples, a small group of tumors have very high expression of KI-67 (poor prognosis) and another small group of tumors have very low expression (good prognosis), but most tumors fall somewhere in between.1 The second class is made up of genes with a discontinuous or bimodal distribution of expression. In breast cancer, examples include ER (estrogen receptor), PR (progesterone receptor), and ERBB2 [receptor tyrosine-protein kinase erbB-2 (synonym: human epidermal growth factor receptor 2, HER2)]. For these genes, some tumors have high levels of expression and others have little to no expression, with relatively few tumors in between. These genes can be considered as molecular switches, with distinct on/off expression states defining clinically unique subtypes of breast cancer that have different overall prognoses and, most importantly, respond to different therapeutic regimens.2,3 In addition to their biological importance, genes with bimodal expression patterns are also excellent targets for clinical testing, given the robust and easily detectable differences between their low and high expression states. The great majority of genome-wide expression-based discovery work does not consider these expression characteristics of a gene (bimodal versus Gaussian) or even the gene’s ultimate utility as a clinical testing target, but rather starts with the clinical annotation of tumor samples. From this annotation (stage, survival, response to therapy), genes are found that correlate with some aspect of tumor behavior. Alternatively, in expression data sets that are large enough, unsupervised clustering approaches are used. These methods usually lead to models that consist of hundreds of genes and can be robustly applied only to other large data sets that are representative Accepted for publication January 13, 2012. D.N.K. and A.H.S.H. contributed equally to this work.

A review of the expression-based targets that are currently used to define clinical decision trees reveals two different classes of genes. The first class is made up of genes with a continuous, or Gaussian, distribution of expression. In breast cancer, MKI67 (encoding antigen KI-67) is a gene in this class that is predictive of survival.

214

CME Disclosure: None of the authors disclosed any relevant financial relationships. Supplemental material for this article can be found at http://jmd. amjpathol.org or at doi: 10.1016/j.jmoldx.2012.01.007. Address reprint requests to Michael B. Datto, M.D., Ph.D., Department of Pathology, Duke University Medical Center, Medical Center Box 3712, Durham, NC 27710. E-mail: [email protected].

Bimodal Gene Expression in Ovarian Cancer 215 JMD May 2012, Vol. 14, No. 3

of all tumor types. In this work, we took a very different approach. We searched for genes with interesting (bimodal) expression patterns irrespective of clinical annotation and then determined whether these genes have clinical utility. We applied this approach to epithelial ovarian carcinomas. In the United States, ovarian carcinoma has the highest mortality rate among gynecologic malignancies and is the fifth most common cause of cancer death in women.4 Serous carcinomas are the most common type of epithelial ovarian malignancy, representing 50% of the total. Prognosis for patients with serous ovarian carcinoma is determined primarily by tumor stage,5 and by histological grade and extent of surgical debulking in advanced cases.6 These prognostic methods, however, remain relatively inaccurate.7 The majority of patients present with high-grade, advanced-stage tumors and a corresponding overall poor prognosis.4 Within this group, however, there is a subset with a durable response to chemotherapy, resulting in better survival.8,9 At present, there are no ancillary tests in general use that are able to distinguish between high-grade, advanced-stage tumors that are likely to be rapidly progressive and those that may be more effectively treated. Although there have been numerous studies of molecular differences among ovarian carcinomas, including genome-wide expression studies (expression microarray studies),9 –20 none of these have been translated into clinical practice, and a molecular subtyping platform for ovarian cancer, similar to that used in breast cancer, does not yet exist. Here we report our investigation of the presence and clinical relevance of genes with bimodal patterns of expression, using the largest publicly available ovarian cancer microarray data set.21 We demonstrate the clinical value of these molecular switches by describing their correlation with tumor type and overall patient survival. Finally, given their robust bimodal nature and applicability across different data sets using different testing platforms, we propose that these genes are strong candidates for use in clinical diagnostic and prognostic testing.

Materials and Methods Expression Microarray Data Preprocessing The publicly available ovarian cancer microarray data set GSE9891 (Affymetrix GeneChip Human Genome U133 Plus 2.0 array; n ⫽ 285)21 was used for bimodal discovery. For survival score validation, two data sets were used, GSE18520 (laser-capture microdissected epithelial samples from high-grade, advanced-stage serous ovarian tumors; Affymetrix U133 Plus 2.0 array; n ⫽ 53)22 and a subset of the data set described by Bild et al9 [advanced-stage ovarian tumors; Affymetrix U133A array (see Supplemental Table S1 at http://jmd.amjpathol.org); n ⫽ 119]. Each array data set was independently normalized using the robust multichip average (RMA) implementation in the Expression Console version 1.1 software package from Affymetrix (Santa Clara, CA), and

expression values were log2-transformed before further analysis.

Bimodal Index Calculation and Molecular Subtyping Using the R-based algorithm described by Wang et al23 on GSE9891, a bimodal index (BI) for each probe set was calculated. HLA genes and genes with clear population copy number variation24 were excluded from further analysis. Uncentered correlation, complete linkage hierarchical cluster analysis with no further data adjustment was conducted using Cluster 3.0.25 Results were visualized using Java TreeView version 1.1.5r2.26 For each probe set, cutoff values for low, high, and indeterminate expression levels were calculated from the bimodal analysis output as follows: low cutoff ⫽ ␮1 ⫹ 2␴ and high cutoff ⫽ ␮2 ⫺ 2␴, where ␮1 and ␮2 are the mean values of the low and high mode, respectively, and ␴ is the standard deviation. If the low and high cutoffs overlapped, the midpoint between ␮1 and ␮2 was used. Using GraphPad QuickCalcs software (GraphPad Software, La Jolla, CA), Fisher’s exact test P values were calculated for each probe set, to identify those with bimodal expression (high versus low) that were significantly associated with histological subtype.

Survival Analysis Using 242 of the 246 MS samples from GSE9891 (4 samples were filtered for missing clinical outcome data), individual Kaplan-Meier survival analysis was conducted on the top 125 bimodal probe sets (BI ⬎ 1.9). The 16 survival-associated probe sets, with high cutoff values above noise (log2 expression ⬎ 5.0), were then combined using a simple equally weighted scoring algorithm to create a sum survival score. Each probe set for each sample was given the value of ⫺1 (low expression), 1 (high expression), or 0 (indeterminate expression), and these values were summed to create a sum survival score, with the value for the 207802_at probe set subtracted from the total score (because of its opposite association with survival, compared with the other 15 probe sets). Cutoff values (low versus high) for the sum survival scores were determined based on their distribution and to maximize the predictive power in this training set (see Supplemental Table S2 at http://jmd.amjpathol.org). Kaplan-Meier analysis was conducted to evaluate survival significance between low and high sum scores using GraphPad Prism version 4.0 software (GraphPad Software, La Jolla, CA). Survival validation was conducted using two independent, publicly available data sets, GSE1852022 and the data set generated by Bild et al.9 Low, high, and indeterminate expression cutoff values for individual probe sets were applied directly to these validation sets without any further normalization or batch correction.

216 Kernagis et al JMD May 2012, Vol. 14, No. 3

A

B 9000

35

8000

30 25

6000

Frequency

Frequency

7000

5000 4000 3000

HLA-DQB1 BI = 2.35

20 15 10

2000 5

1000

9.8

10.2

8.6

9

9

9.4

8.6

8.2

7.8

7

7.4

6.6

6.2

5.8

5

5.4

4.6

4.2

3.8

3

0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4 3.6 3.8 4.0 4.2 4.4 4.6 4.8 5.0

3.4

0

0

Log(2) Expression

Bimodal Index Score

C

D Low

30

Indeterminate

High µ2

µ1

120 25 100

+2σ

8.2

7.8

7.4

7

6.6

6.2

5.8

5

5.4

4.6

2.6

9.6

9.2

8.8

8

8.4

7.6

7.2

6.8

6

6.4

5.6

4.8

5.2

0

4.4

0 4

20

4.2

40

5

Log(2) Expression

NUMA1 BI = 4.97

3.8

10

60

3.4

15

-2σ

80

3

Frequency

GSTT1 BI = 2.73

3.6

Frequency

20

Log(2) Expression

Figure 1. Bimodal gene expression in ovarian tumors (GSE9891). A: Distribution of bimodal index scores across all genes. Genes with a BI of ⬎1.9 were selected for further evaluation. Genes demonstrating robust bimodal expression distribution among all samples include HLA-DQB1 (B), GSTT1 (C), and NUMA1 (D). Cutoff values distinguishing low, high, and indeterminate gene expression of NUMA1 are shown.

Results Genes with Robust Bimodal Patterns of Expression Are Present in Ovarian Cancer Using the bimodality index algorithm described by Wang et al23 and the ovarian cancer data set published by Tothill et al,21 we generated the bimodality index BI for the expression of all genes across all 285 samples: 18 low malignant potential (LMP) tumors, 20 endometrioid tumors, 1 malignant adenocarcinoma, and 246 malignant serous (MS) tumors. Many genes with bimodal expression patterns were identified. We chose a BI cutoff of ⬎1.9 to select genes for further evaluation. This cutoff produced a manageable number of genes (Figure 1A), and each of the genes with BI ⬎ 1.9 had a clear, visually recognizable, and robust bimodal distribution among tumor samples when evaluated by a simple histogram. Each of these genes (BI ⬎ 1.9) showed a distinct bimodal distribution of expression, as demonstrated for a set of representative genes in Figure 1. We subsequently excluded HLA genes because of their population sequence variation, which could cause the spurious appearance of a bimodal distribution (Figure 1B). We also excluded genes, such as GSTT1, known to have a bimodal pattern

of expression due to population copy number variation (Figure 1C). Ten genes were ultimately excluded, leaving 159 probe sets (see Supplemental Table S3 at http:// jmd.amjpathol.org). For subsequent analysis, the low, high, and indeterminate expression cutoff values were determined based on the bimodal expression pattern (Figure 1D), as described under Materials and Methods. Results from this analysis clearly demonstrate that bimodally expressed genes are present in epithelial ovarian carcinomas.

Bimodal Gene Expression Defines Distinct Ovarian Tumor Histological Subtypes An unsupervised hierarchical cluster analysis on these 159 probe sets using all 285 samples revealed a distinct clustering of LMP samples from MS and endometrioid tumor samples (Figure 2). This suggests that the dominant contributor to variance in the expression of these bimodal genes is attributable to histological subtype. The LMP cluster is driven by a distinct set of covariant genes (Figure 2). Specifically, 79 of the 159 probe sets (73 genes) are strongly associated with LMP subtype (P ⬍ 0.05, Fisher’s exact test) (see Supplemental Table S4 at

Bimodal Gene Expression in Ovarian Cancer 217 JMD May 2012, Vol. 14, No. 3

Figure 2. Hierarchical cluster analysis was conducted for all 285 samples (GSE9891) using BI ⬎ 1.9 probe sets after filtering for those associated with HLA or large copy number variants. Genes are clustered on the left; arrays are clustered along the top. Sample types are denoted by color below the dendrogram: blue, low malignant potential serous; green, endometrioid; red, malignant serous.

http://jmd.amjpathol.org). Next, RMA normalization and bimodal analysis were performed on the 266 endometrioid and MS samples. Of the 135 probe sets with BI ⬎ 1.9, 25 probe sets (22 genes) distinguished between endometrioid and MS tumors (P ⬍ 0.05, Fisher’s exact test) (see Supplemental Table S5 at http://jmd.amjpathol.org). Thus, these data demonstrate that certain bimodal genes are associated with histologically distinct tumor subtypes of ovarian carcinoma.

Bimodal Genes Define a Molecular Subtype of Ovarian Carcinoma Associated with Survival To identify genes associated with survival without the confounding effect of different tumor types, we next renormalized and performed bimodal discovery on the 246 MS samples. In all, 125 probe sets were identified with a BI ⬎ 1.9 (see Supplemental Table S6 at http:// jmd.amjpathol.org). Unsupervised hierarchical cluster analysis conducted using these probe sets identified distinct clusters of covariant genes and two major groupings of arrays, branches A and B (Figure 3A). When KaplanMeier analysis was performed on these two branches, a significant difference in survival was identified (P ⫽ 0.003, hazard ratio HR ⫽ 1.773) (Figure 3B). Thus, these bimodal genes define two distinct molecular types of tumors with different overall survival, and survival is one of the major contributors to variance in their expression. Cutoff values for low, high, and indeterminate expression levels for these 125 probe sets were determined as described above. Kaplan-Meier analysis was conducted on each probe set individually, to determine which had a

significant association between low versus high expression and survival. Sixteen probe sets (encoding 14 genes) were identified. These probe sets, which had a high degree of concordance in their high versus indeterminate versus low states (Fleiss’ ␬ 0.68), were combined to create a sum survival score for each sample (Table 1; see also Supplemental Figure S1 at http://jmd.amjpathol. org). The molecular switch state for all 16 probe sets was combined in an additive fashion to create a single survival score for each sample. Survival score cutoffs for low, high, and indeterminate were visually determined in accordance with the bimodal score distribution pattern across all MS samples (Figure 4A). Kaplan-Meier analysis results demonstrated a statistically significant survival difference between patients with low (score ⬍ ⫺8), high (score ⬎ ⫺3), and indeterminate (⫺8 ⱕ score ⱕ ⫺3) survival score (P ⫽ 0.0001) (Figure 4B). The median long-term survival for patients with a favorable score was 65 months, compared with 29 months for patients with unfavorable or indeterminate survival scores (P ⬍ 0.0001, HR ⫽ 0.4221). In this data set, the MS samples originated from different primary sites, with the majority being either ovarian primary (n ⫽ 200) or peritoneal primary (n ⫽ 34). To verify that the survival significance of the genes was not due to primary tumor site or associated with tumor pathology, we repeated the sum score survival analysis on the ovarian primary tumors alone. Our results showed a significant survival difference between low, high, and indeterminate survival score groups when just ovarian primary tumors were considered (P ⫽ 0.0001) (see Supplemental Figure S2 at http://jmd.amjpathol.org). Similarly, sum score sur-

218 Kernagis et al JMD May 2012, Vol. 14, No. 3

Figure 3. A: RMA and bimodal analysis was performed on MS tumors (GSE9891), and hierarchical cluster analysis was conducted for the 246 MS samples using BI ⬎ 1.9 probe sets after filtering for those associated with HLA or population copy number variants. B: Kaplan-Meier survival analysis identified a survival difference between patients in the two major branches of the hierarchical clustering dendrogram.

When samples with low survival sum scores were compared with samples demonstrating high and indeterminate score, the median long-term survival for patients with a favorable (low) score was 36 months, compared with 17 months in patients with unfavorable (high) or indeterminate survival scores (P ⫽ 0.0084, HR ⫽ 0.4565). In this data set, in which a good outcome is defined as survival past the median overall survival of 25 months and a bad outcome is defined as death before median survival is reached, the sensitivity of an indeterminate or unfavorable sum score was 57.7% (95% CI ⫽ 36.9% to 76.6%) and its specificity was 79.2% (95% CI ⫽ 57.8% to 92.9%) (see Supplemental Table S7 at http://jmd.amjpathol.org). The second validation data set (generated by Bild et al9) contained 135 advanced-stage tumors and used the Affymetrix U133A platform.9 Of these 135 arrays, 119 were used for subsequent analysis (see Supplemental Table S1 at http://jmd.amjpathol.org). Threshold values

vival significance held when high-grade and low-grade tumors were analyzed separately (P ⫽ 0.0026, grade 1 and 2 tumors; P ⫽ 0.0274, grade 3 tumors), as well as when advanced-stage tumors (stage III-IIIc) were analyzed separately (P ⫽ 0.0027) (see Supplemental Figure S3 at http://jmd.amjpathol.org). Finally, we validated our sum survival score using two independent validation data sets. GSE1852022 contained 53 laser-capture microdissection samples of high-grade, advanced-stage papillary serous ovarian tumors. Probe set expression cutoff values determined in the training set were applied with no further correction, and the sum score for each sample was determined. Sum survival scores across all validation samples maintained a bimodal expression pattern (see Supplemental Figure S4 at http://jmd.amjpathol.org). Kaplan-Meier analysis demonstrated significant survival differences between all three survival score groups (low, high, and indeterminate) in the GSE18520 validation set (P ⫽ 0.0314) (Figure 4C).

Table 1.

Survival-Significant Genes among Malignant Serous Tumors

Affymetrix Probe ID

Gene

BI

␮1

␮2



P value*

232523_at 204915_s_at 229554_at 206439_at 219937_at 230865_at 228780_at 218468_s_at 229479_at 218469_at 209613_s_at 228598_at 203980_at 207802_at 1560698_a_at 209612_s_at

MEGF10 SOX11 LUM EPYC TRHDE LIX1 POU3F3 GREM1 unknown GREM1 ADH1B DPP10 FABP4 CRISP3 LOC283392 ADH1B

2.09 1.91 1.97 2.27 2.81 1.94 2.91 2.01 2.37 1.94 2.34 2.02 2.58 2.29 2.08 2.26

3.468748 4.804793 5.188884 3.601230 3.097969 3.472591 3.075755 3.640580 3.828929 3.916469 3.157311 3.139758 4.241476 3.230949 3.324755 3.960957

6.847644 8.525937 8.094111 8.473435 6.762696 7.023606 8.625999 7.162920 7.634728 7.587012 7.94840 6.729975 10.02201 7.479999 6.052015 8.695147

0.684446 0.799646 0.724050 1.038651 0.301738 0.834944 0.551117 0.830929 0.782749 0.908092 0.926028 0.580631 1.025201 0.608813 0.293312 0.937286

0.0003 0.0008 0.0020 0.0046 0.0110 0.0120 0.0120 0.0127 0.0145 0.0285 0.0313 0.0336 0.0362 0.0364 0.0420 0.0461

*Kaplan-Meier survival analysis (low versus high expression). BI, bimodal index; ␮1, mean value of the low mode; ␮2, mean value of the high mode; ␴, standard deviation.

Bimodal Gene Expression in Ovarian Cancer 219 JMD May 2012, Vol. 14, No. 3

Figure 4. Survival scores were generated in the training set (GSE9891; n ⫽ 242). A: Score cutoffs for low, high, and indeterminate were established. B: Kaplan-Meier analysis was conducted between all groups. Training set-based cutoffs were applied to validate survival significance in the GSE18520 data set (U133 Plus 2.0 data set, ovarian tumor epithelial tumors; n ⫽ 53) (C) and in the data set of Bild et al9 (U133A data set, advanced-stage ovarian tumors; n ⫽ 119) (D).

generated in the training set for individual probe sets were applied. Of the 16 probe sets, only 9 were present on the U133A platform. Thus, sum score threshold values were re-established based on the distribution of the sum score across the samples in this data set and to give population sizes similar to the training data set. Sum score cutoffs were established for low (score ⬍ ⫺4), high (score ⬎ ⫺2), and indeterminate (⫺4 ⱕ score ⱕ ⫺2) (see Supplemental Figure S5 at http://jmd.amjpathol.org9). The survival scores maintained significance (P ⫽ 0.0082) when the three groups were compared, and when favorable scores were compared with unfavorable or indeterminate scores (P ⫽ 0.0018, HR ⫽ 0.4849) (Figure 4D). In this data set, in which a good outcome is defined as survival past the median overall survival of 53 months and a bad outcome is defined as death before median survival is reached, the sensitivity of an indeterminate or unfavorable sum score is 54.5% (95% CI ⫽ 40.6% to 68.0%) and its specificity is 70.7% (95% CI ⫽ 54.5% to 83.9%) (see Supplemental Table S7 at http://jmd.amjpathol.org). Taken together, these data show that certain genes with bimodal expression patterns define a clinically relevant molecular subtype in ovarian carcinoma. The robust clinical predictive power of these genes is demonstrated

by their ability to predict patient survival in independent data sets containing advanced-stage tumor tissue as well as epithelial cells captured from high-grade, advancedstage ovarian tumors.

Discussion The present study is the first to investigate the presence and clinical relevance of genes with bimodal patterns of expression in epithelial ovarian carcinoma. We have identified a subset of genes with distinct bimodal expression, based on a large data set of ovarian tumor samples. We also demonstrated that a number of genes with the strongest bimodal expression patterns are significantly associated with tumor type and/or overall patient survival. When combined into a single sum survival score, the top survival-significant genes identify a clinically distinct molecular subtype of MS ovarian carcinoma. The robustness of this approach is evident in the successful application of this survival index to two independent ovarian cancer data sets without any further normalization or batch correction. A number of recent studies have identified significant genes and multigene expression signatures to distinguish

220 Kernagis et al JMD May 2012, Vol. 14, No. 3

histological subtypes11,13–15 or to stratify advanced-stage serous ovarian carcinoma by survival.9,16,17,21,27–29 Our approach, however, is unique. We started with the focus on translation and ultimate implementation in the clinical molecular laboratory setting. In this setting, the requirements are that i) the number of genes in a predictor should be small, ii) once genes are identified the statistical approaches should be simple, and iii) the differences between high and low expression should be large, robust, and easy to distinguish. These simple requirements exclude many of the currently used approaches for arraybased predictions. Approaches that involve self-organization of data sets through clustering of the entirety of expressed genes (or a subset of the most variant genes) do not meet all of these requirements. Often, the distilled lists of genes that distinguish subtypes are large. In addition, direct application of clustering-based predictors is dependent on having similarly large data sets that are representative of all possible phenotypes. Even then, the application of clustering-based algorithms to external data sets can be difficult. This is seen in the original description of the publicly available ovarian cancer microarray data set (GSE9891)21 when applied to the data set of Bild et al9 used in our validation. It is also difficult to translate these types of approaches to the single-patient sample, and thus our second requirement is not met. This is made evident by the complex controversy around the implementation of cluster-based molecular subtyping of breast cancer.30 The second main approach to array-based discovery is to identify those genes most strongly associated with survival or tumor subtypes from all genes represented on the array. This subset of predictive genes is then combined using a predicting algorithm of some sort (eg, logistic regression or principal component analysis). Although this approach can, like the clustering approaches, be quite successful, it runs the risk of identifying large sets of genes of which only a small fraction are good testing targets; thus, our third requirement is not met. In the present study, we focused on genes with a strong bimodal expression pattern. We did this for two reasons. First, bimodal analysis identifies genes that are ideal candidates for translation into robust, precise clinical diagnostic and prognostic tests. Second, as indicated by analogy to other tumor types (breast adenocarcinoma), bimodal genes tend to define distinct molecular subtypes of cancer with different clinical behavior. From the perspective of clinical testing, genes with a continuous (Gaussian) distribution of expression make problematic testing targets. High and low expression can be difficult to standardize between laboratories or even within a single laboratory. However, the distinction between tumors with on versus off expression for a particular bimodal gene is relatively straightforward. This allows clear-cut decision-making boundaries and development of precise, reliable testing methods that are less likely to be affected by issues such as methodological variation between laboratories, suboptimal tissue collection or preservation, or other sources

of preanalytic variance. Bimodal genes may also be candidates for testing by less quantitative methods, such as immunohistochemistry, as in the case of ER, PR, and ERBB2 (HER2) in breast cancer. In addition to being good testing targets, several of the bimodal genes that we identified have been previously described as having roles in tumorigenesis. Two of the 14 survival-significant genes described here (SOX11 and POU3F3) encode transcription factors that synergize to regulate transcription by binding to adjacent DNA binding elements in the context of embryonic development of the central nervous system.31 Typically, expression of SOX11 is not seen in adult tissues. However, aberrant overexpression has been described in mantle cell lymphoma,32 malignant gliomas,33 and ovarian carcinoma.34 In fact, in the context of ovarian carcinoma, Brennan et al34 recently described SOX11 as prognostic factor on its own. In their study, however, the correlation with survival (high expression, increased survival) was reversed from what we observed (low expression, increased survival). Furthermore, low expression of SOX11 has been associated with improved outcomes in mantle cell lymphoma and meningioma,35,36 in line with our present observation of low SOX11 expression associated with better survival. The reason for such discrepant findings is uncertain. GREM1 has been described as prognostic in renal cell carcinoma.37 Again, however, the association in renal cell carcinoma is reversed from what we observed in our three ovarian cancer data sets. Given the potential function of the GREM1 protein as proangiogenic through its interaction with VEGFR-2,35 it is certainly conceivable that high expression of GREM1 could be associated with poor outcomes and that these associations could be tumor specific. High CRISP3 expression has been associated with smaller recurrence-free probabilities in prostate cancer after radical prostatectomy.38 There are conflicting data on this point, however, and other studies have not found this association.39 Finally, internal deletions in DPP10 (exons 4 to 24) have been described in malignant pleural mesothelioma, and expression of either mutated or WT DPP10 correlated with improved survival in this tumor type.40 Determining whether the survival association seen in the present study is a result of this internal deletion remains an area for future study. All of these examples raise the possibility that we have described molecular switch genes that will be relevant not only in the context of ovarian carcinoma, but also across multiple tumor types. There is overlap between the genes that we have identified and those identified as defining tumor subtypes in the original description of this data set by Tothill et al.21 However, although there is significant covariance in expression among the 14 genes that we identified, these genes did not define a single subtype in the previous work. Among the 14 survival-significant genes identified in the present study, up-regulation of SOX11, DPP10, and POU3F3 was seen in the C5 subclass described by Tothill et al.21 This subtype is a high-grade serous subtype defined by genes associated with mesenchymal development. Increased expression of CRISP3 is associ-

Bimodal Gene Expression in Ovarian Cancer 221 JMD May 2012, Vol. 14, No. 3

ated with the LMP-related C3 subtype. This is consistent with its up-regulation in the good prognostic group in the present study. The other nine genes overlapping between the present study and that of Tothill et al21 (MEGF10, ADH1B, LUM, FABP4, EPYC, TRHDE, GREM1, LIX1, and the unknown gene targeted by probe set 229479_at) showed increased expression in the C1 subtype, which was described as a stromal-associated subtype. In summary, we have identified a small set of genes that have the potential to be robust prognostic markers in epithelial ovarian cancer. Initial validation demonstrates that a simple survival score based on these genes holds across varied data sets and even across different array platforms. We have also demonstrated the utility of a novel approach (bimodal gene discovery) in identifying clinically relevant expression targets. Finally, this work raises a number of intriguing cell biological and biochemical questions concerning these genes, from what transcription factors or genetic events drive their bimodal expression pattern to the implications of expression on basic cell functions as related to neoplasia.

References 1. Karn T, Metzler D, Ruckhäberle E, Hanker L, Gätje R, Solbach C, Ahr A, Schmidt M, Holtrich U, Kaufmann M, Rody A: Data-driven derivation of cutoffs from a pool of 3,030 Affymetrix arrays to stratify distinct clinical types of breast cancer. Breast Cancer Res Treat 2010, 120: 567–579 2. Viale G, Regan MM, Maiorano E, Mastropasqua MG, Dell’Orto P, Rasmussen BB, Raffoul J, Neven P, Orosz Z, Braye S, Ohlschlegel C, Thürlimann B, Gelber RD, Castiglione-Gertsch M, Price KN, Goldhirsch A, Gusterson BA, Coates AS: Prognostic and predictive value of centrally reviewed expression of estrogen and progesterone receptors in a randomized trial comparing letrozole and tamoxifen adjuvant therapy for postmenopausal early breast cancer: BIG 1–98. J Clin Oncol 2007, 25:3846 –3852 3. Abramson V, Arteaga CL: New strategies in HER2-overexpressing breast cancer: many combinations of targeted drugs available. Clin Cancer Res 2011, 17:952–958 4. Jemal A, Siegel R, Xu J, Ward E: Cancer statistics, 2010 [Erratum appeared in CA Cancer J Clin 2011, 61:133–134]. CA Cancer J Clin 2010, 60:277–300 5. Edge SB, Byrd DR, Compton CC, Fritz AG, Greene FL, Trotti A: AJCC Cancer Staging Manual, ed 7. New York, Springer; 2010 6. Clark TG, Stewart ME, Altman DG, Gabra H, Smyth JF: A prognostic model for ovarian cancer. Br J Cancer 2001, 85:944 –952 7. Agarwal R, Kaye SB: Prognostic factors in ovarian cancer: how close are we to a complete picture? Ann Oncol 2005, 16:4 – 6 8. Menczer J, Golan A, Levy T: Platin sensitivity and long-term survival in stage III epithelial ovarian cancer patients. Eur J Gynaecol Oncol 2008, 29:473– 475 9. Bild A, Yao G, Chang J, Wang Q, Potti A, Chasse D, Joshi M, Harpole D, Lancaster J, Berchuck A, Olson JA Jr, Marks JR, Dressman HK, West M, Nevins JR: Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 2006, 439:353–357 10. Welsh JB, Zarrinkar PP, Sapinoso LM, Kern SG, Behling CA, Monk BJ, Lockhart DJ, Burger RA, Hampton GM: Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. Proc Natl Acad Sci USA 2001, 98:1176 –1181 11. Schwartz DR, Kardia SLR, Shedden KA, Kuick R, Michailidis G, Taylor JMG, Misek DE, Wu R, Zhai Y, Darrah DM, Reed H, Ellenson LH, Giordano TJ, Fearon ER, Hanash SM, Cho KR: Gene expression in ovarian cancer reflects both morphology and biological behavior, distinguishing clear cell from other poor-prognosis ovarian carcinomas. Cancer Res 2002, 62:4722– 4729

12. Le Page C, Puiffe ML, Meunier L, Zietarska M, de Ladurantaye M, Tonin P, Provencher D, Mes-Masson AM: BMP-2 signaling in ovarian cancer and its association with poor prognosis. J Ovarian Res 2009, 2:4 13. Schaner ME, Ross DT, Ciaravino G, Sorlie T, Troyanskaya O, Diehn M, Wang YC, Duran GE, Sikic TL, Caldeira S, Skomedal H, Tu IP, Hernandez-Boussard T, Johnson SW, O’Dwyer PJ, Fero MJ, Kristensen GB, Borresen-Dale AL, Hastie T, Tibshirani R, van de Rijn M, Teng NN, Longacre TA, Botstein D, Brown PO, Sikic BI: Gene expression patterns in ovarian carcinomas. Mol Biol Cell 2003, 14: 4376 – 4386 14. Bonome T, Lee JY, Park DC, Radonovich M, Pise-Masison C, Brady J, Gardner GJ, Hao K, Wong WH, Barrett JC, Lu KH, Sood AK, Gershenson DM, Mok SC, Birrer MJ: Expression profiling of serous low malignant potential, low-grade, and high-grade tumors of the ovary. Cancer Res 2005, 65:10602–10612 15. Gilks CB, Vanderhyden BC, Zhu S, van de Rijn M, Longacre TA: Distinction between serous tumors of low malignant potential and serous carcinomas based on global mRNA expression profiling. Gynecol Oncol 2005, 96:684 – 694 16. Berchuck A, Iversen ES, Lancaster JM, Pittman J, Luo J, Lee P, Murphy S, Dressman HK, Febbo PG, West M, Nevins JR, Marks JR: Patterns of gene expression that characterize long-term survival in advanced stage serous ovarian cancers. Clin Cancer Res 2005, 11:3686 –3696 17. Spentzos D, Levine DA, Ramoni MF, Joseph M, Gu X, Boyd J, Libermann TA, Cannistra SA: Gene expression signature with independent prognostic significance in epithelial ovarian cancer [Erratum appeared in J Clin Oncol 2005, 23:248]. J Clin Oncol 2004, 22:4700 – 4710 18. Helleman J, Jansen MPHM, Span PN, van Staveren IL, Massuger LFAG, Meijer-van Gelder ME, Sweep FCGJ, Ewing PC, van der Burg MEL, Stoter G, Nooter K, Berns EMJJ: Molecular profiling of platinum resistant ovarian cancer [Erratum appeared in Int J Cancer 2006 Aug 1;119(3):726]. Int J Cancer 2006, 118:1963–1971 19. Jazaeri AA, Awtrey CS, Chandramouli GVR, Chuang YE, Khan J, Sotiriou C, Aprelikova O, Yee CJ, Zorn KK, Birrer MJ, Barrett JC, Boyd J: Gene expression profiles associated with response to chemotherapy in epithelial ovarian cancers. Clin Cancer Res 2005, 11:6300 – 6310 20. Hartmann LC, Lu KH, Linette GP, Cliby WA, Kalli KR, Gershenson D, Bast RC, Stec J, Iartchouk N, Smith DI, Ross JS, Hoersch S, Shridhar V, Lillie J, Kaufmann SH, Clark EA, Damokosh AI: Gene expression profiles predict early relapse in ovarian cancer after platinum-paclitaxel chemotherapy. Clin Cancer Res 2005, 11:2149 –2155 21. Tothill RW, Tinker AV, George J, Brown R, Fox SB, Lade S, Johnson DS, Trivett MK, Etemadmoghadam D, Locandro B, Traficante N, Fereday S, Hung JA, Chiew YE, Haviv I; Australian Ovarian Cancer Study Group, Gertig D, DeFazio A, Bowtell DD: Novel molecular subtypes of serous and endometrioid ovarian cancer linked to clinical outcome. Clin Cancer Res 2008, 14:5198 –5208 22. Mok SC, Bonome T, Vathipadiekal V, Bell A, Johnson ME, Wong KK, Park DC, Hao K, Yip DK, Donninger H, Ozbun L, Samimi G, Brady J, Randonovich M, Pise-Masison CA, Barrett JC, Wong WH, Welch WR, Berkowitz RS, Birrer MJ: A gene signature predictive for outcome in advanced ovarian cancer identifies a survival factor: microfibril-associated glycoprotein 2. Cancer Cell 2009, 16:521–532 23. Wang J, Wen S, Symmans WF, Pusztai L, Coombes KR: The bimodality index: a criterion for discovering and ranking bimodal signatures from cancer gene expression profiling data. Cancer Inform 2009, 7:199 –216 24. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C: Detection of large-scale variation in the human genome. Nat Genet 2004, 36:949 –951 25. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95:14863–14868 26. Saldanha AJ: Java Treeview— extensible visualization of microarray data. Bioinformatics 2004, 20:3246 –3248 27. Crijns AP, Fehrmann RS, de Jong S, Gerbens F, Meersma GJ, Klip HG, Hollema H, Hofstra RM, te Meerman GJ, de Vries EG, van der Zee AG: Survival-related profile, pathways, and transcription factors in ovarian cancer. PLoS Med 2009, 6:e24

222 Kernagis et al JMD May 2012, Vol. 14, No. 3

28. Denkert C, Budczies J, Darb-Esfahani S, Györffy B, Sehouli J, Könsgen D, Zeillinger R, Weichert W, Noske A, Buckendahl AC, Müller BM, Dietel M, Lage H: A prognostic gene expression index in ovarian cancer—validation across different independent data sets. J Pathol 2009, 218:273–280 29. Yoshihara K, Tajima A, Yahata T, Kodama S, Fujiwara H, Suzuki M, Onishi Y, Hatae M, Sueyoshi K, Kudo Y, Kotera K, Masuzaki H, Tashiro H, Katabuchi H, Inoue I, Tanaka K: Gene expression profile for predicting survival in advanced-stage serous ovarian cancer across two independent datasets. PLoS One 2010, 5:e9615 30. Weigelt B, Mackay A, A’Hern R, Natrajan R, Tan DS, Dowsett M, Ashworth A, Reis-Filho JS: Breast cancer molecular profiling with single sample predictors: a retrospective analysis. Lancet Oncol 2010, 11:339 –349 31. Kuhlbrodt K, Herbarth B, Sock E, Enderich J, Hermans-Borgmeyer I, Wegner M: Cooperative function of POU proteins and SOX proteins in glial cells. J Biol Chem 1998, 273:16050 –16057 32. Wang X, Asplund AC, Porwit A, Flygare J, Smith CI, Christensson B, Sander B: The subcellular Sox11 distribution pattern identifies subsets of mantle cell lymphoma: correlation to overall survival. Br J Haematol 2008, 143:248 –252 33. Weigle B, Ebner R, Temme A, Schwind S, Schmitz M, Kiessling A, Rieger MA, Schackert G, Schackert HK, Rieber EP: Highly specific overexpression of the transcription factor SOX11 in human malignant gliomas. Oncol Rep 2005, 13:139 –144 34. Brennan DJ, Ek S, Doyle E, Drew T, Foley M, Flannelly G, O’Connor DP, Gallagher WM, Kilpinen S, Kallioniemi OP, Jirstrom K, O’Herlihy C, Borrebaeck CA: The transcription factor Sox11 is a prognostic factor for improved recurrence-free survival in epithelial ovarian cancer. Eur J Cancer 2009, 45:1510 –1517

35. Fernàndez V, Salamero O, Espinet B, Solé F, Royo C, Navarro A, Camacho F, Beà S, Hartmann E, Amador V, Hernández L, Agostinelli C, Sargent RL, Rozman M, Aymerich M, Colomer D, Villamor N, Swerdlow SH, Pileri SA, Bosch F, Piris MA, Montserrat E, Ott G, Rosenwald A, López-Guillermo A, Jares P, Serrano S, Campo E: Genomic and gene expression profiling defines indolent forms of mantle cell lymphoma. Cancer Res 2010, 70:1408 –1418 36. Stuart JE, Lusis EA, Scheck AC, Coons SW, Lal A, Perry A, Gutmann DH: Identification of gene markers associated with aggressive meningioma by filtering across multiple sets of gene expression arrays. J Neuropathol Exp Neurol 2011, 70:1–12 37. van Vlodrop IJ, Baldewijns MM, Smits KM, Schouten LJ, van Neste L, van Criekinge W, van Poppel H, Lerut E, Schuebel KE, Ahuja N, Herman JG, de Bruine AP, van Engeland M: Prognostic significance of Gremlin1 (GREM1) promoter CpG island hypermethylation in clear cell renal cell carcinoma. Am J Pathol 2010, 176:575–584 38. Bjartell AS, Al-Ahmadie H, Serio AM, Eastham JA, Eggener SE, Fine SW, Udby L, Gerald WL, Vickers AJ, Lilja H, Reuter VE, Scardino PT: Association of cysteine-rich secretory protein 3 and beta-microseminoprotein with outcome after radical prostatectomy. Clin Cancer Res 2007, 13:4130 – 4138 39. Dahlman A, Rexhepaj E, Brennan DJ, Gallagher WM, Gaber A, Lindgren A, Jirström K, Bjartell A: Evaluation of the prognostic significance of MSMB and CRISP3 in prostate cancer using automated image analysis. Mod Pathol 2011, 24:708 –719 40. Bueno R, De Rienzo A, Dong L, Gordon GJ, Hercus CF, Richards WG, Jensen RV, Anwar A, Maulik G, Chirieac LR, Ho KF, Taillon BE, Turcotte CL, Hercus RG, Gullans SR, Sugarbaker DJ. Second generation sequencing of the mesothelioma tumor genome. PLoS One 2010, 5:e10612