Using Microarray Analysis as a Prognostic and Predictive Tool in Oncology: Focus on Breast Cancer and Normal Tissue Toxicity Dimitry S.A. Nuyten, MD,*,†,‡ and Marc J. van de Vijver, MD, PhD†,‡ Microarray analysis makes it possible to study the expression levels of tens of thousands of genes in one single experiment and is widely available for research purposes. Gene expression profiling is currently being used in many research projects aimed at identifying gene expression signatures in malignant tumors associated with prognosis and response to therapy. An important goal of such research is to develop gene expression– based diagnostic tests that can be used to guide therapy in cancer patients. Here we provide examples of studies using microarrays, especially focusing on breast cancer, in a wide range of fields including prediction of prognosis, distant metastasis and local recurrence, therapy response to radio- and chemotherapy, and normal tissue response. Semin Radiat Oncol 18:105-114 © 2008 Elsevier Inc. All rights reserved.
S
ince the late 1990s, microarray analysis has been used to study multiple diseases, most prominently cancer. This method of analysis has provided researchers with a powerful tool to look at genome-wide gene expression.1-3 Differences in gene expression in tumors that appear to be similar according to clinical and pathological features can be detected and associated with response or resistance to therapeutic regimens, clinical outcome (eg, survival, metastasis, and local recurrences), and treatment toxicity. The first step for analyzing gene expression profiles is the extraction of messenger RNA (mRNA) from a frozen (tumor) tissue sample. The mRNA (or complementary DNA [cDNA]) is labeled with a florescent dye and hybridized to a glass slide that contains a library of complementary strings of DNA in the form of cDNA fragments or oligonucleotides. The array is scanned for the amount of fluorescently labeled RNA bound to each spot on the array (representing part of a gene), and, based on this, an expression value for each gene is calculated.
*Radiation Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands. †Diagnostic Oncology, The Netherlands Cancer Institute, Amsterdam, The Netherlands. ‡Experimental Therapy, The Netherlands Cancer Institute, Amsterdam, The Netherlands. Address reprint requests to Dimitry S.A. Nuyten, MD, Divisions of Radiation Oncology, Diagnostic Oncology, and Experimental Therapy, The Netherlands Cancer Institute, Plesmaniaan 121, 1066 CX Amsterdam, The Netherlands. E-mail:
[email protected]
1053-4296/08/$-see front matter © 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.semradonc.2007.10.007
To analyze the expression data obtained in this way, the most commonly used methods are unsupervised and supervised classification.4-8 Unsupervised methods analyze differences in gene expression between samples without upfront knowledge of clinical outcome (Fig 1). The supervised approach uses clinical data to build a predictive model for outcome (eg, metastasis, death, and therapy response) (Fig 2). Supervised approaches can also be used to find gene expression profiles that characterize specific clinical or pathological parameters such as histologic grade and proliferation and prediction of normal tissue toxicity. Although the unsupervised methods are primarily used to unravel biological differences between tumors and are not optimally suited to find classifiers predicting outcome, the supervised approaches are more suited to identify prognostic and predictive gene expression patterns. An additional method for the analysis of gene expression data is to apply gene expression signatures obtained by performing biological experiments on gene expression profiles in tumors. It has been shown that these gene expression profiles can have predictive value in various cancer types. In radiation oncology, microarray analysis can be applied to study general radiation response and radiation sensitivity and resistance both in vitro and in vivo for tumors and normal tissue (toxicity).
Radiation Response Toxicity It can be hypothesized that gene expression profiles in normal cells such as peripheral blood lymphocytes are associated 105
D.S.A. Nuyten and M.J. van de Vijver
106
Using all genes (filter for noise)
Microarray data (e.g. 25.000 genes) Expression value for each gene
Using selected list (hypothesis driven analysis)
Calculate correlation between expression values of different genes
Unsupervised (2-dimensional) Hierarchical Clustering
Visualizing Results (Heatmap)
3
Test for prognostic value or Correlate to Clinical and Pathological Features
Form groups based on similarities in expression (based on dendrogram)
0 -3
1
Overall S urvival at 15 years
0.9
Quies c ent
Overall Survival
0.8 0.7
• 77% Quiescent
0.6
• 46% Activated
A c tivated
0.5 0.4
• HR 3.6
0.3
Determining the optimal cut off (optimizing sensitivity and specifity) - Need for (Cross) validation
(95% C I 2.3 -5.6)
0.2 0.1
P<
0 0
153 142
1*10 -8
5
140 99
10 15 Years after Surgery
100 56
34 12
Validation on Second (retrospective) Cohort
20
4 3
Quies cent Activated
Validation on Independent (retrospective) Cohort - Multi Center
Prospective (randomized) Multicenter Validation
Figure 1 Schematic visualization of unsupervised analysis. All genes can be used; commonly a 1.5- or 2-fold up- and downregulation filter is used to filter noise. As a second example, a gene list from a hypothesis-driven gene expression profile (example used, the wound-response signature,20-21 updated clinical data) is used. Similarities in expression values are calculated by using a (Pearson) correlation model and visualized in a heat map. Gene expression values are scaled red (upregulation), black (average expression), gray (missing values), and green (downregulation) visualizing magnitude of expression as compared to the mean expression values per gene (each column represents a patient, and each row represents a gene). The dendogram on top visualizes the similarities between patients; a shorter distance indicates a higher correlation. Subsequently, groups of patients with a similar pattern of gene expression can be grouped based on clusters from the dendogram. These groups can be analyzed for correlation to outcome (eg, overall survival).
with the reaction of normal tissues to radiation therapy. By using this approach, Hummerich et al9 studied normal tissue toxicity in an intensity-modulated radiation therapy– treated cohort of prostate cancer patients. They compared irradiated lymphocytes from patients with documented normal toxicity (grade 0-2, n ⫽ 40) to those of patients showing enhanced normal tissue radiosensitivity (grade 3-4, n ⫽ 18). By using a custom-made cDNA array, containing 143 DNA repair or repair-related genes, they identified 19 genes differentially expressed in lymphocytes between the 2 patient cohorts. Results were subsequently validated by reverse-transcription polymerase chain reaction and revealed genes involved in DNA repair pathways (eg, ATM, BRCA1, BRCA2, ERCC1, POLH, and POLK [polymerase eta and kappa]). The study cohort unfortunately was too small to validate these results.
In a similar study, Svensson et al10 investigated a cohort of 21 prostate cancer patients who suffered severe complication (grade 3) and 17 patients with minor or no toxicity (grade 0-2). These patients were treated to an elective pelvic field (40-50 Gy) and received an additional boost to the prostate (70 Gy cumulative dose). Irradiated lymphocytes from these patients were used for microarray analysis on the Affymetrix HG-U133A platform (Affymetrix, Santa Clara, CA). In a supervised approach and cross-validation on the 38-patient cohort as described by Michiels et al,8 a cassette of 72 genes was identified. This whole genome approach, in contrast to the DNA-repair custom-made array approach by Hummerich et al, showed stress signaling, apoptosis, development and protein metabolism, and ubiquitination to be the most important pathways differentially expressed between patients who experienced mild toxicity and patients who expe-
Microarray analysis and breast cancer rienced excessive toxicity. Interestingly, they also analyzed 4 blood samples (out of the 38 original patients) taken a year after the original sample, and results were fully reproducible. In a small validation set of 12 patients, 8 patients were classified correctly. The authors conclude that an apoptotic response may protect against normal tissue toxicity and variability in downstream targets in the identified pathways might explain variation in response between patients. Vozenin-Broton et al11 studied late effects on the small intestine in 6 patients who underwent surgery for radiation enteritis by comparing gene expression in small bowel tissue removed at surgery to gene expression of specimens from 6 patients undergoing (a small segment) small bowel resection for colon carcinoma without receiving prior radiotherapy. Patients in the radiation groups typically underwent radiation treatment up to a dose of 45 Gy with a variable interval of 1 to 16 months (1 patient 75) before surgery. Differentially expressed genes between the 2 groups showed the following functional groups: fibrosis (matrix metalloproteinase 1, 2, 3, and 14 and tissue inhibitor of metalloproteinases 2), growth factor/cytokine/chemokine related (tumor necrosis factor precursor and insulin-like growth factor binding protein 2), cell adhesion (intergrin beta 4 and intergrin associated protein), intracellular signaling (transforming protein rhoB and ephrin type-A receptor 1 precursor [eph]), nuclear signaling, metabolic pathways (dipeptidyl peptidase IV), and stress response. This comprehensive approach leads to a hypothesis
107 of the pathogenic pathway of radiation enteritis. A chronic inflammatory response and a stress response lead to changes in the extracellular matrix components, which lead to vascular changes and fibrosis through contraction and differentiation of smooth muscle cells and fibroblasts.
Tumor Response Harima et al12 examined “thermoradiotherapy” response (radiotherapy and hyperthermia) in a heterogeneous groups of 19 cervical cancer patients (stage IIIA-IVB: 8 responders and 11 nonresponders) using a cDNA microarray platform. A 35-gene classifier was constructed by using a supervised approach and a permutation test as describe by Golub et al.13 The classifier was submitted to a leave-one-out cross-validation procedure. Pathways involved in responsive tumors include apoptosis, whereas hypoxia (HIF1 and CA12), tumor cell invasion, and metastasis (cathepsin L and cathepsin B, plasminogen activator, and urokinase) were upregulated in resistant tumors. In a previous study from the same group (Kitahara et al14), they analyzed 9 sensitive and 10 resistant cervical tumors that were treated by radiation therapy alone and identified a 62gene classifier using a similar approach. Although these genes showed little overlap, the same processes are involved in both sensitive and resistant groups (eg, apoptosis and DNA repair). The authors concluded in the second study that this approach allows for better patient selection. It should be
Figure 2 Schematic visualization of supervised analysis. Microarray data and clinical data are used together upfront. A group of samples can be split into a test and validation set. Correlation between expression values and outcome is calculated. The optimal number of genes (the classifier) is determent to separate groups (eg, yes or no metastasis) and a cross-validation within the test set is preformed. Patients are separated into groups based on the classifier by hierarchical clustering or based on correlation to an average expression pattern representing the average of 1 group (eg, no metastasis), a centroid. The results can be tested for robustness on the validation set. Subsequently, independent validations can be performed on archived material or in a prospective setting.
D.S.A. Nuyten and M.J. van de Vijver
108 noted that these are studies with a small sample size and that proper validation is lacking. In 19 locally advanced breast cancer patients, Helland et al15 analyzed gene expression profiles in biopsies during a course of radiation treatment (after 20 Gy; 10 fractions of 2 Gy) and compared these with those in baseline biopsies and validated results using reverse-transcription polymerase chain reaction. They did not have data on clinical tumor response but focused on changes in gene expression during treatment. The strongest upregulation was seen for DDB2 (DNA-repair) and CDKN1A (cell-cycle regulation).
Local Recurrence Prediction in Breast Cancer Postmastectomy It is still under debate which breast cancer patients should receive postmastectomy radiotherapy. There is evidence for a benefit for even T1 and T2 tumors with limited lymph node involvement, but these results have also been criticized.16,17 Cheng et al18 retrospectively sought a predictive profile for local-regional recurrence in mastectomy patients who did not receive adjuvant radiation treatment. At a minimal fol-
161 Patients treated with breast conserving therapy; 17 patient with LR, 144 without
Training set; 81 patients 72 without LR, 9 with LR
Validation set; 80 patients 72 without LR, 8 with LR
Input Gene List - Wound Signature - Hypoxia Signature - 70-genes
Calculate Pearson Correlation to LR-centroid (that has been calculated in training set LRpatients) for all patients (n=80) in validation set
Calculate LR-Centroid Average Expression per gene from Gene List across LR patients in training set (n=9)
Apply optimal correlation cut off that has been established in training set on patients in validation set
Calculate Pearson Correlation to LR-centroid for all patients (n=81) in training set Look for optimal correlation cut off value to separate LR and No LR patients
Figure 3 Schematic representation of the optimization model used to build a local recurrence predictor based on previously established gene expression profiles.20
Microarray analysis and breast cancer
109
low-up of 3 years, 92 patients (67 free of local-regional recurrence and 27 with a local-regional recurrence) were randomly split in a training and validation group (2:1 ratio). Local-regional recurrences included chest wall (14), axilla (1), internal mammary chain (1), supraclavicular fossa (6), and a combination of these sites (5). The method applied by Cheng et al is based on the so-called metagene approach (a subgroup of genes that are clustered together because of their similarity in gene function or sharing the same pathway) and also includes a leave-one-out cross-validation step and multiple optimization steps. The optimal predictive set of genes (n ⫽ 34) in the training set are selected based on their (Pearson) correlation coefficient to local recurrence. In the validation set, the sensitivity of prediction of local-regional recurrence was 67%, whereas the specificity was 83%. The
pathways represented in this signature are cell death, cell cycle and proliferation, DNA replication and repair, and immune response. These genes involved oncogenic process (BLM, TCF3, RCHY1, and PTI1), proliferation (TPX2), cellcycle regulation (CCNB1, GPS2, and FYN), cell-cell interaction (CMAH), cell morphology (CLCA2), and immune response (CCR1). Breast-Conserving Therapy Because young age is an independent risk factor for local recurrence after breast-conserving therapy, Kreike et al19 looked at gene expression profiles in patients under 51 years of age who had undergone breast-conserving therapy. In a series of 50 tumors (31 free of local recurrence at least 11 years of follow-up; 19 true local recurrences as also deter1
1 0.8 0.7 0.6
• 80% G ood
P oor
0.5
0.9
Metastasis F ree S urvival at 15 years
G ood
0.4
• 46% P oor
0.3
• HR 4.3
0.2
0.8
5
10 15 Years after Surgery
100 93
59 53
• HR 5.3 (95% C I 3.0 -9.4)
0.1
P < 1*10 -9
20
23 12
1 3
• 48% P oor
0.4
0
0
115 180
P oor
0.5
0.2
P < 1*10 -7
0
• 83% G ood
0.6
0.3
(95% C I 2.5 -7.3)
0.1
Overall S urvival at 15 years
G ood
0.7 Survival
Distant Metastasis free Survival
0.9
G ood P oor
0
5
115 180
111 128
10 15 Years after Surgery
82 74
20
27 19
2 G ood 5 P oor
Figure 4a (left) and 4b (right): Kaplan-Meier Curves showing metastasis free and overall surivival for all patients (n=295) stratified by the 70-genes prognosis profile.
1
0.7
• 72% Quies cent
0.6
A c tivated
0.5 0.4
• 47% Activated • HR 2.8
0.3 0.2
• 77% Quiescent
0.6
5
10 15 Years after Surgery
120 73
73 39
0.4
• HR 3.6
0.3
(95% C I 2.3 -5.6) P<
0
20
26 9
• 46% Activated
A c tivated
0.5
0.1
0
153 142
0.7
0.2
P < 1*10 -5
0
Quies c ent
0.8
(95% C I 1.8 -4.3)
0.1
Overall S urvival at 15 years
0.9
Metas tas is F ree Quies c ent S urvival at 15 years
0.8
Overall Survival
Distant Metastasis Free Survival
1 0.9
1*10 -8
0
5
153 142
3 Quies cent 1 Activated
140 99
10 15 Years after Surgery
100 56
20
34 12
4 3
Quies cent Activated
Figure 4c (left) and 4d (right): Kaplan-Meier Curves showing metastasis free and overall surivival for all patients (n=295) stratified by the Wound-response Signature.
1
1
0.8
Metas tas is F ree S urvival at 15 years
Non-Hypoxic
0.7
• 67% Non-Hypoxic
0.6 0.5
Hypoxic
• 42% Hypoxic
0.4 0.3
• HR 2.3
0.2
(95% C I 1.5 -3.5)
0.1 0
5
10
15
20
Years after surgery
218 77
0.8 0.7
154 41
91 23
• 69% Non-Hypoxic
Non-Hypoxic
0.6
• 44% Hypoxic
0.5 0.4
• HR 2.6
0.3
(95% C I 1.7 -3.8)
0.2 0.1
P < 0.0001
0
Overall S urvival at 15 years
0.9
Overall Survival
Metastasis Free Survival
0.9
Hypoxic
P <0.00001
0 0
5
10
15
20
Years after Surgery
31 6
3 Non-Hypoxic 1 Hypoxic
218 77
190 51
124 34
40 8
6 Non-Hypoxic 1 Hypoxic
Figure 4e (left) and 4f (right): Kaplan-Meier Curves showing metastasis free and overall surivival for all patients (n=295) stratified by the Hypoxia-response Signature.
Figure 4 Kaplan-Meier curves for distant metastasis free and overall survival for the (A and B) 70-gene prognosis profile, (C and D) the wound-response signature, and the (E and F) hypoxia-response profile. All gene expression profiles are analyzed in the NKI-295 dataset30 using updated clinical data (a median follow- up of 12 years for patients alive).
110 mined by loss of heterozygosity), they applied various methods of supervised and unsupervised analysis. One gene list seemed to be differentially expressed between tumors that did give rise to a local recurrence and tumors that did not, but this list was presumably driven by estrogen receptor (ER) status. When looking within the ER-positive subgroup (n ⫽ 39) or when analyzing the whole group with a gene list that did not have the ER-driven genes, no such significant separation could be determined. Nuyten et al20 have used a different approach to look for a gene expression signature that could predict a local recurrence after breast-conserving therapy. When a standard supervised approach failed to predict a local recurrence, they applied a hybrid model consisting of a hypothesis-driven gene expression profiles (the wound-response signature21 or the hypoxia signature22) or a supervised profile (the 70-gene prognosis profile23) and optimized these profiles (all of these profiles predict distant metastasis and survival in multiple datasets) toward local recurrence prediction in a test set by incorporating the clinical data (72 tumors without a local recurrence and 9 tumors with a local recurrence). Subsequently, these classifiers were applied to a validation set (72 tumors without a local recurrence and 8 tumors with a local recurrence) to test the robustness of these optimized signatures (Fig 3). Only the supervised/ optimized wound signature was found to be a robust predictor of local recurrence after breast-conserving therapy. In a multivariate model, including clinical and pathological variable, the wound signature was not only independent in predicting local recurrence, but it was also the strongest predictor.
Tumor Response to Chemotherapy Neoadjuvant Treatment of Breast Cancer Neoadjuvant treatment of breast cancer has been an attractive setting to study drug sensitivity in patients. The relatively large number of patients, ability to biopsy at multiple time points, and availability of response evaluation by magnetic resonance imaging during treatment and pathology evaluation of response at surgery after completing chemotherapy all favor this study design. Furthermore, a pathological complete response is considered to be a valid endpoint for response in view of its association with improved survival.24 Sotiriou et al25 were the first to report on a microarray profile for chemotherapy response (doxorubicin/cyclofosfamide) in a small series of patients (n ⫽ 10). By using a leave-one-out cross-validation approach, a 37-gene list was identified that separated patients with a response (complete clinical response or minimal residual disease) from those who did not respond (stable disease or progressive disease) to the chemotherapy regimen. Among the identified genes, DNA repair and cell death regulators (HMG1, COX17, PAPPA, BCL2-like2, GADD34, RPL27, and CD44) played an important role as did the oncogene DLC2, growth factor receptor KIT, and UBPH, which is part of the ubiquitin-proteosome pathway.
D.S.A. Nuyten and M.J. van de Vijver A somewhat larger cohort was analyzed by using microarrays of core biopsies by Chang et al.26 Patients with either a large primary tumor (⬎4 cm) or clinical positive lymph nodes were treated in a phase II study with neoadjuvant docetaxel. Eleven sensitive tumors and 13 resistant tumors were compared, and a 92-gene list was constructed using a leave-one-out cross-validation approach showing a positive and negative predictive value of 92% and 83%, respectively. Biological pathways represented in this classifier include apoptosis, cell adhesion or cytoskeleton, protein transport, signal transduction, and cell cycle. A small set of 6 sensitive tumors were used as validation set, and all of these tumors were classified correctly. Hess et al27 studied the response to neoadjuvant paclitaxel, fluorouracil, doxorubicin, and cyclophosphamide in 133 stage I to III breast cancer patients. This study used a crossvalidation approach, separated the 133 patients into a training (n ⫽ 82) and validation (n ⫽ 51) set, and applied a variety of signature discovery tests. An optimal set of 30 genes was selected and applied to the validation set. An overall sensitivity of 93% (pathological complete response) was seen, outperforming a predictor based on clinical variables. The negative predictive value and area under the curve also favored the genetic classifier, although not reaching a statistically significant difference. Hannemann et al28 analyzed 48 patients who were part of a randomized phase II trial in which 2 anthracycline-containing regimens (cyclophosphamide and doxorubicin-docetaxel) were compared in the neoadjuvant setting.28 Microarray analysis was performed by using an 18,000 element cDNA array, and pretreatment biopsies were compared with posttreatment (surgical) specimens. For 15 patients, both pre- and posttreatment sample were available, enabling looking at differences in gene expression before and after treatment. No clear response profile was discovered for either one of the treatment arms or for the combined group of patients that achieved a (near) pathological complete remission (10 patients, 20%). Neither unsupervised nor supervised methods showed a separation between responders and nonresponders (analysis based on pretreatment samples). When pre- and posttreatment samples were compared, remarkably few differences in overall gene expression were seen in patients showing stable disease. In unsupervised hierarchical cluster analysis, interestingly, samples from patients showing stable disease mostly clustered together. A similar observation was made in different studies. In the Kreike study, local recurrence had a tendency to cluster together with their primary tumor, showing a more similar pattern of gene expression between recurrence and primary tumor from 1 patient compared with tumors from different patients. Hannemann et al found remarkable differences in gene expression between pre- and posttreatment samples in responders. Genes identified were mostly involved in cell metabolism, whereas a few genes involved in proliferation and apoptosis were also present. A few other groups have described response profiles in the setting of neoadjuvant chemotherapy treatment of breast cancer; neither one of them was validated independently nor did they show a sensitivity and specificity that would be required for clinical decision making. Larger studies and independent vali-
Microarray analysis and breast cancer dation series are needed before such profiles will be applicable in clinical decision making. Furthermore, profiles for specific drugs, in contrast to combination regimens of 2 or 3 drugs, are needed. Another factor not taken into account in these studies is the variability in drug metabolism between patients. Drug Screens A broader signature-based drug-screen approach was applied by Potti and coworkers29 using tumor cell-line chemotherapy response data (National Cancer Institute 60 cell line panel) to derive signatures for a specific drug that could subsequently be applied on human cancer data sets measuring chemotherapy response. Cell lines that were docetaxel sensitive or resistant were selected, and their gene expression profile was analyzed on Affymetrix microarrays. Predictive gene expression profiles were validated on an independent set of 29 lung cancer cell lines for which docetaxel responsiveness data and gene expression profiles were available. The last step in the validation process was predicting response in the tumors described by Chang et al26 (24 breast cancer samples from patients treated with neoadjuvant docetaxel). The response for 22 of 24 samples in this dataset was predicted correctly by the cell line– derived gene list.
Prognostic Gene Expression Signatures in Breast Cancer Supervised Approaches Based on the hypothesis that a tumor would have an intrinsic capacity to metastasize, regardless of size, and that this feature could be captured by gene expression profiling, researchers at the Netherlands Cancer Institute (NKI) performed a study that resulted in the identification of a 70-gene prognosis profile.23 An important goal of this study was to develop a gene expression profile that could identify the patients who are very unlikely to develop distant metastases (good prognosis group) among the breast cancer patients without lymph node metastases. The majority of these patients are cured by local therapy alone (surgery and radiation) and would not require adjuvant systemic therapy. A group of tumors from patients who were free of any recurrence of their breast cancer for at least 5 years was selected and a group who developed distant metastases within 5 years. All patients were below the age of 52 and lymph node negative, and most did not undergo adjuvant hormonal treatment or chemotherapy. The activity of 25,000 genes was measured, and, by using a stepwise process of training and cross-validation, an optimal prognostic set of 70 genes was selected. For a new tumor sample, for these 70 genes, the correlation to the average gene expression (“centroid”) of the original good prognosis patient is calculated, and, based on this correlation, a tumor is classified either as having a good or poor prognosis (eg, a poor prognosis profile corresponds to a 50% risk of developing distant metastases and a good prognosis profile indicates a smaller than 10% risk of metastasis). The genes in this signature are involved in cell-cycle, invasion, metastasis, angiogenesis, and signal transduction (eg, cyclin E2, MCM6, metalloproteinases MMP9 and MP1,
111 RAB6B, PK428, ESM1, and the Vascular Epithelial Growth Factor receptor FLT1). After the initial series of 78 patients, subsequent studies were performed to validate the prognostic value of this signature. In a study of 295 lymph node–negative and –positive breast cancer patients (including 61 patients from the original series), at 10 years, there is a large difference in survival (95% for the goodprognosis group v 55% for the poor-prognosis subgroup [updated results shown in Fig 4a and 4b; Nuyten unpublished data]).30 In a second validation series (by the translating molecular knowledge into early breast cancer management building on the Breast International Group (TRANSBIG) consortium) of 307 early-stage lymph node–negative breast cancer patients who did not receive adjuvant systemic treatment from 5 different European centers, the 70-gene signature holds up as an independent predictor of outcome31 The clinical merit of this new diagnostic test, called MammaPrint (Agendia, Amsterdam, The Netherlands), is the main research question of a large phase III randomized trial called MINDACT.32 Risk assessment for lymph node–negative breast cancer patients will be determined by the both the 70-gene and clinical criteria (assessed by using Adjuvant! Online33 [Ajuvant! Inc, San Antonio, TX]). For patients who are designated as having a good prognosis both by Adjuvant!Online and the genomic test, only adjuvant hormonal treatment is advised for patients with hormone receptor–positive breast cancer. In the case of a concordant high-risk assessment, patients will be advised to undergo chemotherapy and hormonal therapy for endocrine-responsive disease. The merit of the genetic predictive assay is tested in the third group, in which the clinical and genomic criteria are discordant. These patients will be randomized to either undergo adjuvant chemotherapy treatment based on genomic or based on clinical criteria. The trial aims to prove equivalent outcome (5-year disease-free survival) for low genomic risk/ high clinical risk patients for whom chemotherapy has been forgone and thereby ultimately lowers the number of patients that undergo the toxic treatment unnecessarily. Unfortunately, the majority of hormone receptor–positive breast cancer patients in the MINDACT trial will undergo adjuvant hormonal therapy, making it impossible to validate the prognostic value of the 70-gene profile in patients who are treated without any adjuvant systemic treatment. Two key biological processes captured by the 70-gene signature, proliferation and cell cycle, are also the main biological determinants of the so-called Genomic Grade Index. Sotiriou et al34 used microarray analysis as a tool to further subclassify the intermediate-risk group of histological grade II tumors. In a supervised analysis they captured part of the biology that explains the heterogeneity in outcome in these tumors. A 97-gene signature was found to classify histologic grade II tumors into high and low genomic grade. When applied to the full spectrum of breast cancer, including histologic grade I and grade III tumors, this signature added prognostic value to clinical variables. Gene expression in breast cancers is largely driven by ER status. In a study of 115 lymph node–negative breast carcinomas, Wang et al35 used this difference to derive a signature
112 consisting of 60 genes for the ER-positive subgroup and 16 genes for the ER-negative subgroup. Their results were validated in 171 patients and showed a 27% absolute difference in overall survival (95% v 63% at 80 months)36 and in the TRANSBIG validation series that was first assembled for the validation of the 70-gene prognosis profile.37 Biological processes underlying this 76-gene signature are cell cycle, proliferation, cell death, DNA replication, and repair. This signature was also tested in an independent validation series. In this series of 180 lymph node–negative tumors, the 76-gene signature showed a difference in outcome and the hazard ratio for metastasis and death comparable to that in the original study. Unfortunately, the ER-negative subgroup was too small for a robust validation. It is of note that the gene expression signatures derived by supervised approaches described previously show little overlapping genes. Furthermore, analyzing tens of thousands of genes does not allow for testing every single combination of genes because of exhaustion of (cross) validation options and issues of data overfitting. Looking at pathways or metagenes underlying the different prognostic signatures will reveal similarities in underlying mechanisms in which ER-related genes, proliferation, and cell-cycle genes are key pathways involved in metastatic potential.38 This is further supported by a study performed by Fan et al.39 Multiple previously established different classifiers both supervised and hypothesis driven were all tested on 1 dataset. The 70-gene prognosis profile,23 a 21-gene recurrence score (RS),40 the intrinsic gene list,41-44 the woundresponse signature21,45 and a 2-gene “tamoxifen responsiveness” ratio46 are all analyzed on the NKI data. All classifiers except for the 21-gene RS are microarray based. The RS is a polymerase chain reaction– based prognostic test for earlystage, lymph node–negative, tamoxifen-treated breast cancer patients. For this study, the RS was transformed to the oligonucleotide microarray platform The different gene expression signatures not only show a strong predictive value for metastasis free and overall survival, but, more importantly, they show a high concordance in classifying the same patients as either having a poor or a good prognosis.
Biological Gene Expression Profiling Perou et al42,47 were the first to use microarrays to investigate gene expression patterns in breast cancer by unsupervised hierarchical cluster analysis. They hypothesized that differences in gene expression are responsible for various phenotypes in breast cancer patients. A subset of genes was selected based on large variation in expression between patients and small difference in expression between 2 biopsies from 1 patient (pre- and post-neoadjuvant chemotherapy treatment). By clustering tumors using this “intrinsic gene list,” 4 groups were identified: the basal type, the luminal/ epithelial ER⫹, the normal like, and the Erb-B2 group. In a subsequent study by Sorlie et al,43,44 the luminal group was subdivided into the luminal A and luminal B subgroups,
D.S.A. Nuyten and M.J. van de Vijver and it was shown these subgroups had a different prognosis. Basal-type tumors were shown to have a poor outcome, whereas the luminal A tumors have a relatively good prognosis. Clinically, these different types can be roughly translated into ER positive (luminal A and B;, ER, PR, and HER2 negative (“triple negative,” basal subtype); and HER2 postive (Erb-B2 subtype).
Hypothesis-Driven Gene Expression Profiling Hypothesis-driven gene expression signatures can be defined based on the outcome of in vitro or in vivo experiments using either cancer cell lines/tissues or benign cell lines/tissues. Based on a specific biological process that plays a role in cancer, gene signatures can be defined in such a model. These “functional” gene expression profiles can be applied to human cancer datasets. Tumors are classified or subcategorized based on similarities or differences in gene expression compared with the model. This model a priori represents a biological process in cancer and, therefore, could be used to unravel genotypic features of good and poor prognosis tumors. One example of this hypothesis-driven approach is the wound-response signature first described by Chang et al.21 “Tumors are wounds that do not heal” was the basis for the biological link between a wound-healing process and cancer.48 The in vitro model used for the wound-healing model is based on serum-activated fibroblasts. It was first shown that gene expression in a serum-activated fibroblast model exhibited similarities with processes involved in wound healing. Because proliferation is a general response of fibroblast in cell culture exposed to serum and not exclusively linked to wound healing, a “core-serum response” list of 512 genes was defined from the full list of activated and repressed genes after serum exposure by taking out the cell-cycle–regulated genes. Visualizing the gene expression pattern of these genes in different types of tumors (breast, gastric, and lung cancer) revealed a characteristic pattern of expression similar or completely opposed to the in vitro wound model, a “wound activated” and a “wound quiescent” subtype. Pathways represented in this gene list capture the biology of a wound, which encompasses extracellular matrix remodeling, cell-cell signaling, angioinvasion, complement activation, induction of cell motility, and proteolysis. These different processes can provide a cancer cell with the necessary tools to display metastatic behavior. The wound-response signature was subsequently validated on the 295 patients from the NKI series described previously (updated results shown in Fig 4c and 4d; Nuyten, unpublished data).45 This hypothesis-driven signature not only showed a strong prediction of metastasis and overall survival, it was also an independent prognostic factor in a multivariate Cox-regression model. To facilitate the use of such a signature as a clinical test, Chang et al45 showed the scalability of the wound signature. By calculating a score for each tumor, representing the amplitude of concordance or
Microarray analysis and breast cancer discordance with the wound model, a nonlinear linkage to outcome is shown (eg, at the end of the spectrum of correlation values, a small increase in score represents a larger increase in risk compared with the same increase in the score close to zero). Because the threshold of activated versus quiescent is scalable, it enables optimizing toward a specific endpoint in a specific kind of cancer. Another interesting biological process in cancer, especially in radiation oncology, is hypoxia. Hypoxia is a well-recognized phenomenon in cancer development and metastasis.49 After exposure to various conditions of hypoxia, different benign cell lines were profiled and used to create a hypoxia signature.21 Tumors can be assigned to the hypoxic or nonhypoxic group, either based on clustering or based on their average expression of the hypoxia-response genes. Also, the hypoxia signature is a powerful and independent predictor of breast cancer metastasis and overall survival in breast cancer patients (updated results shown in Fig 4e and 4f; Nuyten, unpublished data). Interestingly, the wound signature and the hypoxia signature have very few overlapping genes, and, in a multivariate model, they predict outcome independently. An important additional purpose of these gene expression profiling studies is to gain more understanding of the biological and genetic mechanisms leading to the development of distant metastases. Adler et al50 used both DNA and RNA data to search for the genomic regulators that drive the activated wound signature. Introducing a new analysis method termed “SLAMS” (stepwise linkage analysis of micro array signatures), 2 genes have been identified as drivers of the wound signature. These 2 genes, the well-known oncogene MYC and CSN5 (activator of ubiquitin ligase complexes) work together in activating the wound-response gene expression signature. This is the first step in unraveling the genetic mechanisms that lead to specific gene expression patterns that may be associated with the metastatic process.
Discussion Introducing a microarray-based test into the clinic encompasses multiple complicated steps. After the identification of a profile and validation on 1 or 2 retrospective datasets, the profile must be transformed into a reproducible standardized test. One microarray-based test (MammaPrint) and 1 polymerase chain reaction– based test (Oncotype Dx; Genomic Health, Redwood City, CA) are now commercially available. The main question is can these tests be used for patients or should clinical introduction be preceded by a prospective randomized validation trial. Both tests are currently being tested in the following prospective multicenter randomized clinical trials: the MINDACT-trial (MammaPrint) and the TailorX trial (Oncotype Dx). For MammaPrint, a national multicenter study, the RASTER trial, has been conducted to investigate the logistics for a microarray (fresh frozen tissue) test in the setting of community hospitals. The large randomized trials will take at least 3 to 4 years for accrual of the required number of patients and another 3 years before preliminary results will become available. The question is do we have to wait for these results before this promising technique
113 will be made available to patients. Clinical and pathological variables currently used for prognostication and subsequently for (adjuvant) treatment have never formally been tested for their value in treatment decision making in a prospective randomized controlled trial. Most of the prognostic profiles discussed in the review have been tested on one or a few larger datasets, but the predictive assays have only been validated on small numbers of samples. It would, therefore, appear wise to await the results from the large prospective studies (or, if such results would become available, from sufficiently large retrospective studies). In addition to studies of larger numbers of patients, further development of bioinformatics approaches to analyze gene expression data will be explored. Supervised classification appears to be the best method to identify prognostic and predictive profiles, but adding biologically derived signatures will definitely have added value. The hypothesis-driven approaches might have more impact on understanding the biology of poor prognosis tumors. This can lead to the identification of pathways involved in poor prognosis and may hint toward new drug targets. The work done by Fan et al39 suggests involvement of multiple different mechanisms in poor prognosis, and Adler et al50 show how mechanisms driving a (biological) signature can be identified. So far, studies have focused on identifying profiles that predict sensitivity to a given drug. Starting at the other end of the spectrum, not the drug but a subgroup of tumors, may enable drug target identification in (small) subsets of tumors for an available drug or might lead to new targets for future drug development.
Conclusion Microarray-based research has become a powerful highthroughput tool for researchers in many fields of biology and medicine. Oncology and other areas of medicine will benefit from diagnostic assays developed based on gene expression profiling studies.
References 1. Brown PO, Botstein D: Exploring the new world of the genome with DNA microarrays. Nat Genet 21:33-37, 1999 2. Schena M, Shalon D, Davis RW, et al: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467-470, 1995 3. DeRisi J, Penland L, Brown PO, et al: Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14:457-460, 1996 4. Eisen MB, Spellman PT, Brown PO, et al: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A 95:1486314868, 1998 5. Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98:5116-5121, 2001 6. Tibshirani R, Hastie T, Narasimhan B, et al: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A 99:6567-6572, 2002 7. Bair E, Tibshirani R: Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol 2:E108, 2004 8. Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: A multiple random validation strategy. Lancet 365:488492, 2005
114 9. Hummerich J, Werle-Schneider G, Popanda O, et al: Constitutive mRNA expression of DNA repair-related genes as a biomarker for clinical radio-resistance: A pilot study in prostate cancer patients receiving radiotherapy. Int J Radiat Biol 82:593-604, 2006 10. Svensson JP, Stalpers LJ, Esveldt-van Lange RE, et al: Analysis of gene expression using gene sets discriminates cancer patients with and without late radiation toxicity. PLoS Med 3:e422, 2006 11. Vozenin-Brotons MC, Milliat F, Linard C, et al: Gene expression profile in human late radiation enteritis obtained by high-density cDNA array hybridization. Radiat Res 161:299-311, 2004 12. Harima Y, Togashi A, Horikoshi K, et al: Prediction of outcome of advanced cervical cancer to thermoradiotherapy according to expression profiles of 35 genes selected by cDNA microarray analysis. Int J Radiat Oncol Biol Phys 60:237-248, 2004 13. Golub TR, Slonim DK, Tamayo P, et al: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286:531-537, 1999 14. Kitahara O, Katagiri T, Tsunoda T, Harima Y, Nakamura Y. Classification of sensitivity or resistance of cervical cancers to ionizing radiation according to expression profiles of 62 genes selected by cDNA microarray analysis. Neoplasia 4:295-303, 2002 15. Helland S, Johnsen H, Froyland C, et al: Radiation-induced effects on gene expression: An in vivo study on breast cancer. Radiother Oncol 80:230-235, 2006 16. Overgaard M, Nielsen HM, Overgaard J: Is the benefit of postmastectomy irradiation limited to patients with four or more positive nodes, as recommended in international consensus reports? A subgroup analysis of the DBCG 82 b&c randomized trials. Radiother Oncol 82:247-253, 2007 17. Pierce LJ: The use of radiotherapy after mastectomy: A review of the literature. J Clin Oncol 23:1706-1717, 2005 18. Cheng SH, Horng CF, West M, et al: Genomic prediction of locoregional recurrence after mastectomy in breast cancer. J Clin Oncol 24: 4594-4602, 2006 19. Kreike B, Halfwerk H, Kristel P, et al: Gene expression profiles of primary breast carcinomas from patients at high risk for local recurrence after breast-conserving therapy. Clin Cancer Res 12:5705-5712, 2006 20. Nuyten DS, Kreike B, Hart AA, et al: Predicting a local recurrence after breast-conserving therapy by gene expression profiling. Breast Cancer Res 8:R62, 2006 21. Chang HY, Sneddon JB, Alizadeh AA, et al: Gene expression signature of fibroblast serum response predicts human cancer progression: Similarities between tumors and wounds. PLoS Biol 2:E7, 2004 22. Chi JT, Wang Z, Nuyten DS, et al: Gene expression programs in response to hypoxia: Cell type specificity and prognostic significance in human cancers. PLoS Med 3:e47, 2006 23. Veer LJ, Dai H, van de Vijver MJ, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415:530-536, 2002 24. Fisher B, Bryant J, Wolmark N, et al: Effect of preoperative chemotherapy on the outcome of women with operable breast cancer. J Clin Oncol 16:2672-2685, 1998 25. Sotiriou C, Powles TJ, Dowsett M, et al: Gene expression profiles derived from fine needle aspiration correlate with response to systemic chemotherapy in breast cancer. Breast Cancer Res 4:R3, 2002 26. Chang J, Wooten E, Tsimelzon A, et al: Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet 362:362-369, 2003 27. Hess KR, Anderson K, Symmans WF, et al: Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. J Clin Oncol 24:4236-4244, 2006
D.S.A. Nuyten and M.J. van de Vijver 28. Hannemann J, Oosterkamp HM, Bosch CA, et al: Changes in gene expression associated with response to neoadjuvant chemotherapy in breast cancer. J Clin Oncol 23:3331-3342, 2005 29. Potti A, Dressman HK, Bild A, et al: Genomic signatures to guide the use of chemotherapeutics. Nat Med 12:1294-1300, 2006 30. van de Vijver MJ, He YD, van’t Veer LJ, et al: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med 347:19992009, 2002 31. Buyse M, Loi S, van’t VL, et al: Validation and clinical utility of a 70-gene prognostic signature for women with node-negative breast cancer. J Natl Cancer Inst 98:1183-1192, 2006 32. MINDACT. Available at: http://www.eortc.be/services/unit/mindact/ MINDACT_websiteii.asp. Accessed November 29, 2007 33. Adjuvant! Online. Available at: http://www.adjuvantonline.co. Accessed 34. Sotiriou C, Wirapati P, Loi S, et al: Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst 98:262-272, 2006 35. Wang Y, Klijn JG, Zhang Y, et al: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet 365:671-679, 2005 36. Foekens JA, Atkins D, Zhang Y, et al: Multicenter validation of a gene expression-based prognostic signature in lymph node-negative primary breast cancer. J Clin Oncol 24:1665-1671, 2006 37. Desmedt C, Piette F, Loi S, et al: Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG Multicenter Independent Validation Series. Clin Cancer Res 13:3207-3214, 2007 38. Desmedt C, Sotiriou C: Proliferation: The most prominent predictor of clinical outcome in breast cancer. Cell Cycle 5:2198-2202, 2006 39. Fan C, Oh DS, Wessels L, et al: Concordance among gene-expressionbased predictors for breast cancer. N Engl J Med 355:560-569, 2006 40. Paik S, Shak S, Tang G, et al: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 351: 2817-2826, 2004 41. Hu Z, Fan C, Oh DS, et al: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics 7:96, 2006 42. Perou CM, Sorlie T, Eisen MB, et al: Molecular portraits of human breast tumours. Nature 406:747-752, 2000 43. Sorlie T, Perou CM, Tibshirani R, et al: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad Sci U S A 98:10869-10874, 2001 44. Sorlie T, Tibshirani R, Parker J, et al: Repeated observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100:8418-8423, 2003 45. Chang HY, Nuyten DS, Sneddon JB, et al: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci U S A 102:3738-3743, 2005 46. Ma XJ, Wang Z, Ryan PD, et al: A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell 5:607-616, 2004 47. Perou CM, Jeffrey SS, van de RM, et al: Distinctive gene expression patterns in human mammary epithelial cells and breast cancers. Proc Natl Acad Sci U S A 96:9212-9217, 1999 48. Dvorak HF: Tumors: Wounds that do not heal. Similarities between tumor stroma generation and wound healing. N Engl J Med 315:16501659, 1986 49. Brown JM, Wilson WR: Exploiting tumour hypoxia in cancer treatment. Nat Rev Cancer 4:437-447, 2004 50. Adler AS, Lin M, Horlings H, et al: Genetic regulators of large-scale transcriptional signatures in cancer. Nat Genet 38:421-430, 2006