Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance

Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance

MOLONC520_proof ■ 6 June 2014 ■ 1/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1 available at www.sciencedirect.com 1 2 3 4 5 6 7 8 9 ...

3MB Sizes 3 Downloads 57 Views

MOLONC520_proof ■ 6 June 2014 ■ 1/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

available at www.sciencedirect.com 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64

ScienceDirect www.elsevier.com/locate/molonc

Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance Q5

Haoyan Chena,*, Jie Xua,*, Jie Honga, Ruqi Tanga, Xi Zhangb, JingYuan Fanga,*

a State Key Laboratory of Oncogenes and Related Genes, Division of Gastroenterology and Hepatology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai Cancer Institute, Shanghai Institute of Digestive Q1 Disease, 145 Middle Shandong Road, Shanghai 200001, China b Departments of Biochemistry and Molecular Biology, University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA

A R T I C L E

I N F O

A B S T R A C T

Article history:

Colorectal cancer (CRC) is a heterogeneous disease in terms of clinical behavior and

Received 24 April 2014

response to therapy. Increasing evidence suggests that long noncoding RNAs (lncRNAs)

Received in revised form

are frequently aberrantly expressed in cancers, and some of them have been implicated

14 May 2014

in CRC biogenesis and prognosis. Using an lncRNA-mining approach, we constructed

Accepted 22 May 2014

lncRNAs expression profiles in approximately 889 CRC samples. By applying unsupervised

Available online -

consensus clustering to LncRNA expression profiles, we identified five distinct molecular subtypes of CRC with different biological pathways and phenotypically distinct in their

Keywords:

clinical outcome in both univariate and multivariate analysis. The prognostic significance

lncRNA

of the lncRNA-based classifier was confirmed in independent patient cohorts. Further anal-

Consensus clustering

ysis revealed that most of the signature lncRNAs positively correlated with somatic copy

Colorectal cancer

number alterations (SCNAs). This lncRNAs-based classification schema thus provides a

Gene expression profiling

molecular classification applicable to individual tumors that has implications to influence

Somatic copy number alterations

treatment decisions.

Survival

ª 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights

Gene set enrichment analysis

1.

Introduction

Colorectal cancer (CRC) is the third most commonly diagnosed cancer with an estimated 50,000 deaths per year in the United States (Siegel et al., 2014). The current AJCC TNM histopathological classification system has offered a valuable basis for

reserved.

defining groups of patients for clinical assessment, and predicts the clinical behavior of the respective neoplasm with direct impact on the applied treatment regimes (Marisa et al., 2013; O’Connell et al., 2004). A remarkable feature of CRC is the difference in prognosis of the early and late stages of the disease: stage I and II have moderate risk of relapse after

* Corresponding authors. State Key Laboratory for Oncogenes and Related Genes, School of Medicine, Shanghai Jiao Tong University, 145 Middle Shandong Rd, Shanghai, 200001, China. E-mail addresses: [email protected] (H. Chen), [email protected] (J. Xu), [email protected] (J.-Y. Fang). http://dx.doi.org/10.1016/j.molonc.2014.05.010 1574-7891/ª 2014 Federation of European Biochemical Societies. Published by Elsevier B.V. All rights reserved.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129

MOLONC520_proof ■ 6 June 2014 ■ 2/11

2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

Figure 1 e Unsupervised lncRNAs expression profiling analysis of the discovery set of 443 colorectal cancers. (a) NMF consensus clustering analysis and cophenetic coefficient for cluster k [ 2 to k [ 6 of discovery dataset of GSE39582. Maximum cophenetic coefficient occurred for k [ 5. (b) Silhouette plot for the discovery dataset of GSE39582 showing samples from different subtypes and those with positive and negative silhouette score.(c) Heatmap showing 5 subtypes (k [ 5) from NMF consensus clustering of the discovery dataset of GSE39582. (d) The heatmap of the 229 lncRNAs ordered by subtype, with annotations associated with each subtype.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 3/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

3

Figure 2 e DFS comparison of five CRC subtypes. KaplaneMeier Survival curve depicting differential survival for the discovery dataset of GSE39582 (a), validation dataset of GSE39582, GSE33113 and GSE17536/GSE17537 (b) and all the dataset combined (c).

surgical resection, whereas patients with stage III have a higher chance of recurrence. Recognized clinical risk factors for recurrence are emergency presentation, poorly differentiated tumor, depth of tumor invasion, and adjacent organ involvement (T4) (Dotan and Cohen, 2011; Gill et al., 2004; McMillan et al., 2010). However, these factors are insufficient to identify those patients with stage II CRC at high risk of recurrence and posterior metastasis or those patients with stage III CRC at low risk (Dotan and Cohen, 2011), leading to potential under-treatment or over-treatment (Gill et al., 2004). Molecular markers such as mutations in Kirsten ras gene (KRAS ) and BRAF as well as chromosome and microsatellite instability (MSI) have been systematically analyzed for prognostic potential in CRC. So far, only KRAS mutation analysis has been used in clinical practice as a predictive marker for the effect of EGFR antibodies in metastatic disease (FarinaSarasqueta et al., 2010; Popat et al., 2005; Roth et al., 2010; Walther et al., 2008). Gene-expression profiling-based assays have been successful as prognostic tool in breast cancer (Prat and Perou, 2011; Reis-Filho and Pusztai, 2011; Weigelt et al., 2010). However, no signature has been adopted in routine clinical practice in CRC despite a large number of gene expression profiling studies on prognosis have been performed (De Sousa et al., 2013; Marisa et al., 2013; Sadanandam et al., 2013; Sanz-Pamplona et al., 2012). Currently, the roles of dysregulated functional long non-coding RNAs (lncRNAs) in human cancers have received considerable attention (Gibb et al., 2011; Mercer et al., 2009; Mitra et al., 2012; Prensner and Chinnaiyan, 2011; Villegas et al., 2014). LncRNAs are mRNA-like transcripts ranging in length from 200 nucleotides

(nt) to w100 kilobases (kb) that lack significant protein-coding abilities (Lipovich et al., 2010; Mercer et al., 2009). Increasing evidence suggests that these transcripts are frequently aberrantly expressed in cancers, and some of them have been implicated in diagnosis and prognostication (Qi and Du, 2013). As lncRNAs do not encode proteins, their functions are closely associated with their transcript abundance (Du et al., 2013). It has been reported that lncRNAs demonstrated higher specificity than protein-coding mRNAs (Hessels et al., 2003; Prensner et al., 2011), and had the advantages of being detectable in the blood (Lin et al., 2007) and urine (Hessels et al., 2003; Tinzl et al., 2004) of cancer patients by conventional PCR methods. Several mRNA expression based molecular classification systems have been established for CRC (Budinska et al., 2013; De Sousa et al., 2013; Marisa et al., 2013; Sadanandam et al., 2013; Wang et al., 2013). The perturbation of lncRNA expression can result in the development and progression of CRC (Ge et al., 2013; Li et al., 2013), whereas few approaches take the lncRNA molecular abnormalities into consideration. Currently, lncRNA profiles in most human cancers remain largely unknown, mainly due to the lack of such arrays. Previous study demonstrated that lncRNA profiling could be achieved by mining previously published gene expression microarray data because a large group of lncRNA-specific probes were fortuitously represented on the commonly used microarray platforms (Du et al., 2013; Li et al., 2014). In the present study, GATExplorer (Genomic and Transcriptomic Explorer) (Risueno et al., 2010) was used to process microarrays on a local computer for gene expressions of lncRNAs profiling

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

Q6

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 4/11

4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

Table 1 e Characteristics of the five Subtypes of CRC.

on a cohort of 556 patients from GSE39582 as well as three independent datasets from the Gene Expression Omnibus (GEO). By applying unsupervised consensus clustering on LncRNA expression profiles, we identified five distinct molecular subtypes of CRC with differentiating biological signatures, which could be distinguished using Gene Set Enrichment Analysis (GSEA) and are also phenotypically distinct in their DFS (disease free survival) in both univariate and multivariate analysis.

2.

Materials and methods

2.1.

CRC datasets preparation

Microarray data from GSE39582, GSE17536/GSE17537, GSE14333 and GSE33113 databases were directly downloaded from Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/). These datasets corresponded to all available public

datasets fulfilling the following criteria: available gene expression data obtained using the same chip platform (Affymetrix U133 Plus 2.0 chips) with raw data CEL files, and patient outcome data available. After initial quality check, the majority samples in GSE14333 were overlapping with GSE17536/ GSE17537, which were not included in the further analysis. The CRC samples in GSE39582 were split into a discovery set (n ¼ 443) and an internal validation set (n ¼ 123) as described before (Marisa et al., 2013). Within the discovery (n ¼ 443) and the validation (n ¼ 445) sets, 359 and 336 patients with stage IIeIIICC and documented relapse-free survival (RFS) were available for survival analysis, respectively.

2.2. Microarray data processing and lncRNA profile mining The raw CEL files were downloaded from GEO database and background adjusted using Robust Multichip Average.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 5/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

5

Figure 3 e Gene set enrichment analysis delineates biological pathways and processes that define five distinct subtypes. Cytoscape and Enrichment Map were used for visualization of the GSEA results. Nodes represent enriched gene sets, which are grouped and annotated by their similarity according to related gene sets. Enrichment results were mapped as a network of gene sets (nodes). Node size is proportional to the total number of genes within each gene set. Proportion of shared genes between gene sets is represented as the thickness of the green line between nodes. This network map was manually curated removing general and uninformative sub-networks, resulting in a simplified network map.

GATExplorer (Risueno et al., 2010) was used to process microarrays on a local computer for gene expressions of lncRNAs. This GATExplorer provides a series of R packages, designed to be used with BioConductor tools, that allow to apply in a simple way the probe mapping data included in GATExplorer. A type of files called ncRNA Mapper were also obtained from GATExplorer, which include the probes that do not map to any coding region but that were mapped to a database for non-coding RNA of human and mouse derived from RNAdb (Pang et al., 2007). A customized R scripts was used to perform a microarray expression calculation according to the re-mapping data (file ncrnamapperhgu133plus2cdf_3.0). Each LncRNA should include at least a minimum of 3 probes mapping in the corresponding ncRNAs entity. In our previous work (Hu et al., 2014), we have also validated this reannotation method using experimental procedures. Six lncRNAs that were extracted by re-annotation of microarrays were experimentally validated by RT-qPCR in our CRC tissue

samples, and the results were highly consistent with the microarray-derived data.

2.3.

NearestTemplatePrediction algorithm

We used the NearestTemplatePrediction (NTP) algorithm (Hoshida, 2010) from GenePattern to predict the class of a given sample by a predefined set of markers that are specific to each subtype.

2.4.

Molecular subtype characterization

The published supervised signatures including Stem cell signature up regulated tumors (Merlos-Suarez et al., 2011), Cell from crypt signature up regulated tumors (Kosinski et al., 2007), Popovici BRAF mutated like tumors (Popovici et al., 2012), Laiho et al. Serrated CRC tumors (Laiho et al., 2007), were annotated as described before (Marisa et al.,

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 6/11

6 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

2013). Wnt pathway activity (Vermeulen et al., 2010) and FOLFIRI response signatures (Graudens et al., 2006) were predicted by using the NTP algorithm as described by Sadanandam et al. (2013).

2.5.

CGH alteration frequency profiles

CGH array data were downloaded via the NCBI Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/; accession number GSE40966). The raw copy-number data for each sample provided by array CGH was analyzed using the GISTIC2 algorithm (Mermel et al., 2011).

2.6. Identification of biological pathways distinguishing each subtypes GSEA was performed by the JAVA program (http://www.broadinstitute.org/gsea) using MSigDB C2 CP: Canonical pathways gene set collection (1320 gene sets available). The GSEA outputs, visualized in Cytoscape (version 2.8.2), and the Enrichment Map software (Merico et al., 2010), were used to identify the biological processes discriminating one subtype from others. To simplify the network map, a stringent threshold of gene-set permutations with a FDR cutoff of 0.5% and p-value cutoff of 0.01 was used in Enrichment Map software as described (Merico et al., 2010).

2.7.

KaplaneMeier method, and differences between survival distributions were assessed with the log-rank test. Univariate and multivariate models were computed using Cox proportional-hazards regression (R package survival).

Statistical analysis

The Chi-squared test and logistic regression were used to study associations between anatomo-clinical features, common DNA alterations, previously published supervised signatures and subtypes. Survival analysis was intentionally restricted to the subgroup of patients with stage IIeIII tumors because reliable prognostic biomarkers are most needed for these patients as suggested (Marisa et al., 2013). Indeed, most stage I patients will not derive benefit from adjuvant chemotherapy because of their excellent prognosis after curative surgery, and most stage IV patients, already metastatic, will die from their disease and therefore should be analyzed independently for progression-free survival. RFS was defined as the time from surgery to the first recurrence and was censored at 5 y. Survival was analyzed according to the

3.

Results

3.1.

Unsupervised lncRNAs selection

The GSE39582 dataset containing a large sample size with 556 cases was adapted as primary data source in this study. The patient and tumor characteristics of the different sets were described previously (Marisa et al., 2013). To remove potential multicenter batch effects, data were corrected using ComBat method (Johnson et al., 2007) implemented in the R package sva as described by Marisa et al. (2013). The probes showing little variation over the patients were commonly removed from the expression data as part of the preprocessing. Here, we only keep the 50% most varying probes. The subtypes of CRC were identified in row median centered microarray datasets using the non-negative matrix factorization (NMF) algorithm (Brunet et al., 2004). According to the cophenetic coefficient score (greater than 0.99) from the NMF consensus clustering of discovery set of GSE39582, we chose k ¼ 5 (five) subtypes (Figure 1a) as discussed elsewhere in Brunet et al. (2004). Silhouette width (Rousseeuw, 1987) was computed to identify the most representative samples within each cluster (Figure 1b). Finally, we retained samples with positive silhouette width (n ¼ 376) as core CRC dataset to build the classifier.

3.2.

Generation of the colorectal cancer classifier

To build the CRC classifier, we applied two filtering steps to select the most representative and predictive genes as described by De Sousa et al. (2013). Briefly, significance analysis of microarrays (SAM) (Tusher et al., 2001) (R package siggenes version 1.32.0) was used to identify genes significantly differentially expressed (false discovery rate (FDR) < 0.001) between each subtype and the other four. AUC (area under ROC curve, R package ROCR version 1.0e4) was calculated to assess each gene’s ability to separate one subtype from the others. This retained 229 genes with AUC >0.85 were trained by

Table 2 e Univariate and Multivariate Cox Analyses including the classification and clinical annotations. Variables

Discovery þ validation datasets

TNM.Stage ncRNA classification

Sex Age

Ref

II S1 S2 S3 S4 S5 F e

Univariate analysis

Q4

Multivariate analysis

n

n.event

H.R.

95% C.I.

P value

H.R.

95% C.I.

P value

695 169 123 65 163 175 695 694

191 31 49 14 48 49 191 191

1.91 Ref 2.53 1.44 1.7 1.59 1.24 1.001

1.44e2.54 e 1.61e3.95 0.76e2.70 1.08e2.66 1.02e2.49 0.93e1.65 0.99e1.01

7.69E-06 e 5.45E-05 0.2592 0.0214 0.0439 0.142 0.821

1.92 Ref 2.55 1.45 1.63 1.64 1.3 1.009

1.44e2.55 e 1.62e4.00 0.77e2.72 1.04e2.57 1.05e2.58 0.97e1.74 0.99e1.02

9.50E-06 e 5.45E-05 0.248 0.0345 0.0317 0.0772 0.1444

H.R.: Cox Hazard Ratio, 95% C.I.: 95 Percent Confidence Interval of HR.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 7/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

7

Figure 4 e Integrated view of signature lncRNAs regulated by Copy Number. (a) An ideogram of a normal karyotype is shown in the outer ring. The next outermost ring represents the lncRNAs expression levels(Red represents ncRNAs which positively correlates with expression level changes and green represents ncRNAs does not correlate with expression level changes.). The next ring represents copy number as a function of genomic coordinates. (Red represents amplification and green represents deletion).(b) In the center of the figure, the box plot showed two representative lncRNAs, AK022220 and BC017275, positively correlates with SCNA ( p < 0.001).

prediction analysis for microarrays (PAM) (Tibshirani et al., 2002) to build a classifier. To select the optimal threshold for centroid shrinkage, we performed tenfold cross-validation over a range of thresholds for 1000 iterations and selected the one yielding a good performance (error rate < 2%) with the least number of genes. Of note, the gene filtering steps do not significantly affect the selection of signature genes, as observed from PAM classification using various cutoffs on SAM FDR and AUC (data not shown). Using this strategy, we built a classifier of 229 unique genes and used it to classify

the CRC samples in validation dataset (Supplementary Table A.1).

3.3. Identification of molecular subtypes of CRC by lncRNAs expression profiling The non-negative matrix factorization consensus unsupervised analysis of the LncRNA data from the 443 samples of the discovery set of GSE39582 revealed five clusters of samples based on the most variant probe sets (n ¼ 2817): S1 (n ¼ 122,

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 8/11

8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

27.5%), S2 (n ¼ 53, 12%), S3 (n ¼ 51, 11.5%), S4 (n ¼ 118, 26.6%) and S5 (n ¼ 99,22.3%) (Figure 1c). These samples have been classified as C1eC6 based mRNA expression profile by previous study (Marisa et al., 2013). We found the subtype distribution classified by lncRNAs was significantly different from the one derived by mRNA using Chi-square test ( p ¼ 3.36E55).Notably, we also observed considerable overlap between these two classification systems, as 60.2% (32 in 53) S2 samples were classified as C4 subtype and 72.5% (37 in 51) S3 were C2.

3.4. Clinical and molecular relevance and signaling pathways of colorectal cancer subtypes Associations with clinical characteristics and DNA alterations data are shown in Figure 1d and Supplementary Table A.2. Tumors classified as S2 and S4 were more frequently metastatic (18.9 and 18.6%, versus 4.9%, 11.8% and 10.1% in S1, S3 and S5, respectively). CRC classified as S3 was enriched for dMMR (78.0%), CIMPþ (53.5%), BRAF-mutant (37.0%) and proximal (70.6%) tumors, whereas CINþ tumors less frequently in S3 than other clusters (31.9% in S3 versus >75% in other clusters). No association between clusters and TP53 and KRAS mutation was found. We performed Gene Set Enrichment Analysis (Subramanian et al., 2005) to identify biological processes and signaling pathways distinguishing certain subtype from others. Gene sets were compiled from the BIOCARTA, Kyoto Encyclopedia of Genes and Genomes (KEGG), REACTOME pathway databases. Significant gene sets (FDR < 0.005, p < 0.001) were visualized as interaction networks with Cytoscape and Enrichment Map (Figure 2) (Merico et al., 2010). The network map was manually curated removing general and uninformative sub-networks and nodes, resulting in a shown in Figure 2. In S1, DNA repair pathway and cell growth were up-regulated and focal adhesion was down-regulated. Similar to previous report (Marisa et al., 2013), in S3, the subtype enriched for dMMR tumors, immune system and cell growth pathways were up-regulated. S2 showed up-regulation of focal adhesion, angiogenesis and Extracelluar matrix adhesion pathway and down-regulation of DNA repair and cell growth pathway. Most signaling pathways were downregulated in S4 and S5. These five molecular subtypes were further investigated using previously published supervised signatures to explore the subtype origins. As observed in Figure 1d, S2 appeared highly enriched for tumors displaying “stem cell phenotypelike” gene expression profiling (79.2%) and up-regulating of the bottom crypt signature (96.2%). Most of the tumors classified as S2 and S3 displayed a “serrated CRC phenotype-like” gene expression profiling, whereas those in S1, S4 and S5 displayed a “conventional CRC phenotype-like” gene expression profiling. A strong association between stem-like-subtype tumors and high Wnt activity signature (Vermeulen et al., 2010) and FOLFIRI (FOLFIRI, a combination of irinotecan, 5fluorouracil, and leukovorin) response signature (Graudens et al., 2006) has been reported (Sadanandam et al., 2013). Similarly, the FOLFIRI response signature was significantly associated with 94.3% (n ¼ 50 in 53) of stem cell phenotype-like CRC samples, as compared to only 30e60% in other subtypes as assessed using the NTP algorithm.

A schematic summary of the subtype characteristics is shown in Table 1. The five subtypes were named according to their main respective biological characteristic as follows: S1, “Proliferation CRC”; S2, “stem-cell liked CRC”; S3, “BRAF mutant CRC”; S4 “Immune-system down CRC”; and S5 “KRAS mutant CRC”.

3.5.

Prognostic value of the five-subtype classification

Further investigation of the clinical relevance of our classification included a prognostic analysis based on RFS is restricted to stage II and III tumors as discussed in Marisa et.al’s report (Marisa et al., 2013). The prognostic value of the five-subtype classification was significant in the discovery set (n ¼ 359) ( p ¼ 0.00998) and the validation set (n ¼ 336) ( p ¼ 0.0147) (Figure 3a), with a worse prognosis confirmed for patients with S2 tumors (Figure 3b); The five-subtype classification was also significant for the discovery and the validation sets combined ( p ¼ 0.00139) (Figure 3c). Multivariate analysis (discovery and validation sets analyzed separately and merged) revealed that the LncRNA derived subtypes, together with TNM stage, remained an independent prognostic factor (Table 2 and Supplementary Table A.3).

3.6.

Functional annotation of signature lncRNAs

Studies have suggested that the genes with causal roles in oncogenesis are often located in the SCNAs that are frequently altered across tumors (Beroukhim et al., 2010; Garraway et al., 2005). To reveal the lncRNAs that may have tumor-promoting or -suppressing functions, we identified most of lncRNAs that map to regions of recurrent somatic copy number alterations (SCNAs) (Figure 4a). The driver lncRNAs with SCNAs should result in corresponding gene expression changes (Akavia et al., 2010), as only those SCNAs that cause changes in transcript abundance could possibly alter lncRNA activity. We integrated SCNA and expression profiles of lncRNAs in tumors. Integrated Circos plot shows signature ncRNA regulated by copy number (CN). The next outermost histogram represents lncRNAs expression as a function of genomic coordinates: Red represents ncRNAs which positively correlate with expression level changes and green represents ncRNAs does not correlate with expression level changes. Here, we found among the 229 ncRNAs used to build the classifier, 38.9% (89/ 229) ncRNAs were positively correlated with expression level changes (Figure 4a). As seen in Figure 4b, two representative lncRNAs, AK022220 and BC017275, showed a positive correlation with SCNA ( p < 0.001). This may suggest that most of the ncRNAs selected to build the classifier are potential cancer drivers.

4.

Discussion

CRC of varying histological subtypes and grades of malignancy are traditionally diagnosed based on their histopathological features. In the past few decades, molecular expression profiles using microarray analyses have provided additional information to help distinguish the CRC subtypes (Budinska et al., 2013; De Sousa et al., 2013; Marisa et al.,

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 9/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

2013; Sadanandam et al., 2013; Wang et al., 2013). Aberrantly expressed molecular markers have also been used to elucidate the mechanisms of CRC progression and malignant transformation. lncRNAs, as a novel class of functional molecules involved in cancer biology, may similarly yield valuable information (Ge et al., 2013; Kogo et al., 2011). In this study, to investigate the molecular features that best define subsets of CRC, we applied unsupervised consensus clustering to the lncRNA expression data of CRC tumors, without applying biologic or clinical information. The clinical heterogeneity of CRC, coupled with previous gene expression studies (Budinska et al., 2013; De Sousa et al., 2013; Marisa et al., 2013; Sadanandam et al., 2013; Wang et al., 2013), suggests that at least three molecular subtypes of CRC exist. Here we demonstrated that there are likely five primary subtypes of CRC by profiling LncRNA expression, although further subclassifications within these subtypes may be identified in much larger datasets. The subtypes derived by lncRNAs profiling were associated with a significant difference in survival outcome, with S2 patients having a markedly worse prognosis. While the TNM stage proved to be an independent predictor of survival, the LnRNAs based molecular classification had a similar effect size and was statistically significant at the P ¼ 0.01 level in the multivariable analysis. Future studies on larger numbers of patients are needed to validate the results of the preliminary multivariate analysis reported herein. The functions of lncRNAs were more likely to closely correlate with their transcript abundance as they do not encode proteins (Du et al., 2013). Thus lncRNAs may have higher specificity than protein-coding mRNAs in Hessels et al. (2003); Prensner et al. (2011), and be more suitable to serve as prognostic and/or predictive markers for CRC. In this study, using a set of 229 lncRNAs, the CRC samples could be successfully classified into five distinct molecular subtypes with differentiating biological pathways and phenotypically distinct in their clinical outcome. SCNAs is an important form of somatic genetic alteration in cancer, and within that a genomic region is either amplified or deleted. Some of the genes within amplified (or deleted) regions exhibit increased (or decreased) expression levels resulting in altered activity in cancer cells. The most of the signature lncRNAs showed a positive correlation between their SCNAs and expression levels, which we reasoned could possibly lead to altered lncRNA activity in CRC. The limitations should be acknowledged for this study. First, we lack information on the mechanisms behind the signature lncRNAs, and experimental studies on these lncRNAs might provide important information to further our understanding of their functional roles. Second, Affymetrix HG-U133 Plus 2.0 arrays represent part but not all of the possible lncRNAs present. The lncRNAs candidates identified here may not represent the complete lncRNA populations underlying CRC biological behavior. To accurately and comprehensively elucidate the role of lncRNAs in classification of CRC, more comprehensive profiling studies and laboratory and clinical researches are needed. Last but not least, all the lncRNAs were derived through the re-annotation algorithm in this study and should be validated in the further studies. In summary, five molecular subtypes in CRC based on the lncRNA profiles were identified. Although the possible

9

functional pathways of many identified lncRNA genes are still little understood, our results of the classification based on the lncRNA profiles may provide an efficient classification tool for clinical prognosis evaluation and selection of the target of gene therapy of CRC.

Acknowledgments We thank all the subjects of this study for their participation. This study is sponsored by Shanghai Natural Science Foundation(Grant No. 13ZR14244000) and National Natural Science Foundation of China(Grant No: 31371273) to HYC and the grants from the Nataional Natural Science Foundation of Key Program (No. 81320108024), the Ministry of Public Health, China (No. 200802094), the Ministry of Education (No. 20120073110078) to JYF; Q2

Appendix A. Supplementary data Supplementary data related to this article can be found at http://dx.doi.org/10.1016/j.molonc.2014.05.010. R E F E R E N C E S

Akavia, U.D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H.C., Pochanard, P., Mozes, E., Garraway, L.A., Pe’er, D., 2010. An integrated approach to uncover drivers of cancer. Cell 143, 1005e1017. Beroukhim, R., Mermel, C.H., Porter, D., Wei, G., Raychaudhuri, S., Donovan, J., Barretina, J., Boehm, J.S., Dobson, J., Urashima, M., Mc Henry, K.T., Pinchback, R.M., Ligon, A.H., Cho, Y.J., Haery, L., Greulich, H., Reich, M., Winckler, W., Lawrence, M.S., Weir, B.A., Tanaka, K.E., Chiang, D.Y., Bass, A.J., Loo, A., Hoffman, C., Prensner, J., Liefeld, T., Gao, Q., Yecies, D., Signoretti, S., Maher, E., Kaye, F.J., Sasaki, H., Tepper, J.E., Fletcher, J.A., Tabernero, J., Baselga, J., Tsao, M.S., Demichelis, F., Rubin, M.A., Janne, P.A., Daly, M.J., Nucera, C., Levine, R.L., Ebert, B.L., Gabriel, S., Rustgi, A.K., Antonescu, C.R., Ladanyi, M., Letai, A., Garraway, L.A., Loda, M., Beer, D.G., True, L.D., Okamoto, A., Pomeroy, S.L., Singer, S., Golub, T.R., Lander, E.S., Getz, G., Sellers, W.R., Meyerson, M., 2010. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899e905. Brunet, J.P., Tamayo, P., Golub, T.R., Mesirov, J.P., 2004. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. U.S.A. 101, 4164e4169. Budinska, E., Popovici, V., Tejpar, S., D’Ario, G., Lapique, N., Sikora, K.O., Di Narzo, A.F., Yan, P., Hodgson, J.G., Weinrich, S., Bosman, F., Roth, A., Delorenzi, M., 2013. Gene expression patterns unveil a new level of molecular heterogeneity in colorectal cancer. J. Pathology 231, 63e76. De Sousa, E.M.F., Wang, X., Jansen, M., Fessler, E., Trinh, A., de Rooij, L.P., de Jong, J.H., de Boer, O.J., van Leersum, R., Bijlsma, M.F., Rodermond, H., van der Heijden, M., van Noesel, C.J., Tuynman, J.B., Dekker, E., Markowetz, F., Medema, J.P., Vermeulen, L., 2013. Poor-prognosis colon cancer is defined by a molecularly distinct subtype and develops from serrated precursor lesions. Nat. Medicine 19, 614e618.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 10/11

10 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

Dotan, E., Cohen, S.J., 2011. Challenges in the management of stage II colon cancer. Semin. Oncology 38, 511e520. Du, Z., Fei, T., Verhaak, R.G., Su, Z., Zhang, Y., Brown, M., Chen, Y., Liu, X.S., 2013. Integrative genomic analyses reveal clinically relevant long noncoding RNAs in human cancer. Nat. Struct. Molecular Biol. 20, 908e913. Farina-Sarasqueta, A., van Lijnschoten, G., Moerland, E., Creemers, G.J., Lemmens, V.E., Rutten, H.J., van den Brule, A.J., 2010. The BRAF V600E mutation is an independent prognostic factor for survival in stage II and stage III colon cancer patients. Ann. Oncol.: Off. J. Eur. Soc. Med. Oncol./ESMO 21, 2396e2402. Garraway, L.A., Widlund, H.R., Rubin, M.A., Getz, G., Berger, A.J., Ramaswamy, S., Beroukhim, R., Milner, D.A., Granter, S.R., Du, J., Lee, C., Wagner, S.N., Li, C., Golub, T.R., Rimm, D.L., Meyerson, M.L., Fisher, D.E., Sellers, W.R., 2005. Integrative genomic analyses identify MITF as a lineage survival oncogene amplified in malignant melanoma. Nature 436, 117e122. Ge, X., Chen, Y., Liao, X., Liu, D., Li, F., Ruan, H., Jia, W., 2013. Overexpression of long noncoding RNA PCAT-1 is a novel biomarker of poor prognosis in patients with colorectal cancer. Med. Oncol. 30, 588. Gibb, E.A., Brown, C.J., Lam, W.L., 2011. The functional role of long non-coding RNA in human carcinomas. Mol. Cancer 10, 38. Gill, S., Loprinzi, C.L., Sargent, D.J., Thome, S.D., Alberts, S.R., Haller, D.G., Benedetti, J., Francini, G., Shepherd, L.E., Francois Seitz, J., Labianca, R., Chen, W., Cha, S.S., Heldebrant, M.P., Goldberg, R.M., 2004. Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: who benefits and by how much? J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 22, 1797e1806. Graudens, E., Boulanger, V., Mollard, C., Mariage-Samson, R., Barlet, X., Gremy, G., Couillault, C., Lajemi, M., PiatierTonneau, D., Zaborski, P., Eveno, E., Auffray, C., Imbeaud, S., 2006. Deciphering cellular states of innate tumor drug responses. Genome Biol. 7, R19. Hessels, D., Klein Gunnewiek, J.M., van Oort, I., Karthaus, H.F., van Leenders, G.J., van Balken, B., Kiemeney, L.A., Witjes, J.A., Schalken, J.A., 2003. DD3(PCA3)-based molecular urine analysis for the diagnosis of prostate cancer. Eur. Urol. 44, 8e15. Discussion 15e16. Hoshida, Y., 2010. Nearest template prediction: a single-samplebased flexible class prediction with confidence assessment. PloS One 5, e15543. Hu, Y., Chen, H.Y., Yu, C.Y., Xu, J., Wang, J.L., Qian, J., Zhang, X., Fang, J.Y., 2014. A long non-coding RNA signature to improve prognosis prediction of colorectal cancer. Oncotarget 5, 2230e2242. Johnson, W.E., Li, C., Rabinovic, A., 2007. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118e127. Kogo, R., Shimamura, T., Mimori, K., Kawahara, K., Imoto, S., Sudo, T., Tanaka, F., Shibata, K., Suzuki, A., Komune, S., Miyano, S., Mori, M., 2011. Long noncoding RNA HOTAIR regulates polycomb-dependent chromatin modification and is associated with poor prognosis in colorectal cancers. Cancer Res. 71, 6320e6326. Kosinski, C., Li, V.S., Chan, A.S., Zhang, J., Ho, C., Tsui, W.Y., Chan, T.L., Mifflin, R.C., Powell, D.W., Yuen, S.T., Leung, S.Y., Chen, X., 2007. Gene expression patterns of human colon tops and basal crypts and BMP antagonists as intestinal stem cell niche factors. Proc. Natl. Acad. Sci. U.S.A. 104, 15418e15423. Laiho, P., Kokko, A., Vanharanta, S., Salovaara, R., Sammalkorpi, H., Jarvinen, H., Mecklin, J.P., Karttunen, T.J., Tuppurainen, K., Davalos, V., Schwartz Jr., S., Arango, D., Makinen, M.J., Aaltonen, L.A., 2007. Serrated carcinomas form

a subclass of colorectal cancer with distinct molecular basis. Oncogene 26, 312e320. Li, L., Sun, R., Liang, Y., Pan, X., Li, Z., Bai, P., Zeng, X., Zhang, D., Zhang, L., Gao, L., 2013. Association between polymorphisms in long non-coding RNA PRNCR1 in 8q24 and risk of colorectal cancer. J. Exp. Clin. Cancer Res.: CR 32, 104. Li, R., Qian, J., Wang, Y.Y., Zhang, J.X., You, Y.P., 2014. Long noncoding RNA profiles reveal three molecular subtypes in Glioma. CNS Neurosci. Therap.. Q3 Lin, R., Maeda, S., Liu, C., Karin, M., Edgington, T.S., 2007. A large noncoding RNA is a marker for murine hepatocellular carcinomas and a spectrum of human carcinomas. Oncogene 26, 851e858. Lipovich, L., Johnson, R., Lin, C.Y., 2010. MacroRNA underdogs in a microRNA world: evolutionary, regulatory, and biomedical significance of mammalian long non-protein-coding RNA. Biochim. Biophys. Acta 1799, 597e615. Marisa, L., de Reynies, A., Duval, A., Selves, J., Gaub, M.P., Vescovo, L., Etienne-Grimaldi, M.C., Schiappa, R., Guenot, D., Ayadi, M., Kirzin, S., Chazal, M., Flejou, J.F., Benchimol, D., Berger, A., Lagarde, A., Pencreach, E., Piard, F., Elias, D., Parc, Y., Olschwang, S., Milano, G., Laurent-Puig, P., Boige, V., 2013. Gene expression classification of colon cancer into molecular subtypes: characterization, validation, and prognostic value. PLoS Med. 10, e1001453. McMillan, D.C., McArdle, C.S., Morrison, D.S., 2010. A clinical risk score to predict 3-, 5- and 10-year survival in patients undergoing surgery for Dukes B colorectal cancer. Br. J. Cancer 103, 970e974. Mercer, T.R., Dinger, M.E., Mattick, J.S., 2009. Long non-coding RNAs: insights into functions. Nat. Rev. Genet. 10, 155e159. Merico, D., Isserlin, R., Stueker, O., Emili, A., Bader, G.D., 2010. Enrichment map: a network-based method for gene-set enrichment visualization and interpretation. PloS One 5, e13984. Merlos-Suarez, A., Barriga, F.M., Jung, P., Iglesias, M., Cespedes, M.V., Rossell, D., Sevillano, M., HernandoMomblona, X., da Silva-Diz, V., Munoz, P., Clevers, H., Sancho, E., Mangues, R., Batlle, E., 2011. The intestinal stem cell signature identifies colorectal cancer stem cells and predicts disease relapse. Cell Stem Cell 8, 511e524. Mermel, C.H., Schumacher, S.E., Hill, B., Meyerson, M.L., Beroukhim, R., Getz, G., 2011. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copynumber alteration in human cancers. Genome Biol. 12, R41. Mitra, S.A., Mitra, A.P., Triche, T.J., 2012. A central role for long non-coding RNA in cancer. Front. Genet. 3, 17. O’Connell, J.B., Maggard, M.A., Ko, C.Y., 2004. Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging. J. Natl. Cancer Inst. 96, 1420e1425. Pang, K.C., Stephen, S., Dinger, M.E., Engstrom, P.G., Lenhard, B., Mattick, J.S., 2007. RNAdb 2.0ean expanded database of mammalian non-coding RNAs. Nucl. Acids Res. 35, D178eD182. Popat, S., Hubner, R., Houlston, R.S., 2005. Systematic review of microsatellite instability and colorectal cancer prognosis. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 23, 609e618. Popovici, V., Budinska, E., Tejpar, S., Weinrich, S., Estrella, H., Hodgson, G., Van Cutsem, E., Xie, T., Bosman, F.T., Roth, A.D., Delorenzi, M., 2012. Identification of a poor-prognosis BRAFmutant-like population of patients with colon cancer. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 30, 1288e1295. Prat, A., Perou, C.M., 2011. Deconstructing the molecular portraits of breast cancer. Molecular Oncol. 5, 5e23. Prensner, J.R., Chinnaiyan, A.M., 2011. The emergence of lncRNAs in cancer biology. Cancer discovery 1, 391e407. Prensner, J.R., Iyer, M.K., Balbin, O.A., Dhanasekaran, S.M., Cao, Q., Brenner, J.C., Laxman, B., Asangani, I.A., Grasso, C.S.,

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130

MOLONC520_proof ■ 6 June 2014 ■ 11/11 M O L E C U L A R O N C O L O G Y X X X ( 2 0 1 4 ) 1 e1 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

Kominsky, H.D., Cao, X., Jing, X., Wang, X., Siddiqui, J., Wei, J.T., Robinson, D., Iyer, H.K., Palanisamy, N., Maher, C.A., Chinnaiyan, A.M., 2011. Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nature Biotechnol. 29, 742e749. Qi, P., Du, X., 2013. The long non-coding RNAs, a new cancer diagnostic and therapeutic gold mine. Modern Pathol.: Off. J. U.S. Canad. Acad. Pathol. Inc. 26, 155e165. Reis-Filho, J.S., Pusztai, L., 2011. Gene expression profiling in breast cancer: classification, prognostication, and prediction. The Lancet 378, 1812e1823. Risueno, A., Fontanillo, C., Dinger, M.E., De Las Rivas, J., 2010. GATExplorer: genomic and transcriptomic explorer; mapping expression probes to gene loci, transcripts, exons and ncRNAs. BMC Bioinform. 11, 221. Roth, A.D., Tejpar, S., Delorenzi, M., Yan, P., Fiocca, R., Klingbiel, D., Dietrich, D., Biesmans, B., Bodoky, G., Barone, C., Aranda, E., Nordlinger, B., Cisar, L., Labianca, R., Cunningham, D., Van Cutsem, E., Bosman, F., 2010. Prognostic role of KRAS and BRAF in stage II and III resected colon cancer: results of the translational study on the PETACC-3, EORTC 40993, SAKK 60-00 trial. J. Clin. Oncol.: Off. J. Am. Soc. Clin. Oncol. 28, 466e474. Rousseeuw, P.J., 1987. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Mathem. 20, 53e65. Sadanandam, A., Lyssiotis, C.A., Homicsko, K., Collisson, E.A., Gibb, W.J., Wullschleger, S., Ostos, L.C., Lannon, W.A., Grotzinger, C., Del Rio, M., Lhermitte, B., Olshen, A.B., Wiedenmann, B., Cantley, L.C., Gray, J.W., Hanahan, D., 2013. A colorectal cancer classification system that associates cellular phenotype and responses to therapy. Nat. Med. 19, 619e625. Sanz-Pamplona, R., Berenguer, A., Cordero, D., Riccadonna, S., Sole, X., Crous-Bou, M., Guino, E., Sanjuan, X., Biondo, S., Soriano, A., Jurman, G., Capella, G., Furlanello, C., Moreno, V., 2012. Clinical value of prognosis gene expression signatures in colorectal cancer: a systematic review. PloS One 7, e48877. Siegel, R., Ma, J., Zou, Z., Jemal, A., 2014. Cancer statistics, 2014. CA: A Cancer J. Clinic. 64, 9e29.

11

Subramanian, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gillette, M.A., Paulovich, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesirov, J.P., 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. U.S.A. 102, 15545e15550. Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G., 2002. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S.A. 99, 6567e6572. Tinzl, M., Marberger, M., Horvath, S., Chypre, C., 2004. DD3PCA3 RNA analysis in urineea new perspective for detecting prostate cancer. Eur. Urol. 46, 182e186. Discussion 187. Tusher, V.G., Tibshirani, R., Chu, G., 2001. Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. U.S.A. 98, 5116e5121. Vermeulen, L., De Sousa, E.M.F., van der Heijden, M., Cameron, K., de Jong, J.H., Borovski, T., Tuynman, J.B., Todaro, M., Merz, C., Rodermond, H., Sprick, M.R., Kemper, K., Richel, D.J., Stassi, G., Medema, J.P., 2010. Wnt activity defines colon cancer stem cells and is regulated by the microenvironment. Nature Cell Biol. 12, 468e476. Villegas, V.E., Rahman, M.F., Fernandez-Barrena, M.G., Diao, Y., Liapi, E., Sonkoly, E., Stahle, M., Pivarcsi, A., Annaratone, L., Sapino, A., Ramirez Clavijo, S., Burglin, T.R., Shimokawa, T., Ramachandran, S., Kapranov, P., Fernandez-Zapico, M.E., Zaphiropoulos, P.G., 2014. Identification of novel non-coding RNA-based negative feedback regulating the expression of the oncogenic transcription factor GLI1. Molecular Oncol.. Walther, A., Houlston, R., Tomlinson, I., 2008. Association between chromosomal instability and prognosis in colorectal cancer: a meta-analysis. Gut 57, 941e950. Wang, X., Markowetz, F., De Sousa, E.M.F., Medema, J.P., Vermeulen, L., 2013. Dissecting cancer heterogeneityean unsupervised classification approach. Int. J. Biochem. Cell Biol. 45, 2574e2579. Weigelt, B., Baehner, F.L., Reis-Filho, J.S., 2010. The contribution of gene expression profiling to breast cancer classification, prognostication and prediction: a retrospective of the last decade. J. Pathol. 220, 263e280.

Please cite this article in press as: Chen, H., et al., Long noncoding RNA profiles identify five distinct molecular subtypes of colorectal cancer with clinical relevance, Molecular Oncology (2014), http://dx.doi.org/10.1016/j.molonc.2014.05.010

41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80