Accepted Manuscript Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to Carcinoma Xi Liu, Min Zhang, Songmin Ying, Chong Zhang, Runhua Lin, Jiaxuan Zheng, Guohong Zhang, Dongping Tian, Yi Guo, Caiwen Du, Yuping Chen, Shaobin Chen, Xue Su, Juan Ji, Wanting Deng, Xiang Li, Shiyue Qiu, Ruijing Yan, Zexin Xu, Yuan Wang, Yuanning Guo, Jiancheng Cui, Shanshan Zhuang, Huan Yu, Qi Zheng, Moshe Marom, Sitong Sheng, Guoqiang Zhang, Songnian Hu, Ruiqiang Li, Min Su PII: DOI: Reference:
S0016-5085(17)30342-6 10.1053/j.gastro.2017.03.033 YGAST 61065
To appear in: Gastroenterology Accepted Date: 23 March 2017 Please cite this article as: Liu X, Zhang M, Ying S, Zhang C, Lin R, Zheng J, Zhang G, Tian D, Guo Y, Du C, Chen Y, Chen S, Su X, Ji J, Deng W, Li X, Qiu S, Yan R, Xu Z, Wang Y, Guo Y, Cui J, Zhuang S, Yu H, Zheng Q, Marom M, Sheng S, Zhang G, Hu S, Li R, Su M, Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to Carcinoma, Gastroenterology (2017), doi: 10.1053/ j.gastro.2017.03.033. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Manuscript Number: GASTRO 16-01758
2
Title: Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to
3
Carcinoma
4
Short Title: Genetic Evolution of Esophageal Carcinoma
5
Authors: :Xi Liu1,#, Min Zhang2,#, Songmin Ying1,3,#, Chong Zhang1, Runhua Lin1,
6
Jiaxuan Zheng1, Guohong Zhang4, Dongping Tian1, Yi Guo4, Caiwen Du4, Yuping
7
Chen4, Shaobin Chen4, Xue Su1, Juan Ji1, Wanting Deng1, Xiang Li1, Shiyue Qiu1,
8
Ruijing Yan1, Zexin Xu1, Yuan Wang1, Yuanning Guo1, Jiancheng Cui2, Shanshan
9
Zhuang4, Huan Yu2, Qi Zheng2, Moshe Marom5, Sitong Sheng6, Guoqiang Zhang7,
M AN U
SC
RI PT
1
10
Songnian Hu7, Ruiqiang Li2, Min Su1,*
11
1
12
Medical College, Shantou 515041, Guangdong, China.
13
2
Novogene Co., LTD, Beijing 100083, China.
14
3
Department of Pharmacology, Zhejiang University School of Medicine, Hangzhou
TE D
Institute of Clinical Pathology & Department of Pathology, Shantou University
EP
310058, Zhejiang, China.
15 16
4
17
Guangdong, China.
18
5
21
AC C
Guangdong Technion-Israel Institute of Technology, Shantou 515063, Guangdong, China.
19 20
Cancer Hospital of Shantou University Medical College, Shantou 515041,
6
HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China.
1
ACCEPTED MANUSCRIPT 22
7
23
Grant support: National Natural Science Foundation of China (NSFC) –Guangdong
24
joint fund key project (U1132004), NSFC (31171226, 81572684), Guangdong
25
International Cooperative Technical Innovation Platform (Certificates gihz1106),
26
Guangdong Government under the Top-tier University Development Scheme for
27
Research and Control of Infectious Diseases, Molecular Diagnosis and Personalized
28
Medicine Center of Shantou University.
29
Abbreviations: BE (Barrett’s esophagus), EAC (esophageal adenocarcinoma), ESCC
30
(Esophageal squamous cell carcinoma), ESSH (esophageal squamous simple
31
hyperplasia), hTNF-α (human tumor necrosis factor-alpha), IEN (intraepithelial
32
neoplasia), NF-κB (nuclear factor-kappa B), TRS (targeted sequencing), WES (whole
33
exome
34
(phosphorylated H2AX)
35
#
36
*Correspondence: Min Su, Institute of Clinical Pathology & Department of
37
Pathology, Shantou University Medical College, No.22 Xinling Road, Sahntou,
38
Guangdong, China. Phone: 86-0754-88900429; Fax: 86-0754-88900429; E-mail:
39
[email protected]
40
Conflict of Interest: :All of the authors declare no personal, professional and financial
41
conflicts of interest.
SC
M AN U (whole-genome
sequencing),
γH2AX
AC C
EP
Co-first author
WGS
TE D
sequencing),
RI PT
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
2
ACCEPTED MANUSCRIPT Accession codes: Binary sequence alignment/map (BAM) files are uploading to
43
BioProject and the accession number is PRJNA317404.
44
Writing Assistance: Bruce AJ Ponder, Dante Neculai and Zeev Ronai offered free
45
assistance.
46
Author Contributions:
47
Conception and design - Min Su, Xi Liu
48
Financial support - Min Su
49
Material support - Min Su, Dongping Tian, Yi Guo, Caiwen Du, Yuping Chen,
50
Shaobin Chen, Shanshan Zhuang
51
Collection and assembly of data - Min Su, Dongping Tian, Xi Liu, Xue Su, Juan Ji,
52
Wanting Deng, Xiang Li, Shiyue Qiu, Ruijing Yan, Zexin Xu, Yuan Wang, Chong
53
Zhang, Jiaxuan Zheng, Guohong Zhang, Yi Guo, Caiwen Du, Yuping Chen, Shaobin
54
Chen and Shanshan Zhuang
55
Analysis and interpretation of data - Ruiqiang Li, Min Zhang, Jiancheng Cui, Huan
56
Yu, Qi Zheng, Xi Liu, Yuanning Guo, Runhua Lin, Songmin Ying, Sitong Sheng,
57
Guoqiang Zhang, Songnian Hu and Moshe Marom
58
Drafting of the manuscript - All authors
59
Final approval of manuscript - All authors
AC C
EP
TE D
M AN U
SC
RI PT
42
60 61 62
3
ACCEPTED MANUSCRIPT Abstract:
64
BACKGROUND & AIMS: Esophageal squamous cell carcinoma (ESCC) is most
65
common subtype of esophageal cancer. Little is known about the genetic changes that
66
occur in esophageal cells during development of ESCC. We performed
67
next-generation sequence analyses of esophageal non-tumor, intraepithelial neoplasia
68
(IEN), and ESCC tissues from the same patients to track genetic changes during
69
tumor development.
SC
RI PT
63
M AN U
70
METHODS: We performed whole-genome, exome, or targeted sequence analyses of
72
227 esophageal tissue samples from 70 patients with ESCC undergoing resection at
73
Shantou University Medical College in China from 2012 through 2015 (no patients
74
had received chemotherapy or radiation therapy); we analyzed normal tissue, tissue
75
with simple hyperplasia, dysplastic tissue (intraepithelial neoplasia, IEN), and ESCC
76
tissues collected from different regions of the esophagus at the same time. We also
77
obtained 1191 non-tumor esophageal biopsies from the Chaoshan region of China (a
78
high-risk region for ESCC) and performed immunohistochemical and histologic
79
analyses to detect inflammation.
EP
AC C
80
TE D
71
81
RESULTS: IEN and ESCC tissues had similar mutations and copy number alterations,
82
at similar frequencies; these differed from mutations detected in tissues with simple
83
hyperplasia. IEN tissues had mutations associated with apolipoprotein B mRNA
4
ACCEPTED MANUSCRIPT editing enzyme, catalytic polypeptide-like (APOBEC)–mediated mutagenesis (a DNA
85
damage mutational signature). Genetic analyses indicated that most ESCCs formed
86
from early-stage IEN clones. Trunk mutations (mutations shared by more than 10% of
87
paired IEN and ESCC tissues) were in genes that regulate DNA repair and cell
88
apoptosis, proliferation, and adhesion. Mutations in TP53 and CDKN2A and copy
89
number alterations in 11q (contains CCND1), 3q (contains SOX2), 2q (contains
90
NFE2L2), and 9p (contains CDKN2A) were considered to be trunk variants; these
91
were dominant mutations detected at high frequencies in clones of paired IEN and
92
ESCC samples. In the esophageal biopsy samples from high-risk individuals (residing
93
in the Chaoshan region), 68.9% had evidence of chronic inflammation; level of
94
inflammation correlated with atypical cell structures and markers of DNA damage.
M AN U
SC
RI PT
84
TE D
95
CONCLUSION: We analyzed mutations and gene copy number changes in
97
non-tumor, IEN, and ESCC samples, collected from 70 patients. IEN and ESCCs each
98
had similar mutations and markers of genomic instability, including APOBEC.
99
Genomic changes observed in precancerous lesions might be used to identify patients
101 102
AC C
100
EP
96
at risk for ESCC.
KEY WORDS: esophagus, sequencing, carcinogenesis, driver mutation
103 104
5
ACCEPTED MANUSCRIPT Introduction
106
Esophageal cancer is one of the most common cancers worldwide, and can be divided
107
into two histologic subtypes — esophageal squamous cell carcinoma (ESCC) and
108
esophageal adenocarcinoma (EAC).1 The overall 5-year survival with esophageal
109
cancer ranges from 15% to 25%, and disease diagnosed in earlier stages yields better
110
outcomes.
111
precancerous lesions of ESCC,2, 3 with significantly increased risk of developing into
112
ESCC.4, 5 Therefore, identifying mutations occurring during ESCC development could
113
provide implications for early diagnosis and potential therapeutic strategies.
(intraepithelial
neoplasia,
IEN)
has
been
considered
M AN U
SC
Dysplasia
RI PT
105
The genomic landscape of ESCC, which frequently exhibits mutations in TP53,
115
CDKN2A, and PIK3CA, has been well characterized by whole-genome sequencing
116
(WGS) and whole exome sequencing (WES).6-9 However, most studies of
117
precancerous lesions of ESCC were limited to hotspot genes or the allelic loss of
118
tumor suppressor genes.10-12 The panoramic genetic architecture of the carcinogenesis
119
process is unknown.
EP
TE D
114
Recent research provided evidence that chronic inflammation is a strong risk
121
factor in the microenvironment for development of digestive tumors.13, 14 We also
122
reported a close association between chronic inflammation and esophageal
123
precancerous lesions from ESCC patients.15
AC C
120
124
Here, to explore the potential role of inflammation in the development of ESCC,
125
we evaluated the prevalence of chronic inflammation in 1191 esophageal endoscopic
6
ACCEPTED MANUSCRIPT non-tumor biopsies from the Chaoshan area in China (a high-risk region for ESCC).
127
To identify genomic changes underlying the transition from non-dysplastic epithelium
128
(simple hyperplasia, ESSH) and IEN to ESCC, we performed WGS, WES and
129
targeted sequencing (TRS) with matched samples (ESSH, IEN and ESCC) derived by
130
microdissection from the same individuals. We aimed to determine the genomic
131
alterations events that contribute to the carcinogenesis of ESCC.
132
Material and Methods
133
Sample collection
M AN U
SC
RI PT
126
All samples were obtained with approval of the ethics committee of Shantou
135
University Medical College. A total of 227 distinct samples from 6 individuals with
136
ESCC were manually microdissected, then performed by sequencing. Sequencing
137
samples were collected from patients undergoing resection at the Cancer Hospital of
138
Shantou University Medical College from 2012-2015 and no patients had received
139
with chemotherapy or radiation (Supplementary Table 1 and Supplementary Fig. 1).
140
A total of 1191 esophageal endoscopic non-tumor biopsies (170 biopsies from
141
mass screening) were collected from individuals the Chaoshan district of China
142
(Supplementary Table 2).
143
Whole-exome and whole- genome sequencing
EP
AC C
144
TE D
134
For WES, capture libraries were prepared from 2 µg genomic DNA (gDNA) by
145
using the Agilent SureSelect Human All ExonV5 kit (Agilent Technologies, CA,
146
USA) following the manufacturer’s recommendations, the library preparations were
7
ACCEPTED MANUSCRIPT sequenced on an Illumina Hiseq 2500. For WGS, a paired-end DNA library was
148
prepared from 1.5 µg gDNA according to the manufacturer’s instructions (Illumina
149
Truseq Library Construction), then sequenced on an Illumina Hiseq X-ten.
150
Targeted Sequencing
151
RI PT
147
A total of 126 genes were selected (Supplementary Materials and Methods). An amount of 2 ug gDNA per sample was used as input. Sequencing libraries were
153
generated by using the Agilent SureSelectXT Custom kit (Agilent Technologies, CA,
154
USA) following the manufacturer’s recommendations.
155
Data Analysis
M AN U
SC
152
Burrows-Wheeler Aligner (BWA) software was utilized to map the paired-end
157
clean reads to the reference genome (UCSC hg19). GATK Haplotype Caller
158
(https://www.broadinstitute.org/gatk/) and variant Filtration were used for variant
159
calling and identify single nucleotide polymorphisms (SNPs) and indels. The MuTect
160
algorithm was used to identify somatic mutations in WES and WGS data.16 The
161
MuSiC algorithm was used to identify significantly mutated genes (SMGs) in ESCC
162
and IEN cohorts respectively.17 Control-FREEC was applied to analyze somatic copy
163
number alterations (CNAs) and loss of heterozygosity (LOH).18 Allelic heterogeneity,
164
clonal frequency and clonal cluster were evaluated using PyClone version 0.12.7.19
165
The tumor content and polyploidy were obtained by using Absolute software.20
166
Phylogenetic trees were constructed manually according to the numbers of somatic
AC C
EP
TE D
156
8
ACCEPTED MANUSCRIPT nonsynonymous mutations and were rooted at the germline state. Additional details
168
can be found in Supplementary Materials and Methods.
169
Results
170
Chronic inflammation and DNA damage are positively correlated with cellular
171
atypia in non-tumor samples
172
We obtained 1191 endoscopic esophageal biopsies from non-tumor individuals (170
173
biopsies from mass screening) in the Chaoshan district of China, a region showing
174
relatively high incidence of esophageal cancer. The presence of chronic inflammation
175
was as high as 68.85% (820/1191) (Supplementary Table 2). The level of
176
inflammation was positively correlated with cellular atypia (rs = .464, P < .001) (Fig.
177
1A). We examined the DNA damage status in 84 mass-screened non-tumor biopsies
178
and the degree of inflammation was positively correlated with histologic subtype (rs
179
= .488, P < .001). The level of γH2AX (phosphorylated H2AX, a rapid and sensitive
180
response marker for DNA double-strand breaks) expression rate was significantly
181
higher in IEN than normal and ESSH samples (P < .001) (Fig. 1B). Not surprisingly,
182
the positive γH2AX rate also significantly differed between non-inflammation and
183
inflammation tissue (Fig. 1C and D).
SC
M AN U
TE D
EP
AC C
184
RI PT
167
We next exposed immortalized esophageal epithelial cell line (NE2) to human
185
tumor necrosis factor-alpha (hTNF-α) and hydrogen peroxide (H2O2), a representative
186
chronic inflammatory microenvironment, to test the effect on esophageal epithelial
187
cells. Immunofluorescence staining demonstrated that nuclear factor-kappa B
9
ACCEPTED MANUSCRIPT (NF-κB) translocated to the cell nucleus and the expression of γH2AX in the cell
189
nucleus was greater with hTNF-α and H2O2 treatment than control treatment
190
(Supplementary Fig. 2). These data are consistent with our hypothesis that chronic
191
inflammation promotes DNA damage.15
192
High incidence of genetic changes is common in IEN and ESCC
RI PT
188
To investigate the genetic changes contributing to the carcinogenesis of ESCC,
194
we sampled and extracted matched ESCC, IEN and ESSH tissue from 70 patients
195
(Supplementary Table 1). To avoid the effect from secondary inflammation induced
196
by tumor samples, we evaluated the degree of chronic inflammation in non-tumor
197
tissue samples. The level of inflammation in the sequenced cohort was also positively
198
correlated with cellular atypia (rs = .872, P < .001, Supplementary Fig. 3A). Then,
199
matched tissues (ESSH, IEN, and ESCC) from 25 patients with normal tissues or
200
blood DNA as controls were subjected to WES or WGS and we evaluated the
201
expression of γH2AX in these non–tumor samples (Supplementary Table 3). The
202
γH2AX expression increased with the degree of inflammation and cellular atypia
203
(Supplementary Fig. 3B and C).
M AN U
TE D
EP
AC C
204
SC
193
Next, we compared the nature of somatic nucleotide variants and the CNA
205
spectrum at different stages of ESCC development. The mutation rate in exon regions
206
represented a mean of 3.55, 4.56 and 0.81 mutations/Mb per ESCC, IEN and ESSH
207
samples, respectively (Fig. 2A and Supplementary Table 4). Furthermore, the
208
mutation number in WES and WGS samples was positively associated with γH2AX
10
ACCEPTED MANUSCRIPT expression (rs = .406, P = .008). The number of CNA base pairs was significantly
210
higher in ESCC and IEN than ESSH samples (758.10 and 420.10 vs 29.09 Mb,
211
respectively) (Fig. 2B and Supplementary Table 5). Likewise, for 7 matched WGS
212
cases, the median number of structural variations (SVs) was lower in ESSH than IEN
213
and ESCC samples (104 compared with 449.43 and 329.71, respectively)
214
(Supplementary Table 6).
RI PT
209
In addition to investigating the common variation spectrum in IEN and ESCC,
216
we further performed genomic ploidy analysis by using ABSOLUTE (Supplementary
217
Fig. 4A).20 It showed that 68.0% (17/25) of ESCC and 55.6% (15/27) of IEN samples
218
with evidence of polyploidy and no ESSH samples showed polyploidy. On comparing
219
matched samples, the ploidy events occurred in the early stage of IEN development in
220
52.2% samples, and only 17.4% samples exhibited polyploidy in the malignant
221
progression from IEN to ESCC, indicating ploidy events are the early genetic
222
variation events (Supplementary Fig. 4B). We also found that both IEN and ESCC
223
samples showed greater genetic diversity than did ESSH samples, on the basis of a
224
higher incidence of monoclonal clusters seen in ESSH samples (Supplementary Fig.
225
4C). Collectively, these observations suggest that increased genomic diversity begins
226
at the IEN formation phase.
227
DNA damage mutation signature in IEN and ESCC
AC C
EP
TE D
M AN U
SC
215
228
Analysis of the context of mutations seen in ESSH, IEN and ESCC samples
229
identified C/G>T/A transitions as preponderant nucleotide substitutions in different
11
ACCEPTED MANUSCRIPT stage samples (Fig. 2C). We also observed enriched C>T or C>G variations in TpCpX
231
in IEN and ESCC (Fig. 2D), which suggests an APOBEC (apolipoprotein B mRNA
232
editing enzyme, catalytic polypeptide-like)-mediated signature.21 To confirm this
233
finding, we further characterized three and four mutation signature patterns of IEN
234
and ESCC samples respectively (Supplementary Fig. 5A and 5B). Of note, IEN and
235
ESCC shared similar mutation signature A and signature B. After comparison to the
236
COSMIC database (Supplementary Fig. 5C), signature A presented a high rate of
237
C>T transition in XpCpG which may be initiated by spontaneous deamination of
238
5-methylcytosine and has previously been found ubiquitously expressed across
239
different tumor types and highly correlated with aging.22 For signature B, C>G
240
transversion and C>T transition were dominant mutations in TpCpX. This signature is
241
associated with the APOBEC family of cytidine deaminases, which is frequently
242
observed in several tumor types, and presents a DNA damage pattern.23-25 Both
243
signatures A and B in IEN and ESCC samples agreed well with the signatures
244
reported in ESCC sequencing.26 However, signature C in IEN exhibited a high rate of
245
T>A mutations which differed from that in ESCC. After comparison to the COSMIC
246
database, signature C in ESCC presented the main mutations in C>T at XpCpG which
247
may associate with defective DNA mismatch repair. Signature D in ESCC showed
248
transcriptional strand bias for T>C and C>T substitutions. So far, the aetiology of
249
signature C in IEN and signatures C and D in ESCC remains unknown.
AC C
EP
TE D
M AN U
SC
RI PT
230
12
ACCEPTED MANUSCRIPT To evaluate and compare the composition of signatures in paired cases, we
251
analyzed the IEN and ESCC samples as a whole (Fig. 2E) and found that integral
252
signatures contained four types, which were similar to the signatures derived from
253
independent ESCC and IEN cohorts (Supplementary Fig. 5C). We found no
254
significant differences between the contribution of IEN and ESCC to the four
255
signatures (signature A, P = .8169416; signature B, P = .3051342; signature C, P
256
= .4628854; signature D, P = .2297273; Fig. 2F). There still had some special cases,
257
for instance EC1110, all mutations in tumor sample (unique in ESCC and shared
258
mutations) is dominated by signature B so we can infer that shared mutations
259
contributed to signature B and specific mutations in IEN contributed to signatures A,
260
C and D. Hence, during malignant transformation in EC1110, signatures A, C and D
261
played a role in the early stage while the trunk and later mutation events were from
262
signature B which associated DNA damage mutagenesis. To observe which mutations
263
contributed to each signature, we analyzed the shared and unique mutation
264
percentages and the contributions to the signatures in paired IEN and ESCC samples
265
(Fig. 2G).
266
Significantly mutated genes are shared in IEN and ESCC stages
SC
M AN U
TE D
EP
AC C
267
RI PT
250
Subsequently, we measured the significant mutated genes in independent ESCC
268
and IEN cohorts (Fig. 3 and Supplementary Table 7), which were largely similar in
269
ESCC and IEN samples. Large-scale chromosome deletion at 9p21.3 (CDKN2A) and
270
2q35 (ASCL3, FEV) and amplification at 11q13.3 (CCND1), 5p15.33, 8q24, 2q31.2
13
ACCEPTED MANUSCRIPT (NFE2L2), 8p11.23, 7q22.1 and 3q27 (SOX2) were common in both IEN and ESCC
272
samples by GISTIC (Fig. 3B and Supplementary Table 8), which suggests suggesting
273
that these region or candidate genes were early CNA events. Although IEN and ESCC
274
samples did not differ in CNA size (P = .070), ESCC had a widespread increase in
275
CNA status. During the malignant development, amplification of 3q26-3q28, 8q24
276
and 12p13 in ESCC samples contained wider peak boundaries as compared with IEN
277
samples, which suggested that several cancer genes (ATR, MECOM, PIK3CA, BCL6,
278
MYC, CCND2) contained more recurrent CNAs in ESCC samples. We also identified
279
several significant deletion regions (10p11.1, 21p12 and 4q35.2, q < .050) in IEN
280
samples which did not show as significant deletion regions in ESCC samples, which
281
may imply that the genes located in these regions play different roles in IEN and
282
ESCC.
TE D
M AN U
SC
RI PT
271
Furthermore, LOH of TP53 loci was identified in 44% (11/25) of ESCC samples
284
and 30% (8/27) of IEN samples. A similar incidence of LOH for CDKN2A (12% and
285
15%), FAT2 (24% and 11%), NOTCH family genes (32% and 19%), RB1 (16% and
286
11%) and YAP1 (12% and 11%) was seen in ESCC and IEN samples (Supplementary
287
Table 9).
AC C
288
EP
283
To identify mutations likely linked to ESCC development, we selected 126
289
candidate genes (see Supplementary Methods and Supplementary Table 10) and
290
analyzed them by TRS. Analyses relied on an independent validation cohort of
291
45-matched tissues (Supplementary Tables 1). The data for TRS, WES and WGS
14
ACCEPTED MANUSCRIPT were integrated, and we found no distinct difference in the mutation frequency
293
between IEN and ESCC samples (Fig. 4A and Supplementary Table 11). As expected,
294
TP53 was identified as the most frequently mutated gene in both IEN and ESCC
295
samples (71.2% and 74.3%, respectively).
RI PT
292
HOTNET2 was performed to identify the altered subnetworks from the recurrent
297
point mutations and CNAs.27 TP53 signaling, NOTCH signaling, ERBB2-PI3K
298
signaling, DNA repair pathway, NFE2L2-KEAP1 and SWI/SNF complexes were
299
significantly defective during ESCC progression (Fig. 4B).
300
Phylogenetic and clonal analyses reveal potential driver events
M AN U
SC
296
Given the large number of somatic mutations we identified, we assumed that the
302
shared alterations within matched IEN and ESCC samples were considered as trunk
303
events and might allow us to identify “driver” genes. Because copy number variations
304
always affect a constellation of genes, to present the real cancer-related genes, we
305
further considered the trunk CNA genes in the peak regions and included in the cancer
306
gene census (COSMIC). Trunk mutations combined with CNAs detected in more than
307
10% of cases (7/68) are shown in Figure 5A. The trunk genes can be divided into
308
three major categories: (a) DNA repair and apoptosis (as expected, the top two trunk
309
alterations genes were TP53 and CDKN2A); (b) proliferation genes and oncogenes
310
representing a high amplification frequency suggested that the proliferative capability
311
was activated in the IEN stage; and (c) variations in cell adhesion and junction genes.
AC C
EP
TE D
301
15
ACCEPTED MANUSCRIPT Compared with the mutations with low clonal frequencies in ESSH samples
313
(more than 70% mutations with clonal frequencies less than 20%, Supplementary
314
Table 12), the high clonal frequencies of trunk mutated genes and SMGs in IEN
315
indicated that key mutations arose in the IEN stage and gave rise to ESCC (Fig. 5B).
316
The mutations in TP53 and CDKN2A showed high clonal frequency (>90%) in the
317
IEN stage in more than 5% cases (29/73 and 6/73 samples, respectively).
318
Subsequently, mutations in NOTCH1, FAT1, KMT2D and ZFHX4 with a high
319
frequency of trunk events, grew to clonal dominance during the carcinogenesis
320
process. In a previous report, ZFHX4 can regulate CHD4, a core member of the
321
nucleosome remodeling and deacetylase (NuRD) complex in glioblastoma.28
M AN U
SC
RI PT
312
We constructed the clonal frequency and phylogenetic trees of 25 cases which
323
were subjected to WGS and WES (Fig. 6, Supplementary Fig. 6 and Supplementary
324
table 12). For case EC1117, the variations in ESSH were extremely low frequency
325
clonal mutations. Two clusters of clones were observed in the shared mutations. Clone
326
1, containing 19.1% SNVs assessed, was the initial clone with key mutated genes
327
such as TP53, SYNE1 and NCOR1. The shared CNAs of CCDN1, CDKN2A, FGFR1
328
and trunk mutations with the high clonal frequencies indicated they were the early
329
driver events in the carcinogenesis. Surprisingly, an abundant number of SNVs did
330
not overlap in some paired samples, for instance, cases EC1000, EC1120 and EC1134
331
with a lower degree of overlap (1.3%, 3.6% and 3.7%, respectively, Supplementary
332
Fig. 6) showed such genetic heterogeneity between IEN and ESCC. For case EC1000,
AC C
EP
TE D
322
16
ACCEPTED MANUSCRIPT the few overlapping mutations indicated the different lesion sites of the esophagus of
334
EC1000 may undergo independent mutation clonal expansions and become
335
transformed as separate clones.
336
Discussion
RI PT
333
Genomic analyses of ESCC and paired precancerous lesions led us to infer an
338
initial genomic progression model of ESCC (Fig. 7). To the best of our knowledge,
339
this is the first study to characterize the different stages of the ESCC carcinogenic
340
process with genomic profiling. First, in our study, ESSH samples harbored few
341
somatic alterations as compared to IEN and ESCC samples, and most of the mutation
342
sites in ESSH had extremely low clonal frequencies, which suggests that ESSH is an
343
adaptive response. However, we found no distinct barrier between IEN and ESCC in
344
the genetic variant spectrum. The high frequency of mutations and large-scale
345
chromosome aberrations shared between IEN and ESCC indicate that genetic stability
346
in IEN has collapsed already. Although IEN and ESCC have distinct
347
histological patterns, from our analyses, we proposed the characteristics of genetic
348
alteration in IEN may be analogous to tumor cells. The shared clones with high clonal
349
frequencies and key mutations identified in matched IEN and ESCC samples suggest
350
that the initial tumor clones were formed early in IEN stage.
M AN U
TE D
EP
AC C
351
SC
337
Intriguingly, despite shared genetic changes in paired ESCC and IEN samples, a
352
high degree of private mutation sites showed evidence for genetic heterogeneity,12, 29
353
which was similar to paired Barrett’s and EAC samples.30,31 Also some cases showing
17
ACCEPTED MANUSCRIPT high heterogeneity between paired IEN and ESCC samples (three pairs with <5%
355
SNV overlap) implied the possibility of multifocal lesions in the esophagus. The
356
microenvironment such as the composition of microbiota, inflammation, or other
357
environmental-based factors probably contribute to the multifocal lesions.32
RI PT
354
APOBEC-mediated signature was one of the major mutation types in both IEN
359
and ESCC samples. Activated AID/APOBEC deaminases can convert cytosine (dC)
360
to uracil (dU) and finally initiate C>T/G hypermutation in the TpCpN motif.23, 33
361
Because the efficient dC deamination by APOBECs requires single-strand DNA
362
generated from DNA damage (DNA double-strand and single-strand breaks) or
363
replication fork stalling, this signature is considered DNA damage-related
364
mutagenesis.25,
365
visualized marker for DNA damage in IEN, which confirms DNA damage as an
366
important mutation pattern of the esophageal carcinogenic process as well as multiple
367
types of precancerous lesions.35-37 From the analyses of panoramic network and the
368
phylogenetic process, majority of cases harbored defects in DNA damage repair and
369
cell cycle in the early stage and were persistent during ESCC progression, which
370
could help damaged cells escape from the early barriers of tumorigenesis and could
371
underlie the initial tumor clone formation. 37, 38
M AN U
Subsequently, we detected the high expression of γH2AX, a
EP
TE D
34
AC C
372
SC
358
Combined with trunk mutation frequencies and clonal analysis, TP53 and
373
CDKN2A were identified as the early mutated driver genes in IEN. In a hypothesized
374
progression model of EAC, mutations of TP53 were also emphasized as early events
18
ACCEPTED MANUSCRIPT in Barrett’s esophagus (a precancerous lesion to EAC).30 However we also found the
376
two types of esophageal carcinogenic processes contain some differences: CNAs are
377
rare in Barrett’s epithelium,30, 31 whereas a high degree of CNAs was observed in
378
IEN. These findings are consistent with a study from The Cancer Genome Atlas that
379
showed ESCC and EAC have distinct molecular characteristics. For example,
380
amplifications of CCND1 and SOX2 and mutations in NOTCH1 were more commonly
381
in ESCC as compared with EAC.39, 40
SC
RI PT
375
Our analyses also found more than 30% of ESCC cases harbored CCND1, SOX2
383
and MYC amplification as trunk events. The activated proliferation genes and
384
oncogenes in the IEN stage could aggravate DNA damage to the precancerous lesion
385
cells via stalling and collapse of DNA replication forks,41 and in contrast, induce
386
aberrant proliferation of the initial tumor cell and promote its progression, directly or
387
indirectly. Another observation is the high level of variant genes in cell adhesion and
388
junction. Here, we proposed that the genetic defects of adhesion genes also play an
389
important role in the initial process of carcinogenesis, for example, the trunk gene
390
PCDH9 we identified in had shown suppressed capacity in colony formation,
391
metastasis and invasion of tumor cells.42, 43 Perturbed adhesion function in the IEN
392
stage could facilitate infiltration of the initial tumor clonal cells.
TE D
EP
AC C
393
M AN U
382
A preliminary background investigation in the Chaoshan area showed a high
394
prevalence of chronic inflammation (68.85%) in high risk populations of ESCC. In
395
our previous studies, we found that the helicobacter pylori (H. pylori) infection and
19
ACCEPTED MANUSCRIPT the related chronic inflammation may contribute to the high incidence of ESCC/
397
gastric cardia cancer (GCC) in the Chaoshan area and may lead to dysplasia of
398
epithelial tissues.44, 45 The close correlation between cellular atypia, DNA damage
399
status and inflammation suggest that chronic inflammation may be an important
400
pathogenic factor and contribute to the neoplastic process in the high-incidence
401
esophageal cancer district in China.
RI PT
396
In the current study, we present the preliminary exploration of genetic
403
variations in the multi-stage carcinogenesis of ESCC. The data can be used to model
404
the key genetic events in esophageal tumorigenesis as potential biomarkers for early
405
detection and provide new insights into the architecture of ESCC progression.
406
However, we still lack direct evidence to prove inflammation as a pathogenic factor in
407
ESCC development and we have not yet identified the “last straw” events that
408
promote IEN into infiltrating carcinoma, which remains for further exploration by a
409
holistic approach to epigenetics.
410
Acknowledgments
EP
TE D
M AN U
SC
402
We thank Bruce AJ Ponder, Dante Neculai and Zeev Ronai for their help in the
412
revised manuscript and all the staff and assistants in the Department of Pathology of
413
Shantou University Medical College for their support in collecting samples.
414
AC C
411
415 416 417 20
ACCEPTED MANUSCRIPT 418
References
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459
1.
Rustgi AK, El-Serag HB. Esophageal carcinoma. N Engl J Med 2014;371:2499-509.
2.
Pennathur A, Farkas A, Krasinskas AM, et al. Esophagectomy for T1 Esophageal Cancer: Outcomes in 100 Patients and Implications for Endoscopic Therapy. Annals Of Thoracic Surgery 2009;87:1048-1055.
3.
Enzinger PC, Mayer RJ. Medical progress - Esophageal cancer. New England Journal Of
4.
RI PT
Medicine 2003;349:2241-2252.
Wang GQ, Abnet CC, Shen Q, et al. Histological precursors of oesophageal squamous cell
carcinoma: results from a 13 year prospective follow up study in a high risk population. Gut 2005;54:187-92. 5.
Dawsey SM, Lewin KJ, Wang GQ, et al. Squamous esophageal histology and subsequent risk China. Cancer 1994;74:1686-92.
6.
SC
of squamous cell carcinoma of the esophagus. A prospective follow-up study from Linxian,
Song YM, Li L, Ou YW, et al. Identification of genomic alterations in oesophageal squamous cell cancer. Nature 2014;509:91-+.
Gao YB, Chen ZL, Li JG, et al. Genetic landscape of esophageal squamous cell carcinoma. Nat Genet 2014;46:1097-102.
8.
M AN U
7.
Lin DC, Hao JJ, Nagata Y, et al. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nat Genet 2014;46:467-73.
9.
Sawada G, Niida A, Uchi R, et al. Genomic Landscape of Esophageal Squamous Cell Carcinoma in a Japanese Population. Gastroenterology 2016;150:1171-82.
10.
Gao H, Wang LD, Zhou Q, et al. p53 tumor suppressor gene mutation in early esophageal
TE D
precancerous lesions and carcinoma among high-risk populations in Henan, China. Cancer Res 1994;54:4342-6. 11.
Roth MJ, Hu N, Emmert-Buck MR, et al. Genetic progression and heterogeneity associated with the development of esophageal squamous cell carcinoma. Cancer Res 2001;61:4098-104.
12.
Liu M, Zhang F, Liu S, et al. Loss of heterozygosity analysis of microsatellites on multiple
EP
chromosome regions in dysplasia and squamous cell carcinoma of the esophagus. Exp Ther Med 2011;2:997-1001. 13.
Shimizu T, Marusawa H, Matsumoto Y, et al. Accumulation of Somatic Mutations in TP53 in
AC C
Gastric Epithelium With Helicobacter pylori Infection. Gastroenterology 2014;147:407-+. 14.
Chiba T, Marusawa H, Ushijima T. Inflammation-Associated Cancer Development in
Digestive Organs: Mechanisms and Roles for Genetic and Epigenetic Modulation. Gastroenterology 2012;143:550-563.
15.
Lin R, Zhang C, Zheng J, et al. Chronic inflammation-associated genomic instability paves the way for human esophageal carcinogenesis. Oncotarget 2016.
16.
Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in
17.
Dees ND, Zhang QY, Kandoth C, et al. MuSiC: Identifying mutational significance in cancer
impure and heterogeneous cancer samples. Nature Biotechnology 2013;31:213-219. genomes. Genome Research 2012;22:1589-1598. 18.
Boeva V, Popova T, Bleakley K, et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 2012;28:423-425. 21
ACCEPTED MANUSCRIPT 19.
Roth A, Khattra J, Yap D, et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 2014;11:396-+.
20.
Carter SL, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 2012;30:413-21.
21.
Nik-Zainal S, Alexandrov LB, Wedge DC, et al. Mutational Processes Molding the Genomes of 21 Breast Cancers. Cell 2012;149:979-993.
22.
Wang K, Yuen ST, Xu J, et al. Whole-genome sequencing and comprehensive molecular
23.
RI PT
profiling identify new driver mutations in gastric cancer. Nat Genet 2014;46:573-82.
Roberts SA, Lawrence MS, Klimczak LJ, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nature Genetics 2013;45:970-+.
Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet 2013;45:977-83.
25.
Rebhandl S, Huemer M, Greil R, et al. AID/APOBEC deaminases and cancer. Oncoscience 2015;2:320-33.
26.
SC
24.
Zhang L, Zhou Y, Cheng CX, et al. Genomic Analyses Reveal Mutational Signatures and Frequently Altered Genes in Esophageal Squamous Cell Carcinoma. American Journal of
27.
M AN U
Human Genetics 2015;96:597-611.
Leiserson MD, Vandin F, Wu HT, et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 2015;47:106-14.
28.
Chudnovsky Y, Kim D, Zheng SY, et al. ZFHX4 Interacts with the NuRD Core Member CHD4 and Regulates the Glioblastoma Tumor-Initiating Cell State. Cell Reports 2014;6:313-324.
29.
Shimada M, Yanagisawa A, Kato Y, et al. Genetic mechanisms in esophageal carcinogenesis: 1996;15:165-9.
30.
TE D
frequent deletion of 3p and 17p in premalignant lesions. Genes Chromosomes Cancer Stachler MD, Taylor-Weiner A, Peng S, et al. Paired exome analysis of Barrett's esophagus and adenocarcinoma. Nat Genet 2015. 31.
Ross-Innes CS, Becq J, Warren A, et al. Whole-genome sequencing provides new insights 2015.
32.
EP
into the clonal architecture of Barrett's esophagus and esophageal adenocarcinoma. Nat Genet Nasrollahzadeh D, Malekzadeh R, Ploner A, et al. Variations of gastric corpus microbiota are associated with early esophageal squamous cell carcinoma and squamous dysplasia. Scientific
AC C
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501
Reports 2015;5.
33.
Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human
cancer (vol 500, pg 415, 2013). Nature 2013;502.
34.
Chelico L, Pham P, Goodman MF. Mechanisms of APOBEC3G-catalyzed processive
deamination of deoxycytidine on single-stranded DNA. Nature Structural & Molecular Biology 2009;16:454-455.
35.
Gorgoulis VG, Vassiliou LV, Karakaidos P, et al. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 2005;434:907-13.
36.
Ma N, Tagawa T, Hiraku Y, et al. 8-Nitroguanine formation in oral leukoplakia, a premalignant lesion. Nitric Oxide 2006;14:137-43.
22
ACCEPTED MANUSCRIPT 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519
37.
Bartkova J, Horejsi Z, Koed K, et al. DNA damage response as a candidate anti-cancer barrier
520
Author names in bold designate shared co-first authorship
in early human tumorigenesis. Nature 2005;434:864-870. 38.
Campisi J. Suppressing cancer: the importance of being senescent. Science 2005;309:886-7.
39.
Agrawal N, Jiao YC, Bettegowda C, et al. Comparative Genomic Analysis of Esophageal Adenocarcinoma and Squamous Cell Carcinoma. Cancer Discovery 2012;2:899-905.
40.
Cancer Genome Atlas Research N, Analysis Working Group: Asan U, Agency BCC, et al. Integrated genomic characterization of oesophageal carcinoma. Nature 2017;541:169-175. Halazonetis TD, Gorgoulis VG, Bartek J. An oncogene-induced DNA damage model for
42.
Wang CL, Tao BB, Li ST, et al. Characterizing the Role of PCDH9 in the Regulation of
cancer development. Science 2008;319:1352-5.
Glioma Cell Apoptosis and Invasion. Journal Of Molecular Neuroscience 2014;52:250-260. 43.
Chen Y, Xiang HG, Zhang YF, et al. Loss of PCDH9 is associated with the differentiation of Experimental Metastasis 2015;32:417-428.
44.
Li WS, Tian DP, Guan XY, et al. Esophageal intraepithelial invasion of Helicobacter pylori correlates with atypical hyperplasia. International Journal Of Cancer 2014;134:2626-2632. Wang YS, Liu SH, Zhang Y, et al. Helicobacter pylori infection and gastric cardia cancer in
M AN U
45.
Chaoshan region. Microbes And Infection 2014;16:840-844.
527 528 529
EP
526
AC C
525
TE D
522
524
SC
tumor cells and metastasis and predicts poor survival in gastric cancer. Clinical &
521
523
RI PT
41.
530 531
23
ACCEPTED MANUSCRIPT
Figure legends
533
Figure 1. DNA damage and activation of NF-κB correlated with inflammation.
534
(A) Correlation between inflammation score and histologic types. (Spearman’s rank
535
correlation test, two-sided). (B, C) Quantification of γH2AX positive cells in different
536
histological tissue samples and different scores of inflammations. (Nonparametric test,
537
two-sided). (D) Top panel shows hematoxylin and eosin (H&E) staining of esophageal
538
epithelium. Immunohistochemistry (IHC) confirms a stepwise increased expression of
539
γH2AX and NF-κB at the transition of cellular atypia and inflammation degree. Scale
540
bars, 100 µm.
541
Figure 2. Mutation spectrum shared by IEN and ESCC (ESCC, n = 25; IEN, n =
542
27; ESSH, n = 14).
543
(A, B) Mutation rates and number of CNA base pairs for ESSH, IEN and ESCC. The
544
data were analyzed by Wilcoxon rank-sum test, two-sided. (C) Analysis of mutation
545
spectra. Base mutations are characterized as six types, and the percentage of each is
546
calculated in the three cohorts. (D) Proportion of SNVs occurring in specific
547
nucleotide motif contexts for each category of single-nucleotide substitution. (E)
548
Three mutational signatures extracted from WGS and WES data derived from 25
549
paired ESCC and IEN samples. The major components contributing to each signature
550
are highlighted by arrows. (F) Comparison of percentage contribution of signatures
551
between paired IEN and ESCC samples, P values were computed by nonparametric
552
test. (G) The percentage of unique and shared SNVs and relative contributions of
AC C
EP
TE D
M AN U
SC
RI PT
532
24
ACCEPTED MANUSCRIPT mutational signatures in individual cases.
554
Figure 3. The variation landscape of ESCC, IEN and ESSH (ESCC, n = 25; IEN,
555
n = 27; ESSH, n = 14).
556
(A) Significantly mutated genes from WES and WGS and genes selected from the
557
literatures (marked with *) are ranked in the top panel.6-9 Cancer genes (based on
558
COSMIC analysis) within the CNA peak regions are shown in bottom panel. Samples
559
are grouped by clinical features. Altered sample counts are shown in the left panel.
560
Based on mutation rate and frequency of altered genes, ESSH samples differed
561
markedly from IEN and ESCC samples. (CN: copy number) (B) GISTIC analysis of
562
ESCC and IEN samples. G-scores of genomic amplifications and deletions of ESCC
563
and IEN are plotted as curves. Genes located in most significant regions are
564
represented.
565
Figure 4. The frequency of recurrently mutated genes and gene networks in
566
ESCC and IEN.
567
(A) Percentage of samples with recurrently mutated genes (mutated in ≥7 samples)
568
identified in ESCC and IEN cohorts (ESCC, n=70; IEN, n=73) (B) Subnetworks
569
based on somatic mutations (deleterious mutations) and CNAs (CN ≤ 1 and ≥ 4 copies)
570
are defined by HOTNET2 analysis and annotated by KEGG and Reactome pathways.
571
Figure 5. Clonal evolution and driver genes in paired ESCC and IEN cases.
AC C
EP
TE D
M AN U
SC
RI PT
553
25
ACCEPTED MANUSCRIPT (A) The frequency of trunk altered genes detected in more than 10% of cases. (B)
573
Clonal frequencies in somatic mutations of paired samples are shown. Square shading
574
encodes the clonal frequency (CF: the percent of tumor cells).
575
Figure 6. Evolutionary process of individual cases.
576
(A) Clonal frequency comparison of SNVs detected in IEN and ESCC samples. The
577
x-axis and y-axis correspond to the clonal frequency of IEN and ESCC samples,
578
respectively. The number of mutations in each clone cluster was calculated. The genes
579
with extremely low clonal frequency are marked with asterisk (*). (B) The inferred
580
phylogenetic trees. The numbers indicate the number of nonsynonymous mutations.
581
The trunk and branch lengths are proportional to the number of somatic mutations.
582
Key mutations are marked with clonal frequency analysis (arrows). (C, D) Copy
583
number alterations and the beta allele frequency (BAF) profiles across chromosomes.
584
Cancer genes and the high frequency alteration genes are labeled. The purple boxes
585
indicate the overlapping alterations between the samples. For BAF profiles, the allelic
586
imbalances from the matched allelic ratio (0.5:0.5) of germline heterozygous
587
single-nucleotide polymorphisms (SNPs) is plotted on the y-axis. Predicted BAF are
588
shown in black and loss of heterozygosity (LOH) are shown in blue.
589
Figure 7. The model of ESCC development.
590
The preliminary progression model of hyperplasia, intraepithelial neoplasia and
591
infiltrating carcinoma.
AC C
EP
TE D
M AN U
SC
RI PT
572
26
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT