Genetic Alterations in Esophageal Tissues From Squamous Dysplasia to Carcinoma

Genetic Alterations in Esophageal Tissues From Squamous Dysplasia to Carcinoma

Accepted Manuscript Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to Carcinoma Xi Liu, Min Zhang, Songmin Ying, Chong Zhang, Runhu...

17MB Sizes 1 Downloads 70 Views

Accepted Manuscript Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to Carcinoma Xi Liu, Min Zhang, Songmin Ying, Chong Zhang, Runhua Lin, Jiaxuan Zheng, Guohong Zhang, Dongping Tian, Yi Guo, Caiwen Du, Yuping Chen, Shaobin Chen, Xue Su, Juan Ji, Wanting Deng, Xiang Li, Shiyue Qiu, Ruijing Yan, Zexin Xu, Yuan Wang, Yuanning Guo, Jiancheng Cui, Shanshan Zhuang, Huan Yu, Qi Zheng, Moshe Marom, Sitong Sheng, Guoqiang Zhang, Songnian Hu, Ruiqiang Li, Min Su PII: DOI: Reference:

S0016-5085(17)30342-6 10.1053/j.gastro.2017.03.033 YGAST 61065

To appear in: Gastroenterology Accepted Date: 23 March 2017 Please cite this article as: Liu X, Zhang M, Ying S, Zhang C, Lin R, Zheng J, Zhang G, Tian D, Guo Y, Du C, Chen Y, Chen S, Su X, Ji J, Deng W, Li X, Qiu S, Yan R, Xu Z, Wang Y, Guo Y, Cui J, Zhuang S, Yu H, Zheng Q, Marom M, Sheng S, Zhang G, Hu S, Li R, Su M, Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to Carcinoma, Gastroenterology (2017), doi: 10.1053/ j.gastro.2017.03.033. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Manuscript Number: GASTRO 16-01758

2

Title: Genetic Alterations as Esophageal Tissues From Squamous Dysplasia to

3

Carcinoma

4

Short Title: Genetic Evolution of Esophageal Carcinoma

5

Authors: :Xi Liu1,#, Min Zhang2,#, Songmin Ying1,3,#, Chong Zhang1, Runhua Lin1,

6

Jiaxuan Zheng1, Guohong Zhang4, Dongping Tian1, Yi Guo4, Caiwen Du4, Yuping

7

Chen4, Shaobin Chen4, Xue Su1, Juan Ji1, Wanting Deng1, Xiang Li1, Shiyue Qiu1,

8

Ruijing Yan1, Zexin Xu1, Yuan Wang1, Yuanning Guo1, Jiancheng Cui2, Shanshan

9

Zhuang4, Huan Yu2, Qi Zheng2, Moshe Marom5, Sitong Sheng6, Guoqiang Zhang7,

M AN U

SC

RI PT

1

10

Songnian Hu7, Ruiqiang Li2, Min Su1,*

11

1

12

Medical College, Shantou 515041, Guangdong, China.

13

2

Novogene Co., LTD, Beijing 100083, China.

14

3

Department of Pharmacology, Zhejiang University School of Medicine, Hangzhou

TE D

Institute of Clinical Pathology & Department of Pathology, Shantou University

EP

310058, Zhejiang, China.

15 16

4

17

Guangdong, China.

18

5

21

AC C

Guangdong Technion-Israel Institute of Technology, Shantou 515063, Guangdong, China.

19 20

Cancer Hospital of Shantou University Medical College, Shantou 515041,

6

HYK High-throughput Biotechnology Institute, 4/F, Building #11, Software Park, 2nd Central Keji Rd, Hi-Tech Industrial Park, Shenzhen 518060, China.

1

ACCEPTED MANUSCRIPT 22

7

23

Grant support: National Natural Science Foundation of China (NSFC) –Guangdong

24

joint fund key project (U1132004), NSFC (31171226, 81572684), Guangdong

25

International Cooperative Technical Innovation Platform (Certificates gihz1106),

26

Guangdong Government under the Top-tier University Development Scheme for

27

Research and Control of Infectious Diseases, Molecular Diagnosis and Personalized

28

Medicine Center of Shantou University.

29

Abbreviations: BE (Barrett’s esophagus), EAC (esophageal adenocarcinoma), ESCC

30

(Esophageal squamous cell carcinoma), ESSH (esophageal squamous simple

31

hyperplasia), hTNF-α (human tumor necrosis factor-alpha), IEN (intraepithelial

32

neoplasia), NF-κB (nuclear factor-kappa B), TRS (targeted sequencing), WES (whole

33

exome

34

(phosphorylated H2AX)

35

#

36

*Correspondence: Min Su, Institute of Clinical Pathology & Department of

37

Pathology, Shantou University Medical College, No.22 Xinling Road, Sahntou,

38

Guangdong, China. Phone: 86-0754-88900429; Fax: 86-0754-88900429; E-mail:

39

[email protected]

40

Conflict of Interest: :All of the authors declare no personal, professional and financial

41

conflicts of interest.

SC

M AN U (whole-genome

sequencing),

γH2AX

AC C

EP

Co-first author

WGS

TE D

sequencing),

RI PT

Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.

2

ACCEPTED MANUSCRIPT Accession codes: Binary sequence alignment/map (BAM) files are uploading to

43

BioProject and the accession number is PRJNA317404.

44

Writing Assistance: Bruce AJ Ponder, Dante Neculai and Zeev Ronai offered free

45

assistance.

46

Author Contributions:

47

Conception and design - Min Su, Xi Liu

48

Financial support - Min Su

49

Material support - Min Su, Dongping Tian, Yi Guo, Caiwen Du, Yuping Chen,

50

Shaobin Chen, Shanshan Zhuang

51

Collection and assembly of data - Min Su, Dongping Tian, Xi Liu, Xue Su, Juan Ji,

52

Wanting Deng, Xiang Li, Shiyue Qiu, Ruijing Yan, Zexin Xu, Yuan Wang, Chong

53

Zhang, Jiaxuan Zheng, Guohong Zhang, Yi Guo, Caiwen Du, Yuping Chen, Shaobin

54

Chen and Shanshan Zhuang

55

Analysis and interpretation of data - Ruiqiang Li, Min Zhang, Jiancheng Cui, Huan

56

Yu, Qi Zheng, Xi Liu, Yuanning Guo, Runhua Lin, Songmin Ying, Sitong Sheng,

57

Guoqiang Zhang, Songnian Hu and Moshe Marom

58

Drafting of the manuscript - All authors

59

Final approval of manuscript - All authors

AC C

EP

TE D

M AN U

SC

RI PT

42

60 61 62

3

ACCEPTED MANUSCRIPT Abstract:

64

BACKGROUND & AIMS: Esophageal squamous cell carcinoma (ESCC) is most

65

common subtype of esophageal cancer. Little is known about the genetic changes that

66

occur in esophageal cells during development of ESCC. We performed

67

next-generation sequence analyses of esophageal non-tumor, intraepithelial neoplasia

68

(IEN), and ESCC tissues from the same patients to track genetic changes during

69

tumor development.

SC

RI PT

63

M AN U

70

METHODS: We performed whole-genome, exome, or targeted sequence analyses of

72

227 esophageal tissue samples from 70 patients with ESCC undergoing resection at

73

Shantou University Medical College in China from 2012 through 2015 (no patients

74

had received chemotherapy or radiation therapy); we analyzed normal tissue, tissue

75

with simple hyperplasia, dysplastic tissue (intraepithelial neoplasia, IEN), and ESCC

76

tissues collected from different regions of the esophagus at the same time. We also

77

obtained 1191 non-tumor esophageal biopsies from the Chaoshan region of China (a

78

high-risk region for ESCC) and performed immunohistochemical and histologic

79

analyses to detect inflammation.

EP

AC C

80

TE D

71

81

RESULTS: IEN and ESCC tissues had similar mutations and copy number alterations,

82

at similar frequencies; these differed from mutations detected in tissues with simple

83

hyperplasia. IEN tissues had mutations associated with apolipoprotein B mRNA

4

ACCEPTED MANUSCRIPT editing enzyme, catalytic polypeptide-like (APOBEC)–mediated mutagenesis (a DNA

85

damage mutational signature). Genetic analyses indicated that most ESCCs formed

86

from early-stage IEN clones. Trunk mutations (mutations shared by more than 10% of

87

paired IEN and ESCC tissues) were in genes that regulate DNA repair and cell

88

apoptosis, proliferation, and adhesion. Mutations in TP53 and CDKN2A and copy

89

number alterations in 11q (contains CCND1), 3q (contains SOX2), 2q (contains

90

NFE2L2), and 9p (contains CDKN2A) were considered to be trunk variants; these

91

were dominant mutations detected at high frequencies in clones of paired IEN and

92

ESCC samples. In the esophageal biopsy samples from high-risk individuals (residing

93

in the Chaoshan region), 68.9% had evidence of chronic inflammation; level of

94

inflammation correlated with atypical cell structures and markers of DNA damage.

M AN U

SC

RI PT

84

TE D

95

CONCLUSION: We analyzed mutations and gene copy number changes in

97

non-tumor, IEN, and ESCC samples, collected from 70 patients. IEN and ESCCs each

98

had similar mutations and markers of genomic instability, including APOBEC.

99

Genomic changes observed in precancerous lesions might be used to identify patients

101 102

AC C

100

EP

96

at risk for ESCC.

KEY WORDS: esophagus, sequencing, carcinogenesis, driver mutation

103 104

5

ACCEPTED MANUSCRIPT Introduction

106

Esophageal cancer is one of the most common cancers worldwide, and can be divided

107

into two histologic subtypes — esophageal squamous cell carcinoma (ESCC) and

108

esophageal adenocarcinoma (EAC).1 The overall 5-year survival with esophageal

109

cancer ranges from 15% to 25%, and disease diagnosed in earlier stages yields better

110

outcomes.

111

precancerous lesions of ESCC,2, 3 with significantly increased risk of developing into

112

ESCC.4, 5 Therefore, identifying mutations occurring during ESCC development could

113

provide implications for early diagnosis and potential therapeutic strategies.

(intraepithelial

neoplasia,

IEN)

has

been

considered

M AN U

SC

Dysplasia

RI PT

105

The genomic landscape of ESCC, which frequently exhibits mutations in TP53,

115

CDKN2A, and PIK3CA, has been well characterized by whole-genome sequencing

116

(WGS) and whole exome sequencing (WES).6-9 However, most studies of

117

precancerous lesions of ESCC were limited to hotspot genes or the allelic loss of

118

tumor suppressor genes.10-12 The panoramic genetic architecture of the carcinogenesis

119

process is unknown.

EP

TE D

114

Recent research provided evidence that chronic inflammation is a strong risk

121

factor in the microenvironment for development of digestive tumors.13, 14 We also

122

reported a close association between chronic inflammation and esophageal

123

precancerous lesions from ESCC patients.15

AC C

120

124

Here, to explore the potential role of inflammation in the development of ESCC,

125

we evaluated the prevalence of chronic inflammation in 1191 esophageal endoscopic

6

ACCEPTED MANUSCRIPT non-tumor biopsies from the Chaoshan area in China (a high-risk region for ESCC).

127

To identify genomic changes underlying the transition from non-dysplastic epithelium

128

(simple hyperplasia, ESSH) and IEN to ESCC, we performed WGS, WES and

129

targeted sequencing (TRS) with matched samples (ESSH, IEN and ESCC) derived by

130

microdissection from the same individuals. We aimed to determine the genomic

131

alterations events that contribute to the carcinogenesis of ESCC.

132

Material and Methods

133

Sample collection

M AN U

SC

RI PT

126

All samples were obtained with approval of the ethics committee of Shantou

135

University Medical College. A total of 227 distinct samples from 6 individuals with

136

ESCC were manually microdissected, then performed by sequencing. Sequencing

137

samples were collected from patients undergoing resection at the Cancer Hospital of

138

Shantou University Medical College from 2012-2015 and no patients had received

139

with chemotherapy or radiation (Supplementary Table 1 and Supplementary Fig. 1).

140

A total of 1191 esophageal endoscopic non-tumor biopsies (170 biopsies from

141

mass screening) were collected from individuals the Chaoshan district of China

142

(Supplementary Table 2).

143

Whole-exome and whole- genome sequencing

EP

AC C

144

TE D

134

For WES, capture libraries were prepared from 2 µg genomic DNA (gDNA) by

145

using the Agilent SureSelect Human All ExonV5 kit (Agilent Technologies, CA,

146

USA) following the manufacturer’s recommendations, the library preparations were

7

ACCEPTED MANUSCRIPT sequenced on an Illumina Hiseq 2500. For WGS, a paired-end DNA library was

148

prepared from 1.5 µg gDNA according to the manufacturer’s instructions (Illumina

149

Truseq Library Construction), then sequenced on an Illumina Hiseq X-ten.

150

Targeted Sequencing

151

RI PT

147

A total of 126 genes were selected (Supplementary Materials and Methods). An amount of 2 ug gDNA per sample was used as input. Sequencing libraries were

153

generated by using the Agilent SureSelectXT Custom kit (Agilent Technologies, CA,

154

USA) following the manufacturer’s recommendations.

155

Data Analysis

M AN U

SC

152

Burrows-Wheeler Aligner (BWA) software was utilized to map the paired-end

157

clean reads to the reference genome (UCSC hg19). GATK Haplotype Caller

158

(https://www.broadinstitute.org/gatk/) and variant Filtration were used for variant

159

calling and identify single nucleotide polymorphisms (SNPs) and indels. The MuTect

160

algorithm was used to identify somatic mutations in WES and WGS data.16 The

161

MuSiC algorithm was used to identify significantly mutated genes (SMGs) in ESCC

162

and IEN cohorts respectively.17 Control-FREEC was applied to analyze somatic copy

163

number alterations (CNAs) and loss of heterozygosity (LOH).18 Allelic heterogeneity,

164

clonal frequency and clonal cluster were evaluated using PyClone version 0.12.7.19

165

The tumor content and polyploidy were obtained by using Absolute software.20

166

Phylogenetic trees were constructed manually according to the numbers of somatic

AC C

EP

TE D

156

8

ACCEPTED MANUSCRIPT nonsynonymous mutations and were rooted at the germline state. Additional details

168

can be found in Supplementary Materials and Methods.

169

Results

170

Chronic inflammation and DNA damage are positively correlated with cellular

171

atypia in non-tumor samples

172

We obtained 1191 endoscopic esophageal biopsies from non-tumor individuals (170

173

biopsies from mass screening) in the Chaoshan district of China, a region showing

174

relatively high incidence of esophageal cancer. The presence of chronic inflammation

175

was as high as 68.85% (820/1191) (Supplementary Table 2). The level of

176

inflammation was positively correlated with cellular atypia (rs = .464, P < .001) (Fig.

177

1A). We examined the DNA damage status in 84 mass-screened non-tumor biopsies

178

and the degree of inflammation was positively correlated with histologic subtype (rs

179

= .488, P < .001). The level of γH2AX (phosphorylated H2AX, a rapid and sensitive

180

response marker for DNA double-strand breaks) expression rate was significantly

181

higher in IEN than normal and ESSH samples (P < .001) (Fig. 1B). Not surprisingly,

182

the positive γH2AX rate also significantly differed between non-inflammation and

183

inflammation tissue (Fig. 1C and D).

SC

M AN U

TE D

EP

AC C

184

RI PT

167

We next exposed immortalized esophageal epithelial cell line (NE2) to human

185

tumor necrosis factor-alpha (hTNF-α) and hydrogen peroxide (H2O2), a representative

186

chronic inflammatory microenvironment, to test the effect on esophageal epithelial

187

cells. Immunofluorescence staining demonstrated that nuclear factor-kappa B

9

ACCEPTED MANUSCRIPT (NF-κB) translocated to the cell nucleus and the expression of γH2AX in the cell

189

nucleus was greater with hTNF-α and H2O2 treatment than control treatment

190

(Supplementary Fig. 2). These data are consistent with our hypothesis that chronic

191

inflammation promotes DNA damage.15

192

High incidence of genetic changes is common in IEN and ESCC

RI PT

188

To investigate the genetic changes contributing to the carcinogenesis of ESCC,

194

we sampled and extracted matched ESCC, IEN and ESSH tissue from 70 patients

195

(Supplementary Table 1). To avoid the effect from secondary inflammation induced

196

by tumor samples, we evaluated the degree of chronic inflammation in non-tumor

197

tissue samples. The level of inflammation in the sequenced cohort was also positively

198

correlated with cellular atypia (rs = .872, P < .001, Supplementary Fig. 3A). Then,

199

matched tissues (ESSH, IEN, and ESCC) from 25 patients with normal tissues or

200

blood DNA as controls were subjected to WES or WGS and we evaluated the

201

expression of γH2AX in these non–tumor samples (Supplementary Table 3). The

202

γH2AX expression increased with the degree of inflammation and cellular atypia

203

(Supplementary Fig. 3B and C).

M AN U

TE D

EP

AC C

204

SC

193

Next, we compared the nature of somatic nucleotide variants and the CNA

205

spectrum at different stages of ESCC development. The mutation rate in exon regions

206

represented a mean of 3.55, 4.56 and 0.81 mutations/Mb per ESCC, IEN and ESSH

207

samples, respectively (Fig. 2A and Supplementary Table 4). Furthermore, the

208

mutation number in WES and WGS samples was positively associated with γH2AX

10

ACCEPTED MANUSCRIPT expression (rs = .406, P = .008). The number of CNA base pairs was significantly

210

higher in ESCC and IEN than ESSH samples (758.10 and 420.10 vs 29.09 Mb,

211

respectively) (Fig. 2B and Supplementary Table 5). Likewise, for 7 matched WGS

212

cases, the median number of structural variations (SVs) was lower in ESSH than IEN

213

and ESCC samples (104 compared with 449.43 and 329.71, respectively)

214

(Supplementary Table 6).

RI PT

209

In addition to investigating the common variation spectrum in IEN and ESCC,

216

we further performed genomic ploidy analysis by using ABSOLUTE (Supplementary

217

Fig. 4A).20 It showed that 68.0% (17/25) of ESCC and 55.6% (15/27) of IEN samples

218

with evidence of polyploidy and no ESSH samples showed polyploidy. On comparing

219

matched samples, the ploidy events occurred in the early stage of IEN development in

220

52.2% samples, and only 17.4% samples exhibited polyploidy in the malignant

221

progression from IEN to ESCC, indicating ploidy events are the early genetic

222

variation events (Supplementary Fig. 4B). We also found that both IEN and ESCC

223

samples showed greater genetic diversity than did ESSH samples, on the basis of a

224

higher incidence of monoclonal clusters seen in ESSH samples (Supplementary Fig.

225

4C). Collectively, these observations suggest that increased genomic diversity begins

226

at the IEN formation phase.

227

DNA damage mutation signature in IEN and ESCC

AC C

EP

TE D

M AN U

SC

215

228

Analysis of the context of mutations seen in ESSH, IEN and ESCC samples

229

identified C/G>T/A transitions as preponderant nucleotide substitutions in different

11

ACCEPTED MANUSCRIPT stage samples (Fig. 2C). We also observed enriched C>T or C>G variations in TpCpX

231

in IEN and ESCC (Fig. 2D), which suggests an APOBEC (apolipoprotein B mRNA

232

editing enzyme, catalytic polypeptide-like)-mediated signature.21 To confirm this

233

finding, we further characterized three and four mutation signature patterns of IEN

234

and ESCC samples respectively (Supplementary Fig. 5A and 5B). Of note, IEN and

235

ESCC shared similar mutation signature A and signature B. After comparison to the

236

COSMIC database (Supplementary Fig. 5C), signature A presented a high rate of

237

C>T transition in XpCpG which may be initiated by spontaneous deamination of

238

5-methylcytosine and has previously been found ubiquitously expressed across

239

different tumor types and highly correlated with aging.22 For signature B, C>G

240

transversion and C>T transition were dominant mutations in TpCpX. This signature is

241

associated with the APOBEC family of cytidine deaminases, which is frequently

242

observed in several tumor types, and presents a DNA damage pattern.23-25 Both

243

signatures A and B in IEN and ESCC samples agreed well with the signatures

244

reported in ESCC sequencing.26 However, signature C in IEN exhibited a high rate of

245

T>A mutations which differed from that in ESCC. After comparison to the COSMIC

246

database, signature C in ESCC presented the main mutations in C>T at XpCpG which

247

may associate with defective DNA mismatch repair. Signature D in ESCC showed

248

transcriptional strand bias for T>C and C>T substitutions. So far, the aetiology of

249

signature C in IEN and signatures C and D in ESCC remains unknown.

AC C

EP

TE D

M AN U

SC

RI PT

230

12

ACCEPTED MANUSCRIPT To evaluate and compare the composition of signatures in paired cases, we

251

analyzed the IEN and ESCC samples as a whole (Fig. 2E) and found that integral

252

signatures contained four types, which were similar to the signatures derived from

253

independent ESCC and IEN cohorts (Supplementary Fig. 5C). We found no

254

significant differences between the contribution of IEN and ESCC to the four

255

signatures (signature A, P = .8169416; signature B, P = .3051342; signature C, P

256

= .4628854; signature D, P = .2297273; Fig. 2F). There still had some special cases,

257

for instance EC1110, all mutations in tumor sample (unique in ESCC and shared

258

mutations) is dominated by signature B so we can infer that shared mutations

259

contributed to signature B and specific mutations in IEN contributed to signatures A,

260

C and D. Hence, during malignant transformation in EC1110, signatures A, C and D

261

played a role in the early stage while the trunk and later mutation events were from

262

signature B which associated DNA damage mutagenesis. To observe which mutations

263

contributed to each signature, we analyzed the shared and unique mutation

264

percentages and the contributions to the signatures in paired IEN and ESCC samples

265

(Fig. 2G).

266

Significantly mutated genes are shared in IEN and ESCC stages

SC

M AN U

TE D

EP

AC C

267

RI PT

250

Subsequently, we measured the significant mutated genes in independent ESCC

268

and IEN cohorts (Fig. 3 and Supplementary Table 7), which were largely similar in

269

ESCC and IEN samples. Large-scale chromosome deletion at 9p21.3 (CDKN2A) and

270

2q35 (ASCL3, FEV) and amplification at 11q13.3 (CCND1), 5p15.33, 8q24, 2q31.2

13

ACCEPTED MANUSCRIPT (NFE2L2), 8p11.23, 7q22.1 and 3q27 (SOX2) were common in both IEN and ESCC

272

samples by GISTIC (Fig. 3B and Supplementary Table 8), which suggests suggesting

273

that these region or candidate genes were early CNA events. Although IEN and ESCC

274

samples did not differ in CNA size (P = .070), ESCC had a widespread increase in

275

CNA status. During the malignant development, amplification of 3q26-3q28, 8q24

276

and 12p13 in ESCC samples contained wider peak boundaries as compared with IEN

277

samples, which suggested that several cancer genes (ATR, MECOM, PIK3CA, BCL6,

278

MYC, CCND2) contained more recurrent CNAs in ESCC samples. We also identified

279

several significant deletion regions (10p11.1, 21p12 and 4q35.2, q < .050) in IEN

280

samples which did not show as significant deletion regions in ESCC samples, which

281

may imply that the genes located in these regions play different roles in IEN and

282

ESCC.

TE D

M AN U

SC

RI PT

271

Furthermore, LOH of TP53 loci was identified in 44% (11/25) of ESCC samples

284

and 30% (8/27) of IEN samples. A similar incidence of LOH for CDKN2A (12% and

285

15%), FAT2 (24% and 11%), NOTCH family genes (32% and 19%), RB1 (16% and

286

11%) and YAP1 (12% and 11%) was seen in ESCC and IEN samples (Supplementary

287

Table 9).

AC C

288

EP

283

To identify mutations likely linked to ESCC development, we selected 126

289

candidate genes (see Supplementary Methods and Supplementary Table 10) and

290

analyzed them by TRS. Analyses relied on an independent validation cohort of

291

45-matched tissues (Supplementary Tables 1). The data for TRS, WES and WGS

14

ACCEPTED MANUSCRIPT were integrated, and we found no distinct difference in the mutation frequency

293

between IEN and ESCC samples (Fig. 4A and Supplementary Table 11). As expected,

294

TP53 was identified as the most frequently mutated gene in both IEN and ESCC

295

samples (71.2% and 74.3%, respectively).

RI PT

292

HOTNET2 was performed to identify the altered subnetworks from the recurrent

297

point mutations and CNAs.27 TP53 signaling, NOTCH signaling, ERBB2-PI3K

298

signaling, DNA repair pathway, NFE2L2-KEAP1 and SWI/SNF complexes were

299

significantly defective during ESCC progression (Fig. 4B).

300

Phylogenetic and clonal analyses reveal potential driver events

M AN U

SC

296

Given the large number of somatic mutations we identified, we assumed that the

302

shared alterations within matched IEN and ESCC samples were considered as trunk

303

events and might allow us to identify “driver” genes. Because copy number variations

304

always affect a constellation of genes, to present the real cancer-related genes, we

305

further considered the trunk CNA genes in the peak regions and included in the cancer

306

gene census (COSMIC). Trunk mutations combined with CNAs detected in more than

307

10% of cases (7/68) are shown in Figure 5A. The trunk genes can be divided into

308

three major categories: (a) DNA repair and apoptosis (as expected, the top two trunk

309

alterations genes were TP53 and CDKN2A); (b) proliferation genes and oncogenes

310

representing a high amplification frequency suggested that the proliferative capability

311

was activated in the IEN stage; and (c) variations in cell adhesion and junction genes.

AC C

EP

TE D

301

15

ACCEPTED MANUSCRIPT Compared with the mutations with low clonal frequencies in ESSH samples

313

(more than 70% mutations with clonal frequencies less than 20%, Supplementary

314

Table 12), the high clonal frequencies of trunk mutated genes and SMGs in IEN

315

indicated that key mutations arose in the IEN stage and gave rise to ESCC (Fig. 5B).

316

The mutations in TP53 and CDKN2A showed high clonal frequency (>90%) in the

317

IEN stage in more than 5% cases (29/73 and 6/73 samples, respectively).

318

Subsequently, mutations in NOTCH1, FAT1, KMT2D and ZFHX4 with a high

319

frequency of trunk events, grew to clonal dominance during the carcinogenesis

320

process. In a previous report, ZFHX4 can regulate CHD4, a core member of the

321

nucleosome remodeling and deacetylase (NuRD) complex in glioblastoma.28

M AN U

SC

RI PT

312

We constructed the clonal frequency and phylogenetic trees of 25 cases which

323

were subjected to WGS and WES (Fig. 6, Supplementary Fig. 6 and Supplementary

324

table 12). For case EC1117, the variations in ESSH were extremely low frequency

325

clonal mutations. Two clusters of clones were observed in the shared mutations. Clone

326

1, containing 19.1% SNVs assessed, was the initial clone with key mutated genes

327

such as TP53, SYNE1 and NCOR1. The shared CNAs of CCDN1, CDKN2A, FGFR1

328

and trunk mutations with the high clonal frequencies indicated they were the early

329

driver events in the carcinogenesis. Surprisingly, an abundant number of SNVs did

330

not overlap in some paired samples, for instance, cases EC1000, EC1120 and EC1134

331

with a lower degree of overlap (1.3%, 3.6% and 3.7%, respectively, Supplementary

332

Fig. 6) showed such genetic heterogeneity between IEN and ESCC. For case EC1000,

AC C

EP

TE D

322

16

ACCEPTED MANUSCRIPT the few overlapping mutations indicated the different lesion sites of the esophagus of

334

EC1000 may undergo independent mutation clonal expansions and become

335

transformed as separate clones.

336

Discussion

RI PT

333

Genomic analyses of ESCC and paired precancerous lesions led us to infer an

338

initial genomic progression model of ESCC (Fig. 7). To the best of our knowledge,

339

this is the first study to characterize the different stages of the ESCC carcinogenic

340

process with genomic profiling. First, in our study, ESSH samples harbored few

341

somatic alterations as compared to IEN and ESCC samples, and most of the mutation

342

sites in ESSH had extremely low clonal frequencies, which suggests that ESSH is an

343

adaptive response. However, we found no distinct barrier between IEN and ESCC in

344

the genetic variant spectrum. The high frequency of mutations and large-scale

345

chromosome aberrations shared between IEN and ESCC indicate that genetic stability

346

in IEN has collapsed already. Although IEN and ESCC have distinct

347

histological patterns, from our analyses, we proposed the characteristics of genetic

348

alteration in IEN may be analogous to tumor cells. The shared clones with high clonal

349

frequencies and key mutations identified in matched IEN and ESCC samples suggest

350

that the initial tumor clones were formed early in IEN stage.

M AN U

TE D

EP

AC C

351

SC

337

Intriguingly, despite shared genetic changes in paired ESCC and IEN samples, a

352

high degree of private mutation sites showed evidence for genetic heterogeneity,12, 29

353

which was similar to paired Barrett’s and EAC samples.30,31 Also some cases showing

17

ACCEPTED MANUSCRIPT high heterogeneity between paired IEN and ESCC samples (three pairs with <5%

355

SNV overlap) implied the possibility of multifocal lesions in the esophagus. The

356

microenvironment such as the composition of microbiota, inflammation, or other

357

environmental-based factors probably contribute to the multifocal lesions.32

RI PT

354

APOBEC-mediated signature was one of the major mutation types in both IEN

359

and ESCC samples. Activated AID/APOBEC deaminases can convert cytosine (dC)

360

to uracil (dU) and finally initiate C>T/G hypermutation in the TpCpN motif.23, 33

361

Because the efficient dC deamination by APOBECs requires single-strand DNA

362

generated from DNA damage (DNA double-strand and single-strand breaks) or

363

replication fork stalling, this signature is considered DNA damage-related

364

mutagenesis.25,

365

visualized marker for DNA damage in IEN, which confirms DNA damage as an

366

important mutation pattern of the esophageal carcinogenic process as well as multiple

367

types of precancerous lesions.35-37 From the analyses of panoramic network and the

368

phylogenetic process, majority of cases harbored defects in DNA damage repair and

369

cell cycle in the early stage and were persistent during ESCC progression, which

370

could help damaged cells escape from the early barriers of tumorigenesis and could

371

underlie the initial tumor clone formation. 37, 38

M AN U

Subsequently, we detected the high expression of γH2AX, a

EP

TE D

34

AC C

372

SC

358

Combined with trunk mutation frequencies and clonal analysis, TP53 and

373

CDKN2A were identified as the early mutated driver genes in IEN. In a hypothesized

374

progression model of EAC, mutations of TP53 were also emphasized as early events

18

ACCEPTED MANUSCRIPT in Barrett’s esophagus (a precancerous lesion to EAC).30 However we also found the

376

two types of esophageal carcinogenic processes contain some differences: CNAs are

377

rare in Barrett’s epithelium,30, 31 whereas a high degree of CNAs was observed in

378

IEN. These findings are consistent with a study from The Cancer Genome Atlas that

379

showed ESCC and EAC have distinct molecular characteristics. For example,

380

amplifications of CCND1 and SOX2 and mutations in NOTCH1 were more commonly

381

in ESCC as compared with EAC.39, 40

SC

RI PT

375

Our analyses also found more than 30% of ESCC cases harbored CCND1, SOX2

383

and MYC amplification as trunk events. The activated proliferation genes and

384

oncogenes in the IEN stage could aggravate DNA damage to the precancerous lesion

385

cells via stalling and collapse of DNA replication forks,41 and in contrast, induce

386

aberrant proliferation of the initial tumor cell and promote its progression, directly or

387

indirectly. Another observation is the high level of variant genes in cell adhesion and

388

junction. Here, we proposed that the genetic defects of adhesion genes also play an

389

important role in the initial process of carcinogenesis, for example, the trunk gene

390

PCDH9 we identified in had shown suppressed capacity in colony formation,

391

metastasis and invasion of tumor cells.42, 43 Perturbed adhesion function in the IEN

392

stage could facilitate infiltration of the initial tumor clonal cells.

TE D

EP

AC C

393

M AN U

382

A preliminary background investigation in the Chaoshan area showed a high

394

prevalence of chronic inflammation (68.85%) in high risk populations of ESCC. In

395

our previous studies, we found that the helicobacter pylori (H. pylori) infection and

19

ACCEPTED MANUSCRIPT the related chronic inflammation may contribute to the high incidence of ESCC/

397

gastric cardia cancer (GCC) in the Chaoshan area and may lead to dysplasia of

398

epithelial tissues.44, 45 The close correlation between cellular atypia, DNA damage

399

status and inflammation suggest that chronic inflammation may be an important

400

pathogenic factor and contribute to the neoplastic process in the high-incidence

401

esophageal cancer district in China.

RI PT

396

In the current study, we present the preliminary exploration of genetic

403

variations in the multi-stage carcinogenesis of ESCC. The data can be used to model

404

the key genetic events in esophageal tumorigenesis as potential biomarkers for early

405

detection and provide new insights into the architecture of ESCC progression.

406

However, we still lack direct evidence to prove inflammation as a pathogenic factor in

407

ESCC development and we have not yet identified the “last straw” events that

408

promote IEN into infiltrating carcinoma, which remains for further exploration by a

409

holistic approach to epigenetics.

410

Acknowledgments

EP

TE D

M AN U

SC

402

We thank Bruce AJ Ponder, Dante Neculai and Zeev Ronai for their help in the

412

revised manuscript and all the staff and assistants in the Department of Pathology of

413

Shantou University Medical College for their support in collecting samples.

414

AC C

411

415 416 417 20

ACCEPTED MANUSCRIPT 418

References

419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459

1.

Rustgi AK, El-Serag HB. Esophageal carcinoma. N Engl J Med 2014;371:2499-509.

2.

Pennathur A, Farkas A, Krasinskas AM, et al. Esophagectomy for T1 Esophageal Cancer: Outcomes in 100 Patients and Implications for Endoscopic Therapy. Annals Of Thoracic Surgery 2009;87:1048-1055.

3.

Enzinger PC, Mayer RJ. Medical progress - Esophageal cancer. New England Journal Of

4.

RI PT

Medicine 2003;349:2241-2252.

Wang GQ, Abnet CC, Shen Q, et al. Histological precursors of oesophageal squamous cell

carcinoma: results from a 13 year prospective follow up study in a high risk population. Gut 2005;54:187-92. 5.

Dawsey SM, Lewin KJ, Wang GQ, et al. Squamous esophageal histology and subsequent risk China. Cancer 1994;74:1686-92.

6.

SC

of squamous cell carcinoma of the esophagus. A prospective follow-up study from Linxian,

Song YM, Li L, Ou YW, et al. Identification of genomic alterations in oesophageal squamous cell cancer. Nature 2014;509:91-+.

Gao YB, Chen ZL, Li JG, et al. Genetic landscape of esophageal squamous cell carcinoma. Nat Genet 2014;46:1097-102.

8.

M AN U

7.

Lin DC, Hao JJ, Nagata Y, et al. Genomic and molecular characterization of esophageal squamous cell carcinoma. Nat Genet 2014;46:467-73.

9.

Sawada G, Niida A, Uchi R, et al. Genomic Landscape of Esophageal Squamous Cell Carcinoma in a Japanese Population. Gastroenterology 2016;150:1171-82.

10.

Gao H, Wang LD, Zhou Q, et al. p53 tumor suppressor gene mutation in early esophageal

TE D

precancerous lesions and carcinoma among high-risk populations in Henan, China. Cancer Res 1994;54:4342-6. 11.

Roth MJ, Hu N, Emmert-Buck MR, et al. Genetic progression and heterogeneity associated with the development of esophageal squamous cell carcinoma. Cancer Res 2001;61:4098-104.

12.

Liu M, Zhang F, Liu S, et al. Loss of heterozygosity analysis of microsatellites on multiple

EP

chromosome regions in dysplasia and squamous cell carcinoma of the esophagus. Exp Ther Med 2011;2:997-1001. 13.

Shimizu T, Marusawa H, Matsumoto Y, et al. Accumulation of Somatic Mutations in TP53 in

AC C

Gastric Epithelium With Helicobacter pylori Infection. Gastroenterology 2014;147:407-+. 14.

Chiba T, Marusawa H, Ushijima T. Inflammation-Associated Cancer Development in

Digestive Organs: Mechanisms and Roles for Genetic and Epigenetic Modulation. Gastroenterology 2012;143:550-563.

15.

Lin R, Zhang C, Zheng J, et al. Chronic inflammation-associated genomic instability paves the way for human esophageal carcinogenesis. Oncotarget 2016.

16.

Cibulskis K, Lawrence MS, Carter SL, et al. Sensitive detection of somatic point mutations in

17.

Dees ND, Zhang QY, Kandoth C, et al. MuSiC: Identifying mutational significance in cancer

impure and heterogeneous cancer samples. Nature Biotechnology 2013;31:213-219. genomes. Genome Research 2012;22:1589-1598. 18.

Boeva V, Popova T, Bleakley K, et al. Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 2012;28:423-425. 21

ACCEPTED MANUSCRIPT 19.

Roth A, Khattra J, Yap D, et al. PyClone: statistical inference of clonal population structure in cancer. Nature Methods 2014;11:396-+.

20.

Carter SL, Cibulskis K, Helman E, et al. Absolute quantification of somatic DNA alterations in human cancer. Nat Biotechnol 2012;30:413-21.

21.

Nik-Zainal S, Alexandrov LB, Wedge DC, et al. Mutational Processes Molding the Genomes of 21 Breast Cancers. Cell 2012;149:979-993.

22.

Wang K, Yuen ST, Xu J, et al. Whole-genome sequencing and comprehensive molecular

23.

RI PT

profiling identify new driver mutations in gastric cancer. Nat Genet 2014;46:573-82.

Roberts SA, Lawrence MS, Klimczak LJ, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nature Genetics 2013;45:970-+.

Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet 2013;45:977-83.

25.

Rebhandl S, Huemer M, Greil R, et al. AID/APOBEC deaminases and cancer. Oncoscience 2015;2:320-33.

26.

SC

24.

Zhang L, Zhou Y, Cheng CX, et al. Genomic Analyses Reveal Mutational Signatures and Frequently Altered Genes in Esophageal Squamous Cell Carcinoma. American Journal of

27.

M AN U

Human Genetics 2015;96:597-611.

Leiserson MD, Vandin F, Wu HT, et al. Pan-cancer network analysis identifies combinations of rare somatic mutations across pathways and protein complexes. Nat Genet 2015;47:106-14.

28.

Chudnovsky Y, Kim D, Zheng SY, et al. ZFHX4 Interacts with the NuRD Core Member CHD4 and Regulates the Glioblastoma Tumor-Initiating Cell State. Cell Reports 2014;6:313-324.

29.

Shimada M, Yanagisawa A, Kato Y, et al. Genetic mechanisms in esophageal carcinogenesis: 1996;15:165-9.

30.

TE D

frequent deletion of 3p and 17p in premalignant lesions. Genes Chromosomes Cancer Stachler MD, Taylor-Weiner A, Peng S, et al. Paired exome analysis of Barrett's esophagus and adenocarcinoma. Nat Genet 2015. 31.

Ross-Innes CS, Becq J, Warren A, et al. Whole-genome sequencing provides new insights 2015.

32.

EP

into the clonal architecture of Barrett's esophagus and esophageal adenocarcinoma. Nat Genet Nasrollahzadeh D, Malekzadeh R, Ploner A, et al. Variations of gastric corpus microbiota are associated with early esophageal squamous cell carcinoma and squamous dysplasia. Scientific

AC C

460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501

Reports 2015;5.

33.

Alexandrov LB, Nik-Zainal S, Wedge DC, et al. Signatures of mutational processes in human

cancer (vol 500, pg 415, 2013). Nature 2013;502.

34.

Chelico L, Pham P, Goodman MF. Mechanisms of APOBEC3G-catalyzed processive

deamination of deoxycytidine on single-stranded DNA. Nature Structural & Molecular Biology 2009;16:454-455.

35.

Gorgoulis VG, Vassiliou LV, Karakaidos P, et al. Activation of the DNA damage checkpoint and genomic instability in human precancerous lesions. Nature 2005;434:907-13.

36.

Ma N, Tagawa T, Hiraku Y, et al. 8-Nitroguanine formation in oral leukoplakia, a premalignant lesion. Nitric Oxide 2006;14:137-43.

22

ACCEPTED MANUSCRIPT 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519

37.

Bartkova J, Horejsi Z, Koed K, et al. DNA damage response as a candidate anti-cancer barrier

520

Author names in bold designate shared co-first authorship

in early human tumorigenesis. Nature 2005;434:864-870. 38.

Campisi J. Suppressing cancer: the importance of being senescent. Science 2005;309:886-7.

39.

Agrawal N, Jiao YC, Bettegowda C, et al. Comparative Genomic Analysis of Esophageal Adenocarcinoma and Squamous Cell Carcinoma. Cancer Discovery 2012;2:899-905.

40.

Cancer Genome Atlas Research N, Analysis Working Group: Asan U, Agency BCC, et al. Integrated genomic characterization of oesophageal carcinoma. Nature 2017;541:169-175. Halazonetis TD, Gorgoulis VG, Bartek J. An oncogene-induced DNA damage model for

42.

Wang CL, Tao BB, Li ST, et al. Characterizing the Role of PCDH9 in the Regulation of

cancer development. Science 2008;319:1352-5.

Glioma Cell Apoptosis and Invasion. Journal Of Molecular Neuroscience 2014;52:250-260. 43.

Chen Y, Xiang HG, Zhang YF, et al. Loss of PCDH9 is associated with the differentiation of Experimental Metastasis 2015;32:417-428.

44.

Li WS, Tian DP, Guan XY, et al. Esophageal intraepithelial invasion of Helicobacter pylori correlates with atypical hyperplasia. International Journal Of Cancer 2014;134:2626-2632. Wang YS, Liu SH, Zhang Y, et al. Helicobacter pylori infection and gastric cardia cancer in

M AN U

45.

Chaoshan region. Microbes And Infection 2014;16:840-844.

527 528 529

EP

526

AC C

525

TE D

522

524

SC

tumor cells and metastasis and predicts poor survival in gastric cancer. Clinical &

521

523

RI PT

41.

530 531

23

ACCEPTED MANUSCRIPT

Figure legends

533

Figure 1. DNA damage and activation of NF-κB correlated with inflammation.

534

(A) Correlation between inflammation score and histologic types. (Spearman’s rank

535

correlation test, two-sided). (B, C) Quantification of γH2AX positive cells in different

536

histological tissue samples and different scores of inflammations. (Nonparametric test,

537

two-sided). (D) Top panel shows hematoxylin and eosin (H&E) staining of esophageal

538

epithelium. Immunohistochemistry (IHC) confirms a stepwise increased expression of

539

γH2AX and NF-κB at the transition of cellular atypia and inflammation degree. Scale

540

bars, 100 µm.

541

Figure 2. Mutation spectrum shared by IEN and ESCC (ESCC, n = 25; IEN, n =

542

27; ESSH, n = 14).

543

(A, B) Mutation rates and number of CNA base pairs for ESSH, IEN and ESCC. The

544

data were analyzed by Wilcoxon rank-sum test, two-sided. (C) Analysis of mutation

545

spectra. Base mutations are characterized as six types, and the percentage of each is

546

calculated in the three cohorts. (D) Proportion of SNVs occurring in specific

547

nucleotide motif contexts for each category of single-nucleotide substitution. (E)

548

Three mutational signatures extracted from WGS and WES data derived from 25

549

paired ESCC and IEN samples. The major components contributing to each signature

550

are highlighted by arrows. (F) Comparison of percentage contribution of signatures

551

between paired IEN and ESCC samples, P values were computed by nonparametric

552

test. (G) The percentage of unique and shared SNVs and relative contributions of

AC C

EP

TE D

M AN U

SC

RI PT

532

24

ACCEPTED MANUSCRIPT mutational signatures in individual cases.

554

Figure 3. The variation landscape of ESCC, IEN and ESSH (ESCC, n = 25; IEN,

555

n = 27; ESSH, n = 14).

556

(A) Significantly mutated genes from WES and WGS and genes selected from the

557

literatures (marked with *) are ranked in the top panel.6-9 Cancer genes (based on

558

COSMIC analysis) within the CNA peak regions are shown in bottom panel. Samples

559

are grouped by clinical features. Altered sample counts are shown in the left panel.

560

Based on mutation rate and frequency of altered genes, ESSH samples differed

561

markedly from IEN and ESCC samples. (CN: copy number) (B) GISTIC analysis of

562

ESCC and IEN samples. G-scores of genomic amplifications and deletions of ESCC

563

and IEN are plotted as curves. Genes located in most significant regions are

564

represented.

565

Figure 4. The frequency of recurrently mutated genes and gene networks in

566

ESCC and IEN.

567

(A) Percentage of samples with recurrently mutated genes (mutated in ≥7 samples)

568

identified in ESCC and IEN cohorts (ESCC, n=70; IEN, n=73) (B) Subnetworks

569

based on somatic mutations (deleterious mutations) and CNAs (CN ≤ 1 and ≥ 4 copies)

570

are defined by HOTNET2 analysis and annotated by KEGG and Reactome pathways.

571

Figure 5. Clonal evolution and driver genes in paired ESCC and IEN cases.

AC C

EP

TE D

M AN U

SC

RI PT

553

25

ACCEPTED MANUSCRIPT (A) The frequency of trunk altered genes detected in more than 10% of cases. (B)

573

Clonal frequencies in somatic mutations of paired samples are shown. Square shading

574

encodes the clonal frequency (CF: the percent of tumor cells).

575

Figure 6. Evolutionary process of individual cases.

576

(A) Clonal frequency comparison of SNVs detected in IEN and ESCC samples. The

577

x-axis and y-axis correspond to the clonal frequency of IEN and ESCC samples,

578

respectively. The number of mutations in each clone cluster was calculated. The genes

579

with extremely low clonal frequency are marked with asterisk (*). (B) The inferred

580

phylogenetic trees. The numbers indicate the number of nonsynonymous mutations.

581

The trunk and branch lengths are proportional to the number of somatic mutations.

582

Key mutations are marked with clonal frequency analysis (arrows). (C, D) Copy

583

number alterations and the beta allele frequency (BAF) profiles across chromosomes.

584

Cancer genes and the high frequency alteration genes are labeled. The purple boxes

585

indicate the overlapping alterations between the samples. For BAF profiles, the allelic

586

imbalances from the matched allelic ratio (0.5:0.5) of germline heterozygous

587

single-nucleotide polymorphisms (SNPs) is plotted on the y-axis. Predicted BAF are

588

shown in black and loss of heterozygosity (LOH) are shown in blue.

589

Figure 7. The model of ESCC development.

590

The preliminary progression model of hyperplasia, intraepithelial neoplasia and

591

infiltrating carcinoma.

AC C

EP

TE D

M AN U

SC

RI PT

572

26

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

ACCEPTED MANUSCRIPT