Identifying potential DNA methylation markers in early-stage colorectal Cancer

Identifying potential DNA methylation markers in early-stage colorectal Cancer

Journal Pre-proof Identifying potential DNA methylation markers in early-stage colorectal Cancer Xiaoyu Zhang, Shenmei Wan, Yanqi Yu, Weimei Ruan, Ho...

9MB Sizes 2 Downloads 54 Views

Journal Pre-proof Identifying potential DNA methylation markers in early-stage colorectal Cancer

Xiaoyu Zhang, Shenmei Wan, Yanqi Yu, Weimei Ruan, Hong Wang, Linhao Xu, Chanjuan Wang, Shang Chen, Tianfeng Cao, Quanzhou Peng, Sihui Li, Tianliang Hu, Zeyu Jiang, Zhiwei Chen, Jian-Bing Fan PII:

S0888-7543(20)30306-2

DOI:

https://doi.org/10.1016/j.ygeno.2020.06.007

Reference:

YGENO 9578

To appear in:

Genomics

Received date:

24 March 2020

Revised date:

29 April 2020

Accepted date:

3 June 2020

Please cite this article as: X. Zhang, S. Wan, Y. Yu, et al., Identifying potential DNA methylation markers in early-stage colorectal Cancer, Genomics (2019), https://doi.org/ 10.1016/j.ygeno.2020.06.007

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Identifying Potential DNA Methylation Markers in Early-stage Colorectal Cancer Xiaoyu Zhang1 ,Shenmei Wan1Error! Bookmark not defined. , Yanqi Yu1 , Weimei Ruan2 , Hong Wang2 , Linhao Xu2 , Chanjuan Wang1 , Shang Chen1 , Tianfeng Cao1 ,Quanzhou Peng1 , Sihui Li2 , Tianliang Hu2 , Zeyu Jiang2 , Zhiwei Chen2,3,*

Department of Pathology, School of Basic Medical Science, Southern Medical

AnchorDx Medical Co., Ltd, Unit 502, 3rd Luoxuan Road, International Bio-Island,

Pr

2

e-

University, Guangzhou, China, 510515.

pr

1

oo

f

[email protected], Jian-Bing Fan1,2,* [email protected]

Guangzhou, China, 510300.

AnchorDx, Inc., 46305 Landing Pkwy, Fremont, CA, United States 94538

*

Corresponding authors.

ABSTRACT

Jo u

rn

al

3

Colorectal cancer (CRC) is the second leading malignancy worldwide. Accurate screening is pivotal to early CRC detection, yet current screening modality involves invasive colonoscopy while non-invasive FIT tests have limited sensitivity. We applied a DNA methylation assay to identify biomarkers for early-stage CRC detection, risk stratification and precancerous lesion screening at tissue level. A model of biomarkers SFMBT2, ITGA4, THBD and ZNF304 showed 96.1% sensitivity and 87% specificity in CRC detection, with 100% sensitivity for advanced precancerous

Journal Pre-proof lesion and stage I CRC. Performances were further validated with TCGA data set, which showed a consistent AUC of 0.99 and exhibited specificity against other cancer types. KCNJ12, VAV3-AS1 and EVC were further identified for stage stratification (stage 0-I versus stage II-IV), with AUC of 0.87, 83.0% sensitivity and 71.2% specificity. Additionally, dual markers of NEUROD1 and FAM72C showed 83.2% sensitivity and 77.4% specificity in differing non-advanced precancerous lesions from

f

inflammatory bowel diseases.

oo

Keywords: methylation profiling; methylation biomarkers; early-stage colorectal

e-

pr

cancer

Pr

Abbreviations colorectal cancer;

APL

advanced-precancerous lesions;

al

CRC

rn

IBD inflammatory bowel disease;

NAPL non-advanced precancerous lesions; negative predictive values;

Jo u

NPV

HP hyperplastic polyps ; HGIN

high grade intraepithelial neoplasia;

LGIN

low grade intraepithelial neoplasia;

FFPE

the formalin-fixed paraffin-embedded;

H & E hematoxylin and eosin; AUC

area under curve;

CT ,computed tomography; MRI

magnetic resonance imaging;

ROC

receiver operating characteristic;

BRCA breast invasive carcinoma;

Journal Pre-proof COAD colon adenocarcinoma; ESCA esophageal carcinoma; LIGH

liver hepatocellular carcinoma;

LUAD lung adenocarcinoma; LUSC

lung squamous cell carcinoma;

PRAD prostate adenocarcinoma; READ rectum adenocarcinoma;

oo

f

STAD stomach adenocarcinoma;

microsatellite stable;

MSI

microsatellite instable

Pr

e-

MSS

pr

TCGA The Cancer Genome Atlas;

al

INTRODUCTION

rn

Colorectal cancer (CRC) is the fourth most common and second most deadly cancer globally with an estimate 1.8 million new cases and 881,000 deaths in 2018,

Jo u

respectively [1]. Patients diagnosed with localized CRC have much better survival with a 5-year survival rate of 89.9% compared to 14.2% in patients diagnosed with distant CRC (stage IV) [2]. Therefore, screening in populations of average-risk that enables early detection of CRC and early removal of polyps is of significant clinical utilities. An estimated annual 60% mortality reduction and 73% 5-year survival rate increment may have been expected if regular screening is carried out [3]. The two mainstays of screening approach include non-invasive stool/blood-based screening tests and invasive visualization screening tests [4]. The gold standard of CRC diagnosis remains colonoscopy or sigmoidoscopy followed by biopsy lesion

Journal Pre-proof pathological confirmation [5, 6]. Despite the relatively high reliability of colonoscopy in detection of early stage CRC (stage I), the invasive test causes discomfort to patients and is costly, resulting in decreased patient compliant compared to non-invasive tests [5]. The non-invasive screening tests detect genome instability, somatic mutations and abnormal DNA methylations. The stool-based screening tests (e.g. Cologuard from Exact Science) have overall CRC sensitivities of 73.8% - 92.3% compared to colonoscopy as non-invasive alternatives for CRC screening [7].

oo

f

However, these tests show decreased sensitivities of 46.2% - 69.2% against polyps with high-grade dysplasia, and even worst sensitivities of 23.8% - 42.4% for

pr

advanced precancerous lesions detection [7]. Similarly, the blood-based DNA

e-

methylation test, mSEPT9 assay (Epigenomics), offers non- inferiors sensitivity of 35%

Pr

and 11% in detecting stage I CRC and advanced adenoma, respectively [8]. Identifying biomarkers that associate with early stage CRC is pivotal aiding in

al

non-invasive CRC early detection.

rn

In this study, we aim to identify biomarkers that can be potentially used to facilitate a

Jo u

systematic CRC diagnostic management, in which CRC can be firstly detected emphasising on high sensitivity on early stage CRC (early detection markers), and subsequently patients identified with CRC can be stage stratified (stratification markers) and patients identified with Non-CRC can be further ruled out of non-advanced precancerous lesions (precancerous markers) for having excessive colonoscopy. MATERIALS AND METHODS Study populations

Journal Pre-proof The formalin- fixed paraffin-embedded (FFPE) tissue samples used in this study were collected from the Southern Hospital of Southern Medical University from November 2017 to May 2019. A total of 95 participants were recruited in the study, among which, 18 were diagnosed with advanced-precancerous lesions (APL) including those with cancer in situ or high grade dysplasia adenoma, 22 with stage I, and 18 with stage II/III/IV CRC. The cohort also included tissue samples from participants diagnosed with non-advanced precancerous lesions (NAPL) or inflammatory bowel

oo

f

disease (IBD). The NAPL groups include samples from patients with hyperplastic polys (HP), tubular adenoma, tubulovillous adenoma of low grade, and with low

pr

grade intraepithelial neoplasia (LGIN). All samples underwent representative

e-

hematoxylin and eosin (H & E) sections for pathology confirmation and estimation of

Pr

tumor content. Samples from CRC patients with a tumor content less than 30% were excluded in the analysis. The patient characteristics of the study including the age,

al

gender, lesion histology, tumor location and staging information was summarized in

rn

Table 1. The human methylation 450 K array data set and clinical characteristics of

Jo u

breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), esophageal carcinoma (ESCA), liver hepatocellular carcinoma (LIGH), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), prostate adenocarcinoma (PRAD), rectum adenocarcinoma (READ) and stomach adenocarcinoma (STAD) and the RNA-seq data of COAD and READ were obtained from The Cancer Genome Atlas (TCGA). Ethical approvals The study was conducted under the approval of the local and regional institutional review boards of the Southern Medical University. The informed consent was obtained from all participants.

Journal Pre-proof DNA extraction, bisulfite treatment and methylation analysis Genomic DNA from the FFPE tissue samples were extracted by the AllPrep DNA/RNA FFPE Kit (Qiagen, Germany) according to the manufacturer’s instruction. A total of 20 ng of genomic DNA from each sample was subjected to bisulfite treatment with the EZ-96-DNA Methylation-Direct MagPrep Kit (Zymo Research, United States). The bisulfite- modified DNA were further analyzed by a 21-gene DNA

f

methylation panel according to the manufacturer’s instruction (AnchorDx, China).

oo

The EpiTect PCR Control DNA Set (Qiagen, Germany) served as positive and

pr

negative controls. The qPCR reactions were carried out on the QuantStudio 3

Pr

Data and statistical analysis

e-

Real-Time PCR System (Thermo Fisher, United States).

The ∆Ct was used for the representation of the co-methylation level in target region of

al

interest, where ∆Ct = Mean Ct (target region of interest) - Mean Ct (region of internal

rn

control). For regions with undetermined Ct values, an artificial ∆Ct of 35 was

Jo u

assigned. Heatmap and clustering of the methylation markers was generated using R package of ComplexHeatmap. The clinical performance analysis of individual markers and finalized classification models were evaluated using the R pROC package with bootstrap method of 2,000 and 100 resampling, respectively. Methylation levels of individual marker from the TCGA cohort were calculated as average methylation values of CpG site involved in the marker regions. The logistic regression-based model constructions were conducted using Python Sklearn packages. One-way ANOVA with Turkey post hoc comparison was used for statistical analysis on distributions of risk probabilities among different test groups. Significance was determined at p < 0.05 using the Prism 7.

Journal Pre-proof RESULTS Clinical features of the cohort Clinical features of the cohort including ages, genders of all groups, histology of lesions, tumor location and staging of CRC were shown in Table 1. There were no significant differences of any of the clinical features among all groups. The distributions of ages and genders of all groups and histology of lesions showed

oo

f

epidemiological consistence with reported data of CRC [5, 9-11]. To evaluate biomarkers for early-stage CRC detection more accurately, the cohort was designed to

pr

recruit more stage I CRC patients and populations with APL. All patients diagnosed

e-

with CRC in the cohort were microsatellite stable (MSS).

Pr

Identification of DNA methylation markers associated with CRC

al

We tested DNA methylation levels of 21 gene region in tissue samples from the five groups including IBD, NAPL, APL, stage I CRC and stage II-IV CRC. Targeted

rn

methylation quantifications with ∆Ct values inversely representing DNA methylation

Jo u

levels for each gene region. Heatmaps of methylation pattern with marker clustering from different test groups are shown in Figure 1. An overall higher methylation level was observed in the CRC/APL group compared to the non-CRC group (IBD and NAPL), indicating the potential binary classification power of these methylation markers. We further evaluated the clinical performance of individual markers in CRC group or early stage CRC group (APL and stage I CRC) compared to non-CRC group to identify best performing markers associated with CRC and early stage CRC. The performance characteristics of top markers were shown in Table 2. Among the top markers, the marker SFMBT2 showed the highest area under curve (AUC) of 0.96 and 0.97 for detecting CRC and early stage CRC, respectively. Marker C9orf50

Journal Pre-proof detects CRC and early-stage CRC at mean sensitivities of 94.1% (95%CI: 80.4%-100.0%) and 97.2% (95%CI: 83.3%-100.0%) in 2000 bootstrap samplings. And ZNF304 has a mean specificity of 100% in classifying both CRC and early stage CRC. We have also observed a highly consistent performance and marker clustering of these markers in detection of CRC and early stage CRC, implicating these top markers being the DNA methylation signatures in CRC development.

oo

f

Development of diagnostic model for CRC early detection Based on the performance of individual markers, we took the top performing markers

pr

(Table 2) forward to develop a methylation model for CRC early detection. To

e-

minimize marker redundancy, we limited the maximum number of markers from the

Pr

same hierarchical clustering group (Table 2) to two in the model. An exhaustive search of all marker combinations was used to identify the best performing marker

al

combination with consistent performance characteristics in 100 splits of data sets

rn

using logistic regression. A methylation model with marker combination of SFMBT2, ITGA4, THBD and ZNF304 was finalized and the receiver operating characteristic

Jo u

(ROC) curves in train and test set compared to individual markers were shown in Figure 2A and 2B. The model achieved an AUC of 0.96, sensitivity of 96.1% and specificity of 87% in the test set under a cut-off threshold of 0.51, compared to individual markers’ sensitivities of 85.7%, 85.7%, 84.1% and 70.0% and specificities of 87.0%, 87.0%, 87.0% and 100.0% for SFMBT2, ITGA4, THBD and ZNF304, respectively. The detection sensitivities of the model against different stages of CRC were shown in Figure 2C and Table 3. The distributions of CRC risk probability generated from the model in different disease groups were shown in Figure 2D. The methylation model was able to detect 100.0% APL and stage I CRC, with 100.0% sensitivity for both T1 and T2 lesions, and 87.0% stage II-IV CRC at tissue level, with

Journal Pre-proof a significant differences (p<0.05) of risk probability compared to IBD and NAPL groups. Compared to individual markers, the methylation model had superior sensitivity in detecting APL, indicating an enhanced sensitivity for early CRC detection using multiple methylation markers. The detection sensitivities with regard to the tumor position were also evaluated and the model revealed high sensitivities against both left-sided (95.8% and 93.8%) and right-sided tumors (100.0%) in both

f

train and test sets (Table 3).

oo

Specificity and validation of 4-marker model for CRC detection

pr

We further validated the performance and specificity of the four markers used in CRC

e-

detection model through the TCGA data sets. The methylation levels of CpG sites

Pr

involved in the marker region were compared between normal and primary tumor tissue samples in the corresponding COAD and READ data sets as well as in other

al

cancer types of high incidence and mortality rates or within the gastrointestinal

rn

system. Consistent with our tissue cohort results, distinct differences of methylation levels of the four markers between normal and primary tumor tissues were observed

Jo u

(Figure 3A). Detection model built by the four markers in COAD and READ data sets further showed an AUC of 0.99 (Figure 3C). Methylation levels of all the four markers showed poor differentiation between normal and tumor tissue in BRCA, LIHC, LUAD,LUSC and PRAD data sets, revealing high specificities of these markers against non-gastrointestinal cancers (Figure 3A). Within the gastrointestinal system, while ZNF304 exhibited relatively high specificity on CRC with low methylation levels in both ESCA and STAD data sets, SFMBT2, ITGA4 and THBD showed mild levels of methylation in ESCA and STAD tumor samples (Figure 3A). Logistic regression model of the four markers based on COAD and READ data set were further used for evaluating the performance in ESCA or STAD detection. The

Journal Pre-proof four-marker model showed AUCs of 0.74 and 0.80 against ESCA and STAD respectively, indicating inferior performance than in CRC detection (Figure 3C). The model also exhibited high sensitivities regardless of the tumor microsatellites status and tumor location, with 98.3% and 92.4% for microsatellite instable (MSI) and MSS tumor, and 95.3% and 100.0% for left-sided and right-sided tumor, respectively (Figure 3D). For early stage CRC, the model detect 83.2% T1 lesion and 100.0% T2 lesion (Figure 3D). Additionally, methylations and gene expressions of these four

oo

f

markers were inversely correlated in COAD and READ data sets, indicating

pr

methylations of these genes led to functional gene suppression (Figure 3B).

e-

Methylation markers associated with CRC progression for staging prediction

Pr

Given that standard CRC management is highly depended on the accuracy of pathological staging, which are mainly determined at the post- surgery tissue level,

al

biomarkers used for non-invasive stratification of disease progression will aid in a

rn

real-time and precise prediction of staging. We further compared the DNA methylation landscapes of the 21 marker between early stage CRC (APL and stage I

Jo u

CRC) and mid-late stage CRC (stage II-IV CRC) to identify markers that can indicate CRC progression. The top 8 markers for differentiating APL and stage I CRC from stage II-IV CRC were shown in Table 4, with AUCs ranging from 0.57 to 0.73, sensitivities ranging from 60.0% to 93.3% and specificities ranging from 44.4% to 80.6%. A disease stratification model with markers of KCNJ12, VAV3-AS1 and EVC was set using exhaustive searches of all possible top marker combinations with 100 splits of the data set (Figure 4A). The stratification model showed enhanced sensitivity of 83.0%, specificity of 71.2% and AUC of 0.87 in the test set compared to individual markers (Figure 4B and 4C). Based on the stratification model, the APL population, stage I CRC group and stage II-IV groups can be classified into different

Journal Pre-proof risk level consistent with CRC staging (Figure 4D). Markers to identify NAPL from IBD populations We have also explored potential markers associated with NAPL against IBD populations. Individual marker analysis indicated that the top NAPL markers showed AUCs of 0.77-0.85, sensitivities of 71.4%-100.0% and specificities of 60.9%-91.3% in differentiating NAPL and IBD groups (Table 4). A dual-marker NAPL model

oo

f

(NEUROD1 and FAM72C) was further established with an AUC of 0.91 and 0.87 in the training and test data sets, respectively (Figure 5A and 5B). 83.2% sensitivity and

pr

77.4% specificity were observed in the test set based on the threshold set on the

e-

training set. Based on the disease prevalence of NAPL (25%) in average-risk

Pr

populations [12, 13], adjusted NPV for the NAPL model was 93.3%, which was higher than individual markers (89.9% and 87.5%) (Figure 5C).

al

DISCUSSIONS

rn

Using targeted methylation profiling on tissue samples, we have identified a panel of

Jo u

biomarker, methylation levels of which were associated with CRC and APL. A methylation model based on the combinations of SFMBT2, ITGA4, THBD and ZNF304 gene region showed sensitivities of 96.1% on CRC of all stages and 100.0% on detection of early-stage CRC (APL and stage I CRC), with a specificity of 87.0%. This assay demonstrated superior overall clinical performance in the detection of CRC compared to multiple reported tissue-based methylation assay, including assay panel of MGMT, RASSF1A, SEPT9 (96.6% sensitivity, 74.0% specificity), assay panel of CMTM3, SSTR2, and MDFI (81.0% sensitivity and 91.0% specificity), and assay panel of APC, RASSF1A, ITGA4, SEPT9 and VIM (82.7% sensitivity and 97.3% specificity) [14-16]. Additionally, the model revealed similar sensitivities of

Journal Pre-proof 93.8%-100% in detection of both left and right-sided tumors in our cohort, regardless the reported genetic differences between them [17]. Due to a high percentage of left-sided tumor and monotonicity of MSS tumor in this study, which may not represent all tumor types of CRC, we further validated the four methylation markers with the TCGA cohorts. The four- marker model showed a consistently AUC of 0.99 in COAD and READ cohorts from the human methylation 450 K array data set (TCGA), with an overall 96.3% sensitivity and 100.0% specificity. In particular, the

oo

f

four-marker model showed high sensitivities on both left-sided and right-sided tumors (95.3% and 100.0%) and MSS and MSI tumors (98.3% and 92.4%). In addition, these

pr

markers showed high specificities against breast, liver, lung and prostate cancers.

e-

While tumor tissues of ESCA and STAD showed mild methylations in SFMBT2 and

Pr

ITGA4 region, the four- marker classification model showed inferior AUCs of 0.74 and 0.80 in ESCA and STAD detections. The consistent performance and high

al

specificity against other cancer types implied that these markers are not ethnicity

rn

specific. The evidence indicated a potential of the four markers for developing a

Jo u

non- invasive CRC screening assay in terms of CRC specificity and desirable sensitivity on early stage CRC regardless of tumor sides and microsatellites status. It was observed that methylations of the four marke were inversely correlated to their gene expressions, with high methylation levels and low gene expressions in tumor samples in COAD and READ cohorts, indicating the methylations of the four gene region may play a role in the corresponding functional gene suppression. In addition to the reported association of ITGA4 and ZNF304 hypermethylation with CRC and THBD methylation with early stage CRC [10, 11, 18], the combination of these markers showed a favorable detection sensitivity of early-stage CRC including APL and stage I CRC. Among them, SFMBT2 may have potential roles in regulations of

Journal Pre-proof CRC tumorigenesis. As the polycomb group repressors, SFMBT2 has been reported to control cell growth via epigenetic regulation of HOXB13, and repress cell migration and invasion by interaction with transcriptional repressive histone marks in prostate cancer [19, 20]. In addition to early stage CRC detection, CRC treatment modalities and surveil lance workups depend heavily on accurate disease staging which correlates clearly with

f

survival outcomes [6]. The current staging system relies mostly on histological

oo

examinations of biopsy from colonoscopy in conjunction of computed tomography

pr

(CT) and magnetic resonance imaging (MRI). Due to the heterogeneity nature of

e-

tumors, such approach may lead to inaccurate staging of the disease, which affects therapeutic planning and raises urgency of biomarkers for homogeneous biopsy test.

Pr

For this purpose, we have further identified a group of markers for resolving stage II-IV CRC from stage I CRC and APL. Among the top 8 markers for stage

al

stratification from single marker performance analysis, there were five including

rn

VAV3-AS1, EVC, C9orf50, ITGA4 and SFMBT2, overlapping with top markers for

Jo u

early stage CRC detection (Table 2 and 3), implicating that these five methylation signatures may robustly reflectd cancer initiation and progression. Two overlapping markers, VAV3-AS1 and EVC, in combination of top stratification marker KCNJ12, further enhanced the sensitivity and specificity of disease stratification to 83.0% and 71.2% respectively in current cohort. However, it was observed that these markers exhibited inferior performance in the TCGA cohort (stage I versus stage II-IV), with sensitivity of 52.6%, specificity of 78.4% and AUC of 0.73 in the test set (Supplementary Figure 1). The observation may indicate that these markers might be ethnic and race specific and required further multi-center studies for future validation. While it has been reported that VAV3, EVC and KCNJ12 may play a role in tumour

Journal Pre-proof growth in a variety of cancers such as breast cancer and adult T-cell leukemia [21-23], the precise network regulations by the epigenetic control of these genes on CRC progression still required further investigations. Interestingly, the methylation level of KCNJ12 was inversely correlated with staging classifications (the highest level in the APL groups while lowest in stage IV CRC) while such trend is not reflected in non-CRC control groups. The evidence indicates that alteration in KCNJ12

f

methylation may be a unique characteristic in CRC progression.

oo

We have also explored further to identify a set of methylation markers that can detect

pr

NAPL including hyperplastic polyps and adenoma with low grade dysplasia from

e-

IBD population. A classification model with dual markers (NEUROD1 and FAM72C) showed sensitivity of 83.2% and estimated NPV of 93.3% at 25% disease prevalence,

Pr

suggesting a possible use of dual markers as alternatives for exclusion of IBD polulation for colonoscopy during regular screening. Consistent with current

al

evidences, NEUROD1 has been implicated to regulate enteroendocrine cell

rn

differentiation via Wnt signalling that affect intestinal and colon polyposis [24, 25].

Jo u

In conclusion, we have identified three sets of methylation signatures associated with CRC particularly early stage CRC, CRC stage stratification and NAPL respectively at tissue level. These markers may be further applied for non-invasive diagnostic test developments (stool- or blood-based tests) and required further study to validate their clinical utilities for non-invasive CRC screening in average-risk populations. ACKNOWLEDGEMENTS We thank the members of the Jian-Bing Fan’s Laboratory in Southern Medical University for helpful discussions and the AnchorDx R&D team (AnchorDx Medical Co., Ltd.) for kindly providing the custom made 21-gene DNA methylation panel and

Journal Pre-proof related reagent kits. The Jian-Bing Fan’s Laboratory is supported by The National Key Research and Development Program of China (Grant NO.2017YFC1309002), Science and Technology Planning Project of Guangdong Province, China (Grant NO.2017B020226005), Scheme of Guangzhou Economic and Technological Development District for Leading Talents in Innovation and Entrepreneurship (Grant NO.2017-L152), Scheme of Guangzhou for Leading Talents in Innovation and Entrepreneurship(Grant NO.2016007) and Scheme of Guangzhou for Leading Team

oo

f

in Innovation(Grant NO.201909010010), China.

e-

The authors declare no conflict of interests.

pr

CONFLICT OF INTERESTS

Pr

AUTHOR CONTRIBUTIONS

al

Xiaoyu Zhang: Conceptualization, Methodology, Investigation. Shenmei Wan: Methodology, Investigation, Resources. Yanqi Yu: Methodology, Investigation.

rn

Weimei Ruan: Data curation, Visualization, Writing- Original draft preparation. Hong

Jo u

Wang: Visualization, Investigation. Linhao Xu: Conceptualization, Methodology. Chanjuan Wang: Project administration. Shang Chen: Investigation. Tianfeng Cao1: Resources. Quanzhou Peng: Resources. Sihui Li: Methodology. Tianliang Hu: Visualization. Zeyu Jiang: Supervision, Visualization. Zhiwei Chen: Supervision, Conceptualization, Writing - Review & Editing. Jian-Bing Fan: Supervision, Conceptualization, Funding acquisition.

Journal Pre-proof

Tables Table 1. Patient characteristics of the CRC cohort Groups Characteristics

IBD

NAPL

APL

Stage I CRC

Stage II-IV CRC

Number of participants

23

14

18

22

18

Measurement available

23

14

18

18

15

47(6-72)

46(37-77)

63(30-78)

59(35-77)

62(34-73)

Female (n, %)

9(39.1%)

3(21.4%)

6(33.3%)

5(27.8%)

3(20.0%)

Male (n, %)

14(60.9%)

11(78.6%)

12(66.7%)

13(72.2%)

12(80.0%)

Stage 0

-

-

18(100.0%)

-

-

Stage I

-

-

-

18(100.0%)

-

Stage II

-

-

-

-

6(40.0%)

Stage III

-

-

-

-

6(40.0%)

Stage IV

-

-

-

-

3(20.0%)

Hyperplastic Polyps(n, %)

-

7(50.0%)

-

-

-

Tubular adenoma (n, %)

-

3(21.4%)

8(44.4%)

-

-

Tubulovillous adenoma

-

3(21.4%)

9(50.0%)

-

-

1(5.6%)

-

-

Age [Median(Range),years]

oo

e-

Pr

al

-

-

LGIN(%)

-

3(21.4%)

-

-

-

HGIN(%)

-

-

18(100.0%)

-

-

-

-

13(72.2%)

17(94.4%)

11(73.3%)

-

-

2(11.1%)

1(5.6%)

4(26.7%)

-

-

3(16.7%)

-

-

Tumor location Left-sided (n, %) Right-sided (n, %) Transvese/Not available

Jo u

Villous adenoma (n, %)

rn

(n, %)

pr

CRC stage

Lesion Histology

f

Gender

(n, %)

Table 2. Features of top markers associated with CRC and early stage CRC IBD + NAPL vs APL + CRC markers

AUC

SFMBT2

0.96

VAV3-AS1

0.95

sensitivity

specificity

92.2%

94.6%

(78.4%-100.0%)

(86.5%-100.0%)

92.2%

89.2%

(82.4%-100.0%)

(75.7%-97.3%)

IBD + NAPL vs APL+ stage I CRC clustering group A A

AUC

sensitivity

specificity

0.97

91.7%

97.3%

(80.6%-100.0%)

(86.5%-100.0%)

91.7%

89.2%

(80.6%-100.0%)

(73.0%-97.3%)

0.94

clustering group A A

Journal Pre-proof

ITGA4

0.94

THBD

0.93

FBN1

0.92

EVC

0.90

C9orf50

0.89

TWIST1

0.85

ZNF304

0.82

90.2%

89.2%

(76.5%-98.0%)

(78.4%-100.0%)

90.2%

94.6%

(80.4%-96.1%)

(86.5%-100.0%)

86.3%

94.6%

(72.6%-96.1%)

(83.8%-100.0%)

86.3%

91.9%

(70.6%-96.1%)

(78.4%-100.0%)

84.3%

86.5%

(64.7%-98.0%)

(67.6%-100.0%)

94.1%

86.5%

(80.4%-100.0%)

(73.0%-97.3%)

72.6%

100.0%

(58.8%-88.2%)

(83.8%-100%)

66.7%

100.0%

(52.9%-80.4%)

(94.6%-100.0%)

0.94

B

0.94

A

0.94

A

0.94

B

0.90

B

0.93

C A B

0.90

86.1%

94.6%

(72.2%-100.0%)

(75.7%-100.0%)

88.9%

91.9%

(75%-97.2%)

(83.9%-100.0%)

91.7%

94.6%

(80.6%-100.0%)

(86.5%-100.0%)

88.9%

94.6%

(72.2%-97.2%)

(83.8%-100.0%)

88.9%

89.2%

(69.4%-97.2%)

(75.7%-100.0%)

91.7%

86.5%

(75%-100.0%)

(73.0%-100.0%)

97.2%

83.8%

f

0.94

94.6% (86.5%-100.0%)

oo

KCNQ5

86.3% (76.5%-96.1%)

0.88

pr

0.95

C

0.88

(83.3%-100.0%)

(70.3%-94.6%)

77.8%

94.6%

(61.0%-91.7%)

(78.4%-100.0%)

77.8%

100.0%

(63.9%-91. 7%)

(89.2%-100.0%)

Pr

e-

ZNF132

Table 3. Features of diagnostic model for CRC detection For CRC detection (IBD + NAPL vs APL + CRC) sensitivity

specificity

Train

96.6%(96.0%-97.2%)

95.4%(94.5%-96.2%)

Test

96.1%(95.3%-96.9%)

87%(85.2%-88.8%)

rn

al

Mean (95%CI)

Detection sensitivities on early stage CRC

Train Test

APL

Stage I CRC

T1 tumors

T2 tumors

90.4%

100.0%

100.0%

100.0%

(88.3%-92.5%)

(100.0%-100.0%)

(100.0%-100.0%)

(100.0%-100.0%)

100.0%

100.0%

100.0%

100.0%

(100.0%-100.0%)

(100.0%-100.0%)

(100.0%-100.0%)

(100.0%-100.0%)

Jo u

Mean (95%CI)

Detection sensitivities with regards to tumor locations Left-sided

Right-sided

Train

95.8% (23/24)

100% (4/4)

Test

93.8% (15/16)

100% (3/3)

Table 4. Performance characteristics of top markers associated with CRC progression or NAPL markers

AUC

sensitivity

specificity

B A A B B C A

B C

Journal Pre-proof Top markers associated with CRC Progression markers

AUC

sensitivity

specificity

KCNJ12

0.73

80.0%(40.0%-100.0%)

69.4%(49.9%-100.0%)

FAM19A4

0.71

86.7%(60.0%-100.0%)

66. 7%(50.0%-86.1%)

ZFHX4-AS1

0.70

80.0%(46.7%-100.0%)

66. 7%(44.4%-91.7%)

VAV3-AS1

0.69

66.7%(26.7%-93.3%)

77. 8%(50.0%-100.0%)

EVC

0.68

93. 3%(33.3%-100.0%)

55. 6%(33.3%-97.2%)

C9orf50

0.63

60.0%(26.7%-93.3%)

80. 6%(36.1%-97.2%)

ITGA4

0.59

93. 3%(46.7%-100.0%)

44. 4%(25.0%-83.3%)

SFMBT2

0.57

53. 3%(26.7%-100.0%)

80.6%(11.1%-97.2%) 82.6%(60.9%-100.0%)

Top markers associated with NAPL 0.85

92.9%(57.1%-100.0%)

THBD

0.83

85.7%(50.0%-100.0%)

KCNQ5

0.82

85.7%(57.1%-100.0%)

82.6%(65.2%-100.0%)

C9orf50

0.82

78.6%(57.1%-100.0%)

91.3%(73.9%-100.0%)

ST8SIA4

0.81

85.7%(50.0%-100.0%)

82.6%(56.5%-100.0%)

TWIST1

0.81

78.6%(42.9%-100.0%)

82.6%(43.5%-100.0%)

VAV3-AS1

0.77

71.4%(42.9%-92.9%)

91.3%(78.3%-100.0%)

FAM72C

0.77

100.0%(42.9%-100.0%)

60.9%(39.1%-100.0%)

Jo u

rn

al

Pr

e-

pr

oo

f

NEUROD1

82.6%(52.2%-100.0%)

Journal Pre-proof Figure captions Figure 1. Heatmap of 21 methylation markers in the cohort; The blue colour represents a

small ∆Ct value and high methylation level whereas the red colour indicates a large ∆Ct value and low methylation level. Markers are grouped with hierarchical clustering. Figure 2. Diagnostic methylation model for CRC early detection. A, ROC curves of diagnostic methylation model and individual methylation markers in the training set; B, ROC curves of diagnostic methylation model and individual methylation markers in the test set; C, Detection sensitivities of methylation model in different CRC subgroups compared to individual markers in the test set; D, Distributions of CRC risk prediction among groups of

oo

f

IBD, NAPL, APL, stage I CRC and stage II-IV CRC in the test set; An asterisk mark indicates a statistical significant difference with a p value<0.05 compared to IBD or NAPL

pr

groups.

Figure 3. Validation and specificity of four-marker diagnostic model in TCGA cohort. A,

e-

Methylation levels of four markers in TCGA cohorts; The methylation levels were expressed as average methylation values of CpG sites involved in the marker regions; Each dot

Pr

represented one tissue sample and the red lines indicated median values in the corresponding groups. B, Correlations of methylation profiles and RNA expression of the four markers in COAD and READ cohorts; Each dot represented an individual tissue sample; The gene

al

expression levels were presented as log fold changes. C, ROC curves of the four marker

rn

models in COAD and READ, ESCA and STAD cohorts; The curve of COAD and STAD was depicted in the test set under a train-test splitting with ratio of 6 to 4. D, Detection sensitivities

Jo u

of the four marker models on MSS, MSI, left-sided, right-sided, T1 and T2 tumors in COAD and READ cohorts.

Figure 4. Methylation markers associated with CRC progression. A, ROC curves of CRC stratification models and individual methylation markers in the training set; B, ROC curves of CRC stratification models and individual methylation markers in the training set; C, Features of clinical performance of CRC stratification model compared to individual markers in the test set; D, Distributions of CRC stratification probabilities among CRC subgroups including APL, stage I CRC and stage II-IV CRC in the test set. Figure 5. Methylation markers for classification of IBD and NAPL. A, ROC curves of dual marker NAPL models and individual methylation markers in the training set; B, ROC curves of dual marker NAPL models and individual methylation markers in the test set; C, Features of clinical performance of NAPL model compared to individual markers in the test set; the NPV are adjusted based on a disease prevalence of 25%.

Journal Pre-proof Table 1. Patient characteristics of the CRC cohort. A hyphen mark indicates the value is not applicable. Table 2. Features of top markers associated with CRC and early stage CRC. The

values are expressed as mean (range of 95% confident intervals) from 2000 bootstrap sampling. Table 3. Features of diagnostic model for CRC detection. The values are expressed as

mean (range of 95% confident intervals) from 100 bootstrap sampling. Table 4. Performance characteristics of top markers associated with CRC progression or

NAPL. The values are expressed as mean (range of 95% confident intervals) from

Jo u

rn

al

Pr

e-

pr

oo

f

2000 bootstrap sampling.

Journal Pre-proof

Reference List

Jo u

rn

al

Pr

e-

pr

oo

f

[1] F. Bray, J. Ferlay, I. Soerjomataram, R.L. Siegel, L.A. Torre, A. Jemal, Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries, CA: A Cancer Journal for Clinicians, 68 (2018) 394-424. [2] N.A. Howlader N, Krapcho M, Miller D, Brest A, Yu M, Ruhl J, Tatalovich Z, Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA SEER Cancer Statistics Review, 1975-2016, in: N.C. Institute (Ed.), Bethesda, MD, 2018. [3] L.-L. Song, Y.-M. Li, Current noninvasive tests for colorectal cancer screening: An overview of colorectal cancer screening tests, World J Gastrointest Oncol, 8 (2016) 793-800. [4] J.K. Triantafillidis, C. Vagianos, A. Gikas, M. Korontzi, A. Papalois, Screening for colorectal cancer: the role of the primary care physician, European Journal of Gastroenterology & Hepatology, 29 (2017). [5] E.J. Kuipers, W.M. Grady, D. Lieberman, T. Seufferlein, J.J. Sung, P.G. Boelens, C.J.H. van de Velde, T. Watanabe, Colorectal cancer, Nat Rev Dis Primers, 1 (2015) 15065-15065. [6] G. Nakayama, C. Tanaka, Y. Kodera, Current Options for the Diagnosis, Staging and Therapeutic Management of Colorectal Cancer, Gastrointest Tumors, 1 (2013) 25-32. [7] T.F. Imperiale, D.F. Ransohoff, S.H. Itzkowitz, T.R. Levin, P. Lavin, G.P. Lidgard, D.A. Ahlquist, B.M. Berger, Multitarget Stool DNA Testing for Colorectal-Cancer Screening, New England Journal of Medicine, 370 (2014) 1287-1297. [8] T.R. Church, M. Wandell, C. Lofton-Day, S.J. Mongin, M. Burger, S.R. Payne, E. Castaños-Vélez, B.A. Blumenstein, T. Rösch, N. Osborn, D. Snover, R.W. Day, D.F. Ransohoff, I. Presept Clinical Study Steering Committee, T. Study, Prospective evaluation of methylated SEPT9 in plasma for detection of asymptomatic colorectal cancer, Gut, 63 (2014) 317-325. [9] I. Blumenstein, W. Tacke, H. Bock, N. Filmann, E. Lieber, S. Zeuzem, J. Trojan, E. Herrmann, O. Schröder, Prevalence of colorectal cancer and its precursor lesions in symptomatic and asymptomatic patients undergoing total colonoscopy: results of a large prospective, multicenter, controlled endoscopy study, European Journal of Gastroenterology & Hepatology, 25 (2013). [10] C. Ausch, Y.-H. Kim, K.D. Tsuchiya, S. Dzieciatkowski, M.K. Washington, C. Paraskeva, J. Radich, W.M. Grady, Comparative Analysis of PCR-Based Biomarker Assay Methods for Colorectal Polyp Detection from Fecal DNA, Clinical Chemistry, 55 (2009) 1559. [11] C.P.E. Lange, M. Campan, T. Hinoue, R.F. Schmitz, A.E. van der Meulen-de Jong, H. Slingerland, P.J.M.J. Kok, C.M. van Dijk, D.J. Weisenberger, H. Shen, R.A.E.M. Tollenaar, P.W. Laird, Genome-scale discovery of DNA-methylation biomarkers for blood-based detection of colorectal cancer, PLoS One, 7 (2012) e50266-e50266. [12] A. Giacosa, F. Frascio, F. Munizzi, Epidemiology of colorectal polyps, Techniques in Coloproctology, 8 (2004) s243-s247. [13] A. Buda, M. De Bona, I. Dotti, P. Piselli, E. Zabeo, R. Barbazza, A. Bellumat, F. Valiante, E. Nardon, C.S. Probert, M. Pignatelli, G. Stanta, G.C. Sturniolo, M. De Boni, Prevalence of

Journal Pre-proof

Jo u

rn

al

Pr

e-

pr

oo

f

different subtypes of serrated polyps and risk of synchronous advanced colorectal neoplasia in average-risk population undergoing first-time colonoscopy, Clin Transl Gastroenterol, 3 (2012) e6-e6. [14] O.I. Brovkina, M.G. Gordiev, A.N. Toropovskiy, D.S. Khodyrev, A.V. Nikitin, A.V. Averyanov, THE ROLE OF ABERRANT METHYLATED IN GENES APC, RASSF1A AND ITGA4 FOR DIAGNOSIS OF COLORECTAL CANCER, J Clin Pract, 8 (2017) 8-14. [15] M. Freitas, F. Ferreira, S. Carvalho, F. Silva, P. Lopes, L. Antunes, S. Salta, F. Diniz, L.L. Santos, J.F. Videira, R. Henrique, C. Jerónimo, A novel DNA methylation panel accurately detects colorectal cancer independently of molecular pathway, J Transl Med, 16 (2018) 45-45. [16] J. Li, C. Chen, X. Bi, C. Zhou, T. Huang, C. Ni, P. Yang, S. Chen, M. Ye, S. Duan, DNA methylation of CMTM3, SSTR2, and MDFI genes in colorectal cancer, Gene, 630 (2017) 1-7. [17] B. Baran, N. Mert Ozupek, N. Yerli Tetik, E. Acar, O. Bekcioglu, Y. Baskin, Difference Between Left-Sided and Right-Sided Colorectal Cancer: A Focused Review of Literature, Gastroenterology Res, 11 (2018) 264-273. [18] J.W. Moon, S.K. Lee, J.O. Lee, N. Kim, Y.W. Lee, S.J. Kim, H.J. Kang, J. Kim, H.S. Kim, S.-H. Park, Identification of novel hypermethylated genes and demethylating effect of vincristine in colorectal cancer, J Exp Clin Cancer Res, 33 (2014) 4-4. [19] J. Gwak, J.Y. Shin, K. Lee, S.K. Hong, S. Oh, S.-H. Goh, W.S. Kim, B.G. Ju, SFMBT2 (Scm-like with four mbt domains 2) negatively regulates cell migration and invasion in prostate cancer cells, Oncotarget, 7 (2016) 48250-48264. [20] K. Lee, W. Na, J.-H. Maeng, H. Wu, B.-G. Ju, Regulation of DU145 prostate cancer cell growth by Scm-like with four mbt domains 2, Journal of Biosciences, 38 (2013) 105-112. [21] X. Chen, S.I. Chen, X.-A. Liu, W.-B. Zhou, R.-R. Ma, L. Chen, Vav3 oncogene is upregulated and a poor prognostic factor in breast cancer patients, Oncol Lett, 9 (2015) 2143-2148. [22] R. Takahashi, M. Yamagishi, K. Nakano, T. Yamochi, T. Yamochi, D. Fujikawa, M. Nakashima, Y. Tanaka, K. Uchimaru, A. Utsunomiya, T. Watanabe, Epigenetic deregulation of Ellis Van Creveld confers robust Hedgehog signaling in adult T-cell leukemia, Cancer Sci, 105 (2014) 1160-1169. [23] I. Lee, S.-J. Lee, T.M. Kang, W.K. Kang, C. Park, Unconventional Role of the Inwardly Rectifying Potassium Channel Kir2.2 as a Constitutive Activator of RelA in Cancer, Cancer Research, 73 (2013) 1056. [24] H.J. Li, S.K. Ray, N.K. Singh, B. Johnston, A.B. Leiter, Basic helix-loop-helix transcription factors and enteroendocrine cell differentiation, Diabetes Obes Metab, 13 Suppl 1 (2011) 5-12. [25] M. El-Salhy, K. Umezawa, J.G. Hatlebakk, O.H. Gilja, Abnormal differentiation of stem cells into enteroendocrine cells in rats with DSS-induced colitis, Mol Med Rep, 15 (2017) 2106-2112.

Journal Pre-proof AUTHOR STATEMENT Xiaoyu Zhang: Conceptualization, Methodology, Investigation. Shenmei Wan: Methodology, Investigation, Resources. Yanqi Yu: Methodology, Investigation. Weimei Ruan: Data curation, Visualization, Writing- Original draft preparation. Hong Wang: Visualization, Investigation. Linhao Xu: Conceptualization, Methodology. Chanjuan Wang: Project administration. Shang Chen: Investigation. Tianfeng Cao1:

f

Resources. Quanzhou Peng: Resources. Sihui Li: Methodology. Tianliang Hu:

oo

Visualization. Zeyu Jiang: Supervision, Visualization. Zhiwei Chen: Supervision,

pr

Conceptualization, Writing - Review & Editing. Jian-Bing Fan: Supervision,

Jo u

rn

al

Pr

e-

Conceptualization, Funding acquisition.

Journal Pre-proof Highlights



Three sets of methylation markers for CRC management were identified at tissue level.



SFMBT2, ITGA4, THBD and ZNF304 were markers for early-stage CRC detection.



Methylation of KCNJ12, VAV3-AS1 and EVC were identified for CRC stage stratification.

rn

al

Pr

e-

pr

oo

f

NEUROD1 and FAM72C differentiated non-advanced precancerous lesions from IBD.

Jo u



Figure 1

Figure 2

Figure 3

Figure 4

Figure 5