Tracking age-correlated DNA methylation markers in the young

Tracking age-correlated DNA methylation markers in the young

Accepted Manuscript Title: Tracking age-correlated DNA methylation markers in the young Authors: Ana Freire-Aradas, Christopher Phillips, Lorena Gir´o...

NAN Sizes 0 Downloads 28 Views

Accepted Manuscript Title: Tracking age-correlated DNA methylation markers in the young Authors: Ana Freire-Aradas, Christopher Phillips, Lorena Gir´on-Santamar´ıa, Ana Mosquera-Miguel, Antonio ´ ´ G´omez-Tato, M. Angeles Casares de Cal, Jose Alvarez-Dios, Maria Victoria Lareu PII: DOI: Reference:

S1872-4973(18)30206-0 https://doi.org/10.1016/j.fsigen.2018.06.011 FSIGEN 1909

To appear in:

Forensic Science International: Genetics

Received date: Revised date: Accepted date:

3-4-2018 8-6-2018 11-6-2018

Please cite this article as: Freire-Aradas A, Phillips C, Gir´on-Santamar´ıa L, Mosquera´ Alvarez-Dios ´ Miguel A, G´omez-Tato A, de Cal MAC, J, Lareu MV, Tracking agecorrelated DNA methylation markers in the young, Forensic Science International: Genetics (2018), https://doi.org/10.1016/j.fsigen.2018.06.011 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Tracking age-correlated DNA methylation markers in the young

Ana Freire-Aradas1*, Christopher Phillips1, Lorena Girón-Santamaría1, Ana Mosquera-Miguel1, Antonio GómezTato2, M. Ángeles Casares de Cal2, Jose Álvarez-Dios2, Maria Victoria Lareu1 1 2

Forensic Genetics Unit, Institute of Forensic Sciences, University of Santiago de Compostela, Spain Faculty of Mathematics, University of Santiago de Compostela, Spain

SC R

IP T

*Corresponding author. E-mail address: [email protected] (A. Freire-Aradas)

N

PT

Abstract

ED

M

 

Candidate age-correlated CpG sites explored in public datasets of 398 young subjects (3-17 yrs). 3 most correlated genes selected and tested in 209 new DNAs (2-18 yrs) + 7 previous adult-informative age markers. CpGs in KCNAB3 have strongest correlation of methylation status with pre-adult age ranges. Quantile regression prediction model extended with 6 most young-informative CpGs has ±0.94 yr median absolute error.

A

 

U

Highlights

A

CC E

DNA methylation is the most extensively studied epigenetic signature, with a large number of studies reporting age-correlated CpG sites in overlapping genes. However, most of these studies lack sample coverage of individuals under 18 years old and therefore little is known about the progression of DNA methylation patterns in children and adolescents. In the present study we aimed to select candidate age-correlated DNA methylation markers based on public datasets from Illumina BeadChip arrays and previous publications, then to explore the resulting markers in 180 blood samples from donors aged between 2 to 18 years old using the EpiTYPER® DNA methylation analysis system. Results from our analyses identified six genes highly correlated with age in the young, in particular the gene KCNAB3, which indicates its potential as a highly informative and specific age biomarker for childhood and adolescence. We outline a preliminary age prediction model based on quantile regression that uses data from the six CpG sites most strongly correlated with age ranges extended to include children and adolescents.

Keywords: DNA methylation; children; adolescents; individual age; Illumina; EpiTYPER®; age estimation

1

1. Introduction

SC R

IP T

Both human development and aging are biological processes regulated by several factors including epigenetic changes [1,2]. Such alterations form the basis of an individual´s epigenome, which can be defined as the assembly of reversible and mitotically heritable modifications that affect gene regulation without altering the underlying DNA sequence [3]. Chromatin remodeling, post-translational modification of histone tails, non-coding RNAs and DNA methylation are the main epigenetic categories [4–7]. Of these epigenetic changes, DNA methylation has been the most extensively explored [8–11], and a strong correlation between methylation levels and individual age is widely reported [12–15]. Patterns of both DNA hyper- and hypo-methylation are observed during the aging process [16] but are not randomly distributed amongst methylation sites. In general, as an individual becomes older, DNA hypomethylation increases in distribution across the genome (affecting promoters, exonic, intronic, and intergenic regions), while DNA hypermethylation is more specifically localized at certain promoterassociated CpG islands [15].

ED

M

A

N

U

To date, a large collection of age-correlated CpG sites are applicable to age estimation from methylation analysis [13,17–24]. Several genes have been consistently identified as highly age-correlated in independent studies, notably the ELOVL2 gene is widely reported [20,23,25–28]. Nevertheless, a common drawback for these studies is the lower limits to the age range of samples used. Most studies cover adult ages (from 18 years old onwards), and that means little is known about DNA methylation patterns in children and adolescents (herein, ‘the young’). The dynamic nature of DNA methylation is the result of complex interactions between genes and environment taking place during the lifetime of an individual. At all ages, individuals are continuously exposed to different environmental factors, but their impact on the young is higher, since the highest activation of the immune system and development of the individual occurs during the pre-adult years of life [29]. Therefore, it is likely that the corresponding changes in DNA methylation linked to gene regulation, from early childhood to late youth, display different patterns to those observed during adulthood. In order to better understand the epigenetic changes that occur during the initial stages of life, further research is needed. Several studies investigated the role of DNA methylation in early years of life [30–33], and from these, Alisch et al [34] undertook a comprehensive genomewide DNA methylation study targeting samples from the young (covering a full range of ages between 3 and 17 years old) that examined a total of 27,578 CpG sites. That study has provided a useful reference dataset for subsequent investigations seeking to extend age studies into the early years of life.

A

CC E

PT

Identification of the most age-correlated CpG sites active during childhood and adolescence would be informative in some biological areas. Since the most significant development of the immune system occurs during childhood [29] and an association between age-correlated markers and immune ontological functions is already reported [34], further research would improve the understanding of certain autoimmune diseases. Additionally, from a forensic point of view, prediction of the chronological age of the young would improve the identification of unidentified remains (victims of mass disasters or casualties in regions of conflict) or could potentially support legal hearings that attempt to estimate the age of an asylum seeker or in discerning whether to apply the penalty pertinent to young offenders. Although certain initial analysis included the young in forensic DNA methylationbased age prediction models [35], further investigations are still required to detail how DNA methylation patterns vary in children and adolescents. In the present study, we aimed to establish a manageable number of candidate age-correlated CpG sites applicable to the earliest periods of life before the onset of adulthood. Based on results from Alisch et al [34] and from a previous study of forensic age estimation by Freire-Aradas et al [27], an initial selection of DNA methylation markers was created. To reduce this marker set further and validate its usefulness for the targeted age ranges, a total of 209 individuals from 2 to 18 years old were analyzed using EpiTYPER® DNA methylation analysis. From a total of ten selected genes, six were used to develop a preliminary age prediction model. Of these, KCNAB3 was shown to be a highly specific predictor of age for childhood and adolescence.

2

2. Material and Methods 2.1. DNA samples and DNA methylation data

IP T

Peripheral blood DNA samples of 209 healthy European donors ranging from 2 to 18 years old (covering approximately 10 individuals per year-of-age at sampling and balanced to maintain a uniform male:female ratio) were provided by the BioBank IBSP-CV (PT13/0010/0064), integrated in the Spanish National Biobanks Network and in the Valencian Biobanking Network. Samples were processed following standard operating procedures with the appropriate approval of the Biobank Ethical and Scientific Committees. Additionally, ethical approval for the present study was granted from the ethics committee of investigation in Galicia, Spain (CAEI: 2013/543). From these, 180 donors were used as a training set for building a preliminary age prediction model, and the remaining 29 were used as a test set providing an independent group of samples to begin validation of the model.

SC R

DNA methylation data for a total of 398 blood samples from donors aged between 3 to 17 years old were compiled from GSE27097 (Illumina HumanMethylation27, HM27) [34] for in silico discovery of the most agecorrelated CpG sites. Additionally, DNA methylation data for 723 blood samples (14-94 years old) obtained from GSE87571 (Illumina HumanMethylation450, HM450) [21], were used for comparative purposes. 2.2. CpG site selection

A

2.3. Agena Bioscience EpiTYPER® DNA methylation analysis

N

U

As well as candidate age-correlated CpG sites for young age ranges from the GSE27097 datasets [34], DNA methylation markers previously displaying high correlation with individual age in adult age ranges were compiled. The previous age-correlated markers comprised CpG sites in genes: ELOVL2; ASPA; PDE4C; FHL2; CCDC102B; MIR29B2CHG; and chr16:85395429 [27].

CC E

PT

ED

M

The Agena Bioscience EpiTYPER® system (San Diego, CA, USA) is a bisulfite-treatment-based method for detection and quantification of DNA methylation using MassARRAY® mass spectrometry [36]. Initially, sequences flanking the selected CpG sites were downloaded from the UCSC genome browser using the GRCh38/hg38 human genome assembly [37], selecting 300 base pairs (bp) 5’ upstream and 3’ downstream of the CpG of interest. Primer design for new candidate CpG sites analyzed with EpiTYPER® was made using Agena Bioscience EpiDesigner software [38], applying parameters: i) optimal primer melting temperature of 62°C, ii) optimal primer size of 25 bp, iii) optimal amplicon length of 300 bp. Both strands were selected for the design, using the EpiTYPER® T reaction mass cleavage (i.e. targeting methylated DNA for cleavage). PCR primers used for ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 analyses were as previously reported in [27]. EpiTYPER® detects methylation levels at single or multiple CpG positions depending on the cleavage fragment. Therefore, cleavage fragments containing single CpGs will detect a single methylation level for that CpG site, while fragments with multiple closely positioned CpGs detect average methylation levels for the CpG sites present on the fragment. Herein, we use the term CpG site to denote both single CpG sites or clusters of CpGs in short DNA fragments.

A

Twelve amplicons containing 60 CpG sites were analyzed with EpiTYPER®. Prior bisulfite conversion of 300 ng of genomic DNA was performed with the EZ DNA MethylationTM Kit (Zymo Research). Specific procedures for the EpiTYPER® workflow were as previously outlined [27]. All samples were PCR amplified and run on EpiTYPER® in duplicate. Mean methylation levels were estimated from replicates, discarding outlying samples (deviations of >10% between replicates). Additionally, four controls were run on each plate reaction: a human high methylated DNA control (EpigenDx Inc.); a human low methylated DNA control (EpigenDx Inc.); a non‐template DNA control; and a non‐transformed DNA control. 2.4. Statistical analysis

3

N

U

SC R

IP T

Methylation data (β-values) were obtained using EpiTYPER® software v.1.2.22 (Agena Bioscience) calculating the proportions between the corresponding methylated and unmethylated signals. The in silico fragmentation patterns and internal bisulfite conversion of non-CpG cytosines were analyzed with the MassArray Bioconductor package [39]. Correlations between age and DNA methylation levels were assessed using the Spearman correlation test for Illumina datasets and full EpiTYPER® analysis (highest levels of correlation are those close to -1 or 1, while absence of correlation is observed when values are close to 0). In the following data descriptions, high and low correlations are compiled as absolute values. Average β-values were used for the preliminary CpG site screening, calculated as the mean values of the methylation levels detected for both extreme groups (i.e. 3 years old, N=4, and 18 years old, N=4). Differences in average β-values between both groups were termed Ø. Quantiles 0.1 and 0.9 (q10 and q90) were used to build a multivariate quantile regression model using the quantreg R package [40]. Validation of the prediction model was based on k-fold cross-validation (k=10) applying an R script developed in-house. For this test, the input data (N=180) were randomly cleaved into k fragments of similar size (k=10). Random cleavage of data was made using the cvTools R package [41]. Subsequently, the quantile regression model was repeatedly tested up to 10 times. Every k time that the model was assessed, a different cluster was retained as the testing set with remaining clusters used as training sets, maintaining in each run, proportions of 10% and 90% of the input data for the testing and training sets, respectively. The predictive accuracy was measured using the following parameters: Median Absolute Error (MAE); Percentage of Predicted Error Relative to the Age (PPERA); correlation of training and testing set and percentage of correct classifications. Since both the predicted age and the age prediction intervals (minimum: MinPred and maximum: MaxPred) can be estimated when using quantile regression analysis, correct classifications can be described as chronological age fitting inside the estimated age prediction intervals. Predicted versus chronological age was plotted using the ggplot2 R package [42]. All calculations were performed using R software v.3.4.0.

A

3. Results

M

3.1. Selection and tracking of candidate age-correlated DNA methylation markers in the young from public Illumina datasets

A

CC E

PT

ED

The Spearman correlation (rs) test was applied to DNA methylation data from the GSE27097 dataset containing a total of 27,578 CpG sites based on 398 blood-based DNA samples (3 to 17 years to detect the most agecorrelated CpG sites in young age ranges). Table 1 shows the top ten CpG sites obtained from this analysis. From these sites, the lowest correlation value was displayed by cg27210390 (TOM1L1, rs: -0.501) and the maximum observed for cg14918082 (KCNAB3, rs: 0.656). In order to examine if these selected CpG sites were also informative for age estimation in adults, Spearman correlation for these ten CpG sites was calculated including DNA methylation data from dataset GSE87571 comprising 723 blood DNA samples ranging from 14 to 94 years old (Table 1). In this case, the highest correlation was displayed by cg09809672 (EDARADD, rs: -0.835). However, when combining all ages, cg14918082 completely changed its position and gave the lowest correlation amongst these CpGs (rs: 0.699). To better understand how DNA methylation patterns for the selected methylation markers progress during the extended human lifetime, dispersion plots were constructed, as shown in Fig. 1. DNA methylation patterns for most of these markers reveal exponential change (hyper- or hypo-methylation) during early childhood to young adulthood that stabilizes during adulthood (FLJ40365, SDS, PGLYRP2, HKR1, KCNAB3, PRKG2, FLJ46365 and TOM1L1). In contrast, EDARADD and ITGA2B show decreasing DNA methylation levels as the individual ages at a consistent rate. From these young-age indicative CpG sites, five were selected for the validation study. The four CpGs displaying the highest age-correlation for young age ranges: cg14918082 (KCNAB3); cg16744741 (PRKG2); cg25538571 (FLJ46365); and cg25809905 (ITGA2B), plus the CpG site with the highest age-correlation when including the group of adult samples: cg09809672 (EDARADD), were subsequently explored using EpiTYPER®. Our previous study focusing on age estimation in adult age ranges led to a set of seven DNA methylation markers highly correlated with age: ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 [27]. These markers had been selected after assessment of three Illumina HumanMethylation450 datasets. In order

4

to explore how the corresponding DNA methylation patterns change during the progression from early childhood to young adulthood, these markers were also included in the present study. Dispersion diagrams based on GSE27097 are not shown, as all corresponding CpG sites, except ASPA, are absent from HM27 (Illumina HumanMethylation 27K). 3.2. Preliminary screening with EpiTYPER®

U

3.3. Full EpiTYPER® analysis of samples from children and adolescents

SC R

IP T

PCR primers for the candidate CpG sites were designed to produce a pool of five novel amplicons covering a total of 18 CpG sites. These were initially tested in eight individuals at the extremes of the target age ranges (i.e. 3 years old, N=4, and 18 years old, N=4) using EpiTYPER® analysis. Results from this preliminary screening are shown in Fig. 2. Ten CpG sites in KCNAB3 displayed the highest differences between the 3 year old and 18 year old test samples: CR_29_CpG_1.2 (Ø: -0.313), CR_29_CpG_3.4 (Ø: -0.333), CR_29_CpG_5.6 (Ø: -0.350), CR_29_CpG_7 (Ø: -0.313), CR_29_CpG_8.9.10 (Ø: -0.300), CR_29_CpG_11 (Ø: -0.313), CR_29_CpG_12 (Ø: 0.260), CR_29_CpG_13 (Ø: -0.278), CR_29_CpG_14 (Ø: -0.280), CR_29_CpG_15 (Ø: -0.288). In the upper half of Fig. 2, PRKG2 displayed large differences in DNA methylation levels between both age groups: CR_33_CpG_1 (Ø: 0.213), CR_33_CpG_3 (Ø: 0.168), CR_33_CpG_4 (Ø: 0.143), together with EDARADD: CR_35_CpG_2 (Ø: 0.190) and CR_35_CpG_3 (Ø: 0.225). The lowest methylation differences were observed for FLJ46365: CR_31_CpG_1.2 (Ø: 0.123) and ITGA2B: CR_7_CpG_1.2 (Ø: 0.068) and CR_7_CpG_4 (Ø: 0.128); therefore, these genes were not included in subsequent analyses.

ED

M

A

N

A total of 10 amplicons covering 58 CpG sites were analyzed with EpiTYPER® for validation purposes. The three novel amplicons with the highest differences in DNA methylation levels from preliminary screening: KCNAB3, PRKG2 and EDARADD; plus the seven previously designed amplicons for ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429, were assessed in 180 Europeans ranging from 2 to 18 years old. These marker’s genomic coordinates are listed in Supplementary Table S1 and amplification primer information in Supplementary Table S2. Bisulfite conversion (BC) was externally and internally monitored. External BC was confirmed by assessment of two bisulfite conversion controls (human high- and low-methylated DNA: 100% and 0% methylated DNA, respectively); while internal BC was evaluated using internal bisulfite controls (CpX, X=A, C or T) present on each amplified sequence. A high rate of missing data was detected for CR_2_CpG_2 and this CpG site was removed from all subsequent analyses.

A

CC E

PT

To detect the most age-correlated markers, Spearman correlation tests were performed for the remaining 57 CpG sites. Fig. 3 shows the corresponding bar plot, where positively age-correlated hypermethylated CpGs are shown in green, and negatively age-correlated hypomethylated CpGs in red. From these results the highest correlations between DNA methylation and age in the young were found in KCNAB3 and CCDC102B. The amplified fragment for KCNAB3 covers a total of 10 CpG sites, which all showed Spearman correlation coefficients higher than 0.7 (average rs: 0.730). For CCDC102B, a similar pattern was observed, with the three CpG sites in the amplicon showing correlation values higher than 0.7, but negatively correlated for this gene (average rs: 0.744). The hypomethylated levels displayed for EDARADD (average rs: -0.697) and MIR29B2CHG (average rs: 0.744) were also strongly correlated with age for all their CpG sites analyzed. The genes PRKG2, ASPA and chromosome region chr16:85395429, were similarly hypomethylated and have correlation values of more than -0.5 (average rs: -0.542, -0.503 and -0.585, respectively). However, it is notable that the methylation patterns seen in the well-established age correlated genes of ELOVL2, PDE4C and FHL2 were varied. The amplified fragments from all three genes contain at least one highly age-correlated CpG, while additional CpG sites had low levels of correlation with age. High levels of correlation were observed for three CpGs in ELOVL2 (CR_1_CpG_9 rs: 0.750, CR_1_CpG_11.12.13.14 rs: 0.791 and CR_1_CpG_15.16.17 rs: 0847), one CpG in PDE4C (CR_4_CpG_27.28.29 rs: 0.611) and one in FHL2 (CR_12_1_CpG_3 rs: 0.677); all hypermethylated. Fig. 4 shows the dispersion diagrams (DNA methylation compared to chronological age) for the highest agecorrelated CpG site from each of the amplicons described above. Data points for males and females are depicted

5

IP T

in blue and red, respectively. For CR_29_CpG_11 (KCNAB3), although some dispersion is observed, data points clearly describe a linear increase in DNA methylation with age. A similar, but hypomethylation pattern is seen in CR_35_CpG_3 (EDARADD) and CR_13_CpG_1 (CCDC102B). However, dispersion is much reduced in CR_1_CpG_15.16.17 (ELOVL2), CR_12_1_CpG_3 (FHL2), CR_21_CpG_9.10 (MIR29B2CHG) and CR_23_CpG_3 (no gene associated with this site), despite maintaining high correlation values with age (see corresponding rs coefficients in Fig.4). Sites CR_2_CpG_3 (ASPA) and CR_4_CpG_27.28.29 (PDE4C), although age-correlated, are the CpG sites with the lowest differences in DNA methylation levels; so these two sites show weak DNA methylation correlation across the full range of ages, although dispersion is much lower for these sites. Finally, CR_33_CpG_3 (PRKG2) shows a general decrease of DNA methylation with age; nevertheless, there is dispersion and it patterns differently to other CpG sites; with more random distribution. No statistical differences were observed between males and females for all the aforementioned CpG sites, with the exception of: CR_33_CpG_3 and CR_35_CpG_3 (p-value<0.01). Table 2 summarizes the properties of the selected 10 CpG sites. 3.4. Age estimation in young age range samples

A

CC E

PT

ED

M

A

N

U

SC R

From the initial test results, an age estimation model was explored for the young, based on the previous 180 donors ranging from 2 to 18 years old, using quantile regression analysis - a statistical approach we previously developed for adult age ranges [27]. To build a preliminary estimation method, we first examined a model using the corresponding CpG sites from the three young-specific markers (KCNAB3, PRKG2 and EDARADD). However, when using only these three CpG positions, despite obtaining a low mean absolute error (MAE: ±1.74 years); instability in the model was observed in terms of low correlations for both training and testing sets (0.8079 and 0.7718, respectively). Subsequently, we increased the number of CpG sites for testing the model by choosing the CpG sites from Table 2 with age-correlations greater than 0.7. Specifically, these comprised for model A: CR_29_CpG_11 (KCNAB3); CR_35_CpG_3 (EDARADD); CR_1_CpG_15.16.17 (ELOVL2); CR_13_CpG_1 (CCDC102B); CR_21_CpG_9.10 (MIR29B2CHG); and CR_23_CpG_3 (no associated gene). Prediction performance parameters were calculated and are listed in Table 3. Model A gave a median absolute error of ±0.94 years and 77.78% correct classifications. Additionally, the predictive error relative to the age (PPERA) was 9.80%. The PPERA indicates the range of deviation of predicted age from chronological age as donors become older. Therefore, broader prediction intervals are expected for young adult subjects, while narrower prediction intervals are applicable in early childhood. Correlations were 0.9141 and 0.8925 for the training and testing set, respectively. In a second step that tested if the accuracy of the model could be improved by the inclusion of additional CpG sites; models from B to D were tested. The CpG sites included in these models were restricted to those in Table 2 with age correlations higher than 0.6. Therefore, PRKG2 and ASPA were removed from these analyses, as no CpG sites reached the established threshold. Model B included CR_12_1_CpG_3 (FHL2); model C included CR_4_CpG_27.28.29 (PDE4C); and model D included both CR_12_1_CpG_3 and CR_4_CpG_27.28.29. The corresponding performance parameters are described in Table 3. To test differences between the models, an ANOVA test was performed and no statistical differences were found between the models. Subsequently, model A is proposed as an optimal model for age estimation in the young, due to a reduced prediction error (MAE: ±0.94), the highest level of correct classifications made and the reduced number of six CpG sites used. The corresponding plot of predicted versus chronological age is shown in Fig. 5A, with data points in orange indicating subjects between 2 and 18 years old. The black diagonal line represents the 0.5 quantile regression line between predicted and chronological age, while the continuous grey line is the diagonal representing perfect correlation, and discontinuous grey lines delimit the prediction intervals (0.1 and 0.9 quantile regression lines). Most of the data points fit inside these estimated prediction intervals. However, it is important to note that there is a tendency for the predicted age of the subjects under 10 years old to be slightly overestimated; while the opposite trend can be observed for those individuals over 10 years old, with very slight underestimation (p-value<0.01). In order to further validate this preliminary model A, an independent group of 29 samples (3-18 years old) was assessed as a test set (Fig. 5B). From these, 18 out of 29 (62.07%) were correctly predicted (chronological age inside the prediction intervals) with a median absolute prediction error of 1.25 years. From this sample group, one outlier was identified, with a chronological age of six years old, but predicted to be 12.92 years old.

6

4. Discussion

PT

ED

M

A

N

U

SC R

IP T

The correlation between DNA methylation levels and age has been widely explored by many genome-wide studies. Some of them established subsequent epigenetic age prediction systems using a high number of CpG sites [13,20]. However; forensic applications are constrained to use a reduced number of markers, and accordingly, reduced-scale age prediction models have been developed for these purposes [18,27,43,44]. Nevertheless, most of these published models were either based on adult samples (>18 years old) or restricted in their sample size of younger subjects. In order to enhance all these models, it is important to improve the knowledge of DNA methylation changes seen during childhood and its progression into adolescence and young adulthood. We have identified ten genomic regions highly correlated with the individual age of the young. From these, seven were used as the established markers in a previous forensic age prediction model that was initially developed for adult age estimation based on Illumina HM450 data. The remaining three genes are identified in this study as highly specific for young age ranges based on a dataset derived from HM27. Although a broader CpG coverage (offered by HM450) would be the best approach for marker selection applicable to age estimation tests extended to the young, epigenetic studies of children and adolescents form a small proportion of the body of methylation data available. Therefore, the study from Alisch et al [34] was used as the best reference dataset for the present study, since it has the largest sample size and covers ages between 3 and 17 years with adequate numbers of samples per year of age (approximately 10 per year). Of the three new genes identified in the present study, KCNAB3 (potassium voltage-gated channel subfamily A regulatory beta subunit 3, chromosome 17) clearly shows a strong correlation of its methylation status with the age ranges of childhood and adolescence. This gene encodes a protein that forms a heterodimer with the potassium voltage-gated channel. Our study identified the corresponding cg14918082 in the dataset GSE27097 as the highest age-correlated CpG site. In total, 10 CpG sites were covered by EpiTYPER® analysis for KCNAB3. The highest correlation with age was confirmed for cg14918082, but strong levels of correlation occurred in the other sites, all hypermethylated and with very similar levels of correlation (rs>0.7). This observation is interesting: despite the common observation that closely positioned CpG sites tend to display similar methylation patterns; the genomic distance between the outlying CpG of KCNAB3 of 165 nucleotides (nt) is a sufficient span to lead to mixed levels of methylation. This suggests that the methylation of KCNAB3 during childhood and adolescence is not a random event but biologically regulated, particularly taking into account the position of the amplicon analyzed in KCNAB3 located close to the promoter region. The pattern of regulation in KCNAB3 suggests rapid changes during early life (levels of methylation change markedly between 2 and 18 years old), which then stabilize during adulthood (see Fig. 1). Although little is currently known about KCNAB3, correlations of the DNA methylation levels for this gene with pubertal development and altered reproductive hormone levels were recently reported [45].

A

CC E

PRKG2 (protein kinase, cGMP-dependent, type II, chromosome 4) and EDARADD (EDAR associated death domain, chromosome 1) are the other two genes presenting high correlation with age for methylation changes observed in the young. Whereas the protein encoded by PRKG2 belongs to the serine/threonine protein kinase family of proteins and plays a role in the regulation of fluid balance in the intestine; the protein encoded by EDARADD is a death domain-containing protein found to interact with EDAR, a death domain receptor required for the development of hair, teeth and other ectodermal-derived tissues. Three CpG sites in PRKG2 and two in EDARADD were detected with EpiTYPER®; all hypomethylated, and a particularly high correlation was observed for CR_35_CpG_3 in EDARADD (rs> -0.7). From this gene, cg09809672 had been detected in the dataset GSE27097 as one of the most age-correlated CpG sites in the young, and although also tested in our study (CR_35_CpG_2), levels of correlation were higher in CR_35_CpG_3, a CpG site located 59 nt 3’ downstream. Therefore, PRKG2 is similar to KCNAB3 in demonstrating marked changes during the first years of life, which subsequently slow down from about the age of 20 years onwards (see Fig. 1). In contrast, changes in methylation for EDARADD follow a linear progression from the very first years continuously through to the oldest studied ages. This gene was previously reported as an informative age marker in adults [18,28,46], and in a previous study [27], we suggested its potential genetic role in the earliest years. Besides these most age-correlated markers found in the young, it is noteworthy that further assessment of Illumina chip data with a higher marker coverage, e.g. HM450 or EPIC, could help to identify additional markers for age prediction in children and adolescents.

7

A

N

U

SC R

IP T

In addition to the above three highly age-specific markers obtained from assessments of GSE27097 and subsequent validation with EpiTYPER®, we explored the DNA methylation landscape in young ages for the markers already established for analysis of adults. Genes ELOVL2, ASPA, PDE4C, FHL2, CCDC102B and MIR29B2CHG, plus position chr16:85395429; identified in our study of adults [27], match the findings of other studies [20,25,43,44,46–48]. The maximum correlation with age was found in ELOVL2 (ELOVL fatty acid elongase 2, chromosome 6), and this gene is well established as the principal age predictor in human aging studies (summarized in [28]). The present study examined 13 CpG sites in ELOVL2 using EpiTYPER®. Three hypermethylated CpG sites gave the highest correlations: CR_1_CpG_9 together with CR_1_CpG_11.12.13.14 (rs> 0.7); and CR_1_CpG_15.16.17 (rs> 0.8) corresponding to the highest correlation with age obtained in our study. From these results, we recommend the use of ELOVL2 as the key age predictor in young age range studies, as well as for adult analyses. DNA hypermethylation with age was also detected for PDE4C (phosphodiesterase 4C, chromosome 4) and FHL2 (four and a half LIM domains 2, chromosome 2), with highest correlations (rs> 0.6) in CR_4_CpG_27.28.29 (PDE4C) and CR_12_1_CpG_3 (FHL2) in the young. Both of these CpG sites are the highest correlated sites in each gene in adults [27]. Hypomethylation was observed in ASPA (aspartoacylase, chromosome 17), CCDC102B (coiled-coil domain containing 102B, chromosome 18), MIR29B2CHG (MIR29B2 and MIR29C host gene, previously called: C1orf132) and chr16:85395429 (no associated gene) in the young. The gene ASPA had just one CpG site, and although correlated with age (rs> -0.5), it is the least age-correlated marker in the young, following the pattern of weaker correlation than other genes in adult studies [27]. However, it is important to take into account that inclusion of the ASPA CpG site improved the age prediction in the model built in this study and since only one CpG site was tested with EpiTYPER®, additional sites should be explored in order to seek higher correlations. In CCDC102B, MIR29B2CHG and chr16:85395429; the three, seven and three CpG sites we tested gave high correlations with young age ranges, particularly the first two, where substantial overlap between the CpG coverage was observed, as seen in KCNAB3. From the three CpGs detected for chr16:85395429, CR_23_CpG_3 gave the highest correlation (rs> -0.7), consistent with previous observations in adults [27].

PT

ED

M

After assessment of these results and taking into account the most age-correlated CpG per amplicon, we propose a preliminary age prediction model for young age ranges, testing six CpG sites and performing a quantile regression analysis, which results in a model providing an MAE of ±0.94 years. Validation was additionally performed in an independent dataset obtaining a MAE of ±1.25 years. These deviations from chronological age are lower than the previous model based on adult populations (MAE: ±3.07 years [27]), and the main contributor to this age prediction accuracy is KCNAB3. This gene displays an exponential change in DNA methylation levels during the early stages of life, but stabilizes during adulthood. However, although the error is reduced, additional sampling is necessary, especially regarding the upper limit of the model (18 years), in order to provide further support for legal hearings, where a minimal error range should be targeted by improvements in the predictive model.

A

CC E

The advantage of quantile regression analysis is the provision of the predicted age, complimented by two estimated age-specific prediction intervals (MinPred and MaxPred). By providing these prediction intervals, quantile regression is best able to handle the heteroscedasticity usually found in these data, i.e. non-constant error variance with age [27,49], For forensic tests, the chronological age is unknown, and an age prediction interval gives the most appropriate value to qualify the predictive performance of the test for investigative purposes. Although a preliminary age prediction model in the young is proposed, the main target of the present study is to explore changes in DNA methylation with chronological age in children and adolescents. From the analysis of such data, forthcoming challenges have emerged. First, once highly informative CpG sites are established either for the young or for adulthood, the current models should be rebuilt in order to accommodate the full human age range. This could be particularly useful in a forensic context, where no indications are usually present of the age of the donor of a contact trace (with very few exceptions, such as bite marks). Second, since DNA methylation patterns for some markers behave differently between different specific age ranges, e.g. an exponential change has been observed for some genes, i.e. KCNAB3; the possibility of using age specific tests may be necessary in

8

forensic analysis in order to achieve higher accuracy in predictions based on DNA. Third, technical validation of the preliminary age prediction model will be accomplished by further optimization of methylation analysis for forensic use, where poor quality DNA in small quantities is commonly encountered. Since the amplicons used in EpiTYPER® are ~300 bp in length, such fragment sizes could be decreased when adapting CpG detection to different technologies, such as massive parallel sequencing, where smaller amplicon sizes can be used. Additionally, sensitivity studies should be made to try to reduce the minimum levels of input DNA. Furthermore, since bisulfite conversion is a critical step due to the degradation that it produces in the DNA, technologies working with alternative deamination methods will be of interest for future forensic age estimation tests.

IP T

Acknowledgements

A

CC E

PT

ED

M

A

N

U

SC R

This work was supported by CHRONOGEN (BIO2013-42188-R), a research project funded by the Ministry of Economy and Competitiveness, Spain, and co-financed with ERDF funds. AFA was supported by a postdoctorate grant funded by the Xunta de Galicia, Spain, as part of the Plan Galego de Investigación, Innovación e Crecemento 2011–2015, Axudas de apoio á etapa de formación postdoutoral, Plan I2C. LGS was supported by the training subprogam by the Ministry of Economy and Competitiveness, Spain (as part of the Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-20). The genotyping service generating methylation analysis data for this study was carried out at CEGEN‐PRB2‐ISCIII and is supported by grant PT13/0001, ISCIII‐SGEFI / FEDER. We particularly wish to gratefully acknowledge the sample volunteers and the BioBank IBSP-CV (PT13/0010/0064) integrated in the Spanish National Biobanks Network and Valencian Biobanking Network for their collaboration.

9

References W. Reik, W. Dean, J. Walter, Epigenetic reprogramming in mammalian development, Science (80-. ). 293 (2001) 1089–1093.

[2]

M.F. Fraga, E. Ballestar, M.F. Paz, S. Ropero, F. Setien, M.L. Ballestar, D. Heine-Suñer, J.C. Cigudosa, M. Urioste, J. Benitez, M. Boix-Chornet, A. Sanchez-Aguilera, C. Ling, E. Carlsson, P. Poulsen, A. Vaag, Z. Stephan, T.D. Spector, Y.-Z. Wu, C. Plass, M. Esteller, Epigenetic differences arise during the lifetime of monozygotic twins., Proc. Natl. Acad. Sci. U. S. A. 102 (2005) 10604–10609.

[3]

A. Riggs, V. Russo, R. Martienssen, Epigenetic mechanisms of gene regulation, Plainview, N.Y. Cold Spring Harbor Laboratory Press, 1996.

[4]

A. Saha, J. Wittmeyer, B.R. Cairns, Chromatin remodelling: the industrial revolution of DNA around histones, Nat. Rev. Mol. Cell Biol. 7 (2006) 437–447.

[5]

B.D. Strahl, C.D. Allis, The language of covalent histone modifications., Nature. 403 (2000) 41–45.

[6]

J.T. Lee, Epigenetic regulation by long noncoding RNAs., Science (80-. ). 338 (2012) 1435–1439.

[7]

P.A. Jones, Functions of DNA methylation: islands, start sites, gene bodies and beyond., Nat. Rev. Genet. 13 (2012) 484–492..

[8]

M.M. Suzuki, A. Bird, DNA methylation landscapes: provocative insights from epigenomics., Nat. Rev. Genet. 9 (2008) 465–476.

[9]

P.W. Laird, Principles and challenges of genomewide DNA methylation analysis, Nat. Rev. Genet. 11 (2010) 191–203.

[10]

Z.D. Smith, A. Meissner, DNA methylation: roles in mammalian development, Nat. Rev. Genet. 14 (2013) 204-220.

[11]

D. Schübeler, Function and information content of DNA methylation, Nature. 517 (2015) 321–326.

[12]

H. Heyn, N. Li, H.H.J. Ferreira, S. Moran, D.G. Pisano, A. Gomez, J. Diez, Distinct DNA methylomes of newborns and centenarians, Proc. Natl. Acad. Sci. U. S. A. 109 (2012) 10522–10527.

[13]

S. Horvath, DNA methylation age of human tissues and cell types, Genome Biol. 14 (2013) R115.

[14]

M.J. Jones, S.J. Goodman, M.S. Kobor, DNA methylation and healthy human aging, Aging Cell. 14 (2015)

[15]

M. Zampieri, F. Ciccarone, R. Calabrese, C. Franceschi, A. Bürkle, P. Caiafa, Reconfiguration of DNA methylation in aging, Mech. Ageing Dev. 151 (2015) 60–70.

[16]

L.N. Booth, A. Brunet, The Aging Epigenome, Mol. Cell. 62 (2016) 728–744.

[17]

V.K. Rakyan, T.A. Down, S. Maslau, T. Andrew, T.P. Yang, H. Beyan, P. Whittaker, O.T. McCann, S. Finer, A.M. Valdes, R.D. Leslie, P. Deloukas, T.D. Spector, Human aging-associated DNA hypermethylation occurs preferentially at bivalent chromatin domains, Genome Res. 20 (2010) 434–439.

[18]

S. Bocklandt, W. Lin, M. Sehl, F. Sánchez, J. Sinsheimer, S. Horvath, E. Vilain, Epigenetic predictor of age, PLoS One. 6 (2011) e14821.

[19]

J.T. Bell, P.C. Tsai, T.P. Yang, R. Pidsley, J. Nisbet, D. Glass, M. Mangino, G. Zhai, F. Zhang, A. Valdes, S.Y. Shin, E.L. Dempster, R.M. Murray, E. Grundberg, A.K. Hedman, A. Nica, K.S. Small, E.T. Dermitzakis, M.I. McCarthy, J. Mill, T.D. Spector, P. Deloukas, Epigenome-wide scans identify differentially methylated regions for age and age-related phenotypes in a healthy ageing population, PLoS Genet. 8 (2012).

CC E

PT

ED

M

A

N

U

SC R

IP T

[1]

G. Hannum, J. Guinney, L. Zhao, L. Zhang, G. Hughes, S. Sadda, B. Klotzle, M. Bibikova, J.B. Fan, Y. Gao, R. Deconde, M. Chen, I. Rajapakse, S. Friend, T. Ideker, K. Zhang, Genome-wide methylation profiles reveal quantitative views of human aging rates, Mol. Cell. 49 (2013) 359–367.

[21]

Å. Johansson, S. Enroth, U. Gyllensten, Continuous aging of the human DNA methylome throughout the human lifespan, PLoS One. 8 (2013) e67378.

[22]

J.L. Mcclay, K.A. Aberg, S.L. Clark, S. Nerella, G. Kumar, L.Y. Xie, A.D. Hudson, A. Harada, C.M. Hultman, P.K.E. Magnusson, P.F. Sullivan, E.J.C.G. Van den oord, A methylome-wide study of aging using massively parallel sequencing of the methyl-CpG-enriched genomic fraction from blood in over 700 subjects, Hum. Mol. Genet. 23 (2014) 1175–1185.

[23]

I. Florath, K. Butterbach, H. Müller, M. Bewerunge-hudler, H. Brenner, Cross-sectional and longitudinal changes in DNA methylation with age: An epigenome-wide analysis revealing over 60 novel ageassociated CpG sites, Hum. Mol. Genet. 23 (2014) 1186–1201.

[24]

Q. Tan, B.T. Heijmans, J. v. B. Hjelmborg, M. Soerensen, K. Christensen, L. Christiansen, Epigenetic drift

A

[20]

10

in the aging genome: a ten-year follow-up in an elderly twin cohort, Int. J. Epidemiol. (2016) 1146-1158. P. Garagnani, M.G. Bacalini, C. Pirazzini, D. Gori, C. Giuliani, D. Mari, A.M. Di Blasio, D. Gentilini, G. Vitale, S. Collino, S. Rezzi, G. Castellani, M. Capri, S. Salvioli, C. Franceschi, Methylation of ELOVL2 gene as a new epigenetic marker of age, Aging Cell. 11 (2012) 1132–1134.

[26]

R. Zbieć-Piekarska, M. Spólnicka, T. Kupiec, Z. Makowska, A. Spas, A. Parys-Proszek, K. Kucharczyk, R. Płoski, W. Branicki, Examination of DNA methylation status of the ELOVL2 marker may be useful for human age prediction in forensic science, Forensic Sci. Int. Genet. 14 (2015) 161–167.

[27]

A. Freire-Aradas, C. Phillips, A. Mosquera-Miguel, L. Girón-Santamaría, A. Gómez-Tato, M. Casares De Cal, J. Álvarez-Dios, J. Ansede-Bermejo, M. Torres-Español, P.M. Schneider, E. Pospiech, W. Branicki, Carracedo, M. V. Lareu, Development of a methylation marker set for forensic age estimation using analysis of public methylation data and the Agena Bioscience EpiTYPER system, Forensic Sci. Int. Genet. 24 (2016) 65–74.

[28]

A. Freire-Aradas, C. Phillips, M. V. Lareu, Forensic individual age estimation with DNA: From initial approaches to methylation tests, Forensic Sci Rev. 29 (2017) 121–144.

[29]

A. Goenka, T.R. Kollmann, Development of immunity in early life, J. Infect. 71 (2015) S112–S120.

[30]

D. Wang, X. Liu, Y. Zhou, H. Xie, X. Hong, H.-J. Tsai, G. Wang, R. Liu, X. Wang, Individual variation and longitudinal pattern of genome-wide DNA methylation from birth to the first two years of life., Epigenetics. 7 (2012) 594–605.

[31]

J.B. Herbstman, S. Wang, F.P. Perera, S.A. Lederman, J. Vishnevetsky, A.G. Rundle, L.A. Hoepner, L. Qu, D. Tang, Predictors and consequences of global DNA methylation in cord blood and at three years, PLoS One. 8 (2013) e72824.

[32]

N. Acevedo, L.E. Reinius, M. Vitezic, V. Fortino, C. Söderhäll, H. Honkanen, R. Veijola, O. Simell, J. Toppari, J. Ilonen, M. Knip, A. Scheynius, H. Hyöty, D. Greco, J. Kere, Age-associated DNA methylation changes in immune genes, histone modifiers and chromatin remodeling factors within 5 years after birth in human blood leukocytes, Clin. Epigenetics. 26 (2015) 7–34.

[33]

A.J. Simpkin, M. Suderman, T.R. Gaunt, O. Lyttleton, W.L. McArdle, S.M. Ring, K. Tilling, G. Davey Smith, C.L. Relton, Longitudinal analysis of DNA methylation associated with birth weight and gestational age, Hum. Mol. Genet. 24 (2015) 3752–3763.

[34]

R.S. Alisch, B.G. Barwick, P. Chopra, L.K. Myrick, G.A. Satten, K.N. Conneely, S.T. Warren, Age-associated DNA methylation in pediatric populations, Genome Res. 22 (2012) 623–632.

[35]

L. Shi, F. Jiang, F. Ouyang, J. Zhang, Z. Wang, X. Shen, DNA methylation markers in combination with skeletal and dental ages to improve age estimation in children, Forensic Sci. Int. Genet. 33 (2018) 1–9.

[36]

M. Ehrich, D. Correll, D. Van Den Boom, Introduction to EpiTYPER for Quantitative DNA methylation analysis using the MassARRAY ® system, Seq. Appl. Note. Doc. No. 8 (2006) 1–8. www.sequenom.com.

[37]

UCSC, genome browser, https://genome.ucsc.edu/cgibin/hgGateway?redirect=manual&source=genome.ucsc.edu. Accessed June 2018

[38]

EpiDesigner, software, (n.d.). http://www.epidesigner.com/start3.html. Accessed June 2018

[39]

R.F. Thompson, J.M. Greally, Package MassArray: Analytical Tools for MassArray Data, (2015).

[40]

R. Koenker, S. Portnoy, P.T. Ng, A. Zeileis, P. Grosjean, B.D. Ripley, Package “quantreg” (2017).

[41]

A. Alfons, Package “cvTools”: Cross-validation tools for regression models, (2015).

[42]

H. Wickham, W. Chang, Package “ggplot2,” (2016). doi:10.1093/bioinformatics/btr406.

[43]

C.I. Weidner, Q. Lin, C.M. Koch, L. Eisele, F. Beier, P. Ziegler, D.O. Bauerschlag, K.-H. Jöckel, R. Erbel, T.W. Mühleisen, M. Zenke, T.H. Brümmendorf, W. Wagner, Aging of blood can be tracked by DNA methylation changes at just three CpG sites., Genome Biol. 15 (2014) R24.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[25]

[44]

R. Zbieć-Piekarska, M. Spólnicka, T. Kupiec, A. Parys-Proszek, Z. Makowska, A. Pałeczka, K. Kucharczyk, R. Płoski, W. Branicki, Development of a forensically useful age prediction method based on DNA methylation analysis, Forensic Sci. Int. Genet. 17 (2015) 173–179.

[45]

K. Almstrup, M. Lindhardt Johansen, A.S. Busch, C.P. Hagen, J.E. Nielsen, J.H. Petersen, A. Juul, Pubertal development in healthy children is mirrored by DNA methylation patterns in peripheral blood, Sci. Rep. 6 (2016) 28657.

[46]

B. Bekaert, A. Kamalandua, S.C. Zapico, W. Van De Voorde, R. Decorte, Improved age determination of blood and teeth samples using a selected set of DNA methylation markers, Epigenetics. 10 (2015) 922–

11

930. J.L. Park, J.H. Kim, E. Seo, D.H. Bae, S.Y. Kim, H.C. Lee, K.M. Woo, Y.S. Kim, Identification and evaluation of age-correlated DNA methylation markers for forensic use, Forensic Sci. Int. Genet. 23 (2016) 64–70.

[48]

D. Zubakov, F. Liu, I. Kokmeijer, Y. Choi, J.B.J. van Meurs, W.F.J. van IJcken, A.G. Uitterlinden, A. Hofman, L. Broer, C.M. van Duijn, J. Lewin, M. Kayser, Human age estimation from blood using mRNA, DNA methylation, DNA rearrangement, and telomere length, Forensic Sci. Int. Genet. 24 (2016) 33–43.

[49]

I. Smeers, R. Decorte, W. Van de Voorde, B. Bekaert, Evaluation of three statistical prediction models for forensic age prediction based on DNA methylation, Forensic Sci. Int. Genet. 34 (2018) 128–133.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

[47]

12

Figure legends Fig. 1. Dispersion diagrams showing the methylation level compared to the chronological age for the top ten CpG sites that display the highest age-correlation in children and adolescents. Data points for both young (orange, N=398, 3-17 years old, GSE27097) and adult (green, N=723, 14-94 years old, GSE87571) samples are depicted in order to show changes in the DNA methylation patterns for these CpG sites during the human lifetime.

IP T

Fig. 2. Differences in average DNA methylation levels detected for four individual of 3 years old versus four individuals of 18 years old using EpiTYPER® DNA methylation analysis. Five novel amplicons were explored: KCNAB3 (10 CpG sites), PRKG2 (3 CpG sites), FLJ46365 (1 CpG site), ITGA2B (2 CpG sites) and EDARADD (2 CpG sites).

SC R

Fig. 3. Bar plot representing the Spearman correlations for the 57 CpG sites contained in the ten amplicons analyzed: KCNAB3, PRKG2, EDARADD, ELOVL2, ASPA, PDE4C, FHL2, CCDC102B, MIR29B2CHG and chr16:85395429 (no gene associated), analyzed in 180 European individuals ranging from 2 to 18 years old using EpiTYPER® DNA methylation analysis. Positive correlations with age are depicted in green (hypermethylation), while negative correlations are depicted in red (hypomethylation).

U

Fig. 4. Dispersion diagrams representing the DNA methylation compared to the chronological age for the CpG sites presenting the highest age-correlation from each amplicon in 180 children/adolescent samples (2 to 18 years old) using EpiTYPER® DNA methylation analysis. Blue and red data points describe males and females, respectively. The corresponding Spearman correlation coefficient is also provided (rs).

A

CC E

PT

ED

M

A

N

Fig. 5. A. Predicted versus chronological age for a total of 180 subjects between 2 and 18 years old (training set). B. Predicted versus chronological age for a total of 29 additional subjects between 3 and 18 years old (testing set). Predictions were based on a quantile regression analysis using six CpG sites: CR_29_CpG_11 (KCNAB3), CR_35_CpG_3 (EDARADD), CR_1_CpG_15.16.17 (ELOVL2), CR_13_CpG_1 (CCDC102B), CR_21_CpG_9.10 (MIR29B2CHG) and CR_23_CpG_3 (no gene). The black diagonal line represents the 0.5 quantile regression line between predicted and chronological age, while the continuous grey line is the diagonal representing perfect correlation. The prediction intervals (0.1 and 0.9 quantile regression lines) are represented by discontinuous grey lines.

13

14

A ED

PT

CC E

IP T

SC R

U

N

A

M

15

A ED

PT

CC E

IP T

SC R

U

N

A

M

16

A ED

PT

CC E

IP T

SC R

U

N

A

M

17

A ED

PT

CC E

IP T

SC R

U

N

A

M

18

A ED

PT

CC E

IP T

SC R

U

N

A

M

Table 1. List of the top ten age-correlated CpG sites in a study cohort of 398 young individuals from 3 to 17 years old, derived from blood DNA methylation data in dataset GSE27097. Corresponding age-correlations including blood DNA methylation data from 723 donors aged between 14 to 94 (GSE87571) are also reported.

FLJ40365 SDS PGLYRP2 EDARADD HKR1 KCNAB3 PRKG2 FLJ46365 ITGA2B TOM1L1

cg02489552 cg04123409 cg07408456 cg09809672 cg12024906 cg14918082 cg16744741 cg25538571 cg25809905 cg27210390

GRCh38 chromosome position chr19:15010719 chr12:113403812 chr19:15479721 chr1:236394382 chr19:37334777 chr17:7929919 chr4:81204871 chr8:48590150 chr17:44390360 chr17:54901222

rs (GSE27097)

rs (GSE27097-GSE87571)

0.511 -0.516 -0.522 -0.512 0.530 0.656 -0.624 -0.582 -0.631 -0.501

0.832 -0.732 -0.820 -0.835 0.728 0.699 -0.752 -0.750 -0.705 -0.710

IP T

CpG ID

A

CC E

PT

ED

M

A

N

U

SC R

Gene

19

Table 2. List of the 10 selected CpG sites with the highest age-correlation values from each amplicon detected with EpiTYPER® DNA methylation analysis in 180 young subjects (2-18 years old). CpG ID

KCNAB3 PRKG2 EDARADD ELOVL2 ASPA PDE4C FHL2 CCDC102B MIR29B2CHG no gene

CR_29_CpG_11 CR_33_CpG_3 CR_35_CpG_3 CR_1_CpG_15.16.17 CR_2_CpG_3 CR_4_CpG_27.28.29 CR_12_1_CpG_3 CR_13_CpG_1 CR_21_CpG_9.10 CR_23_CpG_3

cg27162435 none none none/ none/ none cg02228185 none/ cg01481989/ none cg06639320 none none cg07082267

GRCh38 chromosome position chr17:7929845 chr4:81204908 chr1:236394441 chr6:11044634/31/28 chr17:3476273 chr19:18233127/31/33 chr2:105399282 chr18:68722210 chr1:207823702/05 chr16:85395429

rs 0.752 -0.563 -0.713 0.847 -0.503 0.611 0.677 -0.775 -0.771 -0.740

IP T

Internal code

A

CC E

PT

ED

M

A

N

U

SC R

Gene

20

Table 3. Comparison of four age prediction models based on quantile regression analysis using a total of 180 DNA blood samples aged between 2 and 18 years old. MAE: Median Absolute Error (± years), PPERA: Percentage of Predicted Error Relative to Age.

% Correct Classifications

Correlation Training

Correlation Testing

±0.94

9.80%

77.78%

0.9141

0.8925

±0.87

8.79%

75.56%

0.9177

±0.98

9.37%

73.33%

±0.91

9.06%

SC R

IP T

PPERA

0.8912

0.9148

0.8899

73.33%

0.9175

0.8909

A

CC E

PT

ED

D

CR_29_CpG_11 CR_35_CpG_3 CR_1_CpG_15.16.17 CR_13_CpG_1 CR_21_CpG_9.10 CR_23_CpG_3 CR_29_CpG_11 CR_35_CpG_3 CR_1_CpG_15.16.17 CR_13_CpG_1 CR_21_CpG_9.10 CR_23_CpG_3 CR_12_1_CpG_3 CR_29_CpG_11 CR_35_CpG_3 CR_1_CpG_15.16.17 CR_13_CpG_1 CR_21_CpG_9.10 CR_23_CpG_3 CR_4_CpG_27.28.29 CR_29_CpG_11 CR_35_CpG_3 CR_1_CpG_15.16.17 CR_13_CpG_1 CR_21_CpG_9.10 CR_23_CpG_3 CR_12_1_CpG_3 CR_4_CpG_27.28.29

(in

U

C

KCNAB3 EDARADD ELOVL2 CCDC102B MIR29B2CHG no gene KCNAB3 EDARADD ELOVL2 CCDC102B MIR29B2CHG no gene FHL2 KCNAB3 EDARADD ELOVL2 CCDC102B MIR29B2CHG no gene PDE4C KCNAB3 EDARADD ELOVL2 CCDC102B MIR29B2CHG no gene FHL2 PDE4C

MAE years)

N

B

CpG Sites

A

A

Gene

M

Model

21