Targeted exonic sequencing of GWAS loci in the high extremes of the plasma lipids distribution

Targeted exonic sequencing of GWAS loci in the high extremes of the plasma lipids distribution

Atherosclerosis 250 (2016) 63e68 Contents lists available at ScienceDirect Atherosclerosis journal homepage: www.elsevier.com/locate/atherosclerosis...

255KB Sizes 4 Downloads 28 Views

Atherosclerosis 250 (2016) 63e68

Contents lists available at ScienceDirect

Atherosclerosis journal homepage: www.elsevier.com/locate/atherosclerosis

Targeted exonic sequencing of GWAS loci in the high extremes of the plasma lipids distribution Aniruddh P. Patel a, b, c, Gina M. Peloso a, b, James P. Pirruccello a, b, c,  d, Daniel B. Larach e, Matthew R. Ban d, Christopher T. Johansen d, Joseph B. Dube f Geesje M. Dallinge-Thie , Namrata Gupta b, Michael Boehnke g, Gonçalo R. Abecasis g, John J.P. Kastelein f, G. Kees Hovingh f, Robert A. Hegele d, Daniel J. Rader h, Sekar Kathiresan a, b, c, * a

Cardiovascular Research Center and Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA, USA Program in Medical and Population Genetics, Broad Institute, Cambridge, MA, USA c Department of Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA d Department of Medicine, Western University, London, Ontario, Canada e Department of Anesthesiology, University of Michigan Medical School, Ann Arbor, MI, USA f Department of Vascular Medicine, Academic Medical Center, Amsterdam, The Netherlands g Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA h Institute for Translational Medicine and Therapeutics, University of Pennsylvania, Philadelphia, PA, USA b

a r t i c l e i n f o

a b s t r a c t

Article history: Received 15 September 2015 Received in revised form 12 April 2016 Accepted 13 April 2016 Available online 23 April 2016

Objective: Genome-wide association studies (GWAS) for plasma lipid levels have mapped numerous genomic loci, with each region often containing many protein-coding genes. Targeted re-sequencing of exons is a strategy to pinpoint causal variants and genes. Methods: We performed solution-based hybrid selection of 9008 exons at 939 genes within 95 GWAS loci for plasma lipid levels and sequenced using next-generation sequencing technology individuals with extremely high as well as low to normal levels of low-density lipoprotein cholesterol (LDL-C, n ¼ 311; mean low ¼ 71 mg/dl versus high ¼ 241 mg/dl), triglycerides (TG, n ¼ 308; mean low ¼ 75 mg/dl versus high ¼ 1938 mg/dl), and high-density lipoprotein cholesterol (HDL-C, n ¼ 684; mean low ¼ 32 mg/dl versus high ¼ 102 mg/dl). We identified 15,002 missense, nonsense, or splice site variants with a frequency <5%. We tested whether coding sequence variants, individually or aggregated within a gene, were associated with plasma lipid levels. To replicate findings, we performed sequencing in independent participants (n ¼ 6424). Results: Across discovery and replication sequencing, we found 6 variants with significant associations with plasma lipids. Of these, one was a novel association: p.Ser147Asn variant in APOA4 (14.3% frequency, TG OR ¼ 0.49, P ¼ 7.1  104) with TG. In gene-level association analyses where rare variants within each gene are collapsed, APOC3 (P ¼ 2.1  105) and LDLR (P ¼ 5.0  1012) were associated with plasma lipids. Conclusions: After sequencing genes from 95 GWAS loci in participants with extremely high plasma lipid levels, we identified one new coding variant associated with TG. These results provide insight regarding design of similar sequencing studies with respect to sample size, follow-up, and analysis methodology. © 2016 Elsevier Ireland Ltd. All rights reserved.

Keywords: Genetic techniques Genomics Lipids Human genetics

1. Introduction

* Corresponding author. 185 Cambridge Street, CPZN 5.252, Boston, MA, 02114, USA. E-mail address: [email protected] (S. Kathiresan). http://dx.doi.org/10.1016/j.atherosclerosis.2016.04.011 0021-9150/© 2016 Elsevier Ireland Ltd. All rights reserved.

Low-density lipoprotein cholesterol (LDL-C), triglycerides (TG), and high-density lipoprotein cholesterol (HDL-C) are highly heritable risk factors for coronary heart disease (CHD) [1]. Genomewide association studies (GWAS) have identified many new single

64

A.P. Patel et al. / Atherosclerosis 250 (2016) 63e68

nucleotide polymorphisms (SNPs) related to plasma lipid levels in the population [2e9]. Most associated SNPs are non-coding (intergenic or intronic) and fall in regions containing many protein-coding genes. It has been a major challenge to identify the causal gene and variant responsible for the observed associations. One approach to pinpoint causal genes and variants at GWAS loci is to perform fine mapping through targeted sequencing. Sequencing may identify a protein-altering variant in a gene, which if associated with LDL-C, TG, or HDL-C might suggest that the gene is influencing plasma lipid variation. The discovery of rare, nonsense alleles that affect a trait may be particularly informative. Targeted sequencing of GWAS loci has been used to pinpoint independent rare variants and causal genes for diabetes mellitus [10], fetal hemoglobin [11], age-related macular degeneration [12e16], and Crohn's disease, among others [17]. To search for causal genes and variants at genomic regions implicated for plasma lipids, we focused on the first 95 GWAS loci reported for plasma lipid levels [5] and targeted 9008 exons at 939 genes within these genomic regions. We sequenced individuals with extremely high plasma lipid levels and healthy controls, and performed replication sequencing in independent samples. Our two major goals were: 1) to discover novel coding variants of large effect in GWAS loci associated with plasma lipids; 2) to determine at least one causal gene influencing plasma lipids at each GWAS locus. 2. Materials and methods 2.1. Ethics statement All individuals studied and all analyses on their samples were performed according to the Declaration of Helsinki and were approved by the local medical ethics and institutional review committees at the Broad Institute. 2.2. Discovery cohort selection Individuals of European ancestry with an extremely high LDL-C, TG, or HDL-C level were recruited from lipid specialty clinics at the University of Pennsylvania, Amsterdam Medical Centre, and the University of Western Ontario. Healthy age and sex-matched controls were recruited from the same medical centers independent of the lipid clinics. Individuals without history of liver disease or HIV and who were not pregnant, nursing, or taking hormone replacement therapy or niacin had ~40 cc of blood drawn. Plasma lipid levels were measured directly by standard protocols in clinical labs. LDL-C was calculated using the Friedewald equation (LDL-C ¼ TC e HDL-C e (TG/5)) for those with TG < 400 mg/dl. If TG > 400 mg/dl, calculated LDL-C was not calculated. Whole genomic DNA was extracted from the blood of these individuals. The cut point percentiles were calculated based on the individuals of European descent in the Framingham Heart Study Offspring cohort stratified by age and sex. Individuals with plasma lipid levels greater than the 95th percentile were selected for targeted sequencing (LDL: n ¼ 145, mean LDL-C ¼ 241 mg/dl; TG: n ¼ 143, mean TG ¼ 1937 mg/dl; HDL: n ¼ 353, mean HDLC ¼ 101 mg/dl). Healthy controls with plasma lipid levels less than the 25th percentile for LDL and HDL and less than the 50th percentile for TG were also sequenced (LDL: n ¼ 166, mean LDLC ¼ 71 mg/dl; TG: n ¼ 165, mean TG ¼ 75 mg/dl; HDL: n ¼ 331, mean HDL-C ¼ 32 mg/dl) (Table 1). The paucity of individuals with very low TG led to the altered cutoff for the control group to maintain power.

2.3. Targeted sequencing We sequenced the exons of all genes within 300 kb from the lead GWAS SNP identified in a GWAS meta-analysis involving >100,000 individuals (Supplementary Table 1) [5,9]. The 95 loci previously mapped for plasma lipids (P < 5  108) represented a total of 9008 exons at 939 genes. Exons were captured by solutionbased hybridization [18]. To amplify exons, target-specific oligonucleotides 170 bases in length were designed to cover the entire coding sequence (hybrid selection bait size: 262,873 bases). These 170-mers were flanked with universal primer sequence to allow for PCR amplification. A T7 promoter was added in a second round of PCR, and in vitro transcription in the presence of biotin-UTP was performed to generate single-stranded hybridization bait to capture targets of interest from the DNA sample. Genomic DNA from individuals was randomly sheared and ligated to Illumina sequencing adapters. The fragments of this sheared and ligated genomic DNA were PCR amplified for 12 cycles and hybridized with biotinylated RNA bait. The hybridized DNA was extracted and PCR amplified to generate 36-base sequencing reads off of the Illumina adaptor sequence at the ends of each fragment. Next generation sequencing reactions were performed using Illumina Genome Analyzers. Base pairs were called and sequencing reads were aligned to the human genome reference GRCh37 (hg19). Sequencing metrics were calculated using the Picard dataprocessing pipeline with an output of Binary Alignment Map (BAM) files. The Genome Analysis Toolkit [19] suite was used to genotype all variants, calculate initial quality control metrics, and filter variants based on these values to result in an output of Variant Call Format (VCF) files, which were used for further quality control and analysis. Variants were annotated using SnpEFF [20]. 2.4. Discovery cohort quality control Samples that failed in any step of the solution hybrid selection component of the targeted sequencing process were excluded. Population clustering was assessed through multidimensional scaling using pruned common variants (>5% minor allele frequency) with high call rates and that were not in linkage disequilibrium. Outliers on a plot of the first two principal components generated from multidimensional scaling were excluded to ensure population stratification did not confound the results. Samples with high heterozygosity rates (number of heterozygote sites/ number of variants per sample) were excluded as presumptively contaminated, and those with high singleton counts (>three interquartile range above the median) were excluded due to presumptive sequencing error. Variants with low mean depth (<8) and low call rate (<95%) were excluded. 2.5. Discovery cohort statistical analysis Single variant association results for the discovery phase targeted sequencing analysis were computed using adaptive permutations on a dichotomous phenotype of high and low levels of plasma lipids using Fisher's exact test. Using a minor allele frequency cutoff of 5%, the variable threshold and C-alpha gene burden tests were used to identify significantly associated genes in each locus, with a Bonferroni corrected P value based on the total number of genes sequenced at the locus [21,22]. We filtered results to have a minor allele count of at least 5. All single variant associations and gene-based associations with a P value < 0.05 were compared with respective association results in the replication sequencing population. Multiple marginally significant associations within a GWAS locus underwent conditional analysis with respect to the strongest known association in the locus to confirm

A.P. Patel et al. / Atherosclerosis 250 (2016) 63e68

65

Table 1 Characteristics of participants who underwent targeted sequencing with extremely high or low plasma lipid levels.

LDL-C (mg/dl) TG (mg/dl) HDL-C (mg/dl) Age Female (%) BMI T2D (%) Smoking (%) HTN (%) Statin use (%) CAD (%)

Low LDL (n ¼ 166)

High LDL (n ¼ 145)

Low TG (n ¼ 165)

High TG (n ¼ 143)

Low HDL (n ¼ 331)

High HDL (n ¼ 353)

70.8 (15.9) 90.5 (36.5) 50.7 (12.7) 41.7 (17.5) 47.4% 18.3 (2.7) 0.0% 20.0% 0.0% 0.0% 0.0%

241.3 (41.2) 128.8 (45.7) 53.5 (12.2) 42.8 (15.1) 48.0% 23.9 (3.4) 1.4% 45.0% 40.7% 0.0% 30.2%

116.7 (32.7) 74.8 (20.4) 53.9 (14.9) 48.7 (15.2) 58.2% 25.6 (4.3) 1.8% 10.8% 10.4% 3.0% N/A

118.0 (83.0) 1937.8 (1907.5) 32.2 (13.6) 49.6 (12.2) 28.7% 30.5 (4.8) 37.9% 34.1% 46.4% 0.0% N/A

103.1 (38.3) 155.4 (86.4) 31.5 (4.9) 62.9 (13.6) 45.9% 28.9 (5.0) 6.9% 35.1% 60.4% 28.2% 82.1%

123.1 (36.0) 74.4 (30.4) 101.3 (18.9) 59.7 (12.2) 49.6% 23.5 (3.2) 5.9% 5.6% 27.4% 18.9% 4.6%

Mean phenotypic characteristics of individuals with low TG (<50th percentile), high TG (>95th percentile), low LDL-C (<25th percentile), high LDL-C (>95th percentile), low HDL-C (<25th percentile), high HDL-C (>95th percentile) who underwent targeted sequencing. All individuals who underwent targeted sequencing were of European ancestry. Parentheses denote standard deviations.

independent association results. All analyses were performed using R, GATK, PLINK, PLINK/SEQ [23e26]. Power estimates were recalculated after the quality control phase.

traits were inverse normalized before analysis. Analyses were performed separately by study and myocardial infarction casecontrol status adjusted for age, sex, and PCs of ancestry. Replication results were obtained through meta-analysis using the seqMeta package.

2.6. Replication studies To replicate our findings, samples were obtained from 3 studies of European descent: the Ottawa Heart Study [27], PROCARDIS [28], and ATVB [29] (Supplementary Table 2). Each study was designed to understand the inherited basis for coronary heart disease and ascertained cases with either myocardial infarction or coronary revascularization at an early age and controls free of coronary heart disease. Plasma lipid levels were measured by standard protocols in clinical labs in these individuals. For participants known to be on lipid lowering therapy, we estimated the untreated LDL-C value by dividing an individual's total cholesterol (TC) value by 0.8 for those on treatment. Such an approach has been demonstrated to perform well in accounting for treatment effects in studies of quantitative traits [30]. Statins are the most widely used treatment to lower plasma lipids and a statin at average dose reduces total cholesterol by 20% [31]. LDL-C was calculated using the Friedewald equation (LDL-C ¼ TC e HDL-C e (TG/5)) for those with TG < 400 mg/dl. If TG > 400 mg/dl, calculated LDL-C was set to missing. If TC was modified for medication use, the modified total cholesterol was used to calculate LDL-C. Whole genomic DNA was extracted from the blood of these individuals for sequencing, which was performed at the Broad Institute as previously described [32]. Briefly, we sequenced the exomes of individuals within these cohorts to high coverage by performing solution-based hybrid selection of exons (Agilent) followed by massively parallel sequencing (Illumina Genome Analyzer II and HiSeq). The hybrid selection targeted 32.7 million bases spanning 188,260 exons from 18,560 genes. We used the Burrows-Wheeler Aligner to map 76-base-pair reads. Using the Genome Analysis Toolkit, we identified and genotyped autosomal single nucleotide variants (SNVs) and short insertion-deletion variants (indels) occurring in exons and canonical splice sites up to 2 bases from each intron-exon boundary. We performed several quality control steps to identify and remove outlier samples (based on missingness, total number of variants, singletons, doubletons, and TiTv) and variants (based on VQRS). Analysis was performed using the seqMeta package (http://cran.r-project.org/web/ packages/seqMeta/index.html) in the R software package. Two analyses were performed: single variant association using linear regression analysis of plasma lipid levels on a continuous distribution and gene-based burden testing. For the gene based test, a 1% MAF threshold was used with only nonsynonymous and nonsense variants. TG levels were log transformed due to skewness, and all

3. Results 3.1. Discovery sequencing Of the 1434 individuals of European descent that underwent targeted sequencing, 1303 individuals remained after quality control measures and phenotype modeling. The final targeted sequencing association analysis was performed on 311 individuals for LDL-C levels (166 low LDL-C individuals with mean LDLC ¼ 70.8 mg/dl and 145 high LDL-C individuals with mean LDLC ¼ 241.3 mg/dl), 308 individuals for TG levels (143 low TG individuals with mean TG ¼ 74.8 mg/dl and 165 high TG individuals with mean TG ¼ 1937.8 mg/dl), and 684 individuals for HDL-C levels (331 low HDL-C individuals with mean HDL-C ¼ 31.6 mg/dl and 353 high HDL-C individuals with mean HDL-C ¼ 101.8 mg/dl) (Table 1). Of the 262,873 targeted bases, 76% were covered at greater than 30-fold coverage whereas 81% were covered at greater than 20-fold coverage. Across the 1303 individuals, we identified 16,199 missense, nonsense, or essential splice site DNA sequence variants and of these, 15,002 were ‘rare’ (defined in this report as minor allele frequency <5%). 3.2. Single variant association analysis from targeted sequence data First, we tested the association of individual variants discovered through targeted sequencing with plasma lipid levels using a casecontrol design. Quantile-quantile plots of the single variant association results from targeted sequencing show that most of the variants fall along the expected null distribution for each trait, indicating that each study is well calibrated (Supplementary Figures 1e3). A small fraction of nonsynonymous variants within the vicinity of the tested 95 GWAS loci (n ¼ 114 coding variants) displayed nominal evidence for association (P < 0.05) (Supplementary Table 3). Of these, 7 were loss-of-function mutations (stop gained, frame-shift, splice site) and 107 were missense mutations. The variants with the lowest P values for each trait were in genes with well-characterized roles in plasma lipid metabolism including APOB with LDL-C, LPL with TG, and CETP with HDL-C. The rare missense, nonsense, or splice-site variant with the strongest

66

A.P. Patel et al. / Atherosclerosis 250 (2016) 63e68

association evidence was in the TM6SF2 gene with LDL-C (p.Leu156Pro, 1.60% frequency, OR for high LDL-C of 0.05, P ¼ 1.1  103), C6orf10 gene with TG (p.Gly463Val, 1.30% frequency, OR for high TG of 20.17, P ¼ 8.6  104), and CETP gene with HDL-C (p.Arg408Gln, 3.2% frequency, OR for high HDL-C of 0.22, P ¼ 1.0  105). 3.3. Replication of single variant results with additional sequencing We sought to replicate single variants with nominal significance in a set of independent participants. Towards this end, we used sequences from 6424 individuals of European descent. These individuals had plasma lipid levels reflective of the general European population (mean LDL-C ¼ 140.4 mg/dl, mean TG ¼ 162.8 mg/dl, and mean HDL-C ¼ 48.1 mg/dl (Supplementary Table 2). After performing stringent quality control measures, we performed single variant analyses using linear regression for LDL-C, TG, and HDLC levels in this cohort. Quantile-quantile plots were well calibrated (Supplementary Figure 4) for each trait. Across discovery and replication, 6 variants were found to be associated with plasma lipid levels (Table 2) after accounting for the number of variants tested. Of these, 1 was found to be associated with LDL-C (p.Thr98Ile in APOB), 3 with HDL-C (p.Arg408Gln in CETP, p.Ser460X in LPL, and p.Ser19Trp in APOA5), and 4 with triglycerides (p.Ser460X in LPL, p.Leu256Pro in GCKR, p.Ser147Asn in APOA4, and p.Ser19Trp in APOA5). Only one nonsense variant, p.Ser460X in LPL, was found to have a significant association with both HDL-C and triglycerides, and only p.Arg408Gln in CETP had a MAF <5%. One of the results was novel whereas seven of the associations have been previously reported. We found a novel association between APOA4 p.Ser147Asn (MAF ¼ 14.3%) and TG. Carriers of APOA4 p.Ser147Asn have a 51% reduced risk of high TG compared with non-carriers in the discovery set (P ¼ 1.5  103), and on average 9.8 mg/dl lower on TG (P ¼ 7.1  104) in the replication set [33e36]. 3.4. Gene-level association analysis using sequence data Our initial goal was to find and validate low-frequency variants in genes in the vicinity of known GWAS loci which may be contributing to the local GWAS signal. Power for detecting these associations is limited by the rarity of variants under study and the limited number of individuals that carry a particular variant. With the gene burden analysis approach, we aggregated putatively functional variation within a gene to test the proportion of variant carriers in the low lipid group versus the proportion of carriers in

the high lipid group using the variable threshold (VT) and C-alpha gene-based tests. C-alpha test is more powerful than VT when there are both protective and deleterious variants with different magnitudes in the same gene. The VT test is more powerful when the magnitudes and directions of effect for the individual variants in a gene are consistent. Using burden testing collapsing missense, nonsense, and splice site mutations in each gene, we found nominally significant associations for 18 genes with LDL-C, 11 genes with TG, and 25 genes with HDL-C (P < 0.05, see Supplementary Table 4). Of these, only the APOC3 association with HDL (P ¼ 2.1  105) and the LDLR association with LDL (P ¼ 5.0  1012) were found to replicate in independent participants after adjusting for the number of genes tested (Table 3). No replicating significant associations were identified using the C-alpha test. 4. Conclusion In individuals with extremely high LDL-C, TG, or HDL-C and healthy controls, we sequenced the coding regions of 939 genes located at 95 GWAS loci for plasma lipids. We subsequently performed a replication analysis in independent samples. After performing single variant and gene-burden analyses across discovery and replication cohorts, we identified a coding variant at APOA4 as associated with plasma TG levels. This study was successful in replicating several known genes with previously defined associations with plasma lipid levels. The functions of LPL, GCKR, APOA5, APOB, and CETP in plasma lipid metabolism have been well established [33e37]. The study also identified a novel genetic association between APOA4 and triglycerides. Although its precise function remains unknown, human apolipoprotein A4 is synthesized in the intestines and is secreted in chylomicrons [38e40]. Synthesis of APOA4 is increased during fat absorption, and it is thought to activate lecithin-cholesterol acyltransferase in vitro [41,42]. Further functional studies will be needed to fully elucidate the role of this protein in regulating plasma triglyceride levels. The study permits several conclusions. First, since we were unable to pinpoint specific coding variants responsible for the genome-wide association signals for many plasma lipid loci, we can speculate that the GWAS association signals may truly be due to non-coding, regulatory variants. Of the 95 total HDL-C GWAS loci that were fine mapped, 88 loci (93%) remain without any replicating significant single coding variant or gene-based association. Only the p.Ser460X variant in LPL, the p.Thr98Ile variant in APOB, and the p.Ser147Asn variant in APOA4 were found to have identical minor allele frequencies and similar effect size estimates as the non-coding variants in their respective GWAS loci (LPL locus:

Table 2 Single variant association results including discovery and replication phases. Trait Gene

Chromosome:Position rsID

Replication Targeted Protein Ref/ Alternate AF Alternate AF HWE Targeted sequencing sequencing P sequencing effect P in high in low change Alt size (mg/dl) value value OR group allele group

HDL HDL HDL LDL TG TG TG TG

16:57017319 8:19819724 11:116662407 2:21263900 11:116662407 2:27730940 11:116692334 8:19819724

R408Q S460X S19W T98I S19W L256P S147N S460X

CETP LPL APOA5 APOB APOA5 GCKR APOA4 LPL

rs1800777 rs328 rs3135506 rs1367117 rs3135506 rs1260326 rs5104 rs328

G/A C/G G/C G/A G/C T/C C/T C/G

0.05 0.08 0.07 0.26 0.03 0.63 0.88 0.10

0.01 0.13 0.04 0.39 0.18 0.45 0.78 0.03

0.53 0.85 0.15 0.14 0.21 0.91 0.42 0.65

0.22 1.66 0.61 1.83 7.11 0.48 0.49 0.33

1.0 3.6 4.2 2.7 1.0 4.0 1.5 2.6

       

105 103 102 104 106 106 103 103

3.9 þ2.9 2.13 þ4.5 þ25.2 10.7 9.8 24.3

± ± ± ± ± ± ± ±

0.7 0.4 0.5* 0.8 4.2* 2.1 2.9 3.2

Replication sequencing P value 4.7 4.3 8.9 8.2 2.3 1.8 7.1 2.1

       

108 1013 105* 108 109* 107 104** 1014

Locus

CETP LPL APOA1 APOB APOA1 GCKR APOA1 LPL

Association results for variants found to have P value < 0.05 in targeted sequencing single variant analysis and significant association in replication sequencing single variant analysis after Bonferroni correction for number of variants tested; REF ¼ Reference allele, ALT ¼ Alternate allele, AF ¼ allele frequency, HWE ¼ Hardy-Weinberg Equilibrium, OR ¼ odds ratio, Locus ¼ gene name assigned to plasma lipid GWA SNP from Teslovich et al., Nature 2010. *Result provided for rs35120633, which is a perfect proxy (r2 ¼ 1) for rs3135506. **Conditional analysis performed for SNP rs5104 taking into account nearby SNP rs35120633 with stronger association, and P value remained significant at P ¼ 1.5  104.

A.P. Patel et al. / Atherosclerosis 250 (2016) 63e68

67

Table 3 Top gene-level association results after discovery and replication phases. Trait

Position

Gene

Test

P value from targeted sequencing

P value from replication sequencing

HDL-C LDL-C

chr11:116701353-116701613 chr19:11200282-11241961

APOC3 LDLR

VT VT

1.8  103 4.1  103

2.1  105 5.0  1012

Association results with P < 0.05 in targeted sequencing gene based analysis using variable threshold tests (VT) and replication sequencing gene based analysis using burden tests.

rs12678919, frequency ¼ 12%, HDL effect ¼ þ2.25 mg/dl, HDL P ¼ 9.71  1098 TG effect ¼ 13.64 mg/dl, TG P ¼ 1.5  10115; APOB locus: rs1367117, frequency ¼ 30%, LDL effect ¼ þ4.16 mg/dl, LDL P ¼ 4.08  1096; and APOA1 locus: rs964184, frequency ¼ 13%, TG effect ¼ þ16.95 mg/dl, TG P ¼ 6.71  10240) [5]. This suggests that for these three loci, the identified coding variants may be responsible for the signal derived by the common noncoding variant in the GWAS association, but further functional studies need to be performed. For the remaining loci, intronic or intergenic SNPs in the vicinity of the plasma lipid GWAS loci may be involved in regulation and expression of coding genes involved in lipid metabolism [43]. Secondly, targeted sequencing may have limited utility in discovering causal variants at GWAS regions. The absence of rare coding variants of large effect in GWAS loci seen in this study is consistent with a recent targeted sequencing study for autoimmune diseases with even larger sample sizes [44]. Although targeted sequencing has previously been used to identify a few genes implicated in various diseases, hundreds of GWAS loci have collectively been fine mapped and the functional significance of the association signal has not been fully resolved for the vast majority of the loci [10e12,17]. Therefore revisiting and systematically studying the initially discovered non-coding variants in the implicated loci will be necessary to better understand the biologic underpinnings of these associations. Some key limitations of the present study need to be considered. Since the discovery cohort is ascertained from the high extremes of the population while the validation cohort is ascertained from the general population, we could have missed replicating associations that only drive someone to an extreme phenotype and do not have an effect in the general population. Although sampling in the high extremes was performed to increase the power of finding rare functional variants [45], dichotomizing continuously varying plasma lipid levels may have led to loss of information. Additionally, the collective sample size of 1303 individuals may be too small to provide sufficient power to detect associations of rare alleles with more modest effect associated with these three traits. Based on the respective cohort sizes, this final targeted sequencing analysis has a calculated 80% power to identify 1% frequency variants with OR greater than 2.4 or less than 0.42 in LDL-C, 1% frequency variants with OR greater than 2.58 or less than 0.39 in TG, and 1% frequency variants with odds ratio greater than 1.85 or less than 0.54 in HDL-C, (Supplementary Table 5) [46]. This study successfully identified common variants and genes previously implicated in plasma lipid metabolism as well as one new association with plasma triglyceride levels. However, we did not identify any new coding variants or genes associated with plasma lipids at the remaining 94 GWAS loci. Fine mapping of coding regions surrounding GWAS loci may have limited utility in the investigation of the cause of these association signals. These results provide insight regarding the design of similar sequencing studies for cardiovascular traits. Acknowledgements APP and JPP are recipients of research fellowships from the

Stanley J. Sarnoff Cardiovascular Research Foundation. Funding for this study was provided by NHLBI grant 5RC1HL099793 to DJR. GMP is supported by award number T32HL007208 from the NHLBI. NJS holds a Chair supported by the British Heart Foundation. SK is supported by a Research Scholar award from the Massachusetts General Hospital, R01HL107816, and a grant from Fondation Leducq. JJPK is holder of a lifetime achievement award from the Dutch Heart Foundation. GKH is a recipient of a Veni grant (project number 91612122) from the Netherlands Organisation for Scientific Research (NWO). GKH is holder of a Veni grant (91612122) from the Netherlands Organisation for Scientific Research. This work is supported by the CardioVascular Research Initiative (CVON201119; Genius), European Union (TransCard: FP7-603091-2), and Fondation Leducq (Transatlantic Network, 2009-2014). Appendix A. Supplementary data Supplementary data related to this article can be found at http:// dx.doi.org/10.1016/j.atherosclerosis.2016.04.011. References [1] W.B. Kannel, T.R. Dawber, A. Kagan, N. Revotskie, J. Stokes 3rd, Factors of risk in the development of coronary heart diseaseesix year follow-up experience. The framingham study, Ann. Intern. Med. 55 (1961) 33e50. [2] S. Kathiresan, O. Melander, C. Guiducci, et al., Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans, Nat. Genet. 40 (2008) 189e197. [3] S. Kathiresan, C.J. Willer, G.M. Peloso, et al., Common variants at 30 loci contribute to polygenic dyslipidemia, Nat. Genet. 41 (2009) 56e65. [4] R. Saxena, B.F. Voight, V. Lyssenko, N.P. Burtt, P.I.W. De Bakker, H. Chen, J.J. Roix, S. Kathiresan, J.N. Hirschhorn, M.J. Daly, T.E. Hughes, L. Groop, D. Altshuler, Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels, Science 316 (2007) 1331e1336. [5] T.M. Teslovich, K. Musunuru, A.V. Smith, et al., Biological, clinical and population relevance of 95 loci for blood lipids, Nature 466 (2010) 707e713. [6] C.J. Willer, S. Sanna, A.U. Jackson, et al., Newly identified loci that influence lipid concentrations and risk of coronary artery disease, Nat. Genet. 40 (2008) 161e169. [7] Y.S. Aulchenko, S. Ripatti, I. Lindqvist, et al., Loci influencing lipid levels and coronary heart disease risk in 16 european population cohorts, Nat. Genet. 41 (2009) 47e55. [8] C. Sabatti, S.K. Service, A.L. Hartikainen, et al., Genome-wide association analysis of metabolic traits in a birth cohort from a founder population, Nat. Genet. 41 (2009) 35e46. [9] C Global Lipids Genetics, C.J. Willer, E.M. Schmidt, et al., Discovery and refinement of loci associated with lipid levels, Nat. Genet. 45 (2013) 1274e1283. [10] S. Nejentsev, N. Walker, D. Riches, M. Egholm, J.A. Todd, Rare variants of ifih1, a gene implicated in antiviral responses, protect against type 1 diabetes, Science 324 (2009) 387e389. [11] G. Galarneau, C.D. Palmer, V.G. Sankaran, S.H. Orkin, J.N. Hirschhorn, G. Lettre, Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation, Nat. Genet. 42 (2010) 1049e1051. [12] S. Raychaudhuri, O. Iartchouk, K. Chin, et al., A rare penetrant mutation in cfh confers high risk of age-related macular degeneration, Nat. Genet. 43 (2011) 1232e1236. [13] H. Helgason, P. Sulem, M.R. Duvvari, et al., A rare nonsynonymous sequence variant in c3 is associated with high risk of age-related macular degeneration, Nat. Genet. 45 (2013) 1371e1376. [14] J.M. Seddon, Y. Yu, E.C. Miller, et al., Rare variants in cfi, c3 and c9 are associated with high risk of advanced age-related macular degeneration, Nat. Genet. 45 (2013) 1366e1373. [15] J.P.H. Van De Ven, S.C. Nilsson, P.L. Tan, et al., A functional variant in the cfi gene confers a high risk of age-related macular degeneration, Nat. Genet. 45 (2013) 813e817.

68

A.P. Patel et al. / Atherosclerosis 250 (2016) 63e68

[16] X. Zhan, D.E. Larson, C. Wang, et al., Identification of a rare coding variant in complement 3 associated with age-related macular degeneration, Nat. Genet. 45 (2013) 1375e1381. [17] M.A. Rivas, M. Beaudoin, A. Gardet, et al., Deep resequencing of gwas loci identifies independent rare variants associated with inflammatory bowel disease, Nat. Genet. 43 (2011) 1066e1073. [18] A. Gnirke, A. Melnikov, J. Maguire, P. Rogov, E.M. LeProust, W. Brockman, T. Fennell, G. Giannoukos, S. Fisher, C. Russ, S. Gabriel, D.B. Jaffe, E.S. Lander, C. Nusbaum, Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing, Nat. Biotechnol. 27 (2009) 182e189. [19] M.A. Depristo, E. Banks, R. Poplin, et al., A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nat. Genet. 43 (2011) 491e501. [20] P. Cingolani, A. Platts, L. Wang le, M. Coon, T. Nguyen, L. Wang, S.J. Land, X. Lu, D.M. Ruden, A program for annotating and predicting the effects of single nucleotide polymorphisms, snpeff: snps in the genome of drosophila melanogaster strain w1118; iso-2; iso-3, Fly 6 (2012) 80e92. [21] A.L. Price, G.V. Kryukov, P.I.W. de Bakker, S.M. Purcell, J. Staples, L.J. Wei, S.R. Sunyaev, Pooled association tests for rare variants in exon-resequencing studies, Am. J. Hum. Genet. 86 (2010) 832e838. [22] B.M. Neale, M.A. Rivas, B.F. Voight, D. Altshuler, B. Devlin, M. Orho-Melander, S. Kathiresan, S.M. Purcell, K. Roeder, M.J. Daly, Testing for an unusual distribution of rare variants, PLoS Genet. 7 (2011) e1001322. [23] A. McKenna, M. Hanna, E. Banks, A. Sivachenko, K. Cibulskis, A. Kernytsky, K. Garimella, D. Altshuler, S. Gabriel, M. Daly, M.A. DePristo, The genome analysis toolkit: a mapreduce framework for analyzing next-generation DNA sequencing data, Genome Res. 20 (2010) 1297e1303. [24] S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M.A.R. Ferreira, D. Bender, J. Maller, P. Sklar, P.I.W. De Bakker, M.J. Daly, P.C. Sham, Plink: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet. 81 (2007) 559e575. [25] Plink/seq: A Library for the Analysis of Genetic Variation Data, 2012. [26] RDC Team, R: A Language and Environment for Statistical Computing, 2010. [27] R. McPherson, A. Pertsemlidis, N. Kavaslar, A. Stewart, R. Roberts, D.R. Cox, D.A. Hinds, L.A. Pennacchio, A. Tybjaerg-Hansen, A.R. Folsom, E. Boerwinkle, H.H. Hobbs, J.C. Cohen, A common allele on chromosome 9 associated with coronary heart disease, Science 316 (2007) 1488e1491. [28] R. Clarke, J.F. Peden, J.C. Hopewell, et al., Genetic variants associated with lp(a) lipoprotein level and coronary disease, N. Engl. J. Med. 361 (2009) 2518e2528. [29] S. Kathiresan, B.F. Voight, S. Purcell, et al., Genome-wide association of earlyonset myocardial infarction with single nucleotide polymorphisms and copy number variants, Nat. Genet. 41 (2009) 334e341. [30] M.D. Tobin, N.A. Sheehan, K.J. Scurrah, P.R. Burton, Adjusting for treatment effects in studies of quantitative traits: antihypertensive therapy and systolic blood pressure, Statistics Med. 24 (2005) 2911e2935. [31] C. Baigent, A. Keech, P.M. Kearney, L. Blackwell, G. Buck, C. Pollicino, A. Kirby, T. Sourjina, R. Peto, R. Collins, R. Simes, C cholesterol treatment trialists. Efficacy and safety of cholesterol-lowering treatment: prospective meta-

[32] [33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42] [43] [44]

[45] [46]

analysis of data from 90,056 participants in 14 randomised trials of statins, Lancet 366 (2005) 1267e1278. J. Crosby, G.M. Peloso, P.L. Auer, et al., Loss-of-function mutations in apoc3, triglycerides, and coronary disease, N. Engl. J. Med. 371 (2014) 22e31. I. Kondo, K. Berg, D. Drayna, R. Lawn, DNA polymorphism at the locus for human cholesteryl ester transfer protein (cetp) is associated with high density lipoprotein cholesterol and apolipoprotein levels, Clin. Genet. 35 (1989) 49e56. J.R. Patsch, S. Prasad, A.M. Gotto, W. Patsch, High density lipoprotein2. Relationship of the plasma levels of this lipoprotein species to its composition, to the magnitude of postprandial lipemia, and to the activities of lipoprotein lipase and hepatic lipase, J. Clin. Investig. 80 (1987) 341e347. D. Farrelly, K.S. Brown, A. Tieman, J. Ren, S.A. Lira, D. Hagan, R. Gregg, K.A. Mookhtiar, N. Hariharan, Mice mutant for glucokinase regulatory protein exhibit decreased liver glucokinase: a sequestration mechanism in metabolic regulation, Proc. Natl. Acad. Sci. U. S. A. 96 (1999) 14511e14516. L.F. Soria, E.H. Ludwig, H.R.G. Clarke, G.L. Vega, S.M. Grundy, B.J. McCarthy, Association between a specific apolipoprotein b mutation and familial defective apolipoprotein b-100, Proc. Natl. Acad. Sci. U. S. A. 86 (1989) 587e591. L.A. Pennacchio, M. Olivier, J.A. Hubacek, J.C. Cohen, D.R. Cox, J.C. Fruchart, R.M. Krauss, E.M. Rubin, An apolipoprotein influencing triglycerides in humans and mice revealed by comparative sequencing, Science 294 (2001) 169e173. P.H.R. Green, R.M. Glickman, J.W. Riley, E. Quinet, Human apolipoprotein a-iv. Intestinal origin and distribution in plasma, J. Clin. Investig. 65 (1980) 911e919. P.H.R. Green, R.M. Glickman, C.D. Saudek, C.B. Blum, A.R. Tall, Human intestinal lipoproteins. Studies in chyluric subjects, J. Clin. Investig. 64 (1979) 233e242. S.K. Karathanasis, Apolipoprotein multigene family: tandem organization of human apolipoprotein ai, ciii, and aiv genes, Proc. Natl. Acad. Sci. U. S. A. 82 (1985) 6374e6378. S.K. Karathanasis, P. Oettgen, I.A. Haddad, S.E. Antonarakis, Structure, evolution, and polymorphisms of the human apolipoprotein a4 gene (apoa4), Proc. Natl. Acad. Sci. U. S. A. 83 (1986) 8457e8461. A. Steinmetz, G. Utermann, Activation of lecithin:cholesterol acyltransferase by human apolipoprotein a-iv, J. Biol. Chem. 260 (1985) 2258e2264. EP Consortium, I. Dunham, A. Kundaje, et al., An integrated encyclopedia of DNA elements in the human genome, Nature 489 (2012) 57e74. K.A. Hunt, V. Mistry, N.A. Bockett, et al., Negligible impact of rare autoimmune-locus coding-region variants on missing heritability, Nature 498 (2013) 232e235. G.M. Peloso, D.J. Rader, S. Gabriel, S. Kathiresan, M.J. Daly, B.M. Neale, Phenotypic extremes in rare variant study designs, Eur. J. Hum. Genet. (2015). S. Purcell, S.S. Cherny, P.C. Sham, Genetic power calculator: design of linkage and association genetic mapping studies of complex traits, Bioinformatics 19 (2003) 149e150.