Accepted Manuscript Screening for regulatory variants in 460kb encompassing the CFTR locus in cystic fibrosis patients Jenny L. Kerschner, Sujana Ghosh, Alekh Paranjapye, Wilmel R. Cosme, MariePierre Audrézet, Miyuki Nakakuki, Hiroshi Ishiguro, Claude Férec, Johanna Rommens, Ann Harris PII:
S1525-1578(18)30236-8
DOI:
10.1016/j.jmoldx.2018.08.011
Reference:
JMDI 739
To appear in:
The Journal of Molecular Diagnostics
Received Date: 2 June 2018 Revised Date:
18 July 2018
Accepted Date: 10 August 2018
Please cite this article as: Kerschner, JL, Ghosh, S, Paranjapye, A, Cosme, WR, Audrézet, M-P, Nakakuki, M, Ishiguro, H, Férec, C, Rommens, J, Harris A, Screening for regulatory variants in 460kb encompassing the CFTR locus in cystic fibrosis patients, The Journal of Molecular Diagnostics (2018), doi: https://doi.org/10.1016/j.jmoldx.2018.08.011. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Screening for regulatory variants in 460kb encompassing the CFTR locus in cystic fibrosis patients
RI PT
Jenny L. Kerschner,* Sujana Ghosh,†‡ Alekh Paranjapye,* Wilmel R. Cosme,* Marie-Pierre Audrézet,§ Miyuki Nakakuki,¶ Hiroshi Ishiguro,¶ Claude Férec,§ Johanna Rommens,ǁ** and Ann Harris*†‡
SC
From the Department of Genetics and Genome Sciences,* Case Western Reserve University, Cleveland,Ohio; the Human Molecular Genetics Program,† Lurie Children’s Research
M AN U
Center, Chicago, Illinois; the Department of Pediatrics,‡ Northwestern University Feinberg School of Medicine, Chicago, Illinois; INSERM U1078,§ Brest, France; Human Nutrition,¶ Nagoya University Graduate School of Medicine, Nagoya, Japan; the Program in Genetics and Genome Biology,ǁ Research Institute, The Hospital for Sick Children, Toronto, Ontario,
Ontario.
TE D
Canada; and the Department of Molecular Genetics,** University of Toronto, Toronto,
AC C
Ann Harris
EP
To whom correspondence should be addressed:
Department of Genetics and Genome Sciences Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio, 44106 Email:
[email protected] Running Title: Identifying regulatory variants in CFTR Disclosures: None declared.
1
ACCEPTED MANUSCRIPT
Funding: Supported by the Cystic Fibrosis Foundation (Harris 14P0 and 16G0), the National Institutes of Health R01HL094585 (PI: AH); Genome Canada through the Ontario Genomics Institute (2004-OGI-3-05), and the Canadian Cystic Fibrosis Foundation (aka
AC C
EP
TE D
M AN U
SC
RI PT
Cystic Fibrosis Canada).
2
ACCEPTED MANUSCRIPT
ABSTRACT It is estimated that up to 5% of cystic fibrosis transmembrane conductance regulator (CFTR) pathogenic alleles are unidentified. Some of these errors may lie in non-coding
RI PT
regions of the locus and impact gene expression. To identify regulatory element variants in the CFTR locus, SureSelect targeted enrichment of 460kb encompassing the gene was
optimized to deep-sequence genomic DNA from 80 CF patients with an unequivocal clinical
SC
diagnosis but only one or no CFTR-coding region pathogenic variants. Bioinformatics tools were used to identify sequence variants and predict their impact, which was then assayed
M AN U
in transient reporter gene luciferase assays. The effect of five variants in the CFTR promoter and four in an intestinal enhancer of the gene were assayed in relevant cell lines. The initial analysis of sequence data revealed previously known CF-causing variants, validating the robustness of the SureSelect design, and showed that 85/160 CF alleles were
TE D
undefined. Of a total 1,737 variants revealed across the extended 460kb CFTR locus, 51 map to known CFTR cis-regulatory elements, and many of these are predicted to alter transcription factor occupancy. Four promoter variants and all those in the intestinal
EP
enhancer significantly repress reporter gene activity. These data suggest that CFTR
AC C
regulatory elements may harbor novel CF disease–causing variants that warrant further investigation, both for genetic screening protocols and functional assays.
3
ACCEPTED MANUSCRIPT
INTRODUCTION
Most disease-causing variants for monogenic disorders fall within the coding region or
RI PT
splice site sequences of genes. Although sequencing efforts to identify causal variants have focused mainly on the exome, it is estimated that 98% of the human genome is comprised of non-coding DNA. It is likely that many pathogenic variants lie within these non-coding
SC
sequences, especially those that influence chromatin structure/organization or gene
expression. Next-generating sequencing (NGS) protocols have facilitated the discovery and
M AN U
characterization of many cis-regulatory element variants that underlie human disease (1). However, generating and interpreting full genomic sequence data for disease-associated loci in patients is not yet common practice.
TE D
Cystic fibrosis, a common life-shortening autosomal recessive disorder, is caused by pathogenic variants in the cystic fibrosis transmembrane conductance regulator (CFTR) gene, which encodes a cyclic AMP-activated chloride channel. Newborn screening for CF is
EP
routine in the countries with advanced healthcare, as most of the estimated 70,000
AC C
individuals with CF are of European ancestry2-6. Commercial tests for about 100 variants detect about 90% of CF-causal alleles in North America and Northern European countries. There are over 2000 documented CFTR variants listed in the CFTR Mutation Database (www.genet.sickkids.on.ca; last accessed June 02, 2018), however as of December 2017, only 312 of the most frequently observed variants in the CF population have been annotated as disease-causing in the Clinical and Functional Translation of CFTR Database, (CFTR2; available at https://cftr2.org; last accessed June 02, 2018). Notably, the variants
4
ACCEPTED MANUSCRIPT
listed in these databases largely include those that fall within the exons (coding regions) or at intron-exon boundary segments (encompassing splice sites) of the 189kb CFTR locus, which spans from the 1kb upstream of the transcription start site to the 3’ untranslated
RI PT
region. However, even with extensive sequencing of these exonic and boundary regions, and identification of novel intronic cryptic splice sites by transcript analysis it is estimated
regulatory non-coding regions of the CFTR locus7.
SC
that 1% to 5% of CF alleles have unidentified causal variants, which are presumed to lie in
M AN U
We previously identified and characterized CFTR structural and cis-regulatory elements which extend ~80kb upstream of the CFTR transcriptional start site and ~90kb downstream of the CFTR translational stop site, spanning a distance of ~350kb8. The distal sites at -80.1kb (from the CFTR start codon) and +48.9kb (from CFTR stop codon) function
TE D
as topologically associating domain (TAD) boundaries, which isolate CFTR and its regulatory elements from those of other genes9, 10. Many important CFTR cis-regulatory
EP
elements are found in the intergenic and intronic sequences encompassed by this TAD8-12.
AC C
Our goal was to develop a robust targeted-enrichment approach to sequence the ~460kb region encompassing the CFTR TAD, thus including known structural and cis-regulatory elements. This pipeline was then applied to look for variants in the non-coding regions of CFTR in CF patients with an unequivocal diagnosis but incomplete causal variant information. The following were used: i) an archived cohort of Canadian CF patients with undefined pathogenic variants on one or both alleles after coding region evaluation by earlier multiplex heteroduplex analysis13 and exclusion of large deletions in some patients;
5
ACCEPTED MANUSCRIPT
ii) CF patients from Brittany, France, and Japan in which all exons had been fully sequenced, and those from France also screened for large deletions14, 15. Following deep sequencing and genotyping of 80 CF genomic DNA samples, known disease-causing
RI PT
variants were identified on 75 alleles, whereas 85 CFTR alleles still contained unknown pathogenic CF variants. Taking all alleles together, 1,737 variants (1,426 substitutions and 311 indels) were identified across the 460kb region, of which 52 occurred in 20 CFTR
SC
structural and cis-regulatory elements. Many of these alterations are predicted to alter the binding of transcription factors (TFs) known to regulate CFTR expression, or others TFs
M AN U
with a pivotal role in differentiated epithelia. Functional analysis of five variants found within the CFTR promoter region (2kb) showed them to impair its activity in airway and intestinal cell lines. Moreover, four variants in an intestinal intronic enhancer of CFTR also significantly reduced enhancer activity in transient assays in an intestinal cell line. Further
mechanisms causing CF.
TE D
characterization of these intronic and intergenic variants may reveal novel pathogenic
AC C
EP
MATERIALS AND METHODS
Genomic DNA Samples
Genomic DNA from 80 de-identified CF patients, with one or both CFTR alleles having unknown pathogenic variants, were obtained from three CF centers in accordance with approved local guidelines. All patients were followed as having CF based on their clinical characteristics with diagnosis based on elevated sweat chloride levels and/or other CFpresenting features including failure to thrive, progressive lung disease, and infection
6
ACCEPTED MANUSCRIPT
histories. gDNA samples were drawn populations in Canada (n=72), Brittany, France (n=6), and Japan (n=2). Integrity of each gDNA sample was determined by Bioanalyzer (Agilent
SureSelect Targeted Enrichment Design
RI PT
Technologies, Santa Clara, CA) prior to target enrichment and sequencing.
Agilent’s SureDesign eArray tool (Agilent Technologies) was used to design ~120bp
SC
SureSelect biotinylated complementary RNA (cRNA) probes to span ~460kb that
encompasses the extended CFTR locus (GRCh37/hg19 chr7:116,970,000-117,429,999),
M AN U
including previously published cis-regulatory elements8. The first design included 12,776 probes with a total coverage of 491,070 kb (9158_repeat_ moderate stringency_maximum_boost_1X set and an additional 3,618 least stringency_max_ boost_1X, to add coverage for regions omitted from the moderate stringency group). To reduce off-
TE D
target hybridization an optimized design of 12,053 probes covering 463,793 kb was generated by removing the 630 most repetitive probes from the least stringency group (9,158_repeat_ moderate stringency_ maximum_boost_1X set and an additional 2,895 least
EP
stringency_max_ boost_1X). SureSelect libraries (containing 24 or 48 samples) were
AC C
prepared according to the manufacturer’s protocol and up to 48 libraries pooled and sequenced on a HiSeq2500 (Illuminam Inc., San Diego, CA) for 100bp paired end reads.
Bioinformatic Variant Analysis Base calls were determined by the Illumina CASAVA pipeline. After filtering for base quality and adapter sequences, the sequencing reads were aligned to the human reference genome (hg19) using bowtie2 v2.2.216. The aligned reads were formatted for input into the Genome
7
ACCEPTED MANUSCRIPT
Analysis Toolkit (GATK) v3.317 and the pipeline was modeled on the Broad Institutes Best Practices Guideline for variant calling18, 19; including local realignment, removal of PCR duplicates, and base quality recalibration. Single nucleotide variants and small insertion-
RI PT
deletion variants (indels) were called by the GATK HaplotypeCaller and variants for each sample were consolidated with GenotypeGVCFs. The combined variant call format file
(VCF) was then annotated with ANNOVAR v2015June1720 and filtered for quality with
scripts and manually called using IGV v2.3.92
SC
VCFtools v0.1.12.b21. Large deletions/duplications were identified using custom Perl
CFTR and Variant Nomenclature
M AN U
(http://www.broadinstitute.org/igv)(Supplemental Code S1).
TE D
CFTR introns and exons are numbered using legacy nomenclature22, for consistency with our previous work. All variants are numbered with respect to the A (+1) of the ATG start codon of the major CFTR transcript [LRG_663; NG_016465.4 (NM_000492.3)], following the
AC C
EP
recommendations of the Human Genome Variation Society23.
In Silico Predictions
MatInspector v8.4 (Genomatix, Munich, Germany) was used to predict TF binding sites (Matrix Library 10.0) in reference genome versus variant sequences for CFTR cisregulatory elements surrounding the variants of interest using default matrix search parameters (core similarity: 0.75; matrix similarity: optimized).
8
ACCEPTED MANUSCRIPT
Plasmid Construction Site-directed mutagenesis on pGL3B.196324, which contains an ~ 2kb CFTR promoter fragment (hg19 chr7:117,118,152-117,120,148), was performed using the QuikChange
RI PT
Lightning Multi Site-Directed Mutagenesis Kit (Agilent Technologies). ~ 1700 bp
encompassing the CFTR intron 11 enhancer element25 (hg19 chr 7: 117,227,831-
117,229,503) was PCR amplified and cloned into pSCB using the StrataClone Blunt PCR
SC
Cloning Kit (Agilent Technologies). The fragment was sub-cloned using SalI into the
enhancer site of pGL3B.245, containing the 787bp minimal CFTR promoter26, 27, and site-
verified using Sanger sequencing.
M AN U
directed mutagenesis performed. Primers are listed in Table 1 and all plasmids were
Cell Culture and Transient Luciferase Assays
TE D
Human colorectal carcinoma Caco228 and bronchial epithelial 16HBE14o-29 cell lines were cultured in Dulbecco’s modified Eagle’s medium, low glucose supplemented with 10% fetal bovine serum (FBS). Using standard methods25, cells were co-transfected with pGL3B
EP
luciferase reporter constructs and a modified pRL Renilla luciferase control vector using
AC C
Lipofectin (Thermo Fisher Scientific, Waltham, MA). Cells were lyzed after 48h and assayed on a GloMax Navigator (Promega Corp., Madison, WI) for firefly and Renilla luciferase activities using the Dual-Luciferase Reporter Assay Kit (Promega Corp.). Transfections were performed three times, in triplicate, using two different plasmid preparations.
RESULTS
9
ACCEPTED MANUSCRIPT
SureSelect Targeted Sequencing Study Design Using a targeted enrichment approach (chr7:116,970,000-117,429,999) and 12,053 SureSelect biotinylated cRNA probes, 463kb encompassing the extended CFTR locus were
RI PT
deep-sequenced. This region extends beyond the -80.1kb and +48.9kb TAD boundaries that demarcate the limits of the CFTR locus, irrespective of cell type or CFTR expression9, 10, and includes at least 20 other defined CFTR regulatory elements (Figure 1)8. Genomic DNA
SC
isolated from 80 de-identified CF patients from Toronto, Canada (n=72), Brittany, France (n=6), and Japan (n=2) was analyzed. At the time of clinical CF diagnosis and initial
M AN U
genotype analysis, all patients had at least one undefined CF allele.
Analysis of CF Alleles
To first evaluate the sensitivity of SureDesign reliable detection of the known pathogenic
TE D
alleles among the 80 CF patient gDNA samples was confirmed. Next, currently understood CFTR variants, which were not identified by earlier screening methods, were considered 1315.
These variants were classified as disease-causing, varying clinical significance, or
EP
unknown significance according to the CFTR2 database (which includes 374 variants;
AC C
https://www.cftr2.org; last accessed December 2017). Eight pathogenic variants, S489X, C524X, W1282X, L558S, CFTRdele2,3, CFTR dele4-7, CFTRdup6b-10, and CFTRdele14b17b, were newly identified in this patient cohort following SureDesign enrichment and deep-sequencing compared to original genotyping efforts (Table 2 and Supplemental Table S130, 31). The large deletions and duplication were detected as ~ 2-fold reduction or 2-fold increase in sequence read depth across the affected regions and all were reported in the literature previously32-36. Also, a previously identified ~ 7.2kb deletion in one CF patient
10
ACCEPTED MANUSCRIPT
was confirmed 31. The only common CFTR variant among the patient gDNAs that was not detected by the SureDesign was the T(n) polymorphic tract 5T (c.1210-1212 [5]) allele. However, this was likely due to sequence alignment issues that will be discussed later. Of
RI PT
160 CF alleles in our sample set, 75 contained known CF-causing variants that affect the
unknown pathogenic CF variants.
Variation Analysis within the Extended CFTR Locus
SC
CFTR coding sequence or mRNA splicing (Table 2). Thus, 85 CF alleles in our analysis had
M AN U
To utilize the full depth of the data set variant analysis was performed on all 160 CF alleles, irrespective of whether they carried defined or undefined causal variants. Substitution and small insertion/deletion (indels) variants identified by SureDesign-deep sequencing in the extended CFTR locus were cross-referenced against dbSNP138 (Table 3). Of the 1,737
TE D
variants identified 19% of substitutions and 67% of indels were not annotated in dbSNP138. The high percentage of unannotated indels is likely due to many of them occurring as more than two alleles. Next the 20 regions of CFTR that were previously
EP
defined as regulatory were studied (Figure 1)8, 25, 37. These regulatory elements cover
AC C
approximately 20kb of genomic sequence within which 51 total variants (37 substitutions and 11 indels) were observed in 17 elements (Table 3). Though four of these variants were seen in patients with two CF disease-causing variants (Supplemental Table S1), they were included in further analyses since they may nonetheless be functional in regulating CFTR expression. Many of the variants alter the predicted binding of TFs, either disrupting sites or creating novel ones. Table 4 summarizes these changes for selected known CFTR enhancer elements, including airway (-44kb, -35kb) and intestinal (intron 1 and intron 11)
11
ACCEPTED MANUSCRIPT
selective enhancers, and an enhancer that is common to both cell lineages (intron 23).37-42 Changes in TF binding sites in the variant sequences as compared to the wild-type CFTR sequence were predicted using MatInspector v8.4 (Table 4). This approach successfully
CFTR Promoter Variants Repress Promoter Activity
RI PT
identified key factors driving CFTR cis-regulatory elements in our earlier work39-43.
SC
We previously used transient luciferase reporter gene assays to determine the extent of CFTR promoter sequence required to drive the most robust gene expression. An
M AN U
approximately 2kb fragment (1963bp) was defined previously that has strong promoter activity in airway and intestinal cell lines24. Here the effect of five promoter variants identified in our SureSelect screen was investigated on the activity of the 2kb CFTR promoter. All five variants (four substitutions and one indel) in the 2kb CFTR promoter
TE D
(Figure 2A) are predicted to alter TF binding sites, and all, except c.-410G>C, were previously annotated in dbSNP138 (Table 4). Site-directed mutagenesis was used to independently introduce variants into the pGL3B.2kb CFTR promoter-luciferase construct.
EP
CFTR promoter constructs were transfected into airway (16HBE14o-) and intestinal
AC C
(Caco2) cell lines that express high levels of endogenous CFTR transcript25, 37, and luciferase expression compared to the wild type (WT) promoter (Figure 2C). An additional variant detected (c.-966T>G) was not examined independently, as the 2kb WT promoter fragment contained the minor allele (c.-966G). This variant has a high minor allele frequency (MAF 0.2238) in the general population (Table 4). Among the five variants tested, four significantly repressed CFTR promoter activity, with c.-812T>G being the strongest repressor of promoter activity (53% in Caco2) and c.-869delT not significantly
12
ACCEPTED MANUSCRIPT
affecting promoter activity in either cell type (Figure 2C). Consistent with the CFTR promoter lacking tissue-specific control elements (44), the effect of the promoter variants
16HBE14o- and Caco2 cells for most variants.
RI PT
examined here are cell-type independent, with little differences observed between
Variants in the CFTR Intron 11 Intestinal Enhancer Repress Enhancer Activity
SC
To determine whether sequence variants identified in cis-regulatory elements impact their function, four variants found in a robust intestine-selective enhancer element within intron
M AN U
11 of CFTR were first evaluated 9, 25, 42, 43. These four substitutions within the 1,400bp encompassing the element were predicted to alter TF binding sites (Table 4). Two of the four SNPs were novel (not annotated in dbSNP138), whereas the annotated SNPs both had high minor allele frequencies. The intron 11 enhancer:CFTR promoter luciferase
TE D
construct25 (DHS11 short) used previously, lacked the terminal 200bp of DHS11, which included the location of two of the four variants (Figure 2B). Hence, a new construct (pGL3B.245DHS11 long) was designed to assay their function. Site-directed mutagenesis
EP
was used to independently introduce variants into this construct, and plasmids were
AC C
transfected into Caco2 cells. The four variants tested all reduce intron 11 enhancer activity in Caco2 cells by 37% to 63% (Figure 2D).
13
ACCEPTED MANUSCRIPT
DISCUSSION
Among disease-causing variants currently annotated in the CFTR gene, only those in the
RI PT
promoter or that disrupt or create splice sites occur in non-coding regions. However, as 1% to 5% of CF patients have unknown molecular lesions, additional non-coding variants within the CFTR locus likely contribute to the pathogenicity of CF. Advances in NGS
SC
technologies have enabled whole CFTR locus sequencing to search for novel and/or noncoding variants 45, 46. Here we describe a targeted-enrichment method to deep-sequence
M AN U
460kb of the CFTR locus in 80 CF patients with at least one unknown CF allele (Figure 1). This method was robust in identifying known CF alleles, but the disease-associated variant on 85 alleles in this cohort remains unknown (Table 2). We predicted that among the 85 alleles are uncharacterized variants in CFTR regulatory elements, including the promoter,
TE D
enhancers, or other structural regulatory elements, which could reduce or abolish CFTR transcription (potentially Class I or V CFTR pathogenic variants resulting in no or low transcript synthesis47). Some variants may also cause disease by creating cryptic splice
EP
sites or altering splicing efficiency, which though potentially detectable by in silico
AC C
prediction programs, would need to be validated using relevant RNA samples and so are not considered here.
Overall, the SureDesign targeted enrichment and deep-sequencing method identified nearly all previously known CF variants present in our cohort, with the exception of the T(n) polymorphic tract 5T (c.1210-12[5]) variant. Previous genotyping efforts indicated that at least two individuals contained a 5T allele; however, the inability of our
14
ACCEPTED MANUSCRIPT
bioinformatic analysis to identify this variant is likely due to low confidence of mapping and alignment of the T(n) tract. Manual inspection of sequence reads that mapped to the three probes that spanned the T(n) tract revealed the 5T genotype in only the two expected
RI PT
patients, and no others. 5T contributes to complex CF alleles, and is pathogenic for
example, when found in cis with R117H48. Although the phase of variants cannot be
distinguished with this analysis, the only patient with a R117H allele was negative for the
M AN U
considered as CF-causing in this analysis.
SC
5T allele (Supplemental Table S1). Therefore, the 5T alleles in isolation were not
The CFTR promoter is the most extensively studied region of the gene (44), with ~20 variants identified in the 2kb CFTR promoter in patients with CF or CF-related disorders4959.
Indeed, four of the six variants identified in this study, c.-966T>G, c.-887C>T, c.-812T>G,
TE D
and c.-8G>C (Table 4), were reported previously49, 50, 54, 58, 59, though none are listed in the CFTR2 database as their disease-causing status has not been assayed. Here, it was shown that the c.-887T variant reduces CFTR promoter activity by 33% in intestinal and airway
EP
cell lines (Figure 2C). However, a previous study found that c.-887T had no effect on
AC C
promoter activity in A549 (lung adenocarcinoma, CFTR mRNA+), Panc1 (pancreatic adenocarcinoma, CFTR mRNA+), or HepG2 (hepatocellular carcinoma, CFTR mRNA-) cells58. This disparity could be a consequence of the promoter fragments assayed (2kb versus extended 6kb), or might suggest that the functional consequence of the c.-887T allele is cell-type dependent. Of note, the 16HBE14o- and Caco2 cells used in our experiments likely express substantially more CFTR transcript than A549 and Panc1 cells, implicating higher levels of activating TFs. The c.-887T allele is predicted to create binding motifs for zinc
15
ACCEPTED MANUSCRIPT
finger protein 652 (ZNF652) and also an inverted repeat 2 (IR2) negative glucocorticoid response element (nGRE). A C2H2-type zinc finger protein, ZNF652, functions as a transcriptional repressor60, 61, like glucocorticoid receptor when bound to nGREs62.
RI PT
Aberrant recruitment of either of these factors to the CFTR promoter may have negative consequences on expression.
SC
This study also shows that the c.-812G variant decreased CFTR promoter activity 53% in Caco2 cells, though it did not significantly decrease CFTR promoter activity in 16HBE14o-
M AN U
cells (Figure 2C). Two recent studies also investigated the functional consequence of this variant. In one, c.-812G increased the extended 6kb promoter activity by ~1.5-fold in HepG2, which do not express endogenous CFTR transcript58. In the second, this variant decreased promoter activity in another CFTR mRNA+ bronchial epithelial cell line, Beas2B,
TE D
possibly due to the creation of a potential binding motif for an E2F TF family member59. E2F TFs serve as activators and repressors during development, differentiation, and the cell cycle63. Although MatInspector analysis did not predict the creation of an E2F motif in
EP
the c.-812G sequence, it predictes loss of a deltaEF1 motif (Table 4), now known as zinc
AC C
finger E-box binding homeobox 1 (ZEB1). ZEB1 is an important regulator of epithelial-tomesenchymal transition in development and cancer, and like E2F family members, ZEB1 functions as both a transcriptional activator and repressor64. Further work is required to determine how the c.-812G>T transversion can both activate and repress CFTR promoter activity.
16
ACCEPTED MANUSCRIPT
The novel promoter variant observed in one patient, c.-410G>C, also reduces the activity of the CFTR promoter (Figure 2C). This substitution abolishes a putative X-linked zinc finger protein (ZFX) motif (Table 4), a transcriptional activator that plays an important role in
RI PT
maintaining stem cell pluripotency65. Conversely, the c.-8G>C variant is predicted to gain a zinc finger protein 300 (ZNF300) consensus site (Table 4). ZNF300, a broadly expressed C2H2/KRAB (Krüppel-associated box) zinc finger factor helps mediate the NF-κB immune
SC
response, and functions as a transcriptional repressor66, 67.
M AN U
Disruption of TF recruitment to CFTR structural and cis-regulatory elements can have a dramatic effect on CFTR expression and chromatin organization of the locus8, 9, 11, 40, 42, 43, however, until recently it has been challenging to identify and investigate CF disease– associated variants in these regions. These data indicate that multiple TF binding sites
TE D
(TFBS) are predicted to be lost or gained by the variants in CFTR enhancers identified here (Table 4). These enhancers are experimentally validated25, 26, 37, 39-42 and are known to regulate CFTR expression. Moreover, the altered TFBS recruit factors that are relevant to
EP
lung and intestinal biology. For example, a nuclear factor I (NF1) motif is destroyed by the
AC C
c.-35147G variant in the -35kb airway-selective enhancer. The four NFI family members bind the same consensus motif and play critical roles during development through activation and repression of target genes (68, 69). Notably, Nfib null mice die shortly after birth due to severe lung hypoplasia, which is a direct result of developmental defects in mesenchymal and epithelial lung cells in Nfib-/- embryos70. Also, the c.53+10442G>C variant in the intron 1 intestine-selective enhancer is predicted to destroy binding motif for hepatocyte nuclear factor 1 homeobox (HNF1) in intron 1, which was previously shown to
17
ACCEPTED MANUSCRIPT
be an important regulator of CFTR expression in human and mouse intestinal cells, in part through its interaction with the intron 1 enhancer element41, 42, 71. Loss of binding sites for these factors could reduce CFTR expression to contribute to CF disease pathogenicity in
RI PT
patients carrying these alleles. The observations that variants in the intron 11 intestinal enhancer reduce luciferase reporter gene expression by 37% to 63% in Caco2 cells clearly demonstrate this point (Figure 2D), although in vitro assays are not always indicative of
SC
significance in vivo. The novel c.1679+566G>T substitution is predicted to destroy a
putative site of occupancy by GATA binding protein 3 (GATA3). GATA3 acts as a pioneer
M AN U
factor to facilitate the opening of previously inaccessible chromatin72, and is known to precede forkhead box A (FOXA) binding at some loci in breast cancer cells73. Notably, FOXA factors, which also function as pioneer TFs, are crucial for expression of CFTR in intestinal cells, in part through binding to the intron 11 enhancer42, 43. Furthermore, loss of a
TE D
predicted caudal type homeobox 2 (CDX2) binding site may contribute to the reduced enhancer activity of the c.1679+1539T>C novel variant (Table 4, Figure 2D). Importantly, in intestinal cells, CDX2 binds at multiple sites within the CFTR locus, including in intron 11,
AC C
EP
and siRNA-mediated depletion of CDX2 reduces CFTR mRNA abundance42.
One aspect of this study that warrants further discussion is whether novel rSNPs occur in cis or in trans with each other, and also with known pathogenic variants. Since our study cohort includes gDNA only from index cases and not their parents, phase cannot be readily established. Moreover, unequivocal phasing may require long-read sequencing of CFTR alleles to establish reference haplotypes and confirm imputations. However, this limitation does not detract from the utility of this study in defining rSNPs, since they may have an
18
ACCEPTED MANUSCRIPT
important impact irrespective of the haplotype. For example, where a functional rSNP impairs a CFTR tissue-specific enhancer this on its own could reduce transcript abundance below a threshold required for normal CFTR channel activity. If that same rSNP was in cis
RI PT
with a known coding region variant that causes partial loss of function of CFTR, it might lead to a more severe phenotype. Though there is controversy about how much functional CFTR is required to prevent disease, it is possible that 5% of mean WT levels of CFTR
SC
transcript results in less-severe CF phenotypes, whereas 10% may protect against CF disease (74). It may also be important to know the impact of rSNPs in cis with another
M AN U
pathogenic variant when designing novel personalized therapeutics.
Identification of non-coding CFTR variants in the CF population will rapidly increase with the application of NGS screening protocols. The challenge will lie in determining the
TE D
functional consequence of these variants, especially those that fall within tissue-specific regulatory elements. Few molecular diagnostic laboratories are currently equipped to perform these tests. However, the robust cellular assays for coding region pathogenic
EP
variants and splice-site errors developed through CFTR275, 76, together with cell-specific
AC C
enhancer assays in CF-relevant epithelial cell lines that express endogenous CFTR mRNA, as described here, provide a toolbox and reagents that can be utilized for testing many novel variants. For future studies, it will be important to use patient-derived or CRISPRgenerated variant-induced pluripotent stem cells (iPSCs) to test the effect of these polymorphisms on endogenous CFTR expression in the appropriate differentiated celltypes.
19
ACCEPTED MANUSCRIPT
ACKNOWLEDGEMENTS We thank Dr. Pieter Faber and staff at the University of Chicago Genomics Core for sequencing; Dr. Ricky Chan, Case Western Reserve University Institute for Computational
RI PT
Biology for bioinformatics variant analysis; the SureDesign team at Agilent for technical advice; and late Dr. Julian Zielenski for his efforts toward the initial CFTR analysis of the
SC
Canadian samples.
J.L.K., S.G., and A.H. designed the experiments. J.L.K. and A.H. wrote the manuscript. J.L.K.,
AC C
EP
TE D
and J.R. shared gDNA samples.
M AN U
A.P., S.G., and W.C. performed experiments. J.L.K. and S.G. analyzed the data. C.F., M.N., H.I.,
20
ACCEPTED MANUSCRIPT
REFERENCES
1.
Scacheri CA, Scacheri PC: Mutations in the noncoding genome. Curr Opin Pediatr
2.
RI PT
2015, 27:659-664.
Programme WHG: The molecular genetic epidemiology of cystic fibrosis : report of a joint meeting of WHO/IECFTN/ICF(M)A/ECFS, Genoa, Italy, 19 June 2002. 2004. Massie J, Clements B, Australian Paediatric Respiratory G: Diagnosis of cystic fibrosis
SC
3.
after newborn screening: the Australasian experience--twenty years and five million
M AN U
babies later: a consensus statement from the Australasian Paediatric Respiratory Group. Pediatr Pulmonol 2005, 39:440-446. 4.
Southern KW, Munck A, Pollitt R, Travert G, Zanolla L, Dankert-Roelse J, Castellani C, Group ECNSW: A survey of newborn screening for cystic fibrosis in Europe. J Cyst
5.
TE D
Fibros 2007, 6:57-65.
Ross LF: Newborn screening for cystic fibrosis: a lesson in public health disparities. J Pediatr 2008, 153:308-313.
Massie RJ, Curnow L, Glazner J, Armstrong DS, Francis I: Lessons learned from 20
EP
6.
7.
AC C
years of newborn screening for cystic fibrosis. Med J Aust 2012, 196:67-70. Castellani C, Cuppens H, Macek M, Jr., Cassiman JJ, Kerem E, Durie P, Tullis E, Assael BM, Bombieri C, Brown A, Casals T, Claustres M, Cutting GR, Dequeker E, Dodge J,
Doull I, Farrell P, Ferec C, Girodon E, Johannesson M, Kerem B, Knowles M, Munck A,
Pignatti PF, Radojkovic D, Rizzotti P, Schwarz M, Stuhrmann M, Tzetis M, Zielenski J, Elborn JS: Consensus on the use and interpretation of cystic fibrosis mutation analysis in clinical practice. J Cyst Fibros 2008, 7:179-196.
21
ACCEPTED MANUSCRIPT
8.
Gosalia N, Harris A: Chromatin Dynamics in the Regulation of CFTR Expression. Genes (Basel) 2015, 6:543-558.
9.
Yang R, Kerschner JL, Gosalia N, Neems D, Gorsic LK, Safi A, Crawford GE, Kosak ST,
RI PT
Leir SH, Harris A: Differential contribution of cis-regulatory elements to higher order chromatin structure and expression of the CFTR locus. Nucleic Acids Res 2016, 44:3082-3094.
Smith EM, Lajoie BR, Jain G, Dekker J: Invariant TAD Boundaries Constrain Cell-
SC
10.
Type-Specific Looping Interactions between Promoters and Distal Elements around
11.
M AN U
the CFTR Locus. Am J Hum Genet 2016, 98:185-201.
Gosalia N, Neems D, Kerschner JL, Kosak ST, Harris A: Architectural proteins CTCF and cohesin have distinct roles in modulating the higher order structure and expression of the CFTR locus. Nucleic Acids Res 2014, 42:9612-9622. Moisan S, Berlivet S, Ka C, Le Gac G, Dostie J, Ferec C: Analysis of long-range
TE D
12.
interactions in primary human cells identifies cooperative CFTR regulatory elements. Nucleic Acids Res 2016, 44:2564-2576. Zielenski J, Aznarez I, Onay T, Tzounzouris J, Markiewicz D, Tsui LC: CFTR mutation
EP
13.
AC C
detection by multiplex heteroduplex (mHET) analysis on MDE gel. Methods Mol Med 2002, 70:3-19.
14.
Audrezet MP, Chen JM, Raguenes O, Chuzhanova N, Giteau K, Le Marechal C, Quere I,
Cooper DN, Ferec C: Genomic rearrangements in the CFTR gene: extensive allelic heterogeneity and diverse mutational mechanisms. Hum Mutat 2004, 23:343-357.
15.
Ferec C, Casals T, Chuzhanova N, Macek M, Jr., Bienvenu T, Holubova A, King C, McDevitt T, Castellani C, Farrell PM, Sheridan M, Pantaleo SJ, Loumi O, Messaoud T,
22
ACCEPTED MANUSCRIPT
Cuppens H, Torricelli F, Cutting GR, Williamson R, Ramos MJ, Pignatti PF, Raguenes O, Cooper DN, Audrezet MP, Chen JM: Gross genomic rearrangements involving deletions in the CFTR gene: characterization of six new events from a large cohort of
RI PT
hitherto unidentified cystic fibrosis chromosomes and meta-analysis of the underlying mechanisms. Eur J Hum Genet 2006, 14:567-576. 16.
Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Methods
17.
SC
2012, 9:357-359.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K,
M AN U
Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 2010, 20:1297-1303. 18.
DePristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, Philippakis AA,
TE D
del Angel G, Rivas MA, Hanna M, McKenna A, Fennell TJ, Kernytsky AM, Sivachenko AY, Cibulskis K, Gabriel SB, Altshuler D, Daly MJ: A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet
Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, Del Angel G, Levy-Moonshine A,
AC C
19.
EP
2011, 43:491-498.
Jordan T, Shakir K, Roazen D, Thibault J, Banks E, Garimella KV, Altshuler D, Gabriel S, DePristo MA: From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr Protoc Bioinformatics 2013, 43:11 10 11-33.
20.
Wang K, Li M, Hakonarson H: ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010, 38:e164.
23
ACCEPTED MANUSCRIPT
21.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, Genomes Project Analysis G: The variant call format and VCFtools. Bioinformatics 2011, 27:2156-2158. Tsui LC, Dorfman R: The cystic fibrosis gene: a molecular genetic perspective. Cold
RI PT
22.
Spring Harb Perspect Med 2013, 3:a009472. 23.
den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, McGowan-Jordan J,
SC
Roux AF, Smith T, Antonarakis SE, Taschner PE: HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat 2016, 37:564-569. Lewandowska MA, Costa FF, Bischof JM, Williams SH, Soares MB, Harris A: Multiple
M AN U
24.
mechanisms influence regulation of the cystic fibrosis transmembrane conductance regulator gene promoter. Am J Respir Cell Mol Biol 2010, 43:334-341. 25.
Ott CJ, Blackledge NP, Kerschner JL, Leir SH, Crawford GE, Cotton CU, Harris A:
TE D
Intronic enhancers coordinate epithelial-specific looping of the active CFTR locus. Proc Natl Acad Sci U S A 2009, 106:19934-19939. 26.
Smith AN, Barth ML, McDowell TL, Moulin DS, Nuthall HN, Hollingsworth MA, Harris
EP
A: A regulatory element in intron 1 of the cystic fibrosis transmembrane
27.
AC C
conductance regulator gene. J Biol Chem 1996, 271:9947-9954. Phylactides M, Rowntree R, Nuthall H, Ussery D, Wheeler A, Harris A: Evaluation of
potential regulatory elements identified as DNase I hypersensitive sites in the CFTR
gene. Eur J Biochem 2002, 269:553-559.
28.
Fogh J, Wright WC, Loveless JD: Absence of HeLa cell contamination in 169 cell lines derived from human tumors. J Natl Cancer Inst 1977, 58:209-214.
24
ACCEPTED MANUSCRIPT
29.
Cozens AL, Yezzi MJ, Kunzelmann K, Ohrui T, Chin L, Eng K, Finkbeiner WE, Widdicombe JH, Gruenert DC: CFTR expression and chloride secretion in polarized immortal human bronchial epithelial cells. Am J Respir Cell Mol Biol 1994, 10:38-47. Bonini J, Varilh J, Raynal C, Theze C, Beyne E, Audrezet MP, Ferec C, Bienvenu T,
RI PT
30.
Girodon E, Tuffery-Giraud S, Des Georges M, Claustres M, Taulan-Cadars M: Small-
cystic fibrosis. Genet Med 2015, 17:796-806. 31.
SC
scale high-throughput sequencing-based identification of new therapeutic tools in
Nakakuki M, Fujiki K, Yamamoto A, Ko SB, Yi L, Ishiguro M, Yamaguchi M, Kondo S,
M AN U
Maruyama S, Yanagimoto K, Naruse S, Ishiguro H: Detection of a large heterozygous deletion and a splicing defect in the CFTR transcripts from nasal swab of a Japanese case of cystic fibrosis. J Hum Genet 2012, 57:427-433. 32.
Dork T, Macek M, Jr., Mekus F, Tummler B, Tzountzouris J, Casals T, Krebsova A,
TE D
Koudova M, Sakmaryova I, Macek M, Sr., Vavrova V, Zemkova D, Ginter E, Petrova NV, Ivaschenko T, Baranov V, Witt M, Pogorzelski A, Bal J, Zekanowsky C, Wagner K, Stuhrmann M, Bauer I, Seydewitz HH, Neumann T, Jakubiczka S: Characterization of
EP
a novel 21-kb deletion, CFTRdele2,3(21 kb), in the CFTR gene: a cystic fibrosis
AC C
mutation of Slavic origin common in Central and East Europe. Hum Genet 2000, 106:259-268.
33.
Morral N, Nunes V, Casals T, Cobos N, Asensio O, Dapena J, Estivill X: Uniparental
inheritance of microsatellite alleles of the cystic fibrosis gene (CFTR): identification
of a 50 kilobase deletion. Hum Mol Genet 1993, 2:677-681. 34.
Quemener S, Chen JM, Chuzhanova N, Benech C, Casals T, Macek M, Jr., Bienvenu T, McDevitt T, Farrell PM, Loumi O, Messaoud T, Cuppens H, Cutting GR, Stenson PD,
25
ACCEPTED MANUSCRIPT
Giteau K, Audrezet MP, Cooper DN, Ferec C: Complete ascertainment of intragenic copy number mutations (CNMs) in the CFTR gene and its implications for CNM formation at other autosomal loci. Hum Mutat 2010, 31:421-428. Niel F, Martin J, Dastot-Le Moal F, Costes B, Boissier B, Delattre V, Goossens M,
RI PT
35.
Girodon E: Rapid detection of CFTR gene rearrangements impacts on genetic counselling in cystic fibrosis. J Med Genet 2004, 41:e118.
Girardet A, Guittard C, Altieri JP, Templin C, Stremler N, Beroud C, des Georges M,
SC
36.
Claustres M: Negative genetic neonatal screening for cystic fibrosis caused by
M AN U
compound heterozygosity for two large CFTR rearrangements. Clin Genet 2007, 72:374-377. 37.
Zhang Z, Ott CJ, Lewandowska MA, Leir SH, Harris A: Molecular mechanisms controlling CFTR gene expression in the airway. J Cell Mol Med 2012, 16:1321-1330. Genomes Project C, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel
TE D
38.
JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR: A global reference for human genetic variation. Nature 2015, 526:68-74. Zhang Z, Leir SH, Harris A: Oxidative stress regulates CFTR gene expression in
EP
39.
AC C
human airway epithelial cells through a distal antioxidant response element. Am J Respir Cell Mol Biol 2015, 52:387-396.
40.
Zhang Z, Leir SH, Harris A: Immune mediators regulate CFTR expression through a
bifunctional airway-selective enhancer. Mol Cell Biol 2013, 33:2843-2853.
41.
Ott CJ, Suszko M, Blackledge NP, Wright JE, Crawford GE, Harris A: A complex intronic enhancer regulates expression of the CFTR gene by direct interaction with the promoter. J Cell Mol Med 2009, 13:680-692.
26
ACCEPTED MANUSCRIPT
42.
Kerschner JL, Harris A: Transcriptional networks driving enhancer function in the CFTR gene. Biochem J 2012, 446:203-212.
43.
Kerschner JL, Gosalia N, Leir SH, Harris A: Chromatin remodeling mediated by the
RI PT
FOXA1/A2 transcription factors activates CFTR expression in intestinal epithelial cells. Epigenetics 2014, 9:557-565. 44.
McCarthy VA, Harris A: The CFTR gene and regulation of its expression. Pediatr
45.
SC
Pulmonol 2005, 40:1-8.
Vecchio-Pagan B, Blackman SM, Lee M, Atalar M, Pellicore MJ, Pace RG, Franca AL,
M AN U
Raraigh KS, Sharma N, Knowles MR, Cutting GR: Deep resequencing of CFTR in 762 F508del homozygotes reveals clusters of non-coding variants associated with cystic fibrosis disease traits. Hum Genome Var 2016, 3:16038. 46.
Straniero L, Solda G, Costantino L, Seia M, Melotti P, Colombo C, Asselta R, Duga S:
TE D
Whole-gene CFTR sequencing combined with digital RT-PCR improves genetic diagnosis of cystic fibrosis. J Hum Genet 2016, 61:977-984. Rowe SM, Miller S, Sorscher EJ: Cystic fibrosis. N Engl J Med 2005, 352:1992-2001.
48.
Kiesewetter S, Macek M, Jr., Davis C, Curristin SM, Chu CS, Graham C, Shrimpton AE,
EP
47.
AC C
Cashman SM, Tsui LC, Mickle J, Amos J, Highsmith WE, Shuber A, Witt DR, Crystal RG, Cutting GR.: A mutation in CFTR produces different phenotypes depending on chromosomal background. Nat Genet 1993, 5:274-278.
49.
Bienvenu T, Lacronique V, Raymondjean M, Cazeneuve C, Hubert D, Kaplan JC,
Beldjord C: Three novel sequence variations in the 5' upstream region of the cystic fibrosis transmembrane conductance regulator (CFTR) gene: two polymorphisms and one putative molecular defect. Hum Genet 1995, 95:698-702.
27
ACCEPTED MANUSCRIPT
50.
Verlingue C, Vuillaumier S, Mercier B, Le Gac M, Elion J, Ferec C, Denamur E: Absence of mutations in the interspecies conserved regions of the CFTR promoter region in cystic fibrosis (CF) and CF related patients. J Med Genet 1998, 35:137-140. Romey MC, Guittard C, Carles S, Demaille J, Claustres M, Ramsay M: First putative
RI PT
51.
sequence alterations in the minimal CFTR promoter region. J Med Genet 1999, 36:263-264.
Romey MC, Guittard C, Chazalette JP, Frossard P, Dawson KP, Patton MA, Casals T,
SC
52.
Bazarbachi T, Girodon E, Rault G, Bozon D, Seguret F, Demaille J, Claustres M:
M AN U
Complex allele [-102T>A+S549R(T>G)] is associated with milder forms of cystic fibrosis than allele S549R(T>G) alone. Hum Genet 1999, 105:145-150. 53.
Romey MC, Pallares-Ruiz N, Mange A, Mettling C, Peytavi R, Demaille J, Claustres M: A naturally occurring sequence variation that creates a YY1 element is associated
TE D
with increased cystic fibrosis transmembrane conductance regulator gene expression. J Biol Chem 2000, 275:3561-3567. 54.
Wu CC, Alper OM, Lu JF, Wang SP, Guo L, Chiang HS, Wong LJ: Mutation spectrum of
EP
the CFTR gene in Taiwanese patients with congenital bilateral absence of the vas
55.
AC C
deferens. Hum Reprod 2005, 20:2470-2475. Taulan M, Lopez E, Guittard C, Rene C, Baux D, Altierl JP, DesGeorges M, ClaustreS A,
Romey MC: First functional polymorphism in CFTR promoter that results in
decreased transcriptional activity and Sp1/USF binding. Biochem Bioph Res Co 2007, 361:775-781.
56.
Lopez E, Viart V, Guittard C, Templin C, Rene C, Mechin D, Des Georges M, Claustres M, Romey-Chatelain MC, Taulan M: Variants in CFTR untranslated regions are
28
ACCEPTED MANUSCRIPT
associated with congenital bilateral absence of the vas deferens. J Med Genet 2011, 48:152-159. 57.
Viart V, Des Georges M, Claustres M, Taulan M: Functional analysis of a promoter
RI PT
variant identified in the CFTR gene in cis of a frameshift mutation. Eur J Hum Genet 2012, 20:180-184. 58.
Giordano S, Amato F, Elce A, Monti M, Iannone C, Pucci P, Seia M, Angioni A, Zarrilli
SC
F, Castaldo G, Tomaiuolo R: Molecular and functional analysis of the large 5'
promoter region of CFTR gene revealed pathogenic mutations in CF and CFTR-
59.
M AN U
related disorders. J Mol Diagn 2013, 15:331-340.
Bergougnoux A, Viart V, Miro J, Bommart S, Molinari N, des Georges M, Claustres M, Chiron R, Taulan-Cadars M: Should diffuse bronchiectasis still be considered a CFTR-related disorder? J Cyst Fibros 2015, 14:646-653.
Kumar R, Cheney KM, McKirdy R, Neilsen PM, Schulz RB, Lee J, Cohen J, Booker GW,
TE D
60.
Callen DF: CBFA2T3-ZNF652 corepressor complex regulates transcription of the Ebox gene HEB. J Biol Chem 2008, 283:19026-19038. Kumar R, Manning J, Spendlove HE, Kremmidiotis G, McKirdy R, Lee J, Millband DN,
EP
61.
AC C
Cheney KM, Stampfer MR, Dwivedi PP, Morris HA, Callen DF: ZNF652, a novel zinc finger protein, interacts with the putative breast tumor suppressor CBFA2T3 to repress transcription. Mol Cancer Res 2006, 4:655-665.
62.
Surjit M, Ganti KP, Mukherji A, Ye T, Hua G, Metzger D, Li M, Chambon P: Widespread
negative response elements mediate direct repression by agonist-liganded glucocorticoid receptor. Cell 2011, 145:224-241.
29
ACCEPTED MANUSCRIPT
63.
Dimova DK, Dyson NJ: The E2F transcriptional network: old acquaintances with new faces. Oncogene 2005, 24:2810-2826.
64.
Zhang P, Sun Y, Ma L: ZEB1: at the crossroads of epithelial-mesenchymal transition,
65.
RI PT
metastasis and therapy resistance. Cell Cycle 2015, 14:481-487.
Galan-Caridad JM, Harel S, Arenzana TL, Hou ZE, Doetsch FK, Mirny LA, Reizis B: Zfx controls the self-renewal of embryonic and hematopoietic stem cells. Cell 2007,
66.
SC
129:345-357.
Gou D, Wang J, Gao L, Sun Y, Peng X, Huang J, Li W: Identification and functional
Acta 2004, 1676:203-209. 67.
M AN U
analysis of a novel human KRAB/C2H2 zinc finger gene ZNF300. Biochim Biophys
Wang T, Wang XG, Xu JH, Wu XP, Qiu HL, Yi H, Li WX: Overexpression of the human ZNF300 gene enhances growth and metastasis of cancer cells through activating NF-
68.
TE D
kB pathway. J Cell Mol Med 2012, 16:1134-1145. Gronostajski RM: Roles of the NFI/CTF gene family in transcription and development. Gene 2000, 249:31-45.
Harris L, Genovesi LA, Gronostajski RM, Wainwright BJ, Piper M: Nuclear factor one
EP
69.
AC C
transcription factors: Divergent functions in developmental versus adult stem cell populations. Dev Dyn 2015, 244:227-238.
70.
Hsu YC, Osinski J, Campbell CE, Litwack ED, Wang D, Liu S, Bachurski CJ,
Gronostajski RM: Mesenchymal nuclear factor I B regulates cell proliferation and
epithelial differentiation during lung maturation. Dev Biol 2011, 354:242-252.
30
ACCEPTED MANUSCRIPT
71.
Mouchel N, Henstra SA, McCarthy VA, Williams SH, Phylactides M, Harris A: HNF1alpha is involved in tissue-specific regulation of CFTR gene expression. Biochem J 2004, 378:909-918. Takaku M, Grimm SA, Shimbo T, Perera L, Menafra R, Stunnenberg HG, Archer TK,
RI PT
72.
Machida S, Kurumizaka H, Wade PA: GATA3-dependent cellular reprogramming requires activation-domain dependent recruitment of a chromatin remodeler.
73.
SC
Genome Biol 2016, 17:36.
Theodorou V, Stark R, Menon S, Carroll JS: GATA3 acts upstream of FOXA1 in
M AN U
mediating ESR1 binding by shaping enhancer accessibility. Genome Res 2013, 23:12-22. 74.
Amaral MD: Processing of CFTR: traversing the cellular maze--how much CFTR needs to go through to avoid cystic fibrosis? Pediatr Pulmonol 2005, 39:479-491. Sosnay PR, Siklosi KR, Van Goor F, Kaniecki K, Yu H, Sharma N, Ramalho AS, Amaral
TE D
75.
MD, Dorfman R, Zielenski J, Masica DL, Karchin R, Millen L, Thomas PJ, Patrinos GP, Corey M, Lewis MH, Rommens JM, Castellani C, Penland CM, Cutting GR: Defining the
EP
disease liability of variants in the cystic fibrosis transmembrane conductance
76.
AC C
regulator gene. Nat Genet 2013, 45:1160-1167. Gottschalk LB, Vecchio-Pagan B, Sharma N, Han ST, Franca A, Wohler ES, Batista DA,
Goff LA, Cutting GR: Creation and characterization of an airway epithelial cell line for stable expression of CFTR variants. J Cyst Fibros 2016, 15:285-294.
31
ACCEPTED MANUSCRIPT
FIGURE LEGENDS
Figure 1. SureSelect targeted enrichment experimental design. Twelve-thousand and
RI PT
fifty-three SureSelect biotinylated probes span the 463kb CFTR locus, and encompass 20 previously identified CFTR cis-regulatory elements. Gaps in probe coverage correspond to highly repetitive regions that were excluded. Repeats shown correspond to RepeatMasker
SC
track from UCSC Genome Browser (hg19).
M AN U
Figure 2. Promoter variants reduce activity of 2kb CFTR promoter in a cell-type independent manner. Schematic of five variants identified within the 2kb CFTR promoter (A) and four substitutions identified within the CFTR intron 11 cis-element (B). Stars represent individual patients with observed variant (blue: homozygous variant, yellow:
TE D
heterozygous) and orange hexagons represent 10 patients heterozygous for observed variant. C: Luciferase expression vectors containing the 2kb CFTR promoter (WT or with variants) were transiently co-transfected into 16HBE14o- or Caco2 cells. Data are shown
EP
relative to the CFTR 2kb promoter parental vector, error bars represent standard error of
AC C
the mean (n=12). pGL3B, lacking a promoter, is shown for control. D: Luciferase expression vectors containing the 787bp minimal CFTR promoter and the DHS11 long (WT or with variants) cis-element were transiently co-transfected into Caco2 cells. Data are shown relative to the CFTR minimal promoter parental vector (pGL3B.245), error bars represent standard error of the mean (n=9). Luciferase expression levels were compared against pGL3B.2kb (C) and pGL3B.245-DHS11(long) (D) using unpaired t-tests, ****P < 0.0001, ***P < 0.001, **P < 0.01.
32
ACCEPTED MANUSCRIPT
Table 1 – Oligonucleotides.
Sequence
Description
oJLK021
5’-GTAATTACGCAAAGCATTATCTCTTCTTACCTCCTTGCAGATTTTTT-3’
CFTR prom -887C>T
oJLK022
5’-CTCCTCTTACCTCCTTGCAGATTTTTTTTCTCTTTCAGTACG-3’
CFTR prom -869delT
oJLK023
5’-CCACCCTTGGAGTTCACGCACCTAAACCTGAAACT-3’
oJLK025
5‘-GGATGGGCCTGCTGCTGGGCGGT-3’
oJLK026
5’-CCCCAGCGCCCCAGAGACCA-3'
SalI DHS11F
5’-CGTCGACTGGAGAAGGTGGAATCACACTG-3’
oJLK048
5’-CGTCGACTTCTCTGTTTATACATGTAATTGTTGG-3’
CFTR prom -410G>C CFTR prom -8G>C
DHS11-long forward cloning primer
M AN U
GTCCAAGCATTTTAAAGCTGTCAAAGATATGTAAATATAGATAATGTATG oJLK044
TCAAG-3’ 5’-
ACTTTGAGGAACTAAAAATAATTGTCTATTCTTATTCTGATCAGAATGTGT oJLK045
AATG-3’
oJLK046
5’-GATCCATTATGTAGCTCTTGCATGCTGTCTTCAAAAATAAGTTACA-3’ 5’-
TE D
CCATTGGTTTTTAAAAAAATTTTTAAATTGGCTTCAAAAATTTCTTAATTGT
DHS11-long reverse cloning primer
c.1679+566G>T (DHS11)
c.1679+1280G>A (DHS11)
c.1679+1449A>G (DHS11)
c.1679+1539T>C (DHS11)
GTGCTGAATACAATTTT-3’
AC C
EP
oJLK047
CFTR prom -812T>G
SC
5’-
RI PT
Oligonucleotide
33
ACCEPTED MANUSCRIPT
Table 2: Genotype Summary.
CFTR Variants
# alleles (n=160)
CF-causing
75
RI PT
G85E, S489X, I507del, F508del, C524X, Coding variants
59/75
G551D, c.3744delA*, W1282X, Q1313X c.489+1G>T, c.1116+1G>A, c.1393-1G>A, Splice site variants
c.1585-1G>A, c.1679+1634A>G, c.1680-
10/75
877G>T, c.3718-2477C>T, c.3718-3T>G
10, CFTRdele14b-17b, CFTRdele16-17b R117H, 5T
Unknown Significance
L558S
*Also known as p.Lys1250ArgfsX9.
6/75 3‡
M AN U
Varying Clinical Significance
SC
CFTRdele2,3, CFTRdele4-7, CFTRdup6bLarge deletions/duplications†
1
†CFTR legacy nomenclature, see Supplementary Table S 1 for CFTR2 full nomenclature. ‡no patients with R117H; 5T genotype.
AC C
EP
TE D
Newly detected variants in this study are in bold.
34
ACCEPTED MANUSCRIPT
Table 3: Variant Summary.
Total
Total Variants
(bp)
Variants
in dbSNP138
Substitutions in dbSNP138
Indels
dbSNP138
1737
1253
1426
1152
311
101
460kb CFTR Locus
Substitutions
Indels in
RI PT
Size
382
1
1
1
1
0
-
-44 kb
1200
1
1
1
1
0
-
-35 kb
1600
4
2
3
2
1
0
-20.9 kb
395
0
-
0
-
0
-
-3.4 kb
1214
3
3
3
3
0
-
promoter
1999
6
5
5
4
1
1
185 + 10 kb (intron 1)
1100
4
3
2
2
2
1
1716 + 13.2/13.7 kb (intron 10a,b)
1300
6
3
1
1
5
2
1716 + 23 kb (intron 10c)
700
1
0
1
0
0
-
1811 + 0.8 kb (intron 11)
1400
4
2
4
2
0
-
3600 + 1.6 kb (intron 18a)
1000
6
5
4
4
2
1
3600 + 10 kb (intron 18b)
500
2
1
2
1
0
-
3849 + 12.5 kb (intron 19)
900
3
3
3
3
0
-
4374 + 1.3 kb (intron 23)
1400
1
1
1
1
0
-
+15.6 kb +21.5 kb
+83.7 kb
399
0
-
0
-
0
-
1500
3
3
3
3
0
-
1000
2
2
2
2
0
-
1200
1
0
1
0
0
-
250
0
-
0
-
0
-
459
3
2
3
2
0
-
19898
51
37
40
32
11
5
EP
+36.6 kb +48.9 kb
M AN U
TE D
+6.8 kb/+7.0 kb
AC C
Regulatory Regions Totals
SC
-80.1 kb
35
ACCEPTED MANUSCRIPT
Table 4: Predicted TFBS Losses and Gains in CFTR Promoter and Enhancers.
Regulatory dbSNP138
Region
c.-43626G>A
MAF
Gain
(1000G)
V$HMX2.02
V$HOXC12.01
V$MEIS1A_HOXA9.01
V$CPHX.01
V$VMYB.05
V$OSNT.01
V$TST.01
V$SMARCA3.01
rs185018312
c.-35564T>G
rs6972168 V$HHEX.01 V$HBP1.01 rs6972819
V$TEAD.01
M AN U
c.-35147T>G
MAF (This 38
Study)
0.0004
2/160
0.4295
65/160
SC
-44 kb DHS37, 39
Predicted TFBS Predicted TFBS Loss
RI PT
Variant
0.1458
1/160
N/A
1/160
0.2238
14/160
0.0164
1/160
0.0048
1/160
0.0012
3/160
N/A
1/160
0.0274
4/160
V$NF1.01 -35 kb DHS37,40
V$NF1.02
V$EOMES.02 V$ZBED1.01
c.-34893G>C
-35 novel.1
V$GZF1.01
V$ZBED1.02
c.-966T>G
TE D
V$PAX5.02
rs4148682
V$PRDM4.01
V$STAT.01 V$BRN2.01
V$RAR_RXR.01
c.-887C>T
rs34465975
V$ZFP652.01
V$TR4.02
V$IR2_NGRE.01
EP
V$PPARG.02
AC C
c.-869delT
rs4148683
V$IRF4.01
V$IRF3.01
V$NKX25.05 V$AP1.02
2kb Promoter24
c.-812T>G
V$FXRE.01 V$AHRARNT.03 rs181008242 V$FXRE.01 V$TAXCREB.01 V$DELTAEF1.01 V$VDR_RXR.06 V$PSE.01 V$ZFX.01
c.-410G>C
. V$NRSF.02
c.-8G>C
rs1800501
V$ZNF300.01
36
ACCEPTED MANUSCRIPT
c.53+9941A>C
rs35714998
V$PLZF.01
V$SOX9.06
0.0160
1/160
0.3073
65/160
V$ZEC.01
Intron 1 DHS26, 41 c.53+10442G>C
rs1557630
V$HNF1.03
V$PLZF.01 V$PAX8.01
c.1679+566G>T
11 novel.1
N/A
2/160
0.4265
95/160
0.4263
95/160
V$HMGIY.01
N/A
2/160
V$SOX9.09
0.1879
28/160
V$SRY.04 V$BRACH.01
RI PT
V$PBX1_MEIS1.03 V$GATA3.02
V$E4BP4.01 V$PRE.01
c.1679+1449A>G rs213964
V$POU3F3.01
c.1679+1539T>C 11 novel.2
V$CDX2.01
c.4242+198T>C
rs1429568
V$FAST1.01
AC C
EP
TE D
M AN U
Intron 23 DHS37
c.1679+1280G>A rs213963
SC
Intron 11 DHS25, 42
37
Scale chr7:
200 kb 117,000,000
-80.1kb
CFTR cis-Elements SureSelect Probes
117,100,000
117,150,000
-44kb promoter -35kb -3.4kb -20.9kb 185+10kb
ASZ1
CFTR
EP AC C Figure 1
117,200,000
117,250,000
117,300,000
hg19 117,350,000
1716+13.2/13.7kb 3600+10kb +6.8kb/+7.0kb +36.6kb 1716+23kb 3600+1.6kb +48.9kb 4374+1.3kb +21.5kb +15.6kb 1811+0.8kb 3849+12.5kb
TE D
WNT2 Repeats
117,050,000
M AN U
SC
RI PT
ACCEPTED MANUSCRIPT
CTTNBP2
117,400,000 +83.7kb
117,450,000
pGL3B
AC C
EP
Relative Luciferase Expression
c.
** ****
c.1679+1539T>C
Relative Luciferase Expression
40
20
*** 30
c.1679+1449A>G
10
5
1.
1.
5 0.
Caco2 16HBE14o0
**** ****
-8G>C
***
****
c.1679+1280G>A
TE D
****
-410G>C
c.1679+566G>T
0
pGL3B.245-DHS11 (long)
****
-812T>C
0
c.
DHS11 (long) WT
**
-869delT
0.
16 79 +
pGL3B.245
-887C>T
Figure 2
c.
M AN U
pGL3B.CFTR 2kb prom.
12 80 G
56 6G
c.
16 79 +
D
16 >A 79 + 16 144 79 9A +1 > 53 G 9T >C
intron 11 cis-element
>T
>C
Exon 11
-8 G
>C -4 10 G
C
DHS11 (long) DHS11 (short)
SC
B
-8 -8 87C 6 > -8 9de T 12 lT T> G
A
RI PT
ACCEPTED MANUSCRIPT