CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes

CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes

Accepted Manuscript CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes Haiyan Huang, Qiang Wu PII: S1673-8527(16)30021-2 DOI:...

2MB Sizes 0 Downloads 22 Views

Accepted Manuscript CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes Haiyan Huang, Qiang Wu PII:

S1673-8527(16)30021-2

DOI:

10.1016/j.jgg.2016.03.006

Reference:

JGG 440

To appear in:

Journal of Genetics and Genomics

Received Date: 5 February 2016 Revised Date:

3 March 2016

Accepted Date: 16 March 2016

Please cite this article as: Huang, H., Wu, Q., CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes, Journal of Genetics and Genomics (2016), doi: 10.1016/j.jgg.2016.03.006. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

CRISPR Double Cutting through the Labyrinthine Architecture of 3D Genomes

RI PT

Haiyan Huang, Qiang Wu

Key Laboratory of Systems Biomedicine (Ministry of Education), Center for Comparative Biomedicine, Institute of Systems Biomedicine, Shanghai Jiao Tong University, 800 Dongchuan Road, Minhang, Shanghai 200240, China

AC C

EP

TE D

M AN U

SC

Corresponding author. Tel/fax: +86 21 3420 4300. Email address: [email protected] (Q. Wu).

1

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

ABSTRACT The genomes are organized into ordered and hierarchical topological structures in interphase nuclei. Within discrete territories of each chromosome, topologically associated domains (TADs) play important roles in various nuclear processes such as gene regulation. Inside TADs separated by relatively constitutive boundaries, distal elements regulate their gene targets through specific chromatin-looping contacts such as long-distance enhancerpromoter interactions. High-throughput sequencing studies have revealed millions of potential regulatory DNA elements, which are much more abundant than the mere ~20,000 genes they control. The recently emerged CRISPRCas9 genome editing technologies have enabled efficient and precise genetic and epigenetic manipulations of genomes. The multiplexed and highthroughput CRISPR capabilities facilitate the discovery and dissection of gene regulatory elements. Here, we describe the applications of CRISPR for genome, epigenome, and 3D genome editing, focusing on CRISPR DNAfragment editing with Cas9 and a pair of sgRNAs to investigate topological folding of chromatin TADs and developmental gene regulation.

AC C

EP

TE D

KEYWORDS: DNA-fragment editing; Chromatin architecture; Topological domain; CTCF; Enhancer; Insulator.

2

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

1. INTRODUCTION The more than 2-meter human genome is folded exquisitely in the cell nucleus with a radius of about 5 micrometers. Dynamic genome folding is intricately related to many nuclear processes such as DNA replication, repair, rearrangement, and recombination, as well as chromatid separation during mitosis and chromosome segregation during meiosis. In particular, higherorder chromatin organization or DNA looping during the interphase plays an important role in the dynamic transcriptional regulation during tissue development and disease progression (Levine et al., 2014; Xu and Corces, 2015). Promoters are central for gene regulation. Wide varieties of core promoters with proximal elements and their cognate basal transcriptional machineries exist in higher eukaryotes for specific gene expression (Tjian and Maniatis, 1994; Levine et al., 2014). In addition, multiple distal regulatory elements that activate or repress promoters, such as enhancers or silencers, are essential for tissue- or cell-specific gene expression (de Laat and Duboule, 2013). They recruit additional specific molecular machineries to precisely control spatiotemporal gene expression patterns (Lagha et al., 2012; de Laat and Duboule, 2013).

AC C

EP

TE D

Enhancers recruit lineage-specific transcription factors or cofactors which facilitate enhancer interactions with the basal RNA polymerase II transcriptional machinery (Tjian and Maniatis, 1994). Specifically, through long-distant DNA looping interactions, distal enhancers are brought close to promoters to activate cell-type specific transcription (Levine et al., 2014). Insulators function to prevent the spread of heterochromatin or block enhancer-promoter communications (Ali et al., 2016). The most extensivelystudied vertebrate insulator-binding protein (also known as chromosome architectural protein) is the CCCTC-binding factor (CTCF), which also plays an essential role in three-dimensional (3D) organization of chromatin (Handoko et al., 2011; Dixon et al., 2012; Ong and Corces, 2014; Rao et al., 2014; Xu and Corces, 2015). In humans, less than 2% of the genome codes for proteins, and 50% is repeated sequences (International Human Genome Sequencing Consortium, 2001). The remaining approximately 48% genome contains key regulatory elements (ENCODE Project Consortium, 2012). The ENCODE project identified millions of putative regulatory elements, many of which were found to be physically associated with one another and with gene promoters (ENCODE Project Consortium, 2012). The landscape of long-range geneelement connectivity is complex. The looping size could range from hundreds of kilobases to several megabases, placing gene regulation in 3D context (Lieberman-Aiden et al., 2009; Sanyal et al., 2012). Various high-throughput next-generation sequencing (NGS) techniques have 3

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

been developed to study the relationship between higher-order chromatin organization and gene transcription. RNA-seq profiling reveals spatiotemporal gene expression patterns. In addition, DNA proximity-ligation methods, such as 3C (chromosome conformation capture), 4C (circularized chromosome conformation capture), ChIA-PET (chromatin interaction analysis with pairedend tag sequencing), 5C (chromosome conformation capture carbon copy), and Hi-C (high-throughput chromosome conformation capture), provide information on long-range chromatin interactions. These evolving genomic methods have begun to yield exciting insights into architectural mechanisms of 3D genome regulation. As substantial biases and technical artifacts commonly exist in NGS chromatin profiling data, further work needs to be done to validate these models (Meyer and Liu, 2014). The recently emerged clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPRassociated (Cas) (CRISPR-Cas9) genome editing methods provide great opportunities to probe spatial DNA-looping interactions and perturb higherorder chromatin organization. In this review, we focus on the CRISPR-Cas9 genome-editing system (Lander, 2016; Wright et al., 2016), especially DNAfragment editing (Li et al., 2015b), for mechanistic studies of 3D genome regulation.

AC C

EP

TE D

2. CRISPR SYSTEMS AND DNA REPAIR PATHWAYS 2.1 CRISPR Genome Editing Cells have evolved elaborate DNA repair mechanisms to maintain genome integrity and stability (Sancar and Rupp, 1983; Lindahl, 1993; Dzantiev et al., 2004; Chiruvella et al., 2013). Targeted DNA double-strand breaks (DSBs) could vastly stimulate genome editing through homologous recombination (HR) and non-homologous end joining (NHEJ) repair pathways (Carroll, 2014). Four types of programmable DNA-binding proteins have been engineered to introduce site-specific DNA DSBs: zinc finger nucleases (ZFNs) based on eukaryotic transcription factors, meganucleases derived from microbial mobile genetic elements, transcription activator-like effector nucleases (TALENs) from Xanthomonas bacteria, and, most recently, the RNA-guided DNA endonuclease Cas9 from the CRISPR system (Wei et al., 2013; Carroll, 2014; Sander and Joung, 2014). Meganucleases, ZFNs, and TALENs all recognize cognate DNA sequence through protein-DNA interactions (Sander and Joung, 2014). The relationships between meganucleases and their target DNA sequence specificity are unclear (Sander and Joung, 2014). ZFNs and TALENs use a strategy of linking endonuclease catalytic domains to modular DNA-binding proteins, which recognize 3-bp and 1-bp sequence, respectively. However, the designing and construction of modular DNA-binding proteins for each targeting site are intrinsically difficult and labor-intensive. By contrast, Cas9 from the CRISPR system can be guided by a single guide RNA (sgRNA) 4

ACCEPTED MANUSCRIPT through Watson-Crick base pairing with target DNA sequences. The CRISPR system is remarkably easier to design, highly-specific, very efficient, and wellsuited for high-throughput and multiplexed gene editing (Fig. 1A) (Carroll, 2014; Doudna and Charpentier, 2014; Hsu et al., 2014; Sander and Joung, 2014). This advantage has opened up its broad applications in basic biology research, biotechnology, and medicine.

M AN U

SC

RI PT

CRISPR has become a widely-used genome-editing tool after more than two dozen years of arduous research on mysterious bacterial repeats (Lander, 2016; Wright et al., 2016). CRISPR-Cas is an elaborate microbial adaptive defense system evolved to escape from phage conjugation or virus infections (Wright et al., 2016). They exist in about 50% of bacteria and 87% of archaea and can be classified into two classes, which are subdivided into five types (Makarova et al., 2015). The architecture of the CRISPR loci is usually composed of direct short-palindromic repeats separated by a fixed length of spacer sequences. Similar to the vertebrate adaptive immune systems, microbial CRISPR-Cas systems can remember the initial infection by records from exogenous infectious agents (Wright et al., generating 2016).

EP

TE D

The CRISPR records are barcoding DNA sequences of spacers acquired from the foreign protospacer of naive infections. The key to distinguish self from non-self is several nucleotides in the vicinity of the protospacer called PAM (protospacer adjacent motif) (Fig. 1A). Thus, for each naive infection, the CRISPR array is increased by a copy of spacer and one direct palindromic repeat. After acquisition of spacers, microbe expresses and assembles an RNA-programed enzymatic complex called effector which can cut foreign DNA upon reinfections (Lander, 2016; Wright et al., 2016). It is this interference or cutting mechanism that was developed as tools for genome editing.

AC C

In the type II system (belong to the class 2 CRISPR), Cas9 can be programed by a single sgRNA to cleave targeted sites to generate DSBs for genome editing in a wide variety of species and cell types (Fig. 1A) (Jinek et al., 2012). The blunt ends of DSBs are usually ligated by DNA repair machineries through the NHEJ pathway to induce indel (insertion and deletion) mutations (Fig. 1A) (Cho et al., 2013; Cong et al., 2013; Hwang et al., 2013; Jiang et al., 2013; Jinek et al., 2013; Mali et al., 2013b). Since DNA repair is universal in many organisms, CRISPR genome editing tools have found applications in virtually every species such as fungi, plants, and animals (Jiang and Marraffini, 2015). Recently, a type V system (also belong to the class 2 CRISPR) utilizing Cpf1 instead of Cas9 was shown to be useful for genome editing in human cells (Zetsche et al., 2015).

5

ACCEPTED MANUSCRIPT

SC

RI PT

2.2 CRISPR DNA-Fragment Editing Geneticists have made great contributions to our understanding of the fundamental biology by analyzing phenotype-genotype correlations through forward genetic screening. Reverse genetics through gene targeting and subsequent phenotypic analyses of the targeted mutants is routine in labs upon knowing gene sequences. Although frameshift mutations of small indels induced by CRISPR-Cas9 are very useful for protein-coding genes, it is challenging to use small indels to study vast numbers of noncoding regulatory elements. For investigation of gene regulatory elements and gene clusters, DNA-fragment editing or chromosomal engineering is often required (Fig. 1B). This includes DNA-fragment deletion, duplication, inversion, insertion, and substitution (Fig. 1B) (Mills and Bradley, 2001; Li et al., 2015b). DNA-fragment editing will also be useful to model human chromosomal structure variations in development and neurological diseases (Weckselblatt and Rudd, 2015).

EP

TE D

M AN U

Site-specific recombination of two different sites within a single chromosome results in deletion and inversion of DNA fragment between the two sites (Golic and Golic, 1996; Kmita et al., 2000; Mills and Bradley, 2001; Spitz et al., 2005). By contrast, trans-allelic recombination between two non-allelic sites each on a homologous chromosome or chromatid (also known as non-allelic homologous recombination: NAHR) generates deletion and duplication of DNA fragment between the two sites (Golic and Golic, 1996; Herault et al., 1998; Wu et al., 2007). Similarly, when two DSBs were induced by ZFN or TALEN, intra-chromosomal ligation results in deletion or inversion of the DNA fragment between the two targeted DSBs (Lee et al., 2010; Carlson et al., 2012; Gupta et al., 2013; Xiao et al., 2013); while inter-chromosomal or interchromotidal ligation results in deletion and duplication (Carlson et al., 2012; Lee et al., 2012; Xiao et al., 2013).

AC C

The CRISPR-Cas9 system has also been used to generate targeted DNAfragment editing or chromosomal rearrangements. First, inter-chromosomal translocations and 12 Mb paracentric or 11 Mb pericentric intra-chromosomal inversions have been generated by Cas9 with a pair of sgRNAs in cultured human cells (Choi and Meyerson, 2014). Second, inter-chromosomal translocation induced by Cas9 with paired sgRNAs recapitulates fusion-gene expression in acute myeloid leukemia and Ewing’s sarcoma (Torres et al., 2014). Third, an 11 Mb chromosomal inversion generated by CRISPR in mice models human lung cancers (Maddalo et al., 2014). These cases suggest that DNA-fragment editing by Cas9 with a pair of sgRNAs is powerful for modelling human cancers. Initial multiplexed CRISPR editing with two sgRNAs resulted in deletion of a short DNA fragment between the two targeting sites through concurrent DSBs (Fig. 1B) (Cong et al., 2013; Mali et al., 2013b). A series of DNA fragment 6

ACCEPTED MANUSCRIPT

RI PT

deletions with sizes ranging from 1.3 kb to greater than 1 Mb suggest that the deletion frequency is inversely related to the length of the deleted DNA fragments (Canver et al., 2014). However, deletion of other series of similarlysized DNA fragments does not reveal the inverse correlation between deletion frequency and the size of the deleted DNA fragments (Li et al., 2015a). The discrepancy may be due to the 3D genome folding. The actual physical closeness, rather than the linear distance, between the two DSBs determines the deletion frequency (Kraft et al., 2015; Li et al., 2015b). Consistently, translocations occur more frequent between DSBs in close proximity than between distant DSBs in cancer genomes (Zhang et al., 2012).

M AN U

SC

Inversion frequencies ranging from 0.7% to 4.2% of DNA fragments for six loci have been detected in mouse embryonic stem cells (ESCs) by Cas9 with a pair of sgRNAs (Kraft et al., 2015). In mice, very efficient DNA fragment inversion has been found in founders by pronuclear injection of Cas9 mRNA and a pair of sgRNAs (Li et al., 2015a). In addition, DNA fragment inversions can be germline transmitted at high frequency (Li et al., 2015a). Finally, inversion frequencies ranging from 0.72% to 23.28% of a series of DNA fragments spanning from several dozen bp to about 1 Mb have been observed in cultured human cells (Li et al., 2015a). Thus, CRISPR inversion of DNA fragments could be easily generated by Cas9 with a pair of sgRNAs (Fig. 1B).

AC C

EP

TE D

Tandem or segmental duplications (recurrent microduplications) could also be achieved by Cas9 with a pair of sgRNAs through trans-allelic ligation between the two non-allelic targeting DSBs in mouse ES cells (Kraft et al., 2015) as well as cultured human cells and founder mice (Fig. 1B) (Li et al., 2015a). However, the duplication efficiency is much lower than that of DNA fragment deletion and inversion (Kraft et al., 2015; Li et al., 2015a). Nevertheless, this CRISPR method could be useful to model human copy number variations (CNVs) as well as recurrent microdeletion and microduplication syndromes which are often associated with complex neuropsychiatric diseases such as schizophrenia and autism spectrum disorders (Tai et al., 2016). Finally, DNA fragment insertions could be generated through CRISPR-Cas9 systems when a donor template is provided (Fig. 1B) (Cong et al., 2013; Mali et al., 2013a; Wang et al., 2013). In particular, since DSBs greatly induce homologous recombination (Orr-Weaver et al., 1981), introducing targeted DSBs by CRISPR-Cas9 between genomic sequences matching the two homologous arms of the donor template facilitate DNA fragment insertion or replacement (Fig. 1B) (Byrne et al., 2015). In summary, DNA-fragment editing including deletions, inversions, duplications, insertions, and substitutions could be easily generated by CRISPR-Cas9 system with a pair of sgRNAs (Fig. 1B) (Li et al., 2015b). We 7

ACCEPTED MANUSCRIPT do not suggest multiplexing CRISPR genome editing with more than two sgRNAs because very complex combinatorial and composite DNA-fragment editing could be generated from presumed giga-Dalton repair centers (Jasin and Rothstein, 2013).

TE D

M AN U

SC

RI PT

2.3 Precise vs Imprecise CRISPR GENOME Editing CRISPR genome editing is dependent on the sgRNA-programmed Cas9 to generate DNA blunt ends from targeted DSBs in a genome. Organisms have evolved several DNA repair pathways to ligate blunt-ended DSBs to maintain genome stability (Fig. 1A) (Jasin and Rothstein, 2013; Ceccaldi et al., 2016). Among the four known DNA repair pathways, it is generally thought that the repair of DSBs through NHEJ is mutagenic because NHEJ normally generates indels as a scar in the editing site (Cong et al., 2013; Mali et al., 2013a; Wang et al., 2013). However, recent studies suggest that NHEJmediated ligation of blunt ends may be intrinsically accurate because Ku70/80 protects the blunt ends (Jasin and Rothstein, 2013; Bétermier et al., 2014; Ceccaldi et al., 2016). Consequently, accurate and precise ligations were frequently observed in the case of DNA fragment deletions with a pair of sgRNAs (Canver et al., 2014; Li et al., 2015a). In addition, precise junctions of both upstream and downstream inversion ends were frequently observed in DNA fragment inversions induced by Cas9 and a pair of sgRNAs (Li et al., 2015a). Thus, the NHEJ DNA repair pathway could be utilized to achieve both mutagenic and precise genome editing (Fig. 1A). Finally, when a donor template with flanking homologous arms is present, efficient CRISPR genome editing could be achieved through the HR repair pathway (Fig. 1A) (Byrne et al., 2015).

AC C

EP

NHEJ is independent on DSB end resection (Jasin and Rothstein, 2013; Ceccaldi et al., 2016). By contrast, HR is dependent on extensive resection of the DSB ends and strand invasion by the donor DNA (Fig. 1A) (Jasin and Rothstein, 2013; Ceccaldi et al., 2016). Two other known resection-dependent repair pathways are SSA (single strand annealing) and MMEJ (microhomology-mediated end joining) (Fig. 1A). Like HR, SSA requires extensive resection of DSB ends; however, unlike HR, which is an accurate repair pathway, SSA often results in large deletions (Fig. 1A) (Jasin and Rothstein, 2013; Ceccaldi et al., 2016). MMEJ could be utilized for efficient mutagenic CRISPR genome editing because it is dependent on resection and annealing of short microhomologous sequence of 5-25 bp near DSB ends (Fig. 1A) (McVey and Lee, 2008; Li et al., 2015a). For example, MMEJ can generate very efficient inversions and deletions. In the case of genome editing with Cas9 and a pair of sgRNAs, very efficient DNA fragment inversion could be achieved through MMEJ when short inverted repeats are present at or near the two DSBs (Li et 8

ACCEPTED MANUSCRIPT al., 2015a). In addition, efficient DNA fragment deletion could be achieved through MMEJ when short direct repeats are present at the two DSBs or embedded nearby (Li et al., 2015a). In summary, both precise and imprecise genome editing could be achieved by CRISPR through various DNA repair pathways (Fig. 1A).

RI PT

Understanding the mechanisms of HR and NHEJ would help to control the choice of the two pathways in Cas9-mediated genome editing. For example, imprecise repair of target DNA cleavages by NHEJ could be partially alleviated by the use of HR-mediated repair. The efficiency of HR can be enhanced several folds by repressing the NHEJ key molecules Ku70/80 or DNA ligase IV (Chu et al., 2015).

TE D

M AN U

SC

3. CRISPR DNA-FRAGMENT EDITING AND GENE REGULATION Recent studies revealed that the human genome contains several millions of noncoding regulatory DNA elements (ENCODE Project Consortium, 2012; Thurman et al., 2012). Hierarchical higher-order chromatin organization and long-range DNA-looping interactions between proximal promoters and distal DNA elements, such as enhancers, insulators, silencers, and locus control regions (LCR) are central for developmental gene regulation (Cremer and Cremer, 2001; Lieberman-Aiden et al., 2009; Dixon et al., 2012; Gibcus and Dekker, 2013). The CRISPR DNA-fragment editing systems are very useful to investigate spatiotemporal gene regulation and 3D genome organization through engineering regulatory DNA elements (Fig. 1B) (Canver et al., 2014; Canver et al., 2015; Guo et al., 2015; Kraft et al., 2015; Li et al., 2015a; Lupiáñez et al., 2015).

AC C

EP

3.1 Enhancers Enhancers were originally defined as remote DNA element that activates a promoter in an orientation-independent manner in a heterologous reporter assay of transfection experiments (Banerji et al., 1981). However, in the endogenous chromosomal environment, inversion of an enhancer in the protocadherin (Pcdh) clusters decreases their gene expression, demonstrating an orientation-dependent enhancer function (Guo et al., 2015). Enhancer orientation likely influences higher-order chromatin conformation and long-distance DNA looping interactions which are important for gene regulation. Thus, enhancers may activate promoters in vivo in an orientationdependent manner (Guo et al., 2015). Enhancers are central for spatiotemporal gene expression during cellular differentiation, tissue morphogenesis, organ development, and disease progression through long-distance enhancer-promoter interactions (Bulger and Groudine, 2011; Levine et al., 2014). Although there are only about 20,000 genes in the human genome, many genes have two or more 9

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

promoters (Zhang et al., 2004). In particular, the neuronal nitric oxide synthase (NOS1), glucocorticoid receptor (GR), and plectin genes may have more than a dozen promoters (Zhang et al., 2004). Finally, genome-wide studies have revealed that there are far more enhancers than promoters in the human genome (ENCODE Project Consortium, 2012). Specifically, there may exist up to a million noncoding DNA elements with characteristic features of enhancers, such as enriched H3K4Me1 and H3K27ac histone marks, constrained positive selection, bidirectional transcription of eRNAs (enhancer RNAs), open and increased chromatin accessibility, reduced DNA methylation, abundant disease-association variants, and cell type-specific activation profiles (Heintzman et al., 2007; Xi et al., 2007; Visel et al., 2009; Hardison, 2010; Kim et al., 2010; Noonan and McCallion, 2010; Rada-Iglesias et al., 2011; Maurano et al., 2012; Tang et al., 2015). Clustered strong enhancers, known as super-enhancers, often stretch a large region and are particularly important in determining cell-specific gene expression and cellular identity (Ernst et al., 2011; Whyte et al., 2013).

TE D

CRISPR DNA-fragment editing is useful in accessing the function of noncoding DNA elements such as enhancers (Guo et al., 2015; Li et al., 2015a; Li et al., 2015b). CRISPR-mediated deletion of regulatory elements and enhancers has revealed their new roles in gene regulation (Li et al., 2015a). CRISPR DNA-fragment editing has generated mouse models of human limb malformations. For example, enhancers targeting to one gene can be retargeted to another gene through de novo enhancer-promoter interactions by CRISPR DNA-fragment editing, resulting in gene dysregulation (Lupiáñez et al., 2015). Thus, alteration of TAD structures is important in the pathogenesis of human diseases (Ren and Dixon, 2015).

AC C

EP

In a recent study, CRISPR DNA fragment deletion of a composite erythroid enhancer has confirmed its role in the activation of a repressor gene of fetal globin gene (Canver et al., 2015). The authors then developed pooled CRISPR-Cas9 sgRNA libraries to perform in situ saturating mutagenesis of the human and mouse enhancers. They designed a library of all possible sgRNAs within the human composite enhancer (Canver et al., 2015). The sgRNAs within the library have a median gap between adjacent genomic cleavages of 4 bp. The library was then introduced into cultured cells through lentiviral transduction at low multiplicity such that each of nearly all selected cell clones contained one single integrant. Each sgRNA-targeted site was disrupted by a single DSB followed by NHEJ repair with indels. This CRISPR screening approach reveals critical minimal features and discrete vulnerabilities of the composite enhancer and is generally applicable to functional interrogation of noncoding genomic elements (Canver et al., 2015). 3.2 Silencers 10

ACCEPTED MANUSCRIPT

SC

RI PT

Spatiotemporal expression of a given gene is tightly controlled during tissue development. In addition to enhancers, silencers, a class of noncoding DNA elements that negatively regulate gene expression, are also important for gene regulation (Maston et al., 2006). For example, a specific type of silencers called neuron-restrictive silencer elements (NRSEs) are recognized by neuron-restrictive silencer factor (NRSF), known also as RE1 silencing transcription factor (REST) and inactivate promoters of neural genes in nonneuronal cells (Chong et al., 1995; Schoenherr and Anderson, 1995; Johnson et al., 2007). Among the 15849 NRSF binding sites identified in the K562 cell genome, 8592 or 54% of them have at least one CTCF binding site (CBS) within 2 kb (Guo et al., 2015). Since CTCF and the associated cohesin complex can loop distal DNA elements to cognate promoters, silencers may function in a manner similar to enhancers through long-distance chromatin looping interactions ().

AC C

EP

TE D

M AN U

3.3 Insulators Insulators were originally identified as genetic elements in Drosophila that prevent position-effect variegation of gene expression and act as chromatindomain barriers or enhancer blockers (Chung et al., 1993; Bell et al., 2001; Handoko et al., 2011; Ali et al., 2016). Insulator elements, which define the boundaries between chromatin domains, play pivotal roles in orchestrating proper long-range DNA-looping interactions between distal regulatory DNA elements (enhancers and silencers) and their cognate promoters (Ong and Corces, 2014). In mammals, CTCF has been found to be the major insulatorbinding protein that plays a key role in weaving the genome fiber to form topological domains (Ong and Corces, 2014). ChIP-seq experiments have identified over 100,000 CBSs in mammalian genomes; however, whether all of these CBSs function as boundary insulators is not known (Kim et al., 2007; Shen et al., 2012; Wang et al., 2012). Nevertheless, the boundary CBS is important in gene expression control and tumorigenesis. For example, CRISPR disruption of a CBS, which function as an insulation boundary for an oncogene, increases cell proliferation of a cellular model of human gliomas (Flavahan et al., 2016). 3.4 LCR LCR is operationally defined as an element that activates an entire locus (linked promoters) or gene clusters in a position-independent manner (Hardison et al., 1997). The well-characterized β-globin LCR consists of 5 HS sites (HS1-5) and, together with additional elements, controls exquisite sequential developmental expression of ε-, γ-, δ-, and β- globin genes during hematopoiesis (Orkin, 1990; Hardison et al., 1997). HS2 itself has an enhancer activity; however, the entire LCR element, with additional proximal regulatory elements, functions as a unit for controlling proper gene expression of the β-globin locus (Hardison et al., 1997). LCR in the human CD2 locus 11

ACCEPTED MANUSCRIPT establishes and maintains its open chromatin configuration even in a heterochromatin environment (Festenstein et al., 1996). LCR may overcome the heterochromatin-induced position-effect variegation, and thus, has functional similarity of an insulator. In summary, LCRs appear to be composite cis-regulatory elements that control developmental expression patterns of gene loci or gene clusters.

M AN U

SC

RI PT

3.5 Composite DNA element HS5-1 in the Pcdh clusters is a composite noncoding element functioning as enhancer, silencer, or insulator in different tissues. The HS5-1 noncoding element was originally discovered by DNase I digestion assays as a sensitive site required for maximal expression of members of the Pcdhα cluster in the brain (Ribich et al., 2006; Kehayova et al., 2011; Yokota et al., 2011). HS5-1 was then found to function as a silencer because its targeted deletion results in aberrant expression of the Pcdhα cluster in the kidney (Kehayova et al., 2011). In addition, its silencing activity is dependent on the NRSF/REST occupation of NRSE within HS5-1 (Kehayova et al., 2011). Finally, HS5-1 orientation determines its directional looping to the Pcdhα cluster as demonstrated by experiments through CRISPR DNA-fragment editing (Guo et al., 2015; Li et al., 2015a; Li et al., 2015b).

AC C

EP

TE D

Inversion or deletion of HS5-1 results in dysregulation of not only members of the Pcdhα cluster, but also members of the Pcdhβγ clusters. Thus, HS5-1 functions as an insulator for proper cell-specific Pcdh gene expression in the brain (Guo et al., 2015; Li et al., 2015a; Li et al., 2015b). Several HS sites downstream of the Pcdhβγ gene clusters are also a very large composite enhancer (Yokota et al., 2011; Guo et al., 2012; Guo et al., 2015). In summary, HS5-1 is a composite noncoding DNA element functioning as enhancer, silencer, or insulator depending on specific cellular context. HS5-1 is both an enhancer for the Pcdhα cluster through CTCF/cohesin-mediated cell-specific chromatin looping within the Pcdhα CCD/subTAD (CCD: CTCF/cohesin-mediated chromatin domain) and an insulator for the Pcdhβγ clusters by the virtue of its location in the invariant domain boundary (Guo et al., 2012; Guo et al., 2015). This configuration prevents improper chromatinlooping interactions of the HS5-1 enhancer with the Pcdhβγ promoters within the Pcdhβγ CCD/subTAD. 4. CRISPR EPIGENOME EDITING AND LOCUS IMAGING Epigenetic modifications, such as DNA methylation, hydroxymethylation, histone modifications, and non-coding RNAs (ncRNAs), shape the eukaryotic chromatin structures and control development of multicellular organism. These epigenetic modifications are reversible and are precisely regulated by epigenetic enzymes. Many epigenetic modifications alter chromatin plasticity and chromosome conformation. CRISPR systems have found many 12

ACCEPTED MANUSCRIPT applications in epigenome editing and probing gene foci in 3D genomes.

SC

RI PT

4.1 Deactivated Cas9 (dCas9) The discovery of the RNA-programed Cas9-DNA interaction system has powered the abilities of epigenome editing (Fig. 2). This directly alters chromatin marks at specific genomic loci by using engineeredeffectors consisting of DNA recognition domains and catalytic domains from a chromatin-modifying enzyme (Kungulovski and Jeltsch, 2016). For example, the nuclease-deactivated Cas9 (dead Cas9: dCas9) protein, mutated in its RuvC (D10A) and HNH (H841A) domains, has been used as a DNA recognition domain linked to a repressor or an activator through polypeptide bonds (Fig. 2A). In addition, dCas9 can be linked to a repressor or an activator through RNA binding modules and extended sgRNAs (Fig. 2B).

TE D

M AN U

The Krüppel-associated box (KRAB) domain is a commonly used repressor effector. It recruits a heterochromatin-forming complex that causes histone methylation and deacetylation (Fig. 2A). The dCas9-KRAB fusion protein has effectively silenced noncoding RNAs or single genes when targeted to promoters, 5′ untranslated regions, as well as proximal and distal enhancer elements. For example, CRISPR targeting of the KRAB repressor (dCas9KRAB) to HS2, an enhancer element within the LCR of the β-globin cluster, induces trimethylation of H3K9 in the HS2 enhancer and silenced the expression of the entire gene cluster (Thakore et al., 2015). Thus, targeted epigenetic editing of HS2 silenced the expression of multiple globin genes.

AC C

EP

The dCas9-histone lysine-specific demethylase 1 (LSD1) fusion protein allows an effector-dependent silencing of functional, native enhancer elements (Fig. 2C) (Kearns et al., 2015). The dCas9-LSD1 method has been suggested as a rapid and powerful approach to dissect the function of distal cis-regulatory regions. Using this fusion protein, Kearns et al. (2015) functionally characterized the roles of novel enhancer elements in maintaining the ESC state. They generated stable N. meningitidis (Nm) dCas9-effector ESCs and delivered sgRNAs by the lentiviral system. Upon targeting, H3K4me2 (an LSD1 substrate) and H3K27ac (an active enhancer mark), were lost around the enhancer-sgRNA target site. The dCas9 was also fused to some activation domains (Fig. 2A), including the herpes simplex viral protein 16 tetramer (VP64) and the catalytic histone acetyltransferase core domain of the human E1A-associated protein p300 (p300Core) (Fig. 2C). The dCas9-VP64 activators require multiple activator domains or combinations of gRNAs to achieve high levels of gene induction by their synergistic effects. They act as scaffolds for recruiting multiple components of the preinitiation complex and do not enzymatically modulate chromatin state directly. Guided by an individual sgRNA, dCas9-p300Core 13

ACCEPTED MANUSCRIPT directly catalyzes acetylation of histone H3 lysine 27, leading to robust transcriptional activation of target genes (Fig. 2C) (Hilton et al., 2015).

RI PT

4.2 CRISPR Labelling of Chromosomal Loci The conformation and dynamics of native chromosomes play critical roles in regulating genome function. Genomic loci located megabases away or even on different chromosomes could be brought into proximity though long-range DNA interactions. Interactions between specific genomic sites can be visualized by fluorescence in situ hybridization (FISH) which involves denaturing dsDNA and hybridizing fluorescent nucleic acid probes.

M AN U

SC

Methods for visualization of specific chromosomal sites in living cells have been developed based on TALEs and CRISPR-dCas9. The dCas9 system provides a more universal and flexible platform for dynamic imaging of specific, endogenous genomic loci in living cells (Fig. 2D). Using an EGFPtagged dCas9 and a sequence-specific, structure-optimized sgRNA that improves its interaction with dCas9, robust imaging of genomic loci can be achieved in living cells (Chen et al., 2013). In addition, the dCas9 fusion proteins have been used as probes to label sequence-specific genomic loci fluorescently without global DNA denaturing in tissue sections (Deng et al., 2015).

AC C

EP

TE D

Finally, fusion of different fluorescent proteins to distinct dCas9s allowed simultaneous coloring of multiple loci to probe their spatial organization in the cell nucleus. Advances in orthogonal Cas9 proteins or modified sgRNAs further extend the usage of CRISPR imaging to a multi-color and multi-locus labeling system. Ma et al. designed multicolor versions of CRISPR using dCas9 from three bacterial orthologs: Sp dCas9-mCherry, Nm dCas9-GFP and S. thermophiles dCas9-GFP (Ma et al., 2015). Each pair of dCas9fluorescent proteins and their cognate sgRNAs efficiently labelled multiple target loci in living human cells, allowing estimation of the inter-locus distances. The fluorescence resolution between two loci on the same chromosome maps their spatial distance. By comparison with their linear distance along the chromosome, the DNA compaction could be inferred in living cells (Ma et al., 2015). At minimal, the two-color CRISPR imaging system could be used to characterize long-range DNA interactions between two distal genomic loci, and has great potential to facilitate research on complex chromosomal architecture and spatial chromatin organization (Fig. 2D). 5. CTCF CRISPR ENGINEERING OF 3D GENOME Prokaryotic chromosomal DNA is folded within a cell and eukaryotic genome chromatin fibers are folded to fit the limited volume of a nucleus (Badrinarayanan et al., 2015; Sexton and Cavalli, 2015). Although there might 14

ACCEPTED MANUSCRIPT

SC

RI PT

be many similar aspects between genome and protein folding, little is known about the general principles that establish, maintain, and reorganize hierarchical genome organization (Sexton and Cavalli, 2015). Recent studies, however, revealed that a class of ubiquitously expressed proteins known as architectural factors could construct an orderly folded genome (Sexton and Cavalli, 2015; Xu and Corces, 2015). These architectural proteins are enriched at boundaries between topologically associated domains (TADs) (Li et al., 2015c; Xu and Corces, 2015). The redistribution of architectural proteins in the genome, which alters TAD organization, could result in polycomb-mediated stress-induced gene silencing (Li et al., 2015c). Among many architectural proteins, CTCF emerged as a key player in the topological folding of 3D genomes (Xu and Corces, 2015). Thus, the linear genomic sequence may contain a CBS blueprint that codes for 3D genome folding (Nichols and Corces, 2015).

M AN U

5.1 Chromatin Looping Orientation Determined by Directional CTCF Binding

TE D

CTCF is a highly conserved and ubiquitously expressed transcription factor that contains 11 zinc fingers (ZFs) (Baniahmad et al., 1990; Lobanenkov et al., 1990; Filippova et al., 1996; Ong and Corces, 2014). Since each ZF generally recognizes 3 bp, the CBS core sequences of 12 bp are recognized by ZF4-7 (Renda et al., 2007). Genome-wide analyses revealed 18- to 20-bp motif consensus (Holohan et al., 2007; Kim et al., 2007; Xie et al., 2007), which includes the 12-bp core sequences. Subsequent studies revealed a 42bp CBS which consists of four modules (modules1-4) or two motifs (M1 and M2) (Rhee and Pugh, 2011; Schmidt et al., 2012; Guo et al., 2015).

AC C

EP

Initial electrophoretic mobility shift assays (EMSAs) revealed that CTCF ZF4-7 are essential for its strong binding to the core 12-bp DNA sequence in vitro in a preferred orientation (Quitschke et al., 2000; Renda et al., 2007). CTCF also recognizes a wide array of CBSs in vivo in a preferred orientation (Schmidt et al., 2012; Nakahashi et al., 2013). However, the human genome contains a large number of palindromic CBSs (Xie et al., 2007). Comprehensive biophysical analyses of CTCF recognition of CBS with a 17-bp palindromic core sequence revealed that the module1 is the key determinant in directional CTCF binding to CBSs with palindromic cores (Guo et al., 2015). The human clustered Pcdh locus contains 53 highly-similar variable exons, each of which is preceded by a distinct promoter, organized into three sequentially linked clusters (Pcdhα, β, and γ) that are highly conserved in vertebrates and cephalopods (Fig. 3A) (Wu and Maniatis, 1999; Wu, 2005; Albertin et al., 2015). They are expressed in many brain regions in a cellspecific manner and are required for specifying exquisite dendritic patterning and neuronal connectivity during brain development (Esumi et al., 2005; Zou et al., 2007; Chen et al., 2012; Garrett et al., 2012; Lefebvre et al., 2012; Suo 15

ACCEPTED MANUSCRIPT et al., 2012; Thu et al., 2014).

M AN U

SC

RI PT

In the human Pcdh gene clusters, a 21- to 23-bp conserved sequence element (CSE) was initially identified in the promoter region of each variable first exon (except Pcdhαc2, β1, γc4, and γc5) (Fig. 3A) (Wu et al., 2001). Subsequent studies revealed that CSEs were CTCF recognition sites, thus were actually CBSs (Fig. 3A) (Golan-Mashiach et al., 2011; Guo et al., 2012; Monahan et al., 2012). In addition, there is a second CBS (exonic CBS: eCBS) in the coding region of each alternate member (except Pcdhα1 which does not contain eCBS) of the Pcdhα cluster (Fig. 3A) (Golan-Mashiach et al., 2011; Guo et al., 2012; Monahan et al., 2012). Moreover, directional CTCF binding to paired CBSs in variable promoters and the HS5-1 enhancer is required for specific enhancer-promoter interactions for every activated member of the Pcdhα gene cluster (Fig. 3A) (Guo et al., 2012; Guo et al., 2015). Thus, CTCF/cohesin-mediated long-distance enhancer-promoter looping interactions are required for the Pcdhα promoter choice in the brain (Guo et al., 2012).

TE D

An important observation was that the anchoring CBS pairs of the Pcdh enhancer-promoter loops are in opposite orientations (Fig. 3A) (Guo et al., 2012). The same opposite oriented anchoring pairs were observed in the immunoglobulin gene cluster (Alt et al., 2013). In both Pcdh and immunoglobulin gene clusters, a large number of forward CBSs are arrayed in tandem in the variable region. These forward oriented CBSs are specifically looped to reverse CBSs located in the downstream regulatory region (Guo et al., 2012; Alt et al., 2013; Guo et al., 2015).

AC C

EP

Chromosome conformation capture (Dekker et al., 2002) and related 4C experiments (Simonis et al., 2006; Zhao et al., 2006) revealed that the HS5-1 enhancer, which contains two reverse-oriented CBSs, is in close contact with selected upstream Pcdhα variable promoters through multiple overlapping CTCF-mediated long-range DNA looping interactions (Guo et al., 2012). These multiple overlapped chromatin loops form a Pcdhα CCD/subTAD (Guo et al., 2015). Similarly, the super-enhancer regulatory region downstream of the Pcdhγ cluster, which contains five CBSs in reverse orientation in tandem, loops to multiple variable promoters of members of the Pcdhβ and γ gene clusters, forming the Pcdhβγ CCD/subTAD (Fig. 3A) (Guo et al., 2015). Thus, the three Pcdh clusters are spatially organized into two neighboring CCDs or subTADs (Fig. 3A) (Guo et al., 2015). Genome-wide analyses revealed strong correlations between convergent CBSs and chromatin loops (Rao et al., 2014; Gómez-Marín, et al., 2015; Guo et al., 2015; Tang et al., 2015; Vietri Rudan et al., 2015). Are the observed correlations between CBS orientations and chromatin contacts functional? CRISPR engineering of CBSs demonstrates that inversion of CBSs switches directions of long-range chromatin looping interactions. In particular, in situ inversion of an enhancer element containing two boundary CBSs by CRISPR 16

ACCEPTED MANUSCRIPT DNA-fragment editing causes switching of looping orientations from upstream to downstream (Guo et al., 2015). Importantly, this alteration of CCD domains causes extensive dysregulation of expression of members of all three Pcdh gene clusters, demonstrating functional consequence of enhancer inversion and topological domain alteration (Guo et al., 2015).

M AN U

SC

RI PT

Similar switching of genome topology was observed upon CRISPR-mediated inversion of boundary CBSs in the β-globin gene locus (Guo et al., 2015). However, insertion of a 50-bp inverted CBS only disrupted old chromatin loops but did not result in formation of new loops (de Wit et al., 2015). Thus, in addition to CTCF and the associated cohesin complex, other architecture proteins are required for establishing proper directionality of genome looping (Xu and Corces, 2015; Ali et al., 2016). The CBS architecture rule observed in the Pcdh and β-globin loci is applicable to genome-wide topological chromatin organization (Guo et al., 2015). Overlapping CTCF/cohesin-mediated longrange chromatin loops between forward-reverse CBSs form CCD/subTADs (Guo et al., 2015; Tang et al., 2015), resulting in a boundary between neighboring topological domains with a pair of CBSs configured in a reverseforward orientation (Gómez-Marín et al., 2015; Guo et al., 2015).

5.2 Multivalent CTCF Cellular Functions

AC C

EP

TE D

CTCF associates with the cohesin complex to establish genome-wide chromatin-looping interactions (Parelho et al., 2008; Wendt et al., 2008; Handoko et al., 2011; Phillips-Cremins et al., 2013). For example, the CTCF/cohesin complex is thought to organize constitutive DNA-looping interactions at TAD boundaries (Dixon et al., 2012; Nora et al., 2012). However, the genome-wide CTCF/cohesin complex occupancy is plastic and the chromosomal architecture is dynamic during development and disease (Hou et al., 2010; Wang et al., 2012; Dixon et al., 2015). In the mouse β-globin locus, CTCF is required for long-distance chromatin interactions of CBSs flanking the β-globin genes, suggesting that CTCF organizes higher-order chromatin structures (Splinter et al., 2006). ChIA-PET experiments (Fullwood et al., 2009) revealed CTCF organizes stem cell genomes into five distinct types of interacting chromatin domains (Handoko et al., 2011). In addition, genome-wide locations of CTCF, cohesin, and H3K4me3 are strongly enriched at transcription start site (Jia et al., 2014). Finally, 4C-seq and RNAi experiments suggest that CTCF and the associated cohesin complex regulate the expression of a member of non-clustered Pcdh family through promoterpromoter interactions (Jia et al., 2014). CTCF has been implicated in a variety of cellular and developmental functions, including transcriptional activation, repression, and imprinting, RNA polymerase pausing, alternative splicing, DNA replication and repair, chromosome condensation and translocation, X chromosome inactivation, 17

ACCEPTED MANUSCRIPT tumorigenesis, V(D)J recombination in the immune system, and promoter choice in the nervous system (Lobanenkov et al., 1990; Ong and Corces, 2014). However, the most prominent role of CTCF is functioning as an insulator-binding protein that restricts proper enhancer activation to cognate promoters and blocks enhancer activity from non-cognate promoters (Chung et al., 1993; Bell et al., 1999; Bell and Felsenfeld, 2000; Hark et al., 2000; Ong and Corces, 2014).

SC

RI PT

The multivalent biological functions and prominent insulator roles of CTCF may be related to its bridging ability in long-distance chromatin looping in the higher-order genome architecture. The seemingly contradictory cellular functions of CTCF as both activator and repressor of transcription could be explained by whether it loops an enhancer or a silencer in close contact with a promoter (Fig. 3B).

TE D

M AN U

The oriented CBS looping of the CCD model provides a simple explanation for the pivotal role of CTCF in organizing chromatin during 3D genome folding and provides a unifying mechanism for multivalent and seemingly conflicting functions of CTCF in epigenomic regulation. Depending on the topology of chromatin looping, as well as CBS orientation and location, a CBS element bound by CTCF/cohesin could function as a chromatin barrier (Ong and Corces, 2014), a boundary between the marks of active euchromatin and repressive heterochromatin (Splinter et al., 2006), or an insulator between distinct topological chromatin domains (Handoko et al., 2011).

5.3 A CTCF Looping Mechanism for 3D Genome Folding

AC C

EP

Are all of more than 100000 CBSs functioning as insulators or do they code for 3D genomes? Does the genome-wide CBS organization have an instructional role in 3D genome folding? Inversion of boundary CBS elements switches the directionality of topological chromatin looping and alters gene expression (Guo et al., 2015). In addition, deletion of boundary CBS but not internal CBS results in merging of topological domains in β-globin clusters (Guo et al., 2015). Genome-wide computational analysis revealed that oriented CTCF/cohesin binding and specific directional chromatin looping occurs throughout mammalian genomes (Guo et al., 2015). In addition, haplotype variants and allelic interactions of CBSs regulate chromatin topology and gene expression (Tang et al., 2015). Thus, the location and orientation of genome-wide CBSs code for 3D genome architecture (Fig. 3C). Every CBS in the genome has an orientation, but are all insulators oriented or not? The directionality of looping formation by oriented CTCF binding to CBSs may explain seemingly contradictory data previously obtained from reporter gene assays or transgenic mice experiments that addressed the question of whether insulators function in an orientation-dependent manner (Bell et al., 1999; Tanimoto et al., 1999; Bell and Felsenfeld, 2000; Hark et al., 2000; 18

ACCEPTED MANUSCRIPT Saitoh et al., 2000). When reporter genes and CBS-containing insulator elements are randomly inserted into the mouse genome during transgenic experiments, the oriented CBS can always loop to the convergent CBSs in the flanking genomic regions; however, depending on the orientation of CBSs and transgenic insertions, sometimes CBSs loop away from the reporter genes and sometimes CBSs loop toward the reporter genes.

TE D

M AN U

SC

RI PT

CRISPR engineering reveals intricate relationship between CBSs, topological domains, and genome regulation. First, CRISPR disruption of CBSs demonstrated that super-enhancer controlling expression of cell identity genes occurs only in insulated neighborhood (Dowen et al., 2014). Second, during limb development, telomeric and centromeric TADs in the HoxD gene cluster control sequential early gene expression for arm/forearm and late expression for hands, respectively (Andrey et al., 2013). Switching looping direction of HoxD genes to distinct enhancer clusters between the two TADs in a conformational change underlies the collinearity of HoxD gene regulation (Andrey et al., 2013). Third, in the HoxA gene cluster, which also contains two TADs, CRISPR-mediated deletion of a boundary CBS resulted in spreading of the active chromatin to next CBS in the repressive chromatin domain (Narendra et al., 2015). Finally, physical simulation of Hi-C data and CRISPR editing of CBSs lead to an extrusion model for 3D genome folding (Sanborn et al., 2015). The CTCF bending DNA and cohesin extruding DNA loops may be a mechanism for topological folding of 3D genome (Imakaev et al., 2015; Nichols and Corces, 2015; Sanborn et al., 2015; Xu and Corces, 2015; Ali et al., 2016).

AC C

EP

6. PERSPECTIVES Forms determine function. Linear genomic sequences encode the 3D genome. However, the physical form of 3D genomes controls genome expression, leading to versatile biological function. Proteins are coded by linear gene coding sequences; however, when and where a gene gets expressed are largely determined by the spatial organization of noncoding genetic elements. CRISPR has found new and exciting avenues in probing and engineering 3D genomes, including editing noncoding genetic elements, modifying chromatin states, and labeling conformational interactions. Its elegant simplicity facilitates the discovery and dissection of millions of genomic DNA elements. Recent exciting progress indicates that CRISPR manipulation of DNA elements could have a profound impact on genome folding and gene regulation. In particular, topological domains could be predicted based on genome-wide locations and relative orientation of CBSs as well as its directional CTCF binding. In addition, genome topology could be engineered by manipulating controlling elements and dissecting architectural proteins. The labyrinth topology of 3D genome has an enormous complexity; however, 19

ACCEPTED MANUSCRIPT as technology develops, we may solve the Lego Puzzle of 3D genomes someday in the future. In the end, the era that forms follow function for 3D genomes will come.

RI PT

ACKNOWLEDGMENTS This work was supported by the National Natural Science Foundation for the Youth of China (No. 81302861); the National Natural Science Foundation of China (No. 91519302); and the Science and Technology Commission of Shanghai Municipality (No. 14JC1403600). We thank members of the Wu lab for discussions, Jia Shou for drawing Fig. 3C, and Diyu Chen for assistance.

AC C

EP

TE D

M AN U

SC

REFERENCES Albertin, C.B., Simakov, O., Mitros, T., Wang, Z.Y., Pungor, J.R., EdsingerGonzales, E., Brenner, S., Ragsdale, C.W., Rokhsar, D.S., 2015. The octopus genome and the evolution of cephalopod neural and morphological novelties. Nature 524, 220-224. Ali, T., Renkawitz, R., Bartkuhn, M., 2016. Insulators and domains of gene expression. Curr. Opin. Genet. Dev. 37, 17-26. Alt, F.W., Zhang, Y., Meng, F.L., Guo, C., Schwer, B., 2013. Mechanisms of programmed DNA lesions and genomic instability in the immune system. Cell 152, 417-429. Andrey, G., Montavon, T., Mascrez, B., Gonzalez, F., Noordermeer, D., Leleu, M., Trono, D., Spitz, F., Duboule, D., 2013. A switch between topological domains underlies HoxD genes collinearity in mouse limbs. Science 340, 1234167. Bétermier, M., Bertrand, P., Lopez, B.S., 2014. Is non-homologous end-joining really an inherently error-prone process? PLoS Genet. 10, e1004086. Badrinarayanan, A., Le, T.B., Laub, M.T., 2015. Bacterial chromosome organization and segregation. Annu. Rev. Cell Dev. Biol. 31, 171-199. Banerji, J., Rusconi, S., Schaffner, W., 1981. Expression of a beta-globin gene is enhanced by remote SV40 DNA sequences. Cell 27, 299-308. Baniahmad, A., Steiner, C., Kohne, A.C., Renkawitz, R., 1990. Modular structure of a chicken lysozyme silencer: involvement of an unusual thyroid hormone receptor binding site. Cell 61, 505-514. Bell, A.C., Felsenfeld, G., 2000. Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature 405, 482-485. Bell, A.C., West, A.G., Felsenfeld, G., 1999. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell 98, 387-396. Bell, A.C., West, A.G., Felsenfeld, G., 2001. Insulators and boundaries: versatile regulatory elements in the eukaryotic genome. Science 291, 447450. Bulger, M., Groudine, M., 2011. Functional and mechanistic diversity of distal transcription enhancers. Cell 144, 327-339. Byrne, S.M., Ortiz, L., Mali, P., Aach, J., Church, G.M., 2015. Multi-kilobase 20

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

homozygous targeted gene replacement in human induced pluripotent stem cells. Nucleic Acids Res. 43, e21. Canver, M.C., Bauer, D.E., Dass, A., Yien, Y.Y., Chung, J., Masuda, T., Maeda, T., Paw, B.H., Orkin, S.H., 2014. Characterization of genomic deletion efficiency mediated by clustered regularly interspaced palindromic repeats (CRISPR)/Cas9 nuclease system in mammalian cells. J. Biol. Chem. 289, 21312-21324. Canver, M.C., Smith, E.C., Sher, F., Pinello, L., Sanjana, N.E., Shalem, O., Chen, D.D., Schupp, P.G., Vinjamur, D.S., Garcia, S.P., Luc, S., Kurita, R., Nakamura, Y., Fujiwara, Y., Maeda, T., Yuan, G.C., Zhang, F., Orkin, S.H., Bauer, D.E., 2015. BCL11A enhancer dissection by Cas9-mediated in situ saturating mutagenesis. Nature 527, 192-197. Carlson, D.F., Tan, W., Lillico, S.G., Stverakova, D., Proudfoot, C., Christian, M., Voytas, D.F., Long, C.R., Whitelaw, C.B., Fahrenkrug, S.C., 2012. Efficient TALEN-mediated gene knockout in livestock. Proc. Natl. Acad. Sci. USA 109, 17382-17387. Carroll, D., 2014. Genome engineering with targetable nucleases. Annu. Rev. Biochem. 83, 409-439. Ceccaldi, R., Rondinelli, B., D'Andrea, A.D., 2016. Repair pathway choices and consequences at the double-strand break. Trends Cell Biol. 26, 52-64. Chen, B., Gilbert, L.A., Cimini, B.A., Schnitzbauer, J., Zhang, W., Li, G.W., Park, J., Blackburn, E.H., Weissman, J.S., Qi, L.S., Huang, B., 2013. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479-1491. Chen, W.V., Alvarez, F.J., Lefebvre, J.L., Friedman, B., Nwakeze, C., Geiman, E., Smith, C., Thu, C.A., Tapia, J.C., Tasic, B., Sanes, J.R., Maniatis, T., 2012. Functional significance of isoform diversification in the protocadherin gamma gene cluster. Neuron 75, 402-409. Chiruvella, K.K., Liang, Z., Wilson, T.E., 2013. Repair of double-strand breaks by end joining. Cold Spring Harb. Perspect. Biol. 5, a012757. Cho, S.W., Kim, S., Kim, J.M., Kim, J.S., 2013. Targeted genome engineering in human cells with the Cas9 RNA-guided endonuclease. Nat. Biotechnol. 31, 230-232. Choi, P.S., Meyerson, M., 2014. Targeted genomic rearrangements using CRISPR/Cas technology. Nat. Commun. 5, 3728. Chong, J.A., Tapia-Ramirez, J., Kim, S., Toledo-Aral, J.J., Zheng, Y., Boutros, M.C., Altshuller, Y.M., Frohman, M.A., Kraner, S.D., Mandel, G., 1995. REST: a mammalian silencer protein that restricts sodium channel gene expression to neurons. Cell 80, 949-957. Chu, V.T., Weber, T., Wefers, B., Wurst, W., Sander, S., Rajewsky, K., Kuhn, R., 2015. Increasing the efficiency of homology-directed repair for CRISPRCas9-induced precise gene editing in mammalian cells. Nat. Biotechnol. 33, 543-548. Chung, J.H., Whiteley, M., Felsenfeld, G., 1993. A 5' element of the chicken 21

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74, 505-514. Cong, L., Ran, F.A., Cox, D., Lin, S., Barretto, R., Habib, N., Hsu, P.D., Wu, X., Jiang, W., Marraffini, L.A., Zhang, F., 2013. Multiplex genome engineering using CRISPR/Cas systems. Science 339, 819-823. Cremer, T., Cremer, C., 2001. Chromosome territories, nuclear architecture and gene regulation in mammalian cells. Nat. Rev. Genet. 2, 292-301. de Laat, W., Duboule, D., 2013. Topology of mammalian developmental enhancers and their regulatory landscapes. Nature 502, 499-506. de Wit, E., Vos, E.S., Holwerda, S.J., Valdes-Quezada, C., Verstegen, M.J., Teunissen, H., Splinter, E., Wijchers, P.J., Krijger, P.H., de Laat, W., 2015. CTCF binding polarity determines chromatin looping. Mol. Cell 60, 676-684. Dekker, J., Rippe, K., Dekker, M., Kleckner, N., 2002. Capturing chromosome conformation. Science 295, 1306-1311. Deng, W., Shi, X., Tjian, R., Lionnet, T., Singer, R.H., 2015. CASFISH: CRISPR/Cas9-mediated in situ labeling of genomic loci in fixed cells. Proc. Natl. Acad. Sci. USA 112, 11870-11875. Dixon, J.R., Jung, I., Selvaraj, S., Shen, Y., Antosiewicz-Bourget, J.E., Lee, A.Y., Ye, Z., Kim, A., Rajagopal, N., Xie, W., Diao, Y., Liang, J., Zhao, H., Lobanenkov, V.V., Ecker, J.R., Thomson, J.A., Ren, B., 2015. Chromatin architecture reorganization during stem cell differentiation. Nature 518, 331336. Dixon, J.R., Selvaraj, S., Yue, F., Kim, A., Li, Y., Shen, Y., Hu, M., Liu, J.S., Ren, B., 2012. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485, 376-380. Doudna, J.A., Charpentier, E., 2014. Genome editing. The new frontier of genome engineering with CRISPR-Cas9. Science 346, 1258096. Dowen, J.M., Fan, Z.P., Hnisz, D., Ren, G., Abraham, B.J., Zhang, L.N., Weintraub, A.S., Schuijers, J., Lee, T.I., Zhao, K., Young, R.A., 2014. Control of cell identity genes occurs in insulated neighborhoods in mammalian chromosomes. Cell 159, 374-387. Dzantiev, L., Constantin, N., Genschel, J., Iyer, R.R., Burgers, P.M., Modrich, P., 2004. A defined human system that supports bidirectional mismatchprovoked excision. Mol. Cell 15, 31-41. ENCODE Project Consortium, 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74. Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang, X., Wang, L., Issner, R., Coyne, M., Ku, M., Durham, T., Kellis, M., Bernstein, B.E., 2011. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43-49. Esumi, S., Kakazu, N., Taguchi, Y., Hirayama, T., Sasaki, A., Hirabayashi, T., Koide, T., Kitsukawa, T., Hamada, S., Yagi, T., 2005. Monoallelic yet combinatorial expression of variable exons of the protocadherin-alpha gene cluster in single neurons. Nat. Genet. 37, 171-176. 22

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Festenstein, R., Tolaini, M., Corbella, P., Mamalaki, C., Parrington, J., Fox, M., Miliou, A., Jones, M., Kioussis, D., 1996. Locus control region function and heterochromatin-induced position effect variegation. Science 271, 1123-1125. Filippova, G.N., Fagerlie, S., Klenova, E.M., Myers, C., Dehner, Y., Goodwin, G., Neiman, P.E., Collins, S.J., Lobanenkov, V.V., 1996. An exceptionally conserved transcriptional repressor, CTCF, employs different combinations of zinc fingers to bind diverged promoter sequences of avian and mammalian cmyc oncogenes. Mol. Cell. Biol. 16, 2802-2813. Flavahan, W.A., Drier, Y., Liau, B.B., Gillespie, S.M., Venteicher, A.S., Stemmer-Rachamimov, A.O., Suva, M.L., Bernstein, B.E., 2016. Insulator dysfunction and oncogene activation in IDH mutant gliomas. Nature 529, 110114. Fullwood, M.J., Liu, M.H., Pan, Y.F., Liu, J., Xu, H., Mohamed, Y.B., Orlov, Y.L., Velkov, S., Ho, A., Mei, P.H., Chew, E.G., Huang, P.Y., Welboren, W.J., Han, Y., Ooi, H.S., Ariyaratne, P.N., Vega, V.B., Luo, Y., Tan, P.Y., Choy, P.Y., Wansa, K.D., Zhao, B., Lim, K.S., Leow, S.C., Yow, J.S., Joseph, R., Li, H., Desai, K.V., Thomsen, J.S., Lee, Y.K., Karuturi, R.K., Herve, T., Bourque, G., Stunnenberg, H.G., Ruan, X., Cacheux-Rataboul, V., Sung, W.K., Liu, E.T., Wei, C.L., Cheung, E., Ruan, Y., 2009. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature 462, 58-64. Garrett, A.M., Schreiner, D., Lobas, M.A., Weiner, J.A., 2012. gammaprotocadherins control cortical dendrite arborization by regulating the activity of a FAK/PKC/MARCKS signaling pathway. Neuron 74, 269-276. Gibcus, J.H., Dekker, J., 2013. The hierarchy of the 3D genome. Mol. Cell 49, 773-782. Golan-Mashiach, M., Grunspan, M., Emmanuel, R., Gibbs-Bar, L., Dikstein, R., Shapiro, E., 2011. Identification of CTCF as a master regulator of the clustered protocadherin genes. Nucleic Acids Res. 40, 3378-3391. Golic, K.G., Golic, M.M., 1996. Engineering the Drosophila genome: chromosome rearrangements by design. Genetics 144, 1693-1711. Gómez-Marín, C., Tena, J.J., Acemel, R.D., Lopez-Mayorga, M., Naranjo, S., de la Calle-Mustienes, E., Maeso, I., Beccari, L., Aneas, I., Vielmas, E., Bovolenta, P., Nobrega, M.A., Carvajal, J., Gomez-Skarmeta, J.L., 2015. Evolutionary comparison reveals that diverging CTCF sites are signatures of ancestral topological associating domains borders. Proc. Natl. Acad. Sci. USA 112, 7542-7547. Guo, Y., Monahan, K., Wu, H., Gertz, J., Varley, K.E., Li, W., Myers, R.M., Maniatis, T., Wu, Q., 2012. CTCF/cohesin-mediated DNA looping is required for protocadherin alpha promoter choice. Proc. Natl. Acad. Sci. USA 109, 21081-21086. Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D.U., Jung, I., Wu, H., Zhai, Y., Tang, Y., Lu, Y., Wu, Y., Jia, Z., Li, W., Zhang, M.Q., Ren, B., Krainer, A.R., Maniatis, T., Wu, Q., 2015. CRISPR Inversion of CTCF sites alters genome topology and enhancer/promoter function. Cell 162, 900-910. 23

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Gupta, A., Hall, V.L., Kok, F.O., Shin, M., McNulty, J.C., Lawson, N.D., Wolfe, S.A., 2013. Targeted chromosomal deletions and inversions in zebrafish. Genome Res. 23, 1008-1017. Handoko, L., Xu, H., Li, G., Ngan, C.Y., Chew, E., Schnapp, M., Lee, C.W., Ye, C., Ping, J.L., Mulawadi, F., Wong, E., Sheng, J., Zhang, Y., Poh, T., Chan, C.S., Kunarso, G., Shahab, A., Bourque, G., Cacheux-Rataboul, V., Sung, W.K., Ruan, Y., Wei, C.L., 2011. CTCF-mediated functional chromatin interactome in pluripotent cells. Nat. Genet. 43, 630-638. Hardison, R., Slightom, J.L., Gumucio, D.L., Goodman, M., Stojanovic, N., Miller, W., 1997. Locus control regions of mammalian beta-globin gene clusters: combining phylogenetic analyses and experimental results to gain functional insights. Gene 205, 73-94. Hardison, R.C., 2010. Variable evolutionary signatures at the heart of enhancers. Nat. Genet. 42, 734-735. Hark, A.T., Schoenherr, C.J., Katz, D.J., Ingram, R.S., Levorse, J.M., Tilghman, S.M., 2000. CTCF mediates methylation-sensitive enhancerblocking activity at the H19/Igf2 locus. Nature 405, 486-489. Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O., Van Calcar, S., Qu, C., Ching, K.A., Wang, W., Weng, Z., Green, R.D., Crawford, G.E., Ren, B., 2007. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311-318. Herault, Y., Rassoulzadegan, M., Cuzin, F., Duboule, D., 1998. Engineering chromosomes in mice through targeted meiotic recombination (TAMERE). Nat. Genet. 20, 381-384. Hilton, I.B., D'Ippolito, A.M., Vockley, C.M., Thakore, P.I., Crawford, G.E., Reddy, T.E., Gersbach, C.A., 2015. Epigenome editing by a CRISPR-Cas9based acetyltransferase activates genes from promoters and enhancers. Nat. Biotechnol. 33, 510-517. Holohan, E.E., Kwong, C., Adryan, B., Bartkuhn, M., Herold, M., Renkawitz, R., Russell, S., White, R., 2007. CTCF genomic binding sites in Drosophila and the organisation of the bithorax complex. PLoS Genet. 3, e112. Hou, C., Dale, R., Dean, A., 2010. Cell type specificity of chromatin organization mediated by CTCF and cohesin. Proc. Natl. Acad. Sci. USA 107, 3651-3656. Hsu, P.D., Lander, E.S., Zhang, F., 2014. Development and applications of CRISPR-Cas9 for genome engineering. Cell 157, 1262-1278. Hwang, W.Y., Fu, Y., Reyon, D., Maeder, M.L., Tsai, S.Q., Sander, J.D., Peterson, R.T., Yeh, J.R., Joung, J.K., 2013. Efficient genome editing in zebrafish using a CRISPR-Cas system. Nat. Biotechnol. 31, 227-229. Imakaev, M.V., Fudenberg, G., Mirny, L.A., 2015. Modeling chromosomes: Beyond pretty pictures. FEBS Lett. 589, 3031-3036. International Human Genome Sequencing Consortium, 2001. Initial sequencing and analysis of the human genome. Nature 409, 860-921. 24

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Jasin, M., Rothstein, R., 2013. Repair of strand breaks by homologous recombination. Cold Spring Harb. Perspect. Biol. 5, a012740. Jia, Z., Guo, Y., Tang, Y., Xu, Q., Li, B., Wu, Q., 2014. Regulation of the protocadherin Celsr3 gene and its role in globus pallidus development and connectivity. Mol. Cell. Biol. 34, 3895-3910. Jiang, W., Bikard, D., Cox, D., Zhang, F., Marraffini, L.A., 2013. RNA-guided editing of bacterial genomes using CRISPR-Cas systems. Nat. Biotechnol. 31, 233-239. Jiang, W., Marraffini, L.A., 2015. CRISPR-Cas: new tools for genetic manipulations from bacterial immunity systems. Annu. Rev. Microbiol. 69, 209-228. Jinek, M., Chylinski, K., Fonfara, I., Hauer, M., Doudna, J.A., Charpentier, E., 2012. A programmable dual-RNA-guided DNA endonuclease in adaptive bacterial immunity. Science 337, 816-821. Jinek, M., East, A., Cheng, A., Lin, S., Ma, E., Doudna, J., 2013. RNAprogrammed genome editing in human cells. eLife 2, e00471. Johnson, D.S., Mortazavi, A., Myers, R.M., Wold, B., 2007. Genome-wide mapping of in vivo protein-DNA interactions. Science 316, 1497-1502. Kearns, N.A., Pham, H., Tabak, B., Genga, R.M., Silverstein, N.J., Garber, M., Maehr, R., 2015. Functional annotation of native enhancers with a Cas9histone demethylase fusion. Nat. Methods 12, 401-403. Kehayova, P., Monahan, K., Chen, W., Maniatis, T., 2011. Regulatory elements required for the activation and repression of the protocadherin-alpha gene cluster. Proc. Natl. Acad. Sci. USA 108, 17195-17200. Kim, T.H., Abdullaev, Z.K., Smith, A.D., Ching, K.A., Loukinov, D.I., Green, R.D., Zhang, M.Q., Lobanenkov, V.V., Ren, B., 2007. Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell 128, 1231-1245. Kim, T.K., Hemberg, M., Gray, J.M., Costa, A.M., Bear, D.M., Wu, J., Harmin, D.A., Laptewicz, M., Barbara-Haley, K., Kuersten, S., MarkenscoffPapadimitriou, E., Kuhl, D., Bito, H., Worley, P.F., Kreiman, G., Greenberg, M.E., 2010. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182-187. Kmita, M., Kondo, T., Duboule, D., 2000. Targeted inversion of a polar silencer within the HoxD complex re-allocates domains of enhancer sharing. Nat. Genet. 26, 451-454. Kraft, K., Geuer, S., Will, A.J., Chan, W.L., Paliou, C., Borschiwer, M., Harabula, I., Wittler, L., Franke, M., Ibrahim, D.M., Kragesteen, B.K., Spielmann, M., Mundlos, S., Lupianez, D.G., Andrey, G., 2015. Deletions, inversions, duplications: engineering of structural variants using CRISPR/Cas in mice. Cell Rep. 10, 833-839. Kungulovski, G., Jeltsch, A., 2016. Epigenome editing: state of the art, concepts, and perspectives. Trends Genet. 32, 101-113. Lagha, M., Bothma, J.P., Levine, M., 2012. Mechanisms of transcriptional 25

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

precision in animal development. Trends Genet. 28, 409-416. Lander, E.S., 2016. The heroes of CRISPR. Cell 164, 18-28. Lee, H.J., Kim, E., Kim, J.S., 2010. Targeted chromosomal deletions in human cells using zinc finger nucleases. Genome Res. 20, 81-89. Lee, H.J., Kweon, J., Kim, E., Kim, S., Kim, J.S., 2012. Targeted chromosomal duplications and inversions in the human genome using zinc finger nucleases. Genome Res. 22, 539-548. Lefebvre, J.L., Kostadinov, D., Chen, W.V., Maniatis, T., Sanes, J.R., 2012. Protocadherins mediate dendritic self-avoidance in the mammalian nervous system. Nature 488, 517-521. Levine, M., Cattoglio, C., Tjian, R., 2014. Looping back to leap forward: transcription enters a new era. Cell 157, 13-25. Li, J., Shou, J., Guo, Y., Tang, Y., Wu, Y., Jia, Z., Zhai, Y., Chen, Z., Xu, Q., Wu, Q., 2015a. Efficient inversions and duplications of mammalian regulatory DNA elements and gene clusters by CRISPR/Cas9. J. Mol. Cell Biol. 7, 284298. Li, J., Shou, J., Wu, Q., 2015b. DNA fragment editing of genomes by CRISPR/Cas9. Hereditas(Beijing) 37, 992-1002. Li, L., Lyu, X., Hou, C., Takenaka, N., Nguyen, H.Q., Ong, C.T., CubenasPotts, C., Hu, M., Lei, E.P., Bosco, G., Qin, Z.S., Corces, V.G., 2015c. Widespread rearrangement of 3D chromatin organization underlies polycombmediated stress-induced silencing. Mol. Cell 58, 216-231. Lieberman-Aiden, E., van Berkum, N.L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B.R., Sabo, P.J., Dorschner, M.O., Sandstrom, R., Bernstein, B., Bender, M.A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L.A., Lander, E.S., Dekker, J., 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-293. Lindahl, T., 1993. Instability and decay of the primary structure of DNA. Nature 362, 709-715. Lobanenkov, V.V., Nicolas, R.H., Adler, V.V., Paterson, H., Klenova, E.M., Polotskaja, A.V., Goodwin, G.H., 1990. A novel sequence-specific DNA binding protein which interacts with three regularly spaced direct repeats of the CCCTC-motif in the 5'-flanking sequence of the chicken c-myc gene. Oncogene 5, 1743-1753. Lupiáñez, D.G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., Horn, D., Kayserili, H., Opitz, J.M., Laxova, R., Santos-Simarro, F., GilbertDussardier, B., Wittler, L., Borschiwer, M., Haas, S.A., Osterwalder, M., Franke, M., Timmermann, B., Hecht, J., Spielmann, M., Visel, A., Mundlos, S., 2015. Disruptions of topological chromatin domains cause pathogenic rewiring of gene-enhancer interactions. Cell 161, 1012-1025. Ma, H., Naseri, A., Reyes-Gutierrez, P., Wolfe, S.A., Zhang, S., Pederson, T., 2015. Multicolor CRISPR labeling of chromosomal loci in human cells. Proc. Natl. Acad. Sci. USA 112, 3002-3007. 26

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Maddalo, D., Manchado, E., Concepcion, C.P., Bonetti, C., Vidigal, J.A., Han, Y.C., Ogrodowski, P., Crippa, A., Rekhtman, N., de Stanchina, E., Lowe, S.W., Ventura, A., 2014. In vivo engineering of oncogenic chromosomal rearrangements with the CRISPR/Cas9 system. Nature 516, 423-427. Makarova, K.S., Wolf, Y.I., Alkhnbashi, O.S., Costa, F., Shah, S.A., Saunders, S.J., Barrangou, R., Brouns, S.J., Charpentier, E., Haft, D.H., Horvath, P., Moineau, S., Mojica, F.J., Terns, R.M., Terns, M.P., White, M.F., Yakunin, A.F., Garrett, R.A., van der Oost, J., Backofen, R., Koonin, E.V., 2015. An updated evolutionary classification of CRISPR-Cas systems. Nat. Rev. Microbiol. 13, 722-736. Mali, P., Esvelt, K.M., Church, G.M., 2013a. Cas9 as a versatile tool for engineering biology. Nat. Methods 10, 957-963. Mali, P., Yang, L., Esvelt, K.M., Aach, J., Guell, M., DiCarlo, J.E., Norville, J.E., Church, G.M., 2013b. RNA-guided human genome engineering via Cas9. Science 339, 823-826. Maston, G.A., Evans, S.K., Green, M.R., 2006. Transcriptional regulatory elements in the human genome. Annu. Rev. Genomics Hum. Genet. 7, 29-59. Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H., Reynolds, A.P., Sandstrom, R., Qu, H., Brody, J., Shafer, A., Neri, F., Lee, K., Kutyavin, T., Stehling-Sun, S., Johnson, A.K., Canfield, T.K., Giste, E., Diegel, M., Bates, D., Hansen, R.S., Neph, S., Sabo, P.J., Heimfeld, S., Raubitschek, A., Ziegler, S., Cotsapas, C., Sotoodehnia, N., Glass, I., Sunyaev, S.R., Kaul, R., Stamatoyannopoulos, J.A., 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190-1195. McVey, M., Lee, S.E., 2008. MMEJ repair of double-strand breaks (director's cut): deleted sequences and alternative endings. Trends Genet. 24, 529-538. Meyer, C.A., Liu, X.S., 2014. Identifying and mitigating bias in next-generation sequencing methods for chromatin biology. Nat. Rev. Genet. 15, 709-721. Mills, A.A., Bradley, A., 2001. From mouse to man: generating megabase chromosome rearrangements. Trends Genet. 17, 331-339. Monahan, K., Rudnick, N.D., Kehayova, P.D., Pauli, F., Newberry, K.M., Myers, R.M., Maniatis, T., 2012. Role of CCCTC binding factor (CTCF) and cohesin in the generation of single-cell diversity of protocadherin-alpha gene expression. Proc. Natl. Acad. Sci. USA 109, 9125-9130. Nakahashi, H., Kwon, K.R., Resch, W., Vian, L., Dose, M., Stavreva, D., Hakim, O., Pruett, N., Nelson, S., Yamane, A., Qian, J., Dubois, W., Welsh, S., Phair, R.D., Pugh, B.F., Lobanenkov, V., Hager, G.L., Casellas, R., 2013. A genome-wide map of CTCF multivalency redefines the CTCF code. Cell Rep. 3, 1678-1689. Narendra, V., Rocha, P.P., An, D., Raviram, R., Skok, J.A., Mazzoni, E.O., Reinberg, D., 2015. Transcription. CTCF establishes discrete functional chromatin domains at the Hox clusters during differentiation. Science 347, 1017-1021. 27

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Nichols, M.H., Corces, V.G., 2015. A CTCF code for 3D genome architecture. Cell 162, 703-705. Noonan, J.P., McCallion, A.S., 2010. Genomics of long-range regulatory elements. Annu. Rev. Genomics Hum. Genet. 11, 1-23. Nora, E.P., Lajoie, B.R., Schulz, E.G., Giorgetti, L., Okamoto, I., Servant, N., Piolot, T., van Berkum, N.L., Meisig, J., Sedat, J., Gribnau, J., Barillot, E., Bluthgen, N., Dekker, J., Heard, E., 2012. Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485, 381-385. Ong, C.T., Corces, V.G., 2014. CTCF: an architectural protein bridging genome topology and function. Nat. Rev. Genet. 15, 234-246. Orkin, S.H., 1990. Globin gene regulation and switching: circa 1990. Cell 63, 665-672. Orr-Weaver, T.L., Szostak, J.W., Rothstein, R.J., 1981. Yeast transformation: a model system for the study of recombination. Proc. Natl. Acad. Sci. USA 78, 6354-6358. Parelho, V., Hadjur, S., Spivakov, M., Leleu, M., Sauer, S., Gregson, H.C., Jarmuz, A., Canzonetta, C., Webster, Z., Nesterova, T., Cobb, B.S., Yokomori, K., Dillon, N., Aragon, L., Fisher, A.G., Merkenschlager, M., 2008. Cohesins functionally associate with CTCF on mammalian chromosome arms. Cell 132, 422-433. Phillips-Cremins, J.E., Sauria, M.E., Sanyal, A., Gerasimova, T.I., Lajoie, B.R., Bell, J.S., Ong, C.T., Hookway, T.A., Guo, C., Sun, Y., Bland, M.J., Wagstaff, W., Dalton, S., McDevitt, T.C., Sen, R., Dekker, J., Taylor, J., Corces, V.G., 2013. Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell 153, 1281-1295. Quitschke, W.W., Taheny, M.J., Fochtmann, L.J., Vostrov, A.A., 2000. Differential effect of zinc finger deletions on the binding of CTCF to the promoter of the amyloid precursor protein gene. Nucleic Acids Res. 28, 33703378. Rada-Iglesias, A., Bajpai, R., Swigut, T., Brugmann, S.A., Flynn, R.A., Wysocka, J., 2011. A unique chromatin signature uncovers early developmental enhancers in humans. Nature 470, 279-283. Rao, S.S., Huntley, M.H., Durand, N.C., Stamenova, E.K., Bochkov, I.D., Robinson, J.T., Sanborn, A.L., Machol, I., Omer, A.D., Lander, E.S., Aiden, E.L., 2014. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665-1680. Ren, B., Dixon, J.R., 2015. A CRISPR connection between chromatin topology and genetic disorders. Cell 161, 955-957. Renda, M., Baglivo, I., Burgess-Beusse, B., Esposito, S., Fattorusso, R., Felsenfeld, G., Pedone, P.V., 2007. Critical DNA binding interactions of the insulator protein CTCF: a small number of zinc fingers mediate strong binding, and a single finger-DNA interaction controls binding at imprinted loci. J. Biol. Chem. 282, 33336-33345. Rhee, H.S., Pugh, B.F., 2011. Comprehensive genome-wide protein-DNA 28

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

interactions detected at single-nucleotide resolution. Cell 147, 1408-1419. Ribich, S., Tasic, B., Maniatis, T., 2006. Identification of long-range regulatory elements in the protocadherin-alpha gene cluster. Proc. Natl. Acad. Sci. USA 103, 19719-19724. Saitoh, N., Bell, A.C., Recillas-Targa, F., West, A.G., Simpson, M., Pikaart, M., Felsenfeld, G., 2000. Structural and functional conservation at the boundaries of the chicken beta-globin domain. EMBO J. 19, 2315-2322. Sanborn, A.L., Rao, S.S., Huang, S.C., Durand, N.C., Huntley, M.H., Jewett, A.I., Bochkov, I.D., Chinnappan, D., Cutkosky, A., Li, J., Geeting, K.P., Gnirke, A., Melnikov, A., McKenna, D., Stamenova, E.K., Lander, E.S., Aiden, E.L., 2015. Chromatin extrusion explains key features of loop and domain formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. USA 112, E6456-6465. Sancar, A., Rupp, W.D., 1983. A novel repair enzyme: UVRABC excision nuclease of Escherichia coli cuts a DNA strand on both sides of the damaged region. Cell 33, 249-260. Sander, J.D., Joung, J.K., 2014. CRISPR-Cas systems for editing, regulating and targeting genomes. Nat. Biotechnol. 32, 347-355. Sanyal, A., Lajoie, B.R., Jain, G., Dekker, J., 2012. The long-range interaction landscape of gene promoters. Nature 489, 109-113. Schmidt, D., Schwalie, P.C., Wilson, M.D., Ballester, B., Goncalves, A., Kutter, C., Brown, G.D., Marshall, A., Flicek, P., Odom, D.T., 2012. Waves of retrotransposon expansion remodel genome organization and CTCF binding in multiple mammalian lineages. Cell 148, 335-348. Schoenherr, C.J., Anderson, D.J., 1995. The neuron-restrictive silencer factor (NRSF): a coordinate repressor of multiple neuron-specific genes. Science 267, 1360-1363. Sexton, T., Cavalli, G., 2015. The role of chromosome domains in shaping the functional genome. Cell 160, 1049-1059. Shen, Y., Yue, F., McCleary, D.F., Ye, Z., Edsall, L., Kuan, S., Wagner, U., Dixon, J., Lee, L., Lobanenkov, V.V., Ren, B., 2012. A map of the cisregulatory sequences in the mouse genome. Nature 488, 116-120. Simonis, M., Klous, P., Splinter, E., Moshkin, Y., Willemsen, R., de Wit, E., van Steensel, B., de Laat, W., 2006. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet. 38, 1348-1354. Spitz, F., Herkenne, C., Morris, M.A., Duboule, D., 2005. Inversion-induced disruption of the Hoxd cluster leads to the partition of regulatory landscapes. Nat. Genet. 37, 889-893. Splinter, E., Heath, H., Kooren, J., Palstra, R.J., Klous, P., Grosveld, F., Galjart, N., de Laat, W., 2006. CTCF mediates long-range chromatin looping and local histone modification in the beta-globin locus. Genes Dev. 20, 23492354. Suo, L., Lu, H., Ying, G., Capecchi, M.R., Wu, Q., 2012. Protocadherin 29

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

clusters and cell adhesion kinase regulate dendrite complexity through Rho GTPase. J. Mol. Cell Biol. 4, 362-376. Tai, D.J., Ragavendran, A., Manavalan, P., Stortchevoi, A., Seabra, C.M., Erdin, S., Collins, R.L., Blumenthal, I., Chen, X., Shen, Y., Sahin, M., Zhang, C., Lee, C., Gusella, J.F., Talkowski, M.E., 2016. Engineering microdeletions and microduplications by targeting segmental duplications with CRISPR. Nat. Neurosci. 19, 517-522. Tang, Z., Luo, O.J., Li, X., Zheng, M., Zhu, J.J., Szalaj, P., Trzaskoma, P., Magalska, A., Wlodarczyk, J., Ruszczycki, B., Michalski, P., Piecuch, E., Wang, P., Wang, D., Tian, S.Z., Penrad-Mobayed, M., Sachs, L.M., Ruan, X., Wei, C.L., Liu, E.T., Wilczynski, G.M., Plewczynski, D., Li, G., Ruan, Y., 2015. CTCF-mediated human 3D genome architecture reveals chromatin topology for transcription. Cell 163, 1611-1627. Tanimoto, K., Liu, Q., Bungert, J., Engel, J.D., 1999. Effects of altered gene order or orientation of the locus control region on human beta-globin gene expression in mice. Nature 398, 344-348. Thakore, P.I., D'Ippolito, A.M., Song, L., Safi, A., Shivakumar, N.K., Kabadi, A.M., Reddy, T.E., Crawford, G.E., Gersbach, C.A., 2015. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143-1149. Thu, C.A., Chen, W.V., Rubinstein, R., Chevee, M., Wolcott, H.N., Felsovalyi, K.O., Tapia, J.C., Shapiro, L., Honig, B., Maniatis, T., 2014. Single-cell identity generated by combinatorial homophilic interactions between alpha, beta, and gamma protocadherins. Cell 158, 1045-1059. Thurman, R.E., Rynes, E., Humbert, R., Vierstra, J., Maurano, M.T., Haugen, E., Sheffield, N.C., Stergachis, A.B., Wang, H., Vernot, B., Garg, K., John, S., Sandstrom, R., Bates, D., Boatman, L., Canfield, T.K., Diegel, M., Dunn, D., Ebersol, A.K., Frum, T., Giste, E., Johnson, A.K., Johnson, E.M., Kutyavin, T., Lajoie, B., Lee, B.K., Lee, K., London, D., Lotakis, D., Neph, S., Neri, F., Nguyen, E.D., Qu, H., Reynolds, A.P., Roach, V., Safi, A., Sanchez, M.E., Sanyal, A., Shafer, A., Simon, J.M., Song, L., Vong, S., Weaver, M., Yan, Y., Zhang, Z., Zhang, Z., Lenhard, B., Tewari, M., Dorschner, M.O., Hansen, R.S., Navas, P.A., Stamatoyannopoulos, G., Iyer, V.R., Lieb, J.D., Sunyaev, S.R., Akey, J.M., Sabo, P.J., Kaul, R., Furey, T.S., Dekker, J., Crawford, G.E., Stamatoyannopoulos, J.A., 2012. The accessible chromatin landscape of the human genome. Nature 489, 75-82. Tjian, R., Maniatis, T., 1994. Transcriptional activation: a complex puzzle with few easy pieces. Cell 77, 5-8. Torres, R., Martin, M.C., Garcia, A., Cigudosa, J.C., Ramirez, J.C., RodriguezPerales, S., 2014. Engineering human tumour-associated chromosomal translocations with the RNA-guided CRISPR-Cas9 system. Nat. Commun. 5, 3964. Vietri Rudan, M., Barrington, C., Henderson, S., Ernst, C., Odom, D.T., Tanay, A., Hadjur, S., 2015. Comparative Hi-C reveals that CTCF underlies evolution 30

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

of chromosomal domain architecture. Cell Rep. 10, 1297-1309. Visel, A., Blow, M.J., Li, Z., Zhang, T., Akiyama, J.A., Holt, A., Plajzer-Frick, I., Shoukry, M., Wright, C., Chen, F., Afzal, V., Ren, B., Rubin, E.M., Pennacchio, L.A., 2009. ChIP-seq accurately predicts tissue-specific activity of enhancers. Nature 457, 854-858. Wang, H., Maurano, M.T., Qu, H., Varley, K.E., Gertz, J., Pauli, F., Lee, K., Canfield, T., Weaver, M., Sandstrom, R., Thurman, R.E., Kaul, R., Myers, R.M., Stamatoyannopoulos, J.A., 2012. Widespread plasticity in CTCF occupancy linked to DNA methylation. Genome Res. 22, 1680-1688. Wang, H., Yang, H., Shivalila, C.S., Dawlaty, M.M., Cheng, A.W., Zhang, F., Jaenisch, R., 2013. One-step generation of mice carrying mutations in multiple genes by CRISPR/Cas-mediated genome engineering. Cell 153, 910918. Weckselblatt, B., Rudd, M.K., 2015. Human structural variation: mechanisms of chromosome rearrangements. Trends Genet. 31, 587-599. Wei, C., Liu, J., Yu, Z., Zhang, B., Gao, G., Jiao, R., 2013. TALEN or Cas9 rapid, efficient and specific choices for genome modifications. J. Genet. Genomics 40, 281-289. Wendt, K.S., Yoshida, K., Itoh, T., Bando, M., Koch, B., Schirghuber, E., Tsutsumi, S., Nagae, G., Ishihara, K., Mishiro, T., Yahata, K., Imamoto, F., Aburatani, H., Nakao, M., Imamoto, N., Maeshima, K., Shirahige, K., Peters, J.M., 2008. Cohesin mediates transcriptional insulation by CCCTC-binding factor. Nature 451, 796-801. Whyte, W.A., Orlando, D.A., Hnisz, D., Abraham, B.J., Lin, C.Y., Kagey, M.H., Rahl, P.B., Lee, T.I., Young, R.A., 2013. Master transcription factors and mediator establish super-enhancers at key cell identity genes. Cell 153, 307319. Wright, A.V., Nunez, J.K., Doudna, J.A., 2016. Biology and applications of CRISPR systems: harnessing nature's toolbox for genome engineering. Cell 164, 29-44. Wu, Q., 2005. Comparative genomics and diversifying selection of the clustered vertebrate protocadherin genes. Genetics 169, 2179-2188. Wu, Q., Maniatis, T., 1999. A striking organization of a large family of human neural cadherin-like cell adhesion genes. Cell 97, 779-790. Wu, Q., Zhang, T., Cheng, J.F., Kim, Y., Grimwood, J., Schmutz, J., Dickson, M., Noonan, J.P., Zhang, M.Q., Myers, R.M., Maniatis, T., 2001. Comparative DNA sequence analysis of mouse and human protocadherin gene clusters. Genome Res. 11, 389-404. Wu, S., Ying, G., Wu, Q., Capecchi, M.R., 2007. Toward simpler and faster genome-wide mutagenesis in mice. Nat. Genet. 39, 922-930. Xi, H., Shulha, H.P., Lin, J.M., Vales, T.R., Fu, Y., Bodine, D.M., McKay, R.D., Chenoweth, J.G., Tesar, P.J., Furey, T.S., Ren, B., Weng, Z., Crawford, G.E., 2007. Identification and characterization of cell type-specific and ubiquitous chromatin regulatory structures in the human genome. PLoS Genet. 3, e136. 31

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Xiao, A., Wang, Z., Hu, Y., Wu, Y., Luo, Z., Yang, Z., Zu, Y., Li, W., Huang, P., Tong, X., Zhu, Z., Lin, S., Zhang, B., 2013. Chromosomal deletions and inversions mediated by TALENs and CRISPR/Cas in zebrafish. Nucleic Acids Res. 41, e141. Xie, X., Mikkelsen, T.S., Gnirke, A., Lindblad-Toh, K., Kellis, M., Lander, E.S., 2007. Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc. Natl. Acad. Sci. USA 104, 7145-7150. Xu, C., Corces, V.G., 2015. Towards a predictive model of chromatin 3D organization. Seminars in cell & developmental biology. doi: 10.1016/j.semcdb.2015.11.013. Yokota, S., Hirayama, T., Hirano, K., Kaneko, R., Toyoda, S., Kawamura, Y., Hirabayashi, M., Hirabayashi, T., Yagi, T., 2011. Identification of the cluster control region for the protocadherin-beta genes located beyond the protocadherin-gamma cluster. J. Biol. Chem. 286, 31885-31895. Zetsche, B., Gootenberg, J.S., Abudayyeh, O.O., Slaymaker, I.M., Makarova, K.S., Essletzbichler, P., Volz, S.E., Joung, J., van der Oost, J., Regev, A., Koonin, E.V., Zhang, F., 2015. Cpf1 is a single RNA-guided endonuclease of a class 2 CRISPR-Cas system. Cell 163, 759-771. Zhang, T., Haws, P., Wu, Q., 2004. Multiple variable first exons: a mechanism for cell- and tissue-specific gene regulation. Genome Res. 14, 79-89. Zhang, Y., McCord, R.P., Ho, Y.J., Lajoie, B.R., Hildebrand, D.G., Simon, A.C., Becker, M.S., Alt, F.W., Dekker, J., 2012. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell 148, 908921. Zhao, Z., Tavoosidana, G., Sjolinder, M., Gondor, A., Mariano, P., Wang, S., Kanduri, C., Lezcano, M., Sandhu, K.S., Singh, U., Pant, V., Tiwari, V., Kurukuti, S., Ohlsson, R., 2006. Circular chromosome conformation capture (4C) uncovers extensive networks of epigenetically regulated intra- and interchromosomal interactions. Nat. Genet. 38, 1341-1347. Zou, C., Huang, W., Ying, G., Wu, Q., 2007. Sequence analysis and expression mapping of the rat clustered protocadherin gene repertoires. Neuroscience 144, 579-603. FIGURE LEGENDS Fig. 1. CRISPR DNA-fragment editing for regulatory elements. A: The Streptococcus pyogenes Cas9 (SpCas9) endonuclease cleaves DNA via its RuvC and HNH nuclease domains, each of which nicks a DNA strand to generate a blunt-end DSB. Cas9 is targeted to specific DNA sequences by a programmable single guide RNA (sgRNA). The RNA sequences bind to the targeting site upstream of a requisite NGG protospacer-adjacent motif (PAM; red) through direct Watson-Crick base pairing. Cas9 generates a DSB 3 bp upstream of a PAM (red scissors). The resulting DSB can be repaired by one of four pathways (NHEJ, HR, MMEJ, and SSA). When DNA end resection is 32

ACCEPTED MANUSCRIPT

M AN U

SC

RI PT

blocked by the Ku70/80 heterodimer, the DNA repair pathway of NHEJ is favored. However, when DNA resection occurs, three DNA repair pathways (HR, MMEJ, and SSA) compete for repairing DSB. In the NHEJ pathway, Ku dimers bind to the two blunt ends. These blunt ends are repaired and rejoined, resulting in random indels at the site of junction. In the HR pathway, Rad51 binds to DSB ends, recruiting accessory factors that direct genomic recombination with homologous arms from an exogenous repair template. Both MMEJ and SSA pathways require 3′ end resection or unwinding to reveal homologous sequences, although the length of homology required for MMEJ (5-25 bp) is shorter than for SSA. MMEJ results in deletions but sometimes also nucleotide insertions, but SSA only results in deletions. NHEJ, nonhomologous end joining; HR, homologous recombination; MMEJ, microhomology-mediated end joining; SSA, single-strand annealing. B: The CRISPR DNA-fragment editing by Cas9 with a pair of sgRNAs can result in insitu deletion, inversion, duplication, insertion, and substitution of DNA regulatory elements or gene clusters. RE, regulatory element.

AC C

EP

TE D

Fig. 2. CRISPR-mediated epigenome editing and locus imaging. A: Nuclease-deficient Cas9 (dCas9) can be programed by sgRNA to specific locus for gene repression. In addition, transcription repression by dCas9 can be enhanced by fusing dCas9 with repressor effectors, including MAXinteracting protein 1 (MXI1), Krüppel-associated box (KRAB), or four concatenated mSin3 domains (SID). By contrast, transcription activation can be achieved by fusing dCas9 with activator effectors, including viral protein 16 tetramer of the herpes simplex (VP64), the p65 activation domain of the nuclear factor-κB (NF-κB) (p65AD), and the Epstein-Barr virus R transactivator (Rta). B: CRISPR RNA-scaffold-based recruitment can also be used to repress or activate gene expression. This enables simultaneous repression and activation of genes in loci A and B, respectively. The sgRNA is engineered with additional RNA domains to recruit RNA-binding proteins that are fused to functional effectors. C: Epigenome editing can be achieved by fusing epigenetic regulators to dCas9. Fusion of the histone lysine-specific demethylase 1 (LSD1) to Neisseria meningitidis dCas9 removes active enhancer marks, histone 3 lysine 4 dimethylation (H3K4me2), from targeted distal enhancers, leading to transcription repression. By contrast, fusion of the histone acetyltransferase core domain of the human E1A-associated protein p300 with dCas9 catalyzes acetylation of histone 3 lysine 27 (H3K27ac) at target sites. D: The combinational usage of orthogonal S. pyogenes dCas9RFP and N. meningitidis dCas9-GFP fusions simultaneously labels loci A and B with distinct colored (red and green, respectively) fluorescence, allowing live cell imaging of the chromosomal interactions between loci A and B. Fig. 3. CRISPR engineering of 3D genomes. A: The three human protocadherin (Pcdh) gene clusters (α, β, and γ) are 33

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

organized into two CCDs/subTADs (CTCF/cohesin-mediated chromatin domains or topologically associated domains). The Pcdh α and γ clusters contain variable (var) and constant (con) (consist of three exons) regions, similar to the genomic organization of the immunoglobulin, T-cell receptor, and Ugt1 gene clusters. The Pcdhβ cluster only has variable exons but with no constant exons. The Pcdhα variable region contains 13 highly-similar alternate variable exons (α1-α13), arrayed in tandem, and two divergent Ctype ubiquitous variable exons (αc1 and αc2). Each of the 15 Pcdhα variable exons is preceded by a separate promoter. Each of the Pcdhα promoters (except αc2) is flanked by an upstream conserved sequence element (CSE), which is recognized by directional CTCF binding and a downstream exonic CTCF-binding site (eCBS) (α1 and αc1 do not have eCBS). All of the Pcdhα variable tandem CBSs are in the forward orientation. HS5-1 and HS7, DNase I hypersensitive sites, are the two enhancers of the Pcdhα cluster. HS5-1 contains two CBSs (HS5-1a and HS5-1b) that are in the reverse orientation. Within the Pcdhα topological domain, specific long-distance chromatin looping interactions between the forward paired CBSs of a variable promoter and the reverse paired CBSs of the HS5-1 enhancer determine the stochastic promoter choice in the combinatorial expression of members of the Pcdhα gene clusters in the brain. Similarly, the Pcdhγ variable region contains 12 alternate a-type (γa1-γa12) and 7 alternate b-type (γb1-γb7), as well as 3 divergent c-type (γc3-γc5) ubiquitous variable exons. The Pcdhβ cluster contains 16 variable exons (β1-β16). Each of Pcdhβ and γ variable exons is preceded by a separate promoter. Each of the Pcdhβ and γ promoters (except β1, γc4, and γc5) contains a CSE, which is recognized by directional CTCF binding and thus is a CBS by definition. All of the Pcdhβ and γ variable CBSs are in the forward orientation. Downstream of the Pcdhγ cluster, there are linked clusters of strong enhancers (super-enhancer) containing five reverse oriented CBSs in tandem. Long-distance chromatin looping interactions between the forward CBSs in the variable region and the reverse CBSs in the super-enhancer region form the Pcdhβγ topological domain. Reverse-forward CBS pairs forms the boundary between neighboring topological domains. By the virtual of opposite looping, this boundary functions as an insulator that blocks enhancers within one topological domain from activating promoters within the neighboring domains. B: Long-distance CTCF/cohesin-mediated (probably with other architectural proteins) chromatin looping interactions bring the remote regulatory element (enhancer or silencer) in close contact with a promoter. Thus, CTCF could function as a transcription activator (in case of looping enhancers) or a transcription repressor (in case of looping silencers). C: 1D sequence to 3D genome. The genome-wide CTCF code, consisting of hundreds of thousand CBSs, determines the ordered assembly of 3D topological genome architecture. Specifically, the linear genome sequence containing oriented CBSs plays a central role in the establishment of long-distance chromatin looping interactions between forward and reverse 34

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

CBSs or CBS clusters. CTCF proteins mediate specific chromatin looping interactions between CBSs in the forward-reverse orientations, and thus promote the establishment of distinct topological domains within chromosome territories in the mammalian cell nucleus. The location and relative orientations of CBSs determine 3D genome architecture and developmental gene expression.

35

A

ACCEPTED MANUSCRIPT

RuvC domain

3′

Cas9

PAM

Homology (>30 bp)

NG G

3′ 5′

5′

Rad52 sgRNA HNH domain

3′ 5′ 5′ 3′ 5′ 3′ 3′ 5′

5′ 3′

RE

Microhomology (5-25 bp) 3′ 5′

5′ 3′ MMEJ

HR 5′ 3′

3′ 5′

Precise repair in late S/G2 phases

AC C

DNA repair in any phase of cell cycle

5′ 3′

3′ 5′

3′ 5′

EP

NHEJ

3′ 5′

Rad51

TE D

5′ 3′

RPA

SC M AN U

DNA double stranded break (DBS)

Ku70/80

SSA

5′ 3′

3′ 5′

5′ 3′

B

3′ 5′

5′ 3′

RI PT

5′ 3′

DNA repair in M/early S phases

RE

RE

3′ 5′

RE RE

RE Trans-allelic recombination RE DNA fragment deletion DNA fragment inversion

RE

RE

DNA fragment duplication DNA fragment deletion

RE

RE

DNA fragment insertion DNA fragment substitution

ACCEPTED MANUSCRIPT

A RuvC (D10A)

3’

Repressor

dCas9

PAM

3’ Activator

dCas9

NG G

RI PT

NG G

5’

5’ sgRNA HNH (H840A)

B

RNA binding module

Repressor 3’

SC

RNA binding module

Activator

3’

dCas9

dCas9

M AN U

NG G

NG G

5’

5’

Locus A

C Epigenetic repression H3K4me2 H3K4

Epigenetic activation H3K27me3 H3K27ac

TE D

Me

3’

LSD1

Locus B

dCas9

Ac

3’ P300 Core

dCas9 NG G

NG G

EP

5’

Proximal and distal regulatory elements

AC C

Distal enhancer

D

5’

RFP

3’

S. pyogenes dCas9-RFP

NG G

5’ Locus A Locus A GFP

Locus B 3’

N. meningitidis dCas9-GFP 5’

Locus B

NNNN GA TT

A

Human Protocadherin Gene Clusters

ACCEPTED MANUSCRIPT

S5

-1

Pcdhγ Var γa1-12 γb1-7

β1-16

H

S7

Pcdhβ

Ubiquitous

Alternate

Con γc3-5

Ubiquitous

RI PT

Alternate

H

α1-13

βγ CCD/subTAD

Pcdhα Var Con αc1-2

H S H 7L S5 H -1a S5 L H -1b S1 L 820

α CCD/subTAD

CTCF binding Site

B

SC

CTCF binding site Enhancer

M AN U

Cohesin Enhancer Promoter Silencer

Promoter

Promoter

C

Silencer

EP

A

TE D

(CBS) b nding site CTCF bi

C h hromosome territory i Chromosome

AC C

Nucleus

Topological domain

CBSs CBSs Topological domain