Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview

Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: A brief overview

Accepted Manuscript Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: a brief overview Tautvydas Karvelis, Giedrius Gasiunas, Virg...

580KB Sizes 0 Downloads 22 Views

Accepted Manuscript Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: a brief overview Tautvydas Karvelis, Giedrius Gasiunas, Virginijus Siksnys PII: DOI: Reference:

S1046-2023(16)30304-8 http://dx.doi.org/10.1016/j.ymeth.2017.03.006 YMETH 4158

To appear in:

Methods

Received Date: Revised Date: Accepted Date:

21 December 2016 9 February 2017 3 March 2017

Please cite this article as: T. Karvelis, G. Gasiunas, V. Siksnys, Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: a brief overview, Methods (2017), doi: http://dx.doi.org/10.1016/j.ymeth.2017.03.006

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Methods for decoding Cas9 protospacer adjacent motif (PAM) sequences: a brief overview

Tautvydas Karvelis1, Giedrius Gasiunas1 and Virginijus Siksnys1*

1

*

Institute of Biotechnology, Vilnius University, Saulėtekio av. 7, LT-10257 Vilnius, Lithuania

Corresponding author: V. Siksnys ([email protected])

Mail: Institute of Biotechnology, Vilnius University, Saulėtekio av. 7, LT-10257, Vilnius Lithuania Tel: +370-5-2234354;

Manuscript Information:

Figures: 2 Tables: 1

Highlights    

PAM sequences are uniquely associated with each Cas9 protein. PAM is required to initiate base-pairing between guide RNA and DNA target. High-throughput screening approaches allow PAM identification. Cas9 with novel PAM requirements may expand toolbox for genome editing.

2

Abstract Recently the Cas9, an RNA guided DNA endonuclease, emerged as a powerful tool for targeted genome manipulations. Cas9 protein can be reprogrammed to cleave, bind or nick any DNA target by simply changing crRNA sequence, however a short nucleotide sequence, termed PAM, is required to initiate crRNA hybridization to the DNA target. PAM sequence is recognized by Cas9 protein and must be determined experimentally for each Cas9 variant. Exploration of Cas9 orthologs could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. Here we briefly review and compare Cas9 PAM identification assays that can be adopted for other PAM-dependent CRISPR-Cas systems. Abbreviations HEase – homing endonuclease; ZFN – zinc finger nuclease; TALEN – TAL-effector nuclease; DSB – double strand break; dCas9 – catalytically inactive (dead) Cas9; RNP – ribonucleoprotein. Keywords Genome editing; CRISPR; Cas9; high-throughput methods; PAM library.

3

1.

Introduction

In recent years the bacterial Cas9 protein from Type II CRISPR-Cas (Clustered Regularly Interspaced Short Palindromic Repeats-CRISPR associated) systems emerged as a powerful tool for targeted genome manipulation. Previous attempts to manipulate gene function relied on native or engineered nucleases [homing endonucleases (HEases), zinc finger nucleases (ZFNs) and TAL-effector nucleases (TALENs)] that recognize long nucleotide sequences and induce a double strand break (DSB) in the vicinity of target sequence. Despite the differences in protein scaffolds, target-site specificity of HEases, ZFNs, and TALENs is dictated by intricate interactions between amino acid residues in the DNA-binding domain, and donor and acceptor atoms on the DNA base edges [1]. Therefore, re-engineering of nuclease specificity requires modification of amino acid residues at the protein-DNA interface. This process is time and labor consuming and often produces variants that are poised for off-target cleavage [1,2]. In contrast to HEases, ZFNs, and TALENs, Cas9-dual RNA ribonucleoprotein (RNP) complex achieves target-site specificity through Watson-Crick base pairing between crRNA and target DNA, while the Cas9 protein executes double-stranded DNA cleavage using two active sites that act on separate DNA strands [3,4]. Over the past few years Cas9 became a silver bullet for genome manipulation since simply by changing the crRNA component, Cas9 can be reprogrammed to cleave different DNA targets [5]. The Cas9 not just revolutionized the genome editing field but was also repurposed to control gene expression [6,7], build gene drives [8], and develop novel antimicrobial agents [9]. Although Cas9 RNP complex specificity is dictated by the base pairing between crRNA and the matching DNA target (protospacer), a short protospacer adjacent motif (PAM) sequence in the vicinity of the 3’-end of the protospacer is required to initiate the hybridization between crRNA and the DNA target [10,11]. In contrast to the protospacer sequence which is recognized via Watson-Crick base pairing between crRNA and target DNA, the PAM sequence is read-out by the Cas9 protein [12]. PAM sequences are uniquely associated with each Cas9 protein and license subsequent DNA unwinding and base pairing with guide RNA [13–17]. In bacteria, the PAM is required for discrimination of “self” vs. “invader” DNA to avoid autoimmunity: the lack of the PAM in the vicinity of spacers in the CRISPR array makes host DNA resistant to Cas9 cleavage [18,19]. The absolute requirement of the PAM for the Cas9 interference against

4

invading DNA becomes a constraint that restricts the DNA sequence space available for genome editing applications. These targeting constraints can be important for the efficient homologydirected repair, where DSBs within 10-20 bp of the target sequence are desirable [20]. Precise targeting is also required for allele alterations to specifically edit a single allele [21,22]. The exploration of thousands of Cas9 orthologs present in the sequence databases [23,24] could offer a diversity of PAM sequences and novel biochemical properties that may be beneficial for genome editing applications. To harness the diversity of Cas proteins for genome manipulations, guide sequence and PAM requirements have to be established for novel Cas9 orthologs. The same is true for the recently discovered PAM-dependent Cpf1 effector complexes of Type V CRISPR-Cas systems [25–27], which similarly to Cas9 can be exploited for gene editing [25,28,29]. In this review, we provide a brief overview of methods used for deciphering PAM sequences for Cas9 and other CRISPR-Cas effector complexes. 2.

PAM determination strategies

2.1. In silico PAM prediction Although all key protein and RNA components of the Cas9 effector complex required for DNA cleavage are encoded in the CRISPR-Cas locus of bacterial genome, a hint for the PAM specificity is hidden in the invader DNA. A CRISPR-Cas system of bacteria highjacks pieces of the alien DNA and incorporates them as spacers in the CRISPR array [31,32]. Therefore, the spacer content of the CRISPR array of bacteria provides a historical record of past phage exposure [33–35]. The bioinformatic approach for decoding the PAM sequences for a Cas9 protein is based on extracting spacer sequences present in the CRISPR locus of bacteria (using e.g., CRISPRFinder [36]) and finding matches in bacteriophage or plasmid sequence databases (e.g., GenBank [37]) with the help of Blast [38] or more specific tools, e.g., CRISPRTarget [39]. By aligning protospacer sequences and analyzing sequences flanking a DNA target (5’-end for Type I and Type V, and 3’-end for Type II CRISPR-Cas systems) it is possible to identify the consensus PAM sequence. However, usually the success rate is low because of a relatively small number of known phage genomes compared to the abundance of phages to which bacteria are exposed [40]. Nevertheless, this approach enabled to identify the PAM sequences for a set of Cas9 proteins that were successfully adopted in genome targeting applications (Table 1).

5

2.2. In vivo PAM library screening This method is based on the high-throughput PAM library interrogation using negative or positive selection screens in vivo. The principle of negative selection is implemented in plasmid depletion experiments [41–43]. Briefly, cells bearing plasmids encoding the Cas9 and protospacer-targeting guide RNA are co-transformed with a plasmid library containing an antibiotic resistance gene, and a protospacer sequence flanked by a randomized PAM sequence (Figure 1A). Plasmids containing functional PAMs are cleaved by the Cas9 leading to cell death. Deep-sequencing of the Cas9 cleavage-resistant plasmid pool isolated from the surviving cells displays a set of depleted plasmids that contain functional cleavage-permitting PAMs [41–43]. In contrast to the plasmid depletion experiments, where identification of functional PAMs relies on plasmid cleavage, the PAM-SCANR (PAM screen achieved by NOT-gate repression) method relies on the binding of catalytically inactive (dead) Cas9 (dCas9) to the DNA target containing a functional PAM [44] (Figure 1B). To associate Cas9 binding with a positive signal, genetic circuit, a NOT-gate, is engineered. Briefly, protospacers flanked by the randomized PAMs are inserted within the promoter of lacI, to enable the control of gfp expression by LacI protein. Therefore, only functional PAMs would result in dCas9 binding and transcriptional repression of lacI leading to gfp expression. Next, cells encoding dCas9 and the guide RNA on a single plasmid are transformed with the PAM library carrying the reporter construct. After GFPpositive cell isolation using fluorescence activated cell sorting (FACS), plasmid purification and sequencing, the functional PAMs are established [44]. 2.3. In vitro PAM library screening An alternative high-throughput strategy for the PAM determination relies on PAM library cleavage in vitro. PAM library in the form of DNA plasmid [45,46] or concatemeric repeats [47] is subjected to cleavage by the Cas9 RNP complex assembled in vitro [46,47] or in cell lysates [45]. Cas9 cleaves the target DNA forming a DSB only if cleavage-permitting PAM is present in the library. Resulting free DNA ends are captured by adapter ligation, followed by the PCR amplification of the PAM-sided products (Figure 1C). Amplified library of functional PAMs is subjected to deep sequencing and PAMs licensing DNA cleavage are identified. A negative selection in vitro screen based on plasmid depletion experiments has also been implemented for 6

the PAM determination [16,25]. In this case, plasmid library after in vitro cleavage is subjected to deep-sequencing and functional depleted PAMs are identified. 3.

PAM sequence visualization

A set of PAM sequences obtained by bioinformatic analysis or high-throughput experimental approaches is visualized in a few ways. The most typical representation is based on a graphical sequence logo [48]. Multiple PAM sequences are aligned and sequence logo is generated in such way that the height of the stack of nucleotide symbols (in Bits) indicates the sequence conservation at that position and the height of the individual symbol represents its relative frequency (Figure 2A). Therefore, the sequence logo provides more accurate PAM sequence landscape than a plain consensus sequence. However, PAM profiles in the sequence logo are just an “average PAM” and do not take into account each nucleotide impact on the PAM. In order to represent PAMs preserving the information for each individual sequence, “PAM Wheel” method has been proposed [44]. PAM wheels are based on Krona plots [49] and contain information for all sequence combinations recognized as PAM (Figure 2B). Each PAM positions are represented by circles going from center outwards, and the areas of each nucleotide in the circles are proportional to the relative frequency of the nucleotide at that position. This method of PAM representation allows to evaluate the relevance of different PAM variants. 4.

Pros and cons of different methods for PAM characterization

Stringent PAM dependence of DNA cleavage by Cas9 restricts available sequence space for the genome editing applications where accurate targeting, e.g., at a single nucleotide resolution, is required. Cas9 orthologs may show different PAM requirements, therefore to harness the potential of Cas9 variants for the genome manipulation, experimental methods for PAM characterization were developed. Despite of a relatively simple and successful in silico PAM decoding for a handful of Cas9 proteins (Table 1), bioinformatics approach is currently limited due to the relatively low match of spacer sequences to the phage sequences present in databases. Therefore, current experimental strategies for the PAM characterization rely on the highthroughput interrogation of PAM libraries either in vivo or in vitro. These methods have enabled PAM sequence characterization for novel Cas9 orthologs and revealed a complex nature of the PAM sequence recognition.

7

First experimental hint that Cas9 may recognize degenerate PAM variants instead of a strict sequence came from Streptococcus pyogenes (Sp) Cas9 plasmid library depletion experiments [41]. In addition to the previously identified NGG PAM, SpCas9 targeting was also observed for NAG and NNGG sequences. Similar results were later reported for the Cas9s from Streptococcus thermophilus CRISPR1 (St1) and Neisseria meningitidis (Nm), targeting NNAGAAW and NNNNGATT, respectively [42]. In addition to consensus PAM, both St1Cas9 and NmCas9 recognized a set of additional PAMs that were closely related to the consensus PAM sequence. Moreover, the analysis of depleted PAMs also revealed PAM sequence variations depending on the protospacer sequences [42]. These studies provided a proof of principle that the highthroughput plasmid depletion screens can be used to decipher PAMs. However, because PAM data are obtained by the negative selection in the cells (by sequencing the plasmids that were not cleaved by Cas9) these screens require a high sequencing coverage. PAM-SCANR method that relies on binding of the dCas9 to the randomized PAM sequences and selection of the GFP-positive clones allows to overcome potential PAM bias problems in the negative selection screens [44]. FACS separation of the GFP-positive cells bearing plasmids with binding-permitting PAMs streamlined the analysis without sacrificing sensitivity. In addition, PAM recognition stringency control by IPTG titration of the active LacI repressor concentration exposes “weak” PAMs. However, since selection relies solely on binding of the cleavagedeficient dCas9, PAM-SCANR assay may report PAMs that differ from those required for DNA cleavage. Although in vivo screens report PAM sequences, due to varying Cas9 concentrations in different cell types and uncontrollable cleavage time, identification of most stringent PAMs may become difficult, as exemplified by St1Cas9 case [42,43]. PAM library screening in vitro by Cas9 RNP complex allows to overcome these limitations. In this assay, the randomized PAM library in the plasmid or concatemeric repeat background is subjected to cleavage by Cas9 RNP assembled using purified Cas9 protein or crude cell lysates. Subsequent PCR amplification of the cleavage products captured by adapter ligation allows identifying cleavage-permitting PAMs. In contrast to in vivo screens, the PAM library and the enzyme concentrations (or dilutions for effector complex assembled using cell lysates) and cleavage time can be strictly controlled enabling PAM interrogation in a dose-dependent manner. The importance of reaction conditions in the PAM determination assay is illustrated by the St1Cas9 case. In silico analysis of spacer 8

sequences acquired in the phage challenge experiments reported St1Cas9 PAM consensus sequence NNAGAAW [35]. However neither in vivo methods, nor in vitro PAM determination assay using cell lysates managed to recapitulate the PAM consensus identified in silico [42–45] (Table 1). Only in vitro PAM library screen under stringent reaction conditions returned the NNAGAAW PAM consensus sequence for the St1Cas9 [46]. Moreover, this method was successfully applied for the PAM characterization of Cas9s from Staphylococcus aureus [45] and Bacillus laterosporus [46] that were subsequently used as a genome editing tools in vivo. The major limitation of in vitro PAM library screen assay is the prerequisite of a heterologous expression and isolation of the Cas9 protein. This limitation can be overcome using cell lysates or in vitro translation. Additionally, high-throughput PAM determination methods can be adopted for PAM characterization of Cas9 mutant variants with altered PAM recognition preferences obtained by directed evolution [43,50] and rational design [15,16] (Table 1).

5.

Conclusions

Recently developed experimental methods of the PAM determination that rely mainly on the high-throughput in vivo and in vitro assays were successfully applied for the characterization of novel Cas9 orthologs. Relatively small number of Cas9 proteins characterized to date revealed a remarkable diversity in the PAM sequences expanding target sequence space available for genome editing applications. Importantly, the analysis of the PAM sequences identified by highthroughput assays often reveal a set of related PAM sequences depending on protein concentration and reaction time. Detailed characterization of these degenerate PAM profiles should improve methods for the off-target site prediction for the genome editing applications. Acknowledgment This work was supported by the Agency for Science, Innovation and Technology (MITA) grant #31v-107.

9

References [1]

M. Porteus, Genome Editing: A New Approach to Human Therapeutics, Annu. Rev. Pharmacol. Toxicol. 56 (2016) 163–190.

[2]

S. Chandrasegaran, D. Carroll, Origins of Programmable Nucleases for Genome Engineering, J. Mol. Biol. 428 (2015) 963–989.

[3]

M. Jinek, K. Chylinski, I. Fonfara, M. Hauer, J.A. Doudna, E. Charpentier, A Programmable Dual-RNA-Guided DNA Endonuclease in Adaptive Bacterial Immunity, Science. 337 (2012) 816–821.

[4]

G. Gasiunas, R. Barrangou, P. Horvath, V. Siksnys, Cas9-crRNA ribonucleoprotein complex mediates specific DNA cleavage for adaptive immunity in bacteria., Proc. Natl. Acad. Sci. 109 (2012) 2579–2586.

[5]

S.H. Sternberg, J.A. Doudna, Expanding the Biologist’s Toolkit with CRISPR-Cas9, Mol. Cell. 58 (2015) 568–574.

[6]

L.A. Gilbert, M.H. Larson, L. Morsut, Z. Liu, G.A. Brar, S.E. Torres, N. Stern-Ginossar, O. Brandman, E.H. Whitehead, J.A. Doudna, W.A. Lim, J.S. Weissman, L.S. Qi, CRISPR-Mediated Modular RNA-Guided Regulation of Transcription in Eukaryotes, Cell. 154 (2013) 442–451.

[7]

L.S. Qi, M.H. Larson, L.A. Gilbert, J.A. Doudna, J.S. Weissman, A.P. Arkin, W.A. Lim, Repurposing CRISPR as an RNA-Guided Platform for Sequence-Specific Control of Gene Expression, Cell. 152 (2013) 1173–1183.

[8]

K.M. Esvelt, A.L. Smidler, F. Catteruccia, G.M. Church, Concerning RNA-guided gene drives for the alteration of wild populations, Elife. 3 (2014).

[9]

C.L. Beisel, A.A. Gomaa, R. Barrangou, A CRISPR design for next-generation antimicrobials, Genome Biol. 15 (2014) 516.

[10] M.D. Szczelkun, M.S. Tikhomirova, T. Sinkunas, G. Gasiunas, T. Karvelis, P. Pschera, V. Siksnys, R. Seidel, Direct observation of R-loop formation by single RNA-guided Cas9

10

and Cascade effector complexes., Proc. Natl. Acad. Sci. 111 (2014) 9798–803. [11] S.H. Sternberg, S. Redding, M. Jinek, E.C. Greene, J.A. Doudna, DNA interrogation by the CRISPR RNA-guided endonuclease Cas9., Nature. 507 (2014) 62–7. [12] V. Siksnys, G. Gasiunas, Rewiring Cas9 to Target New PAM Sequences, Mol. Cell. 61 (2016) 793–794. [13] C. Anders, O. Niewoehner, A. Duerst, M. Jinek, Structural basis of PAM-dependent target DNA recognition by the Cas9 endonuclease, Nature. 513 (2014) 569–573. [14] S. Hirano, H. Nishimasu, R. Ishitani, O. Nureki, Structural Basis for the Altered PAM Specificities of Engineered CRISPR-Cas9, Mol. Cell. 61 (2016) 886–894. [15] C. Anders, K. Bargsten, M. Jinek, Structural Plasticity of PAM Recognition by Engineered Variants of the RNA-Guided Endonuclease Cas9, Mol. Cell. 61 (2016) 895– 902. [16] H. Hirano, J.S. Gootenberg, T. Horii, O.O. Abudayyeh, M. Kimura, P.D. Hsu, T. Nakane, R. Ishitani, I. Hatada, F. Zhang, H. Nishimasu, O. Nureki, Structure and Engineering of Francisella novicida Cas9, Cell. 164 (2016) 950–961. [17] H. Nishimasu, L. Cong, W.X. Yan, R. Ishitani, F. Zhang, O.N. Correspondence, Crystal Structure of Staphylococcus aureus Cas9, Cell. 162 (2015) 1113–1126. [18] L. a Marraffini, E.J. Sontheimer, Self versus non-self discrimination during CRISPR RNA-directed immunity., Nature. 463 (2010) 568–71. [19] E.R. Westra, E. Semenova, K.A. Datsenko, R.N. Jackson, B. Wiedenheft, K. Severinov, S.J.J. Brouns, Type I-E CRISPR-cas systems discriminate target from non-target DNA through base pairing-independent PAM recognition., PLoS Genet. 9 (2013) e1003742. [20] B. Elliott, C. Richardson, J. Winderbaum, J.A. Nickoloff, M. Jasin, Gene conversion tracts from double-strand break repair in mammalian cells., Mol. Cell. Biol. 18 (1998) 93–101. [21] D.G. Courtney, J.E. Moore, S.D. Atkinson, E. Maurizi, E.H.A. Allen, D.M.L. Pedrioli, W.H.I. McLean, M.A. Nesbit, C.B.T. Moore, CRISPR/Cas9 DNA cleavage at SNP11

derived PAM enables both in vitro and in vivo KRT12 mutation-specific targeting, Gene Ther. 23 (2016) 108–112. [22] Y. Li, S. Mendiratta, K. Ehrhardt, N. Kashyap, M.A. White, L. Bleris, Exploiting the CRISPR/Cas9 PAM Constraint for Single-Nucleotide Resolution Interventions, PLoS One. 11 (2016) e0144970. [23] I. Fonfara, A. Le Rhun, K. Chylinski, K.S. Makarova, A.L. Lécrivain, J. Bzdrenga, E. V. Koonin, E. Charpentier, Phylogeny of Cas9 determines functional exchangeability of dual-RNA and Cas9 among orthologous type II CRISPR-Cas systems, Nucleic Acids Res. 42 (2014) 2577–2590. [24] K.S. Makarova, Y.I. Wolf, O.S. Alkhnbashi, F. Costa, S.A. Shah, S.J. Saunders, R. Barrangou, S.J.J. Brouns, E. Charpentier, D.H. Haft, P. Horvath, S. Moineau, F.J.M. Mojica, R.M. Terns, M.P. Terns, M.F. White, A.F. Yakunin, R.A. Garrett, J. van der Oost, R. Backofen, E. V. Koonin, An updated evolutionary classification of CRISPR–Cas systems, Nat. Rev. Microbiol. 13 (2015) 722–736. [25] B. Zetsche, J.S. Gootenberg, O.O. Abudayyeh, I.M. Slaymaker, K.S. Makarova, P. Essletzbichler, S.E. Volz, J. Joung, J. van der Oost, A. Regev, E. V. Koonin, F. Zhang, Cpf1 Is a Single RNA-Guided Endonuclease of a Class 2 CRISPR-Cas System, Cell. 163 (2015) 759–771. [26] S. Shmakov, O.O. Abudayyeh, K.S. Makarova, Y.I. Wolf, J.S. Gootenberg, E. Semenova, L. Minakhin, J. Joung, S. Konermann, K. Severinov, F. Zhang, E. V. Koonin, Discovery and Functional Characterization of Diverse Class 2 CRISPR-Cas Systems, Mol. Cell. 60 (2015) 385–397. [27] S. Shmakov, A. Smargon, D. Scott, D. Cox, N. Pyzocha, W. Yan, O.O. Abudayyeh, J.S. Gootenberg, K.S. Makarova, Y.I. Wolf, K. Severinov, F. Zhang, E. V. Koonin, Diversity and evolution of class 2 CRISPR–Cas systems, Nat. Rev. Microbiol. 15 (2017) 169–182. [28] J.K. Hur, K. Kim, K.W. Been, G. Baek, S. Ye, J.W. Hur, S.-M. Ryu, Y.S. Lee, J.-S. Kim, Targeted mutagenesis in mice by electroporation of Cpf1 ribonucleoproteins, Nat. Biotechnol. 34 (2016) 807–808. 12

[29] B.P. Kleinstiver, S.Q. Tsai, M.S. Prew, N.T. Nguyen, M.M. Welch, J.M. Lopez, Z.R. McCaw, M.J. Aryee, J.K. Joung, Genome-wide specificities of CRISPR-Cas Cpf1 nucleases in human cells, Nat. Biotechnol. 34 (2016) 869–874. [30] B. Zetsche, M. Heidenreich, P. Mohanraju, I. Fedorova, J. Kneppers, E.M. DeGennaro, N. Winblad, S.R. Choudhury, O.O. Abudayyeh, J.S. Gootenberg, W.Y. Wu, D.A. Scott, K. Severinov, J. van der Oost, F. Zhang, Multiplex gene editing by CRISPR–Cpf1 using a single crRNA array, Nat. Biotechnol. 35 (2016) 31–34. [31] G. Amitai, R. Sorek, CRISPR–Cas adaptation: insights into the mechanism of action, Nat. Rev. Microbiol. advance on (2016) 1–10. [32] S.H. Sternberg, H. Richter, E. Charpentier, U. Qimron, Adaptation in CRISPR-Cas Systems, Mol. Cell. 61 (2016) 797–808. [33] F.J.M. Mojica, C. Díez-Villaseñor, J. García-Martínez, C. Almendros, Short motif sequences determine the targets of the prokaryotic CRISPR defence system., Microbiology. 155 (2009) 733–40. [34] R. Barrangou, C. Fremaux, H. Deveau, M. Richards, P. Boyaval, S. Moineau, D.A. Romero, P. Horvath, CRISPR provides acquired resistance against viruses in prokaryotes., Science. 315 (2007) 1709–12. [35] P. Horvath, D.A. Romero, A.-C. Coûté-Monvoisin, M. Richards, H. Deveau, S. Moineau, P. Boyaval, C. Fremaux, R. Barrangou, Diversity, activity, and evolution of CRISPR loci in Streptococcus thermophilus., J. Bacteriol. 190 (2008) 1401–12. [36] I. Grissa, G. Vergnaud, C. Pourcel, CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats, Nucleic Acids Res. 35 (2007) W52–W57. [37] D.A. Benson, M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, E.W. Sayers, GenBank., Nucleic Acids Res. 41 (2013) D36-42. [38] G.M. Boratyn, C. Camacho, P.S. Cooper, G. Coulouris, A. Fong, N. Ma, T.L. Madden, W.T. Matten, S.D. McGinnis, Y. Merezhuk, Y. Raytselis, E.W. Sayers, T. Tao, J. Ye, I. Zaretskaya, BLAST: a more efficient report with usability improvements, Nucleic Acids 13

Res. 41 (2013) W29–W33. [39] A. Biswas, J.N. Gagnon, S.J.J. Brouns, P.C. Fineran, C.M. Brown, CRISPRTarget: bioinformatic prediction and analysis of crRNA targets., RNA Biol. 10 (2013) 817–27. [40] M. Krupovic, D. Prangishvili, R.W. Hendrix, D.H. Bamford, Genomics of Bacterial and Archaeal Viruses: Dynamics within the Prokaryotic Virosphere, Microbiol. Mol. Biol. Rev. 75 (2011) 610–635. [41] W. Jiang, D. Bikard, D. Cox, F. Zhang, L.A. Marraffini, RNA-guided editing of bacterial genomes using CRISPR-Cas systems, Nat. Biotechnol. 31 (2013) 233–239. [42] K.M. Esvelt, P. Mali, J.L. Braff, M. Moosburner, S.J. Yaung, G.M. Church, Orthogonal Cas9 proteins for RNA-guided gene regulation and editing, Nat. Methods. 10 (2013) 1116–1121. [43] B.P. Kleinstiver, M.S. Prew, S.Q. Tsai, V. V Topkar, N.T. Nguyen, Z. Zheng, A.P.W. Gonzales, Z. Li, R.T. Peterson, J.J. Yeh, M.J. Aryee, J.K. Joung, Engineered CRISPRCas9 nucleases with altered PAM specificities, Nature. 523 (2015) 481–485. [44] R.T. Leenay, K.R. Maksimchuk, R.A. Slotkowski, R.N. Agrawal, A.A. Gomaa, A.E. Briner, R. Barrangou, C.L. Beisel, Identifying and Visualizing Functional PAM Diversity across CRISPR-Cas Systems, Mol. Cell. 62 (2016) 137–147. [45] F.A. Ran, L. Cong, W.X. Yan, D. a. Scott, J.S. Gootenberg, A.J. Kriz, B. Zetsche, O. Shalem, X. Wu, K.S. Makarova, E. V. Koonin, P. a. Sharp, F. Zhang, In vivo genome editing using Staphylococcus aureus Cas9, Nature. 520 (2015) 186–190. [46] T. Karvelis, G. Gasiunas, J. Young, G. Bigelyte, A. Silanskas, M. Cigan, V. Siksnys, Rapid characterization of CRISPR-Cas9 protospacer adjacent motif sequence elements, Genome Biol. 16 (2015) 253. [47] V. Pattanayak, S. Lin, J.P. Guilinger, E. Ma, J.A. Doudna, D.R. Liu, High-throughput profiling of off-target DNA cleavage reveals RNA-programmed Cas9 nuclease specificity, Nat. Biotechnol. 31 (2013) 839–843.

14

[48] G.E. Crooks, WebLogo: A Sequence Logo Generator, Genome Res. 14 (2004) 1188– 1190. [49] B.D. Ondov, N.H. Bergman, A.M. Phillippy, Interactive metagenomic visualization in a Web browser, BMC Bioinformatics. 12 (2011) 385. [50] B.P. Kleinstiver, M.S. Prew, S.Q. Tsai, N.T. Nguyen, V. V Topkar, Z. Zheng, J.K. Joung, Broadening the targeting range of Staphylococcus aureus CRISPR-Cas9 by modifying PAM recognition, Nat. Biotechnol. 33 (2015) 1293–1298. [51] Y. Zhang, N. Heidrich, B.J. Ampattu, C.W. Gunderson, H.S. Seifert, C. Schoen, J. Vogel, E.J. Sontheimer, Processing-Independent CRISPR RNAs Limit Natural Transformation in Neisseria meningitidis, Mol. Cell. 50 (2013) 488–503. [52] J.R. van der Ploeg, Analysis of CRISPR in Streptocccus mutans suggests frequent occurrence of acquired immunity against infection by M102-like bacteriophages, Microbiology. 155 (2009) 1966–1976. [53] H. Chen, J. Choi, S. Bailey, Cut site selection by the two nuclease domains of the Cas9 RNA-guided endonuclease, J. Biol. Chem. 289 (2014) 13284–13294.

15

Table 1. Comparison of PAM determination methods. Strategy

Bioinformatic analysis (in silico)

Principle Spacer sequences in CRISPR array are used to identify matching sequences (protospacers) in bacteriophage or plasmid DNA available in sequence databases followed by analysis of sequences flanking protospacers. Pros: + Sensitivity; + Can be used for different types of CRISPR-Cas systems; + Relatively simple procedure. Cons: − Reliable PAM could be determined only if sufficient number of protospacer sequences is known. The PAM is identified by sequencing plasmids from the PAM library which were cleaved by the Cas9 effector complex.

Plasmid depletion (in vivo)

Pros: + Can be adopted for other CRISPR-Cas systems; + Sensitivity. Cons: − Uncontrollable reaction conditions; − Requires high sequencing coverage; − Plasmid library requirement. dCas9 binds to a PAM library in the vicinity of lacI repressor promoter triggering positive GFP signal in the cells.

PAM-SCANR (in vivo)

Pros: + Sensitivity; + Reproducibility; + Controllable reaction conditions; + Versatility of DNA library; + Direct cleavage identification. Cons: − Heterologous expression / protein purification are required; − Cannot be adopted for CRISPR-Cas systems that degrade DNA. Analysis of the remaining PAM library after cleavage by the effector complex.

In vitro depletion assay

PAM consensus NGG [33] NNAGAAW [35] NGGNG [35] NNNNGATT [51] NGG [33] NGG [33] NGG [52] NNNNACA [23] NG [23] NNGYAAA [53] GNNNCNNA [23]

S. pyogenes

S. thermophilus CRISPR1 N. meningitidis T. denticola

NGG [41,43] NGA (VQR variant)* [43] NGAG (EQR variant)* [43] NGCG (VRER variant)* [43] NNGRRT [43,50] NNNRRT (KKH variant)* [50] NNRGRA [42,43] NNNNGNNT [42] NAAAAN [42]

S. pyogenes S. thermophilus CRISPR1

NGG [44] NNAGAA [44]

S. pyogenes S. thermophilus CRISPR1

NGG [45–47] NNRRRA [45] NNAGAAW [46] NNNCAT [45] NGG [45] NNGTGA [45] NNGGG [45] NNNNGTA [45] NNGRRT [45] NGGNG [46] NNNNCNAA [46]

S. aureus

Pros: + Can be adopted for other CRISPR-Cas systems; + Sensitivity; + Stringency of the screen can be tuned with IPTG. Cons: − Report PAM for binding rather than cleavage; − Limited control of reaction conditions; − Plasmid library requirement. Capturing of DSBs by adapter ligation after PAM library cleavage in vitro.

Adapter ligation assay (in vitro)

Characterized Cas9s S. pyogenes S. thermophilus CRISPR1 S. thermophilus CRISPR3 N. meningitidis S. agalactiae L. monocytogenes S. mutans C. jejuni F. novicida S. thermophilus LMG18311 P. multocida

P. lavamentivorans C. diphtheria S. pasteurianus C. lari N. cinerea S. aureus S. thermophilus CRISPR3 B. laterosporus

F. novicida

NGG [16] YG (RHA variant)** [16]

Pros: + Sensitivity; + Reproducibility; + Controllable reaction conditions; + Versatility of DNA library; + Can be adopted for other CRISPR-Cas systems. Cons: − Heterologous expression/protein purification are required; − Requires high sequencing coverage.

* Cas9 variants obtained using directed evolution approach. ** F. novicida Cas9 variant engineered using rational design. Cas9s and PAMs for which activity was confirmed in vivo are underlined.

16

Figure legend Fig. 1. Schematic representation of high-throughput strategies for PAM determination. (A) Plasmid depletion assay in vivo. Cells carrying plasmid with Cas9 and guide RNA encoding sequences are transformed with plasmid library bearing randomized PAM sequences and antibiotic resistance gene. Plasmids containing permissive PAMs are cleaved by Cas9 resulting in cell death on the medium supplemented with appropriate antibiotic. Sequencing of the plasmids purified from surviving colonies allows identifying depleted functional PAMs. (B) PAM-SCANR assay in vivo. Catalytically inactive Cas9 (dCas9) binds to target sequence within the promoter of lacI repressor. Transcriptional repression of lacI by Cas9 binding results in GFP expression. Isolation of GFP positive cells is followed by sequencing of target DNA resulting in PAM identification. (C) PAM library cleavage in vitro. Plasmid library bearing randomized PAM sequences is subjected to cleavage by Cas9 complex in vitro, resulting DNA ends are captured by adapter ligation and PAM-sided products are amplified. Sequencing of DNA fragments reveals cleavage-permitting PAMs. Fig. 2. St1Cas9 PAM sequence visualization using sequencing data obtained from the adapter ligation assay [46]. (A) Graphical sequence logo representation [48]. The height of the stack (in Bits) indicates the sequence conservation and the height of the individual symbol represents its relative frequency. (B) PAM representation using PAM wheels based on Krona plots [49]. Each PAM positions are represented by circles going from center outwards, and the areas of each nucleotide in the circles are proportional to the relative frequency of the nucleotide at that position.

17

A

B Plasmid depletion Min vivox

Cas9

guide RNA

C In vitro cleavage

PAM-SCANR Min vivox

dCas9 guide RNA

CmR

Cas9-guide RNA complex Protospacer

CmR

NNNNN Cleavage of PAM library in vitro

ApR

ApR NNNNN

Protospacer

NNNNN

Protospacer

NNNNN

LacI

GFP

NOT-gate Inefficient or no cleavage by Cas9

Efficient cleavage by Cas9

Selection on Cm and Ap

Inefficient or no binding by dCas9

Adapter ligation

Efficient binding by dCas9

Fluorescence assisted cell sorting MFACSx M+IPTG for signal modulationx

NNNNN

Amplification of PAM-sided products

Analyze colonies

Discard cells

Analyze cells

Analyze PCR products

B NN T

AG

A C

St1 Cas9

AT

AW

A A A

T

A A

C

A A G

C

A

A

G

A

TA

G

C

T

N

AW

NN

N

N G GA A

7

N N A GS

T

A

G

M

3

4

T

6 5

7

T

Bits

GA

G

A

3 4 5 6 PAM position

A A

T

A

G

N

A T A

2

C

T

A

C

N

A

1

T

A

C

T

NN

A G

G

C

C

NN

A C

0.0

AGA

AW

T

A

GA

G

1.0

Z

A NN

NA

2.0

AZ

A