Article No. mb981786
J. Mol. Biol. (1998) 279, 513±527
Recognition of Core-type DNA Sites by l Integrase Radhakrishna S. Tirumalai1, Hyock Joo Kwon2, Erica Healey Cardente1 Tom Ellenberger2 and Arthur Landy1* 1
Department of Biology and Medicine, Brown University Providence, RI 02912, USA 2
Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston MA 02115, USA
Escherichia coli phage l integrase (Int) is a 40 kilodalton, 356 amino acid residue protein, which belongs to the l Int family of site-speci®c recombinases. The amino-terminal domain (residues 1 to 64) of Int binds to ``arm-type'' DNA sites, distant from the sites of DNA cleavage. The carboxy-terminal fragment, termed C65 (residues 65 to 356), binds ``coretype'' DNA sites and catalyzes cleavage and ligation at these sites. It has been further divided into two smaller domains, encompassing residues 65 to 169 and 170 to 356, respectively. The latter has been characterized and its crystal structure has been determined. Although this domain catalyzes the cleavage and rejoining of DNA strands it, unexpectedly, does not form electrophorectically stable complexes with core-type DNA. Here we have investigated the critical features of l Int binding to core-type DNA sites; especially, the role of the central 65 to 169 domain. To eliminate the complexities arising from l Int's heterobivalency we studied Int C65, which was shown to be as competent as Int, in binding to, and cleaving, core-type sites. Zero-length UV crosslinking was used to show that Ala125 and Ala126 make close contact with bases in the core-type DNA. Modi®cation by pyridoxal 50 -phosphate was used to identify Lys103 at the protein± DNA interface. Since both of the identi®ed loci are in the central domain, it was cloned and puri®ed and found to bind to core-type DNA autonomously and speci®cally. The synergistic roles of the catalytic and the central, or core-binding (CB), domains in the interaction with core-type DNA are discussed for (Int and related DNA recombinases. # 1998 Academic Press Limited
*Corresponding author
Keywords: site-speci®c recombination; DNA binding protein; protein domain structure; photocrosslinking; protein modi®cation
Introduction Bacteriophage l integrase (Int) is a heterobivalent site-speci®c DNA binding protein and a type I topoisomerase, which catalyzes the insertion and excision of the viral genome into and out of the host Escherichia coli genome. Int belongs to a large family of site-speci®c recombinases (the l Int family) that rearrange DNA sequences having little e-mail:
[email protected] Abbreviations used: Int, integrase; CB, core-binding; IHF, integration host factor; PLP, pyridoxal-50 phosphate; Hepes, N-(2-hydroxyethyl) piperazine-N0 -2ethanesulfonic acid; Mops, (3-N-morpholino) propanesulfonic acid; TCA, trichloroacetic acid; TFA, tri¯uoroacetic acid; HPLC, high performance liquid chomatography; PCR, polymerase chain reaction; IPTG, isopropyl-b-D-thiogalactopyranoside; DTT, dithiothreitol. 0022±2836/98/230513±15 $25.00/0
or no sequence homology (for reviews, see Landy, 1993; Sadowski, 1993; Stark et al., 1992; Nash, 1996; Nunes-DuÈby et al., 1998). Int catalyzes integrative and excisive recombination between pairs of speci®c DNA target sites, attP/attB and attL/attR, respectively (Campbell, 1962). Each att site contains an inverted pair of ``core-type'' Int binding sites (9 bp each) separated by a 7 bp ``overlap region''. A reciprocal exchange of the ``top'' strands at the left boundary of the overlap region generates a Holliday junction recombination intermediate, which is then resolved by exchange of the ``bottom'' strands at the right boundary of the overlap region. DNA cleavage is mediated by a tyrosine hydroxyl that attacks the scissile phosphate, forming a 30 phosphotyrosine link to the nicked DNA. This covalent protein ±DNA intermediate is resolved when the 50 terminal hydroxyl of the invading DNA strand attacks the phosphotyrosine # 1998 Academic Press Limited
514 linkage and displacesthe protein. The chemistry of DNA strand exchange and the general arrangement of the ``core region''are commonto all of the l Int family members except that their overlap regions vary from 6 to 8 bp in length, and their core-type binding sites vary from 9 to 13 bp. For some family members, such as Cre, XerC/D and FLP, the core region is the entire (minimal) att site. For other family members, such as l and HP1 integrases, the att sites are more complex and contain additional protein binding sites in viral DNA sequences that comprise ¯anking ``arms''. Some of these ¯anking sites bind to accessory proteins like IHF, Xis, and Fis, which bend DNA, whereas others bind to the N-terminal domain of Int, creating a higher-order complex in which Int bridges between the core and arm sequences of a sharply bent att DNA (Ross, et al., 1979; Moitoso de Vargas et al., 1989; Richet et al., 1988; Kim et al., 1990). The 356 amino acid l Int protein, ®rst puri®ed by Kikuchi & Nash (1978), can be cleaved by limited proteolysis into two domains (Moitoso de Vargas et al., 1988). The amino-terminal domain (residues 1 to 64) binds with high af®nity to the ®ve arm-type sites. The carboxyl-terminal domain (residues 65 to 356) binds with low af®nity to coretype sites located at the positions of strand cleavage, and it functions as a topoisomerase. This carboxy-terminal domain (C65) has been cloned, puri®ed and characterized. C65 can be further dissected by proteolysis into two smaller domains, encompassing residues 65 to 169 and 170 to 356, respectively. The latter, termed C170 or the catalytic domain, has been characterized and its crystal structure has been determined (Tirumalai, et al., 1997; Kwon, et al., 1997). The C170 domain contains all of the residues that have been identi®ed as being conserved in the l Int family of recombinases (Leong, et al., 1984; Esposito & Scocca, 1997; Nunes-DuÈby et al., 1998; Argos et al., 1986; Abremski & Hoess, 1992; Blakely & Sherratt, 1996; Kwon et al., 1997). These include a very highly conserved triad of Arg212, His308 and Arg311 that has been suggested to activate the scissile phosphate for DNA cleavage (Chen et al., 1992; Pan et al., 1993); the active site nucleophile Tyr342, (Pargellis et al., 1988) and Glu174, which can be mutated to give a hyper-recombination phenotype (Lange-Gustafson & Nash 1984). The C170 domain of l Int is approximately the same size as the smallest l Int family members, such as FimB and FimE of E. coli (227 and 209 amino acid residues respectively; Klemm, 1986). Crystal structures of the catalytic domain of HP1 integrase (Dyda et al., 1994), the XerD recombinase (Subramanya et al., 1997), and the Cre recombinase complexed with its att site loxA (Guo et al., 1997) reveal protein folds resembling that of the l Int C170 domain (Kwon, et al., 1997). However, there are signi®cant differences in the active sites of these recombinases, including different orientations of the tyrosine nucleophile and differences in nearby segments that are predicted to contact the DNA.
Recognition of Core-type DNA Sites by Integrase
Although the catalytic domain of l Int catalyzes the cleavage and rejoining of core-type DNA sequences, it does not form electrophoretically stable complexes with att DNAs (Tirumalai et al., 1997). This binding behavior was unexpected for the following reasons: several mutations affecting core-type DNA binding af®nity (MacWilliams et al., 1996) and sequence recognition speci®city (Dorgai et al., 1995) are located in this domain; the corresponding domains of Cre and FLP can bind to core-type sites (Hoess et al., 1990; Panigrahi & Sadowski, 1994) and the Cre-loxA cocrystal structure indicated extensive contacts between the catalytic domain and the core DNA sites (Guo et al., 1997). The experiments reported here were designed to investigate the critical features of l Int binding to core-type DNA sites and especially the role of the central 65 to 169 domain. To eliminate the complexities arising from l Int's hetero-bivalency, we used Int C65, which lacks the amino-terminal domain that binds to arm-type DNA sites. C65 was shown to be just as competent as intact Int in binding to, and cleaving, core-type sites. Zerolength UV photocrosslinking was used to identify residues in Int C65 that make close contact with bases in the core-type DNA site. Modi®cation by pyridoxal 50 -phosphate (PLP) was used to identify lysine residues at the protein ±DNA interface. Each procedure identi®ed a different region of DNA contact, although both are located in the central domain of l Int. Consequently, the central domain of Int was cloned and puri®ed. It had been shown that the central domain of Int is necessary for ef®cient binding to core-type DNA (Tirumalai et al., 1997). Here we show that the central domain can bind to core-type DNA autonomously.
Results C65 is a monovalent analog for studying Int interactions with core-type DNA Even though the two DNA binding domains of Int have different sequence speci®cities, it is very dif®cult to study the properties of one in the presence of the other. This is especially true for interactions with the core-type DNA sites as these are of considerably lower af®nity than the arm-type sites. To circumvent this problem, we used the Int fragment C65 (residues 65 to 356), which lacks the amino-terminal arm-type binding domain, as a monovalent analog for Int ±core interactions. As an assay for these interactions we used a half-att site suicide substrate, which consists of a single coretype binding site plus three bases of the top and seven bases of the bottom strands of the overlap region (Nunes-DuÈby et al., 1987; Tirumalai et al., 1997). When this radiolabeled substrate is cleaved by Int a three-base oligomer from the top strand is lost by diffusion, thus trapping the covalent Int ± DNA complex. These covalent complexes are distinguished from free DNA by polyacrylamide gel
515
Recognition of Core-type DNA Sites by Integrase
the tyrosine nucleophile has been replaced by phenylalanine (Y342F or C65F). However, native complexes with this DNA can be detected as a retarded DNA band during PAGE. Using our native gel shift assay conditions, we had shown that C65F binds speci®cally to core-type DNA, in that binding is abolished by homologous competitor but it is resistant to equivalent amounts of heterologous competitor (Tirumalai et al., 1997). Thus, C65 is a reliable model for studying the interactions governing Int binding to core-type DNA. Zero-length photo-crosslinking To identify speci®c Int residues that make direct contact with core-site DNA we utilized UVinduced zero-length crosslinking. The C65F protein was incubated for 20 minutes at 25 C with 32 P-labeled 30/34 bp core-site oligonucleotide at several different molar ratios and irradiated at 0 C with 254 nm light for 20 minutes. The reactions were fractionated by SDS-PAGE to separate free DNA from the more slowly migrating covalent protein± DNA complexes. As seen in Figure 2, the amount of covalent complex formation with C65F is dependent upon UV irradiation and the amount of covalent complex is proportional to the amount of protein. As expected, the UV-dependent C65F complexes have the same electrophoretic mobility in SDS-PAGE as the covalent phosphotyrosine complexes between C65 and this 30/34 (suicide) oligonucleotide (lane 9, Figure 2). In preparative scale reactions containing 150 mM each of protein and DNA in 1.0 ml, approximately
Figure 1. Comparison of the kinetics of covalent complex formation by Int and IntC65 with a half-att suicide substrate. The 30/34 bp half-att suicide substrate (83.3 nM) was incubated with the indicated amounts (41.6, 83.3, 166.6 and 416.6 nM) of either full-length Int (A) or IntC65 (B), and at the indicated times aliquots were quenched in 0.1% SDS. The amount of covalent complex formed was quantitated by gel electrophoresis (see Materials and Methods). The percentage of covalent complex formed as a function of time is shown for each of the different protein concentrations.
electrophoresis (PAGE) in the presence of SDS. In Figure 1 it can be seen that over a wide range of protein concentrations, C65 is just as ef®cient as full-length Int in speci®cally binding and cleaving a core-type DNA site. Cleavage with C170 requires much higher protein concentrations than those used here (Tirumalai et al., 1997). No covalent complex is formed between the suicide att DNA and a mutant Int protein in which
Figure 2. UV-crosslinking of core-type DNA to C65F. 5 mM 32P-labeled 30/34 mer core-type DNA (lane 1) was mixed with 0.25, 0.5, 1, 2.5, 5 or 10 mM C65F (lanes 2 to 7) and irradiated with UV light as described in Materials and Methods. Lane 8 contains 5 mM 32P-labeled 30/34mer core-type DNA mixed with 5 mM C65F, but not irradiated with UV light. Lane 9 contains the UV-independent covalent complex (>75% conversion) formed between 208 nM C65Y and 8.3 nM of 32P-labeled 30/34 mer (see Figure 1). The reactions were quenched by the addition of 0.1% SDS (w/v), analyzed by SDS-PAGE and autoradiography. The diffuse bands of free DNA are the result of UV irradiation (cf. lanes 1 to 7 with lane 8). The results were also quantitated with a phosphorimager (see the text).
516 5% of the C65F protein is crosslinked to the DNA. Following UV irradiation, the DNA-crosslinked protein was separated from the non-crosslinked protein by chromatography on phosphocellulose in the presence of 0.3 M NaCl. The crosslinked protein ±DNA complex and the free DNA ¯owed through the column, whereas the non-crosslinked protein was bound to the matrix. The non-crosslinked protein was eluted from the phosphocellulose column with 2 M NaCl salt (Figure 3). The crosslinked protein was separated from most (>95%) of the non-crosslinked DNA by repeated ®ltration and washing in a Centricon-30 concentrator. The crosslinked protein± DNA complex was denatured with 8 M urea, diluted to a ®nal urea concentration of 2 M and digested overnight with endoproteinase GluC (V8 protease). The digestion conditions were chosen such that GluC cleaves speci®cally at the carboxy side of Glu (Allen 1989) (see Materials and Methods). GluC was chosen instead of trypsin because the large number of Lys and Arg residues in the Int protein would generate many non-unique peptides. The resultant peptides were separated by anion-exchange HPLC, which takes advantage of the large negative charge on peptides that are crosslinked to DNA. We established that non-crosslinked peptides and non-crosslinked integrase protein, if any is present, do not bind to the column under these conditions (data not shown). Two major peaks (DNA-associated)
Recognition of Core-type DNA Sites by Integrase
eluted from the ion-exchange column (Figure 4A). Each peak was further puri®ed by reverse phase HPLC on a C18 column (Figures 4B and C). Peptides from the two puri®ed peaks were subjected to gas phase N-terminal amino acid sequencing to identify the amino acid residue that had been crosslinked. If the crosslink is stable under the conditions employed for gas phase sequencing, the crosslinked residue will not be extracted from the polybrene-coated support disk and consequently there will be a gap at that position in the sequence (Williams & Konigsberg, 1991). For peak I, 15 cycles of sequencing uniquely identi®ed the GluC cleavage product, IIe-Ala-Ala-Met-Leu-AsnGly-Tyr-Ile-Asp-Glu, which corresponds to residues 124 to 134 of l Int. Cycle 1 yielded a mixture of amino acids and therefore nothing can be said about the ®rst residue of this peptide. The remaining amino acids were obtained in good yield except for Ala125 and Ala126, suggesting that these two residues are probably involved in photoinduced crosslinks with the DNA. Peak II did not yield an amino acid sequence and it is therefore probably the free DNA. This is consistent with the later elution of peak II on the anion-exchange column; free DNA would be expected to bind more tightly to the anion-exchange column than DNA bound to a peptide. Since the crosslinked complex was completely separated from free protein by phosphocellulose chromatography prior to peptide mapping and the crosslinked peptide was further resolved from free peptides by anionexchange HPLC, the peptide in peak I is most likely part of the core-type DNA binding interface of l integrase. Modification of lysine by pyridoxal 50 -phosphate
Figure 3. Separation of photocrosslinked Int ± DNA complex from free Int. 150 mM of C65F was mixed with 150 mM 30/34 mer core-type DNA in 1 ml of buffer A and crosslinked with UV light as described in Materials and Methods. NaCl was added to 0.6 M and repeated centrifugal ®ltrations on Centricon-30 membranes (cut off 30,000 daltons) were used to remove unbound DNA. The sample was diluted to a NaCl concentration of 300 mM and loaded on a 2 ml phosphocellulose column equilibrated with buffer A plus 300 mM NaCl. The column was washed with 20 ml of buffer A plus 300 mM NaCl and eluted with 2.0 M NaCl (in buffer A); 1.0 ml fractions were collected. Small aliquots of each fraction were analyzed by SDS-PAGE and stained with Coomassie blue. The relative intensity of the stained gel bands is shown in arbitrary units.
To con®rm the above ®ndings and identify additional residues that make close contact with the core-type DNA we utilized a second approach that makes use of completely different chemistry to probe protein ±DNA interactions. Pyridoxal 50 -phosphate (PLP) reacts under mild conditions preferentially and reversibly with the e-amino group of lysine to form a Schiff base. The resulting Schiff base can be irreversibly reduced with sodium borohydride to form a stable pyridoxal 50 -phosphate imine adduct of lysine (Martial et al., 1975). An additionally useful feature of this reaction is that PLP is a nucleotide analog that preferentially interacts the lysine in nucleotide-binding or DNA-binding pockets of many proteins (see Discussion). The ratio of speci®c (nucleotide analog-like) PLP interactions to non-speci®c modi®cations of surface lysine residues was maximized by working with the minimum PLP concentration that blocked binding of C65F protein to core-type DNA. The amount of Int bound to core-type DNA was monitored by a non-denaturing gel mobility shift assay using the 30/34 bp oligonucleotide described above (see also
Recognition of Core-type DNA Sites by Integrase
517 Materials and Methods). No covalent complexes could be formed with this substrate because C65F (Y342F) lacks the active site tyrosine nucleophile. As seen from the inhibition curve in Figure 5, 1.0 mM PLP almost completely inhibited the binding of C65F to the core site, and this concentration was used for all subsequent experiments. To determine whether the PLP-induced loss of Int's DNA binding activity resulted from a speci®c modi®cation of its DNA binding surface or a nonspeci®c and general inactivation of the protein, we asked if preincubation with core DNA would protect Int against PLP inactivation. C65F was preincubated with unlabeled core-type oligomer, then treated with PLP, and the resultant Schiff base was reduced by sodium borohydride. Unreacted PLP is inactivated by borohydride reduction. After removing unlabeled DNA by binding it to a DE52 mini column, the PLP-treated protein was tested for its ability to bind 32P-labeled core-type oligomer in the gel shift assay. As seen in the third column of Table 1, preincubation of C65 with the core-type oligomer decreased the PLP inactivation by approximately 80%. Furthermore, this protection was lost if the preincubation was carried out at higher salt concentrations, 0.3 M and 0.5 M NaCl, where C65F binds very poorly to core-type DNA (Table 1, columns 4 and 5). The ability of DNA to protect, only under salt conditions where it can bind to C65F, suggests that the PLP-induced block of Int's DNA binding activity is a speci®c consequence of modifying lysine residues that contact the DNA, and not due to a global effect on protein stability. PLP-modi®ed lysine residue(s) of Int C65 that were protected by core-type DNA were identi®ed on the basis of their resistance to cleavage by lysine-speci®c proteases such as endoproteinase LysC (Basu & Modak, 1987; Basu et al., 1988; Basu et al., 1989b; Tamura & Gellert 1990), and by the fact that the reduced PLP ±protein complex has an absorption shoulder at 325 nm (Martial et al., 1975; Basu et al., 1989a,b). Aliquots of Int C65F were modi®ed with PLP in the presence or absence of core-type DNA, then denatured and digested with LysC (see Materials and Methods). The resulting peptides were separated on a C-18 reverse phase column and the two peptide maps were compared with a third peptide map of unmodi®ed Int C65F protein generated in a similar manner (Figure 6). The peaks eluting from HPLC were monitored at 220 nm for the peptide bond and at 325 nm for the presence of reduced PLP. We consistently observed
Figure 4. Puri®cation of the photocrosslinked DNA ± peptides by anion exchange HPLC. The UV-irradiated mixture of IntC65F and core-type DNA was digested with endoproteinase GluC and the peptides were subjected to anion-exchange HPLC (see Materials and Methods). The two major peaks were pooled as indicated (A) and individually subjected to a second round
of puri®cation by reverse-phase HPLC on a C-18 matrix (peak I, B; peak II, C). A shallow acetonitrile gradient was employed under ion-pairing conditions (see Materials and Methods). The absorbance at 254 nm is plotted as a function of retention time. Peak I contains crosslinked DNA ±peptide(s) and peak II contains free DNA (see the text).
518
Recognition of Core-type DNA Sites by Integrase
Figure 5. Inactivation of Int C65F core-type DNA binding by modi®cation with pyridoxal 50 -phosphate. The 32 P-labeled 30/34-mer core-type DNA oligomer, 2.66 mM in 15 ul of binding buffer, was incubated for 20 minutes at 25 C, with 1.33 mM of either IntC65F or IntC65F that had been treated with the indicated concentrations of pyridoxal 50 -phosphate for 30 minutes, as described in Materials and Methods. The amount of core-type DNA bound to Int in each sample was analyzed by the gel mobility shift assay. The amount of DNA binding by the PLP-treated Int relative to the untreated control Int is plotted as a function of the PLP concentration.
the presence of a unique peptide eluting around 85 minutes in the LysC peptide map of the PLP-modi®ed protein (peak 3, Figure 7B). This peak was absent from cleavage products of the unmodi®ed protein (Figure 7A) or the protein treated with PLP in the presence of core-type DNA (Figure 7C). The unique peak, which was further puri®ed by chromatography on a C-18 reverse-phase column (Figure 8A) using a shallow gradient of acetonitrile, was also associated with reduced PLP, as seen from the 325 nm shoulder in its absorbance spectrum (Figure 8B). In contrast, peak 2 is not associated with the 325 nm shoulder (Figure 8C). All of these properties are expected for a peptide containing a PLP-modi®ed Lys residue that is pro-
Figure 6. Strategy for identifying DNA-sensitive PLPmodi®ed peptides. The ®rst objective is to identify those peptides containing a lysine that has been modi®ed by pyridoxal 50 -phosphate. This involves a comparison of untreated and PLP-treated Int. The second objective is to identify the subset of PLP-modi®ed peptides for which the binding of core-type DNA prevents modi®cation. This involves a comparison of PLP treatment in the presence and absence of protecting DNA.
tected by DNA binding. The puri®ed peak was subjected to ten cycles of gas phase amino acid sequence analysis, which revealed that this peptide uniquely corresponds to residues 96 to 105, ThrLeu-IIe-Asn-Tyr-Met-Ser-Lys-IIe-Lys, of l integrase protein. No amino acid was detected during cycle 8 of the sequencing reaction, which is consistent with the modi®cation of Lys103 by PLP and with the resistance of this modi®ed lysine to cleavage by endo-LysC (Benesch et al., 1982; Basu et al., 1989b). We conclude that Lys103 is located at, or in the close vicinity of, the core-type DNA binding surface of the Int protein.
Table 1. Effect of preincubation with core-type DNA at different NaCl concentrations on the modi®cation of C65F by PLP DNA in Preincubation ÿ ÿ
PLP ÿ
NaCl:
DNA binding (% of unmodified control) 0.1 M 0.3 M 0.5 M 100 1 78
100 5.3 52
100 17 16
C65F (6.25 mM) was preincubated in the absence or presence of 31.25 mM unlabeled core-type DNA, for ten minutes on ice, in PLP modi®cation buffer containing the indicated amount of NaCl. Modi®cation with 1 mM PLP and reduction with NaBH4 was then carried out as described in Materials and Methods. Each of the samples was passed through a DE-52 column to remove unlabeled DNA and tested for its ability to bind 32P-labeled core-type DNA in a gel mobility shift assay as described in Materials and Methods.
519
Recognition of Core-type DNA Sites by Integrase
Cloning the central domain The residues that make close contact with the core-type DNA site are located in a previously identi®ed 11 kDa central domain of l Int, comprising residues 65 to 169 (Tirumalai et al., 1997). To better understand the functional capability of this domain, we cloned and over-expressed the region from Lys62 to Arg177. The over-expressed protein was puri®ed by two column chromatography steps to approximately 95% purity, as described in Materials and Methods. To assess the functionality of the puri®ed domain we determined its ability to bind to a full core-site, i.e. a pair of inverted coretype binding sites separated by the 7 bp overlap region. This core-site was constructed on a synthetic 29 bp oligomer that was labeled with 32P at its 50 termini. Protein binding to the labeled core site was assayed by a gel mobility shift assay. The puri®ed domain ef®ciently bound the core DNA site at a protein concentration of 15 mM. With this substrate approximately 98% of the complex migrated as a speci®c band of the expected mobility. The remaining 2% was an aggregate that did not enter the gel. (In contrast to this target, binding to a half-att site, single core-type site, yielded only an aggregate.) The speci®city of binding to the full core site was demonstrated by its resistance to competition by an unlabeled heterologous competitor relative to the unlabeled homologous competitor (Figure 9).
Discussion Biochemical identification of contacts between l Int and core-type DNA
Figure 7. HPLC fractionation of the three LysC digests generated by the strategy shown in Figure 6. LysC digests were carried out on IntC65F that had not been treated with PLP (A) or treated with PLP in the absence (B) or presence (C) of core-type DNA, as outlined in Figure 6 and described in Materials and Methods. Each digest (10 nmol) was fractionated by reverse-phase HPLC on a C-18 column (see Materials and Methods). The Figure shows the 220 nm absorbance pattern of the column fractions. The region containing the unique peak of interest (peak 3) was pooled, along with peak 2 as a carrier (bracket), for further puri®cation (Figure 8) and analysis (see the text). Only that region of the chromatogram containing differences between A, B and C is shown.
In the work reported here, three different experimental approaches have been used to study the regions of Int protein involved in binding to coretype DNA sites. Each of these methods points to residues within the domain bounded by the proteolytically sensitive sites at positions 64/65 and 169/170 of the Int protein. This domain was originally implicated in core-type binding because its removal from Int resulted in weakened DNA binding (Moitoso de Vargas et al., 1988; Tirumalai et al., 1997). It was not known how much, if any, of this effect resulted from the loss of primary interaction(s) with core-type DNA versus a secondary effect on the conformation of the C-terminal catalytic domain. The present results clearly show that the central 65 to 169 domain plays a critical and direct role in binding to the core-type DNA sites. We have shown here and in previous experiments (Tirumalai et al., 1997) that the carboxyterminal fragment of Int starting at Thr65 (C65) binds to and cleaves core-type sites with the same af®nity and speci®city as intact Int (Figure 1). Because C65 lacks the amino-terminal domain (1 to 64) that interacts with arm-type sites, it is the preferred protein for studying Int interactions with
520
Recognition of Core-type DNA Sites by Integrase
Figure 9. Speci®city of the CB domain binding to a full att site. 20 mM core-binding domain was mixed with 1 mM 32P-labeled 29 mer att site DNA (see Materials and Methods) in the presence or absence of the indicated mM concentrations of homologous (speci®c) or heterologous (non-speci®c) competitor in 15 ml binding buffer plus 50 mM NaCl and 0.5% (v/v) NP-40, at 25 C for 20 minutes. The reaction was analyzed by the gel mobility shift assay. In the control with no competitor, 20% of the input DNA was complexed with protein; 2% of this was in an aggregate at the top of the gel and 98% ran as a speci®c band of the expected mobility. The latter band was normalized to 100% and used to quatitate the sensitivity to competitor.
core-type sites. When C65F was incubated with a core-type DNA site and irradiated with UV light approximately 5% of the input protein and DNA (present at a 1:1 molar ratio) was converted to covalent complexes (Figure 2). Since C65F lacks the Tyr342 nucleophile, these complexes were not the result of a phospho ±tyrosine linkage and they were completely dependent upon UV irradiation. The UV-photocatalyzed reaction between proteins and nucleic acids is thought to be mediated by a free radical that is generated by photoexcitation of a nucleic acid base. This is followed by abstraction of a hydrogen atom from a favorably positioned amino acid. A covalent bond is formed by the pairing of two radicals located on the base and the proximate amino acid. Although any base
Figure 8. Repuri®cation of the unique PLP-modi®ed LysC peptide. The pooled fractions indicated in Figure 7B were concentrated and rechromatographed on
a C-18 column using a shallow acetonitrile gradient (see Materials and Methods). The column elution pro®le for the absorbance at 220 nm is shown as a function of retention time (A). The absorption spectra are shown for peaks 3 (B) and 2 (C). Peak 3 was pooled for sequence analysis.
521
Recognition of Core-type DNA Sites by Integrase
should be able to participate in this reaction, thymine appears to be the most reactive (Williams & Konigsberg 1991). The proposed mechanism for this UV-induced reaction produces zero-length crosslinks and could involve virtually any amino acid. In the sequence analysis of the UV-crosslinked peptide, the ®rst cycle was contaminated with a mixture of amino acids, so it was not informative. The observed low yield of residues corresponding to Ala125 and Ala126 is expected if these amino acids are crosslinked to DNA and therefore do not elute from the polybrene ®lter (Williams & Konigsberg 1991). However, the loss of photoinduced protein ±DNA crosslinks during Edman peptide sequencing has been reported (Basu et al., 1992) and therefore additional residues of our Int peptide might also be capable of crosslinking to DNA. Additionally, other crosslinked peptides might not be resolved by our chromatography conditions. In the experiments reported here, the covalent Int ± DNA complexes were puri®ed by phosphocellulose chromatography and after proteolysis the covalent peptide ±DNA complexes were puri®ed by anion-exchange HPLC (Figures 3 and 4). The conclusion from these data is that the GluC peptide derived from l Int positions 124 to 135 contains residues that are in close proximity to DNA in the core-type binding site. The second experimental approach reported here took advantage of the stable adduct that can be formed between pyridoxal 50 -phosphate and the e-amino group of lysine. Inactivation of C65's coretype binding activity could be prevented by binding C65 to core-type DNA before exposure to PLP. This protection was abolished at higher salt concentrations in which C65 cannot bind to DNA (Figure 5 and Table 1). In other words, the protecting DNA did not directly interfere with PLP's chemical reactivity and the DNA afforded protection only when complexed to C65. These experiments established that at least one lysine targeted by PLP is located at the protein interface that binds to the core of att DNAs. The PLP-modi®ed peptide containing this critical lysine was identi®ed on the basis of being unique to the PLP modi®cation carried out in the absence of protecting core DNA. It contained a LysC-resistant lysine, Lys103, that appeared as a gap in the peptide sequence, as expected for the PLP modi®cation (Benesch et al., 1982; Basu et al., 1989b). The identi®cation of only one DNA-sensitive PLP adduct in l Int is similar to other results obtained with pyridoxal 50 -phosphate (or the closely related compound 50 -diphospho-50 -adenosine) indicating that these compounds react preferentially with the e-amino group of lysine residues that bind to nucleotides or to DNA. For several RNA and DNA polymerases and for DNA gyrase, the speci®city of PLP modi®cation has been demonstrated by kinetic and stoichiometric analyses of inhibition and by correlation of PLP modi®cation sites with known crystal structures and speci®c biochemical defects (Martial et al.,
1975; Basu et al., 1988; Basu & Modak 1987; Basu et al., 1989a,b; Tamura & Gellert, 1990). In most of these cases, only one or a very small number of the total lysine residues was modi®ed by PLP and all of the modi®ed residues were located at a protein surface involved in DNA or nucleotide binding. Nevertheless, our results do not exclude a possible role for other lysine residues in core binding, since the local environment of lysine in the uncomplexed proteins can in¯uence its reactivity towards PLP. Autonomous binding by the CB domain The deletion analyses reported previously (Tirumalai et al., 1997) and the chemical modi®cation studies described above have established that the domain spanning residues 65 to 169 of l Int is directly involved in, and essential for, binding to core-type DNA. In order to determine whether this domain is suf®cient (on its own) for speci®c binding to core-type DNA it was cloned, over-expressed and puri®ed. The isolated domain binds ef®ciently to a 29 bp oligomer containing the full core region, which includes a pair of inverted core-type binding sites separated by the 7 bp overlap region. The speci®city of binding was demonstrated by its resistance to competition by an unlabeled heterologous competitor relative to the unlabeled homologous competitor (Figure 9). Thus, this domain is both necessary and suf®cient for speci®c binding to the core-type DNA sites. We propose the name CB domain (central, or corebinding domain) of l Int, for this region sandwiched between the N-terminal arm-binding domain (residues 1 to 64) and the C-terminal catalytic domain (residues 170 to 356). Relationship between the CB and catalytic domains in core recognition Notwithstanding the photo crosslinking, chemical modi®cation and autonomous binding results reported here for the CB domain, and the inability of the catalytic domain to form electrophoretically stable complexes with core DNA (Tirumalai et al., 1997), the CB domain cannot be the sole determinant of Int binding to core-type DNA. The catalytic domain encompasses the active site of DNA strand breakage and rejoining activities, and it includes a cluster of three highly conserved residues that are thought to activate the scissile phosphate at the site of DNA cleavage. Amino acid substitutions involving any of these residues, Arg212, His308 and Arg311, cause defects in DNA cleavage and/ or ligation (Parsons et al., 1988; Wierzbicki et al., 1987; Friesen & Sadowski 1992; Han et al., 1994) and in l Int they were shown to have greatly reduced af®nity for core-type sites (MacWilliams et al., 1996). All three residues, along with a fourth (His333), are clustered together in the catalytic sites of l Int, HP1 Int, XerD and Cre recombinases (Kwon et al., 1997; Hickman et al., 1997; Subramanya et al., 1997; Guo et al., 1997). The most
522 decisive evidence for the role of this quartet in catalysis and binding to core-type DNA comes from the cocrystal structure of Cre with its (coretype) DNA site (Guo et al., 1997). Here it is clearly seen that each member of the quartet makes at least one hydrogen bond with a non-bridging oxygen of the scissile phosphate, in either the preor post-cleavage DNA complexes. Further evidence that the CB domain is not the sole determinant of l Int binding to core-type DNA comes from the laboratory of Bob Weisberg, in which studies have identi®ed the speci®c amino acid differences that are responsible for the distinctive recombination speci®cities of two closely related integrases from bacteriophage l and HK022 (Yagil et al., 1995; Dorgai et al., 1995). The substitution of ®ve residues in l Int with the corresponding residues from HK022 switched the recombination speci®city from l to HK022 att sites, which differ from each other only in the core region. One of these residues, Asn99, is located in the CB domain, whereas four others, Ser282, Gly283, Arg287 and Glu319, are in the catalytic domain. We suggest that there is no real dichotomy between the localization of several core-speci®city mutants to the catalytic domain (Dorgai et al., 1995; Yagil et al., 1995) and the localization of corebinding interactions to the CB domain, as reported here. The speci®city mutants were de®ned by their ability to carry out recombination while the interactions identi®ed with the CB domain required stable DNA binding. It is likely that the protein± DNA ``®t'' required for ef®cient catalysis (DNA cleavage and ligation) is greater than that required for sequence-speci®c DNA binding. Furthermore, the discrimination of sequence speci®city in the competition binding experiments is far less subtle than the discrimination assayed by recombination ef®ciency. Considering both the recombination speci®city results from the Weisberg laboratory and the biochemical results reported here, it appears that in its interaction with core-type DNA, l Int derives much of the binding energy and some of the sequence speci®city from the CB domain. Core recognition by different l -Int family members The C65 domain of l Int, which has been shorn of the 64-residue arm-type binding domain, is analogous to the full-length Cre and Xer recombinases, which bind exclusively to their sites of strand cleavage (core-type sites) and lack an auxiliary DNA binding domain. According to this alignment, the CB domain and catalytic domains of l Int are analogous to the amino and carboxy-terminal domains, respectively, of XerD and Cre. In crystal structures of XerD and Cre, an a-helical amino-terminal domain of approximately 100 amino acid residues is joined by a ¯exible linker to the larger carboxy-terminal domain containing all
Recognition of Core-type DNA Sites by Integrase
of the residues directly involved in DNA cleavage and ligation (Guo et al., 1997; Subramanya et al., 1997). Presumably, the ¯exible connectors of XerD and Cre correspond to the protease-sensitive site that separates the CB and catalytic domains of l Int. The cocrystal structure of the Cre ± loxA complex shows extensive protein± DNA contacts for both the amino and carboxy-terminal domains, which bind to opposite faces of the double helix and form a clamp around the core DNA (Guo et al., 1997). A similar arrangement has been suggested for the interaction of XerD with its DNA site (Subramanya et al., 1997). The carboxy-terminal domain of Cre interacts with the entire 13 base-pair core-type site plus the ®rst two base-pairs of the overlap region, consistent with the prior ®nding that this domain binds autonomously to core-type sites (Hoess et al., 1990). This contrasts with the results reported here and previously (Tirumalai et al., 1997) for l Int showing that the CB domain binds core-type DNA better than the catalytic domain. The simplest explanation for this apparent difference between ( Int and Cre is that the two proteins have core distributed recognition differently between the two domains. In this regard it should be noted that l Int and Cre represent the extremes of two different subgroups within the Int family. Cre is a monovalent protein that has evolved a strong core interaction that enables recombination with simple att sites composed only of core-type binding sites. l Int, on the other hand, is a bivalent protein that has evolved weak core interactions so as to be dependent upon arm-type binding sites and accessory proteins. We suggest that the difference between l Int and Cre in their overall af®nity for core-type sites might be re¯ected in how the core-type binding energy is distributed, i.e. between the CB (amino) and catalytic (carboxyl) domains. Further support for this suggestion comes from the observation that the ``catalytic domain'' of HP1 integrase (i.e. lacking the upstream CB domain) did not have detectable catalytic activity, presumably due to its very low (undetectable) af®nity for core-type DNA (Hickman et al., 1997). Thus, HP1 and l integrases, which are the only bivalent l Int family members characterized at this level, both have catalytic domains that do not bind well to core-type sites, in contrast to the binding demonstrated for the catalytic domains from the monovalent autonomous integrases Cre and FLP (Hoess et al., 1990; Panigrahi & Sadowski, 1994). Finally, it should be emphasized that even more striking than the apparent differences between Cre and l Int is the apparent similarity in their interaction with core-type DNA. The pattern of utilizing two distinct domains for binding to core DNA sites is conserved in each of the ®ve l Int family members studied at this level. These ®ve recombinases comprise a diverse cross-section of the family: completely autonomous, monovalent, prokaryotic and eukaryotic recombinases (Cre and
523
Recognition of Core-type DNA Sites by Integrase
FLP); monovalent recombinases that utilize accessory proteins (the Xer family); and bivalent accessory-protein-dependent recombinases (l and HP1 integrases). Thus, despite the lack of signi®cant homology among l Int family members in the region corresponding to the CB domain (or aminoterminal domain in monovalent recombinases), it appears that the two-domain structure for interacting with core-type DNA sites is a well-conserved motif. How the binding energy is apportioned between these two domains may depend upon whether the recombinase is monovalent (e.g. Cre and FLP) or bivalent (e.g. l and HP1 integrases).
Materials and Methods Materials Plasmids The plasmid pRT1 was used to overexpress l Int protein under the control of a bacteriophage T7 promoter and was the parent for constructing all other expression plasmids used here except for the CB domain expression plasmid. pRT1 was constructed by inserting a 30 basepair minicistron, or minigene (Schoner, et al., 1986), between the T7 promoter and the l Int gene, in the expression plasmid pLV356-7. pLV356-7 (L. M. de Vargas & A. Landy, unpublished results) contains the l Int gene (50 to 30 of the sense strand) between the unique NdeI and HindIII restriction sites of the expression vector pT7-7 (Tabor & Richardson, 1985). The minigene was synthesized by the polymerase chain reaction (PCR) using the plasmid pLV356-7 as the template. One primer was complementary to the NdeI site and the Int gene at its 50 end and complementary to the T7 promoter at its 30 end; it contained the minigene as an intervening noncomplementary loop. The minigene sequence contained a diagnostic ClaI restriction site. The other primer was complementary to the pT7-7 sequence and upstream of the unique XbaI restriction site. The 310 bp PCR product was puri®ed using the Qiagen PCR puri®cation kit (Qiagen Inc., Chatsworth, CA) and digested with NdeI and XbaI. The 69 bp NdeI-XbaI fragment was inserted into pLV 356-7, between its unique NdeI and XbaI restriction sites. The resulting plasmid was sequenced to con®rm the presence and sequence of the minigene (Sanger et al., 1977). pRT2 was made by swapping the Int gene of pRT1 with the IntY342F gene (called IntF) (Pargellis et al., 1988). The minigene gene resulted in a 20 to 25-fold enhancement in the expression levels of these proteins (data not shown). DNA segments encoding different portions of Int or IntF were made as described and cloned into the pRT1 backbone to obtain the following plasmids: residues 65 to 356 are called C65 and C65F in pRT3 and pRT4, respectively and residues 170 to 356 are called C170 and C170F in pRT7 and pRT8, respectively (Tirumalai et al., 1997). The Int gene fragment containing the central domain (CB domain) encompassed residues 62 to 177 and was made by PCR using the l Int gene as the template. The promoter-proximal primer had a unique NdeI site at the region coding for Lys62 and the distal primer had a unique EcoRI site down stream of the stop codon following Arg177. The puri®ed PCR product (Qiagen PCR kit) was digested with the restriction enzymes NdeI and EcoRI, further puri®ed by gel electrophoresis through 1.8% agarose (w/v) and introduced into a pRSET-based expression vector (Invitrogen, Carls-
bad, CA), under the control of a T7 promotor, at its multicloning NdeI-EcoRI restriction sites. The nucleotide sequence of the PCR-generated CB coding region was con®rmed by nucleotide sequencing. All of the expression plasmids were introduced by transformation into the host E. coli BL21 (DE3, is pLysS), which contains T7 RNA polymerase under the control of a lac promoter (Studier & Moffatt, 1986). To overexpress the protein, the E. coli BL21 (DE3, pLysS) cells carrying the appropriate expression plasmid were grown at 37 C to an A600 value of 0.4 to 0.6 at which point protein expression was induced by the addition of IPTG to a ®nal concentration of 1.0 mM and the cells were grown for a further three hours. The cells were pelleted by centrifugation at 5000 g for ten minutes, washed with 50 mM Tris-HCl (pH 8.0), 10% (w/v) sucrose 1 mM DTT, 1 mM EDTA and suspended in an equal amount (w/v) of 50 mM Tris-HCl (pH 8.0), 10% (v/v) glycerol, 1 mM EDTA and 1 mM DTT (lysis buffer). The cell suspension was frozen as droplets directly in liquid nitrogen and the resulting ``popcorn'' was stored at ÿ80 C. Oligonucleotides The synthetic oligonucleotides were either obtained from Operon technologies, CA or synthesized on a Applied Biosystems 380A DNA synthesizer. They were ®rst puri®ed by HPLC and then by gel electrophoresis. When necessary, the DNA fragments were labeled at their 50 OH termini with [g-32P]ATP, using bacteriophage T4 polynucleotide kinase (Sambrook et al., 1989). The sequences were as follows: (1) 30/34-mer single core-type site (half-att suicide substrate): top strand, 50 AAGCTGAAGATCTT CTCGAGCAGCTTTCTA 30 ; bottom strand, 30 -TTCGACTTCTAGAAGAGCTCGTCGAAAGATCTTG 50 . (2) 30/34-mer heterologous competitor for the single core-type site: top strand, 50 AAGCTGAAGATCTTCTCGAGACGTTCTCTA-30 ; bottom strand 30 -TTCGACTTCTAGAAGAGCTCTGCAAGAGATCTTG 50 . (3) 29-mer full att site: top-strand, 50 -CGTTCAGCTTTTTTATACTAAGTTGGCAT-30 ; bottom strand, 30 -GCAAGTCGAAAAAATATGATTCAACCGTA-50 (4) 29-mer heterologous competitor for full att site: top strand, 50 -TGAACAGGTCACTATCAGTCAAAATAAAA 30 ; bottom strand, 30 -ACTTGTCCAGTGATAGTCAGTTTTATTTT-50 . Methods Protein purification The full-length Int and IntF proteins were expressed as insoluble proteins. They were puri®ed from the insoluble fraction, by a modi®cation of the method used for the s subunit of RNA polymerase (Gribskov & Burgess, 1983) as follows. The frozen cells (popcorn) were added to an equal weight of ice cold lysis buffer (®nal 1 g wet
524 weight of cells per 3 ml lysis buffer) and thawed on ice for 30 minutes. To this lysate, protease-free RNase (Boehringer-Mannheim) was added at a concentration of ten units per g wet weight of the cells, and it was then incubated on ice for 30 minutes. At the end of 30 minutes, MgCl2 (10 mM) and CaCl2(1 mM) were added and the lysate was digested with protease-free DNase (®ve units per g wet weight of cells) (Worthington) for 30 minutes. The digestion was terminated by the addition of EDTA to a ®nal concentration of 20 mM. To this mixture NaCl was added to a ®nal concentration of 0.6 M, and it was then brie¯y sonicated on ice for one minute to ensure non-viscous homogenate. The sonicated homogenate was centrifuged at 12,000 g for 30 minutes. The resulting pellet was resuspended in lysis buffer plus 0.6 M NaCl containing 6 M guanidinium chloride (1 ml per g wet weight of cells) and brie¯y homogenized with a Potter-Elvejehm type homogenizer. The homogenate was centrifuged at 12,000 g for 30 minutes and the Int protein was renatured from the clear supernatant as follows. The supernatant was serially diluted with equal volumes of lysis buffer plus 0.6 M NaCl containing no denaturant to a ®nal guanidinium chloride concentration of 120 mM. There was a ®ve minute interval between additions and the mixture was gently stirred during and between additions. At this step it was ®nally diluted with lysis buffer without NaCl and guanidinium chloride, to a ®nal NaCl concentration of 0.3 M and a guanidinium chloride concentration of 60 mM. This mixture was centrifuged at 12,000 g for 30 minutes and the supernatant was mixed with phosphocellulose (1 ml swollen matrix per g wet weight of cells) equilibrated with the same buffer (lysis buffer plus 0.3 M NaCl), overnight in the cold (0 to 4 C). The phosphocellulose ± protein mixture was poured into a column and washed with equilibration buffer. The proteins were eluted with a gradient of NaCl (0.3 M to 2.0 M), analyzed by electrophoresis through 15% (w/v) polyacrylamide/0.1% (w/ v) SDS (Laemmli, 1970) and stained with Coomassie blue. The active fractions were pooled and chromatographed on a SP-Sepharose column, developed with a NaCl gradient, (0.3 M to 2.0 M). The active fractions were pooled and subjected to chromatography on a hydroxylapatite column, developed with a sodium phosphate (pH 8.0) gradient, (10 mM to 100 mM). The proteins, after this step, were greater than 95% as judged by Coomassie or silver staining of an overloaded SDS-polyacrylamide gel (not shown). Protein concentration was estimated by the dye binding method (Bradford, 1976). The C65, C170 and CB domains were expressed as soluble proteins and the initial puri®cation steps were as follows. Frozen cells (popcorn) were diluted into an equal amount (w/v) of ice cold lysis buffer (®nal 1 g wet weight of cells per 3 ml lysis buffer) and thawed on ice for 30 minutes. NaCl was added to a ®nal concentration of 0.6 M and the lystate was brie¯y sonicated, until non-viscous. The homogenate was centrifuged at 12,000 g for 30 minutes. To the resulting supernatant a 40% stock solution of protamine sulfate was added to a ®nal concentration of 10% to precipitate the DNA, on ice for 30 minutes with gentle stirring. The mixture was centrifuged at 12,000 g, for 30 minutes. The resulting supernatant was diluted to a NaCl concentration of 0.3 M and subjected to anionexchange chromatography on phosphocellulose. The column was developed with a gradient of NaCl from 0.3 M to 2.0 M. The remaining puri®cation steps were as described above for full-length Int.
Recognition of Core-type DNA Sites by Integrase Int cleavage and binding assays The half-att site suicide substrate was labeled at the 50 OH of its protruding bottom strand with [g-32P]-ATP. The cleavage reactions were carried out in a 120 ml reaction mixture consisting of 50 mM Mops-NaOH (pH 7.4), 100 mM NaCl, 10% (v/v) glycerol, 5 mM EDTA, 1 mM DTT (assay buffer) and indicated amounts of suicide substrate and enzyme. The reaction mix was incubated at 25 C and at indicated times aliquots were withdrawn into a 0.1% SDS and analyzed by electrophoresis through 15% (w/v) polyacrylamide, 0.1% (w/v) SDS (Laemmli, 1970). The gels were autoradiographed and quantitated by scanning using a phosphorimager (Fujix, Bas 1000). Gel mobility shift assays were carried out in a volume of 15 ml. The reaction mixture consisted of 10 mM TrisHCl (pH 7.5), 100 mM NaCl, 1 mM EDTA, 1 mM DTT, 10% Ficoll (binding buffer) and indicated amounts of protein (C65F or CB domain) and DNA. The protein was mixed with the 32P-labeled core-type DNA in the absence or presence of unlabeled competitor DNA and incubated at room temperature for ten minutes and loaded onto a native, non-denaturing 8% (w/v) polyacrylamide gel (acrylamide:bis ratio, 30:1, w/w). The gel was run at 50 volts overnight. The gels were autoradiographed and bands corresponding to protein ±DNA complex and DNA were quantitated by a Fuji phosphorimager. UV photocrosslinking The indicated amounts of C65F were incubated with the indicated amounts of 32P-labeled or unlabeled coretype DNA in 20 ml (analytical scale) or 1 ml (preparative scale) of 10 mM Tris-HCl (pH 7.5), 100 mM NaCl, 10% (v/v) glycerol, 1 mM EDTA, and 1 mM DTT (Buffer A) at room temperature for 20 minutes and on ice for ten minutes. The protein ± DNA complex was layered on a pre-cooled microtiter plate (analytical) or plastic Petri plate (preparative scale) and irradiated with UV light (254 nm, Stratagene Crosslinker), on ice, for 20 minutes. The reaction was quenched by the addition of SDS to 0.1% (w/v) and analyzed by electrophoresis through 15% (w/v) polyacrylamide, 0.1% (w/v) SDS (Laemmli, 1970). The gels were: (i) stained with Coomassie brilliant blue to visualize the protein bands; (ii) also subjected to autoradiography to visualize the protein ± DNA complex and quantitated using a phosphorimager. GluC digestion The crosslinked protein ± DNA complex, after separation from non-crosslinked protein and DNA (see Figure 3), was concentrated 200-fold by centrifugal concentration on Centricon-30 membranes, and washed three times with 100 mM ammonium bicarbonate (pH 8.0) and denatured with 8 M urea in the same buffer (250 ml). This was then diluted to a ®nal concentration of 2 M urea in the same buffer and GluC was added at a protein to protease concentration of 50:1 (w/w) and digested for 12 hours at 37 C. Anion-exchange HPLC of GluC peptides The GluC digest of C65F photocrosslinked to coretype DNA was resolved by anion-exchange HPLC on a Sychrom AX 300 (250 mm4.6 mm) column, equilibrated with 20 mM sodium phosphate (pH 6.8), 20% (v/ v) acetonitrile (equilibration buffer). The peptides were
525
Recognition of Core-type DNA Sites by Integrase eluted by increasing the concentration of NaCl in the equilibration buffer as follows: 0 to 30 minutes, no salt; 30 to 90 minutes, a gradient from 0 to 1.0 M NaCl; 90-110 minutes, 1 M NaCl. The ¯ow rate was 1.0 ml/min. All HPLC analyses were conducted using a Varian 9012 inert solvent delivery system equipped with a polychrome 9065 diode array detector. Peptides were monitored at 254 nm. Reverse-phase HPLC purification of GluC peptides The two peaks (I and II) obtained by anion-exchange HPLC were individually subjected to reverse-phase HPLC on a C18 column (Vydac) under ion-pairing conditions. The column was equilibrated with 10 mM triethylammonium acetate (TEAA) (pH 7.0) for ten minutes. After a ®ve minute wash with HPLC-grade water, the peptide(s) were eluted by increasing the concentration of acetonitrile in water from 0 to 50% over the period from 15 to 50 minutes. The peptides were monitored at 254 nm. Modification by pyridoxal 50 -phosphate C65F at a ®nal concentration of 6.25 mM in 20 ml of PLP modi®cation buffer (20 mM Hepes-KOH (pH 7.5), 10% glycerol, 100 mM NaCl, 1 mM DTT, 1 mM EDTA) was reacted with the indicated concentrations of PLP at 25 C for 30 minutes. Some reactions (Table 1) were in 0.3 M or 0.5 M NaCl. The reaction was terminated by the addition of a freshly prepared solution of ice-cold NaBH4 (in 5 mM NaOH) to a ®nal concentration of 10 mM. The tubes were kept on ice for 30 minutes after the addition of NaBH4. LysC digestion C65F protein, unmodi®ed or modi®ed with PLP in the presence or absence of DNA, was precipitated by the addition of ice-cold 100% TCA to a ®nal concentration of 10%. After standing on ice for one hour they were centrifuged in the cold at 15,000 g for ten minutes. The precipitate was washed twice with 5% TCA and with 95% ethanol to remove traces of TCA and dried. The samples were suspended in 50 ml of 50 mM ammonium bicarbonate (pH 8.0), 8 M urea. They were then diluted to a ®nal urea concentration of 2 M urea and digested by the addition of endoproteinase LysC, at a protein:protease ratio of 50:1 (w/w), for 12 hours at 37 C. Reverse-phase HPLC of PLP-modified peptides The LysC digest of C65F, unmodi®ed or modi®ed with PLP in the absence or presence of core-type DNA, was resolved by reverse-phase HPLC on a C18 column (Vydac), equilibrated with 0.1% TFA .The peptides were eluted by increasing the concentration of ``acetonitrile buffer'' (70% acetonitrile (v/v) in 0.1% TFA) as follows: 0 to 90 minutes, 0 to 40%; 90 to 120 minutes, 40 to 70%; 120 to 140 minutes, 70 to 100% acetonitrile buffer. The column was run at a rate of 0.7 ml/min, and monitored at 220 nm (peptide bond) and 325 nm (reduced pyridoxal 50 phosphate). The region around 85 minutes, containing the pyridoxylated peptide, was pooled and further puri®ed by reverse-phase HPLC on the same C18 column. The column was equilibrated with 0.1% TFA and a shallow gradient of acetonitrile in 0.1% TFA was applied as follows: 0 to 10 minutes, 0 to 20%; 10 to 90 minutes,
20 to 40%. The column was run at a ¯ow rate of 0.7 ml/min and monitored at 220 and 325 nm. Peptide sequencing N-terminal amino acid sequencing using the Edman degradation reaction (Edman & Begg, 1967) of peptides was performed using an Applied Biosystems 470A gas phase sequenator at the W.M. Keck Foundation Biotechnology Resource Laboratory, Yale University, New Haven, CT. The resulting phenylthiohydantion derivatives were analyzed using an on-line Applied Biosystems model 470A microbore HPLC.
Acknowledgments We thank T. Oliveira for technical assistance, J. Boyles for assistance with manuscript preparation and other members of our research groups for their assistance and comments. We thank Bob Weisberg for communication of results prior to publication. The peptide sequence analyses were carried out by the W. M. Keck Biotechnology Resource Laboratory at Yale University, New Haven, CT. This work was supported by NIH grants GM33928 and AI13544 (A.L.), by the Lucille P. Markey Charitable Trust (T.E.) and a Howard Hughes Medical Institute Predoctoral Fellowship (H.J.K.).
References Abremski, K. E. & Hoess, R. H. (1992). Evidence for a second conserved arginine residue in the integrase family of recombination proteins. Protein Eng. 5, 87 ±91. Allen, G. (1989). Sequencing of Proteins and Peptides, Elsevier, Oxford. Argos, W., Landy, A., Abremski, K., Egan, J. B., HaggaÊrd-Ljungquist, E., Hoess, R. H., Kahn, M. L., Kalionis, W., Narayana, S. V. L., Pierson, L. S. I., Sternberg, N. & Leong, J. M. (1986). The integrase family of site-speci®c recombinases: regional similarities and global diversity. EMBO J. 5, 433± 440. Basu, A. & Modak, M. J. (1987). Identi®cation and amino acid sequence of the deoxynucleoside triphosphate binding site in Escherichia coli DNA polymerase I. Biochemistry 26, 1704± 1709. Basu, A., Kedar, P., Wilson, S. H. & Modak, M. J. (1989a). Active-site modi®cation of mammalian DNA polymerase B with pyridoxal 50 -phosphate: mechanism of inhibition and identi®cation of lysine 71 in the deoxynucleoside binding pocket. Biochemistry, 28, 6305± 6309. Basu, A., Tirumalai, R. S. & Modak, M. J. (1989b). Substrate binding in human immunode®ciency virus reverse transcriptase. J. Biol. Chem. 264, 8746± 8752. Basu, A., Ahluwalia, K. K., Basu, S. & Modak, M. J. (1992). Identi®cation of the primer binding domain in human immunode®ciency virus reverse transcriptase. Biochemistry, 31, 616±623. Basu, S., Basu, A. & Modak, M. J. (1988). Pyridoxal 50 phosphate mediated inactivation of Escherichia coli DNA polymerase I: identi®cation of lysine-635 as an essential residue for the processive mode of DNA synthesis. Biochemistry, 27, 6710± 6716.
526 Benesch, R., Benesch, R. E., Kwong, S., Acharya, A. S. & Manning, J. M. (1982). Labeling of hemoglobin with pyridoxal phosphate. J. Biol. Chem. 257, 1320± 1324. Blakely, G. W. & Sherratt, D. J. (1996). Cis and trans in site-speci®c recombination. Mol. Microbiol. 20, 234± 237. Bradford, M. M. (1976). A rapid and sensitive method for the quantitation of microgram quantities of protein utilizing the principle of protein-dye binding. Anal. Biochem. 72, 248± 254. Campbell, A. M. (1962). Episomes. Advan. Genet. 11, 101± 145. Chen, J.-W., Lee, J. & Jayaram, M. (1992). DNA cleavage in trans by the active site tyrosine during Flp recombination: Switching protein partners before exchanging strands. Cell, 69, 647± 658. Dorgai, L., Yagil, E. & Weisberg, R. A. (1995). Identifying determinants of recombination speci®city: Construction and characterization of mutant bacteriophage integrases. J. Mol. Biol. 252, 178± 188. Dyda, F., Hickman, A. B., Jenkins, T. M., Engelman, A., Craigie, R. & Davies, D. R. (1994). Crystal structure of the catalytic domain of HIV-1 integrase: Similarity to other polynucleotidyl transferases. Science, 266, 1981± 1986. Edman, P. & Begg, G. (1967). A protein sequenator. Eur. J. Biochem. 1, 80 ± 91. Esposito, D. & Scocca, J. J. (1997). The integrase family of tyrosine recombinases: evolution of a conserved active site domain. Nucl. Acids Res. 25, 3605± 3614. Friesen, H. & Sadowski, P. D. (1992). Mutagenesis of a conserved region of the gene, encoding the FLP recombinase of Saccharomyces cerevisiae. J. Mol. Biol. 225, 313± 326. Gribskov, M. & Burgess, R. R. (1983). Overexpression and puri®cation of the sigma subunit of Escherichia coli RNA polymerase. Gene, 26, 109± 118. Guo, F., Gopaul, D. N. & Van Duyne, G. D. (1997). Structure of Cre recombinase complexed with DNA in a site-speci®c recombination synapse. Nature, 389, 40 ± 46. Han, Y. W., Gumport, R. I. & Gardner, J. F. (1994). Mapping the functional domains of bacteriophage lambda integrase protein. J. Mol. Biol. 235, 908± 925. Hickman, A. B., Waninger, S., Scocca, J. J. & Dyda, F. (1997). Molecular organization in site-speci®c recombination: The catalytic domain of bacterioÊ resolution. Cell, 89, phage HP1 integrase at 2.7 A 227± 237. Hoess, R., Abremski, K., Irwin, S., Kendall, M. & Mack, A. (1990). DNA speci®city of the Cre recombinase resides in the 25 kDa carboxyl domain of the protein. J. Mol. Biol. 216, 873± 882. Kikuchi, Y. & Nash, H. A. (1978). The bacteriophage l int gene product. J. Biol. Chem. 253, 7149± 7157. Kim, S., Moitoso de Vargas, L., Nunes-DuÈby, S. E. & Landy, A. (1990). Mapping of a higher order protein-DNA complex: Two kinds of long-range interactions in l attL. Cell, 63, 773± 781. Klemm, P. (1986). Two regulatory ®m genes, ®mB and ®mE, control the phase variation of type 1 ®mbriae in Escherichia coli. EMBO J. 5, 1389± 1393. Kwon, H. J., Tirumalai, R. S., Landy, A. & Ellenberger, T. (1997). Flexibility in DNA recombination: Structure of the l integrase catalytic core. Science, 276, 126± 131. Laemmli, U. K. (1970). Cleavage of structural proteins during the assembly of the head of bacteriophage T4. Nature, 227, 680±685.
Recognition of Core-type DNA Sites by Integrase Landy, A. (1993). Mechanistic and structural complexity in the site-speci®c recombination pathways of Int and FLP. Curr. Opin. Genet. Dev. 3, 699± 707. Lange-Gustafson, B. J. & Nash, H. A. (1984). Puri®cation and properties of Int-H, a variant protein involved in site-speci®c recombination of bacteriophage l. J. Biol. Chem. 259, 12724± 12732. Leong, J. M., Nunes-DuÈby, S., Oser, A., Youderian, P., Susskind, M. M. & Landy, A. (1984). Site-speci®c recombination systems of phages f80 and P22: binding sites of integration host factor and recombination-induced mutations. Cold Spring Harbor Symp. Quant. Biol. 49, 707±714. MacWilliams, M. P., Gumport, R. I. & Gardner, J. F. (1996). Genetic analysis of the bacteriophage l attL nucleoprotein complex. Genetics, 143, 1069± 1079. Martial, J., Zaldivar, J., Bull, P., Venegas, A. & Valenzuela, P. (1975). Inactivation of rat liver RNA polymerases I and II and yeast RNA polymerase I by pyridoxal 50 -phosphate. Evidence for the participation of lysyl residues at the active site. Biochemistry, 14, 4907± 4911. Moitoso de Vargas, L., Pargellis, C. A., Hasan, N. M., Bushman, E. W. & Landy, A. (1988). Autonomous DNA binding domains of l integrase recognize different sequence families. Cell, 54, 923± 929. Moitoso de Vargas, L., Kim, S. & Landy, A. (1989). DNA looping generated by the DNA-bending protein IHF and the two domains of l integrase. Science, 244, 1457± 1461. Nash, H. A. (1996). Site-speci®c recombination: Integration, excision, resolution, and inversion of de®ned DNA segments. In Escherichia coli and Salmonella (Neidhardt, F. C., Curtiss, R., III, Ingraham, J. L., Lin, E. C. C., Low, K. B., Magasanik, B., Reznikoff, W. S., Riley, M., Schaechter, M. & Umbarger, H. E., eds), pp. 2363± 2376, ASM Press, Washington. Nunes-DuÈby, S. E., Matsumoto, L. & Landy, A. (1987). Site-speci®c recombination intermediates trapped with suicide substrates. Cell, 50, 779± 788. Nunes-DuÈby, S., Tirumalai, R. S., Kwon, H. J., Ellenberger, T. & Landy, A. (1998). Similarities and differences among 105 members of the Int family of site-speci®c recombinases. Nucl. Acids Res. 26, 391± 406. Pan, G., Luetke, K. & Sadowski, P. D. (1993). Mechanism of cleavage and ligation by the FLP recombinase: Classi®cation of mutations in the FLP protein using in vitro complementation analysis. Mol. Cell. Biol. 13, 3167± 3175. Panigrahi, G. B. & Sadowski, P. D. (1994). Interaction of the NH2- and COOH- terminal domains of the FLP recombinase with the FLP recognition target sequence. J. Biol. Chem. 269, 10940± 10945. Pargellis, C. A., Nunes-DuÈby, S. E., Moitoso de Vargas, L. & Landy, A. (1988). Suicide recombination substrates yield covalent l integrase ± DNA complexes and lead to identi®cation of the active site tyrosine. J. Biol. Chem. 263, 7678±7685. Parsons, R. L., Prasad, P. V., Harshey, R. M. & Jayaram, M. (1988). Step-arrest mutants of FLP recombinase: Implications for the catalytic mechanism of DNA recombination. Mol. Cell. Biol. 8, 3303± 3310. Richet, E., Abcarian, P. & Nash, H. A. (1988). Synapsis of attachment sites during lambda integrative recombination involves capture of a naked DNA by a protein-DNA complex. Cell, 52, 9 ± 17.
Recognition of Core-type DNA Sites by Integrase Ross, W., Landy, A., Kikuchi, Y. & Nash, H. (1979). Interaction of Int protein with speci®c sites on l att DNA. Cell, 18, 297± 307. Sadowski, P. D. (1993). Site-speci®c genetic recombination: hops, ¯ips and ¯ops. FASEB 7, 760± 767. Sambrook, J., Fritsch, E. F. & Maniatis, T. (1989). Molecular Cloning, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York. Sanger, F., Nicklen, S. & Coulson, A. R. (1977). DNA sequencing with chain terminating inhibitors. Proc. Natl Acad. Sci. USA, 74, 5463± 5467. Schoner, B. E., Belagaje, R. M. & Schoner, R. G. (1986). Translation of a synthetic two-cistron mRNA in Escherichia coli. Proc. Natl Acad. Sci. USA, 83, 8506± 8510. Stark, W. M., Boocock, M. R. & Sherratt, D. J. (1992). Catalysis by site-speci®c recombinases. Trends Genet. 8, 432± 439. Studier, F. W. & Moffatt, B. A. (1986). Use of bacteriophage T7 RNA polymerase to direct selective highlevel expression of cloned genes. J. Mol. Biol. 189, 113± 130. Subramanya, H. S., Arciszewska, L. K., Baker, R. A., Bird, L. E., Sherratt, D. J. & Wigley, D. B. (1997).
527 Crystal structure of the site-speci®c recombinase, XerD. EMBO J. 16, 5178± 5187. Tabor, S. & Richardson, C. C. (1985). A bacteriophage T7 RNA polymerase/promoter system for controlled exclusive expression of speci®c genes. Proc. Natl Acad. Sci. USA, 82, 1074± 1078. Tamura, J. K. & Gellert, M. (1990). Characterization of the ATP binding site on Escherichia coli DNA gyrase. J. Biol. Chem. 265, 21342± 21349. Tirumalai, R. S., Healey, E. & Landy, A. (1997). The catalytic domain of (site-speci®c recombinase. Proc. Natl Acad. Sci. USA, 94, 6104± 6109. Wierzbicki, A., Kendall, M., Abremski, K. & Hoess, R. (1987). A mutational analysis of the bacteriophage P1 recombinase Cre. J. Mol. Biol. 195, 785± 794. Williams, K. R. & Konigsberg, W. H. (1991). Identi®cation of amino acid residues at interface of protein-nucleic acid complexes by photochemical crosslinking. Methods Enzymol. 208, 516± 539. Yagil, E., Dorgai, L. & Weisberg, R. (1995). Identifying determinants of recombination speci®city: Construction and characterization of chimeric bacteriophage integrases. J. Mol. Biol. 252, 163± 177.
Edited by M. Gottesman (Received 12 January 1998; received in revised form 18 February 1998; accepted 2 March 1998)