J. Nol. Viol. (1986) 189. 603-616
Structural and Regulatory Divergence Among Site-specific Recombination Genes of Lambdoid Phage John M. Leong’, Simone E. Nunes-Diiby’, Allen B. Oser’, Cammie F. Lesser’ Philip Youderian2t, Miriam M. Susskind2t and Arthur Landy’j: ’ Division of Biology and Medicine Brown University, Providence, RI 02912, F.S.A. 2Department of Molecular Genetics and Microbiology University of Massachusetts, Worcester, MA 01605, V.S.A. (Received 11 March 1985, and in revised form 28 January 1986) The lambdoid bacteriophage $80 and P22 have site-specific recombination systems similar to t’hat of A. Each of the three phage has a different insertion specificity, but structural analysis of their attachment sites suggests that the three recombination pathways share similar features. In this study, we have identified and sequenced the int and xis genes of $80 and P22. &30 int and xis were identified using a plasmid recombination assay in z&o, and the P22 genes were mapped using Tnl insertion mutations. In all three phage. the sit’especific recombination genes are located directly adjacent to the phage attachment site. Interestingly, the transcriptional orientation of the 480 int gene is opposite to that of i and P22 int, resulting in convergent transcription of $80 int and xis. Because of its transcriptional orientation, 480 int cannot be expressed by the major leftward promoter. P,, and the regulatory strategy of 480 integration and excision must differ significantly from that of 1. The deduced amino acid sequences of the recombination proteins of the three systems show surprisingly little homology. Sequences homologous to t’he 1 P, promoter are more conserved than the protein-coding sequences. Nevertheless, the Int proteins are locally related in the C-terminal sequences, particularly for a stretch of some 25 amino acid residues that lie approximately 50 residues from the C terminus. The Xis proteins can be aligned at their N termini.
1. Introduction
phage, a phage-encoded integrase (Int) protein and a host-encoded integration host fact’or (IHF)§ are required for integrative recombination (Franklin et al., 1965; Signer & Beckwith. 1966; Smith & Levine, 1967; Kikuchi & Nash, 1978; Miller & Friedman, 1977: Miller et al., 1979). Excision additionally requires the phage-encoded excisionase (Xis) protein (Guameros & Echols, 1970; see below). Therefore, despite their different site specificities, these three recombination systems catalyze similar recombination events. The partners in the integration event are the phage and the bacterial attachment sit’es, attP and attB, respectively. In each of the three systems attP and attB share a small (15 to 46 bp) region of homology, termed the “common core” (Landy & Ross, 1977; Leong et al., 1985a). The so-called
Tntegration and excision of a lambdoid phage genome during lysogeny and induction occurs by reciprocal site-specific events recombination between specific attachment (att) sites on the phage and bacterial chromosomes (Campbell, 1962). We have been studying these recombination events in I., 480, and P22, three phage that integrate at different specific loci on the host chromosome (Matsushiro, 1963; Jessop, 1976; Chan & Botstein, 1976). The three phage share extensive nucleotide homology and similar genetic organization; recombinants viable in viwo have been isolated (for a review, see Campbell & Botstein, 1983). For each
t Present address: Department of Biological Sciences, Ilniversity of Southern California, Los Angeles, CA 90089, l-.S.A. t Author t,o whom reprint requests should be addressed. OW2-2S36!86/120603-11
$03.OOjO
5 Abbreviations kb. lo3 base-pairs;
used: IHF, integration bp. base-pairs.
host factor;
603 0 1986 Academic Press Inc. (London)
Ltd.
604
J. M. Leong et al.
“arm” sequences that lie on the left’ and right side of the attP core are termed P and P’, respectively. Similarly, the flanking sequences in attB are termed B and B’. The crossover event occurs within the common core, thus preserving an intact common core in each of the att sites that flank the integrated prophage, attL (BOP’) and uttR (POB’). Site-specific recombination between uttL and uttR results in prophage excision, and the regeneration of at@ and attB.
Of the three site-specific recombination systems, 1 has been studied most extensively (for reviews, see Nash, 1981; Campbell, 1983; Weisberg & Landy, 1983). The ,? int gene lies to the right of uttP (see Fig. 9), and codes for a 40.3 x lo3 M, basic protein (Hoess et al., 1980; Davies, 1980) that has topoisomerase activity (Kikuchi & Nash, 1979) and is responsible for the actual cleavage and rejoining during strand-exchange (Craig & Nash, 1983). 1 Int protein binds two unrelated families of recognition sequences; one family is found in the att site core regions and the other is found in the flanking arms, P and P’ (Hsu et al., 1980; Ross & Landy, 1982, 1983). The flanking arms also have binding sites for IHF (Craig & Nash, 1984) and several observations suggest the formation of a large recombinogenic protein-DNA complex that Better et al. (1982) have (see also Pollock & Nash. called an “intasome” 1984). The 1 xis gene transcriptionally precedes and partially overlaps the int gene (see Fig. 9) (Hoess et nl., 1980; Davies, 1980). Xis is an 8.6 x lo3 M, basic protein that binds co-operatively to two tandemly repeated sites in the P arm (Yin et al., 1985). This binding stimulates Int binding to an adjacent site (Rushman et al., 1984) and is required for the formation of a large complex involving Int, Xis and attR (Better et al., 1983). In addition to being required for normal excisive recombination, Xis is inhibitory to integrative recombination (Nash, 1975; Abremski & Gottesman, 1982). Thus, it is not surprising that, the levels of Xis and Int are controlled by an elaborate set of mechanisms that includes the transcriptional activator ~11 protein, the antiterminator N protein and a &-acting negative regulatory sequence, sib (for a review, see Echols & Guameros, 1983). These controls result in t’he expression of int, only when required for integration or excision, and the repression of xis during phage integration (see also below). Despite a general lack of sequence homology among the att sites of 480, P22 and %, several features suggest a similarity in the mechanisms of recombination in the three systems. In the 2 system, the Int molecules that specifically bind to the core region recognize two sequences present as an inverted repetition (Ross & Landy, 1983), and in executing strand exchange these molecules make 7-bp staggered nicks within the core (Mizuuchi et al., 1981; Craig & Nash, 1983). Although t’he functional features of the $80 and P22 systems are less well-defined. all available information is consistent wit,h very similar mechanisms for strand
exchange: genetic evidence suggests that c#&) Int generates staggered nicks within the core that are between 5 bp and 9 bp apart (Leong et al., 19856). and the DNA sequences of the 480 and P22 core regions reveal inverted repetitions t,hat are likely candidates for Int recognition sequences (Leong et al.. 1985u). Furthermore, the same host accessory protein, IHF? has been implicated in all three systems (Miller & Friedman, 1977; R. Weisberg, personal communication). As is the case for 3,, IHF specifically binds to @SOand P22 utt DNA; although the number of uttP IHF sites differs for the three phage, the relative spacing and orientation of s&es are strikingly conserved (Craig & Nash, 1984; Lrong et aZ., 1985u). The site-specific recombination proteins of t,he three closely related phage are probably responsible for mechanistically similar events with different specificities. To begin to understand the nature of this specificity at the level of protein structure, we have identified and sequenced the int and xis genes of P22 and 480. Given the similarities of the three systems, we find that the primary structures of their proteins are, in general, remarkably different. Interesting exceptions to this global divergence are apparent only when local homologies are used to align all three proteins in a few limit,ed regions. Furthermore, although the events cat,alyzed by these proteins underlie key developmental decisions of lambdoid phage, the mechanisms of control of int and xis expression, and even the relat,ive arrangement of genes, are not, strictly conserved among t,he three phage.
2. Materials and Methods (a) Phage strains @Oh was a gift from C. Yanofsky. PBZint-am137 has been described (Hilliker, 1974). The phage P22int: Tnl and P22xis : Tnl analyzed in this study were derived from phage originally made by Weinstock (1977). The original and P22sieA44Ap75. carry phage, P22sieA44Ap77 insertions of the translocatable ampicillin-resistance element Tnl. Both insertions were mapped genetically to the left end of the prophage chromosome, and result in an excision-deficient phenotype: P22sieA44Ap77 excises at 0.05% of the efficiency of t,he wild-type, and P22sieA44Ap75 at, 05% (M. M. Susskind. unpublished results). P22 requires a terminal repetition of its genome for circularization and subsequent growth (Tye et nl., 1974); the Tnl insertions in these P22 mutants result in the packaging of genomes without terminal repetition. To restore terminal repetition, we crossed into each a compensating deletion, Al%, in the region of the gene using P22,GeA44Al84. that encodes tail protein, P22sieA44A184 is derived from P22sieA44Ap7 by the method described by Weinstock et al. (1979). Lysogens of P22sieA44Ap75 and P22sieA44Ap77 (HB7419 and HB7421. respectively) were induced at mid-log phase growth with mitomycin C (I!pg/ml). grown for 25 min and infected with Pd2sieA44A184 at a multiplicity of infection of 1. The lysates were plated (with exogenous tail protein), and plaques were tested for ampicillinresistant lysogens. P22sieA44A184Ap77 was shown to he int-: P22sieA44A184Ap75 was shown to be id XI? (see Results).
605
Divergence Among Recombination Genes of Lambdoid Phage (b) Plasmids pJLll0 was derived from pTP84, a gift from A. Poteete. and carries t’he BamHI (0.481)-AwaI (0551) fragment from P22 (Leong et al., 1985a). pJL115 is an identical construction. but is derived from P22int-am137. p.JLlO carries the 54-kb SmaI (SmaI-F) fragment from @Oh (Leong et al., 1985a), cloned into the HincII site (at 3907) of pKT21, a derivative of pBR322 lacking the HincII site at position 650 (a gift from K. Talmadge and W:. Gilbert). pJL20 carries the 1.65-kb SmaI (SmaI-G) fragment from @Oh cloned into the same site. pJL80 contains XmnI (-5 in relation to attP core)-SphI (+ 1661) fragment from #SO. pSN14 was derived from pJL10 by an Int-mediated inversion (see Results). pCL14 (xis-) was derived from pSNl4 by the deletion of the segment between the restriction sites ScaI (3846 in pBR322) and AvaI (+ 1492 in relation to attP). pCL15 carries the same deletion, but is derived from pJL10. The following derivatives of pJL10 are int- by virtue of a 4-base duplication at’ restriction sites within the gene: (i) pSN17 (Xho: +749); (ii) pJL13 (BclI: +1321); (iii) pJL14 (BcZI: + 195); (iv) pJL16 (BamHI: +436). (c) $80 site-specijc recombination in serially passaged cells Escher%chia coli HBlOl carrying plasmids were diluted 1 : 500 and grown t’o saturation (i.e. “passaged”) in LB broth plus 20 pg tetracycline/ml at, 30°C or 35°C. Successive passages were carried out each day. Plasmid DNA was isolated from lo-ml cultures, digested with the endonuclease ClaI. and separated by agarose gel electrophoresis. (d) Testing integration and excision in P22 P22 phage were tested for their ability to integrate by the pick test of Smith & Levine (1967). Excision was tested by EcoRI restriction analysis of DNA packaged into phage particles after induct,ion of a lysogen. Lysogens were grown to 6 x lo* cells/ml in LB medium or LB +50 pg ampicillin/ml at 37°C and induced by the addition of mitomycin C (Sigma) to a final concentration of 2 pg/ml. Cultures were incubated at 37°C for 3 h, when visible lvsis occurred. Tail-less P22 particles were purified as described above for P22 phage. Phage stocks were grown Iytically as described by Youderian & Susskind (1980). (e) Prepuration and sequencing of labeled restriction fragments Plasmid DNA was prepared and restriction fragments were 5’ or 3’ end-labeled as described (Ross & Landy, 1983). Sequencing reactions were performed as described by Maxam Br. Gilbert (1980). Sequence was determined across all restriction sites, and on both strands, except for the region of P22 from + 1356 to + 1408. The same strand of this region was sequenced from 2 different restriction sites. (f) Primary
structure comparisons
Compuber algorithms similar to that of Needlemann & Wunsrh (1970) were used to generate the best pairwise alignments. Two scoring systems were used to generate and evaluate global alignments: one that only counts perfect’ identities (Sauer et al., 1982) and one that utilizes
the mutation data (PAM) matrix t,o score each substitution. depending on the physical similarity of the 2 residues (ALIGN program: Dayhoff et al., 1978). Both systems penalize insertions in 1 sequence relative to the other: the former method used a gap penalty of 2.5 residue identities, and t,he latter program utilized a penalty parameter of 6 (see Dayhoff el al., 1978). Statistical significance was determined as described in Results. For 480 Int, since the site of translation initiation is not known, the longest potential sequence was used. Local pairwise alignments were derived using the algorithm of Goad & Kanehisa (1982). Significance was determined by the same method as for t,he global alignments. Three-way alignments were generated using combinations of pairwlse alignments.
3. Results (a) Maps of the 480 and P22 att-int’
,regions
The restriction maps of the ~$80 and P22 att-int regions were determined by standard techniques, and are shown in Figure 1. Some of the restriction sites were identified previouslv (Moore & James, 1978; Jackson et al., 1978; Ciisholm et al., 1980: Leong et al., 1985a). (b) ~$80 int gene The 480 int gene was identified using a plasmid recombination assay in viva (Leong et al., 1985b). The plasmid pJLl0 carries the 480 5+kb SmaT-F fragment. attP is located 1.65 kb from the right end of this fragment, and a secondary att site, att41.4 is situated 1.4 kb to the left of attP. Because attP and att41.4 are in opposite orientation, recombination between them inverts the intervening DNA segment,, an event that can be monitored by examining the products of an appropriate restriction digestion. Serial dilut’ion and growth of HBlOl carrying pJLl0 results in the gradual accumulation of plasmid with the segment in the inverted orientation (Fig. 2(a)). Franklin (1967) previously mapped the 480 int gene to the right of attP. We generated small duplications in this region of pJL10. If the duplications are within the int gene, inversion of the intervening segment should be abolished. Four separate 4-bp duplications were made in the region to the right of attP (Figs 1 and 5). pJLl0 was linearized using restriction endonucleases that leave a four-base 5’ overhang; the staggered ends were filled in using Klenow fragment of DNA polymerase, and the plasmid was recircularized. Tn each case. a new six-base rest’riction site was created at the duplication. and each mutant plasmid was checked by the appropriate restriction digestion (data not shown). No plasmid inversion was detectable for any of the four resulting plasmids. even when passaging was extended fivefold longer than was required to detect inversion of pJLl0 (Fig. 2(a)). The region extending 1a8 kb to the right of attP was sequenced by the chemical degradation method (Maxam & Gilbert, 1980). and this analysis reveals
606
J. M. Leong et al.
_--AXIS 17
BclI” XmnI
stu1 (a)
@80
,
a ’
,
r
i OffP
.
C/d1 Cl01 P22
,
‘1,
! ’
.
&I ,‘I
. 500
,
XhoI*
BclI”
Hp.71 NdeI
CloiI
.‘I
HpoI
, ’ ,
.’ 1000
r
inf :: Tn I ~~---.~~~~---.--~~~~-~
XmnI
(b)
BornHI”
ECORV
’ ,
HlflCiT *
‘=
OftP
.
XmnI 1 MstIl
AccI
‘,
‘,
’ .
I
500
.
‘. ’
r
. 1500
, “,
,
.
1000
.
1
I
c
Tnl tf/ndlIi
ECORV I
/ 2000
+-h/1
NoeI H!r;dIU 1
S/i701
1”
x/s::
SphI
1,’
ECOk .
I
,
1500
,
‘1,
.
r
2000
Figure 1. Physical and genetic maps of the q%O and P22 a&int regions. Map distances are given in bp: co-ordirmte zero is in the core region of attP. The location and transrriptional orientation of the itrt and .ris genes are indicated t)? broad arrows. The broken open box for 480 int indicates uncertainty in the start site of translation. The secluen(~ing strategy for each strand is represented by thin arrows below the map. (a) #80. Asterisked restriction sites ( *) indicate location of 4 bp duplications that abolish Int-mediat’ed inversion (see the text). The open box above the map. labeled Axis, indicates the extent and location of the smallest xis deletion (see Fig. 3). (b) P22. Broken lines aborc~ the map indicate the approximate locations of int- and xis- Tnl insertions.
an open reading frame starting at + 135 and extending rightward for 1247 bp (Figs 1 and 5). It is the only reading frame that encompasses all four insertions; in fact, no other coding region on either Also, the strand spans even two mutations. predicted amino acid sequence is distantly related to both the 2 and P22 Int proteins (see below). We conclude that this is the correct reading frame for 480 int. It is intriguing that 480 int is transcribed left, to right, opposite to the transcriptional polarity of A and P22 int genes; regulatory implications of this fact are discussed below. There are several potential int gene st,art codons, ATG or GTG, proximal to the mutations. Also we cannot exclude the possibility that the mutation farthest upstream (+ 195) disrupts the expression of int rather than the actual coding sequence, disrupts no canonical although this insertion promoter sequence (Rosenberg & Court, 1979: Hawley & McClure, 1983). Depending on which start codon is used, 480 Int is between 338 and 416 amino acids long (39.5 x lo3 to 48.4 x lo3 M,) and thus is similar in size to the 356-residue ,? Int. Definitive identification of the 480 int start codon awaits N-terminal analysis of t,he purified protein.
Like 1 and P22 Int, (see below), $80 Int basic, wit,h at least 19.59/;, basic residues.
is highly
(c) f#180 xis genw The $80 ,~is gene was also identified using a plasmid assay in I’~LV. The plasmid pS?;14 is derived from pJL10 by Tnt-mediat’ed inversion of t.he segment between nttP and a.!@1 4. Serial passage of HBlOl carrying pSXl4 results in t,he formation of plasmids with re-inverted segments. i.e. plasmids that have undergone a, complete cycle of inversion and re-inversion (Fig. 2(b)). Sinctl the re-inversion event is equivalent to an nttL X attK recombination, we expect it to be dependent on Sis. To test, t,his hypothesis and t,o map the .ris gene. WP constructed several delet)ions of p8N 14 lacking DNA in the region t#o the right of the inf gene. The deletions extend for varying distances left,ward from the Sea1 sit.e in the vect.or DNA (position 3846 in pBR322. 60 bp from the border of the insert). The left delet,ion endpoints are + 1486 (d ~1~1). +1564 (MaeIII) for pCL1-l. + 1563 (FokT). pSNIBF, and pSNl5M. respectively (Figs 2(b) and 3). Additionally, in pSN20, a 57.bp deletion from
607
Divergence Among Recombination Genes of Lambdoid Phage (a)
pSN17
OJLIO
Passaqe
N
Ib)
pSN 14
Passage
pSN
pCLl4
15F
pSN20
Passage
Figure 2. Testing for 480 integrative and excisive recombination in serially passaged cells. Plasmid DKA was isolated from seriall~passagedcellsanddigestedwiththeendonucleaseCla1; uponinversion, the3.4-kband2.2-kbrestrictionfragments (N) give rise to 3.8-kb and 1.8-kb fragments (In). The passage number is given above each lane. (a) 480 int assayed by inversion. Serial passage of pJL10 (int ’ ) results in the gradual accumulation of plasmids in inverted orientation. Serial passage of pSX’17 (which has a 4-bp duplication at + 749) does not give rise to inverted plasmids. (b) 480 xis assayed by re-inversion of an inverted plasmid. pSN14 (zis+) gives rise to re-inverted plasmids (re-inverted products do not accumulate presumably because Int promotes a second cycle of inversion). For deletion endpoints of pCL14. pSS15F and pSK20 see Fig. 3. the ScaI site to the vector-insert junction was made using an AhaII site (position 3902). Each of the deleted plasmids was tested for its ability to undergo re-inversion. As shown in Figure 3, only the three longest, deletions abolish re-inversion. These deletions do not abolish Int activity, because
when identical deletions are made in non-inverted plasmids, attP x att41.4 recombination still proceeds efficiently (data not’ shown). We infer that 480 xis is located in the region just t’o the right of int. The xis genes of 1 and P22 are situat,ed in similar positions (see below).
608
J. M. Leong et al.
I
1400 ACLI~
Aval I
8
FokI I MaelJI II
1500
,
AhaII u
ScaI
1600
-
-
Excision
-
A-
Integration
-
+ fus:.Tn/
AsNI~F
-
-
t
ASNI5t.d
-
-
NT
+
NT
ASN20
Figure 3. Deletion mapping of the 480 xis gene. Four deletions were made to map 480 xis. The continuous thin line represents 480 DNA. The wavy thin line represents rector DNA (originally derived from pBR322). Numbers below 480 DNA indicate position with respect to the common core region. Open bars represent the extent of DNA deleted in each plasmid. All 4 deletion plasmids were tested for their ability to undergo excisive recombination (see Fig. 2(b)). Additionally, ACLl4 and ASNl5F were tested for their effect on integrative recombination. as assayed by attP x attA plasmid recombination similar to assays shown in Fig. 2(a). These result,s are tabulated at the right: -. no recombination detected: + , recombination detected; NT. not tested. The filled arrow indicates the position and transcript,ion orientation of the deduced 480 xis gene.
Inspection of the sequence in this region reveals a start codon preceded by a canonical ribosome binding site and followed by an open reading frame of 64 codons (7.7 x lo3 M,; Figs 1 and 5; Shine & Dalgarno, 1974). No other reading frame (preceded by an AUG codon) that is affected by the smallest xas- deletion is longer than 23 codons, and none has a canonical ribosome binding site. This open reading frame, like t’he 1 and P22 xis genes, is transcribed right to left, and its predicted amino acid sequence shares significant homology to P22 Xis, as well as localized regions of homology to I Xis (see below). We conclude that this open reading frame is the 480 xis gene. The 480 xis and int genes have opposite transcriptional polarity and directly abut’ each other; t’heir coding sequences are separated by only 2 bp (Fig. 5). (We do not know the source of the xis transcript in pSN14 and pSN20; by DNA sequence there is no obvious promoter in the 480 insert DNA upstream from xis. and it is possible that xis is expressed from a promoter within the vector sequence.) The 65 residue 480 Xis is comparable in size to the 72residue 2 Xis protein, and both proteins are basic. (d) P22 int gene P22Ap77 was used to identify the P22 int gene. This phage was determined to be intby the pick test of Smith & Levine (1967). Rest’riction digestion of phage DNA localized the Tnl insertion between +261 (M&II) and +835 (HindIII; data not shown). DNA sequence analysis of the region 2.2 kb
B-
-
Dfus -
FG-
Figure 4. EcoRI restriction profile after induction of an excision-defective p22xG : : Tnl lysogen. EcoRI profiles of DNA from phage particles grown lytically (1~. lanes 1 and 3) or produced after induction of a lysogen (in. lanes 2 and 4). Lanrs 1 and 2, P22sieA44A184 @is’); lanes 3 and 4. P22sirA44A184Ap75 (xi.9 : : Tnl). Fragments were resolved 011 a 0.75% (w/v) agarose gel. The bands are lettered according to the system of Jackson et nl. (1978). with the hand labeled “fus” arising from the fusion of EcoRI fragments E and C by the Al84 deletion. fus : : Tnl is the same band with the Tnl insertion in .ris. P22sieAAI84 gives an identical profile whether grown lytically or by induction. In cont,rast, P22&eA44Al84Ap75 results in Ijackaging of DNA in situ and an rxcisiondefective EcoRT restriction profile (see Weaver & Lrvine. 1978).
to the right, of attP reveals a long open reading frame, transcribed right, to left on t,he prophage map, that, extends to +45 in t.he P’ arm (Figs 1 and 6). Confirmation that this is indeed the correctit reading frame for P22 int comes from analysis of P22int-nm137: this phage has undergone a c. G --+ T. A transition ate +230 cbreatinp a TAG stop codon in t)his reading frame (Fig. 6). Two AT(: codons (at’ posibions + 1205 and + 1037) are preceded by potential ribosome binding sites. and initiation at, either of these sit’es would give rise to a, protein of 387 or 331 amino acids (44.3 x IO3 ot 38.3 x IO3 && respectively: Fig. 6). (‘omparison of
Divergence Among Recombination Genes qf Lambdoid Phage
609
TIPIP~TGTCATTTGGCATATTACGAA~AATTCCGCGTAAAAACGTTCTG~TACGCTAAACCCTTATCCAGCAGGCTTTCAAGGATGTAAACCATAACACTCTGCGAACTAGTGTTACAT 50 C NH I3 ValGlyAsnPheValTyrThrPheValTyrProLysThrLysMe~CysThr~isSerMe~IleThrAspTh~LysLeuA~gLysAlaLeuGlyLysLysA~gAsp TGCGTGTAGCTTTGAGTGGGCAACTTTGTGTACACTTTTGTGTACCCAAAAACAAAAATGTGTACCCATTCAATGATCACCGACACA4AGCTCAGGAAGGCGCTCGGCA4GAAAAGAGAT
t
is0
*
ioo
200
AspIleGlullelleSerAspSerHisGlyLeuAsnAlsArg~leSerGlnAlaGlyLyslleSerPhePheTyrArgTyrArgTrp4'sGlyLysAlsvslLysLe~4snvs'G'yAsp GATATCGAGP!TATTTCTGATTCGCACGGGCTCAACGCCAGAATCAGCCAGGCCGGAAAA~TATCATTTTTCTATCGGTATCGCTGGGCCGGTA4AGCGGTAAAACTCAATGTTGGTGAT I 300 250
350
MetSerValA~sGlUA'sPheAS"TyrTrpIleGluArgHisCysIleAla4snGlyLe~ValLysvs'AspTyrTyrArgGlnValPheGl~Lys~isIleAlaGluPr~MetLysAsn ATGTCCGTTGCCGAAGCGTT~AATTACTGG4TTGAAAGGCACTGTATCGC~AACGGGCTAGTTA4AGTCG~TTACTATCGCC4GGTGTTTGAGAAACATATCGCCGAACCGATGA4GAAT 500 550 ValLysValAspAsnThr4laLysMetHisTrpIleAsnValPheAspSerlleGluSe~ArgVslMe~A'a~isTy~Me~LeuSerLeuCysLysArgAlaPheArgPheCysValAsn ~TCAAAGTCGAT4ACACAGCGAAAATGC4CTGGATCA4CGTCTTCGATTC~ATAGAAAGCAGGGTGATGGCTCATTACAlGCTTTCGCTGTGCAA4CGGG~GTTTAGGlTCTGCGTTAAC 600 650
7"O
ArgSerValI'eAlaSerAsnProLeuGluGlyLeuieuPrOSerAspVa'GlyGlnLysProLysLysArgThrArgArgMe~AspAspAspAspLe~A~gLysIleTyrGlnTrpLeu AGAAGTGTGATCGCCTCAA4CCCGCTCGAG~GATT4CTGCCAlCTGATGTCGGGCA4AAGCCT4AAAAGAGAACTCGCAGGATGGACGATGACGATClGCGCAAAATCTATCAGTGGTTG Boo * 750 LysSerHisMetSerIleGluSerValPheLeuValLysPhelleMe~LeuThrGlyCysArgThrAlaGlulleArgLeuSerGluArgSerTrpPheArgLeuAspAsp4snGluTrp AAAAGCCATA~GTCGATAGAGTCCGTTTTCCTGGTGAAATTTATTATGCTTACCGGAlGCCGT4CGGcTGAGATTCGACTTAGTGAG4GATCATGGTTTCGATTG~~ATG4TA4TGAGlGG I 350 400
450
ValValProAlaG'ySerTyrLysThrArgValHisIleArgArgGlyLeuSerAspAlaAlaValAs"LeuValArg4snHisLeuLysLysIle4snThrAsnHisLeuValThrSer ~TCGTGCCTG~GGGCAGTTA~AAAACTCGG~TACATATTA~~~~GGG4CT~TCAGACGCC~CCGTTAACCTGGTCAGAAATCACC~CAAG~AAATAAACA~CAATCACCTGGTGACTTC4 105c G'"A~gLYsIleAsPGlYG'YIleLYsAsPSerProvalHisSerPr~ValAlaSe~AsnTy~AlsA~gSerIleTrpAsnGlyThrGlyMe~AlaG'uTrpSerLeu~isAspMe~Arg CAACGTAAPATTGATGGCGG~ATCAAAGATTCGCCCGTTCATTCACCTGTGGCATCCAATTACGCCCGTT~TATTTGGAATGGAACAGGTATGGcAGAGTGGTCGCTTCATGATATG4GG I ' ' 1100 115c A~gTh~I'eAlsTh~As"LeuSerGluLe~GlyCysProProHisVslIleGl~LysLe~Le~G'yH~sGl"MetvslG'yVslMetAls~isTyrAsnLe~HisAspTy~I'eAspAsp CGGACGATAGCCACAPATCTCTCTGAGTTAGGTTGCCCGCCGCACGTAATTGA4AAGCTGCTCGGGCATC4G4TGGTGGGGGlTATGGCGCATTACAACC~TCATGACT4TATCGATGAT 1250 ' 1200
13c3
*
GlnLysHisT~pLeuArgvalTrpGlnSerHisLeuGluGluIleI'eG'YG'~P~~P~eSe~~~~ 1400 ~AGAAACACT~GCTCCGCGTTTGGCAGAGCZPTCTTGAAG~G4TCATCGGAGAGCCCTTC~GTTAATTTATCTTCTTTTT~TCCTCCCACTCTTTGATTGACTCAGAGCG~CAGCGGTTA 1350 ~~~As"I'eLysLysLYsAspGluTrpGluLysIleSerGluSerArgTrpArgAs"p ‘ 1500 1450 ~GGTTGCCGGdCCAGTCAGG~GGTGGGAAC~GGCATACGAAGCCCCGAGGCATTGTGTCT~cACTTTGCCATGACC4AAG~GTTTTGCGTGA4ATTTTGT~GCGACTGGT~AGGTCTGAC roAsnGlyProTrpAspPrOPrDProPheP~~CySvs'PheG'yA~gP~O~e~ThrAspAlaSerG'"TrpSerTrpLe~ThrLysA~gSerI'eLysTy~A~gSe~Th~Le~AspSe~T
1550
LACll4 ~TTACCAAAA;PTCATCJtATdGCTCT~~GTTGCCCGTTCGGGCCATT~AA4ATCTTT?TCA4CCAACCTGCCCGGG i600 hrValLe~lleAspAspMe~-F b
h %T
IA3
Figure 5. Nucleotide sequence of the &O int-ris region. Co-ordinat,e zero is in the attP core region (see Leong et al., 1985a). The sequence starts at, core and extends 59 bp beyond xis. Only the int coding strand of DKA is shown, with the predicted amino acid sequence of Int’ above the DNA sequence. The sbart’ codon for int is unidentified. and we have show-n the longest potentialintcodingsequence,from + 135to + 1382.Four-bpduplicationsatpositions + 195. +436, + 749and + 1321 abolish Int-mediated plasmid recombination and are designated by asterisks ( * ). The Dh’A strand shown is the antisense strand for xis, and the predicted amino acid sequence of Xis is shown below the DNA sequence. The left endpoints of the xiu- deletions ACL14, ASNl5F and ASN15M are indicated by bent arrows. The fMet codon for xis, at position + 1579. is boxed. and the potentia,l ribosome binding site for xis is overlined. TER indicates translation termination codon. Sequence hyphens have been omitted in t’his and subsequent Figures for clarity.
the deduced amino acid sequence with the N-terminal sequences of the Int proteins of two non-lambdoid phage, 186 and P2, reveals homologies that predict that the proximal ATG (at + 1205) is the start codon for P22 int (Argos et al., 1986). This start site is consistent with t’he electrophoretically determined molecular weight of P22 Int. 42 x lo3 (Youderian & Susskind, 1980). P22 Int is slightly larger than J. Int (which is 356
residues).
and is also basic. with
Slow!;, basic amino
acids. (e) P22 xis gene We also utilized a Tnl insert’ion to identify the P22 xis gene. P22Ap75 is integration proficient, as determined by the pick test of Smith & Levine (1979). but is excision defect’& by t’he test of
610
J. A’. Leong et al.
CTAACGCCAAGCTAAAATCCTTGCCTG~ATGGAATTACTACATAACCAACCACTACGGGATTATGCGTACAGGAATTGGTGGTCAATCGGGCCGTGGCTTTGGTTTCGTTATGTTATCAA I I 1700 1656
CGTACGG~GGCAAGCACCCGCAACGCGATGATTGTCTTATTTTTGCAATACCAAATA~CAAAGAAGAAAGGCATGGCGTAGTTGTTA~CCCTGATGCATTCAAGAAA~TAACTTACGGGC I # 1600 1550
1500
~~~~tGlUS~~HisSerLeuTh~L~~A~pGl~Al~Cy~A'~Ph~L~~LysIl~S~~A~gP~OTh~A~ AATTCTACGACATGCCTAACGCAAAAGAAGACGAAGA~GAAACAGC~TAAAC~AATCACACAGCCTCACAC~TGATGAGGC~TGTGCATTTCTCAAGATATCCAGACCTACCGC 1450 1400
I
I
aThrAsnTrplleArgThrGlyArgLeuGlnAlaThrArgLysAspProThrLysProLysSerProTyrLeuThrThrArgGlnAlaCyslleAlaAlaLeuGlnSe~P~~LeuH~sTh CACCAAC!GGATTC~~~SAGGCCGACT?CAGGCAACA~GTAAAGACCCCACCAAACC~AAATCCCCT?ACCTCACCA~ACGACAAGCCTGCATTGCG~CACTTCAATCTCCGCTGCA?AC I 1300
I
1250
F-MetSerLeuPheArgArgGlyGluThrTrpTyrAlaSerPheThrLeuProAsnGlyLy~ArgPheLysGlnSerLe~ rValGlnValSerAlaGlyAspAspIleThrGluGluLeuLysCysHisTyrSerAlaGl~ValLysPrOGlyT~~P~ovalSerHisCysArgThrAlaLy~A~pL~~S~~S~~~~~L~ TGTCCAGGTGAGCGCGGGTGATGACATAACAGAGGAACTGA~TC~CTATTCCGCAGAGGTGAAACCTGGTACGCCAGTTTCACATTGCCGAACGGCAAAAGATTTAAGCAGTCTCTT I -I I 1200
1
r15Ci
LeuGluGluLysAlaHisLysLysSerLeuAspAspAspLysSerArgIleGlyPheTrpLeuGlnHisPheAlaGlyMetGlnLeuLysAspIleThrGl~ThrLyslleTyrSerAla TTAGAGG~GAAGGCGCATAAGAAGTCG~TGGATGATG~CAAGAGTCG~ATAGGATTC~GGCTCCAGCATTTTGCAGGAATGCAGTTGAAGGATATTACCGAGACGAA~ATTTACTCCGCC I 950 1000
900
IleGlnLysIleThrAsnArgArgHisGluGluAsnTrpLysLeuMetAspGluAlaCysArgLysAsnGlyLysGlnP~oProvalP~eLysProLysProAlaAlaValAlaT~rLys ATCCAGAAGATAACTAATCGGCGGCATGAGGAAAACT~GAAGTTAAT~GATGAAGCTTGCAGGAAGAATGGGAAGCAGCCTCCGGTATTCAAGCCTAAGCCGGCAGCAGTAGCTACAAAA 850 606 AlaThrHi~LeuSerPheIleLysAlaLeuLeuArgAlaAlaGluArgGluTrpLysMetL~UASPLYSAl~P~OIleIleLysValProGlnProLysAsnLysArgIleArgTrpLeu GCAACTCACCTTTCATT~ATTAAGGCACTCCTCCGGGCTGCTGAACGCGAATGGAAG~TGCTGGATA~GGCTCCGATCATCAAAGTTCCTCAGCCGAAAAATAAGCGTATCCGCTGGCTT 700 750
656
1leAspMetGlnArgLysValAlaTrpIleHisProGluGlnSerLysSerAsnHisAlaIleGlyValAlaLeuAsnAspThrAlaCysArgValLeuLysLysGlnIleGlyAsnh~s ATAGACATGCAACGAAAGGTGGCATGG~TACACCCGGAACAAAGCAAGTCTAATCATGCCATTGGAG~GGCGCTGAA~GATACCGCTTGCCGGGTGCTGAAAAAGCAAATCGGCAATCAT 450 500
GlyIl~Gl~AspPheArgPheHisAspLeuArgHisThrTrpAlaSerTrpLeuValGlnAlaGlyValProIleSerValL~~GlnGl~~~tGlyG~yT~pGl~S~~~l~Gl~~~tv~~ GGCATTGAAGACTTCCGTTTTCATGACCTGAGGCACA~GTGGGCAAGTTGGTTAGTTCAGGCTGGCGTTCCGATTTCGGTATTGCAG~AAATGGGTGGCTGGGAGTCTATCGAAATGGTT 200 250 AT
inl~m137
ThrTER ACGTAAGTATTTGATTTAACjTGGTGCCGATAATAGGAGTCGAACCTAC 0
Figure 6. Nucleotide sequence of the P22 int-ziu region. Co-ordinate zero is in the attP core region (see Leong rt al., 198.5~). Thesequenceshownstarts295 bpupstreamfromxis, andextends tocore. Only t’he intcodingstrandof DNAisshown. with thcl predicted aminoacidsequenceoftheInt and Xisabove theDNAsequence. Theprobablestart codonsof.zis. at) + 1432. and for int. at + 1205. are boxed. Pot’ential ribosome binding sites upstream from these codons are underlined. TER indicates translation termination codon. The base change at + 230 corresponding to int-am137 is boxed below the wild-type sequencr.
Weaver & Levine (1978). Induction of a P22,intC lysogen results in packaging in situ of chromosomal DNA st,arting at t,he pat site on the prophage genome and ext)ending rightward for several headfuls of DNA. EcoRT digestion of the DNA extracted from phage particles produced by such an induction results in a characterist’ic restriction profile (Fig. 4). When we induced a lysogen of
P22Ap75, the DKA packaged into phage particles also gives an excision-defective profile. Since this phage is integration proficient we conclude that the Tn7 insertion in P22Ap75 has inactivated the xis gene. The burst size upon induction of a lysogen of this phage is reduced approximately 200.fold (M. M. Susskind. unpublished results). similar t,o jL xix - mut,ants.
61 1
Divergence Among Recombination Genes of Lambdoid Phage R,estriction analvsis of P22Ap75 DNA reveals that) the insertion “is between + 1296 (HphI) and + 1348 (HaeITI; dat’a not shown). Inspection of the DNA sequence in t)his region reveals two coding regions longer than 35 amino acids. one of 56 and the ot’her of 116 codons. Only the longer open reading frame is preceded by a recognizable ribosome binding site (Figs 1 and 6; Shine & Dalgarno. 1974), and since this open reading frame displays significant homology to the proposed 480 Xis sequence (see below), we believe that it is P22 .zis. As is the case in E,. the int and xis genes of P22 overlap (Fig. 6): this overlap is 121 bp in P22, compared to 20 bp in 1. P22 xis, like 3, X~S.codes for a basic protfein (20 basic and 10 acidic residues), but. is somewha,t larger than 3, xis (116 versus 72 codons). (f) Comparison of the primary structures of the A, P22 rind I recombination proteins The best pairwise alignments of the predicted amino acid sequences of the 2, P22 and 480 recombination proteins were determined by computer analysis using an algorithm similar to that, of Seedlemann & Wunsch (1970; see Materials and ,Methods). These alignment’s were assigned a score. depending upon the extent’ of identities (or conservative substitutions) and the number of gaps introduced to arrive at t’he alignment. This score was t
our threshold for significance, and no pairwise alignment has greater than 230,,, identity. (Gaps were allowed but penalized: see 3laterials and Methods.) Indeed. when an alignment scoring syst,em is used that counts only identical amino conservative suhstit’utions: see acids (ignoring Mat,erials and Methods), the P22 and i Int insignificant, alignment gives an score (approximately one standard deviation above the mean). Their relationship is only detected when a mutation dat,a matrix (PAM) is used that scores each substitution depending on the frequency of that) substitution in aligned sequences of closei! related proteins (Dayhoff et al., 1978). The relat)edness of the t’hree Int proteins is clear]? illustrated only when local homologies are used t’o generat,? three-way alignments. The algorithm of (ioad & Kanehisa (1982) was used to detect localized regions of homology, and these restricted alignments show a much great’clr statistical significance than the global comparisons (great,er t’han seven standard deviations above the mean). To test t)hese alignments. we combined two pairwise alignments (e.g. L-480 and @SO-P22) to generate a single t,hree-way alignment (e.g. /I-@O-P22): if the two pairwise alignments are “correct”. t,hrn all three proteins should align in t’he t,hree-way comparison. We find that only the C’-terminal halves of t*he proteins (approximat.ely the last 180 residues) show localized regions of homology among all t’hree Tnt proteins; no consist’ent alignment could be generated for t’he N-terminal halves. This pa t,tern of (‘-terminal conservation iqc also supported bv the detailed ana,lysis of the three sequences using amino acid phvsical parameters thought. t’o control protein folding (Arpos et al.. 1986). Tn particular, one stretch of 28 residues near each (’ terminus displays t.he best homology (Fig. 7(a)). 0f the 28 positions, 12 (43?,) shoa either complete identit,y among all three prot,eins. or ident’ity for two of the three and a conservative substitution in the third,
(a) 080 1nt:
Ieu His Asp Met Arg Arg Thr Ile Ala lhr --,..-... . .._...
P22 Int:
Phe His nSp Leu Rrg His 'Ihr Trp Ala Ser hp _ ... ---..f _
A 1nt:
Phe His Glu Leu Arg ---. . . --
Asn Leu Ser Glu Ieu Gly Cys Pro Pro ills Val Ile _ . ..__...... Leu Val Gln Ala Gly Val Pro Ile _ ... ... -
Glu Lys Leu IEU Gly His --38--_-_
Cc011
Ser Val LEu Gln Glu Met Gly Gly Trp --47--CCOH _... -
Ser Leu Se+ Ala Arg Leu T'yr Glu Lys Gln Ile SET Asp Lys Phe Ala Gln His Ieu Leu Gly Ills ... ... ... ... _ ... .. ... . . . ---_
--23--
CKXi
(b) 080
xis:
P22 Xis: A Xis:
NH3 --5-NH3 --5-NH3
--2--
Val ?hr Ser Asp --- Leu l'hr Ser Arg 'Qr Lys Ile . .._ . ..---Ieu 'Ihr Leu Asp Glu Ala Cys Ala Phe ku Lys Ile -------II% 'Ihr Leu Gin Glu lYp Am Ala Arg Gin Arq Arq --... _ -. ..-
Ser Arg Lys 'Ihr --17-Ser Arg Pro ti -----... _ Pro Arg Ser Ieu --12-. .._...
Phe Pro Pro Pro Asp 'Trp PLO Gly --20------- --- --- --- --- --- --- --- --95--
Cm
Phe Pro Pro Pro Val Lys Asp Gly --34-----
Cm
C~H
Figure 7. Homologies among the primary structures of the i.@O and I’22 site-specific recombination proteins. Pairwise alignments were derived using an algorithm similar to that of Needlemann & Wunsch (1970). Three-way alignments were genrratpd by combining pairwise alignments. Broken lines symbolize gaps in 1 of the sequences relat’ivr to the other 2. (‘ontinuous underlines indicate identical residues; dotted underlines indicat,e conservative substitutions. Substitutions are considered t’o be conservative if both amino acids are in 1 of the following groups: Ser. Pro, Ala. GIy and Thr: Arg. 1,~s and His: Phe. Trp and Tyr; Asp, Glu, Gln and Asn; Ile, Leu, Met and Val; Cys (Dayhoff et nl., 1978). (a) Int l)rot,einn. Kumbers at the right end indicate C-terminal amino acids that are not shown. (1)) Xis proteins. Numbers represent terminal or intervening amino acids that are not) shown.
612
J. M. Leong et al.
Among the three Xis proteins, only P22 and 480 have sufficient global homology (approximately five standard deviations above the mean); the P22-i and 480-n alignments give scores approximately one and two standard deviations above the mean, respectively. Furthermore, the sizes of t,he Xis proteins vary greatly, from 65 to 116 residues. Nevertheless, all three Xis proteins can be aligned in a 15 residue region at their N termini (Fig. 7(b)). Only one pairwise homology is apparent outside of this region (between 1 and 480 Xis in the center of t,he proteins; Fig. 7(b)). (g) Potential int promoter sequences in 480 and P2B The 480 int gene is transcribed left to right, and thus cannot be expressed from the major leftward promoter, P,. Moreover, in the prophage state, the phage DNA to the left of the core is replaced by bacterial DNA. Therefore, assuming that 480 does not depend on a fortuitous bacterial promoter in the H arm, a phage int promoter P, must lie between the core and the int gene if the phage is to be able to express int during prophage induction. Comparison of the DNA sequence in this region with the il P, sequence reveals a candidate for 480 P,. The sequence + 119 to + 155 shares extensive (22 out of 28) homology with the i P, region (Fig. 8): this homology is all the more st,riking given the extent of divergence of the att sites and int genes of the two phage (Leong et al., 1985a). The homology includes the -35 region of R P,. the region that is specifically bound by ;L cTI protein (Ho et al.. 1983). It is inberesting, however. that the two tetranucleotide repeats thought to be responsible for cIT recognition in 1 are not strictly conserved in sequence or in spacing (an 8 versus 6 bp spacing; Fig. 8). Although less striking than the above-mentioned homology, the region in P22 just proximal to the int gene also shows homology to the 2 P, sequence (Fig. 8). Only one of the two i ~11 recognition units is present. Like A P,. these sequences in 480 and P22 would be judged to be weak promoters on the basis of their sequence alone (Rosenberg & Court, 1979: Hawley & McClure. 1983). Further analyses are required t,o determine if these sequence homologies to )UP, reflect functional homologies as well.
080:
TTGCGTGTACCTTTGAGTGG~~~ACTTT
h y :
mGTGTAA-
T~GGAG
~~~~ ACTTT
-35 P22:
-G-TGTAcACT -GCoaToTACT IO
crrcccrocA--TTGCGGc-~--ACTTc*4rcrccGCTGcATACT
Figure 8. &30 and P2 sequences homologous to the i PI promoter sequence. Larger letters indicate homologous
basest,hat are conserved in at least 2 of t,he 3 sequences. Hracsketsbelow the i sequence indicat’e -35 and - 10 regions of the i P, promoter. Arrows above the 1 sequence indicate a tetranucleotide repeat recognized by i. ~11 prot,ein (Ho et al.. 1983). The 480 sequence is from + 119 to + 115: the P22 sequence is from + 1315 to + 1253.
4. Discussion (a) Global diversity and localized conseruatiojt among the sequences of Eambdoid site-specific recombination proteins One might predict, that, the primary structure of the recombination proteins of ,?. P22 and @SOwould reflect the fact, that they are proteins that have several activities in common. First. the three Int proteins (and probably also the three Xis proteins) recognize specific DNA sequences. Although the prot,eins must’ differ somewhat in their I);I;A binding domains since t,hey recognize different sequences. several repressors t,hat recognize different operator sequences have similar amino acid sequences in their binding domains (Saucr et al.. 1982: Takeda et al.. 1983). Second, all three Int proteins presumably possess a site-specific cllearapt and rejoining activity: #SO and jU Int each generat,c a pair of staggered nicks in the core> and the distance between the nicks is either identical or ver? similar in both systems (Mizuurhi et al.. 1981; Craig Br Na,sh, 1983; Leong et nl.. 1985b). Third. THF bound to 1 attP increases the apparent, binding affinity of Int, for attP (Craig & Kash. 1984). Tf IHF directly interacts wit,h 1 Int. it probably also int,eracts with P22 and @SOInt in a similar manner. and one might predict that the Int domain responsible for this proposed interact,ion would be conserved. Finally, all t’hree svst)ems have the interesting property of directionalit\-: the presence or absence of Xis governs the direction of the reaction: and presumably the mechanism of Xis action is similar in each pathwa.y. It’ is intriguing then. that the primary st,ruct,urrs of these recombination proteins show only limited homology. In t)he N-terminal halves of the proteins, no region gives a consistent’ three-way alignment. and the three Xis proteins ca,nnot’ be aligned outside their N termini. This divergenre is in striking contrast t.o the extreme sequence conserviit,ion between two phage that, share t)he same &tK site. R and phage 434. The last 120 bp of the 134 id gene have been sequenced. and comparison to i. int reveals no differences despite numerous diffcrencrs in their attP sequences (Mascarenhas et al.. 1981). Examples of primary structure divergence among functionally and rrolutionarily relat,ed enzymes have been observed before. The transcription antitermination prot,eins of 1.. I’22 and 421 arc> presumably related. recognize specific. (lllll different) nucleotidr sequences. and presumabl>~ interact with the same host prot,ein. R&VA polpmerasc. However. beyond the observation that they are basic and approximately thr same sizcl. they show little sequence homology (Franklin, 1985). A particularly ext)reme example is the set of aminoacyl-tRNA synthetases. which show great sequence and size divergence (Webster et oh.. 1983). It is clear that mechanistically similar events can b(a catalyzed by proteins wit)h very different primary structures. We suspect tha,t the t.ertiar+y and quaternary structures of the site-specific rrcombina
613
Divergence Among Recombination Genes of Lambdoid Phage tion proteins and their higher-order complexes with att DNA will show considerably less diversity. Despite the lack of global homology, the amino acid sequences of certain localized regions of the Int proteins are clearly related. All of the regions that can be aligned among all three Int proteins fall in the C-terminal halves of the proteins. This C-terminal homology is emphasized by the observation tha,t sequences in this region (and only this region) are also conserved in several other Int proteins (Argos et al., 1986). These results suggest that the C-terminal halves of the Int proteins (particularly the region shown in Fig. 7) are under more stringent structural constraints than the N-terminal halves. Interestingly, this pattern of conserved and divergent domains has been observed (in more extreme form) among a set of proteins whose relationship is similar to that of the Int proteins: the specificity subunits of the restriction enzymes ~&OK, EcoB and EcoD are evolutionarily related, bind t’o specific but different DNA sequences, and interact with functionally interchangeable modification subunits (see Bickle, 1982). Nonetheless, sequence analysis reveals that only two regions, one of about 45 and another of about 90 amino acids, show any apparent sequence homology, and these two regions are highly conserved (Gough & Murray, 1983). The repressor proteins of 1, 434 and P22 have homologous functions but recognize different DNA sequences. They each consist of two domains separated by a protease-sensitive “linker” region. Their C-terminal domains, responsible for dimerization, are more homologous in primary structure than their N-terminal domains, which are involved in the recognition of specific DNA sequences (Sauer et al., 1982). (Although less pronounced, the N-terminal domains also show significant homology to each other as well as to the Cro proteins of the three
phage,
particularly
in
the
regions
of the
proteins that make intimate contact with operator DNA.) The site-specific recombination protein resolvase can also be cleaved into two functional domains (Abdel-Meguid et al., 1984). Among the family of recombinases related to this protein, the more conserved (N-terminal) domain is responsible for prot>ein-protein interactions and also apparently for strand cleavage and rejoining; the more divergent (C-terminal) domain DXA recognition (Reed et al.,
is responsible for 1982; N. Grindley,
personal communication).
In this regard, one might and non-conserved regions of Int protein represent discrete structural domains with distinct functions. speculate
that
the
conserved
(I)) Regulation of integration and excision il employs a complex combination
of positive and
negative
regulatory mechanisms to regulate the expression of id and xis (Fig. 9). These mechanisms
include repression by 1 repressor at P,, activation by cII protein at P,, antitermination by N protein and retro-regulation that is dependent upon the
-200
x
@80
P22
OVP 0
200
400
600
800
1000
s/b , II
1200
1400
-----PI
1600
I -d-P,
L..
P, 7
4
ml -
&
P, 7
Figure 9. Comparison of the genetic organization of the att-int regions (of 1, 480 and P22. Co-ordinates are numbered with zero in the attP core region. Broad arrows indicate the extent and transcriptional orientation of genes. Broken open region indicates uncertainty in the exact site of translation initiation of r&.30int. Wavy lines represent RNA transcripts. The positive regulatory effects of ~11 protein and N protein are indicated (+). The negative regulatory effect of the sib sequence is indicated ( -). Potential P, promoters in $80 and P22 were ident.ified by homology to /I P, (see Fig. 8). distal sib sequence (for reviews, see Herkowitz & Hagen, 1980; Echols & Guameros, 1983). The net result of these elaborate controls is that Int’ is efficiently produced only when it is required for recombination, i.e. during the establishment of lysogeny (from P,) and during prophage induction (from PL). Xis, because it is translated from the P, transcript, and not from the P, transcript, is synthesized only during prophage induction and not during phage integration. The results reported here suggest that the regulatory mechanisms of 480 and P22 are not identical with those of 1. Most notably, the transcriptional orientation of t’he &SO int gene is opposite to that of 1 and P22 int. (One might imagine that Int-mediated recombination with a secondary att site to the right of int may have resulted in this inversion; indeed, our plasmid inversion assay would predict that such events may be common; Leong et al., 19856.) This arrangement precludes t,he possibility of a sib-like “retroregulation” that depends on a change in distal sequences upon viral int.egration and excision. Also. in P22, although the transcript’ional orientation of int is identical with that of 1.. no obvious sib-like palindrome is apparent in the P arm (Lrong et al.. 1985a). Additionally, there is no compelling evidence in eit’her 480 or P22 for a positively activated int promoter similar to 1 Pi. /z Pi is situated so as to express int but not xis (see Fig. 9): in P22, the recognition sequence for transcriptional activator protein cl (S. Keilty and M. Rosenberg, personal communication) is not apparent in the analogous location or orientation. Furthermore. even in the
614
J. A+‘. Leong et al.
absence of cl, P22 int is expressed, albeit at somewhat lower levels than under cl + conditions (Youderian 8t Susskind, 1980), raising the possibility that’ int is constitutively expressed. In 480, notable sequence homology t’o 1 P, is observed (Fig. S), but it is not, known whether this homology reflects the presence of a cII-like function (with a slightly different recognition sequence) or if it simply reflects a common evolutionary past that, is no longer relevant in 480 regulat,ion. Because 480 int is transcribed left to right and cannot be expressed from the PL transcript, @30 P, (or a bacterial promoter in the R arm) must be active during prophage induct,ion. Khesin et nl. (as cited by Rybchin, 1984) have suggest’ed that 480 inf is constltutively expressed to account for their abilit! to isolat,e tandem double lpsogens. (In ;(, the attP x nttL and nttP x attR recombinations that could lead to excision of one of the two t,sndem phage genomes can occur in the absence of Bis: Guarneros & Echols, 1973.) One possibilit,y then, is that 480 and P22 int may both be const,itutively expressed, and that control of integration and excision may be cont,rolled by the differential expression of 2%. It is interesting that’ all t’hree phage may have a mechanism for producing less Int protein during excision than during integration. 2 .ris and i~t overlap by 20 bp. and P22 int and J%Soverlap bj 121 bp. In each case: translation of .ris could sterically hinder the init
qf site-speci$c
function8
Tn all t,hree phage, the site-specific recombination functions (&I’. int and xis) are t,ightly clustered on
a relatively small stretch of I)iXA (approxirnatel) I.7 kb: Fig. 9). The i,nt and xis coding sequences overlap in 1. and P22. and directly abut each other in ~$80:in all three phage. the two genes are directI! adjacent to attP. This arrangement may increase the efficiency of recombination, because expression of int and iris would lead to high local concentrations of the prot*eins near their site of action. The clustering of functions also emphasizes t,he modular nature of the genomic organization of lambdoid phage (see Busskind R: Rotstein. 1978; Campbell & Rotstein. 1983): such an organizat,ion is consistent with the notion that, lambdoid phage tavolved from pre-existing functional “cassettes”. A aomplnt,t~ a,ltera,tion of insertion specificity can bC accomplished by an exchange of only a small region of I)SA between diRerent phage or host-encoded t’ransposition cassett’es. A hybrid phage integrates an entirely different and excises with sit’e specificit)y, yet, st,ill retains the genetic organization necessary for the proper c>ontrol and rxecnt,ion of gene expression.
5. Conclusions 3”. c#J~Oand P%% a,re closely related phage wit811 similar site-specific recombination fun&ions. For all three phage. t,he regulat)ed expression of these of functions is essential for proper control development. Furthermore. structural analysis of’ t,he att sit,es has suggestled that the recombination of t.he t,hree phage have similar functions mechanistic features. Nevertheless, this homology of function and mechanism is not obviously reflected in homology of the enzyme sequences or the regulation of their expression. When compared globall?. the primary structures of the proteins art’ only distantly relat,ed. Homologies are detected only when local homologies are used to align thv proteins from all t,hree phage. Homology among thft Sis proteins is found at t’heir N t’ermini, uhiltx significant homologies a,mong the Int proteins art’ fo;nd only in their (!-terminal halves. That 1hr relative arrangements of int and .ris also vary is int,riguing c&idering the regulator\- significance of the arrangement of these genes in 3.. The different genetic arrangement in c#JSO indicat,es a,n alternativcl patt’ern of regulation. Further st’udy of t.he sit,especific recombination systems of &W and 1’22 provide new perspectives on both should mechanistic and regulatory aspe’cts of phagr integration and excision. \I’e thank 1). I,iprnan and It. Saurr for their vrxr) generous help in t’he ~ornput~ed analysis of’ protriu sryuencw. \V. Md’lurt~ for the c&omputor,analysis of potential I)romoter sequences. and K. Franz for some cut the sequenw dat,a. We are grateful to Y.-S. Ho. S. Keilt;l. 11. Rosenberg. E. TJjungquistj. W’. Kalionis. Ii. Egan. h. Franklin. A. I’oteete. X. Sternberg. K. Abremski. and R. Weisbrrg for helpfiil adrive and/or c~omnrunic~atiorl of results prior to publication. The following investigators generously pave US hact~erial. phage or plasmid strains: 11’. Dove. N. Franklin. Ii:. Jackson. A. Matjsushiro. Y.
615
Divergence Among Recombination Genes of Lambdoid Phage Nishimune. A. Poteete. K. Talmadge, and W. Gilbert. This work was supported by a grant to A.L. from the X.I.H. (A113544). ,J.L. was the recipient of an N.S.F. graduate fellowship.
References Abdel-Meguid, S. S.. Grindley. N. D. G.. Templeton, N. S. & St.eitz. T. A. (1984). Proc. Nat. Acad. Sci., T.‘.S.A 81. 2001l2005. Abremski. K. 8: Got,t’esman, S. (1982). J. Biol. Chem. 257. 9658-9662.
Xrgos, P.. Landy. A.. Abremski. K., Egan, J. B.. Haggard-Ljungquist. E.. Hoess, R. H., Kahn. M. L.. Kalionis, W.. Narayana. S. V. L., Pierson III, L. S.. St)ernberg. N. & Leong. ,J. M. (1986). EMBO J. 5. 433440. Better. M.. Chi, I,.. Williams, R. C. & Echols. H. (1982). Proc ,Vat. Acad. Aci., U.S.A. 79. 5837-5841. Bett’er. M.. Wickner. S.. Auerbach, J., Williams. R. & Erhols, H. (1983). Cell, 32. 161-168. Bickle. T. A. (1982). In Xucleases (Linn, S. M. & Roberts. R. J , eds). pp. 85-108. Cold Spring Harbor Laboratory Press. Cold Spring Harbor, N.Y. Bushman, W., Yin. S.. Thio, L.-L. 83.Landy, A. (1984). Cell, 39. 6999706. Campbell. A. (1962). Adnan.
Genet. 11, 101-116. A. (1983). In ,Vobile Genetic Elements (Shapiro.
Campbell. ,I.. ed.). pp. 66-104. Academic Press, New York. (“ampbell. A. & Botstein, D. (1983). In Lambda ZZ (Hendrix. R. W.. Roberts, J. W., Stahl, F. W. & W&berg. It. A.. eds), pp. 365-380. Cold Spring Harbor Laboratory Press. Cold Spring Harbor. N.Y. (‘ban. It. K. CI Rotstein. D. (1976). Genetics, 83, 433-458. Chisholm. R. L.. Deans. R. J.. Jackson, E. N. & Rutila. ,I. E. (1980). Virology, 102, 172-189. (‘olrnran. .J.. (ireen. P. ,J. & Inouye, M. (1984). Cell, 34. 429 436. (‘raig. N. 8: Nash. H. A. (1983). Cell, 35, 795-803. Craig. 5. & Nash. H. il. (1984). Cell, 39, 7077716. Davies. R. W’. (1980). ~\‘ucl. Acids Res. 8, 1765-1774. I)ayhoff. M. 0.. Schwartz. R. M. & Orcott. B. L. (1978). vol. 5. In Atlas of Protein Sequence and Structure, suppl. 3 (Dayhoff, M. O., ed.), pp. 3455352, Nat. Biomrd. Res. Found. MD. Echols. H. 8: (:uarnrros, (2. (1983). In Lambda II (Hrndrix. R. W’.. Stahl, F. W., Roberts, J. W. & iv&berg. R. A.. eds), pp. 72-92, Cold Spring Harbor Laboratory Press. Cold Spring Harbor. N.Y. Enquist, I,. W’., Kikuchi. A. & Weisberg, R. A. (1979). Cold Spring Harbor Symp. @ant. Biol. 43. 11151121. Franklin, 1;. (‘. (1967). Genetics, 57. 57, 301-318. Franklin 9. C. (1985). J. Mol. Biol. 181. 85-91. Franklin 9. (‘.. Dove. W. F. & Yanofsky. C. (1965). Hiochem. Biophys. Res. Comm. 18, 910-923. Goad, W. Sr Kanehisa. M. (1982). A1ucl. Acids Res. 10. 183m196. Gough. J. A. 8r JImray. N. E. (1983). J. ~Vol. Biol. 166. l-19. Guameros. G. & Echols. H. (1970). J. Mol. Biol. 47. 565583. Guarneros. U. 8: Echols. H. (1973). Virology, 52. 30-38. Hawley, D. & McClure. W. (1983). ~Vucl. Acids Res. 11. 2273-2255. Herskowitz. T. & Hagen. D. (1980). Bnnu. Rer. &net. 15. 399~-45-5. Hillikrr. S. (1979). Ph.D. t,hesis. MIT.
Ho. Y.-S., Wulff, D. L. & Rosenberg. M. (1983). Nature (London),
304. 703-708.
Hoess. R. II.. Foeller. C., Bidwell, K. & Landy. A. (1980). Proc. Nat. Acad. Sci., U.S.A. 77. 2482-2486. Hsu. P.-L., Ross, W. & Landp. A. (1980). Nature (London), 285, 85-9 1. Izant, J. G. C Weintraub, H. (1984). Cell. 36, 10071015. ,Jarkson. E. N., Miller, H. I. & Adams. M. L. (1978). J. Mol. Biol. 118, 347-363.
Chem. 260, 4468-4477.
Leong. ,J. M.. Nunes-Duby.
S. E. & Landy.
Proc. Xat. Acad. Sci., C:.S.A
A. (19856).
82. 6990-6994.
Mascarenhas, Tj.. Kelley. R. B Campbell. A. (1981). Gene. 15. 151.-157. Mat,sushiro. A. (1963). Virology, 19, 475.-491. Maxam. A. & Gilbert. W. (1980). Meth. Enzymol. 65. 499-560.
Miller, H. 1. & Friedman.
D. I. (1977). In B&‘A Insertion and Episomcs. (Bukhari. A.. Shapiro. J. Br Adhya. S.. eds). pp. 3499356, Cold Spring Harbor Laboratory Press. Cold Spring Harbor. N.Y. Miller, H. I.. Kikuchi. A., Nash. H. A.. Weisberg, R. A. Br Friedman, D. T. (1979). Cold Spring Harbor Symp. @ant. Biol. 43, 1121-l 127. Mizuuchi. K.. Weisberg. R.. Enquist. L.. Mizuuchi, M.. Buraczynska. M., Foeller. C.. Hsu. P-L.. Ross. W. & Landy. A. (1981). Cold Spring Harbor Symp. @aunt. Elements.
Biol.
Plasmids,
45, 4299435.
Moore. S. 6
616
J. M. Leong et al.
Ward. D. F. & Murray, Ku’.E. (1979). J. Mol. BioZ. 133. 249-266. Weaver, S. & Levine, M. (1978). J. Mol. Biol. 118, 389411. Webster. T. A., Gibson, B. W.. Keng, T., Biemann. K. & Schimmel. P. (1983). J. Biol. Chem. 258. 106377 10641. Weinstock, G. (1977). Ph.D. thesis, MIT. Weinstock, G., Susskind. M. M. & Botstein, D. (1979). Genetics, 92, 685-710. Edited
Weisberg. R. & Landy. A. (1983). In Lambda II (Hendrix, R. W.. Stahl. F. W., Roberts, J. W. & Weisberg, R. A.. eds), pp. 211-250, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y. Yin, S.. Bushman, W. 8 Landy. A. (1985). Proc. Xat. Acad. Sci.. I:.S.A. 82. 1040--1044. Youderian, P. & Susskind. M. M. (1980). I’ir&gy, 107. 258-269.
by M. E. Gottesman