Gene, 73 (1988) 175-183 Elsevier
175
GEN 02740
Analysis of the nucleotide sequence of the PI operon of Mycophzsma pneumoniae (Recombinant DNA; primer extension; polycistronic message; phage vectors 1 and M13)
Julia M. Inamine’, Steve Loechel’
and Ping-chuan HuPbC
Departments of a Pediam’cs and b Microbiology and Immunology and ’ Center for Environmental Medicine, University of North Carolina, Chapel Hill, NC 27599-7220 (U.S.A.) Received 2 June 1988 Revised 8 August 1988 Accepted 10 August 1988 Received by publisher 30 August 1988
SUMMARY
The attachment of virulent Mycoplasmapneumoniae to the ciliated epithelium of the respiratory tract involves a surface protein designated Pl. Our previous determination of the nucleotide sequence of the PI attachmentprotein gene revealed that it is flanked by open reading frames (ORFs) and there is no obvious ribosome-binding site (RBS) or transcription termination sequence in the adjacent regions. We extended this analysis by cloning and sequencing the 1%kb region containing the PI gene. This study indicates that the PI gene is transcribed as part of a larger polycistronic message. The PI operon is composed of the PI gene and two predicted genes, designated 0RF-4 and ORF-6. The gene order is ORF-4, PI, ORF-6 with intervening regions of 12 and 5 nt, respectively. ORF-4 and ORF-6 have respective coding capacities for proteins of M, z 28 000 and M, E 130000. Putative promoter and RBS sequences which correspond closely to those found in Escherichia coli and Bacillus subtilis, as well as a sequence indicative of a transcription terminator, have been found in the flanking sequences. The transcription start point has been determined by primer extension of M. pneumoniae RNA.
INTRODUCTION
The specific attachment of M. pneumoniae to the respiratory ciliated epithelium is mediated by a surface protein designated Pl (Hu et al., 1977). The
Correspondence to: Dr. P.-c. Hu, Department of Pediatrics, 535
Burnett-Womack Bldg., CB No. 7220, University of North 27599-7220 (U.S.A.) Chapel Hill, NC Carolina, Tel. (919)966-2331. 0378-I 119/88/$03.50
0 1988 Elsevier
Science Publishers
B.V. (Biomedical
nucleotide sequence of the attachment-protein PI gene has been determined by ourselves (Inamine et al., 1988) as well as Su et al. (1987). The 4881 nt Abbreviations: aa, amino acid(s); AMV, avian myeloblastis virus; bp, base pair(s); dd, dideoxy; dNTP, deoxynucleotide triphosphate(s); kb, kilobase or 1000 bp; nt, nucleotide(s); oligo, oligodeoxyribonucleotide; ORF, open reading frame; Pl, surface protein (A4r 169758) of iUycoplasma pneumoniae that mediates attachment of the organism to respiratory ciliated epithelium, and that is also a major immunogen; PI, gene coding for Pl; RBS, ribosome-binding site. Division)
176
of the PI structural gene are identical in these two publications. However, our study revealed that the PI gene is flanked by ORFs, suggesting that it is transcribed as part of a larger polycistronic message, while the analysis by Su et al. (1987) indicated that the PI gene is monocistronic in nature. This paper presents the cloning and sequencing of an 18-kb region containing the PI gene. Nucleotide sequence analysis shows that the PI gene is part of an operon.
MATERIALS AND METHODS
(a) Bacteria, phage and medii
Monolayer cultures of A4. pneumoniae strain M-129 (ATCC29342) were grown in glass prescription bottles, rinsed, and harvested as previously described (Hu et al., 1977). E. coli strain NM539 was used to propagate derivatives of phage vector IEMBL3 (Frischauf et al., 1983), and strains JMlOl (Messing, 1979) and JM109 (Yanisch-Perron et al., 1985) were the hosts for phages M13mp18 and M13mp19 (Norrander et al., 1983) and their derivatives. All E. coli strains were cultured with Luria broth (Ausubel et al., 1987). Phage 1 derivatives were propagated in E. coli by standard procedures (Ausubel et al., 1987). Transformation of E. coli was performed by the method of Hanahan (1985). Recombinant phage AJI-7, containing the PI attachment-protein gene within approx. 16 kb of M. pneumoniue DNA, and several independently isolated recombinant phages with DNA inserts that overlap that of IJI-7, were described or isolated as described previously (Inamine et al., 1988). (b) Chemicals and enzymes
Restriction enzymes and other enzymes used in molecular cloning were purchased from a number of sources. Ml3 vectors were from Bethesda Research Laboratories, and IEMBL3 was from Promega Biotec. dNTPs and ddNTPs were from Pharmacia. Radioactive materials were from Amersham Corporation and ICN Biomedicals, Inc. Primer oligos were synthesized in the laboratory of Dr. Clyde
Hutchinson III. The sources of other products and kits are given in the text. Unless otherwise stated, all were used according to the manufacturers’ instructions. (c) Nucleic acid techniques
Procedures for the isolation of DNA from phage and the isolation of DNA and RNA from M. pneumoniae were described previously (Inamine et al., 1988). Restriction fragments from the recombinant a phages were purified from LE agarose (FMC Corporation) gels using a Geneclean Kit (BIO 101, Inc.), and cloned into Ml3 vectors by standard procedures (Ausubel et al., 1987). Nested deletions of M 13 phage clones were produced by the exonuclease III method of Henikoff (1984) using the Erase-a-base System (Promega Biotec). Nucleotides were sequenced by the dideoxy chain-termination method of Sanger et al. (1977) as previously described (Inamine et al., 1988).
RESULTS AND DISCUSSION
(a) Nucleotide sequencing containing the PI gene
of the
l&kb
region
We had previously sequenced 10.83 kb of the region containing the PI gene (Inamine et al., 1988), and extended this analysis by cloning appropriate restriction fragments into Ml3 phage vectors for sequencing. These restriction fragments were isolated from IJI-7 (Inamine et al., 1988) and other recombinant phages whose inserts were mapped to this region of the M. pneumoniue chromosome (Fig. 1). Independent sources of the fragments used for DNA sequencing were particularly needed for the ORF-6 region (see below) because deletions and point mutations were commonly found in clones isolated from E. coli. Only when a consensus sequence of a single fragment, cloned from at least two different phages, was obtained was this information included in the database. The reason for the instability of this region in E. coli is unknown. The nucleotide sequence data were compiled and analyzed with the Beckman MicroGenie program.
177 SH
Sti
ti
I,
SH
H
HS
HS
SS
H
HS
HSS
HXH
HS
HSS
HHH
HHH
HS
HSS
HHH
HHH
SS
HSS
HHH
HHH
S
S
l-P1 --
9.0
#16
ss
I,
I
--
l+-----
X7
S
,
8
#lj
H
HS
H
HHS
f
#23
#20
HS
1,‘
GENE-I 4.4
I---
S
P1
--
5.3
OPERON
SEQUENCED
-kb
Il.5
-1
REGION
_.
_-....w [
Fig. I. Partial restriction endonuclease maps of the cloned inserts of recombinant Aphages which span the 3%kb region containing the PI operon. .&II7(i.e., No. 7) has been described previously (Inamine et al., 1988). The solid boxes identity the 4.4” and 1.7-kb HindIII fragments, or portions thereof, previously found to contain transcribed or untranscribed, respectively, Pi coding sequences (Inamine et al., 1988).Their relative locations, and the location ofthe Pl gene and Pl operon, are shown below on the composite map. The complete sequence of both strands of the 1%kb region indicated on the bottom was obtained from nested deletions of restriction fragments subcloned from the recombinant phages into Ml3 vectors. H, HindHI, S, SUE.
(b) Analysis of the 18-kb region wntaining tbe PI gene
unlabeled ORFs on the ends are incomplete. Consensus RBS sequences (see below) were found at the beginning of all the ORFs except ORF-5 (the PI gene), ORF-6 and that labeled Mpl-1 1 (refer to fnamine et al., 1988).
The 17 897 bp of sequence data were analyzed and seven complete ORFs were identifkd (Fig. 2). The
1
0
I
I
2
I
I 4
I
I 6
I
I 8
I
I 10
I
I 12
I
I 14
I
I 16
I
I la kb
Fig. 2. Functions map derived from the analysis of the nucl~tide sequence of the 18.kb region ~nt~g the PI operon. ORFs are indicated by the open boxes with their limits given by nt positions and their 5’ to 3’ orientation shown by arrows. The PI operon is composed of the PI gene (OR&5) flanked by the predicted genes designated ORF-4 and ORP-6; the estimated sixes of the encoded proteins are given within the respective boxes. The limits of the PI operon are indicated by the horizontal arrows below the map. Relevant flanking and intervening nucleotide sequences are shown above their map locations. The -35, -10, transcription start point, and S~ne-~~~o (S/D; RRS) sequences are underlined. Vertical arrows mark the begianing and end of the ATG codons and TAA or TAG stop codons, respectively, which delineate the ORFs of the PI operon. The sequences indicative of a transcription terminator are depicted as a stem-and-loop structure (LIG = -8.6 kcal/mol); K = kDa.
178 so
30 60 TGTCAGTCAAACTACGACAACAACAGTTGCTGTTTGATTCTTTAAACTTAAACAGAATTAGCACCAATOAATAGCCAAGTACACCGCAAGGGCTCTATT~CAGAAGCAG~TAGTGCCATC
-35
-10
ikscription
120
initiation
150 180 CAAGCTCATGATAAGATCGTGATCTTCCACCACATTCGTCCTGATGGCGATTGTTTGGGCGCACAACACGGCTTAGCGCGTTTAATCCAAACTAACTTT~CCCACAAGC~GGTCTTCTGT Beginning 270 300 1 GTTGGTGATCCCAAACACAACTTTCCCTGATTGGAGATGGTTTTCACTCCAAAGGAACAGATTACCCCGGAGTTAATGCAACAAGCCTTAGCCGTTATTGTTGATGCCAACTATAAGGAA s/D
210
of
210
2S3kSy 360
W=tGJ~GlnA1=~~~AJ=V=IIleValAspAla4.nTyrLy=GJ~
390 120 CGGATTGAGTGCCGGGACTTATTAGACCAAAACCAGTTTAAGGCAGTATTGCGGATTGACCACCACCCCAATGAGGACGATCTCAATACGACCCATAACTTCGTTGATGCGTCTTACATT
450
480
510
800
630 SE.0 ACATCATGA;GAACACTAT;TCTAGGATCTATGCTATAT~GAGCTCAAGCTAATATAGCTAAGATCCATGATGAGTTAAACCACACTTCCTTAAAGGAC~TCCAGTTTA~ACAATATGTC ThrSerTrpAr~ThrL~uTyrleu0lySerletbeuryr*~gA,=GJ”A,=A=n,,~AJ=~y=~,~~~=A=pG,“~~“A=“~~=Th~S~~~~”~ysA=p,,=G~“Ph~~y=G,“Ty~V=~ -
690
720
750 180 TTTAAAAACTTTCAGACCTTTCAGAATGTTATTTACTTTGTGGCCGATAAGAAGTTCCAAAAGAAATTAAAGGTAACACCCTTAGAATGTGCACGGGTAAATATCCTAGCTAACATTGAA
810
840
930
se0
ArSIlcGluCy=ArgAspLeuLeuA=pGlnA=nG~nPheLys*1aV=l~euArSlleAsP~l=HlsProAsnGluAsPAsp~euAsnThrThrHisAsnPheV=~A=pAl=S~~Tyrlle 510 540 GCCGCGGCTGAACAAGTGGTAGATCTAGCGGTGCAGGCCAAATGGAAGCTTAGCCCCCCAGCGGCTACGGCGCTGTATCTAGGTATTTATACAGATAGT~ATAGGTTTC~ATATAGTAAT Al=AJaAlaGJUG1nValVaJA=p~euAJ=V=JGJnAJ=LY=T~p~Ys~eUS=rPrOP~OA~=AJ=ThrAJ=~ruTy~~eUGJyIleTY~ThrA=pSerA=nA~gPhe~~“Ty~SerA=”
PheLy=AsnPheGlnThrPheGJnAsnV=J~JeTyrPheV=JAl=Asp~ys~y=PheGln~ys~ys~=uLysV=JThrProLeuG~uCyaAJ=ArgV=lA=n~le~euAJ=A=nIleG~u 870 900 CAATTCCACATTTGGCTGTTCTTTATAGAAGAGGGTAAGAACCACTATCGGGTCGAATTCCGTAGTAACGGAATTAACGTACGCGAAGTAGCTTTAAAGTATGGTGGCGGGGGTCATATT OJnPhcHJ,lleTrpLcuPhePheIle01u01u01yLy=AsnSJ=TyrArgVaJG~uPheArgSerAsnGJy~JeA==V=lArgGJUV=lAJaLeU~Y=TyrGlYG~YGJyG~YSJ=I~= End of 28 tDo SSO ,l 1020 CAGGCCAGCGGTGCAGTTCTTAAAAGCAAGCGCGACATAATTCGTGTAGTTCAAGATTGCCAAAAGCAAATTGCTGTATAATTTTTAACAACTATGCACCAAACCAAAAAAACTGCCTTG GlnAJaSerGJyAJaVaJLeuLy=SerLy=ArgA=pIlel=pCysGln~ysGlnIleAlaVaJ***
PI GENE
( 4884
nucleotides,
from
nt 1054
1050
,
Boginning
of PI
1080
to nt 5937)
End ol PI Brginning of 13O,,ksD,. 5930 5800 7.7 CCCACCAGTACCACCAAAGCCAGGGGCTCCTAAGCCACCAGTGCAACCACCTAAAAAACCCGCTTAGTATTTATGAAATCGAAGCTAAAGTTAAAACGTTATTTACTGTTTTTACCACTT xxx “etLysSerLy=Le”LysLeULys*rpTyrleULeuPheL~”P~~~~”
5880
6020 6050 TTACCGCTAGGGACGTTGTCACTAGCCAACACCTACCTCCTCCAAGACCACAACACCCTCACCCCCTACACGCCCTTTACGACACCGCTCAATGGGGGGCTGGATGTCGTGCGCGCCGCC LeuProLeuGJyThrLeuSe~~~“AJ=A=nThrTyrLeuL~”GJnA=p~~=A=nTh~~euTh~P~oTyrThrProPh=Th~Th~Pr=~euAsnGJyGlY~euAspV=JVaJArgAJaAJ=
6080
6110
6140 6170 CATTTACACCCCTCATACGAACTCGTGGACTGAAAGCGGGTGGGGGATACCAAGTTGGTGGCGCTGGTCCGCTCAGCGTTGGTCAGGGTGAAATTCCAGGACACAACGAGTTCGGATCAA SJ=LeuHJ=ProSerTyrGl~L~~V=~A,pT~pLy=A~~V=JGlyA=pTh~~V=~~“V=lAl=L~“V=~Ar~S~rAla~e”V=~A~gVaJ~y=Ph=GJnAspTh~Th~SerSerA=pG~n -
6200
8130
8320
6350
G410
SllG
8560
S500
8880
8710
8770 8710 AGTTGACCGGTGTATGAACCCCTGGATTCGACCAAGGAGGGGAAGGGGAAGGATGAGAGCTCTTGGAAAAATTCGGAAAAAACAACAGCGGAAAATGATGCCCCGTTGGTGGGGATGGTT S=rTrpProVsJTyrGluPr~~~“A=pSerrhrLysGJ~Gly~V=Gly~ysAspGluSerS~~T~p~ysA=“S=rGJu~ysThcTh~A~=GJuAsnAspAJ=PrO~eUV=JGJYM=tV=~ -
6800
8830
6890 6860 GGAAGTGGTGCGGCTGGAAGTGCTACTTTAGTTTACAAGGCAATGGCTCGAACAGTTCGGGGTTAAAATCGCTCTTGAGATCAGCACCTGTCAGTGTTCCACCAAGCAGTACAAGTAATCAA GJySerGJyAJaAJaGlySerAlaSerSerLeuOlnGlnGlyA=”GJyS~~AsnS~~Se~GJY~=“LySSe~~e”~eUA~gS~rAl=P~OV=~S~~V=~PCOP~OS~~S~~Th~S=~AS~G~n
BSZO
6950
7040
,070
,160
7190
7280
13*0
6260
6290
AGTAATACCAACCAAAATGCCTTGAGTTTTGATACCCAAGAATCACAGAAGGCACTTAATGGCTCGCAGAGTGGATCTTCTGACACTTCCGGGTCTAACTCCCAAGACTTCOCCAGCTAT SerA=nThrA,nGl”A,nAl=LeUS~~Ph=A=pTh~Gl”GJ”S~~Gln~y=AJ=~=“As”GJyS~~GlnS=~GlyS=rSe~A=pTh~Se~GlySe~A=nS~~GJnA=pPheAJ=S~~Ty~ 8380
6410
GTCCTCATCTTTAAAGCCGCGCCCAGGGCCACGTGGGTGTTTGAACGCAAGATTAAGTTGGCGTTGCCCTACGTTAAGCAGGAAAGTCAGGGTTCCGGCGATCAAGGTTCCAATGGTAAG VaJLeulJePheLy=AlaAlaProArg*laThrT~pValPheGluAr~LysJJ~~ys~~uAlaLeuPr~TyrV=l~y=GlnGJuSerGJnGlySe~G~yA=pGJnGJySerA=nGJY~y= 8500
6530
GGCTCCCTCTACAAAACCCTCCAAGACCTCCTCGTCGAACAACCCGTGACCCCTTACACCCCGAATGCGGGGTTAGCCCGGGTGAATGGGGTTGCTCAGGATACGGTTCATTTTGGTTCG GlySerLe”TyrLysThrLe”G~nA=pLeuLeuVa101uClnP~oV=~ThrP~~TvrThrP~oAsnAJ=GJy~euAlaACgVa~AsnGJyV=lA~=GJnA=pThrV=lSJ=PheGJYSe~ 6650 6620 GGTCAAGAATCGAGTTGGAATTCCCAACGTTCCCAAAAAGGCCTTAAAAACAACCCCGGACCCAAAGCCGTCACCGGCTTTAAGCTCGATAAGGGCCGCGCGTACCGGAAGCTGAATGAA GJyGJnGl”SerSerTrpA.nSerClnA~gSerUlnLysGJy~~”LysA=nA=nProGlYPr~~y=A~=V=lThrGJYPhe~YS~~UAsPLysGlYA~gAJ=TY~A~g~YS~~UA=nGJU
6980
,010
ACTTTAAGCTTATCTAACCCCGCTCCTGTGGGCCCACAAGCGGTTGTAAGCCAACCCGCGGGGGGTGCTACGGCAGCAGTGTCCGTCAATCGCACAGCGAGTGACACCGCCACCTTTAGC ThrLe”SerLeuSerAsnProAlaPrDValGlyProGI”A,aVa,Va,SerG,nP~aAl=G,yGJyAJ=ThrA.laPllaValS~~V=JA=nA~gTh~AJ=S~~A=pT~~AJ=Th~Ph=S~~ 7100
7130
AAGTACCTCAACACCGCCCAGGCCTTGCACCAGATGGGGGTGATTGTTCCGGGGTTGGAAAAATGAGGTGGTAACAACGGTACGGGTGTAGTGGCTAGCCGACAGGATGCTACTTCCACT LysTyrLeuA.=ThrAJaGJ”Al=~=“~~=Gln*etGlyV=J~J~V=JProGJy~euGlU~V=TrpGJyG~yAsnAsnGJyTh~GJyValV=JAlaS=rA~gGlnAspAlaTh~Se~Th~ 1220
7250
AACI:TGCCCCATGCGG~:AGGTGCTTCCCAAACGGGTTTGGGAACTGGTTCGCCCCGCGAACCAGCTTTAACCGCAACGTCACAGCGTGCCGTCACGGTGGTTGCTGGCCCCCTT~~GTGCG As”Le”P~~HlsAl=AI.GIyAi=S=~G~“Th~GJyLeuC~yTh~GJVS~~P~~A~SGl”P~~AJ=~~“Th~AJ=ThrS=~OJ”A~gAJ=V=lThrV=JV=JAJ=GJyP~=~~~~A~g~~?
179 7400
7430
7520
7550
7840
7870
7780
7790
7880
,910
GO00
8030
8120
9150
8210 SJGO TCCGATGATAATTCAAACA~CAAAGTCAAGTGAACCAACACCGCCTCGCACTACCTCCCCGTGCCGTATTACTACAGTGCCAATTTCCCCGAAGCGGGTAACAGAAGGCGAGCGGAGCAG Se~A.pA~pA~“Se~A~~Th~Ly~V~lLy~T~pTh~A~”Th~Al~SerSJ~TyrLeuProV~lP~DTY~TY~Ty~Se~Al~A~nPheP~OGlUAl~GlYA~nA~SA~SA~SAJ~GlUGl~ -
8240
8270
9330 9300 CGGAATGGGGTGAAGATTAG~ACCTTGGAATCGCAAGCCACTGATGGCTTTGCCAACTCGTTACTTAACTTTGGTACCGGTCTTAAAGCCGGTGTTGACCCAGCTCCAGTAGCACGGGGT ArSA,nGlyVaJLy~llsSerthrleu0lu5cr0ln*laTl~A~nSerLeuLeUA~~PheGlYThrGJyLeULY~Al~GlYV~JA~PP~~Al~P~OV~JAJ~A~SGlY
8360
SSGO
8450 8420 CATAAACCGAACTATAGTGCAGTACTACTAGTGCGTGGTGGCGTTGTAAGGTTAAACTTTAACCCCGATACTGATAAACTGTTGGATTCTACTGACAAAAACAGTGAACCTATCTCCTTC HJ~Ly~ProA.nTyrSerAJnV~lLe”L~“V~lArSGlyGlyValV~lArSLeUA~nPheA~nP~OA~PTh~A~PLY~LeUL~uA~PSe~Th~A~pLY~A~nSe~GlUP~O~leSe~Phe
8480
SE.10
8570 SS40 TCCTATACCCCATTTGGGTCTGCTGAAAGTGCCGTAGACCTCACCACGT~GAAGGATGTGACCTATATTGCTGAAAGTGGTCTGTGGTTCTATACCTTTGACAA~GGTGAAAAACCAACG SsrTyrThrProPheGlyScrAJ~Gl”S~~AJ~V~lA~pL~”ThrThrLeuLy~A~pV~lThrTY~lJeAlaGlUSerGlyLeuT~pPheTyrTh~Ph~A~pA~nGlYGJuLy~ProTh~
8600
GGSO
8690 SSSO TACGATGGT;AACAACAAC;GGTCAAAAACCGCAAGGGTTATGCTGTGATTACCGTATCACGTACCGGAATTGAATTTAACGAGGACGCTAATACCACAACCTTAAGCCAAGCCCCAGCT TyrA,pGJyLy~G1nG1nG1nV~lLy~A,nArSLysGlyTyrAl~V~lIJeThTV~lSerArSTh~Gly~leGlUPheA~nGluA~pAl~A~nThrThrThrLeuSerGlnAl~Pr~Al~
8710
8750
8810 8180 GCTTTGGCT~TCCAAAACG~GATTGCTTCCAGTCAGGACGACCTCACAGGCATCCTACCGTTATCCGATGAGTTCTCCGCTGTGATTACCAAGGATCAAACATGGACCGG~AAGGTTGAT AlaLcuAJ~ValGJnAwnGJyIJ~AJ~SerS~~GlnAspAspL~UTh~Gly~JeLeUP~OLeuSerA~PGJuPh~Se~AlaV~lll~ThrLy~A~pGlnThrTrpTh~GlyLy~V~JA~p
8840
8070
SGSG OS00 ATCTATAAG;hCACCAACGbGTTGTTTGAAAAGGATGAT~AGCTATCGGAAAACGTGAAGAGGCGTGACAACGGTTTGGTCCCTATTTACAACGAAGGTATCGTCGATA~TTGGGGCAGA IJeTyrLysAwnThrA~nGlyLeuPhe01uLyaA~pA~pGlnL~nSerGluA~nV~lLy~ArSA~SAapA~nGJYLeUValP~oJleTyrA~nGluGly~l~ValA~p~leTrpGlyArS
SGSO
0980
9170 9140 AACCAAAGAGCAATGCTAGTGGGGGAAAAGACATCGGATACTACCTTAACGGTTAAACCGAAGATTGAGTACTTGGATGGTAACTTCTATGGTGAGGATTCCAAGATTGCTGGAATTCCG A~nGlnArSAJalletLeuValDlyOluLy~Th~SerA~pThrTh~L~UThrVaJLYDP~OLy~~leGJUTY~LeUA~PGlYA~nPheTY~GlyGJuA8pSerLy~lleAJ~GJylJeP~o
8200
9290
DSGO GSSO CTCAACATTGATTTCCCTTCCCGGATTTTTGCTGGCTTTGCTGCTTTACCGTCCTGGGTCATTCCGGTATCAGTCGGTTCATCGGTGGGCATTCTCTTAATCCTGCTCATCTTAGGCCTT LcuA~nlJeA~pPhsProSe~ArSlJePheAl~GJyPheAl~AJ~LeuProSerTrpV~JJl~ProVaJS~~V~lGlySe~S~rV~JGJyIleLeuL~u~JeL~uLeuJJeL~~GJyLeu
SSlG
GSSG
G4JG SSSG GGTATTGGA;TTCCAATGT;TAAGGTCCGCAAGCTTCAA~ACTCCAGCTTTGTTGATGTGTTTAAAAAGGTGGATACGTTGACAACCGCTGTGGGTAGC~TGTACAAGA~GATTATCACC GJyllcGlyllePro~etTyrLYaV~lArSLysL~uGlnA~pSerSerPheV~JA~pV~lPh~LyaLy~V~lA~pThrLeuThrThrAl~V~lGlyS~rV~lTyrLy~Ly~ll~ll~Thr
9440
9470
OS00 9530 CAAACGAGTGTGATCAAAAAAGCTCCTAGTGCGTTGAAAGCTGCTAATAACGCTGCTCCTAAAGCACCAGTTAAACCAGCTGCTCCAACAGCTCCAAGACCACCAGTCC~A~~A~~TAAA GlnThrSerVslIJcLY~LYsAJ~PrOS~~Al~LeULY~AJ~Al~A~nAOnAl~AJ~P~oLy~Al~PrOV~lLy~P~OAl~AlaProThrAl~P~~ArSP~~P~~V.JGl~P~~p~~Ly,
OS80
SSSO
,340
7370
GGcAAT*GcAGTGAA*cTGA*GcccTAccG**TGTcATcACcCAGc~CTATc*TAc~TcAAcCGcccAAcTcGCTTAcTTAAATGGcc*GATcGT~GTGATGGGT~ccGAccGGG~AccG GlyA~nSerScrGl”ThrA~pAl~Le”P~~A~nV~lIleThrGlnLeuTyrSl~ThrSerThrAl~GlnLeuAl~TyrLe”A~“GlyGl”lleV~lV~l~etGlySerA~pA~SV~lPro 7460
7480
AGTCTTTGGTATTGAGTTGTCGGGGAGGACCAGGAATCGGGCAAAGCGACCTGATGAGCGAAAACCGAGCTCAACTGGGGCACCGACAAGCAGAAGCAGTTTGTCGAAAACCAGTTGGGG SsrLcuTrpTyrTrpValValGlyGl”A~pGJnGluSerClyLy~Al~Th~TrpTrpAl~Ly~Th~GJuLeuAsnT~pGlyTh~A~pLy~Gl”LysGlnPheV~JGluA~nGlnL~uGly -7580
7610
TTTAAAGATGACTCAAATTCGGATTCCAAAAATTCGAATTTGAAGGCCCAAGGCCTCACCCAACCCGCCTACCTCATCGCCGGTCTTGACGTTGTGGCCGACCACCTCGTCTTTGCGGCC PhaLy~A.pA~pS.~A.“Se~A~pSerLyaAanSe~AS”L~”Ly~Al~Gl”GlyLeUTh~GlnP~OAlaTy~LeUlleAl~GlyLeUA~pVaJV~JAJ~A~pSJ~LeUVaJPheAl~Al~ 7700
7730
TTTAAAGCGGGCGCGGTGGGGTATGATATGACGACTGATTCGAGCGCTTCGACCTACAACCAAGCACTCGCCTGGTCGACCACGGCCGGGTTGGACAGTGATGGGGGGTACAAGGCCTTG PheLy~A,aGlyAla”alG,yTy~A~p”e~Th~Th~A~pSe~Se~Al~Se~T~~~y~A~”Gl”Al~Le”AJ~T~pSe~Th~Th~Al~GlyLe”A~pSe~A~pGlyGlyTy~Ly~A,~Le” 7820
7850
GTGGAAAACACGGCCGGGCTCAACGGCCCGATTAATGGCTTGTTTACCCTGCTCGACACCTTTGCGTATGTGACCCCCGTGAGTGGGATGAAAGGGGGGAGTCAGAATAATGAAGAAGTG VaJGJuA~nThrAlaGlyLeu*sn01yProIleA~nGlyLe”PheThrLe”LeuA~pThrPheAl~TY~V~lThrP~DV~lSerGly~etLy~OlyGlySerGlnA~nA~nGluGJUV~l 7870 7840 CAAA~GA~TTACCCGGT~AAGTCCGACCAAAAGGCCACCGCCAAAATTGCCTCCTTAATTAATGCCAGCCCACTCAACAGTTATGGGGATGATGGGGTGACCGTGTTTGATGCCCTGGGC GlnThrThrTyrProValLy~S~~A,pGl~Ly~Al~Th~Al~Ly~JleAl~Se~LeuIJeA~nAl~SerProLeUA~nSe~TyrGlYA~pA~pGlyV~lTh~V~lPheA~PAl~LeUGlY SOS0 9080 CTTAACTTT~ACTTTAAGTTGAACGAGGAGCGCTTGCCATCGCGCACCGACCAACTGCTTGTGTATGGGATTGTAAACGAAAGTGAACTGAAGTCCGCACGGGAAAATGCCCAGTCGACC LeuA~nPheA,nPheLy,Le”A~“Gl”Gl”ArS~.euProS~~ArSThrA~pGJnL~uLeuV~JTYrGJYIleV~JA~nGlUSe~GlULeULY~Se~Al~ArSGJUA~~AJ~GlnSe~Th~
End of 130 kDS SSSO SOS0 7, AAGGCTTAACGATTAAGCCCCAACAGAAAGAAGCACAACTGTAAAAAGGTTGTGTTTTTTTTCATATTTACAACTTTTTGCTTTTTTGTTOGCACATTA~C Ly~AJa*** T*rmination
SBSO
Fig. 3. The nucleotide sequences of ORF-4 (labeled 28 kDa) and ORF-6 (labeled 130 kDa), with flanking sequences of the attachmentprotein PI gene (Inamine et al., 1988). The deduced amino acid sequence is shown below the nucleotide sequence. Only the nt are numbered, with Nos. 1 through 9691 being nt positions 3875 through 13 565, respectively, in Fig. 2. The nucleotide sequences which putatively serve as signals for transcription and translation are underlined, as are the Trp residues which correspond to the TGA codons.
180
The nucleotide sequences surrounding the PI structural gene contain several notable features which indicate that the PI gene is transcribed as part of a polycistronic message (included in Fig. 2). Neither a consensus RBS sequence (Shine and Dalgarno, 1974; Moran et al., 1982) nor a structure indicative of a transcription terminator (Platt, 1986) could be found in the regions immediately flanking the PI gene. Rather, the PI gene is flanked by ORFs. These predicted genes are designated ORF-4 and ORF-6 (previously referred to as ORFl and 0RF2, respectively, in Inamine et al., 1988) and their nucleotide sequences are shown in Fig. 3. The gene order is ORF-4, PI, ORF-6 with intervening regions of 12 and 5 nt, respectively. The 723 nt and 3654 nt of the coding regions of ORF-4 and ORF-6, respectively, represent 241 and 1218 aa with calculated M,s of 27788 and 130441. Northern-blot hybridizations of M. pneumoniae RNA were performed using the procedures of Ausubel et al. (1987) to see if a full-length PI operon mRNA could be detected. Unfortunately, the halflife of the transcript is too short for this analysis. Hybridizations with DNA probes specific for each ORF of the operon resulted in smears rather than discrete bands (not shown). Interestingly, probes from the 5’ end of the operon hybridized much less intensely than did 3’ probes, suggesting that the partially intact transcripts which are present are derived primarily from the 3’ end of the operon. This was confirmed by primer extension with a number of different oligos, where it was found that the amount of mRNA encoding the 3 ’ half of the PI gene exceeded that of ORF-4 and the 5’ half of the Pl gene by at least ten-fold (not shown). This pattern of mRNA degradation is similar to that observed in E. cofi in which the mRNA encoding the promoterproximal (5’) genes of an operon is often degraded before that of the promoter-distal (3’) genes (reviewed by Higgins and Smith, 1986). It appears, then, that the polycistronic mRNA encoding the PI operon is unstable, probably due in large part to its size (predicted to be about 9600 nt). The transcription start point was determined to be the AG doublet at nt positions 54 and 55 (approx. 50% each) by primer extension of total M. pneumoniue RNA with an oligo c.omplementary to the sequences from nt positions 244-258 in Fig. 3. The results are shown in Fig. 4. No other bands
could be seen either above or below the labeled bands on the gel. In addition, the nucleotide sequence of this region does not contain any hairpin or G + C-rich sequences which could have caused reverse transcriptase to terminate prematurely. By these criteria, the run-off transcript measured in vitro represents the true 5’ terminus of the mRNA. This analysis is supported by the presence of sequences, beginning 27 nt (the ‘-35’ region) and 11 nt (the ‘-10’ region) upstream from the transcription start point, which are similar to the sequences of E. coli (Hawley and McClure, 1983) and B. subtilis (Moran et al., 1982) promoters, with the only major difference being that they are separated by 10 nt rather than 15-20 nt. There are three possible ATG start codons for ORF-4, but the one beginning at nt position 316 is most likely because there is a Shine-Dalgarno sequence (GGAG) beginning 7 nt upstream. This is a good RBS based on its complementarity to the sequences of the 3’ ends of 16s rRNAs from A4.pneumoniae (U. Gobel, A.G. Geiser and E. Stanbridge, personal communication), as well as from E. coli, species of Bacillus and other mycoplasmas (Frydenberg and Christiansen, 1985; Iwami et al., 1984; and references therein). Following the TAA stop codon of ORF-6 are sequences of dyad symmetry that provide for the potential formation of a stem-and-loop structure (dG = -8.6 kcal/mol) indicative of a transcription terminator (Platt, 1986). All of the above features are compatible with the Pl gene being part of an operon. (c) Analysis of ORF-4 and 0RF-6 The amino acid sequences of ORF-4 and ORF-6 were deduced from the nucleotide sequence (Fig. 3). That of ORF-6 predicts that the 130-kDa protein does not contain any Cys residues, a notable feature confirmed previously for Pl protein (Inamine et al., 1988), while the predicted 28-kDa protein (ORF-4) contains three. The amino acid sequences were also examined by computer analysis of the hydrophilicity values and predicted signal sequence cleavage sites (von Heijne, 1986). The first 25 aa of the Pl (Inamine et al., 1988) and 130-kDa (Fig. 3) protein sequences have the characteristics of signal peptides (Watson, 1984), and both contain Ser-Leu-Ala at aa positions 23-24-25 (corresponding to nt positions 6009-6017 of ORF-6 in Fig. 3) which conforms to the (-3, -1)
t81
rule (von Heijne, 1986). There are no possible cleavage sites predicted for the 28-kDa protein. A computer search of the National Biomedical Research Foundation Protein Data Bank detected
(nt541
(nt55f
Fig. 4. Determination of the transcription start point. A 15-mer deoxyribonucleotide (complementary to nt positions 244-258 in Fig. 3) was end-labeled with [Y-~~P]ATPand T4 polynucleotide kinase by a standard procedure (Ausubel et al., 1987). This 15-mer was used for primer extension of M. pneumoniae RNA using the GemSeq transcript sequencing system (Promega Biotec) with the following modifications. The annealing mixture contained 4pg of total M. pneumoniae RNA and 0.5 pmol of primer in 5 mM Na40,P2, 1 mM EDTA, pH 7, and was heated at 90°C for 3 min; the solution was adjusted to 50 mM NaCl before cooling slowly to room temperature. Primer extension was carried out with AMV reverse transcriptase in the absence of ddNTP. Size standards were produced by using the primer to sequence the appropriate Ml3 phage clone with the Sequenase kit (U.S. Biochemical Corp.) following the manufacturer’s instructions. The reactions were resolved on a 6% polyacrylamide field gradient gel as previously described (Inamine et al., 1988), and the gel was dried before exposing to x-ray film. The respective sequencing reactions are shown in the lanes labeled T, G, C, A. The products of the primer extension reaction are in the last lane, with arrows pointing to the fragments ending at nt 54 (A) and nt 55 (G) according to the numbering system used in Fig. 3.
no significant homology between the 130-kDa and 28-kDa sequences and sequences in the database. However, extensive homology (58% with conservative r~lacem~ts) was found in a comparison of the deduced C-terminal amino acid residues of Pl protein (Inamine et al., 1988) and the putative 130-kDa protein of ORF-6. The beginning of this homologous region (Fig_ 5) is reminiscent of the membrane anchor sequences and proximal flanking regions found at the C terminus of group A streptococcal M proteins (Hollingshead et al., 1986; Mouw et al., 1988) in that it is characterized by regularly spaced Pro or Gly residues (aa 1509-1525 for Pl and aa 1102-1118 for 130-kDa proteins) followed by a stretch of nonpolar hydrophobic amino acids (aa 1526-1549 for Pl and aa 1119-1142 for 130-kDa proteins) that ends with several charged residues (Lys or Arg). These hydrophobic stretches are the only potential membr~e-sprig regions of either the Pl or 130-kDa proteins based on the characteristics of segments of membrane proteins that are known to span lipid bilayers (Warren, 1981), although such segments are not interrupted by Pro residues as are the hydrophobic stretches in Pl and 130-kDa proteins. The above features, namely putative signal and stop-transfer sequences, are compatible with the known surface-exposed and membrane-bound nature of Pl protein and suggest that the predicted 130-kDa protein may also be located in the membrane. We note that the homologous region of the P 1 and 130-kDa protein sequences (Fig. 5) ends with residues that are extremely proline-rich (14/37 or 38% for Pl protein and lo/34 or 29% for 130-kDa protein). It will be of interest to determine if the putative 130-kDa protein is localized on the terminal tip structure of M. pneumoniue as is PI protein (Hu et al., 1982; Feldner et al., 1982; Baseman et al., 1982); if so, then one could speculate that these proline-rich C termini may be involved in restricting the top~ap~~~ dist~bution of these proteins in the membrane. (d) Conclusions
Our nucleotide sequence analysis indicates that the PI gene is transcribed as part of a polycistronic message rather than a monocistronic message as suggested by Su et al. (1987). We feel that this dif-
182 1509 1102 1529 1122 1549 1142 1569 1162
ProGlnThrLeuPheSerProPheAsnGlnTrpProAspT~ValLeuProLe~laIle I I I I I I ProSerArgIjePheAiaGlyPheAlaAlaLeuProSerT~pValIieP~l
Pl
ThrValProIleValValIleValLeuSerValThrLeuGlyLe~laIleGlyIlePro I I I I I I I GlyS~rSerVhlGlyI~eL~uL~uI~eLeuL~uIleLeuGlyLeuGlyIlsGlyIlsPro
Pl
MetHisLysAsnLysGlnAlaLeuLysAlaGlyPheAlaLeuSerAsnGlnLysValA~p I I I III MetTyrLysValArgLysLeuGlnAspSdrSerPheValAspValPheLysLysValAsp
Pl
ValLeuThrLysAlaValGlySerValPheLysGluIleIleAsnArgThrGlyII I I II I: I I I I ThrLeuThrThrAlaValGlySerValTyrLysLysIleISerValIle
Pl
Ile
130-kDa
130-kDa
130-kDa
I 130-kDa
1588
Pl
1182
130-kDa
Pl 130-kDa Fig. 5. Comparison of the deduced C-terminal amino acid residues of Pl protein (top) and the putative 130-kDa protein (bottom). The codons of the Pl protein and 130-kDa protein represent 1628 and 1218 aa, respectively, beginning with the first codons of their ORFs; the aa positions included in this comparison are given on the left. Optimal alignments of the 119-aa residues of Pl protein and the 117-aa residues of the 130-kDa protein were obtained by introducing gaps in both sequences. Identical amino acids are indicated by the vertical lines, while conservative replacements are shown by colons. The terminal Pro residues are boxed.
ference arises because of problems in their analysis: (1) no RBS was observed between the putative promoter and the proposed start codon for translation; (2) their imperfect inverted repeat sequence (‘RNA terminator’) has a calculated AG = 0 which indicates that it does not have the potential to form a stable stem-and-loop structure, and (3) the omission of one G in the 3’-flanking sequence following the PI gene produces a TAG stop codon in an otherwise open reading frame. The polygenic nature of the ORF4/PI/ORF-6 transcript permits us to apply the term operon to these genes. In bacteria, genes contained within an operon are often involved in related function. It is thus reasonable to assume that the predicted 28-kDa and 130-kDa proteins are involved in P 1 processing, regulation or the attachment of M. pneumoniae to respiratory epithelium. The latter is certainly a possibility for the 130-kDa protein since it is predicted to be a membrane protein like P 1. Immediate progress in the analysis of the PI operon with regard to the functions of the predicted 2%kDa and 130-kDa proteins is hampered by the presence of TGA (i.e., UGA) codons throughout all the ORFs. The PI gene encodes 21 UGA codons (Inamine et al., 1988; Su
et al., 1987), while ORF-4 and ORF-6 have one and seven, respectively (Fig. 3). According to the universal genetic code, UGA is a stop (opal) codon, but we have recently identified a tRNA capable of reading UGA as tryptophan in M. pneumoniae (J.M.I., J. Ho, S.L. and P.-c.H., manuscript in preparation). However, this means that the PI operon will not be expressed in any host that utilizes the universal genetic code due to premature termination of translation at the UGA codons. This complication is readily addressed by current recombinant DNA methodologies whereby the UGA codons can be converted to the universal code equivalent (UGG) for tryptophan. We have used site-specific in vitro mutagenesis techniques to construct a universal code equivalent of the PI gene, and will continue this work for ORF-4 and ORF-6. This will permit us to express these genes in E. coli for the production of specific antibodies and for structural/functional analyses.
ACKNOWLEDGEMENTS
This work was supported by National Institutes of Health grants AI-20391 and HL-19171 and by
183
Cooperative Augment CR807392 from the Environmental Protection Agency. We thank B. Shambley and A. Thomas for secretarial assistance.
REFERENCES Ausubel, F.M.,Brent, R.,Kingston,R.E., Moore, D.D., Seidman, J.G., Smith, J.A. and Struhl, K. (Eds.), Current ProtocoIs in Molecular Biology, Wiley, New York, 1987. Baseman, J.B., Cole, R.M., Krause, DC. and Leith, D.K.: Molecular basis for c~so~tion of ~y~o~~~rn~ pneumoniae. J. Bacterial. 151 (1982) 1514-1522. Feldner, J., Gobel, U. and Bredt, W.: Mycoplasma pneumoniae adhesin localized to tip structure by monoclonal antibody. Nature 298 (1982) 765-767. Frischauf, A.M., Lehrach, H., Poustka, A. and Murray, N.: Lambda replacement vectors carrying polylinker sequences. J. Mol. Biol. 170 (1983) 827-842. Frydenberg, J. and Christiansen, C.: The sequence of 16s rRNA from Mycoplusmu strain PG50. DNA 4 (1985) 127-137. Ham&an, D.: Techniques for transformation of E. co&. In Glover, D.M. (Ed.), DNA Cloning, Vol. I. IRL Press, Oxford, 1985, pp. 109-135. Hawley, D.K. and McClure, W.R.: Compilation and analysis of Eschenkhiu coli promoter DNA sequences. Nucleic Acids Res. 11 (1983) 2237-2255. Hen&off, S.: Unidirectional digestion with exonuclease III creates targeted breakpoints for DNA sequencing. Gene 28 (1984) 351-359. Higgins, C.F. and Smith, N.H.: Messenger RNA processing, degradation and the control of gene expression. In Booth, I.R. and Higgins, C.F. (Eds.), Thirty-ninth Symposium of the Society for General Microbiology, Cambridge University Press, Cambridge, 1986, pp. 179-198. Hollingshead, SK., Fischetti, V. and Scott, J.R.: Complete nucleotide sequence of type 6 M protein of the group A Saeprocoe~. J. Biol. Chem. 261 (1986) 1677-1686. Hu, P.-c., Collier, A.M. and Baseman, J.B.: Surface parasitism by Myeoplasma pneumoniue of respiratory epithelium. J. Exp. Med. 145 (1977) 1328-1343. Hu, P.-c., Cole, R.M.,Huang, Y.S., Graham, J.A., Gardner,D.E., Collier, A.M. and Clyde Jr., W.A.: Mycoplasma pneumoniae infection: role of a surface protein in the attachment organelle. Science 216 (1982) 313-315.
Inamine, J.M., Denny, T.P., Loechel, S., Schaper, U., Huang, C.-h., Bott, K.F. and Hu, P.-c.: Nucleotide sequence ofthe PI attachment-protein gene of~ycop~~~neurnon~e. Gene 64 (1988) 217-229.
Iwami, M., Muto, A., Yamao, F. and Osawa, S.: Nucleotide sequence of the rrnl 16s ribosomal RNA gene from Mycoplasma capricolum. Mol. Gen. Genet. 196 (1984) 317-322. Messing, J.: A multipurpose cloning system based on the singlestranded DNA bacteriophage M13. Recombinant DNA Technical Bulletin, NIH Publ. No. 79-99, 2 (1979) 43-48. Moran Jr., C.P., Lang, N., LeGrice, S.F.J., Lee, G., Stephens, M., Sonenshein, A.L., Pero, J. and Losick, R.: Nucleotide sequences that signal the initiation oftr~sc~ption and translation in Bacillus sub&r Mol. Gen. Genet. 186 (1982) 339-346. Mouw, A.R., Beachey, E.H. and Burdett, V.: Molecular evolution of streptococcal M protein: cloning and nucleotide sequence of the type 24 M protein gene and relation to other genes of Streptococcuspyogenes. J. Bacterial. 170 (1988) 676-684. Norrander, J., Kempe, T. and Messing, J.: Construction of improved Ml3 vectors using oligodeoxyribonucleotidedirected mutagenesis. Gene 26 (1983) 101-106. Platt, T.: Transcription termination and the regulation of gene expression. Annu. Rev. Biochem. 55 (1986) 339-372. Sanger, F., Nicklen, S. and Coulson, A.R.: DNA sequencing with chain-te~nating in~bitors. Proc. Natl. Acad. Sci. USA 74 (1977) 5463-5467. Shine, J. and Dalgamo, L.: The 3’-terminal sequence of Escherichia coli 165 ribosomal RNA: complementarity to nonsense triplets and ribosome binding sites. Proc. Natl. Acad. Sci. USA 71 (1974) 1342-1346. Su, C.J., Tryon, V.V. and Baseman, J.B.: Cloning and sequence analysis of cytadhesin Pl gene from Mycopksma pneumoniae. Infect. Immun. 55 (1987) 3023-3029. von Heijne, G.: A new method for predicting signal sequence cleavage sites. Nucleic Acids Res. 14 (1986) 4683-4690. Warren, G.: Membrane proteins: structure and assembly. In Finean, J.B. and Michell, R.H. (Eds.), Membrane Structure, Elsevier, Amsterdam, 1981, pp. 215-257. Watson, M.E.E.: Compilation of published signal sequences. Nucleic Acids Res. 12 (1984) 5145-5164. Yanisch-Perron, C., Vieira, J. and Messing, J.: Improved Ml3 phage cloning vectors and host strains: nucleotide sequences of the M13mp18 and pUC19 vectors. Gene 33 (1985) 103-l 19. Communicated by R.E. Yasbin.