PLASMID
28, 86-9 1 (I 992)
Nucleotide Sequence of Mycoplasma mycoides Subspecies Mycoides Plasmid pKMK1 KENDALL
W. KING* ANDKEVINDYBVIG*'J~,'
Departments of tComparative Medicine and *Microbiology, University at Birmingham, Birmingham, Alabama 35294
of Alabama
Received December 9, 199 I; revised March 2, 1992 To facilitate the development of mycoplasmal cloning vectors, we have determined the nucleotide sequence of pKMK I, a cryptic plasmid isolated from Mycoplasma rqycoides subsp. mycoides. It is 1875 bp in length and contains two open reading frames (ORFs) that share homology with ORFs from members of a large family ofgram-positive bacterial plasmids which replicate via a single-stranded DNA intermediate. Putative origins of replication and candidate cloning sites have been identified. 0 1992 Academic PETS. IK
Due to inherent differences between the genetic codes of mycoplasmas and other eubacteria (Dybvig, 1990; Yamao et al., 1985) existing systems for studying mycoplasmal genetics are limited. In all mycoplasmas for which the genetic code is known (except the genus Acholeplasma), the codon UGA encodes tryptophan rather than translation termination. Therefore, eubacterial genetic systems are not useful for functional studies requiring expression of cloned mycoplasmal genes containing UGA codons. Also, there is evidence that Escherichia coli may initiate translation of cloned mycoplasmal genes at incorrect sites (Notarnicola et al., 1990). Two plasmids have been isolated from different strains of Mycoplasma mycoides subspecies mycoides, opening up new possibilities for the development of mycoplasmal genetic systems. The DNA sequence of one of these plasmids, pADB20 1, has previously been reported (Bergemann et al., 1989). The second plasmid, pKMK 1, was isolated from M. mycoides subsp. mycoides strain GM12, and initial characterization indicated that it is slightly larger than pADB20 1 and has a dif-
ferent restriction pattern (Dybvig and Khaled, 1990). We report in this paper the nucleotide sequence of pKMK1 and potential open reading frames (ORFs), as well as homology of pKMK 1, pADB20 1, and other members of the family of single-stranded (ss)~ DNA plasmids. pKMK1 was randomly linearized using DNase I in the presence of Mnzf (Heffron et al., 1978). The staggered ends generated were filled in with T4 DNA polymerase (Heffron et al., 1978) and XhoI linkers were ligated to the ends. Following digestion with XhoI, pKMK1 was cloned into the XhoI site of pJI3, a plasmid that replicates in E. coli (Burdett et al., 1982). The mixture of random constructs was transformed into E. coli DHl cells (Hanahan, 1983), and the plasmid was isolated from a single clone by CsCl-gradient fractionation of cell lysates. Sequencing was performed by the dideoxynucleotide chain termination method using a double-stranded DNA template and the Sequenase 2.0 kit from United States Biochemical Corp, (Cleveland, OH). DNA primers, usually 20-mers, were supplied by the Oligonucleotide Synthe-
Sequence data from this article have been depositedwith the EMBL/GenBank DataLibraries under Accession No. M8 1470. i To whom correspondence should be addressed.
’ Abbreviations used: ss, single-stranded; GCG, Genetics Computer Group; ORF, open reading frame; RBS, ribosome binding site; P-O, plus-strand origin; M-O, minus-strand origin.
0147-619X/92
$5.00
Copyright 0 1992 by Academic Press. Inc All rights ~freproductlon m any form reserved.
86
SHORT COMMUNICATIONS
sis Core Facility at the University of Alabama at Birmingham. Both strands of cloned pKMK1 were sequenced in their entirety. Uncloned pKMK1 was also sequenced through the region that received X/z01linkers to verify the sequence obtained from the cloned plasmid. Sequence fragment assembly and analysis were performed using the MacVector software package from International Biotechnologies, Inc. (New Haven, CT) and the Genetics Computer Group programs (GCG) (Devereux et al., 1984). Comparisons between pKMK1 DNA and predicted protein sequences and published DNA and protein sequences in the GenBank and NBRF databases were made utilizing the MacVector package and the GAP program from the GCG. Analysis of potential DNA secondary structures was performed using the FOLD (Zucker and Steigler, 1981) and STEMLOOP programs from the GCG. pKMK1 is 1875 bp in length (Fig. 1). In comparison, pADB20 1 has 1717 bp (Bergemann et al., 1989). Some of the restriction sites previously identified in pKMK1 (Dybvig and Khaled, 1990) are shown in Fig. 1, as well as selected unique restriction sites. The low G+C content of pKMKl,29%, is similar to the estimated percentage for the genome of M. mycoides (Weisburg et al., 1989). Dybvig and Khaled (1990) previously reported that there was a limited region of homology between pKMK 1 and pADB20 1 identified by Southern blot analysis. Comparison of nucleotide sequencesreveals that, within this region, the homology between the two plasmids is extensive. Between nucleotides 941 and 1673 of pKMK1, greater than 90% of the nucleotides are identical to those of the corresponding region of pADB20 1. A total of 11 ORFs were found that could encode for polypeptides at least 40 amino acids long, beginning with an AUG, GUG, or UUG codon, and taking into account the fact that UGA encodes for tryptophan. Eight of the 11 ORFs began with GUG or UUG; however, a conventional ribosome binding site (RBS) could not be located for any of these
87
ORFs. The three remaining ORFs would all begin with an AUG-encoded methionine residue. The largest of these ORFs, located between nucleotide positions 293 and 967 (ORF l), could encode for a polypeptide 225 amino acids in length having a molecular weight of 27.2 kDa. This polypeptide, designated polypeptide 1, would contain four tryptophan residues, three encoded by UGA codons and one by the conventional UGG codon. At the amino acid level, polypeptide 1 and polypeptide A of pADB20 1, by the parameters of the GAP program, are 35% identical and 58% similar. ORF A has homology with genesencoding proteins involved in the replication of staphylococcal plasmid pE 194 (repF) and streptococcal plasmid pLS1 (repB) (Bergemann et al., 1989). pE194 and pLS 1 belong to a family of gram-positive bacterial plasmids that replicate via a singlestranded DNA intermediate (ssDNA plasmids) in which a rolling-circle mechanism is thought to be the mode of replication (Gruss and Ehrlich, 1989). Polypeptide 1 ofpKMK1 is 25% identical to the repF and the repB gene products, while it is 49 and 45% similar, respectively. In Fig. 2, these four polypeptides have been optimally aligned with each other using the COMPARE and PRETTY programs from the GCG. The homology is more extensive in the amino terminal domain of the polypeptides (Fig. 2). A second ORF located between positions 67 and 225, designated ORF 2, could encode for a polypeptide of 53 amino acids. pADB201 contains an open reading frame, located just 5’ of ORF A, encoding a small polypeptide of 52 amino acids. At the amino acid level, there is 56% identity and 87% similarity between ORF 2 and this small polypeptide from pADB201. A 35% identity and 55% similarity exist between ORF 2 and the COP protein from pE194, a protein involved in the replication of that plasmid (Byeon and Weisblum, 1990). A third ORF is located between positions 1608 and I8 17 and could encode for a 70 amino acid polypeptide. However, this putative polypeptide contains two CGG (arginine) codons, and recent evidence
SHORT COMMUNICATIONS
FIG. 1. Nucleotide sequence of pKMKl and amino acid sequence of ORF I and ORF 2. Putative - 10 and -35 promoter sequences (-), ribosome binding sites (RBS) (- - -), rho-independent terminator (--TERM--) (basespaired in the stem of the putative terminator are double underlined), and plus-strand (a . (+) . . ) and minus-strand (A A( -)A A) origins of replication are indicated.
from Mycoplasma capricolum, a mycoplasma closely related to M. mycoides, suggeststhat this codon may not be used in some mycoplasmas (Andachi et a/., 1989; Oba et al., 1991). Therefore, in light ofthis evidence, aswell as the lack of a conventional ribosome binding site 5’ of the start codon, this may not be a gene. Using the GCG TERMINATOR program (Brendel and Trifonov, 1984a,b), a rho-independent terminator was located downstream of ORF 1 (Fig. 1). One could not be located downstream of ORF 2. Conventional promoter sequencescould be located upstream of ORF 2 (Fig. I), but not upstream of ORF 1. Together, ORF 1 and ORF 2 may be tran-
scribed as a single message.A similar arrangement has been described for the replication proteins of pLS 1 (Puyet et al., 1988) and lactococcal plasmid pFX2 (Xu et al., 1991). Significant effort has been made to determine the location of the origin of replication for both plasmid strands for many of the ssDNA plasmids (Gruss and Ehrlich, 1989; Novick, 1989). Both origins, located on the same strand of DNA, are capable of forming separate hairpins. The plus-strand origins (POS) of pE 194 and related plasmids contain a consensus sequence, S-TACTACGA-3’, within which lies the site at which the DNA is nicked to initiate replication (Gruss and Ehrlich, 1989; Novick, 1989). pKMK 1 contains,
SHORT COMMUNICATIONS
89
DNA intermediate to a double-stranded molecule, is larger than the P-O, orientation specific, and consists of imperfect palindromic sequences(Gruss and Ehrlich, 1989). A putative M-O of pADB201 has previously been identified (Bergemann, 1990; Bergemann and Finch, 1990) and shown to be very similar to the M-OS of Bacillus subtifis plasmids pBAA 1 and pLS 11 (Chang et al., 1987; Gruss and Ehrlich, 1989). These M-OS are composed of two pairs of palindromic sequences and one pair of imperfect palindromic sequences, with nucleotide basesseparating the members of each pair. Since the M-O of pADB201 lies within the region sharing strong homology with pKMK 1 sequences,it is highly probable that these sequences also represent the M-O of pKMK 1. The putative M-O of pKMK 1, therefore, extends from nu-
ACG ,+
.
R--------
repF
---------
FIG. 2. Optimal alignment of predicted amino acids of replication proteins from pKMK1 (ORF l), pADB201 (ORF A), pLSl (repB), and pE194 (repF) using COMPARE and PRETTY programs from the University of Wisconsin Genetics Computer Group. Asterisk (*) representsidentity between ORF 1and corresponding polypeptide. Spaces(-) have been introduced for optimal alignment.
T C A0
A C C
A
T G-C T
G-C G-C G-C G-C G-C A-T 076 T-A 902
G A
T ’ G-C ’
pE194
G-C G-C G-C G-C G-C
pLS1
G-T T-A A-T 431 T-A 460
A c AT A at nucleotide positions 1774 to 1782, a simiC G C lar sequence, 5’-TACTACCGA-3’. Based A T G upon analysis of this region using the GCG 0 G A T-A G-C FOLD and STEMLOOP programs, the first C-G G-C five nucleotides of this sequence lie within a G-C C-G pKMK1 pADB201 loop flanked by nucleotides that can form an G-C T-A G-C 11-bp stem containing no mismatches (Fig. T-A G-C A-T 3). This putative P-O is similar to that of 66 Ix 87 T-A pE 194, pLS 1, and pADB20 1 (Gruss and EhrT-A lich, 1989), except that it is not as G+C rich. T-A Perhaps the stem of pKMKl’s putative P-O is longer than the stem of the other origins to 1762”‘,792 compensate for the lack of G+C base pairs. FIG. 3. Potential hairpin-loop structures for plusThe minus-strand origin (M-O), essential strand origins (P-OS) of pE 194, pLS 1, pADB20 1, and for the conversion of the single-stranded pKMK1.
90
SHORT COMMUNICATIONS pBAAl
M-O
1 -A
1
1
I
TGGCGTGAGTCAACGGTAACFGGACCGTAGGGAGGATTAAGGA~GCTCAG~CCCGAACCCTTTC
2
2
3
3
AGCAC~CC&&??i?ACGCCAACCGGCGAGGGAG=AAGATG&%i%iTTGGGGGGAT
pLSl1
M-O
1
I
__)A
1
1
TGGCGTGAGTCAACGGTAACCGGACCGTAGGGAGGATTAAGGA~GCTCAG~CCCGAACCCTTTC
2
2
3
3
AGCACTCAAACAAAdCCmCCAACCGGCGAGGGAG=AAGAAG&GGTTGGGGGGAT
pKMKl 1
M-O I
1
1
~CTTT~AATTTAAGTATAG~GTTACACTT~GTTTAAAATGCTTATTTTTTAGGTAAG
2
2
3
3
TACATTTAC~AA~AAAGATAAGAAAAAGTTG~GGTTA&TGTTACAACGCCTT l
FIG. 4. Minus-strand origins (M-OS) for BaciNus s&&is plasmids pBAA 1 and pLS 11 and Mycoplasma mycoides subsp. mycoides plasmid pKMK1. (*) Denotes a G to A transition from the M. mycoides subsp. mycoides plasmid pADB20 1 M-O sequence to that of pKMK 1.
cleotide position 1129 (the beginning of the first pair of palindromic sequences) to 1258 (the end of the third set of palindromic sequences) (Fig. 1). Figure 4 shows the M-OS of plasmids pBAA 1, pLS 11, and pKMK1. The M-O of pKMK1 differs from the M-O of pADB20 1 only by a single G to A transition. The homology of ORF 1 and 2 to polypeptides from pADB20 1 and other plasmids suggeststhat pKMK1 belongs to a large family of ssDNA plasmids which replicate by a singlestranded DNA intermediate. The region of extensive homology between pKMK1 and pADB20 1 contains the M-O and may suggest a common derivation of the two plasmids. The conservation of this region might also be related to a phenomenon involving the horizontal exchange and maintenance of sequence “cassettes” between closely related ssDNA plasmids (Novick, 1989). Determining the nucleotide sequence of pKMK 1 is an important step in the development of a cloning vector(s) for studying mycoplasmal genetics. Based on the sequence
data, candidate cloning sites can be identified for insertion of selectable markers into pKMK 1. One such site is the unique EcoRI site (Fig. l), which lies between the plus and minus origins of replication. Potential cloning vectors can be tested for their ability to replicate in M. mycoides subsp. mycoides using the polyethylene glycol-mediated transformation procedure we have recently described (King and Dybvig, 1991). ACKNOWLEDGMENTS This work was supported by Public Health Service Grant AI25640 from the National Institutes of Health. The Genetics Computer Group programs were made available to us through the University of Alabama at Birmingham Center for AIDS Research, supported by Public Health Services Grant P30 AI27767 from the National Institutes of Health. We thank L. Finch for providing valuable information and suggestions.
REFERENCES ANDACHI, Y., YAMAO, F., MUTO, A., AND OSAWA, S.
(1989). Codon recognition patterns as deduced from
SHORT COMMUNICATIONS
91
highly interrelated single-stranded deoxqribonuclelc sequencesof the complete set of transfer RNA species acid plasmids. Microhiol. Rev. 53, 23 l-24 1. in Myeoplasma capricolwn J Mol. Biol. 209, 31-H. BERGEMANN,A. D. (1990). Ph.D. thesis. University of HANAHAN,D. ( 1983).Studies on transformation ofl-:\c /Ierichia co/r with plasmids. .J..Wo/ Biol. 166, 557-580. Melbourne. BERGEMANN,A. D., AND FINCH, L. R. (1990). Analysis HEFFRON.F.. So. M., AND MCCARTHY, B. J. (1978). I/r vitro mutagenesisof a circular DNA molecule by using of a mycoplasma plasmid as a cloning vector. I.O.M. synthetic restriction sites. Proc. Nat/. .kad. SC,;.C’SA Lett. 1, 207-208. 75,6012-6016. BERGEMANN,A. D., WHITLEY, J. C., AND FINCH, L. R. KING, K. W., AND DYBVIG, K. (199 1). Plasmid transfor(1989). Homology of mycoplasma plasmid pADB201 mation of Mycopiasma mycoides subsp. mycoides is and staphylococcal plasmid pE 194. J. Bacterial. 171, promoted by high concentrations of polyethylene gly593-595. col. Plasmid 26, 108-l 15. BRENDEL,V., AND TRIFONOV, E. N. (1984a). A comNOTARNICOLA,S. M., MCINTOSH, M. A., AND WISE. puter algorithm for testing potential prokaryotic terK. S. (1990). Multiple translational products from a minators. Nucleic Acids Rex 12,44 1I-4421. Mycoplasma hyorhinis gene expressed in Escherichia BRENDEL,V., AND TRIFONOV.E. N. (1984b). Computercoli. J. Bacterial. 172, 2986-2995. aided mapping of DNA-protein interaction sites. In NOVICK, R. P. (1989). Staphylococcal plasmids and their “Role of Data in Scientific Progress. Proceedings of replication. .Innu. Rev. Microbial. 43, 537-565. the 9th International Conference ofthe Committee on OBA, T., YOSHIKI, A., MUTO. A., AND OSAWA.S. (199 1). Data for Science and Technology” (D. S. Glaeser. CGG: An unassigned or nonsense codon in M.vcoEd.), pp. 115- 118. North-Holland, Amsterdam. plasma capricoium. Proc. Natl. Acad. Sci. USA 88, BURDETT, V., INAMINE, J., AND RAJAGOPALAN,S. 921-925. (1982). Multiple tetracycline resistance determinants PUYET, A., DEL SOLAR, G. H., AND ESPINOSA.M. in Streptococcus.In “Microbiology. 1982” (D. Schle(1988). Identification of the origin and direction of singer, Ed.), pp. 155- 158.American Society for Microreplication of the broad-host-range plasmid pLS 1. NIPbiology, Washington. DC. cleic Acids Rev. 16, 115-l 33. WEISBUR~.W. G., TULLY, J. G.. ROSE,D. L., PETZEL. BYEON,W.-H., AND WEISBLUM,B. (1990). Replication J. P., OYAIZU. H., YANG, D., MANDELCO. L., genes of plasmid pE194- cop and repF: Transcripts SECHREST.J.. LAWRENCE.T. G., VAN ETTEN. J.. and encoded proteins. J. Bacreriol. 172, 5894-5900. MANILOFF.J., AND WOESE.C. R. (1989). A phylogeCHANG, S., CHANG,S.-Y., AND GRAY, 0. (1987). Strucnetic analysis of the Mycoplasmas: Basis for their clastural and genetic analyses of a par locus that regulates sification. J. Bacterial. 171, 6455-6467. plasmid partition in Bacillus suhtilis. J. Bacterial. 169, Xu, F.. PEARCE,L. E., AND Yu, P.-L. ( 1991). Genetic 3952-3962. Analysis of a lactococcal plasmid replicon. Mol. Gen. DEVEREUX,J., HAEBERLI.P., AND SMITHIES,0. (1984). Genet. 227, 33-39. A comprehensive set of sequence analysis programs YAMAO, F., MUTO, A., KAWAUCHI,Y.. IWARI, M., IWAfor the VAX. Nucleic Acids Res. 12, 387-395. GAMI, S., AZOMI. Y.. AND OSAWA,S. (1985). UGA is DYBVIG, K. (1990). Mycoplasmal Genetics. Annul. Rev. read as tryptophan in Mycoplasma capricolttm. Proc. Microbial. 44, 8 1- 104. Natl. Acad Sci. USA 82, 2306-2309. DYBVIG,K., AND KHALED. M. ( 1990).Isolation and char- ZUCKER. M.. AND STIEGLER.P. (1981). Optimal comacterization of a second cryptic plasmid from Mycoputer folding of large RNA sequencesusing thermodyplasma mycoides subsp. mycoides. Plasmid 24, 153namics and auxiliary information. Nucleic Acid.r Re,s. 155. 9, 133-148. GRUSS,A., AND EHRLICH, S. D. (1989). The family of Communicated by Stuart B. LPIJV