FEMS Microbiology Letters 159 (1998) 201^207
Cloning and sequence analysis of the putative rifamycin polyketide synthase gene cluster from Amycolatopsis mediterranei Thomas Schupp a; *, Christiane Toupet a , Nathalie Engel a , Stephen Go¡ b
b
a Novartis Pharma AG, Core Technology Area, K-681.3.44, CH 4002 Basel, Switzerland Novartis Crop Protection Inc., Biotechnology and Genomics Center, Research Triangle Park, NC, USA
Received 14 November 1997 ; revised 7 December 1997; accepted 9 December 1997
Abstract The 54-kbp Type I polyketide synthase gene cluster, most probably involved in rifamycin biosynthesis by Amycolatopsis mediterranei, was cloned in E. coli and completely sequenced. The DNA encodes five closely packed, very large open reading frames reading in one direction. As expected from the chemical structure of rifamycins, ten polyketide synthase modules and a CoA ligase domain were identified in the five open reading frames which contain one to three polyketide synthase modules each. The order of the functional domains on the DNA probably reflects the order in which they are used because each of the modules contains the predicted acetate or propionate transferase, dehydratase, and L-ketoacyl-ACP reductase functions, required for the respective step in rifamycin biosynthesis. z 1998 Federation of European Microbiological Societies. Published by Elsevier Science B.V. Keywords : Amycolatopsis mediterranei; Rifamycin; Ansamycin; Polyketide synthase; Polyketide synthase gene cluster ; Cloning; Sequence analysis
1. Introduction Rifamycins form an important group of macrocyclic antibiotics which inhibit bacterial DNA transcription very speci¢cally by interacting with the DNA-dependent RNA polymerase [1]. These antibiotics are most e¡ective against Gram-positve bacteria and they are clinically broadly applied against Mycobacterium tuberculosis infections. Some rifamycin derivatives have been found to be e¡ective against di¤cult-to-treat M. avium and M. intracellulare infections in patients with AIDS [2], and against pneumococci with decreased susceptibility to penicil* Corresponding author.
lin [3]. Rifamycins with lipophilic side chains were found to be active at higher concentrations against RNA tumor viruses [4]. The starting compound for semisynthetic and therapeutically useful rifamycins is rifamycin B, which is produced by the soil bacterium Amycolatopsis mediterranei, belonging to the order Actinomycetales [5]. Rifamycin B is a macrocyclic polyketide, containing a naphthoquinone chromophore spanned by a long aliphatic bridge. Incorporation studies with 13 C enriched precursors and biosynthetic analysis have demonstrated that 3-amino-5-hydroxybenzoic acid, derived from the shikimate pathway, and eight propionate and two acetate units are condensed to build both the chromophore and the aliphatic bridge of
0378-1097 / 98 / $19.00 ß 1998 Federation of European Microbiological Societies. Published by Elsevier Science B.V. PII S 0 3 7 8 - 1 0 9 7 ( 9 7 ) 0 0 5 6 9 - 7
FEMSLE 7995 5-2-98
202
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
the rifamycin B molecule [5]. Based on these data it was concluded that rifamycin B and the other rifamycins produced by A. mediterranei are synthesized by a polyketide synthase (PKS), that uses a 3-amino5-hydroxybenzoic acid starter unit and sequentially adds acetate and propionate units. Ten successive condensation steps and the correct processing of the keto groups lead to the formation of the rifamycin ring system. To obtain information on the genes involved in rifamycin synthesis and their organization, we have undertaken and describe here the cloning and DNA sequence analysis of the putative rifamycin PKS gene cluster. This information will be important to improve our understanding of rifamycin biosynthesis and for the future application of molecular genetic techniques to rationally manipulate rifamycin biosynthesis, either to increase the productivity of industrial strains further or to produce modi¢ed rifamycins for clinical applications.
2. Materials and methods
ular weight DNA was isolated from these cells as described [8]. Southern blot analysis and cloning of DNA fragments was as described earlier [7,9]. Nick translation was done with the Gibco/BRL Kit (Life Technologies AG, Basel, CH) according to the manufacturer's instructions. To clone the 13-kbp BglII genomic A. mediterranei fragment, BglII fragments of the size range of 12^16 kbp were isolated from a 0.8% agarose gel by electroelution [7] and cloned into the E. coli positive selection vector pIJ4642, which was derived from pIJ666 [10] by removal of the BglII-BclI fragment containing the neo gene. Chloramphenicol resistant colonies of E. coli HB101 transformed with this gene library were analysed by colony hybridization [7] using as probe the 32 P-dCTP labeled 3.8-kbp PvuI fragment of the plasmid p98/1 encoding part of the soraphen PKS gene cluster[11]. Positive clones were analysed by Southern blot hybridization. A 5.7-kbp KpnI fragment, internal to the 13-kbp BglII fragment of A. mediterranei, was isolated and used as a probe to screen the A. mediterranei cosmid library. 2.3. Construction of an A. mediterranei genomic library
2.1. Bacterial strains and culture conditions A. mediterranei LBG A3136 wild type (from the Ciba-Geigy collection) was used in this work [5]. The strain was cultivated at 28³C in liquid medium NL148G [6] without glycine for vegetative growth and with 2.5 g l31 glycine for the isolation of chromosomal DNA. Escherichia coli HB101 and DH5K were used for cloning of DNA and propagation of plasmids, and E. coli XL-1BlueMR (Stratagene, La Jolla, USA) for phage lambda transfections. E. coli was grown at 37³C in Luria broth or on Luria agar [7] supplemented with the appropriate antibiotics when needed. 2.2. DNA isolation and cloning Standard genetic techniques for in vitro DNA manipulations and with E. coli were as described [7]. Genomic DNA of A. mediterranei was isolated from cells, grown for 48 h in 50 ml NL148G with 2.5 g l31 glycine, centrifuged at 3000Ug for 30 min, and resuspended in 5 ml SET bu¡er (75 mM NaCl, 25 mM EDTA, 20 mM Tris, pH 7.5). High molec-
Genomic DNA of A. mediterranei was partially digested with Sau3A1 and size fractionated on a sucrose density gradient. DNA fragments of 35 to 45 kbp were ligated to the BamHI-cut pWE15 cosmid vector (Stratagene). The ligated DNA was packaged into lambda phage particles by using the in vitro packaging kit from Stratagene, and transfected into E. coli XL-1BlueMR. 2.4. DNA sequencing Intact cosmid clones or isolated plasmid subclone inserts were fragmented using an Aero-Mist Nebulizer (CIS-US, Bedford, MA, USA) with a nitrogen pressure of 1^2 pounds per square cm. These random fragments were treated with bacteriophage T4DNA polymerase, T4 DNA kinase, and E. coli DNA polymerase in the presence of dNTPs to generate 5P phosphorylated blunt-ended fragments. Fragments were then fractionated on 0.8% low-melting temperature agarose, and 1.5^2-kbp fragments were isolated from the agarose via warm phenol extraction [7].
FEMSLE 7995 5-2-98
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
These fragments were subcloned into blunt-ended, dephosphorylated pBlueScript KS (Stratagene) and introduced by transformation into E. coli DH5K. Plasmid DNA for sequencing was isolated from overnight liquid culture grown in 96-well 2-ml deepblock plates [12]. Sequencing reactions were performed using the `Perkin Elmer/Applied Biosystems Dye Terminator cycle sequencing ready reaction premix' (Catalog number 402122) and 20 cycles of linear ampli¢cation in a thermocycler. Sequencing reactions were ethanol precipitated, resuspended in formamide loading bu¡er, and run on an Applied Biosystems (ABI) Model 377 automated DNA sequencer according to the recommendations of the manufacturer. Gel ¢les were tracked and extracted using ABI Analysis Software. Extracted chromatogram ¢les were transferred to a SUN UltraSparc workstation, and sequence assembly and editing was accomplished using the PHRED/CROSSMATCH/PHRAP/CONSED suite (Phillip Green, University of Washington; Brent Ewing, Washington University in Saint Louis; and David Gordon, Washington University in Saint Louis) and the program GAP [13]. Following assembly of the initial sequencing reactions into multiple contigs, the remaining gaps between contigs were closed by primer-walking, longer reads, and opposite strand sequencing. Low quality areas were improved by additional sequencing coverage. DNA and protein sequences were analysed with the University of Wisconsin Genetics Computer Group programs [13].
3. Results and discussion 3.1. Cloning of a polyketide synthase (PKS) gene cluster from A. mediterranei PKS genes were identi¢ed in the chromosome of A. mediterranei by hybridization to DNA of the Type I PKS gene cluster of Sorangium cellulosum responsible for the biosynthesis of the macrocyclic polyketide antibiotic soraphen A [11]. To clone these genes a 13-kbp BglII fragment from A. mediterranei which gave a hybridization signal in a Southern blot, using as probe a 3.8-kbp PvuI DNA fragment from the soraphen PKS gene cluster, was isolated and
203
cloned in E. coli (see Section 2). A cosmid library of A. mediterranei genomic DNA was constructed and screened for homology to the 13-kbp BglII fragment. Three cosmid clones that gave very strong hybridization signals were identi¢ed. Restriction and DNA hybridization analysis revealed that the three cosmid clones overlap and cover a region of about 61 kbp of the A. mediterranei genomic DNA. A continuous stretch of 54 kbp of this region from the A. mediterranei chromosome has signi¢cant DNA homology to the PKS genes of Sorangium cellulosum and also to PKS genes (eryA) of Saccharopolyspora erythraea governing the biosynthesis of erythromycin [14]. A restriction map of this region in the A. mediterranei chromosome is shown in Fig. 1A. 3.2. Sequence analysis of the 54 kbp PKS gene cluster The cloned 54-kbp chromosomal region of A. mediterranei with homology to PKS genes from S. cellulosum and S. erythraea was completely sequenced (see Section 2). The nucleotide sequence obtained (EMBL No. xyz) was analyzed for open reading frames (ORFs) using the computer program Codonpreference [13]. This analysis revealed ¢ve closely packed, very large ORFs reading in the same direction (see Fig. 1B). All ¢ve ORFs showed a strong bias towards G or C in the third codon position typical for translated genes in high G+C DNA. The overall G+C content in the translated region is 74%. The most probable translational start sites for the ¢ve ORFs were determined by identi¢cation of plausible ribosomal binding sites (RBS) with similarity to the 3P end of the S. lividans 16S rRNA sequence. The deduced amino acid sequences of the ¢ve ORFs are highly repetitive. Each repeating unit corresponds to a catalytic module responsible for a speci¢c round of polyketide chain extension. Ten well de¢ned rifamycin PKS modules were identi¢ed by comparison with the PKS biosynthetic domains of the 6-deoxyerythronolide B synthase (DEBS) of S. erythraea [15]. ORF1 and ORF2 contain three modules, ORF3 and ORF4 each code for a single module and ORF5 encodes two modules. This ¢ts exactly to the number of condensation steps required to build up the rifamycin polyketide chain [5].
FEMSLE 7995 5-2-98
204
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
FEMSLE 7995 5-2-98
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
205
Fig. 1. 53.8-kbp Type I polyketide synthase (PKS) gene cluster of A. mediterranei involved in rifamycin biosynthesis. A: Restriction map of the genomic region and overlapping inserts of the three cosmid clones used for cloning and sequencing. B: Organization of the ten Type I PKS modules, order of the functional domains and postulated biosynthetic intermediates of the rifamycin PKS. Enzymatic activities are indicated as follows: CL, CoA ligase; ACP, acylcarrier protein; KS, L-ketoacyl-ACP synthase; AT, acyltransferase; DH, dehydratase ; KR, L-ketoacyl-ACP reductase. Shaded domains are possibly inactive. C: Structures of rifamycin B, protorifamycin I, protorifamycin (postulated direct product of the rifamycin PKS), and the early intermediate P8/1-OG.
6
3.3. Organization of the enzymatic domains in the PKS gene cluster The localisation of the enzymatic domains in the multifunctional proteins, encoded by the ¢ve ORFs of the polyketide synthase gene cluster, was determined by computer assisted comparison with the well de¢ned active domains of DEBS of S. erythraea [15]. The order of the active domains found in the 10 modules (Table 1) is exactly the same as found in other Type I PKS [15,16]. The range of homology of the di¡erent active domains as compared to the domains of DEBS responsible for the biosynthesis of erythromycin were as follows: KS 62%-65%, AT 38%-56%, DH 41%-49%, KR 43%-52%, ACP 50%54% (% amino acid identity over the domain length). Two classes of AT domains were found. Modules 2 and 9 contain a sequence motif typical for AT domains catalysing the transacylation of malonylCoA, and thus being responsible for incorporation
of acetate extender units [17]. The other 8 modules contain AT domains with sequence motifs typical for AT domains catalysing the transacylation of methylmalonyl-CoA, resulting in the incorporation of propionate extender units into the growing polyketide chain. Eight modules, namely numbers 1 and 4 to 10, contain KR domains with a high degree identity to the KR domains of DEBS from S. erythraea. All these domains contain a potential motif for NADP(H) binding (GxGxxGxxxA) between amino acids 9 and 20, the same as was found in the KR domains of DEBS for erythromycin biosynthesis [15] and the KR domain of the soraphen PKS of S. cellulosum [11]. Module 3 contains a KR domain with lower identity to the KR domains of DEBS, and has an imperfect NADP(H) binding motif (GAEGLGRHAS). Therefore the KR domain of module 3 is probably inactive. No ketoreductase domain is present in module 2, which is the smallest module
Table 1 Deduced functions of the ¢ve ORFs in the PKS gene cluster ORF (amino acids)
Position in gene cluster (start^end point)
ORF1 (4735) Loading domain Module 1 Module 2 Module 3 ORF2 (5069) Module 4 Module 5 Module 6 ORF3 (1763) Module 7 ORF4 (1728) Module 8 ORF5 (3413) Module 9 Module 10
1336^15543
Proposed functions CoA ligase, ACP KS, AT(P), KR, ACP KS, AT(A), ACP KS, AT(P), ACP
15550^30759 KS, AT(P), DH, KR, ACP KS, AT(P), KR, ACP KS, AT(P), DH, KR, ACP 30769^36060 KS, AT(P), DH, KR, ACP 36139^41325 KS, AT(P), DH, KR, ACP 41373^51614 KS, AT(A), DH, KR, ACP KS, AT(P), DH, KR, ACP
Predicted PKS enzymatic activities are indicated as follows: KS, L-ketoacyl-ACP synthase; AT(P), acyltransferase incorporating a propionate unit; AT(A), acyltransferase incorporating an acetate unit; DH, dehydratase ; DH, dehydratase, probably not functional ; KR, L-ketoacylACP reductase; ACP, acylcarrier protein.
FEMSLE 7995 5-2-98
206
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
in the PKS, containing only the core enzymatic functions KS, AT, ACP (Table 1). Eight of the ten modules contain a domain which is 41^49% identical to the DH domain of DEBS. Six of these have a region with good similarity to the postulated DH active site motif (Table 2). Interestingly, modules 6, 7, and 8 seem to contain DH domains that are intact, even though the corresponding positions are hydroxylated in rifamycin B (Fig. 1). Enoyl reduction is probably not required for rifamycin biosynthesis, and none of the 10 modules contains a typical ER domain. 3.4. Loading domains in the N-terminal region of the ORF1 deduced protein The protein deduced from ORF1 has at the Nterminal end a long amino acid extension not assigned to module 1. This leading region contains two enzymatic domains which are probably involved in starter unit activation and attachment to the PKS for rifamycin biosynthesis. The ¢rst domain, 507 amino acids at the N-terminal end, has 52% identity to the starter unit activation domain of the rapamycin PKS (RAPS) from S. hygroscopicus [16], which also has homology to ATP-dependent carboxylic acid-CoA ligases (CL). This similarity suggests that the CL domain in Fig. 1B may form the activated starter unit, 3-amino-5-hydroxybenzoyl-CoA [5]. The region downstream of CL resembles ACP domains of Type I PKS; its role may be the binding of the
activated aromatic starter unit, to the PKS for rifamycin biosynthesis. 3.5. Evidence for involvement of the described PKS in rifamycin biosynthesis Attempts to disrupt the sequenced PKS genes in A. mediterranei were unsuccessful so far, probably because of low transformation and recombination frequencies. It was therefore not possible to test the involvement of these genes in rifamycin biosynthesis by this functional test. However, the described gene cluster of A. mediterranei and the deduced protein sequences match closely the features expected for the rifamycin PKS gene cluster. (i) There are 10 repetitive Type I PKS modules, as required for rifamycin biosynthesis. (ii) The positions of the acetate incorporating AT domains in modules 2 and 9, and of propionate incorporating AT domains in the other eight modules, is exactly as required for the synthesis of the rifamycin polyketide chain. (iii) Correct processing of the L-keto groups by the PKS for 7 of the 10 modules. (iv) Colinearity of the functional modules deduced from the PKS gene cluster with the order of the respective enzymatic steps needed for biosynthesis of the rifamycin ring system (see Fig. 1). (v) There is a speci¢c loading domain, similar to the one of RAPS, for the activation of the aromatic starter unit. (vi) An early intermediate of rifamycin biosynthesis, P8/1-OG, detected in a mutant of A. mediterranei [18], is identical to the postulated
Table 2 DH domains in the ten modules of the putative rifamycin PKS : analysis and alignment of the active site motifs in comparison to the DH domain of DEBS Module DEBS 1 2 3 4 5 6 7 8 9 10
Activity required for rifamycin biosynthesis
Identity to DH domain of DEBS
Active site motif H...G....P
No No No Yes No No No No Yes Yes
42% ^ ^ 48% 41% 44% 49% 49% 49% 47%
HVVGGRTLVP G......LVP ^ ^ HAIGGVVLIP G.......VP HAVGGVVILP HTIGGVVLFP HTLEDLVVVP HVIGGVVLVA HAVRDVVIVP
FEMSLE 7995 5-2-98
Comment
Deletion Absent Absent Active Deletion Active? Active? Active? Active Active
T. Schupp et al. / FEMS Microbiology Letters 159 (1998) 201^207
intermediate of a mutated PKS, in which only the modules 1 to 3 are active (Fig. 1C). The potential DH activities in the modules 6, 7 and 8 are unexpected. It is possible that some of these may be active and that the corresponding hydroxyl groups are introduced later by the action of hydroxylases. Similar discrepancies exist in the case of rapamycin [16]. Protorifamycin, the postulated direct product of the rifamycin PKS, encoded by the described gene cluster, was not found so far in fermentations of A. mediterranei. However, protorifamycin I, detected in a mutant of A. mediterranei [19], di¡ers only by the absence of a hydroxyl group in the naphthoquinone ring and the oxidation of one methyl group of the aliphatic ring (Fig. 1C). It can be assumed that in rifamycin B biosynthesis protorifamycin is transformed in two enzymatic reduction/oxidation steps into protorifamycin I. From protorifamycin I the biosynthetic pathway to rifamycin B, with the intermediates rifamycin W and rifamycin S is well established by biosynthetic and genetic studies [5,20].
Acknowledgments We thank T. Kieser and J. Ligon for helpful discussions and critical reading of the manuscript. We are grateful to R. Amstutz and U. Regenass for supporting this work.
References [1] Wehrli, W. (1977) Ansamycins. Chemistry, Biosynthesis and Biological Activity. Topics Curr. Chem. 72, 21^49. [2] Brogden, R.N. and Fitton, A. (1994) Rifabutin. A review of its antimicrobial activity, pharmacokinetic properties and therapeutic e¤cacy. Drugs 47, 983^1009. [3] Oppenheim, B., Koornhof, H.J. and Austrian, R. (1986) Antibiotic-resistant pneumococcal disease in children at Baragwanath Hospital, Johannesburg. Pediatr. Infect. Dis. 5, 520^524. [4] Szabo, C., Bissel, M.J. and Calvin, M. (1976) Inhibition of infectious rous sarcoma virus production by a rifamycin derivative. J. Virol. 18, 445^453. [5] Ghisalba, O., Auden, A.L., Schupp, T. and Nuëesch, J. (1984) The Rifamycins : Properties, Biosynthesis, and Fermentation. In : Biotechnology and Industrial Antibiotics (Vandamme, E.J., Ed.) pp. 281^327, Decker Inc., New York. [6] Schupp, T. and Divers, M. (1986) Protoplast preparation and regeneration in Nocardia mediterranei. FEMS Microbiol. Lett. 36, 159^162.
207
[7] Maniatis, T., Fritsch, E.F. and Sambrook, J. (1982) Molecular Cloning : a Laboratory Manual. Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. [8] Pospiech, A. and Neumann, B. (1995) A versatile quick-prep of genomic DNA from Gram-positive bacteria. Trends Genet. 11, 217. [9] Hopwood, D.A., Bibb, M.J., Chater, K.F., Kieser, T., Bruton, C., Kieser, H., Lydiate, D.J., Smith, C.P., Ward, J.M. and Schrempf, H. (1985) Genetic manipulation of Streptomyces. A laboratory manual. John Innes Foundation, Norwich, UK. [10] Kieser, T. and Melton, R.E. (1988) Plasmid pIJ699, a multicopy positive-selection vector for Streptomyces. Gene 65, 83^ 91. [11] Schupp, T., Toupet, C., Cluzel, B., Ne¡, S., Hill, S., Beck, J. and Ligon, J. (1995) A Sorangium cellulosum (Myxobacterium) gene cluster for the biosynthesis of the macrolide antibiotic soraphen A: cloning, characterization, and homology to polyketide synthase genes from Actinomycetes. J. Bacteriol., 3673^3679. [12] Birnboim, H.C. (1983) A rapid alkaline extraction method for the isolation of plasmid DNA. Methods Enzymol. 100, 243^ 255. [13] Genetics Computer Group (1996) Sequencing analysis sofware package, version 9. University of Wisconsin Genetics Computer Group, Madison. [14] Donadio, S., Staver, M.J., McAlpine, J.B., Swanson, S.J. and Katz, L. (1991) Modular organization of the genes required for complex polyketide biosynthesis. Science 252, 675^679. [15] Donadio, S. and Katz, L. (1992) Organization of the enzymatic domains in the multifunctional polyketide synthase involved in erythromycin formation in Saccharopolyspora erythraea. Gene 111, 51^60. [16] Aparicio, J.F., Molnar, I., Schwecke, T., Konig, A., Haydock, S.F., Khaw, L.E., Staunton, J. and Leadlay, P.F. (1996) Organization of the biosynthetic gene cluster for rapamycin in Streptomyces hygroscopicus: analysis of the enzymatic domains in the modular polyketide synthase. Gene 169, 9^16. [17] Haydock, S.F., Aparicio, J., Molnar, I., Schwecke, T., Khaw, L.E., Konig, A., Marsden, A.F., Galloway, I.S., Staunton, J. and Leadlay, P.F. (1995) Divergent sequence motifs correlated with the substrate speci¢city of (methyl)malonyl-CoA:acyl carrier protein transacylase domains in modular polyketide synthases. FEBS Lett. 374, 246^248. [18] Ghisalba, O., Fuhrer, H., Richter, W.J. and Moss, S. (1981) A genetic approach to the biosynthesis of the rifamycin-chromophore in Nocardia mediterranei. III. Isolation and identi¢cation of an early aromatic ansamycin-precursor containing the seven-carbon amino starter-unit and three initial acetate/propionate-units of the ansa chain. J. Antibiot. 34, 58^63. [19] Ghisalba, O., Traxler, P. and Nuëesch, J. (1978) Early intermediates in the biosynthesis of ansamycins. I. Isolation and identi¢cation of protorifamycin I. J. Antibiot. 31, 1124^1131. [20] Schupp, T. and Nuëesch, J. (1979) Chromosomal mutations in the ¢nal step of rifamycin B biosynthesis. FEMS Microbiol. Lett. 6, 23^27.
FEMSLE 7995 5-2-98