ELSEVIER
FEMS Microbiology Letters 125 (1995) 3-14
MiniReview
Modular structure of genes encoding multifunctional peptide synthetases required for non-ribosomal peptide synthesis Torsten Stachelhaus,
Mohamed A. Marahiel
Biochemie / Fachbereich Chemie, Philipps Vniuersiriit Marburg, Hans-Meerwein-Str.,
*
35032 Marburg, Germany
Received 2 September 1994; revised 19 October 1994; accepted 20 October 1994
Abstract Peptide synthetases are large multienzyme complexes that catalyze the non-ribosomal synthesis of a structurally diverse family of bioactive peptides. They possess a multidomain structure and employ the thiotemplate mechanism to activate, modify and link together by amide or ester bonds the constituent amino acids of the peptide product. The domains, which represent the functional building units of peptide synthetases, appear to act as independent enzymes whose specific linkage order forms the protein-template that defines the sequence of the incorporated amino acids. Two types of domains have been characterized in peptide synthetases of bacterial and fungal origin: type I comprises about 600 amino acids and contains at least two modules involved in substrate recognition, adenylation and thioester formation, whereas type II domains carry in
addition an insertion of about 430 amino acids that may function as a N-methyltransferase module. The role of other genes associated with bacterial operons encoding peptide synthetases is also discussed. Keywords:
Peptide synthesase; Domain structure; Modulus; Non-ribosomal peptide synthesis; Peptide antibiotic
1. Introduction Peptides with biological activities produced by several bacterial and fungal species belong to a large and diverse family of natural products that includes antibiotics, enzyme inhibitors, plant or animal toxins and immunosuppressants. They are therefore of great benefit to medicine, agriculture, biological research and industry. Some of these peptides are synthesized ribosomally, but although gene-coded, they often
’ Corresponding author. Tel: (49)~6421-285722; Fax: (49)6421-282191; E-mail:
[email protected] 0378-1097/95/$09.50
undergo extensive posttranscriptional modification and proteolytic processing. Bioactive peptides synthesized ‘non-ribosomally’ on a protein-template, the subject of this review, are produced mainly by soil bacteria and filamentous fungi and are of linear, cyclic and branched linear structures (Fig. 1). They contain non-protein amino acids like o-amino acids or hydroxy acids and other amino acid constituents that can undergo extensive modifications, including N-methylation, acylation, glycosylation and covalent linkage to other unusual functional groups [l-3]. The linkage can be as peptide bonds or through the formation of lactones and esters. In the case of depsipeptides, such as enniatin, alternating peptide
0 1995 Federation of European Microbiological Societies. All rights reserved
SSDI 0378-1097(94)00459-5
4
T. Stachelhaus, MA.
Marahiel/
FEMS Microbiology
II. (a)
Fig. 1. Primary structure of bacterial (I) and fungal (II) peptide antibiotics of non-ribosomal origin. (I) a) gramicidin S, b) tyrocidine, c) surfactin, d) bacitracin; II) a) HC-toxin, b) enniatin A, C) cyclosporin A. Amino acid sequence and enzymes that catalyze the peptide synthesis are shown (see Table 1).
and ester bonds are formed between amino acids and hydroxy acids.
2. Multifunctional peptide thiotemplate mechanism
N-methylated
synthetases
and the
The biochemistry of non-ribosomal peptide synthesis by large enzyme complexes called multifunctional peptide synthetases has been investigated over the last 35 years. These studies established the ‘ thiotemplate mechanism’ [3]. According to this model, large multisubunit enzymes ranging from 100 to over 1600 kDa (Table 1) accomplish the non-ribosomal
Letters 125 (1995) 3-14
peptide synthesis [1,4]. They are composed of distinct domains that catalyze the activation of constituent amino acids as acyladenylates and thioesterify the activated amino acids through a covalent interaction with specific thiol groups. In some cases, carboxy thioester bound amino acids are racemized to the D-form or N-methylated. The peptide elongation reaction (transpeptidation) was originally thought to be mediated by a centrally located and enzyme bound single cofactor, the 4’-phosphopantetheine. However, recent biochemical and genetic studies on the biosynthesis of the cyclic peptide antibiotics tyrocidine, gramicidin S and surfactin, have suggested a modified version of the single cofactor thiotemplate mechanism [5,6] in which the presence of multiple cofactors of the 4’-phosphopantetheine type, that are covalently bound to the C-terminal region of each amino acid activating domain (see below), are proposed. Such cofactors represent possible candidates for a direct thioester linkage of activated amino acids and would facilitate peptide bond formation. Thus, in contrast to the original model which suggested that a single 4’-phosphopantetheine cofactor would act as the sole ‘swinging arm’ able to transfer the growing peptide chain from one domain to the next, the later model assumes multiple 4’phosphopantetheine cofactors which act during thioesterification and transpeptidation. Indeed, recent biochemical evidence supports the presence of one cofactor for each amino acid-activating domain, covalently attached to a conserved serine residue located within the thioester formation module (see below). These cofactors are believed to facilitate the ordered shift of carboxy thioester activated amino acids between the aligned domains, resulting in the formation of a specific peptide chain with a defined sequence [7-91. During this elongation reaction, intermediate peptides remain covalently attached to their postulated specific sites. Termination of nonribosomal peptide synthesis includes release of the thioester bound peptide from the enzyme complex either by cyclization, the action of thioesterase or by transferring the peptide chain to a functional group such as a phospholipid. Although amino acid activation by adenylation in the enzyme-catalyzed peptide synthesis resembles that of ribosomal synthesis catalyzed by aminoacyl tRNA synthetases, the two activation mechanisms
Fusarium scripi
Tolypocladium niveum
cyclic hexadepsipeptide [cyc(DHIV3-MeVal),] modified and N-methylated cyclic undecapeptide [cyc(DAla-MeLeuMeLeu-MeVal-MeBmt4-Avu’Sar6-MeLeu-Val-MeLeu-Ala-)]
Enniatin B
Cyclosporin
A
Cochliobolus carbonum
modified tetrapeptide [cyc(DPro-Ala-DAla-Aeo*)]
HC-toxin
acid; MeBmt4,
[I61
[151
1141
D31
[231
11921
f&91
(4R)-4-[(El-2-butenyl]-4-methyl-L-
BA2 (2401, BA3 (380)
SimA (1,689)
Esynl(347)
Htsl (575)
AcvA (421)
acid; DHIV~, D2-hydroxyisovaleric a Abbreviations used are: Om’, ornithine; Aeo ‘, 2-amino-9,10-epoxi-8-oxodecanoic threonine; Abus, a-amino butyric acid; Sar6, sarcosine. b Molecular weights, with the exception of TycB, TycC and BAl-3, were deduced from the DNA sequence.
hygroscopicus viridochromogenes
Penicillium chrysogenum Acremonium chrysogenum Asparagillas nidulans Streptomyces clavuligerus Norcardia lactamdurans Lysobacter lactamgenus Flauobacterium SC12, 154
phsA
Streptomyces Streptomyces
tripeptide [ 8-(L-o-aminoadipyl)-Cys-DVal]
Bialaphos
P-Lactam
BAl(3351,
Bacillus licheniformis
branched cyclic dodecapeptide [Ile-Cys-Leu-ffilu-IIe_(lys-DOmIie-DPhe-His-DAsp-Asn),y,] modified linear tripeptide [Ala-Ala-phosphinotricin]
Bacitracin
~&zbp-Dhl-hd]
SrfA-A (402), SrfA-B (400,
(I&I), SrfA-TE (251, Sfp (26)
Sfi-C
Bacillus subtilis ATCC 21332
acylated cyclic heptapeptide [cyc(acyl-Ght-Leu-DLeu-
Surfactin
L21
GrsT (29) GrsA (1261, GrsB (510), Gsp (28)
Bacillus brevis ATCC 9999
S
cyclic decapcptide [cyc(DPhe-Pro-Val-Or’-Let&]
in biosynthesis
Gramicidin
involved
[1,21
and enzymes
TycA (1221, TycB (2301, TycC (450)
organisms
Bacillus brevis ATCC 8185
producer
cyclic decapaptide [cyc(DPhe-Pro-Phe-DPhe-AsnGln-Tyr-Val-Om’-Leu)]
of structure,
Tyrocidine
fungi, brief description Ref.
and filamentous Enzyme (MW &Da)) b
bacteria Organism(s)
by Gram-positive Structure ’
produced
Peptide antibiotic
Peptide antibiotics
Table 1
6
T. Stachelhaus, MA
Marahiel/
FEMS Microbiology
are unrelated. No sequence homology between the enzymes involved has been observed and a thiol function is not essential for aminoacyl-tRNA synthetases. In addition, peptide synthetases exhibit reduced specificity for substrate amino acids and for ATP-binding. Peptide synthetases, in contrast to aminoacyl-tRNA synthetases, can carry out the activation of substrate amino acids or analogues when 2’-dATP is substituted for ATP [lo]. This finding helped to distinguish peptide synthetases during purification from aminoacyl tRNA synthetases that activate the same amino acid, when the amino acid-dependent ATP-PPi exchange reaction was used. In fact, the thiotemplate mechanism, with its limited potential for peptide synthesis ranging from 2 to 20 amino acid residues, has very little in common with the ribosomal mechanism. Instead, it has more in common with the mechanisms of fatty acid and polyketide synthesis, since in these systems the cofactor 4’-phosphopantetheine and thioesterases are involved Ill]. Another mechanism of non-ribosomal peptide synthesis, which uses aminoacyl phosphate-activated amino acids instead of carboxy thioester linked aminoacyladenylates is involved in the biosynthesis of the tripeptide glutathione and the antifungal cyclic peptide mycobacillin in Bacillus subtilis. The phosphate-activated intermediates of these peptides are not covalently attached to the corresponding multienzyme and the synthetases involved seem to bear little resemblance to those which use the thiotemplate mechanism [12]. The structural features of the latter class, which were revealed from the sequence of the encoding genes, are detailed below.
3. Multidomain structure peptide synthetases
of bacterial
and fungal
Isolation, sequencing and characterization of several genes encoding multifunctional peptide synthetases of bacterial and fungal origin required for non-ribosomal synthesis of tyrocidine, gramicidin S, surfactin, bialaphos, HC-toxin, enniatin, cyclosporin and the tripeptide precursor of p-lactams &(~-aaminoadipyl)-L-cysteinyl-D-valine (ACV) confirmed the multidomain arrangement of peptide synthetases and defined a domain of about 600 amino acid
Letters 125 (1995) 3-14
residues as a highly conserved and repeated functional unit. It was designated the type I domain. For peptide synthetases involved in the non-ribosomal synthesis of N-methylated cyclopeptides such as enniatin and the immunosuppressant cyclosporin, a second domain type has been recently discovered. The type II domain harbours in addition to a type I domain a N-methyltransferase function, and therefore can be considered as a ‘hybrid’ between ‘normal’ peptide synthetase domains of type I and Nmethyltransferases (see below). The organization of both functional units, either type I or type II, within a multifunctional peptide synthetase is colinear to the order that corresponds to the amino acid-or N-methylated amino acid-sequence of the corresponding peptide antibiotic (Fig. 2). As shown in Fig. 2, the genes encoding bacterial peptide synthetases required for the non-ribosomal synthesis of the cyclic peptides gramicidin S, surfactin and tyrocidine are organized in large operons of 20 kb or more [7-91. The grs operon encodes two multienzymes, GrsA and GrsB, which are arranged in five homologous domains that activate the five amino acid constituents of the cyclic peptide gramicidin S. Similarly, three peptide synthetases, SrfA-A, MA-B and SurfA-C are the products of the srfA operon. They comprise a total of seven amino acid activating domains that recognize, activate and in part racemize the cognate seven amino acids of surfactin. All domains encoded by the bacterial operons tyc, srfA and grs as well as those of acvA and htsl, which are of fungal origin, are of type I. The tripeptide G(t_-a-aminoadipyl)-L-cysteinylD-valine (ACV) synthetase and the synthetase of HC-toxin, encoded by acuA and htsl, are multidomain peptide synthetases of about 420 and 570 kDa, respectively. The ACV synthetase, which contains three domains, initiates the biosynthesis of /3-lactam antibiotics in prokaryotes and lower eukaryotes by assembling the tripeptide ACV, the key intermediate in the formation of all penicillins and cephalosporins. In fact, the multidomain structural model proposed for peptide synthetases emerged first by determining the primary structure of the acvA gene encoding the multienzyme ACV synthetase of Penicillium chrysogenum and later of Cephalosporium acremonium and Aspergillus nidulans [13]. A non-ribosomal synthesis for the cyclic tetrapep-
Fig. 2. Schematic diagram of the domain organization of peptide synthetases encoded by the bacterial operons grs, ~$4, fyc and the Black boxes indicate the amino acid activating domains (acyladeaylation module) and shaded areas show the location of the thioester at the C-terminal end of each type I domain. Type II domains, shown in esynl and simA, encode N-methyl peptide synthetases and that carries an N-methyltransferase module (SAM-binding). White boxes represent the non-conserved spacer regions that separate locations of tbiocsterase encoding genes (grsz sq%-TE), the operons associated with these genes (gsp, sfp) and the corn... gene are
(a)
fongal genes amA, htsl, esynl and sirnA. formation module @AN-binding), located have in addition an insertion (dotted area) each amino acid-activating domain. The indicated.
-4
8
T. Stachelhaus, M.A. Marahiel/
FEMS Microbiology
tide HC-toxin, produced by the fungal plant pathogen Cochliobolus carbonum race 1, by a multidomain enzyme was confirmed after the synthetase coding gene, htsl, was cloned and sequenced. Since the 15.7 kb htsl gene encodes a single polypeptide containing four domains of type I, that have homology with each other and with similar domains found in other prokaryotic and eukaryotic peptide synthetases, it is likely that the encoded synthetase has the ability to activate the four cognate amino acids of HC-toxin: L-2-amino-9,10-epoxy-8-oxodecanoic acid (Aeo), D-proline, L-alanine and D-alanine [14]. Interestingly, although the sizes of the adenylate-forming domains and the sizes of the spacer regions in between are quite similar for all peptide synthetases so far sequenced, the HC-toxin synthetase shows one exceptional case. The spacer region between the first and the second domain is 1010 amino acids instead of the usual 500 amino acid residues found in all other peptide synthetases. Due to this large interdomain region, htsl encodes a 17% larger polypeptide than grsB, although both synthetases activate the same number of amino acids and each bears four domains. However, the order of domains in Htsl, based on the sequences of tryptic peptides and partial enzyme activities, is either D-Pro-L-Ala-D-Ala-Aeo, or Aeo-D-Pro-L-Ala-D-Ala. It is therefore tempting to speculate, in the case of the second assignment, that the large spacer between the first and second domain might be responsible not only for the activation of Aeo, but also for its synthesis or modification. Irrespective of the amino acid activated by the different domains of type I found in diverse peptide synthetases, the domains share strong homology (30-80% identity) to each other within a multidomain enzyme and with similar domains of other peptide synthetases. These homologous domains consist of at least two functional modules: an amino acid-dependent adenylation module located near the N-terminal end (Fig. 2, black rectangle) and a thioester linkage module (4’-phosphopantetheine, PAN, binding site) at the carboxy terminal end (Fig. 2, shaded area). In domains that also racemize the constituent amino acid from the L to the D configuration, and only in those, a putative racemization module was also detected (see below, Fig. 2). In the synthesis of N-methylated cyclopeptides such as enniatin and cyclosporin, large single multi-
Letters 125 (1995) 3-14
functional enzymes that follow the thiotemplate mechanism and which also contain mixed multidomains of type I and type II were found to be involved. The depsipeptide enniatin, for example, is synthesized via a repeated condensation of three dipeptide units of D-HIV-MeVal followed by final cyclization. The enniatin synthetase, which carries out this non-ribosomal condensation, is a two domain multienzyme of 347 kDa, encoded by the 9.5 kb esynl gene [15]. The first domain, type I, is necessary for activation and thioester formation of D-HIV (D-2-hydroxyisovaleric acid), whereas the second domain, responsible for valine activation and N-methylation, is of type II. It harbours at least three modules with an adenylation, thioester formation and N-methyltransferase function (see below). Seven such multimodule domains of type II were recently found within the sequence of the cyclosporin synthetase, encoded by the giant 45.8 kb simA gene 1161. The simA gene from Tolypocladium niveum represents the largest genomic open reading frame so far described. It encodes a single polypeptide composed of 15 281 amino acids (1689 243 Da) arranged in eleven domains that correlate with the number of amino acid residues of cyclosporin A. Four domains are of type I and show a very similar arrangement to those detected in bacterial peptide synthetases. The other seven domains belong to type II and carry at the C-terminal boundary in addition to type I modules an insertion of approximately 430 amino acid residues. The arrangement of the modules is very similar to the MeVal-domain of the enniatin synthetase and shows over 50% similarity in all type II modules. The hypothesis that each domain in the peptide synthetases corresponds to the adenylation, thioesterification, racemization or N-methylation of one amino acid in the order shown in Fig. 2 has been supported in the structures of srfA, grs, tyc, esynl, htsl and simA by partial biochemical and genetic data as well as in some cases by determining the sequence and activity of module-containing peptide synthetase fragments. For acuA, a tentative assignment of the three amino acid-activating domains is shown.
4. Modular arrangement
of functional
Sequence alignment of peptide vealed domains as the functional
domains
synthetases rebuilding units,
T. Stachelhaus, MA
Marahiel/FEMS
Microbiology Letters 125 (1995) 3-14
whose occurrence and specific order dictate the number and sequence of the amino acids incorporated into the peptide product. For a better understanding of the structure-function relationship of this type of building block arrangement, putative modules and potentially important residues that might be involved in substrate adenylation, thioester formation, racemization and N-methylation were analysed [1,13]. 4.1. Adenylation
fatty acid and CoA synthetases [7]. All have in common an ATP-dependent carboxy group substrate activation as acyladenylate. Shared among this group is a domain of about 500 amino acids that shows homology in the range of 20-80% identity between the different enzymes. It contains the strictly conserved sequences, core 1 to core 5, shown in Fig. 3. The function of core 1 is unknown. Cores 2 to 5 are believed to be involved in ATP-binding and hydrolysis. Core 2 has a glycine-rich sequence that contains a lysine, and thus resembles the Walker type A motif which forms a so-called phos(GXXXXGKT/S), phate-binding loop. Core 4 is conserved within a large family of ATPases. As suggested from the sequence analysis, the functions of core 2 and core 4 were confirmed by site-directed mutagenesis [17]. The core sequences 2 and 4 of TycA were mutated and analyzed for a phenylalanine-dependent aminoacyl adenylate formation. Mutations of lysine (K) in
module
A thorough analysis of the sequence of a constantly increasing number of peptide synthetase genes and a search for related proteins in the database revealed the existence of a superfamily of adenylate-forming enzymes. The members of this gene family include all peptide synthetases so far sequenced and several adenylating enzymes, including luciferase, the enterobactin synthetases EntF and EntE as well as 4-coumarate CoA ligase, long chain
1 2
LKAGGAYVPID YSGTTGXPKG”
3 4 5 6
GELCIGGxGxARGYL YXTGD "KIRGxRIELGEIE DNFYxLGGHSL
unknown ATP binding ATP binding ATPase motif ATP binding 4'-phosphopantetheine
spacer
HHILXDGW
spacer
raceA racee racec
AYxTExNDILLTAxG EGHGRExIIE RTVGWFTSMYPxxLD FNYLGQFD
unknown unknovm unknown unknown
raceD
9
binding
(thioester
formation!
motif
Fig. 3. A detailed map of the organization of sequence motifs within two domains of the grs operon involved in proline and phenylalanine activation and racemization. The relative location of the core sequences (core 1 to 61, their amino acid sequence in one letter code and putative functions are indicated. Racemization motifs A, B, C, and D are shown at the C-terminal end of the r+Phe domain. Also, the locations of the spacer motif in both domains are marked as are the acyladenylate and thioester modules.
10
T. Stachelhaus,
MA
Marahiel/FEMS
core 2 to arginine or threonine caused a strong reduction of amino acid-dependent activation to 10 and 1% of wild-type level, respectively. Alteration of aspartate (D) in core 4 to asparagine or serine reduced the activation to 75 and 10% of wild-type level. These mutant domains indicated important roles for lysine in core 2 and for aspartate in core 4 within the adenylate-forming module of peptide synthetases. The role of the core sequences 3 and 5 in ATPbinding was investigated by photoaffinity labelling of the purified TycA domain, using 2-azidoadenosine triphosphate (2-azido-ATP), a substrate analogue that is active in the amino acid activation reaction [18]. After irradiation with ultraviolet light, labelled tryptic fragments were purified and sequenced. Since the identified sequences (core 3, core 5 and other closely associated sequences) are strongly conserved in all domains of peptide synthetases and in acyl CoA synthetases, a role in catalyzing aminoacyl adenylate formation was suggested. The involvement of these sequences in the nucleotide binding site was also approached by selective modification of TycA domain with fluorescein 5’-isothiocyanate [19] and by site-specific mutation within the Pro-domain of GrsB. A mutation that converted the conserved first glycine in core 5 to aspartate suggested the essential role of this residue for aminoacyl-adenylation in peptide synthetases [20]. 4.2. Thioester formation
module
Peptide synthetase domains of type I, that catalyze amino acid adenylation and thioester formation, represent an extension of the above mentioned 500 residue adenylate-forming domain. They are about 600 amino acids long and possess an additional thioester module, that bears the sequence of core 6 (Fig. 3). Core 6 is found at the C-terminal end of each amino acid activating domain that catalyzes the formation of amide or ester bonds, but not in the shorter domain of adenylate-forming enzymes. The core 6 sequence is the site of thioester formation in the domains of GrsB-Val and GrsB-Leu [5]. However, it does not contain a cysteine residue and the sequence resembles the binding site of the 4’-phosphopantetheine cofactor in acyl carrier proteins (ACP) of fatty acid- and polyketide synthases. Based on these findings, the presence of multiple panteth-
Microbiology
Letters 12.5 (1995) 3-14
eine binding sites in multidomain peptide synthetases was suggested. The cofactors are believed to be covalently attached to the serine (S) residue of core 6, to which the activated amino acid is then linked as a thioester. The essential role of this serine residue for cofactor binding and thioester formation was confirmed by site-directed mutagenesis. Replacement of the core 6 serine by alanine in the first to fourth domain of surfactin synthetases [21] and in the D-Phe domain of TycA either abolished surfactin production in vivo, or inhibited the thioester formation of phenylalanine in the mutagenized D-Phe domain. The phenylalanine-dependent ATP-PP, exchange activity (acyladenylate formation) was not affected in this mutant [17]. In addition, the serine to alanine substitution in the D-Phe domain of TycA prevented the incorporation of P-(3H)-alanine, a precursor of 4’phosphopantetheine, when the mutated gene was overexpressed in E. coli. However, the wild-type D-Phe domain seems to be a poor substrate for the holo-ACP synthetase of E. co/i, which normally catalyzes the transfer of 4’-phosphopantetheine from coenzyme A to the ACP protein. Only about 2% of TycA overexpressed in E. coli could be charged with (‘4C)-phenylalanine, owing to incomplete modification of TycA with 4’-phosphopantetheine [17]. However, although several ACPs from heterologous sources were recognized by the E. coli ACP synthase and correctly modified, the 6-deoxyerythronolide B synthase 3 of Saccharopolyspora erythraea, involved in the synthesis of the polyketide erythromycin A, was not modified at all in E. coli, even though it bears the 4’-phosphopantetheine-binding motif of ACPs [22]. 4.3. Racemization
module and a spacer motif
By analyzing the sequence of the relatively nonhomologous spacer regions which are located between the amino acid activating domains of the surfactin synthetases, Zuber and co-workers made two interesting observations [8]. They identified two regions of extended sequence similarity at the carboxy terminal end of each domain that catalyzes the racemization of their cognate amino acids from L to the D-form. This putative racemization module spans a region of about 130 amino acids and contains only four highly conserved and ordered sequences (Fig. 3). Domains containing these sequences include the
T. Stachelhaus, MA Marahiel/FEMS
D-Phe domains of TycA and GrsA, the D-LAX domains of SrfA-A and SrfA-B as well as the third domain of ACV synthetase and the first domain of Htsl that may activate D-Pro. However, the D-Ala domains of SimA and Hts synthetases possess no racemization motifs, but do carry the ‘spacer motifs’ (see below). Studies on the racemization of GrsA suggested the involvement of a basic group to act as proton donor/ acceptor during racemization of the enzyme bound thioesterified phenylalanine. Therefore, analysis of these putative racemization sites by directed mutagenesis would help to clarify their possible role. A conserved sequence designated ‘spacer motif was identified in the non-homologous region of peptide synthetases. In these multidomain enzymes, this motif is located at the N-terminal region upstream of the core 1 sequence. However, in domains that have the potential to initiate peptide synthesis, such as GrsA and TycA (Fig. 3), the same motif is located at the C-terminal region of the amino acid activating domain, downstream of core 6. The role of the spacer motif and the significance of its location are unknown. 4.4. N-methylation module of the type II domain The type II domain, which is found only in N-methyl peptide synthetases, has a novel structure. It represents an extension of the type I domain, because it contains an insertion of about 430 amino acids. This insertion is only found in domains that catalyze the N-methylation of their cognate amino acids. These include the MeVal-domain of enniatin synthetase and seven domains (4 X MeLeu, MeVal, MeBmt, Sar) of cyclosporin synthetase [15,16]. Significant sequence similarity of the inserted fragment to adenine- and cytosine-specific DNA and RNA methyltransferases was found. A glycine-rich sequence (VLE/DXGXGXG), which is a common motif of S-adenosyl-methionine (SAM&dependent methyltransferases, is also present in the N-terminal region of the inserted sequence of type II domains. In addition, photoaffinity labelling of the enniatin synthetase using SAM followed by proteolysis and methyltransferase inhibition studies confirmed that a single methyltransferase module is present within the peptide synthetase. These findings suggest that the polypeptide sequence inserted in the type II domain
Microbiology Letters 125 (1995) 3-14
11
between the adenylation and thioester formation modules may contain the N-methyltransferase activity. 5. Genes associated with antibiotic production 5.1. grsT/srfA-TE
The grsT gene, located at the S-end of the grs operon, encodes a protein of 29 kDa that shows strong homology to fatty acid thioesterase type II enzymes of vertebrate origin, which catalyze the release of fatty acids from multifunctional complexes. A GrsT homologue was also found to be encoded by sr$X-TE, the fourth gene of the srfA operon. The srfA-TE gene is located at the 3’-end of the ~$4 operon, downstream of srfA-C (Fig. 2). It encodes a polypeptide of 25 kDa, that shows 31.5% identity to GrsT [9]. Also, associated with the gene cluster encoding enzymes involved in the synthesis of the tripeptide antibiotic bialaphos in Streptomyces hygroscopicus, are two open reading frames encoding putative thioesterases 1231.Both thioesterases are about 27 kDa and show about 31% identity to GrsT. All these putative thioesterases associated with peptide synthetases carry the sequence GHSXG presumed to form the active site within acyltransferases, lipases, fatty acid- and polyketide synthases. Although the function of the putative thioesterases in non-ribosomal peptide synthetases is unknown, they could catalyze transfer of the growing peptide chain or hydrolyze the thioester bond to release the peptide product. Interestingly, at the carboxy terminal end of the ACV synthetase an integrated domain homologous to GrsT and SrfA-TE was found, but not in Htsl, Esynl and SimA. Clarification of the role of genes encoded- or peptide synthetase-integrated putative thioesterases should await further biochemical and genetic analysis. 5.2. sfi / gsp The s& gene which is located 4 kb downstream of the sr@ operon of B. subtilis (Fig. 2) encodes a 244 amino acid protein required in addition to the srfA operon for surfactin production. Some B. subtilis strains unable to produce the lipopeptide surfactin were shown to have an intact srfA operon, but a defective sfp gene (sj$‘). The Sfp’ phenotype was
12
T. Stachelhaus, M.A. Marahiel/ FEMS Microbiology Letters 125 Cl995) 3-14
found to be due to a frame shift mutation that causes the production of a truncated Sfp protein of 165 amino acids. The s& homologue, the gsp gene, located immediately upstream of the S-end of the grs operon in B. breuis, encodes a protein of 237 amino acids that shows 34% identity to Sfp. The gsp gene of B. brevis complemented in trans the B. subtilis sfp” mutation and promoted the production of the lipopeptide antibiotic surfactin [24]. The functional homology of Gsp and Sfp and the sequence similarity of the two proteins to EntD, a protein involved in synthesis of the siderophore enterobactin in E. coli, suggest that they represent a new class of proteins that might be involved in peptide secretion or, in iron acquisition [24]. Both Gsp and Sfp were also found to complement an entD siderophore-deficient mutant unable to produce enterobactin, and to grow at low iron concentration (Borchert and Marahiel, unpublished). Therefore, a possible involvement of Sfp and Gsp in the secretion of the Bacillus siderophore 2,3-dihydroxy benzoyl glycine or peptide antibiotics was suggested. However, the real function of this new class of proteins in nonribosomal peptide synthesis is still a mystery. 5.3. corns The valine activation domain encoded by srfA-B is not only essential for non-ribosomal synthesis of the iipopeptide antibiotic surfactin, but is also required for the development of genetic competence in B. subtilis. Bacterial competence, which is genetically determined, may be defined as the physiological state that permits the uptake of exogenous DNA. In the complex regulatory pathway of competence development, the region encoding the Val-domain of SrfA-B was found to occupy a position intermediate between the induction of early and late competence genes. Recently it has been shown that the functions included in the Val-domain such as adenylation and thioester formation are not essential for competence induction. Instead, a small open reading frame designated corns, which encodes a polypeptide of 46 amino acids embedded within the srfA-B gene (Valdomain) and is translated in a different frame, is the essential component required for competence (Fig. 2) 125,261. This was underlined by the following experiment: an amber mutation in the comS reading frame, which prevents the development of competence, was
suppressed in trans by a tRNA suppresser mutation (SUP-~), indicating that corns translation is required for development of competence (P. Zuber, personal communication). Additional studies are needed to identify the Corns target in order to distinguish its role in this regulatory pathway.
6. Conclusions Analysis of the primary structure of peptide synthetases involved in non-ribosomal synthesis of peptide antibiotics revealed a highly conserved and ordered domain structure. The repeated functional building units are composed of distinct modules involved in ATP-dependent acyladenylation, thioester formation and in some cases modification of the activated amino acids (N-methylation or racemization). The designated acyladenylation module, which comprises about 500 amino acids, is conserved within a superfamily of adenylating enzymes and is believed to be essential for amino acid recognition and activation. Type I domains bear in addition to the acyladenylation module a thioester forming region of about 100 amino acid residues. The serine residue of core 6 motif (LGGHSL), which is an integral part of this thioester formation module, is essential for binding of the cofactor 4’-phosphopantetheine and for thioester formation. Type II domains, found only in fungal N-methyl peptide synthetases, contain in addition to type I domains an insertion of about 430 amino acid residues, that seem to catalyze the Nmethylation of the cognate amino acids. Modules and domains act as independent enzymes, whose occurrence and specific linkage order form the protein-template that defines the number and sequence of the amino acids incorporated into the peptide product. These findings should promote the development and realization of programming changes in peptide synthetases for the synthesis of diverse biologically active peptides.
Acknowledgements We are grateful to G. Weber and P. Zuber for providing unpublished data, and D. Weiss and S.
T. Stachelhaus, MA
Marahiel /FEMS Microbiology Letters 125 (1995) 3-14
Borchert for critical reading of the manuscript. Work in the authors’ laboratory was supported by the Deutsche Forschungsgemeinschaft and the Fond der Chemischen Industrie. References [l] Zuber, P., Nakano, M.M. and Marahiel, M.A. (1993) Peptide antibiotics In: BaciZfus subtilis and other Gram-positive bacteria (Sonenshein, A.L., Hoch, J.A. and Losick, R., Eds.), pp. 897-916. Am. Sot. Microbial., Washington, DC. [2] Kleinkauf, H. and von Dohren, H. (1990) Nonribosomal biosynthesis of peptide antibiotics. Eur. J. Biochem. 192, 1-15. [3] Lipmann, F. (1980) Bacterial production of antibiotic polypeptides by thiol-linked synthesis on protein templates. Adv. Microbial. Physiol. 21, 227-266. [4] Marahiel, M.A. (1992) Multidomain enzymes involved in peptide synthesis. FEBS Lett. 307, 40-43. [5] Schlumbohm, W., Stein, T., Ullrich, C., Vater, J., Krause, M., Marahiel, M.A., Kruft, V. and Wittmann-Liebold, B. (1991) An active serine is involved in covalent substrate amino acid binding at each reaction center of gramicidin S synthetase. J. BiolChem. 266, 23135-23141. k11 Stein, T., Vater, J., Kruft, V., Wittmann-Liebold, B., Franke, P., Panico, M., MC Dowell, R. and Morris, H.R. (1994) Detection of 4’-phosphopantetheine at the thioester binding site for L-valine of gramicidin S synthetase 2. FEBS Lett. 340, 39-44. [7] Turgay, K., Krause, M. and Marahiel, M.A. (1992) Four homologous domains in the primary structure of GrsB are related to domains in a superfamily of adenylate-forming enzymes. Molec. Microbial. 6, 529-546. [S] Fuma, S., Fujishima, Y., Corbell, N., D’Souza, C., Nakano, M.M., Zuber, P. and Yamane, K. (1993) Nucleotide sequence of 5’ portion of s1$4 that contains the region required for competence establishment in Bacillus subtilis. Nucleic Acids Res. 21, 93-97. [9] Cosmina, P., Rodriguez, F., de Ferra, F., Perego, M., Venema, G. and van Sinderen, D. (1993) Sequence and analysis of the genetic locus responsible for surfactin synthesis in Bacillus subtilis. Molec. Microbial. 8, 821-831. [lo] Krause, M., Marahiel, M.A., von DGhren, H. and Kleinkauf, H. (1985) Molecular cloning of an omithine-activating fragment of the gramicidin S synthetase 2 gene from BaciZZus breuis and its expression in Escherichia coli. J. Bacterial. 162, 1120-1125. [ll] Donadio, S. and Katz, L. (1992) Organization of the enzymatic domains in the multifunctional polyketide synthase involved in erythromycin formation in Saccharopolyspora erythraea. Gene 111, 51-60. [12] Meister, A. (1988) Glutathione metabolism and its selective modification. J. Biol. Chem. 263, 17205-17208. [13] Aharonowitz, Y., Bergmeyer, J., Cantoral, J.M., Cohen, G.,
[14]
1151
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
13
Demain, A.L., Fink, U., Kinghom, J., Kleinkauf, H., MacCabe, A., Palissa, H., Pfeifer, E., Schwecke, T., Liempt, H.v., Dohren, H.v., Wolfe, S. and Zhang, J. (1993) 6&raminoadipyl)-L-cysteinyl-Dvaline synthetase, the multienzyme integrating the four primary reactions in b-lactam biosynthesis, as a model peptide synthetase. Bio/Technology 11, 807-810. Scott-Craig, J.S., Panaccione, D.G., Pocard, J.-A. and Walton, J.D. (1992) The cyclic peptide synthetase catalyzing HC-toxin production in the filamentous fungus Cochliobolus carbonam is encoded by a 15.7~kilobase open reading frame. J. Biol. Chem. 267, 26044-26049. Haese, A., Schubert, M., Herrmann, M. and Zocher, R. (1993) Molecular characterization of the enniatin synthetase gene encoding a multifunctional enzyme catalysing N-methyldepsipeptide formation in Fusarium scirpi. Molec. Microbiol. 7, 905-914. Weber, G., Schijrgendorfer, K., Schneider-Scherzer, E. and Leitner, E. (1994) The peptide synthetase catalyzing cyclosporin production in Tolypocladiun niueum is encoded by a giant 45.8~kilobase open reading frame. Current Genetics 26, 120-125. Gocht, M. and Marahiel, M.A. (1994) Analysis of core sequences in the BPhe activating domain of the multifunctional peptide synthetase TycA by site-directed mutagenesis. J. Bacterial. 176, 2654-2662. Pavela-Vrancic, M., Pfeifer, E., van Liempt, H., Schafer, H.-J., von Ddhren, H. and Kleinkauf, H. (1994) ATP binding in peptide synthetases: determination of contact sites of the adenine moiety by photoaffinity labeling of tyrocidine synthetase 1 with 2azidoadenosine triphosphate. Biochemistry 33, 6276-6283. Pavela-Vrancic, M., Pfeifer, E., SchrBder, W., von Ddhren, H. and Kleinkauf, H. (1994) Identification of the ATP-binding site in tyrocidine synthetase 1 by selective modification with fluorescin 5’-isothicyanate. J. Biol. Chem. 269, 1496214966. Tokita, K., Hori, K., Kurotsu, T., Kanda, M. and Saito, Y. (1993) Effect of single base substitutions at glycine-870 codon of gramicidin S synthetase 2 gene on proline activation. J. Biochem. 114, 522-527. D’Souza, C., Nakano, M.M., Corbell, N. and Zuber, P. (1993) Amino-acylation site mutations in amino acid-activating domains of surfactin synthetase: effects on surfactin production and competence development in Bacillus subtilis. J. Bacterial. 175, 3502-3510. Roberts, G.A., Staunton, J. and Leadlay, P. (1993) Heterologous expression in Escherichia coli of an intact multienzyme component of the erythromycin-producing polyketide synthase. Eur. J. Biochem. 214, 305-311. Raibaud, A., Zalacain, M., Holt, T.G., Tizard, R. and Thompson, C.J. (1991) Nucleotide sequence analysis reveals linked N-acetyl hydrolase, thioesterase, transport, and regulatory genes encoded by the bialaphos biosynthetic gene cluster of Streptomyces hygroscopicus. J. Bacterial. 173, 44544463.
14
T. Stachelhaus, MA. Marahiel/FEMS
[24] Borchert, S., Stachelhaus, T. and Marahiel, M.A. (1994) Induction of surfactin production in Bacillus subtilis by gsp, a gene located upstream of the gramicidin S operon in Bacillus breuis. J. Bacterial. 176, 2458-2462. [25] D’Souza, C., Nakano, M.M. and Zuber, P. (1994) Identification of corn& a gene of the srfA operon that regulates the establishment of genetic competence in Bacillus subtilis. Proc. Natl. Acad. Sci. USA 91, 9397-9401.
Microbiology Letters 125 (1995) 3-14 [26] Hamoen, L.W., Eshuis, H., Jongbloed, J., Venema, G. and van Sinderen, D. (1994) A small gene, designated co&, located within the coding region of the fourth amino acidactivation domain of srfA, is required for competence development in Bacillus subtilis. Molec. Microbial., in press.