Article
STAC—A New Domain Associated with Transmembrane Solute Transport and Two-Component Signal Transduction Systems Mateusz Korycinski, Reinhard Albrecht, Astrid Ursinus, Marcus D. Hartmann, Murray Coles, Jörg Martin, Stanislaw Dunin-Horkawicz and Andrei N. Lupas Department of Protein Evolution, Max Planck Institute for Developmental Biology, D-72076 Tübingen, Germany
Correspondence to Andrei N. Lupas:
[email protected] http://dx.doi.org/10.1016/j.jmb.2015.08.017 Edited by T. Yeates
Abstract Transmembrane receptors are integral components of sensory pathways in prokaryotes. These receptors share a common dimeric architecture, consisting in its basic form of an N-terminal extracellular sensor, transmembrane helices, and an intracellular effector. As an exception, we have identified an archaeal receptor family—exemplified by Af1503 from Archaeoglobus fulgidus—that is C-terminally shortened, lacking a recognizable effector module. Instead, a HAMP domain forms the sole extension for signal transduction in the cytosol. Here, we examine the gene environment of Af1503-like receptors and find a frequent association with transmembrane transport proteins. Furthermore, we identify and define a closely associated new protein domain family, which we characterize structurally using Af1502 from A. fulgidus. Members of this family are found both as stand-alone proteins and as domains within extant receptors. In general, the latter appear as connectors between the solute carrier 5 (SLC5)–like transmembrane domains and two-component signal transduction (TCST) domains. This is seen, for example, in the histidine kinase CbrA, which is a global regulator of metabolism, virulence, and antibiotic resistance in Pseudomonads. We propose that this newly identified domain family mediates signal transduction in systems regulating transport processes and name it STAC, for SLC and TCST-Associated Component. © 2015 MRC Laboratory of Molecular Biology. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Introduction Prokaryotes have evolved sophisticated systems to sense their environment and respond to external stimuli. As a widely used principle, two-component signal transduction (TCST) systems serve to regulate phototaxis and chemotaxis, uptake of metabolites, stress response, and sporulation among other basic cellular processes. The two defining parts of a TCST system are a histidine kinase as the sensing and signal transmission component (membrane-bound or soluble) and a cytosolic response regulator as the output component. In transmembrane TCST systems, signal perception by an extracytoplasmic sensory domain triggers conformational changes that are propagated across the membrane to the histidine kinase module. In many kinases, additional intracellular domains, such as PAS and GAF, regulate this process. Upon reaching the
kinase module, the conformational changes cause the phosphorylation of a conserved histidine in the dimerization and histidine phosphotransfer domain (DHp) by the catalytic ATP-binding domain (CA). The phosphate group is then transferred to a conserved aspartate in the receiver domain of the response regulator (REC), which eventually triggers the cellular response, frequently by changing gene expression (for reviews of TCST, see Refs. [1] and [2] for examples). One of the least understood steps in this signaling cascade is how the signal actually crosses the lipid membrane. Previously, it was found that many of the aforementioned receptor proteins contain a small domain in direct continuation of the last transmembrane helix, named HAMP for its presence in histidine kinases, adenylyl cyclases, and methylaccepting chemotaxis proteins [3]. HAMP occurs either as a single unit or in form of poly-HAMP arrays [4] and is thought to function as an adaptor that directly
0022-2836/© 2015 MRC Laboratory of Molecular Biology. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). J Mol Biol (2015) 427, 3327–3339
3328
A Domain Associated with Solute Transport and Signal Transduction
receives the signal from the membrane and transmits it to downstream domains. It was first characterized structurally from the putative receptor Af1503 of the archaeon Archaeoglobus fulgidus and its structure led to the proposal that signal propagation from the membrane domain to the kinase module proceeds by axial rotation of its constituent helices [5].
Af1503 has an unusual architecture in that it lacks an effector domain, its sole cytoplasmic domain being the HAMP domain. Nevertheless, both the full protein and the HAMP domain alone retain the ability to relay transmembrane signals, as judged from chimeric constructs with chemoreceptors, histidine kinases and adenylyl cyclases [5–9]. The gene for
Fig. 1. Genomic environment of Af1502-like proteins. Arrows show whether a gene is located on the plus (left to right) or on the minus (right to left) strand of the chromosome. Overlapping (i.e., translationally coupled) genes are offset relative to each other. Red denotes Af1502 and Af1502-like proteins; orange, TCST proteins; green, integral membrane proteins; blue, soluble components of transmembrane transport systems.
A Domain Associated with Solute Transport and Signal Transduction
Af1503 is embedded in a larger operon, Af1505-1502, on the minus strand of the chromosome. It is translationally coupled at either end to Af1504 and Af1502, respectively, both of which encode proteins of unknown function (Fig. 1) [5]. The first gene in the operon, Af1505, encodes a putative metal-ion transporter and member of solute carrier family 41 (Mg 2+ transporter E, MgtE; SLC41). Based on this operon structure, we surmise that Af1503 is the sensory component of a signaling and transport system and that Af1502 and Af1504 serve as additional regulatory components. Due to our long-standing interest in Af1503 as a model protein for prokaryotic transmembrane signal transduction [5,7–12], we embarked on an analysis of the individual proteins composing the operon. Here, we focus on Af1502 and characterize it structurally and bioinformatically. Its sequence and structural features identify it as the prototype of a novel domain family, STAC, which occurs either as a separate protein, such as Af1502, or as embedded within hybrid receptors that combine membrane domains belonging to the solute carrier 5 (SLC5) family with cytosolic TCST domains. We propose that this new domain family is mechanistically involved in regulating solute transport across the cell membrane.
Results Detection of Af1502-like proteins Af1502 is a small protein of 68 residues, encoded by the last gene in the A. fulgidus operon Af1505-1502 (Fig. 1) [5]. It has no detectable homologs by sequence similarity searches, for example, with BLAST [13], HMMER [14], or HHblits [15]. Given the considerable divergence of the transmembrane sensor Af1503 from its nearest homologs (≤ 25% sequence identity), we considered that Af1502 might have also diverged substantially, beyond the detection sensitivity of current search methods. We therefore took advantage of the fact that the genes encoding Af1502 and Af1503 are translationally coupled, suggesting also a functional coupling of these proteins in the cell, and we examined the genomic context of Af1503-like proteins. We found that, indeed, the closest homologs of Af1503 from the methanogenic archaea Methanoperedens, Methanoplanus, Methanofollis, Methanospirillum, and Methanosphaerula were all followed closely on the chromosome by short proteins (Fig. 1) resembling Af1502 in size, predicted secondary structure, and pattern of hydrophobic residues (Fig. 2a). Iterative PSI-BLAST and HHblits searches with these proteins (see Materials and Methods), using newly identified sequences in each round as starting points for further searches, uncov-
3329
ered homologs in two prokaryotic branches, the archaeal Methanomicrobiales and the bacterial Myxococcales (Fig. 1). In the Methanomicrobiales, the proteins were found in tandem with Af1503 homologs in 10 of 12 detected occurrences; in the Myxococcales, they only formed such a tandem in 1 of 10 occurrences. In both branches, the proteins were frequently found in the genomic vicinity of TCST proteins (11 of 12 occurrences in Methanomicrobiales; 3 of 10 in Myxococcales) and of transmembrane transport components (5 of 12 occurrences in Methanomicrobiales; 3 of 10 in Myxococcales) (Fig. 1). The proteins have pairwise sequence identities of 10– 22% to Af1502 (Fig. 2c), providing a rationale for the inability to connect them to Af1502 by sequence search methods. Nevertheless, their homology to Af1502 is strongly supported by their coupling to Af1503-like homologs, their genomic environment, and their size, predicted secondary structure, and pattern of hydrophobic residues, as seen in Figs. 1 and 2a. Structure of Af1502 For the biochemical characterization of Af1502, we expressed the protein recombinantly in Escherichia coli. The purified protein migrated as a single species on gel size-exclusion columns and was determined to be monomeric by analytical gel filtration and static light scattering (Fig. 3f). The circular dichroism (CD) spectrum showed a typical α-helical shape with two characteristic negative peaks at 208 nm and 222 nm, which did not change upon heating to 95 °C (data not shown), indicating that the protein is well folded and exquisitely stable. For structure determination, we crystallized Af1502 as a selenomethionine (SeMet) derivative. The best-diffracting crystals grew in space group P212121, with two monomers in the asymmetric unit, and data were collected at the selenium K-edge, yielding a dataset to a resolution of 1.6 Å. The structure was solved via the single-wavelength anomalous diffraction method and refined to an Rfree of 21.6%. It shows a four-helical bundle, consisting of two α-hairpins connected by a central loop of nine residues (Fig. 3a–d). The structure confirms closely the consensus secondary structure prediction for the family (Fig. 2a). Its main conserved features are the occurrence of hydrophobic residues at core positions and a G(x)xxA pattern at the tip of the first hairpin. In this pattern, the glycine adopts unusual backbone angles (phi, 90°; psi, − 180°) and the alanine is located at the point where the two helices of the hairpin are closest and generate a strong size constraint. Although clearly recognizable at the tip of both hairpins in the family consensus (Fig. 2a), the G(x)xxA pattern is missing in the second hairpin of Af1502, whose first helix is
3330
A Domain Associated with Solute Transport and Signal Transduction
substantially shorter than the family consensus and whose connector has a unique three-residue deletion (the connector is length invariant in all other members of the family). Given that this pattern and the hydrophobic register of the two hairpins are very
similar in the family consensus, we expect that other members will show a pseudosymmetry not seen in Af1502. A search for structurally similar domains using Dali [16] yielded many matches with Z-scores N 2,
Fig. 2 (legend on next page)
A Domain Associated with Solute Transport and Signal Transduction
which is hardly surprising given the large number of four-helix bundles in the Protein Data Bank (PDB). The best match was to the R1 subunit of ribonucleotide reductase (PDB ID: 6R1R, residues 18–92), at a Z-score of 6. Neither this nor any other match, however, suggested evolutionary or functional hypotheses. The two chains in the asymmetric unit form an extended interface of 580 Å 2 via helices 2 and 3 (Fig. 3e). Although Af1502 behaved exclusively as a monomer in solution (Fig. 3f), the presence of a potential dimer in the crystal structure prompted us to re-assess its oligomeric state, given that Af1502-like proteins are genetically coupled to dimeric receptors. We therefore mapped the conservation pattern of the family, as derived from the alignment in Fig. 2a, onto the Af1502 structure. Only the loops connecting helices 1 with 2 and 3 with 4 are well conserved in the family (Fig. 3b, top), not the potential interface (Fig. 3b, bottom). Analysis of 15N-labeled Af1502 by NMR spectroscopy over a 30-fold concentration range (0.033–1.0 mM) only showed small (but significant) shift changes in 15 N heteronuclear single-quantum correlation (HSQC) spectra in the loop connecting helices 1 with 2. The apparent dissociation constant for this interaction was 100–200 mM. We conclude that any dimerization propensity of Af1502, if present at all, is very low. Af1502 in the context of the Af1505-1502 operon The close genomic association of Af1502-like proteins with Af1503-like membrane receptors suggested that the two proteins may be functionally coupled and possibly interact physically. As Af1502
3331
is a cytosolic protein, the potential interaction partner would have to be the Af1503 HAMP domain. We therefore explored this possibility with purified proteins in vitro and, because HAMP can assume two different conformational states [5,7], we did so with wild-type HAMP and a mutant locked in the alternate state, A291F, but were unable to observe complex formation under any condition (data not shown). Refolding the proteins together from the unfolded state did not affect this outcome. We also considered that Af1502 might have a role in regulating the expression of the operon, even though its structure is not related to known DNAbinding proteins, but were unable to observe any interaction with DNA fragments covering the regions upstream of Af1504 and Af1505 (data not shown). A novel signal transduction domain present in SLC5-like membrane transporters During our search for Af1502-like proteins, we encountered matches that were not stand-alone but embedded within multi-domain proteins (Fig. 4). Although these matches were distant in sequence space (average sequence identity of 12% to Af1502), their length, predicted secondary structure, pattern of hydrophobic residues, and the presence of two G(x) xxA motifs showed them to be homologs of the stand-alone Af1502-like proteins (Fig. 2). Their main systematic difference to the stand-alone proteins is in the level of hydrophobicity at individual positions (Fig. 2b). Specifically, the embedded domains have a lower average hydrophobicity in buried and partly buried positions and a higher one in exposed
Fig. 2. Sequence properties of Af1502 homologs. (a) Alignment of stand-alone Af1502-like proteins (top) and of domains from multi-domain proteins homologous to Af1502 (bottom). Conserved residues (at least 50% identity at a given position within each group) are marked boldface and are summarized between the two alignments (h = hydrophobic), showing extensive agreement. Sequences are colored according to their predicted or, for Af1502, observed secondary structure: α-helices (α1–α4), red; loops (L1–L3), black; propensity for β-structure (blue) was very low throughout the sequences. Columns corresponding to the core residues of Af1502 are highlighted in cyan. Further domains of the multi-domain proteins are indicated in square brackets: CA, catalytic domain of histidine kinases; REC, phosphoacceptor domain of TCST response regulators; FHA, forkhead-associated domain; SLC5, solute carrier family 5 domain; PAS and GAF, ligand-binding domains of the profilin fold; PP2Cc, protein phosphatase; DHp, dimerization and histidine phosphotransfer domain; CC, coiled coils; DUF835, domain of unknown function. Organism abbreviations are as follows: Af, A. fulgidus; Mhun, Methanospirillum hungatei; Metli, Methanofollis liminatans; Metlim, Methanoplanus limicola; Mpet, Methanolacinia petroelaria; Mpal, Methanosphaerula palustris; Metfor, Methanoregula formicica; ANME2D, Candidatus Methanoperedens nitroreducens; AnAeK, Anaeromyxobacter sp. K; Adeh, Anaeromyxobacter dehalogenans; Anae190, Anaeromyxobacter sp. FW-109-C; COCOR, Corallococcus coralloides; A176, Myxococcus sp.; CAP, Chondromyces apiculatus; LILAB, Myxococcus fulvus; MYSTI, Myxococcus stipitatus; sce, Sorangium cellulosum; Hoch, Haliangium ochraceum; Sti aur, S. aurantiaca; Cys fus, C. fuscus; Des ole, D. oleovorans; Ter tur, T. turnerae; Fer fut, Ferrimonas futtsuensis; Tha sp., Thalassobium sp. R2A62; Chr sol, Chryseobacterium solincola; Alc pac, Alcanivorax pacificus; Zoo gan, Zooshikella ganghwensis; Des ace, Desulfobacca acetoxidans; Met the, Methanosaeta thermophila; Nep jap, Neptunomonas japonica; Mar rhi, Marinobacterium rhizophilum; Rhe sp., Rheinheimera sp. A13L; Sap gra, Saprospira grandis; Nov sp., Novosphingobium sp. MBES04; Fle sin, Flexistipes sinusarabici; gam pro, gamma proteobacterium IMCC3088; Aes sal, Aestuariibacter salexigens; Col psy, Colwellia psychrerythraea. (b) Average hydrophobicity of the two sequence groups (red: Af1502-like proteins; blue: domains homologous to Af1502). Where the average hydrophobicity differs between the two groups, the difference is marked by a filled bar in the color of the more hydrophobic group. (c) Heat map of sequence identities between the proteins in (a). Values were calculated based on the presented alignment.
3332
A Domain Associated with Solute Transport and Signal Transduction
positions. We attribute this to their decreased solvent exposure and reduced folding requirements in the context of flanking domains. Most of the multi-domain proteins related to Af1502 belong to a large family broadly represented in Bacteria (e.g., Tertu_0572 from Teredinibacter turnerae, which was the first multi-domain protein we detected in our searches); this family is absent from Archaea (except Methanosaeta) and Eukaryotes.
The only experimentally characterized member of this family is the histidine kinase CbrA, a global regulator of metabolism, virulence, and antibiotic resistance in Pseudomonas species [17–21]. In these proteins, the Af1502-like domain is bracketed N-terminally by a transmembrane domain related to the sodium-solute symporter family 5 (SLC5) and C-terminally by an array of functionally different domains characteristic of TCST systems. Given
Fig. 3. Structure of Af1502. (a–d) Organized vertically, with rotation of the structure as indicated. (a) Cartoon representation in rainbow colors from blue (N-terminus) to red (C-terminus). (b) Surface representation colored according to sequence conservation. Conservation scores were mapped from orange (invariant) to white (highly variable). (c) Surface representation colored by electrostatic potential from negative − 8 kT/e (red) to positive + 8 kT/e (blue). (d) Cartoon representation indicating concentration-dependent chemical shift changes in NMR analysis. Colors correspond to the normalized average shift difference for backbone amides from white (no shift) to red (highest shift = 1.5 ppm). (e) Cartoon representation of the Af1502 crystallographic dimer, with the two monomers colored blue and green, respectively. (f) The refraction index (RI) chromatogram obtained from static light scattering combined with size-exclusion chromatography is shown in gray (left y-axis), and the calculated protein molar mass is shown in black (right y-axis). The calculated value of 7.5 kDa corresponds to a monomer (theoretical molecular mass: 8.3 kDa).
A Domain Associated with Solute Transport and Signal Transduction
3333
Fig. 4. Domain architectures of proteins containing the STAC domain. The domains are as follows: SLC5, solute carrier family 5 domain; STAC, SLC and TCST-associated component; CC, coiled coil; PAS and GAF, ligand-binding domains of the profilin fold; PP2Cc, protein phosphatase; GGDEF, diguanylate phosphatase; EAL, diguanylate phosphodiesterase; DHp, dimerization and histidine phosphotransfer domain; CA, catalytic domain of histidine kinases; REC, phosphoacceptor domain of TCST response regulators; DUF835, domain of unknown function; FHA, forkhead-associated domain. Organism abbreviations are as follows: Ske sti, Skermanella stibiiresistens; Ter tur, T. turnerae; Ect hal, Ectothiorhodospira haloalkaliphila; Ple shi, Plesiomonas shigelloides; Des ace, D. acetoxidans; Des sp. Desulfuromonas sp.; Str toy, Streptomyces toyocaensis; Azo tol, Azoarcus toluclasticus; Vib mim, Vibrio mimicus; Are mal, Arenimonas malthae; Nit lac, Nitrincola lacisaponensis; Geo met, Geobacter metallireducens; Met har, Methanosaeta harundinacea; Hal sp., Halomonas sp.; Vib sp., Vibrio sp.; Vib pon, Vibrio ponticus; Bru neo, Brucella neotomae; Des ole, D. oleovorans; Cys fus, C. fuscus; Sti aur, S. aurantiaca; Arc ful, A. fulgidus.
that the stand-alone Af1502-like proteins are also associated with transmembrane transport and TCST, we decided to name the entire protein domain family STAC, for SLC and TCST-Associated Component. SLC5 proteins are responsible for the Na + coupled symport of different solutes through the membrane [22,23]. The core of their fold consists of two 5-transmembrane units in antiparallel orientation, which between them form the transport channel. Transport proceeds by alternately opening the channel to the periplasmic and cytosolic sides. While SLC5 proteins may contain between 11 and 15 transmembrane helices, they always contain 13 in their STAC-associated form. To investigate the relationship of the STAC-associated proteins to the SLC5 family, we collected sequences homologous to the SLC5-like domain of Tertu_0572 (residues
1–505) using six PSI-BLAST iterations over the non-redundant NCBI protein database filtered at a maximum of 70% sequence identity (nr70). We obtained approximately 5500 sequences above the significance cutoff of E = 0.005, after removal of partial sequences. We clustered these in CLANS, a program that generates a force-directed layout from an all-against-all matrix of BLAST P values [24], using a cutoff of 1.0e-10 (see Materials and Methods). Although we extracted the full-length sequences from the database, only the transmembrane domain matches were considered for clustering. This analysis showed the STAC-containing proteins as a well-connected satellite group to the main SLC5 family (red cluster in Fig. 5). Of the 516 sequences forming this group, 15 consisted entirely of the SLC5-like domain (white dots in the satellite cluster), whereas the others contained a STAC domain,
3334
A Domain Associated with Solute Transport and Signal Transduction
Fig. 5. Cluster map of SLC5 domains, highlighting proteins containing a STAC domain (red) and proteins containing TCST domains, but no STAC domain (blue). Sequences were clustered in CLANS [24] at a BLAST P value cutoff of 1e-10 (see Materials and Methods for details). Each dot represents one protein sequence. BLAST connections are shown as gray lines; the darker a line is, the higher the similarity is.
almost always followed by domains characteristic for TCST signaling (Fig. 4). Conversely, 12 of the 5124 sequences in the main SLC5 clusters also contained TCST-associated domains, however without an intervening STAC domain (blue dots in Fig. 5). As a complementary step, we searched the nr70 database for homologs of the STAC domain using residues 528–608 of Tertu_0572 as a query. The PSI-BLAST search converged after nine iterations, yielding 461 sequences after removal of partial sequences. Although their domain composition is diverse (Fig. 4), over 90% are histidine kinases, of which about half are hybrid kinases, that is, kinases combining the His kinase module with downstream REC domains in the same polypeptide (Fig. 6d). In almost all cases, the STAC domain is connected to downstream domains by a coiled coil of 3–5 heptads, the exception being the five sequences in which it is followed by GGDEF or DUF835 (Fig. 4). In proteins without further domains after STAC, it is mostly also followed by a C-terminal coiled coil. Of the 461 proteins in our search set, only 12 appeared to be cytosolic, carrying the STAC domain at their N-terminus (red dots in Fig. 6a), whereas 449 contained an N-terminal SLC5-like domain. All of the latter are also present in the STAC-containing satellite group of the SLC5 cluster map (Fig. 5), showing the congruence of the two analyses. Clustering in CLANS at a P value cutoff of 1.0e-5 showed that neither the absence of an SLC5-like domain nor the presence of an adaptor domain or the nature of the output domain is indicative of STAC subgroups (Fig. 6).
The STAC cluster map in Fig. 6 does not include any of the stand-alone STAC proteins of archaea and myxobacteria, as these are too divergent to be detected by PSI-BLAST searches. This was also the case for a few occurrences in multi-domain proteins, which nevertheless underscore the association of STAC with signal transduction domains. One of these occurrences, DoLe_0553 from Desulfococcus oleovorans (a δ-proteobacterium), consists of an N-terminal forkhead-associated domain (FHA) responsible for phosphopeptide recognition and a C-terminal STAC domain. DoLe_0553 is part of an operon with a Ser/Thr protein kinase and a diguanylate cyclase. The other occurrences form two protein groups comprising around 10 members each, both in Myxobacteria. One group defines a subfamily of RsbT proteins, in which the RsbT Ser/Thr protein kinase domain is preceded by a STAC domain (e.g., STAUR_1948 of Stigmatella aurantiaca). The second group consists of proteins that are either stand-alone STAC domains or contain two tandem REC domains, followed C-terminally by the STAC domain (e.g., D187_05738 of Cystobacter fuscus).
Discussion We have defined a new protein domain family, STAC, associated with solute transport and TCST, and we have characterized it structurally and biophysically using Af1502 of A. fulgidus. The domain forms a four-helix bundle with its N- and C-termini in
A Domain Associated with Solute Transport and Signal Transduction
3335
Fig. 6. STAC domain cluster maps, colored according to the presence (green dots) or absence (red dots) of given domains. In (d), the proteins are colored by output domain as follows: pale yellow, hybrid histidine kinases; cyan, histidine kinases; orange, phosphatases; magenta, phosphodiesterases; white, no recognizable output domain. Sequences were clustered in CLANS [24] at a BLAST P value cutoff of 1e-5 (see Materials and Methods for details).
close proximity, is monomeric in solution, and does not show any clefts or crevices that would suggest small-molecule binding. Sequence conservation is low and largely limited to the arrangement of hydrophilic, hydrophobic, and small residues needed for the four-helical structure. From this, we conclude that the domain has little propensity for dimerization and does not bind small-molecule ligands. In their vast majority, STAC domains occur between N-terminal SLC5-like membrane domains and C-terminal TCST domains. Indeed, the number of STAC-containing proteins that differ from this architecture is so low that STAC can be reasonably seen as a mediator between solute transport and signal transduction. Given the monophyletic nature of STAC-containing SLC5 homologs and their evolutionary distance to the main part of the family (Fig. 5), we conclude that the association between SLC5 and STAC led to the emergence of a new subfamily with altered functionality, which now couples solute uptake to the generation of an intracellular signal, as substantiated very recently by Zhang et al. for CbrA of Pseudomonas fluorescens [20]. STAC occurs in the same position as HAMP, between the last transmembrane helix and the first TCST domain, but this similarity is superficial. It differs from HAMP in most other aspects, being monomeric, never occurring in more than one copy per protein, occasionally being found in stand-alone proteins, and having the N- and C-termini adjacent rather than diametrically opposed. The latter property in particular raises questions about the topological arrangement of STAC domains within their larger proteins, given the known dimeric structure of TCST receptors. N-terminally, the only crystallized homolog of the SLC5-like domain is the sodium-galactose symporter of Vibrio parahaemolyticus (PDB IDs: 2XQ2 and 3DH4), which forms a tightly packed parallel dimer [25]. Helix 13 of this symporter, homologous to the last transmembrane helix of
STAC-associated SLC5 domains, is located at the interface of the crystallographic dimer, close to the 2-fold axis. This allows a direct extension of the structure by the homodimeric downstream domains (coiled coil, DHp). The STAC domain can now be envisaged to lie along the 2-fold axis (Fig. 7a) or laterally extruded and interacting with the following coiled coil (Fig. 7b). We favor the second possibility because the STAC domain is always connected to the last transmembrane helix of the membrane domain by an extended linker comprising minimally 8 residues, but more commonly 20–30 residues. Regarding the role of the STAC domain, it could be a site of sensory input (like PAS or GAF), a modulator transmitting a conformational signal linearly between adjacent domains (like HAMP), or a module mediating interactions with other proteins or domains in the respective system (like REC). The absence of a binding site recognizable in sequence or structure argues against it being a receptor for small-molecule ligands and its repeated occurrence as a stand-alone protein against it transmitting conformational signals along the polypeptide chain. We therefore favor the possibility that it mediates protein–protein interactions and are particularly intrigued by its potential to act as a mobile “plug”, reversibly obstructing the cytosolic side of transport channels in response to stimuli. Experiments to explore these possibilities are currently ongoing in our department.
Materials and Methods Bioinformatics Operon architectures were explored in PubSeed † (February 2015) [26] and in the NCBI Gene and Genomes browsers ‡. Sequence similarity searches were carried out iteratively in the MPI Bioinformatics Toolkit § [27] and at NCBI || [28].
3336
A Domain Associated with Solute Transport and Signal Transduction
Fig. 7. Schematic models for the position of STAC domains in CbrA-like histidine kinases (a) along the 2-fold axis and (b) laterally extruded. Domains are annotated as in Fig. 4.
Sequences newly identified in each iteration were used as starting points for further searches. For searches in the MPI Toolkit, PSI-BLAST was run on the non-redundant protein sequence database (nr) clustered at 70% sequence identity (nr70) with a threshold of E = 0.005. HHblits [15] was run on the nr database clustered at 20% sequence identity (nr20). For searches at NCBI, PSI-BLAST was run both on the full nr and on the myxobacterial, methanobacterial, or joined myxobacterial and methanobacterial subsets of nr. Proteins identified in the searches were clustered by pairwise BLAST P values [13] in CLANS [24]. Clustering was performed in default settings (attract = 10, repulse =5, exponents = 1) with P value cutoffs as given in the text. Domain annotation was based on CD-Search [29] run on the Conserved Domain Database [30] (CDD from NCBI, February 2015) and confirmed by SMART [31,32] and HHpred [33]. Secondary structure was predicted using the meta-tool Quick2D ¶, in the MPI Toolkit. The plots in Fig. 2b were calculated using the hydrophobicity scale of Kyte and Doolittle [34]. The plots were prepared using the Python library Matplotlib [35]. Evaluation of the dimer interface in the Af1502 crystal structure was performed using EPPIC a [36] and PISA b [37]. EPPIC considered it of potential biological relevance, whereas PISA scored it as a crystallization artifact. A search for coevolution signals for residues at the interface was made with EVcouplings c [38] but was largely unsuccessful. Structure figures were prepared in PyMol (Schrödinger, LLC), sequence conservation patterns were visualized with ProtSkin d [39], and electrostatic potentials were analyzed with PDB2PQR [40,41] and APBS [42]. Protein expression and purification Af1502 was expressed as a C-terminal fusion to glutathione S-transferase. The expression construct was
made by PCR amplification of the appropriate region of A. fulgidus DSM 4304 genomic DNA (LGC Promochem) and cloning of the fragment between BamHI and XhoI restriction sites of plasmid pGEX4T-1 (GE Healthcare). For expression, BL21(DE3) Gold cells containing the construct were grown at 37 °C in LB medium. At OD600 ~ 0.6, expression was induced with 1 mM isopropyl-β-D-thiogalactoside for 4 h. After resuspension in phosphate-buffered saline (PBS), cells were lysed by French press. According to manufacturer's instructions, the soluble fraction of the lysate was loaded onto a GSH FF column (25 ml; GE Healthcare) and subsequently treated with thrombin (Calbiochem) to cleave off glutathione S-transferase. The sample was loaded onto a Superdex G-75 26/60 gel size-exclusion chromatography column (GE Healthcare) and eluted with 20 mM Tris (pH 8.1) and 150 mM NaCl. To prepare protein labeled with SeMet for crystallography, we grew cells in M9 minimal medium in the presence of SeMet (50 μg/ml) and supplemented them with the amino acids Leu, Ile, Phe, Thr, Lys, and Val [43]. To prepare 15N-labeled samples for NMR, we grew cells in M9 minimal medium with 15NH4Cl (Euriso-top) as the sole nitrogen source. Recombinantly expressed protein was then purified as described above. Biochemical and biophysical characterization For secondary structure measurements, CD spectra were recorded at room temperature from 195 to 240 nm with a JASCO J-810 spectropolarimeter. Thermally induced protein denaturation was monitored by CD spectroscopy using a Peltier-controlled sample holder unit. Temperature profiles at 222 nm were recorded from 20 to 95 °C. To determine the native molecular mass of Af1502, we performed analytical gel size-exclusion chromatography in PBS on a Superose 6 column (GE Healthcare), calibrated with gel filtration standards. Static
3337
A Domain Associated with Solute Transport and Signal Transduction
light-scattering experiments were performed using a size-exclusion chromatography column (Wyatt) to which a miniDAWN Tristar Laser photometer (Wyatt) and a RI-2031 differential refractometer (JASCO) were coupled. Runs were performed in 30 mM Mops/NaOH and 150 mM NaCl (pH 7.2). Data analysis and molecular mass calculations were carried out with ASTRA V software (Wyatt). Protein–protein and protein–DNA interaction assays Potential interaction between Af1502 and the HAMP domain of Af1503 was monitored by analytical gel filtration and native polyacrylamide gel electrophoresis (PAGE). Gel filtration runs were performed on a Superdex 75 10/ 300 column (GE Healthcare) with the individual proteins and with preincubated stoichiometric mixtures thereof. Protein peaks were detected at 280 and 215 nm. For native PAGE, equimolar mixtures of Af1502, wild-type HAMP, or the A291F HAMP mutant were incubated at room temperature or at 50, 60, or 70 °C. Mixtures of Af1502 and HAMP were also preincubated in 6 M guanidinium chloride or 8 M urea and subsequently refolded by dialysis against 20 mM Tris and 150 mM NaCl (pH 8), followed by analysis on 17% native PAGE gels. For DNA-binding studies, double-stranded promoter sequences of Af1504 (83 bp) and of Af1505 (147 bp) were generated by PCR. DNA-binding studies were performed by incubating 1 pmol of DNA with increasing amounts of Af1502 (0.5–10 pmol) for 30 min in 20 mM Hepes, 100 mM NaCl, 100 mM KCl, and 10% glycerol (pH 7.5) either at room temperature or at 50 °C. Mixtures were then analyzed on 2% agarose gels supplemented with Serva DNA stain G (Serva). Crystallization Crystallization of SeMet labeled protein was performed at 22 °C in 96-well sitting-drop plates. Drops containing 400 nl of reservoir solution and 400 nl of protein solution at a concentration of 23 mg/ml were equilibrated against 50 μl reservoir solution. The best-diffracting crystals grew within 2 weeks with a reservoir solution containing 170 mM (NH4)2SO4, 25.5% (w/v) polyethylene glycol 4000, and 15% (v/v) glycerol. The crystals were briefly transferred to a separate droplet of reservoir solution and then directly flash-frozen in liquid nitrogen. Data were collected at 100 K at the SeMet K-edge (0.978 Å) at beamline X10SA of the Swiss Light Source (Villigen, Switzerland), using a PILATUS 6M hybrid pixel detector (Dectris Ltd.). The best dataset was indexed, integrated, and scaled to a resolution of 1.6 Å using XDS [44]. Crystals belong to the orthorhombic space group P212121 with unit cell dimensions a = 42.50 Å, b = 56.72 Å, and c = 62.91 Å and two monomers in the asymmetric unit. SHELXD [45] readily identified two of the three selenium sites of each monomer, and density modification with SHELXE [45] resulted in an electron density map of excellent quality that could be traced almost completely using ARP/WARP [46]. The structure was completed in cyclic manual modeling with Coot [47] and refinement with REFMAC5 [48]. Analysis with MolProbity [49] showed an excellent model geometry without any Ramachandran outliers. Data collection and refinement statistics are summarized in Table 1.
Table 1. Data collection and refinement statistics PDB accession code Space group Unit cell dimensions: a (Å)/b (Å)/c (Å) Resolution range (Å) Completeness (%) Redundancy I/σ(I) Rmerge (%) Rcryst/Rfree (%) Bond length/angle RMSD (Å/°) MolProbity clashscore Ramachandran favored (%)
5A1Q P212121 42.50/56.72/62.91 35.0–1.60 (1.69–1.60) 99.4 (96.6) 6.71 (6.39) 16.65 (2.07) 5.7 (75.9) 17.4/21.6 0.020/1.95 5.46 100
Values in parentheses refer to the highest-resolution shell. Ramachandran plot statistics and clashscore are as determined by MolProbity [49].
NMR spectroscopy Protein labeled with 15 N was concentrated to 1 mM in PBS and 90% H2O/10% D2O. All spectra were recorded at 25 °C on Bruker AVIII-600 and AVIII-800 spectrometers. The oligomeric state of the protein was examined by conducting a dilution series from 1 mM to 33 μM in four steps. Small but significant chemical shift changes allowed re-assignment of the spectra across this range. The apparent dissociation constant was estimated by plotting shift changes versus concentration for several residues. In order to map chemical shift changes to the structure, we completed backbone sequential assignments at a concentration where the protein was substantially monomeric. This was performed by tracing the strong contacts between sequential amide protons in 15 N HSQC nuclear Overhauser effect spectroscopy (NOESY) and NNH-NOESY spectra, supplemented by sequential H α contacts in HNHA and HNHB spectra. An almost complete set of aliphatic and aromatic sidechain assignments was obtained from a 15N HSQC total correlation spectroscopy spectrum and a 12 C, 14 N-filtered two-dimensional NOESY spectrum. Comparison of expectation NOESY spectra back-calculated from the crystal structure (in-house software) showed an excellent match to experimental spectra, indicating that the crystal structure is a very good model for the protein in solution.
Accession codes Coordinates and structure factors have been deposited in the PDB under the accession code 5A1Q.
Acknowledgements We thank Kerstin Bär for crystallization setups, Michael Hulko for cloning Af1502, Moritz Ammelburg and Vikram Alva for advice on the manuscript, and the staff of beamline X10SA/Swiss Light Source for
3338
A Domain Associated with Solute Transport and Signal Transduction
their continuous support. This work was supported by institutional funds of the Max Planck Society. Received 23 May 2015; Received in revised form 7 August 2015; Accepted 19 August 2015 Available online 28 August 2015 Keywords: CbrA; histidine kinase; SLC; TCST; transmembrane receptor Present address: S. Dunin-Horkawicz, Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, 02-109 Warsaw, Poland. †pubseed.theseed.org/seedviewer.cgi. ‡www.ncbi.nlm.nih.gov/gene/ and www.ncbi.nlm.nih.gov/genome/, respectively. §toolkit.tuebingen.mpg.de. ||blast.ncbi.nlm.nih.gov/Blast.cgi. ¶toolkit.tuebingen.mpg.de/quick2_d. awww.eppic-web.org/ewui/. bwww.ebi.ac.uk/pdbe/pisa/. cevfold.org/evfold-web/evfold.do. dhttp://www.mcgnmr.mcgill.ca/ProtSkin/. Abbreviations used: HSQC, heteronuclear single-quantum correlation; NOESY, nuclear Overhauser effect spectroscopy; PBS, phosphate-buffered saline; PDB, Protein Data Bank; SeMet, selenomethionine; TCST, two-component signal transduction.
References [1] A.M. Stock, V.L. Robinson, P.N. Goudreau, Two-component signal transduction, Annu. Rev. Biochem. 69 (2000) 183–215. [2] R. Gao, A.M. Stock, Biological insights from structures of twocomponent proteins, Annu. Rev. Microbiol. 63 (2009) 133–154. [3] L. Aravind, C.P. Ponting, The cytoplasmic helical linker domain of receptor histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins, FEMS Microbiol. Lett. 176 (1999) 111–116. [4] S. Dunin-Horkawicz, A.N. Lupas, Comprehensive analysis of HAMP domains: Implications for transmembrane signal transduction, J. Mol. Biol. 397 (2010) 1156–1174. [5] M. Hulko, F. Berndt, M. Gruber, J.U. Linder, V. Truffault, A. Schultz, et al., The HAMP domain structure implies helix rotation in transmembrane signaling, Cell 126 (2006) 929–940. [6] J.U. Linder, J.E. Schultz, Transmembrane receptor chimeras to probe HAMP domain function, Methods Enzymol. 471 (2010) 115–123.
[7] H.U. Ferris, S. Dunin-Horkawicz, L.G. Mondejar, M. Hulko, K. Hantke, J. Martin, et al., The mechanisms of HAMPmediated signaling in transmembrane receptors, Structure 19 (2011) 378–385. [8] H.U. Ferris, S. Dunin-Horkawicz, N. Hornig, M. Hulko, J. Martin, J.E. Schultz, et al., Mechanism of regulation of receptor histidine kinases, Structure 20 (2012) 56–66. [9] L.G. Mondejar, A. Lupas, A. Schultz, J.E. Schultz, HAMP domain-mediated signal transduction probed with a mycobacterial adenylyl cyclase as a reporter, J. Biol. Chem. 287 (2012) 1022–1031. [10] H.U. Ferris, M. Coles, A.N. Lupas, M.D. Hartmann, Crystallographic snapshot of the Escherichia coli EnvZ histidine kinase in an active conformation, J. Struct. Biol. 186 (2014) 376–379. [11] M.D. Hartmann, S. Dunin-Horkawicz, M. Hulko, J. Martin, M. Coles, A.N. Lupas, A soluble mutant of the transmembrane receptor Af1503 features strong changes in coiled-coil periodicity, J. Struct. Biol. 186 (2014) 357–366. [12] H.U. Ferris, K. Zeth, M. Hulko, S. Dunin-Horkawicz, A.N. Lupas, Axial helix rotation as a mechanism for signal regulation inferred from the crystallographic analysis of the E. coli serine chemoreceptor, J. Struct. Biol. 186 (2014) 349–356. [13] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410. [14] R.D. Finn, J. Clements, S.R. Eddy, HMMER Web server: Interactive sequence similarity searching, Nucleic Acids Res. 39 (2011) W29–W37. [15] M. Remmert, A. Biegert, A. Hauser, J. Soding, HHblits: Lightning-fast iterative protein sequence searching by HMMHMM alignment, Nat. Methods 9 (2012) 173–175. [16] L. Holm, P. Rosenstrom, Dali server: Conservation mapping in 3D, Nucleic Acids Res. 38 (2010) W545–W549. [17] T. Nishijyo, D. Haas, Y. Itoh, The CbrA-CbrB two-component regulatory system controls the utilization of multiple carbon and nitrogen sources in Pseudomonas aeruginosa, Mol. Microbiol. 40 (2001) 917–931. [18] A.T. Yeung, M. Bains, R.E. Hancock, The sensor kinase CbrA is a global regulator that modulates metabolism, virulence, and antibiotic resistance in Pseudomonas aeruginosa, J. Bacteriol. 193 (2011) 918–931. [19] M. Valentini, S.M. Garcia-Maurino, I. Perez-Martinez, E. Santero, I. Canosa, K. Lapouge, Hierarchical management of carbon sources is regulated similarly by the CbrA/B systems in Pseudomonas aeruginosa and Pseudomonas putida, Microbiology 160 (2014) 2243–2252. [20] X.X. Zhang, P.B. Rainey, Dual involvement of CbrAB and NtrBC in the regulation of histidine utilization in Pseudomonas fluorescens SBW25, Genetics 178 (2008) 185–195. [21] X.X. Zhang, J.C. Gauntlett, D.G. Oldenburg, G.M. Cook, P.B. Rainey, Role of the transporter-like sensor kinase CbrA in histidine uptake and signal transduction, J. Bacteriol. 197 (2015) 2867–2878. [22] J. Abramson, E.M. Wright, Structure and function of Na(+)symporters with inverted repeats, Curr. Opin. Struct. Biol. 19 (2009) 425–432. [23] A. Watanabe, S. Choe, V. Chaptal, J.M. Rosenberg, E.M. Wright, M. Grabe, et al., The mechanism of sodium and substrate release from the binding pocket of vSGLT, Nature 468 (2010) 988–991. [24] T. Frickey, A. Lupas, CLANS: A Java application for visualizing protein families based on pairwise similarity, Bioinformatics 20 (2004) 3702–3704.
A Domain Associated with Solute Transport and Signal Transduction
[25] S. Faham, A. Watanabe, G.M. Besserer, D. Cascio, A. Specht, B.A. Hirayama, et al., The crystal structure of a sodium galactose transporter reveals mechanistic insights into Na+/sugar symport, Science 321 (2008) 810–814. [26] R. Overbeek, T. Begley, R.M. Butler, J.V. Choudhuri, H.Y. Chuang, M. Cohoon, et al., The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes, Nucleic Acids Res. 33 (2005) 5691–5702. [27] A. Biegert, C. Mayer, M. Remmert, J. Soding, A.N. Lupas, The MPI Bioinformatics Toolkit for protein sequence analysis, Nucleic Acids Res. 34 (2006) W335–W339. [28] S.F. Altschul, T.L. Madden, A.A. Schaffer, J. Zhang, Z. Zhang, W. Miller, et al., Gapped BLAST and PSI-BLAST: A new generation of protein database search programs, Nucleic Acids Res. 25 (1997) 3389–3402. [29] A. Marchler-Bauer, S.H. Bryant, CD-Search: Protein domain annotations on the fly, Nucleic Acids Res. 32 (2004) W327–W331. [30] A. Marchler-Bauer, M.K. Derbyshire, N.R. Gonzales, S. Lu, F. Chitsaz, L.Y. Geer, et al., CDD: NCBI's conserved domain database, Nucleic Acids Res. 43 (2015) D222–D226. [31] J. Schultz, F. Milpetz, P. Bork, C.P. Ponting, SMART, a simple modular architecture research tool: IDENTIFICATION of signaling domains, Proc. Natl. Acad. Sci. 95 (1998) 5857–5864. [32] I. Letunic, T. Doerks, P. Bork, SMART: Recent updates, new developments and status in 2015, Nucleic Acids Res. 43 (2015) D257–D260. [33] J. Soding, A. Biegert, A.N. Lupas, The HHpred interactive server for protein homology detection and structure prediction, Nucleic Acids Res. 33 (2005) W244–W248. [34] J. Kyte, R.F. Doolittle, A simple method for displaying the hydropathic character of a protein, J. Mol. Biol. 157 (1982) 105–132. [35] J.D. Hunter, Matplotlib: A 2D graphics environment, Comput. Sci. Eng. 9 (2007) 90–95. [36] J.M. Duarte, A. Srebniak, M.A. Scharer, G. Capitani, Protein interface classification by evolutionary analysis, BMC Bioinf. 13 (2012) 334. [37] E. Krissinel, K. Henrick, Inference of macromolecular assemblies from crystalline state, J. Mol. Biol. 372 (2007) 774–797.
3339
[38] T.A. Hopf, L.J. Colwell, R. Sheridan, B. Rost, C. Sander, D.S. Marks, Three-dimensional structures of membrane proteins from genomic sequencing, Cell 149 (2012) 1607–1621. [39] B. Ritter, A.Y. Denisov, J. Philie, C. Deprez, E.C. Tung, K. Gehring, et al., Two WXXF-based motifs in NECAPs define the specificity of accessory protein binding to AP-1 and AP-2, EMBO J. 23 (2004) 3701–3710. [40] T.J. Dolinsky, P. Czodrowski, H. Li, J.E. Nielsen, J.H. Jensen, G. Klebe, et al., PDB2PQR: Expanding and upgrading automated preparation of biomolecular structures for molecular simulations, Nucleic Acids Res. 35 (2007) W522–W525. [41] T.J. Dolinsky, J.E. Nielsen, J.A. McCammon, N.A. Baker, PDB2PQR: An automated pipeline for the setup of PoissonBoltzmann electrostatics calculations, Nucleic Acids Res. 32 (2004) W665–W667. [42] N.A. Baker, D. Sept, S. Joseph, M.J. Holst, J.A. McCammon, Electrostatics of nanosystems: Application to microtubules and the ribosome, Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 10037–10041. [43] G.D. Van Duyne, R.F. Standaert, P.A. Karplus, S.L. Schreiber, J. Clardy, Atomic structures of the human immunophilin FKBP12 complexes with FK506 and rapamycin, J. Mol. Biol. 229 (1993) 105–124. [44] W. Kabsch, Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants, J. Appl. Crystallogr. 26 (1993) 795–800. [45] G.M. Sheldrick, A short history of SHELX, Acta Crystallogr. Sect. A: Found. Crystallogr. 64 (2008) 112–122. [46] A. Perrakis, R. Morris, V.S. Lamzin, Automated protein model building combined with iterative structure refinement, Nat. Struct. Biol. 6 (1999) 458–463. [47] P. Emsley, K. Cowtan, Coot: Model-building tools for molecular graphics, Acta Crystallogr. Sect. D: Biol. Crystallogr. 60 (2004) 2126–2132. [48] G.N. Murshudov, A.A. Vagin, A. Lebedev, K.S. Wilson, E.J. Dodson, Efficient anisotropic refinement of macromolecular structures using FFT, Acta Crystallogr. Sect. D: Biol. Crystallogr. 55 (1999) 247–255. [49] V.B. Chen, W.B. Arendall III, J.J. Headd, D.A. Keedy, R.M. Immormino, G.J. Kapral, et al., MolProbity: All-atom structure validation for macromolecular crystallography, Acta Crystallogr. D Biol. Crystallogr. 66 (2010) 12–21.