The hidden lipoproteome of Staphylococcus aureus

The hidden lipoproteome of Staphylococcus aureus

Accepted Manuscript Title: The hidden lipoproteome of Staphylococcus aureus Authors: Anica Graf, Richard J. Lewis, Stephan Fuchs, Martin Pagels, Susan...

1MB Sizes 0 Downloads 10 Views

Accepted Manuscript Title: The hidden lipoproteome of Staphylococcus aureus Authors: Anica Graf, Richard J. Lewis, Stephan Fuchs, Martin Pagels, Susanne Engelmann, Katharina Riedel, Jan Pan´e-Farr´e PII: DOI: Reference:

S1438-4221(17)30493-9 https://doi.org/10.1016/j.ijmm.2018.01.008 IJMM 51207

To appear in: Received date: Revised date: Accepted date:

27-9-2017 28-11-2017 27-1-2018

Please cite this article as: Graf, Anica, Lewis, Richard J., Fuchs, Stephan, Pagels, Martin, Engelmann, Susanne, Riedel, Katharina, Pan´e-Farr´e, Jan, The hidden lipoproteome of Staphylococcus aureus.International Journal of Medical Microbiology https://doi.org/10.1016/j.ijmm.2018.01.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

The hidden lipoproteome of Staphylococcus aureus Anica Grafa, Richard J. Lewisb, Stephan Fuchsc, Martin Pagelsa, Susanne Engelmannd,e, Katharina Riedela and Jan Pané-Farréa* a

Institute of Microbiology, Department of Microbial Physiology and Molecular Biology, University of Greifswald, F.-L.-Jahn-Str. 15, 17489 Greifswald, Germany d

IP T

Institute for Cell and Molecular Biosciences, Faculty of Medical Sciences, University of Newcastle, Newcastle upon Tyne, NE2 4HH, UK. c

FG13 Nosocomial Pathogens and Antibiotic Resistance, Robert Koch Institut (RKI), Burgstr. 37, 38855 Wernigerode, Germany d

SC R

Helmholtz Center for Infection Research GmbH, Microbial Proteomics, Inhoffenstraße 7, 38124 Braunschweig, Germany e

Institute for Microbiology, Department of Microbial Proteomics, Technical University Braunschweig, Spielmannstraße 7, 38106 Braunschweig, Germany. *

M

A

N

U

Corresponding author at: University of Greifswald, Institute of Microbiology, F.-L.-Jahn-Str. 15, Greifswald, Germany. Email-address: [email protected]

A

CC E

PT

ED

Graphical abstract

Abstract Lipoproteins are attached to the outer leaflet of the membrane by a di- or tri-acylglyceryl moiety and are thus positioned in the membrane-cell wall interface. Consequently, lipoproteins are involved in many surface associated functions, including cell wall synthesis, electron transport, uptake of

nutrients, surface stress response, signal transduction, and they represent a reservoir of bacterial virulence factors. Inspection of 123 annotated Staphylococcus aureus genome sequences in the public domain revealed that this organism devotes about 2-3% of its coding capacity to lipoproteins, corresponding to about 70 lipoproteins per genome. 60 of these lipoproteins were identified in 95% of the genomes analyzed, which thus constitute the core lipoproteome of S. aureus. 30% of the conserved staphylococcal lipoproteins are substrate-binding proteins of ABC transporters with roles in nutrient transport. With a few exceptions, much less is known about the function of the remaining

IP T

lipoproteins, representing a large gap in our knowledge of this functionally important group of proteins. Here, we summarize current knowledge, and integrate information from genetic context

analysis, expression and regulatory data, domain architecture, sequence and structural information,

SC R

and phylogenetic distribution to provide potential starting points for experimental evaluation of the biological function of the poorly or uncharacterized lipoproteome of S. aureus.

U

Keywords

N

Staphylococcus, lipoprotein, structure, PepSY, Lpl, DUF

A

Introduction

Lipoproteins are an important class of surface proteins, which are attached to membrane lipids in the

M

outer leaflet of the Gram-positive bacterial cell membrane by a conserved cysteine at the N-terminus of the mature lipoprotein. Lipoproteins are first synthesized in the cytoplasm as preprolipoproteins

ED

and are secreted across the membrane either in the unfolded state, for the majority of lipoproteins, by the Sec or in the folded form by the twin-arginine translocation (Tat) machinery (Goosens and van

PT

Dijl, 2017). In both cases, an N-terminal signal sequence determines to which of the two secretion pathways the protein will be targeted (Szewczyk and Collet, 2016). After secretion, the

CC E

preprolipoprotein is covalently attached by a cysteinyl sulfhydryl group to the diacylglyceryl moiety of phosphatidylglycerol in a reaction catalyzed by lipoprotein diacylglyceryl transferase (Lgt) (Tokunaga et al., 1982). The cysteine residue necessary for membrane attachment is located in a highly conserved four-amino-acid sequence motif, the lipobox [(L/V/I)(A/S/T/G)(G/A)C], at the C-

A

terminus of the signal peptide. After lipidation, the signal peptide is cleaved from the di-acylated prolipoprotein between G/A and the lipid-modified cysteine by lipoprotein signal peptidase II (Lsp), resulting in the release of the mature lipoprotein (Hussain et al., 1982). Recently, a third protein involved in lipoprotein maturation, apolipoprotein N-acyltransferase (Lnt), has been identified in mycobacteria and streptomycetes. It was previously thought that Lnt was only present in Gramnegative bacteria. Lnt catalyzes the N-acylation of the free amino group of the N-terminal cysteine leading to the formation of tri-acylated lipoproteins (Buddelmeijer, 2015). Despite the absence of a

clear Lnt homolog, N-acylation was also reported in Staphylococcus aureus, where the lipoprotein SitC (the substrate-binding protein of the iron ABC transporter SitABC) occurs in a tri-acylated form, consistent with the presence of an enzyme with Lnt-like activity (Kurokawa et al., 2012). In addition to its function as a membrane anchor, the acyl moiety of lipoproteins represents an important pathogen associated molecular pattern (PAMP) of Gram-positive bacteria, and is a potent activator of the host’s innate immune system (Aliprantis et al., 1999). While triacylated lipoproteins

IP T

are thought to mediate an immune response via TLR1/TLR2 heterodimers, diacylated lipoproteins signal via TRL2/TRL6 heterodimer binding (Akira, 2003). In particular for triacylated SitC, which it is

one of the most abundant lipoproteins in S. aureus and a model lipoprotein to analyze immune

SC R

signaling, TLR2/MyD88 activation has been shown to induce the production of tumor necrosis factor

α and interleukin-6 in mouse peritoneal macrophages (Nguyen and Götz, 2016a; Schmaler et al., 2009; Stoll et al., 2005). However, the robust activation of the innate immune system by lipoproteins

U

does not translate into a strong adaptive immune response (Vu et al., 2016).

Lipoprotein biogenesis and maturation and their role in nutrient uptake, especially iron, and

N

pathogenicity for characterized lipoproteins is reasonably well understood in S. aureus (Nguyen and

A

Götz, 2016b; Schmaler et al., 2010; Sheldon and Heinrichs, 2012), and the theoretical lipoproteome

M

of strain USA300 has been investigated in some detail (Shahmirzadi et al., 2016). In the present study, we have used an integrated approach to predict the theoretical lipoproteome of 123 annotated S. aureus strains. Following a manual curation of the annotations, proteins were grouped

ED

according to their function and occurrence within the different S. aureus strains. Focusing on poorly characterized lipoproteins, we integrated genetic, structural, expression and regulatory data to

PT

identify the biological context associated with these proteins to guide future functional analyses of

CC E

the S. aureus lipoproteome.

Materials and Methods

A

Identification of lipoproteins from genomic data. Complete genomic sequences of 123 annotated S. aureus strains were downloaded from RefSeq (O'Leary et al., 2016) in August 2016. The dataset includes 123 chromosomal and 92 plasmid sequences, a total of 352,983 genes (Table S1). Noncoding and pseudo genes were excluded prior to further analysis. Amino acid sequences derived from the remaining 328,287 coding sequences (CDS) were used to predict (i) general physicochemical properties, (ii) transmembrane topology, (iii) signal peptides, (iv) bacterial lipobox features, and (v) gene orthology.

We used custom scripts to calculate theoretic molecular weights and isoelectric points (at pH 7.0). The hydrophilic/hydrophobic character was assessed by GRAVY (grand average of hydropathy), calculated using the GRAVY calculator (http://www.gravy-calculator.de). Positive GRAVY values indicate an increased probability for membrane association (Kyte and Doolittle, 1982; Målen et al., 2008). Signal peptides were predicted using SignalP 4.1 (Petersen et al., 2011), and Phobius v1.01 (Käll et al.,

IP T

2007); the latter also predicts the presence of transmembrane helices. Lipoboxes were identified using LipoP 1.0 and the PROSITE pattern PS00013, G+LPP, and G+LPPv2 (Sigrist et al., 2013; Sutcliffe

and Harrington, 2002, 2004). Whilst the LipoP algorithm was trained on sequences from Gram-

SC R

negative bacteria, LipoP successfully recognizes lipoproteins from Gram-positive bacteria (Rahman et al., 2008).

All protein sequences sharing 90% pairwise identity (alignment coverage of at least 90%) in a protein-

U

protein BLAST 2.6.0+ (Camacho et al., 2009) were clustered into orthologous groups using Proteinortho v5.15 (Lechner et al., 2011). Synteny criteria were used to separate highly paralogous

N

sequences (equal weight of contextual adjacencies vs. sequence similarity). Groups of orthologous

A

lipoproteins were revised manually because of the extensive sequence similarity of multiple paralogs.

M

Bioinformatics analyses of lipoprotein sequences. Using BLAST and information curated in the published literature, the functional annotation for each predicted lipoprotein was evaluated

ED

manually. The identification of protein domains was performed with Pfam and SMART tools (Letunic et al., 2015). Coiled-coil predictions were only accepted if coiled-coil structures were detected by Coils (Lupas et al., 1991), Multicoil (Wolf et al., 1997) and Marcoil (Delorenzi and Speed, 2002). The

PT

PDB (www.rcsb.org) was searched for 3-D structural information using sequences of the S. aureus lipoproteins as queries (Berman et al., 2000). If hits to non-S. aureus proteins were identified in the

CC E

PDB search, the structure of the respective S. aureus lipoprotein was predicted in iTASSER (Yang et al., 2015). Conserved features of potential functional importance were identified using ConSurf (Ashkenazy et al., 2016); protein sequence alignments for ConSurf were constructed using Muscle or

A

Clustal Omega (Edgar, 2004; Sievers et al., 2011). Sequences for alignment construction were extracted from Uniprot90 to avoid unnecessary redundancy. Unless stated otherwise, only sequences from the genus Staphylococcus were considered in order to limit the analyses to likely orthologs and to avoid the introduction of background noise by inclusion of remote homologs. Sequences and alignments are available from https://mikrobiologie.uni-greifswald.de. Genetic context analysis and collection of expression and regulatory data. High resolution transcriptional data, which is available for S. aureus strain HG001, was used to identify co-

transcription of lipoprotein encoding genes with neighboring genes (Mäder et al., 2016). The literature was also searched extensively for global transcription and regulatory information to identify environmental conditions and regulators affecting lipoprotein expression. In total, over 53,000 regulatory events and 2,000 protein DNA interactions were extracted from the literature and screened for information concerning lipoproteins, and relevant information is summarized in tables S2 and S3.

IP T

Results and discussion Inventory and distribution of S. aureus lipoproteins. To determine the repertoire of S. aureus lipoproteins, we searched the genome of 123 fully sequenced S. aureus strains using an integrated

SC R

approach. In a first step, all protein sequences derived from the 123 S. aureus genome sequences were grouped into orthologous clusters to deduce the S. aureus panproteome. Next, all proteins within each cluster were screened with LipoP generating a list of 203 potential lipoprotein sequence

U

clusters. Protein clusters for which LipoP prediction showed less than 90% consistency for the included protein sequences or clusters which included proteins with predicted transmembrane

N

domains based on Phobius were inspected manually for the presence of general signal peptide and

A

lipobox features resulting in a final list of 192 probable lipoprotein clusters. Finally, the list of

M

predicted lipoproteins was compared with experimental data from studies mapping surface associated S. aureus proteins to minimize the potential identification of false-negatives (Table S1). However, global proteomic data of an S. aureus lgt mutant, which is no longer able to covalently link

ED

lipoproteins to the membrane consequently leading to an accumulation of lipoproteins in the culture medium, will be necessary to experimentally confirm the presence or absence of a lipobox. For

or protein.

PT

clarity, in the following text, clusters of orthologous sequences will be simply referred to as sequence

CC E

The average number of predicted lipoproteins per strain was 70, indicating that S. aureus devotes about 2.6% of its coding capacity to lipoproteins, which is in the same range as other Gram-positive bacteria (Sulzenbacher et al., 2006). Out of these 70 lipoproteins, 40 were identified in at least 95%

A

of all strains considered in this study. Lipoproteins that belong to the domain of unknown function (DUF) families DUF576, DUF1672 and DUF4467 were also identified in at least 95% of all strains. However, members of DUF565 and DUF1672 families were usually detected as multiple paralogs encoded within large loci containing variable numbers of genes thus hampering a clear sorting of the individual sequences into distinct orthologous groups. Thus, taking the average number of lipoproteins for each of these three DUF families into account (DUF576 = 15, DUF1672 = 3 and DUF4467 = 2), the total number of lipoproteins conserved in at least 95% of the strains increases to 60.

Functional classification. In order to classify the 192 identified lipoproteins based on function, the annotation provided at NCBI was inspected manually for every protein and complemented or corrected if necessary (Table S1). Almost 50% of the S. aureus core lipoproteome encode for proteins of unknown function. Based on their current annotation we grouped these proteins into three classes: (I) proteins with homology to functionally characterized proteins or protein domains, (II) proteins with conserved domains of unknown function and (III) proteins without predicted domains or homology to characterized proteins. Proteins assigned to the latter group were usually only

IP T

detected in staphylococci and are thus seemingly genus specific. Representative domain organizations and phylogenetic profiles of lipoproteins belonging to group II and III are shown in

figures 1 and S1. We are aware that the genetic background may have a profound effect on gene

SC R

expression and function in S. aureus. For the sake of clarity, and in the absence of a common

nomenclature for S. aureus proteins, we refer below to locus tags of the well-characterized S. aureus laboratory strain COL to avoid a confusing number of locus tags referring to the same protein/gene.

U

In silico analysis of lipoproteins of unknown function. About 30% of all conserved lipoproteins

N

identified in this study are substrate-binding proteins (SBP) of ABC transporters. S. aureus

A

lipoproteins involved in transport have been extensively reviewed elsewhere (Shahmirzadi et al., 2016; Sheldon and Heinrichs, 2012) and consequently we will discuss briefly only one SBP,

M

SACOL_RS01035, which has not been studied experimentally. The remainder of the article will focus on uncharacterized lipoproteins not related to the SBP family.

ED

Group I: Lipoproteins with function prediction based on homology to characterized proteins or protein domains.

PT

SBP_bac_11 (SACOL_RS01035)

CC E

At least eight of the SBP encoded in S. aureus are devoted to the uptake of iron, reflecting the severe iron limitation these pathogenic bacteria experience in their host environment and the resulting need to sequester iron from a variety of sources with high efficiency (Sheldon and Heinrichs, 2012). The reliance of S. aureus on iron sequestration is reflected by the numerous strategies and multitude

A

of different SBP devoted to iron binding and the differential expression of these uptake systems. A comparative transcriptome analysis of S. aureus HG001 cultured under 44 different conditions showed that expression of the different iron uptake and utilization systems is tightly regulated (Mäder et al., 2016). For instance, whilst the transcription pattern of SACOL_RS05335 and SACOL_RS10495 (FuhD1) was very similar, with strongest induction in RPMI medium, transcription of SACOL_RS11970 (FuhD2) was generally high but showed a significant decrease during stationary phase conditions irrespective of the growth medium. Other predicted Fe-specific SBPs like

SACOL_RS01035 and SACOL_RS03440 displayed only weak transcription levels suggesting that their induction requires specific signals not covered even by the 44 conditions tested (Mäder et al., 2016). Together these observations clearly show that transcription of iron acquisition is fine tuned in response to individual needs and specific signals. Despite extensive progress in our understanding of iron acquisition and utilization in S. aureus, several SBP predicted to have a role in iron uptake (e.g. SACOL_RS01035, SACOL_RS03440,

IP T

SACOL_RS05335 and SACOL_RS11390) have not been investigated so far. SACOL_RS01035 is the first gene in a tricistronic operon, which also encodes a classical two-component histidine kinase and

response regulator of unknown target specificity. The interplay of SBPs with membrane-embedded

SC R

signaling receptors is not unprecedented in the bacterial world. For instance, the Escherichia coli maltose-binding protein MalE, which is a component of the MalEFGK maltose ABC-transporter, also controls taxis towards maltose. In its maltose-bound state, MalE interacts with the chemoreceptor Tar thereby inhibiting CheA-dependent phosphorylation of CheY, resulting in swimming towards

U

increasing maltose concentrations (Zhang et al., 1999). A further example is the citrate binding

N

protein BctC of Bordetella pertussis. Substrate-loaded BctC can either interact with the citrate

A

transport system BctBA, or stimulate the kinase activity of the histidine kinase BctE, which in turn phosphorylates its cognate response regulator BctD to activate transcription of the bctABC operon

M

(Antoine et al., 2005). Indeed, the dual use of SBP as components of transport systems and signal sensing apparatus that regulate expression of cognate permease components appears to be a

ED

common theme (Tetsch and Jung, 2009). In this context, it is interesting to note that transcription of SACOL_RS01035 is positively correlated with transcription of SACOL_RS03445, encoding the iron (III) ABC transporter permease, FecD. The SBP (SACOL_RS03440) encoded upstream of fecD is transcribed

PT

as a single gene and does not form a transcription unit with fecD suggesting that expression of fecD could be regulated independently of SACOL_RS03440, e.g. by SACOL_RS01035 and the linked two

CC E

component system. The SACOL_RS03440 ortholog is not annotated as a lipoprotein in S. aureus strain USA300 due to the incorrect identification of the start codon of SAUSA300_RS03215 in this strain.

A

PepSY (SACOL_RS02240) Two PepSY (Peptidase propeptide and YPEB domain) domains can be identified in the SACOL_RS02240 sequence. PepSY domains are widely distributed in the bacterial kingdom and usually occur in two major protein architectures: (I) in association with M4 peptidase domains or (II) as single or tandem repeats of PepSY domains with or without additional transmembrane domains. In both groups, many proteins are characterized by signal peptide sequences that target the proteins to the extracellular milieu and, in the presence of a lipobox, anchor the proteins to the bacterial cell

membrane. This observation suggests a general extracellular function for PepSY domain proteins (Yeats et al., 2004). PepSY domains with C-terminally linked M4 peptidases are autoinhibitory of the proteolytic activity of the protease domain to prevent inappropriate intracellular activation (Braun et al., 2000; Tang et al., 2003). Based on these observations it has been suggested that PepSY domains may generally inhibit peptide bond-interacting proteins (Yeats et al., 2004). Interestingly, yepB from Bacillus subtilis, encoding a PepSY protein of group II (Üstok et al., 2015), is co-transcribed with selB, encoding the primary spore cortex-lytic enzyme required during spore germination. YepB was shown

IP T

to be important for SelB stabilization and localization (Moriyama et al., 1996). In Streptomyces coelicolor deletion of the SACOL_RS02240 homolog SCO7434 leads to irregular sporulation septation

and altered spore shape (Tzanis et al., 2014). In addition, SCO7434 was localized to the septum early

SC R

during sporulation and distributed around the spore surface during later stages of spore formation.

However, the molecular target of SCO7434 was not determined in this study. In S. coelicolor, transcription of SCO7434 is controlled by SigF, which like SigB of S. aureus, belongs to the stress and

U

sporulation sigma factor family (Tzanis et al., 2014). The gene encoding SACOL_RS02240 is transcribed from a SigB-dependent promoter that is induced following entry of S. aureus into

N

stationary phase or during a shift to alkaline pH (Pané-Farré et al., 2006). Thus, PepSY domains not

A

associated with M4 peptidases could be important in controlling cell (or spore) envelope integrity

M

and composition under stress and/or during development. For instance, down-regulation of SACOL_RS02240 was observed in a cwrA mutant. CwrA (SACOL_RS13470) is a putative 63 amino acid polypeptide of unknown function whose expression increases dramatically under cell wall stress

ED

conditions (Balibar et al., 2010). SACOL_RS02240 transcription was also down regulated in a strain with a constitutively active WalKR two-component system, which controls cell wall metabolism

PT

(Delauné et al., 2012). Furthermore, peptidoglycan biosynthesis inhibitors (bacitracin and mersacidin) or molecules leading to membrane depolarization (CCCP, nisin and daptomycin), induced

CC E

SACOL_RS02240 transcription (Muthaiyan et al., 2008; Sass et al., 2008; Utaida et al., 2003). Therefore, SACOL_RS02240 likely plays a role in cell surface stress and/or envelope biosynthesis or function.

A

SACOL_RS02240 has 29% sequence identity to the dimeric, putative lipoprotein CD630_1622 from Clostridium difficile 630 for which a crystal structure is available (PDB: 4EXR), the only meaningful hit when the PDB was searched with the SACOL_RS02240 amino acid sequence (Fig. 2). CD630_1622 contains two tandem copies of PepSY-like domains, small domains of a 4-stranded anti-parallel sheet with a single -helix on one face of the -sheet. The domains assemble by maintaining the anti-parallel directionality of the -strands to form a concave 8-stranded -sheet, with the two helices on the convex face. Both PepSY domains in CD630_1622 show an almost perfect structural alignment suggesting that this protein may have evolved by duplication of one of the two domains

(Fig. 2A and 2C). Using 4EXR as the template, the SACOL_RS02240 structure was modeled in iTASSER (Yang et al., 2015) for mapping amino acid conservation of staphylococcal SACOL_RS02240 orthologs (Fig. 2B and 2D). This analysis identified six highly conserved negatively-charged residues located in -strand 1 (Glu95 and Glu97) and -strand 3 (Glu113 and Glu115) of PepSY1 and Asp155 and Asp177 located at the end of -strands 1 and 3 of PepSY2, respectively (Fig. 2D). These residues form a patch of negative charge near the N-terminal end of the loop connecting the two PepSY domains of SACOL_RS02240 and the top of PepSY2 (Fig. 2E). Four highly conserved lysine residues, Lys60, Lys72,

IP T

Lys137 and Lys179, were also found in sequence alignments of SACOL_RS02240 orthologs. Lys72 is

situated at the C-terminal end of the first -strand of PepSY1 and is thus in close spatial proximity to the glutamate cluster; Lys60 and Lys137 are located on equivalent positions on the -helix in PepSY1

SC R

and PepSY2 domains, respectively; and Lys179 were found on the loop between -strands 3 and 4 of PespSY2. With the exception of Lys72 all the other lysines are exposed on the convex side of the

molecule (Fig. 2D). Lys179 is immediately adjacent to a highly conserved Asp-Ala motif (Asp177-

U

Ala178 in SACOL_RS02240), which is present in many members of the PepSY protein family. For

N

pseudolysin from Pseudomonas aeruginosa, substitution of Ala183 of the Asp-Ala motif to valine leads to severe growth retardation, cell leakage and, ultimately, cell lysis (Braun et al., 2000),

A

suggesting that this residue is important for the PepSY domain to exert its peptidase inhibitory

M

function. Taken together, these observations suggest that SACOL_RS02240 may have a role in the envelope formation and highlight amino acid residues that may be functionally important.

ED

Polysacc_deac_1 (SACOL_RS14150)

Sequence analysis of SACOL_RS14150 revealed the presence of a C-terminal (residues 126 to 305)

PT

predicted polysaccharide deacetylase domain type 1. For many pathogenic bacteria deacetylation of surface molecules is crucial for adaptation to the host environment and protection from host

CC E

immune defense mechanisms. For instance, deacetylation of exopolysaccharides by the action of secreted IcaB is important during biofilm formation of Staphylococcus epidermidis, resistance towards antimicrobial cationic peptides and phagocytosis by neutrophils (Vuong et al., 2004). A role in host immune evasion was also reported for the monotopic polysaccharide deacetylase PgdA of

A

Listeria monocytogenes and Streptococcus pneumoniae (Boneca et al., 2007; Vollmer and Tomasz, 2000). Examples of putative lipid-attached polysaccharide deacetylases include Bacillus anthracis BA0330 and BA0331, which do not modify peptidoglycan because of subtle adaptations of the active site, but are required for growth under high salt conditions and for cell shape maintenance, respectively (Arnaouteli et al., 2015). By contrast, Bd3279 from the predatory bacterium Bdellovibrio does modify the prey cell wall during predation and deacetylase activity is necessary for prey lysis (Lambert et al., 2016). Whilst all of these proteins share the Pfam Polysacc_deac_1 domain definition

with SACOL_RS14150, none were identified in a BLASTP search with SACOL_RS14150 as the query. A search of the PDB with the SACOL_RS14150 sequence also failed to find the structures of BA0330 and Bd3279, but did identify the unpublished chitin deacetylases from Arthrobacter sp. Yc-rl1 (PDB: 5LFZ) and Aspergillus nidulans (PDB: 2Y8U), and many structures of E. coli penicillin G acylase (e.g. Hewitt et al., 2000, PDB: 1E3A). A separate search limited to the N-terminal region of SACOL_RS14150 (residues 23 to 125) that excluded the C-terminal Polysacch_deac_1 domain, identified weak similarities to a region (residues 236 to 304) of the C-terminal aminotransferase-like domain of the

IP T

B. subtilis transcriptional regulator GabR (PDB: 5X03). More distantly related homologs of the

SACOL_RS14150 N-terminal region, including proteins from paenibacilli, Clostridia and enterococci

share two highly conserved sequence motifs, H(V/I/L)FFH(P/S)L and TXXEF and two invariant

SC R

tyrosines, corresponding to SACOL_RS14150 residues 58-HVFYHPL-64, 88-TVSEF-92, Y100 and Y104.

These patterns were not responsible for identifying GabR in the PDB search and none of these sequence-based matches provided sufficient coverage to generate a meaningful structural model for

N

U

full length SACOL_RS14150.

A

Group II: Lipoproteins with domains of unknown function.

M

For the majority of conserved, uncharacterized S. aureus lipoproteins a DUF can be identified. The DUF-containing group II lipoproteins are usually not restricted to S. aureus but show a broader

ED

phylogenetic distribution in the Staphylococcaceae. However, the distribution extends rarely to other members of the firmicutes, suggesting that the group II lipoproteins may have evolved to fulfill Staphylococcus-specific surface-associated functions. Three groups of DUF lipoproteins – DUF576,

CC E

functions.

PT

DUF1672 and DUF4467 – are encoded as multiple paralogs, suggesting overlapping or redundant

DUF576 (Lipoprotein-like (Lpl) proteins) Lpls, also designated as tandem-type lipoproteins because they usually occur as chromosomal

A

clusters, were identified in all 123 S. aureus genome sequences analyzed in this study (Table S4). In almost all strains at least three major lpl gene clusters can be identified at a conserved locus near the chromosomal origin of replication (Fig. S2). Additional lpl gene clusters, scattered over the entire chromosome, are present in many strains with a maximum of five clusters per genome. The maximum number of Lpls encoded within a single genome was 23 and the minimum was five. The number of genes within a cluster ranged from one to 15. Despite the large variations in lpl gene cluster number and total Lpl numbers per cluster and genome, one third of the genomes investigated here are characterized by a common set of 3 gene clusters with sizes of 2-4, 3-6, and 9-11 lpl genes.

The largest lpl gene cluster is located on the νSaα (non-phage and non-staphylococcal cassette chromosome) genomic island and is encoded upstream of the staphylococcal superantigen like (ssl) gene cluster (Tsuru and Kobayashi, 2008). The vSaα islands, which have been identified in most S. aureus strains, were classified into types I to IV using the sequence of hsdS that correlates to the vSaα genomic island structure. The hsdS and hsdR genes form a bicistronic operon encoded between the ssl and lpl gene clusters, and together with the chromosomally remote HsdM form a type I

IP T

restriction modification system. In 89 of the S. aureus strains analyzed (72%), lpl pseudogenes were identified. The large number of

pseudogenes suggests high genetic activity within the lpl gene clusters, constantly restructuring the

SC R

repertoire of S. aureus Lpls. In this context, Tsuru and Kobayshi described a highly conserved central

region in each Lpl open reading frame which constitutes a starting point for multiple recombination events, leading to duplication and further diversification of these genes. Moreover, they suggest that the resulting tandem paralogous genes serve as a way of antigenic phase variation in response to

U

rapidly changing environments (Tsuru and Kobayashi, 2008). The exact function of the Lpls is

N

unknown, but deletion of the vSaα-encoded lpl gene cluster of S. aureus strain USA300 results in a

A

diminished TLR2-dependent pro-inflammatory response and reduced host invasion, suggesting an important role for these lipoproteins in pathogenicity (Nguyen et al., 2015). Furthermore, Lpls may

M

function as cyclomodulins leading to a delay in the transition from the G2 to the M phase of the eukaryotic cell cycle (Nguyen et al., 2016). Whether this activity is related to host cell invasion

ED

remains unknown, but cell cycle progression was independent of the lipid modification of the prototypical Lpl1 used in this study. Furthermore, it is interesting to note that Lpls are released from the cell surface in stationary phase (Schluepen et al., 2013). Recently, it has been shown that

PT

aspartate at +2 position favors tight anchoring of lipoproteins to the cytoplasmic membrane, while the substitution of Asp with Ser at +2 position as for instance in Lpl8, leads to an increased release of

CC E

this Lpl into the supernatant (Kumari et al., 2017). Once released from the surface, Lpls could represent targets for the adaptive immune system. The large number of Lpls encoded by all S. aureus strains could thus be explained by the significant evolutionary pressure on this group of lipoprotein.

A

Alternatively, these proteins may have evolved to interact with a range of closely related target structures and therefore expansion of this protein group might reflect functional redundancy. The crystal structures for three Lpls from S. aureus are available (PDB: 4EGD; 4BIH; 4BIG) (Schluepen et al., 2013). The sequence identity of the crystallized proteins ranges between 95% and 50% with corresponding structures showing a root-mean-square deviation (r.m.s.d.) of around 1 Å on matched C atoms (Fig. 3A and 3B). The overall structure of these Lpls is a single / domain with a large 10stranded anti-parallel -sheet resembling half a -barrel, which forms the concave side of the

molecule, and five -helices form the convex side. Interestingly, this organization of secondary structure elements is somewhat reminiscent of the PepSY domain proteins, suggesting that Lpl might also have evolved by internal duplication of a gene encoding a small protein with a 4-stranded antiparallel -sheet (Fig. S4). The three S. aureus Lpl structures show significant structural homology, despite ~10% sequence identity, to the Mycobacterium tuberculosis lipoproteins LppX (PDB: 2BYO, r.m.s.d. 3.2 Å) and LprG (PDB: 4ZRA, r.m.s.d. 3.6 Å) (Fig. S3). LppX and LprG are lipid transport proteins that transfer phthiocerol dimycocerosate (PDIM) and triacylglyceride (TAG), respectively,

IP T

between membranes (Martinot et al., 2016; Sulzenbacher et al., 2006). It has been proposed that binding of the glycolipid lipoarabinomannan (LAM) by LprG is important for LAM surface display and

contributes to glycolipid-dependent phagolysosomal fusion (Drage et al., 2010; Shukla et al., 2014).

SC R

Lipophilic molecules are bound in a hydrophobic pocket located between the concave side of the

antiparallel -strands and the -helices (Drage et al., 2010; Martinot et al., 2016; Shukla et al., 2014; Sulzenbacher et al., 2006). However, the α-helices of LprG and Lpls are on opposite faces of the

U

concave β-sheet and consequently the cavity that can accommodate lipophilic ligands in LprG is absent from Lpl, making this region unlikely as a ligand-binding pocket (Schluepen et al., 2013).

N

Nevertheless, a conserved sequence motif (Lpl core region) within the N-terminal part of the Lpls

A

(Nguyen et al., 2015), from the C-terminal end of -helix 1 to the beginning of the first -strand, is in

M

close proximity to the lipid-binding site of LprG and LppX (Fig. S3). Residues of the core region (particularly within the loop preceding -strand 1) and from the loop between -strands 8 and 9 form

ED

a highly conserved surface-exposed region (Fig. 3C). The same residues form one side of a small tunnel that penetrates the entire Lpl molecule. By contrast, little sequence conservation was observed for surface-exposed residues on the concave side of the molecule (Fig. 3C). It remains to be

PT

seen whether the Lpls have a lipid-binding site similar to those of LppX and LprG, and the Lpls’ functions and the role of highly conserved loop regions in lipid or small molecule binding awaits

CC E

experimental support.

DUF1307 (SACOL_RS12940) SACOL_RS12940 is transcribed as monocistronic mRNA in S. aureus HG001, and is induced upon

A

internalization by human monocytes (Mäder et al., 2016). Proteins of the DUF1307 family are usually formed of single domains and a search of the PDB using the SACOL_RS12940 protein sequence as the query identified the single domain E. coli lipoprotein YehR (PDB: 2JOE). YehR is composed of an antiparallel -sheet on one side and a loop rich region including three -helices on the other side of the molecule and shares a degree of structural similarity with the pneumococcal lipoprotein SP_0191 (PDB: 2MVB, r.m.s.d. 3.0 Å). A biological function has yet to be reported for either YehR or SP_0191. The YehR structure was used as a template to generate a model for SACOL_RS12940. A large

hydrophobic pocket is formed in this model between the antiparallel -sheet at the back of the pocket, and which is lined by -helices 1 and 3 that are found at either end of the -sheet, and the loop between -helices 2 and 3 (Fig. S5). The remaining surface of the molecule is, by comparison, relatively hydrophilic. The pattern of surface hydrophobicity is also partially reflected by the ConSurf analysis. Interestingly, a similarly hydrophobic cavity can also be identified for the E. coli YehR and the S. pneumonia SP_0191 protein, although in SP_0191 it adopts more of a molecule-long groove (not shown). Taking into account the location of the cysteine required for membrane anchoring, the

SC R

a partner ligand with complementary surface shape, size and charge properties.

IP T

SACOL_RS12940 pocket would face away from the membrane thus ideally positioned to interact with

U

DUF1672 (SACOL_RS07790, SACOL_RS07800, SACOL_RS07805, SACOL_RS07815)

With a calculated molecular mass of around 35 kDa, the DUF1672 proteins are the largest DUF

N

lipoproteins encoded in the S. aureus genome. As with the Lpls, the DUF1672 protein encoding genes

A

are usually organized in clusters on the chromosome. The number of DUF1672 domain proteins

M

encoded per genome varies from zero to five with 70% of the strains encoding four DUF1672 proteins (e.g. SACOL_RS07790, SACOL_RS07800, SACOL_RS07805, SACOL_RS07815). Inspection of

ED

the S. aureus genomes revealed that genes encoding DUF1672 proteins are organized in gene clusters displaying five major types of architectures that are most likely of common origin (Fig. S6). In 40% of the strains a single gene cluster with one to five DUF1672 genes, associated with a further

PT

conserved gene of unknown function (SACOL_RS07810), can be identified. A variant of this organization can be found in 50% of the strains, where the SACOL_RS07810 gene is disrupted by the

CC E

insertion of a prophage (e.g. in strain USA300 the Panton-Valentine leucocidin encoding SA2usa (Diep et al., 2006). In the remaining 10% of the investigated strains, the SACOL_RS07810 gene and two linked DUF1672 genes have either moved to a different position in the chromosome resulting in two DUF1672 gene clusters, or have been lost completely resulting in strains with only two or no

A

copies of DUF1672 genes. As has been observed for the Lpls, a significant number of DUF1672 sequences carry frame shift mutations. An unpublished crystal structure for a DUF1672 protein from S. aureus strain Mu50 (SAV_RS08010, PDB: 4QPV) is available. The DUF1672 structure shows no homology to any other protein in the PDB and hence the potential function of DUF1672 homologs is difficult to ascertain from structure alone. The structure of 4QPV has quasi two-fold symmetry centered on the vertex of the long -helix that runs between residues 137 and 170, and which is bent by almost 90 ° at residue His156. The carbonyl

of His156 is flipped 90 ° out of the helical plane to act, in essence, as the N-cap for the latter half of the helix, forming a hydrogen bond to the amide nitrogen of Leu160. All of the residues immediately flanking the breakpoint of the -helix are in thermodynamically-favoured rotamers. The quasi twofold symmetry superimposes the -sheets in each half of the protein in addition to the first and last major -helix. The sequence conservation of surface exposed amino acids shows a striking accumulation of highly conserved residues covering almost one entire side of the molecule and a relatively prominent area of neutral charge (Fig. S7). The correlation between surface charge and

IP T

sequence conservation in 4QPV suggests that the DUF1672 proteins of S. aureus interact with a large target molecule that contacts almost the entire face of the lipoprotein, indicating that this protein

U

DUF4467 (SACOL_RS01510, SACOL_RS08020, SACOL_RS12635)

SC R

may interact with another protein as a scaffold rather than performing some catalytic function.

N

Unlike the Lpls and the DUF1672 proteins, proteins sharing the DUF4467 domain are not organized in gene clusters but are scattered over the chromosome. Up to five DUF4467 protein encoding genes

A

can be identified per genome with 50% of all strains carrying two DUF4467 genes and another 40%

M

carrying three alleles. With only few exceptions, two DUF4467-containing genome organizations are present in all strains (Fig. S8). In the first organization, the DUF4467 protein encoding gene

ED

(SACOL_RS12635) is co-transcribed with the thiol-disulfide oxidoreductase encoding gene dsbA. The DsbA lipoprotein functions as a chaperone that is required for the introduction of disulphide bonds into extracellular proteins. The second organization (SACOL_RS01510) is characterized by the

PT

presence of a transposase and multiple short open reading frames, encoding many different DUF proteins, flanking the upstream and the downstream region of SACOL_RS01510. This genomic

CC E

organization can be identified at two different chromosomal locations within a single genome in 20% of all strains. Transcriptome analyses showed up-regulation of SACOL_RS01510 transcripts following mupirocin treatment, an inducer of the stringent response in S. aureus, and positive regulation by the alternative stress sigma factor SigB, the cell density dependent regulators agr and rot (Anderson

A

et al., 2006; Bischoff et al., 2004; Cheung et al., 2011; Saïd-Salim et al., 2003; Sully et al., 2014). In addition, ChIP-on-chip experiments identified a CodY binding site up-stream of the SACOL_RS01510 coding region (Majerczyk et al., 2010), suggesting a stationary phase or starvation related function for SACOL_RS01510. In the third organization, which is present in only 30% of all strains, the DUF4467 gene (SACOL_RS08020) appears to form an operon with at least four additional genes encoding a putative endopeptidase, also known as CHAP (cysteine, histidine-dependent amidohydrolase/peptidase) (Bateman and Rawlings, 2003), an FtsK/SpoIIIE family protein and two

transmembrane proteins of unknown function. Genes encoding conjugative transposon proteins often flank this gene segment. In two genomes, two and three copies of this region were identified. Interestingly, members of the DUF4467 family display a fold similar to cystatin, a family of proteins that function as inhibitors of Cys-dependent proteases by binding directly to the protease active site (Turk et al., 2008). Thus, co-occurrence of SACOL_RS08020 with CHAP domain proteins might reflect functional coupling, with SACOL_RS08020 controlling the enzymatic activity of the CHAP protein. As can be expected from their occurrence at distinct chromosomal locations, the DUF4467 proteins

IP T

show relatively little sequence identity to each other. Thus, DUF4467 proteins may not have

redundant but specific functions. To address this question, we constructed sequence alignments and phylogenetic trees for homologs of all three member of the DUF4467 protein family. This analysis

SC R

showed that SACOL_RS08020 orthologs associated with an FtsK/SpoIIE family protein are more closely related to SACOL_RS01510 orthologs, which clusters on the chromosome with genes

encoding other DUF proteins, than to SACOL_RS12635, which is associated with DsbA. Although

U

each group of DUF4467 proteins was distinguishable by a set of invariant amino acids, a multiple sequence alignment of all DUF4467 proteins showed almost no invariant residues (not shown).

N

However, the overall distribution of charged residues followed a common pattern and regions

A

containing charged amino acids tended to localize in the same region of the polypeptide chain. The

M

unpublished crystal structure of the SACOL_RS01510 ortholog from S. aureus strain Mu50 (SAV_RS01690, PDB: 4EBG) is annotated as existing as a dimer (Fig. S9). This conclusion is supported by the observation that the N-termini, which would be the site of attachment to the lipid anchor, are

ED

located on the same side of the dimer. Dimerization of SACOL_RS01510 leads to the formation of a prominent cleft pointing away from the membrane, which would thus be in an excellent position to

PT

interact with an unknown substrate. However, the amino acids forming this cleft are poorly conserved among S. aureus members of the DUF4467 family, and this poor conservation is

CC E

maintained even when DUF4467 members are grouped by genomic organization. The latter analysis also failed to identify clear candidate amino acids that line the ride of the cleft that might provide substrate specificity. However, a cluster of conserved negatively charged and polar amino acids can be identified in the last -strand of the structure (Fig. S9). Finally, the rather small solvent accessible

A

surface that is buried by ‘dimerization’ appears more reflective of a protein:protein contact in the formation of a crystal lattice (Ponstingl et al., 2000). It therefore remains to be seen if SACOL_RS01510 dimerizes in vivo and further knowledge of its function remains to be uncovered DUF4889 (SACOL_RS12495) Transcription of the SACOL_RS12495-encoding gene showed a particularly strong induction under anaerobic conditions, the presence of CO2 and in non-adherent bacteria isolated from the

supernatant of S9 cells after 1 hour of infection (Mäder et al., 2016). Using the SACOL_RS12495 protein sequence as a query, significant hits were detected in a large number of staphylococcal species. Interestingly, while a signal sequence was usually identifiable in all sequences producing significant alignments, a lipobox was not, suggesting that in some staphylococcal species this protein could be attached to the membrane, while other staphylococci might rather secrete the protein into their surroundings. Interestingly, both variants shared a highly conserved sequence motif at the Cterminus that is characterized by three invariant histidine residues: HDDXPHGLMXXIH. In several

IP T

SACOL_RS12495 orthologs this motif is followed by a variable number of further histidine residues.

Histidine rich sequences are often involved in metal ion binding - especially iron - and it is conceivable that coordination or scavenging of metal ions, such as the growth-limiting nutrient iron,

SC R

could be important for SACOL_RS12495 function.

U

DUF4909 (SACOL_RS09465)

N

A particularly interesting S. aureus lipoprotein that has not been investigated so far is SACOL_RS09465. The respective gene in strain HG001 is transcribed at very low levels under many

A

laboratory conditions but is induced when cells are cultured in human plasma over a prolonged time

M

(Mäder et al., 2016). Transcription of SACOL_RS09465 was correlated closely with that of fmhA (Mäder et al., 2016), which encodes a protein predicted to play a role in peptidoglycan synthesis

ED

(Tschierske et al., 1999). Other studies have reported that SACOL_RS09465 is 10 fold up-regulated following a challenge with fusidic acid, which interferes with translocation of elongation factor G and

PT

is thus an inhibitor of translation in bacteria (Delgado et al., 2008). An unpublished crystal structure is also available for the SACOL_RS09465 ortholog from S. aureus

CC E

strain Mu50 (SAV_RS09645, PDB: 4LQZ). Although a sequence-based search did not reveal any hits in the PDB database, significant structural homology was detected to a number of lipoproteins, including a lipoprotein from S. pneumonia (SP_0198, 2.7 Å r.m.s.d., PDB: 5CYB), Treponema pallidum pallilysin (Tp0751, 3.1 Å r.m.s.d., PDB: 5JK2), and two proteins from Bacteroides uniformis

A

(BACUNI_01346, 3.1 Å r.m.s.d., PDB 4QRL and BACUNI_03114, 3.2 Å r.m.s.d., 2MHD), which all share a lipocalin-like fold with SACOL_RS09465 (Fig. S10). Lipocalins are a diverse group of extracellular proteins that transport and store small, usually hydrophobic molecules within a ligand pocket surrounded by four loops at the open end of a calyx-shaped 8-stranded anti-parallel β-barrel (Flower et al., 2000). Furthermore, lipocalins are characterized by an N-terminal helix (usually a 310-helix) and a C-terminal helix followed by an additional β-strand. However, contrary to the classical lipocalin architecture, SACOL_RS09465 has a large loop (L5) with a short -helical stretch inserted between β-

strands 5 and 6. β-strand 6 does not participate in the formation of the β-barrel but flanks one halfside of the β–barrel with an extensive loop-region (L5), including a 310-helix, which is inserted between β-strands 5 and 6. A large number of conserved, surface exposed amino acids are found in the L5 loop, -strand 6 and the adjacent L6 loop, the domain flanking the -barrel, and the side of the -barrel itself (Fig. S10B). It is tempting to speculate that at least some of these conserved residues may interact with an unknown ligand, in line with the way other lipocalin-like proteins bind their ligands, and perhaps translate this interaction into conformational changes in loop L5

IP T

controlling access to the interior of the -barrel.

Until recently, lipocalins were not detected in Gram-positive bacteria but a number of unpublished

SC R

structures deposited at the PDB with the lipocalin fold (see above) and the structure of the B. subtilis YxeF (PDB: 2JOZ) raised the possibility that this fold is more widely distributed than previously anticipated (Wu et al., 2012). Finally, a function as host cell adhesion molecule that interacts with

extracellular matrix was demonstrated for the T. pallidum lipocalin homolog identified in our

U

structure-based search (Parker et al., 2016). Thus, it will be important to clarify if SACOL_RS09465

A

DUF5067 (SACOL_RS04375)

N

functions as a small molecule binding protein.

M

Comprehensive transcriptome analyses for S. aureus strain HG001 showed that transcription of the DUF5067 lipoprotein gene SACOL_RS04375 is up-regulated during oxygen limitation in non-adherent

ED

bacteria isolated from the supernatant of S9 cells after 1 hour of infection, internalization by human monocytes, and at elevated CO2 concentrations (Mäder et al., 2016). Furthermore, transcription of

PT

SACOL_RS04375 was correlated strongly with the transcription of two downstream genes encoding two small proteins of unknown function and, to a lesser extent, to transcription of a putative amidase (SACOL_RS01370), a lipoprotein annotated as DUF1307 domain protein (SACOL_RS12940,

CC E

see above) and two genes encoding small protein toxins (SACOL_RS01375 and SACOL_RS06860) involved in inter-organism conflicts (Zhang et al., 2012). An unpublished structure of the SACOL_RS04375 ortholog from strain S. aureus NCTC8325 is

A

available (SAOUHSC_0808; PDB 3QFG). Despite the absence of any significant sequence similarity to any other entry in the PDB, 3QFG shares structural homology with several proteins from Grampositive bacteria that are either predicted lipoproteins (e.g. YcdA and YjhA from B. subtilis, PDB 4R4G and 3CFU; 3.1 Å r.m.s.d. for each) or secreted proteins from M. tuberculosis (Immunogenic protein MPT63, Rv1926c PDB 1LMI; r.m.s.d. 3.0 Å). Structural homology was not limited to members of the DUF5067 (EF_1241, PDB 2LLG; r.m.s.d. 3.2 Å) and the DUF5067 superfamily DUF4342 (YcdA, YjhA)

but extended to the DUF1942 family (Rv1926c), indicating that these DUF families are structurally related (Fig. S11). Although proteins of the DUF5067 family are widely distributed in the bacterial world including several staphylococcal species, the SACOL_RS04375 sequence appears to be highly specific for S. aureus. While a BLASTP search with SACOL_RS04375 as query produced significant hits in a very small number of strains from other staphylococcal species, this was characterized by a steep drop in

IP T

the e-value and the level of sequence identity, from 4e-134 and 96% (Staphylococcus argenteus) to 8e-34 and 38% (Staphylococcus xylosus), for example. Thus, our approach to restrict the amino acid sequence analysis to the genus Staphylococcus did not generate sufficient sequence diversity to

SC R

provide meaningful information. Therefore, the NCBI protein database was searched with BLASTP using SACOL_RS04375 as query but excluding S. aureus from the list of searched species in order to limit results to more distantly related sequences. A multiple sequence alignment was constructed from the list of hits and visualized with the WebLogo tool (data not shown). Although the alignment

U

was of only moderate quality, including many gaps to accommodate the differences in length of the

N

hits, this analysis confirmed that members of the DUF5067 family are characterized by large numbers of charged residues, a feature we also observed in several other lipoproteins discussed here. For

A

instance, 47% of the residues of the mature SACOL_RS04375 are charged, with E/D contributing 25%

M

and K/R contributing 22%. The high frequency (74%) of charged residues is particularly obvious in the C-terminal 39 amino acids. The majority of negatively-charged residues cluster on one side of the

ED

molecule resulting in a negatively-charged surface potential surrounding a prominent grove formed between -helix 1 and a large elongated loop between -strand 7 and -helix 2 the base of which is mostly hydrophobic. Lys126 and Asp179 straddle either side of the groove and are separated by 6.5

PT

Å, and may define the end of this groove if it is used in macromolecular interactions. Another pair of residues, His111 and Asp129, straddles the grove in approximately its middle and may also be

CC E

involved in defining specificity (Fig. S11B). YkyA (SACOL_RS05625) The ykyA gene appears to be transcribed as a single unit in S. aureus strain HG001 (Mäder et al.,

A

2016), but transcription may extend into the downstream pyruvate dehydrogenase operon. The transcription of ykvA starts from alternative promoters in a growth-condition dependent manner leading to a transcript characterized by a long UTR, which overlaps with that of SACOL_RS05620, which is encoded on the minus strand upstream of ykyA. Transcription of ykyA is strongly induced by a number of cell envelope active antibiotics including daptomycin, telavancin (a vancomycin derivative), enduracidin (an inhibitor of MurG) and mersacidin (a probable inhibitor of

transglycosylation) (Muthaiyan et al., 2008; Sass et al., 2008; Song et al., 2012), and therefore presumably plays some role in cell wall biology. The protein encoded by ykyA is annotated in KEGG as a potential hydrolase, but our sequence-based bioinformatics searches failed to reproduce this annotation. The unpublished structure of YkyA has been solved (PDB 2AP3), revealing that the protein folds into a four helical bundle that is unlikely to have an enzymatic function since most 4-helical bundles perform structural roles only. The bundle is

IP T

rather open such that there is a groove between helices 1 and 4, ~50 Å long, and between 7 and 15 Å wide running along the length of the molecule. The 4-helical bundle is a common fold in biology and consequently 2AP3 produces a large number of hits during a search for structural relatives, most of

SC R

which share little functional similarity. However, high scoring matches characterized by long regions

of sequence coverage included the ligand-binding domain of the aspartate receptor of the E. coli and Salmonella enterica methyl-accepting chemotaxis protein, Tar (e.g. PDB 2LIG, 4Z9H). Analysis of sequence conservation among Staphylococcus YkyA orthologs revealed a stretch of invariant amino

U

acids forming the bottom of the cleft located between the two N- and C-terminal helices of the four-

N

helix bundle. Of particular interest are residues E32, D46, K49 from helix 1, R78, E85, S92 from helix

A

2, K125, Y126, H129, Y132, Y136, E143, F147 from helix 3 and Q157, Y181, K192 from helix 4 that are all part of the groove, and could contribute to bundle stability, and/or the interaction of the bundle

M

with a target ligand (Fig. S12). The dimensions of the groove are consistent with those of an -helix from another protein molecule, suggesting that SACOL_RS05620 acts as a scaffold for

ED

macromolecular interactions.

PT

Group III: Proteins with no detectable conserved protein domains.

CC E

The majority of proteins within this category are Staphylococcus-specific proteins and are characterized by the frequent observation of regions of low sequence complexity. For some proteins within this group, coiled-coil regions were predicted usually spanning a region of about 50 amino acids following the lipobox. No structural data is available for any of the proteins discussed here

A

indicating that at least some of these proteins have extended unstructured regions. Thus, careful evaluation of these proteins for well-structured regions, e.g. using limited proteolysis and/or circular dichroism spectroscopy, may be advised prior to crystallization trials. SACOL_RS02270 For S. aureus strain HG001 SACOL_RS02270 forms a bi-cistronic operon with a gene encoding a small protein of unknown function (Mäder et al., 2016). Transcriptional read-through from the ahpCF

operon located up-stream, which encodes alkyl hydroperoxid reductase, was observed under certain growth conditions. Transcription of SACOL_RS02270 was strongly up-regulated in a cymR mutant (Soutourina et al., 2009): CymR is the master regulator of sulphur metabolism and has a parallel function as a thiol-based oxidation-sensing regulator of stress resistance and oxidative stress response in S. aureus (Ji et al., 2012). However, no conserved cysteine residue was identified in the SACOL_RS02270 sequence, which suggests SACOL_RS02270 does not have a direct role in oxidative stress sensing. The SACOL_RS02270 sequence is strongly conserved among SACOL_RS02270

IP T

orthologs but no particular sequence motifs stand out. However, similar to other lipoproteins

discussed here, SACOL_RS02270 orthologs are relatively rich in charged amino acids. For example,

the mature SACOL_RS02270 lipoprotein has 15% negatively- (D/E) and 15% positively-charged (K/R)

SC R

residues (Fig. S13). Secondary structure predictions suggest an all -helical fold following a disordered N-terminal region of about 20 amino acids (data not shown).

U

SACOL_RS03785

Secondary structure prediction suggests a coiled coil region between amino acids 20 and 77 for

N

SACOL_RS03785. The sequence of SACOL_RS03785 is also particularly rich in charged amino acids,

A

which are concentrated in the predicted coiled-coil region and the following 20 amino acids.

M

Excluding the signal peptide, 29% of the sequence is negatively charged, whilst 31% is positively charged. A strongly-charged surface with positively- and negatively-charged areas might thus play an

ED

important role in SACOL_RS03785 function. One possibility could be the interaction with other strongly-charged surface components such as lipoteichoic or wall teichoic acids. However, with a calculated length of about 40 Å for the SACOL_RS03785 monomer, a 57 amino acid long coiled coil

PT

structure could not span the entire cell wall of staphylococci (20–40 nm). Thus, an interaction with wall teichoic acids might be restricted during their biogenesis when they are still close to the

CC E

membrane. A comparison of SACOL_RS03785 orthologs within the staphylococci shows a high variation in sequence length and composition following the predicted coiled-coil region. This variable region is characterized by an over-representation of serine, glutamine and asparagine residues suggesting that this part of the protein may be intrinsically disordered (Fig. S14), reflecting a

A

potential need to interact dynamically with species-specific target structures. In some Staphylococcus species, including Staphylococcus saprophyticus, Staphylococcus equorum and S. xylosus, the SACOL_RS03785 homolog appears to be truncated at the C-terminal end of the predicted coiled-coil structure. SACOL_RS03785 shows a significant chromosomal association with two acetyltranferases of the GNAT family, but these enzymes are cytosolic and any post-translational modification (e.g. on conserved lysines) would have to occur prior to trafficking across the membrane. The transcription pattern of SACOL_RS03785 closely follows transcription of divIVA, ltaS and vraFG over many different

growth conditions (Mäder et al., 2016). DivIVA is an essential regulator of cell division in bacilli although its function in the staphylococci is less well established (Pinho and Errington, 2004). LtaS is an essential lipoteichoic acid synthase (Gründling and Schneewind, 2007) and VraFG is an ABC transporter that senses cationic antimicrobial peptides (CAMPs) through the GraRS two component system (Falord et al., 2012). GraRS in turn controls expression of genes that confer resistance against antimicrobial peptides by increasing bacterial cell surface positive charges, by D-alanylation of teichoic acids and by lysylination of phosphatidylglycerol, leading to electrostatic repulsion of CAMPs

IP T

(Herbert et al., 2007). Together, these observations suggest a possible link between SACOL_RS03785 and cell envelope biogenesis or modification.

SC R

SACOL_RS04130

The SACOL_RS04130 transcript in S. aureus strain HG001 starts with a long UTR overlapping with the coding region of SACOL_RS04125 on the opposite strand. In addition, the SACOL_RS04130 transcript

U

might extend to include SACOL_RS04135 encoding a small DUF2847 family protein of unknown function and SACOL_RS04140, encoding a glycerate kinase family protein. However, no data are

N

available at present to support a functional link between these proteins. Secondary structure

A

prediction suggests that SACOL_RS04130 is an all -helical protein (data not shown). Two regions of

M

the protein appear to be particularly well conserved. Region one, between residues 141 and 160, is characterized by three almost invariant aspartate residues and region two, between residues 233

ED

and 250, is flanked by two highly conserved arginine residues and features a tyrosine-rich YYXLXXYY motif (Fig. S15).

PT

SACOL_RS08095

SACOL_RS08095 is another lipoprotein for which a 50 amino acid coiled-coli region, located C-

CC E

terminal of the lipobox, is predicted with high confidence. Transcription data showed that SACOL_RS08095 is co-transcribed with SACOL_RS08100, a gene encoding a small protein of unknown function. Similar to SACOL_RS03785, the mature SACOL_RS08095 lipoprotein is relatively rich in positively- (21%) and negatively-charged (15%) amino acids. Particularly the lysine residues show a

A

high level of conservation among SACOL_RS08095 orthologs supporting some functional role for them. Further prominent sequence features are five highly conserved tyrosine residues. Two of these tyrosine residues are part of a strongly conserved SSXYY motif that is immediately downstream of the predicted coiled-coil region (Fig. S16). Secondary structure prediction suggests an all -helical protein (data not shown) with potentially disordered regions around residues 70 and 140. SACOL_RS12420

Coiled coil prediction algorithms gave inconclusive results for SACOL_RS12420, but a large low complexity region can be identified following the signal peptide. The low complexity region is characterized by an accumulation of positively and negatively charged residues that are often arranged as Lys-Asp or Asp-Lys pairs at the N-terminus. A similar clustering of lysine and aspartate was also observed in SACOL_RS03785 suggesting that both proteins may interact with related target structures. The charged region of about 48 amino acids in SACOL_RS12420 is followed by a 31 amino acid sequence that almost exclusively consists of asparagine and glutamine residues. Comparison of

IP T

SACOL_RS12420 orthologs encoded within the staphylococci revealed that despite relaxed sequence

conservation, accumulation of consecutive positively and negatively charged residues immediately Cterminally of the signal peptide, followed by an asparagine and glutamine rich region, is a shared

SC R

feature of this group of proteins. However, the length of the asparagine and glutamine rich low

complexity region varied considerably between species ranging from 80 residues in S. aureus to 149 residues in S. xylosus. The particularly long regions usually had a significantly increased number of

U

alanine and serine residues suggesting that this part of the protein may be intrinsically disordered. The sequence following the low complexity region shows more consistent conservation possibly

N

pointing to a more defined structure of the proteins C-terminus (Fig. 4). This conclusion is supported

A

by structure prediction indicating a high probability for a disordered region between amino acids 25

M

and 110 for SACOL_RS12420, but a well-ordered C-terminal domain. As a consequence, structural analysis should focus on the C-terminal domain because full-length constructs of the mature lipoprotein are unlikely to produce crystals. The presence of a disordered and thus highly flexible and

ED

elastic N-terminal part of the protein suggests that SACOL_RS12420 might interact with a large and/or dynamic target that is captured by SACOL_RS12420 similar to prey by the tentacles of a squid,

PT

with charged residues functioning as “suckers” and the ordered domain, possibly displaying a

CC E

catalytic function, as the “tentacular club”.

Lipoproteins of unknown function of the variable lipoproteome.

A

About 15% of the lipoproteins identified per genome were only shared by a subset of the investigated strains and thus belong to the variable lipoproteome of S. aureus. In many instances these proteins showed a broad phylogenetic distribution indicating that these lipoproteins are recent acquisitions originating from a variety of bacterial species. Examples include beta-lactamases, proteins with uncharacterized domains (e.g. DUF1541 and DUF1433) and a large number of proteins that display no conserved domain features (Fig. S1, Table S1). For these proteins it remains to be determined if they provide an advantage to S. aureus under specific environmental conditions or rather are non-functional hitchhikers, e.g. on mobile genetic elements, that will be lost during

progressive strain evolution. A particularly unusual potential lipoprotein present in only one of the S. aureus strains investigated here is AUC50_RS00145. Proteins related to AUC50_RS00145 were also identified in other staphylococci including several strains of S. hominis, S. captis and particularly S. epidermidis. AUC50_RS00145 is characterized by a highly conserved 31 amino acid sequence following the signal peptide that contains 24 serine residues suggesting this lipoprotein is unstructured in this region. Thus, the variable lipoproteome of S. aureus might not only attract the

function-structure relationships of intrinsically-disordered proteins.

SC R

Conclusion

IP T

attention of the infection biologist but it also holds intriguing examples for those interested in

S. aureus devotes about 2-3% of its coding capacity to lipoproteins corresponding to about 70 lipoproteins per strain. Out of these, about 60 lipoproteins (85%) were identified in at least 95% of

U

the 123 genome sequences investigated in this study. Almost half of these conserved lipoproteins are

N

poorly or not functionally characterized, representing a large gap in our knowledge of S. aureus

A

surface proteins.

M

A large number of uncharacterized lipoproteins are transcribed in response to stress conditions that increase membrane permeability or interfere with various steps of peptidoglycan biosynthesis. In addition, transcription of the same lipoproteins is also affected in strains carrying mutations in genes

ED

encoding regulators of cell wall biosynthesis and envelope stress responses. It will be important to investigate if these effects are directly related to the function of the lipoproteins, or are rather

PT

indirect effects that reflect changes in the physicochemical environment of the affected lipoproteins. Some of the lipoproteins discussed here for which structural information is available share a common

CC E

topology, suggesting that the PepSY proteins (4EXR), the Lpls (e.g. 4EGD) and the mycobacterial LprG (4ZRA) might have arisen from a common ancestor. In a first step, the duplication of a gene encoding a small protein with a 4-stranded anti-parallel -sheet as main feature might have resulted in the formation of a two-domain protein in which the antiparallel directionality of the -sheet was

A

maintained. This 8-stranded -sheet later served as a scaffold for the addition of further structural elements that were inserted in the loop regions linking the -strands. Usually the -sheet forms one side of the molecule, whilst the other structural elements are positioned on the opposite side. This fold might have proven to be particularly well suited to adopt a multitude of different functions in the membrane-cell wall interface. Structural comparison of the PepSY, Lpl and LprG proteins further suggests that the PepSY domain followed a route of evolution distinct from the Lpls and LprG, which

might share a more closely related common ancestor. Clearly, more detailed sequence, structural and functional analysis will be required to verify the evolutionary origin of these proteins. The inspection of molecular surfaces and conserved surface exposed amino acids identified regions in many lipoproteins that are possibly involved in interaction with components of the staphylococcal cell envelope, small molecules or host structures. Some of these interactions might require a high structural elasticity as prominent disordered regions were predicted for several of the poorly

IP T

characterized lipoproteins. This has to be taken into account if these proteins become the target of

SC R

structural and functional investigations.

U

Funding

N

This work was supported by the Deutsche Forschungsgemeinschaft within the framework of

M

A

SFB/Transregio 34 and GRK1870.

ED

Conflict of interest

The authors declare that they have no conflicts of interest with the content of this article.

PT

Acknowledgments:

We would like to thank Lukas Ratz for help with analysis of Lpl sequences. This work was supported

A

CC E

by the Deutsche Forschungsgemeinschaft within the framework of SFB/Transregio 34 and GRK1870.

References Akira, S., 2003. Mammalian Toll-like receptors. Current Opinion in Immunology 15 (1), 5–11. Aliprantis, A.O., Yang, R.B., Mark, M.R., Suggett, S., Devaux, B., Radolf, J.D., Klimpel, G.R., Godowski, P., Zychlinsky, A., 1999. Cell activation and apoptosis by bacterial lipoproteins through toll-like receptor-2. Science (New York, N.Y.) 285 (5428), 736–739.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Anderson, K.L., Roberts, C., Disz, T., Vonstein, V., Hwang, K., Overbeek, R., Olson, P.D., Projan, S.J., Dunman, P.M., 2006. Characterization of the Staphylococcus aureus heat shock, cold shock, stringent, and SOS responses and their effects on log-phase mRNA turnover. J. Bacteriol. 188 (19), 6739–6756. Antoine, R., Huvent, I., Chemlal, K., Deray, I., Raze, D., Locht, C., Jacob-Dubuisson, F., 2005. The periplasmic binding protein of a tripartite tricarboxylate transporter is involved in signal transduction. Journal of Molecular Biology 351 (4), 799–809. Arnaouteli, S., Giastas, P., Andreou, A., Tzanodaskalaki, M., Aldridge, C., Tzartos, S.J., Vollmer, W., Eliopoulos, E., Bouriotis, V., 2015. Two Putative Polysaccharide Deacetylases Are Required for Osmotic Stability and Cell Shape Maintenance in Bacillus anthracis. The Journal of biological chemistry 290 (21), 13465–13478. Ashkenazy, H., Abadi, S., Martz, E., Chay, O., Mayrose, I., Pupko, T., Ben-Tal, N., 2016. ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic acids research 44 (W1), W344-50. Balibar, C.J., Shen, X., McGuire, D., Yu, D., McKenney, D., Tao, J., 2010. cwrA, a gene that specifically responds to cell wall damage in Staphylococcus aureus. Microbiology (Reading, England) 156 (Pt 5), 1372–1383. Bateman, A., Rawlings, N.D., 2003. The CHAP domain: a large family of amidases including GSP amidase and peptidoglycan hydrolases. Trends in biochemical sciences 28 (5), 234–237. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E., 2000. The Protein Data Bank. Nucleic acids research 28 (1), 235–242. Bischoff, M., Dunman, P., Kormanec, J., Macapagal, D., Murphy, E., Mounts, W., Berger-Bächi, B., Projan, S., 2004. Microarray-based analysis of the Staphylococcus aureus sigmaB regulon. J. Bacteriol. 186 (13), 4085–4099. Boneca, I.G., Dussurget, O., Cabanes, D., Nahori, M.-A., Sousa, S., Lecuit, M., Psylinakis, E., Bouriotis, V., Hugot, J.-P., Giovannini, M., Coyle, A., Bertin, J., Namane, A., Rousselle, J.-C., Cayet, N., Prévost, M.-C., Balloy, V., Chignard, M., Philpott, D.J., Cossart, P., Girardin, S.E., 2007. A critical role for peptidoglycan N-deacetylation in Listeria evasion from the host innate immune system. Proceedings of the National Academy of Sciences of the United States of America 104 (3), 997– 1002. Braun, P., Bitter, W., Tommassen, J., 2000. Activation of Pseudomonas aeruginosa elastase in Pseudomonas putida by triggering dissociation of the propeptide-enzyme complex. Microbiology (Reading, England) 146 (Pt 10), 2565–2572. Buddelmeijer, N., 2015. The molecular mechanism of bacterial lipoprotein modification--how, when and why? FEMS microbiology reviews 39 (2), 246–261. Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. BLAST+: architecture and applications. BMC bioinformatics 10, 421. Cheung, G.Y.C., Wang, R., Khan, B.A., Sturdevant, D.E., Otto, M., 2011. Role of the accessory gene regulator agr in community-associated methicillin-resistant Staphylococcus aureus pathogenesis. Infection and immunity 79 (5), 1927–1935. Delauné, A., Dubrac, S., Blanchet, C., Poupel, O., Mäder, U., Hiron, A., Leduc, A., Fitting, C., Nicolas, P., Cavaillon, J.-M., Adib-Conquy, M., Msadek, T., 2012. The WalKR system controls major staphylococcal virulence genes and is involved in triggering the host inflammatory response. Infection and immunity 80 (10), 3438–3453.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Delgado, A., Zaman, S., Muthaiyan, A., Nagarajan, V., Elasri, M.O., Wilkinson, B.J., Gustafson, J.E., 2008. The fusidic acid stimulon of Staphylococcus aureus. The Journal of antimicrobial chemotherapy 62 (6), 1207–1214. Delorenzi, M., Speed, T., 2002. An HMM model for coiled-coil domains and a comparison with PSSMbased predictions. Bioinformatics 18 (4), 617–625. Diep, B.A., Gill, S.R., Chang, R.F., Phan, T.H., Chen, J.H., Davidson, M.G., Lin, F., Lin, J., Carleton, H.A., Mongodin, E.F., Sensabaugh, G.F., Perdreau-Remington, F., 2006. Complete genome sequence of USA300, an epidemic clone of community-acquired meticillin-resistant Staphylococcus aureus. Lancet (London, England) 367 (9512), 731–739. Drage, M.G., Tsai, H.-C., Pecora, N.D., Cheng, T.-Y., Arida, A.R., Shukla, S., Rojas, R.E., Seshadri, C., Moody, D.B., Boom, W.H., Sacchettini, J.C., Harding, C.V., 2010. Mycobacterium tuberculosis lipoprotein LprG (Rv1411c) binds triacylated glycolipid agonists of Toll-like receptor 2. Nature structural & molecular biology 17 (9), 1088–1095. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic acids research 32 (5), 1792–1797. Falord, M., Karimova, G., Hiron, A., Msadek, T., 2012. GraXSR proteins interact with the VraFG ABC transporter to form a five-component system required for cationic antimicrobial peptide sensing and resistance in Staphylococcus aureus. Antimicrobial agents and chemotherapy 56 (2), 1047– 1058. Flower, D.R., North, A.C., Sansom, C.E., 2000. The lipocalin protein family: structural and sequence overview. Biochimica et biophysica acta 1482 (1-2), 9–24. Goosens, V.J., van Dijl, J.M., 2017. Twin-Arginine Protein Translocation, in: Bagnoli, F., Rappuoli, R. (Eds.), Protein and Sugar Export and Assembly in Gram-positive Bacteria. Springer International Publishing, Cham, s.l., pp. 69–94. Gründling, A., Schneewind, O., 2007. Synthesis of glycerol phosphate lipoteichoic acid in Staphylococcus aureus. Proceedings of the National Academy of Sciences of the United States of America 104 (20), 8478–8483. Herbert, S., Bera, A., Nerz, C., Kraus, D., Peschel, A., Goerke, C., Meehl, M., Cheung, A., Götz, F., 2007. Molecular basis of resistance to muramidase and cationic antimicrobial peptide activity of lysozyme in staphylococci. PLoS pathogens 3 (7), e102. Hewitt, L., Kasche, V., Lummer, K., Lewis, R.J., Murshudov, G.N., Verma, C.S., Dodson, G.G., Wilson, K.S., 2000. Structure of a slow processing precursor penicillin acylase from Escherichia coli reveals the linker peptide blocking the active-site cleft. Journal of Molecular Biology 302 (4), 887–898. Hussain, M., Ichihara, S., Mizushima, S., 1982. Mechanism of signal peptide cleavage in the biosynthesis of the major lipoprotein of the Escherichia coli outer membrane. The Journal of biological chemistry 257 (9), 5177–5182. Ishida, T., Kinoshita, K., 2007. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic acids research 35 (Web Server issue), W460-4. Ji, Q., Zhang, L., Sun, F., Deng, X., Liang, H., Bae, T., He, C., 2012. Staphylococcus aureus CymR is a new thiol-based oxidation-sensing regulator of stress resistance and oxidative response. The Journal of biological chemistry 287 (25), 21102–21109. Käll, L., Krogh, A., Sonnhammer, E.L.L., 2007. Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic acids research 35 (Web Server issue), W429-32. Kumari, N., Götz, F., Nguyen, M.-T., 2017. Aspartate tightens the anchoring of staphylococcal lipoproteins to the cytoplasmic membrane. MicrobiologyOpen.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Kurokawa, K., Kim, M.-S., Ichikawa, R., Ryu, K.-H., Dohmae, N., Nakayama, H., Lee, B.L., 2012. Environment-mediated accumulation of diacyl lipoproteins over their triacyl counterparts in Staphylococcus aureus. J. Bacteriol. 194 (13), 3299–3306. Kurokawa, K., Lee, H., Roh, K.-B., Asanuma, M., Kim, Y.S., Nakayama, H., Shiratsuchi, A., Choi, Y., Takeuchi, O., Kang, H.J., Dohmae, N., Nakanishi, Y., Akira, S., Sekimizu, K., Lee, B.L., 2009. The Triacylated ATP Binding Cluster Transporter Substrate-binding Lipoprotein of Staphylococcus aureus Functions as a Native Ligand for Toll-like Receptor 2. The Journal of biological chemistry 284 (13), 8406–8411. Kyte, J., Doolittle, R.F., 1982. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology 157 (1), 105–132. Lambert, C., Lerner, T.R., Bui, N.K., Somers, H., Aizawa, S.-I., Liddell, S., Clark, A., Vollmer, W., Lovering, A.L., Sockett, R.E., 2016. Interrupting peptidoglycan deacetylation during Bdellovibrio predator-prey interaction prevents ultimate destruction of prey wall, liberating bacterial-ghosts. Scientific reports 6, 26010. Lechner, M., Findeiss, S., Steiner, L., Marz, M., Stadler, P.F., Prohaska, S.J., 2011. Proteinortho: detection of (co-)orthologs in large-scale analysis. BMC bioinformatics 12, 124. Letunic, I., Doerks, T., Bork, P., 2015. SMART: recent updates, new developments and status in 2015. Nucleic acids research 43 (Database issue), D257-60. Lupas, A., van Dyke, M., Stock, J., 1991. Predicting coiled coils from protein sequences. Science (New York, N.Y.) 252 (5009), 1162–1164. Mäder, U., Nicolas, P., Depke, M., Pané-Farré, J., Debarbouille, M., van der Kooi-Pol, M.M., Guérin, C., Dérozier, S., Hiron, A., Jarmer, H., Leduc, A., Michalik, S., Reilman, E., Schaffer, M., Schmidt, F., Bessières, P., Noirot, P., Hecker, M., Msadek, T., Völker, U., van Dijl, J.M., 2016. Staphylococcus aureus Transcriptome Architecture: From Laboratory to Infection-Mimicking Conditions. PLoS genetics 12 (4), e1005962. Majerczyk, C.D., Dunman, P.M., Luong, T.T., Lee, C.Y., Sadykov, M.R., Somerville, G.A., Bodi, K., Sonenshein, A.L., 2010. Direct targets of CodY in Staphylococcus aureus. J. Bacteriol. 192 (11), 2861–2877. Målen, H., Berven, F.S., Søfteland, T., Arntzen, M.Ø., D'Santos, C.S., Souza, G.A. de, Wiker, H.G., 2008. Membrane and membrane-associated proteins in Triton X-114 extracts of Mycobacterium bovis BCG identified using a combination of gel-based and gel-free fractionation strategies. Proteomics 8 (9), 1859–1870. Martinot, A.J., Farrow, M., Bai, L., Layre, E., Cheng, T.-Y., Tsai, J.H., Iqbal, J., Annand, J.W., Sullivan, Z.A., Hussain, M.M., Sacchettini, J., Moody, D.B., Seeliger, J.C., Rubin, E.J., 2016. Mycobacterial Metabolic Syndrome: LprG and Rv1410 Regulate Triacylglyceride Levels, Growth Rate and Virulence in Mycobacterium tuberculosis. PLoS pathogens 12 (1), e1005351. Moriyama, R., Kudoh, S., Miyata, S., Nonobe, S., Hattori, A., Makino, S., 1996. A germination-specific spore cortex-lytic enzyme from Bacillus cereus spores: Cloning and sequencing of the gene and molecular characterization of the enzyme. Journal of bacteriology 178 (17), 5330–5332. Muthaiyan, A., Silverman, J.A., Jayaswal, R.K., Wilkinson, B.J., 2008. Transcriptional profiling reveals that daptomycin induces the Staphylococcus aureus cell wall stress stimulon and genes responsive to membrane depolarization. Antimicrobial agents and chemotherapy 52 (3), 980– 990. Nguyen, M.T., Götz, F., 2016a. Lipoproteins of Gram-Positive Bacteria: Key Players in the Immune Response and Virulence. Microbiology and molecular biology reviews : MMBR 80 (3), 891–903.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Nguyen, M.T., Götz, F., 2016b. Lipoproteins of Gram-Positive Bacteria: Key Players in the Immune Response and Virulence. Microbiology and Molecular Biology Reviews 80 (3), 891–903. Nguyen, M.T., Kraft, B., Yu, W., Demircioglu, D.D., Demicrioglu, D.D., Hertlein, T., Burian, M., Schmaler, M., Boller, K., Bekeredjian-Ding, I., Ohlsen, K., Schittek, B., Götz, F., 2015. The νSaα Specific Lipoprotein Like Cluster (lpl) of S. aureus USA300 Contributes to Immune Stimulation and Invasion in Human Cells. PLoS pathogens 11 (6), e1004984. Nguyen, M.-T., Deplanche, M., Nega, M., Le Loir, Y., Peisl, L., Götz, F., Berkova, N., 2016. Staphylococcus aureus Lpl Lipoproteins Delay G2/M Phase Transition in HeLa Cells. Frontiers in cellular and infection microbiology 6, 201. O'Leary, N.A., Wright, M.W., Brister, J.R., Ciufo, S., Haddad, D., McVeigh, R., Rajput, B., Robbertse, B., Smith-White, B., Ako-Adjei, D., Astashyn, A., Badretdin, A., Bao, Y., Blinkova, O., Brover, V., Chetvernin, V., Choi, J., Cox, E., Ermolaeva, O., Farrell, C.M., Goldfarb, T., Gupta, T., Haft, D., Hatcher, E., Hlavina, W., Joardar, V.S., Kodali, V.K., Li, W., Maglott, D., Masterson, P., McGarvey, K.M., Murphy, M.R., O'Neill, K., Pujar, S., Rangwala, S.H., Rausch, D., Riddick, L.D., Schoch, C., Shkeda, A., Storz, S.S., Sun, H., Thibaud-Nissen, F., Tolstoy, I., Tully, R.E., Vatsan, A.R., Wallin, C., Webb, D., Wu, W., Landrum, M.J., Kimchi, A., Tatusova, T., DiCuccio, M., Kitts, P., Murphy, T.D., Pruitt, K.D., 2016. gkv1189 // Reference sequence (RefSeq) database at NCBI: Current status, taxonomic expansion, and functional annotation. Nucleic acids research 44 (D1), D733-45. Pané-Farré, J., Jonas, B., Förstner, K., Engelmann, S., Hecker, M., 2006. The sigmaB regulon in Staphylococcus aureus and its regulation. International journal of medical microbiology : IJMM 296 (4-5), 237–258. Parker, M.L., Houston, S., Pětrošová, H., Lithgow, K.V., Hof, R., Wetherell, C., Kao, W.-C., Lin, Y.-P., Moriarty, T.J., Ebady, R., Cameron, C.E., Boulanger, M.J., 2016. The Structure of Treponema pallidum Tp0751 (Pallilysin) Reveals a Non-canonical Lipocalin Fold That Mediates Adhesion to Extracellular Matrix Components and Interactions with Host Cells. PLoS pathogens 12 (9), e1005919. Petersen, T.N., Brunak, S., Heijne, G. von, Nielsen, H., 2011. SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature methods 8 (10), 785–786. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.E., 2004. UCSF Chimera--a visualization system for exploratory research and analysis. Journal of computational chemistry 25 (13), 1605–1612. Pinho, M.G., Errington, J., 2004. A divIVA null mutant of Staphylococcus aureus undergoes normal cell division. FEMS microbiology letters 240 (2), 145–149. Ponstingl, H., Henrick, K., Thornton, J.M., 2000. Discriminating between homodimeric and monomeric proteins in the crystalline state. Proteins 41 (1), 47–57. Rahman, O., Cummings, S.P., Harrington, D.J., Sutcliffe, I.C., 2008. Methods for the bioinformatic identification of bacterial lipoproteins encoded in the genomes of Gram-positive bacteria. World Journal of Microbiology and Biotechnology 24 (11), 2377. Robert, X., Gouet, P., 2014. Deciphering key features in protein structures with the new ENDscript server. Nucleic acids research 42 (Web Server issue), W320-4. Saïd-Salim, B., Dunman, P.M., McAleese, F.M., Macapagal, D., Murphy, E., McNamara, P.J., Arvidson, S., Foster, T.J., Projan, S.J., Kreiswirth, B.N., 2003. Global regulation of Staphylococcus aureus genes by Rot. J. Bacteriol. 185 (2), 610–619. Sass, P., Jansen, A., Szekat, C., Sass, V., Sahl, H.-G., Bierbaum, G., 2008. The lantibiotic mersacidin is a strong inducer of the cell wall stress response of Staphylococcus aureus. BMC microbiology 8, 186.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Schluepen, C., Malito, E., Marongiu, A., Schirle, M., McWhinnie, E., Lo Surdo, P., Biancucci, M., Falugi, F., Nardi-Dei, V., Marchi, S., Fontana, M.R., Lombardi, B., Falco, M.G. de, Rinaudo, C.D., Spraggon, G., Nissum, M., Bagnoli, F., Grandi, G., Bottomley, M.J., Liberatori, S., 2013. Mining the bacterial unknown proteome: identification and characterization of a novel family of highly conserved protective antigens in Staphylococcus aureus. The Biochemical journal 455 (3), 273–284. Schmaler, M., Jann, N.J., Ferracin, F., Landolt, L.Z., Biswas, L., Götz, F., Landmann, R., 2009. Lipoproteins in Staphylococcus aureus mediate inflammation by TLR2 and iron-dependent growth in vivo. Journal of immunology (Baltimore, Md. : 1950) 182 (11), 7110–7118. Schmaler, M., Jann, N.J., Götz, F., Landmann, R., 2010. Staphylococcal lipoproteins and their role in bacterial survival in mice. International journal of medical microbiology : IJMM 300 (2-3), 155– 160. Shahmirzadi, S.V., Nguyen, M.-T., Götz, F., 2016. Evaluation of Staphylococcus aureus Lipoproteins: Role in Nutritional Acquisition and Pathogenicity. Frontiers in microbiology 7, 1404. Sheldon, J.R., Heinrichs, D.E., 2012. The iron-regulated staphylococcal lipoproteins. Frontiers in cellular and infection microbiology 2, 41. Shukla, S., Richardson, E.T., Athman, J.J., Shi, L., Wearsch, P.A., McDonald, D., Banaei, N., Boom, W.H., Jackson, M., Harding, C.V., 2014. Mycobacterium tuberculosis lipoprotein LprG binds lipoarabinomannan and determines its cell envelope localization to control phagolysosomal fusion. PLoS pathogens 10 (10), e1004471. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., Thompson, J.D., Higgins, D.G., 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular systems biology 7, 539. Sigrist, C.J.A., Castro, E. de, Cerutti, L., Cuche, B.A., Hulo, N., Bridge, A., Bougueleret, L., Xenarios, I., 2013. New and continuing developments at PROSITE. Nucleic acids research 41 (Database issue), D344-7. Song, Y., Lunde, C.S., Benton, B.M., Wilkinson, B.J., 2012. Further insights into the mode of action of the lipoglycopeptide telavancin through global gene expression studies. Antimicrobial agents and chemotherapy 56 (6), 3157–3164. Soutourina, O., Poupel, O., Coppée, J.-Y., Danchin, A., Msadek, T., Martin-Verstraete, I., 2009. CymR, the master regulator of cysteine metabolism in Staphylococcus aureus, controls host sulphur source utilization and plays a role in biofilm formation. Molecular microbiology 73 (2), 194–211. Stivala, A., Wybrow, M., Wirth, A., Whisstock, J.C., Stuckey, P.J., 2011. Automatic generation of protein structure cartoons with Pro-origami. Bioinformatics (Oxford, England) 27 (23), 3315– 3316. Stoll, H., Dengjel, J., Nerz, C., Götz, F., 2005. Staphylococcus aureus deficient in lipidation of prelipoproteins is attenuated in growth and immune activation. Infection and immunity 73 (4), 2411–2423. Sully, E.K., Malachowa, N., Elmore, B.O., Alexander, S.M., Femling, J.K., Gray, B.M., DeLeo, F.R., Otto, M., Cheung, A.L., Edwards, B.S., Sklar, L.A., Horswill, A.R., Hall, P.R., Gresham, H.D., 2014. Selective chemical inhibition of agr quorum sensing in Staphylococcus aureus promotes host defense with minimal impact on resistance. PLoS pathogens 10 (6), e1004174. Sulzenbacher, G., Canaan, S., Bordat, Y., Neyrolles, O., Stadthagen, G., Roig-Zamboni, V., Rauzier, J., Maurin, D., Laval, F., Daffé, M., Cambillau, C., Gicquel, B., Bourne, Y., Jackson, M., 2006. LppX is a lipoprotein required for the translocation of phthiocerol dimycocerosates to the surface of Mycobacterium tuberculosis. The EMBO journal 25 (7), 1436–1444.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Sutcliffe, I.C., Harrington, D.J., 2002. Pattern searches for the identification of putative lipoprotein genes in Gram-positive bacterial genomes. Microbiology (Reading, England) 148 (Pt 7), 2065– 2077. Sutcliffe, I.C., Harrington, D.J., 2004. Putative lipoproteins of Streptococcus agalactiae identified by bioinformatic genome analysis. Antonie van Leeuwenhoek 85 (4), 305–315. Szewczyk, J., Collet, J.-F., 2016. The Journey of Lipoproteins Through the Cell: One Birthplace, Multiple Destinations. Advances in microbial physiology 69, 1–50. Tang, B., Nirasawa, S., Kitaoka, M., Marie-Claire, C., Hayashi, K., 2003. General function of N-terminal propeptide on assisting protein folding and inhibiting catalytic activity based on observations with a chimeric thermolysin-like protease. Biochemical and biophysical research communications 301 (4), 1093–1098. Tetsch, L., Jung, K., 2009. The regulatory interplay between membrane-integrated sensors and transport proteins in bacteria. Molecular microbiology 73 (6), 982–991. Tokunaga, M., Tokunaga, H., Wu, H.C., 1982. Post-translational modification and processing of Escherichia coli prolipoprotein in vitro. Proceedings of the National Academy of Sciences of the United States of America 79 (7), 2255–2259. Tschierske, M., Mori, C., Rohrer, S., Ehlert, K., Shaw, K.J., Berger-Bächi, B., 1999. Identification of three additional femAB-like open reading frames in Staphylococcus aureus. FEMS microbiology letters 171 (2), 97–102. Tsuru, T., Kobayashi, I., 2008. Multiple genome comparison within a bacterial species reveals a unit of evolution spanning two adjacent genes in a tandem paralog cluster. Molecular biology and evolution 25 (11), 2457–2473. Turk, V., Stoka, V., Turk, D., 2008. Cystatins: biochemical and structural properties, and medical relevance. Frontiers in bioscience : a journal and virtual library 13, 5406–5420. Tzanis, A., Dalton, K.A., Hesketh, A., den Hengst, C.D., Buttner, M.J., Thibessard, A., Kelemen, G.H., 2014. A sporulation-specific, sigF-dependent protein, SspA, affects septum positioning in Streptomyces coelicolor. Molecular microbiology 91 (2), 363–380. Üstok, F.I., Chirgadze, D.Y., Christie, G., 2015. Crystal structure of the PepSY-containing domain of the YpeB protein involved in germination of bacillus spores. Proteins 83 (10), 1914–1921. Utaida, S., Dunman, P.M., Macapagal, D., Murphy, E., Projan, S.J., Singh, V.K., Jayaswal, R.K., Wilkinson, B.J., 2003. Genome-wide transcriptional profiling of the response of Staphylococcus aureus to cell-wall-active antibiotics reveals a cell-wall-stress stimulon. Microbiology (Reading, England) 149 (Pt 10), 2719–2732. Vollmer, W., Tomasz, A., 2000. The pgdA gene encodes for a peptidoglycan N-acetylglucosamine deacetylase in Streptococcus pneumoniae. The Journal of biological chemistry 275 (27), 20496– 20501. Vu, C.H., Kolata, J., Stentzel, S., Beyer, A., Gesell Salazar, M., Steil, L., Pane-Farre, J., Ruhmling, V., Engelmann, S., Gotz, F., van Dijl, J.M., Hecker, M., Mader, U., Schmidt, F., Volker, U., Broker, B.M., 2016. Adaptive immune response to lipoproteins of Staphylococcus aureus in healthy subjects. Proteomics. Vuong, C., Kocianova, S., Voyich, J.M., Yao, Y., Fischer, E.R., DeLeo, F.R., Otto, M., 2004. A crucial role for exopolysaccharide modification in bacterial biofilm formation, immune evasion, and virulence. The Journal of biological chemistry 279 (52), 54881–54886. Wolf, E., Kim, P.S., Berger, B., 1997. MultiCoil: a program for predicting two- and three-stranded coiled coils. Protein science : a publication of the Protein Society 6 (6), 1179–1189.

A

CC E

PT

ED

M

A

N

U

SC R

IP T

Wu, Y., Punta, M., Xiao, R., Acton, T.B., Sathyamoorthy, B., Dey, F., Fischer, M., Skerra, A., Rost, B., Montelione, G.T., Szyperski, T., 2012. NMR structure of lipoprotein YxeF from Bacillus subtilis reveals a calycin fold and distant homology with the lipocalin Blc from Escherichia coli. PloS one 7 (6), e37404. Yang, J., Yan, R., Roy, A., Xu, D., Poisson, J., Zhang, Y., 2015. The I-TASSER Suite: protein structure and function prediction. Nature methods 12 (1), 7–8. Yeats, C., Rawlings, N.D., Bateman, A., 2004. The PepSY domain: a regulator of peptidase activity in the microbial environment? Trends in biochemical sciences 29 (4), 169–172. Zhang, D., Souza, R.F. de, Anantharaman, V., Iyer, L.M., Aravind, L., 2012. Polymorphic toxin systems: Comprehensive characterization of trafficking modes, processing, mechanisms of action, immunity and ecology using comparative genomics. Biology direct 7, 18. Zhang, Y., Gardina, P.J., Kuebler, A.S., Kang, H.S., Christopher, J.A., Manson, M.D., 1999. Model of maltose-binding protein/chemoreceptor complex supports intrasubunit signaling mechanism. Proceedings of the National Academy of Sciences of the United States of America 96 (3), 939– 944.

Legends to figures: Figure 1: Protein domains and domain organization of uncharacterized lipoproteins. Blue boxes indicate location and size of identified conserved protein domains. The phylogenetic distribution is summarized in the table on the right and the sequence of the lipobox is shown on the left. Figure 2: Structure prediction and sequence conservation of the PepSY domain protein SACOL_RS02240. (A) Superposition and sequence alignment of the PepSY1 (dark blue) and PepSY2

IP T

(cyan) domains of CD630_1622 from C. difficile (PDB: 4EXR). (B) Alignment of the CD630_1622 crystal

structure (dark blue) and the predicted structure of SACOL_RS02240 (cyan). Only residues of the SACOL_RS02240 polypeptide chain that have an equivalent in the CD630_1622 structure are shown;

SC R

the TM-score and r.m.s.d. for the SACOL_RS02240 model in comparison to 4EXR are 0.732 and 1.1 Å,

respectively. (C) Sequence alignment of the CD630_1622 PepSY1 and PepSY2 domain with secondary structural elements of PepSY1 sketched above (D) Amino acid conservation of SACOL_RS02240

U

orthologs projected on the modeled SACOL_RS02240 structure. Conservation scores were calculated using ConSurf. Coloring is according to the level of conservation (blue = low, red = high). Highly

N

conserved surface exposed residues are shown as side chains and labeled according to the

A

SACOL_RS02240 amino acid sequence. Unless otherwise stated, the ConSurf (Ashkenazy et al., 2016)

M

analysis and structural superposition (using the MatchMaker tool) were generated with Chimera (Pettersen et al., 2004). The annotated alignment in this figure and others was generated with the

ED

Espript tool (Robert and Gouet, 2014).

Figure 3: The Lpl fold is highly conserved among members of the Lpl family. A) Superposition of available Lpl structures: 4EGD (blue), 4BIH (cyan) and 4BIG (gold). The r.m.s.d.s between Lpl

PT

structures is 0.8 ± 0.05 Å. B) Sequence alignment of crystallized Lpl proteins with secondary structure elements for 4EGD drawn above the alignment. C) Conservation scores of surface exposed amino

CC E

acids were calculated with ConSurf, with the same colour scheme as in Figure 3 Figure 4. Amino acid conservation of SACOL_RS12420 orthologs. A) The asterisk indicates the conserved cysteine residue of the lipobox. A grey bar underlines the sequence corresponding to

A

SACOL_RS12420. The region showing a significant accumulation of consecutive charged amino acids is underlined in blue. The dashed green line depicts the asparagine, glutamine and serine rich part of the low complexity region that shows considerable differences in length among SACOL_RS12420 orthologs from different staphylococcal species. The probably well-structured C-terminal domain is indicated by a solid green line. Large gaps introduced in the alignment due to differences in the length of the disordered region among SACOL_RS12420 orthologs were excluded from the WebLogo.

B) Predication of disorder probability using PrDOS: dotted line disorder probability, red line

A

CC E

PT

ED

M

A

N

U

SC R

IP T

threshold, false positive rate was set to 5% (Ishida and Kinoshita, 2007).

A ED

PT

CC E

IP T

SC R

U

N

A

M

A ED

PT

CC E

IP T

SC R

U

N

A

M

A ED

PT

CC E

IP T

SC R

U

N

A

M