Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315

Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315

Accepted Manuscript Title: Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315 Authors: Jyoti Prava, Pranavathiyan...

608KB Sizes 0 Downloads 145 Views

Accepted Manuscript Title: Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315 Authors: Jyoti Prava, Pranavathiyani G, Archana Pan PII: DOI: Reference:

S0141-8130(17)32061-5 https://doi.org/10.1016/j.ijbiomac.2017.10.169 BIOMAC 8464

To appear in:

International Journal of Biological Macromolecules

Received date: Revised date: Accepted date:

23-6-2017 26-9-2017 26-10-2017

Please cite this article as: Jyoti Prava, Pranavathiyani G, Archana Pan, Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315, International Journal of Biological Macromolecules https://doi.org/10.1016/j.ijbiomac.2017.10.169 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315

Jyoti Prava, Pranavathiyani G and Archana Pan* Centre for Bioinformatics, Pondicherry University, Pondicherry, India 605014.

*Corresponding Author: Archana Pan Centre for Bioinformatics Pondicherry University Pondicherry-605014, India Tel: +91-0413-2654584 Email: [email protected]

Abstract Staphylococcus aureus, the causative agent of nosocomial infections worldwide, has acquired resistance to almost all antibiotics stressing the need to develop novel drugs against this pathogen. In S.aureus N315, 302 genes have been identified as essential genes, indispensable for growth and survival of the pathogen. The functions of 40 proteins encoded by S.aureus essential genes were found to be hypothetical and thus referred as essential hypothetical proteins (EHPs). The present study aims to carry out functional characterization of EHPs using bioinformatics tools/databases, whose performance was assessed by Receiver operating characteristic curve analysis. Evaluation of physicochemical parameters, homology search against known proteins, domain analysis, subcellular localization analysis and virulence prediction assisted us to characterize EHPs. Functional assignment for 35 EHPs was made with high confidence. They belong to different functional classes like enzymes, binding proteins, miscellaneous proteins, helicases, transporters and virulence factors. Around 35% of EHPs were from hydrolases family. A group of EHPs (32.5%) were predicted as virulence factors. Of 35, 19 essential pathogen-specific proteins were considered as probable drug targets. Two targets were found to be druggable and others were novel targets. Outcome of the study could aid to identify novel drugs for better treatment of S.aureus infections. Keywords: Staphylococcus aureus, Essential hypothetical proteins, Functional annotation, Drug targets

Introduction

Staphylococcus aureus is a Gram positive bacterium belonging to the phylum Firmicutes. The organism is the etiological agent of a wide variety of human diseases ranging from mild skin infections and food-poisoning to severe life-threatening pneumonia, meningitis, endocarditis infections, osteomyelitis, sepsis and toxic shock syndrome. According to World Health Organization, S. aureus infections pose the greatest threat to mankind than cancer. Over the past decades, the incidence of S. aureus infections has increased dramatically owing to the emergence of antibiotic-resistant strains, such as methicillin-resistant S. aureus (MRSA) [1] and vancomycin-resistant S. aureus (VRSA) [2]. In particular, MRSA has overcome the effectiveness of almost all antibiotics currently available in the market and is considered as the major cause of nosocomial infections worldwide [1]. The scenario emphasizes the need to develop new drugs for preventing and controlling such infections.

A small fraction of genes in a genome are absolutely necessary for the growth and survival of an organism and thus they are considered as foundation of life. These genes are termed as essential genes, and proteins encoded by them are referred as essential proteins. Theoretically all essential genes/proteins in a genome are potential drug targets as deletion or inactivation of such genes/proteins is lethal for the organism. Thus, the prediction of gene essentiality in a pathogenic microorganism could help to shortlist potential drug targets for designing antimicrobial agents. Understanding the function of essential genes/proteins is a necessary step towards exploring the basic principle of cell functionality, which in turn facilitates to comprehend pathogen system. Database of Essential Genes (DEG) is a repository of essential genomic elements, such as protein-coding genes and non-coding RNAs, identified experimentally from several bacteria, archaea and eukaryotes [3]. A large number of essential genes for S. aureus N315 have been identified using antisense RNA technique [4,5]. S. aureus N315 comprises 2624 protein-coding genes out of which 302 are reported as essential genes in DEG. The function of 40 proteins encoded by essential genes of this pathogen is unknown opening an avenue to functionally characterize these proteins. The proteins of unknown function are referred as hypothetical proteins (HPs). Functional characterization of hypothetical proteins can lead to the identification of novel therapeutic targets facilitating the process of drug repositioning [6]. Further functional classification of hypothetical proteins into different functional categories could shed a light into their structures, activities and their roles in the metabolism [6]. Several Bioinformatics tools and databases are available for

functional characterization of HPs [6]. These have been successfully used to annotate the function of uncharacterized HPs of various pathogens, including Vibrio cholerae O139, Candida dubliniensis, Chlamydia trachomatis, Leptospira interrogans, Rickettsia massiliae MTU5, Mycobacterium tuberculosis, Haemophilus influenzae, Rickettsia rickettsii, Neisseria meningitidis MC58, Mycobacterium leprae Br4923, Treponema pallidum and Borrelia burgdorferi [7–18]. In the present study, the functional characterization of 40 essential hypothetical proteins (EHPs) of S. aureus N315 has been carried out using bioinformatics tools and databases which enable us to annotate the function of 35 EHPs. Performance of the function prediction tools was evaluated using Receiver operating characteristic (ROC) curve analysis [19]. Furthermore, host non-homology analysis revealed that out of 35, 19 proteins are exclusively present in the pathogen and thus they can be considered as potential drug target candidates. Among the 19 proteins, two were found to be druggable and the rest were considered as novel targets. Identified targets can be further validated experimentally to design and develop novel drugs for treating S. aureus infections.

Materials and Methods

Sequence retrieval and analysis Essential genes of S. aureus N315 were retrieved from Database of Essential genes (DEG) [3]. We found a total of 302 essential genes and 40 were found to be hypothetical proteins encoded by these essential genes. Proteins encoded by essential genes are considered as essential proteins. Thus, these 40 proteins were termed as essential hypothetical proteins (EHPs). Framework used for functional annotation of EHPs is given in Fig. 1.

Functional assignment and Domain analysis For the assignment of functions to all the 40 EHPs of S. aureus various publicly available bioinformatics databases and tools, such as BLAST, Pfam, InterProScan and Conserved domain database (CDD) were used. BLAST is used to predict homologous proteins having identical or similar functions [20]. The Pfam database has a large collection of protein families (with annotations), each represented by multiple sequence alignments and hidden Markov models [21]. InterProScan scans the input sequence for matches against InterPro protein signature databases using InterProScan tool [22]. InterProScan combines different protein signature recognition methods from the InterPro consortium for motif discovery.

Functional motifs/domains present in EHPs were determined using CDD [23], Pfam and InterProScan.

Physicochemical characterization Theoretical physiochemical parameters, such as molecular weight, aliphatic index, isoelectric point, instability index and grand average of hydropathicity (GRAVY) of each protein were carried out by using Expasy's ProtParam server [24]. The predicted results are listed in Table S1.

Subcellular localization analysis The tools PSORTb, CELLO and SOSUI were used herein to predict subcellular localization of the hypothetical proteins (Table S1). PSORTb can more accurately predict subcellular localization of proteins in bacteria. Currently, both online version and standalone version for PSORTb are available [25]. CELLO is a Support Vector Machine (SVM) based online prediction system for possible subcellular localizations [26]. SOSUI distinguishes between membrane proteins and soluble proteins from amino acid sequences and predicts the transmembrane helices for the former [27].

Virulence factor prediction The virulence nature of the EHPs was predicted using both VirulentPred and VICMpred (Table S1). VICMpred is a web server which functionally classifies proteins of bacteria into virulence factor, information molecule, cellular process and metabolism molecule. Both VirulentPred and VICMpred are SVM based methods, which use patterns, amino acid and dipeptide composition of bacterial protein sequences to predict virulence factors, possessing an overall accuracy of 81.8% and 70.75%, respectively [28,29].

Evaluation of performance The performance of bioinformatics tools utilized for predicting protein function was carried out using ROC curve analysis [19] as it is one of the widely used statistical methods for evaluating accuracy of diagnostic tests/tools. In the present study, for each tool five levels were considered to rate its efficiency. The input data contains two columns- the first column represents the binary number 0 or 1 depending on whether the prediction is true negative (0) or true positive (1). The second column is the rate of efficiency represented by an integer (1 to 5) wherein higher number signifies greater confidence level. The ROC curve analysis was

performed for four tools applied on 100 protein sequences of S. aureus with known functions. The results were submitted to the online software in format-1 [30]. Upon executing the online ROC program, the measures of ROC curve, such as accuracy, sensitivity and specificity were obtained and reported in Tables S2 & S3.

Target Identification From functionally annotated proteins, pathogen-specific proteins i.e., proteins present in the pathogen but absent in the host were identified using host non-homology analysis. Further druggability analysis examined the druggable property of the proteins.

Host non-homology analysis All functionally annotated proteins were subjected to a protein BLAST (BLASTp) search [20] against the non-redundant database of the human proteome with the e-value threshold of 0.0001 and bit score cut-off of 100 [31,32]. Protein sequences that showed no significant hits were selected for further analysis.

Druggability analysis A druggable target should have potential to bind to drug-like molecules with high affinity. DrugBank is a bioinformatics resource containing information on drug and drug targets [33]. chEBML is a biological database comprising manually curated molecules with drug-like properties and biological activity against drug targets [34]. Short-listed host non-homologous proteins resulted from the previous analysis were subjected to a homology search against the DrugBank and chEMBL targets.

Presence

of the non-homologous

proteins

in

DrugBank/chEMBL target list with same function serves as evidence for their druggable property. Its absence, on the other hand, represents the novelty of the protein as a target and thus referred as ‘novel target’ [35].

Results and discussion Essential genes/proteins are absolutely required for the survival of an organism and thereby identifying them would lead to better understanding of the principles of life. A total of 302 essential genes of S.aureus N315 were retrieved from DEG and 40 proteins encoded by these essential genes were identified as hypothetical proteins. Until now there has been no experimental study to characterise these hypothetical proteins, thus an attempt was made to annotate the function of these hypothetical proteins using in silico approach. Bioinformatics

tools like BLAST, Pfam, InterProScan and CDD were used to assign the functions of these proteins and performance assessment of these tools was done by ROC curve analysis (Table S2). The functions of 35 EHPs was assigned with high confidence (Table 1) and they were observed to be present in different functional categories, namely enzymes (hydrolases, oxidoreductases, ligases, lyases, isomerases, transferases, metalloenzymes, permeases), binding proteins, miscellaneous proteins, helicases, transporters and virulence factors (Fig. 2). Detailed knowledge about these functional groups is important for understanding the molecular basis of pathogenesis and host-pathogen interaction. Descriptions of each group of proteins are illustrated below.

Functional annotation

Hydrolases A hydrolase is an enzyme that catalyses the hydrolysis of a chemical bond. The genomes of both Gram negative and Gram positive bacteria encode a wide variety of hydrolase enzymes, responsible for the specific cleavage of different peptidoglycan bonds. Hydrolases are also involved in many other functions, such as peptidoglycan maturation, turnover, recycling, autolysis, and cleavage of the septum during cell division [36]. In the present study, majority of the proteins (14 EHPs) were predicted as hydrolases. This hydrolyse family includes a number of subfamilies (viz., metallo-beta-lactamase, isochorismatase, ParB, YchF-GTPase, GIY-YIG nuclease, HD domain, ribonuclease-J, iosine triphosphate pyrophosphatase, GTPase-HflX, EngA, P-loop nucleoside triphosphate hydrolase, HAD-hydrolase). The complete list of functional categories is shown in Table 2. Beta-lactamase provides antibiotic resistance by breaking the structure of antibiotics. Metallo-beta-lactamase (SA0021) includes thiolesterases which belong to glyoxalase-II family attached with two zinc ions per molecule as a cofactor. It catalyzes the hydrolysis of S-D-lactoyl-glutathione to form glutathione and D-lactic acid [37]. Systematic name of isochorismatase (SA0181) is isochorismate pyruvate-hydrolase acting specifically on ether bonds (ether hydrolases). It catalyses the chemical reaction involving isochorismate in presence of water to produce 2,3-dihydroxy-2,3-dihydrobenzoate and pyruvate [38]. ParB like nuclease domain (SA0348) includes Escherichia coli plasmid protein Par-B and Sulfiredoxin-1. Par-B is localized to both poles of the pre-divisional cell following completion of DNA replication [39]. It has been reported that the parABS system is a broadly conserved molecular mechanism for plasmid partitioning and chromosome segregation in

bacteria. It mainly consists of three components, namely ParA ATPase, ParB DNA-binding protein, and cis-acting parS sequence [40]. GTPase is often described as molecular switch. YchF-GTPase protein domain (SA0351) is located at the C-terminus of the GTP-binding protein. It may be required for ribosome function or signal transduction from the ribosome to downstream targets [41]. The GIY-YIG superfamily (SA0446) groups nucleases having approximately 100 amino acids with two short motifs: "GIY" and "YIG" in the N-terminal part, followed by an Arginine residue in the centre and a Glutamic acid residue in the C-terminal part. The GIYYIG domain is implicated in cellular processes like DNA cleavage, transfer of mobile genetic elements, restriction of foreign DNA and DNA repair and maintenance of genome stability [41,42]. The HD domain (SA0560) belongs to the superfamily of phosphohydrolases, and participates mainly in nucleic acid metabolism and signal transduction. The highly conserved residues are histidine and aspartate, which are essential for its activity [43]. Ribonuclease-J proteins (SA0940) are about 50 to 77 kDa embodied with three conserved histidine residues at the central region. It is mostly related to N-terminal region of the beta-lactamase family. It cleaves the 5'-leader sequence of certain mRNAs and may play a role in the maturation and stability of specific mRNAs [44]. Inosine triphosphate pyrophosphatase (ITPA) (SA0998) hydrolyses the non-canonical purine nucleotides inosine triphosphate (ITP), xanthosine 5'triphosphate

(XTP),

2'-deoxy-N-6-hydroxylaminopurine

triposphate

(dHAPTP)

anddeoxyinosine triphosphate (dITP) to their respective monophosphate derivatives. ITPA acts on both deoxy- and ribose forms of nucleic acid. To avoid chromosomal lesions it excludes non-canonical purines from RNA and DNA precursor pools by preventing their incorporation [45]. Mitochondrial GTPase (MTG1) (rbgA) is required for mitochondrial translation, which belongs to MMR1/HSR1 GTP-binding protein family [46]. MMR_HSR1 functions as 50s ribosome binding GTPase. Full length GTPase protein is required for complete activity of the protein interacting with 50s ribosome and binding with both adenine and guanine nucleotides, having a preference for guanine nucleotide [46,47]. GTPase HflX (SA1147) family belongs to the conserved GTP-binding proteins having pleiotropic effect. HflX is a membrane-associated protease pair possessing housekeeping function. It is encoded downstream of RNA-chaperon Hfq as well as upstream of HflKC. The characteristics feature of this family is that it comprises a conserved domain having a glycine-rich segment at Nterminal region of the putative GTP binding domain. The HflX family is a member of translation factor superfamily, TRAFAC class, which belongs to the GTPase superclass of P-

loop nucleoside triphosphatases [48]. EngA protein belongs to the GTPase Der subfamily showing GTP-binding and GTP hydrolysis activities as an intrinsic biochemical property [49]. P-loop nucleoside triphosphate hydrolase catalyses the hydrolysis of the beta-gamma phosphate bond of a bound nucleoside triphosphate (NTP). The energy from NTP hydrolysis induces conformational changes that are important for its biological function. HADhydrolase, subfamily IIB (SA1957) is a part of the Haloacid Dehydrogenase (HAD) superfamily of aspartate-nucleophile hydrolases. The Class II subfamilies possess a characteristic domain positioned between the second and third conserved catalytic motifs of the superfamily domain. It has a predicted structure of Helix-Sheet-Sheet-(Helix or Sheet)Helix-Sheet-(variable)-Helix-Sheet-Sheet [50].

Oxidoreductases It was observed that one EHP (SA1509) belongs to oxidoreductase family. Ribonucleotide reductases (RNRs) (SA1509) are essential enzymes which catalyse the reduction of ribonucleotides to their respective deoxyribonucleotide, thus providing the precursors necessary for DNA synthesis. Proteins in this entry are orthologous to the novel transcription regulator, NrdR [51].

Ligases Two EHPs (SA0085 and SA0467) were predicted as ligases in the present study. Ligase enzyme catalyzes the joining of two large molecules by forming a new chemical bond. tRNA-dihydrouridine synthase (SA0085) catalyses the reduction of the 5,6-double bond of a uridine residue on tRNA. Most dihydrouridines can be seen in the D loop of t-RNAs [52]. tRNA(Ile) lysidine synthetase (TilS) (SA0467) catalyses lysidine formation by using lysine and ATP as substrates. It ligates lysine onto the cytidine at position 34 of the AUA codonspecific tRNA(Ile) consisting of anticodon CAU in an ATP-dependent manner. TilS substrate C-terminal domain represents the C-terminal domain of lysidine-tRNA(Ile) synthetase, which ligates lysine on cytidine34[53].

Lyases In our study, one EHP was predicted as lyase. Pyridoxal phosphate (SA1031) is the active form of vitamin B6 (pyridoxine or pyridoxal). A number of pyridoxal-dependent decarboxylases share regions of sequence similarity, particularly conserved lysine residue,

which provides the attachment site for the pyridoxal-phosphate (PLP) group [54]. These enzymes belong to the group II decarboxylases, which include aromatic-L-amino-acid decarboxylase, tyrosine decarboxylase and L-aspartate decarboxylase.

Isomerases Herein, one EHP was identified as isomerase. Alanine racemase plays a role in providing Dalanine required for cell wall biosynthesis (peptidoglycan biosynthesis) by isomerising Lalanine to D-alanine. The alanine racemase monomer is composed of two domains, an eightstranded alpha/beta barrel at the N terminus, and a C-terminal domain essentially composed of beta-strands [55]. The alpha-D-phosphohexomutase superfamily (SA2279) is composed of four

related

enzymes

(viz,

phosphoglucomutase

(PGM),

phosphoglucomutase/

phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM) and phosphoracetylglucosamine mutase (PAGM)), each of which catalyses a phosphoryl transfer on its sugar substrates [56].

Transferases Three EHPs were predicted as transferases. Transferases are involved in innumerable reactions of the cell including translation. rRNA small subunit methyltransferase-I (SA0447) catalyses 2-O-methylation of the ribose of cytidine 1402 (C1402) in 16S rRNA using Sadenosyl-L-methionine (SAM or Ado-Met) as a methyl donor. RsmI proteins employ 30S subunit as a substrate, suggesting that methylation reaction occurs at a late step during 30S assembly in the cell [57]. PlsY (SA1187) is a glycerol-3-phosphate acyltransferase (GPAT) that catalyses the transfer of an acyl group from acyl-ACP to glycerol-3-phosphate to form lysophosphatidic acid (LPA) [58]. Acyl-CoA N-acyltransferase (SA1252) has a 3-layer structure i.e. alpha/beta/alpha that contains mixed beta-sheets, and are found in N-acetyl transferase (NAT) family members [59], Autoinducer synthetases [60], Leucyl/phenylalanyltRNA-protein transferase (LFTR) and Ornithine decarboxylase antizyme.

Metalloenzymes One EHP was predicted as metalloenzyme. About one quarter to one third of all proteins are proposed to require metals to carry out their functions. They perform different functions in cells, such as storage and transport of proteins; also act as enzymes and signal transduction proteins. TatD-family (SA0449) is related to metalloenzyme superfamily, which includes TatD and many putative deoxyribonucleases and metal-dependent hydrolases [61].

Permeases Two EHPs in this study were predicted as permeases. Permeases are membrane transport proteins that facilitate the diffusion of specific molecules in and out of cells. Lipoprotein NlpA family (SA0422, SA0771) protein is a component of a D-methionine permease, a binding protein-dependent with ATP-driven transport system [62].

Nucleic acid Binding proteins Four EHPs were found to be binding proteins. Nucleoid-associated protein YbaB/EbfC (SA0437) is a family of DNA-binding proteins. Members of this family form homodimers, which bind DNA via a tweezer-like structure leading to conformational changes in DNA [63]. Sporulation regulator WhiA-like (SA0722) describes a family of DNA-binding proteins widely conserved in Gram positive bacteria [64]. The family includes the sporulation regulator WhiA, which is required for expression of the ParB partitioning protein during sporogenesis [65]. HP1423 type RNA-binding proteins (SA0464) contain an S4 RNAbinding domain. The S4 domain is a small domain with 60-65 amino acid residues, which mediates RNA binding [66]. The structure of HP1423 possesses the αL-RNA binding motif, which is the characteristic of several RNA binding protein families [67]. THUMP (SA1277) is an ancient domain with predicted RNA-binding capacity that probably functions by delivering a variety of RNA modification enzymes to their targets. The THUMP domain has 100-110 amino acid residues adopting an alpha/beta fold similar to that found in the Cterminal domain of translation initiation factor 3 and ribosomal protein S8 [68].

Miscellaneous proteins Three EHPs were identified as miscellaneous proteins. The impact protein (SA0703) is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [69]. CsbD (SA0772) is a bacterial general stress response protein whose expression is mediated by sigma-B, an alternative sigma factor [70].

The natural resistance-associated macrophage protein (NRAMP) family comprises Nramp1, Nramp2, and two yeast proteins (Smf1 and Smf2). The members of NRAMP (SA0956)

protein family have a conserved hydrophobic core with ten transmembrane domains [71]. Nramp1, an integral- membrane protein, is reported to express solely in cells associated with immune system and upon phagocytosis it is recruited to the membrane of a phagosome. Nramp2 is a transporter of divalent cations (viz., Fe , Mn , Zn ), which is known to express 2+

2+

2+

at high levels in mammals’ intestine; and is a chief transferrin-independent iron uptake system in mammals [72]. The yeast proteins Smf1 and Smf2 have also been reported to transport divalent cations [73].

Helicases DEAD-box ATP-dependent RNA helicase CshA (SA1885) is an enzyme that unwinds dsRNA in both 5'- and 3'-directions. It also has RNA-dependent ATPase activity and plays a role in ribosomal 50S subunit assembly [74]. DEAD box helicases are involved in the process of RNA metabolism, including nuclear transcription, pre-mRNA splicing, nucleocytoplasmic transport, ribosome biogenesis, translation, RNA decay and gene expression in organelles [75].

Transporters Herein, two EHPs were found to be transporters. CbiQ includes various cobalt transport proteins, most of which are found in Cobalamin (Vitamin B12) biosynthesis operons. Energycoupling factor (ECF) transporters are a subgroup of ATP-binding cassette (ABC) transporters involved in the uptake of vitamins and micronutrients in prokaryotes [76]. ECF transporters are protein complexes consisting of a conserved module (two peripheral ATPases and the integral membrane protein EcfT) and a non-conserved integral membrane protein responsible for substrate specificity (S-component) [77].

Virulence proteins Virulence factors are produced by pathogenic bacteria, viruses, fungi, and protozoa that give them effectiveness and enable them to bring damage to the host. VirulencePred predicted 13 EHPs as virulence factors. VICMpred depicted that out of 40 EHPs, 17 are involved in cellular process, 2 in information and storage, 18 in metabolism and 3 as virulence factors.

Virulence factors are good drug targets facilitating to design new type of therapeutic drugs i.e., antivirulence drugs. An antivirulence drug, targeting virulence factor, makes the

pathogen avirulent. It has been theorized that antivirulence drugs will make much weaker selection for resistance in pathogen compared to traditional antibiotics [78].

Subcellular localization

Subcellular localization analysis of proteins facilitates to classify them as drug and vaccine targets. The proteins which reside in the cytoplasm are believed to act as a possible drug target while proteins residing in the membrane can act as a possible vaccine target. Using subcellular prediction tools out of 40 HPs, 26 were found to be soluble cytoplasmic proteins and 6 were found to be membrane proteins. Details about each prediction result are shown in Table S1.

Potential drug target candidates An ideal drug target must be essential and pathogen-specific. It should not have any close homolog in the human proteome to minimize the risk of undesirable cross reactivity of a potential drug with the host proteins. Thereby, a host non-homology analysis was carried out to identify proteins that are non-homologous to human proteome. Functionally annotated 35 hypothetical proteins with high confidence, assessed through ROC curve analysis, were thus subjected to host non-homology analysis using a BLASTp search against the human proteome with an e-value threshold of 0.0001. Out of 35 proteins, 19 did not show any significant hit and thus they were referred as non-homologous i.e., solely present in the pathogen. Hence, these proteins can be considered as potential drug targets. To assess the druggability of the shortlisted 19 candidate proteins, a druggability analysis was carried out. Out of 19, two proteins (SA0940 & SA0021) were found to be druggable through chEMBL target search. Ribonuclease J (SA0940) has been previously reported as a drug target [79]. It possesses both endo- and exo-ribonuclease activities and plays a key role in pre-rRNA maturation and mRNA decay [80]. Beta lactamase (SA0021) is a well-known target of broadspectrum antibiotics, such as penicillin derivatives (penams), cephalosporins (cephems), monobactams, and carbapenems used in treating bacterial infections. The rest of the putative drug target candidates can be considered as novel targets, which should be further validated experimentally.

Conclusions Understanding the function of essential genes of a pathogenic microorganism has great importance in basic biology and medical science. In the present study, an in silico approacha combination of different bioinformatics tools/databases- was used for functional characterization of essential hypothetical proteins from S. aureus N315. ROC curve analysis explained that all the four tools considered herein had almost similar accuracy levels with a minute discrepancy suggesting that these tools are reliable for characterization of hypothetical proteins with high confidence level. The adopted methodology predicted the function of more than 87% of hypothetical proteins which belong to important functional categories. However, functional assignment for rest of the EHPs was not possible owing to the lack of enough evidence. Subcellular localization analysis predicted the cellular location of these proteins and 13 of them were found to be virulence in nature. Further, host nonhomology analysis revealed that 19 proteins are pathogen-specific which can be probable drug target candidates. Ribonuclease J (SA0940) and Beta lactamase (SA0021) are two known targets among the 19 pathogen-specific proteins. The remaining proteins were considered as ‘novel targets’ which needs to be further experimentally validated. The structural analyses of these annotated proteins are underway in our laboratory.

Conflict of Interest Authors declare there is no conflict of interest.

Acknowledgement JP and PG are thankful to Pondicherry University, Pondicherry for the pre-doctoral fellowship. Authors are indebted to Centre for Bioinformatics, Pondicherry University, Pondicherry for providing computational facility. Authors are thankful to Dr. R. Vishnu Vardhan, Department of Statistics, Pondicherry University, Pondicherry, for assisting in ROC curve analysis. DBT, DST and DIT, UGC-SAP, Govt. of India support research work carried out in the Centre for Bioinformatics.

References [1] A.P. Fraise, Bailliere’s Clinical Infectious Diseases: International Practice and Research, Antibiotic Resistance, Vol. 5, no. 2: R. G. Finch and R. J. Williams, Eds. Bailliere Tindall, London, 1999. ISSN 1071-6564, pound31.00, J. Antimicrob. Chemother. 46 (2000) 865–a–866. [2] K. Hiramatsu, N. Aritaka, H. Hanaki, S. Kawasaki, Y. Hosoda, S. Hori, Y. Fukuchi, I. Kobayashi, Dissemination in Japanese hospitals of strains of Staphylococcus aureus heterogeneously resistant to vancomycin, Lancet. 350 (1997) 1670–1673. [3] H. Luo, Y. Lin, F. Gao, C.-T. Zhang, R. Zhang, DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements, Nucleic Acids Res. 42 (2013) D574–D580. [4] Y. Ji, B. Zhang, S.F. Van, Horn, P. Warren, G. Woodnutt, M.K. Burnham, M. Rosenberg, Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA, Science. 293 (2001) 2266–2269. [5] R.A. Forsyth, R.J. Haselbeck, K.L. Ohlsen, R.T. Yamamoto, H. Xu, J.D. Trawick, D. Wall, L. Wang, V. Brown-Driver, J.M. Froelich, K.G. C, P. King, M. McCarthy, C. Malone, B. Misiner, D. Robbins, Z. Tan, Z.-Y. Zhu Zy, G. Carr, D.A. Mosca, C. Zamudio, J.G. Foulkes, J.W. Zyskind, A genome-wide strategy for the identification of essential genes in Staphylococcus aureus, Mol. Microbiol. 43 (2002) 1387–1400. [6] M. Shahbaaz, K. Bisetty, F. Ahmad, M.I. Hassan, Current Advances in the Identification and Characterization of Putative Drug and Vaccine Targets in the Bacterial Genomes, Curr. Top. Med. Chem. 16 (2016) 1040–1069. [7] M.S. Islam, S.M. Shahik, M. Sohel, N.I.A. Patwary, M.A. Hasan, In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio cholerae O139, Genomics Inform. 13 (2015) 53–59. [8] K. Kumar, A. Prakash, M. Tasleem, A. Islam, F. Ahmad, M.I. Hassan, Functional annotation of putative hypothetical proteins from Candida dubliniensis, Gene. 543 (2014) 93–100. [9] A.A. Turab Naqvi, S. Rahman, Rubi, F. Zeya, K. Kumar, H. Choudhary, M.S. Jamal, J. Kim, M.I. Hassan, Genome analysis of Chlamydia trachomatis for functional characterization of hypothetical proteins to discover novel drug targets, Int. J. Biol. Macromol. 96 (2017) 234–240.

[10] A. P Bidkar, A.P. Bidkar, In-silico Structural and Functional Analysis of Hypothetical Proteins of Leptospira Interrogans, Biochemistry & Pharmacology: Open Access. 03 (2014). doi:10.4172/2167-0501.1000136. [11] J. Hoskeri. H, J.H. H, Functional Annotation of Conserved Hypothetical Proteins in Rickettsia

Massiliae

MTU5,

J.

Comput.

Sci.

Syst.

Biol.

03

(2010).

doi:10.4172/jcsb.1000055. [12] G.K. Mazandu, N.J. Mulder, Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins, Int. J. Mol. Sci. 13 (2012) 7283–7302. [13] M. Shahbaaz, M. ImtaiyazHassan, F. Ahmad, Functional Annotation of Conserved Hypothetical Proteins from Haemophilus influenzae Rd KW20, PLoS One. 8 (2013) e84263. [14] M. Shahbaaz, K. Bisetty, F. Ahmad, M. Hassan, Functional Insight into Putative Conserved Proteins of Rickettsia rickettsii and their Virulence Characterization, Curr. Proteomics. 12 (2015) 101–116. [15] S. Kumar, Computational functional and structural annotation of hypothetical proteins of Neisseria Meningitidis MC58. Conference: International conference on Biochemsitry, At Kuala Lumpur Malaysia, Volume: Biochem.Anal.Biochem 2016, 5:3(suppl) DOI: 10.4172/2161-1009.S1.005. [16] A.A.T. Naqvi, F. Ahmad, M.I. Hassan, Identification of functional candidates amongst hypothetical proteins of Mycobacterium leprae Br4923, a causative agent of leprosy, Genome. 58 (2015) 25–42. [17] A.A.T. Naqvi, M. Shahbaaz, F. Ahmad, M.I. Hassan, Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum, PLoS One. 10 (2015) e0124177. [18] S. Khan, M.S. Jamal, F. Anjum, M. Rasool, A. Ansari, A. Islam, F. Ahmad, M.I. Hassan, Functional annotation of putative conserved proteins from Borrelia burgdorferi to find potential drug targets, Int. J. Comput. Biol. Drug Des. 9 (2016) 295. [19] A.P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit. 30 (1997) 1145–1159. [20] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410. [21] R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A. Bateman, The Pfam protein families database: towards a more sustainable future, Nucleic Acids Res.

44 (2016) D279–85. [22] P. Jones, D. Binns, H.-Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J. Maslen, A. Mitchell, G. Nuka, S. Pesseat, A.F. Quinn, A. Sangrador-Vegas, M. Scheremetjew, S.-Y. Yong, R. Lopez, S. Hunter, InterProScan 5: genome-scale protein function classification, Bioinformatics. 30 (2014) 1236–1240. [23] A. Marchler-Bauer, Y. Bo, L. Han, J. He, C.J. Lanczycki, S. Lu, F. Chitsaz, M.K. Derbyshire, R.C. Geer, N.R. Gonzales, M. Gwadz, D.I. Hurwitz, F. Lu, G.H. Marchler, J.S. Song, N. Thanki, Z. Wang, R.A. Yamashita, D. Zhang, C. Zheng, L.Y. Geer, S.H. Bryant, CDD/SPARCLE: functional classification of proteins via subfamily domain architectures, Nucleic Acids Res. 45 (2017) D200–D203. [24] E. Gasteiger, C. Hoogland, A. Gattiker, S. ’everine Duvaud, M.R. Wilkins, R.D. Appel, A. Bairoch, Protein Identification and Analysis Tools on the ExPASy Server, in: The Proteomics Protocols Handbook, 2005: pp. 571–607. [25] N.Y. Yu, J.R. Wagner, M.R. Laird, G. Melli, S. Rey, R. Lo, P. Dao, S. Cenk Sahinalp, M. Ester, L.J. Foster, F.S.L. Brinkman, PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes, Bioinformatics. 26 (2010) 1608–1615. [26] C.-S. Yu, C.-J. Lin, J.-K. Hwang, Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions, Protein Sci. 13 (2004) 1402–1406. [27] T. Hirokawa, S. Boon-Chieng, S. Mitaku, SOSUI: classification and secondary structure prediction system for membrane proteins, Bioinformatics. 14 (1998) 378–379. [28] S. Saha, G.P.S. Raghava, VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition, Genomics Proteomics Bioinformatics. 4 (2006) 42–47. [29] A. Garg, D. Gupta, VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens, BMC Bioinformatics. 9 (2008) 62. [30] J. Eng, ROC Analysis: Web-based Calculator for ROC Curves, (n.d.). from http://www.jrocfit.org. [31] A. Jadhav, B. Shanmugham, A. Rajendiran, A. Pan, Unraveling novel broad-spectrum antibacterial targets in food and waterborne pathogens using comparative genomics and protein interaction network analysis, Infect. Genet. Evol. 27 (2014) 300–308. [32] A. Jadhav, V. Ezhilarasan, O. Prakash Sharma, A. Pan, Clostridium-DT(DB): a comprehensive database for potential drug targets of Clostridium difficile, Comput. Biol.

Med. 43 (2013) 362–367. [33] V. Law, C. Knox, Y. Djoumbou, T. Jewison, A.C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, A. Tang, G. Gabriel, C. Ly, S. Adamjee, Z.T. Dame, B. Han, Y. Zhou, D.S. Wishart, DrugBank 4.0: shedding new light on drug metabolism, Nucleic Acids Res. 42 (2013) D1091–D1097. [34] A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, J.P. Overington, The ChEMBL bioactivity database: an update, Nucleic Acids Res. 42 (2014) D1083–90. [35] B. Shanmugham, A. Pan, Identification and Characterization of Potential Therapeutic Candidates in Emerging Human Pathogen Mycobacterium abscessus: A Novel Hierarchical In Silico Approach, PLoS One. 8 (2013) e59126. [36] J.V. Höltje, From growth to autolysis: the murein hydrolases in Escherichia coli, Arch. Microbiol. 164 (1995) 243–254. [37] A. Carfi, S. Pares, E. Duée, M. Galleni, C. Duez, J.M. Frère, O. Dideberg, The 3-D structure of a zinc metallo-beta-lactamase from Bacillus cereus reveals a new type of protein fold, EMBO J. 14 (1995) 4914–4921. [38] I.G. Young, F. Gibson, Regulation of the enzymes involved in the biosynthesis of 2,3dihydroxybenzoic acid in Aerobacter aerogenes and Escherichia coli, Biochimica et Biophysica Acta (BBA) - General Subjects. 177 (1969) 401–411. [39] R.M. Figge, J. Easter, J.W. Gober, Productive interaction between the chromosome partitioning proteins, ParA and ParB, is required for the progression of the cell cycle in Caulobacter crescentus, Mol. Microbiol. 47 (2003) 1225–1237. [40] J.A. Surtees, B.E. Funnell, Plasmid and Chromosome Traffic Control: How ParA and ParB Drive Partition, in: Current Topics in Developmental Biology, 2003: pp. 145–180. [41] C.E. Caldon, P. Yoong, P.E. March, Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function, Mol. Microbiol. 41 (2001) 289–297. [42] S. Dunin-Horkawicz, M. Feder, J.M. Bujnicki, Phylogenomic analysis of the GIY-YIG nuclease superfamily, BMC Genomics. 7 (2006) 98. [43] L. Aravind, E.V. Koonin, The HD domain defines a new superfamily of metal-dependent phosphohydrolases, Trends Biochem. Sci. 23 (1998) 469–472. [44] R. Madhugiri, E. Evguenieva-Hackenberg, RNase J is involved in the 5’-end maturation of 16S rRNA and 23S rRNA in Sinorhizobium meliloti, FEBS Lett. 583 (2009) 2339– 2342. [45] N.E.

Burgis,

R.P.

Cunningham,

Substrate

specificity

of

RdgB

protein,

a

deoxyribonucleoside triphosphate pyrophosphohydrolase, J. Biol. Chem. 282 (2007) 3531–3538. [46] A. Barrientos, D. Korr, K.J. Barwell, C. Sjulsen, C.D. Gajewski, G. Manfredi, S. Ackerman, A. Tzagoloff, MTG1 codes for a conserved protein required for mitochondrial translation, Mol. Biol. Cell. 14 (2003) 2292–2302. [47] C. Vernet, M.T. Ribouchon, G. Chimini, P. Pontarotti, Structure and evolution of a member of a new subfamily of GTP-binding proteins mapping to the human MHC class I region, Mamm. Genome. 5 (1994) 100–105. [48] D. Dutta, K. Bandyopadhyay, A.B. Datta, A.A. Sardesai, P. Parrack, Properties of HflX, an enigmatic protein from Escherichia coli, J. Bacteriol. 191 (2009) 2307–2314. [49] R. Gopalaswamy, Cloning, overexpression, and characterization of a serine/threonine protein kinase pknI from Mycobacterium tuberculosis H37Rv, Protein Expr. Purif. (2004). doi:10.1016/s1046-5928(04)00110-x. [50] E.V. Koonin, R.L. Tatusov, Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search, J. Mol. Biol. 244 (1994) 125–132. [51] P. Reichard, From RNA to DNA, why so many ribonucleotide reductases?, Science. 260 (1993) 1773–1777. [52] F. Xing, M.R. Martzen, E.M. Phizicky, A conserved family of Saccharomyces cerevisiae synthases effects dihydrouridine modification of tRNA, RNA. 8 (2002) 370–381. [53] C. Fabret, E. Dervyn, B. Dalmais, A. Guillot, C. Marck, H. Grosjean, P. Noirot, Life without the essential bacterial tRNA Ile2-lysidine synthetase TilS: a case of tRNA gene recruitment in Bacillus subtilis, Mol. Microbiol. 80 (2011) 1062–1074. [54] E. Sandmeier, T.I. Hale, P. Christen, Multiple evolutionary origin of pyridoxal-5’phosphate-dependent amino acid decarboxylases, Eur. J. Biochem. 221 (1994) 997– 1002. [55] J.P. Shaw, G.A. Petsko, D. Ringe, Determination of the structure of alanine racemase from Bacillus stearothermophilus at 1.9-A resolution, Biochemistry. 36 (1997) 1329– 1342. [56] S. Levin, S.C. Almo, B.H. Satir, Functional diversity of the phosphoglucomutase superfamily: structural implications, Protein Eng. 12 (1999) 737–746. [57] S. Kimura, T. Suzuki, Fine-tuning of the ribosomal decoding center by conserved methyl-modifications in the Escherichia coli 16S rRNA, Nucleic Acids Res. 38 (2010) 1341–1352.

[58] Y.-J. Lu, Y.-M. Zhang, K.D. Grimes, J. Qi, R.E. Lee, C.O. Rock, Acyl-phosphates initiate membrane phospholipid synthesis in Gram-positive pathogens, Mol. Cell. 23 (2006) 765–772. [59] D.L. Burk, X-ray structure of the AAC(6’)-Ii antibiotic resistance enzyme at 1.8 A resolution; examination of oligomeric arrangements in GNAT superfamily members, Protein Sci. 12 (2003) 426–437. [60] T.A. Gould, H.P. Schweizer, M.E.A. Churchill, Structure of the Pseudomonas aeruginosa acyl-homoserinelactone synthase LasI, Mol. Microbiol. 53 (2004) 1135– 1146. [61] L. Holm, C. Sander, An evolutionary treasure: unification of a broad set of amidohydrolases related to urease, Proteins. 28 (1997) 72–82. [62] J. Gál, A. Szvetnik, R. Schnell, M. Kálmán, The metD D-methionine transporter locus of Escherichia coli is an ABC transporter gene cluster, J. Bacteriol. 184 (2002) 4930–4932. [63] A.E. Cooley, S.P. Riley, K. Kral, M.C. Miller, E. DeMoll, M.G. Fried, B. Stevenson, DNA-binding by Haemophilus influenzae and Escherichia coli YbaB, members of a widely-distributed bacterial protein family, BMC Microbiol. 9 (2009) 137. [64] K. Surdova, P. Gamba, D. Claessen, T. Siersma, M.J. Jonker, J. Errington, L.W. Hamoen, The conserved DNA-binding protein WhiA is involved in cell division in Bacillus subtilis, J. Bacteriol. 195 (2013) 5450–5460. [65] J.A. Aínsa, N.J. Ryding, N. Hartley, K.C. Findlay, C.J. Bruton, K.F. Chater, WhiA, a protein of unknown function conserved among gram-positive bacteria, is essential for sporulation in Streptomyces coelicolor A3(2), J. Bacteriol. 182 (2000) 5470–5478. [66] L. Aravind, E.V. Koonin, Novel predicted RNA-binding domains associated with the translation machinery, J. Mol. Evol. 48 (1999) 291–302. [67] J.-H. Kim, S.J. Park, K.-Y. Lee, W.-S. Son, N.-Y. Sohn, A.-R. Kwon, B.-J. Lee, Solution structure of hypothetical protein HP1423 (Y1423_HELPY) reveals the presence of alphaL motif related to RNA binding, Proteins. 75 (2009) 252–257. [68] L. Aravind, E.V. Koonin, THUMP--a predicted RNA-binding domain shared by 4thiouridine, pseudouridine synthases and RNA methylases, Trends Biochem. Sci. 26 (2001) 215–217. [69] K. Okamura, Y. Hagiwara-Takeuchi, T. Li, T.H. Vu, M. Hirai, M. Hattori, Y. Sakaki, A.R. Hoffman, T. Ito, Comparative genome analysis of the mouse imprinted gene impact and its nonimprinted human homolog IMPACT: toward the structural basis for speciesspecific imprinting, Genome Res. 10 (2000) 1878–1889.

[70] Z. Prágai, C.R. Harwood, Regulatory interactions between the Pho and sigma(B)dependent general stress regulons of Bacillus subtilis, Microbiology. 148 (2002) 1593– 1602. [71] M. Cellier, G. Privé, A. Belouchi, T. Kwan, V. Rodrigues, W. Chia, P. Gros, Nramp defines a family of membrane proteins, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 10089– 10093. [72] G. Govoni, P. Gros, Macrophage NRAMP1 and its role in resistance to microbial infections, Inflamm. Res. 47 (1998) 277–284. [73] E. Pinner, S. Gruenheid, M. Raymond, P. Gros, Functional complementation of the yeast divalent cation transporter family SMF by NRAMP2, a member of the mammalian natural resistance-associated macrophage protein family, J. Biol. Chem. 272 (1997) 28933–28938. [74] M. Lehnik-Habrink, L. Rempeters, Á.T. Kovács, C. Wrede, C. Baierlein, H. Krebber, O.P. Kuipers, J. Stülke, DEAD-Box RNA helicases in Bacillus subtilis have multiple functions and act independently from each other, J. Bacteriol. 195 (2013) 534–544. [75] J. de la Cruz, D. Kressler, P. Linder, Unwinding RNA in Saccharomyces cerevisiae: DEAD-box proteins and related families, Trends Biochem. Sci. 24 (1999) 192–198. [76] L. Aravind, Y.I. Wolf, E.V. Koonin, The ATP-cone: an evolutionarily mobile, ATPbinding regulatory domain, J. Mol. Microbiol. Biotechnol. 2 (2000) 191–194. [77] G.B. Erkens, M. Majsnerowska, J. ter Beek, D.J. Slotboom, Energy coupling factor-type ABC transporters for vitamin uptake in prokaryotes, Biochemistry. 51 (2012) 4390– 4396. [78] R.C. Allen, R. Popat, S.P. Diggle, S.P. Brown, Targeting virulence: can we make evolution-proof drugs?, Nat. Rev. Microbiol. 12 (2014) 300–308. [79] Y. Redko, E. Galtier, H. Arnion, F. Darfeuille, O. Sismeiro, J.-Y. Coppée, C. Médigue, M. Weiman, S. Cruveiller, H. De Reuse, RNase J depletion leads to massive changes in mRNA abundance in Helicobacter pylori, RNA Biol. 13 (2016) 243–253. [80] T.M. Eidem, C.M. Roux, P.M. Dunman, RNA decay: a novel therapeutic target in bacteria, Wiley Interdiscip. Rev. RNA. 3 (2012) 443–454.

Figure Legends Fig. 1: Complete framework used for functional annotation of essential hypothetical proteins. Fig. 2: The distribution of functional classes of essential hypothetical proteins.

Table Legends Table 1: The complete list of conserved domain data for essential hypothetical proteins in S. aureus N315 Table 2: List of functionally annotated essential hypothetical proteins in S.aureus N315

Table 1: The complete list of conserved domain data for essential hypothetical proteins in S. aureus N315 DEG AC. No.

GENE NAME

BLAST

InterproScan

Pfam

CDD

Pathogen specific

DEG10020008

SA0021

Metallo-beta-lactamase domain protein

Metallo-beta lactamase

Metallo-beta-lactamase superfamily

MBL-fold-metallo hydrolase domain



DEG10020009

SA0085

tRNA-dihydrouridine synthase

tRNA-dihydrouridine synthase

Dihydrouridine synthase

Dihydrouridine synthaselike (DUS-like) FMNbinding domain



DEG10020012

SA0181

Isochorismatase

isochorismatase like domain

isochorismatase family

Cysteine hydrolases (also contains isochorismatase)



DEG10020015

SA0230

Hypothetical

Unknown function

Protein of unknown function

Uncharacterized conserved protein YeaO, DUF488 family



DEG10020017

SA0348

Chromosome partitioning protein ParB

ParB/RepB/Spo0J partition protein family

ParB-like nuclease domain

ParB-like nuclease domain



DEG10020018

SA0351

GTP-binding protein YchF

Ribosome-binding ATPase YchF/Obg-like ATPase 1

MMR_HSR1, YchFGTPase-C

YchF GTPase



DEG10020025

SA0422

NLPA lipoprotein

Lipoprotein NLPA family

NLPA lipoprotein

PBP2_lipoprotein_GmpC



DEG10020027

SA0437

Nucleoid associated protein, YbaB/EbfC family

Nucleoid-associated protein YbaB/EbfC family

YbaB DNA binding family

Hypothetical protein



DEG10020029

SA0446

Hypothetical

Domain- GIY-YIG nuclease superfamily

GIY-YIG catalytic domain

Predicted endonuclease, GIY-YIG superfamily



DEG10020030

SA0447

16s rRNA (2`-O) methyltransferase

rRNA small subunit methyltransferase I

Tetrapyrrole methylase

16S rRNA C1402 (ribose2'-O) methylase RsmI



DEG10020032

SA0449

Hydrolase TatD

TatD family

TatD Dnase

TatD (DNase activity)



DEG10020036

SA0464

Hypothetical

RNA-binding protein, HP1423 type

TatD_Dnase

HslR, heat shock protein, contains S4 domain



DEG10020037

SA0467

tRNA(Ile)-lysidine synthase

tRNA(Ile)-lysidine synthase

ATP bind-3, TilS-C substrate domain

tRNA(Ile)-lysidine synthase TilS/MesJ



DEG10020054

SA0560

HD domain containing protein

Domain- HD/PDEase domain

HD-domain

Metal dependent phosphohydrolases with conserved 'HD' motif



DEG10020063

SA0703

YigZ family protein

Impact family

Uncharacterized protein

Uncharacterized protein family



DEG10020069

SA0722

Sporulation regulation WhiA

Sporulation regulator WhiA-like

whiA_N-terminal

WhiA C-terminal HTH domain



DEG10020074

SA0732

Hypothetical

Unknown

Unknown

Unknown



DEG10020076

SA0771

Methionine ABC transporter substrate binding protein

Lipoprotein NlpA family

Lipoprotein

ABC-type metal ion transport system



DEG10020077

SA0772

CsbD family protein

CsbD-like

CsbD-like

Uncharacterized conserved protein YjbJ



Table 1:Cont. DEG AC. No.

DEG10020093

GENE NAME SA0940

BLAST

Ribonuclease J

InterproScan

Pfam

CDD

Pathogen specific

Ribonuclease J

Lactamase-B , Zndependent metallo hydrolase RNA species

RNAaseJ, MBL-fold metallo-hydrolase domain





DEG10020098

SA0956

Manganese transporter

NRAMP family

NRAMP

Manganese transport protein MntH, / Natural resistance-associated macrophage protein (NRAMP)

DEG10020106

SA0998

Non-canonical purine NTP pyrophosphatase

Inosine triphosphate pyrophosphatase-like

Ham1p-like

Nucleosidetriphosphatase



DEG10020111

SA1031

YqqS family pyridoxal phosphate enzyme

Uncharatized protein family

Ala_racemase_N

Uncharacterized pyridoxal phosphatecontaining protein



DEG10020128

rbgA

Ribosome biogenesis GTPase YlgF

GTPase, MTG1

MMR_HSR(ribosome binding GTPase)

Circularly permuted YlqF GTPase



DEG10020145

SA1147

GTPase HflX

GTPase HflX

GTP binding GTPase

HflX GTPase family



DEG10020146

SA1176

DUF896 family protein

Unknown function

DUF896

Bacterial protein of unknown function (DUF896)



DEG10020148

SA1187

Glycerol-3-phosphate acyltransferase

Glycerol-3-phosphate acyltransferase, PlsY

G3P-acyltransferase

putative glycerol-3phosphate acyltransferase PlsY



DEG10020155

SA1252

Acyltransferase

Domain- Acyl-CoA Nacyltransferase

Acyltransferase

Acetyltransferase (GNAT) family





DEG10020158

SA1277

RNA methyltransferase

THUMP domain

THUMP

THUMP domain associated with Sadenosylmethioninedependent methyltransferases

DEG10020161

engA

Ribosome biogenesis GTPase Der

GTP-binding protein EngA

MMR-HSR1 (ribosome binding GTPase)

EngA2 GTPase



DEG10020177

SA1445

Putative cytosolic protein

Unknown function

DUF965

Hypothetical protein



DEG10020193

SA1509

NrdR family transcription regulator

Ribonucleotide reductase regulator NrdR-like

ATP-cone domain

Transcriptional regulator NrdR, contains Znribbon and ATP-cone domains



DEG10020225

vga

ABC-transporter ATPbinding protein

Domain- P-loop containing nucleoside triphosphate hydrolase

ABC-transporter

ABC transporter Cterminal domain



DEG10020228

SA1885

DEAD/DEAH box family ATP- dependent RNA helicase

DEAD-box ATPdependent RNA helicase CshA

DEAD/DEAH box helicase

DEAD-box helicases, Helicase superfamily cterminal domain



Table 1:Cont. Pathogen specific

GENE NAME

BLAST

Interpro Scan

DEG10020241

SA1957

Haloacid dehalogenase , caf-like hydrolase

HAD-superfamily hydrolase, subfamily IIB

Hydrolase

DEG10020245

SA1966

Hypothetical

YbbR-like

YbbR-like protein (Hypothetical)

YbbR-like protein



DEG10020249

SA2019

Energy-coupling factor transporter protein FcfT

ABC/ECF transporter, transmembrane component

CbiQ

Energy-coupling factor transporter transmembrane protein EcfT



SA2020

Energy-coupling factor transporter ATPase

Energy-coupling factor transporter ATP-binding protein EcfA2

ABC-transporter

ATP-binding cassette component of cobalt transport system



PGM-PMM-I

CD includes PGM2 (phosphoglucomutase 2) and PGM2L1 (phosphoglucomutase 2like 1)



ABC-transporter

ABC-type lipoprotein export system, ATPase component



DEG AC. No.

DEG10020250

DEG10020288

SA2279

Phosphoglucomutase

Alpha-Dphosphohexomutase superfamily

DEG10020298

vraD

ABC-transporter ATPbinding protein

Domain- P-loop containing nucleoside triphosphate hydrolase

Pfam

CDD Haloacid dehalogenase-like hydrolases



Table 2: List of functionally annotated essential hypothetical proteins in S.aureus N315 Gene name SA0021 SA0085 SA0181 SA0230 SA0348 SA0351 SA0422 SA0437 SA0446 SA0447 SA0449 SA0464 SA0467 SA0560 SA0703 SA0722 SA0732 SA0771 SA0772 SA0940 SA0956 SA0998 SA1031 rbgA SA1147 SA1176 SA1187 SA1252 SA1277 engA SA1445 SA1509 Vga SA1885 SA1957 SA1966

Function Metallo-beta lactamase tRNA-dihydrouridine synthase Isochorismatase Hypothetical ParB-like nuclease domain YchF GTPase NLPA lipoprotein Nucleoid associated protein, YbaB/EbfC family GIY-YIG nuclease superfamily rRNA small subunit methyltransferase I TatD family RNA-binding protein, HP1423 type tRNA(Ile)-lysidine synthase HD-domain Impact family Sporulation regulator WhiA-like Hypothetical NLPA lipoprotein CsbD family protein Ribonuclease J NRAMP family Inosine triphosphate pyrophosphatase-like Pyridoxal phosphate-containing protein GTPase, MTG1 GTPase HflX Unknown function Glycerol-3-phosphate acyltransferase, PlsY Acyl-CoA N-acyltransferase THUMP domain GTP-binding protein EngA Unknown function Ribonucleotide reductase regulator NrdR-like P-loop containing nucleoside triphosphate hydrolase DEAD-box ATP-dependent RNA helicase CshA HAD-superfamily hydrolase, subfamily IIB Hypothetical

Enzyme Hydrolase Ligase Hydrolase Unknown Hydrolase Hydrolase Permease Binding protein Hydrolase Transferase Metallo-enzymes Binding protein Ligase Hydrolase Miscellaneous Binding protein Unknown Permease Miscellaneous Hydrolase Miscellaneous Hydrolase Lyase Hydrolase Hydrolase Unknown Transferase Transferase Binding protein Hydrolase Unknown Oxidoreductase Hydrolase Helicase Hydrolase Unknown

SA2019 SA2020 SA2279 vraD

CbiQ Energy-coupling factor transporter ATP-binding protein EcfA2 Alpha-D-phosphohexomutase superfamily P-loop containing nucleoside triphosphate hydrolase

Transport protein Transport protein Isomerase Hydrolase

Fig. 1: Complete framework used for functional annotation of essential hypothetical proteins. DEG: Database of Essential Genes; EPs: Essential proteins; EHPs: Essential hypothetical proteins

Hydrolases Oxidoreductases Ligases

13% 3%

5%

Lyases

35%

Isomerases

8%

Transferases 10%

Metalloenzymes 5% 7%

2%

Permeases

5%

Binding proteins 2%

3%

2%

Miscellaneous proteins Helicases Transporters Unknown

Fig. 2: The distribution of functional classes of essential hypothetical proteins.