Accepted Manuscript Title: Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315 Authors: Jyoti Prava, Pranavathiyani G, Archana Pan PII: DOI: Reference:
S0141-8130(17)32061-5 https://doi.org/10.1016/j.ijbiomac.2017.10.169 BIOMAC 8464
To appear in:
International Journal of Biological Macromolecules
Received date: Revised date: Accepted date:
23-6-2017 26-9-2017 26-10-2017
Please cite this article as: Jyoti Prava, Pranavathiyani G, Archana Pan, Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315, International Journal of Biological Macromolecules https://doi.org/10.1016/j.ijbiomac.2017.10.169 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315
Jyoti Prava, Pranavathiyani G and Archana Pan* Centre for Bioinformatics, Pondicherry University, Pondicherry, India 605014.
*Corresponding Author: Archana Pan Centre for Bioinformatics Pondicherry University Pondicherry-605014, India Tel: +91-0413-2654584 Email:
[email protected]
Abstract Staphylococcus aureus, the causative agent of nosocomial infections worldwide, has acquired resistance to almost all antibiotics stressing the need to develop novel drugs against this pathogen. In S.aureus N315, 302 genes have been identified as essential genes, indispensable for growth and survival of the pathogen. The functions of 40 proteins encoded by S.aureus essential genes were found to be hypothetical and thus referred as essential hypothetical proteins (EHPs). The present study aims to carry out functional characterization of EHPs using bioinformatics tools/databases, whose performance was assessed by Receiver operating characteristic curve analysis. Evaluation of physicochemical parameters, homology search against known proteins, domain analysis, subcellular localization analysis and virulence prediction assisted us to characterize EHPs. Functional assignment for 35 EHPs was made with high confidence. They belong to different functional classes like enzymes, binding proteins, miscellaneous proteins, helicases, transporters and virulence factors. Around 35% of EHPs were from hydrolases family. A group of EHPs (32.5%) were predicted as virulence factors. Of 35, 19 essential pathogen-specific proteins were considered as probable drug targets. Two targets were found to be druggable and others were novel targets. Outcome of the study could aid to identify novel drugs for better treatment of S.aureus infections. Keywords: Staphylococcus aureus, Essential hypothetical proteins, Functional annotation, Drug targets
Introduction
Staphylococcus aureus is a Gram positive bacterium belonging to the phylum Firmicutes. The organism is the etiological agent of a wide variety of human diseases ranging from mild skin infections and food-poisoning to severe life-threatening pneumonia, meningitis, endocarditis infections, osteomyelitis, sepsis and toxic shock syndrome. According to World Health Organization, S. aureus infections pose the greatest threat to mankind than cancer. Over the past decades, the incidence of S. aureus infections has increased dramatically owing to the emergence of antibiotic-resistant strains, such as methicillin-resistant S. aureus (MRSA) [1] and vancomycin-resistant S. aureus (VRSA) [2]. In particular, MRSA has overcome the effectiveness of almost all antibiotics currently available in the market and is considered as the major cause of nosocomial infections worldwide [1]. The scenario emphasizes the need to develop new drugs for preventing and controlling such infections.
A small fraction of genes in a genome are absolutely necessary for the growth and survival of an organism and thus they are considered as foundation of life. These genes are termed as essential genes, and proteins encoded by them are referred as essential proteins. Theoretically all essential genes/proteins in a genome are potential drug targets as deletion or inactivation of such genes/proteins is lethal for the organism. Thus, the prediction of gene essentiality in a pathogenic microorganism could help to shortlist potential drug targets for designing antimicrobial agents. Understanding the function of essential genes/proteins is a necessary step towards exploring the basic principle of cell functionality, which in turn facilitates to comprehend pathogen system. Database of Essential Genes (DEG) is a repository of essential genomic elements, such as protein-coding genes and non-coding RNAs, identified experimentally from several bacteria, archaea and eukaryotes [3]. A large number of essential genes for S. aureus N315 have been identified using antisense RNA technique [4,5]. S. aureus N315 comprises 2624 protein-coding genes out of which 302 are reported as essential genes in DEG. The function of 40 proteins encoded by essential genes of this pathogen is unknown opening an avenue to functionally characterize these proteins. The proteins of unknown function are referred as hypothetical proteins (HPs). Functional characterization of hypothetical proteins can lead to the identification of novel therapeutic targets facilitating the process of drug repositioning [6]. Further functional classification of hypothetical proteins into different functional categories could shed a light into their structures, activities and their roles in the metabolism [6]. Several Bioinformatics tools and databases are available for
functional characterization of HPs [6]. These have been successfully used to annotate the function of uncharacterized HPs of various pathogens, including Vibrio cholerae O139, Candida dubliniensis, Chlamydia trachomatis, Leptospira interrogans, Rickettsia massiliae MTU5, Mycobacterium tuberculosis, Haemophilus influenzae, Rickettsia rickettsii, Neisseria meningitidis MC58, Mycobacterium leprae Br4923, Treponema pallidum and Borrelia burgdorferi [7–18]. In the present study, the functional characterization of 40 essential hypothetical proteins (EHPs) of S. aureus N315 has been carried out using bioinformatics tools and databases which enable us to annotate the function of 35 EHPs. Performance of the function prediction tools was evaluated using Receiver operating characteristic (ROC) curve analysis [19]. Furthermore, host non-homology analysis revealed that out of 35, 19 proteins are exclusively present in the pathogen and thus they can be considered as potential drug target candidates. Among the 19 proteins, two were found to be druggable and the rest were considered as novel targets. Identified targets can be further validated experimentally to design and develop novel drugs for treating S. aureus infections.
Materials and Methods
Sequence retrieval and analysis Essential genes of S. aureus N315 were retrieved from Database of Essential genes (DEG) [3]. We found a total of 302 essential genes and 40 were found to be hypothetical proteins encoded by these essential genes. Proteins encoded by essential genes are considered as essential proteins. Thus, these 40 proteins were termed as essential hypothetical proteins (EHPs). Framework used for functional annotation of EHPs is given in Fig. 1.
Functional assignment and Domain analysis For the assignment of functions to all the 40 EHPs of S. aureus various publicly available bioinformatics databases and tools, such as BLAST, Pfam, InterProScan and Conserved domain database (CDD) were used. BLAST is used to predict homologous proteins having identical or similar functions [20]. The Pfam database has a large collection of protein families (with annotations), each represented by multiple sequence alignments and hidden Markov models [21]. InterProScan scans the input sequence for matches against InterPro protein signature databases using InterProScan tool [22]. InterProScan combines different protein signature recognition methods from the InterPro consortium for motif discovery.
Functional motifs/domains present in EHPs were determined using CDD [23], Pfam and InterProScan.
Physicochemical characterization Theoretical physiochemical parameters, such as molecular weight, aliphatic index, isoelectric point, instability index and grand average of hydropathicity (GRAVY) of each protein were carried out by using Expasy's ProtParam server [24]. The predicted results are listed in Table S1.
Subcellular localization analysis The tools PSORTb, CELLO and SOSUI were used herein to predict subcellular localization of the hypothetical proteins (Table S1). PSORTb can more accurately predict subcellular localization of proteins in bacteria. Currently, both online version and standalone version for PSORTb are available [25]. CELLO is a Support Vector Machine (SVM) based online prediction system for possible subcellular localizations [26]. SOSUI distinguishes between membrane proteins and soluble proteins from amino acid sequences and predicts the transmembrane helices for the former [27].
Virulence factor prediction The virulence nature of the EHPs was predicted using both VirulentPred and VICMpred (Table S1). VICMpred is a web server which functionally classifies proteins of bacteria into virulence factor, information molecule, cellular process and metabolism molecule. Both VirulentPred and VICMpred are SVM based methods, which use patterns, amino acid and dipeptide composition of bacterial protein sequences to predict virulence factors, possessing an overall accuracy of 81.8% and 70.75%, respectively [28,29].
Evaluation of performance The performance of bioinformatics tools utilized for predicting protein function was carried out using ROC curve analysis [19] as it is one of the widely used statistical methods for evaluating accuracy of diagnostic tests/tools. In the present study, for each tool five levels were considered to rate its efficiency. The input data contains two columns- the first column represents the binary number 0 or 1 depending on whether the prediction is true negative (0) or true positive (1). The second column is the rate of efficiency represented by an integer (1 to 5) wherein higher number signifies greater confidence level. The ROC curve analysis was
performed for four tools applied on 100 protein sequences of S. aureus with known functions. The results were submitted to the online software in format-1 [30]. Upon executing the online ROC program, the measures of ROC curve, such as accuracy, sensitivity and specificity were obtained and reported in Tables S2 & S3.
Target Identification From functionally annotated proteins, pathogen-specific proteins i.e., proteins present in the pathogen but absent in the host were identified using host non-homology analysis. Further druggability analysis examined the druggable property of the proteins.
Host non-homology analysis All functionally annotated proteins were subjected to a protein BLAST (BLASTp) search [20] against the non-redundant database of the human proteome with the e-value threshold of 0.0001 and bit score cut-off of 100 [31,32]. Protein sequences that showed no significant hits were selected for further analysis.
Druggability analysis A druggable target should have potential to bind to drug-like molecules with high affinity. DrugBank is a bioinformatics resource containing information on drug and drug targets [33]. chEBML is a biological database comprising manually curated molecules with drug-like properties and biological activity against drug targets [34]. Short-listed host non-homologous proteins resulted from the previous analysis were subjected to a homology search against the DrugBank and chEMBL targets.
Presence
of the non-homologous
proteins
in
DrugBank/chEMBL target list with same function serves as evidence for their druggable property. Its absence, on the other hand, represents the novelty of the protein as a target and thus referred as ‘novel target’ [35].
Results and discussion Essential genes/proteins are absolutely required for the survival of an organism and thereby identifying them would lead to better understanding of the principles of life. A total of 302 essential genes of S.aureus N315 were retrieved from DEG and 40 proteins encoded by these essential genes were identified as hypothetical proteins. Until now there has been no experimental study to characterise these hypothetical proteins, thus an attempt was made to annotate the function of these hypothetical proteins using in silico approach. Bioinformatics
tools like BLAST, Pfam, InterProScan and CDD were used to assign the functions of these proteins and performance assessment of these tools was done by ROC curve analysis (Table S2). The functions of 35 EHPs was assigned with high confidence (Table 1) and they were observed to be present in different functional categories, namely enzymes (hydrolases, oxidoreductases, ligases, lyases, isomerases, transferases, metalloenzymes, permeases), binding proteins, miscellaneous proteins, helicases, transporters and virulence factors (Fig. 2). Detailed knowledge about these functional groups is important for understanding the molecular basis of pathogenesis and host-pathogen interaction. Descriptions of each group of proteins are illustrated below.
Functional annotation
Hydrolases A hydrolase is an enzyme that catalyses the hydrolysis of a chemical bond. The genomes of both Gram negative and Gram positive bacteria encode a wide variety of hydrolase enzymes, responsible for the specific cleavage of different peptidoglycan bonds. Hydrolases are also involved in many other functions, such as peptidoglycan maturation, turnover, recycling, autolysis, and cleavage of the septum during cell division [36]. In the present study, majority of the proteins (14 EHPs) were predicted as hydrolases. This hydrolyse family includes a number of subfamilies (viz., metallo-beta-lactamase, isochorismatase, ParB, YchF-GTPase, GIY-YIG nuclease, HD domain, ribonuclease-J, iosine triphosphate pyrophosphatase, GTPase-HflX, EngA, P-loop nucleoside triphosphate hydrolase, HAD-hydrolase). The complete list of functional categories is shown in Table 2. Beta-lactamase provides antibiotic resistance by breaking the structure of antibiotics. Metallo-beta-lactamase (SA0021) includes thiolesterases which belong to glyoxalase-II family attached with two zinc ions per molecule as a cofactor. It catalyzes the hydrolysis of S-D-lactoyl-glutathione to form glutathione and D-lactic acid [37]. Systematic name of isochorismatase (SA0181) is isochorismate pyruvate-hydrolase acting specifically on ether bonds (ether hydrolases). It catalyses the chemical reaction involving isochorismate in presence of water to produce 2,3-dihydroxy-2,3-dihydrobenzoate and pyruvate [38]. ParB like nuclease domain (SA0348) includes Escherichia coli plasmid protein Par-B and Sulfiredoxin-1. Par-B is localized to both poles of the pre-divisional cell following completion of DNA replication [39]. It has been reported that the parABS system is a broadly conserved molecular mechanism for plasmid partitioning and chromosome segregation in
bacteria. It mainly consists of three components, namely ParA ATPase, ParB DNA-binding protein, and cis-acting parS sequence [40]. GTPase is often described as molecular switch. YchF-GTPase protein domain (SA0351) is located at the C-terminus of the GTP-binding protein. It may be required for ribosome function or signal transduction from the ribosome to downstream targets [41]. The GIY-YIG superfamily (SA0446) groups nucleases having approximately 100 amino acids with two short motifs: "GIY" and "YIG" in the N-terminal part, followed by an Arginine residue in the centre and a Glutamic acid residue in the C-terminal part. The GIYYIG domain is implicated in cellular processes like DNA cleavage, transfer of mobile genetic elements, restriction of foreign DNA and DNA repair and maintenance of genome stability [41,42]. The HD domain (SA0560) belongs to the superfamily of phosphohydrolases, and participates mainly in nucleic acid metabolism and signal transduction. The highly conserved residues are histidine and aspartate, which are essential for its activity [43]. Ribonuclease-J proteins (SA0940) are about 50 to 77 kDa embodied with three conserved histidine residues at the central region. It is mostly related to N-terminal region of the beta-lactamase family. It cleaves the 5'-leader sequence of certain mRNAs and may play a role in the maturation and stability of specific mRNAs [44]. Inosine triphosphate pyrophosphatase (ITPA) (SA0998) hydrolyses the non-canonical purine nucleotides inosine triphosphate (ITP), xanthosine 5'triphosphate
(XTP),
2'-deoxy-N-6-hydroxylaminopurine
triposphate
(dHAPTP)
anddeoxyinosine triphosphate (dITP) to their respective monophosphate derivatives. ITPA acts on both deoxy- and ribose forms of nucleic acid. To avoid chromosomal lesions it excludes non-canonical purines from RNA and DNA precursor pools by preventing their incorporation [45]. Mitochondrial GTPase (MTG1) (rbgA) is required for mitochondrial translation, which belongs to MMR1/HSR1 GTP-binding protein family [46]. MMR_HSR1 functions as 50s ribosome binding GTPase. Full length GTPase protein is required for complete activity of the protein interacting with 50s ribosome and binding with both adenine and guanine nucleotides, having a preference for guanine nucleotide [46,47]. GTPase HflX (SA1147) family belongs to the conserved GTP-binding proteins having pleiotropic effect. HflX is a membrane-associated protease pair possessing housekeeping function. It is encoded downstream of RNA-chaperon Hfq as well as upstream of HflKC. The characteristics feature of this family is that it comprises a conserved domain having a glycine-rich segment at Nterminal region of the putative GTP binding domain. The HflX family is a member of translation factor superfamily, TRAFAC class, which belongs to the GTPase superclass of P-
loop nucleoside triphosphatases [48]. EngA protein belongs to the GTPase Der subfamily showing GTP-binding and GTP hydrolysis activities as an intrinsic biochemical property [49]. P-loop nucleoside triphosphate hydrolase catalyses the hydrolysis of the beta-gamma phosphate bond of a bound nucleoside triphosphate (NTP). The energy from NTP hydrolysis induces conformational changes that are important for its biological function. HADhydrolase, subfamily IIB (SA1957) is a part of the Haloacid Dehydrogenase (HAD) superfamily of aspartate-nucleophile hydrolases. The Class II subfamilies possess a characteristic domain positioned between the second and third conserved catalytic motifs of the superfamily domain. It has a predicted structure of Helix-Sheet-Sheet-(Helix or Sheet)Helix-Sheet-(variable)-Helix-Sheet-Sheet [50].
Oxidoreductases It was observed that one EHP (SA1509) belongs to oxidoreductase family. Ribonucleotide reductases (RNRs) (SA1509) are essential enzymes which catalyse the reduction of ribonucleotides to their respective deoxyribonucleotide, thus providing the precursors necessary for DNA synthesis. Proteins in this entry are orthologous to the novel transcription regulator, NrdR [51].
Ligases Two EHPs (SA0085 and SA0467) were predicted as ligases in the present study. Ligase enzyme catalyzes the joining of two large molecules by forming a new chemical bond. tRNA-dihydrouridine synthase (SA0085) catalyses the reduction of the 5,6-double bond of a uridine residue on tRNA. Most dihydrouridines can be seen in the D loop of t-RNAs [52]. tRNA(Ile) lysidine synthetase (TilS) (SA0467) catalyses lysidine formation by using lysine and ATP as substrates. It ligates lysine onto the cytidine at position 34 of the AUA codonspecific tRNA(Ile) consisting of anticodon CAU in an ATP-dependent manner. TilS substrate C-terminal domain represents the C-terminal domain of lysidine-tRNA(Ile) synthetase, which ligates lysine on cytidine34[53].
Lyases In our study, one EHP was predicted as lyase. Pyridoxal phosphate (SA1031) is the active form of vitamin B6 (pyridoxine or pyridoxal). A number of pyridoxal-dependent decarboxylases share regions of sequence similarity, particularly conserved lysine residue,
which provides the attachment site for the pyridoxal-phosphate (PLP) group [54]. These enzymes belong to the group II decarboxylases, which include aromatic-L-amino-acid decarboxylase, tyrosine decarboxylase and L-aspartate decarboxylase.
Isomerases Herein, one EHP was identified as isomerase. Alanine racemase plays a role in providing Dalanine required for cell wall biosynthesis (peptidoglycan biosynthesis) by isomerising Lalanine to D-alanine. The alanine racemase monomer is composed of two domains, an eightstranded alpha/beta barrel at the N terminus, and a C-terminal domain essentially composed of beta-strands [55]. The alpha-D-phosphohexomutase superfamily (SA2279) is composed of four
related
enzymes
(viz,
phosphoglucomutase
(PGM),
phosphoglucomutase/
phosphomannomutase (PGM/PMM), phosphoglucosamine mutase (PNGM) and phosphoracetylglucosamine mutase (PAGM)), each of which catalyses a phosphoryl transfer on its sugar substrates [56].
Transferases Three EHPs were predicted as transferases. Transferases are involved in innumerable reactions of the cell including translation. rRNA small subunit methyltransferase-I (SA0447) catalyses 2-O-methylation of the ribose of cytidine 1402 (C1402) in 16S rRNA using Sadenosyl-L-methionine (SAM or Ado-Met) as a methyl donor. RsmI proteins employ 30S subunit as a substrate, suggesting that methylation reaction occurs at a late step during 30S assembly in the cell [57]. PlsY (SA1187) is a glycerol-3-phosphate acyltransferase (GPAT) that catalyses the transfer of an acyl group from acyl-ACP to glycerol-3-phosphate to form lysophosphatidic acid (LPA) [58]. Acyl-CoA N-acyltransferase (SA1252) has a 3-layer structure i.e. alpha/beta/alpha that contains mixed beta-sheets, and are found in N-acetyl transferase (NAT) family members [59], Autoinducer synthetases [60], Leucyl/phenylalanyltRNA-protein transferase (LFTR) and Ornithine decarboxylase antizyme.
Metalloenzymes One EHP was predicted as metalloenzyme. About one quarter to one third of all proteins are proposed to require metals to carry out their functions. They perform different functions in cells, such as storage and transport of proteins; also act as enzymes and signal transduction proteins. TatD-family (SA0449) is related to metalloenzyme superfamily, which includes TatD and many putative deoxyribonucleases and metal-dependent hydrolases [61].
Permeases Two EHPs in this study were predicted as permeases. Permeases are membrane transport proteins that facilitate the diffusion of specific molecules in and out of cells. Lipoprotein NlpA family (SA0422, SA0771) protein is a component of a D-methionine permease, a binding protein-dependent with ATP-driven transport system [62].
Nucleic acid Binding proteins Four EHPs were found to be binding proteins. Nucleoid-associated protein YbaB/EbfC (SA0437) is a family of DNA-binding proteins. Members of this family form homodimers, which bind DNA via a tweezer-like structure leading to conformational changes in DNA [63]. Sporulation regulator WhiA-like (SA0722) describes a family of DNA-binding proteins widely conserved in Gram positive bacteria [64]. The family includes the sporulation regulator WhiA, which is required for expression of the ParB partitioning protein during sporogenesis [65]. HP1423 type RNA-binding proteins (SA0464) contain an S4 RNAbinding domain. The S4 domain is a small domain with 60-65 amino acid residues, which mediates RNA binding [66]. The structure of HP1423 possesses the αL-RNA binding motif, which is the characteristic of several RNA binding protein families [67]. THUMP (SA1277) is an ancient domain with predicted RNA-binding capacity that probably functions by delivering a variety of RNA modification enzymes to their targets. The THUMP domain has 100-110 amino acid residues adopting an alpha/beta fold similar to that found in the Cterminal domain of translation initiation factor 3 and ribosomal protein S8 [68].
Miscellaneous proteins Three EHPs were identified as miscellaneous proteins. The impact protein (SA0703) is a translational regulator that ensures constant high levels of translation under amino acid starvation. It acts by interacting with Gcn1/Gcn1L1, thereby preventing activation of Gcn2 protein kinases (EIF2AK1 to 4) and subsequent down-regulation of protein synthesis. It is evolutionary conserved from eukaryotes to archaea [69]. CsbD (SA0772) is a bacterial general stress response protein whose expression is mediated by sigma-B, an alternative sigma factor [70].
The natural resistance-associated macrophage protein (NRAMP) family comprises Nramp1, Nramp2, and two yeast proteins (Smf1 and Smf2). The members of NRAMP (SA0956)
protein family have a conserved hydrophobic core with ten transmembrane domains [71]. Nramp1, an integral- membrane protein, is reported to express solely in cells associated with immune system and upon phagocytosis it is recruited to the membrane of a phagosome. Nramp2 is a transporter of divalent cations (viz., Fe , Mn , Zn ), which is known to express 2+
2+
2+
at high levels in mammals’ intestine; and is a chief transferrin-independent iron uptake system in mammals [72]. The yeast proteins Smf1 and Smf2 have also been reported to transport divalent cations [73].
Helicases DEAD-box ATP-dependent RNA helicase CshA (SA1885) is an enzyme that unwinds dsRNA in both 5'- and 3'-directions. It also has RNA-dependent ATPase activity and plays a role in ribosomal 50S subunit assembly [74]. DEAD box helicases are involved in the process of RNA metabolism, including nuclear transcription, pre-mRNA splicing, nucleocytoplasmic transport, ribosome biogenesis, translation, RNA decay and gene expression in organelles [75].
Transporters Herein, two EHPs were found to be transporters. CbiQ includes various cobalt transport proteins, most of which are found in Cobalamin (Vitamin B12) biosynthesis operons. Energycoupling factor (ECF) transporters are a subgroup of ATP-binding cassette (ABC) transporters involved in the uptake of vitamins and micronutrients in prokaryotes [76]. ECF transporters are protein complexes consisting of a conserved module (two peripheral ATPases and the integral membrane protein EcfT) and a non-conserved integral membrane protein responsible for substrate specificity (S-component) [77].
Virulence proteins Virulence factors are produced by pathogenic bacteria, viruses, fungi, and protozoa that give them effectiveness and enable them to bring damage to the host. VirulencePred predicted 13 EHPs as virulence factors. VICMpred depicted that out of 40 EHPs, 17 are involved in cellular process, 2 in information and storage, 18 in metabolism and 3 as virulence factors.
Virulence factors are good drug targets facilitating to design new type of therapeutic drugs i.e., antivirulence drugs. An antivirulence drug, targeting virulence factor, makes the
pathogen avirulent. It has been theorized that antivirulence drugs will make much weaker selection for resistance in pathogen compared to traditional antibiotics [78].
Subcellular localization
Subcellular localization analysis of proteins facilitates to classify them as drug and vaccine targets. The proteins which reside in the cytoplasm are believed to act as a possible drug target while proteins residing in the membrane can act as a possible vaccine target. Using subcellular prediction tools out of 40 HPs, 26 were found to be soluble cytoplasmic proteins and 6 were found to be membrane proteins. Details about each prediction result are shown in Table S1.
Potential drug target candidates An ideal drug target must be essential and pathogen-specific. It should not have any close homolog in the human proteome to minimize the risk of undesirable cross reactivity of a potential drug with the host proteins. Thereby, a host non-homology analysis was carried out to identify proteins that are non-homologous to human proteome. Functionally annotated 35 hypothetical proteins with high confidence, assessed through ROC curve analysis, were thus subjected to host non-homology analysis using a BLASTp search against the human proteome with an e-value threshold of 0.0001. Out of 35 proteins, 19 did not show any significant hit and thus they were referred as non-homologous i.e., solely present in the pathogen. Hence, these proteins can be considered as potential drug targets. To assess the druggability of the shortlisted 19 candidate proteins, a druggability analysis was carried out. Out of 19, two proteins (SA0940 & SA0021) were found to be druggable through chEMBL target search. Ribonuclease J (SA0940) has been previously reported as a drug target [79]. It possesses both endo- and exo-ribonuclease activities and plays a key role in pre-rRNA maturation and mRNA decay [80]. Beta lactamase (SA0021) is a well-known target of broadspectrum antibiotics, such as penicillin derivatives (penams), cephalosporins (cephems), monobactams, and carbapenems used in treating bacterial infections. The rest of the putative drug target candidates can be considered as novel targets, which should be further validated experimentally.
Conclusions Understanding the function of essential genes of a pathogenic microorganism has great importance in basic biology and medical science. In the present study, an in silico approacha combination of different bioinformatics tools/databases- was used for functional characterization of essential hypothetical proteins from S. aureus N315. ROC curve analysis explained that all the four tools considered herein had almost similar accuracy levels with a minute discrepancy suggesting that these tools are reliable for characterization of hypothetical proteins with high confidence level. The adopted methodology predicted the function of more than 87% of hypothetical proteins which belong to important functional categories. However, functional assignment for rest of the EHPs was not possible owing to the lack of enough evidence. Subcellular localization analysis predicted the cellular location of these proteins and 13 of them were found to be virulence in nature. Further, host nonhomology analysis revealed that 19 proteins are pathogen-specific which can be probable drug target candidates. Ribonuclease J (SA0940) and Beta lactamase (SA0021) are two known targets among the 19 pathogen-specific proteins. The remaining proteins were considered as ‘novel targets’ which needs to be further experimentally validated. The structural analyses of these annotated proteins are underway in our laboratory.
Conflict of Interest Authors declare there is no conflict of interest.
Acknowledgement JP and PG are thankful to Pondicherry University, Pondicherry for the pre-doctoral fellowship. Authors are indebted to Centre for Bioinformatics, Pondicherry University, Pondicherry for providing computational facility. Authors are thankful to Dr. R. Vishnu Vardhan, Department of Statistics, Pondicherry University, Pondicherry, for assisting in ROC curve analysis. DBT, DST and DIT, UGC-SAP, Govt. of India support research work carried out in the Centre for Bioinformatics.
References [1] A.P. Fraise, Bailliere’s Clinical Infectious Diseases: International Practice and Research, Antibiotic Resistance, Vol. 5, no. 2: R. G. Finch and R. J. Williams, Eds. Bailliere Tindall, London, 1999. ISSN 1071-6564, pound31.00, J. Antimicrob. Chemother. 46 (2000) 865–a–866. [2] K. Hiramatsu, N. Aritaka, H. Hanaki, S. Kawasaki, Y. Hosoda, S. Hori, Y. Fukuchi, I. Kobayashi, Dissemination in Japanese hospitals of strains of Staphylococcus aureus heterogeneously resistant to vancomycin, Lancet. 350 (1997) 1670–1673. [3] H. Luo, Y. Lin, F. Gao, C.-T. Zhang, R. Zhang, DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements, Nucleic Acids Res. 42 (2013) D574–D580. [4] Y. Ji, B. Zhang, S.F. Van, Horn, P. Warren, G. Woodnutt, M.K. Burnham, M. Rosenberg, Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA, Science. 293 (2001) 2266–2269. [5] R.A. Forsyth, R.J. Haselbeck, K.L. Ohlsen, R.T. Yamamoto, H. Xu, J.D. Trawick, D. Wall, L. Wang, V. Brown-Driver, J.M. Froelich, K.G. C, P. King, M. McCarthy, C. Malone, B. Misiner, D. Robbins, Z. Tan, Z.-Y. Zhu Zy, G. Carr, D.A. Mosca, C. Zamudio, J.G. Foulkes, J.W. Zyskind, A genome-wide strategy for the identification of essential genes in Staphylococcus aureus, Mol. Microbiol. 43 (2002) 1387–1400. [6] M. Shahbaaz, K. Bisetty, F. Ahmad, M.I. Hassan, Current Advances in the Identification and Characterization of Putative Drug and Vaccine Targets in the Bacterial Genomes, Curr. Top. Med. Chem. 16 (2016) 1040–1069. [7] M.S. Islam, S.M. Shahik, M. Sohel, N.I.A. Patwary, M.A. Hasan, In Silico Structural and Functional Annotation of Hypothetical Proteins of Vibrio cholerae O139, Genomics Inform. 13 (2015) 53–59. [8] K. Kumar, A. Prakash, M. Tasleem, A. Islam, F. Ahmad, M.I. Hassan, Functional annotation of putative hypothetical proteins from Candida dubliniensis, Gene. 543 (2014) 93–100. [9] A.A. Turab Naqvi, S. Rahman, Rubi, F. Zeya, K. Kumar, H. Choudhary, M.S. Jamal, J. Kim, M.I. Hassan, Genome analysis of Chlamydia trachomatis for functional characterization of hypothetical proteins to discover novel drug targets, Int. J. Biol. Macromol. 96 (2017) 234–240.
[10] A. P Bidkar, A.P. Bidkar, In-silico Structural and Functional Analysis of Hypothetical Proteins of Leptospira Interrogans, Biochemistry & Pharmacology: Open Access. 03 (2014). doi:10.4172/2167-0501.1000136. [11] J. Hoskeri. H, J.H. H, Functional Annotation of Conserved Hypothetical Proteins in Rickettsia
Massiliae
MTU5,
J.
Comput.
Sci.
Syst.
Biol.
03
(2010).
doi:10.4172/jcsb.1000055. [12] G.K. Mazandu, N.J. Mulder, Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins, Int. J. Mol. Sci. 13 (2012) 7283–7302. [13] M. Shahbaaz, M. ImtaiyazHassan, F. Ahmad, Functional Annotation of Conserved Hypothetical Proteins from Haemophilus influenzae Rd KW20, PLoS One. 8 (2013) e84263. [14] M. Shahbaaz, K. Bisetty, F. Ahmad, M. Hassan, Functional Insight into Putative Conserved Proteins of Rickettsia rickettsii and their Virulence Characterization, Curr. Proteomics. 12 (2015) 101–116. [15] S. Kumar, Computational functional and structural annotation of hypothetical proteins of Neisseria Meningitidis MC58. Conference: International conference on Biochemsitry, At Kuala Lumpur Malaysia, Volume: Biochem.Anal.Biochem 2016, 5:3(suppl) DOI: 10.4172/2161-1009.S1.005. [16] A.A.T. Naqvi, F. Ahmad, M.I. Hassan, Identification of functional candidates amongst hypothetical proteins of Mycobacterium leprae Br4923, a causative agent of leprosy, Genome. 58 (2015) 25–42. [17] A.A.T. Naqvi, M. Shahbaaz, F. Ahmad, M.I. Hassan, Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum, PLoS One. 10 (2015) e0124177. [18] S. Khan, M.S. Jamal, F. Anjum, M. Rasool, A. Ansari, A. Islam, F. Ahmad, M.I. Hassan, Functional annotation of putative conserved proteins from Borrelia burgdorferi to find potential drug targets, Int. J. Comput. Biol. Drug Des. 9 (2016) 295. [19] A.P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognit. 30 (1997) 1145–1159. [20] S.F. Altschul, W. Gish, W. Miller, E.W. Myers, D.J. Lipman, Basic local alignment search tool, J. Mol. Biol. 215 (1990) 403–410. [21] R.D. Finn, P. Coggill, R.Y. Eberhardt, S.R. Eddy, J. Mistry, A.L. Mitchell, S.C. Potter, M. Punta, M. Qureshi, A. Sangrador-Vegas, G.A. Salazar, J. Tate, A. Bateman, The Pfam protein families database: towards a more sustainable future, Nucleic Acids Res.
44 (2016) D279–85. [22] P. Jones, D. Binns, H.-Y. Chang, M. Fraser, W. Li, C. McAnulla, H. McWilliam, J. Maslen, A. Mitchell, G. Nuka, S. Pesseat, A.F. Quinn, A. Sangrador-Vegas, M. Scheremetjew, S.-Y. Yong, R. Lopez, S. Hunter, InterProScan 5: genome-scale protein function classification, Bioinformatics. 30 (2014) 1236–1240. [23] A. Marchler-Bauer, Y. Bo, L. Han, J. He, C.J. Lanczycki, S. Lu, F. Chitsaz, M.K. Derbyshire, R.C. Geer, N.R. Gonzales, M. Gwadz, D.I. Hurwitz, F. Lu, G.H. Marchler, J.S. Song, N. Thanki, Z. Wang, R.A. Yamashita, D. Zhang, C. Zheng, L.Y. Geer, S.H. Bryant, CDD/SPARCLE: functional classification of proteins via subfamily domain architectures, Nucleic Acids Res. 45 (2017) D200–D203. [24] E. Gasteiger, C. Hoogland, A. Gattiker, S. ’everine Duvaud, M.R. Wilkins, R.D. Appel, A. Bairoch, Protein Identification and Analysis Tools on the ExPASy Server, in: The Proteomics Protocols Handbook, 2005: pp. 571–607. [25] N.Y. Yu, J.R. Wagner, M.R. Laird, G. Melli, S. Rey, R. Lo, P. Dao, S. Cenk Sahinalp, M. Ester, L.J. Foster, F.S.L. Brinkman, PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes, Bioinformatics. 26 (2010) 1608–1615. [26] C.-S. Yu, C.-J. Lin, J.-K. Hwang, Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions, Protein Sci. 13 (2004) 1402–1406. [27] T. Hirokawa, S. Boon-Chieng, S. Mitaku, SOSUI: classification and secondary structure prediction system for membrane proteins, Bioinformatics. 14 (1998) 378–379. [28] S. Saha, G.P.S. Raghava, VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition, Genomics Proteomics Bioinformatics. 4 (2006) 42–47. [29] A. Garg, D. Gupta, VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens, BMC Bioinformatics. 9 (2008) 62. [30] J. Eng, ROC Analysis: Web-based Calculator for ROC Curves, (n.d.). from http://www.jrocfit.org. [31] A. Jadhav, B. Shanmugham, A. Rajendiran, A. Pan, Unraveling novel broad-spectrum antibacterial targets in food and waterborne pathogens using comparative genomics and protein interaction network analysis, Infect. Genet. Evol. 27 (2014) 300–308. [32] A. Jadhav, V. Ezhilarasan, O. Prakash Sharma, A. Pan, Clostridium-DT(DB): a comprehensive database for potential drug targets of Clostridium difficile, Comput. Biol.
Med. 43 (2013) 362–367. [33] V. Law, C. Knox, Y. Djoumbou, T. Jewison, A.C. Guo, Y. Liu, A. Maciejewski, D. Arndt, M. Wilson, V. Neveu, A. Tang, G. Gabriel, C. Ly, S. Adamjee, Z.T. Dame, B. Han, Y. Zhou, D.S. Wishart, DrugBank 4.0: shedding new light on drug metabolism, Nucleic Acids Res. 42 (2013) D1091–D1097. [34] A.P. Bento, A. Gaulton, A. Hersey, L.J. Bellis, J. Chambers, M. Davies, F.A. Krüger, Y. Light, L. Mak, S. McGlinchey, M. Nowotka, G. Papadatos, R. Santos, J.P. Overington, The ChEMBL bioactivity database: an update, Nucleic Acids Res. 42 (2014) D1083–90. [35] B. Shanmugham, A. Pan, Identification and Characterization of Potential Therapeutic Candidates in Emerging Human Pathogen Mycobacterium abscessus: A Novel Hierarchical In Silico Approach, PLoS One. 8 (2013) e59126. [36] J.V. Höltje, From growth to autolysis: the murein hydrolases in Escherichia coli, Arch. Microbiol. 164 (1995) 243–254. [37] A. Carfi, S. Pares, E. Duée, M. Galleni, C. Duez, J.M. Frère, O. Dideberg, The 3-D structure of a zinc metallo-beta-lactamase from Bacillus cereus reveals a new type of protein fold, EMBO J. 14 (1995) 4914–4921. [38] I.G. Young, F. Gibson, Regulation of the enzymes involved in the biosynthesis of 2,3dihydroxybenzoic acid in Aerobacter aerogenes and Escherichia coli, Biochimica et Biophysica Acta (BBA) - General Subjects. 177 (1969) 401–411. [39] R.M. Figge, J. Easter, J.W. Gober, Productive interaction between the chromosome partitioning proteins, ParA and ParB, is required for the progression of the cell cycle in Caulobacter crescentus, Mol. Microbiol. 47 (2003) 1225–1237. [40] J.A. Surtees, B.E. Funnell, Plasmid and Chromosome Traffic Control: How ParA and ParB Drive Partition, in: Current Topics in Developmental Biology, 2003: pp. 145–180. [41] C.E. Caldon, P. Yoong, P.E. March, Evolution of a molecular switch: universal bacterial GTPases regulate ribosome function, Mol. Microbiol. 41 (2001) 289–297. [42] S. Dunin-Horkawicz, M. Feder, J.M. Bujnicki, Phylogenomic analysis of the GIY-YIG nuclease superfamily, BMC Genomics. 7 (2006) 98. [43] L. Aravind, E.V. Koonin, The HD domain defines a new superfamily of metal-dependent phosphohydrolases, Trends Biochem. Sci. 23 (1998) 469–472. [44] R. Madhugiri, E. Evguenieva-Hackenberg, RNase J is involved in the 5’-end maturation of 16S rRNA and 23S rRNA in Sinorhizobium meliloti, FEBS Lett. 583 (2009) 2339– 2342. [45] N.E.
Burgis,
R.P.
Cunningham,
Substrate
specificity
of
RdgB
protein,
a
deoxyribonucleoside triphosphate pyrophosphohydrolase, J. Biol. Chem. 282 (2007) 3531–3538. [46] A. Barrientos, D. Korr, K.J. Barwell, C. Sjulsen, C.D. Gajewski, G. Manfredi, S. Ackerman, A. Tzagoloff, MTG1 codes for a conserved protein required for mitochondrial translation, Mol. Biol. Cell. 14 (2003) 2292–2302. [47] C. Vernet, M.T. Ribouchon, G. Chimini, P. Pontarotti, Structure and evolution of a member of a new subfamily of GTP-binding proteins mapping to the human MHC class I region, Mamm. Genome. 5 (1994) 100–105. [48] D. Dutta, K. Bandyopadhyay, A.B. Datta, A.A. Sardesai, P. Parrack, Properties of HflX, an enigmatic protein from Escherichia coli, J. Bacteriol. 191 (2009) 2307–2314. [49] R. Gopalaswamy, Cloning, overexpression, and characterization of a serine/threonine protein kinase pknI from Mycobacterium tuberculosis H37Rv, Protein Expr. Purif. (2004). doi:10.1016/s1046-5928(04)00110-x. [50] E.V. Koonin, R.L. Tatusov, Computer analysis of bacterial haloacid dehalogenases defines a large superfamily of hydrolases with diverse specificity. Application of an iterative approach to database search, J. Mol. Biol. 244 (1994) 125–132. [51] P. Reichard, From RNA to DNA, why so many ribonucleotide reductases?, Science. 260 (1993) 1773–1777. [52] F. Xing, M.R. Martzen, E.M. Phizicky, A conserved family of Saccharomyces cerevisiae synthases effects dihydrouridine modification of tRNA, RNA. 8 (2002) 370–381. [53] C. Fabret, E. Dervyn, B. Dalmais, A. Guillot, C. Marck, H. Grosjean, P. Noirot, Life without the essential bacterial tRNA Ile2-lysidine synthetase TilS: a case of tRNA gene recruitment in Bacillus subtilis, Mol. Microbiol. 80 (2011) 1062–1074. [54] E. Sandmeier, T.I. Hale, P. Christen, Multiple evolutionary origin of pyridoxal-5’phosphate-dependent amino acid decarboxylases, Eur. J. Biochem. 221 (1994) 997– 1002. [55] J.P. Shaw, G.A. Petsko, D. Ringe, Determination of the structure of alanine racemase from Bacillus stearothermophilus at 1.9-A resolution, Biochemistry. 36 (1997) 1329– 1342. [56] S. Levin, S.C. Almo, B.H. Satir, Functional diversity of the phosphoglucomutase superfamily: structural implications, Protein Eng. 12 (1999) 737–746. [57] S. Kimura, T. Suzuki, Fine-tuning of the ribosomal decoding center by conserved methyl-modifications in the Escherichia coli 16S rRNA, Nucleic Acids Res. 38 (2010) 1341–1352.
[58] Y.-J. Lu, Y.-M. Zhang, K.D. Grimes, J. Qi, R.E. Lee, C.O. Rock, Acyl-phosphates initiate membrane phospholipid synthesis in Gram-positive pathogens, Mol. Cell. 23 (2006) 765–772. [59] D.L. Burk, X-ray structure of the AAC(6’)-Ii antibiotic resistance enzyme at 1.8 A resolution; examination of oligomeric arrangements in GNAT superfamily members, Protein Sci. 12 (2003) 426–437. [60] T.A. Gould, H.P. Schweizer, M.E.A. Churchill, Structure of the Pseudomonas aeruginosa acyl-homoserinelactone synthase LasI, Mol. Microbiol. 53 (2004) 1135– 1146. [61] L. Holm, C. Sander, An evolutionary treasure: unification of a broad set of amidohydrolases related to urease, Proteins. 28 (1997) 72–82. [62] J. Gál, A. Szvetnik, R. Schnell, M. Kálmán, The metD D-methionine transporter locus of Escherichia coli is an ABC transporter gene cluster, J. Bacteriol. 184 (2002) 4930–4932. [63] A.E. Cooley, S.P. Riley, K. Kral, M.C. Miller, E. DeMoll, M.G. Fried, B. Stevenson, DNA-binding by Haemophilus influenzae and Escherichia coli YbaB, members of a widely-distributed bacterial protein family, BMC Microbiol. 9 (2009) 137. [64] K. Surdova, P. Gamba, D. Claessen, T. Siersma, M.J. Jonker, J. Errington, L.W. Hamoen, The conserved DNA-binding protein WhiA is involved in cell division in Bacillus subtilis, J. Bacteriol. 195 (2013) 5450–5460. [65] J.A. Aínsa, N.J. Ryding, N. Hartley, K.C. Findlay, C.J. Bruton, K.F. Chater, WhiA, a protein of unknown function conserved among gram-positive bacteria, is essential for sporulation in Streptomyces coelicolor A3(2), J. Bacteriol. 182 (2000) 5470–5478. [66] L. Aravind, E.V. Koonin, Novel predicted RNA-binding domains associated with the translation machinery, J. Mol. Evol. 48 (1999) 291–302. [67] J.-H. Kim, S.J. Park, K.-Y. Lee, W.-S. Son, N.-Y. Sohn, A.-R. Kwon, B.-J. Lee, Solution structure of hypothetical protein HP1423 (Y1423_HELPY) reveals the presence of alphaL motif related to RNA binding, Proteins. 75 (2009) 252–257. [68] L. Aravind, E.V. Koonin, THUMP--a predicted RNA-binding domain shared by 4thiouridine, pseudouridine synthases and RNA methylases, Trends Biochem. Sci. 26 (2001) 215–217. [69] K. Okamura, Y. Hagiwara-Takeuchi, T. Li, T.H. Vu, M. Hirai, M. Hattori, Y. Sakaki, A.R. Hoffman, T. Ito, Comparative genome analysis of the mouse imprinted gene impact and its nonimprinted human homolog IMPACT: toward the structural basis for speciesspecific imprinting, Genome Res. 10 (2000) 1878–1889.
[70] Z. Prágai, C.R. Harwood, Regulatory interactions between the Pho and sigma(B)dependent general stress regulons of Bacillus subtilis, Microbiology. 148 (2002) 1593– 1602. [71] M. Cellier, G. Privé, A. Belouchi, T. Kwan, V. Rodrigues, W. Chia, P. Gros, Nramp defines a family of membrane proteins, Proc. Natl. Acad. Sci. U. S. A. 92 (1995) 10089– 10093. [72] G. Govoni, P. Gros, Macrophage NRAMP1 and its role in resistance to microbial infections, Inflamm. Res. 47 (1998) 277–284. [73] E. Pinner, S. Gruenheid, M. Raymond, P. Gros, Functional complementation of the yeast divalent cation transporter family SMF by NRAMP2, a member of the mammalian natural resistance-associated macrophage protein family, J. Biol. Chem. 272 (1997) 28933–28938. [74] M. Lehnik-Habrink, L. Rempeters, Á.T. Kovács, C. Wrede, C. Baierlein, H. Krebber, O.P. Kuipers, J. Stülke, DEAD-Box RNA helicases in Bacillus subtilis have multiple functions and act independently from each other, J. Bacteriol. 195 (2013) 534–544. [75] J. de la Cruz, D. Kressler, P. Linder, Unwinding RNA in Saccharomyces cerevisiae: DEAD-box proteins and related families, Trends Biochem. Sci. 24 (1999) 192–198. [76] L. Aravind, Y.I. Wolf, E.V. Koonin, The ATP-cone: an evolutionarily mobile, ATPbinding regulatory domain, J. Mol. Microbiol. Biotechnol. 2 (2000) 191–194. [77] G.B. Erkens, M. Majsnerowska, J. ter Beek, D.J. Slotboom, Energy coupling factor-type ABC transporters for vitamin uptake in prokaryotes, Biochemistry. 51 (2012) 4390– 4396. [78] R.C. Allen, R. Popat, S.P. Diggle, S.P. Brown, Targeting virulence: can we make evolution-proof drugs?, Nat. Rev. Microbiol. 12 (2014) 300–308. [79] Y. Redko, E. Galtier, H. Arnion, F. Darfeuille, O. Sismeiro, J.-Y. Coppée, C. Médigue, M. Weiman, S. Cruveiller, H. De Reuse, RNase J depletion leads to massive changes in mRNA abundance in Helicobacter pylori, RNA Biol. 13 (2016) 243–253. [80] T.M. Eidem, C.M. Roux, P.M. Dunman, RNA decay: a novel therapeutic target in bacteria, Wiley Interdiscip. Rev. RNA. 3 (2012) 443–454.
Figure Legends Fig. 1: Complete framework used for functional annotation of essential hypothetical proteins. Fig. 2: The distribution of functional classes of essential hypothetical proteins.
Table Legends Table 1: The complete list of conserved domain data for essential hypothetical proteins in S. aureus N315 Table 2: List of functionally annotated essential hypothetical proteins in S.aureus N315
Table 1: The complete list of conserved domain data for essential hypothetical proteins in S. aureus N315 DEG AC. No.
GENE NAME
BLAST
InterproScan
Pfam
CDD
Pathogen specific
DEG10020008
SA0021
Metallo-beta-lactamase domain protein
Metallo-beta lactamase
Metallo-beta-lactamase superfamily
MBL-fold-metallo hydrolase domain
✔
DEG10020009
SA0085
tRNA-dihydrouridine synthase
tRNA-dihydrouridine synthase
Dihydrouridine synthase
Dihydrouridine synthaselike (DUS-like) FMNbinding domain
✘
DEG10020012
SA0181
Isochorismatase
isochorismatase like domain
isochorismatase family
Cysteine hydrolases (also contains isochorismatase)
✔
DEG10020015
SA0230
Hypothetical
Unknown function
Protein of unknown function
Uncharacterized conserved protein YeaO, DUF488 family
✔
DEG10020017
SA0348
Chromosome partitioning protein ParB
ParB/RepB/Spo0J partition protein family
ParB-like nuclease domain
ParB-like nuclease domain
✔
DEG10020018
SA0351
GTP-binding protein YchF
Ribosome-binding ATPase YchF/Obg-like ATPase 1
MMR_HSR1, YchFGTPase-C
YchF GTPase
✘
DEG10020025
SA0422
NLPA lipoprotein
Lipoprotein NLPA family
NLPA lipoprotein
PBP2_lipoprotein_GmpC
✔
DEG10020027
SA0437
Nucleoid associated protein, YbaB/EbfC family
Nucleoid-associated protein YbaB/EbfC family
YbaB DNA binding family
Hypothetical protein
✔
DEG10020029
SA0446
Hypothetical
Domain- GIY-YIG nuclease superfamily
GIY-YIG catalytic domain
Predicted endonuclease, GIY-YIG superfamily
✔
DEG10020030
SA0447
16s rRNA (2`-O) methyltransferase
rRNA small subunit methyltransferase I
Tetrapyrrole methylase
16S rRNA C1402 (ribose2'-O) methylase RsmI
✔
DEG10020032
SA0449
Hydrolase TatD
TatD family
TatD Dnase
TatD (DNase activity)
✘
DEG10020036
SA0464
Hypothetical
RNA-binding protein, HP1423 type
TatD_Dnase
HslR, heat shock protein, contains S4 domain
✔
DEG10020037
SA0467
tRNA(Ile)-lysidine synthase
tRNA(Ile)-lysidine synthase
ATP bind-3, TilS-C substrate domain
tRNA(Ile)-lysidine synthase TilS/MesJ
✔
DEG10020054
SA0560
HD domain containing protein
Domain- HD/PDEase domain
HD-domain
Metal dependent phosphohydrolases with conserved 'HD' motif
✘
DEG10020063
SA0703
YigZ family protein
Impact family
Uncharacterized protein
Uncharacterized protein family
✘
DEG10020069
SA0722
Sporulation regulation WhiA
Sporulation regulator WhiA-like
whiA_N-terminal
WhiA C-terminal HTH domain
✔
DEG10020074
SA0732
Hypothetical
Unknown
Unknown
Unknown
✘
DEG10020076
SA0771
Methionine ABC transporter substrate binding protein
Lipoprotein NlpA family
Lipoprotein
ABC-type metal ion transport system
✔
DEG10020077
SA0772
CsbD family protein
CsbD-like
CsbD-like
Uncharacterized conserved protein YjbJ
✔
Table 1:Cont. DEG AC. No.
DEG10020093
GENE NAME SA0940
BLAST
Ribonuclease J
InterproScan
Pfam
CDD
Pathogen specific
Ribonuclease J
Lactamase-B , Zndependent metallo hydrolase RNA species
RNAaseJ, MBL-fold metallo-hydrolase domain
✔
✘
DEG10020098
SA0956
Manganese transporter
NRAMP family
NRAMP
Manganese transport protein MntH, / Natural resistance-associated macrophage protein (NRAMP)
DEG10020106
SA0998
Non-canonical purine NTP pyrophosphatase
Inosine triphosphate pyrophosphatase-like
Ham1p-like
Nucleosidetriphosphatase
✔
DEG10020111
SA1031
YqqS family pyridoxal phosphate enzyme
Uncharatized protein family
Ala_racemase_N
Uncharacterized pyridoxal phosphatecontaining protein
✘
DEG10020128
rbgA
Ribosome biogenesis GTPase YlgF
GTPase, MTG1
MMR_HSR(ribosome binding GTPase)
Circularly permuted YlqF GTPase
✘
DEG10020145
SA1147
GTPase HflX
GTPase HflX
GTP binding GTPase
HflX GTPase family
✘
DEG10020146
SA1176
DUF896 family protein
Unknown function
DUF896
Bacterial protein of unknown function (DUF896)
✔
DEG10020148
SA1187
Glycerol-3-phosphate acyltransferase
Glycerol-3-phosphate acyltransferase, PlsY
G3P-acyltransferase
putative glycerol-3phosphate acyltransferase PlsY
✔
DEG10020155
SA1252
Acyltransferase
Domain- Acyl-CoA Nacyltransferase
Acyltransferase
Acetyltransferase (GNAT) family
✔
✘
DEG10020158
SA1277
RNA methyltransferase
THUMP domain
THUMP
THUMP domain associated with Sadenosylmethioninedependent methyltransferases
DEG10020161
engA
Ribosome biogenesis GTPase Der
GTP-binding protein EngA
MMR-HSR1 (ribosome binding GTPase)
EngA2 GTPase
✘
DEG10020177
SA1445
Putative cytosolic protein
Unknown function
DUF965
Hypothetical protein
✔
DEG10020193
SA1509
NrdR family transcription regulator
Ribonucleotide reductase regulator NrdR-like
ATP-cone domain
Transcriptional regulator NrdR, contains Znribbon and ATP-cone domains
✘
DEG10020225
vga
ABC-transporter ATPbinding protein
Domain- P-loop containing nucleoside triphosphate hydrolase
ABC-transporter
ABC transporter Cterminal domain
✘
DEG10020228
SA1885
DEAD/DEAH box family ATP- dependent RNA helicase
DEAD-box ATPdependent RNA helicase CshA
DEAD/DEAH box helicase
DEAD-box helicases, Helicase superfamily cterminal domain
✘
Table 1:Cont. Pathogen specific
GENE NAME
BLAST
Interpro Scan
DEG10020241
SA1957
Haloacid dehalogenase , caf-like hydrolase
HAD-superfamily hydrolase, subfamily IIB
Hydrolase
DEG10020245
SA1966
Hypothetical
YbbR-like
YbbR-like protein (Hypothetical)
YbbR-like protein
✘
DEG10020249
SA2019
Energy-coupling factor transporter protein FcfT
ABC/ECF transporter, transmembrane component
CbiQ
Energy-coupling factor transporter transmembrane protein EcfT
✔
SA2020
Energy-coupling factor transporter ATPase
Energy-coupling factor transporter ATP-binding protein EcfA2
ABC-transporter
ATP-binding cassette component of cobalt transport system
✘
PGM-PMM-I
CD includes PGM2 (phosphoglucomutase 2) and PGM2L1 (phosphoglucomutase 2like 1)
✘
ABC-transporter
ABC-type lipoprotein export system, ATPase component
✘
DEG AC. No.
DEG10020250
DEG10020288
SA2279
Phosphoglucomutase
Alpha-Dphosphohexomutase superfamily
DEG10020298
vraD
ABC-transporter ATPbinding protein
Domain- P-loop containing nucleoside triphosphate hydrolase
Pfam
CDD Haloacid dehalogenase-like hydrolases
✔
Table 2: List of functionally annotated essential hypothetical proteins in S.aureus N315 Gene name SA0021 SA0085 SA0181 SA0230 SA0348 SA0351 SA0422 SA0437 SA0446 SA0447 SA0449 SA0464 SA0467 SA0560 SA0703 SA0722 SA0732 SA0771 SA0772 SA0940 SA0956 SA0998 SA1031 rbgA SA1147 SA1176 SA1187 SA1252 SA1277 engA SA1445 SA1509 Vga SA1885 SA1957 SA1966
Function Metallo-beta lactamase tRNA-dihydrouridine synthase Isochorismatase Hypothetical ParB-like nuclease domain YchF GTPase NLPA lipoprotein Nucleoid associated protein, YbaB/EbfC family GIY-YIG nuclease superfamily rRNA small subunit methyltransferase I TatD family RNA-binding protein, HP1423 type tRNA(Ile)-lysidine synthase HD-domain Impact family Sporulation regulator WhiA-like Hypothetical NLPA lipoprotein CsbD family protein Ribonuclease J NRAMP family Inosine triphosphate pyrophosphatase-like Pyridoxal phosphate-containing protein GTPase, MTG1 GTPase HflX Unknown function Glycerol-3-phosphate acyltransferase, PlsY Acyl-CoA N-acyltransferase THUMP domain GTP-binding protein EngA Unknown function Ribonucleotide reductase regulator NrdR-like P-loop containing nucleoside triphosphate hydrolase DEAD-box ATP-dependent RNA helicase CshA HAD-superfamily hydrolase, subfamily IIB Hypothetical
Enzyme Hydrolase Ligase Hydrolase Unknown Hydrolase Hydrolase Permease Binding protein Hydrolase Transferase Metallo-enzymes Binding protein Ligase Hydrolase Miscellaneous Binding protein Unknown Permease Miscellaneous Hydrolase Miscellaneous Hydrolase Lyase Hydrolase Hydrolase Unknown Transferase Transferase Binding protein Hydrolase Unknown Oxidoreductase Hydrolase Helicase Hydrolase Unknown
SA2019 SA2020 SA2279 vraD
CbiQ Energy-coupling factor transporter ATP-binding protein EcfA2 Alpha-D-phosphohexomutase superfamily P-loop containing nucleoside triphosphate hydrolase
Transport protein Transport protein Isomerase Hydrolase
Fig. 1: Complete framework used for functional annotation of essential hypothetical proteins. DEG: Database of Essential Genes; EPs: Essential proteins; EHPs: Essential hypothetical proteins
Hydrolases Oxidoreductases Ligases
13% 3%
5%
Lyases
35%
Isomerases
8%
Transferases 10%
Metalloenzymes 5% 7%
2%
Permeases
5%
Binding proteins 2%
3%
2%
Miscellaneous proteins Helicases Transporters Unknown
Fig. 2: The distribution of functional classes of essential hypothetical proteins.