Comparative genomics of Sphingopyxis spp. unravelled functional attributes

Comparative genomics of Sphingopyxis spp. unravelled functional attributes

Journal Pre-proof Comparative genomics functional attributes of Sphingopyxis spp. unravelled Helianthous Verma, Gauri Garg Dhingra, Monika Sharma...

17MB Sizes 0 Downloads 47 Views

Journal Pre-proof Comparative genomics functional attributes

of

Sphingopyxis

spp.

unravelled

Helianthous Verma, Gauri Garg Dhingra, Monika Sharma, Vipin Gupta, Ram Krishan Negi, Yogendra Singh, Rup Lal PII:

S0888-7543(19)30371-4

DOI:

https://doi.org/10.1016/j.ygeno.2019.11.008

Reference:

YGENO 9401

To appear in:

Genomics

Received date:

20 June 2019

Revised date:

12 November 2019

Accepted date:

14 November 2019

Please cite this article as: H. Verma, G.G. Dhingra, M. Sharma, et al., Comparative genomics of Sphingopyxis spp. unravelled functional attributes, Genomics (2019), https://doi.org/10.1016/j.ygeno.2019.11.008

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Comparative genomics of Sphingopyxis spp. unravelled functional attributes Helianthous Verma1 , Gauri Garg Dhingra2 , Monika Sharma3 , Vipin Gupta4,6 , Ram Krishan Negi3 , Yogendra Singh4 and Rup Lal5, 6 *

Department of Zoology, Ramjas College, University of Delhi, Delhi-110007

2

Department of Zoology, Kirori Mal College, University of Delhi, Delhi-110007

3

Room no. 18, Fish Molecular Biology Laboratory, Department of Zoology, University of Delhi,

of

1

4

ro

Delhi 110007

Room no. 112, Bacterial Pathogenesis Laboratory, Department of Zoology, University of Delhi,

5

-p

Delhi 110007

The Energy and Resources Institute, Darbari Seth Block, IHC Complex, Lodhi Road, New

PhiXgen Pvt. Ltd.

lP

6

re

Delhi-110003, India

ur

na

Running Title: Comparative Genomics of Sphingopyxis spp.

Jo

*Corresponding author: [email protected]

Keywords: Sphingopyxis, Comparative genomics, stress resistance, aromatic compound degradation, polyhydroxybutyrate

Abstract

Journal Pre-proof Members of genus Sphingopyxis are known to thrive in diverse environments. Genomes of 21 Sphingopyxis strains were selected. Phylogenetic analysis was performed using GGDC, AAI and core-SNP showed agreement at sub-species level. Based on our results, we propose that both S. baekryungensis DSM16222 and Sphingopyxis sp. LPB0140 strains should not be included under genus Sphingopyxis. Core-analysis revealed, 1422 genes were shared which included essential pathways and genes for conferring adaptation against stress environment. Polyhydroxybutyrate degradation, anaerobic respiration, type IV secretion were notable abundant pathways and exopolysaccharide, hyaluronic acid production and toxin-antitoxin system were differentially

of

present families. Interestingly, genome of S. witflariensis DSM14551, Sphingopyxis sp. MG and

ro

Sphingopyxis sp. FD7 provided a hint of probable pathogenic abilities. Protein-Protein Interactome depicted that membrane proteins and stress response has close integration with core-

-p

proteins while aromatic compounds degradation and virulence ability formed a separate network.

Jo

ur

na

lP

re

Thus, these should be considered as strain specific attributes.

Introduction

Journal Pre-proof Sphingopyxis spp. belongs to the class Alphaproteobacteria and family Sphingomonadaceae. Members of this family are prevalently identified in the community of soil microorganisms and characterized mainly by the presence of sphingoglycolipids in cellular lipids, octadecenoic fatty acid (C18:1 ) as major fatty acid and ubiquinone Q10 as respirato ry quinone (Glaeser and Kampfer et al., 2014). Sphingopyxis and four other bacterial genera, Sphingobium, Novosphingobium, Sphingomonas and Sphingocinicella are collectively called as sphingomonads. They are versatile organisms and widely known for their role in environment nutrient cycling, biotechnological practices, intake and metabolism of pesticides and other toxic compounds (Balkwill et al., 2006).

of

Among sphingomonads, Sphingobium and Novosphingobium are most studied genera in terms of

ro

functional repertoire, aromatic compounds degradation pathways (Lal et al., 2010), genomic analysis (Verma et al., 2017), and comparative genomics (Gan et al., 2013; Verma et al., 2014;

-p

Kumar et al., 2017). Sphingopyxis strains are comparably less studied and mostly known as inhabitant of diverse environments such as anaerobic sludge blanket (Kim et al., 2005), sea water

re

(Yoon et al., 2005), hydrocarbon contaminated soil (Zhang et al., 2010), hexachlorocyclohexane

lP

contaminated soil (Sharma et al., 2010; Jindal et al, 2013, Verma et al., 2015), and wastewater treatment plant (Kampfer et al., 2002). Due to their ability to thrive in such harsh environments, these strains can tolerate oxidative and osmotic stresses, large temperature variations and high

na

concentration of aromatic compounds in the environment (Kampfer et al., 2002; Jindal et al., 2013). Till now, studies based on taxonomical characterization ha ve revealed their abundance in

ur

diverse environments but deeper insights into the genera are yet to be attained. Previously, core

Jo

and pangenome analyses of Sphingopyxis strains were performed by different research groups (Garcia-Romero et al., 2016; Parthasarathy et al., 2017, Kaminski et al., 2019).

In this study, we selected 21 strains of Sphingopyxis spp. for genome comparison, in which 11 known species were covered. The criteria was to select taxonomically well characterized strains under genus Sphingopyxis. Also, we included seven strains namely, Sphingopyxis sp. QXT-31, Sphingopyxis sp. LPB0140, Sphingopyxis sp. FD7, Sphingopyxis sp. MG, Sphingopyxis sp. EG6, Sphingopyxis sp. PAMC25046 and Sphingopyxis sp. 113P3 to determine their phylogenetic positions as their complete genomes were available on NCBI among other uncharacterized strains. Other uncharacterized and incomplete or draft genome sequences were not included in this study. Phylogenetic analysis was performed to obtain true picture of phylogeny using three

Journal Pre-proof different genome based methods, Amino acid Identity (AAI), Genome to Genome Distance Calculator (GGDC) and Core-Single Nucleotide Polymorphism (Core-SNP). Functional analysis was performed using Fig families annotation and pathways were re-constructed on MinPath server. Further, Protein-Protein Interactomes were prepared using core-proteins of Sphingopyxis spp. and four different attributes (extracted from pan genes): aromatic compound degradation, stress response (against osmotic, oxidative, temperature changes), pathogenic abilities and membrane transporters. Among these, proteins for membrane transport and stress response showed substantial degree of interaction with the core proteins while the possibility of

of

pathogenic behaviour was only identified in three strains, S. witflariensis DSM14551,

ro

Sphingopyxis sp. FD7 and Sphingopyxis sp. MG.

-p

Results and Discussion Characteristics of analysed genomes

re

21 strains of the genus Sphingopyxis (Table 1), which were either taxonomically characterized or

lP

complete genome sequences publicly available, were selected for the study. All strains showed a narrow range of GC content from 62.4% in Sphingopyxis baekryungensis DSM16222 to 66.5%

na

in Sphingopyxis sp. QXT-31 with an exception of 46.1% in Sphingopyxis sp. LPB0140 (Table 1). Notably, this is the lowest GC content reported in any sphingomonad species till date (Schut et

ur

al., 1993; Aylward et al., 2013; Verma et al., 2014). This notably large difference is also observed in members of other genera, for instance, in Corynebacterium, range of GC content

Jo

varies from 46.46% (Ruckert et al., 2015) to 69.5% (Shin et al., 2011). The genome size varied from a minimum of 3Mb difference as lowest of 2.53 Mb to highest 5.95 Mb in Sphingopyxis sp. LPB0140 and Sphingopyxis macrogoltabida 203N, respectively (Table 1). However, strain LPB0140 is uncharacterized and deeper insight into genome based phylogenetic characterization is required which will be discussed in next section in detail. The location, site of isolation, RNAs, coding sequences, specific genes and genome coverage are listed in table 1.

Phylogenetic clustering: Tracing the Divergence Three methods were used to determine the phylogenetic descend of Sphingopyxis species. First, Average Nucleotide Identity (ANI) (Blastn) values were calculated (Konstantinidis & Tiedje, 2005) using ANI calculator (Supplementary Figure 1). Pairwise ANI values of strain

Journal Pre-proof Sphingopyxis sp. LPB0140 were beyond the observable limit which cannot contribute in true phylogeny prediction. Thus, pairwise Average Amino acid Identities were calculated using AAI calculator (Supplementary table 1). The other two methods used were Genome to Genome Distance Calculator (GGDC) (Auch et al., 2010) (Supplementary table 2) and Core-Single Nucleotide Polymorphism (core-SNP) using maximum likelihood method of core-SNPs identified from kSNP3 (Gardner et al., 2015) which takes an account of complete genome and core-genetic repertoire, respectively. The phylogeny was plotted by using Novosphingobium aromaticivorans DSM12444 as a representative of Sphingomonadaceae family and out-genus

of

species (Figure 1). The phylogeny of Sphingopyxis spp. proposed S. baekryungensis DSM16222

ro

and Sphingopyxis sp. LPB0140 as the most divergent strains and outliers of genus Sphingopyxis. This is evident due to their separate clustering similar to N. aromaticivorans DSM12444 (Figure

-p

1 A-C). Notably, S. baekryungensis DSM16222 and Sphingopyxis sp. LPB0140 shared lowest pairwise-AAI value with other Sphingopyxis strains (Supplementary Table 1). For instance,

re

strain LPB0140 has minimum of 56.00% AAI with S. fribergensis Kp5.2 and maximum of

lP

59.0% AAI with S. baekryungensis DSM16222. S. baekryungensis DSM16222 has minimum AAI value of 58.0% with many strains and maximum 60% with S. indica DS15 (Supplementary table 4). As described for AAI cut-off for species classification within the genus, strains within

na

range of >55 to 60% are considered under same genus (Rodriguez and Konstantinidis, 2014). Interestingly, AAI values shared by S. baekryungensis DSM16222 and Sphingopyxis sp.

ur

LPB0140 with other Sphingopyxis species is similar to AAI values of N. aromaticivorans

Jo

DSM12444 with Sphingopyxis strains (Supplementary Table 1). This suggests the need of their reclassification outside the genera. This is also supported by the recent study done by GarcíaRomero et al., which suggested divergence of S. baekryungensis DSM16222 from other strains of Sphingopyxis genus (García-Romero et al., 2016). Hence, we propose that both S. baekryungensis DSM16222 and Sphingopyxis sp. LPB0140 strains should not be included under genus Sphingopyxis.

Focusing on sub-species level, strains of S. macrogoltabida: 203N and EY-1, S. granuli: TFA and NBRC 100800, S. terrae NBRC 15098 and S. ummariensis DSM 24316 (Feng et al., 2017) showed close monophyletic clustering. An uncharacterized strain, Sphingopyxis sp. MG showed closed grouping with strains of S. granuli. Notably, strain MG shared >97% ANI, 96.0% AAI

Journal Pre-proof and >70% of DDH values with both S. granuli strains. From this, we propose the classification of Sphingopyxis sp. MG as a strain of S. granuli and should be called as S. granuli MG. Strain R11H and 113P3 also forms a monophylectic clade which was supported by all three methods. Interestingly, these strains showed high AAI>90%, which is more than species level boundary (85-90%) (Rodriguez and Konstantinidis 2014). However, the ANI value between S. flava R11H and strain 113P3 is 91.0%, which is less than 94%, species boundary for ANI (Rodriguez and Konstantinidis 2014). Hence, Sphingopyxis sp. 113P3 should be classified as closely related species of S. flava R11H but a distinct species under Sphingopyxis. Other strains

of

showed variations in clustering and phylogenetic relatedness in three methods hence, do not

-p

Core genome and Pan genome of Sphingopyxis spp.

ro

possess consensus phylogeny (Figure 1).

As per the phylogenetic placements, two strains, S. baekryungensis DSM16222 and

re

Sphingopyxis sp. LPB0140 were classified as outliers of the genus whereas the other

lP

uncharacterized strains showed relatedness to other Sphingopyxis spp. Thus, 19 strains instead 21 were selected for further analysis as a part of Sphingopyxis spp. The core- genome content was composed of 1422 coding sequences (CDS, sequences annotated as genes), 481,150 bp

na

(Supplementary table 3). The predicted core genome achieved saturation or plateau in 19 genomes reflecting that further addition of new Sphingopyxis genomes may not affect or lower

ur

the genetic content of the “core genome” (Figure 2A). Along with essential genetic content such

Jo

as genes for DNA replication, translation, LSU and SSU, DNA repair, fatty acid biosynthesis, dehydrogenase complexes, genes encoding for heat shock resistance including GRoEL- ES and dnaK, oxidative stress resistance such as choline and betaine uptake and biosynthesis were identified. Also, genes for membrane transport like multidrug resistance and tripartite systems, ton/tol transport systems and ABC transporters were annotated in core-content. In core genome, genes such as groEL, groES, hrcA, grpE, Glutathione Synthetase (GSS), Glutathione Reductase (GSR) and Glutathione Transferase (GST) confers resistance against extreme heat and oxidative stress which suggests that Sphingopyxis strains are capable of tolerating high temperature and free oxygen radicals. Also, these genes have been retained in subsequent evolution. In addition, choline and betaine uptake and betaine biosynthesis genes were also identified in the core genome which were earlier reported as osmo-protectants (Smith et al., 2002). Some reports have

Journal Pre-proof shown that Sphingopyxis strains degrade PAH (Yuan et al., 2015) and mycrosystin (Ho et al., 2006). Comparative studies on sphingomonads (Aylward et al., 2013; Gan et al., 2013; Verma et al., 2014; Kumar et al., 2017) focused on the aromatic compounds degradation by members of genus Sphingobium and Novosphingobium as the main candidates. Ton/Tol and ABC transporters, which can export the metabolites obtained from degradation of aromatic compounds and toxic wastes, were identified as part of core genome (Jeong et al., 2017). In addition, multidrug resistance tripartite system was also present in core genome which plays a crucial role in exporting antimicrobial drugs (Daury et al., 2016), thus increasing the survival chances against

of

certain antimicrobial components in the environment. Also, annotated genes for copper

ro

homeostasis, ammonia metabolism, folate and zinc transport suggests other adaptations within

-p

Sphingopyxis spp.

Pan genome of Sphingopyxis spp. consisted of plethora of genes, n=13012, categorised under

re

623 pathways (Supplementary table 4 and 5), which reflected that strains acquired variety of

lP

genes to become fit to survive in niches such as marine, contaminated sites, activated sludge and waste water treatment plants. For instance, genes for degradation and tolerance of aromatic compounds such as pyrene, chlorophenol, naphthalene, and toxic compounds such as Bacitracin

na

(cyclic peptides interfere with peptidoglycan synthesis) and Colicin E2 (toxin enters in cytoplasm of sensitive strains and degrade DNA) (Duche, 2007) were identified among members

ur

of Sphingopyxis genus. The pan genome has shown gradual increase with addition of genomes

Jo

till 19th genome (Fig. 3B). Thus, the predicted pan genome was of “open sense” i.e., unsaturated, and further addition of genomes may increase the number of genes in ‘pan genome’ (Figure 2B).

Functional pathways: enrichment and variation Functional repertoire of 19 Sphingopyxis strains was obtained with figfam IDs using MinPath server (Yuzhen and Doak, 2009). A total of 622 complete pathways were mapped on the pan genome of strains. Profiling was done by two ways, on the basis of number of genes annotated under a family and upon presence and absence of a particular pathway, as per MinPath server. MinPath uses parsimony approach to predict pathway presence on the basis of annotated genes. The annotated pathways were then classified under three classes: most abundant (maximum

Journal Pre-proof present within strains), most varied (differential in presence within strains) and strain-specific pathways. To divide the pathways under these categories, matrix was prepared and categorised. Central Functional Profile: Most Abundant Protein Families Predominately known as aerobic respirators, functional profile of Sphingopyxis strains suggested that, except S. flava R11H, all strains possessed pathway for anaerobic respiration (Figure 3A). Previous reports have shown anaerobic nitrate respiration by S. granuli TFA (Garcia-Romero et al., 2016) as an environmental adaptation and hence, anaerobic respiration was reported as species-specific adaptation. However, our analysis supports that majority of Sphingopyxis strains

of

contain genes for anaerobic respiration (Supplementary figure 2). In addition, anaerobic benzoate

ro

degradation pathway was identified in 16 strains out of 19, except in Sphingopyxis sp. QXT-31, S. flava R11H, and S. witflariensis DSM14551, which is generally a feature of certain bacterial

-p

(other than sphingomonads) and archaeal strains (Gibson and Gibson 1992; Holmes et al., 2012).

re

Genes for Napthalene and Anthracene degradation were noted in Sphingopyxis spp. Salicylate

lP

and Gentisate degradation genes were also annotated in nearly 14 out of 19, except in S. witflariensis DSM14551, Sphingopyxis sp. QXT-31, Sphingopyxis sp. MG, Sphingopyxis sp. FD7 and S. bauzanensis DSM22271 (Figure 3A). Notably, Salicylate is an intermediate of

na

Napthalene degradation pathway and oxidized into catechol followed by ring cleavage and

ur

oxidation into gentisate (Bosch et al., 1999).

Jo

Focusing on the stress resistance, genes encoding resistance against arsenic, cobalt-zinccadmium and fluoroquinolones were identified in all strains except S. witflariensis DSM14551 . Furfurals, produced as side-product of sugars industry, inhibits the microbial growth by limiting sulphur assimilation (Miller et al., 2009) or inducing reactive oxygen species for cellular damage (Allen et al., 2010). Majority of Sphingopyxis spp. possess tolerance against furfural- induced stress (Figure 3A). Another adaptation strategy identified is the presence of Hfl operon in all Sphingopyxis strains where, strain S. flava R11H, S. witflariensis DSM14551 consists of hflB and hflC genes only and Sphingopyxis sp. MG possess hflC, hflK and hflX. Hfl operon is reported to induce the lysogenic attack by the phages as a mode of adaptation in bacteria (Belfort and Wulff, 1973). Hence, the presence of Hfl operon is supposed to enhance the lysogenic encounters of

Journal Pre-proof phages which in turn helps in acquisition of genes which may contribute positively towards developing an adaptive phenotype.

In Sphingopyxis strains, presence of Type IV secretion system was noted (Figure 3A). Type IV secretion and conjugative transfer system is known to exchange DNA and protein substrates through cell to cell direct contact (Christie, 2001). This system is primarily observed in pathogenic bacteria tailored for the release of virulence factors and modulation of physiological processes (Backert and Selbach, 2008; Alvarez-Martinez and Christie, 2009). Also, association

of

of Type IV system with the transfer of plasmids and antibiotic resistant genes has been reported

ro

(Green and Mecsas, 2016). Earlier studies have suggested their occurrence in sphingomonads species (Lawley et al., 2003; Miyazaki et al., 2006; Verma et al., 2017). Sphingopyxis or more

-p

distantly sphingomonads, generally, are non-pathogenic. Hence, in these strains, the main role of Type IV secretion system might be in relevance with conjugation of plasmids and defence

re

against antibiotics.

lP

Previous studies have reported involvement of few Sphingopyxis strains in Polyhydroxybutyrate (PHBs) degrading consortia (Fernández-García et al., 2016; Silva et al., 2007; Lam et al., 2017). Also, the ability to metabolize PHBs was reported in few sphingomonads such as Sphingomonas

na

pituitosa EDIV and Sphingopyxis macrogoltabida TFA (Denner et al., 2001; Martín‐ Cabello et al., 2011). Polyhydroxybutyrate is predominately known as a potential macromolecule for

ur

production of biodegradable plastics (Verlinden et al., 2007; Getachew et al., 2016). PHBs are

Jo

metabolized into acetyl acetate with the help of PHB depolymerase (PhaZ) and re-polymerized by 3-hydroxybutyrate dehydrogenase (BdhA/bdh1), Acetyl COA synthetase (AcsA2), Acetoacetyl CoA reductase (PhbB) and phb synthase (PhbC) (Altaee et al., 2016). Here, phaZ was annotated in all strains except S. witflariensis DSM14551 and Sphingopyxis sp. QXT-31 (Supplementary figure 3 and supplementary table 6). BdhA/1 was annotated in all strains whereas further two enzymes involved in phb synthase coded by phbA and phaB genes were annotated in S. witflariensis DSM14551, Sphingopyxis sp. MG, Sphingopyxis sp. PAMC25046, Sphingopyxis sp. EG6, Sphingopyxis sp. FD7, S. lindanitolerance WS5A3p and Sphingopyxis sp. QXT-31. Phb synthase encoded by phbC which catalysed the conversion of 3-hydroxybutyratecoA to PHB was annotated in all Sphingopyxis strains. Hence, on the basis of genetic repertoire,

Journal Pre-proof it can be predicted that these strains can be good candidates for PHBs treatment plants but only after the consortia tests.

Functional assortment within strains: Most Varied Protein Families Sphingopyxis strains showed variations in occurrence of metabolic pathways (Figure 3B). For instance, genes for capsular and exo-polysaccharide production were annotated. Capsular polysaccharides provide protection and bacterial pathogenesis in gram ne gative bacteria (Whitfield et al., 2006). Exopolysaccharides are secreted in extracellular environment and

of

confers pathogenicity and biofilm formation in some strains (Schmid et al., 2015). In

ro

sphingomonads, few have been characterized with exopolysaccharide production ( Denner et al., 2001). Among Sphingopyxis strains, S. alaskensis RB2256, S. indica DS15, S. fribergensis

-p

Kp5.2, S. macrogoltabida EY-1, S. lindanitolerans WS5A3p and Sphingopyxis sp. FD7 consists genes that encode for both capsular and exo-polysaccharide biosynthesis. Sphingopyxis sp. QXT-

re

1, S. macrogoltabida EY-1, S. alaskenesis RB2256, S. indica DS15, Sphingopyxis sp. EG6 and S.

lP

fribergensis Kp5.2 possess genes for capsular polysaccharide only while Sphingopyxis sp. 113P3, S. witflariensis DSM14551, S. terrae subsp. ummariensis, S. granuli TFA, S. granuli NBRC 100800, and Sphingopyxis sp. MG consists genes for exo-polysaccharide. Interestingly,

na

strains (except S. witflariensis DSM14551, S. flava R11H, Sphingopyxis sp. QXT-31, Sphingopyxis sp. MG and S. terrae subsp. ummariensis UI2) consists of Streptococcal

ur

Hyaluronic Acid Capsule encoding genes, characteristics of pathogenic Streptococcal strains

Jo

which protects them from phagocytosis (Wessels et al., 1991) during invasion into keratinocytes. However, pathogenic properties have not been reported for Sphingopyxis genus till now. Gene families conferring stress resistance were identified. One among these was Toxin/antitoxin system which is involved in functions to combat against stress such as regulation a nd maintenance of plasmids, virulence expression, biofilm formation and resistance against antibiotics (Fernández-García et al, 2016). 11 strains out of 19 viz. S. alaskensis RB2256, S. bauzanensis DSM22271, Sphingopyxis sp. 113P3, S. fribergensis Kp5.2, S. granuli NBRC 100800 and TFA, S. macrogoltabida 203N and EY-1, S. indica DS15, Sphingopyxis sp. QXT-31, S. lindanitolerans WS5A3p, Sphingopyxis sp. PAMC25046, Sphingopyxis sp.

EG6,

Sphingopyxis sp. FD7 and S. flava R11H, consists of toxin/antitoxin system which was not yet reported in sphingomonads. Other notable gene family include hyperosmotic ectoine synthesis

Journal Pre-proof which protects Sphingopyxis spp. against high salinity (Reshetnikovet al., 2011). Genes encoding for the synthesis of Ectoine, an osmoprotectant, were observed in S. alaskensis RB2256, S. terrae NBRC15098, Sphingopyxis sp. 113P3, S. granuli strains, S. macrogoltabida strains, Sphingopyxis sp. PAMC25046, S. lindanitolerans WS5A3p and S. terrae subsp. ummariensis. Interestingly, strains of Sphingopyxis genus are known to adapt against severe environmental conditions such as high salinity, high osmotic stress and are tolerant and degraders of aromatic compounds but the presence of capsular, exopolysaccharide, hyaluronic acid production, toxin/ antitoxin system foretell other untold functions of this genus needs to be

of

investigated.

ro

Strain specific pathways: Least Abundant Protein Families:

Sphingopyxis strains showed the presence of wide range of metabolic pathways in which some

-p

families are limited to certain species or strains. Among all, four strains, S. witflariensis

re

DSM14551 (23), S. flava R11H (18), Sphingopyxis sp. FD7 (13) and Sphingopyxis sp. MG (11) showed highest number of specific gene families. Interestingly, unlike other Sphingopyxis spp.,

lP

S. witflariensis DSM14551 consists of Staphylococcal phi-Mu50B- like prophages, Streptococcus pyogenes transcription and virulence regulators (Graham, 2002) (Figure 3C) which suggest the

na

possible pathogenic abilities of strain. Interestingly, pathogenic abilities have never been reported in any sphingomonad strain yet. On deep focusing, Staphylococcal pathogenicity

ur

islands (SaPI) and Type VI secretion system were found in S. witflariensis DSM14551 and Sphingopyxis sp. MG. SaPI are mobile, phage-related islands, carries two or more super antigens

Jo

and are capable to induce disease condition in humans (Novick and Subedi, 2007). Presence of Streptococcus pyogenes transcription and virulence regulators (Graham, 2002) and type VI secretion system and SaPI in S. witflariensis DSM14551 might belong to the pathogenic group of bacteria. Sphingopyxis sp. MG consists of ABC transporter for Ferrichrome. This type of transporters are mainly present in Staphylococcus species (Hanks et al., 2005) which projects the link of strain MG with virulence abilities. Interestingly, chloroaromatic degradation pathway was also annotated in this strain suggesting its possible dual role. In strain Sphingopyxis sp. FD7, pathway for extracellular polysaccharide biosynthesis in Streptococci was annotated. Exopolysaccharide synthesis, in general, facilitates the adhesion of cells to the host membrane or surface (Jenkinson et al., 1997). Another important gene is the one encode for autolysin from genus Lysteria, was also annotated in strain FD7, which also plays an

Journal Pre-proof important role in facilitating adhesion of the cells to host cells (Milohanic et al., 2001). These annotated gene families propose strain FD7 as third stra in with probable pathogenic/commensal properties. In this strain teichoic and lipoteichoic acids biosynthesis pathway was identified which is a gram positive cell wall component (Mathew et al., 2014). Generally, these acids protect bacteria from antimicrobial peptides which suggests their possible role in the strain.

Another interesting pathway found in strain S. flava R11H, was tocopherol biosynthesis (Vitamin E) for protection against reactive oxygen species, a property restricted to photosynthetic

of

organisms (Satler et al., 2003) and few non-photosynthetic organisms (Sussmann et al., 2011).

ro

Mycolic acid synthesis pathway was also observed in strain S. flava R11H which is a major lipid component of cell wall of Mycobacterium strains (Marrakchi et al., 2014). In Mycobacterium,

-p

mycolic acids play a crucial role in conferring virulence (Beken et al., 2011 ) and low cell wall permeability (Liu et al., 1995). In addition, importance of mycolic acid has been reported in

re

degradation of Gutta-percha (tough plastic like substance) and rubber (Luo et al., 2014). Still, it

lP

is difficult to predict the role of mycolic acid synthesis in strain R11H as there are no such reports on mycolic acid production in gram negative bacteria. In S. lindanitolerans WS5A3p, accessory colonization factor was annotated which projects its

na

ability to colonize in intestine which is unique to the environment dwelling Sphingopyxis strains. Another important pathway is the synthesis of apigenin derivative, a dietary flavonoid which has

Jo

apigenin production.

ur

anti-tumor activity (Yeung et al., 2006). This suggests the possible role of strain WS5A3p in

Moreover, other Sphingopyxis strains have specific pathways (Figure 3C) which account for their diverse environmental adaptations. Especially, organisms similar at species level has few accumulated

differences such as

S.

macrogoltabida

EY-1

possess degradation of

phenylpropionate and aromatic amino acids whereas S. macrogoltabida 203N consists of genes for Cholera toxin, Cytolysin and Lipase operon of Vibrio, S. granuli NBRC100800 has lactose utilization gene family. Similarly, S. terrae subsp. ummariense consists of Na(+) H(+) antiporter, Na+ translocates decarboxylases and related biotin-dependent enzymes, S. terrae NBRC15098 consists of Bacitracin Stress Response. Sphingopyxis sp. EG6 consists of DedA family of inner membrane proteins. Bacitracin is an antibiotic secreted generally by Bacillus spp. against Gram Positive bacteria (Haavik et al., 1974). Stress response suggests the adaptivity of S. terrae

Journal Pre-proof NBRC15098 against Bacitracin. S. granuli TFA and Sphingopyxis sp. PAMC25046 does not have any specific pathway. The complete metabolic matrix of Sphingopyxis strains on the basis of presence or absence of metabolic pathways were clustered using Principal Component Analysis (PCA) to depict the relatedness of strains (Figure 3D). The analysis predicts that S. flava R11H, S. witflariensis DSM14551, Sphingopyxis sp. FD7 and Sphingopyxis sp. MG showed the maximum divergence from the other Sphingopyxis spp. which might be because of the higher number of specific pathways acquired by these strains. Other strains showed close groupings which suggest the

ro

of

extent of proximity between strains in terms of metabolic potential.

Protein-Protein Interaction: determining the intrinsic properties of genus Sphingopyxis

-p

Till now, we discussed the noteworthy diversity of source of isolation of Sphingopyxis spp. which also corresponds in management of different type of stress such as oxidative, osmotic,

re

excessive heat or cold temperatures. Another notable feature is aromatic compound degradation

lP

for which genes were annotated in strains. Hence, to determine whether stress resistance and aromatic compound degradation mechanisms in Sphingopyxis strains is related with the core genome, protein-protein interactomes (PPI) were prepared. For this, genes encoding for stress

na

resistance and aromatic compound degradation present in the pan genome, annotated using MGRAST server (Keegan et al., 2016), were used for preparing separate interactomes with core-

ur

proteins (Table 2 and Figure 4). Also, we prepared two more interactomes with core- genome,

Jo

one is virulence proteins and another is for membrane transport. No such report is available on pathogenic ability of Sphingopyxis strains. But, from the above analysis, sign of pathogenicity was observed on the basis of genetic repertoire of three strains namely S. witflariensis DSM14551, Sphingopyxis sp. MG and Sphingopyxis sp. FD7, which created our interest to this functional property. Membrane transport is an important link between the above three functional families and thus analysed. Hence, four PPI were prepared (Figure 4). A PPI network on core proteins was prepared and 1397 proteins were mapped out of 1422 proteins. GuaA was identified as the hub forming the maximum network connections (346). This was followed by PolA (326) and GuaB (342) respectively (Supplementary figure 4). GuaA is a guanosine monophosphate synthase or GMP synthase which catalyzes the synthesis of GMP from glutamine and xanthosine 5'- monophosphate. GuaB is Inosine-5'-monophosphate

Journal Pre-proof dehydrogenase, catalyzes the conversion of inosine 5'-phosphate (IMP) to xanthosine 5'phosphate (XMP). Both GuaA and GuaB are the part of de novo synthesis of guanine nucleotides. PolA is a prokaryotic DNA polymerase (Supplementary figure 4). Other proteins listed in Table 2 are essential for DNA replication, transcription and translation machinery. The distribution of edge numbers (#) for all the genes in the core and functional interactome is considered which reflect different hubs or most networking proteins in different interactomes. These were highlighted with a diamond shaped and yellow colored mark in respective

of

interactomes.

ro

Aromatic compounds degradation

Sphingopyxis strains have been studied for aromatic compound tolerance and degradation due to

-p

their source of isolation from aromatic compound contaminated habitats (Schut et al., 1993; Kim et al., 2005; Verma et al., 2015). Here, an interactome was prepared using degradation proteins

re

(obtained from the pangenome) with core proteins to decipher that whether proteins for aromatic

lP

compound degradation interact with the core content or not. Total 81 proteins were annotated and plotted in the interactome out of which most were putative with unknown functions. Three known proteins, AroQ, MaiA and AroK were also mapped (Figure 4A). AroQ and AroK encodes

na

for 3-dehydroquinate dehydratase and Shikimate kinase 1, both of which play an important role in chlorismate synthesis and aromatic amino acid biosynthesis. MaiA is a mitochondrial ATPase

ur

inhibitor. Despite mapping 81 genes, the interactome of core and degradation was identified with

Jo

little interaction between both set of genes. This determines distant placement of aromatic degradation proteins from core proteins hence, depict that aromatic compound degradation is limited to some strains but it is not a core property of genus Sphingopyxis.

Stress response Due to the diverse niches occupied by Sphingopyxis strains, they encounter differential environmental stress and harbour adaptations. Proteins which were annotated under stress resistance were mapped with core genome. The PPI network showed marked intervention of stress resistance proteins into core proteins. Total 101 proteins related to stress resistance were mapped out of which 81 proteins were with putative functions and 20 were known proteins. GshB is a Glutathione synthetase identified as the hub of the network with maximum interactions

Journal Pre-proof (Figure 4B). Glutathione (GSH) is an antioxidant which helps in combating the oxidative and osmotic stress (Masip et al., 2006). Glutathione synthetase catalyses the production of GSH from γ-glutamyl cysteine and glycine (Lu, 2009) and glutathione disulphide (Yan et al, 2013), respectively. Another protein identified was Glutathione Transferase further contributes to catalyse a nucleophile attack by GSH on electrophiles of toxic compounds and thereby escort them from the cells (Hayes et al., 2005). Another important protein in the hub was Heatinducible transcription repressor HrcA which prevents induction of heat-shock operons, hence act as regulator (Schulz et al., 1996). Ectoine synthase ectC helps in maintaining the osmorality

of

by synthesizing ectoine, an osmoprotectant. Another such protein mapped is GrpE, a heat shock

ro

and hyperosmotic protein (Harrison et al., 2003). Catalase peroxidase (katG) helps in oxidizing different compounds like NAD(P)H and protects against toxic reactive oxygen species (ROS).

-p

Proteins DnaK and DnaJ are chaperon and helps in DNA replication process and RpoH and RpoD are part of sigma factor required for RNA polymerase were also found in the network. The

re

integration of stress network in core protein was profound which suggests a link between core

lP

and stress resistance proteins.

Virulence factors

na

Virulence factors were identified only in three strains, S. witflariensis DSM14551, Sphingopyxis sp. MG and Sphingopyxis sp. FD7 and raised interest as till date no Sphingopyxis spp. has been

ur

identified as a pathogen. Hence, to trace that whether the virulence proteins forms a closed

Jo

network with core proteins, an interactome of virulence proteins and core protein was prepared. Virulence proteins showed less interaction with core proteins but were rather found to be concentrated in a separate network (Figure 4C). BlaR1 is a potential penicillin-binding protein required for induction of β-lactamase. Along with BlaR1, another protein TpiA Triosephosphate isomerase was mapped in the network which catalyzes the conversion of dihydroxyacetone phosphate (DHAP) to D-glyceraldehyde-3-phosphate (G3P) stereo-specifically. Like TpiA, other proteins showing network does not project any connection between core and virulence proteins. This suggests that possible virulence ability is limited to few strains only.

Membrane transport

Journal Pre-proof Membrane transport proteins contribute in transportation of molecule across the cell membrane and have a combined role in aromatic compounds degradation, stress resistance and virulence behaviour of bacteria. Hence, major interest was to focus on interactome of core and membrane transport proteins. Total of 152 proteins belonging to membrane transport system were mapped on PPI network among which only 14 proteins have known functions. Among these, eight proteins belong to intracellular protein transmembrane transport family (Figure 4D and Table 2). These were secY (Protein translocase subunit SecY), secB (Protein-export protein), secE (Protein translocase subunit SecE), secF_1(Protein translocase subunit SecD), secF_2 (Protein-

of

export membrane protein SecF) and TatA, TatB and TatC (Sec-independent protein

ro

translocases). Another mapped proteins include CpaD, a pilus assembly protein of type IV secretion system. Two proteins, Ffh and FtsY, are signal recognition particle receptor (Figure

-p

4D). PPI network showed closer interaction of membrane proteins and core proteins compared with aromatic degradation and virulence. However, the presented data is based on the PPI

lP

re

predictions only due to the lack of valuable comparative data on Sphingopyxis strains.

Conclusion

na

Comparative genomic analysis of Sphingopyxis strains was performed to unravel the functional attributes of members of genus Sphingopyxis. We identified that S. baekryungensis DSM16222

ur

and Sphingopyxis sp. LPB0140 should not be considered under genus Sphingopyxis due to comparably low sequence identities while other uncharacterized strains should be included under

Jo

Sphingopyxis. Functional attributes were examined to focus on the abundant as well as rare gene families. Anaerobic respiration, type I secretion, Hfl operon, exopolysaccharide formation, toxinantitoxin system and hyaluronic acid production are some of the interesting features of Sphingopyxis strains. Genes that suggests possible virulence behaviour were annotated in S. witflarensis DSM 14551, Sphingopyxis sp. FD7 and Sphingopyxis sp. MG. PPI suggests the close network between stress and core proteins while virulence and aromatic compound degradation showed less or no such interaction with core proteins. Thus, virulence and aromatic compound degradation can be proposed limited to some members of the genus while stress resistance as the core property of genus Sphingopyxis. Notably, certain genetic features project the usefulness of strains for biotechnological applications. For instance, polyhydroxybutyrate degradation as core-

Journal Pre-proof property of Sphingopyxis strains can suggest their usability in production and clean- up of biodegradable plastics.

Methods Selection of strains and data retrieval Genome of Sphingopyxis strains were retrieved from NCBI-genome database. Strains which are either taxonomically characterized or which have complete genomes published were included in the study. These include Sphingopyxis alaskensis RB2256 (Williams et al., 2009), Sphingopyxis

of

baekryungensis DSM16222, Sphingopyxis bauzanensis DSM22271 (Kaminski et al., 2017b), Sphingopyxis fribergensis Kp5.2 (Oelschlägel et al., 2014), Sphingopyxis granuli NBRC100800,

ro

Sphingopyxis granuli TFA, Sphingopyxis macrogoltabida 203N (Ohtsubo et al.,2013),

-p

Sphingopyxis macrogoltabida EY-1 (Ohtsubo et al., 2015b), Sphingopyxis witflariensis DSM14551 (Kaminski et al., 2017a), Sphingopyxis terrae NBRC 15098, Sphingopyxis sp. QXT-

re

31, Sphingopyxis sp. EG6, Sphingopyxis sp. FD7, S. lindanitolerans WS5A3p (Kaminski et al.,

lP

2018), Sphingopyxis sp. MG, Sphingopyxis sp. PAMC25046, Sphingopyxis sp. LPB0140, Sphingopyxis sp. 113P3 (Ohtsubo et al., 2015a), Sphingopyxis flava R11H, Sphingopyxis indica

na

DS15 and Sphingopyxis terrae subsp. ummariense UI2 (Feng et al., 2017). Core and pan genome analysis

ur

Core genetic content among Sphingopyxis strain were identified using GET_HOMOLOGUES

Jo

(Contreras-Moreira and Vinusea, 2013) at query cov_75, identity_75 using COG and OMCL algorithms. The scripts available in GET_HOMOLOGUES package were used to determine pan genome and plot core and pan genome using tettlin fit. Phylogenetic division The 21 strains included in the study were analysed for relatedness using genome to genome distance calculator (GGDC) (Auch, et al., 2010) at http://ggdc.dsmz.de/. Another method used were Average Nucleotide Identity (ANI) and Average Aminoacid Identity (AAI) calculated at http://enve-omics.ce.gatech.edu/ (Rodriguez and Konstantinidis, 2014) among strains. Core content was extracted from each genome using GET_HOMOLOGUES as described above and SNPs were calculated using kSNP3.0 (Gardner et al., 2015). Phylogeny was constructed by using

Journal Pre-proof the matrices of the obtained values from GGDC, ANI and core-SNP using Interactive Tree Of Life (iTOL) v4 (Letunic et al., 2016). The consensus phylogeny was plotted on the dendrogram obtained from all the three methods and clades were highlighted with different colors Annotation of genomes and functional pathways Functions were assigned using Glimmer method at Rapid Annotation Subsystem Technology (RAST) (Meyer et al., 2008) and the seed Project (Overbeek et al., 2013) for all the strains under study. Fig IDs were retrieved and used to identify enriched pathways in strains using Minpath

of

server (Yuzhen and Doak, 2009). The pathways which were mapped with MinPath based on parsimony approach were further sorted into three divisions: first was most abundant, second

ro

was differentially present and third was strain specific pathways by using the matrix prepared on

-p

the basis of presence or absence of the pathway and nearly or >150 pathways most abundant and most differentially present pathways were plotted in first two categories, respectively. All

re

specific pathways were plotted under the specific pathways. Matrices were prepared on presence

lP

and absence of the pathway in each category in which 1 is denoted to the presence and 0 to the absence.

Then, the heatmap was prepared using Pearson Correlation method in the most abundant, varied pathways and strain- specific pathways on MeV software (Saeed et al., 2003) and Hierarchical

na

clustering was performed. Principal component analysis was performed using matrix made on complete list of pathways and their presence or absence in different strains as per MinPath

ur

parsimony approach. PC1 and PC2 were plotted using Multiexperiment Viewer (MeV) to

Jo

determine the closed clustering among strains based on their functional repertoire (Howe et al., 2011). RNAmmer (Lagesen et al., 2007) and ARAGORN (Laslett et al., 2004) were used to identify tRNA and rRNA in strains. Construction of Protein-Protein Interaction Network of Sphingopyxis strains The functional subsystems from the pan genome was identified against FIGfams and classified into SEED subsystem for annotation of genes. Genes classified under aromatic compound degradation, stress response, membrane transport and virulence were retrieved from the pan genome of Sphingopyxis strains using annotation feature of MG-RAST (Keegan et al., 2016). To construct protein-protein interactions for Sphingopyxis strains, STRING Database v10 (Szklarczyk et al., 2015) was used, which is the most comprehensive resource for PPI. The

Journal Pre-proof STRING database comprises already recognized, published and projected PPIs, which includes direct (physical) and indirect (functional) associations. Data from Sphingopyxis genus was selected for PPI construction, due to their phylogenetic relatedness, as per the criteria (Bergogne- Berezin et al., 1996). Identification of Hubs in PPI network The PPI networks identify the hubs that are crucial in biological networks. Hubs tend to have a high degree of interactions (Sengupta et al., 2009). In this study, using network analyzer, the plug-

of

in of Cytoscape v 3.0.1 and Perl programming version 5.18.2.2, communication of hub protein was

ro

identified. Statistical Analysis of the Network

-p

The statistical and functional significance of the network was measured using different statistical

re

parameters such as probability of degree distribution, average clustering co-efficient and average neighbourhood connectivity (Albert and Barabasi, 2002). The network is constructed to analyze

lP

that whether or not it obeyed power law (Barabasi and Albert, 1999). Statistical analysis was

References

na

performed similar to a previous study by Gupta et al., 2016.

Phys. 74, 47–97.

ur

Albert, R., and Barabasi, A. L. (2002). Statistical mechanics of complex networks. Rev. Mod.

Jo

Allen, S. A., Clark, W., McCaffery, J. M., Cai, Z., Lanctot, A., Slininger, P. J. et. al. (2010). Furfural induces reactive oxygen species accumulation and cellular damage in Saccharomyces cerevisiae. Biotechnol. Biofuels. 3, 2. Altaee, N., El- Hiti, G. A., Fahdil, A., Sudesh, K., and Yousif, E. (2016). Biodegradation of different formulations of polyhydroxybutyrate films in soil. Springer Plus. 5, 762. Alvarez-Martinez, C. E., and Christie, P. J. (2009). Biological diversity of prokaryotic type IV secretion systems. Microbiol. Mol. Biol. Rev. 73, 775–808. Auch, A. F., Klenk, H. P., and Goker, M. (2010). Standard operating procedure for calculating genome- to-genome distances based on high-scoring segment pairs. Stand. Genomic Sci. 2, 142–148.

Journal Pre-proof Aylward, F. O., McDonald, B. R., Adams, S. M., Valenzuela, A., Schmidt, R. A., Goodwin, L. A. (2013). Comparison of 26 sphingomonad genomes reveals diverse environmental adaptations and biodegradative capabilities. Appl. Environ. Microbiol., 79, 3724-3733. Backert, S., and Selbach, M. (2008). Role of type IV secretion in Helicobacter pylori pathogenesis. Cell. Microbiol. 10, 1573–1581. Balkwill, D. L., Fredrickson, J. K., and Romine, M. F. (2006). Sphingomonas and related genera. The Prokaryotes: Volume 7: Proteobacteria: Delta, Epsilon Subclass, 605-629. Barabasi, A. L., and Albert, R. (1999). Emergence of Scaling in Random Networks. Sci. 286,

of

509-512.

ro

Belfort, M. and Wulff, D. L. (1973). Genetic and biochemical investigation of the Escherichia coli mutant hfl-1 which is lysogenized at high frequency by bacteriophage lambda. J.

-p

Bacteriol. 115, 299–306.

Bergogne- Berezin, E., and Towner, K. J. (1996). Acinetobacter spp. as nosocomial pathogens:

re

microbiological, clinical, and epidemiological features. Clin. Microbiol. Rev. 9, 148-165.

lP

Bosch, R., Garcia- Valdes, E., and Moore, E. R. B. (1999). Genetic characterization and evolutionary implications of a chromosomally encoded naphthalene degradation upper pathway from Pseudomonas stutzeri AN10. Gene. 236, 149–157.

na

Christie, P. J. (2001). Type IV secretion: Intercellular transfer of macromolecules by systems ancestrally related to conjugation machines. Mol. Microbiol. 40, 294–305.

ur

Contreras-Moreira, B., and Vinuesa, P. (2013). GET_HOMOLOGUES, a versatile software

7696-7701.

Jo

package for scalable and robust microbial pangenome analysis. Appl. Environ. Microbiol. 79,

Daury, L., Orange, F., Taveau, J. C., Verchère, A., Monlezun, L., Gounou, C. (2016). Tripartite assembly of RND multidrug efflux pumps. Nat. Commun. 7, 10731. Denner, E. B., Paukner, S., Kämpfer, P., Moore, E. R., Abraham, W. R., Busse, H. J. et al. (2001). Sphingomonas pituitosa sp. nov., an exopolysaccharide-producing bacterium that secretes an unusual type of sphingan. Int. J. Syst. Evol. Microbiol. 51, 827–841. Duche, D. (2007). Colicin E2 is still in contact with its receptor and import machinery when its nuclease domain enters the cytoplasm. J. Bacteriol. 189, 4217–4222. Feng, G. D., Wang, D. D., Yang, S. Z., Li, H. P., and Zhu, H. H. (2017). Genome-based reclassification of Sphingopyxis ummariensis as a later heterotypic synonym of Sphingopyxis

Journal Pre-proof terrae, with the descriptions of Sphingopyxis terrae subsp. terrae subsp. nov. and Sphingopyxis terrae subsp. ummariensis subsp. nov. Int. J. Syst. Evol. Microbiol. 67, 5279-5283. Fernández-García, L., Blasco, L., Lopez, M., Bou, G., García-Contreras, R., Wood, T., and Tomas, M. (2016). Toxin-antitoxin systems in clinical pathogens. Toxins (Basel). 8, 227. Gan, H. M., Hudson, A. O., Rahman, A. Y., Chan, K. G., and Savka, M. A. (2013) Comparative genomic analysis of six bacteria belonging to the genus Novosphingobium : insights into marine adaptation, cell-cell signalling and bioremediation. BMC Genom. 14, 431. García- Romero, I., Pérez-Pulido, A. J., González-Flores, Y. E., Reyes- Ramírez, F., Santero, E.,

of

and Floriano, B. (2016). Genomic analysis of the nitrate-respiring Sphingopyxis granuli

ro

(formerly Sphingomonas macrogoltabida) strain TFA. BMC Genom. 17, 93. García-Romero, I., Pérez-Pulido, A. J., González-Flores, Y. E., Reyes-Ramírez, F., Santero, E.,

-p

Floriano B. (2016) Genomic analysis of the nitrate-respiring Sphingopyxis granuli (formerly Sphingomonas macrogoltabida) strain TFA. BMC Genomics. 4, 17:93.

re

Gardner, S. N., Slezak, T., and Hall, B. G. (2015). kSNP3.0: SNP detection and phylogenetic

lP

analysis of genomes without genome alignment or reference genome. Bioinformatics. 31, 2877-2878.

Getachew, A., and Woldesenbet, F. (2016). Production of biodegradable plastic by

BMC Res. Notes. 9, 509.

na

polyhydroxybutyrate (PHB) accumulating bacteria using low cost agricultural waste material.

ur

Gibson, K. J., and Gibson, J. (1992). Potential early intermediates in anaerobic benzoate

Jo

degradation by Rhodopseudomonas palustris. Appl. Environ. Microbiol. 58, 696-698. Glaeser, S. P., and Kampfer, P. (2014). The Family Sphingomonadaceae. The Prokaryotes, 641-707.

Graham, M. R., Smoot, L. M., Migliaccio, C. A. L., Virtaneva, K., Sturdevant, D. E., Porcella, S. F. et al. (2002). Virulence control in group A Streptococcus by a two-component gene regulatory system: global expression profiling and in vivo infection modelling. Proc. Natl. Acad. Sci. USA. 99, 13855–13860. Green, E. R., and Mecsas, J. (2016). Bacterial Secretion Systems: An Overview. Microbiol Spectr. 4. Grissa, I., Vergnaud, G., and Pourcel, C. (2007). The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats. BMC Bioinf. 8, 172.

Journal Pre-proof Gupta, V., Haider, S., Sood, U., Gilbert, J. A., Ramjee, M., Forbes, K. et al. (2016). Comparative genomic analysis of novel Acinetobacter symbionts: A combined systems biology and genomics approach. Sci. Rep. 6, 29043. Haavik, H. I. (1974). Studies on the formation of bacitracin by Bacillus licheniformis: effect of glucose. Microbiology, 81(2), 383-390. Hanks, T. S., Liu, M., McClure, M. J., & Lei, B. (2005). ABC transporter FtsABCD of Streptococcus pyogenes mediates uptake of ferric ferrichrome. BMC microbiology, 5(1), 62. Harrison, C. (2003). GrpE, a nucleotide exchange factor for DnaK. Cell stress chaperon. 8,

of

218–224. Hayes, J. D., Flanagan, J. U., and Jowsey, I. R. (2005). Glutathione transferases. Annu. Rev.

ro

Pharmacol. Toxicol. 45, 51-88.

-p

Ho, L., Meyn, T., Keegan, A., Hoefel, D., Brookes, J., Saint, C. P., and Newcombe, G. (2006). Bacterial degradation of microcystin toxins within a biologically active sand filter. Water

re

Res. 40, 768-774.

Holmes, D. E., Risso, C., Smith, J. A., and Lovley, D. R. (2012). Genome-scale analysis of

lP

anaerobic benzoate and phenol metabolism in the hyperthermophilic archaeon Ferroglobus placidus. Int. J. Syst. Evol. Microbiol. 6, 146–157.

Jo

ur

na

Howe, E. A., Sinha, R., Schlauch, D., & Quackenbush, J. (2011). RNA-Seq analysis in MeV. Bioinformatics, 27, 3209-3210 Jenkinson HF, Lamont R. Streptococcal adhesion and colonization. Crit Rev Oral Biol Med. 1997;8:175–200. Jeong, C. B., Kim, D. H., Kang, H. M., Lee, Y. H., Kim, H. S., Kim, I. C., and Lee, J. S. (2017). Genome-wide identification of ATP-binding cassette (ABC) transporters and their roles in response to polycyclic aromatic hydrocarbons (PAHs) in the copepod Paracyclopina nana. Aquat. Toxicol. 183, 144-155. Jindal, S., Dua, A., and Lal, R. (2013). Sphingopyxis indica sp. nov., isolated from a high dose point hexachlorocyclohexane (HCH) contaminated dumpsite. Int. J. Syst. Evol. Microbiol. 63, 2186-2191. Kaminski M. A., Sobczak, A., M., Dziembowski, A., Lipinski, L. (2019). Genomic Analysis of γ-Hexachlorocyclohexane-Degrading Sphingopyxis lindanitolerans WS5A3p Strain in the Context of the Pangenome of Sphingopyxis. Genes. 10(9), 688.

Journal Pre-proof Kaminski, M. A., Furmanczyk, E. M., Dziembowski, A., Sobczak, A., Lipinski, L. (2017a) Draft Genome Sequence of the Type Strain Sphingopyxis witflariensis DSM 14551. Genome Announc. 5(36). Kaminski, M. A., Furmanczyk, E. M., Dziembowski, A., Sobczak, A., Lipinski, L. (2017b). Draft Genome Sequence of the Type Strain Sphingopyxis bauzanensis DSM 22271. Genome Announc. 5(37) Kaminski, M. A., Sobczak, A., Spolnik, G., Danikiewicz, W., Dziembowski, A., Lipinski, L. (2018) Sphingopyxis lindanitolerans sp. nov. strain WS5A3pT enriched from a pesticide disposal

of

site. Int J Syst Evol Microbiol. 68, 3935-3941.

ro

Kampfer, P., Witzenberger, R., Denner, E. B. M., Busse, H. J., and Neef, A. (2002). Sphingopyxis witflariensis sp. nov., isolated from activated sludge. Int. J. Syst. Evol.

-p

Keegan, K. P., Glass, E. M., and Meyer, F. (2016). MG- RAST, a Metagenomics Service for Analysis of Microbial Community Structure and Function. Methods Mol Biol. 1399, 207-233.

re

Kim, M. K., Im, W. T., Ohta, H., Lee, M., and Lee, S. T. (2005). Sphingopyxis granuli sp.

lP

nov., a β-glucosidase producing bacterium in the family Sphingomonadaceae in α-4 subclass of the Proteobacteria. J. Microbiol. 43, 152–157.

Konstantinidis, K. T., and Tiedje, J. M. (2005). Towards a genome-based taxonomy for

na

prokaryotes. J. Bacteriol. 187, 6258–6264.

Kumar, R., Verma, H., Haider, S., Bajaj, A., Sood, U., Ponnusamy, K. et al. (2017).

ur

Comparative genomic analysis reveals habitat-specific genes and regulatory hubs within the

Jo

genus Novosphingobium. MSystems, 2, e00020-17. Lagesen, K., Hallin, P., Rødland, E. A., Stærfeldt, H. H., Rognes, T., and Ussery, D. W. (2007). RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100-3108. Lal, R., Pandey, G., Sharma, P., Kumari, K., Malhotra, S., Pandey, R., et al. (2010). Biochemistry of microbial degradation of hexachlorocyclohexane and prospects for bioremediation. Microbiol. Mol. Biol. Rev. 74, 58-80. Lam, W., Wang, Y., Chan, P. L., Chan, S. W., Tsang, Y. F., Chua, H. et al. (2017). Production of polyhydroxyalkanoates (PHA) using sludge from different wastewater treatment processes and the potential for medical and pharmaceutical applications. Environ. Technol. 38, 17791791.

Journal Pre-proof Laslett D., and Canback B. (2004). ARAGORN, a program to detect tRNA genes and tmRNA genes in nucleotide sequences. Nucleic Acids. Res. 32, 11–16. Lawley, T. D., Klimke, W. A., Gubbins, M. J., and Frost, L. S. (2003). F factor conjugation is a true type IV secretion system. FEMS Microbiol. Lett. 224, 1–15. Letunic, I., and Bork, P. (2016). Interactive Tree Of Life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic. Acids. Res. 44, 242-245. Liu, J., Rosenberg, E. Y., and Nikaido, N. (1995). Fluidity of the lipid domain of cell wall from Mycobacterium chelonae. Proc. Natl. Acad. Sci. USA. 92, 11254-11258.

of

Lu, S. C. (2009). Regulation of glutathione synthesis. Mol Aspects Med 30, 42–59.

ro

Luo, Q., Hiessl, S., Poehlein, A., Daniel, R., and Steinbuchel, A. (2014). Insights into the microbial degradation of rubber and gutta-percha by analysis of the complete genome of

-p

Nocardia novaSH22a. Appl. Environ. Microbiol. 80, 3895–3907.

and beyond. Chem. Biol. 21, 67–85.

re

Marrakchi, H., Laneelle, M. A., and Daffe, M. (2014). Mycolic acids: structures, biosynthesis,

lP

Martín‐ Cabello, G, Moreno‐ Ruiz., E, Morales., V, Floriano., B., and Santero, E. (2011). Involvement of poly(3‐ hydroxybutyrate) synthesis in catabolite repression of tetralin

Reports. 3, 627-631.

na

biodegradation genes in Sphingomonas macrogolitabida strain TFA. Environ. Micobiol.

Masip, L., Veeravalli, K., and Georgiou G. (2006). The many faces of glutathione in bacteria,

ur

Antioxid. Redox Signal. 8, 753–762.

Jo

Meyer, F., Paarmann, D., D'Souza, M., Olson, R., Glass, E. M., Kubal, M. et al. (2008).The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinf. 9, 386. Miller, A. J., Shen, Q. and Xu, G. (2009). Freeways in the plant, transporters for N, P and S and their regulation. Curr. Opin. Plant Biol. 12, 284–290. Milohanic, E., Jonquières, R., Cossart, P., Berche, P., & Gaillard, J. L. (2001). The autolysin Ami contributes to the adhesion of Listeria monocytogenes to eukaryotic cells via its cell wall anchor. Molecular microbiology, 39(5), 1212-1224. Miyazaki, R., Sato, Y., Ito, M., Ohtsubo, Y., Nagata, Y., and Tsuda, M. (2006). Complete nucleotide sequence of an exogenously

isolated

plasmid, pLB1,

involved

hexachlorocyclohexane degradation. Appl. Environ. Microbiol. 72, 6923–6933.

in

γ-

Journal Pre-proof Novick, R. P., and Subedi, A. (2007). The SaPIs: mobile pathogenicity islands of Staphylococcus. Chem. Immunol. Allergy 93, 42–57. Oelschlägel, M., Zimmerling, J., Schlömann, M., Tischler, D. (2018) Styrene oxide isomerase of Sphingopyxis sp. Kp5.2. Microbiology 160, 2481-91. Ohtsubo, Y., Nagata, Y., Numata, M., Tsuchikane, K., Hosoyama, A., Yamazoe, A., Tsuda, M., Fujita, N., Kawai, F. (2015a) Complete Genome Sequence of Polyvinyl Alcohol-Degrading Strain Sphingopyxis sp. 113P3 (NBRC 111507). Genome Announc. 3(5). Ohtsubo, Y., Nagata, Y., Numata, M., Tsuchikane, K., Hosoyama, A., Yamazoe, A., Tsuda, M.,

of

Fujita, N., Kawai, F. (2015b). Complete Genome Sequence of Polypropylene Glycol- and

ro

Polyethylene Glycol Degrading Sphingopyxis macrogoltabida Strain EY-1. Genome Announc. 3(6).

-p

Ohtsubo, Y., Nonoyama, S., Nagata, Y., Numata, M., Tsuchikane, K., Hosoyama, A., Yamazoe, A., Tsuda, M., Fujita, N., Kawai, F. (2016). Complete Genome Sequence of Sphingopyxis

re

macrogoltabida Strain 203N (NBRC 111659), a Polyethylene Glycol Degrader. Genome

lP

Announc. 4(3).

Overbeek, R., Olson, R., Pusch, G. D., Olsen, G. J., Davis, J. J., Disz, T. et al. (2013). The SEED and the Rapid Annotation of microbial genomes using S ubsystems Technology (RAST).

na

Nucleic Acids Res. 42, 206-214.

Parthasarathy, S., Azam, S., Lakshman Sagar, A., Narasimha Rao, V., Gudla, R., Parapatla, H.,

ur

Yakkala, H. Vemuri, S.G., and Siddavattam, D. (2017). Genome-guided insights reveal

Jo

organophosphate-degrading Brevundimonas diminuta as Sphingopyxis wildii and define its versatile metabolic capabilities and environmental adaptations. Genome Biol. Evolut. 9, 77–81. Percy, M. G., Gründling, A. (2014). Lipoteichoic Acid Synthesis and Function in Gram-Positive Bacteria. Annual review of microbiology, 68, 81-100. Reshetnikov, A. S., Khmelenina, V. N., Mustakhimov, I. I., Kalyuzhnaya, M., Lidstrom, M., and Trotsenko, Y. A. (2011). Diversity and phylogeny of the ectoine biosynthesis genes in aerobic, moderately halophilic methylotrophic bacteria. Extremophiles. 15, 653–663. Rodriguez-R L. M., and Konstantinidis K.T. (2014). Bypassing cultivation to identify bacterial species. Microbe. 9, 111–118. Rückert, C., Albersmeier, A., Winkler, A., & Tauch, A. (2015). Complete genome sequence of Corynebacterium kutscheri DSM 20755, a corynebacterial type strain with remarkably low G+ C content of chromosomal DNA. Genome Announc., 3(3), e00571-15.

Journal Pre-proof Saeed, A. I., Sharov, V., White, J., Li, J., Liang, W., Bhagabati, N. et al. (2003). TM4: a free, open-source system for microarray data management and analysis. Biotechniques 34, 374-378. Sattler, S. E., Cahoon, E. B., Coughlan, S. J., and DellaPenna, D. (2003). Characterization of tocopherol cyclases from higher plants and cyanobacteria. Evolutionary implications for tocopherol synthesis and function. Plant Physiol. 132, 2184–2195. Schmid, J., Sieber, V., and Rehm, B. (2015). Bacterial exopolysaccharides: biosynthesis pathways and engineering strategies. Front. Microbiol. 6, 496. Schulz, A., and Schumann, W. (1996). hrcA, the first gene of the Bacillus subtilis dnaK operon

of

encodes a negative regulator of class I heat shock genes. J. Bacteriol. 178, 1088–1093.

ro

Schut, F., de Vries, E. J., Gottschal, J. C., Robertson, B. R., Harder, W., and Prins, R. A. (1993). Isolation of typical marine bacteria by dilution culture: growth, maintenance, and

-p

characteristics of isolates under laboratory conditions. Appl. Environ. Microbiol., 59, 2150-

re

2160.

Sengupta, U., Ukil, S., Dimitrova, N., and Agrawal, S. (2009). Expression-based network

risk/complications. PLoS one. 4.

lP

biology identifies alteration in key regulatory pathways of type 2 diabetes and associated

na

Sharma, P., Verma, M., Bala, K., Nigam, A., and Lal, R. (2010). Sphingopyxis ummariensis sp. nov., isolated from hexachlorocyclohexane (HCH)-dumpsite in north India. Int. J. Syst. Evol.

ur

Microbiol. 60, 780-784.

Shin, N. R., Whon, T. W., Roh, S. W., Kim, M. S., Jung, M. J., Lee, J., & Bae, J. W. (2011).

Jo

Genome sequence of Corynebacterium nuruki S6-4T, isolated from alcohol fermentation starter. Silva, J. A., Tobella, L. M., Becerra, J., Godoy, F., and Martínez, M. A. (2007). Biosynthesis of poly-β- hydroxyalkanoate by Brevundimonas vesicularis LMG P-23615 and Sphingopyxis macrogoltabida LMG 17324 using acid-hydrolyzed sawdust as carbon source. J. Biosci. Bioeng. 103, 542–546. Smith, A. W., Roche, H., Trombe, M. C., Briles, D. E., and Hakansson, A. (2002). Characterization of the dihydrolipoamide dehydrogenase from Streptococcus pneumoniae and its role in pneumococcal infection. Mol. Microbiol. 44, 431–448. Smith, L. T., Pocard, J., Bernard, T., and Le-Rudulier, D. (2002). Osmotic control of glycine– betaine biosynthesis and degradation in Rhizobium meliloti. J. Bacteriol. 170, 3142–3149.

Journal Pre-proof Sussmann, R. A. C., Angeli, C. B., Peres, V. J., Kimura, E. A., and Katzin, A. M. (2011). Intraerythrocytic stages of Plasmodium falciparum biosynthesize vitamin E. FEBS Lett. 585, 3985–3991. Szklarczyk, D., Franceschini, A., Wyder, S., Forslund, K., Heller, D., Huerta-Cepas, J. et al. (2015). STRING v10: protein- protein interaction networks, integrated over the tree of life. Nucleic. Acids Res. 43, D447–D452. Vander Beken, S., Al Dulayymi, J. A. R., Naessens, T., Koza, G., Maza ‐ Iglesias, M., Rowles, R. et al. (2011). Molecular structure of the Mycobacterium tuberculosis virulence factor,

of

mycolic acid, determines the elicited inflammatory pattern. Eur. J. Immunol. 41, 450–460.

ro

Verlinden, R. A. J., Hill, D. J., Kenward, M. A., Williams, C. D., and Radecka I. (2007). Bacterial synthesis of biodegradable polyhydroxyalkanoates. J. Appl. Microbiol. 102, 1437–

-p

1449.

Verma, H., Bajaj, A., Kumar, R., Kaur, J., Anand, S., Nayyar, N., e t al. (2017). Genome

re

organization of Sphingobium indicum B90A: an archetypal hexachlorocyclohexane (HCH)

lP

degrading genotype. Genome Boil. Evol. 9, 2191-2197. Verma, H., Kumar, R., Oldach, P., Sangwan, N., Khurana, J. P., Gilbert, J. A. et al. (2014). Comparative genomic analysis of nine Sphingobium strains: insights into their evolution and

na

hexachlorocyclohexane (HCH) degradation pathways. BMC Genom. 1014. Verma, H., Rani, P., Singh, A. K., Kumar, R., Dwivedi, V., Negi, V. et al. (2015).

ur

Sphingopyxis flava sp. nov. isolated from a hexachlorocyclohexane (HCH)-contaminated soil.

Jo

Int. J. Syst. Evol. Microbiol. 65, 3720-3726. Wessels, M. R., Moses, A., Goldberg, J. B., and DiCesare, T. J. (1991). Hyaluronic acid capsule is a virulence factor for mucoid group A streptococci. Proc. Natl. Acad. Sci. USA. 88, 8317–8321. Whitfield, C. (2006). Biosynthesis and assembly of capsular polysaccharides in Escherichia coli. Annu. Rev. Biochem. 75, 39–68. Williams, T. J., Ertan, H., Ting, L., Cavicchioli, R. (2009) Carbon and nitrogen substrate utilization in the marine bacterium Sphingopyxis alaskensis strain RB2256. ISME J. 3, 1036-52. Yan, J., Ralston, M. M., Meng, X., Bongiovanni, K. D., Jones, A. L., Benndorf, R. et al. (2013). Glutathione reductase is essential for host defense against bacterial infection. Free Radic. Biol. Med. 61, 320-332.

Journal Pre-proof Yeung, S. C. J. (2006). Chapter 7 Preclinical studies of chemotherapy for undifferentiated thyroid carcinoma. Advances in Molecular and Cellular Endocrinology. 4, 117-144. Yoon, J. H., Lee, C. H., Yeo, S. H., and Oh, T. K. (2005). Sphingopyxis baekryungensis sp. nov., an orange-pigmented bacterium isolated from seawater of the Yellow Sea in Korea. Int. J. Syst. Evol. Microbiol. 55, 1223–1227. Yuan, G. L., Wu, L. J., Sun, Y., Li, J., Li, J. C., and Wang, G. H. (2015). Polycyclic aromatic hydro-carbons in soils of the central Tibetan Plateau, China: distribution, sources, transport and contribution in global cycling. Environ. Pollut. 203, 137-144.

of

Yuzhen, Y., and Doak, T. G. (2009). A parsimony approach to biological pathway

ro

reconstruction/inference for genomes and metagenomes. PLOS Comput. Biol. 8, e1000465. Zhang, D. C., Liu, H. C., Xin, Y. H., Zhou, Y. G., Schinner, F., and Margesin, R. (2010).

-p

Sphingopyxis bauzanensis sp. nov., a psychrophilic bacterium isolated from soil. Int. J. Syst. Evol. Microbiol. 60, 2618-2622.

re

Zhou, Y., Liang, Y., Lynch, K. H., Dennis, J. J., and Wishart, D. S. (2011). PHAST: a fast

lP

phage search tool. Nucleic. Acids Res. 39, 347-352.

na

Competing Interest

Rup Lal and Vipin Gupta was employed by company PhiXgen Pvt. Ltd. All other authors declare

ur

no competing interests.

Jo

Author Contribution

HV, GGD, MS and VG have designed the experiment. HV and GGD carried out the analysis of data and written the manuscript drafts. RL, YS and RKN critically reviewed the manuscript. RL have given the final approval to the manuscript to publish. All authors read and approved the final manuscript.

Funding The work was supported by Grants from the Department of Biotechnology (DBT), Government of India under project BT/PR22797/BCE/8/1413/2016.

Acknowledgement

Journal Pre-proof We would like to thank Ramjas College and Kirori Mal college, University of Delhi, Delhi for providing support. MS gratefully acknowledge Council for Scientific and Industrial Research (CSIR) for providing research fellowships. RL thanks The National Academy of Sciences, India, for support under the NASI‐ Senior Scientist Platinum Jubilee Fellowship Scheme.

Figure Legends: Figure 1: Phylogenetic clustering of Sphingopyxis strains using three different methods: A) Geno me to Genome

of

distance calculator (GGDC), B) Average A mino acid Identity (AAI) and Core-Single Nucleotide Po ly morphism

ro

(Core-SNP). Clades were colored separately to show monophylectic divisions.

Figure 2: Trend of A) core and B) pan genome with addit ion of Sphingopyxis genomes fro m 1 to 19. The saturation

-p

curve is prepared using Tettlin fit. The nu mber of genomes were p lotted on X axis and size of core and pan genome

re

is plotted on Y-axis in separate graphs.

Figure 3: Functional profiling of 19 Sphingopyxis strains. The functional families were grouped under three

lP

categories, most abundant families, varied families (among strains) and least abundant/strain specific families. Sphingopyxis spp. are denoted with their species and strain name A) Most abundant families: absence is denoted with light blue and presence with dark b lue B) Varied families: absence is denoted with light yellow and presence

na

with red C) Least abundant families: absence is denoted with light yellow and presence by green. D) PCA analysis based on the presence or absence of comp lete functional families of strains. PC1 (26.4%) was plotted against PC2

ur

(17.3%). RB2256: Sphingopyxis alaskensis RB2256, DSM 22271: Sphingopyxis bauzanensis DSM 22271, Kp 5.2: Sphingopyxis fribergensis Kp5.2, NBRC100800: Sphingopyxis granuli NBRC100800, TFA: Sphingopyxis granuli

Jo

TFA, 203N: Sphingopyxis macrogoltabida 203N, EY-1: Sphingopyxis macrogoltabida EY-1, QXT-31: Sphingopyxis sp. QXT-31, 113P3: Sphingopyxis sp. 113P3, , DS15: S. indica DS15, UI2: S. terrae subsp. ummariense, EG6: Sphingopyxis sp. EG6, PAMC25046: Sphingopyxis sp. PAMC 25046, WS5A3p: Sphingopyxis lindanitolerans WS5A3p. Sphingopyxis witflariensis DSM14551, Sphingopyxis flava R11H, Sphingopyxis sp. MG and Sphingopyxis sp. FD7 were denoted with complete names .

Figure 4: Protein-Protein Interaction network of Sphingopyxis species was established to trace their connection with core-protein. Core proteins in all interactomes were marked with red color. A) C+D: Core and aromatic compound degradation in which degradation proteins were marked with blue color. B) C+S: Core and stress resistance. Stress proteins were marked with pink color. C) C+P: Core and Virulence Proteins, Virulence proteins were marked with green color. D) C+M: Core and membrane transport, membrane proteins were marked with blue color. Proteins which show high degree in core + functional networks were noted and listed in respective interactomes with a diamond shaped yellow colored box.

Journal Pre-proof Table 1: General Genome characteristics of strains under study including their Accession No., Source, Location, GC content, Genome Size, Specific genes, CDS and RNAs S. No.

Organism

1

S. alaskensis RB2256

2

S. baekryungensis DSM16222 S. bauzanensis DSM22271 S. flava R11H

3 4 5 6

S. fribergensis Kp5.2 S. granuli NBRC 100800

7 8

S. granuli TFA S. indica DS15

9

Sphingopyxis sp. LPB0140 S. macrogoltabida 203N

10

11

S. macrogoltabida EY-1

12

S. terrae NBRC 15098

13

S. terrae subsp. ummariensis DSM 24316

Accession No/s.

Source

Location

NC_008048.1 NC_008036.1

Marine heterotroph,

NZ_ATUR000000 00.1 NZ_NISK000000 00.1 NZ_FUYP000000 00.1 NZ_CP009122.1, NZ_CP009123.1 NZ_BCUA000000 00.1

Sea water

NZ_CP012199.1 NZ_FZPA000000 00.1 NZ_CP018154.1 NZ_CP013349.1 NZ_CP013345.1 NZ_CP013346.1 NZ_CP012700.1 to NZ_CP012705.1

NZ_CP013342.1 to NZ_CP013343.1 NZ_FXWL00000 000.1

Hydrocarbon contaminated soil Soil sample, HCH dumpsite Soil sample

GC content (in % ) 65.5

Genome Size (in Mb) 3.37

62.4

3.07

Bozen, South Tyrol, Italy

63.3

4.26

Ummari Village, Lucknow, U.P. India Freiberg, Saxony, Germany Korea

63.8

Resurrection Bay,Alaskan waters, North Sea, North Pacific Yellow Sea, Korea

Up flow anaerobic sludge blanket bioreactor Mud sample Soil sample, HCH dumpsite Marine isolate Soil

J

Soil sample, HCH dumpsite

CDS

rRNA

tRNA

Genome Coverage

323

3372

3

45

-

702

2993

6

49

-

794

4305

3

46

55X

924

4352

3

46

265X

5.2

896

5073

3

45

210X

4.25

392

4040

3

46

131X

66.2 65.7

4.67 4.15

484 632

4558 4064

3 3

46 45

26X 221X

46.1

2.53

637

2595

6

41

283X

Japan

64.6

5.95

856

5935

3

58

285X

Japan

64.6

5.09

544

4995

3

50

130X

Ummari Village, Lucknow, U.P. India

64.9

4.08

330

3945

3

45

92X

Wetzlar, Germany

65.2

3.57

329

3564

3

44

256X

l a

63.9 66.4

f o

o r p

e

r P

Rhine river Ummari Village, Lucknow, U.P. India South Korea

rn

u o

Microbial consortium growing on random polymer of ethylene oxide and propylene oxide Activated Sludge

Specific genes

4.16

Journal Pre-proof 14 15 16 17

S. witflariensis DSM14551 Sphingopyxis. sp QXT-31 Sphingopyxis. sp 113P3 S. lindanitolerans

NZ_NISJ0000000 0.1 NZ_CP019449.1

Wastewater treatment plant Soil

Germany

63.3

4.31

778

4358

6

52

35X

Hunan province, China

66.5

4.28

623

4168

3

46

200X

NZ_CP009452.1

Activated sludge

Japan

64

5.41

746

4710

3

47

-

PHFW00000000

pesticide disposal soil sewage and soil samples water

north-west Poland

65.3

4.15

693

4079

3

49

114X

USA

66.4

4.22

412

4090

3

48

1361X

Japan

64.6

3.87

457

3895

3

water

Japan

65.18

535

3953

3

52

354X

Soil

Antartica

65.0

468

4363

3

47

100X

WS5A3p 18 19 20 21

Sphingopyxis sp. MG Sphingopyxis sp. EG6 Sphingopyxis sp. FD7 Sphingopyxis sp. PAMC 25046

NZ_CP026381.1, NZ_CP026382.1 NZ_AP017603.1, NZ_AP017604.1 NZ_AP017898.1, NZ_AP017899.1/ NZ_CP039250.1

l a

Jo

n r u

e

r P

f o

o r p 3.94 4.50

50

359X

Journal Pre-proof

Table 2: Genes share the interaction with core and functional families: aromatic degradation, stress resistance, virulence and membrane transport S. No.

Identifiers (IDs)

Function

Core Guanosine monophosphate synthase Prokaryotic DNA polymerase Inosine-5'-monophosphate dehydrogenase RNA polymerase Phenylalanine--tRNA ligase beta subunit Serine hydroxymethyltransferase

of

GuaA PolA GuaB RpoB PheT GlyA

ro

1. 2. 3. 4. 5. 6.

Core _Degradation AroG MaiA AroK

1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

DnaK GrpE RpoH RpoD DnaJ HrcA GshB BphK EctC KatG

Phospho-2-dehydro-3-deoxyheptonate aldolase Maleylacetoacetate isomerase Shikimate kinase 1

-p

1. 2. 3.

Chaperon protein, DNA replication Heat shock protein, hyperosmotic RNA polymerase sigma factor RNA polymerase sigma factor Chaperon protein, DNA replication Heat-inducible transcription repressor Glutathione synthetase Glutathione S-transferase Ectoine Synthase Catalase peroxidase

lP na ur

Jo

1. 2. 3. 4. 5.

re

Core_Stress

tpiA blaR1 pare gmk rsmA

Core_Virulence Triosephosphate isomerase penicillin-binding protein DNA topoisomerase 4 subunit B Guanylate kinase Ribosomal RNA small subunit methyltransferase A

Core_Membrane 1. 2. 3. 4. 5. 6. 7. 8.

secY secF_1 ffh ftsY tolB tatC secB secF_2

Protein translocase subunit Protein-export membrane protein Signal recognition particle signal recognition particle Tol-Pal system protein Sec-independent protein translocases Protein-export protein Protein-export membrane protein

Journal Pre-proof Protein translocase subunit Sec-independent protein translocases Sec-independent protein translocases Pilus assembly protein

ur

na

lP

re

-p

ro

of

secE tatA tatB cpaD

Jo

9. 10. 11. 12.

Journal Pre-proof Highlights  We propose that both S. baekryungensis DSM16222 and Sphingopyxis sp. LPB0140 strains should not be included under genus Sphingopyxis. 

Core-analysis revealed that 1422 genes were shared among the which include essential pathways and genes for conferring adaptation against stress environment



Polyhydroxybutyrate (bio-degradable plastics) degradation, anaerobic respiration, type IV secretion were among most abundant and exopolysaccharide formation, toxinantitoxin system and hyaluronic acid production were differentially present families Genome of S. witflariensis DSM14551, Sphingopyxis sp. MG and Sphingopyxis sp. FD7

of



provides a hint of probable pathogenic abilities.

ro

Protein-Protein Interactome depicted that membrane proteins and stress response has

-p

close integration with core-proteins while aromatic compounds degradation and virulence

ur

na

lP

re

ability are strain specific attributes.

Jo



Figure 1

Figure 2

Figure 3A

Figure 3B

Figure 4