Evolutionary insights into adaptation of Staphylococcus haemolyticus to human and non-human niches

Evolutionary insights into adaptation of Staphylococcus haemolyticus to human and non-human niches

Journal Pre-proof Evolutionary insights into adaptation of haemolyticus to human and non-human niches Staphylococcus Vasvi Chaudhry, Prabhu B. Patil...

630KB Sizes 0 Downloads 11 Views

Journal Pre-proof Evolutionary insights into adaptation of haemolyticus to human and non-human niches

Staphylococcus

Vasvi Chaudhry, Prabhu B. Patil PII:

S0888-7543(19)30804-3

DOI:

https://doi.org/10.1016/j.ygeno.2019.11.018

Reference:

YGENO 9412

To appear in:

Genomics

Received date:

20 October 2019

Revised date:

16 November 2019

Accepted date:

26 November 2019

Please cite this article as: V. Chaudhry and P.B. Patil, Evolutionary insights into adaptation of Staphylococcus haemolyticus to human and non-human niches, Genomics (2019), https://doi.org/10.1016/j.ygeno.2019.11.018

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof

Evolutionary insights into adaptation of Staphylococcus haemolyticus to human and nonhuman niches Vasvi Chaudhry†1 and Prabhu B. Patil*1

of

Author affiliations: 1 Bacterial Genomics and Evolution Laboratory, CSIR-Institute of Microbial

ro

Technology, Sector – 39A, Chandigarh 160036, India †

-p

Present Address: Department of Microbial Interactions, Center for Plant Molecular Biology,

re

Interfaculty Institute of Microbiology and Infection Medicine Tübingen, University of Tübingen,

lP

Auf der Morgenstelle 32, 72076, Tübingen, Germany.

na

*Correspondence: Prabhu B. Patil, [email protected]

ur

Key words: Staphylococcus haemolyticus; Phylogenomic; Pan-genome; Evolution; Genome;

Jo

Endophyte

Abbreviations: Average Nucleotide Identity; ANI, digital DNA-DNA hybridization; dDDH, Rice endophytic Staphylococcus haemolyticus; RESH, Horizontal gene transfers; HGT, Large dynamic regions; LDRs. Data statement: All supporting data, code and protocols have been provided within the article through supplementary data files. Thirteen supplementary tables and one supplementary figure are available with this article.

1

Journal Pre-proof Abstract Staphylococcus haemolyticus is a well-known member of human skin microbiome and an emerging opportunistic human pathogen. Presently, evolutionary studies are limited to human isolates even though it is reported from plants with beneficial properties and in environmental settings. In the present study, we report isolation of novel S. haemolyticus strains from surface sterilized rice seeds and compare their genome to other isolates from diverse niches available in

of

public domain. The study showed expanding nature of pan-genome and revealed set of genes

ro

with putative functions related to its adaptability. This is seen by presence of type II

-p

lanthipeptide cluster in rice isolates, metal homeostasis genes in an isolate from copper coin and

re

gene encoding methicillin resistance in human isolates. The present study on differential genome

lP

dynamics and role of horizontal gene transfers has provided novel insights into capability for ecological diversification of a bacterium of significance to human health.

Jo

ur

Genome, Endophyte

na

Key words: Staphylococcus haemolyticus, Phylogenomic, Microbial ecology, Evolution,

2

Journal Pre-proof INTRODUCTION

Species of Staphylococcus are major cause of nosocomial infections in humans, especially in infants, immunocompromised individuals and patients with implanted devices [1-3]. Members of Staphylococcus genera are also found inhabiting other niches such as metals, plants tissues (including rhizosphere and endosphere), processed food, etc. [4-7]. The 16S rRNA gene plays a

of

crucial role and is the backbone for identification and taxonomy of bacteria but it has its own

ro

limitations that include the highly conserved nature of 16S rRNA gene, therefore, the resolution is often too low to distinguish in species/strains and sometimes in genus [8-10]. In addition,

-p

many bacterial genomes contain multiple copies of the 16S rRNA gene having intergene

re

variation [11]. All these limitations associated with 16S rRNA gene led to misclassification of

lP

strains at species and sometime at genus level.

na

Genome-enabled microbial taxonomy with the availability of bacterial genomes (including reference and type strains) [12-13] and phylogeny based on core genome provide advantage to

ur

classical taxonomy methods [14-16]. It is clear that bacteria evolve rapidly according to the

Jo

environment it experiences and imports genes/genomic regions for their adaptation in a new host/environment. Genome wide comparison of diverse niches bacterial isolates provides valuable approach to enhance our understanding of their adaptation and evolution [17-18].

While there are undoubtedly, reports on whole genome based phylogenetic and comparative analysis of pathogenic Staphylococcus species associated to human and animals with the aim to characterize

pathogen/pathogen

populations for outbreak

investigations and

understanding

bacterial infections and antibiotic resistance [19-21]. There are previous reports on the isolation of Staphylococcus sp. from plant tissues [22] [6], such as poplar (Populus trichocarpa) [23] and 3

Journal Pre-proof sugarcane (Saccharum officinarum) [24] and from seeds of cactus [25] and bean [4]. Infact tilldate, there are limited number of studies on whole genome based diversity and comparative genomics of Staphylococcus species strains from diverse ecological niches. In a recent study, we published the first detailed genome scale characterization of diverse S. epidermidis strains including human and plant niche isolated and uncovered the endophytic lifestyle associated genes/genomic islands [26]. This study provides valuable insights into the genomic plasticity of

of

S. epidermidis that led to its remarkable adaptability in diverse habitats makes this species

ro

ecologically flexible.

-p

Another species of Staphylococcus, S. haemolyticus is also associated with diverse ecological

re

niches including plants, human and environment. Comparative genomics analysis of purely

lP

clinical strains of S. haemolyticus colonizing human and their evolution was studied [3]. Genome-based characterization of diverse geographical and temporal clinical S. haemolyticus

na

isolates to study their phylogenetic relationship and to understand the basis of emergence of S.

ur

haemolyticus as a nosocomial pathogen was documented in literature [27]. Apart from human

Jo

origin [28-31], there are several reports on culture-dependent studies on plant associated S. haemolyticus strains and their beneficial interactions [24, 4, 32-36]. Association of S. haemolyticus to plants is found in culture independent studies also [34, 37]. These studies are summarized in Table S1. Genome sequence of willow (Salix viminalis x S. miyabeana Fabius cultivar.) endophyte and leaf vegetable associated S. haemolyticus are reported until now [38, 39]. There is no information on indepth genome analysis of shrub willow endophyte, which could deliver some insights into the genetic content and adaptation to a lifestyle-associated niche. Phylogenetic analysis and genome-based diversity of S. haemolyticus of plant, human and other

4

Journal Pre-proof environments are not yet explored. Owing to its importance, there is a need to have insights into the emergence, diversification, and evolution of S. haemolyticus strains in different habitats.

In the present study, we sequenced genomes of three seed borne rice endophytic S. haemolyticus (RESH) and the species type strain. We performed comparative genome analysis of in-house sequenced along with sixteen other publicly available S. haemolyticus of diverse ecological

of

origin. This study not only provides genomic insights into multiple lifestyles, but also a better

ro

understanding of adaptation of S. haemolyticus strains to plant and other human and non-

re

-p

human niches.

lP

MATERIALS AND METHODS

na

Bacterial strain isolation, genome sequencing, and data collection

Endophytic bacteria described in the present study were isolated from sterile rice seed samples.

ur

The seed hulls were removed from rice seeds using sterilized forceps, and washed with sterilized

Jo

water for 1 min and then with 1% sodium hypochlorite solution for 5 min. The seeds were again washed with 75% ethanol for 1 min. After another wash with sterilized water five times, the surface sterilized rice seeds were crushed in sterile mortar and pestle and suspended in sterile saline solution (0.85% NaCl). The seeds suspension was incubated for 2 h at 28 °C under shaking condition. Then 100 μl of each of Direct, 10 −1 , 10−2 , 10−3 and 10−4 dilution in sterile saline was plated in duplicates onto Nutrient agar (NA); King’s medium B (KMB); Glucose yeast chalk agar (GYCA); Tryptic soy agar (TSA); Peptone sucrose agar (PSA) supplemented with 0.01% cycloheximide. The confirmation of surface sterilization was conducted by spreading the last 5

Journal Pre-proof water wash as well as placing the washed seeds onto different media plates. "Type strain" of S. haemolyticus MTCC3383(T) = ATCC29970(T) was obtained from Microbial Type Culture Collection and Gene Bank (MTCC), Chandigarh, India. The strain was confirmed based on 16S rRNA

gene

sequence

analysis

using

web-based

tool

EzTaxon-e

(http://www.ezbiocloud.net/eztaxon) [40] prior to whole genome sequencing.

of

"Type strain" and rice endophytic S. haemolyticus (RESH) strains were cultured in Nutrient

ro

Broth (NB) medium with shaking at 150 rpm and 28°C for 18 hours and proceed for genomic DNA isolation, quality check and genomic DNA library construction as described [26]. Cluster

-p

generation and sequencing of libraries were performed on the Illumina MiSeq platform

re

(Illumina, San Diego, CA) with a 2 × 250 paired-end run.

lP

Genome sequence analysis, assembly and annotation

na

The draft genome sequence of the "type strain" of S. haemolyticus MTCC3383(T) and RESH

ur

SE2.14, SE3.8 and SE3.9 isolates were obtained using the reads from Illumina MiSeq paired-end

using

Jo

libraries and de novo assembled using the CLC genomics workbench. Genomes were annotated Prokaryotic

Genomes

Automatic

Annotation

(http://www.ncbi.nlm.nih.gov/genomes/static/Pipeline.html)

and

Pipeline Rapid

(PGAAP)

at

Annotation

NCBI System

Technology (RAST) pipeline [41] and also annotated by NCBI non-redundant (NR) protein databases (http://www.ncbi.nlm.nih.gov/RefSeq/). Function annotation was performed by RAST pipeline

[41].

RNAmmer

(http://www.cbs.dtu.dk/services/RNAmmer/)

and

tRNAscan

(http://lowelab.ucsc.edu/tRNAscan-SE/) tools were used for total rRNA and tRNA prediction.

6

Journal Pre-proof Phylogenomic and Genome based taxonomy

To establish the relatedness among the S. haemolyticus genomes sequences, Average Nucleotide Identity (ANI) and digital DNA-DNA hybridization (dDDH) were performed on RESH and database genomes with respect to “type strain” using JSpecies v1.2.1 [13] and GGDC 2.0 server (http://ggdc.dsmz.de/distcalc.php) [12] respectively. Heatmap was constructed using software

of

Morpheus (https://software.broadinstitute.org/morpheus/).

ro

To determine the relatedness of S. haemolyticus from diverse habitats from plant (willow,

-p

vegetable) copper coin, human and outlier species of Staphylococcus (S. aureus, S. epidermidis,

re

S. saprophyticus and S. gallium), a whole-genome maximum likelihood tree was carried out based on universal thirty-one house-keeping protein encoding phylogenetic marker genes (dnaG,

lP

frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS,

na

rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, and tsf), majority of them are single copy genes involved in information processing (replication, transcription, and

ur

translation) or central metabolism, and are less likely to lateral gene transfer [42]. The tree was

Jo

constructed using General Time reversible (GTR) model, Gamma distributed with Invariant sites (G + I) method with 500 bootstrap replications MEGA v-6 [43].

Comparative genomics

In order to predict the genome dynamics, the sizes of pangenome (gene repertoire), core (conserved), accessory (dispensable) and unique (strain-specific) were simulated using PanGenome

Analysis

Pipeline

[44].

This

multiparanoid

based

algorithm

searches

for

homologs/orthologs in multiple genomes considering local matched region to be not less than 7

Journal Pre-proof 25% of the longer gene protein sequence and global matched region not less than 50% of the longer gene protein sequence. A minimum score value of 50 and an E-value of less than 1 X 10-8 respectively, were used as cutoffs.

To visualize genome level differences between RESE genomes with other groups, a circular genome comparison was performed using BRIG-0.95 [45]. BRIG generated regions of interest

of

were re-annotated using RAST pipeline and re-inspected for homology by BlastP and also

ro

examined the prominence of horizontal gene transfer (HGT) in shaping gene sets by performing GC (%) content calculation by considering <29% and >35% as atypical GC (%) content with

re

-p

respect to GC content of S. haemolyticus genome (32%).

lP

Genome Mining of Biosynthetic gene clusters (BGC)

Genomes mining was performed using antiSMASH tool version 2.0 [46], ARTS [47] and

na

BAGEL version 3 [48] web tool for examining gene clusters classes for secondary metabolites,

Jo

BlastN.

ur

bacteriocin, or lantibiotics. Further each gene in the clusters were mapped manually using

RESULTS

General Genomic features of S. haemolyticus isolates from diverse niches

The gene sequence comparison of the 16S rRNA gene of RESH strains showed 99.7% similarity to the corresponding gene of Staphylococcus haemolyticus. Phylogenetic analysis of the 16S rRNA of RESH strains confirmed its phylogeny of S. haemolyticus. Complete 16S rRNA genes 8

Journal Pre-proof of all three S. haemolyticus SE2.14, SE3.8 and SE3.9 is submitted to NCBI with GenBank accession no. KM877514, KM877515 and KM877516 respectively.

We carried out isolation, in-house whole genome sequencing and analysis of three S. haemolyticus representing “RESH”. We also procured and in-house sequenced S. haemolyticus “type strain”. The raw reads were de novo assembled with minimum contig size of 500bp and

of

coverage range from 48X to 198X. Draft assemblies ranged in size from 2.3 to 2.4 Mb which is

ro

the typical genome size of S. haemolyticus with a range of 2,301 to 2,425 coding sequences, indicating no reductive evolution occurred in these strains. The statistics and general features of

-p

the assembled genomes are summarized in Table 1 and Table S2. Whole-genome sequence

re

determination of RESH revealed comparable genome size, GC content and number of genes with

lP

type strain and other strains from diverse niches representing willow plant, copper coin and

na

human body [3, 38].

ur

Establishing genome-based taxonomy of ecologically diverse S. haemolyticus strains

Jo

Out of total 35 S. haemolyticus genomes from NCBI, 15 genomes were forming group with type strain S. haemolyticus based on complete rpoB gene tree. (Figure S1). Pairwise comparison of strains with “type strain” also revealed 15 genomes with ANI and dDDH values above the species demarcation cutoff i.e. 95-96% and 70% respectively (Table S2; Table S3). The details of all the strains 3 RESH, “type strain” and 15 publicly available S. haemolyticus genomes along with their isolation source, genome size, GC content (%) and accession numbers used for analysis in the present study are summarized in Table 1.

Establishing genome-based phylogeny of diverse S. haemolyticus strains 9

Journal Pre-proof Phylogenetic tree of 19 above shortlisted S. haemolyticus isolates, including three RESH and type strain along with other species of Staphylococcus (S. aureus, S. epidermidis, S. saprophyticus and S. gallium) as outgroups is shown in Figure 1a. It was found that 19 strains of S. haemolyticus species strains formed a monophyletic clade among themselves distinct from other species type strains based on branching pattern. The monophyletic group included both human associated and non-human origin strains. Among the four species, S. aureus subsp.

of

aureus NCTC 8325(T) forms the closest outgroup to the S. haemolyticus strains. Analysis of

ro

unrooted tree depicted that all the 19 strains formed two groups, which we defined as ‘A’ and

-p

‘B’. Among them, 11 human associated, one copper coin and one willow endophyte S. haemolyticus strains including the “type strain”, were mapped to the group ‘A’ whereas three

re

human isolates along with RESH strains formed another group ‘B’. All the RESH strains formed

lP

a distinct sub-lineage with one human strain as outgroup Figure 2a. Similar clustering pattern

na

was also generated using pairwise ANI values heat map which reflects that RESH strains showing ~ 96% value with type strain, clustered separately and form distinct sub-lineage close to

ur

one of the human isolate (with >99% ANI value to RESH) as outgroup whereas the human,

Jo

copper coin and willow endophyte showed close to or more than 99% ANI with type strain, which is much above species cut-off of 95% (Figure 1b).

Pangenomic insights into ecological diversification of S. haemolyticus

Functional classes of RESH and its phylogenetic relatives revealed that Phages, Prophages, Transposable elements subsystem were striking different among the genomes and therefore predicted to play important role in divergence of strains into different ecological niches. (Figure 2b). For inter-strain differences, a pan genome profile was generated and pan and core genome 10

Journal Pre-proof sizes were plotted against the number of genomes under study (n=19) and it was found that the pan genome increases with the addition of each new strain and is far from saturation (Figure 3a). Our analysis revealed a core-gene pool of nearly 1888 genes that is conserved or core to all S. haemolyticus in the present study (Figure 3b). The core gene sets were classified into 22 COG subcategories based on COG analysis. Out of them, 37% functional classes belong to 8 subcategories of metabolism, 18% to 4 subcategories of information storage and processing and

of

signaling, 13% of 8 subcategories of cellular processes, and 21% to 2 subcategories of poorly

ro

characterized whereas 9% remain unclassified as COG category (Table S4). Genes ranging from

-p

300-500 in isolates form the total dispensable or variable gene pool, which constitutes nearly 15-

re

25% of total gene pool and that were functionally classified into 20 COG subcategories (Table S4). 72% of the variable gene pool was unclassified in COG that is in much higher proportion

lP

compared to core genome. Therefore, we particularly focus on variable/dispensable genes that

na

are unique to each strain from different niches. The genomes of four strains, SE2.14 (Rice seed endophyte, RESH), RIT283 (Willow endophyte, WESH), JCSC1435 (Human, HMSH) and

ur

R1P1 (Copper coin, COSH), which represented diverse sub-lineages and ecology, have 12, 38,

Jo

146, and 14 unique genes respectively (Figure 3b). Out of them, more than 50% genes have atypical GC content (%) in each strain. Human pathogenic, JCSC1435 has higher number of unique genes with maximum of 73% genes with atypical content compared to others, which could be due to the gain of genes to acquire additional functions for survival, pathogenicity and adaptation to the human skin while others lost genes for their adaptation to respective niches (Table S5-S8).

Our analysis also identified an important set of 60 genes unique to three RESH (SE2.14, SE3.8 and SE3.9), majority of them showed atypical GC content, suggesting gain of some specific 11

Journal Pre-proof genes in the evolution and adaptation of RESH to rice seeds (Table S5). Gene encoding glycerophosphoryl diester phosphodiesterase (acdB) with atypical GC content (23.07%) was present in RESH genomes only. It is a regulatory gene that along with acdR, co-regulate the expression of acdS, encoding 1-aminocyclopropane-1-carboxylate deaminase (ACCD). A putative proline (proP) and glycine betaine (opuD) transporter unique to RESH showed the role of these genes in promoting higher accumulation of osmolytes such as proline and glycine

of

betaine for their adaptation and stress tolerance in rice seeds. Genes encoding lantibiotic and its

ro

transporter genes as well as two putative ADH (Alcohol dehydrogenase) are also present in

-p

RESH.

re

On the other hand, in willow endophyte, RIT283, unique genes are abundant in transport of

lP

heavy metals, Mobile element (ME) proteins and hypothetical proteins (Table S6). Presence of zinc-containing alcohol dehydrogenase in RIT283 predicted to be important for endophytic

na

adaptation as the presence of multiple copies of alcohol dehydrogenase in genome of N 2 -fixing

ur

grass endophyte Azoarcus sp. strain BH72 were also reported [49,50]. While in human

Jo

associated strain (JCSC1435) genome, majority of the unique genes encodes for hypothetical proteins. Other genes belong to an endolysin, N-acetylmuramoyl-L-alanine amidase (MurNAc– LAA), holin, transposases (IS431mec and IS1272) and mecRI (Table S7). The association of endolysin and holin proteins produced by bacteriophages with human host are known. The predicted endolysin, MurNAc–LAA gene encoded by the genomes of Clostridium perfringens hydrolyze the amide bond between N-acetylmuramoyl and L-amino acids in certain cell wall glycopeptides and the gene product (putative amidase) was reported as potential antimicrobial to control the pathogenic bacterium [51]. Holins are responsible for disrupting the cytoplasmic membrane to aid endolysins in cell lysis and kill host-associated pathogens [52]. Our study also 12

Journal Pre-proof shows the mecA gene that confers methicillin resistance and encodes for penicillin-binding protein-2a is present in majority of human associated S. haemolyticus strains understudy.

The copper coin associated R1P1 genome harbor more than 50% of unique genes encodes for hypothetical proteins and the rest for membrane proteins (Table S8). These gene differences in strains reflect capability of S. haemolyticus strains for their adaptation to diverse habitats from

of

metal, human to plant niche.

ro

Genome dynamics in niche specific S. haemolyticus strains

-p

Since ecological diversification is going on in both the lineages, to capture the variation we

re

looked at large regions (>10kb) in the genome that under flux in strains originating from

lP

contrasting niches. We can expect continuous acquisition, loss and hypervariation at such large dynamic regions (LDRs) that reinforce the genomic flexibility necessary for rapid functional

na

diversification at inter-strain level in this important species. The comparison revealed genomic

ur

regions specific to S. haemolyticus reference strains, HMSH (JCSC1435), COSH (R1P1), WESH

Jo

(RIT283) and RESH (SE2.14) from human, copper, willow and rice associated niches respectively. These LDRs harbor unique and variable regions and reflects gene content variation. These large dynamic regions show marked variation in GC% content compared to their genomes depicting the acquisition of genes in those regions through horizontal gene transfer (HGT) events.

When RESH (SE2.14) was taken as reference, three LDRs were detected and designated here as RE1, RE2 and RE3 (Table 2). In RE1 region of 21kb, a total of 18 CDS encoding for replication associated protein, lantibiotic synthesis along with ME was detected (Table 2; Table S9). 13

Journal Pre-proof Genome wide analysis of secondary metabolite gene cluster distribution also showed the presence of lanthipeptide class unique to RESH (Table S13). Further, in silico analysis revealed the presence of complete LanM type lantibiotic, which carries both dehydratase and cyclase for performing both the dehydration and cyclization respectively. This led to perform the analysis for regions flanking LanM determinants for other open reading frames (ORFs) involved in the biosynthesis of lantibiotics. The two novel structural peptides, LanA1 (44 a.a) and LanA2 (56

of

a.a) were present with proteolytic processing protease encoding, LanP and regulatory element,

ro

LanR. It also contains two dedicated ABC transporters protein for transport (LanT) and

-p

immunity (LanI). The study indicates that except LanG, from LanFEG family transporters, it

re

contains one separate permease (LanE) and one ATPase (LanF) that play role in self-immunity of lantibiotic-producing bacteria [53]. Another permease component, LanG is not present in that

lP

complex. The gene cluster contained an insertion sequence, IS1181 as well. Therefore, here we

na

were able identify and annotate complete Type II lantibiotic- gene cluster in RESH (16.7kb). The amino acid sequences of all the ORFs present in lantibiotic gene cluster along with their identity

ur

to lantibiotic gene clusters reported in literature is mentioned in Table 3. In second region RE2

Jo

(10kb), total 18 ORFs were annotated and majority of them encodes for hypothetical, ME associated and pathogenicity island proteins. (Table 2; Table S9). Overall, the new genes for plant (seed) adaptation in RESH were largely acquired and mediated by horizontal gene transfer with atypical GC content (Table S9).

Willow endophyte, (WESH) RIT283 as reference revealed eight LDRs ranging from 12.5-54kb in size. Majority of genes belong to ME proteins, restriction-modification system, resolvases, replication-associated protein, phages, stress related and hypothetical proteins. Gene encoding UDP-glucose

4-epimerase

(galE),

putative 14

glycosyltransferase,

capsular

polysaccharide

Journal Pre-proof synthesis gene clusters are also present (Table 2; Table S10). Genes coding for galE, glycosyltransferase and some capsular polysaccharide synthesis were known in endophyte Azoarcus sp. strain BH72 predicting their role in interactions with plants [50].

In human associated strain (HMSH), JCSC1435, total eight LDRs were found to be differentially present, HM1 to HM8. The largest one, HM1 (60kb) harbors majority of genes for ME proteins,

of

hypothetical proteins along with genes related to recombination and repair. Other major genomic

ro

regions, HM3 (48kb) and HM6 (43kb) comprised of phages, phage associated proteins, pathogenicity related genes/islands and hypothetical proteins (Table 2; Table S11). Another

-p

LDR of HMSH, HM2 (20kb) possess important genes that play role in normal cellular growth

re

and multiplication, capsular biosynthesis genes. Some of the genes in LDRs of JCSC1435 show

lP

low identity with genes present in human and willow endophyte RIT283 (Table 2; Table S11).

na

When copper coin associated strain (COSH), R1P1 genome was taken as reference; four LDRs were found and designated as CO1 to CO4. The largest among them was CO3 (50kb) which

ur

harbors important genes abundant in metal transport and resistance, ME proteins and

Jo

hypothetical protein (Table 2; Table S12). In CO1 (29kb), the majority of the genes code for osmotic stress related proteins whereas, CO4 (25kb) comprised of important genes encoding for phosphate regulation. Taken together, these results suggest that there are differences in LDRs specific to genomes of S. haemolyticus from different habitats according to their niche specific requirements.

DISCUSSION

15

Journal Pre-proof Bacteria are adapted to specific lifestyle often exhibit niche-driven genome composition by following different strategies for their adaptation to diverse habitats such as acquisition of beneficial genes or genome islands through horizontal gene transfer, loss of genes through reductive evolution, genetic recombination, positive selection etc. [54]. Whole genome analysis approach can best resolve the genomic differences of habitat specific members of genus at not only species but also upto strain level and their lifestyle associated gene signatures [55].

of

Members of genus Staphylcoccus are often considered as human associated bacteria with

ro

commensal as well as pathogenic potential. Literature suggests that they live in diverse habitats

-p

including soil, water and plants domesticated animals and humans.

re

Within the plant associated habitats, members of Staphylococcus species were found as seed

lP

endophytes of maize [56,57]. They were also found associated to plant parts such as cotton [58], carrot [59], soybean [60], Chlorophytum borivilianum [61]. Reports of S. epidermidis and S.

na

pasteuri as endophytes of ginseng are available in literature [62]. A recent study reported four

ur

different Staphylococcus species, S. epidermidis, S. pasteuri, S. haemolyticus and S. aureus

Jo

group (including S. aureus, S. argenteus and S. schweitzeri) as dominant members of endophytic community associated with seeds of Anadenanthera colubrina tree [7]. Hence, our results strongly support the previous studies about seeds as carriers of different species of Staphylococcus as endophyte and their predicted role in seed and plant growth and development.

To date, not many attempts have been made to study the habitat-specific gene contents variations in Staphylococcus members other than the first detailed genome scale characterization of rice seed endophytic S. epidermidis [26]. In the present study, we report another species, S. haemolyticus as rice seed endophyte. When we started the analysis on S. haemolyticus, around 16

Journal Pre-proof 35 diverse genomes were present in NCBI GenBank and out of them 20 genomes were misidentified at the species level. Hence, it was important to classify them based on phylogenomics and comparative genomics using modern taxonomic tools using “type strain” genome [63]. Therefore, we also sequenced the genome of "type strain" of S. haemolyticus MTCC3383(T), which was originally isolated from human skin [64] to establish the identity of

of

RESH isolates, Copper and Willow isolate as S. haemolyticus using phylogenomic approaches.

ro

Genome analysis of RESH shows almost comparable genome size, GC content and number of genes with respect to the type strain genome MTCC3383(T), human, plant (willow) and metal

-p

(copper) associated genomes. This suggests that there has no drastic change in genomes based on

re

genome size in and there is no reductive evolution in genomes of RESH strains and others from

lP

diverse habitats. In this regard, the monophyletic nature of ecological variant strains of S. haemolyticus, suggest intraspecies genome and functional diversification by acquiring genes by In support of this observation, pan genome size of the current

na

horizontal gene transfer events.

ur

dataset genomes increases with the addition of each new strain genome clearly indicated as open

Jo

pan genome. Even the monophyletic grouping of diverse strains suggests diversification and variation through gene acquisition or loss events. Hence other major objective of the current study was to investigate habitat specific gene class/or genes that are exclusively present or absent from the genomes of RESH, copper coin and human associated strains that will give clue of their roles in adaptation in their respective habitats.

In this regard, the open pan genomes and genome comparison showed unique genes/genomic regions to each habitat associated strain genomes. RESH genome showed presence of acdB, which co-regulate the expression of acdS, encoding ACCD showed its importance for rice under 17

Journal Pre-proof stress condition, as plant associated bacteria with ACCD activity help the in lowering the level of stress hormone “ethylene” by breakdown ACC (precursor of ethylene), to ammonia and αketobutyrate, and utilized them for their own growth [65, 66]. There are number of bacteria that obtain their energy by oxidizing ethanol. Presence of ADH (Alcohol dehydrogenase) in RESH is consistent with earlier report where it is also shown that bacterial ADH play important role in

of

endophytic establishment of Azoarcus sp. in rice roots [50].

ro

It is well established that plants do not live alone as single entity and have limited ability to adapt themselves to stressed environmental conditions (heat, drought, toxins, or limited nutrients). The evolve mechanisms such as colonization by

-p

sessile nature of plants made them to

re

microorganisms to adjust under these biotic and abiotic stresses. These microorganisms reside in

lP

plant vicinity and those resides internally are called endophytes [67]. It is also documented that endophytes are closely associated to plants, complete a major or entire part of life cycle with

na

them, cope with stresses and help the host plant in growth promotion, protection and

ur

phytoremediation [68, 69, 70]. Even reports are present in literature that showed the species of

Jo

Staphylococcus in general and S. haemolyticus in particular to be associated with diverse plants as endophyte using cultivable and uncultivable approaches [33, 37, 38].

Bacterial endophytes also produce antimicrobial compounds [71, 72, 73]. In a recent study, S. haemolyticus as tomato root endophyte was reported to have antagonistic potential against Ralstonia solanacearum [33]. In our analysis, the localization of type II lanthipeptide gene cluster acquired by HGT shows its importance in helping RESH strains for their adaption to plant niche and contribute in their growth by inhibiting soil-borne plant fungal and bacterial pathogens associated with rice [74]. Overall RESH strains are equipped with a repertoire of 18

Journal Pre-proof genes encoding endophytic bacterial traits for their colonization, adaptation, survival and transmission in highly stressed seed environment. In addition, presence of mobile genetic elements in RESH indicates their contribution to the overall genome plasticity for their role in gene loss and gene gain [75, 76].

Willow, which is an early successional shrub species is able to colonize nutrient-poor

of

environments, its association with endophytes showed its possible role in providing benefits to

ro

withstand stressed environment. In addition, it is reported that Salix spp. have considerable role in the phytoremediation of organic contaminants [77]. Due to its nutrient poor habitat, the unique

-p

gene/ genomic region of WESH genome is different from rice endophytes and well equipped

re

with genes such as metal specific transporters, different stress tolerance proteins, ME proteins

lP

and phage proteins. Our analysis found that RIT283 genome encode phnB gene, which is a component of phosphonate gene cluster, responsible for bacterial degradation of phosphonates,

na

an organophosphorus compound. Presence of gene encoding haloacid dehalogenase (HAD) like

ur

family protein in willow plant help in protecting them halogen-containing compounds. The HAD

Jo

enzymes are known in protecting Ectocarpus from halogen-containing defense metabolites production by kelp thali during its association as epiphyte or endophyte [78, 79, 80]. These genes suggest the role of willow endophyte in their adaptation and protection of plant under nutrient limiting and stressed environment [26].

On the other hand, inspection of the unique genes and genomes of HMSH revealed several genes such as mecRI, IS elements, transposases, phages and genes that participate in general physiologic processes. The genome uncovered the factors by which the strain successfully colonized to human host who are exposed to a diverse chemicals and antibiotics [3, 81]. Another 19

Journal Pre-proof important finding of our analysis is the detection of methicillin resistance gene (mecA) which are found in methicillin-resistant Coagulase-negative staphylococci. S. haemolyticus was reported as the reservoir of mec gene complex (mecA, mecR1, mecI), that form the ccr gene complex encoding chromosomal recombinase that help in integration of cassette and chromosomal DNA [82]. Our study also showed that the gene for methicillin resistance (mecA) was only found in majority of human isolates and was absent from non-human strains including copper coin and

ro

of

endophytes of willow and rice.

It is believed that bacterial capsular polysaccharide, the cell wall components are involved in

-p

early host recognition, provide resistance against host immune system and help bacteria to

re

survival inside the host [83, 84, 85]. It has been well documented that bacterial surface

lP

components such as polysaccharides are required for successful colonization of bacteria to plant hosts and it was reported that Poly-b-1,6-N-Acetyl-D-glucosamine (PGA) was the essential

na

component for E. coli binding to alfalfa sprouts [86]. Capsular polysaccharides also have role in

ur

pathogenesis of staphylococcal infections [7]. Our result reflects differences in capsular

Jo

biosynthesis genes of willow plant and human associated S. haemolyticus. The human pathogenic strain showed complex gene organization with both cap5, cap8 locus along with Oantigen ligase and flippase genes, that are believed to responsible for attachment and colonization to the human. The willow endophyte genome harbors cap5 specific region genes along with few cap8 common genes whereas majority of these genes found lacking in seed endophyte. These differences in capsular polysaccharides suggest their possible role in successful attachment, interaction and colonization with their respective plant and human hosts.

20

Journal Pre-proof In contrast to host-associated settings, copper surface that is not a habitat for bacterial growth, the copper coin associated bacteria develop mechanisms that enable it to withstand the toxic properties copper. Our studies revealed genes/regions for their adaptation to this nonenvironmental setting, which codes for genes such as glyoxalase family protein that showed its role in protection against toxic methylglyoxal compound encountered from environment, by detoxification into nontoxic forms for their survival. The glyoxalases are maximally activated by

of

Co2+ and Ni2+ ions [87]. Copper-translocating P-type ATPases and stress related proteins are

ro

present in LDRs are serving as important clues for their adaptation to copper metal adaptation

-p

[88]. Our analysis revealed that the genome of copper adapted bacteria also harbor genes for

re

resistance against closely related divalent metals including cobalt, cadmium, zinc and also arsenic [89] which can be due to the co-selection of resistance genes against other toxic metals.

lP

The use of copper in human civilizations was reported in between 5th and 6th millennia B.C. The

na

property of copper as an antimicrobial agent has developed interest and it then started using by human in large-scale applications including in animal feed, vessels and hospital [90]. The large-

ur

scale applications promote bacteria in developing resistance against copper. The existence of S.

Jo

haemolyticus on copper alloy coin might be the result of resistance and transfer due to its applications in human and non-human habitats. There is a strong correlation of antibiotic resistance and copper resistance in microorganisms and this co-resistance might be contributing factor in their adaptation from human to environment and vice versa by horizontally acquisition of genes for resistance [91, 92].

21

Journal Pre-proof CONCLUSION

Phylogenomic approaches allowed in unequivocally establishing the identity of ecologically diverse isolates as members of S. haemolyticus species. Studying ecological variants allowed us to obtain insights into the pattern of genomic flux in diversification of members belonging to this species. Overall, the present investigation of diversity in the genomes provided evidence for new

of

evolutionary aspects of S. haemolyticus adaptation to non-human niches and its well-known

re

-p

ro

lifestyle as human associated bacterium and opportunistic human pathogen.

lP

DATA AVAILABILITY

at

the

NCBI

with

the

accession

numbers

LILF00000000,

JRVR00000000,

ur

deposited

na

The genome sequence data of the Staphylococcus haemolyticus strains sequenced in this article are

Jo

JRVS00000000, and JRVT00000000. All the other sequences from Staphylococcus haemolyticus used in the present study are available in NCBI genome database. A summary of the accession number and metadata of each strain is included in Table 1 of this article.

22

Journal Pre-proof

Acknowledgements

Vasvi Chaudhry gratefully acknowledge the Council of Scientific and Industrial Research (CSIR), New Delhi, and SERB N-PDF (grant PDF/2015/000673) for providing research support. PBP acknowledges CSIR network project, Man as a Super-organism Understanding the Human Microbiome (HUM- BSC0119) for financial support. We acknowledge support from project

of

“Expansion and modernization of Microbial Type Culture Collection and Gene Bank (MTCC)”

re

-p

ro

jointly supported by CSIR and DBT, New Delhi.

lP

Author's contributions

V.C. and P.B.P. conceived and designed the study. V.C. collected samples, isolated and endophytes,

performed

na

characterized

genome

sequencing,

Jo

Conflict of Interest

ur

bioinformatics analysis, V.C. and P.B.P. wrote the paper.

The authors declare no Conflict of Interest.

Ethics approval and consent to participate Not applicable.

23

comparative

genomics

and

Journal Pre-proof Figures Legends

Figure 1. (a) The rooted maximum-likelihood tree including three RESH and "type strain" MTCC3383(T) along with other species of Staphylococcus (S. aureus, S. epidermidis, S. saprophyticus and S. gallium) as outgroups, based on phylogenomic reference genes. (b) Heatmap of Average Nucleotide identity (ANI) values amongst 19 strains of S. haemolyticus strains

of

The rows and columns stand for strains. "Type strain" S. haemolyticus MTCC3383(T) is

ro

highlighted in grey box and RESE strains are highlighted with green box. Strains names with

-p

isolation source are listed in Table 1.

re

Figure 2. (a) Phylogenetic relationship of the S. haemolyticus strains. The unrooted maximumlikelihood tree is constructed based on the concatenated 31 phylogenomic reference genes. The

lP

"type strain" MTCC3383(T) is highlighted in grey. (b) Functional classification of the genes

na

encoded by genomes of Staphylococcus haemolyticus strains. These functions were assigned according to the SEED subsystem obtained using the RAST server. Each column indicates the

Jo

different color.

ur

number of CDSs of each Staphylococcus haemolyticus strain in different subsystems showing in

Figure 3. Pan-genome analysis of 19 S. haemolyticus strains (a) Core-genome and pan-genome sizes according to the number of genomes in the dataset. The plots indicate the novel genes obtained on addition of each genome. (b) Numbers of shared and specific genes in 19 S. haemolyticus strains based on clusters. External circles show unique genes in each genome and the inner one shows the core genome size, given by the number of clusters.

24

Journal Pre-proof Figure 4. The architecture of the type II lantibiotic gene cluster in the RESH strain S.

Jo

ur

na

lP

re

-p

ro

of

haemolyticus SE2.14. Annotation of each ORF with sequence homology is mentioned in Table 3.

25

Journal Pre-proof

References

1. Ing, M.B., Baddour, L.M. & Bayer, A.S. Bacteremia and infective endocarditis: pathogenesis, diagnosis, and complications, In K. B. Crossley and G. L. Archer (ed.), The staphylococci in human disease. Churchill Livingstone, New York, N.Y, 331-354(1997). 2. Von Eiff, C., Proctor, R.A. & Peters, G. Coagulase-negative staphylococci: pathogens have

of

major role in nosocomial infections. Postgraduate Medical Journal 110, 63-76(2001).

ro

3. Takeuchi, F. et al. Whole-genome sequencing of Staphylococcus haemolyticus uncovers the

-p

extreme plasticity of its genome and the evolution of human-colonizing staphylococcal species.

re

Journal of Bacteriology 187,7292-308(2005).

4. Lopez-Lopez, A., Rogel, M.A., Ormeno-Orrillo, E., Martinez-Romero, J. & Martinez-

lP

Romero, E. Phaseolus vulgaris seed-borne endophytic community with novel bacterial species

327(2010).

na

such as Rhizobium endophyticum sp. nov. Systematic and Applied Microbiology 33,322–

ur

5. Johnston-Monje, D. & Raizada, M.N. Conservation and diversity of seed associated

(2011).

Jo

endophytes in Zea across boundaries of evolution, ethnography and ecology. PLoS One 6,e20396,

6. Truyens, S., Weyens, N., Cuypers, A. & Vangronsveld, J. Bacterial seed endophytes genera, vertical transmission and interaction with plants: Bacterial seed endophytes. Environmental Microbiology Reports 7,40–50(2015). 7. Alibrandi, P. et al. The seed endosphere of Anadenanthera colubrina is inhabited by a complex microbiota, including Methylobacterium spp. and Staphylococcus spp. with potential plant-growth promoting activities. Plant Soil 422 (1-2),81-99(2018). 26

Journal Pre-proof 8. Kampfer, P. Systematics of prokaryotes: the state of the art. Antonie Van Leeuwenhoek 101,3– 11(2012). 9. Tindall, B.J., Rossello-Mora, R., Busse, H.J., Ludwig, W. & Kampfer, P. Notes on the characterization of prokaryote strains for taxonomic purposes. International Journal of Systematic and Evolutionary Microbiology 60,249–266(2010). 10. Larsen, M.V. et al. Benchmarking of methods for genomic taxonomy. Journal of Clinical

of

Microbiology 52,1529-1539(2014).

ro

11. Walcher, M., Skvoretz, R., Montgomery-Fullerton, M., Jonas, V., Brentano, S. Description of

-p

an unusual Neisseria meningitidis isolate containing and expressing Neisseria gonorrhoeae-

re

specific 16S rRNA gene sequences. Journal of Clinical Microbiology 51,3199–3206(2013). 12. Richter, M. & Rosselló-Móra, R. Shifting the genomic gold standard for the prokaryotic

na

America 106,19126–19131(2009).

lP

species definition. Proceedings of the National Academy of Sciences of the United States of

13. Auch, A.F., von Jan, M., Klenk, H.P. & Göker, M. Digital DNA-DNA hybridization for

ur

microbial species delineation by means of genome-to-genome sequence comparison. Standards in

Jo

Genomic Sciences 2,117–134(2010).

14. Konstantinidis, K.T. & Tiedje, J.M. Genomic insights that advance the species definition for prokaryotes. Proceedings of the National Academy of Sciences of the United States of America 102,2567–2572(2005). 15. Chan, J.Z., Halachev, M.R., Loman, N.J., Constantinidou, C. & Pallen, M.J. Defining bacterial species in the genomic era: insights from the genus Acinetobacter. BMC Microbiology 12,302(2012).

27

Journal Pre-proof 16. Colston, S.M. et al. Bioinformatic genome comparisons for taxonomic and phylogenetic assignments using Aeromonas as a test case. MBio 5,e02136-14(2014). 17. Falush, D. Toward the Use of Genomics to Study Microevolutionary Change in Bacteria. PLOS Genetics 5:e1000627(2009). 18. Chaudhry, V. et al. Methylobacterium indicum sp. nov., a facultative methylotrophic bacterium isolated from rice seed. Systematic and Applied Microbiology 39,25-32(2016).

of

19. Peleg, A.Y. et al. Whole genome characterization of the mechanisms of daptomycin

ro

resistance in clinical and laboratory derived isolates of Staphylococcus aureus. PLoS One

-p

7(1),e28316(2012).

re

20. Harris, S.R. et al. Whole-genome sequencing for analysis of an outbreak of meticillinresistant Staphylococcus aureus: a descriptive study. The Lancet Infectious Diseases 13(2),130-

lP

6(2013).

na

21. Price, J.R., Didelot, X., Crook, D.W., Llewelyn, M.J. & Paul, J. Whole genome sequencing in

83(1),14-21(2013).

ur

the prevention and control of Staphylococcus aureus infection. Journal of Hospital Infection

Jo

22. Truyens, S., Weyens, N., Cuypers, A. & Vangronsveld, J. Changes in the population of seed bacteria of transgenerationally Cd-exposed Arabidopsis thaliana. Plant Biology -Stuttgart 15(6),971-81(2013). 23. Porteous-Moore, F. et al. Endophytic bacterial diversity in poplar trees growing on a BTEXcontaminated site: The characterisation of isolates with potential to enhance phytoremediation. Systematic and Applied Microbiology 29,539–556(2006).

28

Journal Pre-proof 24. Velazquez, E. et al. Genetic diversity of endophytic bacteria which could be found in the apoplastic sap of the medullary parenchyma of the stem of healthy sugarcane plants. J Basic Microbiol 48,118–124(2008). 25. Puente, M.E., Li, C.Y. & Bashan, Y. Endophytic bacteria in cacti seeds can improve the development of cactus seedlings. Environmental and Experimental Botany 66,402–408(2009). 26. Chaudhry,

V.

& Patil,

P.B.

Genomic Investigation Reveals Evolution and Lifestyle

of

Adaptation of Endophytic Staphylococcus epidermidis. Scientific reports 6,19263(2016).

ro

27. Cavanagh, J.P. et al. Whole-genome sequencing reveals clonal expansion of multiresistant

-p

Staphylococcus haemolyticus in European hospitals. Journal of Antimicrobial Chemotherapy

re

69,2920–2927(2014).

28. Chan, K.G. et al. Antibiotic Resistant and Virulence Determinants of Staphylococcus

lP

haemolyticus C10A as Revealed by Whole Genome Sequencing. Journal of Genomics 3,72-

na

74(2015).

29. Roach, D.J. et al. A Year of Infection in the Intensive Care Unit: Prospective Whole Genome

ur

Sequencing of Bacterial Clinical Isolates Reveals Cryptic Transmissions and Novel Microbiota.

Jo

PLoS Genetics 11,e1005413(2015).

30. de Almeida, L.M. et al. Complete Genome Sequence of Linezolid-Susceptible Staphylococcus haemolyticus Sh29/312/L2, a Clonal Derivative of a Linezolid-Resistant Clinical Strain. Genome Announcements 3,e00494–15(2015). 31. Nair, R.G. et al. Genome Mining and Comparative Genomic Analysis of Five CoagulaseNegative Staphylococci (CNS) Isolated from Human Colon and Gall Bladder. Journal of Data mining in Genomics and Proteomics 7,192(2016).

29

Journal Pre-proof 32. Qurashi, A.W. & Sabri, A.N. Osmolyte accumulation in moderately halophilic bacteria Improves salt tolerance of chickpea. Pakistan Journal of Botany 45(3),1011-1016(2013) 33. Upreti, R. & Pious, T. “Root-Associated Bacterial Endophytes from Ralstonia Solanacearum Resistant and Susceptible Tomato Cultivars and Their Pathogen Antagonistic Effects.” Frontiers in Microbiology 6,255(2015). 34. Najnin, R.A., Shafrin, F., Polash, A.H., Zaman, A. & Hossain, A. A. diverse community of

of

jute (Corchorus spp.) endophytes reveals mutualistic host–microbe interactions. Annals of

S.,

Kusumawardhani,

M.K.

&

Aditiawati,

P.

Isolation

and

molecular

-p

35. Suhandono,

ro

Microbiology 65,1615(2015).

re

identification of endophytic bacteria from Rambutan fruits (Nephelium lappaceum L.) Cultivar Binjai. HAYATI Journal of Biosciences 23(1),39-44(2016).

lP

36. Yousaf, M., Rehman, Y. & Hasnain, S. High-yielding Wheat Varieties Harbour Superior Plant

na

Growth Promoting-Bacterial Endophytes. Applied Food Biotechnology 4(3),143-154(2017). 37. Yousaf, S. et al. Pyrosequencing detects human and animal pathogenic taxa in the grapevine

ur

endosphere. Frontiers in microbiology 8(5), 327(2014).

Jo

38. Gan, H.Y. et al. Whole-genome sequences of 13 endophytic bacteria isolated from shrub willow (salix) grown in Geneva, New York. Genome Announcements 2,e00288–e00314(2014). 39. Hong, J. et al. Complete Genome Sequence of Biofilm-Forming Strain Staphylococcus haemolyticus S167. Genome Announcements 16,4(3)(2016). 40. Kim, O.S. et al. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. International Journal of Systematic and Evolutionary Microbiology 62,716-721(2012).

30

Journal Pre-proof 41. Aziz, R.K. et al. The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9,75(2008). 42. Wu, M. & Eisen, J.A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9,R151(2008). 43. Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis Version 6.0. Molecular Biology and Evolution 30,2725–2729

of

(2013).

ro

44. Zhao, Y. et al. PGAP: pan-genomes analysis pipeline. Bioinformatics 28(3),416-8(2012).

-p

45. Alikhan, N.F., Petty, N.K., Ben Zakour, N.L. & Beatson, S.A. BLAST Ring Image Generator

re

(BRIG): simple prokaryote genome comparisons. BMC Genomics 12,402(2011). 46. Medema, M. H. et al. antiSMASH: rapid identification, annotation and analysis of secondary

na

Research 39, W339–W346 (2011).

lP

metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids

47. Alanjary M. et al. The Antibiotic Resistant Target Seeker (ARTS), an exploration engine for

ur

antibiotic cluster prioritization and novel drug target discovery, Nucleic Acids Research, Volume

Jo

45, (W1), W42–W48 (2017).

48. van Heel, A. J., de Jong, A., Montalban-Lopez, M., Kok, J. & Kuipers, O. P. BAGEL3: automated identification of genes encoding bacteriocins and (non-) bactericidal posttranslationally modified peptides. Nucleic Acids Research 41, W448–W453 (2013). 49. Krause, A. et al. Complete genome of the mutualistic, N2-fixing grass endophyte Azoarcus sp. strain BH72. Nature Biotechnology 25,478(2007).

31

Journal Pre-proof 50. Krause, A., Bischoff, B., Miché, L., Battistoni, F. & Reinhold-Hurek, B. Exploring the function of alcohol dehydrogenases during the endophytic life of Azoarcus sp. strain BH72. Molecular Plant-Microbe Interactions 24,1325-1332(2011). 51. Tillman, G.E., Simmons, M., Garrish, J.K. & Seal, B.S. Expression of a Clostridium perfringens

genome-encoded

putative

N-acetylmuramoyl–l-alanine amidase as a potential

antimicrobial to control the bacterium. Archives of Microbiology 195,675–681(2013).

of

52. Roach, D.R. & Donovan, D.M. Antimicrobial bacteriophage-derived proteins and therapeutic

ro

applications. Bacteriophage 5,e1062590(2015).

-p

53. Biswas, S., Biswas, I. SmbFT, a Putative ABC Transporter Complex, Confers Protection against the Lantibiotic Smb in Streptococci. Journal of Bacteriology 195,5592–5601(2013).

re

54. Dutta, C. & Paul, S. “Microbial Lifestyle and Genome Signatures.” Current Genomics

lP

13,153–162(2012).

na

55. Loman, N.J. & Pallen, M.J. Twenty years of bacterial genome sequencing. Nature Reviews Microbiology 13,787-94(2015).

ur

56. Liu, Y., Zuo, S., Xu, L., Zou, Y. & Song, W. Study on diversity of endophytic bacterial

Jo

communities in seeds of hybrid maize and their parental lines. Archives of Microbiology 194,1001–1012(2012).

57. Liu, Y., Zuo, S., Zou, Y., Wang, J. & Song, W. Investigation on diversity and population succession dynamics of endophytic bacteria from seeds of maize (Zea mays L, Nongda108) at different growth stages. Annals of Microbiology 63,71–79(2013). 58. Mcinroy, J.A. & Kloepper, J.W. Survey of indigenous bacterial endophytes from cotton and sweet corn. Plant and Soil 173,337– 342(1995).

32

Journal Pre-proof 59. Surette, M.A., Sturz, A.V., Lada, R.R. & Nowak, J. Bacterial endophytes in processing carrots (Daucus carota L. Var. sativus): their localization, population density, biodiversity and their effects on plant growth. Plant and Soil 253,381–390(2003). 60. Hung, P.Q. & Annapurna, K. Isolation and characterization of endophytic bacteria in soybean (Glycine sp.). Omonrice 12,92–101(2004). 61. Panchal, H. & Ingle, S. Isolation and characterization of endophytes from the root of

of

medicinal plant Chlorophytum borivilianum (Safed musli). Journal of Advanced Research 2,205–

ro

209(2011).

-p

62. Vendan, R.T., Yu, Y.J., Lee, S.H. & Rhee, Y.H. Diversity of endophytic bacteria in ginseng

re

and their potential for plant growth promotion. Journal of Microbiology 48,559–565(2010). 63. Kyrpides, N.C. et al. Genomic Encyclopedia of Type Strains, Phase I: The one thousand

lP

microbial genomes (KMG-I) project. Standards in Genomic Sciences 9,1278-1284(2014).

na

64. Schleifer, K.H. & Kloos ,W.E. Isolation and characterization of staphylococci from human skin. I. Amended descriptions of Staphylococcus epidermidis and Staphylococcus saprophyticus

ur

and descriptions of three new species: Staphylococcus cohnii, Staphylococcus haemolyticus, and

25,50-61(1975).

Jo

Staphylococcus xylosus. International Journal of Systematic and Evolutionary Microbiology

65. Singh, R.P., Shelke, G.M., Kumar, A. & Jha, P.N. Biochemistry and genetics of ACC deaminase: a weapon to “stress ethylene” produced in plants. Frontiers in microbiology 6,937(2015). 66. Asif, H. et al. Comparative genomics of an endophytic Pseudomonas putida isolated from mango orchard. Genetics and Molecular Biology 39,465–473(2016).

33

Journal Pre-proof 67. Hardoim,

P.R.

et

al. The hidden world within plants: ecological and evolutionary

considerations for defining functioning of microbial endophytes. Microbiology and Molecular Biology Reviews 79,293-320(2015). 68. Moore, F.P. et al. Endophytic bacterial diversity in poplar trees growing on a BTEXcontaminated site: the characterization of isolates with potential to enhance phytoremediation. Systematic and Applied Microbiology 29,539-556(2006).

of

69. Thomas, P. & Soly, T.A. Endophytic bacteria associated with growing shoot tips of banana

ro

(Musa sp.) cv. Grand Naine and the affinity of endophytes to the host. Microbial Ecology 58,952-

-p

964(2009).

re

70. Achari, G.A. & Ramesh, R. Diversity, biocontrol, and plant growth promoting abilities of xylem residing bacteria from solanaceous crops. International Journal of Microbiology 2014:14

lP

(2014).

na

71. Taghavi, S. et al. Genome sequence of the plant growth promoting endophytic bacterium Enterobacter sp 638. PLOS Genetics 6,e1000943(2010).

ur

72. Ding, L., Maier, A., Fiebig, H.H., Lin, W.H. & Hertweck, C. A family of multicyclic

4031(2011).

Jo

indolosesquiterpenes from a bacterial endophyte. Organic & Biomolecular Chemistry 9,4029–

73. Inahashi, Y. et al. Spoxazomicins A-C, novel antitrypanosomal alkaloids produced by an endophytic actinomycete, Streptosporangium oxazolinicum K07- 0450T. The Journal of Antibiotics (Tokyo) 64,303–307(2011). 74. Montesinos, E. Antimicrobial peptides and plant disease control. FEMS Microbiology Letters 270,1-11(2007).

34

Journal Pre-proof 75. Darmon, E. & Leach, D.R.F. Bacterial genome instability. Microbiology and Molecular Biology Reviews 78,1–39(2014). 76. Siguier, P. & Gourbeyre, E., Chandler M. Bacterial insertion sequences: their genomic impact and diversity. FEMS Microbiology Reviews 38,865–891(2014). 77. Vervaeke, P. et al. Phytoremediation prospects of willow stands on contaminated sediment: a field trial. Environmental Pollution 126,275–282(2003). Iodide accumulation provides kelp with an inorganic antioxidant

of

78. Kuepper, F.C. et al.

ro

impacting atmospheric chemistry. Proceedings of the National Academy of Sciences of the United

-p

States of America 105,6954–6958(2008).

re

79. Russell, G. Formation of an ectocarpoid epiflora on blades of Laminaria digitata. Marine Ecology Progress Series 11,181–187(1983a).

na

Progress Series 13,303–304(1983b).

lP

80. Russell, G. Parallel growth-patterns in algal epiphytes and Laminaria blades. Marine Ecology

81. Lee, JH. Methicillin (oxacillin)-resistant Staphylococcus aureus strains isolated from major animals and

their potential transmission to humans. Applied and Environmental

ur

food

Jo

Microbiology 69,6489–6494(2003).

82. Czekaj, T., Ciszewski ,M. & Szewczyk, E.M. Staphylococcus haemolyticus - an emerging threat in the twilight of the antibiotics age. Microbiology 161,2061-2068(2015). 83. O'Riordan, K. & Lee, J.C. Staphylococcus aureus capsular polysaccharides. Clinical Microbiology Reviews 17,218-234(2004). 84. Rodriguez-Navarro, D.N., Dardanelli, M.S. & Ruiz-Sainz, J.E. Attachment of bacteria to the roots of higher plants. FEMS Microbiol Letters 272,127–136(2007).

35

Journal Pre-proof 85. Wu, H.J., Wang, A.H.J. & Jennings MP. “Discovery of virulence factors of pathogenic bacteria.” Current Opinion in Chemical Biology 12,1–9(2008). 86. Matthysse, A.G., Deora, R., Mishra, M. & Torres, A.G. Polysaccharides cellulose, poly-beta1, 6-n-acetyl-Dglucosamine, and colanic acid are required for optimal binding of Escherichia coli O157:H7 strains to alfalfa sprouts and K-12 strains to plastic but not for binding to epithelial cells. Applied and Environmental Microbiology 74,2384–2390(2008).

of

87. Suttisansanee. U. & Honek, J.F. Bacterial glyoxalase enzymes. Seminars in Cell and

ro

Developmental Biology 22,285-92(2011).

-p

88. Orell, A., Navarro, C.A., Arancibia, R., Mobarec, J.C. & Jerez, C.A. Life in blue: copper

Biotechnology Advances 28,839-48(2010).

re

resistance mechanisms of bacteria and archaea used in industrial biomining of minerals.

lP

89. Huertas, M.J., López-Maury, L., Giner-Lamia, J., Sánchez-Riego, A.M. & Florencio, F.J.

na

Metals in cyanobacteria: analysis of the copper, nickel, cobalt and arsenic homeostasis mechanisms. Life 4,865-886(2014).

ur

90. Grass, G., Rensing, C. & Solioz, M. Metallic Copper as an Antimicrobial Surface. Applied

Jo

and Environmental Microbiology 77,1541–1547(2011). 91. Berg, J., Tom-Petersen, A. & Nybroe, O. Copper amendment of agricultural soil selects for bacterial antibiotic resistance in the field. Letters in Applied Microbiology 40(2),146-51(2005). 92. Knapp, C.W. et al. Relationship between antibiotic resistance genes and metals in residential soil samples from Western Australia. Environmental Science and Pollution Research 24(3),24842494(2017).

36

Journal Pre-proof Table 1. General Genomic features along with their isolation source of Staphylococcus haemolyticus “type strain” MTCC3383(T), RESH and other database strains analyzed in this study. S. No

S. haemolyticus Genome Strain* size (Mb)

GC (%)

No. of No. of No. of Isolation genes rRNA tRNA source

Putative plasmids

Genbank Accession No.

Reference

1

MTCC3383(T)*

2.472

32.7

2,425

6

54

Human

-

LILF00000000

This study

2

JCSC1435

2.697

32.7

2,635

16

60

Human

3

NC_007168

3

R1P1

2.395

32.7

2,390

7

58

Copper coin

1

AJVA00000000

Takeuchi et 2005 NCBI database

4

RIT283

2.527

32.7

2,502

9

49

JFOJ00000000

Gan et al. 2014

5

134634

2.393

32.6

2,389

5

54

Salix (willow) Human

1

CUGS00000000

NCBI database

6

C10D

2.440

32.6

2,355

8

58

Human

-

JQHB00000000

NCBI database

7

C10F

2.458

32.6

2,372

8

58

-

JQHA00000000

NCBI database

8

C10A

2.430

32.6

2,425

5

57

Human

2

JPRW00000000

Chan et al. 2015

9

1HT3

2.403

32.8

2,318

6

Human

-

LAKG00000000

Nair et al. 2016

10

96671

2.473

32.7

2,471

7

Human

2

CVRV00000000

NCBI database

11

1328_SHAE

2.587

32.7

2,615

5

29

Human

1

JVTM00000000

Roach et al. 2015

12

1292_SHAE

2.596

32.8

2,587

13

15

Human

1

JVVE00000000

Roach et al. 2015

13

164_SHAE

2.423

32.6

2,385

4

17

Human

5

JVRQ00000000

Roach et al. 2015

14

235_SHAE

2.484

32.7

2,456

4

20

Human

1

JVPA00000000

Roach et al. 2015

15

285_SHAE

2.525

32.7

2,518

6

30

Human

1

JVMX00000000

Roach et al. 2015

16

Sh29/312/L2

2.561

32.7

2,532

16

60

Human

-

CP011116

17

SE2.14*

2.336

32.4

2,301

5

57

Rice seed

-

JRVR00000000

De Almeida et al. 2015 This study

18

SE3.8*

2.343

32.4

2,310

7

54

Rice seed

-

JRVS00000000

This study

2.342 32.5 19 SE3.9* (*) Strains sequenced in the present study

2,325

7

59

Rice seed

-

JRVT00000000

This study

n r u

l a

Jo

64 55

e

o r p

r P

Human

37

f o

-

al.

Journal Pre-proof Table 2. Annotation of large dynamic regions (LDRs) in RESH, WESH, HMSH and COSH. GC content (in percentage) and size of LDRs are mentioned in brackets.

WE6 Total length (14001 bp) GC content (29.64 %) WE7 Total length (54001 bp) GC content (30.18 %)

WE8 Total length (16001 bp) GC content (33.54 %) HM1 Total length (60000 bp) GC content (31.08 %)

oo f

pr

e-

D-mannonate oxidoreductase; Mannonate dehydratase; Uronate isomerase; Glucuronide transporter (UidB); 2-dehydro-3-deoxygluconate kinase; 4-hydroxy-2-oxoglutarate aldolase; 2-dehydro-3deoxyphosphogluconate aldolase; Hydrolase, haloacid dehalogenase -like family; Betahexosaminidase; Glycine betaine ABC transport system, ATP-binding protein; Glycine betaine, permease protein OpuAB, glycine betaine-binding protein OpuAC; hypothetical protein (2) Ferrichrome-binding periplasmic protein precursor; Pathogenicity island SaPIn1; superantigenencoding pathogenicity islands (SaPI) (10); Transcriptional regulator; Phage protein (2) Ferrichrome binding periplasmic protein precursor; Pathogenicity island SaPIn1; Putative terminase, SaPIs; homolog in SaPI (11); Putative DNA helicase, SaPI; Phage protein; hypothetical protein (5) Acetyltransferase; transcriptional regulator (pksA); Resolvase/integrase; Aldehyde dehydrogenase B; Putative NADH-dependent flavin oxidoreductase; Transcriptional regulator (MarR); Transcriptional regulator, (TetR); ABC transporter ecsA-like protein; hypothetical protein (9) ATP-binding protein p271; Antiseptic resistance protein QacA; HTH-type transcriptional regulator QacR; macrolide 2'-phosphotransferase; Repressor CsoR of the copZA operon; Copper-translocating P-type ATPase, Copper(I) chaperone CopZ; Aminoglycoside N6'-acetyltransferase; replication initiator protein A; Replication-associated protein; PhnB protein; putative DNA binding 3demethylubiquinone-9 3-methyltransferase domain protein; Transcriptional regulator, (DeoR), putative primase; ccrB; Type I RM system, restriction subunit R; specificity subunit S; DNA methyltransferase subunit M; decarboxylas e; amino acid permease family protein; Predicted tyrosine transporter, NhaC family; Tyrosyl-tRNA synthetase; His repressor; Universal stress protein family; Sulfate permease; resolvase; Resolvase/integrase Bin; ATP-binding protein p271; Cadmium efflux system accessory protein; ME (3); Hypothetical protein (17) TPR domain in aerotolerance operon; bacteriocin ABC transporter, ATP-binding/permease protein, putative sensor histidine kinase; Osmosensitive K+ channel histidine kinase KdpD; Potassiumtransporting ATPase A, B, C chain; ME protein (3); hypothetical protein (4) HMSH Osmosensitive K+ channel histidine kinase KdpD; Potassium-transporting ATPase A, B, C chain; Antiadhesin Pls; Poly(glycerol-phosphate) alpha-glucosyltransferase GftA; GftB: Glycosyl transferase; Enoyl-[acyl-carrier-protein] reductase; LysR-family regulatory protein; Transporter, MFS superfamily; PhnB protein; putative DNA binding 3-demethylubiquinone-9 3-methyltransferase domain protein; Transcriptional regulator, DeoR family; putative primase; Cassette chromosome recombinase B; Type I RM system (restriction subunit R; subunit S; DNA -methyltransferase subunit M; D-3-phosphoglycerate dehydrogenase; Major myo-inositol transporter IolT; Oligo-1,6glucosidase; Predicted transcriptional regulators; MSM (multiple sugar metabolism) operon regulatory protein; Phage protein; Transcriptional regulator, TetR family; ABC transporter, ATP-

Pr

WE5 Total length (14001 bp) GC content (29.23 %)

UDP-glucose 4-epimerase (galE); Putative glycosyltransferase; Capsular polysaccharide synthesis enzymes Cap5K, Cap5J, Cap5I, Cap5H, Cap5G, Cap5F, Cap8E, Cap8M; UDP-N-acetylglucosamine 2-epimerase Hypothetical homolog in superantigen-encoding pathogenicity islands SaPI (4); Putative primase; Phage proteins (5); hypothetical protein (5)

al

WE1 Total length (40001 bp) GC content (34.13 %) WE2 Total length (15001 bp) GC content (30.17 %) WE3 Total length (12501 bp) GC content (32.78 %) WE4 Total length (15001 bp) GC content (35.05 %)

rn

RE2 Total length (12001 bp) GC content (30.18 %) RE3 Total length (10001 bp) GC content (28.8 %)

RESH Replication initiator protein A; Replication-associated protein; Lipid A export ATP-binding/permease protein MsbA; Putative SAM-dependent methyltransferase; Epidermin leader peptide processing serine protease EPIP precursor; Lanthionine biosynthesis protein LanM, ATP-binding protein p271; ABC transporter ATP-binding protein; Membrane spanning protein; ATPase in DNA repair; Transcriptional regulator HxlR; ME proteins (MEP); hypothetical protein (6) Bacteriophage terminase; spore coat protein; mobile element-associated protein; virulence-associated protein E; putative primase; antibiotic resistance island carrying fusB; excisionase; Phage protein; pathogenicity island protein (5); hypothetical protein (5) PTS system; Ribose operon repressor; Ribokinase, Nitric oxide-dependent regulator DnrN or NorA; Replication protein Rep; Mobile element protein (2); Superfamily I DNA/RNA helicase protein (2) Type III restriction enzyme; hypothetical protein WESH Phage associated proteins (35); hypothetical protein (21)

Jo u

RE1 Total length (21001 bp) GC content (26.66 %)

38

Journal Pre-proof

oo f

Membrane spanning protein; ABC transporter ATP-binding protein; macrolide 2'-phosphotransferase; transcriptional regulator (pksA); DNA-invertase; ATP-binding protein p271; Transcriptional regulator, (PBSX); Peptidase, M23/M37 family; hypothetical protein (12) autolysin Atl; DNA replication protein DnaC; Pathogenesis -related transcriptional factor and ERF; Phage associated proteins (38); hypothetical protein (22)

pr

ME protein; acetyltransferase, GNAT family; ATP-binding protein p271; Tn552 trans posase; DNAinvertase; HTH-type transcriptional regulator pksA; Transcriptional regulator, TetR family; hypothetical protein (9) D-mannonate oxidoreductase; Mannonate dehydratase; Uronate isomerase; Glucuronide transporter (UidB); 2-dehydro-3-deoxygluconate kinase; 4-hydroxy-2-oxoglutarate aldolase; Hydrolase (HAD family); Beta-hexosaminidase; Glycine betaine ABC transport system, ATP-binding protein OpuAA, permease protein OpuAB, glycine betaine-binding protein OpuAC; hypothetical protein (2) COSH Ribulokinase, Transcriptional regulator, (TetR), 3-hydroxyacyl-CoA dehydrogenase, 3hydroxybutyryl-CoA dehydratase, osmotically activated L-carnitine/choline ABC transporter, ATPbinding protein OpuCA, permease protein OpuCB, substrate-binding protein OpuCC, permease protein OpuCD, Tributyrin esterase, 2-dehydropantoate 2-reductase; Glycine betaine transporter OpuD; Ribosomal-protein-L7p-serine acetyltransferase; Glycine betaine ABC transport system, ATPbinding protein OpuAA Glycine betaine ABC transport system, permease protein OpuAB (2); Beta hexosaminidase; Hydrolase (HAD like family); 4-hydroxy-2-oxoglutarate aldolase, 2-dehydro-3deoxygluconate kinase; Glucuronide transporter UidB; Uronate isomerase; Mannonate dehydratase; D-mannonate oxidoreductase; hypothetical protein (4) Ferrichrome-binding periplasmic protein precursor; Pathogenicity island SaPIn1; Putative terminase, SaPI; homolog in SaPI (11); Putative DNA helicase, SaPIs; Phage protein; hypothetical protein (5)

CO4 Total length (25001 bp) GC content (29.92 %)

Jo u

CO2 Total length (13001 bp) GC content (30.92 %) CO3 Total length (50001 bp) GC content (31.94 %)

rn

al

CO1 Total length (29001 bp) GC content (33.27 %)

Transcriptional regulator, AraC family; Putative terminas e, SaPI; homolog in SaPI (11); putative primase; transcriptional regulator; Integrase, SaPI; hypothetical protein (9)

e-

HM3 Total length (48001 bp) GC content (32.88 %) HM4 Total length (16001 bp) GC content (28.93 %) HM5 Total length (14001 bp) GC content (28.90 %) HM6 Total length (43001 bp) GC content (35.17 %) HM7 Total length (12001 bp) GC content (29.33 %) HM8 Total length (15001 bp) GC content (34.76 %)

Pr

HM2 Total length (20001 bp) GC content (28.06 %)

binding protein; ABC-type multidrug transport system permease component; Glutamate synthase [NADPH] large chain; ME protein (6); hypothetical protein (21) Glutamate synthase (2); ME protein; ATP-grasp enzyme-like protein; Spermidine N1acetyltransferase; Undecaprenyl-phosphate galactosephosphotransferase; FMN-dependent NADHazoreductase; Capsular polysaccharide synthesis Cap5A, Cap5F, Cap5G; Capsular polysaccharide synthesis Cap8C; Cap8D; Cap8E; Cap8L; Tyrosine-protein kinase EpsD; UDP-N-acetylglucosamine 2-epimerase; Mn-dependent protein-tyrosine phosphatase; Oligosaccharide repeat unit polymerase Wzy; O-antigen ligase and flippase Wzx; hypothetical protein (4) Phage associated protein (37); Beta-lactamase repressor BlaI; Beta-lactamase regulatory sensortransducer BlaR1; Beta-lactamase; hypothetical protein (18)

Replication initiator protein A; OriT nickase Nes; Aminoglycoside N6'-acetyltransferase; Putative secreted protein; Copper(I) chaperone CopZ; Copper-translocating P-type ATPase; Repressor CsoR of the copZA operon; Cobalt-zinc-cadmium resistance protein CzcD; ATP-binding protein p271; transcriptional regulator QacR; Antis eptic resistance protein QacA; sensor histidine kinase; Osmosensitive K+ channel histidine kinase KdpD; Potassium-transporting ATPase A, B and C; Antiadhesin Pls, binding to squamous nasal epithelial cells; transcriptional regulator; thiJ/pfpI family protein; oxidoreductase ylbE; Putative oxidoreductase YncB; Cadmium resistance protein; Cadmium efflux system accessory protein; Arsenate reductase; Arsenic efflux pump protein; Arsenical resistance operon repressor; Copper-translocating P-type ATPase; Protein export cytoplasm protein SecA ATPase RNA helicase; Type I RM system, restriction subunit R; Proline/sodium symporter PutP; probable monooxygenase; Ferrous iron transport protein B; ME protein (5); hypothetical protein (20) Phosphate regulon PhoB (SphR); Phosphate regulon sensor protein PhoR (SphS); Cobalt -zinccadmium resistance protein; ThiJ/PfpI family protein; D-arabino-3-hexulose 6-phosphate formaldehyde-lyase; Transcriptional regulator (HxlR); 3-ketoacyl-CoA thiolase; Long-chain-fattyacid-CoA ligase; 6-phospho-3-hexuloisomerase; D-arabino-3-hexulose 6-phosphate formaldehydelyase; LmbE family protein; GTP cyclohydrolase I; homolog within ESAT -6 gene cluster; immunodominant antigen B; Malate:quinone oxidoreductase; ATP-binding protein p271; poly (glycerol-phosphate) alpha-glucosyltransferase; Poly(glycerol-phosphate) alpha-glucosyltransferase; 39

Journal Pre-proof ME protein (3); hypothetical protein (8)

Table 3. Deduced peptides and proteins derived from the novel type II lantibiotic gene cluster in RESH strains, ORFs, their size, putative function, GC content (%) and sequence homology. S. ORF No.

GC content Size of Putative of ORF (%) putative function protein (aa) 28.65 377 Insertion Sequence 27.05 572 Transportation/ Secretion

ME

2.

LanT

3.

MT

27.67

270

Methylation

4.

LanP

25.17

420

Leader cleavage

5.

LanM

23.25

968

Dehydration & cyclization

6.

LanA1

30.76

64

Bacteriocin

7.

LanA2

36.84

75

Bacteriocin

8.

X

22.56

324

Unknown

9.

ME

31.03

202

10

LanI

29.45

231

11.

LanE

25.16

12.

LanF

13.

LanR

homolog

[Genbank a.a Identities (%) IS1181, 83%

Insertion Sequence [AKC75357.1] ABC transporter (ATP-binding protein)/ [Staphylococcus simulans], [WP_061855333.1] Putative SAM-dependent methyltransferase, [CDG24502.1] Peptidase/ lantibiotic leader peptide processing serine protease, [WP_042597676.1] Bacteriocin formation protein, [WP_000875568.1]

al

Pr

e-

pr

oo f

1.

Sequence accession]

Jo u

rn

transposition

Immunity

256

Immunity

27.26

873

Immunity

31.05

116

Regulation

Bacteriocin [Bacillus cereus], [WP_033669113.1] Bacteriocin [Bacillus sp.], [WP_029953015.1] hypothetical protein [Staphylococcus massiliensis], WP_017176848.1 transposase [Staphylococcus epidermidis], [WP_002494747.1]

57%

38% 35%

34% 46% 35% 30% 97%

Lantibiotic ABC transporter, ATP- 98% binding protein [Staphylococcus aureus], [WP_048665542.1] Membrane spanning 89% protein/permease component; [Staphylococcus hominis] [CAA83064.1] ATPase [Staphylococcus cohnii 85% subsp. cohnii], [KKI62970.1]

40

HxlR family transcriptional regulator 97% [Staphylococcus haemolyticus], [WP_037548185.1]

Journal Pre-proof Highlights

oo f pr ePr al



rn



First genome wide analysis of Staphylococcus haemolyticus isolated from rice. Investigation of ecological variants allowed us to obtain insights into the pattern of genomic flux in diversification of members belonging to Staphylococcus haemolyticus. This study has provided novel insights into evolution of Staphylococcus haemolyticus lineages and genes that may be important for its success in non-human niches and its well-known lifestyle as human associated bacterium and opportunistic human pathogen. The knowledge on genome dynamics and potential is valuable in understanding and management of this species.

Jo u

 

41

Journal Pre-proof

Author's contributions

V.C. and P.B.P. conceived and designed the study. V.C. collected samples, isolated and characterized

endophytes,

performed

genome

sequencing,

comparative

Jo u

rn

al

Pr

e-

pr

oo f

bioinformatics analysis, V.C. and P.B.P. wrote the paper.

42

genomics

and