Whole genome sequence-based serogrouping of Listeria monocytogenes isolates

Whole genome sequence-based serogrouping of Listeria monocytogenes isolates

Accepted Manuscript Title: Whole genome sequence-based serogrouping of Listeria monocytogenes isolates Author: Patrick Hyden Ariane Pietzka Anna Lennk...

1014KB Sizes 0 Downloads 87 Views

Accepted Manuscript Title: Whole genome sequence-based serogrouping of Listeria monocytogenes isolates Author: Patrick Hyden Ariane Pietzka Anna Lennkh Andrea Murer Burkhard Springer Marion Blaschitz Alexander Indra Steliana Huhulescu Franz Allerberger Werner Ruppitsch Christoph W. Sensen PII: DOI: Reference:

S0168-1656(16)31348-7 http://dx.doi.org/doi:10.1016/j.jbiotec.2016.06.005 BIOTEC 7583

To appear in:

Journal of Biotechnology

Received date: Revised date: Accepted date:

16-2-2016 3-6-2016 7-6-2016

Please cite this article as: Hyden, Patrick, Pietzka, Ariane, Lennkh, Anna, Murer, Andrea, Springer, Burkhard, Blaschitz, Marion, Indra, Alexander, Huhulescu, Steliana, Allerberger, Franz, Ruppitsch, Werner, Sensen, Christoph W., Whole genome sequencebased serogrouping of Listeria monocytogenes isolates.Journal of Biotechnology http://dx.doi.org/10.1016/j.jbiotec.2016.06.005 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Whole genome sequence-based serogrouping of Listeria monocytogenes isolates

Patrick Hyden1, Ariane Pietzka2, Anna Lennkh2, Andrea Murer2, Burkhard Springer2, Marion Blaschitz2, Alexander Indra2, Steliana Huhulescu2, Franz Allerberger2, Werner Ruppitsch2,3, Christoph W. Sensen1

1

Institute of Molecular Biotechnology, Graz University of Technology, Graz, Austria

2

Institute of Medical Microbiology and Hygiene, Austrian Agency for Health and Food

Safety, Graz, Austria 3

Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna,

Austria

Corresponding author: Dr. Ariane Pietzka Austrian Agency for Health and Food Safety Institute of Medical Microbiology and Hygiene National Reference Laboratory for Listeria Beethovenstr. 6, A-8010 Graz Phone: +43 (0) 50555-61269 e-Mail: [email protected]

1

Highlights:  Serogroup prediction of Listeria monocytogenes is described for all twelve serotypes by whole genome sequencing  Determination of serotypes is carried out by extraction of specific sequence targets of the genome  The application of whole genome sequencing as single method shows the potential to replace the most currently used molecular methods

Abstract Whole genome sequencing (WGS) is currently becoming the method of choice for characterization of Listeria monocytogenes isolates in national reference laboratories (NRLs). WGS is superior with regards to accuracy, resolution and analysis speed in comparison to several other methods including serotyping, PCR, pulsed field gel electrophoresis (PFGE), multilocus sequence typing (MLST), multilocus variable number tandem repeat analysis (MLVA), and multivirulence-locus sequence typing (MVLST), which have been used thus far for the characterization of bacterial isolates (and are still important tools in reference laboratories today) to control and prevent listeriosis, one of the major sources of foodborne diseases for humans. Backward compatibility of WGS to former methods can be maintained by extraction of the respective information from WGS data. Serotyping was the first subtyping method for L. monocytogenes capable of differentiating 12 serovars and national reference laboratories still perform serotyping and PCR-based serogrouping as a first level classification method for Listeria monocytogenes surveillance. Whole genome sequence based core genome MLST analysis of a L. monocytogenes collection comprising 172 isolates spanning all 12 serotypes was performed for serogroup determination. These isolates clustered according to their serotypes and it was possible to group them either into the IIa, IIc, IVb or IIb clusters, respectively, which were generated by minimum spanning tree (MST) and neighbor joining (NJ) tree data analysis, demonstrating the power of the new approach.

2

Keywords: Listeria monocytogenes; serotyping; serogrouping; serogroup prediction; whole genome sequencing; core genome MLST

1. Introduction Listeria monocytogenes is the causative agent of listeriosis, one of the major food-borne diseases affecting humans. It is a facultative intracellular pathogen of humans and animals and widespread in the environment (Pietzka et al., 2011; Allerberger et al., 2015). Listeriosis is characterized by symptoms like gastroenteritis, encephalitis, meningitis, and septicemia. Typically, pregnant woman, the elderly, and immuno-compromised people are affected. The high case-fatality rate of 20% to 30% makes L. monocytogenes a leading cause of food-borne related human mortality (Nyarko & Donnelly 2015). L. monocytogenes is a ubiquitous microorganism with the ability to survive in a variety of food sources (for example cheese), to grow at low temperatures (for example on cooled shelves in stores), to survive even freezing and high salt conditions and to withstand nitrite preservation methods, respectively. The ability to form biofilms on food contact surfaces (Allerberger & Wagner 2010; Jordan et al., 2008) can facilitate the persistence, dissemination and food contamination at several stages of food production (Allerberger et al., 2015). Nearly all sporadic and epidemic human listeriosis cases are linked to the consumption or use of contaminated food or feed (Schlech et al., 1983, Allerberger et al., 2015). During the investigation of a listeriosis outbreak, rapid and accurate subtyping methods are essential for the identification and subsequent elimination of the source of the contaminated food (Pichler et al. 2011). Serotyping of L. monocytogenes is based on somatic (O) and flagellar (H) antigens and was the first Listeria monocytogenes subtyping scheme. This approach was developed at the end of the 1970’s (Seeliger & Höhne 1979) and allows the differentiation of 12 serotypes. Phylogenetic investigations revealed that the species L. monocytogenes consists of four genetic lineages, Lineages I-IV, comprising specific serotypes (lineage I: serotypes 1/2b, 3b, 4b, 4d, 4e, and 7; lineage II: serotypes 1/2a, 1/2c, 3a, and 3c; lineage III: serotypes 4b, 1/2a, 4a and 4c; lineage IV: 4a, 4c serotypes (Haase et al., 2014). About 96 % of all reported human listeriosis cases are caused by Lineages I and II (serotypes 4b, 1/2a, 1/2b) isolates (Kasper et al., 2009; Doumith et al., 2004; Seeliger & Höhne 1979).

3

For practical and traditional purposes, serotyping is still the first level response subtyping method in public health laboratories, despite its limited value for tracking isolates (Nyarko & Donnelly 2015, Doumith et al., 2004) due to the low discriminatory power of the method, the insufficient reproducibility and antigen sharing between serotypes, respectively (Schönberg et al., 1996). Testing with antisera sets is time-consuming and demanding. Diagnostic sera/antisera have to be checked on behalf of the internal quality assurance of the laboratory using the sera/antisera on a regular basis (Rili-BÄK, part B3; valid in Germany since 1.5.2015). Above all, the production of the sera requires the use of vertebrate animals and to the best of our knowledge there is currently only one manufacturer for Listeria sera/antisera worldwide (Denka Seiken, Japan). To circumvent these serotyping limitations, a five-plex PCR assay including genes lmo0737, lmo1118, ORF2819, ORF2110, and prs was developed for molecular separation of the four major serotypes (1/2a, 1/2b, 1/2c, and 4b) (Doumith et al., 2004). With the recent evolution of whole-genome sequencing technologies, high-resolution typing schemes have been developed for the characterization of L. monocytogenes strains (Ruppitsch et al., 2015a; Pightling et al., 2015; Kwong et al., 2016; Maury et al., 2016). The developed core genome (cg) MLST scheme based typing of L. monocytogenes represents an expansion of the classical seven gene MLST scheme (Salcedo et al., 2003) and is in all aspects superior for the tracking and source identification, as compared to the current gold standard methods PFGE and fAFLP (Schmid et al., 2014; Ruppitsch et al., 2015b). Access to the genomic sequence allows not only strain characterization at a very high resolution, it also facilitates the rapid extraction of specific sequence data, thus making sequence-based serotyping quite promising as the new gold standard for the rapid and accurate characterization of Listeria strains (Kwong et al., 2016). In our study, we applied next-generation sequencing (NGS) based core-genome (cg) MLST minimum-spanning tree (MST) analysis to a collection of 172 L. monocytogenes isolates from the Austrian National Reference Laboratory for Listeria (NRLL) including isolates from the Seeliger collection (Haase et al., 2011), as well as type strains for all known serotypes (Ruppitsch et al. 2015), and in addition 45 isolates from the NRLL with previously uncharacterized serotypes (i.e. a total of 207 isolates), to determine the serogroup of all isolates via core genome MLST (cgMLST) profiles. Isolates with known serotypes and lineages were selected to cover the entire genomic diversity of the species as described previously (Ruppitsch et al., 2015). In addition to the known multiplex-PCR targets (Doumith et al., 2004, Kwong et al., 2016) further serogroup-specific determinants were identified in 4

this work to improve usability and robustness of the combined speciation and serotyping from WGS data in a single workflow.

2. Material and Methods 2.1 Bacterial isolates and DNA purification Complete genomes of four isolates were downloaded from NCBI (table 1). The 168 isolates (Austria n=92, Germany n=48, USA n=6, Unknown n=6, Canada n=5, France n=4, United Kingdom n=3, Denmark n=3, New Zealand n=1) including 15 reference strains comprising the twelve serotypes (1/2a n=30, 1/2b n=21, 1/2c n=4, 3a n=1, 3b n=2, 3c n=1, 4a n=1, 4b n=42, 4c n=2, 4d n=1, 4e n=1, 7 n=1, 61 unknown), which were used for the assignment of L. monocytogenes isolates to serogroups by cgMLST, using whole genome sequencing (WGS) are listed in Supplement Table 1. In addition, a set of 45 isolates (Supplement Table 2) from NRLL, each without previous serogroup characterization, was used to evaluate the strategies to assign serogroups by cgMLST and WGS (test set). All strains were cultured overnight at 37°C on RAPID´L.Mono agar (Bio-Rad, Vienna, Austria) for species confirmation and subcultured on Columbia blood agar plates (BioMérieux, Marcy I’Etoile, France) prior to DNA extraction using the MagAttract HMW DNA Kit, according to the instructions of the manufacturer (Qiagen, Hilden, Germany). Isolates of Seeliger’s historical “Special Listeria Culture Collection” (Haase et al., 2011) deposited at the German-Austrian binational consiliary laboratory for Listeria (AGES Graz, Austria) were reconstituted from the original agar slant, including also the oldest available Listeria strain (SLCC208) isolated from a soldier during the first world war (Hyden et al., 2015; A. Leclercq, personal communication), by adding Trypticase Soy Broth (BioMérieux, Marcy I’Etoile, France). Subsequently, the strains were subcultured on Columbia blood agar plates (BioMérieux, Marcy I’Etoile, France) and plated on RAPID’L.Mono plates (Bio-Rad, Vienna, Austria) for species verification, and finally grown overnight on Columbia blood agar plates (BioMérieux, Marcy I’Etoile, France) for isolation of genomic DNA, using the MagAttract HMW DNA Kit (Qiagen, Hilden, Germany), according to the instructions of the manufacturer. Molecular serotyping was performed for the confirmation of serogroups, as described previously (Doumith et al., 2004). 2.2 Whole genome sequencing, assembly and data analysis

5

Sequencing libraries were prepared using Nextera XT chemistry (Illumina Inc., San Diego, CA, USA) for a 2 x 300 bp sequencing run on an Illumina MiSeq sequencer. Samples were sequenced with an aim of a minimum coverage of 70-fold by preparing a library of 72 L. monocytogenes genomes each time. The resulting FASTQ files were first quality-trimmed and then de-novo assembled using the Velvet assembler version 1.1.08 (Zerbino & Birney, 2008) and subsequently integrated into the Ridom SeqSphere+ software (Ruppitsch et. al. 2015a) version 3.1 (Ridom GmbH, Münster, Germany). Sequence reads were trimmed at their 5'- and 3'-ends until an average PHRED value of 30 was reached in a window of 20 bases. The assembly was performed with the Velvet assembler, with the k-mer values and coverage cutoffs automatically optimized for each genome, based on the average length of contigs with > 1000 bp. Contigs with an overall length less than 200 bp or an average coverage below five were discarded. All raw reads generated were submitted to the European Nucleotide Archive (http://www.ebi.ac.uk/ena/) under the study accession number PRJEB12548. Assembled genomes were compared by a recently developed core genome MLST scheme using SeqSphere+ as described previously (Ruppitsch et al. 2015a). Minimum spanning tree was visualized in SeqSphere and colored in InkScape v 0.91. The neighbor-joining tree was exported from SeqSphere in newick-tree format, visualized on itol.embl.de (Letunic and Bork, 2016) and adjusted in InkScape. For serogroup determination, gene targets as described by Doumith et al. (2004) were used to create a serogroup task template in SeqSphere. Sequences of the loci lmo0737 and lmo1118 from reference strain EGD-e (Accession-No.: NC_003210) and “ORF2110” (now labeled as Lm4b_01887), “ORF2819” (Lm4b_02048) and prs (Lm4b_00196) from 4b strain Clip80459 (Accession-No.: FM242711) served as references. A Python script was written to extract the serogroup information from the returned allelic profiles. The script uses a simple scoring mechanism, comparing the gene profile (gene found or gene not found) to all known profiles. A match returns a reward of 1, the serogroup with a score of five is returned, otherwise the serogroup is returned as undetermined. Classical multilocus sequence types (MLST) were extracted from the draft genomes using SeqSphere+. Serogroup-specific genes were identified by cgMLST analysis and accessory MLST allelic profiles of 172 genomes. Genes, which revealed specific variants for at least one certain serogroup, were selected. The allelic profiles of test samples were tested with these groups of genes, using a Python script, which was created to predict the respective serogroup The script is based on the serogroup-PCR script with additional possibility to use the wildcards (* and +), and penalties for mismatches. The serogroup with the highest score was returned if the 6

overall score was greater than 0. Both Python scripts are available via GitHub (https://github.com/phyden/serogroup_cgmlst).

3. Results The isolates used to identify serogroup specific genes had a median sequence coverage of 70fold, with the minimum being 23-fold and a maximum coverage of 165-fold. The 45 samples used for evaluation were sequenced with a median coverage of 49-fold, a minimum of 36-fold and a maximum of 83-fold. A minimum spanning tree comprising of 217 L. monocytogenes isolates, which included representatives of all 12 serotypes was calculated based on cgMLST (Figure). Serogroupspecific MLST types, as well as cluster types (CT) based on cgMLST (Table 2), were identified. Serogroup IIa isolates were assignable to 19 different clonal complexes or STs and to 37 different CTs, serogroup IIb isolates to 12 STs and 28 CTs, serogroup IIc isolates to 3 STs and to 10 CTs, and serogroup IVb isolates to to 18 STs and 49 CTs (Table 2). Serogroup-specific targets were selected by observing sequence types of all loci in the cgMLST and the accessory genome MLST. 35 targets were identified, each revealing certain specificity for different serogroups (Table 3). Serotype 1/2a strain EGD-e clustered together with serogroup IIc isolates. EGD-e-specific targets identical to serogroup IIa isolates, but different to IVb, IIb, and IIc isolates were lmo0164, lmo0266, lmo0572, lmo1395, lmo1720 and lmo2059. For evaluation purposes, the set of 45 isolates from the NRLL with previously uncharacterized serotypes, whose serogroup was determined experimentally after whole genome sequencing, (Supplement table 2) was used to predict their respective serogroup, using the obtained cgMLST allelic profile. 41 isolates where correctly assigned to the respective serogroups, while four isolates were wrongly classified: three of them (MRL-15/01411, MRL-15/01413, MRL-15/01448) were assigned to group IIc instead of IIa and one (MRL-15/01419) was incorrectly assigned to group IVb instead of IIb. These results were further evaluated by sequence-typing of the genes used for serogroup PCR in all 217 retrieved draft assemblies. With the exception of 13 isolates, the remaining 204 isolates could be accurately assigned to the serogroups. Nine of these 13 isolates were part of the test set of strains with a previously uncharacterized serogroup. 7

4. Discussion Prior to high-throughput genome-sequencing methods being used, diagnostic procedures for L. monocytogenes strain characterization involved classical serotyping, phage-typing, PCRbased methods and PFGE. WGS outperforms all former methods in all respects. As a consequence, outbreak investigation can be improved considerably and source identification is significantly enhanced (Schmid et al., 2014, Ruppitsch et al., 2015b, Kwong et al., 2016), which is beneficial to the food industry and may help to improve consumer safety. With the evolution of sequencing technologies and development of data analysis pipelines useable for public health laboratories, a single method has therefore become available that has the potential to replace many of the methods used thus far (Ruppitsch et al. 2015a). A standardized cgMLST based typing scheme is superior in many respects to SNP based or whole genome (wg)MLST based typing (Ruppitsch et al., 2015). The defined cgMLST scheme provides similar results, allows the global exchange of data and is easier to standardize than mapping followed by SNP calling (Hyden et al., 2016) or wgMLST. Nevertheless, to obtain higher resolution outbreak or lineages specific SNP or allele schemes can be used in addition to standardized schemes. Serogroup determination based on cgMLST allelic profiles worked remarkably well for 204 isolates tested in our study, allowing a determination of the serogroup for more than 90% of the strains, while determination based on the genes used in the PCR failed for 13 of the isolates (6.3%). In isolate SLCC4771, which is the only Lineage IV and serogroup 4c isolate, a serogroup IIb specific gene was detected. In comparison to the original PCR approach, the analysis scheme was based on 35 genes, instead of the originally used five, which made it more robust to genes, which are absent from the dataset, due to the usage of incomplete draft assemblies for each of the Listeria strains studied. All genes which showed specific types for a respective serogroup were also found in the reference strain EGD-e (Bécavin et al., 2014), whose genome sequence formed the basis of the compilation of the the core genome as well as the accessory genome MLST targets (Ruppitsch et al., 2015a). This implies that a large amount of serotype- or serogroup-specific genes is not part of the cgMLST scheme today. Still the current cgMLST allelic profiles could be used to determine the serogroup of samples with previously uncharacterized serogroup membership in 41 out of 45 cases (over 90%). The four isolates which were wrongly classified show the current limitations of the scheme developed in this study. Especially 8

samples with a large genomic distance to the closest relative of its serogroup (e.g. MRL15/01419) were difficult to classify by this approach. The imperfect delineation between isolates of serogroup IIc and IIa on a genomic basis led to false classification in three cases, however this led to an even larger number (seven out of 45, i.e 15.6%) of false classifications when using the serogroup PCR method, as reported in this study. Outlier isolates, i.e. those, which do not belong to a major serogroup such as serogroups 4a and 4c, are often known to lack some of the genes that are used as markers for the more prominent serogroups. While this might also occur randomly for samples with low quality assemblies, the depth of sequence coverage could be adjusted in determination experiments to counter this effect. We plan to work further on the collection of marker genes to improve the resolving power of cgMLST, especially for the outliers. We believe that the shift in the characterization of isolates during Listeria outbreaks from more classical molecular biological techniques to high-throughput DNA sequencing is showing enough potential to assume that other kinds of microbial threads might be soon monitored and characterized in a similar fashion on a routine basis.

5. Conclusions Whole genome sequencing is the ultimate method for characterization of bacterial isolates as it provides the highest possible resolution in strain typing (i.e. the DNA sequence level) and represents a new paradigm for outbreak investigation and contamination-source tracking. An additional benefit of NGS is the opportunity to extract specific information, such as classical MLST profiles for backward data comparability, the determination of virulence and antibiotic resistance status, as well as, the assignment to serogroups as a first level information (as shown here), respectively, which is a clear additional benefit of the new technology. Some additional work is still necessary and ongoing to elucidate serotype-specific targets for direct WGS based serotyping and routine characterization of Listeria outbreaks using this technique in reference laboratories worldwide in the near future.

9

6. References Allerberger, F., Wagner, M., 2010. Listeriosis: a resurgent infection. Clin. Microbiol. Infect. 16, 16-23. Allerberger, F., Bago, Z., Huhulescu, S., Pietzka, A. Listeriosis: The dark side of refrigeration and ensiling in Zoonoses: Infections affecting humans and animals – Focus on public health aspects A. Sing (Ed.), Springer Verlag, Heidelberg, 2015, pp. 249-286 ISBN: 978-94-0179456-5 (Print) 978-94-017-9457-2 (Online). Bécavin, C., Bouchier C., Lechat, P., Archambaud, C., Creno, S., Gouin, E., Wu, Z., Kühbacher, A., Brisse, S., Pucciarelli, G., García-del Portillo, F., Hain, T., Portnoy, D., Chakraborty, T., Lecuit, M., Pizarro-Cerdá, J., Moszer, I., Bierne, H. 2014. Comparison of widely used Listeria monocytogenes strains EGD, 10403S, and EGD-e highlights genomic differences

underlying

variations

in

pathogenicity.

mBio

5(2):e00969-

14.doi:10.1128/mBio00969-14. Doumith, M., Buchrieser, C., Glaser, P., Jacquet, C., Martin, P., 2004. Differentiation of the major Listeria monocytogenes serovars by multiplex PCR. J.Clin.Microbiol. 42, 3819-3822. Fretz, R., Pichler, J., Sagel, U., Much, P., Ruppitsch, W., Pietzka, A.T., Stöger, A., Huhulescu, S., Heuberger, S., Appl, G., Werber, D., Stark, K., Prager, R., Flieger, A., Karpísková, R., Pfaff, G., Allerberger, F.: Update: Multinational listeriosis outbreak due to “Quargel”, a sour milk curd cheese, caused by two different L. monocytogenes serotype 1/2a strains, 2009-2010. Euro Surveill 2010, 15:pii=19543. Haase, J.K., Murphy, R.A., Choudhury, K.R., Achtmann, M., 2011. Revival of Seeliger's historical 'Special Listeria Culture Collection'. Environ Microbiol. 2011 Dec;13(12):3163-71. doi: 10.1111/j.1462-2920.2011.02610.x. Epub 2011 Oct 18. Haase, J.K., Didelot, X., Lecuit, M., Korkeala, H., 2014. L. monocytogenes MLST Study Group, Achtman M 2014. The ubiquitous nature of Listeria monocytogenes clones: a largescale multilocus sequence typing study. Environ Microbiol. 16:405–416. doi:.10.1111/14622920.12342. Hyden, P., Pietzka, A., Allerberger, F., Springer, B., Sensen, C., Ruppitsch, W. 2015. Draft genome sequence of a 94-year old Listeria monocytogenes isolate, SLCC208. Genome Announc 4(1):e01572-15. doi:10.1128/genomeA.01572-15. Jordan, S.J., Perni, S., Glenn, S., Fernandes, I., Barbosa, M., Sol, M., Tenreiro, R.P., Chambel, L., Barata, B., Zilhao, I., Aldsworth, T.G., Adriao, A., Faleiro, M.L., Shama, G., 10

Andrew, P.W., 2008. Listeria monocytogenes biofilm-associated protein (BapL) may contribute to surface attachment of L. monocytogenes but is absent from many field isolates. Appl. Environ. Microbiol. 74, 5451-5456. Kasper, S., Huhulescu, S., Auer, B., Heller, I., Karner, F., Würzner, R., Wagner, M., Allerberger, F. 2009. Epidemiology of listeriosis in Austria. Wien Klin Wochenschr. 121, 113-9. doi: 10.1007/s00508-008-1130-2. Kwong, J.C., Mercoulia, K., Tomita, T., Easton, M., Li, H.Y., Bulach, D., Stinear, T.P., Seemann, T., Howden, B.P. 2016. Prospective whole-genome sequencing enhances national surveillance

of

Listeria

monocytogenes.

J

Clin

Microbiol;

54,333-342.

doi:10.1128/JCM.02344-15. Letunic, I., Peer, B. 2016. Interactive Tree of Life (iTOL) v3: An Online Tool for the Display and Annotation of Phylogenetic and Other Trees. Nucleic Acids Res. pii:gkw290. doi:10.1093/nar/gkw290. Maury, M.M., Tsai, Y.H., Charlier, C., Touchon, M., Chenal-Francisque, V., Leclercq, A., Criscuolo, A., Gaultier, C., Roussel, S., Brisabois, A., Disson, O., Rocha, E.P., Brisse, S., Lecuit, M. 2016. Uncovering Listeria monocytogenes hypervirulence by harnessing its biodiversity. Nat Genet; 48:308-13. doi: 10.1038/ng.3501. Epub 2016 Feb 1. Nyarko, E.B., Donnelly, C.W. 2015. Listeria monocytogenes: Strain Heterogeneity, Methods, and Challenges of Subtyping. J. Food Sci. 80, M2668-2678. Pichler, J., Appl, G., Pietzka, A., Allerberger, F. 2011. Lessons tob e learned from an outbreak of foodborne listeriosis, Austria 2009-2010. Food Prot Trends; 31,268-273. Pietzka, A.T., Stöger, A., Huhulescu, S., Allerberger, F., Ruppitsch W. 2011. Gene Scanning of an Internalin B Gene Fragment Using High-Resolution Melting Curve Analysis as a Tool for Rapid Typing of Listeria monocytogenes. J Mol Diagn 13:57-63. Pightling, A.W., Petronella, N., Pagotto, F. 2015. The Listeria monocytogenes Core-Genome Sequence Typer (LmCGST): a bioinformatic pipeline for molecular characterization with next-generation sequence data.BMC Microbiology;15:224. doi: 10.1186/s12866-015-0526-1. Ruppitsch, W., Pietzka, A., Prior, K., Bletz, S., Fernandez, H.L., Allerberger, F., Harmsen, D., Mellmann, A. (2015) Defining and Evaluating a Core Genome Multilocus Sequence Typing Scheme for Whole-Genome Sequence-Based Typing of Listeria monocytogenes. J Clin Microbiol 53:2869-76.

11

Ruppitsch, W., Prager, R., Halbedel, S., Hyden, P., Pietzka, A., Huhulescu, S., Lohr, D., Schönberger, K., Aichinger, E., Hauri, A., Stark, K., Vygen, S., Tietze, E., Allerberger, F., Wilking, H. (2015) Ongoing outbreak of invasive listeriosis, Germany, 2012 to 2015. Euro Surveill 20:doi:10.2807/1560-7917. Salcedo, C., Arreaza, L., Alcalá, B., de la Fuente, L., Vázquez, J.A.: Development of a multilocus sequence typing method for analysis of Listeria monocytogenes clones. J Clin Microbiol 2003, 41:757-762. Schmid, D., Allerberger, F., Huhulescu, S., et. al., Pietzka, A., Amar, C., Kleta, S., Prager, R., Preusel, K., Aichinger, E., Mellmann, A. Whole genome sequencing as a tool to investigate a cluster of seven cases of listeriosis in Austria and Germany, 2011 - 2013. CMI (2014); Apr 3. doi: 10.1111/1469-0691.12638. Epub 2014 Apr 28. Schlech, W.F. 3rd, Lavigne, P.M., Bortolussi, R.A., Allen, A.C., Haldane, E.V., Wort A.J., Hightower A.W., Johnson, S.E., King S.H., Nicholls E.S., Broome C.V., 1983. Epidemic listeriosis-evidence for transmission by food. N. Engl. J. Med. 308, 203-206. Seeliger, H.P.R., Höhne, K., 1979. Serotyping of Listeria monocytogenes and related species. Methods Microbiol. 13, 31-49. Schönberg, A. Bannerman, E., Courtieu, A.L., Kiss, R., McLauchlin, J., Shah,S., Wilhelms, D. 1996. Serotyping of 80 strains from the WHO multicenter international typing study of Listeria monocytogenes. Int J Food Microbiol. 32, 279-287. Doi:10.1016/S01681605(96)01142-7. Zerbino, D.R., Birney, E. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 18:821-829.

12

7. Figure Legend Graphical presentation of cgMLST allelic profiles of L. monocytogenes isolates used for cgMLST based serotype prediction (A) Minimum Spanning tree, (B) Neighbor-Joining Tree. Isolates, in which the 35 target genes were identified, are colored by serogroup: red for IIa, margenta for IIc, green for IIb, blue for IVb and grey for the group L. spp (4a, 4c). Isolates, which were used to blindly test the classification methods, are colored white. The serogroups of the test samples match the colored areas, which represent the respective serogroups. A

13

B

14

Table 1: Complete genomes downloaded from NCBI GenBank and used to identify serogroup specific genes. MLST clonal complexes (CC) and lineages in accordance to http://bigsdb.web.pasteur.fr/listeria/listeria.html

and

cgMLST

cluster

accordance to the core genome defined by Ruppitsch et al. (2015a).

Sample ID Serotype MLST

Lineage

CC

cgMLST

GenBank

CT

Accession No.

SLCC2540 3b

617

Lineage

31

NC_018586

1

NC_003210

3

HG421741

27

NC_017544

I EGD-e

1/2a

35

Lineage II

EGD

1/2a

12

Lineage II

10403S

1/2a

85

Lineage II

15

types

in

Table 2: All cgMLST cluster types (CT) and MLST defined clonal complexes (CC) ordered by lineage and serogroup. All types and clonal complexes which were found in this study are summarized in this table. Classical MLST profiles in accordance to http://bigsdb.web.pasteur.fr/listeria/listeria.html. Cluster types defined by cgMLST in accordance to the core genome defined by Ruppitsch et al. (2015a). Lineage

Serogrou

MLST

cgMLST cluster type

p I

IIb

3, 5, 39, 59, 66, 11, 31, 48, 53, 57, 1017, 1027, 1035, 1084, 1087, 87,

117,

287, 1115, 1138, 1139, 1181, 1194, 1211, 1215, 1217,

489, 517, 576, 1222, 1250, 1261, 1287, 1297, 2453, 2455, 2757, 617

IVb

2781, 2784

1, 2, 4, 6, 55, 63, 42, 66, 90, 1004, 1011, 1015, 1032, 1038, 1039, 64, 67, 73, 145, 1043, 1048, 1056, 1057, 1061, 1063, 1076, 1082, 257, 290, 291, 1083, 1094, 1095, 1096, 1098, 1103, 1117, 1118, 347, 397, 454, 1127, 1130, 1131, 1134, 1136, 1140, 1144, 1159, 458, 495

1160, 1169, 1180, 1193, 1201, 1206, 1213, 1214, 1224, 1226, 1230, 1283, 1285, 1291, 1292, 1294

II

IIa

7, 8, 12, 21, 26, 35, 39, 45, 65, 69, 73, 295, 1002, 1008, 1016, 31, 98, 101, 103, 1030, 1066, 1073, 1086, 1120, 1121, 1126, 1128, 109, 121, 155, 1129, 1152, 1164, 1174, 1179, 1198, 1209, 1218, 177, 398, 403, 1219, 1252, 1257, 1258, 1265, 1295, 1350, 1358, 451, 466, 519, 1364, 2451, 2742 521

IIc

9, 122, 356

1, 21, 26, 1092, 1107, 1148, 1182, 1196, 1232, 2449

III

spp.

71,

202,

467, 14, 33, 58, 1189

488

16

Table 3: Targets identified to be typical for different serogroups3. The major sequence types (ST) occurring in each serogroup is listed. For each gene found, a sequence type was assigned (number) by SeqSphere+ using the Ridom SeqSphere+ nomenclature server. A: accessory genome; C: core genome; PCR: targets for serogroup PCR by Doumith et al. 2004, both accessory genome; n.f.: not found (gene not found or missing in the genome); f.: failed (found but bearing frameshifts, a differing consensus sequence or having a too low coverage); +: gene was found, no major sequence type; *: not found or found, no major sequence type. Parenthesis: frequent ST;

Target1

MLST

IIa3

IIb3

IIc3

IVb3

L. spp. 3

gene set2 lmo0072

A

n.f.

f., n.f.

1

f., n.f.

f., n.f.

lmo0093

A

+

+

1

+

n.f.

lmo0224

A

+

n.f.

1

n.f.

n.f.

lmo0413

A

1,3

2

1

2

n.f.

lmo0416

A

1,3

2

1

2

n.f.

lmo0536

C

+

(8)

1

(2)

+

lmo0648

C

(1)

3

1

(2)

+

lmo0701

A

+

n.f.

1

n.f.

n.f.

lmo0726

C

1

f.

1

f.

(+)

lmo0736

A

+

n.f.

1

n.f.

n.f.

lmo0737

PCR

+

n.f.

1

n.f.

n.f.

lmo0832

A

+

3,4

1

3,4

n.f.

lmo0843

C

+

5

1

(2)

+

lmo0877

C

+

8

1

+

+

lmo0944

C

(1)

3

1

(3)

+

lmo0953

C

+

f.

1

f.

f.

lmo1069

A

+

2,3

1

n.f.

n.f.

lmo1074

A

+

3

1

n.f.

n.f.

lmo1089

A

+

3

1

n.f.

n.f.

lmo1118

PCR

n.f.

n.f.

1

n.f.

n.f.

lmo1196

A

+

n.f.

1

n.f.

(+)

lmo1303

A

+

n.f.

1

n.f.

n.f.

17

lmo1358

C

+

3

1

+

+

lmo1432

A

+

+

1

+

n.f.

lmo1451

A

+

+

1

+

n.f.

lmo1465

C

3

+

1

+

+

lmo1668

C

3

2,4

1

2,4

+

lmo1842

A

+

+

1

+

n.f.

lmo1905

A

+

n.f.

1

n.f.

n.f.

lmo2082

A

+

(3)

1

(3)

n.f.

lmo2121

A

+

+

+

n.f.

(+)

lmo2158

C

3,4

2

1

2

2

lmo2219

C

+

(4)

1

(2)

+

lmo2257

C

(1)

3

1

(2,3)

f.

lmo2407

A

f.

f.

1

f.

n.f.

1

Targets in accordance to Ruppitsch et al. (2015a), 2Sequence typing set in which the target is

found in accordance to Ruppitsch et al. (2015a). 3Serogroups in accordance to Doumith et al. (2004)

18