Next Generation Sequencing in Clinical and Public Health Microbiology

Next Generation Sequencing in Clinical and Public Health Microbiology

Clinical Microbiology N e w s l e t CMN Vol. 38, No. 21 November 1, 2016 www.cmnewsletter.com I n Th is Issu e 169 Next Generation Sequencing in C...

481KB Sizes 3 Downloads 170 Views

Clinical Microbiology N e w s l e t

CMN

Vol. 38, No. 21 November 1, 2016 www.cmnewsletter.com I n Th is

Issu e

169 Next Generation Sequencing in Clinical and Public Health Microbiology

0196-4399/©2016 Elsevier Inc. All rights reserved

Stay Current... Stay Informed.

t e r

Next Generation Sequencing in Clinical and Public Health Microbiology Duncan MacCannell, Ph.D., Centers for Disease Control and Prevention, Atlanta, Georgia

Abstract The introduction of next generation sequencing (NGS) and other high-throughput laboratory technologies are beginning to have a broad impact on clinical medicine and public health practice. These new technologies hold tremendous promise in terms of improving the speed, accuracy, and resolution of infectious disease diagnostics; public health surveillance; and outbreak detection, investigation, and response. Sustainable implementation of these technologies will require ongoing research and commitment to laboratory and informatics capacity, workforce development, and development of standardized methods and protocols.

Introduction

Corresponding author: Duncan MacCannell, Ph.D., Centers for Disease Control and Prevention, 1600 Clifton Rd. NE, MS-G38, Atlanta, GA 30333. Tel.: 404-639-1949. Fax: 404-235-0008. E-mail: [email protected]

CMN

The introduction of next generation sequencing (NGS) represents one of the most significant and fundamental technological advances in the biological sciences since the development of the polymerase chain reaction (PCR) in the mid-1980s. Over the course of the past decade, the raw costs associated with sequencing have decreased nearly six orders of magnitude as a result of new technologies, with a corresponding increase in the volume and complexity of sequence data that are produced. As a consequence, NGS instrumentation and methods have become increasingly commonplace in many clinical laboratories, driven primarily by human genomics and the promise of personalized medicine. Most recently, applications in infectious disease diagnostics and therapeutics, including microbial identification and characterization, molecular epidemiology, and metagenomics (the analysis of genomic sequences directly from clinical or environmental samples) are also driving the need for NGS. As NGS becomes increasingly cost-effective and routine, it will require significant changes to the workflow and staffing models for both clinical and public health laboratories and will completely redefine the standard of practice for many aspects of microbiology in both regulated and unregulated testing environments.

For the identification, characterization, and study of most bacteria, fungi, and parasites, the advent of cost-effective genomic sequencing has been nothing short of revolutionary. For these pathogens, NGS has enabled high-resolution strain typing, phylogenetics, and transmission mapping at a level that was not possible before. It has provided powerful new tools for the identification and study of non-culturable or poorly characterized organisms and emerging pathogens, and it has enabled rapid and open-ended profiling of genotypic and diagnostic markers for virulence and antimicrobial resistance. Although routine sequencing for most viral genomes has been possible for decades using Sanger methods, NGS enables clinical and public health laboratories to perform deep sequencing to identify minor population variants and quasispecies that may play an important role in disease transmission and vaccine or diagnostic efficacy. This review discusses the emerging role of NGS in clinical and public health microbiology laboratories, including a brief overview of the current generation of sequencing instruments and platforms, a discussion of the technical and regulatory challenges of working with NGS data, and an overview of the Centers for Disease Control and Prevention’s (CDC’s) Advanced Molecular

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

169

Detection and Response to Infectious Disease Outbreaks (AMD) initiative and its role in catalyzing the implementation of NGS and bioinformatics in routine public health practice.

Sequencing Instrumentation and Platforms The earliest commercially available NGS instruments were massively parallel, short-read sequencers that generated read lengths of several dozen to several hundred base pairs. In general, these systems had significantly higher throughput and lower operating costs than conventional dye terminator Sanger sequencing and could be applied to a range of microbiologic assays, genomic studies, and clinical or public health applications. The introduction of the first commercially viable NGS platform in 2005 marked the beginning of a precipitous decrease in the cost of sequencing and a corresponding increase in the volume and complexity of sequence data produced. In the decade that has followed, the technology space around NGS has evolved extremely rapidly, with new approaches and platforms, as well as iterative improvements to existing sequencing instrumentation, chemistry, sample-processing techniques, corollary instrumentation, reagents, and consumables [1]. Today, a number of different sequencing platforms are on the market, with different form factors, performance characteristics, error models, and operational requirements. Although this is not meant to be a comprehensive and historical overview of sequencing systems, most of the current, commercially available systems are summarized in Table 1.

Short-read sequencing

As their name implies, short-read sequencers output millions to billions of short sequence reads that may range from 75 to 800 bp in length, depending on the sequencing platform and configuration selected, with different output parameters, sequencing modes, and error models. These short-read instruments are currently by far the most common and established sequencing platforms in clinical and public health laboratories; they also command a lion’s share of academic and commercial sequencing activity. Due to their flexibility, high output, and relatively low cost, short-read sequencing has become an important workhorse technology. Despite these advantages, the nature of short-read sequencing can present important challenges, particularly in the sequencing of genomes with complex structure and extensive repeat regions or in sequencing applications where phasing may be a concern. Currently, two major vendors occupy the short-read-sequencing space: Illumina and ThermoFisher. Both vendors offer a range of different form factors and sequencing capacities with varying infrastructure and support requirements and important differences in performance. Compact benchtop sequencers, such as the Illumina MiniSeq and MiSeq, as well as the ThermoFisher IonTorrent S5, typically cost in the range of $50,000 to $150,000 and are increasingly becoming standard equipment in most academic, clinical, and public health laboratories. These smaller instruments are equally at home in research and low-volume production settings, as well

Table 1. Next-generation-sequencing platforms and characteristics Instrument cost

Read length (bp)

Run time (modes)

Single read output (modes)c

Sequence output (modes)

Approx cost per isolate (bacterial)

Platform

Instrument

Form factor

Sequencing technologya

Illumina

MiniSeq

Benchtop

SBS

$50,000

2 x 150

24 h/17 h

25 M/8 M

7.5 Gb/ 2.4 Gb

TBD

MiSeq(Dx)

Benchtop

SBS

$99,000

2 x 300

56 h

25 M

15 Gb

$60-70

NextSeq(Dx)

Benchtop

SBS

$250,000

2 x 150

29 h/26 h

400 M/130 M

120 Gb/ 39 Gb

$50-60

HiSeq (various models)

Capital

SBS

$750,000

2 x 125

6 days/40 h

1 Tb/ 180 Gb

$50-60

PGM

Benchtop

Semiconductor

$50,000

400

7h

5.5 M

2 Gb

$60-70

Proton

Benchtop

Semiconductor

$150,000

200

4h

83 M

10 Gb

$60-70

ThermoFisher IonTorrent

Pacific BioSciences

Oxford Nanopore

S5

Benchtop

Semiconductor

$65,000

400

2.5 h /4 h

5 M/80 M

15 Gb

$50-60

S5XL

Benchtop

Semiconductor

$150,000

400

2.5 h/4 h

5 M/80 M

15 Gb

$50-60

RSII

Capital

SMRT

$700,000

10,00015,000

4h

50,000

1 Gb

$500-600

Sequel

Capital

SMRT

$350,000

10,00020,000

6h

350,000

7 Gb

TBD

MinION MkI

Portable

Nanopore

$1,000

>10,000

1 min to 48 h

2.2 M/ 4.4 M

Up to 42 Gb

TBD

PromethION

Benchtop

Nanopore

TBDb

>10,000

1 min to 48 h

625 M/ 1.25 B

6 Tb/ 12 Tb

TBD

a

SBS, sequencing by synthesis; SMRT, single-molecule real-time sequencing. TBD, to be determined. c M, million; B, billion b

170

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

as high-volume commercial or core sequencing laboratories, where they are often used for backfill, rush, or specialized sequencing needs or to assess the quality of libraries before they are run on larger sequencing instruments. The current platform of choice for most large-scale core laboratories and contract sequencing facilities is the Illumina HiSeq family of sequencers, which come in a range of different models and multi-instrument configurations. These sequencers are large benchtop units that require adequate space, vibration isolation, power, and cooling. For many laboratories, the capital and operating costs of the instruments, combined with the necessary workforce, infrastructure, and sample volumes required for cost-efficient operation, effectively rule out onsite use. For these laboratories, the Illumina NextSeq and ThermoFisher IonTorrent S5/S5XL are an important middle ground between benchtop and capital instruments that balance speed, output, and cost. Because of differences in the detector technology, the Illumina NextSeq was initially better suited to resequencing tasks, but recent improvements in reagents and consumables appear to have improved the accuracy of the platform across a wide range of different sequencing applications, including de novo sequencing. For its part, ThermoFisher has largely shifted product development to focus on highly multiplexed amplicon sequencing (Ion AmpliSeq) and clinical markets, whereas Illumina has continued to extend its market share for general purpose, short-read sequencing across a wide range of life science applications. Another important consideration is the choice of methods for library construction and quality assessment. While large sequencing centers typically use high-throughput, focused ultrasonication to shear genomic DNA into appropriate fragment lengths for sequencing, the cost of the equipment is often a barrier for lower-volume laboratories or resource-limited settings. Enzymatic library construction methods, such as Illumina Nextera, which relies on random transposase-mediated fragmentation and adapter insertion, have been shown to produce analogous results for most sequencing applications and are generally more feasible for widespread use. For this reason, public health activities that include large-scale distributed sequencing have generally standardized enzymatic library construction and have carefully weighed the inclusion of complex protocol steps and requirements for expensive or complicated ancillary equipment. Long-read sequencing

Long-read sequencing first rose to common use with the introduction of the PacBio RS sequencer by Pacific BioSciences in 2010. While early versions of the platform were plagued by relatively high error rates, the current generation of instruments, sequencing chemistry, and signal-processing algorithms have improved the utility and reliability of PacBio long-read sequencing to the point where bacterial genomes are routinely closed as high-quality draft sequences, either with or without accompanying short-read sequence data for hybrid assembly [2]. PacBio instruments perform single-molecule real-time (SMRT) sequencing and generate hundreds of thousands of reads with average read lengths of 3 to 20 kb [3]. Because the instrument sequences individual DNA

molecules, PacBio sequencing has become increasingly useful for deep-sequencing and metagenomic applications, and further, because the system can also detect methylated bases by differences in reaction kinetics, epigenetic data is collected simultaneously during the run [4]. Despite these advantages, PacBio sequencing has remained largely impractical for many laboratories due to the size, cost, and infrastructure requirements of the instruments. Earlier this year, Pacific BioSciences introduced the Sequel, a jointly developed instrument with a significantly smaller footprint, higher output, and lower cost, which seems well positioned for both research and routine production sequencing. Another emerging long-read technology, from Oxford Nanopore, relies on engineered protein nanopores to linearize and sequence DNA molecules. The Oxford Nanopore MinION was first introduced in 2012, with a preview program that launched for early adopters in 2014. Much like the PacBio RS, the first iterations of the MinION platform were beset by high error rates. Recent improvements in the hardware, chemistry, nanopore configuration, and base-calling algorithms have all greatly improved the accuracy and throughput of the MinION, with the release of the Mk 1B in 2016. Nanopore sequencing has shown promise for bacterial-genome assembly [5] and real-time metagenomics [6], and the MinION is a particularly compelling platform for a number of reasons. The first of which is its diminutive size, which makes real-time NGS in remote field locations a practical option for infectious disease surveillance, diagnostics, and public health research [7]. The second is cost: at a fraction of most short- and long-read sequencing platforms, nanopore sequencing is an increasingly feasible option for smaller laboratories and those that cannot justify or support sustained investment in large capital instrumentation. Engineered protein nanopores offer remarkable flexibility in terms of format and capability and have been demonstrated to support direct sequencing of other types of complex biomolecules, including RNA and peptides. As nanopore-based sequencing technologies continue to mature and other systems enter the market, these technologies will almost certainly play an increasing role in microbiological testing, particularly in small laboratories and clinics and in the field.

Impact of NGS on the Microbiology Laboratory In order to fully understand the impact of NGS on clinical and public health microbiology, one must first understand how NGS workflows differ from conventional molecular testing in terms of the overall technical requirements, as well as the required inputs and potential outputs of sequencing (Fig. 1). Unlike conventional molecular testing, which is often highly context and application specific, NGS is a relatively broad and universal technique, with a wide range of potential applications. On the input side, current NGS sample preparation protocols can accommodate either DNA or RNA from whole genomes or amplified targets, and many support a wide range of potential sources, including pure cultured isolates, host or vector tissue, or even direct input from clinical and environmental samples. This universality greatly expands the range of NGS while simultaneously diminishing the need for highly specialized pathogen- or application-specific equipment,

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

171

Library

Data

Input: DNA/RNA

NGS

Bioinformatics

Output: Information From Sequence Data

Platforms Chemistry Perf. char. Labor/TAT Expertise Cost

Hardware/software Specialized skillsets Algorithms/pipelines Pathogen databases Data analysis/interpret/ Integration/visualization

Comparative Genomics Identification High resolution straintyping Cluster identification Molecular evolution Genotypic characterization Virulence, Antimicrobial resistance Functional annotation Diagnostic dev/validation Minor populations, quasispecies Host/pathogen expression

Workflow:

Source: Genomic Amplicon Whole sample Host/vector/ pathogen/ environment …

Increasingly Universal Laboratory Workflows Sample intake Prep/staging Extraction

Conversion Library prep Sequencing

Information

ACAATTTGTGCATAACATGTGGACAGTTTTAATCACATGTGGGTAAATAGTTGTCCACATTTGCTTTTTT TGTCGAAAACCCTATCTCATATACAAACGACGTTTTTAGGTTTTAAAATACGTTTCGTATAAATATACAT TTTATATTTATTAGGTTGTACATTTGTTGCGCAACCTTATTCTTTTACCATCTTAGTAAAGGAGGGACAC CTTTGGAAAATATCTCTGATTTATGGAATAGTGCCTTAAAAGAATTAGAAAAAAAGGTAAGCAAGCCTAG TTATGAAACATGGTTAAAATCAACAACGGCTCATAACTTGAAGAAAGACGTATTAACGATTACAGCTCCA AATGAATTTGCTCGTGACTGGCTAGAATCTCATTACTCAGAACTTATTTCGGAAACACTATACGATTTAA CAGGGGCAAAATTAGCAATTCGCTTTATTATTCCCCAAAGTCAATCGGAAGAGGACATTGATCTTCCTCC AGTTAAGCGGAATCCAGCACAAGATGATTCAGCTCATTTACCACAGAGCATGTTAAATCCAAAATATACA TTTGATACATTTGTTATCGGCTCTGGTAACCGTTTTGCCCATGCAGCTTCATTAGCTGTAGCCGAGGCGC CAGCTAAAGCGTATAATCCACTCTTTATTTATGGGGGAGTTGGGCTTGGAAAGACGCATTTAATGCACGC AATTGGTCATTATGTAATTGAACATAATCCAAATGCAAAAGTTGTATATTTATCATCAGAAAAATTCACG AATGAATTTATTAACTCTATTCGTGATAATAAAGCTGTTGATTTTCGTAATAAATATCGCAACGTAGATG

Workflow:

Bioinformatic analysis remains complex and nonstandardized.

QA/QC Standards File hashes/versioning Skills/proficiency Validated methods/databases Security Process logging/audit Reporting

Pathogen- and application-specific, standard and/or compliant assays

Metagenomics Pathogen identification/discovery Culture-independent diagnostics Microbial ecology/diversity

Many results from a single dataset. Faster and cheaper than serial tests.

Figure 1. Input, output, and workflow considerations of NGS in clinical and public health laboratory settings.

reagents, and expertise. As such, many laboratories increasingly find it practical to develop common protocols and standard operating procedures for nucleic acid extraction, library preparation, and sequencing. In addition, these laboratories can share instrumentation, resources, and quality metrics across reference, surveillance, and response activities that have traditionally occurred in a “silo” due to specialized testing requirements, protocols, or expertise. Differences in the volume and complexity of output data are another important consideration. Setting bioinformatics technical requirements aside for a moment, NGS may yield a number of important and actionable results from a single sequencing run and yield valuable insights by comparing data across multiple samples in a set. This is an important contrast with conventional molecular testing, where assays are often designed to interrogate specific genetic markers (e.g., real-time-PCR for vanA in Enterococcus spp.) and must be run serially or with dependencies on other results and findings. In contrast, whole-genome sequencing of a pathogen can yield many useful and actionable results from a single sequencing run, such as definitive genus/species identification, high-resolution strain-type information, functional annotation, and a comprehensive profile of clinically relevant genotypic features and characteristics (e.g., acquired antimicrobial resistance genes and other genetic virulence markers). At a cost of roughly $50 to $60 per

172

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

isolate or more, NGS is currently more expensive and, in some cases, more time-consuming than most conventional molecular assays and has much more complex analytical and support requirements. As more conventional molecular assays are transitioned to or replaced by genomic sequencing, NGS-based methods will provide an increasingly detailed profile of the pathogen, delivering actionable information more efficiently and at a significantly lower overall cost by replacing multiple specialized tests that would normally be performed independently or in serial fashion. For example, a number of different groups are currently developing validated methods for serotype prediction and reference characterization of pathogenic Escherichia coli, either directly from NGS data [8,9] or in conjunction with other technology platforms, such as matrix-assisted laser desorption ionization–time of flight mass spectrometry (MALDI-TOF MS) [10]. While development and validation of these methods is still ongoing, virtually all reports reflect significant cost and time savings over conventional methods.

From Research to Clinical Practice Despite the revolutionary advances in NGS platforms and technologies over the past decade, progress in adapting NGS from basic research applications to infectious disease diagnostics and public health surveillance has been relatively inconsistent and slow. A number of recent review articles have discussed the application and

practicality of routine NGS in clinical and public health microbiology and the broad impact of these technologies in terms of laboratory workflow, operating costs, and practical considerations [11-13]. In most clinical and public health laboratories, NGSbased methods have been implemented in a piecemeal fashion, with pathogen- or application-specific assays developed and run in parallel with conventional testing. A few reports have discussed the feasibility, relative cost, and complexity of incorporating routine pathogen genomic sequencing in a more holistic manner by committing to sequence all isolates within a specified time frame [13]. While NGS is becoming increasingly accessible to most clinical and public health microbiology laboratories, until recently, only limited consensus or support has been available to help guide the transition from conventional molecular to NGS-based methods. In the spring of 2015, the American Academy of Microbiology convened an expert colloquium to identify key challenges in adapting NGS to routine clinical and public health practice and to propose strategies for action [12,14]. International consortia, such as the Global Microbial Identifier (http://www.globalmicrobialidentifier. org) initiative, also play an important role in developing global consensus and support for pathogen genomics, data sharing, and NGS-based infectious disease surveillance and outbreak response on an international scale.

Earlier this year, the U.S. Food and Drug Administration (FDA) published initial draft guidance on the proposed regulatory framework for commercial NGS-based infectious disease diagnostic submissions, which includes notional recommendations for pathogen identification assays, as well as those that use genotypic markers to predict important microbial functional characteristics, such as virulence and antimicrobial susceptibility [16]. The guidance recommends a “one-system” approach to diagnostic review and proposes a comprehensive assessment of all the steps of a given NGS-based diagnostic submission, including specimen collection, DNA/RNA extraction, amplification, library construction, sequencing, data storage, bioinformatic analysis and reporting, and validation of results. The guidance covers targeted/amplicon-based NGS, as well as more open-ended and agnostic methods (e.g., unbiased metagenomics sequencing). It also underscores the importance of sequence databases and sample metadata and establishes a gold standard repository of reference quality microbial genomes for consistent benchmarking and validation (The FDA-ARGOS reference database can be found at http://www.ncbi.nlm.nih.gov/ bioproject/231221) and recommendations for minimum metadata and structure. Although this draft guidance is primarily intended for FDA and industry audiences, it is nonetheless an excellent starting point for any laboratory working on the development and validation of standardized NGS-based diagnostic methods.

Unsurprisingly, the implementation of NGS-based testing in regulated or accredited laboratory settings presents additional challenges. For laboratories operating under Clinical Laboratory Improvement Amendments (CLIA) or ISO 15189 standards, the wet-laboratory components of NGS and the application of these protocols to laboratory-developed tests are relatively straightforward to proceduralize, document, and validate for laboratories with established quality programs already in place. Standardized protocols and quality measures for sample handling, nucleic acid extraction, conversion, amplification, library construction, sequencer operations, and instrument/workflow management often already exist or can be patterned on existing laboratory documents and workflows. The process for validating bioinformatics procedures and pipelines is much less clear, and in the case of pathogen genomics and other data-intensive laboratory methods, these bioinformatics methods are increasingly critical to data interpretation and result reporting [15]. Protocol definitions for bioinformatics must take into account the algorithms, system environments, databases, and individual runtime parameters and criteria used to generate and validate each result. Quality management systems also must consider the integrity and reproducibility of the entire analytical process, from raw NGS sequence data to the final reported results, and provide a verifiable audit trail for all bioinformatics steps that includes file hashes and versioning, transactional process logging, cryptographic signatures, and time stamps to establish an immutable record of each sample run. Finally, laboratories also must consider how to implement meaningful quality control and quality assurance measures, set appropriate validation criteria, and establish realistic competencies and proficiency testing for individual bioinformatic analyses across the entire NGS sample-to-answer workflow.

Key Issues and Challenges As the financial and technical barriers to NGS implementation continue to decrease, NGS instrumentation and applications are rapidly becoming a mainstay in clinical medicine and public health. Even so, a number of key issues and challenges limit sustainable implementation of NGS in routine public health and clinical medicine, such as laboratory and bioinformatics capacity; standard operating protocols; workforce realignment and training; data management, analysis, and sharing; clinical reporting and billing; and gaps in existing reference databases. While partners in academia, industry, and public health actively work to develop solutions to many of these issues, their impact on the transition of NGS from microbiological research to applied clinical and public health laboratory settings warrants further discussion. Rate of technological innovation and change

Since the commercial introduction of NGS just over a decade ago, the pace of innovation and change in the NGS platform space has been nothing short of incredible. Significant new platform introductions occur every few years, and among existing vendors, the underlying technology and capabilities of the instrumentation, reagents, and consumables remain in a constant state of flux. As a result of this churning, a good rule of thumb is that the most current generation NGS hardware will become obsolete or outdated within 18 to 24 months, which presents important challenges for both laboratory budgeting and capital equipment planning. Laboratories adopting NGS must also make investments in ancillary equipment and instrumentation, such as thermocyclers, ultrasonicators, and bioanalyzers; plan for modifications to laboratory facilities, infrastructure, and physical plant; and anticipate changes in laboratory workflow and staffing requirements.

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

173

Necessary investments in information technology (IT) are often overlooked in planning, even though they may significantly outweigh the costs of laboratory instrumentation. NGS systems require appropriate network bandwidth and connectivity, consume large amounts of storage space, and typically require access to flexible high-performance computing or cloud resources. Laboratories also must make upgrades or changes to their existing laboratory and clinical information management systems (LIMS) to accommodate NGS sample workflows, data management, and result reporting. In most clinical and public health laboratories, IT departments are not adequately staffed or appropriately resourced to support these new scientific computing requirements, and both outsourcing and access to cloud computing resources may be limited by cost or security considerations. Despite the rapid pace of change and innovation in NGS instrumentation, high-throughput sequencer platforms have generally converged on common data standards (e.g., FastQ) for sequence data and their corresponding quality scores. Although different sequencing instruments have different performance characteristics that must be taken into account, this consistency suggests that future generations of sequencing hardware may be accommodated without significant reengineering of IT systems or bioinformatics pipelines. Data volume and complexity

One of the greatest challenges in implementing NGS for routine clinical and public health microbiology is the massive increase in data volume and complexity. Compared to conventional molecular methods, such as PCR or gel-based fingerprinting, raw data from NGS may represent an increase in primary data generation of four to six orders of magnitude. A small outbreak investigation might represent 20 to 100 gigabytes of raw sequence data, while a large-scale molecular surveillance platform, such as PulseNet, may generate 100 terabytes or more each year. At this scale, data transmission, management, storage, and archiving all become important considerations, as does the bioinformatics approach to sequence analysis. Equally important are the policy considerations that govern the use and retention of this primary sequence data. What must be retained? In what format? For how long? For whom? Long-term data storage is ultimately expensive, and in cases where the isolate or sample is still available, it is usually far more cost-effective to simply resequence the primary sample. In cases of public health import, or those involving regulatory action or microbial forensics, the original data files and results will likely be required. The aggregation and integration of NGS data with clinical and epidemiologic data is another important consideration. Modern LIMS can usually accommodate most NGS microbiology workflows and integrate with other health care data systems to support the use of these data for clinical case management. For public health, the integration of laboratory and epidemiologic data has been a longstanding challenge, particularly given the incomplete and often unstructured nature of epidemiologic data. The true power of NGS-based methods for public health and clinical microbiology lies in the ability to associate important

174

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

microbial findings with their respective epidemiologic data and context [17]. As the speed and accuracy of NGS microbial pipelines continue to improve, the rapid integration of epidemiologic data will be increasingly vital to effective and timely public health interventions. Lack of definitive standards and reference data

As public sequence repositories continue to grow, they provide an important basis for comprehensive and curated high-quality reference genome databases. These databases are a critical new tool for the detection of emerging pathogens and represent a strategy to help mitigate public health challenges as diagnostic laboratories increasingly adopt broad-based, culture-independent testing panels. Expert curation is vital, and many poorly culturable species, commensal organisms, and rare or unconventional pathogens remain critically underrepresented in most sequence and pattern libraries. As such, there are still critical errors, misidentification, and omissions in many primary sequence repositories, as well as uneven representation of many clinically important genera, species, and subspecies among specialized and curated reference databases. For example, in the initial release of the FDA-ARGOS reference database, a full 54 (34%) of the 159 bacterial genome sequences are different strains of Staphylococcus aureus. A number of different sequencing efforts are currently under way to help improve the quality and comprehensiveness of available microbial reference sequences and to identify and flag sequences with known issues, including the 100,000 Pathogen Genomes project and updates to National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database [18]. Expertcurated database projects, such as the CDC MicrobeNet (http:// microbenet.cdc.gov), are also intended to help fill in many of these gaps in the sequence data, providing expert-curated sequence, phenotypic, and mass spectrometric (e.g., MALDI-TOF MS) data for identification and characterization. Barriers to data sharing and collaboration

Data sharing marks another important change, particularly in the context of public health, where the open and immediate release of bacterial genome sequences from laboratory-based surveillance programs, such as the Listeria monocytogenes WGS initiative, marks a significant departure from traditional data-sharing models. Near-real-time sharing of microbial sequences underscores a commitment to open data and is intended to accelerate diagnostic development, to support basic research and collaboration, and to enable ongoing assessment of critical genotypic markers. Even so, any release of genomic sequence data from patient isolates must balance patient privacy considerations against the utility and value of public data release. In the case of the Listeria initiative, the metadata describing the isolate are anonymized, aggregated, and released in two sequential uploads to NCBI’s Sequence Read Archive spaced roughly 6 months apart. A unique sample ID, genus/species, organism source (clinical/environmental), and country of origin are included with the initial sequence submission, which generally occurs within 7 days of isolate receipt at the laboratory. A subsequent upload 6 months later updates the metadata for each record to include isolate the source, serotype,

month and year of collection, geographic region, and age range of the patient. All the data fields are reviewed to ensure that data quality, consistency, and patient privacy are not compromised by the submission [19]. Need for specialized systems and expertise

Bioinformatics and genomics are both relatively new disciplines in the public health and clinical laboratory environments and, as such, pose important challenges in recruiting and retaining appropriately skilled personnel. This is particularly true of bioinformatics, where many organizations lack the necessary job series and competency measures and are competing for skilled bioinformaticians with both academia and private industry. To help address the administrative gap, the CDC and the Association of Public Health Laboratories recently published an initial set of guidelines for bioinformatics competency assessments in public health laboratories [20]. The greater issue, particularly with growing demands for bioinformatics and data science expertise, is in recruiting skilled bioinformaticians to clinical and public health careers, since it is not always possible to compete on the basis of salary alone, even with a compelling mission and interesting technical challenges. It is often much more practical to provide the necessary training and resources to bench microbiologists, epidemiologists, and clinical laboratory staff to instill a fundamental understanding of bioinformatics principles and methods and to enable them to become competent users of bioinformatics tools and pipelines. Training and fellowship programs, such as the Bioinformatics in Public Health Fellowship program, are another key component of workforce development as a means of introducing early-career bioinformaticians to clinical microbiology and public health careers (http:// www.aphl.org/bioinformatics). Most clinical and public health laboratories in the United States do not have local access to the necessary scientific and high-performance computing resources needed for complex bioinformatic analyses. In addition, information security and patient privacy considerations can make cloud computing an impractical option. Naturally, both the technical and computational complexity of tasks can vary widely, as can the infrastructure and workforce capabilities of different laboratory organizations. On one hand, the public health laboratories need turnkey and deployable bioinformatics solutions to ensure standardized analytics across a system with differing levels of expertise, capacity, and resources. On the other hand, as routine testing becomes increasingly complex and bioinformatics dependent, most laboratories (both clinical and public health) eventually will need some level of flexible, cost-effective, and secure high-performance computing access and dedicated bioinformatics expertise. Effective and sustainable solutions that meet both of these needs will continue to be an important challenge. Limits of genotypic characterization

Although NGS-based microbial identification and characterization methods can provide a relatively complete and comprehensive profile of a bacterial isolate, important limitations requiring additional or confirmatory testing may still exist. Prediction of the antimicrobial resistance phenotype from genomic data is one area

in which the microbial genotype and phenotype may not correlate completely due to factors not reflected in the genome of the organism (e.g., point mutations, altered gene expression, epigenetic factors, or differences in genomic structure), particularly for virulence and resistance mechanisms that are incompletely understood. For a number of different bacterial species, the presence of acquired resistance genes has been shown to be highly predictive of resistance phenotypes. In a recent study of 640 varied strains of non-typhoidal Salmonella, for example, the correlation was 99.0% [21]. Others, particularly organisms with highly mobile accessory genomes or those with complex, unusual, or highly plastic genomic structures, may require sentinel or routine functional susceptibility testing. While NGS is already being applied to the detection of emerging antibiotic resistance, to strengthen infection control programs, and to guide antibiotic stewardship activities in many facilities and jurisdictions, comprehensive reference databases, analytical tools, and extensive validation will be required before it can be applied to direct individual patient care. To this end, NCBI has recently made changes to the data structure of BioSample records to support the standardized collection and interrogation of functional susceptibility data alongside pathogen genomic sequences (for an example, see http://www.ncbi.nlm.nih. gov/biosample/5170347).

The AMD Initiative The CDC’s AMD initiative was introduced in the 2014 federal budget and is intended to help accelerate, coordinate, and support the rollout of advanced laboratory technologies, such as NGS, in state and federal public health agencies and to develop the necessary IT, bioinformatics, and workforce capacity to sustain them into the future. The AMD initiative has five principal goals: (i) to improve pathogen detection and characterization by leveraging new technologies and methods, (ii) to develop and validate new diagnostic tools to meet both emerging and established public health needs, (iii) to support sustainable genomics and bioinformatics capacity across the U.S. public health system, (iv) to implement enhanced and sustainable information systems to integrate large-scale bioinformatics data into the public health information flow, and (v) to develop and improve laboratory and bioinformatics tools for prediction, modeling, and early recognition of emerging infectious disease threats. To achieve these goals, the AMD initiative supports a range of intramural and extramural projects and activities and coordinates the use of pathogen metagenomics and other emerging laboratory and informatics technologies across the CDC’s infectious disease laboratory programs. Through shared IT infrastructure and expanded core laboratory capacity, standardized laboratory workflows and data management practices, and consistent performance standards and quality management systems, AMD is helping to apply NGS and other innovative new technologies to public health. Workforce development is another important area of focus for AMD, with the development of training and proficiency measures for bioinformatics, data science, and genomics and the establishment of fellowships and new career paths for public health professionals. In the 3 years since its inception, the AMD initiative

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

175

has had a significant and transformational impact on many of the CDC’s infectious disease programs, in particular its surveillance and outbreak response activities. More information on the CDC’s AMD initiative and the programs and activities that it supports can be found at http://www.cdc.gov/amd. As the impact of NGS and other high-throughput laboratory technologies becomes fully realized, the fundamental change in the workflow and capabilities of clinical and public health microbiology laboratories become clearer. NGS, in particular, holds the promise of increased speed, accuracy, and resolution over conventional infectious disease diagnostic methods and is already making significant contributions to surveillance, outbreak detection, and response. The development of consensus standards, high-quality reference databases, laboratory workforce capacity, and standardized analytical approaches will all be critical to sustained implementation of NGS and other advanced laboratory technologies. Collaboration and partnerships are crucial to navigating this important transition in laboratory practice. Through its AMD initiative, the CDC has coordinated and accelerated the incorporation of NGS into routine public health use and has leveraged existing laboratory capacity to meet tomorrow’s public health needs. While important challenges remain, initiatives such as AMD are critical to supporting ongoing innovation in clinical and public health laboratory science.

Summary Over the past decade, the cost and feasibility of NGS has improved significantly. Today, the majority of U.S. state and local public health laboratories, as well as, a growing number of clinical laboratories, are beginning to apply these technologies into practice. Applications include; microbial identification, strain typing, and characterization. This review article describes the current state of NGS technologies and discusses key challenges, opportunities, and barriers to sustainable and widespread implementation of sequencing in clinical and public health settings.

Disclaimer Specific brand names and models are included for informational purposes only, and their use does not imply endorsement by the author, the Department of Health and Human Services, or the CDC.

References [1] Loman NJ, Pallen MJ. Twenty years of bacterial genome sequencing. Nat Rev Microbiol 2015;13:787-94. [2] Lin HH, Liao YC. Evaluation and validation of assembling corrected PacBio long reads for microbial genome completion via hybrid approaches. PLoS One 2015;10:e0144305. [3] Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science 2009;323:133-8. [4] Flusberg BA, Webster DR, Lee JH, Travers KJ, Olivares EC, Clark TA, et al. Direct detection of DNA methylation during singlemolecule, real-time sequencing. Nat Methods 2010;7:461-5. [5] Loman NJ, Quick J, Simpson JT. A complete bacterial genome

176

Clinical Microbiology Newsletter 38:21,2016 | ©2016 Elsevier

assembled de novo using only nanopore sequencing data. Nat Methods 2015;12:733-5. [6] Greninger AL, Naccache SN, Federman S, Yu G, Mbala P, Bres V, et al. Rapid metagenomic identification of viral pathogens in clinical samples by real-time nanopore sequencing analysis. Genome Med 2015;7:99. [7] Quick J, Loman NJ, Duraffour S, Simpson JT, Severi E, Cowley L, et al. Real-time, portable genome sequencing for Ebola surveillance. Nature 2016;530:228-32. [8] Joensen KG, Tetzschner AM, Iguchi A, Aarestrup FM, Scheutz F. Rapid and easy in silico serotyping of Escherichia coli isolates by use of whole-genome sequencing data. J Clin Microbiol 2015;53:2410-26. [9] Lindsey RL, Pouseele H, Chen JC, Strockbine NA, Carleton HA. Implementation of whole genome sequencing (WGS) for identification and characterization of Shiga toxin-producing Escherichia coli (STEC) in the United States. Front Microbiol 2016;7:766. [10] Cheng K, Chui H, Domish L, Sloan A, Hernandez D, McCorrister S, et al. Phenotypic H-antigen typing by mass spectrometry combined with genetic typing of H antigens, O antigens, and toxins by whole-genome sequencing enhances identification of Escherichia coli isolates. J Clin Microbiol 2016;54:2162-8. [11] Fournier PE, Dubourg G, Raoult D. Clinical detection and characterization of bacterial pathogens in the genomics era. Genome Med 2014;6:114. [12] Goldberg B, Sichtig H, Geyer C, Ledeboer N, Weinstock GM. Making the leap from research laboratory to clinic: challenges and opportunities for next-generation sequencing in infectious disease diagnostics. MBio 2015;6:e01888-15. [13] Long SW, Williams D, Valson C, Cantu CC, Cernoch P, Musser JM, et al. A genomic day in the life of a clinical microbiology laboratory. J Clin Microbiol 2013;51:1272-7. [14] Weinstock GM, Ledeboer N, Rubin E, Sichtig H, Geyer C. Applications of clinical microbial next-generation sequencing. http:// academy/images/Colloquia-report/NGS_Report.pdf. ASM, 2016. [15] Olson ND, Zook JM, Samarov DV, Jackson SA, Salit ML. PEPR: pipelines for evaluating prokaryotic references. Anal Bioanal Chem 2016;408:2975-83. [16] FDA. Infectious disease next generation sequencing based diagnostic devices: microbial identification and detection of antimicrobial resistance and virulence markers. 2016 [cited July 13, 2016]. http://www. fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/ GuidanceDocuments/UCM500441.pdf. [17] Grad YH, Lipsitch M. Epidemiologic data and pathogen genome sequences: a powerful synergy for public health. Genome Biol 2014;15:538. [18] O’Leary NA, Wright MW, Brister JR, Ciufo S, Haddad D, McVeigh R, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016;44(D1):D733-45. [19] Jackson BR, Tarr C, Strain E, Jackson KA, Conrad A, Carleton H, et al. Implementation of nationwide real-time whole-genome sequencing to enhance listeriosis outbreak detection and investigation. Clin Infect Dis. 2016;63:380-6. [20] Ned-Sykes R, Johnson C, Ridderhof JC, Perlman E, Pollock A, DeBoy JM, et al. Competency guidelines for public health laboratory professionals: the CDC and the Association of Public Health Laboratories. MMWR Suppl. 2015;64:1-81. [21] McDermott PF, Tyson GH, Kabera C, Chen Y, Li C, Folster JP, et al. Whole-genome sequencing for detecting antimicrobial resistance in nontyphoidal salmonella. Antimicrob Agents Chemother. 2016;60:5515-20.