208
Potential applications of DNA microarrays in biodefense-related diagnostics David A Stenger*†, Joanne D Andreadis‡, Gary J Vora* and Joseph J Pancrazio* Recent years have witnessed a logarithmic growth in the number of applications involving DNA microarrays. Extrapolation of their use for infectious diagnostics and biodefense-related diagnostics seems obvious. Nevertheless, the application of DNA microarrays to biodefense-related diagnostics will depend on solving a set of substantial, yet approachable, technical and logistical problems that encompass diverse topics from amplification efficiency to bioinformatics.
can detect both known and unanticipated pathogens. DNA microarrays, which enable the simultaneous interrogation of thousands of genetic elements, address this crucial need. Here, the term ‘microarray’ refers to any type of planar substrate or solid beads presenting a high multiplicity (102–106) of individual sites, each presenting nucleic acid probes designed to selectively capture complementary strands of target (i.e. pathogen or host) nucleic acid.
Addresses *Center for Bio/Molecular Science and Engineering, Code 6910, Naval Research Laboratory, Washington, DC 20375, USA † e-mail:
[email protected] ‡ Centers for Disease Control & Prevention, 1600 Clifton Road, Mailstop G29, Atlanta, GA 30333, USA
The threat presented by potential agents of biological warfare has fueled interest in the development of sensitive and rapid detection technologies for the identification of pathogenic microrganisms and toxins in the environment. In this article, we review recent advances that have been made in microarray technologies for such detection purposes.
Current Opinion in Biotechnology 2002, 13:208–212 0958-1669/02/$ — see front matter © Published by Elsevier Science Ltd. Abbreviations BA bead array EAA electrically addressable array HDM high-density microarray PBL peripheral blood lymphocyte PCR polymerase chain reaction SDA strand displacement amplification SM spotted microarray
Introduction Recent technological advances in microarray technology offer high-throughput screening capabilities for pathogen detection that constitute an attractive alternative to conventional clinical microbiology protocols. Standard culture and susceptibility tests permit pathogen identification, but are laborious, time-consuming, and expensive [1]. More importantly, these tests do not directly characterize virulence factors and, thus, do not provide any information as to the potential pathogenicity of the organism identified. Genotypic identification methods that utilize molecular biology based techniques, such as the polymerase chain reaction (PCR), offer several potential advantages over conventional microbiological approaches. Nucleic acid amplification strategies base pathogen identification on the detection of genetic information contained within the organism, such that culturing the organism is not required. Although PCR-based assays are exquisitely sensitive, accurate, and rapid, these methods also introduce a new set of concerns. As successful identification depends almost entirely on appropriately chosen primer sets, all PCRbased testing requires a priori knowledge pertaining to the identity of the contaminating organism(s). Consequently, there is a critical need for advanced diagnostic systems that
Types of microarray Figure 1 illustrates four types of microarray. These different types of microarray vary in the density of oligomers and the time ranges required for assay following sample acquisition (Figure 2). High-density microarrays (HDMs) are fabricated by light-directed combinatorial synthesis of DNA oligomers [2]. The DNA oligomers synthesized on these sites typically have lengths of 10–30 bases. Subsequent improvements to the method using high-resolution semiconductor photoresists [3] have enabled fabrication of HDMs having probe oligonucleotide spots with resolutions approaching 1 µm2, enabling densities of >106/cm2. To date, HDM designs relevant to pathogen identification have been based on a tiling strategy [4]. Four probes of equal length are synthesized for each base in a given reference sequence. One probe exactly complements the reference sequence, whereas three others have a single base mismatch at the position of the interrogated base. Thus, a tiled HDM can effectively allow the target nucleic acid to be ‘re-sequenced’. Spotted microarrays (SMs) are fabricated using robots to dispense picoliter to nanoliter quantities of DNA solutions onto glass slides using multiple quills [5] or inkjets. Long, double-stranded DNA fragments (~500 bp) can be inexpensively produced by PCR in multiwell plates and then printed onto slides. Either denatured DNA or oligomers with lengths of 20–100 bases are printed and chemically cross-linked to the surface using a variety of methods. Although less expensive to produce, quill and inkjet-based fabrication strategies produce much larger features than found on HDMs, typically circular spots with diameters of 50–200 µm that allow densities of ~104/cm2. Bead arrays (BAs) do not involve planar substrates, but are comprised of an addressable population of microscopic polymer beads (diameter ~1–5 µm) that contain precise
DNA microarrays in biodefense-related diagnostics Stenger et al.
209
Figure 1
(a)
(b)
(c)
(d)
Current Opinion in Biotechnology
Illustrations of four types of microarray. (a) Electrically addressable array (courtesy of Motorola Life Sciences). (b) Spotted microarray (Operon Stress and Aging glass microarray photographed at
Naval Research Laboratory). (c) Bead microarray (courtesy of Illumina, Inc.). (d) Photolithographic high-density microarray (courtesy of Affymetrix, Inc.).
amounts of up to four different fluorophores. Each type of bead has a characteristic DNA target surface coating where the identity, or ‘bar code’, for each type of bead is then determined optically by measuring the relative fluorescence from each fluorophore. Molecular recognition events on the surface of the beads are measured using additional fluorophores to label the target molecules. Optical interrogation of BAs can be accomplished using flow cytometry [6] or by imaging following self-arrangement in etched fiber optic bundle arrays [7]. In principal, >103 probespecific bead types can be combined in a single assay, although this is yet to be demonstrated. An apparent advantage of BAs would be that existing assay reagents could be ‘updated’ through the addition of more probespecific beads.
molecules near the electrode surface for fluorescencebased detection [9•].
Electrically addressable arrays (EAAs) are relatively low density (<102/cm2), device-specific two-dimensional plastic or silica chips that are specifically designed for electronically addressing the nucleic acids immobilized on substrate microelectrodes. Hybridization to the target nucleic acid with the probe DNA results in a measurable voltametric change at the electrode when the hybrid complex is decorated with various types of enhancer molecules [8]. Alternatively, the microelectrodes can be used for electrostatic placement of the probe and target nucleic acid
Pathogen identification HDMs have been used to identify pathogen species and drug resistance in a series of in vitro experiments using cultured microorganisms, including human immunodeficiency virus (HIV) [10]. Troesch et al. [11] designed HDMs to discriminate between 54 different Mycobacterium species and rifampin-resistant Mycobacterium tuberculosis [12]. A tiled array of 65 000 oligomer probes was used to accurately re-sequence 70 clinical isolates of 27 mycobacterial species and 15 rifampin-resistant M. tuberculosis strains. More recently, sequence-specific identification of Francisella tularensis and Yersinia pestis was demonstrated in environmental samples using tiled HDMs [13••]. SMs have also been used for bacterial species and strain identification, where oligomers were spotted on the surfaces of hydrogel-coated glass slides using non-contact printing techniques and covalently coupled to the hydrogel backbone [14]. PCR primers can also be co-localized in the hydrogels to achieve a local increase in the target DNA signal before hybridization [15•]. This approach has been used to distinguish strains of M. tuberculosis and to identify their toxin and drug resistance. Recently, Chizhikov et al. [16]
Figure 2 Estimated comparative times required for assay completion for various microarray systems applied either to direct pathogen detection or indirectly via monitoring host gene expression profiles.
Microarray assay processing time
EAAs hundreds of genes
Microarray-based pathogen detection with PCR amplification
Microarray-based host expression profiling 1000–33,000 genes
<1 h
2–4 h
>12 h Current Opinion in Biotechnology
210
Environmental biotechnology
developed oligonucleotide SMs to discriminate among strains of Escherichia coli and other enteric bacteria (Shigella and Salmonella) containing virulence factors. EAAs have also been used recently to determine strains and antimicrobial resistance in cultured bacteria [9•]. A unique innovation in this approach is the use of isothermal strand displacement amplification (SDA) to locally amplify the target-DNA-derived signal at individual electrodes [17]. Although BAs have not yet been used for genomic pathogen identification, their use for this purpose is imminent. One technical challenge for pathogen detection with microarrays arises from the difficulty in obtaining samples with a sufficient quantity of pathogen DNA. Thus, for a majority of sample types, some sort of amplification will probably be required to provide sufficient copies of pathogen gene markers for detection by microarray hybridization. Unfortunately, methods for this amplification do not scale as well as the number of probes that can be placed on a microarray chip. Multiplex PCR is currently limited to tens of different primer pairs, not thousands, as primer pairs give rise to varying numbers of spurious amplicons as observed by gel electrophoresis. However, a cause for optimism is that discrimination occurs when the labeled amplicons are required to hybridize to specific probes on the array surface; thus, spurious amplicons may not be detected by the microarray assay [16]. Several research groups are now determining the degree to which this discrimination can be used to mitigate non-specific amplification preceding microarray analysis. In addition, SDA and rolling-circle amplification may address scaling limitations of multiplex PCR [18].
Host expression profiling Diagnosis of a pathogen infection through expression profiling requires a measurable change in the relative abundance of transcribed mRNA in host cells in response to the infection. The measurement is performed indirectly by reverse transcription (RT) of the labile mRNA into more stable cDNA that is labeled with a fluorophore and allowed to hybridize with the microarrays. Typically, colored fluorophores are used to label the ‘control’ and ‘experimental’ pools of cDNA, allowing the relative transcript abundances to be deduced from the ratio of fluorescence intensities. Variations of this concept have been exploited for non-fluorescent measurements such as radioactively labeled target or chemiluminesence. However, defining sets of genes that are modulated is not trivial. A key aspect for the successful utilization of this technology is the understanding that there is an inherent level of noise owing to biologic variability, microarray production batch, handling factors, and variability emerging during sample processing [19,20]. DNA microarrays have been used to elucidate changes in gene expression of fibroblasts infected with cytomegalovirus [21], epithelial cells exposed to Bordatella pertussis [22], and a
CD4+ transformed cell line infected with HIV-1 [23,24]. For diagnostic purposes using microarrays, peripheral blood lymphocytes (PBLs) are an attractive cell source for two reasons. Firstly, any pathogen gaining access to the body is likely to encounter the immune system, which is likely to include PBLs. Secondly, PBLs are easily obtained in sufficient quantities to produce enough mRNA for microarray-based assays. Boldrick et al. [25•] recently elucidated gene expression profiles in PBLs resulting from exposure to bacteria and bacterial toxins, showing qualitative and quantitative differences among B. pertussis, E. coli, and Staphylococcus aureus. Recent work suggests that a select set of genes expressed in PBLs can yield potentially distinct profiles useful in distinguishing exposure to a biological threat. Thus, there exists the potential to reveal the severity of exposure and the individual susceptibility to the agent and to provide indicators of course of impending illness, even for unknown toxic agents [26]. To date, most of the published work has been conducted using in vitro exposures to PBLs, and thus requires extension to relevant animal models for validation. Furthermore, there is a need to unambiguously define ‘baseline’ expression profiles, against which the ‘perturbed’ state profiles are compared, as they may vary in time and between individuals.
Bioinformatics issues and pathogen detection Depending on the endpoints used for microarray-based detection of pathogens, the emphasis of bioinformatics issues can be very different. Bioinformatic tools are indispensable for the efficient design and selection of specific complementary nucleic acid probe sequences for microarray development. For example, target pathogen genomic nucleic acid sequences are often amplified before microarray analysis and bioinformatics clearly has a role in the design of primers (assessing melting and annealing temperatures, secondary structure, self-complementarity, and specificity issues) for assaying genes considered specific to an organism and strain [27]. These same assessments must also be made for microarray probe design. During the initial stages of experimental design, it is assumed that primers and probes to genetic signatures associated with a target pathogen are specific to that pathogen or family of pathogens. It would follow that the generation of an amplicon or positive hybridization reaction using specifically designed primers or probe, respectively, would indicate the detection of the designated molecular trait from the target pathogen; however, this is not necessarily true. Bacterial and viral ‘genetic promiscuity’, the propensity of microorganisms to exchange genetic material, creates difficulties in developing single-species or strain-specific probes [28]. Thus, current primer and probe design methodologies require the use of bioinformatic tools to perform three main tasks: to generate PCR primers with the appropriate biochemical properties; to compare these sequences with those deposited in sequence databases to determine the present uniqueness of particular sequences and the potential for crossreactivity; and to infer the probability of target specificity based on the level of genetic conservation and evolutionary relatedness with other
DNA microarrays in biodefense-related diagnostics Stenger et al.
pathogenic and non-pathogenic species whose primary genetic sequence has not yet been elucidated. When considering host gene expression profiling, the capacity to conduct thousands of assays simultaneously poses challenges regarding data analysis, storage, and management. Although data storage and management issues are largely technical concerns for information technology specialists, no clear consensus on analysis techniques has emerged for making use of host gene expression profiles. The major role for bioinformatics is the identification of patterns associated with responses to pathogens, which may not only provide a means of detection but which may also help to elucidate genetic networks underlying the initiation and progression of disease. The most commonly exploited tool for analysis of gene expression profiles is hierarchical clustering [29,30], where the fundamental assumption is that similar trends, computed through a measure of distance, in the relative magnitudes of gene modulation imply similarity of function. A critical need for the interpretation of large data files is the visualization of information, which can be readily accomplished by dendrograms that can be derived from cluster analysis. Interpretation of expression profiling data has been used to gain profound insights into gene function. Clustering of genes expressed in yeast coupled with statistical algorithms yielded a model of regulatory transcriptional subnetworks [31]. A significant demonstration of the utility of clustering has been offered by Hughes and colleagues [32], where a compendium of expression profiles of 300 diverse yeast mutations was used to identify novel open reading frames that encoded proteins of several cell functions. With regard to pathogen detection, different pathological conditions reflected by particular expression profiles could also be clustered (clustering by arrays rather than by genes), but variation among a broad set of genes or dimensions may reduce the ability to discern pathogen exposure states. Efforts in functional genomics related to cancer research have yielded major successes in the pursuit of gene expression signatures. Expression-based criteria or class predictors have been defined based on neighborhood analysis [33], Bayesian regression models [34], and artificial neural networks [35]. These predictors were successfully used to classify novel samples in a manner consistent with clinical assessments. In fact, classifications based on gene expression alone or class discovery have also been demonstrated, suggesting that gene expression profiling has the capacity to identify subtypes that have not been previously defined [33]. Although promising, one should note that cancer line gene expression analyses are one-dimensional; by contrast, a host expression profile evoked by pathogen exposure would be expected to be temporal and ‘dose-dependent’. Comprehensive sets of gene expression profiles that explore temporal and dose ranges for pathogen exposure must be produced to map the continuum of gene expression changes.
211
Conclusions The threat associated with potential biowarfare agents necessitates the development of sensitive detection technologies that are able not only to detect microorganisms, but also to identify them and to characterize their virulence and pathogenicity. For direct pathogen detection, the throughput capacity of microarray technology allows for the expansion of probe sets to include nucleic acid sequences specific for virulence determinants, exotoxins, receptors determining tissue tropism, and antibiotic resistance markers. Furthermore, host gene expression profiling with microarrays offers a means of monitoring health status, and may yield distinct signatures associated with pathogens. Moreover, host gene expression profiling may yield insight into the cellular mechanisms that are crucial for pathogenesis, such that novel therapeutic interventions may result. Clearly, microarrays are an information-rich approach that will require substantial research and development investment to achieve a bioinformatics-based organizational infrastructure for pathogen detection. Decision-making algorithms that glean characteristic profiles from data sets spanning pathogen dose and time, and that enable profile matching with quantitative, statistical measures, will have a profound impact on microarraybased pathogen detection.
Acknowledgements The Office of Naval Research and the Defense Advanced Research Projects Agency supported this work. GJV was supported by an associateship from the National Research Council. The views expressed here are those of the authors and do not represent those of the US Navy, the US Department of Defense or the US government.
References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as:
• of special interest •• of outstanding interest 1.
Jungkind D: Tech.Sight: Molecular testing for infectious disease. Science 2001, 294:1553-1555.
2.
Fodor SPA, Read JL, Pirrung MC, Stryer LT, Lu A, Solas D: Lightdirected, spatially addressable parallel chemical synthesis. Science 1991, 251:767-773.
3.
McGall G, Labadie J, Brock P, Walraff G, Nguyen T, Hinsberg W: Light-directed synthesis of high-density oligonucleotide arrays using semiconductor photoresists. Proc Natl Acad Sci USA 1996, 93:13555-13560.
4.
Cronin MT, Fucini RV, Kim SM, Masino RS, Wespi RM, Miyada CG: Cystic fibrosis mutation detection by hybridization to lightgenerated DNA probe arrays. Hum Mutat 1996, 7:244-255.
5.
Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270:467-470.
6.
Ye F, Li MS, Taylor JD, Nguyen Q, Colton HM, Casey WM, Wagner M, Weiner MP, Chen J: Fluorescent microsphere-based readout technology for multiplexed human single nucleotide polymorphism analysis and bacterial identification. Hum Mutat 2001, 17:305-316.
7.
Steemers FJ, Ferguson JA, Walt DR: Screening unlabeled DNA targets with randomly ordered fiber-optic gene arrays. Nat Biotechnol 2000, 18:91-94.
8.
Yu CJ, Wan Y, Yowanto H, Li J, Tao C, James MD, Tan CL, Blackburn GF, Meade TJ: Electronic detection of single-base mismatches in DNA with ferrocene-modified probes. J Am Chem Soc 2001, 123:11155-11161.
212
Environmental biotechnology
9. •
Westin L, Miller C, Vollmer D, Canter D, Radtkey R, Nerenberg M, O’Connell JP: Antimicrobial resistance and bacterial identification using a microelectronic chip array. J Clin Microbiol 2001, 39:1097-1104. Establishes anchored strand displacement amplification on a microelectronic chip array for species-specific identification of six clinically relevant bacterial pathogens. 10. Kozal M, Shah N, Shen N, Yang R, Fucini R, Merigan TC, Richman DD, Morris D, Hubbell E, Chee M, Gingeras TR: Extensive polymorphisms observed in HIV-1 clade B protease gene using high density oligonucleotide arrays. Nat Med 1996, 2:753-759. 11. Troesch A, Nguyen H, Miyada CG, Desvarenne S, Gingeras TR, Kaplan PM, Cros P, Mabilat C: Mycobacterium species identification and rifampin resistance testing with high-density DNA probe arrays. J Clin Microbiol 1999, 37:49-55. 12. Gingeras TR, Ghandour G, Wang E, Berno A, Small PM, Drobniewski F, Alland D, Desmond E, Holodniy M, Drenkow J: Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic mycobacterium DNA arrays. Genome Res 1998, 8:435-448. 13. Wilson WJ, Strout CL, DeSantis TZ, Stilwell JL, Carrano AV, •• Andersen GL: Sequence-specific identification of 18 pathogenic microorganisms using microarray technology. 2002, in press. Introduces the development of a multipathogen identification HDM for the detection and identification of viral, bacterial and eukaryotic microorganisms with biowarfare potential. 14. Vasiliskov AV, Timofeev EN, Surzhikov SA, Drobyshev AL, Shick VV, Mirzabekov AD: Fabrication of microarray of gel-immobilized compounds on a chip by copolymerization. Biotechniques 1999, 27:592-606. 15. Strizhkov BN, Drobyshev AL, Mikhailovich VM, Mirzabekov AD: • PCR amplification on a microarray of gel-immobilized oligonucleotides: detection of bacterial toxin- and drug-resistant genes and their mutations. Biotechniques 2000, 29:844-857. Introduces direct PCR-based amplification within a three-dimensional substrate (polyacrylamide porous gel pad). The on-chip amplification was able to detect genes encoding anthrax toxin, plasmid-borne β-lactamase, and Shiga toxin in addition to discerning the mutations responsible for rifampicin resistance of Mycobacterium tuberculosis strains isolated from sputum of tuberculosis patients. 16. Chizhikov V, Rasooly A, Chumakov K, Levy DD: Microarray analysis of microbial virulence factors. Appl Environ Microbiol 2001, 67:3258-3263. 17.
Westin L, Xiao X, Miller C, Wang L, Edman CF, Nerenberg M: Anchored multiplex amplification on a microelectronic chip array. Nat Biotechnol 2000, 18:199-204.
21. Zhu H, Cong J, Mamtora G, Gingeras T, Shenk T: Cellular gene expression altered by human cytomegalovirus: global monitoring with oligonucleotide arrays. Proc Natl Acad Sci USA 1998, 95:14470-14475. 22. Belcher CE, Drenkow J, Kehoe B, Gingeras TR, McNamara N, Lemjabbar H, Basbaum C, Relman DA: The transcriptional responses of respiratory epithelial cells to Bordetella pertussis reveal host defensive and pathogen counter-defensive strategies. Proc Natl Acad Sci USA 2000, 97:13847-13852. 23. Geiss GK, Bumgarner RE, An MC, Agy MB, van’t Wout AB, Hammersmark E, Carter VS, Upchurch, D, Mullins JI, Katze MG: Large-scale monitoring of host cell gene expression during HIV-1 infection using cDNA microarrays. Virol 2000, 266:8-16. 24. Corbeil J, Sheeter D, Genini D, Rought S, Leoni L, Du P, Fergurson M, Masys DR, Welsh JB, Fink JL et al.: Temporal gene regulation during HIV-1 infection of human CD4+ T cells. Genome Res 2001, 11:1198-1204. 25. Boldrick JC, Alizadeh AA, Diehn M, Dudoit S, Liu CL, Belcher CE, • Botstein D, Staudt LM, Brown PO, Relman DA: Stereotypes and specific gene expression programs in human innate immune responses to bacteria. Proc Natl Acad Sci USA 2002, 99:972-977. Utilized SM to assess the transcriptional modulation of human peripheral blood mononuclear cells when responding to S. aureus, B. pertussis and lipopolysaccharide. 26. Das; Rina (Rockville, MD); Jett; Marti (Washington, DC); Mendis; Chanaka (Falls Church, VA): Method of diagnosing of exposure to toxic agents by measuring distinct pattern in the levels of expression of specific genes. United States Patent 6,316,197. 27.
Kampke T, Kieninger M, Mecklenburg M: Efficient primer design algorithms. Bioinformatics 2001, 17:214-225.
28. Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature 2000, 405:299-304. 29. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci USA 1998, 95:14863-14868. 30. Quackenbush J: Computational analysis of microarray data. Nature Rev Genet 2001, 2:418-427. 31. Tavazoie S, Hughes JD, Campbell MJ, Cho RJ, Church GM: Systematic determination of genetic network architecture. Nat Genet 1999, 22:281-285. 32. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai HY, He YDD et al.: Functional discovery via a compendium of expression profiles. Cell 2000, 102:109-126.
18. Nallur G, Luo CH, Fang LH, Cooley S, Dave V, Lambert J, Kukanskis K, Kingsmore S, Lasken R, Schweitzer B: Signal amplification by rolling circle amplification on DNA microarrays. Nucleic Acids Res 2001, 29:U10-U18.
33. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531-537.
19. Andreadis JD, Mann TT, Russell AC, Stenger DA, Pancrazio JJ: Identification of differential gene expression profiles in rat cortical cells exposed to the neuroactive agents trimethylolpropane phosphate and bicuculline. Biosens Bioelectron 2001, 16:593-601.
34. West M, Blanchette C, Dressman H, Huang E, Ishida S, Spang R, Zuzan H, Olson JA, Marks JR, Nevins JR: Predicting the clinical status of human breast cancer by using gene expression profiles. Proc Natl Acad Sci USA 2001, 98:11462-11467.
20. Dodson JM, Charles PT, Stenger DA, Pancrazio JJ: Quantitative assessment of filter-based cDNA microarrays: gene expression profiles of human T-lymphoma cell lines. Bioinformatics 2002, in press.
35. Khan J, Wei JS, Ringner M, Saal LH, Ladanyi M, Westermann F, Berthold F, Schwab M, Antonescu CR, Peterson C, Meltzer PS: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med 2001, 7:673-679.