DNA ARRAYS AND FUNCTIONAL GENOMICS IN NEUROBIOLOGY
Christelle Thibault, 1 Long Wang, Li Zhang, and Michael F. Miles 2 The Ernest Gallo Clinic and Research Center, Wheeler Center for the Neurobiology of Addiction and Department of Neurology, Universily of California San Francisco, Emeryville, California 94608
1. Introduction II. DNA Array Formats a n d Technique A. DNA Array Formats B. Array-Based Expression Profiling C. Analysis of Array Results III, Applications in Neurobiology A. Gene Profiling in N e u r o n a l Cells B. Gene Profiling in the Brain C. Applications in Neurogenetics IV, Caveats a n d Future Needs A. Sensitivity a n d Reproducibility B. Interpretation a n d Use of the Data C. Sharing Array Data D. Cost a n d Availibility of Arrays V. Conclusion References
I. Inlroclucfion
Initiatives such as the H u m a n G e n o m e Project have led to the characterization of an e n o r m o u s a m o u n t of DNA sequence information. Already, the g e n o m e of 20 organisms, including those of S. cerevisiae, C. elegans, and D. melanogasteg, have b e e n fully sequenced. T h e identification of every gene within g e n o m e s constitutes a t r e m e n d o u s task. However, characterization of gene function in isolation and within the context of an entire g e n o m e is clearly the challenge for the next generation of biological discovery. Such "functional genomics" efforts require a new type of science that combines high-throughput analytical m e t h o d s together with extensive efforts in 1present address: Institut de G6n6tique et de Biologie Mol~culaire et Cellulaire, B.E 163, 67404 Illkirch Cedex, France. 2Author to w h o m c o r r e s p o n d e n c e should be addressed. INTERNATIONALREVIEWOF NEUROBIOLOGY,VOL.48
219
Copyright© 2001 by AcademicPress. All rightsof reproduction in anyform reserved. 0074-7742/01 $35.00
220
THIBAULT et al.
computional biology and bioinformatics. The development of DNA microarray technology is one such example of the emerging power of functional genomics. DNA microarrays theoretically allow the interrogation of expression levels for nearly all genes from even the largest of genomes. This technology has developed rapidly over the last 5 years, resulting in an exponentially increasing number of publications (Fig. 1). It has rapidly become apparent that expression profiling with DNA arrays is not simply a new technique for monitoring gene expression. Rather, this approach represents almost a new form of science that allows observation, hypothesis generation, and hypothesis testing in a nonbiased manner and at a genomic scale. In this sense, expression profiling is closely related to genetics in terms of evaluating the entire genome in a nonbiased manner. Furthermore, there are now abundant examples where expression patterns identified by DNA arrays have provided insight into the "phenotype" of a given experimental target. This "you are what you express" approach promises to contribute significantly to the functional classification and treatment of cancer, the study of drug action and drug discovery, and the molecular understanding of disease. This review will provide a background discussion of the approaches and techniques underlying DNA arrays and the application of this form of study. DNA arrays can be used for purposes other than expression profiling, for example, for DNA sequencing (Pease et al., 1994) or single nucleotide polymorphism detection (Hacia et al., 1999). However, we focus this article on the use of these arrays in expression analysis. In particular, we highlight
100.
i!iiiii:-~-" ~i!i!i~i~i~i!-ii]i]iiii!]il []]][]i]]][][
90, 80.
70. # of Publications
,.,.......-.
60. 50.
...,...m_
40,
.........
...............
30. 20. m
:::::::::::::
i~!]!~!i!!~![:............ - -
~i~i~ililili~__ ....,.,.., .,........ ,...,.,.,.,. ......,.,... ..,....,.,.,. .... ,...,:. .....,...,.,. ......,......
0.
ii
1995
~
1996
"::::::::::::--
1997
1998
1999
2000
Year FIG. 1. Growth ofDNA array publications. The n u m b e r o f D N A a r r a y p u b l i c a t i o n s o c c u r r i n g in PubMed for the last 5 years are indicated.
DNA ARRAYS AND NEUROBIOLOGY
221
early work suggesting that DNA arrays may offer significant advantages to the study of development and function of the central nervous system (CNS).
II. DNA Array Formats and Technique
A. DNA ARRAYFORMATS In essence, all DNA arrays are solid supports bearing series of DNA probes at discrete addresses available for hybridization to target DNA or RNA. Two main formats, cDNA and oligonucleotide arrays, can be distinguished by the size of the arrayed DNA fragments. The former usually contains cDNA inserts of more than 100 bases long, the latter oligonucleotides of 7 to 25 bases. Several variants of both formats differ by the method of arraying or support for DNA. Complementary DNA arrays can be made by roboticaUy spotting or printing cDNA probes onto either glass microscope slides or nylon membranes. Although oligonucleotide arrays can also be made by robotic spotting, they are usually prepared by in situ synthesis of oligonucleotide probes directly onto a glass support or by chemical attachment of presynthesized oligonucleotides. Although oligonucleotide arrays are suited for both gene expression monitoring and DNA sequence analysis, cDNA arrays are mostly intended to measure mRNA levels. 1. Complementary D N A Arrays
The glass cDNA microarray technology was largely developed in the laboratory of Pat Brown at Stanford University (Schena et al., 1995, 1996; DeRisi et al., 1996; Shalon et al., 1996). This technology, mainly used for mRNA levels quantification, involves spotting of thousands of PCR-amplified cDNA inserts onto a glass surface and hybridization of these probes with cDNA or cRNA targets prepared from cell or tissue RNA samples (Fig. 2, Eisen and Brown, 1999). It has now been widely adopted by other academic investigators (Behr et al., 1999; Khan et al., 1999; Loftus et al., 1999; Luo et al., 1999a; Wang et al., 1999a; Whitney et al., 1999) and several biotech companies, such as NEN Life Science (http://www.nenlifesci.com/) and TeleChem International, Inc. (http://arrayit.com/). A series of reviews discusses many aspects of manufacturing and using cDNA arrays (Bowtell, 1999). In addition, the Brown Laboratory web site (http://cmgm.stanford. edu/pbrown/) contains complete manuals for constructing a cDNA arrayer and protocols for usage. A related web site also offers software for management and analysis of array data (http://cmgm.stanford.edu/pbrown/ mguide/software.html).
222
cDNA clones
THIBAULT et al.
• • • • • 00000
iPCR PCR products
RNA1
RNA2
cD 1
cDNA2 label
@@@@@ @@@@@ ®@@@@
Cy3-cDNA1
I robotic array cDNA array
~
Cy5-cDNA1
~ hybridize/ ~
I Cy3/Cy5ratio I
r confocal scanning
FIG. 2. Diagram of eDNA array protocol. A typical scheme for preparation of spotted eDNA arrays is shown. The left-hand portion of the figure depicts preparation of the spotted eDNA arrays. The right-hand portion shows the generation of Cy3- or Cy5-1abeled eDNA targets from two different starting RNA preparations. The Cy3/Cy5 fluorescence ratio is generated from a single array and indicates the relative expression level for a given gene in the two starting RNA populations.
T h e source o f spotted DNA may vary greatly. Clone sets carrying known genes a n d representative ESTs are commercially available at several companies, including American Tissue Culture Collection (http://www.atcc.org/), G e n o m e System (http://www.genomesystems.com/), and Research Genetics (http://www.resgen.com/). T h e inserts in these plasmids may be amplified in 96 or 364 well plates using standard sets of P e R primers. Alternatively, eDNA libraries or collections of individual investigators may be used as a source for DNA probes. Thus, it is possible to construct an array biased toward a particular tissue, a given cell type, or a defined set o f functionally related genes. For example, Loftus et al. (1999) described microarrays enriched in neural crest-melanocyte eDNA. In this study, a database analysis a p p r o a c h was developed to identify a subset of ESTs preferentially expressed in neural crest-melanocyte tissues. Such a strategy could be useful for selecting a p p r o p r i a t e eDNA to examine transcriptional profiles of developmental processes and diseases. This flexibility has m a d e eDNA array technology a particular attractive a p p r o a c h for m a n y investigators.
DNA ARRA'~3 AND NEUROBIOLOGY
223
DNA spots are generated by deposition of a few nanoliters of purified PCR product, typically of 100"-~ 500 # g / m l . Up to 5000 DNA spots per cm 2 can be spotted on a glass surface. DNA spots range from 100 to 200 # m in diameter and are spaced by 200 to 500 # m in most cases. The support used is ordinary microscope slides. Different reagents including poly-L-lysine, amino silanes, or amino-reactive silanes are used for coating the slide surface to improve DNA coupling and limit the spread of spotted DNA droplets. DNA spotting on treated slides is p e r f o r m e d robotically. The essential c o m p o n e n t most widely used is a quill-based spotting system (http://cmgm. s t a n f o r d . e d u / p b r o w n / ) . In this format, DNA is drawn into the quill by capillary action, and a small a m o u n t of DNA is then spotted by tapping the tip onto the glass slide (Schena et al., 1995). Before hybridization, spotted DNAs are cross-linked to the matrix by ultraviolet irradiation. The slides are further treated to reduce positive charges caused by residual amines and the spotted DNA is finally denatured by heat or alkali treatment. Different commercial manufacturers have subsequently developed arrayers displaying improved or different designs of spotting mechanisms. With an increase in the choice of arrayers and a concomitant decrease in price, it is now often preferable to buy a commercial arrayer. Additional information about commercial arrayers and many other issues regarding microarrays can be accessed through a variety of web sites (see http://industry.ebi.ac.uk/~alan/MicroArray/). Although glass arrays have set the stage for cDNA array technology, other matrices may be used. Nylon membranes are the most c o m m o n alternative support (Nguyen et al., 1995; Chen et al., 1998; Sehgal et al., 1998; Bertucci et al., 1999b; B u b e n d o r f et al., 1999; Khodarev et al., 1999; Moch et al., 1999). Based on DNA spot diameters, spacing and densities, nylon m e m b r a n e arrays are categorized into macroarrays and microarrays. Macroarrays refer to membranes with DNA spot diameter of typically 0.5 to 1 mm, spot spacing of 1 to 2 mm and a spot density of 10 to 100 per cm 2. Microarrays typically contain DNA spots of 100 to 200 ~ m diameter spaced by less than 300 #m. Their spot density ranges between 500 and 5000 per cm 2. In general, the same array spotting systems as described for glass arrays can be used for nylon membranes. However, certain types of pin such as the Pin-and-Ring system from Genetic Microsystems may perform better with membranes. Although some studies report the construction of custom m e m b r a n e arrays (Bertucci et al., 1999a; Pi6tu et al., 1999; Song et al., 1999), a variety of commercial filters have been widely used (Chen et al., 1998; Ollila and Vihinen, 1998; Sehgal et al., 1998; B u b e n d o r f et al., 1999; Khodarev et al., 1999; Moch et al., 1999; Rajeevan et al., 1999). These membranes contain anywhere from a few h u n d r e d to more than 20,000 cDNAs. For example, Clontech (http//www.clontech.com/atlas/) offers several cDNA arrays containing more than 500 human, mouse, or rat cDNAs with newer versions
224
T H I B A U L T et al.
having 1176 cDNAs. Application-targeted arrays, spotted on membranes or glass slides, are also now being offered by a number of companies. 2. Oligonucleotide Arrays
Unlike cDNA arrays, oligonucleotide chips are essentially commercial products. Affymetrix (Santa Clara, CA) has developed multiple high-density oligonucleotide arrays (GeneChip) for various applications, including largescale gene expression analysis, DNA resequencing, detection of single nucleotide polymorphisms (SNPs), and mutation screening (Lipshutz et al., 1995; Chee et al., 1996; Hacia et al., 1996; Lockhart et al., 1996). Hyseq, Inc., has developed universal sequencing chips containing an arrayed library of all possible oligonucleotides of a given length, usually 8 to 9-mer (Drmanac et al., 1998). Several companies also offer spotted oligonucleotide arrays, manufactured in a fashion similar to spotted cDNA arrays. These have generally been of low density due to the cost of synthesizing large numbers of oligonucleotides through traditional chemistry methods. The manufacturing of all GeneChip arrays involves a combination of solid-phase oligonucleotide synthesis chemistry and photolithography (Fodor et al., 1991, 1993; Pease et al., 1994). This approach allows the rapid production of very high-density arrays with a scalable manufacturing approach. The essential steps of the process are somewhat similar to photolithography techniques used by the semiconductor industry. The glass surface and nucleotides have reactive groups with a photo-labile protecting group. Oligonucleotides are synthesized in a massively parallel fashion simply by flooding the glass surface with different nucleotides and using photolithography masks to dictate which sites have a nucleotide coupled when the array is exposed to light. Sequentially altering the nucleotide solutions and photolithography masks allows the synthesis of specific oligonucleotides (20 nucleotides in length) at known locations with very high density. The complete set of 4 N polydeoxynucleotides of length N, can be synthesized in 4 x N cycles. Large-scale commercial manufacturing methods allow for approximately 300,000 polydeoxynucleotides to be synthesized on small 1.28- x 1.28-cm arrays. Oligonucleotide arrays carrying more than 1 million probes are also reportedly being developed (Lipshutz et al., 1999). The selection and final arrangement of oligonucleotide probes on the arrays is application specific. For expression analysis, oligonucleotides are chosen based on sequence information from known genes and ESTs and served as probe for hybridization to RNA samples. Each gene is represented by a set of 20 different 20-mer oligonucleotide probe pairs synthesized side by side on the chips (Lockhart et al., 1996; Wodicka et al., 1997; Fig. 3). More
225
DNA ARRAYSAND NEUROBIOLOGY
Total RNA 5 ,/~
~,u~.
Rtasel Pol I1=
/,
dsDNA
T7 pol
AAAA-T7
Biotin-cRNA / ~ / V - TTTT- 5"
TTTT-T7 :
CTP-biotin
OligoldT)-T7
Hybridizat
Scanning
4~-PM
Steptavidinphycoerythrin
I~-MM FIG. 3. Oligonucleotide array protocol. The experimental protocol for preparation of target and hybridization of GeneChip oligonucleotide arrays is depicted. The arrays are prepared by photolithographic synthesis of oligonucleotides in situ, as described in the text. Biotinlabeled complementary RNA (cRNA) is synthesized from the starting RNA sample by reverse transcriptase (RT) and DNA polymerase (Pol) in the presence of an oligo(dT) primer containing a T7 RNA polymerase recognition site [Oligo(dT)-T7]. This then allows the synthesis of cRNA with T7 polymerase (T7 pol). Following hybridization, target cRNA is quantitated by staining with streptavidin/phycoerythrin and scanning confocal microscopy. Comparing the hybridization intensity for a given gene on two different arrays generates relative expression changes.
recent arrays disperse the 20 different oligonucleotide probes pairs randomly across the chip to avoid systematic errors f r o m local b a c k g r o u n d fluctuations. Each pair corresponds to a perfect match (PM) oligonucleotide, perfectly c o m p l e m e n t a r y to the gene sequence of interest, and a mismatch (MM) oligonucleotide, identical to its PM c o u n t e r p a r t except at its central position where a mismatch base is inserted. Gene expression levels are calculated based on the difference in PM and MM average intensity across the entire set of probes. This a p p r o a c h reportedly reduces the contribution of b a c k g r o u n d and cross-hybridization, while increasing the quantitative accuracy and reproducibility of the measurements. Commercial oligonucleotide arrays currently are available for h u m a n (~40,000 genes and ESTs), mouse (~30,000 genes and ESTs), and rat (~24,000 genes and ESTs) as well as a set containing all yeast open-reading frames. O n e drawback to this system is the cost. T h e oligonucleotide arrays themselves are expensive and require a dedicated scanner, fluidics station, hybridization oven, c o m p u t e r workstation, and analysis software. T h e high price of this system has prevented its use by m a n y academic centers. However, analysis on Affymetrix arrays
226
THIBAULT et
al.
can be done by a commercial source (Research Genetics), thus eliminating some of the startup e q u i p m e n t expenses. Furthermore, the cost of arrays has decreased significantly during the last 2 years.
3. Summary Since the first description of two-color hybridization to microarrayed DNA on a solid support (Schena et al., 1995), multiple DNA arrays technology have been developed. Currently, spotted cDNA or oligonucleotide arrays are the most widely used, due mainly to their relative affordability, flexibility, and the significant a m o u n t of web-based support for such systems ( h t t p : / / c m g m . s t a n f o r d . e d u / p b r o w n / ) . In the short time since the introduction o f high-density DNA arrays, a large n u m b e r of commercial vendors have developed products for this area. This offers the h o p e o f rapid progress in both quality and sensitivity of the e q u i p m e n t and their widespread usage due to improved pricing. Furthermore, the e q u i p m e n t for making and analyzing spotted arrays is well-suited to core facilities due to the high throughput of the arrayers. Many universities have or are currently establishing such array core facilities.
B. ARRAY-BASEDEXPRESSION PROFILING All DNA array technology applications involve two main techniques: DNA sequencing and gene expression monitoring. Both are hybridizationbased and use the i n h e r e n t property of nucleic acids to recognize and base pair with complementary sequence. Expression chips are designed to evaluate the absolute representation of thousands o f RNA species in cell or tissues simultaneously, or to assay their relative abundance between two or more samples. They are commonly used to measure changes in gene expression as a function of cell or tissue type, physiological state, or pharmacological treatment.
1. GeneExpression Monitonng Array-based mRNA quantification methods are a direct extension of N o r t h e r n and dot blot analyses. In these traditional hybridization techniques, a complex mixture of RNA is spotted onto a solid support and interrogated for the abundance of a given target by incubation with a labeled complementary probe. When using DNA arrays, a large collection of probes is b o u n d at discreet locations onto a solid substrate and a complex mixture of labeled target RNAs is m o n i t o r e d simultaneously. The n u m b e r of probes present on the chip determines the n u m b e r of target mRNAs screened in a single hybridization. T h e abundance of individual nucleic acids in the target
DNA ARRAYS AND N E U R O B I O L O G Y
227
sample is reflected by the intensity of hybridization to their corresponding probe on the chip. All experiments are conducted u n d e r conditions of a large excess of probe relative to labeled target to minimize interprobe competition. The nature and preparation of target samples as well as detection methods vary according to the type of array used. Nylon m e m b r a n e arrays commonly u s e 32p_ or SsP-labeled cDNA as targets and autoradiography detection methods ( B u b e n d o r f et al., 1999; Khodarev et al., 1999). However, analysis of gene expression using chemiluminescence and colorimetry methods has also been successfully conducted on high-density filter arrays (Chen et al., 1998; Rajeevan et al., 1999). Target samples are typically prepared by oligo (dT)-primed reverse transcription of an entire population of total or poly-A+ RNA, in the presence of labeled dCTE Each sample is hybridized to a different m e m b r a n e or sequentially to the same m e m b r a n e after stripping off the previous target sample. Following hybridization and adequate washing, membranes are analyzed by high-resolution phosphor imaging. Commercial readers and arrayers usually provide adapted software for data analysis. Alternatively, custom software have also been developed ( B u b e n d o r f et aL, 1999). T h e printed cDNA microarray technology uses fluorescently labeled cDNA as targets and a two-color confocal detection m e t h o d (see Fig. 2, Schena et al., 1996). Two RNA samples to be compared are reversetranscribed into cDNA in the presence of two spectrally distinct fluorescent dyes. They are then mixed together and subsequently hybridized on the same chip. The relative representation of a given mRNA between two samples is determined by measuring the ratio of the fluorescent intensities of the two dyes at the cognate probe. The most c o m m o n combination of labeled nucleotides is Cy3- and Cy5-dUTE Images p r o d u c e d using these two dyes can be acquired with a laser confocal microscope emitting light at 540 and 650 nm, respectively. Software for quantitation of Cy3/Cy5 hybridization intensities, identification of spots, and data m a n a g e m e n t are available from a n u m b e r of commercial sources or as downloadable software (see http://industry.ebi.ac.uk/~alan/MicroArray/ or http://cmgm.stanford. edu/pbrown/mguide/software.html). Quantification of mRNA levels with Affymetrix GeneChips uses fluorescently labeled cRNA as target and confocal scanning microscopy for detection (Fig. 3, Lockhart et al., 1996). The m e t h o d of cRNA preparation is derived from the antisense RNA amplification technique developed by Eberwine (Kacharmina et al., 1999). The entire population of total or poly-A+ RNA from cells or tissues is first reverse-transcribed into doublestranded cDNA using an oligo (dT) primer containing the p r o m o t e r sequence for the bacteriophage T7 RNA polymerase. This enzyme is then used to in vitro transcribe cDNAs into cRNAs in the presence of biotin-labeled
228
T H I B A U L T et al.
dCTP and dUTP. This latter step produces a linear amplification of the target. Before hybridization, biotinylated cRNAs are fragmented into small oligonucleotides to minimize secondary structure formation. Target cRNAs are then hybridized each to a different chip. Arrays are subsequently stained with phycoerythrin conjugated-streptavidin, washed, and scanned at a single wavelength. Following scanning, absolute and comparative analyses are completed using Affymetrix GeneChip software (Lockhart et al., 1996). Each of the methodologies described has specific advantages and limitations that may influence the choice of array used in a given experiment. For example, unlike oligonucleotide and cDNA glass arrays, which are singleuse only, membrane arrays can be stripped and rehybridized three to five times. The use of radioactive targets with these filters offers a high intrinsic sensitivity and a large dynamic range of detection. Expression levels can be measured using 1 to 10 #g of starting total RNA and relative mRNA abundance can be determined down to 1 in 10,000 or less (Granjeaud et al., 1999). Glass arrays, however, are unique by allowing a two-color hybridization approach valuable for direct comparison of two samples to the same target. The use of fluorescence and confocal microscopy offers a high resolution of detection and quantification of relative abundance of about 1 in 100,000 (Granjeaud et al., 1999). However, the use of fluorescently labeled cDNA instead of a3P-labeled cDNA increases the amount of starting total RNA necessary to 100 or 200 tzg (Bowtell, 1999). Amplification of target samples using T7 RNA polymerase as described in Affymetrix protocols reduces this amount to 5 or 10 #g (Lockhart et al., 1996). Such an approach can, however, easily be adapted for hybridization to slide arrays (Luo et al., 1999a). Although the quality of printed cDNA arrays can vary greatly with the source of cDNA spotted, direct in situ synthesis of oligonucleotide probes onto glass generates highly reproducible chips. In addition, at present, no other method gives such high density of probes. The redundancy of oligonucleotides representing each gene on these arrays improves the quantification, specificity, and reliability of measurement (Lockhart et al., 1996). The use of oligonucleotides is also advantageous by theoretically allowing one to avoid gene regions that are repetitive or homologous to other genes. However, because oligonucleotide probes are designed based on sequence information alone, unknown genes cannot be screened using these arrays. Furthermore, the design and synthesis of oligonucleotide arrays by photolithography also severely limits the ability to easily update or change the content of the arrays. 2. S u m m a r y
DNA array technology offers tools for functional genomics through gene expression analysis and DNA sequencing. Array-based techniques all involve
DNA ARRAYS AND NEUROBIOLOGY
229
hybridization of a complex mixture of labeled nucleic acid targets to a defined set of DNA probes on a chip. In expression studies, targets typically represent the entire mRNA population of a cell or tissue sample. They are prepared by reverse transcription followed or not by in vitro transcription and are radioactively or fluorescently labeled. Multiple cDNA or oligonucleotides complementary to specific mRNA species are used as probes on the chip. Probes are typically oligonucleotides complementary to a given reference sequence a n d / o r to its subsequences. Array studies mainly involve comparative analyses between a test and reference samples and use differences in pattern and intensity of hybridization signal to extract information on the nucleic acid target sequence or on its abundance.
C. ANALYSISOF ARRAY RESULTS
1. Data Management and Elemental Analysis
In many regards, DNA array technology would not be possible without considerable advances in bioinformatics. The complexity of the hardware and the immense size of data involved in the microarray technology make it necessary to have computer assistance at every step from manufacturing the arrays, image scanning, data storage/retrieval, data mining and analysis, and finally, to publication (web based). The bioinformatics issues that involve these steps have been addressed in several reviews (Ermolaeva et al., 1998; Bassett et al., 1999; Claverie, 1999; Vingron and Hoheisel, 1999; Zhang, 1999). Here, we focus on data analysis issues. Simple sorting and filtering of array data is often used to generate lists of genes showing the largest or "most significant" changes due to a given biological perturbation. In many cases, the biologically interesting changes in gene expression do not exceed twofold, which is currently the c o m m o n detection limit of the available methods (Audic and Claverie, 1997). Using traditional measures of statistical validity is problematic with array data given the large n u m b e r of simultaneous observations. Correcting for the n u m b e r of observations would require a prohibitively large n u m b e r of experimental repetitions or high statistical thresholds (Claverie, 1999). Hilsenbeck et al. (1999) has proposed to use principal c o m p o n e n t analysis to identify significant altered expression, but some controversies remain (Wittes and Friedman, 1999). However, in general, there appears to be agreement that multivariate approaches such as clustering (see below) allow detection of significant correlation in expression profiles (Claverie, 1999). These correlated groups of genes or samples allow interrogation of expression changes that, for a single gene or sample, might not approach statistical significance.
230
THIBAULT et aL
Additional tools to aid DNA array data analysis have included multiple visualization techniques to look at the same data. Projecting the expression data onto known pathways and genetic circuits provides valuable clues to functional roles of the genes in context (DeRisi et al., 1997). T h e r e are commercially available programs that make this task easier (e.g., GeneSpring by Silicon Genetics, Inc., http://www.sigenetics.com/GeneSpring/ Overview.htm). 2. Multivariate Analysis
DNA array technology generates large sets of multivariate data. It is thus natural to use multivariate statistical methods, such as clustering and multidimensional scaling, to organize and visualize these data. In a clustering algorithm, data are g r o u p e d into clusters so similar incidents are in the same clusters. For DNA array data, typically used clustering methods are hierarchical clustering (Eisen et al., 1998; Alon et al., 1999) and self-organizing maps (Tamayo et al., 1999; T 6 r 6 n e n et al., 1999). Genes that display similar patterns of expression across a set of experimental conditions suggest there may be related biological functions among the genes. Ideally, functionally related genes are g r o u p e d into the same clusters. By examining the gene expression clusters, one may deduce new functions based on partial knowledge of the functions o f the genes in the clusters. In addition, genes in a given cluster may share mechanisms of regulation such as comm o n p r o m o t e r motifs (DeRisi et al., 1997). Efforts have been made to refine the p e r f o r m a n c e of clustering algorithms for array data (Heyer et al., 1999). An additional use of applying clustering techniques to array data is to discern relationships between the "samples" rather than just the "genes." Using two-dimensional hierarchical clustering approaches, several papers have p r o d u c e d a molecular classification of cancer (Golub et al., 1999; Ross et al., 2000). The expression profiles of a given cancer biopsy or cell line contain predictive information about the biological behavior of the particular sample. Thus, such efforts may produce an improved understanding of cancer biology and the p e r f o r m a n c e of various treatment modalities. Hierarchical clustering assumes that the association of genes in the cluster is based on an underlying relationship in terms of either function or mechanism of regulation. Thus, the importance of minimizing parsimonious inclusion of unrelated genes in a cluster and the elimination of clusters derived by trivial correlation due to an outlier (Heyer et al., 1999). In practice, however, there are multiple sources of potential error in correctly identifying meaningful clusters of genes. First, although the observed expression patterns are the results of a connected network of molecular interactions, there are many factors that obscure the causal relationship. These
DNA ARRAYS AND N E U R O B I O L O G Y
231
factors include (1) regulatory mechanisms that operate at the translation stage that are inaccessible from expression data; (2) secondary structure of mRNAs affecting hybridization interactions (Southern et aL, 1999); (3) cross-hybridization of homologous sequences; (4) RNA alternative splicing; and (5) sample variation incurred in RNA extraction, labeling, hybridization, and detection. Thus, it is actually a technological wonder that expression data can reveal many coregulated gene groups by using a simple hierarchical clustering algorithm (Eisen et al., 1998). In addition, there is the challenge of sparse data. Here, the data are considered sparse in the sense that there are often not enough replicate experiments to assess errors in the data and to show reproducibility. A statistical analysis of cDNA array hybridizations strongly suggested that at least three replicates be used in designing such experiments (Lee et al., 2000). In contrast, many reported studies contain only duplicate hybridizations or time course studies with single determinations per time point (DeRisi et al., 1997). But perhaps even more important, there are often not e n o u g h biological "states" monitored by the designed experiments. This second kind of sparseness may have been less well appreciated. Obviously, measuring gene expression u n d e r the same biological condition over and over does not present more information (other than providing better assessment of data precision). Moreover, if the expression profile (defined as a vector of all gene expression levels measured) monitored at a new "state" is a linear combination of previously measured expression profiles, the new profile does not provide any new information either. This point has been illustrated in Raychaudhuri et al.'s work using principal components analysis (Raychaudhuri et al., 2000). This showed that expression array data containing a 7-point time series could be roughly decomposed into two principal components (i.e., there are only two i n d e p e n d e n t "states" being monitored). With only two states being monitored, clustering the gene expression data can only have limited resolution power; the boundaries of clusters of functional groups are nearly invisible in the space of expression profiles. Many statistical m e t h o d alternatives to hierarchical clustering and SOMs are also being studied (Spanakis and Brouty-Boy~, 1997; Golub et al., 1999; Vingron and Hoheisel, 1999). A simple linear model was developed to portray the temporal expression data from CNS development and CNS injury. In this model, the change of each gene's expression is a linear combination of the expression levels of all genes on the chip. The m e t h o d was successful in that it can reproduce the original data and it can reveal major functional gene interactions (D'Haeseleer et al., 1999). Another interesting observation made in this study is that a gene either plays the role of a positive or a negative regulatory gene but rarely a mixture, which seems to be consistent with what is commonly known in genetic networks.
232
THIBAULT et al.
An analysis evaluated the use o f s u p p o r t v e c t o r machines (SVMs) to characterize expression array data (Brown et al., 2000). SVMs are considered a supervised c o m p u t e r learning method. Unlike hierarchical clustering, SVM begins with a training set to specify in advance which data should cluster together. SVM would learn to discriminate between the m e m b e r s and the n o n m e m b e r s of a given functional group. T h e m e t h o d allows a researcher to start with a set of interesting genes and ask two questions: What o t h e r genes are coexpressed with my set? Does my set contain genes that do not belong? Brown et al. (2000) showed that SVMs worked well with some functional groups but did not work for others. Evidently, this is due to the fact that not all functional groups display a "theme" in expression profiles that can be captured by SVMs. Regardless of the difficulties entailed in efficient g r o u p i n g of genes by such m e t h o d s as hierarchical clustering, a l a n d m a r k study showed a p r o o f of principle in terms of using array analysis to derive new biological information regarding the function of a g e n e or a given drug t r e a t m e n t (Hughes et al., 2000). Using a large matrix of yeast array data derived f r o m analysis o f 300 diverse mutations and chemical treatments, this study showed that uncharacterized genes could be correctly classified according to their position in the cluster diagram. Functionally relevant clustering of genes could be derived for even subtle changes in expression (< 1.5-fold), such as those involving mitochondrial respiration genes. F u r t h e r m o r e , a potential novel m e c h a n i s m of action was derived for dyclonine, a c o m m o n l y used local anesthetic agent. Overall, this study strongly validates the power of collecting large DNA array datasets to generate novel functional information a b o u t uncharacterized genes or pharmacological agents. DNA arrays are still a young technology. Many p r o b l e m s faced by researchers are actually rooted in the high cost or low availability and reproducibility of the arrays. As the technology matures, the m a n u f a c t u r i n g costs o f the arrays or e q u i p m e n t are expected to decrease and the quality of the arrays to improve. Moreover, as m o r e and m o r e data are accumulated and shared by m o r e research laboratories in academia as well as in industry, "the array of h o p e " (Lander, 1999) is expected to flourish in the postg e n o m i c era.
III. Applications in Neurobiology
A. GENE PROFILING IN NEURONAL CELLS
T h e p h e n o t y p e of any given cell type is direcdy related to the set of genes it expresses. T h e level and timing of expression of these genes usually dictate
DNA ARRAYS AND N E U R O B I O L O G Y
233
development, differentiation, function, and physiology. Thus, determining the pattern of genes expressed in a cell should provide information on its state and function. Using traditional cellular and molecular techniques, enormous work is usually n e e d e d to identify which genes are expressed in a cell of interest and what their biological function is within this cell. DNA array-based expression analysis promises to greatly facilitate these studies. It offers, for the first time, the possibility not only to access the identity and expression level of thousands of genes simultaneously, but also to investigate how these genes interact with each other. Over the past few years, genomewide transcriptional profiling has rapidly become an essential tool to study cell function and regulation. Below, we describe the use of this technology to fingerprint neuronal cells and to characterize their molecular responses to drugs.
1. Analysis of Differential Gene Expression between Neuronal Subtypes The nervous system is characterized by a tremendous variety of cellular phenotypes. Each cell subtype, whether neuronal or glial, holds specific structural and functional properties. N e u r o n subpopulations are typically classified according to their location in the nervous system, their morphology, their electrophysiological properties, the neurotransmitter they synthesize, or the receptors they express. By studying differential expression between cell types or neuronal subtypes, DNA array technology should help identifying genes that are selectively expressed in a few given cell types. Identification of these genes should provide information on the contribution of an individual cell type to a particular biological function or physiological state. In vitro, it should help characterize neuronal cell lines commonly used as cell models, whereas in vivo, it may provide a basis to define cellular diversity in the nervous system. In addition, the identification of coordinate expression of a group of genes in a restrictive set of cells may indicate functional coupling between their encoded proteins and therefore help define their function. Profiling h o m o g e n o u s cultured cells such as neuroblastoma cell lines is relatively simple in a technical sense. However, to access cell-type specific gene expression in an organ such as the brain it is preferable to be able to analyze gene expression at the single cell level. This now appears possible with the advent of antisense RNA (aRNA) amplification methods (Kacharmina et al., 1999). The sequential application of this technique allows amplification of the entire mRNA population from a single cell to levels sufficient for DNA array hybridization. It can be applied not only to single cells in culture, but also to individual cell in situ, in a fixed tissue (Kacharmina et al., 1999). As described in the previous section, the mRNA population of a single cell is first reverse transcribed into double-stranded cDNA in the presence of an oligo (dT) primer containing the p r o m o t e r of T7 RNA polymerase.
234
THIBAULT et al.
Complementary DNAs are futher in vitro transcribed into aRNA. A second and third r o u n d of amplification can then be carried out using the aRNA p r o d u c e d as template for a new cDNA synthesis and in vitro transcription. Luo et al. (1999a) used this approach to analyze gene expression profiles not from single cell but from 1000 neurons simultaneously. The authors integrated laser capture microdissection, aRNA amplification, and cDNA microarray technologies to examine differential gene expression between neighboring large- (>50/~m) and small- (<25 # m ) size neurons in the dorsal root ganglia (DRG). These two populations of neurons are functionally distinct: While large neurons transmit mechanosensory information, small neurons dispatch nociceptive stimuli. Neurons of each population were microdissected from Nissl-stained rat DRG sections using laser capture microscopy (Luo et al., 1999a). Transcriptional profiles were subsequently c o m p a r e d using a printed cDNA array containing 477 cDNA clones. The authors identified 40 mRNA species preferentially expressed in either large or small neurons. Although there is a m u c h greater diversity than just small and large neurons in DRG, this study constitutes a first step in correlating gene expression profiles with specific neuronal function. DNA array technology is fairly new and we are still far from being able to extract cell-type specific gene expression patterns from the data generated to date. This will require profiling of h u n d r e d s of different cell types. However, it is not unreasonable to predict that such analyses will be possible in the future. In particular, current efforts seek to establish public repositories of expression data where consistent normalization schemes and relational database structures will allow defining expression profiles correlated with given cell types or physiological states (http://www.ncbi.nlm.nih.gov/ geo/). 2. Characterization of Cellular Effects of Drugs or Ligands Cell function and survival is u n d e r constant regulation by autocrine, paracrine, or endocrine signals. Perhaps nowhere is the regulation of cellular phenotype by extracellular signals as complex as in the nervous system. Activation of m e m b r a n e and nuclear receptors by extracellular stimuli is known to ultimately lead to changes in gene expression. Because of complex cross-talk between signaling pathways, it is often difficult to identify target genes activated or repressed by these stimuli. Although useful, traditional candidate gene approaches, subtractive hybridization and differential display methods are greatly limited by the n u m b e r of the genes m o n i t o r e d at a time. DNA array technology now provides tools to examine changes in gene expression on a genome-wide scale and offers, for the first time, the opportunity to study transcriptional changes in the context of all complex networks operating in a cell.
DNA ARRAYSAND NEUROBIOLOGY
235
Although not conducted on neuronal cell models, several studies illustrate the power of DNA array technology to study signaling events. Der et al. (1998) generated gene expression profiles from interferon (IFN)-a, -fl, and -y treatment of the h u m a n fibrosarcoma cell line HT1080. These different interferons are believed to bind their respective receptors and subsequently activate the JAK/STAT signaling cascade. This study successfully identified known IFN-stimulated genes but also revealed unsuspected IFNspecific inductions, as well as novel IFN-regulated genes, such as those coding for apoptosis regulators. Iyer et al. (1999) investigated the response of h u m a n fibroblasts to serum, which contains most of the growth factors necessary for proliferation in culture. In this case, printed cDNA arrays were used to measure the temporal changes in gene expression of 8613 h u m a n genes after addition of serum-containing medium to primary quiescent fibroblasts. As expected, addition of serum-induced changes in numerous genes known or likely to be involved in controlling and mediating the proliferative response. However, strikingly, serum also triggered changes in multiple genes with known roles in wound healing, suggesting an underestimated role of fibroblasts in wound repair. In a third example of using DNA arrays to study ligand-induced signaling, Fambrough and colleagues (1999) studied growth factor receptor induction of immediate early genes. Surprisingly, these investigators showed that multiple signaling cascades regulated by the platelet-derived growth factor receptor produced overlapping sets of immediate early genes. Again, such results display the ability of DNA arrays to provide a nonbiased, and often surprising, analysis of expression patterns rather than single gene data. Applied to neuronal cell biology, DNA array technology could help gain a better understanding of the cellular effects of neurotransmitters, neuronal growth factors or CNS-acting drugs. Thus, in our laboratory, we have used oligonucleotide arrays to examine the cellular effects of ethanol in a cultured h u m a n neuronal cell line (Thibault et al., 2000). Ideutification of ethanolresponsive genes has been difficult, perhaps due to the pleotropic nature of this drug. Indeed, ethanol has been shown to modulate multiple neurotransnfitter receptor functions and several signaling cascades (Diamond and Gordon, 1997). By screening 6000 genes simultaneously, we identified more than 40 genes reproducibly up or down regulated in SH-SY5Yneuroblastoma cells after 3 days of ethanol treatment. Among the genes up regulated by ethanol were multiple genes involved in n o r e p i n e p h r i n e biosynthesis, such as dopamine beta-hydroxylase. We verified that ethanol indeed increased releasable n o r e p i n e p h r i n e in these cultures. Numerous other studies have suggested a possible role for n o r e p i n e p h r i n e in ethanol-related behaviors. Our array studies also provided mechanistic information by showing that >30% of the ethanol-responsive genes were also regulated in a similar fashion by
236
THIBAULT et al.
the cyclic AMP analog, dibutyryl cAME This result strongly suggested that cAMP signaling could be involved in a significant portion of responses to ethanol. Thus, these studies demonstrated that, in a nonbiased fashion, arrays can contribute to understanding a "phenotype" by identifying multiple members of a biological pathway and generate mechanistic information by comparison to known modulators of specific signaling cascades. DNA array technology also provides an efficient way to test the specificity of therapeutic drugs based on the transcriptional responses they trigger. It may be used to clarify their mechanism of action, predict their potential side effects, or define their efficacy. In an elegant work, Gray et al. (1998) integrated combinatorial chemistry and DNA array technologies to investigate the specificity and affinity of various cyclin-dependent kinase (cdk) inhibitors in yeast. This study showed that two drugs intended to inhibit the same cellular process and cellular proliferation, as well as produce distinct transcriptional profiles despite their similar in vitro activity (i.e., inhibition of cdk28). Such an approach would be undoubtedly useful to better characterize commonly used receptor agonists or antagonists. 3. Identification of Transcription Factor Targets
Transcription factors are key players in the orchestration of cellular changes in gene expression. Following their activation through receptor stimulation, they recognize and bind specific DNA sequence in the promoter of target genes and directly activate or repress their transcription. One possible application of DNA array technology is the characterization of all genes whose expression changes when a transcription factor is mutated. The first study of this type was reported by DeRisi et al. (1997) who identified all yeast genes whose expression is affected by the deletion of Tupl transcription factor gene. In another study in neural cells, downstream gene targets of the Gsh-1 homeobox were screened for in murine cell lines derived from embryonic hypothalamus and hindbrain (Li et al., 1999). Gsh-1 is expressed in several discrete regions of the developing brain and is believed to play a role in dorsal-ventral patterning of the CNS. The authors used the promoter of Gsh-1 to drive the expression of the SV40 T antigen gene in Gsh-1 null mice. Introduction of this gene allowed the immortalization of cells normally expressing Gsh-1. By stably transfecting a tet-inducible DNA construct coding for Gsh-1 in clonal Gsh-1-/-cell lines, they were able to compare gene expression profiles between clonal cells expressing or not expressing Gsh-1. These transcriptional analyses suggested a role for Gsh-1 in the regulation of genes involved in cell growth, differentiation, and patterning. Many transcription factors are of particular interest because of their recognized or potential association with h u m a n diseases. Thus, mutation in the homeobox gene HESX1 has been linked among others with septo-optic
DNA ARRAYS AND NEUROBIOLOGY
237
dysplasia (Dattani et aL, 1998). Similarly, several transcription factors, including p53, have been associated with the development of h u m a n neoplasia (Levine, 1997). DNA microarray technology now provides the opportunity to study the broad effects of these transcription factors on gene expression and potentially elucidate their role in pathogenesis. The main challenge of such studies remains the interpretation of the data. Indeed, all genes changing after mutation, deletion, or overexpression of a transcription factor may not be direct targets. Some of these changes may solely reflect adaptation of the cells to the absence or overexpression of the gene of interest. Alternatively, they may result from a secondary or tertiary effect of the mutated gene. The use of p r o m o t e r sequence information from the GenBank database could facilitate the cataloging of the actual transcription factor targets through identification of upstream regulatory sequences. Cho et al. (1998) characterized the expression profiles of all yeast genes during the cell cycle. These genes were subsequently clustered according to their expression pattern over time. Alignment of the p r o m o t e r sequence of coregulated genes further lead to the identification of previously undetected upstream regulatory elements. As more sequence and transcription data are available, such approaches promise to become increasingly efficient. 4. Summary
Quantification of mRNA level has been used for a long time as a tool to study gene function and regulation. Knowing where and when a gene is expressed and u n d e r what conditions its expression is modulated generally provide strong clues about its function and its contribution to cellular processes. Most studies have focused on the characterization of one or few genes at a time. Recent work in neural or nonneural cells has shown that DNA arrays can provide information regarding changes in cellular phenotype, as well as mechanistic insight regarding operative signaling cascades or transcription factors.
B. GENE PROFILINGIN THE BRAIN One of the greatest tasks of neuroscience is to determine how the brain mediates sensory and motor functions, as well as more elaborate processes such as emotions or learning. Traditionally, neuroanatomy and (electro) physiology studies have been the methods of choice to locate brain activity in relation to behavior. Imaging techniques such as positron emission tomography or functional magnetic resonance imaging have permitted the generation of new overall maps of brain activity in a noninvasive m a n n e r (Volkow et al., 1997). These technologies have yielded valuable insights into the
238
T H I B A U L T et al.
biological interrelation of sensory, motor, and cognitive functions, as well as brain disease. However, they give little information on the molecular mechanisms involved in the generation of these maps. Advances in molecular biology, however, gave rise to a more reductionist approach aimed at correlating single genes with a given behavior. However, few if any behaviors are regulated by a single gene, and it is essential to understand the contribution of all genes at the system level. DNA array technology now offers a new and complementary paradigm for studying brain organization and function. By profiling the expression of thousands of genes simultaneously in brain subregions during development or in response to experience/treatments, array studies should help integrate molecular and functional neurobiology. A major caveat to the use o f DNA arrays for studying brain gene expression concerns the sensitivity of the method. Existing literature on cDNA or oligonucleotide arrays generally quotes a sensitivity of 1 : 100,000 for detecting a twofold change in expression (Bertucci et al., 1999a). This corresponds to a level of "-~3 molecules/cell, assuming that a typical cell has about 3 x 105 mRNA molecules. Considering the heterogeneity of cell types (neurons and n o n n e u r o n a l ) existing in the brain, this implies that a rare transcript expressed in a small subset of neurons will likely go u n d e t e c t e d by existing array technology. Methods to address this issue are discussed below (Section IV.A).
1. Regional and Temporal Gene Expression Mapping T h e nervous system is anatomically highly organized. Furthermore, its structural organization closely reflects its functional organization. T h e r e is compelling evidence that different regions of the brain are specialized for different functions. One classic example o f this functional regionalization concerns the neocortex, which can be divided into primary sensory and m o t o r cortices, higher o r d e r sensory and m o t o r cortices, and cortical association areas that all specialize in different functions. However, it also appears that even the simplest behavior involves multiple parallel neural systems and pathways--sensory, motor, and motivational--in the brain. A clear understanding of the anatomical organization of the brain is essential to the understanding of behavior in both normal and disease states. It is predicted that neurons of different brain regions having unique functions will have different patterns o f g e n e expression. Comparison o f g e n e expression profiles between brain regions using DNA array technology may thus be used as a tool to define the molecular n e u r o a n a t o m y o f the brain. Such an approach was reported using oligonucleotide arrays to study expression of 11,000 genes/ESTs in murine brain (Sandberg et al., 2000). These investigators c o m p a r e d expression patterns across six different brain regions in two different inbred mouse lines, 129SvEv and C57BL/6. They f o u n d that approximately 1% of expressed genes were differentially
DNA ARRAYS AND NEUROBIOLOGY
239
expressed between the two mouse lines in at least one brain region. Most important, these studies identified candidate genes that could contribute to phenotypic differences between the two mouse lines, namely, their different susceptibility to seizures. This work not only showed the feasibility of doing array studies on microdissected brain tissue, but also displayed the ability of arrays to characterize expression changes dictated by different genetic backgrounds. This could have important implications for the study of gene-targeted mouse lines. The different anatomical and functional areas of the brain are patterned according to a precise developmental plan involving neural induction, neuronal cell migration, and segregation. The study of neuronal development has proven a valuable approach to understand brain organization. DNA arrays allow a detailed analysis of changes in gene expression associated with neuronal development. Wen et al. (1998) generated a temporal map ofgene expression during the development of rat cervical spinal cord from embryonic day 11 to postnatal day 12. This study was conducted using reverse transcription and PCR to study expression levels for 112 genes. Wen et al. (1998) showed that functionally related genes display remarkably similar patterns of expression during this period of spinal cord development. Although most of the genes under this study had known functions, one can imagine that similar studies using arrays with many thousands of genes will produce novel information regarding the function of "unknown" genes by correlation with expression profiles of other known genes. 2. Molecular Characterization of Neuronal Plasticity
One of the key properties of the brain is its ability to reorganize structurally and functionally in response to experience. This phenomenon, known as neuronal plasticity, may involve changes in synaptic activity, either through alterations in intrinsic properties of existing synapses/neurons or by structural modification of synapses. This synaptic reorganization ultimately produces behavioral changes. Among the most studied factors that can trigger neuronal plasticity are brain injury and aging, learning and memory, and drug addiction. It is now clear that long-term neuronal plasticity involves complex transcriptional reprogramming (Nguyen et al., 1994; Nestler and Aghajanian, 1997). However, attempts to functionally correlate behavior-altered synaptic function changes in the expression of single or even limited numbers of genes have generally been inconclusive. DNA arrays, through detection of changes in the pattern of expression for large numbers of genes, may provide new insight into molecular events, including neuronal plasticity. In our laboratory, we are currently using oligonucleotide arrays to characterize transcriptional changes accompanying the development of sensitization to drugs of abuse, such as ethanol and cocaine in mice. Sensitization
240
THIBAULT et al.
is defined as an increase in behavioral responses to repeated intermittent administration o f a given dose of a drug (Robinson and Berridge, 1993). Increases in locomotor activity are commonly used as a measure of sensitization. Importantly, sensitization appears to increase the rewarding effect of abused drugs (Horger et al., 1990, 1992). Sensitization persists long after cessation of drug intake and is believed to involve long-term changes in gene expression in different regions of the brain, including the dopaminergic mesolimbic system (Nestler and Aghajanian, 1997). To characterize these changes, we have c o m p a r e d transcriptional profiles in the nucleus accumbens and ventral tegmental area of sensitized and control mice. Initial studies show a striking reorganization of the molecular responses to cocaine in sensitized versus naive animals (Wang et al., 1999b). It is h o p e d that the identification of gene pathways participating in the initiation or expression of sensitization will provide novel targets for intervention in some aspects of drug addiction. Learning and m e m o r y constitute a n o t h e r form of plasticity that has been shown to require gene expression r e p r o g r a m m i n g in the hippocampus and amygdala. Two reports used DNA arrays to characterize the molecular basis of m e m o r y formation and decline, respectively (Dubnau et al., 1999; Luo et al., 1999b). Luo et al. (1999b) compared transcriptional profiles of the hippocampus of young and old rats subjected to a T-maze learning protocol to characterize the decline of m e m o r y function with age. Dubnau et al. (1999) used oligonucleotide arrays to study m e m o r y formation in Drosophila melanogaster by using a combination of behavioral training protocols and various mutant flies with disrupted m e m o r y function. These few studies illustrate how DNA array technology can be used to gain a better understanding of the molecular basis of neuronal plasticity. Future studies should involve careful cellular mapping and testing of the contribution of particular candidate genes to the development or maintenance of neuronal plasticity. This can be achieved through their manipulation in knockout or transgenic animals. Alternatively, specific pharmacological agents may be used to block expression or the function of the e n c o d e d protein. Identification of a specific pattern of gene expression causally linked to a form of neuronal plasticity could provide much more functional information regarding the mechanism of plasticity than does identifying a single gene. 3. Gene Profiling in Animal Models of Neuronal Abnormalities
A great deal of what we know about brain function has been obtained through studies on animal models. Because of the large n u m b e r of inbred strains available (more than 450), the mouse has long been a model of choice. Many of these strains were originally bred for specific phenotypes,
DNA ARRAYS AND N E U R O B I O L O G Y
241
and many have been used to study the genetics of simple or complex traits. The crossing of phenotypically different inbred strains is used for mapping quantitative and qualitative trait loci (Crabbe et al., 1994). However, the identification of causal polymorphisms remains a major challenge to the genetics of complex traits. Expression profiling with DNA arrays may offer a major advance for such studies by allowing the identification of "patterns" of gene expression changes associated with a given mouse phenotypic model. This pattern may impart functional information not easily discerned by studying a single gene. In some cases, this might largely obviate the actual need to isolate the polymorphism(s) associated with a single gene contributing to a given animal model. 4. Gene Profiling of H u m a n Neurological Diseases
DNA array technology is well adapted for studying the complex molecular changes occurring during the development and progression of chronic neurological diseases. It provides simple tools to identify novel markers for disease detection and targets for therapeutic approaches. The suitability of this technology for profiling diseases is now well documented in the literature. The most extensive studies focused on the transcriptional changes associated with the development of various cancers, including renal, colon, prostate, and ovarian cancers (Alon et al., 1999; Bubendorf et al., 1999; Moch et al., 1999; Wang et al., 1999a). One study also examined differential gene expression between normal brain tissues and glioblastoma multiform tumor tissue in humans using Clontech macroarrays (Sehgal et al., 1998). Among the 143 genes detected in these tissues, more than 100 were differentially expressed. The majority of genes overexpressed in normal brain were oncogenes and tumor-suppressor genes, such as the retinoblastoma gene, as well as genes coding for DNA-binding proteins. Conversely, genes coding for cell surface proteins or proteins involved in signal transduction were preferentially overexpressed in tumor compared with normal tissues. Because of the phenotypic and genetic heterogeneity of tumors, it often appears necessary to survey changes in gene expression in a large number of tumor specimens. Few studies report the combined use of cDNA and tissue microarray technologies to facilitate the identification of clinically relevant genes (Bubendorf et al., 1999; Moch et al., 1999). Candidate genes were first isolated through DNA array screening of one tumor specimen. Changes in their expression level were further confirmed using immunohistochemical methods on tissue microarray containing a large number of different fixed tumor samples. By determining changes that occur at higher frequency among tumors, one can distinguish genes with the most promising clinical applications. Others have used clustering of transcriptional profiles from multiple tumor specimens to catalog genes potentially coregulated or having
242
THIBAULT
et al.
similar cellular function. Clustering and display methods have also proved useful for the molecular classification of cancers based on their gene expression pattern (Ross et al., 2000). Such an approach has obvious applications for diagnostic testing. DNA array technology has been applied to the study of several neurological diseases. Whitney et al. (1999) c o m p a r e d transcriptional profiles of normal white matter and acute multiple sclerosis lesions from the brain of a single patient. In a more extensive study of frontal cortex tissue from control and alcoholic subjects, Lewohl et al. (1999) d o c u m e n t e d the coordinate down-regulation of multiple myelin-related genes in the tissue from alcoholics. This study employed two i n d e p e n d e n t groups of controls and alcoholics, each consisting of multiple subjects. Decreased expression of myelin-related genes in alcoholics may have important pathophysiological implications for cognitive dysfunction and demyelinating disorders suffered by alcoholics (Miles and Diamond, 1998). Although array-based transcriptional analyses potentially provide long lists of candidate genes, the interpretation of the changes observed remains difficult. Pathological conditions are often the cumulative result of genetic susceptibility factors and multiple damage and compensatory responses in many different cell types. Tremendous work will be n e e d e d to evaluate the contribution of each candidate gene isolated to the initiation or progression of the disease of interest. Comparison with results from appropriate animal models of disease may greatly aid interpretation of array results from h u m a n disease tissue. 5. Summary
DNA array studies on brain tissue have only begun to be reported. Initial results are promising in terms of arrays providing novel insight into brain function and dysfunction in disease. However, considerable caution on interpretation of results seems warranted, given the e n o r m o u s complexity of the brain and important issues regarding array sensitivity (see Section IV.A).
C. APPLICATIONS IN NEUROGENETICS
Within the last few months, the H u m a n G e n o m e Project has released the first complete reference sequence of the h u m a n chromosomes. Already, many groups are concentrating on identifying sequence variations a m o n g individuals and between species. Genotypic variations are believed to underlie most of the phenotypic differences in normal and disease states. Numerous diseases directly affecting the nervous system have a genetic basis. Some of these disorders have been shown to involve mutation in a single gene;
DNA ARRAYS AND N E U R O B I O L O G Y
243
yet, many arise from mutations at multiple loci. Thus, Huntington's disease results from a characteristic expansion of CAG repeat in the Huntington gene. However, mutations in four genes, including presenilin 1 and 2, have been implicated in the development of Alzheimer's disease. Although some mutations are monomorphic, many disease genes can be affected by a large spectrum of distinct mutations. DNA array technology has been proposed as a new high-throughput tool to carry out exhaustive screening of mutations in these complex disease genes. Hacia et al. (1998) used high-density oligonucleotide arrays to complete a mutational analysis of the ATM gene linked to the autosomal recessive disorder ataxia-telangiectasia (AT). AT is a pleiotropic disease characterized by immunodeficiencies, radiation sensitivity, genetic instability, and gradual loss of Purkinje cells in the cerebellum leading to progressive n e u r o m o t o r deterioration. The responsible gene encodes the ATM protein, a kinase of 350 kDa, with sequence homology to phosphatidylinositol 3-kinases (PI3K). This protein is believed to play a crucial role in signaling pathways that respond to DNA strand breaks, such as in meiosis, genetic recombination, or apoptosis. The ATM gene is characterized by a complex genomic structure, spanning 146 kb of genomic DNA and containing 62 exons. More than 100 distinct ATM mutations widely distributed t h r o u g h o u t the gene have been d o c u m e n t e d in AT patients. Because of the large size of this gene and the diversity and broad distribution of mutations that can affect it, diagnostic mutation screening has been difficult. Hacia et al. (1998) have therefore designed a pair of DNA arrays containing more than 95,000 oligonucleotides and aimed at detecting all possible heterozygous germ-line sequence variations in the ATM coding sequence. Each position in the gene was interrogated with 10 separate 25-mer oligonucleotides, 5 for each sense and antisense strands (2 wild type and 3 containing a base substitution in their central position). In a blinded study, 17 of 18 known distinct heterozygous and 8 of 8 different homozygous sequence variants were detected. In addition, 5 new mutations were identified and previous genotyping assignments were corrected. Similar strategies were used to screen for sequence variations in the BRCA1, a gene important in breast cancer (Hacia et al., 1996). Apart from sequence variations, abnormalities in DNA copy n u m b e r also contribute to numerous genetic disorders. Thus, various developmental abnormalities such as Down, Prader Willi, or Angelman syndromes result from gain or loss of one copy of a chromosome or chromosomal region. Mapping of the genes involved in these abnormalities and determination of their copy n u m b e r is essential for understanding disease phenotypes. Comparative genomic hybridization (GHC) to microarrays constitutes a new m e t h o d to investigate these DNA copy n u m b e r aberrations. Geschwind et al. (1998) have used this approach to test gene dosage in patients with Klinefelter
244
THIBAULT et
al.
syndrome (KS), a sex c h r o m o s o m a l abnormality. Affected males carry an additional X c h r o m o s o m e (XXY), which results in hypogonadism, a n d r o g e n deficiency, a n d impaired spermatogenesis. A n u m b e r of neurological abnormalities have also b e e n recognized in these patients. T h e data p r e s e n t e d in Geschwind's study, together with previous reports, suggest KS patients have specific verbal learning disabilities, an increase in left-handedness when m e a s u r e d by skill, and an a b n o r m a l functional laterality for phonologic processing (Geschwind et al., 1998). Such phenotypes suggest an atypical pattern of cerebral laterality or a n o m a l o u s dominance. T h e authors hypothesized that alterations in gene dosage in the p s e u d o a u t o s o m a l region (PAR) of the sex c h r o m o s o m e s were responsible for a n o m a l o u s cerebral laterality in these patients. They developed an o r d e r e d DNA microarray o f inter-Alu fragments covering the X c h r o m o s o m e PAR at high resolution. T h e ratio o f gene dosage in the XPAR was c o m p a r e d in four separate XY: XY and XXY: XX hybridizations. As expected, the ratio was 1 : 1 in XY: XYhybridizations, whereas it was close to 1.4 : 1 in XXY : XX hybridizations. T h e authors p r o p o s e d that such an a p p r o a c h might eventually lead to the identification of autosomal loci contributing to structural or functional cerebral dominance.
IV. Caveats and Future Needs
A. SENSITMTYAND REPRODUCIBILITY Because of its novelty, the limitations of DNA array technology are still unclear a n d a n u m b e r of basic questions r e m a i n e d to be answered. Although n u m e r o u s studies have now d e m o n s t r a t e d the power o f this technology in m o d e l organisms such as yeast, p e r f o r m a n c e for gene expression monitoring in c o m p l e x tissues such as the brain is still uncertain. Perhaps the major question regarding use of arrays for expression studies on the brain concerns sensitivity. T h e limit of detection for cDNA or oligonucleotide arrays is generally stated as 1-3 mRNA copies p e r m a m m a l i a n cell, assuming each cell contains an average of 300,000 individual mRNA molecules. However, 85% of m a m m a l i a n genes are known to be expressed at very low abundance. F u r t h e r m o r e , m a n y mRNAs of interest in the brain, such as those coding n e u r o t r o p h i n s or neurotransmitters, are often only detected in restrictive population of cells. Because the brain is a highly h e t e r o g e n e o u s tissue, the ability to reliably quantitate a rare mRNA expressed in a small subset of neurons a m o n g the m a n y o t h e r types of n e u r o n and glia remains an u n p r o v e n ability for array studies. Methods to improve the sensitivity and dynamic range of arrays are clearly needed.
DNA ARRAYS AND NEUROBIOLOGY
245
Microdissection techniques such as laser capture microscopy a p p e a r feasible a p p r o a c h e s for improving analysis of genes with restricted expression patterns. Such techniques also serve to decrease the complexity of the RNA populations u n d e r analysis. However, the quantitative robustness of such methods has not been established. Another strategy would be to isolate only mRNA molecules that are localized in certain cellular c o m p a r t m e n t s (e.g., polyribosome bound, synaptosomal) or in cell types that can be easily sorted (e.g., by fluorescent-activated cell sorting). A current r e p o r t documents the use of such approaches to identify secreted and m e m b r a n e associated gene products (Diehn et al., 2000). Most array technologies are adapted to detect changes in expression in the o r d e r of twofold or more. It is likely, however, that smaller changes in expression are biologically relevant. Such changes are technically m u c h harder to distinguish, particularly for genes with low expression levels. In traditional N o r t h e r n blot or RT-PCR studies where one or a few genes are m o n i t o r e d simultaneously, repeat experiments and statistical analyses are a standard for estimating the confidence in the changes observed. Variability between replicates is often quite large, and multiple data points are generally n e e d e d to distinguish between true differences in expression f r o m changes due to experimental variability. Mainly due to costs, many array experiments have been p e r f o r m e d at most in duplicate. Therefore, the accuracy of m e a s u r e m e n t in these experiments may be questionable. Confirmation of the most salient changes observed by m o r e traditional m e t h o d s may be required. New statistical approaches are greatly n e e d e d for analysis. Furthermore, establishing suitable m e t h o d s for comparing array data between different laboratories and array platforms is crucial.
B. INTERPRETATION AND USE OF THE DATA
As discussed above (Section II.C), one of the greatest challenges in largescale expression analyses is to organize the deluge of data g e n e r a t e d and to extract functional information. Most of the studies conducted on m a m malian systems concern pairwise comparisons between two conditions, normal versus disease, or control versus drug treated. A typical a p p r o a c h to select candidate genes in such settings is based on the ratio of expression between the two samples. For example, all genes whose expressions deviate f r o m that of the control by m o r e than twofold or any other arbitrary percentage can be selected. T h e end p r o d u c t of these selections often represents a limited but still complex list of genes. A natural first step in using this information is to focus on the extremes. Although useful, this a p p r o a c h does not exploit the full potential of DNA array studies.
246
THIBAULT et
al.
T h e ultimate goal in large-scale analysis of gene expression is to be able to group genes according to similar expression patterns and to extract functional and casual relationships between genes. Indeed, it is generally assumed that genes which follow similar patterns of expression across a range of conditions or time are likely to share c o m m o n molecular regulatory processes or are likely to participate in similar or complementary functions. It is expected that computational methods, such as cluster analysis combined with graphical representation, will help establish such relationships. However, methods for optimizing the p e r f o r m a n c e of such multivariate studies on array data are still being developed (Heyer et al., 1999). Nevertheless, a n u m b e r of comprehensible patterns of gene expression have been obtained in yeast. This organism represents the ultimate model for gene expression analysis because all its genes have been identified and sequenced, and can be interrogated in a single hybridization. Furthermore, a function has already been attributed to a large n u m b e r of these genes. The interpretation of DNA array studies on mammalian system represents a m u c h greater challenge. The majority of the probes used on DNA arrays encode uncharacterized ESTs. The incomplete monitoring of the g e n o m e and the lack of gene function information complicate the interpretation of the data. In addition, although establishing causality between genes in h o m o g e n e o u s samples such as yeast or t u m o r cell lines is more straightforward, studies on heterogeneous tissues may produce inextricable patterns. Another level of complexity inherent to the nervous system arises from the intricate neuronal networks essential for brain function. It is unclear how such complexity can be addressed by DNA array technology. Thus, primary responses in one n e u r o n will be intermixed with compensatory responses from other neurons "downstream" in a given network. Deciphering such complexity will require a continuation of genetic and pharmacological approaches c o m b i n e d with array studies.
C. SHARINGARRAYDATA
It is already abundantly clear that DNA array studies will require a new era o f cooperation among scientists of various disciplines. T h e data is simply too massive and complex for any one laboratory to fully extract even a portion o f significant patterns within a reasonable time. The Brown laboratory has again set the pace for how the field should proceed. Their posting o f articles, figures, software, statistical analyses, and raw data have b e c o m e the standard for how array data is shared ( h t t p : / / c m g m . s t a n f o r d . e d u / p b r o w n / ) . The National Center for Biotechnology Information (NCBI) has established a web site for depositing array data (http://www.ncbi.nlm.nih.gov/geo/). This "Gene Expression Omnibus" (Geo) seeks to compile data from a variety
DNA ARRAYS AND NEUROBIOLOGY
247
of array platforms. The mechanics of this are obviously complicated, insofar as being able to compare expression patterns of a given gene across various platforms. Furthermore, the issue of data integrity is likely to b e c o m e very complex as the quality of array studies varies greatly. To date, only 10 samples are contained in the Geo database. T h e posting of array data has numerous benefits. Constructing a "comp e n d i u m " of array results, as has been done for yeast (Hughes et al., 2000), should allow much more rigorous multivariate analyses with resultant identification of important functional or regulatory patterns. It is impractical for individual laboratories to construct such a large resource. Thus, submission of array data to a c o m m o n site or widespread exchange of data may allow efforts to generate such a c o m p e n d i u m of array data for brain samples. In addition, the widespread availability of array datasets will speed the development of new analysis tools. Several studies of new approaches to array analysis take advantage of data posted on public web sites (Heyer et al., 1999)
D.
COST AND AVAILIBILITY OF ARRAYS
Regardless of the type of platform, there have been problems with access to DNA arrays for several reasons. In the case of commercial oligonucleotide arrays, the major factor has been cost. The first generation o f such chips for h u m a n or mouse studies, consisting of --6500 genes/ESTs, had a list price of about $3000 per array. This was c o m p o u n d e d by large startup costs for equipment. These factors have greatly improved during the early 2000s. Sites are now available at numerous academic centers for the hybridization/analysis of oligonucleotide arrays. Commercial resources are also available. Furthermore, the price of arrays as of May 2000 decreased to approximately $1500 per array for arrays containing ~11,000 human, mouse, or rat genes/ESTs. This price is even lower for universities having academic licensing agreements or other high-use contracts. Spotted cDNA arrays have dramatically lower prices on a "per-array" basis. Some centers have quoted labor/material costs of about $25 per array for arrays containing 6000-10,000 genes/ESTs. However, there is substantial overhead in labor/equipment for amplifying cDNA inserts, arraying PCR fragments, scanning arrays, and performing sophisticated analysis. Furthermore, maintaining reproducible high-quality array production has not proven to be a trial task. Thus, there is a substantial infrastructure required for producing quality spotted cDNA arrays. Again, some of this has improved since the late 1990s, with a great increase in the n u m b e r of options available for arrayers, scanners, and other equipment. Furthermore, several large collections of cDNA clones have become available through a variety of sources, including commercial vendors (e.g., http://www.incyte.com/).
248
THIBAULTeta/.
V. Conclusion
T h e explosion of studies being p e r f o r m e d with DNA arrays is no accident. Despite considerable challenges in the setup, p e r f o r m a n c e , and analysis of such studies, a large n u m b e r of commercial and academic investigators have b e g u n array studies. This seems largely driven by the power of the app r o a c h to achieve nonbiased, novel insights into phenotypic or regulatory relationships a m o n g all genes in a given g e n o m e . Hence, the term "functional genomics" is not merely a catchphrase for a new technique, but rather, describes an entirely new scientific d i s c i p l i n e - - o n e that promises to greatly accelerate o u r u n d e r s t a n d i n g of disease, physiology, and drug action. Although application of array studies to the nervous system poses several new challenges in regard to sensitivity a n d complexity o f the analysis, early results again suggest great power for expression profiling. T h e rapid d e v e l o p m e n t of new methodologies for RNA isolation, array preparation, and data analysis/retrieval is highly likely given the exponential growth in array studies. Given some of the special issues regarding expression profiling in neurobioiogy, it seems likely that a subdiscipline of"functional n e u r o g e n o m i c s " may eventually develop. Finally, perhaps the most exciting aspect of DNA array studies is the t r e m e n d o u s integrative and collaborative nature of such work. This a p p r o a c h is highly suited to neuroscience, a discipline long versed in the value of integrative science.
Acknowledgments
This work was supported by a grant provided by the State of California for medical research on alcohol and substance abuse through the Universityof California, San Francisco. The authors want to thank other members of Miles' laboratory for their assistance preparing this manuscript.
References
Alon, U., Barkai, N., Notterman, D. A., Gish, ~, Ybarra, S., Mack, D., and Levine, A.J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleofide arrays. Proc. Natl. Acad. Sci, USA 96, 6745-6750. Audic, S., and Claverie,J. M. (1997). The significanceof digital gene expression profiles. Genome Res. 7, 986-995. Bassett, D. E.,Jr., Eisen, M. B., and Boguski, M. S. (1999). Gene expression informatics---It's all in your mine. Nat. Genet. 21, 51-55.
DNA ARRAYSAND NEUROBIOLOGY
249
Behr, M. A., Wilson, M. A., Gill, W. E, Salamon, H., Schoolnik, G. K., Rane, S., and Small, E M. (1999). Comparative genomics of BCG vaccines by whole-genome DNA microarray [see comments]. Science 284, 1520-1523. Bertucci, E, Bernard, K., Loriod, B., Chang, Y. C., Granjeaud, S., Birnbaum, D., Nguyen, C., Peck, K., and Jordan, B. R. (1999a). Sensitivity issues in DNA array-based expression measurements and performance of nylon microarrays for small samples. Hum. Mol. Genet. 8, 1715-1722. Bertucci, E, Van Hulst, S., Bernard, K., Loriod, B., Granjeand, S., Tagett, R., Starkey, M., Nguyen, C., Jordan, B., and Birnbaum, D. (1999b). Expression scanning of an array of growth control genes in human tumor cell lines. Oncogene 18, 3905-3912. Bowtell, D. D. (1999). Options available--from start to finish--for obtaining expression data by microarray. Nat. Genet. 21, 25-32. Brown, M. P., Grundy, W. N., Lin, D., Cristianini, N., Sugnet, C. W., Furey, T. S., Ares, M, Jr., and Haussler, D. (2000). Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc. Natl. Acad. Sci. USA 97, 262-267. Bubendorf, L., Kolmer, M., Kononen, J., Koivisto, E, Mousses, S., Chen, Y., Mahlamfiki, E., Schraml, E, Moch, H., Willi, N., Elkahloun, A. G., Pretlow, T. G., Gasser, T. C., Mihatsch, M. J., Sauter, G., and Kallioniemi, O. P. (1999). Hormone therapy failure in human prostate cancer: Analysis by complementary DNA and tissue microarrays. J. Natl. Cancer Inst. 91, 1758 -1764. Chee, M., Yang, R., Hubbell, E., Berno, A., Huang, X. C., Stern, D., Winkler, J., Lockhart, D.J., Morris, M. S., and Fodor, S. E (1996). Accessing genetic information with high-density DNA arrays. Science 274, 610-614, Chen,J.J., Wu, R., Yang, E C., Huang, J. Y., Sher, Y. P., Han, M. H., Kao, W. C., Lee, EJ., Chin, T. E, Chang, E, Chu, Y. W., Wu, C. W., and Peck, K. (1998). Profiling expression patterns and isolating differentially expressed genes by cDNA microarray system with colorimetry detection. Genomics 51, 313-324. Cho, R.J., Campbell, M.J., Winzeler, E. A., Steinmetz, U, Conway, A., Wodicka, L., Wolfsberg, T. G., Gabrielian, A. E., Landsman, D., Lockhart, D.J., and Davis, R. W. (1998). A genomewide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2, 65-73. Claverie, J. M. (1999). Computational methods for the identification of differential and coordinated gene expression. Hum. Mol. Genet. 8, 1821-1832. Crabbe, J. C., Belknap, J. IL, and Buck, K.J. (1994). Genetic animal models of alcohol and drug abuse. Science 264, 1715 -1723. D'Haeseleer, E, Wen, X., Fuhrman, S., and Somogyi, R. (1999). Linear modeling of mRNA expression levels during CNS development and injury. Pac. Symp. Biocomput. 90, 41-52. Dattani, M. T., Martinez-Barbera, J. E, Thomas, E Q., Brickman, J. M., Gupta, R., M~rtensson, I. L., Toresson, H., Fox, M., Wales, J. K., Hindmarsh, E C., Krauss, S., Beddington, R. S., and Robinson, I. C. (1998). Mutations in the homeobox gene HESX1/Hesxl associated with septo-optic dysplasia in human and mouse. Nat. Genet. 19, 125-133. Der, S. D., Zhou, A., Williams, B. R., and Silverman, R. H. (1998). Identification of genes differentially regulated by interferon alpha, beta, or gamma using oligonucleotide arrays. Proc. Natl. Acacl. Sci. USA 95, 15623-15628. DeRisi, J., Penland, L., Brown, E O., Bittner, M. L., Meltzer, E S., Ray, M, Chen, Y., Su, Y. A., and Trent, J. M. (1996). Use ofa cDNA microarray to analyse gene expression patterns in human cancer [see comments]. Nat. Genet. 14, 457-460. DeRisi, J. L., Iyer, V. R., and Brown, E O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278, 680-686. Diamond, I., and Gordon, A. S. (1997). Cellular and molecular neuroscience of alcoholism. Physiol. Rev. 77, 1-20.
250
THIBAULT et al.
Diehn, M., Eisen, M. B., Botstein, D., and Brown, P. O. (2000). Large-scale identification of secreted and membrane-associated gene products using DNA microarrays. Nat. Genet. 25, 58 -62. Drmanac, S., Kita, D., Labat, I., Hauser, B., Schmidt, C., Burczak,J. D., and Drmanac, R. (1998). Accurate sequencing by hybridization for DNA diagnostics and individual genomics. Nat. Biotechnol. 16, 54-58. Dubnau,J., Certa, U., Gossweiler, S., Broger, C., Neeb, M., Yin,J., Mous,J., and Tully, T. (1999). Functional genomics of long-term memory. Soc. Neurosci. Abstr. 25, 1313. Eisen, M. B., and Brown, E O. (1999). DNA arrays for analysis of gene expression. Methods Enzymol. 303, 179-205. Eisen, M. B., Spellman, E T., Brown, E O., and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863-14868. Ermolaeva, O., Rastogi, M., Pruitt, K. D., Schuler, G. D., Bittner, M. L., Chen, Y., Simon, R., Meltzer, E, Trent, J. M., and Boguski, M. S. (1998). Data management and analysis for gene expression arrays. Nat. Genet. 20, 19-23. Fambrough, D., McClure, K., Kazlauskas, A., and Lander, E. S. (1999). Diverse signaling pathways activated by growth factor receptors induce broadly overlapping, rather than independent, sets of genes [see comments]. Cell 97, 727-741. Fodor, S. E, Rava, R. E, Huang, X. C., Pease, A. C., Holmes, C. E, and Adams, C. L. (1993). Multiplexed biochemical assays with biological chips. Nature 364, 555 -556. Fodor, S. E, Read,J. L., Pirrung, M. C., Stryer, L., Lu, A. T., and Solas, D. (1991 ). Light-directed, spatially addressable parallel chemical synthesis. Science 251,767-773. Geschwind, D. H., Gregg, J., Boone, K., Karrim, J., Pawlikowska-Haddal, A., Rao, E., Ellison, J., Ciccodicola, A., Durso, M., Woods, R., Rappold, G. A., Swerdloff, R., and Nelson, S. E (1998). Klinefelter's syndrome as a model of anomalous cerebral laterality: Testing gene dosage in the X chromosome pseudoautosomal region using a DNA microarray. Dev. Genet. 23, 215-229. Golub, T. R., Slonim, D. K., Tamayo, E, Huard, C., Gaasenbeek, M., Mesirov, J. E, Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D., and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531-537. Granjeaud, S., Bertucci, E, and Jordan, B. R. (1999). Expression profiling: DNA arrays in many guises. BioEssays 21,781-790. Gray, N. S., Wodicka, L., Thunnissen, A. M., Norman, T. C., Kwon, S., Espinoza, E H., Morgan, D. O., Barnes, G., LeClerc, S., Meijer, L., Kim, S. H., Lockhart, D.J., and Schultz, E G. (1998). Exploiting chemical libraries, structure, and genomics in the search for kinase inhibitors. Science 281, 533-538. Hacia, J. G., Brody, L. C., Chee, M. S., Fodor, S. E, and Collins, E S. (1996). Detection of heterozygous mutations in BRCAI using high density oligonucleotide arrays and twocolour fluorescence analysis [see comments]. Nat. Genet. 14, 441-447. Hacia, J. G., Fan, J. B., Ryder, O., Jin, L., Edgemon, K., Ghandour, G., Mayer, R. A., Sun, B., Hsie, L., Robbins, C. M., Brody, L. C., Wang, D., Lander, E. S., Lipshutz, R., Fodor, S. P., and Collins, E S. (1999). Determination of ancestral alleles for human single-nucleotide polymorphisms using high-density oligonucleotide arrays [see comments]. Nat. Genet. 22, 164-167. Hacia, J. G., Sun, B., Hunt, N., Edgemon, K., Mosbrook, D., Robbins, C., Fodor, S. P., Tagle, D. A., and Collins, E S. (1998). Strategies for mutational analysis of the large multiexon ATM gene using high-density oligonucleotide arrays. Genome Res. 8, 1245-1258. Heyer, L.J., Kruglyak, S., and Yooseph, S. (1999). Exploring expression data: Identification and analysis of coexpressed genes. Genome Res. 9, 1106 -1115. Hilsenbeck, S. G., Friedrichs, W. E., Schiff, R., O'Connell, E, Hansen, R. I~, Osborne, C. I~, and
DNA ARRAYSAND NEUROBIOLOGY
251
Fuqua, S. A. (1999). Statistical analysis of array expression data as applied to the problem of tamoxifen resistance [see comments] .J. Natl. Cancer Inst. 91, 453--459. Horger, B. A., Giles, M. K., and Schenk, S. (1992). Preexposure to amphetamine and nicotine predisposes rats to self-administer a low dose of cocaine. Psychopharmacology 107, 271-276. Horger, B. A., Shelton, IC, and Schenk, S. (1990). Preexposure sensitizes rats to the rewarding effects of cocaine. Pharmacol. Biochem. Behavior 37, 707-711. Hughes, T. R., Marton, M.J., Jones, A. R., Roberts, C.J., Stoughton, R., Armour, C. D., Bennett, H. A., Coffey, E., Dai, H., He, Y..D., Kidd, M.J., King, A. M., Meyer, M. R., Slade, D., Lnm, P. Y., Stepaniants, S. B., Shoemaker, D. D., Gachotte, D., Chakraburtty, K., Simon,J., Bard, M., and Friend, S. H. (2000). Functional discoveryvia a compendium of expression profiles. Cell 102, 109-126. lyer, V. R., Eisen, M. B., Ross, D. T., Schuler, G., Moore, T., Lee, J. C. E, Trent, J. M., Staudt, L. M., Hudson, J., Jr., Boguski, M. S., Lashkari, D., Shalon, D., Botstein, D., and Brown, P. O. (1999). The transcriptional program in the response of human fibroblasts to serum [see comments]. Science 283, 83-87. Kacharmina, J. E., Crino, P. B., and Eberwine, J. (1999). Preparation of cDNA from single cells and subcellular regions. Methods Enzymol. 303, 3-18. Khan,J., Bittner, M. L., Saal, L. H., Teichmann, U., Azorsa, D. O., Gooden, G. C., Pavan, W.J., Trent, J. M., and Meltzer, P. S. (1999). cDNA microarrays detect activation of a myogenic transcription program by the PAX3-FKHR fusion oncogene. Proc. Natl. Acad. Sci. USA 96, 13264-13269. Khodarev, N. N., Advani, S. J., Gupta, N., Roizman, B., and Weichselbaum, R. R. (1999). Accumulation of specific RNAs encoding transcriptional factors and stress response proteins against a background of severe depletion of cellular RNAs in cells infected with herpes simplex virus 1. Proc. Natl. Acad. Sci. USA 96, 12062-12067. Lander, E. S. (1999). Array of hope. Nat. Genet. 21, 3-4. Lee, M. L., Kuo, E C., Whitmore, G. A., and Sklar, J. (2000). Importance of replication in microarray gene expression studies: Statistical methods and evidence from repetitive cDNA hybridizations. Proc. Natl. Acad. Sci. USA 97, 9834-9839. Levine, A.J. (1997). p53, The cellular gatekeeper for growth and division. Cell 88, 323-331. Lewohl, J. M., Miles, M. E, Wang, L., Wilke, N., Fan, L., Wilce, P. A., Dodd, P. R., and Harris, R. A. (1999). Differential gene expression in the frontal cortex of human alcoholics. Soc. Neurosci. Abstr. 25, 1325. Li, H., Schrick,J.J., Fewell, G. D., MacFarland, K. L., Witte, D. P., Bodenmiller, D. M., Hsieh-Li, H. M., Su, C. Y., and Potter, S. S. (1999). Novel strategy yields candidate Gsh-1 homeobox gene targets using hypothalamus progenitor cell lines. Dev. Biol. 211, 64-76. Lipshutz, R.J., Fodor, S. P., Gingeras, T. R., and Lockhart, D.J. (1999). High density synthetic oligonucleotide arrays. Nat. Genet. 21, 20-24. Lipshutz, R.J., Morris, D., Chee, M., Hubbell, E., Kozal, M.J., Shah, N., Shen, N., Yang, R., and Fodor, S. P. (1995). Using oligonucleotide probe arrays to access genetic diversity. BioTechniques 19, 442-447. Lockhart, D.J., Dong, H., Byrne, M. C., Follettie, M. T., Gallo, M. V., Chee, M. S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., and Brown, E. L. (1996). Expression monitoring by hybridization to high-density oligonudeotide arrays [see comments]. Nat. Biotechnol. 14, 1675-1680. Loftus, S. K., Chen, Y., Gooden, G., Ryan, J. F., Birznieks, G., Hilliard, M., Baxevanis, A. D., Bittner, M., Meltzer, P., Trent, J., and Pavan, W. (1999). Informatic selection of a neural crest-melanocyte cDNA set for microarray analysis. Proc. Natl. Acad. Sci. USA 96, 9277-9280. Luo, L., Salnnga, R. C., Gut, H., Bitmer, A.,Joy, IL C., Galindo, J. E., Xiao, H., Rogers, K. E., Wan, J. S., Jackson, M. R., and Erlander, M. G. (1999a). Gene expression profiles of lasercaptured adjacent neuronal subtypes. Nat. Med. 5, 117-122.
252
THIBAULT et al.
Luo, Y., Spangler, E. L., Boyer, S., Ingram, D. K., and Weng, N.-E (1999b). Hippocampal gene expression analysis of young and aged rats in complex maze learning by cDNA microarray. Soc. Neurosci. Abstr. 25, 2164. Miles, M. E, and Diamond, I. (1998). Neurologic complications of alcoholism and alcohol abuse. In "Systemic Diseases, Part II, Handbook of Clinical Neurology" (E J. Vinken, and G. W. Bruyn, eds.), pp. 339-365. Elsevier, Amsterdam. Moch, H., Schraml, P., Bubendorf, L., Mirlacher, M., Kononen,J., Gasser, T., Mihatsch, M.J., Kallioniemi, O. E, and Sauter, G. (1999). High-throughput tissue microarray analysis to evaluate genes uncovered by cDNA microarray screening in renal cell carcinoma [see comments]. Am..]. Pathol. 154, 981-986. Nestler, E.J., and Aghajanian, G. I~ (1997). Molecular and cellular basis of addiction. Science 278, 58-63. Nguyen, C., Rocha, D., Granjeaud, S., Baldit, M., Bernard, K., Naquet, E, and Jordan, B. R. (1995). Differential gene expression in the murine thymus assayed by quantitative hybridization of arrayed cDNA clones. Genomics 29, 207-216. Nguyen, P. V., Abel, T., and Kandel, E. R. (1994). Requirement of a critical period of transcription for induction of a late phase of LTP. Science 265, 1104-1107. Ollila,J., and Vihinen, M. (1998). Stimulation of B and T cells activates expression of transcription and differentiation factors. Biochem. Biophys. Res. Commun. 249, 475-480. Pease, A. C., Solas, D., Sullivan, E.J., Cronin, M. T., Holmes, C. E, and Fodor, S. E (1994). Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc. Natl. Acad. Sci. USA 91, 5022-5026. Pi6tu, G., Mariage-Samson, R., Fayein, N. A., Matingou, C., Eveno, E., Houlgatte, R., Decraene, C., Vandenbrouck, Y., Tahi, E, Devignes, M. D., Wirkner, U., Ansorge, W., Cox, D., Nagase, T., Nomura, N., and Auffray, C. (1999). The Genexpress IMAGE knowledge base of the human brain transcriptome: A prototype integrated resource for functional and computational genomics. GenomeRes. 9, 195-209. Rajeevan, M. S., Dimulescu, I. M., Unger, E. R., and Vernon, S. D. (1999). Chemiluminescent analysis of gene expression on high-density filter arrays. J. Histochevn. Cytochem. 47, 337-342. Raychaudhuri, S., Stuart, J. M., and Altman, R. B. (2000). Principal components analysis to summarize microarray experiments: Application to sporulation time series. Pac. Symp. Biocomput. 455-466. Robinson, T. E., and Berridge, K. C. (1993). The neural basis of drug craving: An incentivesensitization theory of addiction. Brain Res. Rev. 18, 247-291. Ross, D. T., Scherf, U., Eisen, M. B., Perou, C. M., Rees, C., Spellman, E, Iyer, V., Jeffrey, S. S., Van de Rijn, M., Waltham, M., Pergamenschikov, A., Lee,J. C. F, Lashkari, D., Shalon, D., Myers, T. G., Weinstein,J. N., Botstein, D., and Brown, E O. (2000). Systematic variation in gene expression patterns in human cancer cell lines. Nat. Genet. 24, 227-235. Sandberg, R., Yasuda, R., Pankratz, D. G., Carter, T. A., Del Rio, J. A., Wodicka, L., Mayford, M., Lockhart, D.J., and Barlow, C. (2000). From the cover: Regional and strain-specific gene expression mapping in the adult mouse brain [in process citation]. Proc. Natl. Acad. Sci. USA 97, 11038-11043. Schena, M., Shalon, D., Davis, R. W., and Brown, P. O. (1995). Quantitative monitoring ofgene expression patterns with a complementary DNA microarray. Science 270, 467-470. Schena, M., Shalon, D., Heller, R., Chai, A., Brown, E O., and Davis, R. W. (1996). Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes. Proc. Natl. Acad. Sci. USA 93, 10614-10619. Sehgal, A., Boynton, A. L., Young, R. E, Vermeulen, S. S., Yonemura, K. S., Kohler, E. E, Aldape, H. C., Simrell, C. R, and Murphy, G. E (1998). Application of the differential hybridization
DNA ARRAYSAND NEUROBIOLOGY
253
of atlas human expression arrays technique in the identification of differentially expressed genes in human glioblastoma multiforme tumor tissue. J. Surg. Oncol. 67, 234-241. Shalon, D., Smith, S. J., and Brown, P. O. (1996). A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Res. 6, 639-645. Song, S., MacLachlan, T. K, Meng, R. D., and E1-Deiry, W. S. (1999). Comparative gene expression profiling in response to p53 in a human lung cancer cell line. Biochem. Biophys. Res. Commun. 264, 891-895. Southern, E., Mir, K., and Shchepinov, M. (1999). Molecular interactions on microarrays. Nat. Genet. 21, 5-9. Spanakis, E., and Brouty-Boy~, D. (1997). Discrimination of fibroblast subtypes by multivariate analysis of gene expression. Intl. J. Cancer 71,402-409. Tamayo, E, Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. S., and Golub, T. R. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96, 2907-2912. Thibault, C., Lai, C., Wilke, N., Duong, B., Olive, M. E, Rahman, S., Dong, H. D. E L., and Miles, M. E (2000). Expression profiling of neural cells reveals specific patterns of ethanolresponsive gene expression. Mol. Pharmacol. 58, 1593-1600. T6r6nen, P., Kolehmainen, M., Wong, G., and Castr6n, E. (1999). Analysis of gene expression data using self-organizing maps. FEBS Lett. 451, 142-146. Vingron, M., and Hoheisel, J. (1999). Computational aspects of expression data. J. Mol. Med. 77, 3-7. Volkow, N. D., Wang, G.-J., Fischman, M. W., Fohin, R. W., Fowler,J. S., Abumrad, N. N., Vitkun, S., Logan,J, Gatley, S.J., Pappas, N., Hitzemann, R., and Shea, C. E. (1997). Relationship between subjective effects of cocaine and dopamine transporter occupancy. Nature 386, 827-830. Wang, K., Gan, L., Jeffery, E., Gayle, M., Gown, A. M., Skelly, M., Nelson, P. S., Ng, W. V., Schummer, M., Hood, L., and Mulligan, J. (1999a). Monitoring gene expression profile changes in ovarian carcinomas using cDNA microarray. G~ne 229, 101-108. Wang, L., Ravindranathan, A., Lai, C., Thibault, C., Wilke, N., Olive, M. E, Lockhart, D.J., Hodge, C. W., and Miles, M. E (1999b). Molecular analysis of gene expression in behavioral sensitization to cocaine using high-density oligonucleotide arrays. Soc. Neurosci. Abstr. 25, 812. Wen, X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. L., and Somogyi, R. (1998). Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95, 334-339. Whitney, L. W., Becker, K. G., Tresser, N.J., Caballero-Ramos, C. I., Munson, P. J., Prabhu, V. V., Trent, J. M., McFarland, H. E, and Biddison, W. E. (1999). Analysis of gene expression in mutiple sclerosis lesions using cDNA microarrays. Ann. Neurol. 46, 425-428. Wittes, J., and Friedman, H. P. (1999). Searching for evidence of altered gene expression: A comment on statistical analysis of microarray data [editorial; comment] .J. Natl. Cancerlnst. 91,400-401. Wodicka, L., Dong, H., Mittmann, M., Ho, M. H., and Lockhart, D.J. (1997). Genome-wide expression monitoring in Saccharomyces cerevisiae. Nat. Biotechnol. 15, 1359-1367. Zhang, M. Q. (1999). Large-scale gene expression data analysis: A new challenge to computational biologists. Genome Res. 9, 681-688.