Molecular Genetics and Metabolism 70, 10 –18 (2000) doi:10.1006/mgme.2000.2989, available online at http://www.idealibrary.com on
Positional Cloning Utilizing Genomic DNA Microarrays: The Niemann–Pick Type C Gene as a Model System Dietrich A. Stephan,* ,1 Yidong Chen,* Yuan Jiang,* Lindsay Malechek,* Jessie Z. Gu,† Christiane M. Robbins,* Michael L. Bittner,* Jill A. Morris,‡ Eugene Carstea,‡ Paul S. Meltzer,* Karl Adler,§ Russell Garlick,§ Jeffrey M. Trent,* and Melissa A. Ashlock† *Cancer Genetics Branch and †Genetics and Molecular Biology Branch, National Human Genome Research Institute, and ‡Developmental and Metabolic Neurology Branch, National Institute of Neurologic Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20892; and §NEN Life Science Products, Inc., P.O. Box 199151, Boston, Massachusetts 02119 Received January 31, 2000, and in revised form March 16, 2000
stable transduction with a YAC that contains NPC1 and encompasses 108N2. Thus, the array technology identified NPC1 as a candidate gene based on a physical contig and differential NPC1 expression between NP-C and non-NP-C cells. This technique should facilitate gene identification when a physical contig exists for a region of interest and mutations result in changes in the mRNA level of the disease gene or portions thereof. © 2000 Academic Press
A major obstacle in positional cloning is identifying the specific mutated gene from within a large physical contig. Here we describe the application of DNA microarray technology to a defined genomic region (physical map) to identify: (i) exons without a priori sequence data and (ii) the disease gene based on differential gene expression in a recessive disorder. The feasibility was tested using resources from the positional cloning of the Neimann–Pick Type C (NP-C) disease gene, NPC1. To identify NPC1 exons and optimize the technology, an array was generated from genomic fragments of the 110-kb bacterial artificial chromosome, 108N2, which encodes NPC1. First, as a test case for blindly identifying exons, fluorescently labeled NPC1 cDNA identified 108N2 fragments that contained NPC1 exons, many of which also contained intronic sequences and could be used to determine part of the NPC1 genomic structure. Second, to demonstrate that the NPC1 disease gene could be identified based upon differential gene expression, subarrays of 108N2 fragments were hybridized with fluorescently labeled cDNA probes generated from total RNA from hamster cell lines differentially expressing NPC1. A probe derived from the NP-C cell line CT60 did not detect NPC1 exons or other genomic fragments from 108N2. In contrast, several NPC1 exons were detected by a probe generated from the non-NP-C cell line 911D5A13, which was derived from CT60, and expressed NPC1 as a consequence of
High-density polymorphic marker maps and the nearly complete yeast artificial chromosome (YAC) contig map of the human genome have facilitated assembly of physical maps of candidate regions. However, the subsequent step of gene identification has often been tedious and expensive, usually involving exon trapping (1–3), EST database searches (4), cDNA selection (5–7), and sequencing. As a consequence, only a small subset of linked loci have yielded etiological genes. In addition, the expectation that the entire human genome will be sequenced within a few years has incited technologic advances to more efficiently exploit these sequence data and the associated physical resources to identify additional human disease genes. Here we describe a technique to facilitate positional cloning (even in the presence of sequence data) that takes advantage of a physical map and the decrease in mutant transcript amounts in recessive disorders. The majority of recessive disorders result from a protein truncating mutation or loss of the entire protein, secondary to deletions, splice errors, or premature termination codons (PTCs). Until recently,
1
To whom reprint requests should be addressed at current address: Research Center for Genetic Medicine, Children’s National Medical Center, 111 Michigan Avenue, NW, Washington, DC 20010. Fax: (202) 884-6014. E-mail:
[email protected]. 10 1096-7192/00 $35.00 Copyright © 2000 by Academic Press All rights of reproduction in any form reserved.
NIEMANN-PICK TYPE C GENOMIC MICROARRAY
only a subset of these mutations (deletions or splice errors) has been known to result in loss of transcripts or portions thereof. Now it is known that PTCs can result in transcript loss that is secondary to rapid degradation of the mutant transcripts upon their association with the translation machinery (8 – 10). This mechanism has been proposed to explain the lack of dominant negative interactions of mutant proteins with functional proteins, thereby allowing carriers of some recessive conditions to be phenotypically normal (11). The observation that even PTC mutations in recessive disease cause loss of the mRNA molecule allows our study (based on differences between affected and unaffected transcript amounts) to be widely applicable to recessively inherited positional cloning projects. Here we take advantage of, and adapt, the established DNA microarray technology to scan large genomic intervals for coding sequence, while concurrently looking for decreased ratios of these coding sequences in affected cells vs normal cells, indicating a disease-causing mutation. The utility of a microarray-based strategy for gene identification is shown utilizing reagents generated in the positional cloning of NPC1 (12,13). Rather than arraying cDNA targets as has commonly been the practice in microarray analysis (14 –18), we have arrayed human genomic DNA fragments from a bacterial artificial chromosome (BAC) that spans NPC1. Utilizing these BAC genomic arrays, we have shown that part of the NPC1 genomic structure can be derived and also that NPC1 exons can be detected from cell lines differentially expressing NPC1. This approach suggests that the high capacity and fidelity of the microarray technology will provide an important adjunct to traditional positional cloning techniques. MATERIALS AND METHODS Preparation of a Human Genomic DNA Library from the BAC 108N2 Insert preparation. BAC 108N2 DNA was extracted using an AutoGen robot and digested with NotI to release the 110-kb insert that was resolved by pulsed-field electrophoresis, excised, and purified with a QIAEX II kit (Qiagen, Valencia, CA). The BAC DNA (20 ng) was combined with 50 nmol of the degenerate primer UN1 (CCGACTCGAGNNNNNNATGTGG) (19), a 0.2 mM concentration of each dNTP, 1⫻ buffer [40 mM Tris–HCl (pH 7.5), 20 nM MgCl 2, 50 mM NaCl], and 0.1 l topoisomerase (Pro-
11
mega, Madison, WI), incubated at 37°C for 30 min, and amplified by PCR (96°C for 10 min, followed by 6 cycles at 94°C for 1 min, 30°C for 2 min, and 37°C for 2 min) in an MJ thermal cycler (Watertown, MA). Sequenase (0.2 U) (Amersham Pharmacia Biotech, Piscataway, NJ) was added at the beginning of each 30°C incubation. Further amplification was performed on the entire template with the UN1 primer (10 pM), 1⫻ PCR buffer [10 mM Tris–HCl (pH 8.3) 50 mM KCl, 2.25 mM MgCl 2] (Perkin–Elmer Cetus, Norwalk, CT), 0.2 mM dNTPs, and 25 U Taq DNA polymerase LD (Perkin–Elmer Cetus) for 35 cycles (94°C for 1 min; 65°C for 1 min; 72°C for 1 min). The PCR products were visualized by electrophoresis through 2% agarose (500-bp average length), phenol/chloroform extracted, ethanol precipitated, and resuspended in 100 l TE (pH 8.0). PCR amplification products were adapted for uracil DNA glycosylase (UDG) cloning by amplification with the primer DAS1 (CUACUACUACUACCGACTCGAG)(10 pM), 0.2 mM each dNTP, 1⫻ PCR buffer, and Taq DNA polymerase (Perkin–Elmer Cetus) for 6 cycles (94°C for 0.5 min; 48°C for 0.5 min; 72°C for 2 min) followed by an additional 14 cycles (94°C for 0.5 min; 55°C for 0.5 min; 72°C for 2 min). Reactions were then phenol/chloroform extracted, ethanol precipitated, resuspended in 50 l TE (pH 8.0), purified over a ChromaSpin100 column (Clontech, Palo Alto, CA), and resuspended in 50 l TE (pH 8.0). The UDG adapted amplification product (0.1 g) was added to primer DAS1 (10 pM), a 0.2 mM concentration of each dNTP, 1⫻ PCR buffer, and Taq DNA polymerase and cycled 4 times (94°C for 0.5 min; 55°C for 0.5 min; 72°C for 2 min). Reactions were purified as above and resuspended in 50 l TE (pH 8.0) contained 1.6 g of adapted fragmented BAC template ready for insertion into vector described below (19). Vector preparation. The vector used to maintain the fragmented BAC was the plasmid pGEM3Zf⫹ (Promega), which was adapted for UDG cloning in the following manner. After digestion with XbaI (Promega), 200 ng pGEM3Zf⫹ was subjected to amplification in the presence of primer DAS2 (UAGUAGUAGUAGGGATCCCCGGGT)(10 pM), primer DAS3 (UAGUAGUAGUAGGTCGACCTGCAG)(10 pM), 0.2 mM dNTPs, 1⫻ KlenTaq PCR buffer (Clontech), and 4 l KlenTaq DNA polymerase (Clontech) for 24 cycles (94°C for 0.25 min; 55°C for 0.5 min; 72°C for 10 min) in 400 l. Amplification products were purified by phenol/chloroform extraction, ethanol precipitation, and chromatography
12
STEPHAN ET AL.
through a ChromaSpin400 column (Clontech). After a second ethanol precipitation, the vector was resuspended in 100 l of TE (pH 8.0). UDG cloning and transformation. The UAG adapted pGEM3Zf⫹ was combined with the UDG adapted fragmented BAC template (50 ng), 1⫻ PCR buffer, and 1 U UDG (Life Technologies Inc., Rockville, MD) in a 20-l reaction volume, incubated at 37°C for 30 min, and then transformed into DH10B cells (Life Technologies). Individual plasmid clones (1200 total) were isolated from LB/ampicillin plates, transferred into 96-well liquid culture boxes (Qiagen), and grown overnight in LB/amp medium, and stocks were prepared in 15% glycerol for storage at ⫺80°C. Preparation of Genomic Array DNA from BAC 108N2/pGEM3Zf⫹ clones was isolated using an alkaline lysis 96-well format miniprep kit (Edge Biosystems, Gaithersburg, MD) and resuspended in 200 l TE (pH 8.0), and 2 l was amplified with primer AEKM13F (GTTGTAAAACGACGGCCAGTG)(10 pM), primer AEKM13R (CACACAGGAAACAGCTATG)(10 pM), 1⫻ PCR buffer, 0.2 mM dNTPs, and 1 U Taq-Gold DNA polymerase (Perkin–Elmer Cetus) for 25 cycles (94°C for 0.5 min; 55°C for 0.5 min; 72°C for 2.5 min). Amplified inserts were ethanol precipitated, washed in 70% ethanol with a Bio-Rad immunowasher (Hercules, CA), dried, resuspended in 15 l of 3⫻ SSC, and robotically spotted onto polylysine-treated microscope slides (Gold Seal Products, Portsmouth, NH) at 5⫻ coverage (1200 clones/slide). Arrayed slides were UV-crosslinked (Stratagene, La Jolla, CA) at 450,000 J, blocked with succinic anhydride in 1-methyl-2-pyrrolidinone (Sigma, St. Louis, MO), plus 1 M sodium borate (Sigma), immersed in boiling water and then in ethanol, and spun dry as previously described (14). Fluorescence Labeling of Probes, Hybridization, and Scanning NPC1 cDNA probe. NPC1 cDNA (GenBank Accession No. AF002020) was amplified from human brain cDNA (Clontech) using primers flanking the translation start site (bp 124) and 5⬘ of the poly(A) tail (bp 4663), diluted, and used as a template for subsequent PCR amplification and labeling with Spectrum Orange dUTP (Vysis, Downer’s Grove, IL) via PCR incorporation (1 l primary amplification
product was reamplified using the identical PCR mix except spiked with 10 mM Spectrum Orange dUTP). Labeled amplification products were isolated, purified with BioSpin P6 (Bio-Rad), ethanol precipitated, and rehydrated in 50 l TE (pH 8.0) from which 2 l was denatured and used for hybridization. Total cellular RNA labeling. Total cellular RNA from both CT60 and 911D5A13 was extracted from approximately 10 8 cells. Cells were scraped from culture plates, pelleted, washed once with PBS, repelleted, resuspended in 6 ml of lysis buffer (RNAeasy kit, Qiagen), and then sonicated (VirSonic 60, 2-mm probe; Virtis, Gardiner, NY) at a dissipated power of 5 W. The lysate was then chromatographed per the manufacturer’s instructions. Eluted total RNA (900 l in DEPC water) was extracted with 3 ml of Trizol (Life Technologies). After the phases were mixed by vortexing, 600 l of chloroform (Sigma) was added and the mixture was centrifuged (5 min at 10,000g). The aqueous phase was precipitated by the addition of 0.5 ml isopropanol (Sigma) per milliliter of aqueous extract, vortexing, and centrifuging (10 min at 10,000g). The pellet was washed with 70% ethanol, dried, resuspended in 600 l DEPC water, quantitated by spectrometry, and stored at ⫺80°C. Biotinylated cDNA probes were made from 100 g total RNA for each of the two cell lines (CT60 and 911D5A13) by oligo(dT)-primed polymerization using SuperScript II (SS II) reverse transcriptase (Life Technologies). The reactions were carried out at a final volume of 40 l. Biotin-dUTP (Boehringer Mannheim, Indianapolis, IN) was used at 0.1 mM. The nucleotide concentrations were 0.5 mM for dGTP, dATP, and dCTP and 0.2 mM for dTTP. SS II (2 l) was added at the beginning of the labeling reaction and incubated at 42°C. After 30 min, an additional 2 l of SS II was added. After 30 min, the reaction was stopped using 5 l of 500 mM EDTA. The unlabeled RNA was hydrolyzed by adding 10 l of 1 M NaOH and heating to 65°C for 1 h. Then 25 l of 1 M Tris–HCl, pH 7.5, was added to partially neutralize the base. Unincorporated nucleotides and salts were removed by chromatography on a Biospin 6 column. The volume of the biotinylated probe was reduced using a Microcon 30 (Amicon, Beverly, MA). Half of the probe produced was used per hybridization and each probe was applied to a separate slide. The hybridization buffer contained 8 g poly(dA) (Amersham Pharmacia Biotech), 4 g Escherichia coli tRNA (Sigma), 10 g Cot1 DNA (Life Technolo-
NIEMANN-PICK TYPE C GENOMIC MICROARRAY
gies), 0.3 l of 10% SDS, 3 l of 20⫻ SSC (final concentration 3⫻ SSC), and water to a final volume of 20 l. Probe was heated to 98°C for 2 min and 4°C for 10 s, placed onto the glass slide with a coverslip, sealed in a custom hybridization chamber to prevent evaporation, and hybridized at 65°C for 16 h. The slides were washed for 2 min each in 0.5⫻ SSC and 0.01% SDS at 24°C, centrifuged to remove residual liquid from the surface, and scanned with a custom inverted scanning confocal laser microscope (14). Spectrum Orange-specific fluorescence was detected using excitation at 544 nm and collecting emitted light from 576 nm. For biotinylated probes, signal was developed via TSA using components from a Renaissance TSA Indirect Kit (NEN Lifescience Products, Boston, MA) amplification (20,21). Slides were treated as per the manufacturer’s instructions, substituting the more reactive Biotin TSA Plus (NEN Lifescience Products) for the standard biotin TSA. After biotin TSA deposition, slides were incubated with 0.5 g/ml streptavidin Cy3 (Amersham Pharmacia Biotech), washed three times (5 min each) with TNT buffer (Renaissance TSA Indirect Kit) and then 3 times (5 min each) with 0.06⫻ SSC. Cy3-specific fluorescence was detected using excitation at 530 nm and collecting emitted light from 570 to 590 nm. Microarray Analysis Two different methods were used to analyze the two types of microarray images acquired. The first is the single-image analysis with the goal of detecting strong match targets; the second is two-color image analysis resulting in detection of expression ratio changes. Strong match detection. Since most target intensities in the image are weak due to nonspecific hybridization, a median filter of 5 ⫻ 5 was first applied. DNA target segmentation was achieved by overlaying a grid on the fluorescent image. A detection method was then employed to determine the actual target region based on the information from the red pixel values (22). It can be assumed that the nonspecific hybridization intensity possesses a normal distribution (23), such that the mean intensity n , its standard deviation n , and mean target size of those nonspecific hybridized targets can then be estimated. The outliers (strongly matched targets) were determined by the following: (i) the size of the detected target must be larger than half the mean target size and (ii) the discordance of mean intensity
13
of the detected target I and its standard deviation I from the nonspecific hybridization intensity, T ⫽ 共 I ⫺ I 兲 ⫺ n 兲/ n . We chose the critical value T ⫹3.03 for a significance level of 5% and for an average target size of n ⫽ 60 pixels (24). The term of I was introduced into the standard formula to guarantee that the majority of pixel gray-levels within the detected target area were above n ⫹ 3.03* n . Array elements of interest based on their strong match detection in the 1200-element genomic BAC array were traced to their respective coordinates in the original glycerol stocks and sequenced using dye-terminators and the ABI 377 automated DNA sequencer (Applied Biosystems Inc., Foster City, CA) (25). BAC sequences that contained parts of NPC1 were aligned to the NPC1 cDNA with Sequencher (Gene Codes, Ann Arbor, MI). Ratio change detection. Two fluorescent images were obtained by scanning the two slides probed with biotinylated probes derived from either CT60 or 911D5A13. DNA target segmentation was achieved by overlaying a grid on both fluorescent images (green for the corrected cell line 911D5A13 and red for the NP-C cell line CT60). A detection method was then employed to determine the actual target region based on the information from both green and red pixels (22). The fluorescence intensity of each of the probes was then calculated by averaging the intensities of every pixel inside the detected target region and subtracting the local background intensity. Then ratios of the red intensity to the green intensity, R/G, for all targets were determined. Ratio normalization was performed based on 20 preselected internal control ESTs (human “housekeeping” genes) that are usually expressed at comparable levels in most human cells under most experimental conditions. The behavior of this panel of control genes provides a robust means of normalizing the data and estimating the level of variance observed between genes expected to have nearly identical expression levels in both the test and the reference samples. The normalization constant from the control gene panel was used to calibrate every ratio within the image. Furthermore, based on the variance observed in ratios in the control gene panel, a 99% confidence interval could be computed and used to assess the significance of differential levels of expression observed at the genomic targets.
14
STEPHAN ET AL.
FIG. 1. Schematic representation of the genomic array experiment. The design of the array was based on a contig from a physical map, which was known by linkage mapping to contain the disease gene of interest. A library made from BAC 108N2 (containing NPC1) by random priming was arrayed at high density on a glass slide. To test the BAC fragments for the presence of NPC1, the array was hybridized with fluorescently labeled NPC1 cDNA. These exons were then sequenced and shown to contain coding sequence from NPC1 (see Fig. 2A). A subarray was made from the high-density array and then hybridized to two different labeled cDNAs known to differentially express NPC1 (see Fig. 3).
RESULTS Delineation of NPC1 Exons and Genomic Structure Using a BAC Genomic Array The strategy shown in Fig. 1 demonstrates schematically: (i) the NPC1 genetic interval; (ii) the NP-C physical map including the YAC contig and the 110-kb BAC 108N2 that resides within YAC 911D5 and contains NPC1; and (iii) the generation of the genomic array used to evaluate NPC1. For the last part, a small insert library was generated from 108N2, and 1200 of the resulting clones were arrayed, resulting in greater than 5⫻ coverage of 108N2 (average element size ⬃500 bp). To verify the presence of NPC1 on the BAC array and to determine optimal hybridization conditions, the 1200-element array was hybridized with Spectrum Orange-labeled NPC1 cDNA. Twenty-four ar-
ray elements were identified (greater than 3 standard deviations above background fluorescence) from which 12 were randomly selected and determined by sequencing to contain NPC1 coding regions (elements 1 through 12; Fig. 2). The majority of these elements also contained intron sequences as shown in Fig. 2A. No non-NPC1 sequences hybridized to the cDNA probe. These findings indicated that the hybridization was specific; however, not all NPC1 exons were represented in this random sampling. Detection of NPC1 Exons on a Genomic Array Using cDNAs from Cell Lines Differentially Expressing NPC1 To test the hypothesis that NPC1 can be identified based on its differential expression in two different mammalian cell lines, a smaller array was con-
FIG. 2. NPC1 cDNA probe identifies exons within the BAC genomic array. The NPC1 cDNA labeled with Spectrum Orange identified 24 genomic fragments, which have a fluorescence intensity of 3 standard deviations above background as determined by our algorithm. Twelve of these targets were randomly chosen, sequenced, and shown to contain exons of NPC1. The 12 elements aligned with the cDNA sequence, clearly showing the intron/exon boundaries of portions of the gene. The BAC genomic array showing the 24 genomic fragments (enclosed with red boxes) and the 12 numbered elements, which were used to construct a smaller subarray (see Fig. 3). FIG. 3. Identification of the NPC1 disease gene using labeled CT60 (NP-C, red) and 911D5A13 (corrected, green) cDNA probes. The genomic array of 96 targets includes the 12 elements from exons of NPC1 (labeled 1 through 12), the full-length NPC1 cDNA as a positive control (P1 and P2), 62 unknown genomic fragments from BAC 108N2 (b1 through b62), and 20 housekeeping genes (HK). Each element was arrayed onto the slide in duplicate. Targets P1 and P2 (NPC1 cDNA controls) were identified by their differential expression between the two cell types, as expected. NPC1 exons/exon fragments labeled 4, 5, 6, 7, 8, and 12 were unambiguously identified only in the green channel.
NIEMANN-PICK TYPE C GENOMIC MICROARRAY
15
16
STEPHAN ET AL.
structed including housekeeping elements to easily control for technical variables (Figs. 1 and 3). This subarray included duplicates of the following: the 12 elements known to contain NPC1 exons (elements 1 through 12); 62 random genomic controls of unknown sequence from BAC 108N2 (b1– b62, each present on the initial array, but which did not hybridize to the NPC1 cDNA); the NPC1 cDNA as a positive control (elements P1 and P2); and 20 housekeeping human cDNA clones (HK1 through HK20) used for normalization. In two-color ratio imaging, which is the method used to analyze the data from this part of the study, the intensities of the two colors must be normalized using several elements in the array. This normalization step facilitates accurate calculation of the intensity of the elements that are outliers. These particular human elements were chosen for normalization because they were known from numerous two-color microarray hybridizations comparing different human cell types, to be expressed at ratios approximating 1:1. Two hamster cell lines were chosen for analysis based on their known differential expression of NPC1: (i) CT60, a chemically mutated CHO line known to have a defect in the gene syntenic to human NPC1, and (ii) the CT60 derivative 911D5A13, which has a normal phenotype with respect to NP-C as a consequence of stable transduction with the YAC 911D5 (13) that encompasses 108N2 and contains NPC1 (12). By Northern blot analysis 911D5A13 expresses NPC1 at high levels, while CT60 has no detectable expression (13). The next step was to generate cDNA probes from 911D5A13 and CT60. These probes were hybridized individually to two identical, consecutively printed, slides of the 96-element subarray, detected by tyramide streptavidin (TSA) amplification (see Discussion below), and scanned to produce digital images. A pseudocolored representation of this result is presented in Fig. 3. The intensity information for the corrected 911D5A13 cell lines is in the green channel and that for the NP-C CT60 cell line is in the red channel. Several genomic fragments that were known to contain NPC1 exons (elements 4, 5, 6, 7, 8, and 12) had statistically significant deviations from the 1:1 ratio predicted for genes expressed at similar levels in the two different cell lines. These elements were positive only in the green channel, which reflects greater hybridization with the 911D5A13 cDNA than the CT60 cDNA probes. Similarly, and as expected, both positive control elements P1 and P2 were clearly positive in the green channel only. Several elements, such as BAC fragments b1 and b2
and the housekeeping gene HK13, were also identified as statistical outliers in the green channel (Fig. 3). Element b1 was not positive in duplicate in this hybridization and not present on successive hybridizations, suggesting a nonspecific signal. The HK13 element has a diffuse appearance and was not an outlier on successive hybridizations. Finally the b2 element is borderline significant in duplicate in this hybridization, but not present in successive hybridizations. The majority of the remaining HK genes were expressed at a ratio of ⬃1:1. The strong intensity of the signals for many of these elements reduced any concern that might arise regarding lack of sufficient cross-species hybridization. However, some of these so-called housekeeping genes (e.g., HK12, HK16, HK18) were positive in the red channel in single experiments (Fig. 3). Upon successive hybridizations (n ⫽ 3), the majority of these genes conformed to the 1:1 ratio. DISCUSSION We have constructed a human genomic array from a BAC containing NPC1 to demonstrate the application of microarray technology for gene identification. By hybridizing this array with a full-length NPC1 cDNA, portions of the exon/intron structure of NPC1 were determined. In addition, a genomic subarray was constructed and used to identify exons from NPC1 expressed in a hamster cell line as a consequence of stable YAC transduction. In the latter study, a novel TSA amplification procedure was introduced to increase the likelihood of detecting hybridization of cellular transcripts (cDNAs) to the genomic DNA targets. We found that direct incorporation of Cy3/Cy5 fluors did not allow us to detect exons of the NP-C gene differentially. This is most likely due to the very small fragment size (average of ⬃500 bp) that we printed onto the arrays. The ability of each molecule of dsDNA both to covalently bind to the surface of the glass slide and to be denatured and hybridize to the template was probably severely reduced. The inclusion of the TSA amplification step places an analyte-dependent reporter enzyme amplification in the detection procedure with the result that previously undetectable levels of hybrid capture can now be detected (20,21). In this amplification, biotinylated cDNA captured at the site of hybridization on the array binds streptavidin that has been coupled with horseradish peroxidase (HRP). When this immobilized enzyme is exposed to hydrogen peroxide and a biotinylated derivative of tyramine, the enzymatic action of the peroxidase on
NIEMANN-PICK TYPE C GENOMIC MICROARRAY
the peroxide efficiently generates oxygen radicals that, in turn, efficiently activate the phenolic portion of the derivatized tyramide. The activated, biotinylated tyramides then react with local tyrosine and tryptophan residues in the protein coating the local surface of the slide and are themselves immobilized. Since HRP has a very high turnover rate, a single biotin that recruits a streptavidin HRP conjugate can produce tens to hundreds of new locally immobilized biotin molecules, which can then be used to capture fluor-labeled streptavidin, giving a large increase in signal per cDNA hybridized to an array element. In our hands, due to the amplification steps, the TSA measures of transcript abundance are not as linear as direct incorporation. In expression profiling on cDNA arrays this becomes a significant issue since reproducibility and linearity of signal across several orders of magnitude are critical. In this study, we are looking for twofold signal increases/ decreases that are not easily distorted by the amplification step. This is borne out in Fig. 3 where an approximation of linearity is seen in that the expression of numerous NP-C exons in the mutant line is decreased relative to that of the housekeeping set. While NPC1 was rapidly and faithfully identified using genomic arrays, there are a few issues that should be noted. Using the subarray, only exons from the 3⬘-most ⬃1 kb of NPC1 (elements 4, 5, 6, 7, 8, and 12) were detected using probes derived from cell lines differentially expressing NPC1. The specificity for the 3⬘ region was likely due to the fact that oligo(dT)-primed reverse transcriptase (RT) was used to generate the labeled cDNA probes. The processivity of the RT polymerase appears to be reduced by inclusion of modified nucleotides in the reaction, producing truncated transcripts. Since the RT adds label in the 3⬘ to 5⬘ direction, more fluorescence will be generated for the 3⬘ end of genes. Therefore, particularly for larger genes, the 5⬘-most exons are likely to be poorly represented in the cDNA probes and consequently alternative-labeling strategies such as random priming could be used. In addition, some of the elements on the subarray containing 3⬘ exons are not detected. Further optimization of the TSA-amplification procedure may be necessary to identify 100% of these elements. Another issue concerns the partial NPC1 genomic structure depicted in Fig. 2. Only a random subset of the BAC array elements identified using the NPC1 cDNA probe was sequenced. Thus, the genomic structure was not expected to be complete. Many of the identified exons were from the 3⬘ end of NPC1, most likely be-
17
cause their large size provides a better target for hybridization. Finally, the need to use consecutive slides adds an element of uncertainty and error to the image overlay. From numerous cDNA array hybridizations, we have seen that consecutive slides do have a higher variability than when a single slide is hybridized differentially. Again, the number of NP-C exons that were identified (and which illustrate the reproducibility of this redundant assay) reinforces the notion that despite increased variability, we are able to identify the defective gene. Although these issues constitute limitations, the strategy could be useful for candidate gene identification and also provide genomic structure for mutation scanning. In the present study, it was known a priori that the two cell lines differentially expressed NPC1. However, if the gene of interest is unknown, a series of arrays could be generated from a physical contig. It may be possible to scan such larger genomic intervals with greater ease if the fragment size of the probes on the slide were greater than 0.5 kb in length. However, further studies are necessary to determine the optimum fragment size that does not sacrifice hybridization specificity. These arrays could be then hybridized with cDNAs from cells (cell lines or primary cells) from affected and normal individuals. Since the vast majority of recessive “loss-of-function” disorders result in diminished transcript levels, via a deletion at the DNA level, splice errors, or nonsense-mediated RNA degradation, those disease genes should potentially be detected using this technology. Another application of this technique would be to search for overexpressed oncogenes within chromosomal amplifications. In addition, if it is possible to improve the current technology by increasing the sensitivity of detection, it may be conceivable to identify haploinsufficiency and carriers for recessive mutations. One class of mutations that could pose a technical challenge would be 5⬘ in-frame internal deletions. Since these mutations would truncate mRNA transcripts, the 3⬘ labeling bias could prevent differential detection. In those instances, it will be important to modify the technology so that the entire transcript is consistently labeled. Finally, at present when comparing two probes using the TSA detection system, each hybridization must be performed on a different slide. Thus, the development of a two-color TSA detection system will be an important asset for higher throughput applications of this technology. As an adjunct to traditional techniques, these applications of cDNA probes to specific genomic arrays
18
STEPHAN ET AL.
should help eliminate the bottleneck in certain positional cloning projects. 12.
ACKNOWLEDGMENTS We thank D. Leja for providing graphics assistance and P. Penchev, National Institute of Neurologic Disorders and Stroke, for his contributions to this study. J. Gu was supported by a grant from the Ara Parseghian Medical Research Foundation.
13.
REFERENCES 1.
2.
3.
4.
5. 6.
7.
8.
9.
10.
11.
Buckler AJ, Chang DD, Graw SL, Brook JD, Haber DA, Sharp PA, Housman DE. Exon amplification: A strategy to isolate mammalian genes based on RNA splicing. Proc Natl Acad Sci USA 88:4005– 4009, 1991. Duyk GM, Kim SW, Myers RM, Cox DR. Exon trapping: A genetic screen to identify candidate transcribed sequences in cloned mammalian genomic DNA. Proc Natl Acad Sci USA 87:8995– 8999, 1990. Krizman DB, Berget SM. Efficient selection of 3⬘-terminal exons from vertebrate DNA. Nucleic Acids Res 21:5198 – 5202, 1993. Schuler GD, Boguski MS, Stewart EA, Stein LD, Gyapay G, Rice K, White RE, Rodriquez-Tome P, Aggarwal A, Bajorek E, Bentolila S, Birren BB, Butler A, Castle AB, Chiannilkulchai N, Chu A, Clee C, Cowels S, Day PJ, Dibling T, et al. A gene map of the human genome. Science 274:540 –546, 1996. Lovett M. Fishing for complements: Finding genes by direct selection. Trends Genet 10:352–357, 1994. Del Mastro RG, Lovett M. Isolation of coding sequences from genomic regions using direct selection. Methods Mol Biol 68:183–199, 1997. Osborne-Lawrence S, Welcsh PL, Spillman M, Chandrasekharappa SC, Gallardo TD, Lovett M, Bowcock AM. Direct selection of expressed sequences within a 1-Mb region flanking BRCA1 on human chromosome 17q21. Genomics 25:248 –255, 1995. Czaplinski K, Ruiz-Echevarria MJ, Paushkin SV, Han X, Weng Y, Perlick HA, Dietz HC, Ter-Avanesyan MD, Peltz SW. The surveillance complex interacts with the translation release factors to enhance termination and degrade aberrant mRNAs. Genes Dev 12(11):1665–1677, 1998. Ruiz-Echevarria MJ, Yasenchak JM, Han X, Dinman JD, Peltz SW. The upf3 protein is a component of the surveillance complex that monitors both translation and mRNA turnover and affects viral propogation. Proc Natl Acad Sci USA 95(15):8721– 8729, 1998. Perlick HA, Medghalchi SM, Spencer FA, Kendzior RJ Jr, Dietz HC. Mammalian orthologues of a yeast regulator of nonsense transcript stability. Proc Natl Acad Sci USA 93(20):10928 –10932, 1996. Sun X, Perlick HA, Dietz HC, Maquat LE. A mutated homologue to yeast Upf1 protein has a dominant-negative effect on the decay of nonsense-containing mRNAs in mam-
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
malian cells. Proc Natl Acad Sci USA 95:10009 –10014, 1998. Carstea ED, Morris JA, Coleman KG, Loftus SK, Zhang D, Cummings C, Gu J, Rosenfeld MA, Pavan WJ, Krizman DB, Nagle J, Polymeropoulos MH, Sturley SL, Ioannou YA, Higgins ME, Comly M, Cooney A, Brown A, Kaneski CR, et al. Neimann–Pick C1 disease gene: Homology to mediators of cholesterol homeostasis. Science 277:228 –231, 1997. Gu JZ, Carstea E, Cummings C, Morris J, Loftus S, Coleman K, Zhang D, Cooney A, Comly M, Fandino L, Roff C, Tagle D, Pavan B, Pentchev P, Rosenfeld MA. Substantial narrowing of the Niemann–Pick C candidate interval by YAC complementation. Proc Natl Acad Sci USA 94:7378 –7383, 1997. DeRisi J, Penland L, Brown PO, Bittner ML, Meltzer PS, Ray M, Chen Y, Su YA, Trent JM. Use of cDNA microarray to analyse gene expression patterns in human cancer. Nat Genet 14:457– 460, 1996. DeRisi JL, Vishwanath RI, Brown PO. Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278:680 – 686, 1997. Schena M, Shalon D, Davis RW, Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270:467– 470, 1995. Lashkari DA, DeRisi JL, McCusker JH, Namath AF, Gentile C, Hwang SY, Brown PO, Davis RW. Yeast microarrays for genome wide parallel genetic and gene expression analysis. Proc Natl Acad Sci USA 94:13057–13062, 1997. Schena M, Shalon D, Heller R, Chai A, Brown PO, Davis RW. Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes. Proc Natl Acad Sci USA 93:10614 –10619, 1996. Meltzer PS, Guan XY, Burgess A, Trent JM. Rapid generation of region specific probes by chromosome microdissection and their application. Nat Genet 1:24 –28, 1992. Adler K, Erickson T, Bobrow MN. High sensitivity detection of HPV-16 in SiHa and CaSki cells utilizing FISH enhanced by TSA. Histochem Cell Biol 108:321–324, 1997. Bobrow MN, Harris TD, Shaughnessy KJ, Litt GJ. Catalyzed reporter deposition, a novel method of signal amplification: Application to immunoassays. J Immunol Methods 125:279 –285, 1989. Chen Y, Dougherty ER, Bittner ML. Ratio-based decisions and the quantitative analysis of cDNA images. J Biomed Optics 2:364 –374, 1997. Pietu G, Alibert O, Guichard V, Lamy B, Bois F, Leroy E, Mariage-Samson R, Houlgatte R, Soularue P, Auffray C. Novel gene trabscripts preferentially expressed in human muscles revealed by quantitative hybridization of a high density cDNA array. Genome Res 6:492–503, 1996. Barnett V, Lewis T. Outliers in Statistical Data, 3rd ed. Wiley Series in Probability and Mathematical Statistics— Applied Probability and Statistics Section, New York: Wiley, 1994. Robbins CM, Hsu E, Gillevet PM. Sequencing homopolymer tracts and repetative elements. BioTechniques 20:862– 864, 1996.