Shotgun sequencing and microarray analysis of RDA transcripts

Shotgun sequencing and microarray analysis of RDA transcripts

Gene 310 (2003) 39–47 www.elsevier.com/locate/gene Shotgun sequencing and microarray analysis of RDA transcripts Tove Anderssona, Stina Bora¨nga, Per...

1MB Sizes 0 Downloads 60 Views

Gene 310 (2003) 39–47 www.elsevier.com/locate/gene

Shotgun sequencing and microarray analysis of RDA transcripts Tove Anderssona, Stina Bora¨nga, Per Unneberga, Valtteri Wirtaa, Anders Thelinb, Joakim Lundeberga, Jacob Odeberga,* a

Department of Biotechnology, Division of Molecular Biotechnology, KTH, Royal Institute of Technology, AlbaNova University Center, Stockholm Centre for Physics, Astronomy, and Biotechnology, SE-106 91 Stockholm, Sweden b Department of Molecular Biology, AstraZeneca R&D Mo¨lndal, SE-431 83 Mo¨lndal, Sweden Received 13 December 2002; received in revised form 20 February 2003; accepted 27 February 2003 Received by T. Sekiya

Abstract Monitoring of differential gene expression is an important step towards understanding of gene function. We describe a comparison of the representational difference analysis (RDA) subtraction process with corresponding microarray analysis. The subtraction steps are followed in a quantitative manner using a shotgun cloning and sequencing procedure that includes over 1900 gene sequences. In parallel, the enriched transcripts are spotted onto microarrays facilitating large scale hybridization analysis of the representations and the difference products. We show by the shotgun procedure that there is a high diversity of gene fragments represented in the iterative RDA products (92 – 67% singletons) with a low number of shared sequences (,9%) between subsequent subtraction cycles. A non redundant set of 1141 RDA clones were immobilized on glass slides and the majority of these clones (97%) gave repeated good fluorescent signals in a subsequent hybridization of the labelled and amplified original cDNA. We observed only a low number of false positives (,2%) and a more than twofold differential expression for 32% (363) of the immobilized RDA clones. In conclusion, we show that by random sequencing of the difference products we obtained an accurate transcript profile of the individual steps and that large-scale confirmation of the obtained transcripts can be achieved by microarray analysis. q 2003 Elsevier Science B.V. All rights reserved. Keywords: Gene expression; Subtractive hybridization; Microarray; Transcript profiling

1. Introduction Transcript profiling represents an important first step in understanding the cellular roles of genes, and several methods to achieve this have been developed. Microarray technology (Schena et al., 1995; Lockhart et al., 1996) is a powerful tool enabling expression profiles to be determined for thousands of genes simultaneously. However, it cannot be used for gene discovery, as it relies on prefabricated probes to allow for analysis. The amount of mRNA required for microarray analysis is also considered a hurdle, although novel approaches may solve this (Wang et al., 2000; Hertzberg et al., 2001; Xiang et al., 2002). It is also problematic to detect changes in transcript profiles for low Abbreviations: bp, base pairs; RDA, representational difference analysis; PCR, polymerase chain reaction; LDL, low-density lipoprotein; oxLDL, oxidized LDL; PMA, phorbol 12-myristate-13 acetate. * Corresponding author. Tel.: þ 46-8-5537-8332; fax: þ46-8-5537-8481. E-mail address: [email protected] (J. Odeberg).

abundance transcripts because of detector sensitivity limits. In addition, the dynamic range of the detector will not accommodate proportional quantification of signals from both low- and high-abundance transcripts in the same reading, since signals of the latter will be saturated. Representational difference analysis (RDA) is a PCRbased subtractive enrichment procedure originally developed for cloning of sequences differing between genomic DNA samples (Lisitsyn et al., 1993) and later modified for cloning of differentially expressed genes (Hubank and Schatz, 1994), which is advantageous in this respect. Briefly, the procedure relies on the generation of representations of cDNA fragments from two different mRNA populations by digestion with a four-cutting restriction endonuclease followed by linker ligation and PCR amplification. The generated representations are then subjected to iterative steps of subtractive cross hybridization and selective PCR amplification, to enrich for cDNA fragments that are more abundant in one mRNA population. The

0378-1119/03/$ - see front matter q 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S0378-1119(03)00498-0

40

T. Andersson et al. / Gene 310 (2003) 39–47

method allows for gene discovery, as gene fragments are isolated and identified based on differential expression and not filtered by prior anticipation of which genes are relevant to study. In addition, it can be also be used for analysis of very small amounts of tissue (Odeberg et al., 2000). We have previously evaluated different cloning strategies to analyse differential products from a RDA study. That work indicated that in order to obtain a representative view of the most abundant gene fragments, a shotgun procedure is preferred to the commonly used size selection of distinct bands on agarose gel before cloning (Borang et al., 2001). In this work, we have used a shotgun cloning strategy to analyse the enrichment procedure in detail through sequencing analysis of over 1900 gene fragments originating from successive rounds of subtractive hybridization and amplification steps. Custom microarrays were manufactured for RDA evaluation and further analysis of the obtained gene fragments.

10 ml of hot start mix (2 ml 5£ PCR buffer, 7 ml water, 1 ml (5 U/ml) AmpliTaq (Perkin Elmer, Norwalk, CT, USA)) was added to each tube and followed by a 15-min extension at 72 8C before a two-step PCR was initiated (95 8C 1 min, 72 8C 3 min). Aliquots of 6 ml were removed after every cycle from 21 to 35 from alternating tubes run in parallel. Optimal cycle number was determined visually on agarose gel by comparing parallel aliquots from different cycles run, using criteria described previously (Odeberg et al., 2000). Titration of second sample (oxLDL treated sample) PCR was performed by varying the amount of template (0.5, 0.8 and 1 ml) in parallel test tubes, using the first sample as a reference and the optimal number of titrated cycles in the PCR. The template concentration giving the most similar gel image to the reference sample in respect to intensity and size distribution was chosen for generation of representation. Multiple PCR reactions of tester and driver were run to obtain a total of approximately 200 mg DNA of each representation, using optimized cycle number and template volume.

2. Material and methods 2.3. Solid phase purification and linker ligation 2.1. Cell line culture and cDNA synthesis THP-1 cells (ATCC TIB-202) cultured in RPMI 1640 supplemented with 50 mM 2-mercaptoethanol, 100 mg/ml streptomycin, 100 U/ml penicillin and 5% foetal calf serum (FCS Hyclone) were seeded in T 75 flasks at a density of 14 £ 106 in 7.5 ml. To establish a macrophage phenotype, THP-1 cells were treated with 0.2 mM phorbol 12-myristate13 acetate (PMA) (Sigma-Aldrich Corp., St Louis, MO, USA) for 24 h. Fifty micrograms/ml oxLDL (LDL oxidized by exposure to copper for 24 h) were added to the PMAtreated THP-1 cells for 24 h. Total RNA was isolated using Trizol (Life Technologies, UK). mRNA was isolated from total RNA using oligo(dT) paramagnetic beads (Dynal AS, Oslo, Norway). cDNA synthesis was performed essentially according to Gubler (1988) using a biotinylated oligo(dT) primer (50 -biotin-GAG GTG CCA ACC GCG GCC GCT TTT TTT TTT TTT TT-30 ). The double-stranded cDNA was digested with Dpn II (New England Biolabs, Beverly, MA, USA) and ligated to complementary R-Bgl-24 (50 biotin-AGC ACT CTC CAG CCT CTC ACC GCA-30 ) and R-Bgl-12 (50 -GAT CTG CGG TGA-30 ) oligonucleotides as previously described (Hubank and Schatz, 1994). 2.2. Titration of optimal number of cycles and generation of representations A PCR master mix was prepared containing 18 ml 5£ PCR buffer (335 mM Tris –HCl (pH 8.8), 20 mM MgCl2, 80 mM (NH4)2SO4, 166 mg/ml BSA), 17 ml of dNTP mix (2 mM of each dNTP), 75 pmol biotinylated R-Bgl-24 primer, and 1 ml ligated cDNA (PMA-treated sample) per 100 ml total reaction, and sterile water to a volume of 90 ml per reaction. The reactions were heated to 72 8C for 3 min and

Twenty micrograms of Dpn II digested tester representation was purified as previously described (Odeberg et al., 2000). Two micrograms of purified tester was annealed and ligated with biotinylated J-Bgl-24 (50 -biotin-ACC GAC GTC GAC TAT CCA TGA ACA-30 ) and J-Bgl-12 (50 -GAT CTG TTC ATG-30 ). Hybridization of tester and driver, and PCR amplification, was essentially performed as described (Hubank and Schatz, 1994), including treatment with mung bean nuclease (MBN) after ten cycles. Remaining PCR products were Dpn II digested, solid-phase purified, ligated to a new set of linkers (N-Bgl-24 (50 -biotin-AGG CAA CTG TGC TAT CCG AGG GAA-30 ) and N-Bgl-12 (50 -GAT CTT CCC TCG-30 ) and the procedure was repeated as described (Hubank and Schatz, 1994) to generate difference products 2 and 3 (DP2 and DP3). During the last round, the MBN treatment was omitted, in which case 36 cycles of PCR were performed directly on the diluted hybridization mixture. 2.4. Sequence assembly and analysis of difference products The final difference products were digested with Dpn II and shotgun cloned into a Bam HI-digested and alkaline phosphatase-treated pRIT28 (Hultman et al., 1991) vector (DP1 and DP2) or pCRII vector (DP3). Colonies were PCR screened with vector-specific primers (Hultman et al., 1991) and approximately 300 clones per differential product were sequenced using cycle sequencing with biotinylated vector specific primers and a DNA sequencing kit (BigDye Terminator Cycle Sequencing Ready Reaction, Perkin Elmer, Foster City, CA, USA). The Staden software version 4.4 was used for sequence preprocessing and assembly (Staden, 1996). First, Pregap4 was run interactively to clean sequences from vector and low-quality sequence and then

T. Andersson et al. / Gene 310 (2003) 39–47

the sequences were assembled using the gap4 normal shotgun assembly of sequences with a similarity greater than 95% and an initial perfect match of more than 20 base pairs. Contigs overlapping with a sequence similarity greater than 90% were inspected manually and joined when sequence identity was supported by hidden data (low quality sequence) or could be justified by manual editing of sequencing artefacts. The consensus sequences were screened for Dpn II sites to detect chimeric clones resulting from ligation of more than one gene fragment into the cloning site. The final contigs were compared by BLASTN (Altschul et al., 1990) to the representative nucleotide sequences included in the UniGene database (release 148) (http://www.ncbi.nlm.nih.gov/UniGene) and The Expressed Gene Anatomy Database (EGAD) (http://www.tigr.org/tdb/ egad/egad.html). Sequences giving a lower than 90% identity to any of the sequences in these databases were further mapped against the human genome data downloaded from the EMBL web site (http://ebi.ac.uk/pub/databases/ embl/release) and sets of sequences with no hits in the databases were investigated by manual BLASTN comparisons to the ENSEMBL (http://www.ensembl.org/) and NCBI (http://www.ncbi.nlm.nih.gov/BLAST/) to elucidate their identity. 2.5. Real-time PCR verification Gene-specific real-time PCR primers were designed for gene fragments cloned from the RDA DP2 using the PrimerExpress software (Perkin Elmer Life Sciences, Boston, MA, USA). New cDNA was prepared from the RNA starting material using the Superscript Preamplification System (LifeTechnologies). Four reactions with 2.2 mg of total RNA template and 200 ng random hexamer primers in each reaction were used. The analysis was performed with both cDNA (diluted 1:500) and representations (diluted 1:10,000) as template. The internal standard was the cDNA for ribosomal RNA S18 and the negative control was total RNA diluted to correspond to the cDNA dilution. A PCR Master mix was prepared using SYBRGreen DNA PCR core reagent kit (Applied Biosystems, Foster City, CA, USA) and aliquoted into microtitre plate wells. One microliter of template and 10 pmol of each primer were added to each well. The ABI Prism 7700 Sequence Detection System (Applied Biosystems) was used for PCR and detection of the fluorescent signal. For data evaluation, the comparative CT method was used according to the manufacturer’s instructions and the level of significance was set to a twofold relative difference between samples. 2.6. Microarray analysis PCR amplification of a non redundant set of clones for each dataset (DP1 – DP3) was performed in 50-ml PCR reactions using vector specific primers and Platinum Taq polymerase (Invitrogen, Life Technologies, USA). The

41

1141 PCR products were purified by solid-phase technology using a Magnatrix 1200 instrument (Magnetic Biosolutions, Stockholm, Sweden) (Wirta et al., unpublished), resolved in 24 ml 50% DMSO and printed onto amino-silane coated glass slides (CMT-GAPS II Coated Slides, Corning Inc., USA) using an automated arrayer (Genetix Qarray, Genetix Ltd., Hampshire, UK). We also printed a set of positive and negative control genes from human and Arabidopsis thaliana supplied with the SpotReport microarray kit (SpotReport-10 Array validation system, Stratagene, TX, USA). All clones were printed in triplicate creating three identical fields on each slide with a spot size of 200 mm in diameter. The slides were UV cross-linked with 250 mJ and the spot quality was checked by soaking the slides in SYTO61-dye (SYTO Red fluorescent nucleic acid stains, Molecular Probes Europe BV, Leiden, The Netherlands) for fluorophore detection of double-stranded DNA. Labelled targets for hybridization were generated by PCR in the presence of Cy3- or Cy5-labelled dCTPs (7:1 ratio of label to normal dCTPs) (Cyanine 3-dCTP and Cyanine 5-dCTP, NEN Life Science Products, Brussels, Belgium). The extension time at 72 8C was changed to 10 min/cycle (compared to the original protocol where 3 min/cycle was used) to compensate for the slower incorporation of the labelled nucleotides. Spin column purification of labelled PCR products was performed (QIAquick PCR purification Kit, Qiagen Inc., Chatsworth, CA, USA) according to the supplied manual. The samples were dried and resuspended in 26 ml hybridization buffer (50% formamide, 5£ saline sodium citrate (SSC), 0.1% sodium dodecyl sulphate (SDS), 20 mg human COT1-DNA and 20 mg poly(A)-DNA), and the mix was denatured by heating to 95 8C for 3 min and snap-cooled on ice for 30 s. The hybridizations were performed as previously described (Hegde et al., 2000). A dye swap experimental design comparing the same samples but using adverse dyes in the two hybridizations was used to compensate for dye-specific labelling effects. Scanning was performed using a confocal laser scanner (GMS418 Array Scanner, Genetic MicroSystems Inc., USA) and images were analysed with the GenePixPro 3.0 software (Axon Instruments, CA, USA). The hybridization quality and specificity was ensured by checking the control spots and using the Axon Array Quality Control Report feature implemented in GenPixPro 3.0. Spots flagged as bad or absent were removed as well as spots with aberrant spot morphology and spots with a median signal intensity lower than the local background plus two standard deviations in both channels. The channels were normalized using the median normalization factor calculated in GenPixPro, as it was reasonable to assume that the total signal intensity would be equal in the two channels since approximately the same number of clones were spotted for the up-regulated and the down-regulated data sets. Each clone giving a signal in at least two (out of three) replicates on each slide was included in the analysis, and the ratio average (up-regulated by oxidized

42

T. Andersson et al. / Gene 310 (2003) 39–47

LDL/down-regulated by oxidized LDL) of each spot was calculated per slide and then averaged between slides.

3. Results We performed representational difference analysis (RDA) to isolate genes that are differentially expressed in two RNA populations. We used RNA extracted from macrophage like cells (PMA-stimulated THP-1 cells) and foam-cell like cells (PMA-stimulated and oxidized LDL exposed THP-1 cells) mimicking the process of foam cell formation which is observed in atherosclerotic plaques. Shotgun cloning and sequencing of the difference products (both up- and down-regulated) after each of three successive rounds of subtraction was performed, generating six sets of clones. A unique set of the obtained transcripts was then spotted onto custom microarrays allowing for quick evaluation of the clone expression ratios in the different RDA products. The biological interpretation of the data has been reported elsewhere (Andersson et al., 2002). 3.1. RDA procedure The RDA was performed as previously described (Odeberg et al., 2000). Briefly, the obtained cDNA was digested with Dpn II followed by linker ligation and amplification. The generated representations were then subjected to iterative steps of subtractive cross hybridization and selective PCR amplification to enrich for cDNA fragments more abundant in one mRNA population. We used streptavidin coated magnetic beads for depletion of remaining or uncleaved linker fragments between successive hybridization rounds. After Dpn II digestion, linker ligation and PCR amplification to generate representative amplicons (representations) of each cDNA, three rounds of subtraction and amplification were performed with an increasing excess of driver added in each round. In the first subtraction (DP1) the ratio of tester to driver was 1:100, followed by ratios of 1:800 and 1:4000 in DP2 and DP3, respectively. Comparison of representations and differential products on a 2% agarose gel verified the expected appearance of distinct bands in later rounds of subtraction compared to the smear in the initial representations (Fig. 1). 3.2. Sequence sampling and assembly Approximately 300 sequences from randomly selected clones per differential product were obtained (both up- and down-regulated), giving a total of 1984 sequences (Table 1). After vector trimming and manual editing of all sequences, the average sequence length was between 132 and 214 bp and all data sets contained sequences ranging from a few to several hundreds bp supporting the unbiased amplification of gene fragments of different lengths in the PCR steps (Table 1). The number of chimeric clones resulting from

Fig. 1. Representations and difference products (DP) of macrophages and foam cells. The representations and successive difference products of down-regulated and up-regulated samples separated on ethidium stained 2% agarose gel. PCR generation of representations from macrophages (lane 1) and foam cells (lane 2) have been carefully optimized to correspond to the wide size range and evenly distributed smear in the original cut cDNA. In the subsequent differential products containing down-regulated (lanes 3 (DP1), 5 (DP2) and 7 (DP3)) and up-regulated (lanes 4 (DP1), 6 (DP2) and 8 (DP3)) gene fragments, a pattern of more distinct bands and less smear is observed.

ligation of more than one gene fragment into a vector was estimated by in silico screening of sequences for Dpn II sites and the frequency detected was very low ð, 2% for all data sets) (Table 1). The sequences from each differential product were assembled and the resulting contigs were found to include both a large number of singletons (one unique sequence) and clusters (two or more identical sequences). The high number of unique sequences in each difference product (Table 1) indicates that the few bands visible on the agarose gel in, e.g., DP3 give an underestimation of the complexity of gene fragments present. The selective enrichment of certain fragments is demonstrated by the increased number of identical sequences present in each cluster of the later difference products (Table 1). A pair-wise basic local alignment search tool (BLAST) comparison (Altschul et al., 1990) of sequences from the RDA datasets was used to estimate the number of sequences shared between iterative subtraction rounds and between the up- and down-regulated data sets (Fig. 2). The retention and disappearance of specific gene fragments during the RDA process followed this way are shown in Fig. 2A and B. The results for upregulated and down-regulated gene fragments are similar, with a large number of gene fragments disappearing from DP1 to DP2, and from DP2 to DP3, but also a substantial number of fragments that are novel in DP3. These new fragments are of interest since they have been retained during a process of stringent selection for gene fragments with significantly altered expression levels. A low number of false positives was detected, i.e. sequences present in both up-regulated and down-regulated datasets, and the number

T. Andersson et al. / Gene 310 (2003) 39–47

43

Table 1 Successive enrichment of differentially expressed gene fragments through three cycles of RDA was followed by sequencing and sequence assembly of random clones for each difference product Up-regulated by oxidized LDL

No. of sequenced clones No. of contigs Singletons (% of contigs) Clusters (% of contigs) Mean no. of sequences per cluster Minimum sequence length (bp) Maximum sequence length (bp) Mean sequence length (bp) Dpn II sites (chimeric clones) BLASTN matches to UniGene data (release 148)a BLASTN matches to Human Genome data (EMBL)a a

Down-regulated by oxidized LDL

DP1

DP2

DP3

DP1

DP2

DP3

296 241 221 (92%) 20 (8%) 3.75 12 507 181 6 (2.0%) 202 (68%) 227 (77%)

338 167 129 (77%) 38 (23%) 5.5 26 441 214 4 (1.2%) 230 (68%) 281 (83%)

346 151 118 (78%) 33 (22%) 6.9 20 337 132 6 (1.7%) 202 (58%) 251 (73%)

315 261 239 (92%) 22 (8%) 3.45 23 339 150 4 (1.3%) 182 (57%) 223 (71%)

331 189 144 (76%) 45 (24%) 4.15 21 529 213 6 (1.8%) 180 (54%) 228 (69%)

358 132 89 (67%) 43 (33%) 6.25 14 337 156 4 (1.1%) 151 (42%) 274 (76%)

Sequence similarity . 90%.

Fig. 2. Survival and depletion of gene fragments in the RDA process. Gene fragments unique and/or shared between successive subtraction rounds (DP1–DP3), detected by sequence comparisons between the datasets representing up- (A) and down-regulated (B) gene fragments. The number of sequences that overlap between opposed datasets indicates a very low number of false positives, especially in later rounds of RDA (C).

44

T. Andersson et al. / Gene 310 (2003) 39–47

was slightly decreasing with increased number of subtractions (Fig. 2C).

3.3. Sequence mapping To verify the presence of true genes and estimate the relative amount of unknown genes in the RDA datasets, the sequences were mapped against the UniGene (release 148) database and the sequences giving a less than 90% identical match were also mapped against the human genome data at http://www.ebi.ac.uk/embl/index.html (Table 1) and a collection of 25 sequences giving matches to the genomic sequence ð. 90% similarity) were investigated with the Ensemble search tool (http://www.ensemble.org) to search for additional information indicating whether the sequences are derived from expressed genes or from genomic DNA. Out of the 25 sequences with sequence similarity to human genome data (but no similarity to annotated UniGene sequences), 24 were indicated to be in a region with shown or predicted gene sequence and only one sequence matched a completely unknown genomic region (data not shown). Furthermore, a random set of 53 sequences with no match to any of the searched databases were subjected to manual BLAST search to genomic data using the search tools at the ENSEMBL (human genome data) and NCBI (the non redundant database) web pages and adapting the searches for short sequences. All but five of the 53 sequences gave matches to human sequences, ruling out the possibility of contaminations among the sequences, and in the majority of cases the short sequence lengths can explain the difficulties to identify these sequences by automated BLAST searches.

3.4. Microarray and confirmatory real-time PCR analysis A non redundant set of 1141 gene fragments from each of the six RDA difference products (DP1 –DP3, upand down-regulated) were amplified, purified and spotted onto microarrays (Table 2). Labelled RDA representations from foam cells and macrophages respectively were generated by direct incorporation of Cy3- or Cy5labelled dCTPs in the PCR, and hybridized to the microarray in two experiments. Each dye was used for labelling of both samples to reveal dye-specific effects. The performance of the microarray assay demonstrated both a high specificity (no cross-hybridization to negative control genes) and a very high sensitivity as 97% of the microarray elements repeatedly gave signals above the intensity threshold used (local background plus two standard deviations) and were included in the analysis (Table 2). This confirms that all of the clones sequenced from DP1 – DP3 are derived from sequences also present in the starting material and not from some contamination during the RDA process. The majority of microarray signals are biased toward the red or green and not, as in a conventional microarray experiment, containing mostly equally expressed clones (Fig. 3). Altered expression levels, indicated from sequence analysis and assembly of RDA clones were further verified by real-time PCR on the starting cDNA for a few selected genes. Real-time PCR results show six out of seven selected genes to be differentially expressed as expected (Andersson et al., 2002). In the single nonconfirmative case the real-time PCR signals were not above the signal threshold level. The clones selected for real-time PCR are indicated in Fig. 3 as spots hybridized

Fig. 3. Microarrays harbouring a unique set of RDA clones were spotted and the RDA representations were used for large scale determination of expression ratios in the RDA starting material. All spotted RDA clones are shown, with circles indicating seven clones that have been confirmed by quantitative PCR to be up- (red) or down-regulated (green) by oxidized LDL.

T. Andersson et al. / Gene 310 (2003) 39–47

45

Fig. 4. The log2 ratios are plotted in falling order along the x-axis for all spots derived from each RDA difference product. Ratios . 0 indicate higher expression in the oxidized LDL treated samples. Spotted clones from the difference products enriched in gene fragments up-regulated by oxidized LDL (A: DP1; B: DP2; C: DP3) show a majority of clones with expression ratios higher than 0 and analogous results can be seen for the clone sets enriched in downregulated gene fragments (D: DP1; E: DP2; F: DP3). All spotted clones are represented in F, showing the differential expression for both up-regulated and down-regulated clones. The cutoff at twofold ratios is indicated by the red lines.

46

T. Andersson et al. / Gene 310 (2003) 39–47

Table 2 Microarray analysis of RDA gene fragments Up-regulated by oxidized LDL

No. of spotted clones Clones yielding a significant signala Confirmed clonesb False positivesb

Down-regulated by oxidized LDL

DP1

DP2

DP3

DP1

DP2

DP3

241 226 (94%) 65 (27%) 2 (1%)

167 160 (96%) 109 (65%) 3 (2%)

151 151 (100%) 48 (32%) 0 (0%)

261 255 (98%) 54 (21%) 3 (1%)

189 183 (97%) 33 (17%) 6 (3%)

132 131 (99%) 54 (41%) 0 (0%)

Clones representative of the consensus sequences obtained from the consecutive rounds of RDA (DP1-DP3) were immobilized on a cDNA microarray and used for hybridizations with labelled representations. a Signals higher than the local background þ 2 standard deviations in at least two replicates on both slides b A cutoff at twofold ratios was used.

with labelled representations and yielding the expected colours for a visual inspection, although the results obviously were interpreted from measured signal intensities. The relative ratios for all RDA clones on the microarray are shown in Fig. 4. As expected, microarray clones derived from the down-regulated datasets are more abundant in the macrophage representations than in the foam cell material and vice versa for the upregulated genes. The number of clones demonstrating at least twofold ratios for all replicates were summarized (Table 2). Among those a total of 32% gave expected results and , 2% gave conflicting results relative to their appearance in the up- or down-regulated RDA data sets. It can be noted that the DP3 data sets contain no false positives (Table 2).

4. Discussion In this study we have monitored and compared the RDA subtraction process by shotgun sequencing and microarray analysis of the difference products. The large diversity of gene sequences found in the difference products, together with the enrichment of rare transcripts in later rounds of RDA demonstrates its usefulness for large scale profiling of differential gene expression. The pool of randomly sequenced clones analysed here is clearly only a fraction of the potentially interesting content of the RDA products, demonstrated by the high percentage of singletons (67 – 92%) even after sequencing more than 300 subtracted clones. Thus continued sequencing would likely reveal additional gene fragments. This has not been done in previous RDA studies, probably due to the observation of a few very strong bands on agarose gel already after 2 – 3 subtraction rounds (Fig. 1). There is a surprisingly low overlap between sequences from DP2 to DP3 ð, 9% in successive subtraction rounds) (Fig. 2), considering the similar pattern observed after gel electrophoresis (Fig. 1). The low overlap could be due to the small sample size (a few hundred clones) obtained from the large sequence diversity (thousands of fragments) present, but this also shows the usefulness of extended sequencing of

all difference products for retrieval of new information. It is anticipated that the information obtained from sequencing the third difference products should represent truly differentially expressed genes, since the retained gene fragments survived a stringent selection process. Still, all of the sequences in DP3 must have been present in DP2 and DP1 but possibly at very low levels. The presented data may also suggest that a fourth round of RDA can be carried out to identify even more transcripts. The process of mapping the sequences by BLASTN searches showed that the majority of the sequences obtained give matches to annotated or predicted gene coding regions in the human genome. Difficulties in getting good alignment for many sequences using automated batch BLAST was overcome by manual BLAST of individual sequences using a BLAST algorithm suited for short sequences (with a word length of 7 instead of 11 and increased expected threshold from 10 to 1000). This revealed that most sequences was indeed very similar to human sequences but too short to be used for reliable mapping by BLAST. The fact that sequences of less than 200 bp are difficult to map using any sequence comparison algorithm has already been reported (Andersson and Brass, 1998), but our efforts to investigate the sequences and alignments carefully manually allowed the majority to give hits to (1) human sequences (rather than other species in the non redundant database) and (2) gene coding or predicted gene coding regions. This leads us to believe that the sequences data are informative and non-contaminated. The shotgun procedure offers quantitative and statistical estimations of differential gene expression, albeit of the subtraction products. The disadvantage is the relative cumbersome sequencing required for analysis of hundreds of fragments, as indicated in this study, from each cycle. Microarray technology has previously shown useful as a high-throughput tool for screening of differentially expressed RDA clones without a pre-sequencing step (Boeuf et al., 2001; Welford et al., 1998; Geschwind et al., 2001). To analyse the relative expression ratios of the cloned RDA sequences in the starting material used and to create a tool for further investigation of the genes they represent in other biological samples, we created microarrays

T. Andersson et al. / Gene 310 (2003) 39–47

harbouring these clones. A representative for each cluster for each difference product was spotted onto microarrays in triplicate and duplicate experiments including dye swaps were performed to eliminate artefacts. The two starting cDNA sources (representations) were used in two hybridizations. Data normalization based on assumptions about equal gene representation in the two channels may not be applicable to this kind of array. However, the array design used where approximately half of the spotted clones are derived from up-regulated (49%) or down-regulated (51%) difference products, respectively, should ensure an overall balance between the channels. The number of microarray elements confirmed with a cutoff of more than twofold expression ratio in this study is 32%. The general patterns of expression ratios shown in Fig. 4 give a clear overview of the enrichment of differentially expressed genes in RDA where the large majority of the obtained gene fragments are over expressed in the sample that was used as tester. The subtraction of gene fragments that are expressed at higher levels in the driver is not complete but the fraction left is decreasing with more subtractions. Less than 2% of the clones gave conflicting microarray data using the twofold expression ratio cutoff and in DP3 no clones were detected as false positives (Table 2). In conclusion, we show that shotgun RDA and large scale DNA sequencing is an attractive approach to monitor differential expression, and that analysis of difference products to a certain extent can be analysed with highthroughput microarray technology.

Acknowledgements The authors would like to thank Bahram Amini and Annelie Johansson of KTH, and Nina Sta˚hlberg of CMM, the Karolinska Institute, for excellent technical assistance. This work was supported by funds from AstraZeneca, the Knut and Alice Wallenberg foundation (KAW) and the Foundation for Strategic Research (SSF).

References Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. Andersson, I., Brass, A., 1998. Searching DNA databases for similarities to DNA sequences: when is a match significant? Bioinformatics 14, 349–356. Andersson, T., Bora¨ng, B., Larsson, M., Wirta, V., Wennborg, A.,

47

Lundeberg, J., Odeberg, J., 2002. Novel candidate genes for atherosclerosis are identified by representational difference analysis-based transcript profiling of cholesterol loaded macrophages. Pathobiology 69, 304–314. Boeuf, S., Klingenspor, M., Van Hal, N.L., Schneider, T., Keijer, J., Klaus, S., 2001. Differential gene expression in white and brown preadipocytes. Physiol. Genomics 7, 15– 25. Borang, S., Andersson, T., Thelin, A., Larsson, M., Odeberg, J., Lundeberg, J., 2001. Monitoring of the subtraction process in solid-phase representational difference analysis: characterization of a candidate drug. Gene 271, 183 –192. Geschwind, D.H., Ou, J., Easterday, M.C., Dougherty, J.D., Jackson, R.L., Chen, Z., Antoine, H., Terskikh, A., Weissman, I.L., Nelson, S.F., Kornblum, H.I., 2001. A genetic analysis of neural progenitor differentiation. Neuron 29, 325 –339. Gubler, U., 1988. A one tube reaction for the synthesis of blunt-ended double-stranded cDNA. Nucleic Acids Res. 16, 2726. Hegde, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J.E., Snesrud, E., Lee, N., Quackenbush, J., 2000. A concise guide to cDNA microarray analysis. Biotechniques 29, 548 –550.552–554, 556 passim. Hertzberg, M., Sievertzon, M., Aspeborg, H., Nilsson, P., Sandberg, G., Lundeberg, J., 2001. cDNA microarray analysis of small plant tissue samples using a cDNA tag target amplification protocol. Plant J. 25, 585– 591. Hubank, M., Schatz, D.G., 1994. Identifying differences in mRNA expression by representational difference analysis of cDNA. Nucleic Acids Res. 22, 5640–5648. Hultman, T., Bergh, S., Moks, T., Uhle´n, M., 1991. Bidirectional solidphase sequencing of in vitro-amplified plasmid DNA. Biotechniques 10, 84–93. Lisitsyn, N., Lisitsyn, N., Wigler, M., 1993. Cloning the differences between two complex genomes. Science 259, 946–951. Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., Mittmann, M., Wang, C., Kobayashi, M., Horton, H., Brown, E.L., 1996. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14, 1675–1680. Odeberg, J., Wood, T., Blucher, A., Rafter, J., Norstedt, G., Lundeberg, J., 2000. A cDNA RDA protocol using solid-phase technology suited for analysis in small tissue samples. Biomol. Eng. 17, 1–9. Schena, M., Shalon, D., Davis, R.W., Brown, P.O., 1995. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270, 467–470. Staden, R., 1996. The Staden sequence analysis package. Mol. Biotechnol. 5, 233 –241. Wang, E., Miller, L.D., Ohnmacht, G.A., Liu, E.T., Marincola, F.M., 2000. High-fidelity mRNA amplification for gene profiling. Nat. Biotechnol. 18, 457–459. Welford, S.M., Gregg, J., Chen, E., Garrison, D., Sorensen, P.H., Denny, C.T., Nelson, S.F., 1998. Detection of differentially expressed genes in primary tumor tissues using representational differences analysis coupled to microarray hybridization. Nucleic Acids Res. 26, 3059–3065. Xiang, C.C., Kozhich, O.A., Chen, M., Inman, J.M., Phan, Q.N., Chen, Y., Brownstein, M.J., 2002. Amine-modified random primers to label probes for DNA microarrays. Nat. Biotechnol. 20, 738–742.