ANALYTICAL BIOCHEMISTRY Analytical Biochemistry 344 (2005) 92–101 www.elsevier.com/locate/yabio
cRNA target preparation for microarrays: Comparison of gene expression proWles generated with diVerent ampliWcation procedures Heike Schindler 1, Anja Wiese 1, Johannes Auer, Helmut Burtscher ¤ Roche Diagnostics, Pharma Research, Nonnenwald 2, D-82372 Penzberg, Germany Received 11 April 2005 Available online 22 June 2005
Abstract Microarray technology has become a standard tool for generation of gene expression proWles to explore human disease processes. Being able to start from minute amounts of RNA extends the Welds of application to core needle biopsies, laser capture microdissected cells, and Xow-sorted cells. Several RNA ampliWcation methods have been developed, but no extensive comparability and concordance studies of gene expression proWles are available. DiVerent ampliWcation methods may produce diVerences in gene expression patterns. Therefore, we compared proWles processed by a standard microarray protocol with three diVerent types of RNA ampliWcation: (i) two rounds of linear target ampliWcation, (ii) random ampliWcation, and (iii) ampliWcation based on a template switching mechanism. The latter two methods accomplish target ampliWcation in a nonlinear way using PCR technology. Starting from as little as 50 ng of total RNA, the yield of labeled cRNA was suYcient for hybridization to AVymetrix HG-U133A GeneChip array using the respective methods. Replicate experiments were highly reproducible for each method. In comparison with the standard protocol, all three approaches are less sensitive and introduced a minor but clearly detectable bias of the detection call. In conclusion, the three ampliWcation protocols used are applicable for GeneChip analysis of small tissue samples. 2005 Elsevier Inc. All rights reserved. Keywords: Gene expression; Microarray; RNA ampliWcation
Microarray technique is a powerful tool for studying complex phenomena in biology and molecular medicine as represented by numerous applications, including developmental biology [1], disease classiWcation [2,3], drug discovery [4], and toxicology [5]. Using wholegenome arrays, thousands of genes can be analyzed simultaneously, allowing the characterization of the “transcriptome” of cells and tissues. The application of microarray technology in cancer research, in particular, promises a better understanding of molecular processes in tumors [6]. Subtyping of human cancers such as leukemia [7], hepatocellular carcinoma [8], and breast cancer *
Corresponding author. Fax: +49 8856 60 2659. E-mail address:
[email protected] (H. Burtscher). 1 These two investigators contributed equally to this work and should both be considered as Wrst authors. 0003-2697/$ - see front matter 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2005.06.006
[9], prediction of prognosis and patients’ treatment outcome [10], and discovery of diagnostic markers [11] and therapeutic targets are of major clinical interest. Studying clinical samples with microarrays remains a challenge due to the often low quality, complexity, and limited amount of human tissue available. During recent years, small size human tissue samples, such as core biopsies and speciWc cell populations selected by laser capture microdissection, have more often become a source for microarray analysis, yielding only minute amounts of total RNA. The amount of starting material needed depends on the microarray technology using cDNA [12] and oligonucleotide [13] platforms. cDNA arrays require up to 200 g of total RNA, whereas 5– 40 g of total RNA is recommended for oligonucleotide arrays (e.g., GeneChip array, AVymetrix, Santa Clara, CA, USA). Replicates are mandatory to reduce errors,
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
and this again increases the demand for total RNA. Much eVort has been made to reduce the amount of RNA required for analysis. Potential strategies to overcome the problem are signal ampliWcation [14–16] and RNA ampliWcation. A few commercial kits and protocols are available for RNA ampliWcation for microarray applications. Good performance of ampliWcation based on a template switching mechanism [17–19], random ampliWcation [20], and two rounds of linear target ampliWcation [21–24] has been reported. To our knowledge, no extensive comparability study of several RNA ampliWcation methods is available to this point. In this study, three diVerent RNA ampliWcation protocols were applied to generate suYcient material for Wnal hybridization to AVymetrix HG-U133A GeneChip arrays. We evaluated these target ampliWcation methods in terms of eVectiveness, robustness, and reproducibility. Another focus of our study was the analysis of concordance between target ampliWcation and standard methods.
Materials and methods Total RNA preparation Sections of two diVerent snap-frozen human lung tumor tissues were transferred into Lysing Matrix D (Bio101 System, Qbiogene, Heidelberg, Germany) and homogenized in 1 ml TriPure isolation reagent (Roche Diagnostics, Mannheim, Germany) using FastPrep FP120 (BIO101 Savant, Qbiogene). Total RNA was extracted according to the manufacturer’s instructions (Roche Diagnostics, Roche Applied Science) and Wnally puriWed using the RNeasy RNA isolation system (Qiagen, Hilden, Germany). Total RNA was quantiWed by spectrophotometry (GeneQuant pro, Biochrom, Cambridge, UK), aliquoted in 10-g and 50-ng portions, respectively, and stored at ¡70 °C until further processing. Standard protocol for cDNA synthesis As reference, 10 g of high-quality total RNA was used as template in the cDNA Wrst-strand synthesis, as recommended by the manufacturer (AVymetrix). The synthesis of double-stranded cDNA was performed using the Microarray cDNA Synthesis Kit (Roche Diagnostics) followed by the Microarray Target PuriWcation Kit (Roche Diagnostics).
93
instructions. In the reverse transcription reaction, a modiWed oligo(dT) primer containing a target ampliWcation sequence (TAS)2 was introduced, generating the 3⬘ anchor on the cDNA for subsequent PCR ampliWcation. For initiation of second-strand cDNA synthesis, an oligo(dN)10 primer, which also was coupled to a TAS sequence to include a 5⬘ anchor sequence, was used. Before PCR ampliWcation, the double-stranded cDNA product was puriWed using the Microarray Target PuriWcation Kit. A PCR with an optimal number of cycles (n D 24) was performed, and the ampliWed cDNA was puriWed. AmpliWcation based on a template switching mechanism The Super SMART PCR cDNA Synthesis Kit (BD Biosciences Clontech, Heidelberg, Germany) provides a second PCR-based method for generating doublestranded cDNA from small quantities of total RNA. Again, 50 ng of total RNA was used as starting material. To allow a subsequent labeling by in vitro transcription with T7 RNA polymerase, a modiWed 3⬘ SMART CDS primer II A was used for Wrst-strand synthesis containing the following T7 promoter sequence: 5⬘-aag cag tgg tat caa cgc aga gta tgt aat acg act cac tat agg gag gcg gtt ttt ttt ttt ttt ttt ttt ttt ttt ttt t(a/c/g)n-3⬘ (TIB Molbiol, Berlin, Germany). PowerScript reverse transcriptase’s (RT) terminal transferase activity adds a few additional nucleotides, primarily deoxycytidine, to the 3⬘ end of the cDNA when the enzyme reaches the 5⬘ end of the mRNA. The SMART oligonucleotide, which has an oligo(G) sequence at its 3⬘ end, base pairs with the deoxycytidine stretch, creating an extended template. RT then switches templates and continues replicating to the end of the oligonucleotide. The resulting full-length, single-stranded cDNA contains the complete 5⬘ end of the mRNA as well as sequences that are complementary to the SMART oligonucleotide. The long-distance PCR was optimized for each cDNA in a Perkin-Elmer DNA Thermal Cycler PE 9600. For all eight samples, the PCR was terminated after 23 cycles. The PCR products were Wnally puriWed with spin columns provided in the kit according to the manufacturer’s instructions. Two cycles of linear target ampliWcation Double-stranded cDNA was generated from 50 ng of starting material performing two cycles of standard cDNA synthesis combined with in vitro transcription for target ampliWcation [25]. The protocol described for the
Random ampliWcation of total RNA Random ampliWcation of 50 ng of total RNA was performed using the Microarray Target AmpliWcation Kit (Roche Diagnostics) following the manufacturer’s
2 Abbreviations used: TAS, target ampliWcation sequence; RT, reverse transcriptase; GCOS, GeneChip Operating Software; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; EGFR, epidermal growth factor receptor; IVT, in vitro transcription.
94
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
GeneChip eukaryotic small sample target labeling assay (version II, AVymetrix) was carried out with minor modiWcations. With the exception of Superscript II RT (Invitrogen, Karlsruhe, Germany), all enzymes were obtained from Roche Diagnostics. The ethanol precipitation of double-stranded cDNA was replaced by a puriWcation step using the Microarray RNA Target PuriWcation Kit (Roche Diagnostics). Generation of labeled target for hybridization by in vitro transcription Double-stranded cDNA was transcribed in vitro using the Microarray RNA Target Synthesis Kit (T7) (Roche Diagnostics). PCR-ampliWed cDNA (200 ng) and the complete reaction volume of linearly ampliWed samples were labeled according to the manufacturer’s protocol. Labeled antisense RNA was Wnally puriWed using RNeasy spin columns (Qiagen). Array hybridization and data analysis The fragmentation of antisense RNA, the hybridization of samples to HG-U133A AVymetrix GeneChip arrays, washing, and staining using the protocol EukGE_WS2v4 were carried out following the manufacturer’s protocol. The Xuorescence signals were detected with an HP GeneArray scanner (Hewlett–Packard, Palo Alto, CA, USA). Raw data were analyzed using AVymetrix GeneChip Operating Software (GCOS). The detection p value of a transcript determines the detection call, which indicates whether the transcript is reliably detected (present) or not detected (absent). Detection P values between 0 and 0.04 and between 0.06 and 1 result in absent calls and present calls, respectively. For reliable
comparisons of multiple arrays, the data were normalized by “global scaling.” The median of all gene expression values of each sample was set to 1, and all expression values were computed relative to it. R2 values determined by Pearson product–moment correlation measure the strength and direction of linear relationship between the x and y variables. Fold changes between two groups of samples were calculated from the median values of replicates, and expression levels of genes below 4 were set to 4. P values were determined by Wilcoxon rank test, and the signiWcance level was 0.05.
Results Experimental setup With the objective of assessing several ampliWcation procedures for gene expression proWling, we compared microarray proWles generated with a standard protocol (for 10 g total RNA) and three diVerent ampliWcation methods (for 50 ng total RNA). Two PCR-based nonlinear ampliWcation techniques (Microarray Target AmpliWcation Kit and Super SMART PCR cDNA Synthesis Kit) and two rounds of linear ampliWcation according to an AVymetrix technical note (GeneChip eukaryotic small sample target labeling assay, version II) were carried out as protocols for small sample target preparation. Starting from cDNA synthesis, each ampliWcation method was performed in eight parallel replicates with total RNA isolated from two diVerent human lung tissue samples as depicted in Fig. 1. Four replicates were prepared following the AVymetrix standard procedure of target preparation. This standard protocol, based on the method Wrst described by Van Gelder et al. [25], includes
Fig. 1. Experimental workXow for each lung tumor sample. Four methods were compared: standard protocol (A), two rounds of ampliWcation (B), random ampliWcation (C), and SMART ampliWcation (D). IVT, in vitro transcription. More detailed information for random ampliWcation at www.roche-applied-science.com/sis/microarray/images/wr_pcr_big.gif and for SMART ampliWcation is available at www.bdbiosciences.com/ clontech/techinfo/manuals/pdf/pt3041-1.pdf.
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
a linear ampliWcation step as well. Each labeled cRNA reaction mix was hybridized separately to an AVymetrix GeneChip HG-U133A array, which contains 22,283 probe sets corresponding to approximately 14,500 functionally characterized human UniGene clusters. Yield of cRNA Using as little as 50 ng of total RNA for cDNA synthesis, both linear and nonlinear ampliWcation procedures resulted in suYcient amounts of cRNA for optimal GeneChip hybridization as depicted in Table 1. Comparison of chip sensitivity and quality There are several criteria to describe the quality of the cRNA and the hybridization process (www.aVymetrix. com/support/downloads/manuals/data_analysis_fundamen tals_manual.pdf). After analysis with GCOS, we Wrst compared the number of present calls on the HG-U133A GeneChip array. This parameter describes the number of probe sets with a reliably detectable signal. In the samples generated by the Wrst and second samples of the standard protocol, 52.8 and 52.0%, respectively, of the probe sets on the HG-U133A GeneChip array gave present calls (Table 2). In comparison, 50.1 § 5.8% present calls is the average of 68 lung samples in our internal database. The percentage of present calls using ampliWcation was reduced compared with the reference, and the percentage was more than 40% for all samples that were ampliWed. Thus, the maximum number of present calls was observed for standard protocol samples indicating the highest chip sensitivity. Another measure for the quality of the hybridized nucleic acid is the ratio of the expression values of probe sets located at the 3⬘ and 5⬘ ends of diVerent housekeeping genes. The optimal value of a 3⬘/5⬘ ratio is 1, which indicates that the mRNA transcript was not enzymatically degraded during either isolation or
95
cDNA synthesis or that in vitro transcription was incomplete. From our experience, proWles of tissue samples with values up to 5 for human glyceraldehyde-3phosphate dehydrogenase (GAPDH) and up to 10 for human -actin are acceptable for further analysis. As expected, the nonlinear ampliWcation methods resulted in 3⬘/5⬘ ratios closer to 1 with regard to -actin and GAPDH (Table 2). Both systems use conditions that ensure an eYcient and accurate ampliWcation of cDNA templates by long-distance PCR, resulting in an enrichment of full-length sequences. For the linear ampliWcation, the ratios indicated a diVerential loss of the 5⬘ end of the transcripts; in particular for human -actin, the values clearly exceed our internal limit (Table 2). All values of the linear ampliWcation were higher than the values of PCR-based ampliWcation methods, with an average for human GAPDH between 2.2 and 11.1 and for human -actin as high as 30.3. However, this loss of 5⬘ ends of the transcripts was not reXected by a decrease in the number of present calls (Table 2). Intra- and intermethod reproducibility The Pearson product–moment correlation coeYcient (R2) of all probe sets was calculated to describe the variance within replicates of the same or diVerent target preparation methods. For all four methods of target generation, the average R2 value was 0.98 (Table 3A), which is in agreement with data also reported by AVymetrix (GeneChip eukaryotic small sample target labeling assay, version II) and Klur et al. [20]. Each ampliWcation strategy was reproducible and highly comparable with the standard method. Comparison of samples generated by either the standard assay or ampliWcation protocol, however, resulted in R2 values as low as 0.67 (Table 3B). Even lower R2 values (0.72–0.50) were obtained when diVerent ampliWcation strategies were compared with each other (Table 3C).
Table 1 Yields (g) of labeled cRNA Yield cRNA
Standard protocol (n D 4) Two rounds of linear ampliWcation (n D 8) Random ampliWcation (n D 8) SMART ampliWcation (n D 8)
First sample 36.9 § 7.2 Second sample 43.0 § 9.0
45.8 § 6.3 40.9 § 6.7
36.5 § 7.6 65.2 § 5.9
30.8 § 2.8 56.9 § 8.1
Note. As starting material for the standard GeneChip array, 10 g of total RNA was used. As starting material for the ampliWcation procedures, 50 ng of total RNA aliquots was used. Table 2 Hybridization and cRNA quality criteria Protocol
Standard protocol Two rounds of linear ampliWcation Random ampliWcation SMART ampliWcation
Percentage present calls
3⬘/5⬘ ratio GAPDH
3⬘/5⬘ ratio -actin
First sample
Second sample
First sample
Second sample
First sample
Second sample
52.8 § 1.4 43.8 § 2.2 42.6 § 8.5 41.0 § 1.7
52.0 § 1.0 43.7 § 2.2 48.9 § 2.3 48.9 § 2.2
2.2 § 0.9 5.4 § 1.0 1.9 § 0.6 2.0 § 0.3
2.8 § 0.1 11.1 § 1.7 1.8 § 0.1 1.0 § 0.1
11.0 § 5.8 21.9 § 0.9 3.8 § 2.4 7.8 § 1.8
9.9 § 0.7 30.3 § 6.2 1.3 § 0.1 3.9 § 1.2
96
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
Table 3 Average R2 values of intra- and intermethod comparisons Comparison
R2 (Wrst sample)
R2 (second sample)
A
Intramethod
Standard protocol (n D 4) Two rounds of linear ampliWcation (n D 8) Random ampliWcation (n D 8) SMART ampliWcation (n D 8)
0.981 § 0.002 0.980 § 0.004 0.974 § 0.014 0.983 § 0.004
0.988 § 0.002 0.967 § 0.020 0.986 § 0.004 0.969 § 0.064
B
Intermethod
Standard protocol versus two rounds of ampliWcation Standard protocol versus random ampliWcation Standard protocol versus SMART ampliWcation
0.817 § 0.023 0.719 § 0.036 0.746 § 0.044
0.758 § 0.053 0.670 § 0.013 0.823 § 0.007
C
Intermethod (ampliWcation)
Two rounds of ampliWcation versus random ampliWcation Two rounds of ampliWcation versus SMART ampliWcation Random ampliWcation versus SMART ampliWcation
0.646 § 0.034 0.715 § 0.012 0.500 § 0.030
0.510 § 0.033 0.600 § 0.047 0.617 § 0.015
Scatterplots of two replicates that are representative of all other comparisons are shown in Fig. 2. Data points for intramethod comparisons (Fig. 2A) were much closer to the regression line than were those for intermethod comparisons (Figs. 2B and C). The lowest correlation of signals was found in comparing replicates from diVerent ampliWcation methods (Fig. 2C). Furthermore, the scatterplots revealed that the maximum signals of two rounds of ampliWcation were similar to those of standard GeneChip arrays. The PCR ampliWed samples showed values of up to 16,000, indicating an increased range of signal intensities. Scatterplots as well as R2 values indicated a high correlation of the expression values of the probe sets within one method of target generation even after cDNA ampliWcation. However, the application of diVerent methods of target generation within one study should be avoided to minimize technical variability and to maximize reproducibility. Variability in detection call We also examined the variation between the standard protocol and ampliWcation protocols with respect to detection call. Because both samples behave similarly, only values for the Wrst sample are shown in Fig. 3. In comparing diVerent ampliWcation strategies with the reference, more than 85% of the probe sets on the HGU133A GeneChip array show an identical detection call (Fig. 3), as demonstrated in the Wrst two sets of columns (A >> A and P >> P, where A D absent calls and P D present calls). Changes in detection calls were observed only for low-level transcripts with average median signal intensities of less than 100 (data not shown). Also, 10, 12, and 13% of probe sets in the samples prepared with two rounds of ampliWcation, random ampliWcation, and SMART ampliWcation, respectively, were lost for the analysis as a result of changing present calls into absent calls after target ampliWcation (P >> A). Approximately 1–2% of probe sets were called absent in standard GeneChip assays but were present in samples with target ampliWcation (A >> P).
One major concern is that gene expression patterns of ampliWed samples are biased compared with samples processed by the standard protocol due to selective ampliWcation of only some transcripts. The percentages of discordant calls between replicates using the standard protocol and replicates using the standard and ampliWcation protocols, respectively, is shown in more detail in Fig. 4 for the Wrst sample only given that results for both samples are comparable. For all three intermethod comparisons, the percentage of discordant calls (P >> A, blue [left] bar) was more than twofold higher than that of replicates from the standard protocol. This indicates that low-abundant RNA is not always detectable with small sample target preparation methods. Among the ampliWcation strategies, the two rounds of ampliWcation procedure seem to be the most sensitive in this respect. In contrast, the percentages (1.9–3.0%) of discordant calls (A >> P, green [middle] bar) were slightly lower in comparison with that of the standard protocol (3.6%), suggesting a high ampliWcation speciWcity of all three ampliWcation methods. In a more detailed analysis, a consistent and reproducible loss (P >> A) was detected only for a minority of probe sets (<0.1%). In summary, all three small sample target preparation methods do aVect the detection calls, particularly of low-level transcripts, but they are comparable in terms of variability. Variability in signal intensities The impact of target ampliWcation on signal intensities was also examined. We analyzed the signal intensity diVerences of samples generated by either the target ampliWcation protocol or the standard protocol. As seen with previous analyses, similar results were observed for both samples. In Table 4A, the numbers of increases and decreases are given for the Wrst sample. In all three ampliWcation protocols, the values are comparable; there were nearly equal numbers of signiWcantly (P < 0.05) increased or decreased signal intensities (from 16.7 to 21.5% of the probe sets represented on HG-U133A GeneChip array) compared with the reference. In addi-
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
97
Fig. 2. Scatterplots and R2 values of selected comparisons of the Wrst sample. Normalized gene expression values of replicates of each target preparation method (A), the standard GeneChip assay and small sample target labeling assay (B), and diVerent ampliWcation protocols (C) were plotted.
tion, 5.8–11.1% of the changes were at least twofold. The subsets of genes that show diVerent signal intensities after target ampliWcation overlapped only marginally,
presumably due to diVerences in experimental conditions (e.g., PCR primers, enzymes). This result conWrms that no subsets of genes were ampliWed preferentially.
98
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
Fig. 3. Analysis of detected probe sets comparing ampliWcation methods with standard protocol of the Wrst sample. The average calls over all replicates from one method were calculated. Comparisons between the standard method and two rounds of ampliWcation, the standard method and random ampliWcation, and the standard method and SMART ampliWcation are indicated with white (left), blue (middle), and red (right) bars, respectively. The percentages of probe sets called absent (A >> A) and present (P >> P) in both methods are plotted, as are the percentages of discordant calls represented as percentage of probe sets called present in the standard method and absent in the ampliWcation method (P >> A) and vice versa (A >> P).
Fig. 4. Concordance comparisons between standard and ampliWcation methods of the Wrst sample. The percentage of discordant detection calls of two replicates was determined by Wltering the probe sets that were present in replicate 1 and absent in replicate 2 (blue bars, left) and the reverse (green bars, middle). The total number of discordant calls is shown with yellow bars (right). All comparisons possible between replicates were performed, and the average values are shown. Table 4 SigniWcant changes of signal intensities (Wrst sample) of (A) all probe sets represented on HG-U133A GeneChip and (B) a subset of genes related to the EGFR pathway Comparison
Percentage increases Total
72-fold
Total
72-fold
A
Standard protocol versus two rounds of ampliWcation Standard protocol versus random ampliWcation Standard protocol versus SMART ampliWcation
17.3 21.4 16.8
5.8 11.0 7.6
20.9 16.7 21.5
8.5 11.1 8.7
B
Standard protocol versus two rounds of ampliWcation Standard protocol versus random ampliWcation Standard protocol versus SMART ampliWcation
6.0 10.7 4.0
0 2.0 0.7
16.8 13.4 18.1
4.0 9.4 6.0
Finally, similar analyses were performed with a subset of 124 genes related to the epidermal growth factor receptor (EGFR) pathway that is involved in the
Percentage decreases
pathogenesis of cancer. For the Wrst sample, the signals of 67 probe sets were detectable. Among these probe sets, only a minority (5–7%) were lost during target
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
ampliWcation independent of the method chosen. These probe sets represent exclusively low-level expressed genes with signal intensities below 200. Approximately 23% of the probe sets showed signal intensity diVerences with a conWdence level of P < 0.05. As expected, the least bias was introduced by the linear ampliWcation method, with 4% of the probe sets showing a twofold or greater change in expression level. In concordance with the analysis done with the entire HG-U133A GeneChip array, 4–12% of the probe sets diVered by twofold (Table 4B). In summary, less than 12% of the changes have a magnitude of twofold or greater over all 22,283 probe sets as well as in a subset of 67 probe sets. Comparison of two diVerent tumor samples Gene expression patterns of two diVerent human lung tumor tissue samples were compared to analyze the impact of the diVerent protocols on the set of signiWcantly diVerentially expressed genes. Nearly 23% of the 22,283 probe sets contained in the HG-U133A AVymetrix GeneChip array for the standard and linear ampliWcation method and approximately 26% of the exponentially ampliWed samples showed a signiWcant change (7twofold, P < 0.05) between both lung samples (Table 5A). When the probe sets that were below the detection limit in all replicates of both samples were eliminated, the percentages of diVerentially expressed probe sets dropped to less than 18% (Table 5A). For the comparison of standard
99
protocol samples, 4019 probe sets were identiWed as diVerentially expressed in both samples. Each ampliWcation method identiWed only approximately 56% of these probe sets (Table 5B). Decreased percentages of overlap (43.1– 51.5%) were observed for comparisons of the ampliWcation protocols (Table 5B). After comparing the results for every target preparation protocol, we found a common list of 1271 probe sets (31.6%) that could be identiWed as diVerentially expressed in both samples. The analysis of targets that were not identiWed as signiWcantly diVerentially expressed by the ampliWcation methods, but that were identiWed as signiWcantly diVerentially expressed by the standard method, also showed no signiWcant diVerences among the three ampliWcation strategies. A quarter of the targets were not detectable in the ampliWed samples. Approximately 70% of the probe sets showed a change of expression of less than twofold, and very few probe sets (<1%) did not show a signiWcant change at all. Finally, diVerential gene expression of the subset of genes related to the EGFR pathway (see also the preceding paragraph) was analyzed in more detail. The standard method revealed 22 probe sets with a signiWcant change of more than twofold (Table 6). In accordance with the analysis of all probe sets represented on the HG-U133A GeneChip array, 50–59% of the probe sets were conWrmed by the ampliWcation procedures as signiWcantly diVerent expressed in both tumor samples. Only 31.8% of the probe sets were detected with either
Table 5 Comparison of two diVerent tumor samples: diVerentially regulated probe sets of all probe sets represented on the GeneChip (A) and intersection of detectable and signiWcantly diVerentially expressed probe sets (B) Standard protocol Two rounds of linear ampliWcation Random ampliWcation SMART ampliWcation A
B
Filter criteria 72-fold 7106 (31.8) 72-fold (P < 0.05) 4924 (22.1) 72-fold (P < 0.05, detectable signal) 4019 (18.0)
6315 (28.3) 5057 (22.7) 3179 (14.3)
6710 (30.1) 5645 (25.3) 3806 (17.1)
6949 (31.2) 5891 (26.4) 3710 (16.6)
Protocol Two rounds of linear ampliWcation Random ampliWcation SMART ampliWcation
— 1964 (48.9) 1731 (43.1)
— — 2069 (51.5)
— — —
2226 (55.4) 2311 (57.5) 2263 (56.3)
Note. Standard deviations are in parentheses. Table 6 Comparison of two diVerent tumor samples: diVerentially regulated probe sets of a subset of genes related to the EGFR pathway represented on the GeneChip (A) and intersection of detectable and signiWcantly diVerentially expressed probe sets (B) Standard protocol Two rounds of linear ampliWcation Random ampliWcation SMART ampliWcation A Filter criteria 72-fold 49 (39.5) 72-fold (P < 0.05) 32 (25.8) 72-fold (P < 0.05, detectable signal) 22 (17.7) B
Protocol Two rounds of linear ampliWcation Random ampliWcation SMART ampliWcation
11 (50.0) 12 (55.0) 13 (59.0)
Note. Standard deviations are in parentheses.
38 (30.6) 30 (24.2) 11 (8.9)
35 (28.1) 29 (23.4) 12 (9.7)
34 (27.4) 25 (20.2) 13 (10.5)
— 9 (40.9) 7 (31.8)
— — 8 (36.4)
— — —
100
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
linear or nonlinear ampliWcation procedures, underscoring the presence of a system-speciWc bias.
Discussion The application of microarray technology for nanogram quantities of RNA allows more comprehensive gene expression studies, including those of small biopsies, laser capture microdissected cells, and Xow-sorted cells. In this study, several RNA ampliWcation techniques were tested systematically for gene expression analysis of samples with limited amounts of starting material. Several parameters, such as cRNA yield, 3⬘/5⬘ ratios, percentages of present calls, and R2 values, were compared with those of samples processed with the standard GeneChip protocol. All three ampliWcation procedures resulted in suYcient and good quality cRNA as well as in highly reproducible proWles. The performances were comparable for all three methods with one exception: the 3⬘/5⬘ ratios for housekeeping genes were higher for two rounds of ampliWcation than those of PCRampliWed and standard samples, in agreement with the reports of others [20,23,26]. One intrinsic feature of this ampliWcation procedure is the truncation of the 5⬘ ends of total RNA during the random priming step during the second round of cDNA synthesis [27]. This eVect can be observed for the analyzed housekeeping genes GAPDH and -actin because oligonucleotides are derived from the entire coding region. However, because the oligonucleotide probes of the HG-U133A GeneChip array are preferentially deduced from a 600-base region at the 3⬘ end of the mRNAs (AVymetrix technical note: array design for the GeneChip human genome 133 set), the loss of 5⬘ ends should have a minor eVect on the hybridization signals of the majority of transcripts represented on the array. On the other hand, for PCR ampliWcations, the 3⬘/5⬘ ratios are always satisfying independent of the RNA quality. The experimental conditions chosen probably enhance the enrichment of particularly long transcripts. For these reasons, 3⬘/5⬘ ratios are not suitable as quality criteria for ampliWed samples. If suYcient material is available, the total RNA integrity can be evaluated using, for example, the Agilent 2100 bioanalyzer. Despite the high intramethod reproducibility, comparability of microarray data generated with diVerent target preparation methods is not guaranteed [17,20,23,26,28]. Concordance studies have shown that approximately 10% of the signals of the standard GeneChip array were not detectable in small sample preparations, and this was also reXected in a decreased percentage of present calls. The number of false positive calls in small sample preparations was negligible (1–2%). Such signals are close to background, and this was also described by McClintick et al. [26]. The identiWed probe
sets that varied between the reference and the ampliWcation methods showed almost no overlap. This Wnding indicated that the bias introduced depends on the protocol. The introduction of this method-related bias was conWrmed by the analysis of two diVerent tumor samples. Regardless of the examination of either all probe sets represented on the GeneChip or a subset of clinical relevant probe sets, the ampliWcation methods detected only approximately 55% of signiWcantly diVerentially expressed probe sets that were also identiWed by the standard protocol. Comparing the ampliWcation protocols, a decreased percentage of overlap (43.1–51.5%) was observed. The following technical points should be considered for high-throughput use of target ampliWcation protocols. The two-round ampliWcation method is more laborand cost-intensive than are the PCR-based technologies. More hands-on labor steps, which also enhance the susceptibility of the target to degradation, are needed. On the other hand, PCR-based techniques are highly dependent on careful optimization of the number of PCR cycles in each experiment, and this can be cumbersome. Several RNA ampliWcation methods have also been evaluated for high-density cDNA array applications [19,22,29]. The high reproducibility of diVerent RNA ampliWcation methods and the introduction of a slight bias to samples by RNA ampliWcation reported are consistent with our Wndings. In this study, we focused on the comparison of diVerent small sample protocols starting from 50 ng of total RNA. For the analysis of even lower amounts of total RNA from, for example, single cells, the addition of T4 gp32 may enhance the yield of labeled cRNA [30–32]. ModiWcations [33–35] or combinations [36,37] of diVerent ampliWcation principles for target preparation have also been published. Although using small quantities of RNA as starting material appears to sacriWce some sensitivity, our analysis conWrmed that target ampliWcation strategies can be used to get reliable results with microarray gene expression technology. However, analysis of microarray data generated with diVerent target ampliWcation strategies should be avoided.
References [1] L. Smith, A. GreenWeld, DNA microarrays and development, Hum. Mol. Genet. 12 (2003) R1–R8. [2] A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson Jr., G.M. Hampton, Molecular classiWcation of human carcinomas by use of gene expression signatures, Cancer Res. 61 (2001) 7388–7393. [3] S. Ramaswamy, K.N. Ross, E.S. Lander, T.R. Golub, A molecular signature of metastasis in primary solid tumors, Nat. Genet. 33 (2003) 49–54. [4] U. Scherf, D.T. Ross, M. Waltham, L.H. Smith, J.K. Lee, L. Tanabe, K.W. Kohn, W.C. Reinhold, T.G. Myers, D.T. Andrews, D.A. Scudiero, M.B. Eisen, E.A. Sausville, Y. Pommier, D. Botstein, P.O.
cRNA target preparation for microarrays / H. Schindler et al. / Anal. Biochem. 344 (2005) 92–101
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13] [14]
[15]
[16]
[17]
[18]
[19]
Brown, J.N. Weinstein, A gene expression database for the molecular pharmacology of cancer, Nat. Genet. 24 (2000) 236–244. M.D. Waters, K. Olden, R.W. Tennant, Toxicogenomic approach for assessing toxicant-related disease, Mutat. Res. 544 (2003) 415– 424. G. Russo, C. Zegar, A. Giordano, Advantages and limitations of microarray technology in human cancer, Oncogene 22 (2003) 6497–6507. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, H. Coller, M.L. Loh, J.R. Downing, M.A. Caligiuri, C.D. BloomWeld, E.S. Lander, Molecular classiWcation of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (1999) 531–537. N. Iizuka, M. Oka, H. Yamada-Okabe, N. Mori, T. Tamesa, T. Okada, N. Takemoto, A. Tangoku, K. Hamada, H. Nakayama, T. Miyamoto, S. Uchimura, Y. Hamamoto, Comparison of gene expression proWles between hepatitis B virus- and hepatitis C virus-infected hepatocellular carcinoma by oligonucleotide microarray data on the basis of a supervised learning method, Cancer Res. 62 (2002) 3939–3944. T. Sorlie, C.M. Perou, R. Tibshirani, T. Aas, S. Geisler, H. Johnsen, T. Hastie, M.B. Eisen, M. van de Rijn, S.S. JeVrey, T. Thorsen, H. Quist, J.C. Matese, P.O. Brown, D. Botstein, P. Eystein Lonning, A.L. Borresen-Dale, Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications, Proc. Natl. Acad. Sci. USA 98 (2001) 10869–10874. L.J. van’t Veer, H. Dai, M.J. van de Vijver, Y.D. He, A.A. Hart, M. Mao, H.L. Peterse, K. van der Kooy, M.J. Marton, A.T. Witteveen, G.J. Schreiber, R.M. Kerkhoven, C. Roberts, P.S. Linsley, R. Bernards, S.H. Friend, Gene expression proWling predicts clinical outcome of breast cancer, Nature 415 (2002) 530–536. S. Varambally, S.M. Dhanasekaran, M. Zhou, T.R. Barrette, C. Kumar-Sinha, M.G. Sanda, D. Ghosh, K.J. Pienta, R.G. Sewalt, A.P. Otte, M.A. Rubin, A.M. Chinnaiyan, The polycomb group protein EZH2 is involved in progression of prostate cancer, Nature 419 (2002) 624–629. M. Schena, D. Shalon, R.W. Davis, P.O. Brown, Quantitative monitoring of gene expression patterns with a complementary DNA microarray, Science 270 (1995) 467–470. R.J. Lipshutz, S.P. Fodor, T.R. Gingeras, D.J. Lockhart, High density synthetic oligonucleotide arrays, Nat. Genet. 21 (1999) 20–24. R.L. Stears, R.C. Getts, S.R. Gullans, A novel, sensitive detection system for high-density microarrays using dendrimer technology, Physiol. Genom. 3 (2000) 93–99. S.L. Karsten, V.M. Van Deerlin, C. Sabatti, L.H. Gill, D.H. Geschwind, An evaluation of tyramide signal ampliWcation and archived Wxed and frozen tissue in microarray gene expression analysis, Nucleic Acids Res. 30 (2002) E4. E. Manduchi, L.M. Scearce, J.E. Brestelli, G.R. Grant, K.H. Kaestner, C.J. Stoeckert Jr., Comparison of diVerent labeling methods for two-channel high-density microarray experiments, Physiol. Genom. 10 (2002) 169–179. Y. Li, S. Ali, P.A. Philip, F.H. Sarkar, Direct comparison of microarray gene expression proWles between non-ampliWcation and a modiWed cDNA ampliWcation procedure applicable for needle biopsy tissues, Cancer Detect. Prev. 27 (2003) 405–411. D. Seth, M.D. Gorrell, P.H. McGuinness, M.A. Leo, C.S. Lieber, G.W. McCaughan, P.S. Haber, SMART ampliWcation maintains representation of relative gene expression: quantitative validation by real time PCR and application to studies of alcoholic liver disease in primates, J. Biochem. Biophys. Methods 55 (2003) 53–66. L.G. Puskas, A. Zvara, L. Hackler Jr., T. Micsik, P. van Hummelen, Production of bulk amounts of universal RNA for DNA microarrays, BioTechniques 33 (2002) 898–904.
101
[20] S. Klur, K. Toy, M.P. Williams, U. Certa, Evaluation of procedures for ampliWcation of small-size samples for hybridization on microarrays, Genomics 83 (2004) 508–517. [21] A.L. Feldman, N.G. Costouros, E. Wang, M. Qian, F.M. Marincola, H.R. Alexander, S.K. Libutti, Advantages of mRNA ampliWcation for microarray analysis, BioTechniques 33 (2002) 906–914. [22] V. Nygaard, A. Loland, M. Holden, M. Langaas, H. Rue, F. Liu, O. Myklebost, O. Fodstad, E. Hovig, B. Smith-Sorensen, EVects of mRNA ampliWcation on gene expression ratios in cDNA experiments estimated by analysis of variance, BMC Genom. 4 (2003) 11. [23] V. Luzzi, M. Mahadevappa, R. Raja, J.A. Warrington, M.A. Watson, Accurate and reproducible gene expression proWles from laser capture microdissection, transcript ampliWcation, and high density oligonucleotide microarray analysis, J. Mol. Diagn. 5 (2003) 9–14. [24] C.I. Dumur, C.T. Garrett, K.J. Archer, S. Nasim, D.S. Wilkinson, A. Ferreira-Gonzalez, Evaluation of a linear ampliWcation method for small samples used on high-density oligonucleotide microarray analysis, Anal. Biochem. 331 (2004) 314–321. [25] R.N. Van Gelder, M.E. von Zastrow, A. Yool, W.C. Dement, J.D. Barchas, J.H. Eberwine, AmpliWed RNA synthesized from limited quantities of heterogeneous cDNA, Proc. Natl. Acad. Sci. USA 87 (1990) 1663–1667. [26] J.N. McClintick, R.E. Jerome, C.R. Nicholson, D.W. Crabb, H.J. Edenberg, Reproducibility of oligonucleotide arrays using small samples, BMC Genom. 4 (2003) 4. [27] J.G. Glanzer, J.H. Eberwine, Expression proWling of small cellular samples in cancer: less is more, Br. J. Cancer 90 (2004) 1111–1114. [28] C.L. Wilson, S.D. Pepper, Y. Hey, C.J. Miller, AmpliWcation protocols introduce systematic but reproducible errors into gene expression studies, BioTechniques 36 (2004) 498–506. [29] J. Schneider, A. Buness, W. Huber, J. Volz, P. Kioschis, M. Hafner, A. Poustka, H. Sultmann, Systematic analysis of T7 RNA polymerase based in vitro linear RNA ampliWcation for use in microarray experiments, BMC Genom. 5 (2004) 29. [30] L.R. Baugh, A.A. Hill, E.L. Brown, C.P. Hunter, Quantitative analysis of mRNA ampliWcation by in vitro transcription, Nucleic Acids Res. 29 (2001) E29. [31] C. Villalva, C. Touriol, P. Seurat, P. Trempat, G. Delsol, P. Brousset, Increased yield of PCR products by addition of T4 gene 32 protein to the SMART PCR cDNA synthesis system, BioTechniques 31 (2001) 81–86. [32] M. Kenzelmann, R. Klaren, M. Hergenhahn, M. Bonrouhi, H.J. Grone, W. Schmid, G. Schutz, High-accuracy ampliWcation of nanogram total RNA amounts for gene proWling, Genomics 83 (2004) 550–558. [33] C.C. Xiang, M. Chen, L. Ma, Q.N. Phan, J.M. Inman, O.A. Kozhich, M.J. Brownstein, A new strategy to amplify degraded RNA from small tissue samples for microarray studies, Nucleic Acids Res. 31 (2003) e53. [34] R. Singh, R.J. Maganti, S.V. Jabba, M. Wang, G. Deng, J.D. Heath, N. Kurn, P. Wangemann, Microarray based comparison of three ampliWcation methods for nanogram amounts of total RNA, Am. J. Physiol. Cell Physiol. 21 (2004) 1179–1189. [35] A. DaVorn, P. Chen, G. Deng, M. Herrler, D. Iglehart, S. Koritala, S. Lato, S. Pillarisetty, R. Purohit, M. Wang, S. Wang, N. Kurn, Linear mRNA ampliWcation from as little as 5 ng total RNA for global gene expression analysis, BioTechniques 37 (2004) 854–857. [36] E. Wang, L.D. Miller, G.A. Ohnmacht, E.T. Liu, F.M. Marincola, High-Wdelity mRNA ampliWcation for gene proWling, Nat. Biotechnol. 18 (2000) 457–459. [37] L. Hu, J. Wang, K. Baggerly, H. Wang, G.N. Fuller, S.R. Hamilton, K.R. Coombes, W. Zhang, Obtaining reliable information from minute amounts of RNA using cDNA microarrays, BMC Genom. 3 (2002) 16.