YABIO 11718
No. of Pages 3, Model 5G
5 May 2014 Analytical Biochemistry xxx (2014) xxx–xxx 1
Contents lists available at ScienceDirect
Analytical Biochemistry journal homepage: www.elsevier.com/locate/yabio
2
Notes & Tips
7 4 8
A minimum variance method for genome-wide data-driven normalization of quantitative real-time polymerase chain reaction expression data
5 6
Benjamin Garcia a,b, Nicholas D. Walter c, Gregory Dolganov d, Marc Coram e, J. Lucian Davis f, Gary K. Schoolnik d, Michael Strong a,b,⇑
9 10
a
11 12 13 14 15 16
Integrated Center for Genes, Environment, and Health, National Jewish Health, Denver, CO 80206, USA Computational Bioscience Program, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO 80045, USA c Division of Pulmonary Sciences and Critical Care Medicine, University of Colorado Denver, Anschutz Medical Campus, Aurora, CO 80045, USA d Department of Microbiology and Immunology, Stanford University, Stanford, CA 94304, USA e Department of Health Research and Policy, Stanford University, Stanford, CA 94305, USA f Division of Pulmonary & Critical Care Medicine, San Francisco General Hospital, University of California, San Francisco, San Francisco, CA, USA b
17
a r t i c l e
1 2 9 8 20 21 22 23
i n f o
Article history: Received 4 April 2014 Accepted 18 April 2014 Available online xxxx
24 25 26 27
Keywords: Multiplex qRT-PCR Data-driven normalization
a b s t r a c t Advances in multiplex qRT-PCR have enabled increasingly accurate and robust quantification of RNA, even at lower concentrations, facilitating RNA expression profiling in clinical and environmental samples. Here we describe a data-driven qRT-PCR normalization method, the minimum variance method, and evaluate it on clinically derived Mycobacterium tuberculosis samples with variable transcript detection percentages. For moderate to significant amounts of nondetection (50%), our minimum variance method consistently produces the lowest false discovery rates compared to commonly used data-driven normalization methods. Ó 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
29 30 31 32 33 34 35 36 37 38
39
Accurate quantification of very low concentrations of RNA via multiplex qRT-PCR1 has enabled large-scale RNA expression profiling beyond the traditional laboratory settings, including the longitudinal analysis of clinical, pathogen, and environmentally derived samples. Data normalization in these types of samples poses three major challenges not typically encountered in carefully controlled laboratory experiments: (1) the absence of validated endogenous reference genes (housekeeping genes), (2) marked variation in the abundance of target material, and (3) differences between the reference sample and the test sample (between-sample variability) in the number of detectable transcripts, particularly in samples with low abundance of target material. As such, these studies are often normalized using data-driven methods, but between-sample variability in the number of nondetectable transcripts (i.e., missing transcripts) may bias these data-driven normalizations, potentially leading to false discovery. Nondetected transcripts, defined as transcripts with a zero or very low abundance as evaluated by a qRT-PCR threshold, may
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
⇑ Corresponding author at: Integrated Center for Genes, Environment, and Health,
Q1
National Jewish Health, Denver, CO 80206, USA. E-mail address:
[email protected] (M. Strong). 1 Abbreviations used: CT, cycle threshold; TB, tuberculosis; Mtb, Mycobacterium tuberculosis; qRT-PCR, quantitative real-time polymerase chain reaction.
result from two experimentally indistinguishable possibilities: a biological state in which a gene is not transcribed (true negative result), or conversely, an experimental error in which a gene is transcribed but not detected (false negative result) due to the detection limits of the assay. False negative results increase as the abundance of target material decreases and low-concentration transcripts can no longer be amplified efficiently. In our own work, we have encountered these challenges in the global analysis of Mycobacterium tuberculosis (Mtb) gene expression profiles from pathogen RNA derived from the sputum of patients treated for tuberculosis (TB). We observed that as antibiotic treatment kills the bacteria, the burden of live Mtb decreases, and therefore the number of transcripts detected declines progressively during longitudinal treatment. In this paper, we systematically evaluate data-driven normalization methods, and introduce a novel method—minimum variance normalization—designed for data with variable transcript nondetection. We first discuss circumstances in which standard reference-based normalizations are inadequate and data-driven normalization may be preferable. Using an experimental dilution series of mRNA derived from TB patient samples to simulate decreasing abundance of input target material and increasing proportions of nondetection, we evaluate three existing data-driven
http://dx.doi.org/10.1016/j.ab.2014.04.021 0003-2697/Ó 2014 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
Please cite this article in press as: B. Garcia et al., A minimum variance method for genome-wide data-driven normalization of quantitative real-time polymerase chain reaction expression data, Anal. Biochem. (2014), http://dx.doi.org/10.1016/j.ab.2014.04.021
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
YABIO 11718
No. of Pages 3, Model 5G
5 May 2014 2 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145
Notes & Tips / Anal. Biochem. xxx (2014) xxx–xxx
normalization methods (quantile, median, and mean) and our novel minimum variance method. To identify whether bias resulting from between-sample variability in transcript nondetection leads to increased false discovery rates, we compare the proportion of genes erroneously classified as differentially expressed with all four methods. The most commonly used method for normalizing qRT-PCR analysis in laboratory-based and controlled environments is the identification of a set of housekeeping genes whose expression is demonstrated or hypothesized to be invariant, relative to all other genes under the conditions being studied [1]. Expression is reported as a ratio relative to the expression of the set of reference genes in that sample. Experimental validation of stable expression of housekeeping genes is imperative, since choosing the wrong housekeeping genes for normalization may lead to erroneous conclusions [2–4]. Unfortunately, experimental validation of reference genes for normalization is frequently not possible for complex samples such as human clinical specimens, particularly those collected in observational field settings. In our study of Mtb expression, for example, bacterial phenotypes and sputum conditions change substantially with increasing duration of antibiotic treatment, transitioning from unencumbered bacterial growth prior to antibiotic initiation to rapid killing in the early days of treatment, to an antibiotictolerant ‘‘persister’’ bacterial state as therapy progresses [5]. We considered whether other sample characteristics—including sputum volume, quantitative culture, or nucleic acid abundance— might provide an alternative reference for normalization; an ideal measure would enumerate viable bacteria present. Unfortunately, no optimal sample-based referent was identified. Sputum sample volume is difficult to measure and does not directly correlate with bacterial burden. Culture fails to enumerate viable but nonculturable bacilli, which may be transcriptionally active. Quantity of Mtb DNA does not measure viable bacteria since DNA from killed Mtb has a long half-life in sputum [6]. Ribosomal RNA subunits appear to be regulated by antibiotic exposure [7] and are therefore not a stable reference. One alternative to using the sample-specific endogenous controls is to normalize to an exogenous control (spike-in) transcript of known quantity that is not present in the sample of interest. Exogenous controls enable adjustment for variability in PCR quantification [3]; however, since they are added after initial sample processing, they cannot control for variation in input biological material, which is frequently a central challenge in complex samples. Data-driven normalization is fundamentally different than normalization to endogenous or exogenous controls. Data-driven methods assume that mathematical or statistical properties of the data itself (such as the mean, the median, quantiles, or the variance) are invariant across samples and conditions. These properties are used as a reference to make the overall signal distribution of samples as similar to one another as possible. Long standard in analysis of cDNA microarrays [8,9], data-driven normalization methods have been adapted for PCR because multiplex platforms can simplify quantification of hundreds or thousands of transcripts [10]. Although data-driven normalization methods have been widely adopted [3,10], there has been insufficient attention paid to determining if between-sample variability in the proportion of nondetectable transcripts can bias data-driven normalization methods, leading to false discovery and false inference of differential expression. We compared data-driven normalization methods in dilution experiments designed to simulate the progressive increase in nondetection of low-abundance transcripts that often occurs as the abundance of target RNA decreases. Since there are no true biological differences between our dilutions, any transcripts significantly
differentially expressed in the comparison between dilutions represent false discoveries, which we can use to evaluate methods. In order to evaluate data-driven normalization methods we performed dilution experiments followed by the implementation of three commonly used data-driven methods and a novel alternative method, the minimum variance method, designed to minimize false discovery in such datasets. We compared false discovery rates among these different normalization methods, at different levels of transcript nondetection. Using the PCR platform described in the Supplemental methods, we quantified expression of 768 Mtb genes from sputum samples collected from two TB patients prior to antibiotic treatment. Three technical replicates were performed each at 10- and 100-fold dilutions, and six replicates were performed each at 1000- and 10,000-fold dilution. As expected, increasing dilution was associated with an increasing proportion of nondetected transcripts. Nondetection affected 1% of transcripts at 10-fold dilution, 10% at 100-fold dilution, 49% at 1000-fold dilution, and 62% at 10,000-fold dilution (Fig. 1). Nondetection was nonrandom, as probes with higher cycle threshold (CT) values prior to dilution were preferentially not detected (Supplemental Fig. S1). We normalized predilution and diluted samples with quantile, mean, median, and our novel minimum variance methods. Quantile normalization assumes that the distribution of expression values itself is invariant; it ranks each sample’s expression values and then partitions ordered expression values into intervals (quantiles) [10]. Expression values are then shifted so that the quantiles from one sample match the quantiles of the reference sample. Nonrandom nondetection skews quantiles to being less representative of reference quantiles for higher instances of nondetection. Because quantile normalization is standard practice in microarrays, we evaluated the effect of this nonrandom, nondetection skew on our dataset (Fig. 1 and Supplemental Fig. S1). The quantile method was tested with a bin size of 1% using all detectable transcripts. Quantiles were then shifted based on the difference in a given bin’s median compared to the reference median. Distributions become more divergent for greater nondetection, making quantile-based approaches less reliable. Median normalization assumes that the median expression value should not vary across samples; it shifts each sample’s distribution of expression values such that all samples end with the same median expression value. This is equivalent to a simplified quantile approach that normalizes based on only the 50th percentile. Two
Fig.1. Simulating effects of low abundance on nonrandom nondetection of transcripts using serial dilution experiments. Each point represents a detected transcript in the dilution with the cycle threshold (CT) value coming from the undiluted sample. Distributions become more divergent for greater CT values making quantile-based approaches unreliable.
Please cite this article in press as: B. Garcia et al., A minimum variance method for genome-wide data-driven normalization of quantitative real-time polymerase chain reaction expression data, Anal. Biochem. (2014), http://dx.doi.org/10.1016/j.ab.2014.04.021
146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189
YABIO 11718
No. of Pages 3, Model 5G
5 May 2014 Notes & Tips / Anal. Biochem. xxx (2014) xxx–xxx 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
217 219
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256
possible approaches to implementing median normalization in data with variable nondetection are based on: (1) the median of all detected transcripts and (2) the median of only the transcripts that are detectable in both the reference and the sample of interest (referred to as shared transcripts). Both of these methods were chosen to demonstrate the effect that nondetection has on false discovery rates, and to demonstrate a simple way to mitigate these effects for moderate amounts of nondetection. Mean normalization shifts expression based on the geometric mean for expression values and fold change or the arithmetic mean for CT values. Mean-based normalization often utilizes correction factors that account for differences in distributions between a reference set and a test set, one of which we used for this analysis [11]. The mean-based approach was calculated using the arithmetic mean of CT values and was then corrected for differences in standard deviation [11] between the undiluted and the diluted sample. Only shared detected transcripts between the diluted and the undiluted sample were used in this calculation. In the above methods, between-sample variability in the proportion of nondetected transcripts could alter the statistical property used for normalization, potentially leading to false discovery rates. To minimize this bias in normalization due to variability in transcript detection, we designed a minimum variance method that minimizes the between-sample variance in only the shared transcripts. A reference sample is selected and treated as the true distribution of transcripts. The variance between the reference and a sample is then calculated using the equation
global variance ¼ R
n i ððT i;sample
2
shiftÞ T i;reference Þ ;
ð1Þ
where Ti is the CT value for ith shared transcript and n is the total number of shared transcripts. The optimal shift to be used for normalization is the value that minimizes global variance and thereby makes the overall distribution of a sample’s expression values as similar to the reference as possible. We tested for significant differential expression between the undiluted and the diluted samples with unpaired t tests for methods that used all detectable transcripts and paired t tests for methods that used shared detectable transcripts. Each dilution replicate was treated as an independent sample with an undiluted reference in order to simulate the effect of having independent samples. A Benjamini–Hochberg multiple testing correction with a 0.05 false discovery rate was used to set a threshold for differential expression. The criterion for determining the validity of a method is a false discovery rate <0.05 (Fig. 2). For samples with low average nondetection (<10% nondetection), all methods performed well. For samples with high average nondetection (62% nondetection), no method had less than a 0.05 false discovery rate after the multiple testing correction. All paired methods performed adequately well up to an average nondetection of 49%, with minimum variance consistently performing the best out of the paired methods. By contrast, when average nondetection reached 49%, false discovery rates were 23% with quantile normalization and 30% with unpaired median normalization, indicating that these methods are inappropriate in samples with moderate to high nondetection. The quantile method performed slightly better than the unpaired median approach across all dilutions, which is expected given that the median approach is a simplified quantile approach. Our dilution series shows that even at a moderate level of nondetection (50%), normalization techniques that use only the shared detectable genes do not produce high false discovery rates, with the minimum variance approach consistently performing the best. Differential expression testing is still valid for samples with high nondetection and variable target material due to death of bacilli, allowing for elucidation of novel biology in clinical samples with low template material.
3
Fig.2. False discovery rates for different data-driven normalization methods. The false discovery rate is the percentage of transcripts that were significantly differentially expressed after the Benjamini–Hochberg correction for multiple hypothesis testing. Average nondetection represents the average percentage of transcripts that were nondetected across the dilution replicates.
Acknowledgments
257
B.G. acknowledges support from a NLM Institutional Training Q2 Grant, NIH 5T15LM009451, and from NICTA, funded by the Austra- Q3 lian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. M.S. acknowledges support from the Eppley Foundation, the Potts Foundation, and the Boettcher Foundation Webb-Waring Biomedical Research Program.
258
Appendix A. Supplementary data
265
Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ab.2014.04.021.
266
References
268
[1] S.A. Bustin, V. Benes, J.A. Garson, J. Hellemans, J. Huggett, M. Kubista, R. Mueller, T. Nolan, M.W. Pfaffl, G.L. Shipley, et al., The MIQE guidelines: minimum information for publication of quantitative real-time PCR experiments, Clin. Chem. 55 (2009) 611–622. [2] K. Dheda, J.F. Huggett, J.S. Chang, L.U. Kim, S.A. Bustin, M.A. Johnson, G.A. Rook, A. Zumla, The implications of using an inappropriate reference gene for realtime reverse transcription PCR data normalization, Anal. Biochem. 344 (2005) 141–143. [3] J. Huggett, K. Dheda, S. Bustin, A. Zumla, Real-time RT-PCR normalisation; strategies and considerations, Genes Immun. 6 (2005) 279–284. [4] T. Nolan, R.E. Hands, S.A. Bustin, Quantification of mRNA using real-time RTPCR, Nat. Protoc. 1 (2006) 1559–1582. [5] D. Mitchison, G. Davies, The chemotherapy of tuberculosis: past, present and future, Int. J. Tuberc. Lung Dis. 16 (2012) 724–732. [6] L.E. Desjardin, Y. Chen, M.D. Perkins, L. Teixeira, M.D. Cave, K.D. Eisenach, Comparison of the ABI 7700 system (TaqMan) and competitive PCR for quantification of IS6110 DNA in sputum during treatment of tuberculosis, J. Clin. Microbiol. 36 (1998) 1964–1968. [7] C.L. Stallings, N.C. Stephanou, L. Chu, A. Hochschild, B.E. Nickels, M.S. Glickman, CarD is an essential regulator of rRNA transcription required for Mycobacterium tuberculosis persistence, Cell 138 (2009) 146–159. [8] B.M. Bolstad, R.A. Irizarry, M. Astrand, T.P. Speed, A comparison of normalization methods for high density oligonucleotide array data based on variance and bias, Bioinformatics 19 (2003) 185–193. [9] T. Park, S.G. Yi, S.H. Kang, S. Lee, Y.S. Lee, R. Simon, Evaluation of normalization methods for microarray data, BMC Bioinform. 4 (2003) 33. [10] J.C. Mar, Y. Kimura, K. Schroder, K.M. Irvine, Y. Hayashizaki, H. Suzuki, D. Hume, J. Quackenbush, Data-driven normalization strategies for highthroughput quantitative RT-PCR, BMC Bioinform. 10 (2009) 110. [11] E. Willems, L. Leyns, J. Vandesompele, Standardization of real-time PCR gene expression data from independent biological replicates, Anal. Biochem. 379 (2008) 127–129.
269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300
Please cite this article in press as: B. Garcia et al., A minimum variance method for genome-wide data-driven normalization of quantitative real-time polymerase chain reaction expression data, Anal. Biochem. (2014), http://dx.doi.org/10.1016/j.ab.2014.04.021
259 260 261 262 263 264
267
301