Respiratory Physiology & Neurobiology 135 (2003) 109 /119 www.elsevier.com/locate/resphysiol
Statistical methods for chip calibration and saturation effects in antibody-spiked gene expression data J. Sunil Rao a,*, Jingjin Li b a
Case Western Reserve University, 10900 Euclid Avenue, 160 Peter B. Lewis Building, Cleveland, OH 44106 7235, USA b Indiana University School of Medicine, Indianapolis, IN 46202, USA Accepted 30 December 2002
Abstract Oligonucleotide microarrays are amongst, a set of technologies that allow for high throughput assessment of vast numbers of gene expressions. In order to evaluate gene expressions given detection limits, antibody spiking is often used providing one with an expression curve relating antibody treated expression and non-antibody treated expression. These curves can exhibit different functional shapes across chips and hence need to be standardized. In addition, each curve is subject to saturation effects, which are typically dealt with by extrapolating a linear fit to the subset of the data not visually subject to saturation. In this paper we introduce methods for the non-parametric standardization of expression curves using univariate smoothers. We also explore parametric methods for more efficient analysis of the standardized curves. We demonstrate an alternate method of parametric analysis using a weighted linear mixed effects model that does not arbitrarily delete data beyond an observed saturation point; allows for natural grouping of genes and provides significantly more accurate predictions than naive linear extrapolation. Both methodologies are studied through sets of simulations. # 2003 Elsevier Science B.V. All rights reserved. Keywords: Gene, chips, expression; Methods, gene expression curves
1. Introduction DNA microarray technology (often called a gene chip) allows researchers to measure the expression levels of thousands of genes (expression profiles) simultaneously over different time points, different experimental conditions or different tissue samples, DNA or RNA molecules are hybri* Corresponding author. Tel.: /1-216-368-1966; fax: /1216-368-3970. E-mail address:
[email protected] (J.S. Rao).
dized with a library of complementary strands fixed on a solid surface. Oligonucleotide chips contain gene-specific sequences about 20 bases long. These are then hybridized with labeled probe derived from a given tissue or cell line (i.e. sample). The resulting fluorescence intensity gives information about the abundance of the corresponding sample mRNA. Alternatively, cDNA can be spotted on nylon filters or glass slides. Complex mRNA probes are reverse transcribed to cDNA and labeled with red or green fluorescent dyes. This technique is often called the spotted array or
1569-9048/03/$ - see front matter # 2003 Elsevier Science B.V. All rights reserved. doi:10.1016/S1569-9048(03)00040-5
110
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
Fig. 1. Typical gene expression curve produced by two scans of data */the antibody treated readings and the non-antibody treated readings. Notice the clear saturation effects present at higher gene expression values.
cDNA array. Good references for the biotechnology of microarrays can be found in Schena et al. (1998), Schena and Davis (1999). Review papers on many other aspects of microarrays can be found in The Chipping Forecast (1999). The expression profiles generated from samples on the gene chips are subject to threshold detection limits with lower expressing genes being rounded to some lower bound of detection (Lin et al., 2000). To circumvent this problem, samples are simultaneously spiked with a complimentary antibody (AB) and expression levels measured (Affy manual protocol, page 6). This unfortunately also induces expression saturation effects artificially lowering AB treated expression values. A characteristic plot is presented in Fig. 1 where data from a gene chip used to assay expression profiles in a tissue sample of metastatic colon cancer is plotted. The experiment from which this chip’s
data was taken will be described later in the manuscript. This type of relationship will be coined a gene expression curve. The gene chip architecture is based on a proprietary algorithm developed at EOS Biotechnology and consists of 35 371 probes sets (genes) each consisting of eight individual probes and their complementary mismatched probe. The actual gene expression level measured is a trimmed mean of the eight intensity readings coming from the probe pairs. As indicated, the AB versus non-AB plot shows an initial linear trend followed by a sigmoidal type saturation after which point the curve flattens dramatically. Biologically relevant saturation effects are quite common in other applications. For instance, in the study of pharmacokinetics, the saturation effects can be expected as part of the biological system under study. Statistical methods have been widely
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
111
Fig. 2. Shape standardization for four parameter logistic simulated curves. The first panel shows these four curves. The remaining panels illustrate various views of the standardization process. Reading across the rows, the second panel shows plots of Y vs. the Y¯ : The first panel of row 2 shows the same thing after standardization. The second panel of row 2 shows the actual standardized curves and the last panel (row 3) gives the median curve as a reference.
used to model such data successfully (Guardabasso et al., 1987). One classic example is the four parameter logistic model: Y [(ad)=(1X =cb )]d where the parameters (a , b, c , d ) trace out a sigmoidal shape curve, controlling the length of
flatness at the bottom, the slope of the linear portion, the location of the middle point of the linear portion and the length of the upper flat portions of the sigmoid, respectively. The random variables Y and X would represent the antibodytreated and non-antibody treated gene expression measurements, respectively. Guardabasso et al.
112
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
(1987) provided fitting algorithms for these curves and they are used widely in the study of pharmacokinetic systems. With gene expressions, however, the saturation is something that is understood to represent a form of measurement error. Ultimately scientists choose to work with either the AB-treated or untreated readings only. In order to do so for all genes on a chip, a statistical model must be estimated that relates the two expression values and corrects for the presence of measurement error due to saturation and detection effects. A typical, very naive statistical analysis that is commonly used in practice involves visually determining the approximate level of non-AB treated expression where saturation or thresholding begins and then fitting a standard linear model to the data prior to this point. AB-treated gene expression values beyond the linear part of the data are estimated/predicted using linear extrapolation. There are a few obvious drawbacks of this approach. A fair amount of subjectivity is involved in eyeballing the thresholding value; not all of the data is ultimately used, and the underlying assumption that expression values beyond the observed saturation point will follow the same linear fit as those before can be suspect. A simple methodology that assigns observations weights for all data points (derived from a simple measurement error formulation) will be developed in this paper. This will allow modeling of all genes without the need to eyeball an expression saturation point. Another underlying drawback of the naive approach will be addressed as well. More subtely and perhaps more importantly, there is a gene clustering effect that is ignored with the naive approach. Genes may in fact cluster with respect to the effect of antibody treatment on gene expression. In fitting models that ignore this clustering, we assume implicitly that each gene in fact has the same (average) linear relationship between AB and non-AB readings. More accurate models can be estimated that allow for clusterspecific effects */in this case cluster-specific slopes and intercepts, using linear mixed effects models (Laird and Ware, 1982). Combining this approach with our derived observation weights leads to a
weighted mixed effects model that allows for measurement error corrected, cluster specific estimates of AB- or non-AB-treated gene expressions. This improved model provides increased accuracy along with less subjectivity. This can be potentially important when trying to use gene chips for the understanding of the genetic mechanisms underlying biological processes such as the control of breathing or tumor metastasis. Prior to this, however, we will discuss the issue of chip to chip standardization which is a well recognized problem (Efron et al., 2000), but takes on a slightly different twist when simultaneously considering pairs of expression readings. Systematic differences between chip hybridizations are best removed prior to any substational modeling of the data. Early approaches (for single expression readings) have either used linear transformations (i.e. chip debrightening) to give all chips mean 0 and variance 1 expression distributions (Efron et al., 2000), or have shifted chips to a so-called ‘median’ chip (Amaratunga and Cabrera, 2001). With AB, non-AB pairs of readings, there is a need for standardizing each gene expression curve in such a way as to take into account each curve’s individual structural features. We will develop an algorithm that does exactly this in a non-parametric way, where each curve is shifted to an estimated median curve. This is in spirit, an extension of the one-dimensional standardization algorithm of Amaratunga and Cabrera (2001). We will assume for notational purposes that the same N genes are assayed in each of M hybridizations. The paired (AB, non-AB) gene expression values will be represented by the random variables {(Xi , Yi ); i/1, . . ., N } which constitute chip m ’s data denoted Tm ; m /1,. . ., M . It is also typical to use log-transformed readings to stabilize variances at higher expression readings (Speed, 2000).
2. Standardizing chips using shape projections Standardization is needed to remove systematic chip to chip effects like differences in background or levels of mRNA in samples. Current techniques either linearly shift expression values to give a chip with mean 0 and variance 1 (Efron et al., 2000)
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
known as debrightening, or use image processing techniques that involve non-linear transformations to some median chip (Amaratunga and Cabrera, 2001). More naive methods simply brighten chips to a common value (Affy, date). In our case, systematic differences between chips produces more serious consequences as entirely different shaped AB /non-AB curves can result. Chips with higher background levels will produce earlier saturation effects with steeper slopes, while chips with lower background levels will stretch the curve out. This is illustrated in the first panel of Fig. 2. Simple linear transforms will only affect scale changes marginally (i.e. for either AB or non-AB expression values), and will not affect changes in the shape profiles in a useful way. Non-linear transforms to a median chip seem more reasonable. However, while this can be done marginally, a more effective way is to use shape projections to a median shape profile, thus more explicitly using the bivariate relationship between AB and non-AB expression levels. 2.1. Shape standardization algorithm 1.
2.
3.
Calculate individual chip univariate standardizations: For each m ; m /1, . . ., M , let (through a slight abuse of notation) Xim / (Xim /Xm )/sXm and Yim /(Yim /Ym )/sYm where sXm and sYm are standard deviations for X and Y for chip m . Estimate the median gene expression curve from the model Y i fmed (X¯ i )ei ; i/1, . . ., N where ei are errors with mean 0 and variance s2, and X¯ ; Y¯ represent median expression values for gene i across the chips. The fitted curve, fˆmed can be calculated using something like loess smoother or some other non-parametric scatterplot smoother (Hastie and Tibshirani, 1990). Do similar loess fits for each of the M individual chips generating expression curves fˆm ; m /1,. . ., M . Note: It is necessary to calibrate the amount of smoothing to be the same for each curve and this can be done by having the same effective degrees of freedom (df) for each smooth.
4.
113
Shift each expression curve to the median expression curve: for each m /1, . . ., M , fˆmed g(fˆm )em ; m /1, . . ., M where the gene subscript i is supressed for clarity of presentation. Similar assumptions on the error terms em as in Step 2 are assumed.
What is found is the standardizing shape projection, namely g( ˆ fˆm ) for chip m to the median curve. Simulations provide a useful way to empirically illustrate the standardization algorithm. Using the four parameter logistic model as an approximation to each of fm , m /l, . . ., M , we generated a set of curves under the following sets of assumed parameter values: first curve: (a /1, b /1, c/1, d/1, x / N (100, 0, 1)); second curve: (n /1, b/1, c/2, d /0, x / N (100, 0, 1)); third curve: (a/1, b /0, c /3, d /0, x / N (100, 0, 1)); fourth curve: (a /l, b//1, c /0.3, d /0, x / N (100, 0, 1)). As noted earlier, varying these parameters changes the shape of the sigmoidal curve. The first panel of Fig. 2 shows these four curves under our settings. The remaining panels illustrate various views of the standardization process. Reading across the rows of Fig. 2, the second panel shows plots of Y versus Y¯ : The first panel of row 2 shows the same thing after standardization. It is clear that the standardization process in each transformed curve has Y values that now closely follow linear patterns when plotted against Y¯ : The second panel of row 2 shows the actual standardized curves and the last panel (row 3) gives the median curve as a reference.
3. Measurement error and clustering in estimating gene expression curves We now discuss in detail the linear modeling of gene expression curves that accounts for measurement error and gene clustering. Recall the simplest naive form of analysis would be to fit a linear
114
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
regression model of Y on X. Genes exhibiting AB saturation effects would unduly effect the fit of the regression line pulling the slope downwards and thus forcing predictions of unsaturated AB-spiked expressions to be biased downwards. The alternative practice of eyeballing the X value at which saturation of Y begins to counter the effects of measurement error, and fitting a linear regression line to data points preceding this observed threshold is highly subjective. The linear extrapolation that follows this model fitting can also be highly biased (upwards or downwards). As indicated in the introduction of this paper, we set up a measurement error model to avoid these problems. So define weights, wi
1 ; ¯ ½Xi X ½ ½Yi Y¯ ½
i 1; . . . ; N;
where X¯ and Y¯ are the sample means of X and Y , respectively. The intuition for this type of weight is as follows: the data point (X , Y ) with the lowest amount of measurement error is naturally (/X¯ ; Y¯ ) (Fuller, 1987). Thus distance from this coordinate gives an indication of amount of error in a particular measurement. Both X and Y are needed in defining the weights since it is somewhat arbitrary what expression value plays the role of Y and which one the role of X in the modeling. In classical measurement error theory, error in the true regression response Y generally is not of much concern */only that in the predictor X attenuates the estimated slope (Fuller, 1987). Genes with the most estimated measurement error get the lowest weights; i.e. those points with largest distance from (/X¯ ; Y¯ ) get the lowest weight in the weighted linear regression fit. It is also customary to then standardize the weights as wi? / wi /a wi ; i/1, . . ., N in order to guarantee the weights sum to 1.
3.1. Modeling clustered gene expression curves As noted in the previous section, there seems to be some underlying grouping of the linear profiles amongst the genes. A more sophisticated model can allow for this information to be incorporated
resulting in more accurate inference and predictions. We consider the following mixed linear model: yX bZao i
(1)
where y /(yi )15i 5N is a vector of observations; b/(bj )15j 5p is a vector of unknown regression coefficients (the fixed effects); a /(aj )15j 5m is a vector of unobservable random variables (the random effects); o /(oi )15i 5N is a vector of errors; and X , Z are known matrices. We assume that E (a ) / 0, Var(a )/G ; E (o )/0, Var(oi ) /R where G and R may involve some unknown parameters such as variance components; and a and o are uncorrelated. It is typical to assume that the errors and random effects are normally distributed although this is not always necessary (Jiang and Rao, 2002). The fixed effects b are estimated by the method of ordinary least squares (OLS) and random effects typically estimated by restricted maximum likelihood (REML) or maximum likelihood (ML) (Brown and Prescott, 1999). This model is typically used for clustered data where extra variation exists due to the correlation of responses within a cluster */thus violating the assumption of homoscedascity of errors in the standard linear model. This extra variation is modeled using the random effect structure a. With dataset T , clustering is likely due to the fact that different groups of genes will have different linear relationships between Y and X. Thus if these clusters can be found, linear mixed models of type 1 can be fit where a would correspond to random slopes and intercepts for each gene grouping. In order to determine this clustering behavior, define Zi /Yi /Xi ; i/1, . . ., N . This random variable represents a data-based estimate of each gene’s slope. Now standard clustering algorithms such as hierarchical clustering or k -means clustering can be applied to determine profile groupings across the genes. It is customary also to first logtransform the Z ratios prior to clustering in order to normalize the distribution of the Z ’s (Speed, 2000).
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
3.2. The Gap statistic for determining number of clusters One open issue is the data-based determination of the number of clusters in order to fit the linear mixed model. One technique that can be used is the gap statistic measure of Tibshirani et al. (2001). This statistic measures the difference in percentage of variance explained by between cluster variation on T and between cluster variation averaged over repeated permutations of T. This statistic is evaluated by simulation using different candidate numbers of clusters (k ) and can be used with any standard clustering algorithm. Our choice in this work involves a k -means clustering approach. The number of clusters corresponding to the maximum gap statistic is the data-based estimate of the number of clusters in the population. Further details can be found in Tibshirani et al. (2001).
4. A simulation study We simulated five different scenarios and compared the five different methods within each scenario. Each scenario consisted of five groups of simulated genes with a total of N /1000 genes in the training set and 200 genes in a test set left out for validation purposes. The five scenarios correspond to five models m1, m2, m3, m4 and m5. The details of each model are 1) 2) 3) 4) 5)
m1: m2: m3: m4: m5:
b/(1, b/(1, b/(1, b/(1, b/(1,
1.5, 2, 2.5, 3)?, s/1. 1.25, 1.5, 1.75, 2)?, s /1. 1.1, 1.2, 1.3, 1.4)?, s/1. 1.5, 2, 2.5, 3)?, s/0.25. 1.25, 1.5, 1.75, 2)?, s /0.25.
The first three models provide illustration of differing group effects with model m1 representing the largest disparity in slope between the groups. Models m4 and m5 reduce the amount of noise from s /1 to 0.25. The five-variate X distribution was simulated as i.i.d. N (0, 1) random variables and the Y ’s were then generated according to
Yij Xij bj eij ;
i 1; . . . ; 240;
115
j 1; . . . ; 5
with independent errors generated as N (0, R ) with R /s2I, for varying values of s as indicated above. A total of 1200 observations, 240 from each of five different groups were simulated. A test set Ttest of 200 observations were then randomly selected and kept aside for validation purposes (the same set for all methods under m 1, m 2, m 3, m 4 or m 5). Thus Y values were generated under a correlation mechanism within gene group (same slopes within group) and independence across groups. A simulated optimal threshold value of X /5.9946 was used, and Y values prior to this threshold were simulated according to the above model. Y values after this threshold were simulated by plugging in bj /0 in the respective models. In order to compare prediction performance of the various methods, average test set prediction error was assessed using P (Y Yˆ l )2 ; l 1; . . . ; 5; PE(ml ) Ttest 200 where Yˆ l represents fitted values from model ml . Linear models were fit using weighted or unweighted least squares (LM no weights; LM weights) using all the data. In addition, the data was truncated at the simulated optimal thresholding values as well as observed eyeballed thresholding values, and unweighted linear models fit (LTM and LTM(O), respectively). The weighted linear mixed effects model was fit using the simplest variance covariance structure of the random effects */namely G /diag (s21, s22, . . ., s2k ) where k is the number of groups (either five by design or that number estimated by the gap statistic procedure). For the gap statistic procedure, clusters were first estimated by k -means clustering for various values of k. Table 1 provides a summary of the simulations. The entries in the table indicate relative PE performance compared with the linear mixed effects model. In other words, (PElm (ml ) PElme (ml )) 100%; PElme (ml )
l 1; . . . ; 5:
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
116
Positive numbers indicate higher test set errors than the weighted linear mixed model and negative numbers lower test set errors. Test set errors were estimated in two different fashions. For models m 1 through m 5, we assumed knowledge of which group the test observation belonged to. Simply plugging in into the estimated linear mixed model (estimated only using the training set) allowed us to predict test responses. For model m 3, we also used the more honest approach of assuming the number of gene groupings were unknown and were estimated from the data using the gap statistic. In this case, we first clustered both test and training sets together, determined the number of clusters using the gap statistic, and then fit the linear mixed effects model to only the training data. Thus we had data-based evaluation of which gene group each test observation belonged to for the purposes of predicting test set observations. The results are extremely clear. The most obvious fact is that the weighted linear mixed effects model uniformly has the best performance by a large margin over the other approaches for all scenarios considered. In fact, improvements in test set prediction accuracy hover around 80% which is huge. When using the more honest approach of assuming that the number of clusters is in fact unknown and hence estimated by the gap statistic approach, one can see that for m 3 say, the large gains over the other methods in prediction accuracy are still maintained. Some specific conclusions related to the weighting mechanism and thresholding points are apparent as well. The suggested weighting mechanism
provides improvement in prediction accuracy for all but model m1 over the linear model with no weights. Next, if one knew the ideal threshold values that were used to simulate threshold (saturation) effects, then this is just slightly better than eyeballing the thresholding value (last two columns). More importantly, the weighting mechanism provides performance improvements on par with knowing the optimal thresholding value and uses all the data at the same time.
5. Real example: saturation analysis of metastatic colon cancer and Duke’s B survivor chips We now illustrate the mixed model methodology on two separate gene chips. Both of these chips come from a large study at Case Western Reserve University trying to identify a distinct molecular signature of metastatic colon cancer. In the study, three groups of tissues are collected: normal colon, samples clinically labeled as Duke’s B survivors and metastatic (to the liver) colon cancer samples. Details of the design including actual numbers of samples in each group can be found in Ishwaran and Rao (2002). Gene expression arrays were generated for samples using the EOS II chip platform described earlier where the chip consisted of 35 371 probe sets. Figs. 3 and 4 show the observed expression values for the AB and non-AB pairs for the Duke’s B survivor and metastatic colon cancer samples. Clear saturation effects are visible with both chips.
Table 1 The results of fitting mixed models and linear models to the simulated gene expression curves Model
LM (no weights)
LM (weights)
LTM (no weights)
LTM(O) (no weights)
m1 m2 m3 m4 m5 m 3 (gap)
83.77 82.09 85.58 81.39 76.75 81.35
84.59 80.70 84.77 78.40 69.55 80.31
81.91 80.23 84.73 79.18 69.58 80.25
81.90 80.21 84.57 78.96 69.39 80.05
Linear models were fit using weighted or unweighted least squares (LM no weights; LM weights) using all the data. In addition, the data was truncated at the simulated optimal thresholding values as well as observed eyeballed thresholding values, and unweighted linear models fit (LTM and LTM(O), respectively).
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
Fig. 3. Raw data for Duke’s B survivor chip allowing saturation effects.
Fig. 4. Raw data for liver mets chip showing saturation effects.
117
118
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
Fig. 5. Fitted lines for Duke’s B chip from naively thresholded model and the linear mixed model which finds two fits from two slope clusters.
Fig. 6. Fitted lines for liver mets chip from naively thresholded model and the linear mixed model, which finds two fits from two slope clusters.
J.S. Rao, J. Li / Respiratory Physiology & Neurobiology 135 (2003) 109 /119
Figs. 5 and 6 show the fitted mixed model as well as the fitted naive model where genes beyond an eyeballed threshold are deleted. The mixed model was after first using the gap statistic to estimate the number of slope clusters present. For both chips, the gap statistic estimated two distinct clusters found by k-means clustering. The fitted mixed models show the two estimated linear trends for each chip after adjusting for measurement error via the weighting scheme derived earlier. Again, it is evident that by using the simple naive model, overly optimistic AB expression readings would be predicted in the linearly extrapolated range of data. The mixed model alleviates this problem by adjusting the slope of each group downwards, yet not as greatly as would have been found by simply fitting a linear model to the entire data with no slope clustering or weighting.
6. Summary Being able to deal with both scans of gene expressions from a typical microarray experiment is an important problem. Simply ignoring one scan can be misleading, as can linear extrapolations generated from linear regression fits naively deal with saturation effects. In this paper we have presented some new ideas for the parametric and non-parametric analysis of gene expression curves. The use of linear mixed effects modeling allows one to cluster genes into groups based on their individual gene expression curves, where the number of clusters can in fact be determined from the data using a permutation based statistic known as the gap statistic. In addition, the use of observation weights derived from a measurement error process eliminates the need for arbitrary eyeballing of the data. Non-parametric standardization of gene expression curves was also introduced in order to deal with chip to chip variability effects. A novel algorithm that does not impose strict parametric assumptions on the shapes of the gene expression curves was derived for standardizing possibly different shaped curves to an overall median curve. In general, simulation results indicate uniformly significant improvements in prediction accuracy of
119
the models using these refined methodologies. Application of the methodology to colon cancer and Duke’s B survivor arrays also show similar improvements.
Acknowledgements J. Sunil Rao’s work was supported in part by NIH grant number K25Ca89867.
References Amaratunga, D., Cabrera, J., 2001. Analysis of data from viral DNA microchips. J. Am. Stat. Assoc. 96, 1161 /1170. Brown, H., Prescott, R., 1999. Applied Mixed Models in Medicine. Wiley, Chichester. Efron, B., Tibshirani, R., Goss, V., Chu, G., 2000. Microarrays and their use in a comparative experiment. Technical report, Department, of Statistics, Stanford University. Fuller, W.A., 1987. Measurement Error Models. Wiley, New York. Guardabasso, V., Rodbard, D., Munson, P.J., 1987. A modelfree approach to estimation of relative potency in dose / response curve analysis. Am. J. Physiol. 252, E357 /E364. Hastie, T., Tibshirani, R., 1990. Generalized Additive Models. Chapman and Hall, London. Ishwaran, H., Rao, J.S., 2002. Detecting differentially expressed genes using Bayesian model selection. J. Am. Stat. Assoc., in press. Jiang, J., Rao, J.S., 2002. A consistent method for linear mixed model selection. Sanhkya A, in press. Laird, N., Ware, J., 1982. Random effects models for longitudinal data. Biometrics 38, 963 /974. Lin, Y., Nadler, S.T., Attic, A.D., Yandell, B.S., 2000. Mining for low-abundance transcripts in microarray data. Technical report, Department of Statistics, University of Wisconsin, Madison. Schena, M., Heller, R.A., Theriault, T.P., Konrad, K., Lachenmeier, E., Davis, R.W., 1998. Microarrays: biotechnology’s discovery platform for functional genomics. Trends Biotechnol. 16, 301 /306. Schena, M., Davis, R.W., 1999. Genes, genomes and chips. In: Schena, M. (Ed.), DNA Microarrays: A Practical Approach. Oxford University Press, Oxford, UK. Speed, T., 2000. The Terry Speed Microarray Page */Hints and Prejudices. Url: http://www.stat.berkeley.edu/users/terry/ zarray/H. The Chipping Forecast, 1999. Nat. Genet. (Suppl.) 2, 1 /60. Tibshirani, R., Walther, G., Hastie, T., 2001. Estimating the number of clusters in a dataset via the Gap statistic. J. R. Stat. Soc., B 63, 411 /423.