A computation saving Jackknife approach to receptor model uncertainty statements for serially correlated data

A computation saving Jackknife approach to receptor model uncertainty statements for serially correlated data

Chemometrics and Intelligent Laboratory Systems 88 (2007) 170 – 182 www.elsevier.com/locate/chemolab A computation saving Jackknife approach to recep...

713KB Sizes 0 Downloads 20 Views

Chemometrics and Intelligent Laboratory Systems 88 (2007) 170 – 182 www.elsevier.com/locate/chemolab

A computation saving Jackknife approach to receptor model uncertainty statements for serially correlated data Clifford H. Spiegelman a,⁎, Eun Sug Park b a

b

Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143, USA Texas Transportation Institute, Texas A&M University System, 3135 TAMU, College Station, TX 77843-3135, USA Received 19 July 2006; received in revised form 18 April 2007; accepted 22 April 2007 Available online 1 May 2007

Abstract The use of receptor modeling is now a widely accepted approach to model air pollution data. The resulting estimates of pollution source profiles have error and frequently the uncertainties are obtained under an assumption of independence. In addition traditional Bootstrap approaches are very computationally intensive. We present an intuitive Jackknife alternative that is much less computationally intensive and in simulation examples and actual data seems to demonstrate that it provides wider confidence intervals and larger standard errors for receptor model profile estimates than does the Bootstrap done under the assumption of independence. © 2007 Elsevier B.V. All rights reserved. Keywords: Jackknife; Bootstrap; Bilinear; Air-pollution

1. Introduction Receptor modeling is an important technique that is used to identify and locate major pollution sources; see Henry et al. [1], Park et al. [2], and Paatero et al. [3]. Generally receptor models have a complicated structure. From a statistician's perspective, they can be thought of as constrained bilinear models or constrained confirmatory factor analysis models. In addition, the data used in receptor models have both a time series correlation and a more typical multivariate correlation among variables. Because the models are nonlinear, have constraints and multiple types of dependence, the uncertainties of parameter estimates are difficult to estimate reliably. The basic model is: Y = AP + ε, subject to (A, P) ∈ Ω, a set of feasible parameter values. Here Y is an n by p matrix of pollution observations. The row index of Y frequently denotes time, and the column indexes of Y typically denote pollution species. The matrix A is an n by q matrix that denotes the contribution of major pollution sources and it is a nonnegative matrix that is subject to additional

⁎ Corrresponding author. E-mail address: [email protected] (C.H. Spiegelman). 0169-7439/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2007.04.004

constraints; see Park, Spiegelman, and Henry [4], or Park et al. [2] for examples. For each of n time periods, the source contribution matrix A contains the amount of pollution received at the receptor from each of q major pollution sources. The matrix P contains the pollution profiles or chemical fingerprints for each of q major pollution sources and is called a source composition matrix. The set Ω describes the feasible region for contribution and composition parameters. For example, Ω is a subset of Rq(n + p) space that has only nonnegative parameters as well as other values subject to other constraints. The errors ε are n by p and represent random measurement errors. Christiansen and Sain [5] used a block Bootstrap to estimate model complexity (how many parameters to include in model) in an effort to directly try to handle the time series dependence that typically occurs in receptor modeling. Their method is an alternative to the standard case resampling Bootstrap used by many in the area for example Gajewski and Spiegelman [6] and Henry et al. [1]. Still while the block Bootstrap mitigates some time series effects, it does not eliminate the effects due to the non-stationary behavior of the dependencies. One of commonly used receptor modeling constraints is that some elements of the source contribution matrix, A, are assumed zeroes, and if several of those elements get deleted during Bootstrap case resampling, identifiability of the parameters may be lost. Thus,

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

case resampling even with blocking can sometimes, perhaps even frequently, present serious challenges. One important feature of a bilinear model is that the transpose of the model also has a bilinear form. This is important because it suggests the possibility of sampling of columns in addition to or in place of rows of the data matrix. The dependence structure of the columns of the observation matrix is different from the typical nonstationary time series structure found in the row dependence. We will show by example that performing a Jackknife evaluation of uncertainties using columns produces standard errors that are much less susceptible to underestimation due to serial correlation. When the errors are temporally correlated, the Jackknife confidence intervals tend to produce larger standard errors and wider confidence intervals than those obtained using the Bootstrap and typical normal distribution multiples of Bootstrap standard errors under the false assumption of no time series dependence. The typical Jackknife estimator deletes only one row or column of the data at a time. The Jackknife method is not intended to produce standard error estimates for pollution sources that are not identifiable when one particular column is deleted. Thus, typically identifiability of the parameters is maintained; whereas a Bootstrap case resampling approach to this type of a problem would lose identifiability of the parameters as several rows or columns are deleted at a time. A Bootstrap residual resampling approach can still maintain the identifiability of the parameters and that is what we use in our examples. As mentioned in Efron and Tibshirani [12], Bootstrapping is not a uniquely defined concept and the choice between case resampling and residual resampling will need to be made depending on how far we trust the model. In addition, block-Bootstrapping with resampled residuals in the presence of constraints remains to be developed. Another important reason for deleting columns rather than row is that, in addition to dealing with a different type of correlation, receptor models are computationally intensive. Usually there are many fewer columns than rows and thus evaluating uncertainties by deleting columns rather than rows may save considerable computational effort. This is another large benefit of using our recommended Jackknife procedure. 2. Notation and method The approach we take should generally work in assessing the random uncertainty component for any estimator of A and P whether the estimators are the constrained least squares as in Park et al. [4], L2E as in Gajewski and Spiegelman [6], Unmix as in Henry [7], Positive Matrix Factorization as in Paatero et al.[3], or any other approach to bilinear parameter estimation. Regardless ˆ and ˆP denote the estimates of A of the estimation methods let A and P obtained from the full dataset. The Jackknife method will be implemented by deleting the jth columns of Y and P as j goes ˆ ( j) denote the estimates of A and P ˆ ( j) and P from 1 to p. Let A obtained from the dataset with the jth column of Y deleted as j goes from 1 to p. The variances of the elements (and covariances) may be computed using pseudo-residuals as in Tukey and Mosteller [10] and renormalization in the case of estimates for variance associated with ˆP in order to adjust for column deletion.

171

Alternatively, the formula given in Efron and Tibshirani [12] can ˆ and ˆP as follows: be used to compute the standard errors for A " #1=2 p 2 p  1X  aˆ tkð jÞ  aˆ tkðdÞ ; ð1Þ rˆ ðaˆ tk Þ ¼ p j¼1 where aˆ tkðdÞ ¼

Pp

ˆ

j¼1 atk ð jÞ =p,

and

"

p 2 p  2X  rˆ ð pˆ kl Þ ¼ pˆ klð jÞ  pˆ klð:Þ p  1 j¼1

#1=2 ;

ð2Þ

P where pˆ klðdÞ ¼ pj¼1 pˆ klð jÞ =ð p  1Þ. Note that pˆ kj( j) is missing, and pˆ kj(·) is computed based on the non-missing values only. Because the standard deviations are sensitive to an outlier especially when p is small, a robust estimator of a standard deviation based on an interquartile range (IQR) may be used instead of the standard deviations above. In that case, the robust ˆ can be given as ˆ and P Jackknife standard errors for A p1 rˆ R ðaˆ tk Þ ¼ pffiffiffi IQRðaˆ tk Þ=1:35 p

ð3Þ

and p2 rˆ R ð pˆ kl Þ ¼ pffiffiffiffiffiffiffiffiffiffiffi IQRð pˆ kl Þ=1:35 p1

ð4Þ

where IQR(aˆ tk) and IQR( pˆ kl) represent an interquartile range ˆ , respectively. ˆ and P for each element of the matrices A ˆ In many papers the rows of P and P are normalized to sum to 1. When such normalization is used, then an adjustment needs to be made to ˆP( j) before computing the standard errors. In this P case each element of the ith row of ˆP( j) should be multiplied by lpj Pˆ il in order to account for the deletion of the jth column. A similar adjustment also needs to be made to each element ofP the ith ˆ (j) by multiplying each column of A ˆ (j) by 1= lpj Pˆ il. column of A The Jackknife estimates of standard error are then computed in the ˆ ( j). usual way using the adjusted ˆP( j) and A ˆ and ˆP are Once Jackknifed estimates of the variances of A computed, there is another important value to compute. That value is the degrees of freedom for the Jackknife estimates. While we are modeling the contributions A and profiles P as fixed parameter (as opposed to a random variable), in practice there is variation in these parameters. The variation in P occurs because there is variation at the source in their production of pollutants, there is variation in meteorology around many of the plants that produce pollution, and there are many other reasons as well. Variation in A occurs due to usage variations as well as weather changes such as wind speed and direction changes. Thus measurement error may not be the major source of uncertainty in receptor models but rather ambiguity in the AP part of the model due to variation in the model mean. If this is so then the (random) dependence structure in the AP part of the model should determine the degrees of freedom of the variance estimates for the parameters. That is because both the algebraic rank and the rank of the covariance matrix for AP with random P determine the degrees of freedom for the variance estimate if random measurement errors are negligible. (Note that

172

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

Table 1 True source composition profiles (P0), estimated source composition profiles (Pˆ ), Jackknife standard errors, Bootstrap standard errors, Jackknife confidence intervals, and Bootstrap confidence intervals for P when the observations are independent Species Source 1

Source 2

Source 3

True Estimate JKSE BSE JKCIL JKCIU BCIL BCIU BCIstL BCIstU True Estimate JKSE BSE JKCIL JKCIU BCIL BCIU BCIstL BCIstU True Estimate JKSE BSE JKCIL JKCIU BCIL BCIU BCIstL BCIstU

1

2

3

4

5

6

7

8

9

0.132 0.135 0.005 0.005 0.119 0.152 0.126 0.144 0.126 0.144 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.146 0.152 0.003 0.006 0.143 0.161 0.139 0.164 0.140 0.164

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.214 0.217 0.003 0.004 0.208 0.225 0.208 0.223 0.209 0.224 0.146 0.139 0.004 0.007 0.125 0.152 0.125 0.151 0.125 0.152

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.119 0.121 0.018 0.007 0.063 0.180 0.105 0.134 0.107 0.136 0.229 0.233 0.004 0.008 0.221 0.245 0.218 0.250 0.217 0.249

0.263 0.269 0.003 0.004 0.260 0.278 0.262 0.277 0.261 0.277 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.083 0.081 0.007 0.010 0.060 0.102 0.062 0.099 0.062 0.100

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.167 0.159 0.013 0.005 0.119 0.199 0.150 0.169 0.149 0.169 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.158 0.155 0.003 0.004 0.144 0.166 0.146 0.164 0.146 0.164 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.079 0.080 0.004 0.003 0.068 0.092 0.074 0.086 0.074 0.086 0.143 0.143 0.009 0.004 0.115 0.171 0.135 0.149 0.135 0.150 0.188 0.191 0.002 0.005 0.185 0.197 0.182 0.200 0.182 0.200

0.105 0.104 0.004 0.004 0.091 0.118 0.096 0.111 0.097 0.111 0.119 0.114 0.011 0.005 0.079 0.149 0.105 0.124 0.105 0.123 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.263 0.256 0.002 0.004 0.250 0.263 0.248 0.263 0.249 0.264 0.238 0.247 0.004 0.005 0.235 0.259 0.238 0.257 0.238 0.256 0.208 0.204 0.002 0.006 0.197 0.211 0.191 0.213 0.192 0.216

Notes: 1. Jackknife standard errors are obtained based on a column deletion (from species 1 to 9). 2. Bootstrap standard errors are obtained based on 200 Bootstrap samples for which resampling is done over the rows. 3. JKSE and BSE represent the Jackknife standard errors and the Bootstrap standard errors, respectively. 4. JKCIL and JKCIU represent the lower limit and the upper limit of the 95% Jackknife confidence intervals, respectively. 5. BCIL and BCIU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the percentile method, respectively. 6. BCIstL and BCIstU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the standard normal method, respectively.

this is a bit counter-intuitive because as the number of model parameters increase with q so does the degrees of freedom. We assume that the sample sizes are much larger than the number of parameters being estimated.) Typically the matrix A is full column rank and if the measurement error is not the major source of variation then rank(Y) ≈ rank(AP) ≈ rank(P) (typically), and since the modeler knows the rank of P the degrees of freedom should be easy to set. Sometimes, however, the modeler may over parameterize (or under parameterize) the model. To protect against the over parameterized case we estimate the rank of our model using the NUMFACT algorithm presented in Henry et al. [8], and Park et al.[9]. The NUMFACT method is a way to estimate the underlying rank of matrix that is subject to error. We denote the estimated rank of the data determined by the NUMFACT algorithm by NUMrank. Then we protect against the effect of over parameterization by using MIN(rank(P), NUMrank) as our degrees of freedom (df) for the Jackknifed estimates of variance. In other words df= min(q,NUMrank) and the Jackknife confidence intervals are Parameter estimateFt ðdf ; 1  a=2Þ ðJackknife standard error estimateÞ

ð5Þ

We are concerned that using a too small value for q leads to model bias as being important. We also use the Jackknife to assess the issue of bias. The motivation for this choice of degrees of freedom can be seen from the following simple example. Suppose that e1,e2, and e3 are independent normal random variables with mean zero and variance σ2. Let e4,e5,…,e101 be linear combinations of e1,e2, and e3 having the same mean and variance. Let di = μ + e1,i = 1,…,101. The proper degrees of freedom associated with the Jackknifed estimate of variance for the sample mean from this data would have only two degrees of freedom from the three independent 2 3observations. (This follows by multiplying the vector e1 6 e2 7 6 7 7 e¼6 6 e3 7by an orthogonal matrix such that the resulting 4 v 5 e101 vector has only three nonzero entries. Since orthogonal matrices preserve Euclidean distance the appropriate degrees of freedom are two). It is clear that if the profiles P were fixed (say at 0) and the rows of Y are independent then the degrees of freedom that

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

we recommend grossly underestimates the degrees of freedom available for a variance estimator. As stated in Tukey and Mosteller [10] (for the case of independent and identically

173

distributed errors) the degrees of freedom should be set as the number of different pseudo-values less the number of parameters estimated. Receptor modeling is typically far from

Table 2 ˆ ), Jackknife standard errors, Bootstrap standard errors, Jackknife confidence intervals, True source composition profiles (P0), estimated source composition profiles (P and Bootstrap confidence intervals for P when the observations are temporally correlated Species Source 1

Source 2

Source 3

True Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU True Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU True Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU

1

2

3

4

5

6

7

8

9

0.132 0.139 0.021 0.005 0.005 0.004 0.073 0.205 0.123 0.155 0.129 0.149 0.130 0.149 0.131 0.147 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.146 0.129 0.009 0.004 0.006 0.005 0.099 0.158 0.117 0.141 0.117 0.140 0.118 0.140 0.119 0.139

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.214 0.208 0.007 0.008 0.010 0.010 0.185 0.230 0.181 0.234 0.187 0.226 0.188 0.228 0.187 0.228 0.146 0.153 0.017 0.012 0.006 0.007 0.099 0.207 0.116 0.191 0.140 0.165 0.141 0.166 0.140 0.166

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.119 0.117 0.060 0.012 0.009 0.008 0.000 0.309 0.078 0.155 0.094 0.132 0.099 0.135 0.101 0.132 0.229 0.232 0.010 0.012 0.005 0.005 0.199 0.265 0.194 0.270 0.221 0.243 0.222 0.243 0.223 0.241

0.263 0.254 0.019 0.017 0.008 0.008 0.194 0.314 0.201 0.306 0.238 0.267 0.238 0.270 0.238 0.269 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.083 0.080 0.012 0.013 0.009 0.007 0.040 0.119 0.040 0.120 0.061 0.097 0.063 0.097 0.065 0.094

0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.167 0.177 0.047 0.004 0.007 0.008 0.029 0.326 0.166 0.188 0.165 0.191 0.163 0.192 0.162 0.192 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.158 0.144 0.014 0.006 0.005 0.004 0.100 0.188 0.124 0.164 0.135 0.153 0.135 0.153 0.136 0.152 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.079 0.071 0.023 0.012 0.006 0.005 0.000 0.143 0.034 0.108 0.059 0.082 0.060 0.082 0.061 0.080 0.143 0.141 0.033 0.004 0.008 0.008 0.035 0.247 0.130 0.152 0.124 0.154 0.126 0.156 0.126 0.157 0.188 0.199 0.010 0.007 0.006 0.005 0.165 0.232 0.176 0.221 0.188 0.209 0.188 0.209 0.188 0.209

0.105 0.114 0.021 0.003 0.004 0.004 0.046 0.182 0.104 0.124 0.105 0.122 0.105 0.122 0.106 0.122 0.119 0.127 0.042 0.012 0.011 0.010 0.000 0.260 0.090 0.165 0.109 0.148 0.106 0.149 0.107 0.148 0 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.263 0.279 0.014 0.006 0.005 0.005 0.233 0.324 0.261 0.297 0.268 0.289 0.268 0.289 0.268 0.289 0.238 0.230 0.019 0.010 0.008 0.007 0.171 0.289 0.197 0.263 0.216 0.246 0.215 0.245 0.216 0.244 0.208 0.208 0.010 0.015 0.006 0.005 0.175 0.241 0.162 0.254 0.195 0.219 0.196 0.219 0.197 0.218

Notes: 1. Jackknife standard errors are obtained based on a column deletion (from species 1 to 9); 2. Bootstrap standard errors are obtained based on 200 Bootstrap samples for which resampling is done over the rows; 3. JKSE and BSE represent the Jackknife standard errors and the Bootstrap standard errors, respectively; 4. JKSER and BSER represent the robust versions of the Jackknife standard errors and the Bootstrap standard errors utilizing the IQR, respectively; 5. JKCIL and JKCIU represent the lower limit and the upper limit of the 95% Jackknife confidence intervals, respectively; 6. JKCIRL and JKCIRU represent the lower limit and the upper limit of the 95% Jackknife confidence intervals utilizing the robust standard errors JKSER, respectively; 7. BCIL and BCIU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the percentile method, respectively; 8. BCIstL and BCIstU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the standard normal method, respectively; 9. BCIRL and BCIRU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the standard normal method and utilizing the robust standard errors BSER, respectively; 10. Intervals not capturing the true parameter value are shown in bold.

174

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

having independent and identically distributed data and the dependence structure is difficult to model. Ad-hoc techniques such as scree plots and the rule of one frequently aid other areas of multivariate analysis, see Henry et al. [8]. The method that we present for degrees of freedom is intuitive but also ad-hoc. 3. Examples We will show by examples that (1) the standard error estimates from our Jackknife method and Bootstrap standard error estimates are comparable when the observations are independent, but (2) when the observations are temporally correlated standard error estimates and confidence intervals obtained by our Jackknife method are conservative when compared to Bootstrap standard error estimates as well as Bootstrap confidence intervals (computed under independence assumptions). 3.1. Simulated example We first consider a simulated example to illustrate (1). The data is generated based on the basic model with true source composition matrix P0 given in Table 1 where n = 50, p = 9, and q = 3. The source contribution matrix A is generated from a truncated multivariate normal distribution N3(3, I3) where 3 = (3,3,3) and I3 represents a 3 × 3 identity matrix. The errors associated with n observations are independently generated from the normal distribution with mean 0 and a diagonal covariance matrix so that the proportions of the error standard deviations to the model standard deviations are about 9–28%. The resulting data matrix Y consists of nonnegative numbers. We employ the constrained nonlinear least squares (CNLS) method (see Park et al. [4]) utilizing the identifiability conditions on the source composition matrix to obtain the estimates of P. Both the standard error estimates and the interval estimates for P are obtained by each of a new Jackknife method deleting each column and a Bootstrap method resampling the rows of the residuals, respectively. The number of Bootstrap samples used was 200. In applying the CNLS, it is important to ensure that a good local minimum (hopefully a global ˆ ) plotted minimum) is found. While graphs of the solutions (P on the principal component axes along with the data are useful tools, checking convergence by graphs cannot be automated in each of 200 Bootstrap replications. Instead, the R2 values ˆ and ˆP⁎ (a Bootstrap replication of P ˆ ) are between the rows of P checked in each Bootstrap replication and if any of the R2 values is less than a reasonable cutoff value (e.g., 0.7) then the CNLS is rerun with a new starting value to ensure the convergence. To make a fair comparison, the R2 check is ˆ ( j) (a Jackknife replication of P ˆ ) also. The Jackknife applied to P confidence intervals (JKCI) are computed by Eq. (5) with df = 3 and α = 0.05, and the Bootstrap confidence intervals (BCI) are obtained by the percentile interval method (see, e.g., Efron and Tibshirani [12]). Following a referee's suggestion, the Bootstrap confidence intervals computed by the standard normal method (Parameter estimate ± 1.96 Bootstrap standard error estimate) are also provided in the table. When the Bootstrap

distribution of the estimator is roughly normal, the standard normal and percentile intervals will nearly agree. When the Bootstrap distribution of the estimator is non-normal, however, there will be non-ignorable discrepancies between two types of intervals, in which case the percentile interval is known to be more accurate than the standard interval (see Efron and Tibshirani [12]). The results are presented in Table 1. It can be observed from Table 1 that there is no natural ordering between the standard error estimates from our new Jackknife method (JKSE) and Bootstrap standard error estimates (BSE) when the errors are independent. All of the 95% Jackknife confidence intervals and the 95% Bootstrap confidence intervals contain the true values of P in this case. The standard error estimates for A obtained by using the Jackknife approach and the Bootstrap approach are also given in the Appendix A (see Table A1). It can be seen from the table that the standard errors of A from both approaches are in the same order of magnitude when the observations are independent. Next, we consider a simulated example where the observations are temporally correlated. The data is generated with the same true source composition matrix P0 given in Table 1 where n = 50, p = 9, and q = 3. To simulate temporally correlated source contributions and errors, the source contribution matrix A = [αt] and the errors ε = [εt] are generated according to the AR(1) process as follows: logat ¼ n0 þ ðlogat1  n0 ÞU þ ut ; ut e N3 ð0; 0:1  I3 Þ et ¼ gt þ dt ; dt e N9 ð0; R0 Þ gt ¼ gt1 H þ tt ; tt e N9 ð0; 0:05  I9 Þ where ξ0 = (3,3,3) u = (ut1,ut2,ut3), Φ = diag(0.7,0.7,0.7), Σ0 = diag(0.0015, 0.001, 0.0015, 0.001, 0.0005, 0.002, 0.001, 0.0015, 0.0015), υ = (υ t1 , υ t2 ,···,υ t9 ) , and Θ = diag(0.9, 0.9,···,0.9). Table 3 Number of non-zero elements that fall outside the 95% confidence intervals Dataset

JKCI

JKCIR

BCI

BCIst

BCIR

1 2 3 4 5 6 7 8 Total Coverage probability (%)

0 0 0 0 0 0 0 0 0 100

1 0 2 0 2 1 1 2 9 94

5 5 2 1 8 3 5 4 33 77

5 4 3 1 8 3 4 2 30 79

7 5 7 9 12 5 4 7 56 61

Notes: 1. The number of non-zero elements of P is 18; 2. Coverage probability is estimated by (100 × Total)/(18 × 8).

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182 Table 4 Sample mean and standard deviation of VOC data for the species used in the analysis Variable number

VOC species

Sample mean

Sample standard deviation

1 2 3 4 5 6 7 8 9 10

Ethane n-Propane Trans-2-pentene 2,3-Dimethylbutane 2-Methylpentane 2,3-Dimethylpentane 3-Methylhexane Ethylbenzene m-Xylene + p-xylene 1,2,3-Trimethylbenzene

27.3973 16.4710 2.4295 5.7470 28.5415 3.0710 9.3104 9.3557 17.2945 3.0852

16.5901 10.9709 2.5890 3.4512 19.2938 1.5413 4.6903 7.9614 16.1344 1.5427

Note that αt s are generated from the multivariate lognormal distribution. The proportions of the error standard deviations to the model standard deviations range from 8% to 34%, and the resulting data matrix Y consists of nonnegative numbers. The CNLS method utilizing the identifiability conditions on P is used again to obtain the point estimates of P. Both the Jackknife method deleting each column of the data and the Bootstrap method resampling the rows of the residuals are applied to obtain the standard error estimates and the interval estimates for the elements of P. In addition to the Jackknife standard error estimates in Eq. (2) (JKSE), the robust Jackknife standard error estimates (JKSER) based on the IQR as in Eq. (4) are also provided in Table 2. For comparison purposes, a robust version of the Bootstrap standard error estimates (BSER) replacing the estimated standard deviation by IQR/1.35 is also provided in the table. It can be observed from Table 2 that JKSER are in general noticeably smaller than JKSE while BSER are almost the same as BSE. As mentioned previously, the

175

standard deviations are sensitive to an outlier especially when the number of underlying samples is small (9 for the Jackknife samples as opposed to 200 for the Bootstrap samples), which explains why there are much bigger differences between JKSER and JKSE compared to those between BSER and BSE. Table 2 also shows that the standard error estimates from the new Jackknife method (JKSE or JKSER) are in general larger than the corresponding Bootstrap standard error estimates (BSE or BSER) when the observations are temporally correlated. The Jackknife confidence intervals (JKCI or JKCIR) are also observed to be wider than the corresponding Bootstrap confidence intervals (BCI, BCIst, or BCIR). For the Jackknife confidence intervals, all of the 95% JKCI contain the true values of P0 and only 1 out of 18 nonzero elements of P0 fall outside the 95% JKCIR. On the other hand, for the Bootstrap confidence intervals 5 out of 18 nonzero elements of P0 fall outside the 95% BCI and the 95% BCIst and 7 elements fall outside the 95% BCIR. The standard error estimates for A obtained by using the Jackknife approach and the Bootstrap approach are also given in the Appendix A (see Table A2). It can be seen from the table that the standard errors of A obtained by the Jackknife approach are noticeably larger than those obtained by the Bootstrap approach when the observations are temporally correlated. Because the ordinary Bootstrap method ignores temporal correlation in the errors, the Bootstrap standard error estimates (BSE or BSER) and the Bootstrap confidence intervals (BCI, BCIst, BCIR) will be much smaller/narrower than what they should have been. On the other hand, the new Jackknife method based on a column deletion does not seem to be as biased downwards by temporal correlation in the errors, and leads to the reasonable standard error estimates and interval estimates for P. To affirm this presumption, we generated eight simulated data sets for which the observations are temporally correlated as in Table 2, and estimated the 95% confidence intervals by

Fig. 1. Principal component plots of the VOC data (◯) and the fitted sources by CNLS (▵).

176

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

Jackknifing and Bootstrapping for each data set. Table 3 contains the number of non-zero elements that fall outside the confidence intervals for each dataset as well as the estimated coverage probability. As can be seen from the above table, the coverage probabilities of the Jackknife intervals, JKCI and JKCIR, are 100% and 94%, respectively, while the coverage probabilities of

the Bootstrap intervals, BCI, BCIst, and BCIR are only 77%, 79%, and 61%, respectively. It appears that when the observations are temporally correlated, the coverage probability of JKCIR is close to the nominal level while JKCI may be too conservative. The Bootstrap confidence intervals are observed to be all too narrow, and their coverage probabilities are much lower than the nominal level. We deem that the robust Jackknife

Table 5 Estimated source composition profiles (Pˆ ), Jackknife standard errors, Bootstrap standard errors, Jackknife confidence intervals, and Bootstrap confidence intervals for P Species Source 1

Source 2

Source 3

Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU Estimate JKSE JKSER BSE BSER JKCIL JKCIU JKCIRL JKCIRU BCIL BCIU BCIstL BCIstU BCIRL BCIRU

1

2

3

4

5

6

7

8

9

10

0.029 0.030 0.032 0.020 0.022 0.000 0.123 0.000 0.130 0.000 0.072 0.000 0.069 0.000 0.073 0.028 0.025 0.029 0.017 0.018 0.000 0.108 0.000 0.119 0.000 0.063 0.000 0.063 0.000 0.063 0.514 0.031 0.013 0.012 0.012 0.416 0.612 0.471 0.556 0.488 0.535 0.490 0.537 0.491 0.536

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.335 0.030 0.014 0.011 0.011 0.241 0.429 0.290 0.380 0.320 0.363 0.313 0.356 0.314 0.356

0.035 0.025 0.029 0.007 0.007 0.000 0.113 0.000 0.126 0.021 0.049 0.022 0.048 0.021 0.048 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.015 0.011 0.014 0.007 0.007 0.000 0.051 0.000 0.058 0.001 0.026 0.002 0.028 0.002 0.028

0.098 0.032 0.018 0.004 0.004 0.000 0.201 0.042 0.155 0.088 0.107 0.090 0.107 0.091 0.106 0.009 0.036 0.012 0.003 0.003 0.000 0.125 0.000 0.049 0.000 0.015 0.003 0.016 0.004 0.015 0.018 0.028 0.005 0.003 0.002 0.000 0.106 0.003 0.033 0.012 0.025 0.012 0.024 0.013 0.023

0.647 0.067 0.057 0.021 0.021 0.434 0.859 0.465 0.829 0.601 0.682 0.606 0.688 0.605 0.688 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.034 0.012 0.009 0.004 0.003 0.000 0.072 0.005 0.064 0.028 0.042 0.027 0.042 0.028 0.041 0.038 0.008 0.008 0.003 0.003 0.013 0.064 0.013 0.064 0.034 0.044 0.033 0.043 0.033 0.043

0.107 0.019 0.016 0.008 0.008 0.047 0.168 0.057 0.157 0.091 0.121 0.092 0.123 0.091 0.124 0.043 0.026 0.007 0.009 0.007 0.000 0.125 0.019 0.067 0.021 0.060 0.025 0.061 0.029 0.057 0.055 0.027 0.007 0.007 0.006 0.000 0.139 0.031 0.079 0.043 0.068 0.042 0.068 0.042 0.068

0.062 0.063 0.010 0.011 0.010 0.000 0.261 0.030 0.095 0.037 0.082 0.041 0.083 0.042 0.082 0.226 0.024 0.026 0.015 0.013 0.149 0.303 0.144 0.308 0.200 0.255 0.197 0.255 0.201 0.251 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.638 0.066 0.059 0.022 0.020 0.427 0.848 0.451 0.824 0.587 0.682 0.594 0.682 0.598 0.678 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.022 0.004 0.003 0.005 0.004 0.000 0.055 0.011 0.033 0.013 0.032 0.012 0.031 0.013 0.030 0.021 0.012 0.002 0.004 0.004 0.000 0.060 0.016 0.027 0.011 0.028 0.013 0.030 0.014 0.029 0.025 0.010 0.004 0.004 0.004 0.000 0.057 0.011 0.039 0.016 0.031 0.017 0.033 0.018 0.032

Notes: 1. Jackknife standard errors are obtained based on a column deletion (from species 1 to 10); 2. Bootstrap standard errors are obtained based on 200 Bootstrap samples for which resampling is done over the rows; 3. JKSE and BSE represent the Jackknife standard errors and the Bootstrap standard errors, respectively; 4. JKSER and BSER represent the robust versions of the Jackknife standard errors and the Bootstrap standard errors utilizing the IQR, respectively; 5. JKCIL and JKCIU represent the lower limit and the upper limit of the 95% Jackknife confidence intervals, respectively; 6. JKCIRL and JKCIRU represent the lower limit and the upper limit of the 95% Jackknife confidence intervals utilizing the robust standard errors JKSER, respectively; 7. BCIL and BCIU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the percentile method, respectively; 8. BCIstL and BCIstU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the standard normal method, respectively; 9. BCIRL and BCIRU represent the lower limit and the upper limit of the 95% Bootstrap confidence intervals obtained by the standard normal method and utilizing the robust standard errors BSER, respectively.

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

estimators (JKSER and JKCIR) are most preferable when the errors have conspicuous temporal correlation. Comment: A small simulation study based upon the first example (the independent case) indicated that the robust estimator had only 85% coverage compared to a nominal 95% coverage level, and the robust estimator frequently gave standard errors that are too small as well. Therefore, our recommendation is to use robust estimators to estimate the ˆ only when the measurement errors have standard errors of P strong correlation. Based on the same study, in the case of independent measurement errors, the JKCI cover the true values about 97% of the time.

177

confidence intervals (BCI, BCIst, or BCIR). Recall that the ordinary Bootstrap method ignores temporal correlation in the errors and as a result it leads to smaller uncertainty estimates than what they should have been. On the other hand, it seems that our new Jackknife method based on a column deletion is not significantly affected by temporal correlation in the errors and leads to the reasonable uncertainty estimates for P even when there exist temporal correlations in the data. The standard error estimates for A obtained by using the Jackknife approach and the Bootstrap approach are also given in the Appendix A (see Table A3). It can be seen from the table that the standard errors of A obtained by the Jackknife approach are noticeably larger than those obtained by the Bootstrap approach.

3.2. Real example 4. Summary and conclusion The data were obtained from the Texas Natural Resources Conservation Commission, which were responsible for running the monitoring site. The data were collected between June 18, 1993, and November 30, 1993 from a collection site on Clinton Drive. An automated gas chromatograph reported hourly average concentrations in ppb C of 54 volatile organic compounds (VOC) and TNMOC. See Henry et al. [1] and Gajewski and Spiegelman [6] for more details about this data set. Using wind data that were also available, the observations with the 180° ∼ 190° wind direction were chosen for analysis. Previous studies determined that a three-source receptor model is the best to use for this data, see Gajewski and Spiegelman [6] and Gajewski [11]. The measurements from 10 species are used in this paper. Table 4 shows the species included in the analysis along with their summary statistics. The CNLS method utilizing the identifiability conditions on the source composition matrix is used again to obtain the point estimates of P. Fig. 1 shows the principal component plot of the data and the fitted sources. From the plot it can be seen that the estimated source profiles give a reasonable fit to the data. Both the standard error estimates and the interval estimates for P are obtained by each of a new Jackknife method and an ordinary Bootstrap method. The results are presented in Table 5. It can be observed from Table 5 that the standard error estimates from our new Jackknife method (JKSE or JKSER) are in general larger than the standard error estimates from the Bootstrap method (BSE or BSER). The Jackknife confidence intervals (JKCI or JKCIR) are also observed to be wider than the Bootstrap

Using both simulated and a real example we demonstrated a Jackknife approach to receptor modeling uncertainty evaluation. It appears in the examples of temporally correlated data that the Jackknife approach tends to produce larger standard error estimates and wider confidence intervals than the Bootstrap method done under the assumption of independence. It is known that the impact of wrongly assuming independence during the application of the Bootstrap method is to bias standard error estimates downward. That is because the typical serial correlation reduces the effective degrees of freedom of the resulting estimates. We believe that until a more theoretically justified method is found that it is often better to state wider uncertainty intervals in these situations. Bayesian approaches to uncertainty estimation may be needed when the Jackknife intervals are too wide to be practically useful. Acknowledgement The authors gratefully acknowledge many helpful comments and suggestion from Professor Byron Gajewski. This work was supported by the United States Environmental Protection Agency (U.S. EPA) the Science to Achieve Results (STAR) program under Grant R8310780. Although the research described in this article has been funded by the U.S. EPA, the views expressed herein are solely those of the authors and do not represent the official policies or positions of the U.S. EPA.

Appendix A. Standard errors for the source contribution matrix A Table A1 ˆ ), Jackknife standard errors, and Bootstrap standard errors when the observations are independent True source contributions (A0), estimated source contributions (A Observation number

A01

Aˆ 1

JKSER

BSER

A02

ˆ2 A

JKSER

BSER

A03

ˆ3 A

JKSER

BSER

1 2 3 4 5 6 7

2.88 2.40 2.92 1.65 3.04 3.55 3.13

2.88 2.28 2.78 1.52 2.95 3.53 3.15

0.06 0.17 0.09 0.09 0.18 0.06 0.08

0.10 0.08 0.08 0.06 0.11 0.04 0.12

2.93 2.85 4.54 3.47 2.37 2.80 4.59

3.10 3.04 4.60 3.49 2.42 2.94 4.72

0.09 0.17 0.14 0.09 0.20 0.23 0.09

0.11 0.09 0.12 0.09 0.09 0.09 0.14

3.49 2.57 2.39 2.10 3.54 0.95 4.02

3.31 2.52 2.46 1.93 3.49 1.01 3.84

0.18 0.23 0.17 0.13 0.16 0.12 0.18

0.13 0.11 0.17 0.12 0.11 0.12 0.17

(continued on next page)

178

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

Table (continued) A1 (continued ) Observation number

A01

Aˆ 1

JKSER

BSER

A02

Aˆ 2

JKSER

BSER

A03

ˆ3 A

JKSER

BSER

8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

1.42 1.98 2.57 2.54 5.11 4.04 3.32 4.96 2.66 4.19 2.40 3.09 3.46 2.37 4.06 3.94 2.30 4.52 2.30 3.59 3.67 5.31 3.91 3.49 4.28 3.10 0.64 3.26 2.87 2.30 1.67 2.99 3.40 1.97 2.65 1.98 2.88 1.48 3.32 2.45 2.42 1.59 1.88

1.42 1.91 2.57 2.55 5.02 3.96 3.35 4.87 2.56 4.28 2.48 2.98 3.44 2.21 4.03 3.87 2.37 4.51 2.33 3.44 3.80 5.34 3.83 3.37 4.17 3.15 0.65 3.15 2.72 2.10 1.70 2.91 3.46 2.03 2.72 2.07 2.95 1.42 3.50 2.46 2.42 1.46 1.95

0.07 0.09 0.15 0.06 0.07 0.15 0.19 0.04 0.12 0.14 0.06 0.13 0.05 0.04 0.07 0.07 0.08 0.05 0.10 0.08 0.03 0.10 0.16 0.09 0.26 0.07 0.07 0.12 0.07 0.08 0.12 0.06 0.08 0.19 0.05 0.06 0.11 0.15 0.10 0.10 0.07 0.14 0.09

0.07 0.10 0.08 0.11 0.08 0.05 0.12 0.16 0.08 0.10 0.06 0.09 0.13 0.06 0.10 0.07 0.08 0.13 0.07 0.11 0.11 0.08 0.07 0.08 0.09 0.11 0.10 0.09 0.04 0.07 0.07 0.09 0.05 0.06 0.05 0.08 0.08 0.09 0.11 0.09 0.08 0.10 0.13

2.92 1.77 3.06 3.37 1.64 2.61 4.55 3.50 1.86 1.88 3.55 1.00 2.68 0.67 2.89 0.88 1.98 2.96 3.01 2.75 2.92 3.52 3.06 2.99 4.86 2.19 3.99 4.21 1.73 3.28 4.07 3.00 2.74 3.24 2.06 2.60 4.06 3.01 3.50 3.26 5.14 4.77 3.62

3.36 1.87 3.24 3.58 1.63 2.87 4.53 3.52 2.10 2.47 3.76 1.06 2.77 0.80 3.31 1.23 2.22 3.21 3.04 2.96 3.10 3.82 3.03 3.21 5.12 2.32 4.14 4.37 1.78 3.70 4.26 2.97 3.00 3.49 2.25 2.85 4.15 3.22 3.98 3.47 5.29 4.98 3.60

0.16 0.12 0.13 0.19 0.13 0.20 0.23 0.09 0.23 0.18 0.07 0.17 0.08 0.07 0.12 0.17 0.14 0.10 0.13 0.09 0.09 0.17 0.14 0.17 0.27 0.16 0.17 0.09 0.09 0.15 0.18 0.16 0.09 0.27 0.07 0.15 0.14 0.13 0.10 0.12 0.10 0.14 0.14

0.08 0.09 0.09 0.12 0.08 0.09 0.13 0.14 0.09 0.10 0.10 0.06 0.12 0.05 0.11 0.06 0.08 0.13 0.09 0.10 0.11 0.13 0.09 0.09 0.13 0.09 0.11 0.12 0.06 0.11 0.11 0.10 0.09 0.09 0.07 0.10 0.11 0.09 0.12 0.10 0.14 0.14 0.12

2.32 3.29 2.63 3.73 1.98 1.62 3.71 4.86 2.79 3.64 1.90 2.51 4.24 1.77 3.38 2.36 2.82 4.23 2.22 3.48 3.89 2.99 1.89 2.72 2.48 3.68 3.22 2.73 1.34 2.46 2.29 2.75 1.34 1.74 1.83 3.17 2.75 3.07 4.28 2.99 2.74 3.33 4.27

2.04 3.19 2.43 3.52 2.10 1.46 3.71 4.97 2.60 2.95 1.73 2.57 4.07 1.98 3.18 2.01 2.44 4.14 2.14 3.39 3.62 2.47 1.90 2.49 2.31 3.46 3.10 2.59 1.12 2.25 2.17 2.84 0.88 1.45 1.57 2.91 2.52 2.87 3.60 2.73 2.54 3.10 4.17

0.10 0.07 0.23 0.24 0.17 0.19 0.34 0.15 0.09 0.21 0.08 0.12 0.13 0.07 0.14 0.16 0.26 0.12 0.11 0.19 0.12 0.24 0.21 0.21 0.35 0.18 0.23 0.15 0.11 0.27 0.22 0.19 0.14 0.24 0.15 0.09 0.21 0.23 0.22 0.28 0.09 0.22 0.19

0.12 0.08 0.12 0.14 0.10 0.11 0.17 0.16 0.09 0.12 0.14 0.08 0.12 0.05 0.14 0.08 0.09 0.14 0.11 0.12 0.13 0.15 0.12 0.12 0.19 0.10 0.14 0.16 0.07 0.14 0.14 0.12 0.12 0.13 0.09 0.11 0.15 0.12 0.16 0.12 0.19 0.16 0.13

ˆ 1, A ˆ 2, and A ˆ 3 represent the estimated source Notes: 1. A01, A02, and A03 represent the true source contributions for Source 1, Source 2, and Source 3, respectively; 2. A contributions for Source 1, Source 2, and Source 3, respectively; 3. JKSER and BSER represent the robust versions of the Jackknife standard errors and the Bootstrap standard errors for the estimated source contributions utilizing the IQR, respectively. Table A2 ˆ ), Jackknife standard errors, and Bootstrap standard errors when the observations are temporally True source contributions (A0), estimated source contributions (A correlated Observation number

A01

Aˆ 1

JKSER

BSER

A02

Aˆ 2

JKSER

BSER

A03

ˆ3 A

JKSER

BSER

1 2 3 4 5 6 7 8 9 10 11 12

22.75 36.71 44.04 20.40 19.70 19.21 24.56 22.58 14.52 15.44 20.92 23.29

24.19 39.63 47.27 23.16 21.47 20.89 26.35 24.24 16.84 16.74 22.11 23.48

2.19 2.51 2.69 2.22 1.68 2.04 2.03 1.95 3.04 2.54 2.93 2.90

0.57 0.61 0.86 0.79 0.65 0.75 0.66 0.65 0.92 1.15 0.99 0.68

17.68 16.33 21.28 24.70 19.43 23.28 14.84 18.77 21.70 41.43 32.07 21.19

14.88 12.95 17.42 22.22 18.19 21.10 13.36 16.34 17.98 38.04 28.38 18.89

2.39 3.89 5.00 2.52 2.77 2.21 2.26 2.21 0.73 0.54 2.48 2.31

0.75 0.74 0.92 1.06 0.80 0.95 0.73 0.77 1.09 1.52 1.27 0.86

19.20 19.55 31.88 36.69 20.77 26.33 29.85 24.76 48.69 24.55 30.68 23.45

20.27 19.36 32.02 36.98 21.09 28.11 30.55 26.57 50.78 27.79 34.48 26.84

3.84 6.22 6.41 3.83 3.76 3.39 3.25 4.27 3.48 2.69 3.60 3.66

1.01 0.96 1.25 1.47 1.22 1.42 0.96 1.11 1.35 2.41 1.86 1.21

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

179

Table (continued) A2 (continued ) Observation number

A01

Aˆ 1

JKSER

BSER

A02

ˆ2 A

JKSER

BSER

A03

ˆ3 A

JKSER

BSER

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

18.95 11.73 12.90 16.05 14.53 10.63 10.02 15.23 20.80 19.86 30.51 25.94 22.79 27.88 26.49 31.74 36.43 23.71 21.06 12.24 19.02 23.40 17.79 15.47 18.48 12.33 16.77 16.23 28.99 20.09 11.28 6.16 8.83 16.25 22.12 28.19 15.12 14.78

20.68 12.64 14.42 17.16 16.00 12.35 11.85 17.51 23.40 21.23 33.22 27.75 24.35 29.38 27.15 32.00 36.87 23.48 21.11 12.70 18.78 22.84 18.17 15.92 19.02 13.94 18.96 18.61 31.44 23.20 14.65 8.37 11.04 18.78 24.85 31.52 17.74 17.45

2.95 3.38 3.52 3.19 3.14 2.95 3.50 2.78 4.11 3.71 2.51 1.80 2.06 2.05 3.32 2.17 2.38 2.27 2.99 3.26 4.69 4.04 2.66 2.99 3.93 4.19 4.99 5.14 3.28 2.04 2.00 2.80 1.37 1.79 2.04 1.67 2.39 2.98

0.66 0.57 0.53 0.52 0.50 0.60 0.66 0.72 1.06 1.25 0.78 0.57 0.51 0.52 0.44 0.45 0.53 0.50 0.66 0.61 0.68 0.50 0.64 1.13 0.59 0.61 0.62 0.61 0.60 0.45 0.47 0.40 0.50 0.44 0.47 0.48 0.33 0.45

14.99 16.97 9.54 9.36 13.73 17.42 21.23 21.38 34.43 46.09 17.81 11.50 14.11 15.31 12.01 8.34 13.04 14.76 22.76 22.35 23.57 14.96 21.33 41.54 19.06 21.07 16.97 15.56 11.96 13.32 10.26 9.72 8.29 8.39 11.03 10.85 7.30 10.97

11.67 14.49 6.83 7.04 11.85 15.74 20.15 21.60 33.59 44.24 14.75 6.89 11.19 11.86 8.69 5.69 10.14 12.81 19.49 19.58 20.68 12.86 19.32 38.92 16.13 16.67 11.79 8.85 5.14 7.47 5.82 6.06 5.41 5.23 8.31 8.45 5.07 9.95

1.92 2.51 1.79 1.51 1.55 2.91 2.57 2.46 5.10 5.50 3.68 1.90 1.97 3.41 2.98 1.62 2.30 3.10 4.66 3.95 4.96 4.36 3.96 2.85 2.67 2.04 2.52 2.42 2.12 2.09 1.33 1.12 0.45 1.06 1.68 2.00 1.20 2.10

0.74 0.65 0.59 0.63 0.69 0.78 0.90 0.97 1.34 1.69 0.95 0.52 0.55 0.55 0.44 0.46 0.59 0.53 0.69 0.64 0.74 0.50 0.68 1.27 0.61 0.65 0.67 0.61 0.57 0.47 0.44 0.39 0.48 0.46 0.51 0.56 0.28 0.51

31.60 24.40 26.07 25.91 22.86 26.36 23.86 25.81 30.05 24.85 38.55 25.40 18.88 15.54 9.82 12.76 10.42 13.06 8.57 5.88 10.44 12.63 9.28 10.25 13.37 17.95 24.35 28.02 26.69 18.67 23.10 19.74 25.07 21.78 18.43 19.11 15.13 22.35

35.51 27.92 30.47 29.47 25.91 29.40 26.07 26.12 30.55 27.04 40.12 29.06 21.19 17.92 13.03 15.60 12.72 15.36 12.44 8.64 14.43 16.39 12.00 13.19 16.48 21.09 28.39 32.70 30.59 22.67 25.78 22.29 27.25 24.21 20.39 19.18 15.28 21.50

3.66 3.70 3.38 2.95 2.88 1.14 2.51 2.04 3.55 5.68 3.96 2.51 3.00 4.28 2.87 2.19 3.73 3.96 4.23 4.16 5.60 5.65 4.03 3.35 3.79 4.57 4.01 3.84 3.09 2.63 1.96 1.75 1.88 2.05 2.00 2.06 1.62 3.45

0.95 0.97 0.64 0.67 0.92 1.16 1.32 1.42 2.12 2.74 1.13 0.65 0.77 0.73 0.50 0.42 0.63 0.74 1.09 1.06 1.16 0.75 1.10 2.16 1.00 1.04 0.79 0.69 0.54 0.55 0.48 0.42 0.49 0.47 0.59 0.63 0.32 0.61

ˆ 1, A ˆ 2, and A ˆ 3 represent the estimated source Notes: 1. A01, A02, and A03 represent the true source contributions for Source 1, Source 2, and Source 3, respectively; 2. A contributions for Source 1, Source 2, and Source 3, respectively; 3. JKSER and BSER represent the robust versions of the Jackknife standard errors and the Bootstrap standard errors for the estimated source contributions utilizing the IQR, respectively. Table A3 ˆ ) and the corresponding Jackknife standard errors and Bootstrap standard errors for the VOC data in Fig. 1 Estimated source contributions (A Observation number

ˆ1 A

JKSER

BSER

Aˆ 2

JKSER

BSER

Aˆ 3

JKSER

BSER

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

102.94 29.32 84.05 68.11 78.69 119.44 46.33 19.92 39.74 35.50 45.40 53.72 35.50 38.34 32.67 9.90 41.25

9.38 2.73 7.01 8.04 8.12 10.81 4.54 1.88 3.33 4.46 4.61 4.49 4.70 3.92 2.88 1.50 4.59

3.42 0.96 2.34 2.45 2.85 4.03 1.58 0.67 1.33 1.37 1.52 1.91 1.25 1.39 1.17 0.40 1.43

30.66 24.38 47.46 22.91 17.44 55.91 32.61 15.06 45.00 16.15 30.46 17.79 41.27 53.62 39.47 8.30 14.52

3.36 2.36 3.88 2.81 2.15 4.23 2.50 1.22 3.60 1.85 3.25 1.75 2.98 4.48 3.54 0.81 1.51

1.02 0.68 1.13 1.00 0.74 1.76 1.04 0.47 1.36 0.72 0.98 0.53 1.26 1.68 1.21 0.31 0.55

96.77 35.48 128.07 65.61 160.18 94.73 44.24 12.37 34.47 46.95 66.14 107.94 44.52 43.84 34.67 10.48 35.74

7.39 1.85 7.53 4.98 9.24 7.66 2.84 0.98 3.24 3.10 3.61 6.44 1.72 1.75 1.47 1.41 2.94

3.42 1.09 2.88 2.45 3.13 4.16 1.71 0.72 1.59 1.47 1.78 2.35 1.47 1.80 1.35 0.39 1.38

(continued on next page)

180

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

Table (continued) A3 (continued ) Observation number

ˆ1 A

JKSER

BSER

Aˆ 2

JKSER

BSER

Aˆ 3

JKSER

BSER

18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82

67.22 26.31 15.62 13.77 33.75 35.48 52.31 68.40 55.94 58.36 73.03 69.59 107.04 79.76 50.83 39.07 41.33 83.98 75.02 108.46 109.99 96.08 67.04 34.09 39.87 71.18 93.54 35.28 41.12 47.44 91.97 111.52 106.91 95.31 20.23 37.05 31.67 115.52 75.00 135.68 16.96 21.90 47.92 39.90 35.61 34.65 60.35 31.57 40.72 40.33 52.00 88.56 84.17 52.46 57.93 42.49 35.24 105.63 54.11 76.73 93.98 92.92 47.27 32.08 28.27

6.28 2.96 1.91 1.61 3.71 3.06 5.32 6.54 4.68 5.48 6.97 6.83 10.87 7.77 4.50 3.60 3.79 7.45 7.11 12.85 10.56 9.02 6.14 3.41 3.79 6.29 9.69 4.61 4.54 5.16 8.72 10.32 9.18 8.35 2.31 3.71 2.93 11.10 7.14 15.35 2.02 2.20 4.73 4.00 3.53 3.33 5.37 3.32 4.16 3.16 4.51 8.30 8.90 5.48 5.62 4.72 3.82 10.32 5.56 7.87 9.87 9.47 4.40 3.33 3.30

2.14 0.96 0.58 0.49 1.19 1.08 1.80 2.19 1.29 1.84 2.28 2.23 3.54 2.61 1.61 1.25 1.27 2.56 2.34 3.32 3.45 3.14 2.12 1.13 1.33 2.32 3.22 1.45 1.30 1.62 2.93 3.42 3.11 3.25 0.72 1.16 0.99 3.78 2.40 4.86 0.61 0.76 1.59 1.38 1.21 1.24 2.30 1.16 1.52 1.18 1.74 2.74 2.80 2.02 1.99 1.56 1.25 3.29 1.88 2.51 3.23 3.25 1.61 1.11 0.98

28.86 25.05 24.85 13.35 21.93 12.83 15.65 34.71 22.65 31.81 18.16 18.52 23.73 40.59 42.37 21.86 17.04 116.01 52.35 52.73 33.87 124.67 37.70 15.94 15.71 16.41 17.52 33.54 20.25 9.60 15.22 15.90 132.22 87.65 16.74 25.76 15.30 27.13 11.81 52.41 14.23 6.69 11.61 10.74 8.14 12.35 15.06 13.30 17.18 19.10 38.04 72.81 34.84 39.28 52.30 58.95 20.67 30.60 20.71 15.85 24.28 47.94 36.80 23.79 25.07

2.08 2.53 2.41 1.30 2.16 0.94 1.64 2.41 1.36 2.21 3.41 2.73 4.66 3.19 3.02 1.58 1.20 10.60 4.99 4.41 3.18 12.19 2.75 1.35 1.93 3.48 2.23 3.03 1.98 0.98 1.63 1.46 11.81 7.83 1.31 2.24 1.32 2.08 1.18 4.63 1.40 0.76 1.29 1.17 0.83 1.26 1.76 1.52 1.86 1.32 2.83 6.76 3.59 3.98 5.17 5.38 2.14 4.93 1.15 2.34 2.24 4.40 3.70 2.28 2.34

0.91 0.85 0.83 0.45 0.75 0.37 0.63 1.04 0.51 0.95 0.61 0.64 0.94 1.33 1.21 0.70 0.50 3.19 1.51 1.70 1.12 3.64 1.16 0.54 0.53 0.59 0.92 1.52 0.65 0.55 0.71 0.73 3.68 2.62 0.76 0.79 0.46 1.35 0.55 2.54 0.51 0.26 0.46 0.43 0.36 0.44 0.60 0.50 0.65 0.46 1.17 2.03 1.13 1.40 1.66 2.06 0.69 1.02 0.70 0.65 0.97 1.60 1.23 0.81 0.84

68.25 12.59 13.87 7.07 25.45 27.37 44.64 65.28 77.88 35.12 45.37 43.52 66.07 48.11 62.26 19.70 32.72 50.91 47.35 25.46 27.55 31.95 23.33 21.10 31.37 41.42 54.84 25.84 20.58 34.76 40.43 50.79 50.34 121.65 27.53 14.24 19.82 57.23 39.98 81.88 12.27 20.97 29.68 41.00 40.45 54.17 159.26 51.00 68.16 45.70 70.18 74.74 88.27 99.39 49.21 23.62 30.63 123.79 117.10 68.58 83.47 107.72 46.07 31.37 23.81

3.96 2.74 2.39 1.33 2.18 2.54 3.69 4.13 4.77 3.30 4.20 4.51 7.20 4.32 2.76 1.61 2.44 2.81 3.13 3.57 5.05 1.63 4.64 2.07 2.93 5.55 6.17 4.19 2.22 2.09 4.48 5.32 3.33 4.97 4.23 2.19 1.49 6.59 3.62 10.58 1.67 1.76 3.37 3.19 3.11 3.35 8.59 3.29 3.99 3.63 3.12 4.24 6.09 5.96 3.24 4.67 2.83 6.86 6.07 5.84 7.34 6.70 2.98 1.89 3.08

2.18 1.00 0.74 0.50 1.19 1.11 1.83 2.27 1.53 2.00 2.23 2.15 3.50 2.77 1.78 1.34 1.28 3.54 2.64 3.72 3.57 4.20 2.29 1.18 1.37 2.23 3.03 1.26 1.38 1.58 2.75 3.18 4.10 3.48 0.66 1.29 1.04 3.53 2.19 4.70 0.63 0.75 1.49 1.38 1.18 1.33 2.89 1.21 1.60 1.27 1.99 3.05 2.72 2.27 2.20 1.82 1.28 3.61 2.31 2.56 3.24 3.26 1.71 1.19 1.03

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

181

Table (continued) A3 (continued ) Observation number

ˆ1 A

JKSER

BSER

Aˆ 2

JKSER

BSER

Aˆ 3

JKSER

BSER

83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146

14.45 40.01 26.75 16.31 52.47 71.39 77.23 133.22 14.51 47.97 13.63 35.61 61.36 29.51 37.28 40.33 34.06 75.43 55.88 84.68 27.89 49.83 28.35 14.29 5.97 31.25 25.45 45.39 31.04 31.61 17.76 6.91 27.97 31.83 19.92 46.58 38.10 70.52 20.45 28.80 35.16 43.86 43.47 42.27 33.86 24.17 28.06 43.19 6.27 11.42 5.50 119.45 15.67 30.52 37.21 104.09 35.08 16.47 30.94 22.00 13.26 31.67 9.65 20.67

1.31 3.59 2.48 1.65 4.88 7.07 6.79 13.19 1.86 4.61 1.49 3.76 7.02 4.34 4.41 4.69 4.32 7.88 5.45 8.92 3.86 6.12 3.22 1.82 0.59 3.09 2.59 4.91 3.06 3.03 1.66 0.79 3.31 4.21 2.27 5.04 3.78 7.45 2.40 3.61 4.00 5.14 5.09 4.14 2.82 2.65 2.99 5.62 0.81 1.26 0.65 11.23 1.69 3.20 3.81 10.00 3.83 1.19 3.05 2.21 2.86 3.03 1.19 2.60

0.50 1.31 0.93 0.57 1.65 2.34 2.30 4.26 0.55 1.53 0.46 1.23 2.29 1.37 1.35 1.47 1.38 2.51 1.81 2.78 1.36 2.11 0.99 0.53 0.20 1.05 0.89 1.63 1.27 1.10 0.61 0.30 0.99 1.15 0.81 1.67 1.28 2.45 0.76 1.04 1.32 1.49 1.57 1.39 1.07 0.90 0.91 1.45 0.24 0.43 0.24 3.74 0.57 1.00 1.28 3.34 1.24 0.52 1.03 0.81 0.60 1.07 0.35 0.72

12.78 23.96 25.66 18.06 16.34 15.71 18.62 19.96 9.20 7.51 6.57 24.87 19.08 25.11 18.77 15.12 17.41 16.96 16.46 25.27 79.56 75.12 12.26 7.55 9.84 13.13 19.11 21.17 203.18 79.46 15.22 8.92 17.70 16.10 17.02 21.07 20.50 48.43 22.12 15.93 26.47 16.67 17.95 24.39 22.53 14.48 12.80 26.10 11.00 17.05 29.79 27.87 20.73 17.30 28.60 131.96 23.48 9.48 12.88 23.70 50.77 49.50 8.24 13.98

1.23 2.36 2.51 1.82 0.94 2.83 0.94 5.18 0.82 0.69 0.64 2.50 1.97 2.40 1.80 1.78 1.99 1.70 1.57 2.95 8.16 7.72 1.34 0.71 0.95 1.19 1.42 2.31 19.11 7.61 1.56 1.01 1.79 2.11 1.19 2.48 1.40 5.44 1.92 1.45 3.12 2.60 2.40 2.63 2.12 1.05 1.09 2.27 1.20 1.71 2.89 2.55 2.00 1.98 2.84 12.79 1.83 0.65 0.95 2.47 5.22 4.99 0.80 1.10

0.40 0.77 0.82 0.54 0.49 0.57 0.55 0.93 0.35 0.33 0.23 0.79 0.90 1.39 0.85 0.64 0.80 0.73 0.60 1.04 2.35 2.44 0.46 0.30 0.29 0.40 0.63 0.81 5.60 2.27 0.45 0.30 0.60 0.62 0.60 0.82 0.68 1.64 0.71 0.61 0.96 0.65 0.75 0.81 0.58 0.52 0.43 0.82 0.35 0.58 0.87 1.05 0.68 0.55 0.93 3.97 0.96 0.23 0.42 0.75 1.53 1.60 0.26 0.39

25.45 30.17 22.49 18.00 54.20 61.06 100.00 67.93 9.46 30.17 13.89 16.78 67.22 22.91 31.66 35.34 41.68 42.34 54.56 44.17 98.11 91.72 15.06 14.82 12.33 63.85 29.58 49.56 43.46 19.85 25.72 10.07 56.19 41.72 60.52 49.95 60.01 73.37 42.32 38.60 64.82 35.73 46.80 50.90 56.12 51.80 16.20 57.24 5.62 7.98 3.07 43.85 19.82 20.52 47.02 91.54 89.02 79.58 57.41 52.56 24.99 20.06 17.32 29.86

1.18 2.78 1.55 0.94 4.16 5.37 6.97 8.59 0.97 2.41 1.10 1.90 4.69 9.40 4.35 2.53 3.43 4.21 3.51 3.13 3.12 3.97 1.80 1.11 0.62 1.50 1.80 3.24 8.70 2.24 1.34 0.52 2.66 2.30 2.59 3.34 3.20 3.89 1.64 2.16 2.85 2.82 3.05 2.84 3.24 2.46 1.45 2.87 0.43 1.10 0.94 8.34 1.06 1.74 4.02 3.38 4.88 3.21 2.15 2.13 2.79 1.68 0.73 2.15

0.58 1.36 0.98 0.65 1.70 2.30 2.39 4.04 0.60 1.41 0.46 1.25 2.34 1.15 1.35 1.45 1.34 2.35 1.82 2.72 2.52 2.70 1.04 0.56 0.29 1.25 1.01 1.67 5.00 2.16 0.71 0.38 1.22 1.22 1.12 1.70 1.48 2.75 1.00 1.13 1.64 1.52 1.59 1.55 1.32 1.13 0.94 1.64 0.35 0.53 0.80 3.67 0.68 1.06 1.42 3.93 1.56 1.04 1.22 1.05 1.21 1.46 0.44 0.83

(continued on next page)

182

C.H. Spiegelman, E.S. Park / Chemometrics and Intelligent Laboratory Systems 88 (2007) 170–182

Table (continued) A3 (continued ) Observation number

ˆ1 A

JKSER

BSER

Aˆ 2

JKSER

BSER

Aˆ 3

JKSER

BSER

147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183

19.93 30.60 94.09 27.23 44.94 35.75 31.55 21.60 26.79 26.35 34.49 73.91 49.13 15.78 48.19 10.86 12.76 44.83 9.57 12.00 10.47 20.71 12.62 19.24 15.21 11.37 17.31 32.83 18.34 27.25 25.57 6.44 8.20 44.88 7.76 4.18 7.54

1.62 3.10 8.67 2.46 5.84 3.93 3.23 2.23 2.87 2.66 3.94 9.39 5.59 1.64 5.33 0.73 0.80 5.39 0.90 1.07 1.25 2.31 1.69 1.30 2.19 1.14 1.57 4.12 3.24 2.95 2.65 0.64 0.30 4.65 0.87 0.49 1.04

0.59 1.08 3.09 0.86 1.49 1.41 1.22 0.64 0.93 0.90 1.25 2.35 1.91 0.55 1.63 0.40 0.37 1.56 0.38 0.46 0.41 0.79 0.46 0.51 0.61 0.41 0.68 1.22 0.76 0.95 0.90 0.20 0.24 1.50 0.36 0.18 0.33

15.42 18.78 32.70 28.66 29.26 22.09 19.31 17.30 19.87 18.17 17.70 63.49 33.86 36.17 35.36 9.60 20.42 45.97 17.00 9.21 7.46 29.21 8.37 9.89 9.13 10.42 21.26 24.17 8.31 7.66 18.44 11.27 18.72 20.58 28.28 6.29 13.06

1.20 1.87 2.80 2.82 2.45 1.50 2.01 1.63 1.92 1.87 1.96 6.02 3.55 3.40 3.38 0.72 1.75 4.62 1.75 0.83 0.83 3.00 0.84 0.60 1.02 0.95 2.16 1.94 1.33 1.02 2.03 0.85 1.12 2.19 2.62 0.65 1.01

0.39 0.62 1.10 0.88 0.89 0.80 0.71 0.48 0.66 0.62 0.65 1.95 1.23 1.08 1.12 0.29 0.52 1.48 0.51 0.28 0.27 1.01 0.27 0.23 0.35 0.32 0.64 0.81 0.39 0.38 0.67 0.27 0.39 0.66 0.79 0.20 0.42

35.50 51.76 149.63 30.21 42.51 128.64 63.73 18.45 26.07 44.77 42.87 76.16 117.51 48.20 88.16 23.70 57.63 72.08 29.17 67.29 27.06 36.21 17.50 86.86 41.21 61.53 75.89 57.76 58.05 34.52 19.23 45.25 87.55 104.73 92.48 24.38 25.63

2.29 2.96 5.04 1.94 2.21 5.49 3.86 1.74 1.81 2.79 2.85 4.29 6.49 3.12 4.88 0.85 2.34 5.58 0.98 1.47 1.23 2.70 0.79 3.69 1.81 2.48 3.02 2.79 2.59 2.42 2.33 1.59 2.83 5.04 2.27 0.85 0.98

0.83 1.24 3.56 0.94 1.57 2.12 1.40 0.74 1.02 1.11 1.28 2.47 2.31 1.04 2.02 0.55 0.85 1.90 0.62 0.92 0.54 0.99 0.53 1.19 0.78 0.86 1.19 1.46 1.06 1.06 0.92 0.63 1.11 2.01 1.30 0.38 0.52

ˆ 1, A ˆ 2, and A ˆ 3 represent the estimated source contributions for Source 1, Source 2, and Source 3, respectively; 2. JKSER and BSER represent the robust Notes: 1. A versions of the Jackknife standard errors and the Bootstrap standard errors for the estimated source contributions utilizing the IQR, respectively.

References [1] R.C. Henry, C.H. Spiegelman, J.F. Collins, E.S. Park, Reported emissions of organic gases are not consistent with observations, Proceedings of National Academy of Science of the USA 94 (1997) 6596–6599. [2] E.S. Park, P. Guttorp, R.C. Henry, Journal of the American Statistical Association 96 (2001) 1171–1183. [3] P. Paatero, P.K. Hopke, X.H. Song, Z. Ramadan, Chemometric and Intelligent Laboratory Systems 60 (2002) 253–264. [4] E.S. Park, C.H. Spiegelman, R.C. Henry, Environmetrics 13 (2002) 775–798. [5] W.F. Christensen, S.R. Sain, Technometrics 44 (2002) 328–337.

[6] B.J. Gajewski, C.H. Spiegelman, Environmetrics 15 (2004) 613–634. [7] R.C. Henry, Chemometrics and Intelligent Laboratory Systems 77 (2005) 59–63. [8] R.C. Henry, E.S. Park, C.H. Spiegelman, Chemometrics and Intelligent Laboratory Systems 48 (1999) 91–97. [9] E.S. Park, R.C. Henry, C.H. Spiegelman, Communications in StatisticsTheory and Methods 29 (2000) 723–746. [10] J.W. Tukey, F. Mosteller, Data Analysis and Regression: A Second Course in Statistics, Addison-Wesley, Reading, Mass, 1977. [11] B.J. Gajewski, PhD dissertation, 2000. [12] B. Efron, R.J. Tibshirani, An Introduction to the Bootstr0ap, Chapman & Hall, New York, 1993.