Computational Statistics & Data Analysis 38 (2001) 95–111 www.elsevier.com/locate/csda
Dispersion e#ects in signal-response data from fractional factorial experiments P.C. Wang ∗ , D.F. Lin Institute of Statistics, National Central University, Chung Li, Taiwan 32054 Received 1 November 2000; received in revised form 1 March 2001
Abstract The identi4cation of dispersion e#ects is as important as that of location e#ects in industrial experiments for improving a manufacturing process. Many good procedures have been proposed for conducting identi4cation when the data resulting from the experiment are produced via a simple response system. However, the identi4cation of dispersion e#ects in the data from a signal-response system does not attract many researchers. The generalization of methods in the usual data from a simple response system to the data from a signal-response system is a good approach. In this article, two existing methods for the analysis of the data from a signal-response system and extend several methods in the usual data to the data from a signal-response system are reviewed. Comparisons among these methods c 2001 Elsevier Science B.V. All rights reserved. are made using simulations and an example. Keywords: Slope e#ects; Dispersion e#ects; SN ratio; Regression; Score statistics
1. Introduction A manufacturing process is usually expected to yield high quality products stably. To improve the quality of a process, parameter designs have been utilized to set up experiments for gathering information. Such designs are usually called robust parameter designs. In Taguchi’s (1986) terminology, two kinds of factors are involved in these designs: control factors and noise factors. The levels of control factors can be easily adjusted in the laboratory and practical environment, while the levels of ∗ Corresponding author. Tel.: +886-3-4267224. E-mail address:
[email protected] (P.C. Wang).
c 2001 Elsevier Science B.V. All rights reserved. 0167-9473/01/$ - see front matter PII: S 0 1 6 7 - 9 4 7 3 ( 0 1 ) 0 0 0 4 5 - 7
96
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
noise factors cannot be controlled in practical environment but might be controlled in the laboratory. The goal of robust parameter designs is to 4nd active control factors and then determine optimal levels of these factors so that the performance of the process is stable and insensitive to noise factors. To analyze the data resulting from conducting a robust parameter design, Taguchi (1986) suggested using di#erent signal-to-noise (SN) ratios for di#erent types of data. Since the measurement of SN ratio takes both location and dispersion into consideration, the analysis of SN ratio seems better than the traditional analysis of variance. However, it lacks theoretical supports. Although it is popular among engineers, many statisticians criticized its usefulness and correctness and suggested many more appropriate methods. For the analysis of data from replicated experiments, assuming a relationship between location and dispersion, Leon et al. (1987) 4rst pointed out the limitations of SN ratio and suggested using the PerMIA measure, a transformation of data. Box (1988) proposed lambda plots to 4nd appropriate transformations for the data. Winterbottom (1992) 4gured out sample mean and variance of observations in each experimental condition and ran regression to 4nd the relation between mean and variance for obtaining an appropriate transformation for the data. All these researches just tried to 4nd replacements of SN ratio. On the other hand, Vining and Myers (1990) proposed a dual response approach for achieving the goal of experiments. Simultaneously, for the analysis of data from unreplicated experiments, many statisticians suggested methods for 4nding factors and=or interactions with dispersion e#ects directly. Box and Meyer (1986) used F-test for detecting these e#ects, while Wang (1989) used asymptotic chi-square statistics. Recently, Bergman and Hynen (1997) pointed out that uncorrected identi4cation of location e#ects might lead to inappropriate identi4cation of dispersion e#ects and thus suggested another F-statistic for detecting dispersion e#ects. Pan (1999) further emphasized this problem by a numerical example, a simulation study and mathematical arguments. To deal with it, he suggested using a replicated design to eliminate the impact of unidenti4ed location e#ects on the identi4cation of dispersion e#ects. All the above articles dealt with the static data in Taguchi’s (1986) terminology. Criticism of Taguchi’s SN ratios for the analysis of dynamic data came out recently. Lunani et al. (1997) pointed out the limitation of SN ratio for analyzing dynamic data and the misleading nature of its use. In addition, they suggested using graphical methods to obtain optimal manufacturing conditions. The graphical methods without theoretical supports might be subjective. Miller and Wu (1996) gave another alternative with two approaches by evaluating chosen performance measurements. One approach, similar to Taguchi’s analysis on SN ratio, requires two stages to identify factors and=or interactions that a#ect the slope and=or dispersion. The other is an extension of the response modeling approach proposed by Welch et al. (1990) and Shoemaker et al. (1991). One feature of the latter approach is to gain more information on control-by-noise interactions. They treat both noise and control factors as usual factors to 4nd optimal combinations for yielding products with dynamic characteristics. The word “dynamic” might cause some confusion as suggested by the referee. Here, we use “signal-response” instead because a data point in such data results from a set of signals. In this article, we establish a model for
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
97
analyzing signal-response data without assuming any functional relation between slope and dispersion. Under the model, we propose two analytic procedures for signal-response data that can be treated as extensions of Wang’s (1989) method for the static data. These two procedures are used to identify e#ects a#ecting the slope and those a#ecting dispersion separately. For comparisons of our procedures with possible analyzing methods for signal-response data, we introduce the existing methods and extend Winterbottom’s (1992) idea to obtain a method for analyzing signal-response data in the next section. In Section 3, detailed procedures to deal with signal-response data are given. A computer simulation for comparisons among these methods is presented in Section 4 and an example for the illustration of the proposed procedures is shown in Section 5. 2. The existing analysis methods Usually, data from each experimental run in an experiment are either single or in some replicates. Taguchi (1986) called such data static. However, in industrial applications, there exists another type of data called signal-response data. A data point of signal-response data is a set of responses corresponding to levels of a predetermined factor called a signal factor. Assume that each treatment combination of control factors has r replicates and these replicates might come from di#erent combinations of noise factors where r is a positive integer. Denote (M1 ; M2 ; : : : ; Mm ) as levels of the signal factor. For the jth replicate in the ith experimental run, we obtain a set of m values (yij1 ; yij2 ; : : : ; yijm ). The purpose of analyzing signal-response data is to obtain the optimal treatment combination of control factors that gives us desired slope and less variation or dispersion. Those factors including control and noise factors, and their interactions that impact the magnitude of the slope are called slope factors and slope interactions, respectively, while those a#ecting dispersion are called dispersion factors or dispersion interactions. For analyzing signal-response data, we review three possible methods as follows: 2.1. Taguchi’s method Assuming a model yijk = 0 + i Mk + ijk with E( ijk ) = 0 and Var( ijk ) = i2 , i = 1; 2; : : : ; n; j = 1; 2; : : : ; r; k = 1; 2; : : : ; m, Taguchi (1987) computed, for each i, an unbiased estimator ˆ2i = MSE of i2 and an unbiased estimator of i2 , 2 ˆi = (MSR − MSE )=r (Mk − ML )2 ; k
where ML is the average of {Mk , k = 1; 2; : : : ; m}, and MSR and MSE are mean squares due to regression and error, respectively. Then signal-to-noise ratio (SN ratio) for dynamic (or signal-response) data was de4ned in Taguchi (1987) to be 2 SN ratio = 10 log( ˆ = ˆ2 ): (1) i
i
He suggested using SN ratios to identify dispersion e#ects and least square estimates of i ’s to identify slope e#ects. The analysis of variance or half-normal plots for SN
98
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
ratios and estimates are carried out for these purposes. For convenience, we call this method the SN ratio method from now on. 2.2. Miller and Wu’s method Miller and Wu (1996) provided two approaches for modeling and analyzing data, namely performance measure modeling (PMM) and response function modeling (RFM). We do not consider the PMM approach further here because it is close to the SN ratio method. In the second approach, they suggested to compute the least square estimate of and the mean squared error in the jth replicate of the ith run as follows: ˆij = (yijk − yL ij+ )(Mk − ML )= (Mk − ML )2 ; (2) k
and ˆ2ij =
k
(yijk − yL ij+ − ˆij (Mk − ML ))2 =(m − 2);
(3)
k
where ML is the average of Mk ’s, and yL ij+ is the average of {yijk ; k = 1; 2; : : : ; m}. Then Miller and Wu (1996) drew half-normal plots of ˆij ’s and ˆij ’s (or log(ˆ2ij )’s) to identify slope e#ects and dispersion e#ects, respectively. 2.3. Winterbottom’s extension Winterbottom (1992) proposed an alternative of SN ratio analysis for static data. He suggested three steps: (i) For each i, compute the sample mean yL i and sample standard deviation si for the data in the ith run; (ii) regress log(si ) on log(yL i ) to ob2 tain the estimate of regression slope, ; (iii) transform yL i to yL (1i −) and 10 log(yL 2 i =si ) to obtain two sets of data and identify location or dispersion e#ects based on these two sets of data, respectively. Notice that the second transformed measurement in the last step is exactly a PerMIA measurement proposed by Leon et al. (1987). Also, when = 1, it is exactly the static SN ratio for nominal data given in Taguchi (1986). We extend Winterbottom’s (1992) idea to establish a method for analyzing signalresponse data. We consider sample slope and estimated variance instead of sample mean and sample variance. Here, we use four steps: (i) For data given in the ith experimental run, we regress yijk on Mk to obtain the least square estimate of the slope, (yijk − yL i++ )(Mk − ML )=r (Mk − ML )2 ; (4) ˆi = j
k
k
where ML is the average of Mk ’s and yL i++ is the average of {yijk ; j = 1; 2; : : : ; r; k = 1; 2; : : : ; m}. The estimated variance for each i is the mean square for error in
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
the regression, ˆ2i = (yijk − yL i++ − ˆi (Mk − ML ))2 =(mr − 2): j
99
(5)
k
(ii) Take log(ˆi ) as the response and log( ˆi ) as an explanatory variable and run a simple linear regression to obtain the estimate of regression slope, . (iii) Transform (1−) 2 for identifying slope e#ects and to 10 log( ˆi = ˆ2i ) for dispersion data {yijk } to ˆi e#ects. (iv) Draw half-normal plots for two sets of transformed data in (iii) to identify slope and dispersion e#ects. We call this extension the generalized signal-to-noise ratio (GSN ratio) method. 3. A score method In this section, we extend Wang’s (1989) method to develop analytic methods for analyzing signal-response data. Wang (1989) proposed a model for identi4cation of factors and=or interactions with location and=or dispersion e#ects in static data. Extending his idea, we propose a model for analyzing signal-response data. Assume the following model for signal-response data: yijk = 0 + ij Mk + ijk ;
i = 1; 2; : : : ; n; j = 1; 2; : : : ; r; k = 1; 2; : : : ; m;
(6)
where ijk ’s are independent N (0; ij2 ) and ij2 is the variance in the jth replicate of the ith experimental run. Let xij and zij denote experimental conditions for detecting factors a#ecting slope and dispersion (or variance), respectively, in the jth replicate of the ith experimental run. The linear relationship ij = 0 + xijT between xij and slope is adopted and logarithmic linear relationship ij2 = exp( + zijT ) is given between zij and variance to guarantee positive variance, where = (1 ; 2 ; : : : ; q )T , ; = (1 ; 2 ; : : : ; p )T and 0 are parameters. Denoting !ij = exp(zijT ) and = exp(), we have ij2 = !ij . For convenience, the following notations are used: response vector yij = (yij1 ; yij2 ; : : : ; yijm )T , signal vector M = (M1 ; M2 ; : : : ; Mm )T , constant vector J m×1 = (1; 1; : : : ; 1)T , slope matrix X = (X1 ; X2 ; : : : ; Xp ) = (x11 ; : : : ; x n1 ; : : : : : : ; x1r ; : : : ; x nr )T , dispersion matrix Z = (Z1 ; Z2 ; : : : ; Zq ) = (z11 ; : : : ; zn1 ; : : : : : : ; z1r ; : : : ; znr )T . To identify active e#ects on slope and dispersion, we use two score statistics. Here we just give the results and derivations are given in the appendix. All the estimates in the score statistics are maximum likelihood estimates under the null hypothesis. Without loss of generality we separate and into two subsets, respectively, by T = (T1 ; T2 ) and T = (T1 ; T2 ), where 2 and 2 are parameters corresponding to the e#ects under testing. The letter I denotes the information matrix for parameters ( 0 ; 0 ; ; ; ). Under null hypothesis H0 : 2 = 0, we have the score statistic RTX X2 I−1 X2T RX ;
(7)
where I−1 is a squared submatrix of the inverse of I corresponding to those parameters in under testing, a submatrix X2 of X contains Xi ’s corresponding to the param eters under testing, RX = ( k eˆ ijk Mk = ˆ2ij ) is an nr ×1 vector, ˆ2ij = ˆ!ˆ ij = ˆ exp(zijT ), ˆ
100
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
and eˆ ijk = yijk − ˆ0 − (ˆ0 + xijT )M ˆ k with ˆT = (ˆT1 ; 0) is a residual in the jth replicate of the ith run. Similarly, under null hypothesis H0 : 2 = 0, we have the score statistic RTZ Z2 (Z2T Z2 )−1 Z2T RZ =2m;
(8)
where a submatrix Z2 of Z contains Zi ’s corresponding to the parameters under test ing, RZ = ( k eˆ2ijk = ˆ2ij ) is an nr × 1 vector, ˆ2ij = ˆ!ˆ ij = ˆ exp(zijT ) ˆ with ˆT = (ˆT1 ; 0), ˆ k with ˆT = (ˆT1 ; ˆT2 ) is a residual in the jth replicate and eˆ ijk = yijk − ˆ0 − (ˆ0 + xijT )M of the ith run. Both statistics (7) and (8) follow limiting chi-squared distributions with degrees of freedom equal to dimensions of 2 and 2 , respectively. For convenience, we call statistics (7) and (8) slope and dispersion statistic, respectively. Based on the above results, we propose two methods: Score method 1: To detect slope e#ects 4rst and then dispersion e#ects. To identify slope e#ects, we assume constant variance. With this assumption, we can reduce the statistic (7) to (M T M )−1 RTX X2 (X2T X2 )−1 X2T RX ;
(9)
1=2 where RX is equal to ( k eˆ ijk Mk = ˆ ). Furthermore, formulas (9) and (8), due to the orthogonality of columns in X2 and Z2 , can be simpli4ed further to (RTX Xi )2 =XiT Xi ; (10) (M T M )−1 i
and
[(RTZ Zi )2 =ZiT Zi ]=2m;
(11)
i
where Xi ’s are contained in X2 and Zi ’s are contained in Z2 . Notice that each component in formula (10) or (11) is the score statistic for null hypothesis i = 0 or null hypothesis i = 0 having a limiting chi-squared distribution with one degree of freedom. Component statistics in (10) and (11) can be used to identify individual slope e#ect and individual dispersion e#ect, respectively. Forward selection is the procedure we use to identify all active e#ects. First, we use statistic (10) to identify slope e#ects one by one. After all active e#ects on slope are found, we use statistic (11) to identify dispersion e#ects one by one. We stop until all active dispersion e#ects are obtained. Score method 2: To detect dispersion e#ects 4rst and then slope e#ects. The identi4cation of dispersion e#ects can be impaired due to uncorrected identi4cation of slope e#ects. To eliminate impact of borderline slope e#ects on identi4cation of dispersion e#ects, we consider residuals. Denote dijk = yijk − E(yijk ); i = 1; 2; : : : ; n; j = 1; 2; : : : ; r; k = 1; 2; : : : ; m. To estimate E(yijk ), we 4rst obtain an unbiased estimate of 0 by ˆ0ij = yL ij+ −[ k (yijk − yL ij+ )(Mk − ML )= k (Mk − ML )2 ]ML in the jth replicate of the ith experimental run, where yL ij+ is the average of yijk ’s. Take the average of these ˆ0ij ’s to estimate 0 , that is, ˆ0 = i j ˆ0ij =nr and then (yijk − ˆ )Mk = M 2. estimate ij by ˆ = ij
k
0
k
k
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
101
Table 1 Two values, IER and EER, are shown for each method Methods
IER Slope
Dispersion
EER Slope
Dispersion
SN ratio GSN ratio RFM Score method 1 Score method 2
0.015 0.015 0.015 0.015 0.015
0.015 0.015 0.015 0.015 0.015
0.08 0.08 0.15 0.195 0.19
0.08 0.08 0.145 0.20 0.205
Finally, we obtain dˆ ijk =yijk − ˆ0 − ˆij Mk ; i=1; 2; : : : ; n; j=1; 2; : : : ; r; k=1; 2; : : : ; m. Replace eˆ ijk in formula (11) by dˆ ijk to obtain the score statistic [(RTZ Zi )2 =ZiT Zi ]=2m; (12) i
2 with RZ equal to the nr × 1 vector ( k dˆ ijk = ˆ2ij ). As in score method 1, forward selection procedure is used to identify all active e#ects. First, based on {dˆ ijk }, we use statistic (12) to identify dispersion e#ects one by one. After all active e#ects on dispersion are found, we use statistic (7) to identify slope e#ects one by one. We stop until all active slope e#ects are obtained. 4. Simulations For comparisons of our proposed methods for analyzing signal-response data with the existing procedures, simulation studies on orthogonal array L8 (27 ) are carried out. One 4-level signal factor and one 2-level noise factor are considered for our study. The values of constant parameters ( 0 ; 0 ; 0 ) = (1; 1; 0:5) and levels (50, 100, 150, 200) in a signal factor are considered. With a speci4ed signi4cance level, usually we 4nd a critical region for a test. In our comparisons, we have many tests to carry out, and the usual procedures might not be fair because the number of misidenti4ed factors is uncertain. For this reason, we adopt individual error rate (IER) given by Hamada and Balakrishnan (1998) for establishing critical regions. IER is the proportion of false identi4cation on individual e#ect. Assuming no active factors, we simulate 10,000 sets of data and 4gure out critical regions that give average IER 0.015 for all the methods. Our methods use the score statistics, while the others use half-normal plots. For reference, we compute the proportion of false identi4cation, called experimentwise error rate (EER), in each testing procedure in Table 1. Our testing methods have a higher EER. Now to 4gure out the di#erences of powers among these 4ve methods, six models are considered: (i) a model with one slope factor; (ii) a model with one dispersion factor; (iii) a model with one slope factor and one dispersion factor; (iv) a model with one factor impacting slope and dispersion simultaneously; (v) a model with one slope factor, one slope interaction and one dispersion factor; and (vi) a model with one slope factor, one dispersion factor
102
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
Fig. 1. Powers of 4ve methods under model (i) are shown as a function of a slope parameter.
and one dispersion interaction. Notice that in the last two models, we consider the interactions of a noise factor with a slope factor and a dispersion factor, respectively. Under each model, several values of parameters are investigated. In model (i), 1 varies from 0 to 0.01 with increment 0.001, while 1 varies from 0 to 1.0 with increment 0.1 in model (ii). In model (iii), 4x 1 = 0:0025 or 0.005, and vary 1 as in model (ii). In model (iv), 4x 1 = 0:25 or 0.5, and vary 1 as in model (i). Finally, in models (v) and (vi) we use 0.005 and 0.5 for interaction parameters, respectively, and the remaining parameters are the same as in models (iii) and (iv). Using 4ve symbols of points, namely circle, box, diamond, triangle and reversed triangle to represent the SN ratio method, GSN ratio method, RFM method, Score methods 1 and 2, respectively, we show our results in six 4gures. Figs. 1– 6 are drawn based on the simulation results under six models, respectively. As shown in the 4gures, our proposed methods prove superior all the time. This might explain why our methods have higher EER. The phenomenon does not bother us because leaving important dispersion undetected is more serious than detecting nonessential dispersion in practice. The other three methods might change their superiority in di#erent models. 5. An example We use the data from a sensor experiment given by China Productivity Center (1991) as our example. Given a room temperature as a signal factor (M ), the corresponding temperatures detected by the thermal sensor in an air condition were collected. Six control factors and 2-level compound noise factor (N ) are considered to 4nd an optimal sensor design. Control factors are type of circuit linearization (A),
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
103
Fig. 2. Powers of 4ve methods under model (ii) are shown as a function of a dispersion parameter.
Fig. 3. Powers of 4ve methods under model (iii): two subplots are functions of a dispersion parameter when 1 = 0:0025 and 0:005.
detecting software (B), sensor position (C), return intake design (D), 4xing method of the sensor (E), and type of capacitor (F). All factors except factor A have 3 levels. The experiment was set up using orthogonal array L18 (2 × 37 ) as shown in Table 2. ◦ ◦ ◦ ◦ The input room temperatures are 20 C, 23 C, 26 C, and 27 C, and the sensed temperatures are reproduced from China Productivity Center (1991) in Table 2.
104
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
Fig. 4. Powers of 4ve methods under model (iv): two subplots are functions of a slope parameter when 1 = 0:25 and 0:5.
Fig. 5. Powers of 4ve methods under model (v): two subplots are functions of a dispersion parameter when 1 = 0:0025 and 0:005.
We analyze this set of signal-response data by several methods given in Sections 2 and 3. The half-normal plots shown in Figs. 7–12 are drawn for identifying slope and dispersion e#ects based on the measurements in SN ratio, GSN ratio, and RFM methods. Two conventions are used when our proposed score methods are conducted. First, a linear e#ect is included automatically when the corresponding quadratic e#ect
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
105
Fig. 6. Powers of 4ve methods under model (vi): two subplots are functions of a slope parameter when 1 = 0:25 and 0:5.
Table 2 Temperature data Control factors and array
M1 = 20
M2 = 23
M3 = 26
M4 = 27
A
B
C
D
E
F
N1
N2
N1
N2
N1
N2
N1
N2
1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2
1 1 1 2 2 2 3 3 3 1 1 1 2 2 2 3 3 3
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
1 2 3 1 2 3 2 3 1 3 1 2 2 3 1 3 1 2
1 2 3 2 3 1 1 2 3 3 1 2 3 1 2 2 3 1
1 2 3 2 3 1 3 1 2 2 3 1 1 2 3 3 1 2
20.4 21.3 20.3 20.4 20.9 21.0 21.0 20.3 21.2 20.4 20.6 21.5 20.8 20.4 21.0 20.2 20.2 21.0
20.4 21.2 20.3 20.3 20.9 20.9 21.0 20.3 21.3 20.4 20.6 21.4 20.9 20.4 21.0 20.3 20.2 21.1
23.6 24.6 23.4 23.5 24.3 24.3 24.3 23.5 24.5 23.6 23.8 24.9 24.0 23.5 24.1 23.4 23.4 24.6
23.5 24.5 23.4 23.4 24.2 24.4 24.4 23.4 24.4 23.6 23.8 24.9 23.9 23.5 24.0 23.4 23.3 24.6
25.9 27.1 25.9 25.8 26.4 26.5 26.9 25.8 27.1 25.9 26.1 27.2 26.2 26.0 26.0 25.8 25.8 26.6
25.9 27.0 25.8 25.8 26.4 26.5 27.0 25.8 27.0 25.8 26.0 27.2 26.2 25.9 26.5 25.8 25.7 26.6
26.9 27.8 26.9 26.8 27.5 27.5 27.8 26.9 26.8 28.2 26.8 28.1 27.3 26.9 26.5 26.8 26.8 27.6
26.9 27.9 26.8 26.8 27.4 27.5 27.8 26.9 26.9 28.1 26.9 28.0 27.2 27.0 26.4 26.9 26.7 27.6
106
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
Fig. 7. The half-normal plot for identifying slope e#ects using SN ratio.
Fig. 8. The half-normal plot for identifying dispersion using SN ratio.
Fig. 9. The half-normal plot for identifying slope using GSN ratio.
Fig. 10. The half-normal plot for identifying dispersion using GSN ratio.
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
107
Fig. 11. The half-normal plot for identifying slope using RFM approach.
Fig. 12. The half-normal plot for identifying dispersion using RFM approach.
is signi4cant. Second, when the linear e#ect of a factor is signi4cant, we check the importance of its quadratic e#ect in the very next step. The results in the selection procedure are reported step by step in Tables 3 and 4. We summarize the 4ndings of importance e#ects from 4ve methods in Table 5. Notice that all the methods detect the linear dispersion e#ect of factor C and our methods have more 4ndings. Quadratic dispersions of factors C and F and=or linear dispersion of factor E are detected by our methods. Similar 4ndings appear on the identi4cation of slope e#ects. While the 4rst two methods detect linear slope e#ects of factors C and D, we detect e#ects of factor E or F in addition to slope e#ects of factors C and D. Our two methods have di#erent 4ndings due to the order of steps in slope and dispersion identi4cations. If we did not add linear dispersion e#ect of E into our model in the second method, we would detect the slope factor F as in the 4rst method. The second method seems better. The slope e#ect of factor B is found to be signi4cant by the RFM method. When carefully checking the steps in our second method, we 4nd the score values of linear e#ect of factor B pretty high until linear e#ect of factor E is identi4ed. Maybe the choice of di#erent dispersions causes di#erent results.
108
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
Table 3 The estimates of slope and dispersion score statistics for score method 1a Steps Factors
1 2 Slope statistic
3
4
A 0.70 1.13 1.298 1.356 B1 1.06 1.70 1.962 2.051 B2 0.89 1.44 1.656 1.730 C1 11.78 18.95∗ — — C2 3.35 5.39 6.210∗ — D1 0.02 — — — D2 54.46∗ — — — E1 1.24 2.00 2.297 2.401 E2 0.09 0.15 0.167 0.175 F1 0.31 0.50 0.577 0.603 F2 2.96 4.76 5.480 5.728∗ N 0.06 0.10 0.119 0.124 N ×A 0.00 0.00 0.000 0.000 N × B1 0.04 0.06 0.070 0.073 N × B2 0.00 0.01 0.008 0.008 N × C1 0.00 0.01 0.007 0.008 N × C2 0.01 0.01 0.017 0.017 N × D1 0.01 0.01 0.017 0.018 N × D2 0.00 0.00 0.002 0.002 N × E1 0.07 0.11 0.128 0.134 N × E2 0.00 0.00 0.005 0.005 N × F1 0.01 0.02 0.025 0.026 N × F2 0.00 0.00 0.000 0.000 a∗ Stands for the most signi4cant e#ect in each step the model.
5
6 7 Dispersion statistic
8
1.419 2.145 1.810 — — — — 2.511 0.183 — — 0.130 0.000 0.076 0.009 0.008 0.018 0.019 0.002 0.140 0.005 0.028 0.000 and “—”
1.005 0.856 0.525 0.071 2.309 1.772 0.593 0.158 0.097 4.372 — — 9.865∗ — — 0.014 0.278 0.277 0.106 0.000 0.024 3.727 3.245 2.339 2.601 2.398 1.926 0.437 0.768 — 7.182 5.482∗ — 0.016 0.068 0.048 0.004 0.003 0.002 0.000 0.038 0.023 0.095 0.122 0.096 0.000 0.001 0.026 0.003 0.055 0.084 0.020 0.000 0.000 0.001 0.007 0.000 0.071 0.041 0.022 0.000 0.039 0.053 0.007 0.013 0.043 0.293 0.170 0.073 means that the e#ect is included into
Acknowledgements We would like to thank the editor and two referees for their careful reading and comments that led to improvement in the presentation of this paper. Support under National Science Council Grant NSC 89-2118-M-008-014 is also acknowledged.
Appendix A. Given a set of response vectors yij = (yij1 ; yij2 ; : : : ; yijm )T , the loglikelihood, ‘, for parameters ( 0 ; 0 ; ; ; ) is thus ‘( 0 ; 0 ; ; ; ) ˙ [ − m ln(!ij ) − (yij − 0 J − (0 + xijT )M )T i
j
(yij − 0 J − (0 + xijT )M )=!ij ]=2:
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
109
Table 4 The estimates of slope and dispersion score statistics for score method 2a Steps Factors
1 2 Dispersion statistic
A 3.21 B1 0.04 B2 0.40 C1 3.76 C2 9.46 D1 0.68 D2 2.69 E1 8.21 E2 0.73 F1 1.33 F2 11.98∗ N 0.13 N ×A 0.03 N × B1 0.06 N × B2 0.11 N × C1 0.00 N × C2 0.01 N × D1 0.11 N × D2 0.01 N × E1 0.34 N × E2 0.04 N × F1 0.00 N × F2 0.22 a∗ Stands for the the model.
3
4
2.820 2.208 0.198 0.519 0.508 0.014 6.227 — 6.336∗ — 0.861 0.046 1.125 0.771 2.302 4.380∗ 0.110 0.297 — — — — 0.060 0.233 0.003 0.005 0.060 0.031 0.103 0.048 0.144 0.115 0.078 0.190 0.138 0.077 0.005 0.001 0.375 0.359 0.038 0.012 0.000 0.025 0.082 0.003 most signi4cant e#ect in
1.602 0.896 0.182 — — 0.041 0.940 — 0.398 — — 0.142 0.003 0.001 0.119 0.135 0.164 0.114 0.007 0.233 0.000 0.023 0.007 each step
5 6 Slope statistic 4.28 1.44 0.00 5.77 2.25 0.00 62.35∗ 0.83 0.25 0.58 2.34 0.06 0.00 0.03 0.01 0.01 0.03 0.02 0.00 0.09 0.00 0.03 0.00 and “—”
7
8
9
0.246 2.926 2.58 0.394 7.127 5.420 8.49 1.257 1.980 2.078 1.98 1.077 9.807∗ — — — 0.640 7.756∗ — — — — — — — — — — 5.743 8.105 11.50∗ — 0.017 0.035 0.03 0.258 0.218 0.232 1.48 0.118 1.910 0.523 1.85 1.719 0.038 0.052 0.10 0.180 0.004 0.002 0.00 0.002 0.089 0.078 0.04 0.013 0.022 0.013 0.00 0.001 0.015 0.010 0.00 0.012 0.069 0.062 0.08 0.150 0.035 0.040 0.05 0.083 0.009 0.012 0.01 0.014 0.105 0.098 0.11 0.186 0.009 0.010 0.01 0.000 0.096 0.111 0.15 0.230 0.015 0.005 0.00 0.038 means that the e#ect is included into
Table 5 Slope and dispersion e#ects identi4ed by 4ve methods Methods SN ratio GSN ratio RFM Score method 1 Score method 2
Identi4ed e#ects Slope
Dispersion
D1 ; C1 D1 ; C1 D1 ; C1 ; B1 ; B2 ; F1 D2 ; D1 ; C1 ; C2 ; F2 ; F1 D2 ; D1 ; C1 ; C2 ; E1
C1 C1 C1 C 2 ; C 1 ; F2 ; F1 F 2 ; F1 ; C 2 ; C 1 ; E1
Without loss of generality, we separate and into two subsets, respectively, by T = (T1 ; T2 ) and T = (T1 ; T2 ), where 2 and 2 are parameters corresponding to the e#ects under testing. In order to derive the score statistics (7) and (8), we need corresponding 4rst derivatives and the information matrix. The derivatives, U and
110
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
U , of loglikelihood ‘ with respect to and are expresssed as @‘ xij M T eij =!ij ; U = = @ i j
(A.1)
and U =
@‘ = zij eijT eij =2!ij ; @ i j
(A.2)
where eij = yij − 0 J − (0 + xijT )M . The information matrix, I , for parameters ( 0 ; 0 ; ; ; ) is m 1 1 1=!ij
i j 1 Mk =!ij i j k I = 1 M x =! i j k k ij ij 0
Mk xijT =!ij
0
0
1 2 Mk =!ij i j k
1 2 T Mk xij =!ij i j k
0
0
1 2 Mk xij =!ij i j k
1 2 Mk xij xijT =!ij i j k
0
0
0
0
nm 2 2
0
0
0
0
m T Z Z 2
Mk =!ij
i
j
0
k
i
j
k
(A.3)
ˆ ) ˆ ) Under null hypothesis H0 : 2 = 0, U ( ˆ0 ; ˆ0 ; ˆ1 ; ; ˆ and I ( ˆ0 ; ˆ0 ; ˆ1 ; ; ˆ are obtained from formulas (A.1) and (A.3) evaluated at the maximum likelihood estimates of the remaining parameters. Similarly, under null hypothesis H0 : 2 = 0, we obtain ˆ ˆ ) and I ( ˆ ; ˆ0 ; ; ˆ ˆ ) by evaluating formulas (A.2) and (A.3) at the ˆ ; ˆ ; U ( ˆ0 ; ˆ0 ; ; 1 0 1 maximum likelihood estimates of the remaining parameters. Score statistics UT I−1 U for H0 : 2 = 0 and UT I−1 U for H0 : 2 = 0 are then obtained (see Cox and Hinkley, 1974). After a few algebraic manipulations, score statistic UT I−1 U for H0 : 2 = 0 becomes RTX X2 I−1 X2T RX ; where a submatrix X2 of X contains Xi ’s corresponding to the e#ects under testing, RX = ( k eˆ ijk Mk = ˆ2ij ) is an nr × 1 vector, ˆ2ij = ˆ!ˆ ij = ˆ exp(zijT ), ˆ and eˆ ijk = yijk − T T T ˆ k with ˆ = (ˆ1 ; 0) is a residual in the jth replicate of the ith run. ˆ0 − (ˆ0 + xij )M Similarly, score statistic UT I−1 U for H0 : 2 = 0 becomes RTZ Z2 (Z2T Z2 )−1 Z2T RZ =2m; where a submatrix Z2 of Z contains Zi ’s corresponding to the e#ects under testing, RZ = ( k eˆ2ijk = ˆ2ij ) is an nr × 1 vector, ˆ2ij = ˆ!ˆ ij = ˆ exp(zijT ) ˆ with ˆT = (ˆT1 ; 0), and ˆ k with ˆT = (ˆT1 ; ˆT2 ) is a residual in the jth replicate of eˆ ijk = yijk − ˆ0 − (ˆ0 + xijT )M the ith run.
P.C. Wang, D.F. Lin / Computational Statistics & Data Analysis 38 (2001) 95–111
111
References Bergman, B., Hynen, A., 1997. Dispersion e#ects from unreplicated design in the 2k−p series. Technometrics 39, 191–198. Box, G., 1988. Signal-to-noise ratios, performance criteria, and transformations. Technometrics 30, 1– 40. Box, G.E.P., Meyer, R.D., 1986. Dispersion e#ects from fractional designs. Technometrics 28, 19–27. China Productivity Center., 1991. Proceeding of Quality Case Award. Taipei, Taiwan (in Chinese). Cox, D.R., Hinkley, D.V., 1974. Theoretical Statistics. Chapman & Hall, London. Hamada, M., Balakrishnan, N., 1998. Analyzing unreplicated factorial experiments: a review with some new proposals. Statist. Sinica 8, 1– 41. Leon, R.V., Shoemaker, A.C., Kacker, R.N., 1987. Performance measures independent of adjustment: an explanation and extension of Taguchi’s signal-to-noise ratios. Technometrics 29, 253–285. Lunani, M., Nair, V.N., Wasserman, G.S., 1997. Graphical methods for robust design with dynamic characteristics. J. Qual. Technol. 29, 327–338. Miller, A., Wu, C.F.J., 1996. Parameter design for signal-response systems: a di#erent look at Taguchi’s dynamic parameter design. Statist. Sci. 11, 122–136. Pan, G., 1999. The impact of unidenti4ed location e#ects on dispersion-e#ects identi4cation from unreplicated factorial designs. Technometrics 41, 313–326. Shoemaker, A.C., Tsui, K.L., Wu, C.F.J., 1991. Economical experimentation methods for robust design. Technometrics 33, 415– 427. Taguchi, G., 1986. Introduction to Quality Engineering: Design Quality into Products and Processes. Asian Production Organization, Tokyo, Japan. Taguchi, G., 1987. System of Experimental Design, Vol. 1 and 2. Unipub=Kraus International Publication, White Plains, NY. Vining, G.G., Myers, R.H., 1990. Combining Taguchi and response surface philosophies: a dual response approach. J. Qual. Technol. 22, 38– 45. Wang, P.C., 1989. Tests for dispersion e#ects from orthogonal arrays. Comput. Statist. Data Anal. 8, 109–117. Welch, W.J., Yu, T.K., Kang, S.M., Sacks, J., 1990. Computer experiments for quality control by parameter design. J. Qual. Technol. 22, 15–22. Winterbottom, A., 1992. The use of a generalized signal-to-noise ratio to identify adjustment and dispersion factors in Taguchi experiments. Qual. Reliab. Eng. Int. 8, 45–56.