Best iterative initial values for PLS in a CSI model

Best iterative initial values for PLS in a CSI model

Mathematical and Computer Modelling 46 (2007) 439–444 www.elsevier.com/locate/mcm Best iterative initial values for PLS in a CSI model Chuanmei Wang ...

296KB Sizes 1 Downloads 95 Views

Mathematical and Computer Modelling 46 (2007) 439–444 www.elsevier.com/locate/mcm

Best iterative initial values for PLS in a CSI model Chuanmei Wang a,b , Hengqing Tong b,∗ a School of Management, Huazhong University of Science and Technology, Wuhan, 430074, PR China b Department of Mathematics, Wuhan University of Technology, Wuhan, Hubei, 430070, PR China

Received 1 November 2005; received in revised form 20 September 2006; accepted 4 October 2006

Abstract According to the characters of customer satisfaction index (CSI) models, we transformed these models into common regression models, and we find suitable iterative initial values with the constraint of a unit vector for latent variables. The convergence of the new algorithm is also illustrated in this paper. Consequently, the partial least square (PLS) algorithm for CSI models is improved greatly with the best iterative initial values. The results of this paper have been embodied into the software DASC. c 2006 Elsevier Ltd. All rights reserved.

Keywords: Customer satisfaction index (CSI) model; Partial least squares (PLS); Iteration initial values; Convergence; Constraint solution

1. Introduction The structural equation model (SEM), first put forward by J¨oreskog [7], has been a booming branch of applied statistics in recent years, mainly including causal models [3]. Partial least squares (PLS) was invented by Wold (a mentor to J¨oreskog, a founder of SEM) [13], which has been widely applied in psychology and sociology as well as other fields, especially in the customer satisfaction index (CSI) model [2,8]. Rather than seek overall optimization in parameter estimates through a full information estimation technique (such as maximum likelihood), Wold opted for limited information methods that provide statistically inferior estimates but which make minimal demands on the data. Thus, PLS may represent a pragmatic alternative to SEM. Chin [4] on the basis of the study of Fornell and Johnson [2] introduced the PLS algorithm for SEM in detail. Belonging to the structural equation models (SEM), the customer satisfaction index (CSI) models are combinations of three equations: X = 3 X ξ + δ, Y = 3Y η + ε and η = Bη + 0ξ + ζ , where 3 X , B, 0 and 3Y are all unknown coefficient matrices; ξ is an unknown exogenous structural vector, η is an unknown endogenous structural vector; X and Y are observations corresponding to ξ and η respectively; δ, ε (as it follows are the same) and ζ are random errors. However, these models are different from the factor analysis models. The elements in the factor analysis models are not limited, but most of the elements in the models above are zero. Also these models are different from the ordinary linear regression models: Y = Xβ + ε, where Y and X are all observations and β is an unknown coefficient vector. These functions are also different from the evaluation models (estimated with the projection iterative algorithm between ∗ Corresponding author. Tel.: +86 027 87651213; fax: +86 027 87651213.

E-mail address: [email protected] (C. Wang). c 2006 Elsevier Ltd. All rights reserved. 0895-7177/$ - see front matter doi:10.1016/j.mcm.2006.10.009

440

C. Wang, H. Tong / Mathematical and Computer Modelling 46 (2007) 439–444

Fig. 1. The path analysis graph of the CCSI model.

convex sets) put forward by one of this paper’s authors, Tong only X is known, the dependent N [10]: Y = Xβ +ε, where 0 , l = (1, 1, . . . , 1), β > 0 and variable Y and coefficient β are unknown, but Y = y l , y = (y , y , . . . , y ) n m n i 1 2 P i βi = 1. We can see, besides the ordinary regression models and factor analysis models, with the appearance of structural equation models and evaluation models, that multi-variable linear regression models corresponding to variables known or unknown are relatively complete. The calculation of SEM plays an important role in the application of CSI analysis which was required by a series of ISO9000 criterions in China [11]. There are many application software systems for SEM, such as LISREL, AOMS, EQS, SEPATH, MPLUS and the CALIS module in SAS which have been exploited by many international well-known software companies. The main techniques used to estimate SEM in applications are LISREL and PLS [1,5,12]. And PLS was the method put forward to estimate a CSI model by Fornell and Johnson [2]. In this paper, we will discuss the PLS algorithm for CSI models. We know that there are some serious problems in the PLS algorithm of SEM: the convergence of the iteration cannot be ensured, or the convergence rate is terribly slow. It was said that for 250 samples the calculation would take four to five minutes on a computer on a professional computer in 1996 [2,6]. 2. The customer satisfaction index model (path analysis) A SEM includes two systems of equations. One is the relation system of equations among the structural variables, which is called the structural system of equations (see Eq. (1)). The other is the relation systems of equations between the structural variables and observed variables (see Eqs. (2) and (3)), which is called the observed system of equations. The Chinese Customer Satisfaction Index (CCSI) model shown in Fig. 1 is a typical SEM, including 6 latent variables (ξ is also called an exogenous variable, and η1 –η5 are also called endogenous variables.), 11 path coefficients (the functionary relationships from exogenous variables to endogenous variables marked with γ1 –γ4 , denoted by broken lines; the functionary relationships between endogenous variables marked with βi j , denoted by real lines), 24 observations (they are also called manifest variables: questions answered by customers, which act as indicators). The structural relationship among the latent variables (structural model) can be put as follows: η   0 1 η2  β21    η3  = β31 η  β 4 41 η5 0

0 0 β32 β42 0

0 0 0 β43 0

0 0 0 0 β54

ζ  0 η1  γ1  1 0 η2  γ2  ζ2        0 η3  + γ3  ξ1 + ζ3  .      ζ  0 η4 γ4 4 0 η5 0 ζ5

(1)

The structural variables are latent and cannot be observed directly. Each structural variable is corresponding to many observed variables as shown in Fig. 1. Suppose that there are m observed variables and each one has n observed values. For instance, there are n customers’ surveys in the CSI model, and then we will get an n × m matrix. Here, we suppose that the observations corresponding to the exogenous variable are denoted as xi , i = 1, . . . , 5; the observations corresponding to the endogenous variables are denoted as y ji , i = 1, . . . , 5. From the figure we can

C. Wang, H. Tong / Mathematical and Computer Modelling 46 (2007) 439–444

describe the relationship function that the latent variables give rise to the observations as       δ1 λ1 x1  ..   ..   ..  ξ + = . . 1 . δ5 λ5     y1i ε1i λ1i  ..   ..   ..   .  =  .  ηi +  .  ,

441

(2)

x5





λ ji

y ji

i = 1, . . . , 5

(3)

ε ji

where λi , λ ji are loadings. δi (i = 1, . . . , 5) are zero-mean random terms, which are not correlated with the latent variable ξ i . ε ji are zero-mean random terms not correlated with the latent variables ηi (i = 1, . . . , 5). On the other hand, the relationship functions that the observations give rise to the latent variables can be denoted as follows: ξ1 =

5 X

ω j x j + εξ

(4)

j=1

ηi =

L(i) X

ωi j y ji + εηi ,

i = 1, . . . , 5,

(5)

j=1

where L(i) denotes the observation’s number corresponding to the ith latent variable. εξ , εηi are random errors with zero mean and not correlated with the other variables. The combination of (1)–(3) (or (1), (4) and (5)) is called a SEM and sometimes we also call it a path analysis model. 3. The estimation of the iterative initial value with PLS At present, the internationally popular algorithm for SEM is the PLS algorithm. Let us take the CCSI model in Fig. 1 for example. The solution put forward by Professor Fornell et al. is as follows. Firstly, a group of initial values of ω j and ωi j are given, and then we can deduce the initial value of ξ1 and ηi from Eqs. (4) and (5). Once we have the initial value of ξ1 and ηi , we can put them in structural model (1); then we can estimate the coefficients βi j and γi j based on OLS. With the estimated ξ1 , βi j and γi j , we can forecast ηi again with Eq. (1). With the up-to-the-minute ξ1 and ηi , corresponding to the observations, we can estimate ω j and ωi j again from Eqs. (4) and (5). Through the partial least squares (PLS) method iterating time after time, until the solution is steady, the last ωi j will take part in the calculation of the customer satisfaction index (CSI). The iterative process mentioned above can be described as follows: (4)(5) (1) (1) (4)(5) (0) (1) (ω j , ωi j )(0) −→(ξˆ1 , ηˆ i )(0) −→(γi , βi j )(0) −→(ξˆ1 , ηˆ i )(1) −→(ω j , ωi j )(1)

where endogenous means that the parameter estimation is obtained from structural equations, and exogenous means that the parameter estimation is obtained from observed equations. Professor Fornell, the founder of the CSI model, had admitted his PLS method did not always ensure the convergence of iterative processes. In his paper [2] and much other literature, they gave arbitrary initial values and usually took (1, 0, 0, . . . , 0)0 . Of course, the convergence has not been well proven. In fact, the initial value cannot be given casually, and it can be calculated with ordinary least squares (OLS) under a unitary deal of ω j and ωi j . We will deduce the best iterative initial value of the customer satisfaction index model to ensure the astringency of the iterative algorithm based on the sense of ordinary least squares (OLS). It is reported that it may cost four or five minutes to adopt the partial least squares (PLS) method in a CSI model for 250 samples and 18 indexes [6]. It is obviously too slow. We find that an arbitrary initial value is not necessary and we can calculate the best iterative initial value based on PLS. First of all, let us specify some essential properties of PLS. (1) The solution of the model consisting of Eqs. (1)–(3) is not unique, and may be different as much as constant times. That is, if (η1 , . . . , η5 , ξ1 ) is a solution, then (cη1 , . . . , cη5 , cξ1 ) is also, where c is an arbitrary constant. Therefore, we can normalize latent variables η1 , . . . , η5 , ξ1 to a unit variable.

442

C. Wang, H. Tong / Mathematical and Computer Modelling 46 (2007) 439–444

P L(i) (2) Eq. (3) is equivalent to Eq. (5) if j=1 ωi j λ ji = 1. So the solution of Eq. (3) is equal to that of Eq. (5) based on PLS. In the past, the PLS algorithm has not proceeded based on Eq. (3) but based on Eq. (5). Here, we obtain the best iterative initial value based on either Eq. (3). Note that Eq. (3), if we define Yi = (y1i , . . . , y ji ), with the assumption that each observation variable has n observations, then Yi is a n by j matrix. Note that 3Y i = (31i , . . . , 3 ji ), εi = (ε1i , . . . , ε ji ), and consequently, the above function can be written as Yi = ηi 3Y i + εi .

(6)

With Yi0 multiplying the two sides of the above function, under the sense of ordinary least squares (OLS) equality, we have Yi0 Yi ≈ 30Y i ηi0 ηi 3Y i + εi0 εi = ηi0 ηi 30Y i 3Y i + εi0 εi . If we choose the latent variables satisfying ηi0 ηi = 1, then Yi0 Yi ≈ 30Y i 3Y i + εi0 εi .

(7)

In fact, E(Yi0 Yi ) = 30Y i 3Y i + E(εi0 εi ), so Eq. (7) is reasonable. We can see that with our assumption, E(εi0 εi ) is a diagonal matrix which can be denoted as E(εi0 εi ) = Ψ = diag(ϕ12 , . . . , ϕ 2j ). Define E(Yi0 Yi ) = Σ ; we get Σ ≈ 30Y i 3Y i + Ψ .

(8)

That is y01i y1i

y01i y2i

y0 y  2i 1i  ···

y2i y02i ··· y0ji y02i



y0ji y1i

y01i y ji

 2 λ1i + ϕ12 0   · · · y2i y ji  ≈  λ2i λ1i ··· ···   ··· 0 λ ji λ1i · · · y ji y ji ···



λ1i λ2i λ22i + ϕ22 ··· λ ji λ2i

 ··· λ1i λ ji ··· λ2i λ ji  . ··· ···  · · · λ2ji + ϕ 2j

(9)

Note that the elements in the left matrix are the products of two vectors, but the elements in the right matrix are the products of two numbers. Choosing equal elements in the diagonal we can get λ2ki + ϕk2 = y0ki yki ,

k = 1, . . . , j

(10)

where λki reflects the influence of the latent variable ηi on the observation variable yki . Now our task is to estimate 3Y i and Ψ from Eq. (8). First we should estimate Ψ . As Ψ is a diagonal matrix, we can estimate it as follows: define Ψˆ = diag(ϕˆ12 , . . . , ϕˆ 2j ), ϕˆk2 = 1/σii ; here Σ ˆ−1 = (σi j ), i.e. the diagonal elements of Ψˆ are the reciprocals of the diagonal elements of Σ ˆ−1 . With Ψˆ and Σ ˆ−1 , from Eq. (8) we can estimate 3Y i . Because 30 3Y i is a nonnegative definite matrix, and 3Y i Yi

is a j-dimension row-vector, we can see that the rank of 30Y i 3Y i is 1, so we can make an orthogonal transformation with an orthogonal matrix as follows: 0 0 (Σ − Ψ ) 0 = diag(l1 , 0, . . . , 0) = Φ j× j .

(11)

Let 0 1 be the first column of 0, so 1/2

1/2

30Y i 3Y i = 0Φ0 0 = (0 01l1 )0 (0 01l1 ).

(12)

And we get the estimator ˆ Y i = 0 0 l 1/2 . 3 1 1

(13)

ˆ X in the same way. So we can get the best iterative initial value 3 ˆ Y i = (λˆ 1i , . . . , λˆ ji ) (i = Similarly, we can estimate 3 0 ˆ X = (λˆ 1 , . . . , λˆ 5 ) . 1, 2, . . . , 5) and 3

C. Wang, H. Tong / Mathematical and Computer Modelling 46 (2007) 439–444

443

Next we will estimate the elements of the latent variable ηi . Note ηi = (ηi1 , ηi2 , . . . , ηin )0 ; we will estimate its elements one by one. ! η i1

From Eq. (6), that is Yi = (y1i , . . . , y ji ) =

. . . ηin

(λ1i , . . . , λ ji ) + εi . Each yki (k = 1, . . . , j) has n observations,

denoted as yki = (yki1 , . . . , ykin )0 (k = 1, . . . , j). Then for the (s, k)th element of Yi we have the approximative relation below: ykis ≈ λki ηis ,

k = 1, . . . , j; s = 1, . . . , n.

(14)

We denote vector ys = (y1is , . . . , y jis ) as the sth row of matrix Yi . Consequently, the above function can be denoted as ys ≈ ηis 3Y i . With 30Y i multiplying the two sides of ys ≈ ηis 3Y i from the right, under the sense of OLS equality, we get ys 30Y i ≈ ηis 3Y i 30Y i . In fact,

E(ys 30Y i )

ηˆ is =

=

(15)

ηis 3Y i 30Y i

ys 30Y i , 3Y i 30Y i

Then we can estimate ηis as

s = 1, . . . , n.

(16)

Here 3Y i is estimated as previously. In this way we obtain the estimation of all latent variables. They satisfy

L(i)

X

ωi j y ji → min

η i −

j=1

(17)

whose geometric significance is for finding the distance between a unit spherical surface and a hyperplane. Its solution is unique if there are no linear dependences among y ji . It is necessary that the unit spherical surface and the hyperplane have no point of intersection. Otherwise the left P L(i) of Eq. (17) may be zero, namely Eq. (5) has a precise solution or satisfies ηi = j=1 ωi j y ji . Then we can conclude that there are linear dependences among the y ji based on Eq. (3). Eq. (1) includes five recursive equations. With the constraint kξ k = 1, kηk = 1, we can estimate (ξˆ1 , ηˆ i )(0) with Eqs. (2) and (3). Once we have the initial value of ξ1 and ηi , we can put them in structural model (1), and then we can estimate the coefficients βi j and γi j based on OLS. With the estimated ξ1 , βi j and γi j , we can forecast ηi again with Eq. (1). With the up-to-the-minute ξ1 and ηi , corresponding to the observations, we can estimate ω j and ωi j again (1) (2) from Eqs. (4) and (5). Then we can forecast new (ξˆ1 , ηˆ i ) with Eqs. (4) and (5) again. Through the partial least squares (PLS) method iterates time after time, until the solution is steady, the last ωi j will take part in the calculation of the customer satisfaction index (CSI). The above improved algorithm can be expressed as (xi , yi j ) (0)

(2) (3),kξ k=1,kηk=1

(1)

−→

(4) (5)

(1)

(1)

(ξˆ1 , ηˆ i )(0) −→(γi , βi j )(0) −→ (4)(5)

(1)

(2)

(ξˆ1 , ηˆ i )(1) −→ (ω j , ωi j )(0) −→(ξˆ1 , ηˆ i )(2) . Then it starts iterating in PLS. It is proven that the convergence rate is much greater. The process has been programmed and this procedure is often iterated only once or twice to convergence. For 250 samples it costs only 1 or 2 s to finish computing and the convergence rate has been enhanced by hundreds of times. The process has been received in the software DASC [9]. 4. Conclusion In this article, under the sense of ordinary least squares (OLS), we propose the best iterative initial value for estimating a customer satisfaction index model (path analysis). This is done by first formulating the SEM and applying the concept used for CSI. Our approach enables us to apply the customer satisfaction index model (path analysis) quickly and exactly. And the programme has been received in the software DASC [9], which has been actualized successfully.

444

C. Wang, H. Tong / Mathematical and Computer Modelling 46 (2007) 439–444

Acknowledgements This work was supported by the National Natural Science Foundation of China (30570611), the Innovation Fund (02 C26214200218s) (The Ministry of Science and Technology of the PR China) and the National Social Science Foundation of China (No.06BJY101). References [1] A. Giuffrida, R.F. Iunes, W.D. Savedoff, Health and poverty in Brazil: Estimation by structural equation model with latent variables, Technical Note on Health, 2005. [2] C. Fornell, M.D. Johnson, E.W. Anderson, The American customer satisfaction index: Nature, purpose, and findings, J. Mark 60 (1996) 7–18. [3] P.M. Bentler, Comparatative fit in dexesin structural models, Psychological Bulletin 107 (2) (1980) 238–246. [4] W.W. Chin, The partial least squares approach for structural equation modeling, in: G.A. Marcoulides (Ed.), Modern Methods for Business Research, Lawrence Erlbaum, Mahwah, NJ, 1998, pp. 295–336. [5] W.W. Chin, M.K.O. Lee, A proposed model and measurement instrument for the formation of IS satisfaction: The case of end-user computing satisfaction, in: Proceedings of the Twenty First International Conference on Information Systems, Brisbane, Queensland, Australia, 2000, pp. 553–563. [6] Department of Quality Control, General Administration of Quality Supervision and Inspection of the People’s Republic of China, Tsinghua University China Centre for Enterprise Research, Directory for CCSI, Standards Press of China, 2003. [7] K.G. J¨oreskog, A general method for estimating a linear structural equation system, in: A.S. Goldberger, O.D. Duncan (Eds.), Structural Equation Models in the Social Sciences, Seminar Press, New York, 1973. [8] S.Y. Sohn, T.H. Moon, Structural equation model for predicting technology commercialization success index (TCSI), Technological Forecasting & Social Change 70 (2003) 885–899. [9] H. Tong, Data Analysis & Statistical Computation (DASC) Software, Science Press, Beijing, 2005. [10] H. Tong, Evaluation model and its iterative algorithm by alternating projection, Mathematical and Computer Modeling 18 (1993) 55–60. [11] H. Tong, Theoretical Econometrics, Science Press, Beijing, 2005. [12] V.E. Vinzi, The PLS Approach to Generalised Linear Models and Causal Path Modeling: Algorithm and Applications, IASC Sessions, Interface Meeting, Montreal, Canada, 2002. [13] H. Wold, in: H. Wold, K. J¨oreskog (Eds.), Soft Modeling, The Basic Design and Some Extensions, in: Systems Under Indirect Observation: Causality, Structure, Prediction, vol. 1, North-Holland, Amsterdam, 1982, pp. 263–270.