Multiobjective regression modifications for collinearity

Multiobjective regression modifications for collinearity

Computers & Operations Research 28 (2001) 1333}1345 Multiobjective regression modi"cations for collinearity Stan Lipovetsky*, W. Michael Conklin Cust...

113KB Sizes 0 Downloads 38 Views

Computers & Operations Research 28 (2001) 1333}1345

Multiobjective regression modi"cations for collinearity Stan Lipovetsky*, W. Michael Conklin Custom Research Inc., 8401 Golden Valley Road, Minneapolis, MN 55427, USA Received 1 July 1999; received in revised form 1 April 2000

Abstract In this work we develop a new multivariate technique to produce regressions with interpretable coe$cients that are close to and of the same signs as the pairwise regression coe$cients. Using a multiobjective approach to incorporate multiple and pairwise regressions into one objective we reduce this technique to an eigenproblem that represents a hybrid between regression and principal component analyses. We show that our approach corresponds to a speci"c scheme of ridge regression with a total matrix added to the matrix of correlations. Scope and purpose One of the main goals of multiple regression modeling is to assess the importance of predictor variables in determining the prediction. However, in practical applications inference about the coe$cients of regression can be di$cult because real data is correlated and multicollinearity causes instability in the coe$cients. In this paper we present a new technique to create a regression model that maintains the interpretability of the coe$cients. We show with real data that it is possible to generate a model with coe$cients that are similar to easily interpretable pairwise relations of predictors with the dependent variable, and this model is similar to the regular multiple regression model in predictive ability.  2001 Elsevier Science Ltd. All rights reserved. Keywords: Multicollinearity; Multiobjective optimization; Ridge regression; Principal component analysis; Net e!ects

1. Introduction In this paper we consider a special multivariate technique for producing regression models with interpretable coe$cients of regression. It is well known that regression models are very e$cient for prediction but often give poor results in the analysis of the importance of individual predictors. For

* Corresponding author. Tel.: 1-612-542-0800; fax: 1-612-542-0864. E-mail addresses: [email protected] (S. Lipovetsky), [email protected] (W.M. Conklin). 0305-0548/01/$ - see front matter  2001 Elsevier Science Ltd. All rights reserved. PII: S 0 3 0 5 - 0 5 4 8 ( 0 0 ) 0 0 0 4 3 - 5

1334

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

actual data the variables used in a regression model are always correlated and often correlated to an extent that produces a problem of multicollinearity, i.e., of distorted regression coe$cients [1]. The more variables that are used in a model, the higher the e!ect of multicollinearity because the predictors provide partially redundant information. Multicollinearity is not important for prediction, but it can have several detrimental e!ects in the analysis of the individual variable in#uence on the criterion variable. Because of multicollinearity, parameter estimates can #uctuate wildly with a negligible change in the sample, parameters can have signs opposite to the signs expected from looking at pairwise correlations, and theoretically important variables can have insigni"cant coe$cients. Multicollinearity causes a reduction in statistical power, i.e., in the ability of statistical tests to detect true di!erences in the population. This leads to wider con"dence intervals around the coe$cients, which means that they could be incorrectly identi"ed as being insigni"cant. The ability to determine if one parameter is higher than another is also degraded [2]. A common problem often faced by researchers and practitioners applying regression modeling is estimation of the independent variables' relative contribution to the explanation of the predicted variable's variance. One method of evaluating the regressors' individual contribution is the estimation of the net e!ects, widely used in applied regression modeling. The net e!ect is a combination of the direct e!ect of a variable (as measured by its regression coe$cient squared) and the indirect e!ects (measured by the combination of its correlations with other variables). The net e!ects have the nice property of summing to the total coe$cient of multiple determination R of the model. They explicitly take into account the correlations that predictor variables have with each other. For example, if two variables are highly correlated then only one of them is likely to have a large regression coe$cient. The net e!ect procedure will increase the importance of the variable with the low coe$cient and decrease the importance of the variable with the high coe$cient, providing a more reasonable comparison between them. However, the net e!ect values themselves are also in#uenced by the collinear redundancy in the data. When a multiple regression model has signs of coe$cients that are opposite of the signs of pairwise correlations, then these regressors often have negative net e!ects. Such negative e!ects are di$cult to interpret. Comparison of the independent variables' importance cannot be performed by looking at their pairwise correlations with the dependent variable. This is because regressors produce a combined e!ect in their explanation of the variability of the dependent variable. This e!ect is indicated by multiple determination that is not equal to a total of isolated coe$cients of determination of every regressor. Such a relation can only be preserved for mutually uncorrelated regressors. In only this case do the net e!ects equal the direct e!ects of independent variables. For correlated regressors the problem of their importance needs to be considered in a multi-relational context. Even in the presence of multicollinearity, it can be desirable to keep all possible variables in the model and to estimate their comparative importance in their relation to the dependent variable. This analysis strategy is justi"ed because all available variables do not represent each other exactly. Rather each of the explanatory variables plays its own speci"c role in "tting and describing the behavior of the dependent variable. To overcome the de"ciencies of multicollinearity, a ridge regression technique was suggested [3] and developed then in numerous works (see, for example, Vinod and Ullah [4] and Brown [5]). In this approach some small quantities are added to the diagonal of the matrix of the normal system of equations resulting in coe$cients that are more stable and closer to true values. Various techniques were elaborated for choosing these diagonal `ridgea quantities [6}9]. In practice usually just

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1335

a scalar matrix is used in the ridge regression approach, although theoretical original works considered total matrices of added elements to the sample covariance matrix. The main practical reason of using of ridge regression with one parameter is the di$culty in choosing multiple ridge parameters. Other techniques used for dealing with multicollinearity are related to principal component analysis (PCA) applied for regression problems [10}12]. Some authors also considered ridge regression combined with PCA estimation [13}16]. Another approach for obtaining regressions with the speci"c property of non-negative coe$cients can be found within the restricted minimizing methods. These problems can be reduced to linear or quadratic programming (see Lawson and Hanson [17], Gay [18] and Dongarra and Grosse [19]) and solved by most modern statistical software. However, these methods usually produce zero coe$cients of regression for negative or small correlations of x with y, so they are similar to variable elimination techniques, although a researcher may wish to keep all the regressors for the purpose of their comparison. In this paper we suggest a convenient approach for the derivation of regression models and the estimation of the importance of every predictor, even when they all are highly correlated. This approach is based on the multiobjective optimization process. Multiobjective methods are mostly used in the multiple criteria decision making "eld, where they are usually applied to multi-goal linear and non-linear programming [20}22]. In Lipovetsky [23] a multiobjective technique was used for the least-squares evaluation of a synthesized preference model. In the present paper we consider a combined objective for the multiple regression coe$cients and the paired regression coe$cients (correlations). Optimizing this multiobjective function pushes coe$cients of multiple regression toward the correlation coe$cients. The result is a regression model where all the coe$cients are interpretable, net e!ects are positive (negligible if negative), and the multiple determination is only a little lower than R of the regular regression. This approach has attractive analytical properties and performs reasonably well. We show that it belongs to a family of Ridge and PCA regression methods. However, this approach is di!erent from them in several ways. It is derived from very clear and explicit assumptions, its parameters make evident sense of weights in the objective function, it does not require any arti"cially introduced parameters, it can be easily extended to a more #exible scheme, and it could be generalized to a multivariate technique with several dependent variables. The paper is organized as following. In Section 2 we describe regression models and evaluation of regressors' importance by their net e!ect. In Section 3 we consider multiobjective optimizing approach to regression modeling. Numerical estimations are discussed in Section 4, and the summary is given in Section 5. 2. Least-squares regression Let us consider some least-squares (LS) properties that will be used in further analysis. The LS approach in linear regression modeling corresponds to minimization of the objective function of the sum of squared deviations , S" (y !y( )Pmin, G G G

(1)

1336

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

where y is every ith observation by the dependent variable, and theoretical values of this variable G are estimated by the linear aggregator of n predictors x ,2, x :  L y( "b x #b x #2#b x . (2) G  G  G L LG Let us assume, without loss of generalization, that all the variables are centered and normalized by standard deviations. In matrix notations, the objective (1) is S"(y!y( )(y!y( )"(y!Xb)(y!Xb)"yy!2bXy#bXXb "1!2br#bRK bPmin,

(3)

where y is a column vector of the standardized dependent variable with the variance yy"1.

(4)

Predictors x ,2, x are arranged in the columns of the matrix X of N by n order (N is the number  L of observations). Vector b is the LS-estimator of regression coe$cients (2). The matrix of correlations among the xs, and the vector of correlations between each x and the dependent variable y in H (3) are de"ned as follows: RK "XX,

r"Xy,

(5)

where prime denotes transposition, and hat denotes a sample correlation matrix of x. Minimizing (3) corresponds to the condition S "0 b

(6)

which yields a normal system of equations RK b"r.

(7)

Solution (7) produces `beta-coe$cientsa of standardized multiple regression (2) b"RK \r.

(8)

Substituting (8) into (3), we "nd the minimum of the LS-objective S"1!br,

(9)

where vector b is de"ned in (8). The scalar product br de"nes covariance between the observed and theoretical (2) values of y, and is called the coe$cient of multiple determination R"br"bXy"(Xb)y"y( y.

(10a)

In practice the importance of every regressor is estimated by the items b r in the scalar product br H WH de"ning R, and these items are called net e!ects R"b r . H H WH Minimizing (3) is equivalent to maximizing value (10) R"1!S"2br!bRK bPmax.

(10b)

(11a)

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1337

For any estimator di!erent from the LS-solution (8), the net e!ect is de"ned by the following items of (11a)





L R"b 2r ! r b . (11b) H H WH HI I I Maximizing (11a) yields the same LS-solution (8), and for this solution the net e!ect (11b) reduces to the LS-expression (10b). In place of the covariance (10) we can maximize the correlation between theoretical and observed values of y: r cov(y, y( ) " Pmax, ,cor(y, y( )" (var(y)var(y( ) (RK 

(12)

where variance var(y) equals one (4), vector  means an estimator for the coe$cients of regression (2) by the objective (12), and variance}covariance notations correspond to their sample estimations. Variance of the theoretical values of y corresponds to the last item in (3), and could be normalized as follows: RK "1

(13)

which is a regular condition used in problems of canonical correlation analysis. The objective (12) with the normalization (13) can be reduced to a conditional problem:  "r! (RK !1)Pmax, 2

(14)

where  is a Lagrange term. From the "rst-order condition similar to (6), we get an equation RK "r

(15)

so the solution is "RK \r

(16)

that is a regular LS solution (8). Multiplying vector  from the left by both sides of (15), we get the term r . " RK 

(17)

Using vector b (8), we express coe$cients of regression b via coe$cients  obtained by the objective (14) with normalized variance (13) r ) . b"RK \r"" RK 

(18)

Derivation (12)}(18) corresponds to the canonical correlation analysis with just one variable in one of the groups (see Tishler and Lipovetsky [24,25]).

1338

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

In the case of just one regressor, the matrix of regressor correlations degenerates to unity, and vector r contains just one correlation r , so solution (8) (and (18)) coincides with this correlation HW bI "r . (19) H HW This is the coe$cient of a pairwise regression with just one predictor x . If all the predictors are H uncorrelated with each other, then the matrix of correlations reduces to an identity matrix, and elements b of vector b (8) for multiple regression coincide with the coe$cients bI (19) of all pairwise H H regressions. In the general case of correlated and even highly correlated regressors, coe$cients b and bI are H H di!erent. They can be di!erent by magnitude as well as by sign. Suppose all of the pairwise correlations of the regressors with y are positive (so dependencies (19) are positive) but negative values appear among the coe$cients of multiple regression (8). This multiple regression can be successfully used for prediction, but it is di$cult to interpret the relative impact of the individual predictors on the dependent variable. We can note that the change in signs is due to collinearity among the regressors, but this explanation does not help much if we need to compare the importance of individual regressors. If the pairwise relations are positive, the negative coe$cients of multiple regression are hardly interpretable. They produce negative inputs to R (10) and diminish this characteristic of the quality of regression. Let us consider how to reduce the multicollinearity impact on the coe$cients of regression.

3. Multiobjective modi5cations for collinearity Consider a linear combination (2) that has maximum possible correlation with y and with each of the predictors x as well. This means that the vector of scores (2) is oriented as closely as possible H in the directions of every variable x ,2, x , and y.  L Correlation between a regressor x and the aggregator (2) is H x X r  cov(x , X) H " H " H , (20) cor(x , y( )" H (var(x )var(X) (XX (RK  H where x is a column vector (and x is a row vector) of observations for the standardized jth H H variable; r denotes the vector of correlations of the x with all the regressors (including x itself). H H H The matrix of correlations among the regressors is the same as in (5). Vector  denotes an estimator of the coe$cients of the aggregator (2) in the new approach. For the total objective we take a sum of squared correlations of the aggregator (2) with every variable x and with y. Squared correlations are used to prevent canceling of the items of possible H opposite signs in this sum. Thus, the multiobjective function is L (r ) L (r) # H Pmax, (21) F,cor(y, y( )# cor(x , y( )" H RK  RK  H H where the "rst item is the squared value of correlation (12) expressed via the new estimator  and vector r (5) of correlations of the regressors with y, and under the sum we have squared values of

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1339

correlations (20). Using normalizing condition (13) for (21), we represent this objective as follows: L F"rr# (r r )!(RK !1)Pmax. H H H Considering the sum of quadratic forms in (21), (22)



(22)



L L L (23) (r )" (r r )" r r  H H H H H H H H we notice that any pkth element of the matrix in parentheses on the right-hand side (23) equals





L L r r " r r "(RK ) . (24) H H NH HI NI NI H H That is nothing else but the pkth element of the squared correlation matrix (5). Thus, we represent the objective (22) in a matrix form F"(rr#RK )!(RK !1)Pmax,

(25)

where rr is the outer product of the vector (5). With a "rst-order condition like (6), a generalized eigenproblem yields from (25) (RK #rr)"RK .

(26)

Solving (26) for maximum , which is the value of objective (21) at its maximum, we get a vector  of parameters for the aggregator (2) that has maximum possible squared correlations with every x and with y. We can rewrite (26) as follows: (RK #RK \rr)",

(27)

and using solution (8) for regular regression we have (RK #br)".

(28)

If correlations between the regressors are small, then RK is close to the identity matrix and (28) reduces to b(r)"(!1),

(29)

where the scalar product r is a constant. Then solution  of (29) is proportional to the regular regression solution, b (8), which in this case corresponds to paired correlations (19). Alternatively, if correlations between y and the regressors are negligible, i.e. vector r is close to zero, then (28) reduces to RK "

(30)

which is the principal component problem. Thus, solution  of the eigenproblem (28) for the maximum eigenvalue corresponds to a vector located between the regular multiple regression solution and the main principal component. This vector  was obtained from the objective (22) with the normalizing condition as in (13). To

1340

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

represent this solution without the normalizing term, we use a regression y"y of empirical by theoretical values, with the coe$cient of regression cov(y, y( ) r " " . var(y( ) RK 

(31)

Then the vector a of regression coe$cients without additional normalization is r . a"" RK 

(32)

This result (32) is similar to (18) obtained for regular regressions. The eigenproblem (26) or (28) adjusted by (32) produces the result that we call the multiobjective regression solution for collinearity. More speci"cally, it is multiobjective with equal weights for all parts of the objective function (21). This objective could be easily generalized for di!erent weights for multiple and pairwise "ttings when in place of (21) we have





L (33) F"k cor(x , y( ) #(1!k)(cor(y, y( ))Pmax H H when parameter k belongs to the 0}1 range. Reproducing derivation (22)}(26), we obtain an eigenproblem [kRK #(1!k)rr]"RK .

(34)

If k"0 the generalized eigenproblem (34) reduces to the regular regression solution (similar to (29)). Increasing k to 0.5 we get problem (26) that diminishes the e!ects of multicollinearity on the coe$cients of regression. Numerical experiments show that with k between 0.1 and 0.3 we usually reach `gooda signs for all coe$cients of regression. Changing k to be closer to 1 we get results closer to the PCA vector (30). As a further generalization of this multiobjective technique we can consider di!erent weights k for every item in the sum (33). Minimizing such multiobjective yields an eigenproblem similar to H (34) but with a diagonal matrix of the weights k in place of the coe$cient k with the "rst item on H the left-hand side of (34). In this case, using Perron}Frobenius theory about positive matrices and their positive main vectors [26,27], we can choose the weights for the multiobjective problem (see Lipovetsky and Conklin [28]). Another generalization of the technique (21)}(26) can be seen in the area of canonical correlation analysis (CCA), when there are m-dependent variables y ,2, y and n-independent variables  K x ,2, x . Using blocks of correlations de"ned via X and > matrices of standardized variables  L Rxy"X>, Ryx">X, Rxx"XX, we can show that generalization (26) corresponds now to the eigenproblem with block matrices [28]: (Rxx#RxyRyx)"(Rxx).

(35)

The problem (35) is obtained from the objective (21) taken for multiple y-variables. Solution of (35) yields coe$cients of the aggregator for x-variables (2) that has maximum correlation with every y and every x. For just one y, matrix Rxy reduces to the vector r of correlations of y with the regressors, and the problem (35) coincides with the eigenproblem (26). If we "nd aggregator (2) by

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1341

its maximum correlations with y-variables only, that is without its correlations with x-variables, then in place of (35) we have (RxyRyx)"(Rxx), which is the problem of redundancy analysis [29,30]. If correlations among the x-variables are small, and Rxx is close to an identity matrix, then (35) reduces to the eigenproblem (RxyRyx)"(!1). This problem is known in partial least squares [31] and in Robust canonical correlation analysis [25]. All these properties show that problems (26) and (35) belong to the CCA family of methods and their modi"cations. Now, let us consider how our approach is related to ridge regression. Ridge regression in its most general form corresponds to the problem (RK # K)b"r,

(36)

where is a parameter and K is a matrix of parameters. In typical applications of ridge regression modeling, K is usually assumed to be an identity matrix, and takes some small values. If equals zero, the problem (36) coincides with the normal system (7) for regular regression. Consider the multiobjective eigenproblem (34) with the parameter k. To transform it to a linear system of equations, we represent (34) as the following k(RK !rr)#(r)r"RK ,

(37)

where the scalar product of r and  is some constant. Then we can rewrite (37) as a linear system [RK #q(rr!RK )]"cr,

(38)

where we use notations q"k/,

c"(r)/

(39)

for a new parameter q and a constant c. Comparing with ridge regression problem (36), we see that ridge parameter corresponds to our parameter q, and matrix K could be identi"ed as the matrix rr!RK  from (38). This comparison allows one to construct matrix K in the ridge regression analysis, and to interpret parameter in terms of the ratio (39) of k and , where k is the weight (33) for adjustment of aggregator (2) to every regressor x, and  is the maximum value of the objective (33). A constant c is not important for the solution (38) because any normalizing term is canceled in the adjustment (32) to the non-normalized regression coe$cients.

4. Numerical example For a typical numerical example we use data from a real project with 1200 observations by dependent and 10 independent variables. In Table 1 we present correlations of y with the regressors, `beta-coe$cientsa (8) of regular regression and its net e!ects (10b) as percent shares in the R (10a). Variables number 5, 6, 8, and 9 have opposite signs of regression and correlation coe$cients. And even positive coe$cients of regression are related among themselves rather di!erently from the structure of pairwise relations. For example, the "rst three correlations are related in the proportion 1.07 : 1.03 : 1 (correlations equal to 0.746, 0.718, and 0.697 * see Table 1). But the "rst three coe$cients of regression are of a very di!erent structure of relation 4.95 : 2.56 : 1 (coe$cients equal 0.391, 0.202, and 0.079). The net e!ects of four variables are negative. The structure of shares

1342

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

Table 1 Correlations and several models of regression

x  x  x  x  x  x  x  x  x  x  R

cor(y, x ) H

Regular regression

0.746 0.718 0.697 0.650 0.616 0.468 0.544 !0.002 0.437 0.561

0.391 0.202 0.079 0.092 !0.028 !0.020 0.046 0.002 !0.026 0.134

*

0.613

Net e!ect (%)

Restricted regression

47.54 23.63 8.98 9.70 !2.80 !1.50 4.11 !0.00 !1.87 12.21

0.389 0.195 0.070 0.083 0 0 0.018 0 0 0.125

0.218 0.155 0.113 0.091 0.050 0.018 0.042 0.004 0.010 0.109

0.590

0.553

(100%)

Ridge regression

of positive net e!ects is also hardly interpretable. Of course, we can blame all of these e!ects on multicollinearity, but it is not of much help if we need, for example, to present shares of regressor importance in a pie-chart. In Table 1 we also see a restricted regression obtained in a procedure similar to quadratic programming (nonnegative least squares software in the S-PLUS 2000 [32]). Positive coe$cients of the restricted regression are close to positive coe$cients of the regular regression, and other coe$cients equal zero. Quality of this regression is not much worse than of the regular regression (multiple determinations equal 0.590 and 0.613, respectively). Four variables got zero coe$cients, so they are eliminated, and we cannot consider all the regressor comparative in#uences by this model. In the last column of Table 1 we present the ridge regression model. It is obtained varying one parameter in (36) (with the identity matrix K). The matrix of correlations among the x}s in our example is not an ill-conditioned matrix (its eigenvalues belong to the spectrum from 2.48 to 0.34), so to get a positive solution in the ridge regression approach we had to increase parameter (36) to about 0.5. Coe$cients of the ridge regression are not very similar to those of a regular regression. The structure of these coe$cients resembles that of pairwise correlations, and multiple determination is close enough to that of the regular regression. Results obtained by multiobjective modeling are given in Table 2. We see from Table 2 that the signs of coe$cients (26) coincide with the signs of pair correlations from Table 1. Magnitude of the regression coe$cients is close to that of the correlations. For example, the "rst three coe$cients (26) are related as 1.18 : 1.07 : 1, which is similar to pairwise relations. All net e!ects are positive and have a similar structure of shares. Multiple determination of this multiobjective model is almost the same as of ridge regression model (0.555 and 0.553, respectively) that is a little less than in a regular regression. Predictors can be easily analyzed with the model (26), and their shares of in#uence can be charted without any problems.

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1343

Table 2 Multiobjective regressions Regression (26) coe$cients

Regression (26) net e!ect (%)

Regression (34) coe$cients

Regression (34) net e!ect (%)

x  x  x  x  x  x  x  x  x  x 

0.128 0.116 0.108 0.106 0.097 0.080 0.095 0.002 0.075 0.091

19.62 16.13 13.90 12.25 9.89 5.63 8.18 0.00 4.80 9.60

0.247 0.157 0.099 0.102 0.045 0.037 0.074 0.000 0.031 0.112

32.64 19.35 11.53 10.84 4.33 2.56 6.16 0.00 1.99 10.60

R

0.555

(100%)

0.595

(100%)

The two right columns in Table 2 show the results of the method (34), where the coe$cients with appropriate signs are achieved already at the level k"0.1. The R of this model is very close to multiple determination for the regular regression (0.595 and 0.613). However, structure of the coe$cients and shares of net e!ects are less satisfactory than for the regression (26). Thus, we need to increase the parameter in (34) to the level k"0.5, when this problem transforms to the problem (26) with the most easily understood solution. Models obtained in the multiobjective approach were checked for their predictive ability as well. One-half of the observations were taken in a random sampling for modeling, and another part of them was used for forecasting. Summary statistics of residuals are very close for regular and modi"ed regressions, both for modeling and forecasting sets of observations. So the forecast power of modi"ed regression is of the same level as of a regular unrestricted regression. At the same time, for purposes of analysis, the modi"ed regression, due to its robustness, is much more convenient and useful than a prone to multicollinearity regular regression.

5. Summary We considered how to obtain regressions with the desirable property of correct signs of the coe$cients and positive net e!ects. We suggest using a special multiobjective of maximum correlations of the regressors' aggregate with every dependent and independent variable. Such regression can be performed by solving a generalized eigenproblem (26). This solution yields a lesser value of multiple determination R but a better set of coe$cients in comparison with regular regression. The trade-o! between a higher R and worse coe$cients, or a lower R and better coe$cients, opens a possibility to choose a feasible solution for multiobjective multiple regression models. We "nd a close relation of our approach with ridge regression that helps to interpret properties of ridge regression as well. Further extensions of our technique can be seen in

1344

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

the direction of many-parametric multiobjective optimization combined with Perron}Frobenius theory of positive matrices, and also in the area of multivariate methods of canonical correlations, partial least-squares and redundancy analysis. The suggested approach promises to be very fruitful for numerous aims of applied regression analysis. Acknowledgements The authors wish to thank two referees for their comments that improved the paper, especially in the part of comparison with ridge regression. References [1] Grapentine A. Managing multicollinearity. Marketing Research 1997;9:11}21. [2] Mason CH, Perreault WD. Collinearity, power, and interpretation of multiple regression analysis. Journal of Marketing Research 1991;28:268}80. [3] Hoerl AE. Application of ridge analysis to regression problems. Chemical Engineering Progress 1962;58:54}9. [4] Vinod HD, Ullah A. Recent advances in regression methods. New York: Marcel Dekker, 1981. [5] Brown PJ. Measurement, regression and calibration. Oxford: Oxford University Press, 1994. [6] Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 1970;12:55}67 (Republished in Technometrics 2000;42:80}6. [7] Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 2000;42:80}6. [8] Hoerl AE, Kennard RW, Baldwin KF. Ridge regression: some simulation. Communications in Statistics A 1975;4:105}24. [9] McDonald GC, Galarneau OI. A Monte-Carlo evaluation of some ridge-type estimators. Journal of the American Statistical Association 1975;70:350. [10] Kendall MG. A course in multivariate analysis. London: Gri$n, 1957. [11] Massy WF. Principal components regression in exploratory statistical research. Journal of the American Statistical Association 1965;60:2. [12] Greenberg E. Minimum variance properties of principal component regression. Journal of the American Statistical Association 1975;70:349. [13] Marquardt DW. Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 1970;12:3. [14] Hawkins DM. Relations between ridge regression and eigenanalysis of the augmented correlation matrix. Technometrics 1975;17:4. [15] Hocking RR, Speed FM, Lynn MJ. A class of biased estimators in linear regression. Technometrics 1976;18:425}38. [16] Gunst RF, Webster JT, Mason RL. A comparison of least squares and latent root regression estimators. Technometrics 1976;18:1. [17] Lawson CL, Hanson RJ. Solving least-squares problems. New York: Prentice-Hall, 1974. [18] Gay DM. A trust region approach to linearly constrained optimization. In: Lootsma FA, editor. Numerical analysis. Proceedings, Dundee 1983. Berlin: Springer, 1984. p. 171}89. [19] Dongarra JJ, Grosse E. Distribution of mathematical software via electronic mail. Communications of the ACM 1987;30:403}7. [20] Zeleny M. Multiple-criteria decision making. New York: McGraw-Hill, 1982. [21] Steuer RE. Multiple criteria optimization: theory, computation, and application. New York: Wiley, 1986. [22] Tzeng GH, Wang HF, Wen UP, Yu PL, editors. Multiple criteria decision making. New York: Springer, 1994. [23] Lipovetsky S. The synthetic hierarchy method: an optimizing approach to obtaining priorities in the AHP. European Journal of Operational Research 1996;93:550}64.

S. Lipovetsky, W.M. Conklin / Computers & Operations Research 28 (2001) 1333}1345

1345

[24] Tishler A, Lipovetsky S. Canonical correlation analysis for three data sets: a uni"ed framework with application to management. Computers and Operations Research 1996;23:667}79. [25] Tishler A, Lipovetsky S. Modeling and forecasting with robust canonical analysis: method and application. Computers and Operations Research 2000;27:217}32. [26] Berman A, Neumann M, Stern RJ. Non-negative matrices in systems theory. New York: Wiley, 1989. [27] Minc H. Nonnegative matrices. New York: Wiley, 1988. [28] Lipovetsky S, Conklin M. CRI: collinearity resistant implement for analysis of regression problems. Proceedings of the 31st Symposium on the Interface: Computing Science and Statistics. June 9}12, 1999, Schaumburg, Illinois, 1999. p. 282}6. [29] Wollenberg AL, Van den. Redundancy analysis: an alternative to canonical correlation analysis. Psychometrika 1977;42:207}19. [30] Fornell C, editor. A second generation of multivariate analysis, vols. 1, 2. New York: Praeger, 1982. [31] Bookstein FL, Sampson PD, Streissguth AP, Barr HM. Measuring dose and response with multivariate data using partial least squares techniques. Communications in Statistics: Theory and Methods 1990;19:765}804. [32] S-PLUS 2000. Modern statistics and advanced graphics. Seattle: MathSoft Inc., 1999.

Michael Conklin is Senior Vice-President, Analytic Services for Custom Research Inc. He graduated from the Wharton School at the University of Pennsylvania. He is currently President of the Twin Cities Chapter of the American Statistical Association. His research interests include Bayesian methods and the analysis of categorical and ordinal data. Stan Lipovetsky received a M.Sc. in Theoretical Physics and a Ph.D. in Mathematical Economics from Moscow University, and worked at the Faculty of Management at Tel Aviv University. He is a research manager at Custom Research. His primary areas of research are multivariate statistics, multiple criteria decision making, econometrics, microeconomics, and marketing research.