Multicollinearity in marketing models: Diagnostics and remedial measures

Multicollinearity in marketing models: Diagnostics and remedial measures

181 Multicollinearity in marketing models: Diagnostics and remedial measures * Chezy OFIR * * And& KHURI * * * The problem of multicollinearity in l...

2MB Sizes 13 Downloads 32 Views

181

Multicollinearity in marketing models: Diagnostics and remedial measures * Chezy OFIR * * And& KHURI * * *

The problem of multicollinearity in linear models is reviewed. Diagnostic measures for detection, analysis of the effects, and localization of multicollinearity are presented. It is recommended that OLS estimates should not be used without a proper diagnostic. The traditional remedial measures, i.e., omission of variables from the model and principal component regression, are critically discussed along with more recent methods. Ridge regression is presented, and several methods for selecting the biasing parameter in ridge regression are introduced. The procedures are illustrated with data taken from the marketing literature and evaluated for their potential usefulness in handling multicollinearity.

1. Introduction Linear models are frequently used in investigations marketing studies. Numerous in the fields of consumer attitudes, judgment and choice, the modeling of the effect of advertising on sales, and others have utilized linear models (e.g., Bechtel and O’Connor

* The authors are grateful to Donald Lehmarm for providing data used in this paper, to Ira Horowitz and David Mazorski for comments regarding the exposition of an early draft of this paper, to the editor and two anonymous referees for helpful suggestions, and to Pan-Yu Lai for his assistance in programming. The authors also acknowledge the support of the Recanati Foundation, and the Faculty of Social Sciences, both at The Hebrew University. ** Author’s address: Chezy Ofir, School of Business Administration, The Hebrew University, Mount Scopus, Jerusalem 91905, Israel. *** Author’s address: Andrt Khuri, Department of Statistics, Nuclear Sciences Center, University of Florida, Gainesville, Forlida 32611, USA.

Intern. J. of Research North-Holland

0167-8116/86/$3.50

in Marketing

0 1986, Elsevier

3 (1986)

Science

181-205

Publishers

(1979), Brodie and Kluyver (1984), Farris and Buzzell (1979), Holbrook (1978), Lehmann et al. (1974), Oliver (1980), Wilkie and Pessemier (1973)). Ordinary least squares (OLS) is a common procedure used to estimate parameters in linear models. A potential problem associated with OLS is multicollinearity in the predictor variables, namely, when linear (or near-linear) dependencies exist among these variables, which decreases the precision of the parameter estimates. In an experimental setting, researchers are able to design the predictor variables to be uncorrelated, thereby avoiding the multicollinearity problem. In survey and field studies, however, the researcher has much less control over the predictors and is thus vulnerable to this problem. These latter studies are very prevalent in marketing. The increased usage of various latent variable models has not eliminated the collinearity problem. This problem could potentially affect estimates in linear models which are used in conjunction with latent variable models (e.g., Bechtel (1981), Jagpal (1982)). The multicollinearity problem is generally recognized in the marketing literature (e.g., Green (1978), Naert and Leeflang (1978)). In practice, .however, identification and treatment of this problem are frequently absent or applied in a partial and unsatisfactory manner. Employing linear models without complete and proper attention to multicollinearity can lead to imprecise parameter estimates and predictions. In view of the widespread use of linear models and the frequent mistreatment of multicollinearity, the objective of this paper is to review the problem of multicollinearity, examine its sources and effects, and present remedial measures.

B.V. (North-Holland)

C. Ofir,

182

A. Khuri

/

Mu/lico\/inearity

2. The problem As an introduction to the problem of collinear predictors consider the following linear model: Y=xp+&,

(1)

where Y is an (n x 1) vector of observations on a response variable y; X is an (n Xp) matrix consisting of n observations on p predictor variables; j? is a (p x 1) parameter vector of p unknown coefficients; and E is an (n x 1) vector of normal random errors with E - N(0, ~~1). The ordinary least squares are given by fi = estimates (OLS) (X’X)-IX’Y. All variables are corrected for their means and scaled to unit length so that X’X and X’Y are in correlation form. In order to get a unique estimate of p, X’X, or equivalently X, must be of rank p and n >p. If the columns of X exhibit a perfect linear dependency, a unique solution does not exist. This case will be referred to as perfect collinearity. In many cases, however, only near collinearity, which does not violate the rank condition, occurs, a situation referred to as multicollinearity. In the presence of multicollinearity an OLS procedure provides all expected properties (i.e., BLUE-best linear unbiased estimates). However, some negative consequences are also associated with it. Before proceeding to the discussion regarding the sources and effects of multicollinearity, let us point out that multicollinearity is not a problem in itself, but a symptom of a problem, namely, lack of information. Consider, for example, the following model: y = &x, + p,x, + E, suppose that the predictors are highly collinear yielding via a regression model the following result x2 = ax1 + 6, substituting in the main model we obtain: Y=Ylzl+

Y92-t

E,

.

(2)

where, yr = j?, + p2a, z1 = x1, z2 = x2 - (YX~, and y2 = p2. Notice that z1 and z2 are nearly

in murketing

models

orthogonaly by design. Variable z2, however, exhibits low variability with most observations close to zero, and thus inaccurate estimate of p2 is obtained (see Learner (1983)). Utilizing ill-conditioned data with an OLS procedure provides, therefore, relatively good information regarding linear combinations of parameters (e.g., & f p2~) but not of individual coefficients. Further details regarding remedies for this formulation of the problem are given later in the paper. 2. I, Sources of multicollinearity Employment of a subspace of the predictors. When only a region of the space of predictor variables is used, covariation may result and the problem of multicollinearity may arise. This might be due to unsuccessful sampling or to the sampling method itself. The problem could be identified through application of theoretical or substantive knowledge which would suggest that the interrelationships among the predictor variables did not exist in the population. Multicollinearity of this kind is a characteristic of a specific data set. An additional data collection might solve the problem. Employment of only a subspace of the predictor variables may be necessary due to an inherent characteristic of the population. For example, specifications of a product category are usually bounded (e.g., price range, technical specifications, etc.). Constraints of this kind in the population will lead to multicollinearity regardless of the sampling method used. The choice of model. Some models are likely to induce the multicollinearity problem. In polynomial models, the terms are very likely to be correlated; the polynomial model used with time series data by Lindberg (1982) to predict demand in several countries was accompanied by severe multicollinearity (see table 1 on page 187). Shifting slope models (e.g., Wildt and Winer (1979)) are also likely

C. Ofir,

A. Khuri

/ Multicollinearity

to induce multicollinearity. In these models the slopes are functions of time or other variables (e.g., Parsons and Abeele (1981)). In form suitable for estimation these models possess interaction terms which are potentially collinear (e.g., Erickson and Montgomery (1980)). Two additional somewhat related problems may rise in dealing with time series data; namely, autocorrelation within each series (e.g., Cochrane and Orcutt (1949)) and multicollinearity among series (Ofir and Rave (1986)). Finally, models which include lag or carry-over effects are prone to the multicollinearity problem (e.g., Palda (1964) see also Davidson et al. (1978) for this and other issues pertaining to time series). Ill-conditioning of data could result from the usage of certain models with populations in which the predictor variables are constrained. For example, an attempt to build a psychophysical model for price the following may be very attractive: y = &Pp1ep2p. E, where y is some response to price and P-price. This model, introduced by Hoer1 ((1954) see also Daniel and Wood (1980)), is attractive due to its ability to represent different shapes using different coefficients. The problem, however, is that price is restricted to a certain region for a given product class; in the form used for estimation of the model, the terms P and In P are most likely to be collinear given the constraints on price. An over defined model. This case typically refers to models in which the number of predictor variables ( p) is larger than or equal to the number of observations (n). Such occurrences are frequent in medical research (e.g., Mason, Gunst and Webster (1975)) and personnel selection (e.g., Darlington (1978)) but are less common in marketing. A related problem is when the ratio of n/p is small, approaching one from above, and thus affecting the mean square error of prediction (Green and Srinivasan (1978: 109)). This problem could arise in an individual level cojoint modeling (e.g., Cattin, Gelfand and Danes (1983))

in murketing

models

183

and could be even more serious for individual level models in which experimentally designed data were inapplicable. 2.2. Effects of multicollinearity Estimates. In order to assess the effects of multicollinearity on OLS estimates define L as the distance from /3 to /?. The squared distance then is L* = (p^ - p)‘( p^ - p), and the average squared distance is given by E( L2) = a* trace( X’X))l, or by E( L2) = a2 i

(3)

(l/X;),

i=l

where Xi (i=l, 2,..., p) are the eigenvalues of X’X ordered according to their size from largest to smallest. In the presence of multicollinearity in the data some eigenvalues are very small and consequently the expected distance is very large. OLS estimates while being BLUE’s, may not be ‘close’ to the population parameters. Further, the estimated vector /? is longer than p: E(/?$) = p,P + a2Q=rl/Xi, suggesting that some OLS estimates are inflated in absolute value. Given ill-conditioned data, the diagonal elements of (X’X)-l are large, resulting in estimates with large variances; ’ this implies that with different samples OLS estimates would probably change. Stability of OLS estimates is also affected by minor changes in the data. Beaton, Rubin and Barone (1976) perturb multicollinear data beyond the last digit by adding a uniform random variable. These minor changes in the data caused drastic changes in the OLS estimates. Moreover, it was demonstrated that different computer programs produce different OLS estimates and signs (see also Wampler (1970)). Omission of variables from or addition of them to the model could also change the estimates of the remaining predictors. Finally, deletion of ’ These diagonal elements are termed tors (VIF) (Marquardt (1970)).

variance

inflation

fac-

184

C. Ofir,

A. Khuri

/ Multicollinectritv

observations from or addition of them to multicollinear data, could change OLS estimates as well. Prediction. It is evident that multicollinearity in the data negatively affects OLS estimates. It is also of interest to assess its effects on the quality of model predictions. In general, if the data points at which prediction of the response is made are within the region where the model was fitted and where the same pattern of multicollinearity holds, then prediction is fairly precise. If, however, the researcher attempts to extrapolate outside the region, prediction will most likely be adversely affected. This was shown analytically by Mason, Gunst and Webster (1975), and was corroborated recently by Snee and Marquardt (1984), and Wood (1984). For illustration, consider the ‘ picket fence’ example of Hocking and Pendleton (1983: 500), concerning a model with two collinear predictors: ‘The responses then resemble the pickets along a not-so-straight fence row. Fitting a regression surface to this data is analogous to balancing a plane on these pickets. The plane is clearly unstable in the direction perpendicular to the fence row and its slope in this direction can be greatly influenced by a single picket’. Predictions based on points outside or far from the fence line are therefore highly variable. In the event such predictions are desired, additional data should be appropriately collected in an attempt to reduce the effect of multicollinearity in the combined sample. If this is not feasible, because of economic or technical limitations, for example, then one should seek additional information about the model’s parameters from other sources. Remedial measures to alleviate the effects of multicollinearity will be addressed later in greater detail. Inferences and variable selection. Inferences which are based on multicollinear data may also be affected. In particular, the testing of the hypothesis of a zero coefficient, i.e., H,: pj = 0, is sensitive to interrelation-

in marketing

models

ships among the predictors. A t-test given by 62vIfy2 where VIFj denotes the jth variance inflation factor (namely the jth diagonal element of the inverse of the X’X matrix) is employed to test this hypothesis. In cases of extreme multicollinearity some of the V1F’s tend to be very large, resulting in a small t-value. In such cases a researcher cannot reject Ha, a conclusion, however, which may be wrong. Common variable selection procedures applied to ill-conditioned data are also suspect, as they may produce conflicting results (Hocking (1983)). In particular, methods based on the residual mean square, R2, Cp statistic (Mallows (1973)) and hierarchical tests (Cohen and Cohen (1975)) are all sensitive to multicollinearity in the data. If the objective is prediction this problem may not be severe since compensating variables remain in the specification. If the model, however, is designed for ‘understanding’ or have a theoretical significance dropping the ‘wrong’ variables from the specification is not desirable. t = Q(

3. Multicollinearity

diagnostics

In view of the potential negative effects of multicollinearity, it is clear that proper identification and diagnostics are useful. A procedure that detects and diagnoses multicollinearity could provide a basis on which to evaluate the quality of OLS estimates, and could be important for the decision of whether or not to employ remedial measures. Surprisingly, despite the recognition of the effects of multicollinearity (e.g., Green (1978)), systematic procedures to examine the condition of data are rarely employed in the marketing literature. On the other extreme the multicollinearity problem is raised (sometimes without proper empirical evidence) to justify results or lack of face validity of results. In cases where multicollinearity diagnostics are

C. Ofir, A. Khuri

/ Multicollinearity

not employed, the parameters estimated by OLS may suffer from the negative symptoms of multicollinearity and biased estimation, or other remedies are probably not being seriously considered. In some instances multicollinearity is hypothesized to exist in data when individual model parameters are not significantly different from zero and the overall F-test is significant. It could be shown, however, that this may result when all predictors are mutually uncorrelated (Geary and Leser (1968)). This and other disturbing results, e.g., ‘incorrect’ signs, may be associated with multicollinearity, but they are neither sufficient nor necessary conditions for its existence. Researchers are advised to treat estimates cautiously when confronted with such results, yet not to take them as conclusive evidence of multicollinearity. The following is a discussion about ways to diagnose multicollinearity. First, various indicators useful in assessing multicollinearity are described, followed by procedures for multicollinearity localization. Empirical illustrations accompany the discussions. 3. I. Detecting

and analyzing

multicollinearity

effects

Painvise correlation. Contrary to widely held opinion, it is not true that pairwise correlations between the independent variables in regression analysis are indicators of multicollinearity. They are neither necessary nor sufficient criteria for multicollinearity (Dhrymes (1978), also Montgomery and Peck (1982: 297-299). Pairwise correlation cannot identify dependencies among more than two predictors. If, however, the correlations are either very high close to one or low close to zero, a researcher can reach a more certain conclusion regarding the existence of multicollinearity. In any event, pairwise correlations should be used only as a very preliminary step in the diagnostic procedure. Another starting The determinant of X’X.

in marketing

mod&

185

point for assessing multicollinearity would be an examination of the determinant of X’X. In order to remove the effects of units of measurement, X’X is usually used in a correlation form. If the predictors exhibit perfect linear dependencies the determinant is zero. If multicollinearity exists in the data, the determinant is small. A more meaningful interpretation of this indicator was suggested by Willan and Watts (1978). They showed that the ratio of the volume of the confidence region for /3 (based on the observed data) to that of an orthogonal reference design is equal to 1X’X I-1’2; this indicates the loss of estimation power due to linear dependencies in the data. 2 A problem with the determinant as an indicator to multicollinearity (when X’X is not given in a correlational form) is that it could be small when some columns of X are close to zero, thus limiting its usefulness. V1E The diagonal elements of (X’X) - ’ are an important indicator of multicollinearity. These elements were termed variance inflation factors (r1J’) due to their impact on the variance of p (Marquardt (1970)). V15. is ’ Farrar and Glauber (1967) offered a statistical test associated with the determinant of the correlation matrix. Assuming that the variables were independent, they transformed the determinant to obtain a X2 statistic, which could provide a measure for departures from orthogonality. This indicator, however, has been criticized in the literature on the following grounds: (1) the determinant could be significantly different from one even in the absence of any problem. Alternatively, Haitovsky (1969) suggested testing deviation from singularity via the following modified statistic: X2 = - (n - 1 - 1/6(2p + 5)) ln(1 - 1X’X I); (2) Farrar and Glauber’s (1967) cm-square test is based on Bartlett’s (1954) statistic which requires that X be stochastic (joint normal distribution of the response and predictor variables) and that the rows of the X matrix be independently distributed. The normality assumption, while quite robust, does not agree with the current assumption that X is fixed. Further, as Kumar (1975) pointed out, in non-experimental model building the latter assumption is untenable. The impact of departure from the independence assumption alone is unknown; here the problem is further complicated by the fact that there are departures from both assumptions; (3) Farrar and Glauber’s statistic as well as Haitovsky’s are based on Bartlett’s cl-&square test, both of which are sensitive to sample size.

186

C. Ofir,

A. Khuri

/ Multicollinearity

also equal to (1 - R:)-l, where R: is obtained by regressing the j’s predictor in X on the remaining p - 1 predictors. Another meaningful interpretation of the VU”s is given by the ratio of the length of the confidence interval associated with Bj, based on the observed data, to the length of the interval associated with an orthogonal reference design. This ratio equals the square root of the VIF (Willan and Watts (1978)). VIF’s are therefore very useful in detecting multicollinearity as well as indications of the precision of the estimates. Examination of the eigenvalues. Another indication of the condition of the data can be obtained by means of the eigenvalues of X’X. Observation of the spread of these values could give preliminary indication of the problem, particularly if some roots were close to zero. Recall that the average squared distance between /3 and p^ is given by a*C~=il/X ;. Thus, if one or more eigenvalues were much smaller than one, this average distance would be quite large, indicating imprecision. If, however, the original levels of the predictor variables were orthogonal, the expected (average) squared distance would be E( L*) = p u *. By comparing cf=il/h, to p the researcher could further assess the condition of the data and the precision of the estimates. Another indicator associated with the eigenvalues is Amax/Xmin, where Xmax and Xmin are, respectively, the largest and smallest eigenvalues of X’X. This ratio is termed the ‘condition number’ of X’X. The square root of this ratio is the condition number of X, which is also equal to the ratio of the largest to the smallest singular values of X (see, for example, Belsley et al. (1980: 98-104)). This ratio should be compared to unity in an orthogonal system. Belsley (1984) argued that the condition number measures the sensitivity of the estimates to changes in the data. Berk (1977), further showed that the maximum VU’ is a lower bound of this ratio (see also Snee and Marquardt (1984)). A re-

in marketing

models

lated indicator is the condition index given by pi = ( Xmax/hi)‘12. This latter indicator has been applied only very infrequently (Hocking (1983)). 3.2. Illustration

of multicollinearity

measures

It is recommended that researchers rely on several indicators when assessing the condition of data. One indicator may not provide sufficient diagnostics as to the data’s condition. This view is shared and convincingly argued by Gunst (1984: 80-81): ‘there is [no] one correct technique within which discussions of collinearity must be straightjacketed ( . . . ) rarely will any single collinearity measure completely characterize the nature and effects of collinear predictor variables’. In choosing measures, a researcher should consider his/ her specific objectives, the capacity of a single indicator to provide information with regard to several objectives, and the ease with which the indicator is implemented in practice. For the purpose of illustration, the above collinearity measures were applied to data taken from the marketing literature (see table 1). The examples were chosen to represent a variety of research areas within the marketing domain. These illustrations also demonstrate the need to assess the condition of the data before using regression analysis. Empirical illustration. The data taken from Bloom and Franzack’s (1981) study are used here in a demonstration of the various measures (see table 1). In all the models the determinants are very small; in model 1.8 the data matrix is close to being singular, i.e., there is an almost perfect covariation among some columns. The VIF’s of this latter model are also very high (maximum of 18.94). As noted above the reliance on a single indicator associated .with an exact threshold may be misleading and is not recommended. For example, Marquardt (1970) suggested that a value larger than ten for one of the VIF’s

C. Ofir,

A. Khuri

/ Multicollineuriiy

would constitute a problem. The analysis of the data associated with models 1.5 and 1.6 in table 1 suggests that the WJ”s are all less than 10. According to the latter rule a problem may not exist. The other indicators associated with these data (table l), however, suggest that multicollinearity is present. The effects of multicollinearity on the estimates could be observed by means of the high condition numbers (i.e. X max/X min), suggesting that the estimates are very sensi-

Table 1 Detecting

and analyzing

in marketing

models

187

tive to changes in the data. The same information could be obtained by means of the maximum VIF which is a lower bound for the condition number. Further indication regarding the estimates may be obtained by CFCp_l/hj which would suggest that the estimates were inflated. Precision (or lack thereof) is examined by means of M, which is the ratio of the average squared distance between p^ to p to that of an orthogonal system. The M’s for all models

multicollinearity.

Study

area

Model

Determinant X’X

100 (Ix’xI-“2 -1)

X max/ Xmin

cp=,1/ X,

M =

VIF

Public

policy

1.5 1.6 1.8

0.002 0.0007 0.0000084

2 136 3 680 34 103

48.71 52.67 174.48

41.32 45.55 88.54

2.43 2.53 4.43

1.17-7.50 1.20-7.69 1.28-18.94

Consumer attitudes and intentions

Low brand loyalty Income over 10,000 Low education

0.07 0.178 0.229

278 137 109

35.52 10.19 9.52

16.95 7.89 7.01

5.65 2.63 2.34

3.86-8.88 2.42-2.83 2.03-2.74

Churchill, Ford and Walker (1976)

Sales force

Job satisfaction

0.189

130

14.53

15.02

1.88

1.03-3.80

Lindberg

Demand casting

Equation 5: US-black & white US-color Denmark Finland Norway West Germany

0.0024 0.000026 0.00062 0.00059 0.000081 0.00013

1941 19512 3916 4017 11011 8 671

7205.89 5698.18 9890.96 7786.46 10007.70 6116.05

699.49 2836.87 1156.10 1088.52 2465.55 2136.38

Author(s)

Bloom Fran&

Bonfield

and (1981)

(1974)

(1982)

fore-

Lucas, Weinberg and Clowes (1975)

Sales management

East region West region

0.183 0.252

Palda (1964)

Cumulative advertising effect

Current effect: Period 1 Period 2 Period 3 Period 4

Store

Loyalty

Samli and Sirgy (1980)

Note:

loyalty

&her models and methods Houston and Weiss (1975),

model

233.16 945.62 385.36 362.84 821.85 712.13

range

255.97-2688.17 59.61-1518.40 43.96-607.08 44.37-583.72 56.47-1291.16 48.71-1113.60

134 99

12.25 10.90

8.99 8.14

2.25 2.04

1.15-3.14 1.66-3.09

0.042 0.00038 0.011 0.0072

388 5 030 853 1078

52.22 139.64 90.27 104.21

28.07 70.76 34.31 40.60

7.02 17.69 8.58 10.15

3.81-11.85 12.47-21.47 2.54-17.71 3.88-21.48

0.236

1695

10.78

10.41

1.73

1.05-2.91

of analysis were also conducted with the Lydia Pinkham Weiss, Houston and Windel (1978), Winer (1979)).

data

(e.g.,

Clarke

and McCann

(1977),

188

C. Ofir,

A. Khuri

/ MulricoNitwurify

are high. Armed with this fact, and with the values of the percentage increase in the confidence region of p due to multicollinearity, we can conclude that the estimates are imprecise. Overall, application of various measures to the data shows that the data are ill-conditioned and that the precision and stability of the estimated parameters are adversely affected by multicollinearity. The empirical examples also show that models consisting of interactions and higher order terms are likely to suffer from multicollinearity. Lindberg (1982) modeled relative demand via a linear model which included polynomial terms. In all the data sets taken from Lindberg’s study the determinants are very small while all other indicators are very high, which is indicative of the severe effects of multicollinearity on the estimated coefficients. In such ill-conditioned data the quality of prediction is highly questionable; in particular outside the region of points where these models were fitted. In summary, the empirical examples presented in table 1 illustrate the application of various collinearity measures, and demonstrate vividly that market researchers are very vulnerable to the multicollinearity problem. Unfortunately, a systematic procedure to detect multicollinearity has rarely been used in the marketing literature. The unsatisfactory and potentially misleading heuristic practice of observing correlations among the predictors is, however, used more often. It is therefore strongly recommended that a diagnostic procedure be applied before using parameters estimated with ordinary least squares.

in murketitlg

localization of multicollinearity and are available in current computational packages. 3 The first, introduced by Belsley et al. (1980), decomposes the variance of the estimates. The entries in the resulting matrix are proportions of the variance of the ith estimate associated with the relevant eigenvalue. If one small eigenvalue contributes to variances of several parameters, multicollinearity and its patterns are identified. While this procedure is generally available, the usage and interpretation of the results may require from some researchers to acquire additional knowledge and training (see Belsley et al. (1980: ch. 3)). The second procedure was suggested by Gunst and Mason (1977a) who pointed out that large elements in the eigenvectors which correspond to small eigenvalues of X’X indicate which independent variables are most involved in multicollinearity. The basis for this method is the relation (Xa,)‘( Xu,) = Xi. Given a very small eigenvalue X,, the elements of a, define the multicollinearity among the precitors (a, is the i th eigenvector of X’X). This localization method is very useful in practice for the following reasons: (1) it is directly connected to the definition of multicollinearity (e.g., Gunst (1984), Sharma and James (1981) Silvey (1969)), and (2) it relates directly to the familiar principal component.’ Empirical illustration: To illustrate the localization procedure, data used by Mahajan et al. (1977) are employed here. The data represent shopping patterns of customers in a study that attempts to predict probability of

3 Farrar

3.3. Localization

of multicollinearity

The above indicators are very useful in detecting and analysing the effects of multicollinearity. If used independently, however, they fail to provide deeper insights into the nature of the problem. Two methods are recommended for the

models

offering

and Glauber

(1967)

attempted

to address

this issue,

the following F-test, W; = ( VI4 - 1) _“-p The ( p-1 1 hypothesis is that the i th predictor variable is independent of all other p - 1 predictors. The latter test may be significant in cases of either mild or high multicollinearity and thus is not discriminative. Another method, Farrar and Glauber (1967) suggested, is based on the partial correlation between two predictors. Belsley et al. (1980: 185-186) however, showed that this procedure also lacks discrimination and may indicate a problem for variables which are not involved in any collinear relation.

C. Ofir, Table 2 Regression Eigenvalues ( X’X

hi

Detection, 6.897

analysis 1.059

/ Multicollinearity

in marketing

models

0.188

0.129

189

and localization. 0.739

0.462

0.349

0.081

0.064

0.032

1 = 0.0000035

lOO( 1X’X Xmax/Xmin

cp,,

diagnostics:

A. Khuri

I- 1’2 - 1) = 53 352 = 215.53

1 h, = 79.90

M = 7.99 Eigenvectors

associated

with

Variables

VIF,

4

3.330 7.526 8.832 14.641 7.488 6.643 2.947 6.783 5.875 15.831

*2 x3 x4 xs X6 x7 %I x9 x10

three

smallest

eigenvalues

Rf / 0.699 0.867 0.887 0.932 0.866 0.849 0.661 0.853 0.830 0.937

V,(X,

= 0.081)

-0.110 0.247 0.186 -0.151 - 0.683 0.589 0.178 - 0.035 - 0.146 - 0.041

shopping at a specific retail outlet. 4 The ten predictors are various objective attributes of the store. Table 2 presents the results of the diagnostic procedure. It is evident that the data are ill-conditioned; the determinant is close to zero, the ratios associated with the eigenvalues are very high and the V1F’s are much higher than unity (as in an orthogonal system) (2.95-15.83). These indicators detect the problem but do not enhance our understanding of the pattern of multicollinearity. The eigenvectors associated with the smallest eigenvaluesare also presented in table 2. Large entries in each of these vectors indicate which predictor variables are most involved in multicollinearity. The T/IF’s are also listed here 4 This probability original study

was transformed before the analysis (Mahajan et al. (1977: 588)).

in the

V9( A, = 0.064)

V,o( A,, = 0.032)

-0.126 - 0.525 - 0.087 0.478 -0.185 0.216 - 0.249 - 0.073 -0.151 0.526

0.213 0.070 - 0.356 - 0.554 0.037 0.001 0.039 0.337 - 0.280 0.566

to aid the analysis. In eigenvector I&, which is associated with the smallest eigenvalue (A,,), variables 4 (VIF, - 14.64) and 10 ( VIEi, = 15.83) are clearly involved; variable 3 with V1F of 8.83 could also be considered in this set. The multicollinear variables according to the two other eigenvectors(V, and V,), which correspond to the next two smallest eigenvalues A, and A,, respectively, are: 2( VII;; = 7.53), 4( V1F4= 14.64),and lO( V1Fi,, = 15.83); and 5( VU” = 7.49), and 6( V1& = 6.64), respectively. We therefore conclude that three major multicollinear sets exist in the data. These results clearly indicate the pattern of severe multicollinearity. In addition to its simplicity the advantage of this procedure is that it can trace a multicolinear set that includes variables which are not necessarily associated with very high correlations. The reverse holds as well, i.e., not all variables

C. Ofir,

190

associated with high correlations in these sets.

4. Remedial

A. Khuri

/ Muhicollineurity

are included

measures

Since some sources of multicollinearity are under the control of the researcher the problem can be avoided or reduced. First, careful selection of the model to be used in a specific population can prevent problems of inherent collinear terms. (For a useful approach see Hendry (1979, 1983).) Unfortunately, this is not always possible due to insufficient knowledge regarding the underlying process. However, careful consideration of the predictor variables and their corresponding ranges may help. Secondly, whenever possible, polynomial models should be designed so that orthogonal polynomial procedures can be applied (Bright and Dawkins (1965)). In cases where this is not possible, the predictor variables could be centered before creating the polynomial terms. It was argued that this would improve the condition of the data (Bradley and Srivastava (1979), Marquardt (1980), Snee and Marquardt (1984)). Finally, proper sampling may prevent sample-based multicollinearity. In the event that the problem arises there are several methods to deal with it. These methods will be briefly discussed in an effort to both highlight their limitations and to propose potential improvements in their application. 4.1. Assessment criteria The questions facing the analyst confronting ill-conditioned data include which method to. use, and whether the method chosen outperforms OLS. Superiority of one estimator over another is usually assessed by comparison of the mean square error (MSE) matrices (e.g., Lilien and Kotler (1983: 106)). Such matrices are of the form E[( p^ - p)( p - p)‘].

in murketing

models

Applying the trace operator to a MSE matrix we get the foJlowing scalar: trace MSE (p^) = E( p - /3)‘( p - p). Following Toutenburg (1982) this latter criterion is referred to as the scalar mean square error (SMSE). 5 OLS estimates are unbiased. When they are based on ill-conditioned data, however, they do have certain negative properties, one of which is large variances. In contrast, all the proposed remedies are biased. These procedures could potentially reduce the SMSE (via variance reduction); the tradeoff, however, is some bias. Researchers may also be concerned with the adequacy of the estimators with regard to model predictions (e.g., Allen (1974), Cattin (1981)). In such cases a researcher may consider using the mean square error of prediction (MSEP) as a criterion. The actual computation of this latter indicator, however, is not needed. Theobald (1974) provided a general theorem which could be applied here; specifically, if an estimate is superior to another with regard to the SMSE it will outperform it with regard to its MSEP. 6 We shall now discuss some remedial measures that are available in the literature to combat the multicollinearity effects. 4.2. The addition

of new data

We have already mentioned this method in conjunction with the effect of multicollinearity on prediction. It is noteworthy here to point out that the addition of new data may not necessarily solve the multicollinearity problem. This is particularly true when multicollinearity is inherent in the system generating the data due to certain structural con5 Unbiased estimate is defined as E(6) = 0; Bias is defined as: E(6)0; MSE(8) =Var(@)+(E(#)e)(E(d)0)‘. The trace of MSE, namely, the sum of the diagonal elements is given by SMSE = E[(f? - 0)‘(8 - O)]. 6 The MSEP is given by MSEP = E[(B - B)‘XdX,( j3 - B)], where, X0 is a matrix of new observations on the independent variables.

C. Ofir,

A. Khuri

/ Multicoliineurity

straints on the predictor variables. Furthermore, of MSE, the new data may not be consistent with the original data due to conditions that did not prevail when the original data were collected. 4.3. Omission of variables from the model A common method used to deal with multicollinearity is to drop collinear variable(s). This method is very simple to implement and is designed to obtain more precise estimates of the set of parameters judged to be relevant. Its effectiveness is contingent upon the multicollinearity being inherent in the system generating the data. If, however, multicollinearity is a characteristic of only the sample data and not of the population, then the omission of variables can have adverse effects on the prediction of future response values. Researchers should also be aware of the following limitations of this method: (1) procedures used to select the preferred set (e.g., stepwise regression) are affected by multicollinearity (e.g., Hocking (1983)). Alternatively, this selection is frequently based on subjective judgment. In either case the selection is arbitrary; (2) dropping variables from models in which every parameter has theoretical importance does not enhance the researcher’s ability to reject the model, nor does it improve his/her chances of increasing understanding of the underlying process; (3) unless the omitted variables are orthogonal to those retained, OLS estimates of parameters in the reduced model are biased (see appendix A). The estimates of the omitted set of variables, which are set to zero, are also biased, except for the case where the ‘true’ parameters are indeed zero, and (4) it is impossible to determine whether the mean square error (MSE) of the estimated parameters of the reduced model is superior to the MSE of the same set of parameters estimated with OLS applied to the full model.

in marketing

4.4. Principal

models

191

components

The principal components procedure was originally introduced by Hotelling (1933) and was only later adopted to regression analysis (e.g., Massy (1965), for applications see Bechtel (1973)). Principal components estimates were shown to be superior to OLS estimates in several simulations (e.g., Dempster et al. (1977), Gunst and Mason (1977b), Lawless and Wang (1976)). These simulations, however, could not guide the analyst when to prefer principal components over OLS in a real setting. A preferred method is to derive the difference between the SMSE of OLS and principal components; if it is positive, use principal components (for details and recommendations see appendix A). Utilization of the principal components procedure should be accompanied by careful consideration of its limitations. First, the number of principal components utilized in the model is reduced by the number of zero, or near zero, eigenvalues of the X’X matrix (see model (1)). The decision as to how many ‘small’ eigenvalues should be made zero is not so clear (see appendix A for some recommendations). Secondly, it is applicable to multiplicative models only if all the observations have positive values fit to a logarithmic transformation (Mahajan et al. (1977)). In addition, researchers using principal components should always examine whether they gain any improvement over OLS in terms of SMSE. 4.5. The Bayesian approach Multicollinearity results in some loss of information concerning the model’s parameter vector. In the Bayesian approach p is a random variable with a known prior distribution. The specification of the prior is tantamount to providing additional information concerning p. The use of the Bayesian approach in regression was discussed in

192

C. Ofir.

A. Khuri

/ Multicollitwuri~v

Learner (1973, 1978) and Zellner (1971). One of the main criticisms of this approach is concerning its subjectiveness as it requires an exact specification of the prior distribution. In this respect a study of the sensitivity of the posterior mean to changes in the prior distribution, particularly those in the prior variante-covariance matrix, would be very useful. Chamberlain and Learner (1976) considered a given location of the prior and developed a correspondence between classes of prior variance-covariance matrices and bounded regions in the space of the posterior mean vector. The bound for this mean vector was generalized by Learner (1982) by assuming that the prior variance-coveriance matrix is constrained to lie between a minimum variante-covariance matrix and a maximum variante-covariance matrix. A Bayes-like technique, called mixed estimation, was proposed by Theil and Goldberger (1961) and Theil (1963) in which certain prior restrictions are made on p. These restrictions are formulated in terms of an additional model involving /3 and a random vector independent of the error vector E. The prior and sample information (provided by the original model) are combined via the method of weighted least squares to produce an unbiased mixed-estimation estimator. The advantage of the mixed estimation technique is that it allows the user to incorporate prior knowledge about p in the model without having to specify a particular prior distribution for p. If, however, normality is assumed and the prior knowledge about p can be used to determine the mean and variance, then a normal prior would be fully specified.

5. Ridge regression Ridge estimates developed by Hoer1 and Kennard (1970a) are based on adding a positive constant k to the diagonal elements of the X’X matrix where X is the matrix in

in marketing

models

model (1). Ridge estimates are biased but have lower variances, and thus have the potential to reach lower mean square error (see appendix B). Many studies have demonstrated that with little bias and only small reasonable increases in the residual sum of squares (RSS), there is a reduction in the variance and an improvement of the SMSE (e.g., Mahajan et al. (1977), Marquardt and Snee (1975)). It is impossible, however, to choose such a k that would minimize the SMSE without knowledge of fi (the ‘ true’ parameters). Therefore, the practical question of which k to choose is relevant. In the following sections this problem is addressed. 5.1. The admissible

range

A major objective of ridge regression is to obtain a smaller scalar mean square error (SMSE) than OLS. As was pointed out above, it is not possible to find an optimal k that will provide the minimum SMSE. It is of interest, therefore, to obtain an interval (0, k,) where any k-c k, would provide a better SMSE than OLS. Such an interval constitutes an admissible range of k. In this range the difference between the OLS mean square error and the ridge SMSE is positive. The pro-, cedure is presented in appendix B along with some discussion implying that, given very illconditioned data, ridge estimates would outperform OLS estimates with regard to SMSE (for every k, or a very wide range of k). This too was demonstrated in simulations conducted by Goto and Matsubara (1979: 7), and confirms empirical findings that ridge estimates outperform OLS as multicollinearity increases (Hoer& Kennard and Baldwin (1975), Lawless and Wang (1976), Lin and Kmenta (1982)). Chow (1983) provided some explanation to these findings pointing out that ridge estimator could be viewed as a weighted average of the least squares estimator and a zero vector. In cases of high multi-

C. Ojir,

A. Khuri

collinearity and given some justification the assumption that the true parameters close to zero, ridge regression provides sonable estimates superior to OLS’s. 5.2. Algorithms

/ Multicollineurity

for are rea-

for the biasing parameter

Finding the admissible range is very useful for both diagnostic purposes and for procedures that combine it with selection rules for a specific k. (See discussion and illustration below.) There are many studies concerning the application of computational rules in choosing the specific biasing parameter k. In the marketing literature, however, only very few selection rules have been applied. Methods found to be useful in previous research (primarily simulations) are therefore presented and discussed in appendix B and illustrated below. 5.3. Choice of the biasing parameter As indicated above, there are many methods for determining k and many simulation studies were carried out in an attempt to find superior methods (e.g., Dempster, Schatzoff and Wermuth (1977), Gunst, Webster and Mason (1976), Hemmerle and Brantle (1978), Lawless (1978), Lawless and Wang (1976), Wichem and Churchill (1978)). The existence of so many techniques does not necessarily make the data analyst’s choice easier: (1) different methods perform well under different conditions; (2) it is hard to infer from the literature which method is ‘best’ for different conditions; (3) the fact that most studies are simulations makes recommending any particular method for given data more difficult, for in practice, the available data do not always conform to the conditions under which the simulation was conducted, and (4) Draper and Van Nostrand (1979) criticized simulations for being designed in a way that favors ridge regression estimates,

in marketing

models

193

Given the problems in selecting k on the basis of cumulative simulation results, and since the theoretical ‘optimal’ k has yet to be found, market researchers are advised to examine their specific data sets with several methods. Similar recommendations were proposed by Dempster et al. (1977: 78) who suggested: ‘( . . .) holding the design matrix X fixed at the values for a given data set and varying the factors and procedures of greatest concern to a particular data analyst ( . . . ) as may be indicated either by prior understanding of circumstances or by the properties of the given data set’. This kind of experimentation on data could be extended to several random samples which would be examined under different conditions. In many situations, however, time and cost considerations as well as the lack of large data sets inhibit extensive experimentation. In such cases it is suggested to compare the performance of several selection procedures with the specific data set. The ridge regression estimator has a Bayesian interpretation. If it can be assumed that the prior distribution of b is normal with mean zero and a variance-covariance matrix r21, then the Bayes estimator of p will be identical to a,,ridge estimator with a biasing parameter k = u*/r*, where a* is the error variance for model (1) (see Judge and Bock (1983: 643)). This fact has been a cause for criticizing ridge regression on the grounds that such implicit assumptions on the prior distribution of B are never examined. If the true prior distribution of /3 is known, then the corresponding Bayes estimator of p should be used instead. Smith and Campbell (1980) mention this and point out certain difficulties that can arise in the selection of the biasing parameter k. In particular, if the ridge prior is correct, then the ridge estimator is optimum for any quadratic loss function (see Dempster et al. (1977)). Empirical Bayes ideas have also been used in ridge regression (see Efron and Morris (1977)).

C. Ofir,

194 Table 3 Comparison

of ridge

A. Khuri

/ Multicollinemity

in marketing

models

procedures.

OLS

RT (1)

HM (3)

HKIM (4)

TI (5)

T II

(2)

FM

(6)

MC (7)

CM (8)

-0.154 - 0.352 0.142 0.226 0.033 0.010 - 0.102 0.162 0.078 0.053

- 0.246 - 0.674 0.245 0.531 0.073 0.047 - 0.249 0.144 0.076 0.055

- 0.440 - 1.139 0.468 1.165 0.092 0.102 - 0.472 - 0.037 0.151 - 0.059

- 0.431 - 1.123 0.458 1.139 0.093 0.100 0.472 - 0.027 0.146 - 0.051

- 0.162 - 0.383 0.152 0.252 0.037 0.014 -0.115 0.164 0.077 0.054

-0.131 - 0.269 0.118 0.163 0.021 0.002 - 0.066 0.152 0.079 0.049

- 0.476 - 1.203 0.513 1.278 0.091 0.110 -0.513 - 0.080 0.175 - 0.098

- 0.148 - 0.330 0.136 0.209 0.030 0.008 - 0.092 0.160 0.079 0.052

Estimated Parameters x1

- 0.486 - 1.220 0.526 1.310 0.090 0.112 - 0.522 - 0.092 0.182 -0.110

x2 x3 x4

x5 X6 x7 xs x9 X10

SMSE

0.1647

0.1300

0.2738

0.1707

0.1770

0.1487

0.0883

0.1606

0.1201

MSEP

0.0206

0.0171

0.0255

0.0205

0.0207

0.0182

0.0142

0.0204

0.0163

0.2 a

0.0617

0.0051

0.0062

0.1760

0.2932

0.0010

0.22

k ’ The

0 biasing

parameter

5.4. Empirical

(k)

was set to 0.2 following

Mahajan,

illustrations

In order to illustrate the performance of the various procedures,data used by Mahajan et al. (1977) are employed here. Multicollinearity diagnostics of this data are presented in table 2. The performance of the various selection methods will be also compared to those in five data sets taken from Lehmann’s (1971) study. ’ Table 3 presents comparisons of the performance of the methods for determining k. For each method the estimated coefficients, the scalar mean square error (SMSE) and the mean square error of prediction (MSEP) are presented. These results indicate that methods l(RT), 5(T I), 6(T II), 7(MG) and 8(CM) produce SMSE and MSEP lower than OLS’s. Furthermore, method 6(T II) is superior with regard to the quality of the estimates and prediction. Methods 8(CM), l(RT), 5(TI) and 7(MG) follow in descending order sf performance. It is interesting to note that methods 5(T I) and 6(T II) are based on the idea that

Jain

and Bergier

(1977:

589).

the V1F’s should be close to unity as they are in an orthogonal system. The ridge trace method (number 1) performed well, ranking third out of eight methods. Method 2 (k = l/F) performed worst of all in our example. This method (i.e., k = l/F) performed very well, however, in the five data sets taken from Lehmann’s (1971) study. With Lehmann’s data all the k’s according to this method produced SMSE’s lower than OLS’s. Method 6(T II), which performed very well in this example (table 4), performed worse than .OLS in the examples taken from Lehmann. Overall, theseempirical examples demonstrate that it is very likely that different methods should be used with different data sets and thus highlight the importance of examining several methods before selecting any k for further analysis. The examples also demonstrate the superiority of. ridge regression over OLS. In particular, method 6 produced a better SMSE than OLS (53.6% of OLS’s; the ridge trace method used by Mahajan et al. produced

C. Ofir,

SMSE which is only mean square error OLS’s) with only (versus 28% achieved

A. Khuri

/ Multicollineuri!v

78.9% of OLS’s), better of prediction (68.9% of 32.4% increase in RSS by ridge trace). Individual level modeling. Individual level modeling was chosen as the second illustration for the following reasons: (1) in these types of analysis several data sets (corresponding to individuals) are used with the same model. The method for the selection of k, however, should not necessarily be identical for all data sets; (2) individual level modeling typically suffers from small ratio of observations (n) to predictors ( p), which affects prediction, and (3) these models frequently have theoretical significance which rule out the option of dropping variables or artificial orthogonalization. Ridge regression is therefore very suitable, but it would require a very conservative approach in selection of the biasing parameter. The four individual level data sets used here, obtained from Ofir and Lynch (1984) who investigated the effects of context on judgment under uncertainty, are ill-conditioned and correspond to a linear model with four predictors and 24 observations. It is suggested that the admissible range procedure be used as the basis for the selection of the biasing parameter. This procedure is very conservative and guarantees that k’s within the range will provide superior estimates to OLS. In the event that several selection algorithms produce k’s within the range, the approximation of SMSE (see appendix B) should be applied to select the ‘best’ k within the range. The procedure is illustrated with four individual level data which are ‘near collinear’. The plots of the four functions D(k) (see appendix B) are presented in fig. 1. This function is the difference between the scalar mean square error (SMSE) of OLS and the ridge regression SMSE as a function of k. All k’s corresponding to positive D values reflect

in rnurketing

models

195

the superiority of ridge regression over OLS and constitute the admissible range. In certain cases, e.g., very ill-conditioned data or data with high u2, the admissible range is very wide (e.g., all positive numbers). In such cases SMSE could be used as a guideline in selecting k. This is demonstrated with data sets 1 and 2. For data 1 (see table 4) the correlation method (CM) produced the smallest SMSE: 0.028, which was 2.05% of OLS’s. This was achieved with only a slight increase in RSS (9%). The iterative procedure of Hoer1 and Kennard (HKIM) also produced estimates with very small SMSE: 0.047, which was 3.4% of OLS’s. The correlation method (CM) produced superior estimates with data 2 as well: SMSE = 0.035 (2.2% OLS’s) and with only 6.9% increase in RSS. The range obtained for data 3 and 4 is bounded and relatively small. Since every k within this range would produce superior estimates to OLS, the algorithm that produced the smallest SMSE within the admissible range is selected. The range for data 3 is 0 - 0.026; only two algorithms produced k’s within the range: HM with k = 0.014, and PRESS (PR) with k = 0.020. Their SMSE are, respectively, 0.247 (16% of OLS’s) and 0.188 (12% of OLS’s). Both provide substantial improvement over OLS estimates with only a negligible increase in RSS (3% and 4.5%, respectively). The admissible range for data 4 is 0 - 0.076; several methods produced k’s within the range. Specifically, FM, HM, PR and HKIM, all provided substantial improvements over OLS. The algorithm based on PRESS (PR) provided the smallest SMSE (5.7% of OLS’s) within the range and was therefore chosen (see table 4).

5.5. The James-Stein Estimator If the X matrix in model (1) is written in correlation form then the James-Stein esti-

C. Ofir,

196

The 60

admissible Individual

A. Khuri

/ Multicollinearity

range 1

in marketing

80 -

models

Individual

2

60 --

Individual

60r

Individual

3

4

40. 20 1 0

0 L -2o-

-6O0

0.2

0.4

0.6

0.8

1.0

K Fig. 1

Table 4 Illustration

of combined

Individual algorithm

x2 x3 x4

for k selection. 2

1

k SMSE MSEP RSSks Parameter x1

procedure

3

4

OLS

CM

OLS

CM

OLS

PR

OLS

PR

0 1.366 0.047 0.222 -

0.21 0.028 0.025 0.242

0 1.613 0.0521 0.247

0.20 0.035 0.028 0.264

0 1.541 0.028 0.132 0.026

0.02 0.188 0.019 0.138

0 0.606 0.020 0.097 0.076

0.07 0.035 0.011 0.100

0.024 0.912 0.458 - 0.183

0.178 0.377 0.264 0.282

0;310 0.723 0.113 0.005

0.199 0.427 0.174 0.274

0.001 - 0.682 0.965 0.808

0.441 -0.418 0.493 0.509

0.763 0.414 0.044 0.086

0.447 0.251 0.326 0.240

estimates

C. Ofir,

A. Khuri

/ Multicollinearity

mator of p is given by: (p - 2)ld t%s=

i

1 -(u

I

8,

where p is the number of elements in p, p^ is the ordinary least squares estimator of p, s* is an unbiased estimator of the error variance a*, usually taken as the residual mean square, and u is the corresponding number of degrees of freedom (see Sclove (1968)). The JamesStein estimator has a smaller SMSE than PI. Geometrically, this estimation technique amounts to shrinking /? toward the origin. It is to be noted that this shrinkage is applied equally to all of the elements of p. By contrast, the ridge approach applies more drastic shrinking where it has greater effect in reducing mean square error. A review of JamesStein estimation in regression is given in Draper and Van Nostrand (1979) (see also Judge and Bock (1983)).

6. Discussion

and summary

In recent years, statisticians have devoted considerable attention to the multicollinearity problem. The fruit of their efforts, however, has not been reflected in the work of market researchers. The objective of this paper was to bridge the gap between ‘developers and users’, (Hocking (1983: 226)) in the context of marketing models. Diagnosis and treatment of multicollinearity is important for those market researchers who frequently use nonorthogonal data with linear models. This problem is also closely related to variable selection techniques, since if data were orthogonal the problem of selection would not have existed or would have been trivial (Hocking (1976), see also Davidson et al. (1978)). Understanding the nature of multicollinearity is therefore essential. A diagnostic procedure to assess the condition of the data in regression analysis is recommended. The need for such a procedure

in marketing

models

197

was demonstrated with examples taken from a variety of areas in marketing, which illustrated that data used by market researchers are very likely to be ill-conditioned. In the absence of a systematic assessment of these data, misleading conclusions could be drawn from regression estimates. Regression diagnostics should be applied routinely to assess model adequacy (e.g., Snee (1983)) and a summary (e.g., eigenvalues and VP’s) reported, which would enable the reader to judge the quality of data and estimated parameters. The traditional methods employed to deal with multicollinearity are dropping variables from the model and artificial orthogonalization. The former method should not be applied mechanically nor used to remedy sample based multicollinearity. Model respecification of this kind should be done only after careful examination of the data by means of influential data diagnostics (e.g., Atkinson (1982a,b), Dempster and Gasko-Green (1981)) and collinearity diagnostics. Similarly, the utilization of principal components should not be done without considering this method’s limitations. Users should also check whether it performs better than OLS (in terms of SMSE). These two methods were recently criticized (Mahajan et al. (1977), Sharma and James (1981)), and ridge regression was suggested as a preferable solution. In the presence of an a priori estimate of /3, however, the market researcher is advised to consider using BayeSian or-mixed estimation procedure (e.g., Cattin et al. (1983), Toutenburg (1982)). In the absence of such explicit prior information, as is the case in many practical situations, ridge regression is recommended. The major objective of ridge regression is to obtain slightly biased estimates with low variances and thus improve the scalar mean square error (SMSE). The practical problem is how much bias to introduce, or equivalently, which biasing parameter, k, to use?

C. Ofir,

198

A. Khuri

/ Multicollinearity

In spite of the fact that many studies have attempted to identify a superior ridge procedure via simulations, no one method exhibited superiority. Although on the whole these studies suggest the superiority of ridge estimates over OLS’s, it is impossible to recommend any particular method for a given set of data. Our suggestion regarding the selection of k is that market researchers should examine their data with several ridge procedures. Several ridge regression procedures were introduced and illustrated with data taken from the marketing literature. It was further suggested that SMSE be used as the criterion in that it guarantees superiority with regard to the precision of estimates as well as predictions (i.e., MSEP). Researchers are also advised to examine the ridge trace in order to observe the stability of the estimates with respect to k. A procedure to find a range where any k would produce better estimates (in Slv&E) than OLS was introduced. The range obtained by this procedure is very conservative, and guarantees that any k within the range will produce superior estimates to those of OLS. It does not preclude, however, the existence of other well performing k ‘s. The strict application of this range might be desirable when the estimates are used for theoretical purposes or when the researcher wishes to be very conservative. In these latter cases, only the best method within the range should be used. In the presence of severe multicollinearity, however, the range is wide, a fact of which the data analyst ought to be aware. In cases where the researcher is interested mainly in a subset of parameters, we suggest consideration of the following weighted criterion: SMSE = E[( p - BR( k))‘B( j3 - BR( k)], where B is a diagonal matrix with entries of 1 or 0 depending on the subset parameters. This criterion is of interest because ir reflects only the set under consideration. Further, application of a. theorem proved by Theobald

c

in marketing

models

(1974) suggests that if the full set of a given ridge-estimated parameters outperforms OLS or other ridge estimates, any subset would outperform the corresponding OLS (or other ridge procedure) estimated subset as well. Practically, the data analyst has only to examine several procedures and to choose the one which is superior in terms of scalar mean square error (SMSE). This would guarantee that any subset of estimates is also superior. Biased estimation is promising and worthwhile, at the very least, for gaining better understanding of the variables space. This is consistent with other indications in the social sciences as to the potential benefits of biased estimation (e.g., Keren and Newman (1978)). The paper suggests several practical procedures to diagnose the condition of the data and select the biasing parameter. These procedures are easy to implement and are available with current statistical packages. Appendix

A

Remedial

measures

Omission of variables from the model Let us partition the data matrix to two sets of variables, X = (X,, X,,-,). X,,-, are variables highly collinear with the remaining predictors X,. Assume that x,-I are omitted from the mode. The ‘new’ estimates are given by:

I- (6) Principal

components

The model. sider the linear new variables tion: Z = XA,

For the discussion below conmodel given in eq. (1). Define by the following transformawhere the columns of A are

C. Ofir,

A. Khuri

/ Multicollinearity

in marketing

models

199

the orthonormalized eigenvectors of X’X associated with the corresponding eigenvalues. If some eigenvalues are very small the researcher may reduce the dimensionality of the data in the hope of getting ‘better’ estimates. This is done by using only those eigenvectors associated with large eigenvalues: Z, = XA,, t < p where A I is obtained from A by deleting eigenvectors with small eigenvalues. The model with the new variables is Y = Ztut + E, where C(= (Z,‘Z,) - ‘Z,‘Y, and the principal components estimates for the original model (i.e., eq. (1)) are &c = A,+ SMSE Comparison. A method which could guide whether to use principal components is, to derive the following difference: SMSE(P,,,) - SMSE($,,). It can be shown (e.g., Dhrymes (1978)) that the difference is given by:

than full rank models (as in the case of exact multicollinearity, or principle component regression where some eigenvalues are set to be zero). In this algorithm a preliminary screening procedure is used to determine which subsets of the parameters in the full model are ‘testable’. Tests of significance of testable subsets are subsequently performed on the basis of Scheffe’s Maximum Principle. A testable subset of the parameters is ‘adequate’ if the corresponding model provides a regression sum of squares not significantly less than the full model. The selection of ‘important’ predictor variables is then made from those predictor variables which correspond to ‘adequate’ subsets of the parameters.

u2 i

Ridge regression

(l/hi)-p’p+u;u,.

(7)

Appendix

B

i=t+l

Since j3, uI and u,’ are unavailable, they can be replaced by poLs, Gt, and c?&, respectively. In cases of high multicollinearity the term C,“,l+ ii/h, has a very high value since the included eigenvalues (i = t + 1,. . . , p) are very small. This suggests that in high multicollinearity the difference in SMSE (i.e. eq. (7))js probably positive and ppc is preferred to POLS Using the term in (7) is also very conservative since typically &s~ooLs, which is used instead of p//3, is inflated. Similarly, for the decision whether to use the rnth component, compute the following: &/Am - i?;. If it is positive, do not use the mth component, but rather a model with m - 1 components. Testing of parameter rank models.

subsets in less than

full

Recently Khuri (1980) introduced an algorithm for the selection of subsets of predictor variables that provide almost the same information as the full set of predictors in less

The model The ridge estimates of @ in model (1) is: j&(k)

= (X’X

+ kl)-‘X’Y,

(8) where k is a positive constant. The scalar mean square error (SMSE) is given by: Ai SMSE = a* i i=l (Xi + k)2 +k2P’(X’X+

kI)-2/3.

(9)

Following Sharma and James (1981) SMSE was estimated with PR(k) replacing p. We note that the OLS estimator of j3 is invariant under reparameterization of the model in (l), but the ridge regression estimator is not. To illustrate this, let us write model (1) as: Y=zy+&, where Z = XA, y = A -‘p, where A is a nonsingular matrix of order p x p. The OLS estimator of y is p = (Z’Z)-‘Z’Y= A-‘/?,

C. Ofir,

200

whereas the ridge regression B,(k)

= (z’z+

A. Khuri

estimator

/ Multicollinearity

is:

kz)-lZ’Y,

which differs from AplbR( k). The mean square error of prediction (MSEP) is given below as well as the proof for its derivation. Let fiR(k)

= Zfi,

where Z=

[z+~(x’x)-‘I-‘,

Jq@R(k)]

= a.

and

in marketing

The nontrivial solution of the equality (i.e., D(k) = 0) will provide k,. 8 In order to solve this equation the OLS estimates &J.?ooLs are used to replace p’@, and the OLS error mean square ( s2 = RSS,,,/( n -p - 1)) is used to replace u2. The term &s/?oLs/s2 is expected to be larger than the real term for which it substitutes and thus a very conservative estimate of k, results. 9 It is also interesting to note that for very ill-conditioned data the smaller eigenvalues are close to zero, and thus 5 l/A;

Va+Mk)l

models

is very large. In such cases D(k)

is

ipl

=Z(X’X)~‘Z’u2= x(xtx)-‘[z+ = [(z+

either always positive wide range of k.

[z+k(X’X)-‘1-l k(X’X)~‘1 kl)]

Algorithms

-2

= [X’X+2kz+k2(X’X)-‘]-102

- P)‘X’X(p^,(k)

-P>]

= u2 tr[ X’XG] +P’(z-Z)X’X(Z-Z)P. of MSEP is given by:

MSEP[&(k)] = S2 tr[ X’XG] Z)X/X(Z-

lo

(RT)

The objective is to choose a k such that pR( k) will be stable with respect to k (Hoer1 and Kennard (1970a, b)). This is done by plotting /3,(k) versus k and choosing k subjectively. This method is very easy to implement and was superior to OLS in simulations. A discussion of this method illustrated with marketing data, may be found in Mahajan et al. (1977). The ridge trace method is criticized mainly for the subjective selection of k, which makes

MSEP[&(k)]

+&(k)(Z-

for the biasing parameter

The Ridge Trace Method

= Ga2.

The estimate

in a very

-la2

k(X’X)~‘)(x/x+

= E[(lj,(k)

or positive

Z)&(k). 00)

The admissible range The interval (0, k,) could be obtained by equating the following function to zero (Goto and ‘Mastsubara (1979)):

D(k) =5(l/hi) - i xi i=l

i=l

(hi

+

k)2

.

(11)

s This analysis is performed with the Newton-Raphson algorithm which is available in current computational packages. 9P oLs B oLs is larger than /3’/3. It remains to be shown that s* is underestimating u2 for &&,;)Ls/s2 to be large than s* /.Y/V/o*. Since (n - p) is distributed with a central u* &i-square with n - p degrees of freedom, the probability that s* underestimates (Y* is higher than 0.5. i” Latent root regression is another method which has been found to be useful, and is discussed in detail and illustrated with marketing data by Sharma and James (1981) (see also Webster et al. (1974), Gunst et al. (1976), Gunst and Mason (1977b)). It is interesting to note that in their example based on Mahajan et al. (1977) data, the ridge trace method produced a smaller SMSE.

C. Ofir,

A. Khuri

/ Mutticollineariiy

k stochastic (e.g., Sharma and James (1981)). Considerable support, however, exists for this method due to its simplicity and convenience (e.g., Montgomery and Peck (1982: 340), Marquardt and Snee(1975)). This is also the most frequent method used. Another trace-oriented method was introduced by Vinod (1976). This method suggestsinstead of k a parameter termed ‘multicoll$rearty allowance’ given by m =p I

- ,Fl h.+k’

This method enables a nonsto-

chastic bperationalization of the concept of stable region, and provides a clearer plot for the selection of the biasing parameter. Vinod also demonstrated that m could not give the misleading impression of greater stability for large m based on orthogonal data. An illustration of this method with marketing data may be found in Jagpal (1982). Hoerl, Kennard

and Baldwin

Method

(HM)

The origin of this method is connected with Generalized Ridge regression(Hoer1 and Kennard (1970a), Goldstein and Smith (1974)).It may be easily understood by considering the principal component model; the traceof the mean square error in terms of this model is given by:

(12) where yj is the ith element in the vector y = Pp, with P being an orthogonal matrix whose columns are eigenvectors of X’X. Minimizing SMSE with respect to ki results a2 in the optimum k,‘s: kj = 1, i = I,.. . , p. Yi

Hoer1 et al. (1975) suggested using the harmonic mean of the k,‘s which results in a single biasing parameter (see also Farebrother (1975)). k = E 2 (I& and p, respectively). l

and fro,, replace a2

in marketing

models

201

This estimator performed well in simulations conducted by Hoer1 et al. (1975) and Gibbons (1981). In the simulation study conducted by Goto and Matsubara (1979) it performed best of nine methods, although somewhat less promising results were obtained by Wahba (1977) and Wichern and Churchill (1978). Draper and Smith (1981) pointed out yet another advantage of this method in showing that it is a reasonable choice from a Bayesian perspective. Hoer1 and Kennard

Iterative

Method

(HKIM)

A modification of HM was introduced by Hoer1 and Kennard (1976); it is given by: kj=po2/~~~,-,~~~~,-,,, where p in the original formulation (see HM above) is replaced by the estimates of the previous stage, starting with OLS, and continuing until (kj kjel)/k,-, < 20[trace (X’X)-‘/p]-‘.3. In general the iterative method (i.e., HKIM) was found superior to HM in simulations. Trace Method

(T I and T II)

The general purpose of ridge regression is to stabilize the system in such a way that it has characteristics of an orthogonal system (Hoer1 and Kennard (1970a)). A major indicator of stability is the UF’s (Marquardt (1970)). It has also been demonstrated that stability of the ridge estimates with respect to k implies I’1F’s similar to those in an orthogonal system (Goto and Matsubara (1979)). This objective could be operationalized by selecting k so that all VIF’s would be close to one (e.g., Marquardt (1970), Price (1977)). This method, however, is subjective and unsatisfactory. The trace methods presented below quantify this objective analytically. The first trace method (T I) is given by trace (A) =p, where, A = (X’X+ kI)-‘X’X (X’X + kI)-‘. The second trace method (T II) is given by Trace A = p/( I + k) 2 (Goto

C. Ofir,

202

A. Khuri

/ Multicollinearity

and Matsubara (1979)). In both methods the equations are solved for k. The resulting ‘optimal’ ridge k provides I/IF’s which are close to those found in an orthogonal system. The empirical attractiveness of these methods is illustrated in the text (see table 4). McDonald

and Galarneau

Method

(MG)

The length of the OLS coefficient vector is very high and biased for ill-conditioned data. McDonald and Galarneau (1975) suggested that k be selected that would provide a smaller coefficient vector, which is in fact an unbiased estimate of j3’p. The method is given by:

i=l

where a^2 is an unbiased estimate of u2. When the right-hand side of eq. (13) is positive, choose a k which satisfies the equation, otherwise k = 0. This method performed well in several simulations (e.g., Gibbons (1981) McDonald and Galarneau (1975)). The Correlation

Method

(CM)

The correlations among the fiR( k) can be expressed as a function of k, and a criterion can be formulated based on the sum of squares of the correlations between the coefficients as suggested by Obenchain (1975). It is given by: * = Zpi,jRfi, where Rij is the (i,j)th element of the matrix R = (Diag A)-l12 A(Diag A)-‘12, and A = (X’X + kI)-lX’X ( X’X + k1) - ‘. The selected k is the one minimizing q. The attractiveness of this method is illustrated in the paper (see tables 4 and 5). The F-Method

(FM)

This method is given by k = l/F, where F is the statistic associated with the overall test of the OLS model (Cattin (1981)). This method is very easy to use, requiring only single OLS estimation. Cattin (1981), applied

in marketing

models

this method to ‘real’ data sets and concluded that it performed well. PRESS (PR) PRESS, a relatively new data-splitting technique, was introduced by Allen (1974). The method predicts the i th observation ( y,), with the model estimated by means of n - 1 observations. This is repeated for all n observations. PRESS is defined as the sum of squares of the omitted residuals. This technique simulates actual prediction and is recommended for model validation. For selection of the biasing parameter, PRESS is expressed as a function of k; the value of k that minimizes PRESS is selected as the biasing parameter (e.g., Goto and Matsubara (1979)). Simplifying forms for computation are provided by Goto and Matsubara (1979) and Montgomery and Peck (1982). The technique’s empirical attractiveness was illustrated in the paper.

References Allen, D.M., 1974. The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125-127. Atkinson, A.C., 1982a. Regression diagnostics, transformations and constructed variables. Journal of the Royal Statistical Society 44, l-22. Atkinson, A.C., 1982b. Robust and diagnostic regression analysis. Communications in Statistics All, 2559-2572. Bartlett, M.S., 1954. A note on multiplying factors for various &i-squared approximation. Journal of the Royal Statistical Society, Series B 16, 296-298. Beaton, Albert E., Donald Rubin and John L. Barone, 1976. The acceptability of regression solutions: Another look at computational accuracy. Journal of American Statistical Association 71 (353), 158-168. Bechtel Gordon G., 1973. Nonlinear submodels of orthogonal linear models. Psychometrika 38(3), 379-392. Bechtel, Gordon G. and P.J. O’Connor, 1979. Testing micropreference structures. Journal of Marketing Research 16, 247-257. Bechtel, Gordon G., 1981. Metric information for group representation. In: Ingwer Borg (ed.), Multidimensional data representations: When and why. 441-419. Ann Arbor, MI: Mathesis Press.

C. Ofir,

A. Khuri

/ Multicollinearity

Belsley, David A., 1984. Demeaning conditioning diagnostics through centering. The American Statistician 38(2), 73-77. Belsley David A., Edwin Kuh and Roe E. Welsch, 1980. Regression diagnostics. New York: John Wiley and Sons. Berk, K.N., 1977. Tolerance and condition in regression computations. Journal of the American Statistical Association 72, 863-866. Bloom, Paul N. and Frank J. Franzak, 1981. Can consumers be protected from themselves? The case of distilled spirits. In: Andrew A. Mitchell (ed.), Advances in consumer research Vol. IX, 315-320. Ann Arbor, MI: Association of Consumer Research. Bonfield, E.H., 1974. Attitude, social influence, personal norm, and intention interactions as related to brand purchase behavior. Journal of Marketing Research 11, 379-389. Bradley, Ralph A. and Sushi1 S. Srivastava, 1979. Correlation in polynomial regression. The American Statistician 33(l), 11-14. Bright, J.W. and G.S. Dawkins, 1965. Some aspects of curve fitting using orthogonal polynomials. Industrial and Engineering Chemistry Fundamentals 4, 93-97. Brodie, Roderick and Comelis A. de Kluyver, 1984. Attraction versus linear and multiplicative market share model: An empirical evaluation. Journal of Marketing Research 21, 194-201. Cattin, Philippe, 1981. The predictive power of ridge regression: Some quasi-simulation results. Journal of Applied Psychology 66(3), 282-290. Cattin, Philippe, Alan E. Gelfand and Jeffrey Danes, 1983. A simple Bayesian procedure for estimation in a conjoint model. Journal of Marketing Research 20, 29-35. Chamberlain, G. and E.E. Learner, 1976. Matrix weighted averages and posterior bounds. Journal of the Royal Statistical Society 38, 73-84. Chow, Gregory C., 1983. Econometrics. London: McGraw-Hill. Churchill, Gilbert A., Neil M. Ford and Orville C. Walker Jr., 1976. Organizational climate and job satisfaction in the saleforce. Journal of Marketing Research 13, 323-332. Clarke, Darral C. and John McCann, 1977. Cumulative advertising effects: The role of serial correlations; a reply. Decision Sciences 8, 336-343. Cochrane, D. and G.H. Orcutt, 1949. Application of least squares regression to relationships containing autocorrelated error terms. Journal of the American Statistical Association 44, 32-61. Cohen, Jacob and Patricia Cohen, 1975. Applied multiple regression/correlation analysis for the behavioral sciences. Hillsdale, NJ: Lawrence Erlbaum. Daniel, Cuthbert and Fred S. Wood, 1980. Fitting equations to data. (2nd edition.) New York: John Wiley and Sons. Darlington, Richard B., 1978. Reduced-variance regression. Psychological Bulletin 85, 1238-1255. Davidson J.E.H., D.F. Hendry, F. Srba and S. Yeo 1978. Econometric modelhng of the aggregate time-series relationship between consumers’ expenditure and income in the United Kingdom. The Economic Journal 88,661-692. Dempster, A.P., Martin Schatzoff and Nanny Wermuth, 1977. A simulation study of alternatives to ordinary least squares. Journal of the American Statistical Association 72(357), 77-91.

in marketing

models

203

Dempster, A.P. and M. Gasko-Green, 1981. New tools for residual analysis. Annals of Statistics 9, 945-959. Dhrymes, Phoebus J., 1978. Introductory econometrics. New York: Springer Verlag. Draper, N.R. and R.C. Van Nostrand, 1979. Ridge regression and James-Stein estimation: Review and comments. Technometrics 21, 451-466. Draper, N.R. and H. Smith, 1981. Applied regression analysis. New York: John Wiley and Sons. Efron, B. and C. Morris, 1977. Comment to ‘A simulation study of alternatives to ordinary least squares’ by A.P. Dempster, M. Schatzoff and N. Wermuth. Journal of the American Statistical Association 72, 91-93. Erickson, Gary M., and David B. Montgomery, 1980. Measuring the time-varying responses to market communication instruments. Proceedings of the First ORSA/TIMS Conference on Market Measurement and Analysis, June, 55-68. Farebrother, R.W., 1975. The minimum mean square error linear estimator and ridge regression. Technometrics 17(l), 127-128. Farrary, Donald E. and Robert F. Glauber, 1967. Multicollinearity in regression analysis: The problem revisited. The Review of Economics and Statistics 49, 92-107. Fart-is, Paul W. and Robert D. Buzzele, 1979. Why advertising and promotional costs pay: Some cross sectional analysis. Journal of marketing 43, 112-122. Geary, R.C. and C.E.V. Leser, 1968. Significance tests in multiple regression. The American Statistician 22, 20-21. Gibbons, D.G., 1981. A simulation study of some ridge estimators. Journal of the American Statistical Association 76, 131-139. Goto, Masashi and Yoshihiro Matsubara, 1979. Evaluation of ordinary ridge regression. Bulletin of Mathematical Statistics 20, 1-35. Green, Paul E., 1978. Analysing multivariate data. Hinsdale, IL: The Dryden Press. Green, Paul E. and V. Srinivasan, 1978. Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research 5, 103-123. Goldstein, M. and A.F. Smith, 1974. Ridge type estimators for regression analysis. Journal of the Royal Statistical Society B 36, 284-291. Gunst, R.F., 1984. Toward a balanced assessment of collinearity Diagnostics. The American Statistician 38, 79-82. Gunst, R.F., J.T. Webster and R.L. Mason, 1976. A comparison of least squares and latent root regression estimators. Technometrics 18, 75-83. Gunst, R.F. and Robert L. Mason, 1977a. Advantages of examining multicollinearities in regression analysis. Biometrics 33, 249-260. Gunst, R.F. and R.L. Mason, 1977b. Biased estimation in regression: An evaluation using mean square error. Journal of the American Statistical Association 72, 616-628. Haitovsky, Yoel, 1969. Multicollinearity in regression analysis: Comment. The Review of Economics and Statistics 50, 486-489. Hemmerle, W.J. and T.F. Brantle, 1978. Explicit and constrained generalized ridge regression. Technometrics 20, 109-120.

204

C. Ofir,

A. Khuri

/ Multicollinearity

Hendry, D.F., 1979. Predictive failure and econometric modelling in macroeconomics: The transactions demand for money. In: Paul Ormerod (ed.), Economic modelling, 217-242. London: Heinemarm. Hendry, D.F., 1983. Econometric modelling: The ‘consumption function’ in Retrospect. Scottish Journal of Political Economy 30, 193-220. Hocking R.R., 1976. The analysis and selection of variables in linear regression, Biometrics 32, l-49. Hocking, R.R., 1983. Developments in linear regression methodology: 1959-1982. Technometrics 25, 219-229. Hocking, R.R. and O.J. Pendleton, 1983. The regression dilemma. Communication in Statistics-Theory and Methodology 12,497-527. Hoerl, Arthur E., 1954. Fitting curves to data. In: J.H. Perry (ed.), Chemical business handbook, Section 20, 55-77. New York: McGraw Hill. Hoerl, Arthur E. and Robert W. Kennard, 1970a. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12, 55-67. Hoer], Arthur E. and Robert W. Kennard, 1970b. Ridge regression: Application to nonorthogonal problems. Technometrics 12, 69-82. Hoerl, Arthur E. and Robert W. Kennard, 1976. Ridge regression: Iterative estimation of the biasing parameter. Communications in Statistics A5, 77-88. Hoerl, Arthur E., Robert W. Kennard and Kent F. Baldwin, 1975. Ridge regression: Some simulations. Communications in Statistics A4, 105-123. Holbrook, Morris B., 1978. Beyond attitude structure: Toward the informational determinants of attitude. Journal of Marketing Research 15, 545-546. Hotelling H., 1933. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24.417-441.489-520. Houston, Franklin S. and Doyle L. Weiss, 1975. Cumulative advertising effects: The role of serial correlation. Decision Sciences 6, 471-481. Jagpal, S. Harsharanjeet, 1982. Multicollinearity in structural equations models with unobservable variables. Journal of Marketing Research 19, 431-439. Judge, G.G. and M.E. Bock, 1983. Biased estimation. In: Z. Griliches and M.D. Intriligator (eds.), Handbook of Econometrics, 599-649. Amsterdam: Elsevier. Keren, Gideon and Robert J. Newman, 1978. Additional consideration with regard to multiple regression and equal weighting. Organizational Behavior and Human Performance 22, 143-164. Khuri, Andre I., 1980. Simultaneous testing of parameter subsets in less than full rank models, Communications in Statistics A9, 617-627. Kumar, K’rishna T., 1975. Multicolinearity in regression analysis. The Review of Economics and Statistics 57, 365-366. Lawless, J.F., 1978. Ridge and related estimation procedures: Theory and practice. Communications in Statistics A7, 139-164. Lawless, J.F. and P. Wang, 1976. A simulation study of ridge and other regression estimators. Communications in statistics A5, 307-323.

in marketing

models

Learner, E.E., 1973. Multicollinearity: A Bayesian interpretation. Review of Economics and Statistics 55, 371-380. Learner, E.E., 1978. Specifications searches. New York: John Wiley. Learner, E.E., 1982. Sets of posterior means with bounded variance priors. Econometrica 50, 725-736. Learner, E.E., 1983. Model choice and specification analysis. In: Z. Griliches and M.D. Intirligator (eds.), Handbook of Econometrics, Vol. I., 285-330. Amsterdam: Elsevier. Lehmann, Donald R., 1971. Television show preference: Application of a choice model. Journal of Marketing Research 8, 47-55. Lehmann, Donald R., Terrence V. O’Brien, John U. Farley and John A. Howard, 1974. Some empirical contribution to buyer behavior theory. Journal of Consumer Research 1, 43-55. Lilien, Gary L. and Philip Kotler, 1983. Marketing decision making: A model building approach. New York: Harper and Row. Lin, K. and J. Kmenta, 1982. Ridge regression under altemative loss criteria. Review of Economics and Statistics 64, 488-494. Lindberg, Bertil C., 1982. International comparison of growth in demand for a new durable consumer product. Journal of Marketing Research 19, 364-371. Lucas, Henry C., Charles B. Weinberg and Kenneth W. Clowes, 1975. Sales response as a function of territorial potential and sales representative workload. Journal of Marketing Research 12, 298-305. Mahajan, Vijay, Arun K. Jain and Michael Bergier, 1977. Parameter estimation in marketing models in the presence of multicollinearity: An application of ridge regression. Journal of Marketing Research 14, 586-591. Mallows, C.L., 1973. Some comments of C,. Technometrics 15, 661-675. Marquardt, Donald W., 1970. Generalized inverses, ridge regression, biased linear estimation and nonlinear estimation. Technometrics 12, 591-612. Marquardt, Donald W., 1980. You should standardize the predictor variables in your regression models. Journal of the American Statistical Association 75 (1369), 87-91. Marquardt, Donald W. and Ronald D. Snee, 1975. Ridge regression in practice. The American Statistican 29, 3-20. Mason, Robert L., R.F. Gunst and J.T. Webster, 1975. Regression analysis and problems of multicollinearity. Communications in Statistics A4, 277-292. Massy W.F., 1965. Principal component regression in exploratory statistical research. Journal of the American Statistical Association 60, 234-256. McDonald, Gary C. Galarneau and Diane I. Galameau, 1975. A Monte Carlo evaluation of some ridge type estimators. Journal of American Statistical Association 70(350), 407-416. Montgomery, Douglas C. and Elizabeth A. Peck, 1982. Introduction to linear regression analysis. New York: John Wiley and Sons. Naert, Philippe and Peter Leeflang, 1978. Building implementable marketing models. Leiden-Boston: Martinus Nijhoff Social Sciences Division.

C. Ofir,

A. Khuri

/ Multicollinearity

Obenchain, R.L., 1975. Ridge analysis following a preliminary test of the Shrunken Hypotheses. Technometrics 17, 431-441. Ofir, Chezy and John Lynch, 1984. Context effects on judgment under uncertainty. Journal of Consumer Research 11, 669-679. Ofir, Chezy and Adi Rave, 1986. Forcasting demand in international markets: The case of correlated time series. Working Paper, The Hebrew University of Jerusalem. Oliver, Richard, 1980. A cognitive model of the antecedents and consequences of satisfaction decisions. Journal of Marketing Research 17, 460-469. Palda, Kristian S., 1964. The measurement of cumulative advertising effects. Englewood Cliffs, NJ: Prentice Hall. Parsons, Leonard Jon, and Piet Vanden Abeele, 1981. Analysis of sales call effectiveness. Journal of Marketing Research 18,107-113. Price, Bertram, 1977. Ridge regression: Application to nonexperimental data. Psychological Bulletin 84, 759-766. Samli, A. Coskun and Joseph M. Sirgy, 1980. A multidimensional approach to analysing store loyalty: A predictive model. In: Kenneth Bernhardt et al. (eds.), 1981 Educators’ Conference Proceedings, 113-116. Sclove, S.L., 1968. Improved estimators for coefficients in linear regression. Journal of the American Statistical Association 63, 596-606. Sharma, Subhas and William L. James, 1981. Latent root regression: An alternate procedure for estimating parameters in the presence of multicollinearity. Journal of Marketing Research 18, 154-161. Silvey, SD., 1969. Multicollinearity and imprecise estimation, Journal of the Royal Statistical Society B, 31, 539-552. Smith, G. and F. Campbell, 1980. A critique of some ridge regression methods. Journal of the American Statistical Association 75, 74-81. Snee, Ronald D., 1983. Discussion, Technometrics 25, 230-236. Snee, Ronald D. and Donald W. Marquardt, 1984. Collinearity diagnostics depend on the domain of prediction, the model, and the data. The American Statistician 38(2), 83-87. Theil, H., (1963). On the use of incomplete prior information in regression analysis. Journal of the American Statistical Association 58. 401-414. Theil, H. and A.S. Goldberger, 1961. On pure and mixed statistical estimation in economics. International Economic Review 2. 65-78.

in marketing

models

205

Theobald, CM., 1974. Generalizations of mean square error applied to ridge regression. Journal of the Royal Statistical Society B, 36, 103-106. Toutenburg, Helga, 1982. Prior information in linear models. New York: John Wiley and Sons. Vinod, Hrishikesh D., 1976. Application of new ridge regression methods to a study of bell system scale economics. Journal of the American Statistical Association 71(356), 835-841. Wahba, Grace, 1977. A survey of some smoothing problems and the method of generalized cross-validation for solving them. In: Paruchuri R. Knishnaiah (ed.), Applications in Statistics, 507-523. New York: Elsevier. Wampler Roy H., 1970. A report on the accuracy of some widely used least squares computer programs. Journal of the American Statistical Association 65(330), 549-565. Webster, J.T., R.F. Gunst and R.L. Mason, 1974. Latent root regression analysis. Technometrics 16(4), 513-522. Weiss, Doyle L., Franklin S. Houston and Piere Windal, 1978. The periodic pain of Lydia E. Pinkham. Journal of Business 51, 91-101. Wichern. D.W. and G.A. Churchill, 1978. A comparison of ridge estimators. Technometrics 20, 301-311. Wildt, Albert R. and Russell S. Winer, 1979. Specification and estimation of dynamic market systems. Working paper #79-057. College of Business Administration, The University of Georgia. Wilkie, William L. and Edgar Pessemier, 1973. Issues in marketing’s use of multi-attribute attitudes models, Journal of Marketing Research 10, 428-441. Willan, Andrew R. and Donald G. Watts, 1978. Meaningful multicollinearity measures. Technometrics 20, 407-412. Winer, Russel, S., 1979. An analysis of the time varying effects of advertising: The case of Lydia Pinkham. Journal of Business 52(4), 563-576. Wood, Fred S., 1984. Effect of centering on collinearity and interpretation of the constraint. The American Statistican 38, 88-90. Zellner, A., 1971. An introduction to Bayesian inference in econometrics. New York: John Wiley and Sons.