A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data

A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data

Journal of Econometrics 89 (1999) 79—108 A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data Wayne S. DeSa...

544KB Sizes 3 Downloads 65 Views

Journal of Econometrics 89 (1999) 79—108

A Bayesian multidimensional scaling procedure for the spatial analysis of revealed choice data Wayne S. DeSarbo *, Youngchan Kim, Duncan Fong Marketing Department — Smeal College of Business, Pennsylvania State University in University Park, PA 16802, USA  Department of Marketing Research — Faculty of Economics, University of Groningen, The Netherlands  Management Science and Information Systems Department, Smeal College of Business, Pennsylvania State University PA 16802, USA

Abstract We present a new Bayesian formulation of a vector multidimensional scaling procedure for the spatial analysis of binary choice data. The Gibbs sampler is gainfully employed to estimate the posterior distribution of the specified scalar products, bilinear model parameters. The computational procedure allows for the explicit estimation of a covariance matrix which can accommodate violations of IIA due to context effects. In addition, posterior standard errors can be estimated which reflect differential degrees of consumer choice uncertainty and/or brand position instability. A marketing application concerning the analysis of consumers’ consideration sets for luxury automobiles is provided to illustrate the use of the proposed methodology.  1999 Elsevier Science S.A. All rights reserved. Keywords: Bayesian analysis; Multidimensional scaling; Choice models; Market structure analysis; Context effects; Consideration sets

1. Introduction The analysis of binary choice data has been the focus of a number of alternative statistical procedures. The econometric literature primarily deals with modeling ‘pick-1/J’ data (e.g., a consumer buys one car from a large

* Corresponding aurhor. E-mail: [email protected] 0304-4076/99/$ — see front matter  1999 Elsevier Science S.A. All rights reserved. PII: S 0 3 0 4 - 4 0 7 6 ( 9 8 ) 0 0 0 5 6 - 6

80

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

assortment of cars) in non-spatial, regression-type manners which attempt to relate the choices being made to various explanatory variables that describe the attributes of the choice alternatives and/or economic agents making such choices. Here, typical regression-like approaches such as conditional logit (McFadden, 1976), conditional probit (Hausman and Wise, 1978), etc., are utilized. The marketing and psychometric literature provides a somewhat different perspective to the analysis of such binary choice data where interest typically lies in deriving a spatial representation of the elements of the rows (e.g., subjects or consumers) and columns (e.g., stimuli or brands) of empirical binary choice data (cf. Takane, 1983; DeSarbo and Hoffman, 1986, 1987; DeSarbo and Cho, 1989; DeSarbo et al., 1990; Jedidi and DeSarbo, 1991; DeSarbo et al., 1994). A number of multidimensional scaling (MDS) methodologies (including optimal scaling and correspondence analysis approaches, cf. Nishisato (1980) and Greenacre (1984)) have been introduced into the psychometric literature which have been devised for the analysis of ‘pick any/J’ (e.g., a consumer buys or considers buying any number of brands in a designated product class) data where the model can be utilized to accommodate choice situations when complementary or multiple-purchased brands are bought with correspondingly high probabilities. For example, in the case of soft drink usage, a consumer may purchase one or more different brands of soft drinks at the grocery store for different usage occasions or for variety seeking reasons. Or, in the measurement of consumers’ consideration sets, interest lies in the subset of brands considered just before final choice. Many of these spatial MDS models have traditionally been based on random utility theory with the traditional assumptions on the distribution of error in the utility function akin to conditional logit and probit models utilizing maximum likelihood estimation methods. These joint space MDS procedures have been a dominant influence in competitive market structure analyses given their reliance on actual or intended choices (i.e., buying behavior) from consumers, as opposed to stated preferences or proximity judgments. These MDS models, however, have their own limitations concerning assumptions and estimation methods. First, most explicitly assume independence among brands and individuals primarily because of computational convenience in calculating the resulting likelihood function. This independence assumption across consumers is typically reasonable, but independence across brands is very questionable for most Marketing applications (DeSarbo and Cho, 1989). For example, in consumer research, there can be order effects where the probability of a consumer choosing the jth brand presented may be a function of the previous J!1 brands presented and the respective choice decisions made. Similarly, with actual brand choices within the same product class (e.g., breakfast cereals), substitution and complementarity (context) effects influence final brand purchases. However, due to the computational difficulties in computing such maximum likelihood estimates (MLE), researchers in this spatial MDS

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

81

area have rarely relaxed this (brand) independence (or IIA) assumption. If choice behavior is modeled without considering this possible interaction, the model may lead to conclusions which are not supported by the actual choice situation. This choice dependency has been discussed in the following contexts: the decision to choose one brand affecting the choice of the other brands (Elrod and Keane, 1995), boredom and fatigue in answering questionnaires (Bijmolt and Wedel, 1995), substitution and complementarity (Kim et al., 1997), and context effects (Tversky and Simonson, 1993) as deviations from the IIA assumption. Secondly, a vast literature in psychometrics and econometrics concerned with the analysis of binary data estimate such choice models using maximum likelihood with inferences about the model based on the associated asymptotic theory. Due to the discrete nature of the dependent variables, a much larger sample size may be required for accurate asymptotic approximations than the size of samples needed to apply asymptotics to standard continuous dependent variable models. Also, MLE approaches require a method for evaluation of choice probabilities which requires J multidimensional integrals of a multivariate normal distribution over a cone, where J is the number of choice alternatives, and extremely accurate estimates of these choice probabilities are required. Thirdly, using the maximum likelihood procedures in estimating incidental parameters (e.g., consumer points or vectors) affects the validity of the statistical properties of estimators and associated tests. To deal with the estimation problems related with incidental parameters, the consumer points or vectors can be reparameterized as linear functions of the subject characteristics (e.g., demographics, psychographics, past consumption patterns, cf. DeSarbo and Rao (1986)) or an approach that assumes the consumers can be grouped into a small number of homogeneous clusters or classes-Latent Class MDS (cf. Lazarsfeld and Henry, 1968; Bockenholt and Bockenholt, 1990a,b; DeSarbo et al., 1991, 1994) can be used. However, individual heterogeneity is particularly relevant in many Marketing applications (e.g., in the analysis of transaction databases, or in direct marketing) and, in the case of small data sets, it may not be appropriate to group consumers via a number of homogeneous clusters or classes. This often necessitates the estimation of parameters characterizing the behavior of each individual customer, and thus enabling marketing activity to be targeted at each individual separately (micro-marketing). The purpose of this paper is, therefore, motivated from the belief that there often exists profound relationships between brands and these relationships often affect the consumers’ choice behavior. Sometimes, choice behavior can not be properly explained without understanding the market association between brands. In this paper, we present a new Bayesian vector/scalar products choice MDS model for the analysis of pick any/J data to provide a joint space of consumers and brands. The important perspective to this joint space MDS model construction concerns the estimation of a covariance matrix of error

82

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

terms (The independence property in error terms states that the odds of choosing alternative brand A over brand B is not influenced by the presence of the other alternatives.). To calculate this error covariance structure (i.e., its posterior distribution), the proposed external MDS vector model utilizes the Gibbs sampling augmentation method with several modifications to ease computational difficulties (cf. Tanner and Wong, 1987; Gelfand and Smith, 1990; McCulloch and Rossi, 1994; Albert and Chib, 1993; Schmeiser and Chen, 1991; Fong and Bolton, 1997). In addition, the posterior distributions for consumers’ vectors and brand coordinates (especially posterior means and standard deviations) can be inspected for ascertaining the uncertainty of choice/consideration decisions and the stability of brand positions in the marketplace.

2. The spatial representation of pick any/J data The proposed model is a new Bayesian alternative to a likelihood-based choice MDS model for the estimation of a joint space of consumers and brands from ‘pick any/J’ binary choice data (we later discuss its application to consideration set formation). Individuals (consumers) will be represented as vectors and stimuli (brands) as points. Fig. 1 illustrates the basic structure of our proposed spatial vector MDS model. Assume a small two-dimensional example with only two consumers (represented as vectors labeled as I and II) and four

Fig. 1. Illustration of the proposed vector spatial MDS model.

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

83

brands (labeled as points A, B, C, and D). Brands that project beyond a certain threshold value on a consumer’s vector are predicted to be chosen by that consumer. The vectors point in the direction of increasing utility for each consumer. Thus, according to Fig. 1, consumer I is predicted to buy/ choose brands A and B, whereas consumer II only finds brand C acceptable. The objective is thus to estimate the appropriate dimensionality, the consumer vector orientations, brand point locations, and threshold regions (by consumer), given the observed choice data. DeSarbo and Cho (1989) proposed such a vector MDS model for such binary pick any/J choice data utilizing a nonlinear, probit-type, bilinear model in a maximum likelihood framework where independence was assumed across consumers and brands in their formulation. We relax the independence assumption across the brand choices and estimate the proposed model using a data augmentation method and the Gibbs sampling algorithm. Previous work on the Bayesian analysis of choice models (see Zellner and Rossi, 1984; Zeger and Karim, 1991) using the data augmentation method and Gibbs sampling procedure primarily focused on econometric models deal with modeling choice data in a non-spatial manner. Albert and Chib (1993) developed a Bayesian method for modeling categorical response data using the idea of data augmentation. This data augmentation approach provides a useful framework for implementing the binary regression model (see also Chib, 1995). McCulloch and Rossi (1994) extend this data augmentation approach to the multinomial probit model, which accommodates the relaxation of the IIA property of multinomial logit models. Similarly, Chib and Greenberg (1995) develop an approach, adapted from Chib (1995), to simulate the identified parameters directly in a multivariate probit setting via a Gibbs sampling and Metropolis—Hastings algorithm. As to be shown, we also use such a data augmentation scheme for handling the underlying individual latent utility values in this paper. However, whereas previous econometric models focus on analyzing probit regression models, our proposed methodology is a spatial bilinear vector MDS model. And, we demonstrate how one can employ a Bayesian framework to gain insight into potential context effects in consumer choice (Kim et al., 1997). As consumers evaluate alternatives in the process of making a choice, their judgments relating to a particular alternative are influenced by the characteristics of the other alternatives under concurrent consideration (Farley et al., 1978; Lynch et al., 1991). Thus, a consumer’s evaluation and eventual choice of one or more brands are made in the context of other brands being considered, e.g., the set of brands that are on display on the supermarket shelf (Ratneshwar et al., 1987). In a brand choice situation, context effects refer to the changes in the choice process and/or its outcome at the individual level as a function of the particular brands that are included in the consideration set (Chakravarti and Lynch, 1983; Payne, 1982; Ratneshwar et al., 1987).

84

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

2.1. The proposed vector MDS model Let t"1,2, ¹ dimensions (extracted in an MDS context), i"1,2, N individuals (or consumers), i"1,2, J brands (or stimuli), 1 if consumer i chooses brand j, y " , GH 0 otherwise p "the probability consumer i chooses brand j, GH a "the tth vector coordinate for consumer i, GR b "the tth coordinate for brand j, HR c "a threshold parameter for consumer i. G We observe N independent observations via the vector, y , with J choice G alternatives. Unlike the multinomial conditional logit and conditional probit, there need not be the constraint that p "1 since the sum of the probabilities H GH across brands is the expected number of picks for consumer i which in most cases exceeds 1. By not requiring this constraint, the model can be utilized to accommodate choice situations where complementary or multiple-purchased brands are bought with correspondingly high probabilities. For each choice occasion, a latent, unobservable utility vector u (J;1) is G defined as:



2 u " a b #e , where e &N(0, R). GH GR HR GH G R

(1)

The right-hand side of this utility function contains the scalar product of the ith consumer’s vector coordinates with the jth brand’s coordinates plus error (e ). It G is similar to a scalar products or vector model (Tucker, 1960; Slater, 1960) of utility where individual consumers are typically represented by vectors and brands by points in a joint space representation (recall Fig. 1). However, as discussed, the proposed model relaxes the independent assumption across the brands and accommodates a J;J general variance—covariance matrix. The projection of a brand onto an individual vector indicates the degree or magnitude of utility — the larger the scalar products (i.e., the higher the projection of a brand onto an individual’s vector), the higher is the predicted utility of that brand for that individual (Slater, 1960). Here, u is specified such that choice j is observed (i.e., y "1) if the u is GH GH GH larger than c , a threshold parameter which varies by individual or can be G constant across consumers. Without loss of generality, as in ordinary probit models, we set c "0 for all i. This accomplishes a number of beneficial aspects: G (1) it reduces the total number of parameters to estimate and improves the

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

85

degrees of freedom; (2) it makes the origin of the derived space directly interpretable — perpendicular lines to the consumer vector through the origin delineate choice — non-choice regions; and, (3) setting c "0, ∀i, resolves an indeterminacy G where both the scalar products and threshold parameters can always be made simultaneously larger by some multiplicative scalar and the predicted choices will always remain the same. Therefore





2 Prob(y "0)"Prob(u )0)"Prob a b #e )0 GH GH GR HR GH R 2 "Prob e )! a b . GH GR HR R





(2)



(3)

Similarly, Prob(y "1)"Prob(u '0) GH GH



2 "Prob e '! a b . GH GR HR R

Thus, one can assume that a latent utility variable exists which, after reaching an individual specific threshold value (here zero), ‘produces’ the observed choice y "1. This general specification is quite common in the econoGH metrics limited dependent variable literature where discrete choice models are tied into latent, indirect utility scores and threshold values, and follows the DeSarbo and Cho (1989) framework. The theoretical justification for this mathematical framework can be found in the psychology literature concerning aspiration levels and decision making. Cyert and March (1963), and Simon (1978) claim that economic agents engage in satisficing behavior rather than maximizing behavior, and thus economic agents form thresholds or aspiration levels which ‘defines a natural zero point in the scale of utility’. When the economic agent has alternatives to it that are at or above its aspiration level, this theory predicts that the agent will choose amongst these alternatives as opposed to those alternatives below this level. This process also appears to be congruent with multistage decision making/choice processes which combine compensatory and conjunctive rules (Coombs, 1964; Einhorn, 1970). The likelihood function for individual i can be expressed as:

 

2 (x) dx, (

¸" G

5

GH

(4)

86

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

where

 



2 A " !R, ! a b if y "0, GH GR HR GH R 2 A " ! a b ,R if y "1, GH GR HR GH R



and ( ) ) is defined as a J-multivariate normal probability density with mean ( 0 and variance—covariance matrix R. Given the independence assumption over consumers, the overall likelihood function can be written as , ¸" “ G

 

2 (x) dx. (

(5)

5

GH

Note, the proposed methodology accommodates both internal (one estimates both subject vectors and brand points) and external analyses (the brand points are given or fixed from some previous analysis and one estimates the subject vectors). For internal analyses, there are several identification issues. There are ¹ indeterminacies reflecting the fact that one can multiply a by a non-singular affine transformation matrix R, and multiply b by R\, and still unaffect the scalar products. If one were to restrict R to be a rigid rotation via an orthogonal matrix, there would be ¹(¹!1)/2 such indeterminacies. However, like most MDS bilinear models, no constraints are imposed directly to allow for rotation for interpretation purposes. In addition, since the Gibbs sampler relies on explicit full conditional distributions, these identification issues have no relevance for sampling a"others... or b"others..., since these identification issues relate to the joint space of coordinates a and b together while the Gibbs sampler conditions on one set of points as given within a particular iterate. 2.2. Bayesian analysis of the proposed choice model A Bayesian analysis of the choice likelihood function requires the specification of prior distributions over the parameters (a, b, R) and computation of the posterior density: p(a, b, R " y ,2, y )Jpr(a, b, R)¸(a, b, R " y ,2, y ),  ,  ,

(6)

where a"((a )) is an (N;¹) matrix and b"((b )) is a (J;¹) matrix. As for the GR HR specification of the prior distributions, following Berger (1985), we employ a normal prior on a , a &N(a , K ), a normal prior on b , b &N(b , K ), and G G  ? H H  @

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

87

independent Wishart prior on G"R\, ¼ (V\, v), of dimension J and with ( v degrees of freedom. Based on the work of Albert and Chib (1993) and McCulloch and Rossi (1994), the data augmentation (Tierney, 1994) procedure for the proposed MDS vector model reduces the difficult problem of drawing from a truncated multivariate normal distribution to one of a series of truncated univariate normal draws. In order to develop the Gibbs sampler for our context, let y"(y ,2, y )  , and u"(u ,2, u ). Then,  ,



p(a, b ,G " y)" p(u, a, b, G " y) du, where G"R\.

(7)

Thus, our strategy will be to construct draws from the p(u, a, b, G " y) distribution via the Gibbs sampler and focus attention on the marginal distributions of p(a, b, G " y). After adding the u vector to the set of variables in the data augmentation step, one must now analytically derive the sets of conditional distributions used to implement the Gibbs sampling strategy. Conditional on u, our Gibbs sampler involves drawing from p(a "u, b, G), ∀i, p(b "u, a , b (kOj), G), ∀j, and p(G " u, a, b) G H G I (See Appendix for details and derivations). Draws are from the normal posterior distribution for each p(a " u, b, G) and p(b "u, a, b (kOj), G), and the Wishart G H I posterior of p(G"u, a, b). The only non-standard distribution is the conditional posterior of p(u"a, b, G, y) which is a product of p(u "a , b, G, y ), a J-variate G G G normal distribution truncated over the appropriate space in R(. If y "1, then GH u '0; if y '0, then u )0. Drawing from such a truncated distribution is GH GH GH known to be a difficult task, especially when the parameter dimensionality is high; however, as in the Gibbs sampler described in the data augmentation step in McCulloch and Rossi (1994), we avoid this problem of directly drawing from the truncated multivariate normal by using the series of univariate conditional distributions of p(u "u , a , b, G, y ), where u is a J!1 diGH G \H G G G \H mensional vector of all of the components of u , excluding u . If we draw, in turn, G GH from all of the conditional distributions of each component of the u vector given G all of the others, we have constructed a Gibbs sampler which can generate sequences which converge to a draw from the truncated multivariate normal distribution. In summary, the proposed model Gibbs sampler constructs series of draws by iterative, recursive sampling consequently from the following groups of conditional distributions: (i) NJ conditional distributions of p(u " u , a , b, G, y ), ∀i, j, GH G \H G G (ii) p(a " u, b, G), ∀i, G

88

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

(iii) p(b " u, a, b (kOj), G), ∀j, H I (iv) p(G " u, a, b).

(8)

We now discuss these steps in greater detail. The derivations of the respective full conditional distributions appear in Appendix. 2.2.1. Generate latent utility vectors The first step of the binary choice MDS vector model with correlated errors sampler requires a draw from p(u " u , a , b, G, y ). In the algorithm outlined GH G \H G G above, the critical step is to draw from the distribution of p(u " u , a , b, G, GH G \H G y ). If y "1, then we draw samples from a normal distribution truncated at the G GH left by 0, and if y "0, we draw the samples from the normal distribution GH truncated at the right by 0. Thus p(u " u , a , b, G, y )&N(k , p ) truncated. GH G \H G G GH GH The conditional mean (k ) and variance (p ) can be computed by partitioning GH GH the G matrix. Note that we do not have to invert R to obtain G since our implementation of Gibbs draws G directly from the Wishart distribution. For convenience, permute the row and column of R:



R"







p r g u \ HH H\H H\H "G\" HH . r R u G \HH \H\H \HH \H\H

(9)

Then k "b a #r R\ (u )!(ba ) )"b a !(1/u )u (u ! G \H H G HH H\H G\H GH H G H\H \H\H G\H (ba ) ) and p "p !r R\ r "(1/g ), where (ba ) is the vector G \H GH HH H\H \H\H \HH HH G \H created by deleting the jth row from ba . G 2.2.2. Generate subject vectors The next step of the Gibbs sampler requires a draw from the conditional distribution of a given u, b, G. Conditional on G, it is a simple matter to G transform the random utility framework to the case of iid errors and apply standard conjugate prior distribution theory (see Appendix and Berger (1985)). First, we assume a normal prior on a , a &N(a , K ), where K is the covariance G G  ? ? matrix. Then the posterior distribution of a given other parameters is (assuming G independence over subjects): p(a " u, b, G)Jpr(a )p(u " a , b, G). G G G G

(10)

From the latent utility formulation in Eq. (1), and use of the Cholesky root C, we can reduce Eq. (1) to a system with N(0, I) errors: Cu "C ba #Ce G G G

where (Ce )&N(0, I ). G (

(11)

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

89

Thus, the conditional distribution of a , p(a " u , b, G), is N(aL , R ), where G G G G ? R "(bGb#K\)\ ? ?

(12)

aL "R (bGu #K\a ). G ? G ? 

(13)

and

2.2.3. Generate brand coordinates An option in our proposed spatial model is to estimate both sets of coordinates (internal analysis), a and b. First, we assume a normal prior for each row of b, b &N(b , K ), where K is the variance — covariance matrix of b . Then, the H  @ @ H posterior of b given other parameters is H , p(b " u, a, b (kOj), G)J “ p(u " b, a , G)p(b ). H I G G H G

(14)

Since u " u , a , b, G&N(u , p ), i"1,2, N, we have GH G \H G GH GH u*"ab #e , j"1,2, J, H H H

(15)

where: u*"u !p R\ (u !(ba ) ) GH GH \H\H \H\H G\H G \H 1 "u # u (u !(ba ) ) GH g H\H G\H G \H HH

(16)

and e &N(0, (1/g )I ). Thus, the conditional distribution of b , p(b " u, a, H HH , H H b (kOj), G), is N(bK , R ) where I H @ R "(g (aa)#K\)\ @ @ HH

(17)

bK "R (g au*#K\b ). H @ HH H @ 

(18)

and

Then using the conditional mean and variance of b , we can generate random H deviates from p(b " u, a , b (kOj), G). The Gibbs procedure for internal analysis H G I constructs sequences of draws by sampling from the groups of conditional distributions described earlier. While identification problems exist in such internal analyses given the rotation indeterminacy (or general affine transformation),

90

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

such problems do not affect these conditional draws where one set of coordinates is always held fixed prior to the sampling of the other parameter set. 2.2.4. Generate G The conditional distribution of G given other parameters can be obtained from standard Bayesian analysis of a covariance matrix and Wishart distribution theory. Given u, a, and b, one can form e "u !ba . We combine the G G G conjugate Wishart prior with the multivariate normal likelihood to obtain a Wishart posterior



p(G " u, a, b)&¼





\ , V# e e , v#N . G G G

(19)

Draws from a ¼(H\, v#N) distribution can be obtained by drawing from the standard Wishart distribution (Zellner, 1971). Let H\"LL. G"LVI L where VI &¼(I, v#N) and VI "TT  where T is a lower triangular matrix made by independently drawing the square root of chi-square variates for the diagonal elements, and N(0, 1) draws for off-diagonal elements (¹ "(s ). GGY T>,\G> Each draw of G is made by constructing a T matrix and premultiplying by the lower triangular matrix, L, where G"CC, C"LT (Odell and Feiveson, 1966). Given the well known indeterminacies in the standard multivariate probit model specification, we employ the McCulloch and Rossi (1994) method of estimating the posterior density of G, but reporting the respective correlation matrix (posterior means and standard deviations). Using the strategy outlined above, we draw in turn from each of the conditional distributions given by (i)—(iv) above, given a user specified number of dimensions. One cycle of the simulation algorithm is completed by simulating each of these distributions and this process is repeated to obtain additional draws from the posterior. Ignoring a transient (burn-in) stage, the sample thus generated can be used to estimate the marginal likelihood or posterior means of the simulated values. 2.3. Dimensionality To determine the dimensionality of an MLE based model, a number of heuristics are traditionally utilized. Theoretically, in most MLE problems, the difference between deviance measures is asymptotically distributed as s with degrees of freedom equal to the difference between the corresponding degrees of freedom of the two dimensionalities of the nested models. However, as mentioned in Jedidi and DeSarbo (1991), the s test is not appropriate with the presence of incidental parameters in the likelihood function when the number of parameters to be estimated varies with the size of the data. Nor is a simple

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

91

correction factor appropriate to correct the test statistic. Therefore, they recommend the simultaneous examination of a number of alternative goodness-of-fit measures for model selection (e.g., matching coefficients, point biserial correlations, phi coefficients, AIC statistics, etc.). Note, in our proposed model, it is impractical to utilize the value of the likelihood function to decide upon the appropriate number of dimensions since we are not explicitly maximizing a likelihood function. Because our Bayesian model estimation procedure employs a Gibbs sampling technique, it is similarly inappropriate to utilize AIC based statistics to determine the dimensionality for an application. In addition, such measures (including traditional Bayes factors) require the evaluation of the log-likelihood function which is quite tedious given the typically high dimensionality of the associated integration in Eq. (5). To make comparisons among various statistical models, Bayes factors are typically utilized in such a Bayesian analysis. Although the Gibbs sampling procedure here facilitates us in obtaining a random sample from the posterior distribution, the Bayes factor is non-trivial to compute in a context given the nature of the likelihood function in Eq. (5) (Chen and Shao, 1997). Instead, we follow the advice of Gelfand (1996) to use the cross-validation predictive densities and the Pseudo-Bayes factor to perform model selection. Let y denote the vector of all elements of y except y . The cross-validation GH G GH predictive density f (y " y ,2, y ,2, y ) can be approximated by: GH  GH , fK (y " y ,2, y ,2, y )" GH  GH , 1

1

1 + I M f (y " y , uI , aI, bI, GI) GH GH G \H G

(20)

where uI , aI, bI, GI are the kth draws from the Gibbs sampler, G \H G f (y "1 " y , uI , aI, bI, GI)"Pr(u '0 " uI , aI, bI, GI) GH G \H G GH GH G \H G

(21)

f (y "0 " y , uI , aI, bI, GI)"Pr(u )0 " uI , aI, bI, GI), GH GH G \H G GH G \H G

(22)

and

where we note that u " uI , aI, bI, GI&N(k , p ) as before. Thus, we reGH G \H G GH GH quire only the univariate normal cumulative density function to evaluate fK ( ) ). Then, the Pseudo Bayes factor, PB, can be defined for any pair of competing models (M , M ) as:   , ( f (y " y ,2, y ,2, y , M ) GH  GH ,  PB "“ “ ++ f (y " y ,2, y ,2, y , M ) GH  GH ,  G H

(23)

92

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

calculated over the actual observations for each (i, j) cell in the data array. One can also inspect log PB for positive vs. negative values. One can also utilize ++ simple scatter plots to examine model comparisons for each (ij) observation by plotting the numerator and denominator in Eq. (23). In addition to the pseudo Bayes factor, we calculate other goodness-of-fit measures for model selection. One measure is the simple matching coefficient between the actual y and the predicted y . Another goodness-of-fit measure GH GH examined for dimensionality selection is the phi coefficient between the actual and the predicted y . Finally, we also calculate a correlation (point biserial GH correlation) between the actual binary choice data and the model predicted latent utility values. 2.4. Convergence While we examine iteration plots by parameter to check for convergence and stationary Markov chains, this is typically insufficient to confirm the convergence of the Gibbs sampler, since the estimates can become stuck at a local mode and they can ignore other areas of posterior probability (Allenby and Lenk, 1994). Therefore, to monitor the convergence of the proposed Gibbs sampler, we compute the between- and within-sequence variances of each parameters (Gelman et al., 1995). The statistics are as follows: n P 1 P 1 L B" (dM !dM ) and ¼" (d !dM ).

H  GH

H r!1 r n!1 H H G

(24)

Here B and ¼ indicate between- and within-sequence variances, d is the GH simulated value of the parameter in draw i from sequence j, 1 P 1 L dM " dM , dM " dM , GH  r

H

H n G H respectively. Now, we calculate the estimated marginal posterior variance of d: n!1 1 vaˆr(d"y)" ¼# B. n n

(25)

 mc"a#d/(a#b#c#d), where a : number of cases when both actual and predicted values are ‘choice’, b : number of cases when actual is ‘choice’, but the predicted value is ‘no choice’, c : number of cases when actual is ‘no choice’, but the predicted value is ‘choice’, d : number of cases when both actual and predicted values are ‘no choice’.  phi"(ad!bc)/(((a#b)(c#d)(a#c)(b#d)), where the definition of a, b, c, and d are same as above.

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

93

Then



(RK "

vaˆr(d"y) ¼

(26)

should be close to 1.00 to achieve convergence. In a variety of analyses performed with synthetic data sets constructed with known parameter values, (RK values were observed within the interval [1.416, 1.081] using 4000 iterations (2000 burn-in, 2000 estimation).

3. Application: Consideration sets for luxury cars 3.1. Consideration sets — definition The brand choice process can been viewed as a dynamic consumer phenomenon of narrowing choice alternatives from many to few. Shocker et al. (1991) characterized brand choice as a sequential decision-making process based upon hierarchical or nested sets of brand alternatives. Here, there are a number of brands or services in any specified product/service class available for purchase in any given time period or market. This ‘universal set’ contains all the brands/services in the class. The ‘awareness’ set consists of the subset of brands/services in the universal set of which a consumer is ‘aware’. Typically, this awareness set contains many fewer brands/services than in the universal set. The ‘consideration set’, which is the focus of this application, evolves from the awareness set. According to Shocker et al. (1991), a consideration set consists of goal-satisfying brands/services that are salient or accessible at a particular time. It comprises those brands or services that a consumer would seriously consider purchasing on a particular occasion. Given consumers’ limited time, energy, and cognitive capacity, it is virtually impossible to consider every possible choice alternative and thus, only a subset of all possible alternatives are evaluated in this consideration set. To be successful, a brand must be included in the consideration sets of at least some consumers. Marketers therefore develop strategies to increase the likelihood that a brand will be activated from consumers’ memories and included in their evoked sets of choice alternatives. The activation potential of a brand, sometimes called top-of-mind awareness, is influenced by many factors. One major factor, according to Peter and Olson (1996), is the amount of past purchase and use experience consumers have had with the brands. Consumers are much more likely to think of (activate) brands that they have used before. For this reason, popular brands with higher market shares have a distinct advantage. Because they are used by more consumers, these brands are more

94

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

likely to be activated in evoked sets and included in more consumers’ consideration sets. This increases the brands’ probability of purchase, which, in turn, increases their activation potential, and so on. In contrast, unfamiliar and low-market-share brands are at a disadvantage because they are much less likely to be included in consumers’ evoked sets and thereby be considered as choice alternatives. 3.2. Consideration sets for luxury cars A major U.S. automobile manufacturer conducted personal interviews with N"240 consumers in 1989 who stated that they were intending to purchase a luxury automobile within the next six months. The study was conducted in a number of automobile clinics occurring at different geographical locations in the U.S. This study was executed at a time when European and Japanese imports were stealing market share away from domestic brands across several major automobile markets. One section of the questionnaire asked the respondent to check off from a list of ten luxury cars, specified by this manufacturer and thought to compete in the same market segment at that time (based on prior research), which brands s/he would consider purchasing as a replacement vehicle after recalling their perceptions of expected benefits and costs of each brand. The ten brands tested were: Lincoln Continental, Cadillac Seville, Buick Riviera, Oldsmobile Ninety-Eight, Lincoln Town Car, Mercedes 300E, BMW 325I, Volvo 740, Jaguar XJ6, and Acura Legend. Here, the vast majority of respndents elicited consideration sets in the range of 2—6 automobiles from the list of ten. Only the Lincoln Town Car appeared in the consideration sets of over one-half of the respondents, followed in popularity by the Acura Legend and Lincoln Continental. The BMW 325I was the lowest considered luxury car from the list (considered by only 44 of the 240 respondents). 3.3. Bayesian spatial vector MDS model analysis Our proposed MDS vector choice model with correlated errors was estimated in ¹"1, 2, 3, 4, and 5 dimensions. An internal analysis was performed for estimating a, b, and R with c"0 as discussed above. Table 1 shows the point biserial correlation (Pbc), phi coefficient (phi), simple matching coefficient (mc), and the log pseudo Bayes factor (log PB) of the proposed model by dimension (¹"1,2, 5). The log Pseudo Bayes factor achieves its maximum positive value at ¹"4 dimensions. Also, the accompanying goodness-of-fit measures such as the matching coefficient, point biserial correlation, and phi coefficient show evidence of the appropriateness of the ¹"4 dimensional solution as they level off in moving from ¹"4 to ¹"5 dimensions. Thus, the ¹"4 dimensional solution appears most appropriate as jointly delineated by all model selection heuristics. Also, in the application of (RK statistics for monitoring convergence,

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

95

Table 1 Goodness of fit for luxury automobile consideration set data ¹

mc

phi

pbc

log PBF

1 2 3 4 5

0.670 0.729 0.772 0.800 0.830

0.351 0.467 0.552 0.615 0.668

0.447 0.540 0.613 0.671 0.714

— 1371.36 2133.02 3176.74 1989.22

we have [1.116, 1.362] for subject vectors, [1.113, 1.216] for brand coordinates, and [1.114, 1.398] for inverse covariance values, respectively, indicating that the variances between sequences are close to each other and that there were no serious convergence violations in the Markov chain. As a nested case of our proposed model, we analyze the same consideration set data using an identity covariance matrix (IIA assumption). This is a similar model to that of DeSarbo and Cho (1989) with c"0. For ¹"4, the log pseudo Bayes factor for the proposed full model compared to the identity error model was 19402.05, which provides a very striking difference. Also, the matching coefficient was 0.508, phi coefficient was 0.165, and the point biserial correlation was 0.337. All these measures support the fact that the full covariance model describes the luxury automobile consideration set data much better than the identity error model (i.e., the restricted DeSarbo and Cho (1989) model). 3.4. Interpretation of model parameters The estimated model parameters are the subject vectors a, the brand locations b, and the covariance matrix R. Table 2 presents the posterior means and standard deviations for the estimated brand locations for these four dimensions. Dimension I easily separates the foreign imports from the domestic luxury cars. Dimension II appears to be a styling dimension which distinguishes between the traditional, more formal styled luxury automobiles such as the Jaguar XJ6, Continental, and Mercedes 300E vs. the more sporty, redesigned automobiles such as the Buick Riviera. Dimension III clearly discriminates between U.S. manufacturers where Ford brands such as the Town Car and Continental load positively while the GM brands including the Seville, Riviera, and Olds 98 load negatively; the imports project near the origin on this dimension. Finally, Dimension IV, with the exception of the Continental, separates the Japanese manufactured Acura (from Honda Motors) from the remaining luxury automobiles. This makes imminent sense since, at the time of the study, Japanese manufactured automobiles had made major inroads in market share in this and other U.S. automobile markets given their advances in quality control. Note

96

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

Table 2 Brand coordinate posterior means and SD’s I Posterior means dimension Continental !0.303 Seville !0.309 Riviera !0.163 Olds ’98 !0.206 Town car !0.089 Mercedes 0.320 BMW 0.429 Volvo 0.406 Jaguar 0.030 Acura 0.392 Posterior SD’s dimension Continental Seville Riviera Olds ’98 Town Car Mercedes BMW Volvoy Jaguar Acura

0.004 0.008 0.004 0.006 0.005 0.005 0.003 0.005 0.004 0.005

II

III

IV

0.346 0.247 !0.368 !0.323 !0.059 0.219 !0.069 !0.146 0.481 !0.104

0.281 !0.524 !0.068 !0.276 0.625 !0.194 !0.041 0.041 0.031 0.038

!0.218 0.308 0.184 !0.001 0.585 0.259 0.225 0.250 0.081 !0.508

0.012 0.003 0.007 0.009 0.004 0.007 0.005 0.008 0.008 0.006

0.009 0.005 0.002 0.003 0.012 0.005 0.008 0.005 0.003 0.006

0.005 0.005 0.005 0.004 0.011 0.014 0.006 0.007 0.005 0.007

that the posterior standard deviations can be utilized to examine the relative stability of the positioning of the brands by dimension in the space. Table 3 presents the posterior means and standard deviations for a sample (n"25 as an illustration) of the N"240 estimated subject vectors a of the proposed model. The relative magnitudes of the vector coordinates give an indication of the comparative importance each dimension is to each consumer. An important advantage in using the Gibbs sampler is the fact that we can investigate the uncertainty of consideration set choice/inclusion in each subject through inspection of the posterior standard deviation. We can obtain interval estimates of each subject vector coordinates instead of mere point estimate values showing the possible range of each subject’s vector location. For example, we find that those subjects that included multiple brands (especially brands with truly different features) have higher uncertainty in coordinate values than those who selected similar featured brands. Thus, the posterior standard deviation term reflects this choice uncertainty via relatively higher values by dimension. Figs. 2 and 3 depict the four-dimensional solution in terms of the two, two-dimensional joint space plots. The subject terminus vector coordinate

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

97

Table 3 Consumer vector posterior means and SDs Consumer

I

II

III

IV

Posterior means dimension 1 1.596 2 1.992 3 4.388 4 1.052 5 0.558 6 !1.132 7 0.237 8 !0.202 9 0.644 10 1.743 11 !0.660 12 1.126 13 1.163 14 2.281 15 3.434 16 !06.75 17 3.158 18 2.071 19 0.248 20 1.214 21 0.892 22 1.118 23 0.856 24 1.980 25 2.520

0.945 0.790 2.022 0.080 !0.120 0.716 !0.587 !1.241 3.328 !0.311 2.025 !0.280 0.890 2.825 0.154 1.705 3.175 1.752 1.028 1.050 0.549 3.171 0.961 !0.887 0.665

!0.385 !2.175 !1.539 !1.011 1.676 0.164 0.059 1.046 2.609 1.071 1.235 1.610 !1.592 0.328 0.656 1.773 !1.327 !2.066 !0.063 0.236 !0.622 !2.093 0.271 2.092 1.467

!2.423 !0.250 1.133 !1.334 0.079 1.422 !1.383 !0.744 !2.096 1.064 !1.377 0.133 3.024 !1.442 1.465 1.039 1.450 !1.321 !1.227 !0.550 !1.966 !0.666 0.710 0.308 0.378

Posterior SD’s dimension 1 0.083 2 0.133 3 0.153 4 0.192 5 0.203 6 0.139 7 0.151 8 0.108 9 0.152 10 0.102 11 0.144 12 0.144 13 0.194 14 0.094 15 0.125 16 0.097 17 0.298 18 0.212 19 0.125 20 0.146 21 0.062 22 0.101 23 0.165 24 0.134 25 0.112

0.123 0.096 0.158 0.134 0.062 0.193 0.085 0.163 0.157 0.143 0.216 0.121 0.143 0.150 0.133 0.115 0.078 0.129 0.108 0.051 0.160 0.129 0.266 0.097 0.269

0.166 0.150 0.091 0.154 0.085 0.124 0.274 0.085 0.215 0.164 0.118 0.113 0.122 0.151 0.104 0.216 0.140 0.384 0.120 0.165 0.101 0.109 0.161 0.109 0.090

0.172 0.082 0.084 0.101 0.159 0.127 0.117 0.205 0.275 0.094 0.127 0.185 0.155 0.096 0.101 0.098 0.174 0.105 0.276 0.128 0.108 0.090 0.101 0.178 0.098

98

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

Fig. 2. Joint space plot of dimension 1 vs. 2 for the luxury automobile consideration set.

values are the posterior mean values. As in traditional MDS vector models, we normalize subject vectors to equal length for ease of display. Subjects whose brand inclusion pattern are similar are located close together; however, even if the inclusion pattern is identical, the vectors are not necessarily located in one mass point together due to the sampling procedure. The positioning of the brands relative to the vectors clearly reflects the consideration set choices/inclusions made by a subject. Those subjects that included multiple brands have vectors between selected brands indicating high positive projections of selected brands on their vector, and are reflections of those brands they did not include (with negative projections). Figs. 2 and 3 clearly show heterogeneous preference structures of these individuals for the various brands of luxury

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

99

Fig. 3. Joint space plot of dimension 3 vs. 4 for the luxury automobile consideration set.

automobiles tested. Consumers’ vectors appear to be distributed throughout the four quadrants of both figures indicating large differences in the basis of formation of these consideration sets across this sample of consumers. Finally, to aid in the interpretation of the error interdependencies, we present in Table 4 the estimated posterior correlation matrix (posterior means and s.d.’s). In order to help interpret these correlations, it is useful to recall that if the J dimensional latent variable system has iid errors (scalar covariance structure), then the correlation matrix will have near zero values in off-diagonal elements. This would indicate an IIA scenario. While the posterior mean correlations in Table 4 are clearly not near 1.00, many are significantly different than zero. Of particular note is the sizable negative correlation between the BMW and Volvo.

Posterior SD’s Continental Seville Rivierayy Olds ’98 Town car Mercedes BMW Volvo Jaguar Acura

Posterior means Continental Seville Riviera Olds ’98 Town car Mercedes BMW Volvo Jaguar Acura

0.000 0.013 0.009 0.006 0.015 0.015 0.020 0.007 0.011 0.005

1.000 !0.094 0.065 !0.033 !0.122 0.121 0.170 0.047 !0.083 !0.011

Table 4 Correlation posterior means and SDs

0.000 0.013 0.010 0.011 0.005 0.021 0.009 0.012 0.020

1.000 !0.102 !0.067 0.092 !0.013 0.152 0.055 !0.081 0.166

0.000 0.035 0.024 0.026 0.015 0.008 0.023 0.006

1.000 !0.291 !0.211 0.211 0.107 0.044 0.196 !0.046

0.000 0.013 0.029 0.016 0.006 0.034 0.015

1.000 0.089 0.126 0.038 !0.020 0.196 0.108

0.000 0.016 0.012 0.006 0.011 0.023

1.000 0.039 0.047 !0.040 0.007 0.212

0.000 0.036 0.009 0.031 0.008

1.000 !0.194 !0.045 !0.102 0.052

0.000 0.046 0.015 0.031

1.000 !0.342 0.121 !0.178

0.000 0.008 0.022

1.000 0.016 0.064

0.000 0.009

1.000 !0.019

0.000

1.000

100 W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

101

This is perhaps due to Volvo’s position in the market place on the basis of safety, whereas BMW has a more sporty, contemporary image. This same aspect is witnessed in the sizable negative correlations between the Riviera and Olds98 in GM’s attempt to target different market segments on the basis of style and image (Riviera had been recently restyled to a very sporty look). These, and other correlations, render a concise portrait of potential context effects or deviations from IIA where the inclusions of one particular luxury car in a consideration set may have positive or negative impact on the inclusion of others.

4. Discussion 4.1. Summary This paper presents a Bayesian approach to likelihood-based choice MDS vector threshold modeling designed to analyze “pick any/J” binary choice/ consideration data. Our proposed model can be gainfully utilized to aid marketing strategy. For many marketing managers, competition has intensified as markets have matured. To successfully maintain or increase current market share, marketing managers need to understand the present market situation via more sophisticated methods for investigating consumer choice or consideration set behavior. In this paper, we develop a choice MDS model which can reveal brand choice/consideration inter-relationships. To achieve this goal, however, we have to overcome the difficulty of calculating an exact likelihood function. This is the primary reason we utilized the Gibbs sampler to estimate the parameter values. And the Gibbs sampling algorithm makes it possible to reduce the problem of calculating a truncated multivariate normal distribution into a series of conditional univariate truncated normal calculations. Besides computing the posterior distributions of the parameters, the Gibbs sampler provides a straightforward method of estimating subjects’ uncertainty over various dimensions. The dependency between brands can be analyzed through the estimated error covariance matrix and the subjects’ uncertainty is observed by the posterior standard deviation of each subjects’ vector estimates. 4.2. Future research In other marketing applications, the important information is not any single subject’s choice or consideration set formation behavior, but these patterns in certain market segments — homogeneous groups of consumers who share some designated set of characteristics (e.g., demographics, psychographics, consumption patterns, etc.). Marketing research suppliers often collect samples from

102

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

thousands of consumers, and the ability of MDS procedures to fully portray the structure in such volumes of data is indeed limited. The resulting joint spaces become saturated with points/vectors, often rendering interpretation impossible. In this vein, an approach that assumes the subjects can be grouped into a small number of homogeneous clusters (Wedel and Steenkamp, 1989) or classes is needed. Latent Class Analysis (Lazarsfeld and Henry, 1968) is the one possible approach to deal with this problem. Latent Class MDS models can portray the structure in the same types of data as traditional MDS procedures, with the difference being that market segments are represented in the resulting maps in place of the individual consumers. Such procedures would simultaneously estimate market segments as well as the choice or consideration set structures of consumers in each segment from data obtained from N consumers rendering judgments on J brands. Finally, for brands which are very similar in quality and price as in our data set, it might be appropriate to apply a patterned covariance structure using variance components. Although it is difficult to impose exact restrictions on R, the use of variance components (Box and Tiao, 1973; Zeger and Karim, 1991; McCulloch and Rossi, 1994) might be one option to handle this problem.

Appendix A. We want to draw random deviates (u, a, b, G) from p(u, a, b, G " y). From Gibbs sampling, this can be achieved by iterative, recursive drawing from: p(u " all others), ∀i, j, GH p(a " all others), ∀i, t, GR p(b " all others), ∀j, t, HR p(p " all others), ∀j, k. HI However, it is more efficient to generate a vector or matrix each time, as in these cases: p(a " u, a , kOi, b, G, y), p(b " u, a, b , kOj, G, y), p(G " u, a, b, y). G I H I

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

103

In the paper, we always assume column vectors: a (¹;1) and b (¹;1). G H OurModel: y "0 0 u )0, y "1 0 u '0 GH GH GH GH u "ba #e , e &N(0, G\), G"R\ G G G G priors: b &N(b , K ) H  @ a &N(a , K ) G&¼(V\, v) G  ? (A) Joint posterior distribution p(u, a, b, G " y ,2, y )  , Jp(u, a, b, G)¸(u, a, b, G " y ,2, y )  , , ( Jp(u, a, b, G) “ “ +1(u )0)1(y "0)#1(u '0)1(y "1), GH GH GH GH G H , , J “ p(u " a , b, G) “ p(a ) G G G G G ( , ( ; “ p(b ) ) [p(G)] “ “ +1(u )0)1(y "0) H GH GH H G H





  

#1(u '0)1(y "1), GH GH





where 1(X3A)"1 if X3A is true, 0 otherwise. (B) To generate u GH p(u , " u , kOi, a, b, G, y,)"p(u , " a, b, G, y)"p(u , " a , b, G, y ) G I G G G G ( Jp(u , " a, b, G) “ +1(u )0)1(y "0) G GH GH H #1(u '0)1(y "1), (from (A)) GH GH Note that p(u " all us except u , a, b, G, y)"p(u " u , a , b, G, y ) is a trunGH GH GH G \H G G cated distribution above. Since u " a , b, G&N(ba , G\), u " u , a , b, G G G GH G \H G G&N(k , p ) where k "b a #r R\ (u !(ba ) ), p "p ! GH GH GH H G H\H \H\H G \H G \H GH HH r R\ r "p (does not depend on i), (ba ) is the vector ba with H\H \H\H \H H H G \H G the jth element deleted (Anderson, 1974, p. 28). Therefore p(u " all us except u , GH GH a, b, G, y)



"

N(k , p ) truncated at the right by 0 if y "0, GH GH GH N(k , p ) truncated at the left by 0 if y "1. GH GH GH

104

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

If y "1, we can generate u&U(0, 1) and set u "k #p U\ GH GH GH GH +U(!k /p )#u[1!U(!k /p )],. If y "0, we can generate u&U(0, 1) and GH GH GH GH GH set u "k #p U\[uU(!k /p )]. GH GH GH GH GH (C) To generate a : From (A) p(a " u, a , kOi, b, G, y)Jp(u " a , b, G)p(a ). G G I G G G Now p(u " a , b, G)"N(ba , G\). If G"CC (or CRC"I), then Cu &N(Cba , I). G G G G G Since p(a )"N(aL , K ), it can be shown that p(a " u, a , kOi, b, G, y)"N(aL , R ) G  ? G I G ? where R "((Cb)(Cb)#K\ a )\ and aL "R ((Cb)Cu #K\a ). ? G ? G ?  Proof: From Cu &N(Cba , I), the least squares estimate, aJ "((Cb)(Cb))\ G G G (Cb)Cu , follows: G N(a , ((Cb)(Cb))\) with density proportional to exp+!(aJ !a )(Cb)(Cb) G G  G (aJ !a ),. Since p(a )Jexp+!(a !a )K\(a !a ),, G G G  G  ? G  p(a " u, a , kOi, b, G, y)Jexp +![a((Cb)(Cb)#K\)a ? G G I  G !2a((Cb)(Cb)aJ #K\a )], G G ?  "N(aL , R ) from direct inspection. G ? (D) To generate b : From (A), H

  



, p(b " u, a, b , kOj, G, y)J “ p(u " a , b, G) p(b ) H I H G G G , J “ p(u " u , a , b, G)p(u " a , b, G) p(b ) H GH G \H G G \H G G , J “ p(u " u , a , b, G) p(b ) GH G \H G H G





because p(u " a , b, G) does not depend on b . Since u " u , a , b, G&N(k , G \H G H GH G \H G GH p) (see (B)) independently for I"1,2, N, we have H u*&N(ab , pI ) H H H , where



u* H . u*" . H . u* ,H

and u*"u !r R\ (u !(ba ) ). GH GH H\H \H\H G \H G \H

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

105

The least-squares estimate, bI "(aa)\au* follows N(b , p(aa)\). Using H H H H a similar argument as in (C) gives p(b " u, a, b , kOj, G, y)"N(bK , R ) H I H @ where









\ 1 1 R" and bK "R (aa)#K\ au*#K\b . @ H @ p @ H @  p H H (E) To generate G: From (A)





, p(G " u, a, b, y)J “ p(u " a , b, G) p(G) G G G Since e "u !ba &N(0, R) and p(G)"¼(V\, v), it can be shown that G G G p(G " u, a, b, y)"¼







\ , V# e e , v#N . G G G

Proof. From Anderson (1974, p. 157), A" , e e&¼(R, N) with density G G G proportional to e\  AR\/"R",. Since p(G)J"G"((v!J!1)/2)e\  GV



p(G " u, a, b, y )J



e\  AG ("G"((v!J!1)/2)e\  GV ) "G\",

J"G"((N#v!J!1)/2)e\ RP GA#V  Therefore



G&¼





\ , , N#v . e e#» G G G

References Albert, J., Chib, S., 1993. Bayesian analysis of binary data and polychotomous response data. Journal of the American Statistical Association 88, 669—679. Allenby, G.M., Lenk, P.J., 1994. Modeling household purchase behavior with logistic normal regression. Journal of the American Statistical Association 89, 1218—1229.

106

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

Anderson, T.W., 1974. An Introduction to Multivariate Analysis. Wiley, New York. Berger, J.O., 1985. Statistical Decision Theory and Bayesian Analysis. Springer, New York. Bijmolt, T.H.A., Wedel, M., 1995. The effects of alternative methods of collecting similarity data for multidimensional scaling. International Journal for Research in Marketing 12 (4), 363—371. Bockenholt, U., Bockenholt, I., 1990a. Modeling individual preferences in unfolding preference data. Applied Psychological Measurement 14 (3), 257—269. Bockenholt, U., Bockenholt, I., 1990b. Constrained latent class analysis: simultaneous classification and scaling of discrete data. Unpublished manuscript, University of Illinois, Champaign. Chakravarti, D., Lynch, J.G. Jr., 1983. A Framework for Exploring Context Effects on Consumer Judgment and Choice. In: Bagozzi, R., Tybout, A. (Eds.), Advances in Consumer Research, vol. 10, Association for Consumer Research, Ann Arbor, MI, pp. 289—297. Chen, M., Shao, Q., 1997. On Monte Carlo methods for estimating ratios of normalizing constants. Annals of Statistics 25, 1563—1594. Chib, S., 1995. Marginal likelihood from the gibbs output. Journal of the American Statistical Association 90, 1313—1321. Chib, S., Greenberg, E., 1995. Bayesian analysis of multivariate binary data. Unpublished manuscript, Washington University, St. Louis. Coombs, C.H., 1964. A Theory of Data. Wiley, New York. Cyert, R.M., March, J.G., 1963. A Behavioral Theory of the Firm. Prentice-Hall, Englewood Cliffs, NJ. DeSarbo, W.S., Cho, J., 1989. A stochastic multidimensional scaling vector threshold model for the spatial representation of “Pick Any/N” data. Psychometrika 54, 105—129. DeSarbo, W.S., Hoffman, D.L., 1986. A new unfolding threshold model for the spatial representation of binary choice data. Applied Psychological Measurement 10, 247—264. DeSarbo, W.S., Hoffman, D.L., 1987. Constructing MDS joint spaces from binary choice data: A new multidimensional unfolding threshold model for marketing research. Journal of Marketing Research 24, 40—54. DeSarbo, W.S., Howard, D.J., Jedidi, K., 1991. MULTICLUS: A new method for simultaneously performing multidimensional scaling and cluster analysis. Psychometrika 56, 121—136. DeSarbo, W.S., Manrai, A.K., Manrai, L.A., 1994. Latent class multidimensional scaling: A review of recent developments in the marketing and psychometric literature. In: Bagozzi, R.P. (Ed.), Advanced Methods for Marketing Research. Blackwell, Cambridge, UK, pp. 190—222. DeSarbo, W.S., Rao, V.R., 1986. A new constrained unfolding model for product positioning. Marketing Science 5, 1—19. Einhorn, H.J., 1970. The use of nonlinear, noncompensatory models in decision making. Psychology Bulletin 73, 221—230. Elrod, T., Keane, M.P., 1995. A factor-analytic probit model for representing the market structure in panel data. Journal of Marketing Research 32, 1—16. Farley, J.U., Katz, J., Lehmann, D.R., 1978. Impact of different comparison sets on evaluation of a new subcompact car brand. Journal of Consumer Research 5, 138—142. Fong, D.K.H., Bolton, G.E., 1997. Analyzing ultimatum bargaining: A Bayesian approach to the comparison of two potency curves under shape constraints. Journal of Business and Economic Statistics 15, 335—344. Gelfand, A.E., 1996. Model determination using sampling based methods. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.J. (Eds.), Markov Chain Monte Carlo In Practice. Chapman and Hall, London, pp. 145—162. Gelfand, A.E., Smith, A.F.M., 1990. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association 85, 398—409.

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

107

Gelman, A., Carlin, J.B., Stern, H., Rubin, D., 1995. Bayesian Data Analysis. Chapman and Hall, London. Greenacre, M.J., 1984. Theory and Application of Correspondence Analysis. Academic Press, London. Hausman, J.A., Wise, D.A., 1978. A conditional probit model for a qualitative choice: Discrete decisions recognizing interdependencies and heterogeneous preferences. Econometrika 46, 403—426. Jedidi, K., Desarbo, W.S., 1991. A stochastic multidimensional scaling procedure for the spatial representation of three-mode, three-way pick any/J data. Psychometric Society 56, 471—494. Kim, J., Chatterjee, R., Desarbo, W.S., 1997. Incorporating context effects in the multidimensional scaling of pick any/N choice data. International Journal of Research in Marketing, forthcoming. Lazarsfeld, P.F., Henry, N.W., 1968. Latent Structure Analysis. Houghton-Mifflin, New York. Lynch J.G., Jr., Chakravarti, D., Mitra, A., 1991. Contrast effects in consumer judgments: Changes in mental representations or in the anchoring of rating scales? Journal of Consumer Research 18, 284—297. McCulloch, R., Rossi, P.E., 1994. An exact likelihood analysis of the multinomial probit model. Journal of Econometrics 64, 207—240. McFadden, D., 1976. Quantal choice analysis: A survey. Annals of Economic and Social measurement 5, 363—390. Nishisato, S., 1980. Analysis of categorical data: Dual scaling and its applications. University of Toronto Press, Toronto. Odell, P.L., Feiveson, A.H., 1966. A numerical procedure to generate a sample covariance matrix. Journal of the American Statistical Association 61, 199—203. Payne, J.W., 1982. Contingent decision behavior. Psychological Bulletin 92 (2), 382—402. Peter, J.P., Olson, J.C., 1996. Consumer Behavior and Marketing Strategy. 4th Ed., Irwin Press, Boston, MA. Ratneshwar, S., Shocker, A.D., Stewart, D.W., 1987. Toward understanding the attraction effect: The implications of product stimulus meaningfulness and familiarity. Journal of Consumer Research 13, 520—533. Schmeiser, B., Chen, M., 1991. On Hit-and-run monte carlo sampling for evaluating multidimensional integrals. Technical Report 91-39 Purdue University, Department of Statistics, West Lafayette, Indiana. Shocker, A.D., Ben-Akiva, M., Boccara, B., Nedungadi, P., 1991. Consideration set influences on customer decision-making and choice. Marketing Letters 2, 181—198. Simon, H.A., 1978. Rationality as process as product of thought. American Economic Association 68, 1—16. Slater, P., 1960. The analysis of personal preferences. British Journal of Statistical Psychology 13, 119—135. Takane, Y., 1983. Choice model analysis of the “Pick Any/N” type of binary data. Handout for Presentation at the European Meetings of the Psychometric and Classification Societies, Jouyen-Josas, France. Tanner, M., Wong, W., 1987. The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association 82, 528—550. Tierney, L., 1994. Markov chains for exploring posterior distributions (with discussion). Annals of Statistics 22, 1701—1762. Tucker, L.R., 1960. Intra-individual and Inter-individual multidimensionality. In: Gulliksen, H., Messick, S. (Eds.), Psychological Scaling: Theory and Applications. Wiley, New York, pp. 157—167. Tversky, A., Simonson, I., 1993. Context dependent preferences. Management Science 39 (10), 1179—1189.

108

W.S. DeSarbo et al. / Journal of Econometrics 89 (1999) 79–108

Wedel, M., Steenkamp, E.M., 1989. Fuzzy clusterwise regression approach to benefit segmentation. International Journal of Research in Marketing 6 (4), 241—258. Zeger, S.L., Karim, M.R., 1991. Generalized linear models with random effects: A Gibbs sampling approach. Journal of the American Statistical Association 86, 79—86. Zellner, A., 1971. Introduction to Bayesian Inference in Econometrics . Wiley, New York. Zellner, A., Rossi, P.E., 1984. Bayesian analysis of dichotomous quantal response models. Journal of Econometrics 25, 365—393.