Panel data analysis of household brand choices

Panel data analysis of household brand choices

Journal of Econometrics 103 (2001) 111–153 www.elsevier.com/locate/econbase Panel data analysis of household brand choices Pradeep Chintaguntaa , Eka...

236KB Sizes 1 Downloads 90 Views

Journal of Econometrics 103 (2001) 111–153 www.elsevier.com/locate/econbase

Panel data analysis of household brand choices Pradeep Chintaguntaa , Ekaterini Kyriazidoub; ∗ , Josef Perktoldc a Graduate

School of Business, University of Chicago, Chicago, IL 60637, USA of Economics, UCLA, 8283 Bunche Hall, Box 951477, Los Angeles, CA 90095-1477, USA c Department of Economics, University of Chicago, Chicago, IL 60637, USA

b Department

Abstract The paper examines theoretical and empirical issues arising in panel data studies of household brand choices. We develop a dynamic utility maximization model with habit formation that yields a discrete choice model that is linear in a vector of observable individual and brand characteristics, the lagged choice, an unobservable permanent individual=brand-speci,c e-ect, and an unobservable time-varying error component. We estimate the model using panel data on household yogurt purchases. We compare traditional estimation procedures with the method recently proposed by Honor/e and Kyriazidou. Panel data discrete choice models with lagged dependent variables, Econometrica 68 839 –874. The methods’ robustness with respect to underlying assumptions is investigated in Monte Carlo simulations. ? 2001 Elsevier Science S.A. All rights reserved. JEL classi-cation: C13; C23; M31 Keywords: Panel data; Dynamic discrete choice; Brand choice

1. Introduction Brand choice is a predominant area of marketing research. In an oligopolistic market with di-erentiated products and heterogeneous consumer preferences, understanding the determinants of agents’ purchase behavior is ∗

Corresponding author. Tel.: +1-310-206-2794; fax: +1-310-825-9528. E-mail address: [email protected] (E. Kyriazidou).

0304-4076/01/$ - see front matter ? 2001 Elsevier Science S.A. All rights reserved. PII: S 0 3 0 4 - 4 0 7 6 ( 0 1 ) 0 0 0 4 1 - 0

112

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

important since the willingness of consumers to switch brands a-ects demand elasticities and hence the degree of competition in the industry. Taste di-erences across consumers for the di-erent brands may be attributed to observable (to the researcher) variables, such as demographics and marketing e-orts by ,rms, but also to permanent unobserved heterogeneity in consumers’ preferences and to unobservable transitory ‘taste shocks’. Habit formation is often thought to be another important determinant of brand choice. In this case, having consumed a brand in the past a-ects current brand choice. Both permanent unobserved heterogeneity and the presence of habit formation may create the often observed serial persistence in consumers’ brand choices. Distinguishing ‘true state dependence’, due to habit formation, from ‘spurious state dependence’, due to the presence of permanent unobserved heterogeneity, has been long recognized as an important issue in the economics literature (see, for example, Heckman, 1981). The issue is also of particular interest to marketing researchers: the presence of habit formation a-ects ,rms’ behavior, since in this case a ,rm has an incentive to attract consumers from other brands using, for example, temporary price promotions in order to enjoy higher revenues through future loyal consumption of its own brand. Thus, the presence of state dependence a-ects the nature of competition in an industry, by introducing dynamic aspects in ,rms’ marketing policies. The typical approach adopted in the marketing literature to study brand choice uses panel data (for a speci,c product category) from several households. The systematic component of a brand’s utility is usually assumed to be a linear function of marketing variables, such as price and promotions, and of household characteristics. In order to capture the e-ects of previous purchases of brand choices, a variable that measures brand loyalty is often introduced. This variable is operationalized either as the most recent purchase (Jones and Landwehr, 1988) or as an exponentially weighted sum of all previous choices made by the household (Guadagni and Little, 1983). Observations within and across households are then pooled, and standard maximum likelihood methods (probit, logit) are used to estimate the e-ects of marketing variables and of brand loyalty on choice behavior. In order to account for the presence of unobserved heterogeneity across households, both ,xed e-ects methods (see, for example, Jones and Landwehr, 1988) and predominantly random e-ects methods (see, for example, Jain et al., 1994; Keane, 1997) have been used. More recently, Bayesian methods have been proposed for estimation of panel data models with time-invariant individual-speci,c e-ects (see, for example, McCulloch and Rossi, 1994; and Rossi et al., 1996). Simulation methods (see, for example, Geweke et al., 1994; Borsch-Supan and Hajivassiliou, 1993) are also increasingly being used to estimate panel data random coeGcients models. Keane (1997) provides a recent and comprehensive review of the marketing literature on brand choice.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

113

There are two issues about the typical approach of analyzing consumer behavior that are worth noting—one theoretical, and the other empirical. Regarding the theoretical issue, the commonly used speci,cation assumes that the consumer is maximizing utility on each purchase occasion. However, if the systematic component of a brand’s utility at a given point in time is a-ected by the previous choice(s) made by the consumer (i.e., if brand loyalty is an important driver), then such behavior, that ignores the impact of current choices on future choices, is in general inappropriate. Rather, it may be reasonable to assume that consumers recognize the inter-temporal linkage in their utility. Consequently, observed choices will be the solution to a dynamic optimization problem. A question of research interest that arises is: Under what conditions will such dynamic utility maximizing behavior yield the standard model of brand choice which includes the lagged decision as an explanatory variable? An answer to this question will provide a link between economic theory and the typical econometric speci,cation. To address the theoretical question above, in this paper we set up a dynamic model of brand choice. In the spirit of Deaton and Muellbauer (1980) and Hanemann (1984), the consumer’s utility in each period depends on the consumption of the product category under consideration which comes in di-erent alternative forms (brands) that are perfect substitutes. Each alternative enters the utility function weighed by a ‘quality index’ that depends not only on di-erent exogenous variables (such as brand promotions and advertising, demographics, etc.) and unobserved permanent and time-varying e-ects (individual=brand heterogeneity and taste shocks), but also on the choice made by the consumer in the previous period. Consumers maximize expected discounted utility over an in,nite time horizon by choosing the optimal sequence of purchase decisions and the optimal quantities for each brand. Based on this model, we derive conditions under which the decision rule for choosing an alternative on a given purchase occasion reduces to a discrete choice threshold crossing model that is linearly additive in the exogenous variables, the lagged endogenous dependent variable, and unobservable time-invariant individual=brand-speci,c e-ects. A second issue that the previously mentioned studies in the marketing literature raise is empirical in nature. The typical random e-ects approach in estimating discrete choice models with lags of the dependent variable and unobserved individual=brand-speci,c individual e-ects conditions on the initial observations, treating them as exogenous variables. 1 This assumption, however, seems untenable in the presence of permanent unobserved heterogeneity, and its violation leads in general to inconsistent estimation of all 1

Endogeneity of initial conditions is considered in Erdem and Keane (1996), and Roy et al. (1996). Erdem and Keane also allow for possible serial correlation in the idiosyncratic disturbances.

114

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

parameters of the model. Furthermore, it is widespread practice to assume that the time-invariant unobserved e-ects are also independent of the observed covariates, which is another source of potential inconsistency in estimating the model. On the other hand, well-known ,xed e-ects methods for estimating panel data discrete choice models, such as the conditional likelihood (see, for example, Chamberlain, 1984) and the conditional maximum score approach (Manski, 1987), do not require any assumptions about the statistical relationship between the observed covariates and the individual=brand-speci,c e-ects. However, their validity rests crucially on the strict exogeneity of the observed covariates. 2 In a recent paper, Honor/e and Kyriazidou (2000) show that it is possible to identify and consistently estimate panel data discrete choice models in the presence of exogenous variables, the lagged endogenous variable, and unobserved heterogeneity. Their method allows for the individual=brand-speci,c unobservable e-ects to be correlated with the exogenous variables included in the model in an unspeci,ed manner. In addition, it does not require modeling of the initial conditions or of their statistical relationship with the unobserved heterogeneity. Given the wide range of maintained assumptions in the discrete choice literature, the question of empirical interest that arises is: How robust are the estimates of the parameters of interest across econometric methods, and how sensitive are these methods to model misspeci-cation? To answer this question, in this paper, we apply a variety of econometric approaches to estimate the structural parameters of a model of brand choice using household panel data on yogurt purchases. In particular, we compare the results obtained from the Honor/e and Kyriazidou (2000) method with those obtained from standard methods, such as the conditional logit and the pooled logit approach, with and without random e-ects. In addition, we carry out a Monte Carlo simulation study using the design matrix of the data. Our goal is to identify situations under which the di-erent estimation methodologies are most reliable. We investigate the sensitivity of the estimated parameters with respect to di-erent levels of heterogeneity and to the presence of correlation between the household-speci,c e-ects and the exogenous variables, as well as their sensitivity with respect to di-erent assumptions concerning the distribution of initial observations. We should point out that the analysis in this paper is limited in several aspects. On one hand, the assumptions underlying the theoretical model are strong. In order to derive a tractable solution to the agent’s dynamic programming problem of the form typically assumed in empirical applications, we assume a speci,c functional form for the utility function. Such 2

As we discuss in Section 3, a conditional likelihood approach may be used for discrete choice models with lags of the dependent variable and unobserved individual e-ects provided that the model does not contain any other explanatory variables.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

115

assumptions typically underlie explicitly or implicitly the theoretical framework of many economic applications in labor, industrial organization, and other ,elds. Perhaps more importantly, we assume serial independence for the problem’s state variables (prices, ,rms’ promotional e-orts, transitory error shocks). This implies that the consumer cannot forecast these variables in optimizing his behavior and hence expectations for their future realizations do not enter his decision rule. While strong and perhaps untenable, this assumption underlies many analyses of consumer behavior in the marketing literature. 3 Despite the restrictiveness of the assumptions, the theoretical model that we propose may serve as a useful benchmark that exempli,es conditions under which a commonly used econometric model is consistent with dynamic utility maximizing behavior. On the other hand, the empirical analysis in the paper is restricted for most parts to only two brands. This is done primarily for computational reasons. It raises, however, potentially important sample selection issues which we brieKy examine by extending the analysis to three brands. Another important issue that both our theoretical and empirical analyses ignore is the issue of possible endogeneity of the timing and frequency of purchases. 4 We should therefore point out the limited economic signi,cance of our point estimates. However, we hope that the results provide insight into the more general problem of multiple brand choice. The main alternative to our approach of reducing the dynamic optimization problem to the ‘static’ discrete choice model is by using numerical methods for solving the optimization problem within the estimation procedure. GLonLul (1999) and Erdem and Keane (1996) are recent examples of this approach. The main advantage of their approach is in the ability to simulate changes in the marketing policies that are not subject to the Lucas critique. However, allowing for unobserved permanent heterogeneity increases substantially the complexity and computational cost. GLonLul (1999) does not allow for any permanent heterogeneity, while Erdem and Keane (1996) restrict it to di-erences in the learning experience. As we show in this paper, the e-ects of state dependence are strongly inKuenced by the speci,cation of heterogeneity. If this speci,cation is not correct, then the estimated e-ect of past brand choices can be considerably biased. The decision which approach to use should therefore depend on the relative importance of heterogeneity versus dynamic Kexibility in the speci,c application. The remainder of this paper is organized as follows. Section 2 presents a theoretical model of dynamic discrete choice. Section 3 describes the di-erent estimation methods for binary choice models. Section 4 describes the data and 3 4

A recent exception is Erdem and Keane (1996).

Some of these issues have been addressed in the marketing literature (see, for example, Jain and Vilcassim, 1991; Vilcassim and Jain, 1996).

116

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Section 5 presents the estimation results. Section 6 presents the results of our Monte Carlo study and Section 7 concludes. 2. A model of dynamic discrete choice Each consumer is assumed to maximize expected lifetime utility which is de,ned over two goods. The ,rst good is available in J alternative forms (brands). The second good is a composite good. In each period the agent’s utility function is given by   J  + zit ; u jit yjit j=1

where u(·) is a strictly concave function with u ¿ 0 and u (0) = ∞, and zit is the quantity of the composite good. In the speci,cation above, which follows Hanemann (1984), we assume that the quantities, y1it ; : : : ; yJit , of the J alternative brands of the ,rst good, enter the utility function with multiplicative time-varying quality indices, 1it ; : : : ; Jit , that represent the agent i’s subjective evaluation of each brand in each time period. These quality indices are assumed to be determined as 5  J  J   = exp X  +  d +  +  jit kit kj kj kit−1 ji jit ; k=1

j = 1; : : : ; J ;

k=1

t = 1; 2; : : : ;

where dkit−1 ≡ 1{ykit−1 ¿ 0} is the indicator function that takes the value 1, if the jth brand was consumed in the previous period, and 0 otherwise. The variables in Xjit include individual=brand characteristics known to the consumer at the beginning of period t and are also observed by the econometrician. jit is a scalar variable which may, for example, represent an individual- and time-speci,c taste shock for brand j. ji is an individual=brand-speci,c permanent taste component that may depend on Xji0 (and possibly on the entire sample path of deterministic variables). Both jit and ji are assumed to be observable by the agent at the beginning of each time period but are not observed  by the econometrician. Apart from the habit e-ects incorporated in J kj dkit−1 , the above speci,cation is quite standard in the literathe term k=1 ture of static random utility maximization models. Note that this speci,cation of the quality indices distinguishes between own e-ects (jj ; jj ) and cross e-ects (kj ; kj ). Furthermore, it allows the cross e-ects (kj ; kj ) to vary with k for the same j, and also to be di-erent across j. 5 Note that the quality indices for the initial time period, may depend on xji0 ; ji , and ji0 in an arbitrary way.

ji0 ,

are not speci,ed. Thus, they

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

117

The speci,cation of the period utility function implies that the qualityadjusted alternative forms of the ,rst good are perceived to be perfect substitutes for each other. Furthermore, we assume away income e-ects by postulating that each period’s utility function is quasi-linear in the composite good which may be thought of as being money. The reason for this is that the product categories that we are often interested in (e.g. yogurt, detergent, soft drinks) represent only a small fraction of consumer income. We will further assume that markets are complete so that the agent’s budget constraint may be expressed as a single equation:   J ∞ ∞    p˜ jit yjit + q˜t zit = PVIi ; (1) t=0

j=1

t=0

J and q˜t are the period t prices of the J brands and the comwhere {p˜ jit }j=1 posite good, respectively, expressed in present value terms, and PVIi is the present value of the agent’s lifetime income. We next turn to the stochastic speci,cation of the model. We assume that at the beginning of his lifetime the agent faces uncertainty with respect to J ) and his income. The uncertainty the future realizations of ((Xjit ; jit ; p˜ jit )j=1 with respect to their current realizations is resolved in the beginning of each time period before the agent makes his choice. On the other hand, the price of the composite good will be assumed to evolve deterministically. De,ning  to be the agent’s discount factor, his problem is therefore:      J ∞   t + zit E0  u max jit yjit J ∞ {(yjit )j=1 ;zit }t=0

t=0

j=1

subject to the budget constraint (1). Here, E0 denotes expectations with respect to the initial period’s information set. Note that due to the quasi-linearity of the utility function with respect to zit , if PVI is large enough, the agent will in each period choose optimally the consumption of one of the brands and will spend the rest of his income on the composite good in the period where the price of zit , corrected for time preference, is lowest. We will assume for simplicity that q˜t = t (i.e. the current marginal utility of money in every time period is constant and equal to 1) so that there is perfect substitutability in the consumption of zit over time, i.e. the agent is indi-erent as to the period in which he will consume the composite good. With this assumption the agent’s problem is to maximize      ∞ J J    − pjit yjit t u + E0 PVIi E0 jit yjit t=0

j=1

j=1

118

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

J over {(yjit )j=1 }∞ ˜ jit =t is the discounted price of the jth t=0 , where pjit ≡ p brand in period t. The expected present value of income enters the optimization problem as an additive constant which does not a-ect the optimal consumption of the non-composite good. For simplicity, we ignore this term from here on. Note that, after the removal of income e-ects, the only dynamic link is through the habit formation due to the presence of the previous brand choice in the current quality indices. Thus, if we condition on past and current brand choice, future utility will not depend on the quantity decision in the current period. Therefore, if the agent chooses to consume only brand j in period t, ∗ , solves the static problem: the optimal quantity, yjit

max u( yjit

jit yjit )

− pjit yjit :

∗ ∗ The ,rst order condition is u ( jit yjit = ) jit = pjit and it implies that jit yjit  u − 1(pjit = jit ). The conditional indirect utility function for brand j given the agent’s past choice is therefore,      pjit pjit −1 pjit jit −1 ∗ ∗ − u u u( jit yjit ) − pjit yjit = u jit

 ≡v

pjit jit

jit



jit

jit

;

where v ¡ 0, i.e. it is only a function of the quality-adjusted price of brand j. This is similar to the static case when there are no habit e-ects, i.e. jk = 0 for all j; k (see Hanemann, 1984). In the latter case, the agent will choose brand j at time t if pjit plit 6 for all l = j: (2) jit

lit

In the presence of habit formation, however, the agent has in general to take into account that his current consumption a-ects his future evaluation of the J brands through the quality indices. 6 In other words, in determining his current choice of brand the agent has to account for his expectations about the future realizations of the state variables 7 zjit ≡ (Xjit ; jit ; pjit ) and djit−1 for all j. This may be seen from the Bellman equation which in the two good 6 We assume that the consumer cannot buy a small amount of each brand to form habits for all brands. This would be implied if habits are formed only for the brand with the highest consumed quantity, i.e. if the indicators for the previous choice in the current quality index were of the form dkit−1 ≡ 1{yjit−1 ¿ 0; yjit−1 ¿ ylit−1 for all l = j }. 7 Note that  is not a state variable since its value is revealed at the beginning of the agent’s ji life and is constant thereafter.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

119

case (j = 1; 2) is 8 V (z1it ; z2it ; d1it−1 ; d2it−1 )      p1it    + V (z1it + 1 ; z2it + 1 ; 1; 0) dF(z1it + 1 ; z2it + 1 |z1it ; z2it )     v 1it : =max      p2it     + V (z1it + 1 ; z2it + 1 ; 0; 1) dF(z1it+1 ; z2it + 1 |z1it ; z2it )  v 2it

An analytic solution to the agent’s problem is in general infeasible, unless he is myopic ( = 0), in which case his decision rule collapses to the same one as in the static case (2). To the extent that the agent can forecast future zjit ’s based on their current realizations, his period t value function depends not only on the ratios pjit = jit but also on the levels of each one of the state variables. Below we describe two cases which imply a decision rule of the form of (2). In both cases the state variables (Xjit ; jit ; pjit ) are assumed to be independent over time so that they are not forecastable. This implies that the two future expected values in the two sums above, which condition on di-erent period t brand choices, will be in general di-erent, albeit constant over time, i.e.  W1i ≡  V (z1it+1 ; z2it+1 ; 1; 0)F(z1it+1 ; z2it+1 )  = 

V (z1it+1 ; z2it+1 ; 0; 1)dF(z1it+1 ; z2it+1 )

≡ W2i :

The ,rst case, where the decision rule collapses to the static=myopic one of Eq. (2), is when the agent is ex ante indi-erent between the two brands. This latter situation will occur if, in addition to the serial independence assumption on zjit ≡ (Xjit ; jit ; pjit ), we assume that (i) ji = li for all j = l, i.e. the agent does not have an intrinsic taste for any particular brand; (ii) the zjit ’s are identically distributed across brands; and (iii) the following restrictions on the agent’s structural preference parameters hold: ,rst, the brand j characteristics enter the own quality index jit with the same coeGcients for all brands, i.e. jj = o , and jj = o for all j (here o stands for own e-ects). Second, all other brands’ characteristics enter brand j’s quality index with the same coeGcients which are also the same for all quality indices, i.e. kj = jk = c , and kj = jk = c for all j = k (here c stands for cross e-ects). Conditions (i) – (iii) imply that the value function is symmetric in all its arguments. Under these symmetry restrictions and the independence over time assumption on 8 Note that the condition u (0) = ∞ on the subutility function implies that the consumer will buy a positive quantity of one of the brands in each period.

120

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

all state variables, it is clear that Wji = Wli for all j = l. The agent will therefore choose brand j at time t if pjit = jit 6 plt = lit for all l = j, similarly to the static and myopic case. It is also possible to derive a similar decision rule without any of the symmetry restrictions (i) – (iii) if we are willing to assume that the subutility function u(·), and hence v(·), is logarithmic. Under independence of zjit over time, the agent will choose brand 1 over brand 2 if     p1it p2it + W1i ¿ v + W2i ; v 2it

1it

which with logarithmic utility is equivalent to ln ln (

1it

− ln p1it + W1i ¿ ln

1it

− ln p2it + W2i ⇔

2it

exp(W1i )) − ln p1it ¿ ln (

p1it 6 1it exp(W1i )

2it

exp(W2i )) − ln p2it ⇔

p2it ; 2it exp(W2i )

i.e. the agent will choose the brand with the lowest price adjusted for both quality and expected future utility—compare with the expression in (2) which applies to the static and myopic cases. The above analysis naturally generalizes to the case of more than two brands. In the two-brand case the decision rule is therefore, d1it = 1{X1it 1 − X2it 2 + 1 d1it−1 − 2 d2it−1 − ln p1it + ln p2it + i +(1it − 2it ) ¿ 0};

t = 1; : : : ; T;

where 1 ≡ (11 − 12 ); 2 ≡ (22 − 21 ); 1 ≡ (11 − 12 ); 2 ≡ (22 − 21 ), and i ≡ (2i + W2i ) − (1i + W1i ) ≡ ˜2i − ˜1i . Given data {(djit ; Xjit ; pjit )Tt=0 }j=1; 2 , it is clear that the parameters of the discrete choice model above can only be identi,ed up to scale. Furthermore, for brand-speci,c variables in Xjit , only the di-erence j ≡ jj − jk between the own e-ect jj and the cross e-ect jk is identi,ed. For purely individual-speci,c such as demographic variables, only their di-erential across brands e-ect, j − k , can be identi,ed. Similarly, for the feedback parameters jk , we can only identify the di-erence j ≡ jj − jk . Note, however, that since d1it + d2it = 1 for all t; 1 and 2 cannot be separately identi,ed if we allow for an intercept in the model to accommodate di-erent (non-zero) means for (˜1i + 1it ) and (˜2i + 2it ). Finally, the theoretical model implies that brand prices enter in logarithms and with opposite coeGcients that have the same absolute magnitude. The familiar binary logit model is obtained if we assume that the jit ’s are independent of Xjit ; pjit , and ji for all j, and are independent of each other and identically distributed according to the extreme value distribution with scale parameter, say, ! ¿ 0. If, in addition, the fore-mentioned symmetry

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

121

restrictions on the structural preference parameters hold, namely 11 =22 =o , 12 =21 =c ; 11 =22 =o and 12 =21 =c (which in the notation above imply that 1 = 2 and 1 = 2 ), we obtain the binary logit model with individual e-ects and state dependence of the form often encountered in applied research: exp(i + xit  + dit−1 ) ; Pr(dit = 1|xi ; i ; di0 ; : : : ; dit−1 ) = 1 + exp(i + xit  + dit−1 ) t = 1; : : : ; T; where dit ≡ 1{y1it ¿ 0}; xi ≡ {xit }Tt=0 ; xit ≡ (X1it − X2it ; ln p1it − ln p2it ); i ≡ (˜1i − ˜2i )1=!;  ≡ (o − c ; −1) 1=!, and  ≡ (o − c )1=!. In the next section we describe the identi,cation and estimation approach proposed by Honor/e and Kyriazidou (2000), along with other estimators that have been used to analyze household brand choices. We will focus on the case where the length of the panel T is small, which is the case most frequently encountered in applied research. 3. Estimators for dynamic discrete choice models Honor/e and Kyriazidou (2000) consider the panel data logit model of Section 2, which contains unobservable individual-speci,c e-ects, exogenous explanatory variables, as well as the dependent variable lagged once: Pr(di0 = 1|xi ; i ) = p0 (xi ; i );

i = 1; : : : ; n;

Pr(dit = 1|xi ; i ; di0 ; : : : ; di; t−1 ) = t = 1; : : : ; T ; T ¿ 3:

exp(xit  + dit−1 + i ) ; 1 + exp(xit  + dit−1 + i ) (3)

Here,  is the parameter of interest, and i is an individual-speci,c e-ect which may depend on the exogenous explanatory variables xi ≡ (xi1 ; : : : ; xiT ). The model is left unspeci,ed in the initial period 0 of the sample, since the value of the dependent variable is not assumed to be known in periods prior to the sample. It is assumed, however, that di0 is observed, so that there are at least four observations per individual. It is not necessary, however, to assume that the explanatory variables are observed in the initial sample period. It is important to note the implicit assumption that the transitory error terms in a threshold-crossing model leading to (3) are independent and identically distributed over time with logistic distributions, and independent of (xi ; i ; yi0 ) in all time periods. For model (3), Chamberlain (1993) has shown that, if individuals are observed in three time periods, i.e. if T = 2, then the parameters of the model are not identi,ed. Honor/e and Kyriazidou (2000) show that  and  are both

122

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

identi,ed (subject to regularity conditions) in the case where the econometrician has access to four or more observations per individual, i.e. T ¿ 3. We next describe Honor/e and Kyriazidou’s identi,cation strategy for T = 3. Consider the events A = {di0 ; di1 = 0; di2 = 1; di3 } and B = {di0 ; di1 = 1; di2 = 0; di3 }, where di0 and di3 are either 0 or 1. Then, Pr(A|xi ; i ) = p0 (xi ; i )di0 (1 − p0 (xi ; i ))1−di0 ×

1 exp(xi2  + i ) × 1 + exp(xi1  + di0 + i ) 1 + exp(xi2  + i )

×

exp(di3 xi3  + di3  + di3 i ) 1 + exp(xi3  +  + i )

and Pr(B|xi ; i ) = p0 (xi ; i )di0 (1 − p0 (xi ; i ))1−di0 ×

exp(xi1  + di0 + i ) 1 × 1 + exp(xi1  + di0 + i ) 1 + exp(xi2  +  + i )

×

exp(di3 xi3  + di3 i ) : 1 + exp(xi3  + i )

In general, Pr(A|xi ; i ; A ∪ B) will depend on i . However, if xi2 = xi3 , then 1 ; (4) Pr(A|xi ; i ; A ∪ B; xi2 = xi3 ) = 1 + exp((xi1 − xi2 ) + (di0 − di3 )) which does not depend on i . In the special case where all the explanatory variables are discrete and the xit process satis,es Pr(xi2 = xi3 ) ¿ 0, one can use (4) to make inference about . The resulting estimator will have all the usual properties (consistency and root-n asymptotic normality). While inference based only on observations for which xi2 = xi3 may be reasonable in some cases (in particular, experimental cases where the distribution of xi is in the control of the researcher), it is not useful in many economic applications. However, if the continuous variables in xi2 − xi3 have positive density at 0, we may think of constructing estimators that use observations for which xi2 is close to xi3 . In particular, assuming for ease of exposition that all of the k variables in xit are continuously distributed, and that sampling across individuals is random, Honor/e and Kyriazidou propose estimating  and  by maximizing   n  xi2 − xi3 1{di1 + di2 = 1}K hn i=1   exp((xi1 − xi2 )b + g(di0 − di3 ))di1 ln 1 + exp((xi1 − xi2 )b + g(di0 − di3 ))

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

123

over some compact set. Here K(·) is a kernel density function which gives the appropriate weight to observation i, while hn is a bandwidth which shrinks to zero as n increases. The asymptotic theory will require that K(·) be chosen so that a number of regularity conditions, such as K()) → 0 as |)| → ∞, are satis,ed. The e-ect of the term K((xi2 − xi3 )=hn ) is to give more weight to observations for which xi2 is close to xi3 . The estimator *ˆn ≡ (ˆn ; ˆn ) of *0 ≡ (;) is shown to be consistent and to converge to√a normal distribution nhkn , which although slower than the standard n rate, can be made at rate √ close to n under appropriate smoothness assumptions. The identi,cation idea described above extends in a natural manner to the case of more than four observations per individual and also to the case of multinomial logit models. It is based on sequences where an individual switches between alternatives in any two of the middle T − 1 periods. For the binary choice model, the objective function in the case of general T takes the form:   n   xit+1 − xis+1 1{dit + dis = 1}K hn i=1 16t¡s6T −1   exp((xit − xis )b+g(dit − 1 − dis + 1 )+g(dit + 1 − dis − 1 )1{s − t ¿ 1})dit ln : 1+exp((xit − xis )b+g(dit − 1 − dis + 1 )+g(dit + 1 − dis − 1 )1{s − t ¿ 1}) (5) Furthermore, Honor/e and Kyriazidou (2000) also show that the model is identi,ed even in the case where the logit assumption is relaxed and the distribution of the unobservable time-varying errors is left unspeci,ed. In either the logistic or the semiparametric case, their approach su-ers from several limitations: (i) the assumption that the errors in the underlying threshold-crossing model are independent over time. This assumption, however, typically underlies most estimation approaches that rely on the maximum likelihood principle, due to the otherwise prohibitive computational cost implied by the required integration over multiple dimensions. Furthermore, note that the independence over time assumption is also required by the theoretical model developed in Section 2. (ii) The assumption that xit − xis has support in a neighborhood of 0 for any t = s, which rules out time–dummies as explanatory variables. (iii) The fact that individual unobservable e-ects cannot be estimated, and hence it is not possible to carry out predictions or compute elasticities for individual agents or at speci,ed values (e.g. means) of the explanatory variables. This latter restriction is also a drawback in all ‘,xed e-ects’ approaches that eliminate the individual-speci,c e-ects. It is, however, possible to calculate average elasticities for the observed (sample) population, as we discuss in the working paper version. But in contrast to other likelihood-based approaches, the Honor/e and Kyriazidou approach does not require modeling of the initial observations of the sample. Further, it does

124

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

not make any assumptions about the statistical relationship of the individual e-ects with the observed covariates or with the initial conditions. The idea underlying the identi,cation approach in Honor/e and Kyriazidou (2000) is closely related to the conditional likelihood approach (see, for example, Chamberlain, 1984) for panel data logit models with individual-speci,c e-ects of the form of (3) when there is no dynamic feedback from the lagged choice, i.e.  = 0. Inference concerning  is based on the fact that, given the total number that the individual has chosen 1, t dit , and given that there has been at least one switch between the two alternatives, the conditional probability of a particular history of choices between 0 and 1 is independent of i .  may be then estimated by maximizing the conditional log-likelihood:      T x d exp b n  t=1 it it    ; ln   (6) T i=1 b x c exp t=1 it t c∈Ci   where Ci = {c = (c1 ; : : : ; cT )|ct = 0 or 1 and Tt=1 ct = Tt=1 dit }. It is also possible to estimate  by maximizing   n T   exp((xit − xis )b)dit 1{dit + dis = 1} ln ; (7) 1 + exp((xit − xis )b) i=1 16t¡s6T

i.e. by forming all possible pairs of choices dit and dis where there has been a switch. The estimators de,ned by (6) and (7) coincide only for T = 2. Although the pairwise estimator de,ned by (7) is not a maximum likelihood estimator, it may be used in cases where T is large in which case the conditional likelihood approach (6) may become computationally infeasible. The argument that leads to (6) and (7) breaks down, however, when the lagged choice enters the model. In other words, the estimators de,ned either by (6) or (7) are inconsistent when dit−1 is included as an additional variable in xit . It is also well known (see for example Chamberlain, 1985; Magnac, 1997) that the conditional likelihood approach may be used to estimate panel data logit models of the form of (3) where the lagged dependent variable is the only explanatory variable, i.e.  = 0, provided that there are at least four observations per individual. The resulting estimators, however, are again inconsistent when other explanatory variables besides the lagged choice are included in the model. We proceed to describe some of the other methods that are typically used in estimating binary choice models. To facilitate comparisons with the conditional likelihood approach and Honor/e and Kyriazidou’s estimator, we will focus on the logistic speci,cation and assume that the time-varying error terms are serially independent. In the absence of the lagged dependent variable ( = 0) and of individual heterogeneity (i = 0), the independence over time assumption on the

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

125

time-varying errors implies that model (3) may be estimated by pooling observations over time. The log-likelihood takes the form: T n   [dit ln .(xit b) + (1 − dit ) ln (1 − .(xit b))]; (8) i=1 t=1

where . is the logistic function: .(v) = exp(v)=(1 + exp(v)). In the absence of dynamics (=0), and in the case where individual e-ects are present, the standard random e-ects approach postulates a functional form for the conditional density of the individual e-ects given the entire path of the xit process. Typically, however, the individual e-ects are assumed independent of the exogenous variables, i.e. f(i |xi1 ; : : : ; xiT )=f(i ), where f(·) denotes a density function speci,ed up to a ,nite number of parameters, e.g. it may be taken to be the density of the normal distribution. Under these assumptions, the log-likelihood takes the form:   T n  (9) ln [.(xit b + )dit (1 − .(xit b + ))(1−dit ) ]f() d: i=1

t=1

In the case where  = 0 and there is no individual heterogeneity (i = 0) the log-likelihood is n  T  [dit ln .(xit b + gdit−1 ) + (1 − dit ) ln (1 − .(xit b + gdit−1 ))] i=1 t=1

+

n  i=1

[di0 ln pi0 + (1 − di0 ) ln (1 − pi0 )]:

(10)

To the extent that T is small relative to n, assumptions have to be made about the initial observations di0 . 9 The typical approach assumes that these are exogenous and treats them as nonstochastic constants. The model is then estimated by maximizing with respect to b and g the ,rst term of (10), which is the log-likelihood conditional on the initial choices di0 and which coincides with (8) with the lagged choice dit−1 entering as another variable in xit . The assumption that the initial choices are exogenous may be a reasonable assumption if the initial observations in the sample coincide with the initialization of the process. However, even in this case, this exogeneity assumption will most likely fail if permanent unobserved individual heterogeneity is present. In the case where ; i = 0, the log-likelihood is   T n  ln [.(xit b +  + gdit−1 )dit (1 − .(xit b +  + gdit−1 ))(1−dit ) i=1

t=1

p0 (xi ; )di0 (1 − p0 (xi ; ))(1−di0 ) ]f(|xi ) d: 9

See also the discussion of the initial conditions problem in Hsiao (1986, Chapter 7).

(11)

126

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Thus, one needs to specify a functional form for the distribution of the initial observations conditional on the exogenous variables and the individual e-ects, p0 (xi ; ), as well as a form for the distribution of the individual e-ects conditional on the entire path of the exogenous variables, f(|xi ). In most applied marketing research, however, estimation of the model is based on (9) with the lagged choice entering as another variable in xit . That is, initial observations are treated as nonstochastic variables and individual e-ects are assumed independent of the observed covariates and the initial conditions. In the analysis that follows we will compare the results obtained by the di-erent methods described above using a panel data set of household yogurt purchases. 4. Data description We use the A.C. Nielsen data on yogurt purchases in the city of Sioux Falls, South Dakota. The choice of Sioux Falls as a site for the panel was driven by (a) the proximity of its demographic pro,le to that of the United States, and (b) the ability to monitor purchasing in all major grocery outlets. The complete data are for 2 years, from 1986 to 1988. During that time period, a sample of households in the market were issued magnetized cards. Each time the household shopped at a grocery store, it presented the card at the check-out counter. All purchases made by the household were then scanned and provided to the data-gathering agency. The agency also collected weekly data on marketing variables that inKuence consumer choice, such as the shelf prices for the di-erent brands, which brands, if any, were on display in the store that week, and=or were featured in local newspaper advertisements or in the stores’ Kyers. It is therefore possible to re-create the store environment for each purchase occasion made by a household member. In addition to the marketing variables, detailed demographic information on the households in the sample is also available. We choose the two dominant yogurt brands in this market, Yoplait and Nordica, for the analysis. These brands account for 18.4 and 19.5%, respectively, of yogurt purchases in terms of weight, and 21.2 and 23.9 in terms of number of units bought. We focus our attention on the most popular size of these brands, the 6 oz packages. This size accounts for the majority of purchases made of the selected brands (92% for Yoplait and 98% for Nordica, in terms of units bought). Our initial sample consists of 1318 households who bought yogurt at least once between 17 September 1986 and 1 August 1988. On each purchase occasion a household member buying yogurt may purchase multiple units of the same variety and brand, or di-erent varieties of the same

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

127

brand (e.g. non-fat, low-fat, etc.), or even di-erent brands. In the data used in the estimation, we ignore the quantity decision, and record the brand choice on each purchase occasion irrespective of which varieties of the same brand were bought (calculating a weighted price accordingly). In the case where multiple brands were bought, we randomly select one of the purchased brands. 10 This gives us 17,679 purchase occasions, out of which 3813 are for Yoplait and 4739 are for Nordica. The remaining purchases are of other brands. Since we focus attention on only the top two brands in this market, we remove from the data all purchase occasions where other brands were bought. Furthermore, because some of the estimation methods use the lagged choice as an explanatory variable, we keep in the data only those households that have at least two consecutive purchases of any one of the two brands under consideration. This leaves us with 737 households and 5618 purchase occasions, out of which 2718 are for Yoplait and the remaining 2900 for Nordica. The panel is unbalanced. The minimum number of purchase occasions per household is 2, for the reason explained above, while the maximum is 305. The mean number of purchases is 9.5 and the median is 5 (as compared to 13.4 and 8, respectively, in the original sample). The marketing variables that are available for these data are the shelf price, and the presence or absence of a store display and of a feature advertisement for each brand (the latter coded as 0 –1 dummy variables). Although the data provide us with information on the value of coupons redeemed by the household, we do not use this information to calculate net prices, i.e. the shelf price for each brand net of any coupons redeemed. One reason is that redemption is observed only for the brand that is purchased which may introduce selection bias in estimating the e-ect of price on brand choice (the issue of selection bias when including coupon information has been addressed by Chiang, 1995). Furthermore, in 311 of the 710 purchase occasions where coupons were redeemed, net prices are negative, indicating that there may be problems with the manner in which the value of redeemed coupons is recorded in the data. However, in some of the speci,cations that we estimate we use the information whether a household ever uses coupons or not. Table 1 provides descriptive statistics for the sample of 5618 purchase occasions. On average, Nordica is cheaper and it exhibits higher frequency of store displays and feature advertisements. Demographic variables are often important in determining brand choices. In some of the speci,cations that we estimate below we use several such variables: the mean income of the income class that the household belongs to 10 There are very few purchase occasions where multiple brands were bought, approximately 3% in the complete data set.

128

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 1 Summary statistics of marketing variables—two-brand samplea Brand

Share (%)

Average price (cent=oz)

Proportion of displays (%)

Proportion of features (%)

Nordica Yoplait

51.62 48.38

6.66 (1.01) 9.90 (1.05)

5.14 1.89

26.63 1.89

a Standard

deviations in parentheses.

Table 2 Summary statistics of demographic variables

INCOME (in thousands of dollars) HHSIZE HH-S HH-WW HH-C

Mean

Median

S.D.

29.5 3.0

27.5 3.0

16.8 1.4

Proportion (%)

12.5 32.8 48.0

(INCOME); the household size (HHSIZE); a dummy for full-time employed single households (HH-S); a dummy for households with two full-time employed heads (HH-WW); and a dummy that equals one if a household uses a coupon at least once during the entire sampling period (HH-C). Table 2 gives summary statistics for these variables for the households that were used in the estimation. It is perhaps noteworthy that approximately 50% of the households in our sample use discount coupons at least once. 5. Estimation results In this section we describe our empirical results obtained by the di-erent estimation methods outlined in Section 3. In the most general speci,cation, the model is given by (3), where dit = 1 if household i chooses Yoplait in period t and dit = 0 if it chooses Nordica. The exogenous variables in xit are Pit , the di-erence in the prices between the two brands (in natural logarithms—see Section 2), and the di-erence in the dummy variables for the two brands that describe whether the brand was displayed in the store and featured in an advertisement that week, Dit ; and Fit . These last two explanatory variables, Dit ; and Fit ; are discrete, taking on three possible values: 1, 0, and −1. Thus the model has one continuous and two discrete variables as exogenous regressors.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

129

We estimate the model using the following approaches. (a) The conditional likelihood (logit) approach without the lagged choice (CL), and with the lagged choice treated as exogenous (CLL). (b) The conditional logit pairwise approach without the lagged choice (CLP), and with the lagged choice treated as exogenous (CLLP). (c) The pooled logit approach without the lagged choice (PL), and with the lagged choice treated as exogenous (PLL). (d) The pooled logit approach with normally distributed random e-ects without the lagged choice (PLHET) and with the lagged choice treated as exogenous (PLLHET). (e) The Honor/e and Kyriazidou approach (HK). The objective functions that correspond to the estimation methods above are: (6), (7), (8), (9), and (5) respectively. For methods (a) – (d), the lagged choice is treated as an exogenous variable whenever it is included in the estimation. As noted in Section 3, this produces consistent estimators for approaches (c) and (d) to the extent that initial observations are exogenous. For (a) and (b), however, treating dit−1 as an additional variable in xit produces inconsistent results. Before we present our results, some comments on the way the data are used in the estimation are in order. In what follows we will use the term ‘string’ to denote a consecutive sequence of Yoplait and Nordica purchases. Thus, a household who buys a third brand on a purchase occasion before the very last one produces more than one strings. For the reason explained in Section 4, we only consider strings of length at least 2. Thus, we do not concatenate a household’s strings to produce a single household purchase history of only the two brands under consideration, which would introduce bias in the estimate of the coeGcient of the lagged choice. Instead, we treat each string as an additional purchase sequence by the same household. This gives us a panel of 1400 strings instead of 737, which would correspond to one for each household in the sample if we merely deleted all purchase occasions where a third brand was bought. In view of the way we construct the data used in the estimation, the i subscript in the objective functions now denotes a string instead of a household. We do, however, keep track of the household identity that each string belongs to when we estimate the model using the random e-ects approach (d). The constructed panel of strings is also unbalanced. Thus, subscripts t in the objective functions run from 1 to Ti ; where Ti now denotes the length of a string. All of the methods above that use the lagged choice as an explanatory variable (CLL, CLLP, PLL, PLLHET, HK) ignore the information in the initial observation of each string, except for using the initial brand choice as an explanatory variable. From the methods that do not include the lag, CL and CLP use all information of each string, including the initial observation.

130

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 3 Estimates using various approachesa

CL CLL CLP CLLP CLP-W CLLP-W PL PLF PLL PLHET PLHETF PLLHET HK05 HK10 HK30 HK05-W HK10-W HK30-W PLLHET-S a Standard

p

d

f

−3:662

0.470 (0.214) 0.828 (0.278) 0.512 (0.106) 0.716 (0.128) 0.642 (0.287) 0.784 (0.340) 0.600 (0.140) 0.661 (0.111) 0.853 (0.174) 0.921 (0.207) 0.867 (0.151) 1.031 (0.217) 0.261 (0.470) 0.248 (0.365) 0.289 (0.315) 0.770 (0.439) 0.788 (0.387) 0.779 (0.368) 1.095 (0.239)

0.986 (0.117) 0.924 (0.141) 0.989 (0.051) 1.036 (0.057) 0.692 (0.171) 0.632 (0.200) 1.485 (0.074) 1.526 (0.061) 1.392 (0.091) 1.366 (0.108) 1.454 (0.084) 1.456 (0.113) 0.782 (0.267) 0.759 (0.228) 0.724 (0.195) 0.659 (0.211) 0.635 (0.194) 0.619 (0.187) 1.291 (0.119)

(0.334) −3:347 (0.399) −2:186 (0.136) −1:943 (0.152) −3:159 (0.495) −3:389 (0.552) −2:118 (0.178) −2:520 (0.151) −3:049 (0.249) −3:400 (0.293) −3:724 (0.226) −3:821 (0.313) −3:477 (0.679) −3:128 (0.658) −2:644 (0.782) −2:432 (0.654) −2:626 (0.605) −2:778 (0.575) −3:419 (0.326)



!

1

1.083 (0.073) 1.119 (0.063) −0:333 (0.102) 0.898 (0.142) 1.022 (0.115) 0.198 (0.150)

2.118 (0.077) 1.944 (0.062) 1.677 (0.086)

0.681 (0.156)

1.161 (0.081)

−0:068

(0.140)

0.539 (0.064) 0.046 (0.218)

3.458 (0.084)

2.126 (0.114) 1.223 (0.352) 1.198 (0.317) 1.192 (0.291) 0.558 (0.254) 0.590 (0.232) 0.627 (0.220) 1.550 (0.117)

errors in parentheses.

PL and PLHET ignore the initial observation of each string completely. We do, however, estimate the model with PL and PLHET using all observations, including the strings with only one purchase of any one of the two brands. These estimates are denoted by PLF and PLHETF in Table 3. Including these

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

131

additional observations increases the number of purchase occasions to 8552, out of which 4739 are for Nordica and the remaining for Yoplait. Note that the CL and CLL procedures are based on strings of length of at least two and three purchases, respectively, where at least one switch between the two brands occurs. Their computational cost explodes for long strings that have a lot of such switches. In the results presented below, the maximum string length was set equal to 20. For longer strings, the remaining observations of the string were treated as new strings. The HK method requires choosing the bandwidth, hn ; and a functional form for the kernel function K(·). We specify hn = h × n−1=5 where n now denotes the total number of strings and h is a positive constant, set equal to 0.5, 1.0 and 3.0. The kernel function is taken to be the standard normal density function. Note that in this case the objective function is globally concave so that we do not have to worry about local maxima. Finally, we discuss how we deal with the unbalanced nature of the constructed panel of strings. Our working assumption is that Ti , the length of each of the strings constructed as discussed above, is a random variable that is independent of brand choice. Furthermore, the assumptions corresponding to the various estimation methods all hold conditional on Ti . Therefore, the objective functions that correspond to the maximum-likelihood estimators— CL, PL, PLL, PLHET, PLLHET—are correctly speci,ed. For the pairwise di-erence methods which are not based on the likelihood principle—CLP, CLLP, HK—the objective functions (Eqs. (7) and (5)) give proportionally more weight to longer strings. We may think of weighting each string so that it receives a weight proportional to its length, Ti . In particular, in the objective function of CLP (Eq. (7)), we multiply the contribution of each pair that belongs to a string of length Ti by 1=Ti . Note that this is the same weighting that would be required to make the estimates from a pairwise difference approach equal to those obtained by taking deviations from individual means (the standard ‘within’ approach) in a linear ,xed e-ects model with an unbalanced data set. In the linear model, this weighting is optimal in the sense that it produces the MLE estimates under a normality assumption. In the CLLP procedure, the e-ective string length is (Ti − 1), given that the method conditions on the initial observation. In this case, we therefore use weights equal to 1=(Ti − 1). Similarly, for HK, the e-ective string length is (Ti − 2), since the method conditions on the initial and the last observation of each string. We therefore use weights equal to 1=(Ti − 2). In Table 3, the estimates produced by this weighting scheme are denoted by the suGx W. We should point out, however, that this choice of weights is arbitrary. The estimators de,ned by Eqs. (7) and (5) may be motivated as GMM estimators that satisfy moment conditions that result from the ,rst order conditions of a limiting maximization problem. We might therefore use instead the optimal weighting scheme described in Hansen (1982) to produce eGcient (within a

132

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

certain class) estimates that would also account for the possibly unbalanced nature of the data. For the other methods, which are all based on the maximum likelihood principle (CL, PL, PLL, PLHET, PLLHET), such weighting is not required since we are assuming that Ti is exogenous. Table 3 provides the estimation results from the di-erent procedures for the coeGcients p ; d ; and f of Pit ; Dit ; and Fit , respectively, and of , the coeGcient on the lagged endogenous variable dit−1 : The table reveals that almost all procedures yield statistically signi,cant coeGcients with the expected signs. Speci,cally, an increase in the price of a brand reduces the probability of choosing that brand, and the presence of a store display or of a feature advertisement for a brand makes purchase of that brand more likely. We also note that most methods produce positive estimates for , i.e. a previous purchase of a brand increases the probability of purchasing the same brand in the next period. With respect to the conditional methods we note the following. The likelihood approaches, CL and CLL, produce estimates that are in general close to those of the weighted pairwise approaches, CLP-W and CLLP-W. The estimates for p range between −3:2 and −3:7. For d the estimates from the likelihood and the weighted pairwise approach range between 0.5 and 0.8. The di-erences in the f estimates are somewhat larger; the estimates range between 0.6 and 1.0. Both CLL and CLLP-W produce statistically insigni,cant ’s. The CLL estimate, however, is negative. As we discuss in the Monte Carlo section below, both the likelihood and the pairwise approaches give ’s that are on average negatively biased toward zero. The unweighted pairwise approaches produce p ’s that are considerably lower than their weighted counterparts, around −2:0. This may be explained by the fact that, without weighting, longer strings contribute more to the objective function. Thus, longer strings that display loyal brand choice and are therefore less price elastic receive more weight. The coeGcients on the other exogenous variables, d and f ; are estimated close to the values reported above. In contrast to CLL and CLLP-W, however, CLLP produces a statistically signi,cant estimate for , which is estimated positive around 0.5. Again, a possible explanation for the large di-erence between the value of  as estimated by CLLP relative to the estimated values produced by CLL and CLLP-W may be due to the fact that, without weighting, longer strings that display loyal brand choice contribute more to the objective function. Thus, it appears that weighting has important implications for producing point estimates. With respect to the pooled methods we note the following. p is estimated in the range of −3:0 to −3:8, with the exception of PL and PLF which estimate it at −2:1 and −2:5, respectively. Observe that the estimates increase (in absolute value) monotonically when the lagged choice is included as an explanatory variable and also when unobserved household heterogeneity is

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

133

allowed for. That is, controlling for possible state dependence and household e-ects results in higher price elasticity estimates. The estimates for d range from 0.6 to 1.0 and are close to those obtained by the conditional methods. We note the same pattern as for the p estimates, namely that including the lag and allowing for household heterogeneity raises d monotonically. The f estimates are all very close, in the neighborhood of 1.5. The lagged choice is found to have a large positive e-ect on brand choice: PLL estimates  at 3.5. However, introducing heterogeneity lowers it substantially to 2.1 (PLLHET). This is due to the fact that ignoring possible heterogeneity introduces spurious state dependence, a point originally raised by Heckman (1981). Thus, the PLL estimate of  may also capture unobserved heterogeneity. Indeed, we ,nd that there is substantial unobserved heterogeneity in the sample. All methods that estimate random e-ects give high values for 1 , the standard deviation of the household e-ects, ranging from 1.7 (PLLHET) to 2.1 (PLHET). Note that introducing the lag lowers 1 , or in other words, ignoring possible state dependence exaggerates the amount of unobserved heterogeneity. Furthermore, we note that the presence of the lagged choice lowers the estimated mean !a of i , from approximately 0.9 (PLHET) and 1.0 (PLHETF) to 0.2 (PLLHET) when heterogeneity is allowed for. The same observation applies to the case where i is assumed to have a degenerate distribution (1 = 0), in which case the estimated mean ! drops from approximately 1.1 (PL and PLF) to −0:3 (PLL) when the lagged choice is included in the explanatory variable set. Finally, the estimates produced by the methods that use the entire sample (PLF and PLHETF) are higher in absolute value for all coeGcients compared to those produced by the same procedures which use only the restricted sample that ignores the initial observation and only considers strings of length at least equal to 2. The di-erences, however, are not large. With respect to the HK estimates we observe the following. The p estimates are in general close to those obtained by the other methods, although the weighting scheme that we employ lowers the estimates considerably for two out of the three bandwidths considered. 11 For d ; the weighted HK estimates range between 0.5 and 0.8 for the di-erent bandwidths and are close to the values estimated by the other methods, especially those of the conditional pairwise logit approach. However, without the weighting d drops by almost a half for all bandwidths and becomes insigni,cant. The f estimates are all close, estimated between 0.6 and 0.8, close to those obtained from CLP and CLLP, but lower than the values estimated by the other methods. The estimates of  are again quite di-erent with and without weighting, ranging between 0.6 (with the weighting) and 1.3 (without weighting), and they are signi,cantly lower than those estimated by the pooled methods. We note some sensitivity in the point estimates of all coeGcients with respect to the 11

Using weights equal to 1=Ti instead of 1=(Ti − 2) hardly changed the point estimates.

134

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

bandwidth choice, although the estimates are well within one standard error of each other for the range of bandwidths considered. As expected, the estimated standard errors on all coeGcients are quite larger than those obtained for the conditional and the pooled estimates, especially when no weighting is used. This is due to the fact that HK uses only a subset of the sample used by the other methods, namely only strings of length at least four, and to the nonparametric component in the HK method which yields estimators that converge at slower than the standard square-root of the sample size rate. Intuitively, the kernel weighting in the HK objective function implies that only a subset of the data is essentially given non-zero weight, which means that the e-ective sample size is smaller than that used by either the conditional or the pooled methods. The large di-erence between the estimate of PLLHET for the habit e-ect, , and the estimates obtained by HK may be explained by the fact that the HK method uses strings that exhibit at least one switch between the two brands (excluding the initial and last observation in each string) and that it uses strings of length at least equal to four. Excluding households that are completely loyal to one brand (PLLHET-S), produces, as expected, a lower estimate for , approximately equal to 1.6. From an economic point of view it seems that the habit e-ect as estimated by PLLHET is too large. For example, using the PLLHET estimate, we ,nd that having consumed a brand in the previous period increases the probability of buying the same brand again from 0:50 (which is the probability of consuming any one of the two brands in the sample) to approximately 0:89. 12 It is likely that the latter estimate still reKects some spurious state dependence due to household heterogeneity that is not suGciently captured. We proceed to investigate this possibility by estimating several speci,cations that include household demographic information. Table 4 presents results using PLL and PLLHET when household characteristics and their interactions with prices are included as additional explanatory variables. The ,rst column for each method shows the estimation results when all household characteristics and price interactions are included. 13 In this case, none of the estimates of the coeGcients on the additional variables is statistically signi,cant. The second column shows the results when the least signi,cant variables are dropped and the remaining variables become significant. The estimates of d ; f , and  are little inKuenced by the inclusion of demographic variables and their interactions with price. The coeGcients 12 See the working version of the paper, available at the corresponding author’s web sites, for the calculations leading to this estimate. 13

Note that there may be an endogeneity issue pertaining to the use of the dummy HH-C, which denotes whether a household ever uses a coupon or not, since this decision may be determined contemporaneously with the brand choice decision.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

135

Table 4 Estimates with demographic variablesa Variable

PLL

PRICE

−2:227

DISPLAY FEATURE LAG CONSTANT ln INCOME HHSIZE HH-S HH-WW P ln INCOME P HHSIZE P HH-S P HH-WW P HH-C 1 Total price e-ectb

−Log-likelihood a Standard

(1.284) 0.870 (0.176) 1.373 (0.092) 3.419 (0.084) −1:666 (0.523) 0.327 (0.178) 0.065 (0.078) 0.535 (0.325) 0.169 (0.236) −0:336 (0.428) 0.080 (0.184) −0:844 (0.747) −0:702 (0.551) 0.200 (0.204)

−2:600 2093.462

PLLHET −3:009

(0.252) 0.878 (0.176) 1.372 (0.092) 3.417 (0.084) −1:364 (0.243) 0.212 (0.073) 0.103 (0.037) 0.494 (0.129)

−0:387

(0.216)

−3:060 2094.853

−4:516

(1.683) 1.024 (0.219) 1.441 (0.113) 2.108 (0.114) −0:741 (0.781) 0.333 (0.261) −0:067 (0.122) 0.318 (0.476) 0.217 (0.343) 0.058 (0.552) 0.087 (0.251) 1.187 (0.920) −0:574 (0.697) 0.280 (0.375) 1.655 (0.086) −4:265 1821.359

−4:090

(0.323) 1.035 (0.218) 1.443 (0.113) 2.116 (0.114) −0:821 (0.466) 0.320 (0.138)

1.708 (0.493)

1.660 (0.086) −4:002 1822.5326

errors in parentheses. at the means of demographic variables.

b Evaluated

on the price are, as expected, less precisely estimated. However, the total price e-ect that includes only signi,cant explanatory variables, is close to the previous estimates. The results show that the estimates produced by the pooled procedures are very robust with respect to di-erent speci,cations. If there is any additional spurious state dependence and heterogeneity left, it is not captured by household demographic characteristics. We next investigate the sensitivity of the estimates produced by the pooled logit methods with respect to the speci,cation of the error distribution, the

136

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 5 Probit estimatesa

PP PPL PPHET PPLHET a Standard

p

d

f

−2:300

0.636 (0.146) 0.850 (0.175) 1.021 (0.207) 1.014 (0.216)

1.581 (0.076) 1.396 (0.092) 1.397 (0.108) 1.439 (0.113)

(0.206) −3:002 (0.243) −3:625 (0.293) −3:769 (0.309)



3.626 (0.080) 2.104 (0.119)

!

1

1.174 (0.085) −0:429 (0.102) 0.737 (0.142) 0.131 (0.154)

1.514 (0.072) 1.018 (0.064)

errors in parentheses.

exogeneity assumption on the initial observation of each string, and the distribution of the household e-ects. Table 5 presents the results obtained assuming that the transitory error term it is normally distributed and independent across households and over purchase occasions. We estimate the model by the pooled probit approach without the lagged choice in the explanatory variable set (PP) and with the lag, treated as exogenous (PPL); and by the pooled probit with normally distributed random e-ects without the lag (PPHET) and with the lag, treated as exogenous (PPLHET). We note that the results change very little as we replace the logit assumption with the normality assumption (compare with the estimates of Table 3). The most important di-erence with the pooled logit estimates is in the magnitude of the standard deviation of the random e-ects, which drops to 1.514 when the lag is excluded (PPHET), and to 1.018 when it is included (PPLHET). Given the relative insensitivity of the estimates with respect to the distribution of the transitory errors, we focus in the rest of this section on the logit speci,cation. All previous estimation by the pooled methods when the lagged choice is included as an explanatory variable, treats the initial observations in each string as exogenous. However, as pointed out by Heckman (1981), this will in general lead to inconsistent estimates, especially of the habit parameter , if unobserved heterogeneity is present. It therefore becomes important to model the probability of the initial brand choice in each purchase string, i.e. p0 (xi ; i ) ≡ Pr(di0 = 1|xi ; i ). Note that if the process is stationary and has been operating for a long time, it may be reasonable to assume that it is in equilibrium, so that we could take p0 (xi ; i ) to be the steady state probability of choosing 1 given (xi ; i ). In the absence of exogenous covariates, this probability may be easily shown to be .(i ) : 1 − .( + i ) + .(i )

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

137

Table 6 PLLHET—endogenous initial conditionsa p

d

f



!

1

−4:053

0.803 (0.178)

1.401 (0.115)

1.598 (0.115)

0.046 (0.133)

1.770 (0.102)

(0.274) a Standard

errors in parentheses.

In the presence of exogenous variables, calculation of the steady state probability would require speci,cation of the distribution of xit as well. Instead, we approximate it by .(x X + i ) ; 1 − .(x X +  + i ) + .(x X + i )

(12)

where xX is the overall sample mean of xit , averaged over both households and purchase occasions. Table 6 reports the estimates obtained by maximizing the log-likelihood function (11) where p0 (xi ; i ) is speci,ed as in (12) and f(|xi ) is taken to be the normal density with mean ! and variance 12 . The results reveal that the point estimates of the coeGcients on the exogenous variables do not change much. However, the estimate of  drops signi,cantly to 1.598 compared to 2.126 produced by PLLHET when the initial observations are treated as exogenous (see Table 3). In other words, assuming exogenous initial conditions leads to overestimating the amount of state dependence. This ,nding is consistent with results reported in the literature for other dynamic discrete choice models, 14 and underlines the importance of appropriately modeling initial conditions when random e-ects methods are used in estimating state dependence. We proceed to investigate the sensitivity of the point estimates with respect to the normality assumption underlying the random e-ects approach. We estimate a random e-ects model using a discrete distribution with 2–5 support points (we return to our original assumption that initial conditions are exogenous). The results are reported in Table 7. We note that the estimates hardly change as we increase the number of support points, and that they are very close to the ones obtained by PLLHET with normal random e-ects (see Table 3). We do, however, observe a small but signi,cant decrease in the estimated values for  as the number of support points increases. The Schwartz Information Criterion for model selection (last column of Table 7) suggests that the model with a four-point distribution for the random e-ects 14 See Chay and Hyslop (1998) for an investigation of the issue of modeling initial conditions in dynamic discrete choice models of labor force and welfare participation.

138

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 7 PLLHET—discrete distribution of random e-ects Parameter estimatesa

Mass pointsb

p

d

f



1

2

−3:652

0.969 (0.198) 1.058 (0.214) 1.080 (0.220) 1.098 (0.221)

1.398 (0.108) 1.394 (0.112) 1.459 (0.115) 1.465 (0.116)

2.383 (0.107) 2.157 (0.114) 2.067 (0.116) 2.052 (0.117)

−0:674

2.185 [0.289] 0.290 [0.450] 0.199 [0.424] −1:082 [0.291]

(0.293) −3:824 (0.306) −3:833 (0.312) −3:844 (0.313)

[0.711] −1:672 [0.327] −1:728 [0.307] −2:809 [0.090]

3

4

5

SIC 3811

2.734 [0.224] 2.230 [0.232] 0.328 [0.354]

3760 5.351 [0.037] 2.264 [0.228]

3741 5.383 [0.037]

3756

a Standard errors in parentheses. Correct only to the extent that the true distribution of the random e-ects has indeed 2–5 support points. b Probabilities in square brackets.

Table 8 PLL—,xed e-ects estimation Parameter estimatesa p d f



−3:933 1.109

Fixed e-ects distributionb [ − 70; −7] (−7; −2] (−2; −1] (−1; 1) [1; 2) [2; 7) [7; 70]

1.289 1.138 322 (1.80) (0.235) (0.116) (0.106) a Standard b Intervals

2

18

122

63

74

136

errors in parentheses. in square brackets.

is the most appropriate in terms of precision of the estimates and parsimony of the parametrization. We next attempt to capture any spurious state dependence by directly estimating the household speci,c e-ects. Using the objective function of the PLL procedure, we carry the maximization in two steps, maximizing for any value of  and  the log-likelihood with respect to the i ’s over a 118-point grid on the interval [ − 70; 70]. Out of the 737 households, 458 (approximately 60%) never switch between brands. Of those, 322 always buy Nordica and the remaining 136 always buy Yoplait. As expected, the estimated e-ects for these households are very large (in absolute value). Note that the average number of purchase occasions (excluding the initial observations in each string) is rather small, approximately 8, compared to the number of household-speci,c e-ects that we want to estimate (737). We therefore do not expect the estimates to be consistent. However, as Table 8 reveals, the estimated e-ect of the lagged choice drops signi,cantly, by almost 50%, from 2.126 to 1.138. All other coeGcient estimates are very close to those estimated by PLLHET in Table 3.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

139

Table 9 Summary statistics of marketing variables—three-brand samplea Brand

Share (%)

Average price (cent=oz)

Proportion of displays (%)

Proportion of features (%)

Nordica Yoplait Dannon

42.23 40.07 17.70

6.73 (0.99) 9.99 (1.03) 8.25 (0.50)

4.52 1.66 0.00

23.06 4.74 2.66

a Standard

deviations in parentheses.

Finally, we examine the sensitivity of results with respect to the number of brands used in the analysis. We include one more brand (Dannon), which has the next largest market share (15.6% in terms of weight and 10.9% in terms of units bought). This increases the number of households to 839 and the number of purchase occasions to 7434 (after excluding all occasions where other brands were bought and including only households who have at least two consecutive purchases of any one of the three brands). Summary statistics for this sample may be found in Table 9. We consider the trinomial logit model, t−1 ) Pr(djit = 1|{{Xjis }2j=0 }Ts=0 ; {ji }2j=0 ; {{djis }2j=0 }s=0 exp(Xjit j + j djit−1 + ji ) = 2 ; j=0 exp (Xjit j + j djit−1 + ji )

where Xjit contains brand j’s price (in natural logarithm), and its display and feature dummies. Note that the model above imposes that there are no cross e-ects across brands, i.e. kj = kj = 0 for all k = j in the notation of Section 2. This restriction is common in the literature although it is not necessary. To keep the notation consistent with the one used in the two-brand case, we use j = 0 to denote Nordica, j = 1 to denote Yoplait, and j = 2 to denote Dannon. Table 10 reports our estimates using PLL (assuming exogenous initial conditions and no household=brand heterogeneity, i.e. ji =0 for all i and j); PLLHET (assuming exogenous initial conditions, and that 1i − 0i and 2i − 0i are normally distributed random e-ects with means !1 and !2 , variances 121 and 122 , respectively, and correlation coeGcient equal to 3); and HK (with the bandwidth constant set equal to 1). In panel A of the table we restrict all coeGcients to be equal across the three brands, (j = l =  and j = l =  for all j; l) which is again common practice in the literature. In panel B we allow the coeGcient of the lagged choice to vary across brands. The suGx S denotes the case where the model was estimated using only households who switch at least once among the three brands. Comparing the results from the three-brand case (Table 10) to those from the two-brand case (Table 3), we ,nd the following. For a given estimation

140

PLL PLLHET PLLHET-S HK10

p

d

f



! 1

! 2

−3:245

0.868 (0.146) 1.469 (0.252) 1.530 (0.281) 0.749 (0.298)

1.399 (0.071) 1.881 (0.119) 1.796 (0.128) 0.871 (0.196)

3.509 (0.061) 2.403 (0.122) 1.649 (0.128) 0.807 (0.222)

−0:253

−1:403

(0.183) −4:481 (0.321) −4:159 (0.338) −3:009 (0.319)

(0.079) 0.245 (0.158) 0.798 (0.172)

(0.056)

−1:354

(0.128) −0:705 (0.143)

1 1

1 2

3

1.650 (0.086) 1.346 (0.096)

1.576 (0.101) 1.512 (0.119)

0.375 (0.086) 0.385 (0.095)

(B) Estimates using three brandsa —identical j ’s; di=erent j ’s

PLL PLLHET PLLHET-S HK10 a Standard

p

d

f

0

1

2

! 1

! 2

1 1

1 2

3

−3:251

0.864 (0.146) 1.454 (0.252) 1.534 (0.281) 0.787 (0.348)

1.398 (0.071) 1.881 (0.119) 1.796 (0.128) 0.860 (0.199)

3.768 (0.164) 2.644 (0.317) 1.741 (0.325) 1.733 (0.467)

3.180 (0.159) 2.085 (0.329) 1.545 (0.336) 0.502 (0.417)

3.645 (0.169) 2.454 (0.368) 1.641 (0.387) 0.834 (0.388)

1.643 (0.104) 1.561 (0.199) 1.670 (0.206)

0.384 (0.078) −0:102 (0.157) 0.144 (0.166)

1.655 (0.087) 1.348 (0.098)

1.556 (0.107) 1.510 (0.125)

0.343 (0.097) 0.374 (0.104)

(0.184) −4:479 (0.321) −4:175 (0.339) −3:076 (0.491)

errors in parentheses.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 10 (A) Estimates using three brandsa —identical j ’s and j ’s

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

141

method, the magnitudes of the parameter estimates are similar across the two cases (one exception is the d estimate using HK, which in the three-brand case is considerably higher than in the two-brand-case and also becomes statistically signi,cant). In general, all parameter estimates tend to increase (in absolute value) when a third brand is included, with the exception of p and  when HK is used (although the drop in the p estimate is quite small compared to its standard error). The increase in the (absolute) magnitude of the estimated price and promotional e-ects is not too surprising since one might expect that with the addition of a third brand in the analysis, consumers’ demand could exhibit higher elasticity with respect to price and ,rms’ promotional e-orts. The increase of the habit e-ect as estimated by the pooled methods when a third brand is added might indicate that some of the increased heterogeneity in households and brand decisions in the enlarged data set is captured by a larger estimate of the state dependence. In contrast, the HK estimate of , which is robust to the amount of the heterogeneity across households and brands, decreases when a third brand is included in the analysis. An interesting result that emerges from our analysis when the habit e-ects are allowed to vary across brands (Table 10) is that there appear to be signi,cant di-erences in the estimates for the di-erent brands. 15 In particular, the habit e-ect is strongest for Nordica, followed by Dannon and then Yoplait. This pattern is consistent across all estimation methods. Future research may endeavor to uncover the reasons for the di-erences in brand loyalty, for example the extent of advertising, price promotions, etc. This ,nding may further suggest that common ad hoc restrictions on behavioral parameters (e.g. identical own e-ects, i.e. that jj and jj are identical for all j), in addition to those necessary to identify the parameters of the econometric model, may bias results. Concluding our discussion of the three-brand case, we ,nd that there do not appear to be serious selection problems associated with our restricting the analysis to two brands. The pattern of the variation in the estimates across methods remains the same as in the two-brand case. The estimate of the habit e-ect is more sensitive to the choice of estimation method than to the inclusion of a third brand. One factor that could potentially explain the di-erences in results when a third brand is added in the analysis is that our sample in the three-brand case includes additional households. We would therefore expect the sample in the three-brand case to reKect more heterogeneity in purchase behavior. It is precisely in such a situation that we expect the HK method to do a better job capturing the underlying properties of the data.

15

We also estimated versions of the model where some of the coeGcients on the exogenous variables were allowed to vary across brands. However, we did not ,nd much variation in the estimates.

142

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Summarizing our empirical analysis, we ,nd that all procedures yield signi,cant price elasticities (above 1) in brand choice, and sensitivity of households to ,rms’ marketing e-orts. Furthermore, the estimates suggest that there exist large habit e-ects and substantial heterogeneity across households. The size of the estimated parameters, however, varies considerably across estimation methods. To investigate this issue further and identify situations where the di-erent methods are most reliable in producing point estimates, we next present the results of a Monte Carlo study that uses the design of the data for the two-brand case.

6. Monte Carlo results In this section we report results of Monte Carlo experiments that investigate the sensitivity of various estimators considered in the previous section with respect to: (a) the assumption of strict exogeneity of initial observations; (b) the magnitude of the variance of the individual e-ects; (c) the correlation of individual e-ects and exogenous covariates. The estimators under consideration are the conditional logit (CL); the conditional logit with lag (CLL); the conditional logit with pairwise comparisons (CLP); the conditional logit with pairwise comparisons with lag (CLLP); the pooled logit (PL); the pooled logit with lag (PLL); the pooled logit with normally distributed random e-ects (PLHET); the pooled logit with lag and normally distributed random e-ects (PLLHET); the pooled logit with lag, normally distributed random e-ects and endogenous initial conditions (PLLHETI); and the Honor/e–Kyriazidou (HK) estimator for three di-erent values of the bandwidth constant: 0.5, 1.0, and 3.0. In all experiments we generate a panel of T = 5 observations for each one of n = 771 individuals. Note that 5 is the median string length of the sample used in estimating the model in the previous section. 771 is the number of strings with exactly ,ve non-overlapping consecutive purchases of Yoplait and Nordica that we obtain from the original sample of 17,679 purchase occasions. The response variables dit for each individual and for the last four periods of the ,ve-period panel, i.e. for t = 1; 2; 3; 4, are generated according to the logit model: dit = 1{Pit p + Dit d + Fit f + dit−1 + i + it ¿ 0} =1{xit  + dit−1 + i + it ¿ 0};

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

143

where Pit is the di-erence in the log-prices between the two brands, and Dit and Fit are the di-erences in the feature and display variables between the two brands. The coeGcients on the explanatory variables  = (p ; d ; f ) and  are set equal to the point estimates obtained from PLLHET (i.e.  = (−3:821; 1:031; 1:456) and  = 2:126). The error terms it are drawn independently over time and over individuals from a logistic distribution. The exogenous regressor strings xi ≡ (xi1 ; xi2 ; xi3 ; xi4 ) are generated by random drawing with replacement from the sample of 771 strings of length equal to 5. The designs described below di-er in the way the initial observation di0 and the individual e-ect i for each individual are generated. In each design the experiment is replicated 100 times. In the ,rst experiment (UN0 design), we take the initial observations as exogenous and the individual e-ects identically equal to the mean of i as estimated by PLLHET (! = 0:198) for all individuals. The initial observations on the response variables di0 , are generated similarly to the exogenous regressors xi , i.e. by random drawing with replacement from the sample of initial observations of the 771 strings. Note that in this design, 1 = 0: Thus, PLL estimates the model consistently. In the second set of experiments (UNLO and UNME designs), we take the initial observations as exogenous and the individual e-ects to be independent of the regressors, and we examine the sensitivity of the estimators with respect to the magnitude of the variance of the individual e-ects relative to the variance of the time-varying error component. Speci,cally, the individual e-ects i are generated as N(! ; 12 ), independently over individuals, with ! equal to the estimated mean √ and variance of i from PLLHET (i.e. ! √= 0:198) and 1 equal to 4=(2 3) for√the UNLO design and equal to 4= 3 for the UNME design. Note that 4= 3 ≈ 1:814 is the standard deviation of the logistic distribution and it is approximately equal to the standard deviation of  estimated by PLLHET (1:677). Thus, in the UNME design the variance of the individual e-ects  is equal to the variance of the transitory errors it , while in the UNLO design 12 is equal to 1=4 of the variance of it . In the next sets of experiments (COLO, COME), we relax the assumption of independence between individual e-ects and regressors. In particular, the individual e-ects are generated as a linear combination of one of the exogenous covariates, Pit , over the four time periods:   4 n  1 Pit − Pit ;  i = ! + c n i=1 t=1 2 where ! = 0:198, and c is such that the implied √ 1 is equal to the two values √ considered in the independence case: 4=(2 3) in the COLO design, and 4= 3 for the COME design. We specify c to be positive which implies

144

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

that the induced correlation between the individual e-ects and the logarithm of the brand price ratio is positive. √ Finally, for all designs with 1 = 4= 3, we relax the assumption of strict exogeneity of the initial choice. The individual e-ects are either independent of the exogenous covariates (UNMEI design), or positively correlated with Pit as above (COMEI design). The initial di0 ’s are drawn as di0 = 1{i + i0 ¿ 0}; where i0 is drawn again from a logistic distribution independently of everything else. This design implies that di0 is now correlated with the individual e-ect. The results from the Monte Carlo experiments are reported in Tables 11, 12, 13 and 14. Each table reports the mean bias, the standard deviation, and the root mean squared error (across the 100 replications) of the estimates of each one of the four parameters, p ; d ; f , and . We observe the following: Across all seven designs, the conditional logit methods—CL, CLL, CLP, and CLLP—give overall smaller average biases than the other methods for the coeGcients of the exogenous variables, p ; d , and f . In particular, the mean bias for p is at most 6% of the true parameter value. The biases for d are somewhat larger, up to 14% of the true value, while the bias for f is at most 6% of the true value. There is no indication that using the pairwise estimators, CLP and CLLP, biases the estimates more than using the conditional likelihood estimators, CL and CLL. Introducing the lag in general lowers the biases for all  coeGcients. The feedback parameter  on the lagged choice is signi,cantly underestimated towards zero by both CLL and CLLP for all designs. On average, the bias ranges from 72% to 100% of the true value. As expected, increasing the variance of the individual e-ects does not a-ect the average biases for any of the coeGcients. With the exception of the UN0 design, the pooled methods, PL and PLL, often produce very biased estimates for the price coeGcient p : Speci,cally, in all designs where the individual e-ects are independent of the x’s (UNLO, UNME, UNMEI) p is underestimated in absolute terms (positive bias) by 12% up to 47% of the true value, although the estimates are still of the correct sign (negative). The biases increase substantially when correlation between i and xi is introduced (COLO, COME, COMEI designs), ranging between 82% to almost 170% of the true parameter value. 16 In particular, in the COME, and COMEI designs, PL and PLL produce on average price e-ects that are of the wrong sign (positive). Increasing 1 has a big e-ect; average biases for p increase by 2–3 times. The coeGcients of the display and the feature variables, d and f ; are almost invariably underestimated, although 16

In experiments, reported in the working paper version, where negative correlation between i and Pi was introduced, we found that the estimated p ’s were negatively biased for all pooled methods. On average p was overestimated by approximately 50 –100%.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

145

Table 11 (A) Mean bias for p UN0 CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.14 0.03 0.22 0.08 0.40 0.01 −0:35 −0:28 0.17 −0:07 −0:20 −0:34

UNLO

UNME

COLO

COME

UNMEI

COMEI −0:22 −0:24 −0:18 −0:19

−0:06

−0:05

−0:01

−0:05

−0:03

−0:00

−0:24 −0:24 −0:23 −0:23

0.48 0.48 0.51 0.51 0.24 0.25 0.33 0.31 0.26 1.36 1.20 1.13

0.52 0.52 0.55 0.56 0.27 0.28 0.34 0.36 0.29 1.39 1.26 1.18

0.39 0.40 0.41 0.42 0.26 0.25 0.32 0.38 0.26 1.19 1.06 0.98

0.41 0.42 0.41 0.44 0.27 0.25 0.30 0.34 0.26 1.30 1.11 1.02

0.59 0.58 0.64 0.64 0.27 0.30 0.30 0.32 0.37 1.43 1.28 1.21

0.50 0.51 0.52 0.53 0.26 0.25 0.30 0.34 0.32 1.44 1.31 1.26

0.49 0.48 0.53 0.50 0.91 0.53 0.38 0.31 0.39 1.38 1.24 1.21

0.52 0.52 0.56 0.56 1.66 1.21 0.53 0.38 1.10 1.41 1.29 1.26

0.40 0.41 0.42 0.42 3.76 3.11 3.58 3.74 2.75 1.21 1.10 1.06

0.43 0.42 0.45 0.44 6.12 5.15 3.65 4.15 4.73 1.31 1.15 1.11

0.63 0.63 0.67 0.67 1.82 1.10 0.88 0.85 0.39 1.42 1.28 1.24

0.55 0.56 0.55 0.56 6.47 4.44 3.65 4.15 2.90 1.43 1.32 1.30

0.10

−0:02

0.16 0.02 0.88 0.47 −0:21 −0:03 0.29 −0:26 −0:36 −0:46

0.07 0.10

1.63 1.18 0.41 0.13 1.06 −0:23 −0:34 −0:45

0.08 0.11

3.75 3.10 3.57 3.72 2.73 −0:23 −0:31 −0:42

0.14 0.17

6.12 5.14 3.64 4.14 4.72 −0:21 −0:32 −0:46

1.80 1.06 0.82 0.79 0.13 −0:03 −0:14 −0:27

6.47 4.43 3.64 4.14 2.89 −0:10 −0:21 −0:33

(B) Standard deviation for p CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.45 0.45 0.47 0.48 0.28 0.28 0.35 0.31 0.28 1.21 1.09 1.03

(C) RMSE for p CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.47 0.45 0.51 0.48 0.48 0.28 0.49 0.41 0.33 1.21 1.10 1.08

the biases are not as large compared to p ; the highest being approximately 50% of the true coeGcient values. Similar to the p estimates, the biases for both d and f increase with 1 :  is in general overestimated by PLL

146

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 12 (A) Mean bias for d

CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

UN0

UNLO

UNME

COLO

COME

UNMEI

COMEI

−0:14 −0:07 −0:12 −0:05 −0:18

−0:10 −0:02 −0:07

−0:09 −0:02 −0:09 −0:02 −0:06

−0:04

−0:11 −0:04 −0:08 −0:01 −0:28 −0:10 −0:06

−0:25

−0:23

−0:34

1.99 1.60 1.52

0.02 0.07 0.11 −0:08 1.15 0.84 0.66

0.04 0.10 0.04 0.10 −0:14 −0:11 0.01 −0:09 −0:13 1.24 1.12 1.12

0.04 0.04 0.04 0.04 −0:47 −0:19 −0:23 −0:12 −0:02 2.02 1.79 1.69

0.08 0.08 0.09 0.09 −0:24 −0:13 0.01 −0:09 0.08 0.82 0.68 0.71

0.39 0.39 0.41 0.41 0.21 0.25 0.27 0.27 0.23 3.72 3.50 3.72

0.37 0.38 0.40 0.40 0.20 0.23 0.26 0.28 0.22 4.13 3.98 4.10

0.32 0.32 0.32 0.32 0.21 0.25 0.22 0.23 0.23 3.32 3.18 2.78

0.36 0.36 0.38 0.39 0.19 0.22 0.24 0.22 0.21 3.57 3.52 3.57

0.36 0.36 0.38 0.38 0.20 0.23 0.23 0.24 0.25 4.33 4.34 4.25

0.40 0.40 0.41 0.41 0.20 0.21 0.24 0.22 0.24 3.07 2.68 2.68

0.40 0.39 0.41 0.41 0.35 0.27 0.28 0.27 0.32 3.96 3.66 3.86

0.38 0.38 0.40 0.39 0.48 0.36 0.31 0.28 0.41 4.56 4.27 4.36

0.33 0.32 0.33 0.32 0.22 0.25 0.23 0.25 0.24 3.49 3.28 2.84

0.36 0.37 0.38 0.40 0.24 0.24 0.24 0.24 0.25 3.76 3.68 3.72

0.36 0.36 0.38 0.38 0.51 0.29 0.32 0.27 0.25 4.76 4.68 4.55

0.41 0.41 0.42 0.42 0.31 0.24 0.24 0.24 0.25 3.16 2.76 2.76

0.00

0.06 1.75 1.31 1.21

0.01 1.42 1.12 1.10

0.00

−0:44 −0:28 −0:18

0.00

(B) Standard deviation for d CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.37 0.38 0.39 0.39 0.21 0.26 0.27 0.28 0.22 3.84 3.56 3.71

(C) RMSE for d CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.39 0.38 0.40 0.39 0.28 0.26 0.27 0.28 0.33 4.20 3.78 3.88

except in the COLO and COME designs. The bias is at most 50% of the true value (UNMEI design). We note that the biases for the designs where initial conditions are endogenous (UNMEI, COMEI) are in absolute value

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

147

Table 13 (A) Mean bias for f

CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

UN0

UNLO

UNME

COLO

COME

UNMEI

COMEI

−0:05 −0:00 −0:05 −0:00 −0:03 −0:01

−0:07 −0:02 −0:06 −0:01 −0:26 −0:23

−0:08 −0:02 −0:07 −0:02 −0:59 −0:56 −0:18 −0:11 −0:58

−0:08 −0:02 −0:07 −0:01 −0:21 −0:17 −0:17 −0:15 −0:22

−0:08 −0:01 −0:08 −0:01 −0:49 −0:48 −0:26 −0:33 −0:49

0.11 0.06 0.03

0.05 0.06 0.06 0.06 −0:74 −0:37 −0:38 −0:27 −0:07 0.13 0.05 −0:03

0.05 0.06 0.05 0.06 −0:69 −0:42 −0:26 −0:33 −0:11 0.06 −0:01 −0:06

0.18 0.19 0.19 0.19 0.11 0.12 0.14 0.14 0.12 0.42 0.36 0.34

0.19 0.19 0.19 0.19 0.10 0.12 0.13 0.14 0.11 0.49 0.40 0.38

0.16 0.17 0.16 0.17 0.09 0.10 0.11 0.12 0.10 0.36 0.31 0.30

0.17 0.17 0.18 0.18 0.09 0.10 0.11 0.11 0.10 0.37 0.35 0.36

0.20 0.20 0.21 0.21 0.10 0.11 0.12 0.11 0.13 0.47 0.37 0.35

0.18 0.18 0.19 0.19 0.09 0.09 0.11 0.11 0.11 0.39 0.34 0.32

0.20 0.19 0.19 0.19 0.28 0.26 0.17 0.14 0.31 0.42 0.35 0.34

0.20 0.19 0.20 0.19 0.60 0.57 0.22 0.18 0.59 0.50 0.40 0.37

0.17 0.17 0.18 0.17 0.22 0.20 0.20 0.20 0.24 0.36 0.31 0.30

0.19 0.17 0.19 0.18 0.50 0.49 0.28 0.34 0.50 0.39 0.35 0.35

0.20 0.21 0.21 0.22 0.75 0.38 0.40 0.29 0.14 0.49 0.37 0.35

0.19 0.19 0.20 0.20 0.70 0.43 0.28 0.34 0.15 0.39 0.34 0.32

0.20 0.11 −0:19 0.03 −0:02 −0:07

0.10

−0:02 −0:28

0.06 0.00 −0:05

0.09 0.03 −0:03

0.03 0.00 −0:03

(B) Standard deviation for f CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.18 0.18 0.19 0.19 0.12 0.12 0.14 0.13 0.12 0.36 0.31 0.28

(C) RMSE for f CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

0.19 0.18 0.19 0.19 0.12 0.12 0.25 0.17 0.22 0.36 0.31 0.29

considerably larger than in the other designs where the initial conditions are exogenous. For the designs with correlated e-ects and endogenous initial conditions, increasing 1a tends to lower the average bias for : Finally, in

148

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 14 (A) Mean bias for 

CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

UN0

UNLO

UNME

COLO

COME

UNMEI

COMEI

— −1:70 — −1:67 — 0.00 — −0:07 −0:35 0.03 −0:00 −0:03

— −1:68 — −1:65 — 0.14 — −0:01 −0:32 0.03 0.00 −0:03

— −1:64 — −1:60 — 0.40 — 0.04 0.06 0.07 0.05 0.02

— −1:65 — −1:62 — −0:21 — −0:44 −0:65 0.01 −0:02 −0:05

— −1:58 — −1:54 — −0:18 — 0.30 −0:60 0.05 0.03 0.01

— −2:09 — −2:14 — 1.09 — 0.97 −0:07 0.67 0.61 0.56

— −2:05 — −2:09 — 0.47 — 0.30 −0:51 0.64 0.58 0.54

— 0.11 — 0.12 — 0.09 — 0.11 0.09 0.35 0.32 0.30

— 0.13 — 0.14 — 0.09 — 0.13 0.09 0.36 0.34 0.33

— 0.10 — 0.12 — 0.09 — 0.10 0.09 0.27 0.25 0.25

— 0.09 — 0.11 — 0.08 — 0.12 0.07 0.25 0.25 0.25

— 0.12 — 0.14 — 0.13 — 0.15 0.13 0.65 0.60 0.56

— 0.09 — 0.11 — 0.10 — 0.12 0.09 0.63 0.53 0.49

— 1.68 — 1.65 — 0.16 — 0.11 0.33 0.35 0.32 0.30

— 1.64 — 1.61 — 0.41 — 0.13 0.11 0.37 0.34 0.33

— 1.65 — 1.62 — 0.23 — 0.45 0.66 0.27 0.25 0.25

— 1.58 — 1.54 — 0.20 — 0.32 0.61 0.25 0.25 0.25

— 2.10 — 2.15 — 1.10 — 0.99 0.15 0.93 0.86 0.79

— 2.05 — 2.09 — 0.49 — 0.32 0.52 0.90 0.78 0.72

(B) Standard deviation for  CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

— 0.12 — 0.13 — 0.10 — 0.11 0.09 0.28 0.26 0.25

(C) RMSE for  CL CLL CLP CLLP PL PLL PLHET PLLHET PLLHETI HK05 HK10 HK30

— 1.71 — 1.68 — 0.10 — 0.13 0.36 0.28 0.26 0.25

almost all designs, the biases for PLL are smaller than those of PL, i.e. introducing the lagged choice in the regressor set when state dependence is present improves the estimates.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

149

Introducing normal random e-ects in the pooled procedures (PLHET and PLLHET), lowers the average biases in p relative to PL and PLL in almost all designs. In the UN0, UNLO, UNME, and UNMEI designs, introducing the lag as an explanatory variable decreases the bias in p , similar to the case without heterogeneity. The opposite occurs in the correlated e-ects designs, COLO, COME, COMEI. Similar to PL and PLL, the average biases for p are very large, up to 100% of the true value, sometimes yielding positive price e-ects (COME, COMEI designs). Increasing 1 produces higher average biases for p : PLHET and PLLHET estimate d and f quite well. The mean biases are at most around 25% of the true values of the coeGcients. The coeGcient of the lagged choice is in general better estimated by PLLHET than PLL, i.e. accounting for unobserved heterogeneity lowers the bias in estimating the e-ect of state dependence. However, the biases in the designs where initial conditions are endogenous is signi,cant, up to 50% of the true value of ; similar to the PLL estimates. Allowing for endogenous initial conditions (PLLHETI) decreases, as might be expected, the magnitude of the average biases of all the estimated coeGcients compared to PLLHET in the design where initial conditions are endogenously generated and the individual e-ects are uncorrelated with the regressors (UNMEI design). However, this ,nding does not carry over to the case where the random e-ects are correlated with prices (COMEI design): only the biases in p and f decrease substantially over those of PLLHET, while the bias in  tends to increase (in absolute terms). For the other designs the biases tend to be higher than those of PLLHET. A notable exception is the COLO design where the bias in p drops signi,cantly (by 27%). In the same design, however, the bias of  increases by approximately 50%. It is interesting to note that, in most designs, the bias of the state dependence parameter  reverses sign as compared to that of PLLHET; in particular, in most cases  is underestimated (negative bias). The HK method overestimates on average the price e-ect; the mean biases are all negative for all designs and bandwidths. The biases are at most 15% of the true p ; and they increase (in absolute magnitude) monotonically with the value of the bandwidth. The mean biases for d are very large. The coeGcient is overestimated by 100 –200%. However, the median biases, reported in the working paper, are much smaller, up to 40% of the true value. In contrast, the average biases for f are of much smaller order, at most 10% of the true coeGcient. HK estimates  quite well. The average biases are smaller than those produced by the other methods. With the exception of the designs with endogenous initial conditions, the biases for  are at most 5% of the true value. In the UNMEI, COMEI designs, however,  is overestimated around 30%. Similar to the conditional methods, increasing the variance of the individual e-ects does not a-ect the average biases of the coeGcients, with the exception of d : The average biases for d , f ; and  decrease

150

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Table 15 Average number of HK strings UN0

UNLO

UNME

COLO

COME

UNMEI

COMEI

343

323

268

378

340

250

312

monotonically (but not necessarily in absolute value) with the bandwidth constant h: Concerning the dispersion of the estimates around their averages across the 100 replications, as measured by their standard deviation (see Panels B of Tables 11–14), we see that the pooled estimates have the lowest dispersion, which is to be expected since these methods use all the data. All procedures display relatively constant dispersion across all designs and across the di-erent speci,cations of 12 : The standard deviation of the HK estimates in general decreases as the bandwidth increases, as expected. The relatively large dispersion of the HK estimates may be explained by the fact that this method uses the smallest number of observations among all approaches. Table 15 provides the average (over 100 replications) number of observations used by the HK procedure. Thus, for example for the UNME design the sample size used by HK is less than a third of the sample size (n = 771) used by the pooled methods. Finally, Panels C of Tables 11–14 report root mean squared errors (RMSEs) of the parameter estimates across the 100 replications. We ,nd that both the conditional methods and HK have constant RMSEs for the p coeGcient across designs, although HK yields RMSEs that are almost twice as large as those of the conditional methods. In contrast, the RMSEs of the pooled methods increase dramatically as we increase the relative magnitude of the household e-ects and as we introduce correlation between the latter and the exogenous covariates. For d we ,nd that HK produces very large RMSEs, which is to be expected given the large mean biases for that coeGcient. All other methods tend to produce similar RMSEs which do not tend to vary much across designs. For f we note that all methods produce comparable RMSEs, with CL and CLL exhibiting the lowest ones. For ; however, both CLL and CLLP produce very large RMSEs as expected given the size of the average biases. With the exception of the endogenous initial conditions designs (UNMEI, COMEI), HK produces constant RMSEs while the pooled methods produce RMSEs that tend to increase as 12 increases and as we introduce correlation between the household e-ects and the observed covariates. We conclude that the conditional logit procedures appear to be the most robust in estimating the coeGcients on the exogenous variables among all procedures. However, they produce very poor estimates of the habit e-ect

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

151

: The pooled procedures produce signi,cantly biased estimates of key parameters, such as the price coeGcient p and the feedback parameter ; in particular when initial conditions are endogenous and=or the individual effects are correlated with the exogenous variables. The HK method estimates p and  quite well, although it produces more imprecise estimates than both conditional and pooled methods. 7. Conclusions Empirical researchers have extensively used the panel data discrete choice model with a lagged choice variable to capture the e-ects of state dependence (‘loyalty’) on purchase behavior. More recently, various estimation approaches have been implemented to also allow for the presence of unobserved heterogeneity across consumers for di-erent brands. The present paper provides a theoretical foundation for the standard model of brand choice with a lagged dependent variable and unobserved individual e-ects. We introduce habit e-ects in the utility function which in general requires the consumer to solve a dynamic optimization problem. We derive suGcient conditions under which the dynamic problem maps into a static one-period optimization problem that underlies the standard econometric speci,cation. In addition, the paper provides an empirical application of the estimator recently proposed by Honor/e and Kyriazidou (2000) for dynamic discrete choice panel data models and compares it to estimators typically used in the literature. Our empirical results for the yogurt data reveal that all procedures yield a signi,cant price elasticity of the brand choice, and sensitivity of consumers to marketing variables, such as advertising. Furthermore, the estimates suggest that there exist large habit e-ects and substantial heterogeneity across consumers. The size of the estimated parameters, however, varies considerably across estimation methods. Our Monte Carlo results indicate that the conditional likelihood procedures are the most robust in estimating the coeGcients on the exogenous variables among all procedures. However, the feedback parameter on the lagged dependent variable is signi,cantly underestimated. The pooled procedures are quite sensitive to model misspeci,cation, often yielding large biases for key economic parameters, such as the e-ect of state dependence and especially the e-ect of prices on brand choice. The estimator proposed by Honor/e and Kyriazidou performs quite satisfactorily in capturing both the price and the habit e-ects. Future research would involve application of the estimators to other kinds of products, to a larger number of brands, and to other consumer decisions (e.g. discrete=continuous choices of brands and quantities). Another important topic for further investigation is the timing and frequency of purchases.

152

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

Acknowledgements We would like to thank Andrew Chesher, Jim Heckman, Bo Honor/e, Cheng Hsiao, Jean-Marc Robin, Peter Rossi, three anonymous referees, and seminar participants at various institutions for useful comments. The paper was presented at the 1998 Econometrics Camp, Catalina Island, California, at the 1998 Econometrics Group conference, Bristol, and at the 1999 Econometric Society European Meetings, Santiago de Compostela. We are grateful to the participants for helpful suggestions. Andr/e Bonfrer provided excellent research assistance. The second author gratefully acknowledges ,nancial support from NSF. References Borsch-Supan, A., Hajivassiliou, V.A., 1993. Smooth unbiased multivariate probability simulators for maximum likelihood estimation of limited dependent variable models. Journal of Econometrics 58 (3), 347–368. Chamberlain, G., 1984. Panel data. In: Griliches, Z., Intriligator, M. (Eds.), Handbook of Econometrics, Vol. II. North-Holland, Amsterdam. Chamberlain, G., 1985. Heterogeneity, omitted variable bias, and duration dependence. In: Heckman, J.J., Singer, B. (Eds.), Longitudinal Analysis of Labor Market Data. Cambridge University Press, Cambridge. Chamberlain, G., 1993. Feedback in panel data models. Unpublished manuscript, Department of Economics, Harvard University. Chay, K., Hyslop, D., 1998. Identi,cation and estimation of dynamic binary response models: empirical evidence on alternative approaches to examining welfare dependence. Working paper, Econometrics Camp, Catalina Island, CA. Chiang, J., 1995. Competing coupon promotions and category sales. Marketing Science 14 (1), 105–122. Deaton, A., Muellbauer, J., 1980. Economics and Consumer Behavior. Cambridge University Press, Cambridge. Erdem, T., Keane, M.P., 1996. Decision-making under uncertainty: capturing dynamic brand choice processes is turbulent consumer goods markets. Marketing Science 15 (1), 1–20. Geweke, J., Keane, M., Runkle, D., 1994. Statistical inference in the multinomial multiperiod probit model. Working paper, Federal Reserve Bank of Minneapolis Sta- Report: 177. GLonLul, F.F., 1999. Estimating price expectation in the OTC medicine market: an application of dynamic discrete choice models to scanner panel data. Journal of Econometrics 89, 41–56. Guadagni, P.M., Little, J.D.C., 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2 (3), 203–238. Hanemann, W.M., 1984. Discrete=continuous models of consumer demand. Econometrica 52 (3), 541–561. Hansen, L.P., 1982. Large sample properties of generalized method of moments estimators. Econometrica 50, 1029–1054. Heckman, J.J., 1981. Heterogeneity and state dependence. In: Rosen, S. (Ed.), Studies of Labor Markets. The University of Chicago Press, Chicago. Honor/e, B.E., Kyriazidou, E., 2000. Panel data discrete choice models with lagged dependent variables. Econometrica 68, 839–874. Hsiao, C., 1986. Analysis of Panel Data. Cambridge University Press, Cambridge.

P. Chintagunta et al. / Journal of Econometrics 103 (2001) 111–153

153

Jain, D.C., Vilcassim, N.J., 1991. Investigating household purchase timing decision: a conditional hazard function approach. Marketing Science 10, 1–23. Jain, D.C., Vilcassim, N.J., Chintagunta, P.K., 1994. A random-coeGcients logit brand-choice model applied to panel data. Journal of Business and Economic Statistics 12 (3), 317–328. Jones, J.M., Landwehr, T.J., 1988. Removing heterogeneity bias from logit model estimation. Marketing Science 7, 41–59. Keane, M.P., 1997. Modeling heterogeneity and state dependence in consumer choice behavior. Journal of Business and Economic Statistics 15 (3), 310–327. Magnac, T., 1997. State dependence and heterogeneity in youth employment histories. Working paper, INRA and CREST, Paris. Manski, C., 1987. Semiparametric analysis of random e-ects linear models from binary panel data. Econometrica 55, 357–362. McCulloch, R.E., Rossi, P.E., 1994. An exact likelihood analysis of the multinomial probit: model. Journal of Econometrics 64 (1–2), 207–240. Rossi, P.E., McCulloch, R.E., Allenby, G.M., 1996. The value of purchase history data in target marketing. Marketing Science 15 (4), 321–340. Roy, R., Chintagunta, P., Haldar, S., 1996. A framework for analyzing habits, hand-of-past, and heterogeneity in dynamic brand choice. Marketing Science 15 (3), 280–299. Vilcassim, N.J., Jain, D.C., 1996. Modelling purchase timing and brand switching behavior incorporating explanatory variables and unobserved heterogeneity. Journal of Marketing Research 28, 29–41.