Additional information for: “Comments on “Alternative models of demand for automobiles” by Charlotte Wojcik”

Additional information for: “Comments on “Alternative models of demand for automobiles” by Charlotte Wojcik”

Economics Letters 74 (2001) 43–51 www.elsevier.com / locate / econbase Additional information for: ‘‘Comments on ‘‘Alternative models of demand for a...

59KB Sizes 0 Downloads 15 Views

Economics Letters 74 (2001) 43–51 www.elsevier.com / locate / econbase

Additional information for: ‘‘Comments on ‘‘Alternative models of demand for automobiles’’ by Charlotte Wojcik’’ Steve Berry 1 , Ariel Pakes* Department of Economics, Littauer Center, Harvard University, Cambridge, MA 02138 -3001, USA Received 19 March 2001; accepted 6 June 2001

Abstract In a recent paper in this journal [Econ. Lett. 68 (2000) 113], Wojcik argues that the nested logit ‘‘is likely to be superior’’ to demand specifications we and others have used in recent empirical work. We review the relevant models and their uses, consider her application, and find that her conclusions are incorrect.  2001 Elsevier Science B.V. All rights reserved. Keywords: Logit; Nested logit; BLP JEL classification: C5 Econometric modeling; L0 Industrial organization (general)

1. Introduction In a recent paper in this journal, Wojcik (2000) argues that the nested logit model ‘‘is likely to be superior’’ to alternative random coefficients logit specifications, like those we and many others have used in recent work (e.g. Berry et al. (1995), henceforth BLP). Her conclusion is based on a within sample ‘prediction’ exercise. We would like to raise several issues about her conclusion. 1. Most important to Wojcik’s specific conclusions, she uses different independent variables in her ‘BLP’-based predictions compared to in her nested logit predictions. In particular, for the nested logit predictions she appears to include on the right-hand side an additional variable that is a function of the left-hand side market shares being predicted. This endogenous variable (which would be unknown in a true out-of-sample prediction) could easily account for the apparent

* Corresponding author. Tel.: 11-617-495-5320; fax: 11-617-496-7352. E-mail addresses: [email protected] (S. Berry), [email protected] (A. Pakes). 1 Tel.: 11-203-432-3556; fax: 11-203-432-6323. 0165-1765 / 01 / $ – see front matter  2001 Elsevier Science B.V. All rights reserved. PII: S0165-1765( 01 )00533-X

44

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

superiority of the nested logit, as no similar endogenous variable is included in the BLP-style specification. 2. To say that nested logit performs better than random coefficients logit is odd to begin with, as the nested logit is a special case of a random coefficients logit (Cardell, 1997). 3. Finally, and pedagogically most important, the issues that arise in the relationship between assumptions and predictions from demand systems depend on the topic of interest. For example BLP focused on obtaining reasonable own and cross price and characteristics elasticities that could be used in various policy analysis. For this purpose the random coefficients in BLP’s specification seem to be necessary (as is clearly illustrated by Wojcik’s own results). Wojcik focuses on a different question, out of sample prediction of market shares. To address this question one must decide what one wants to condition on for the prediction exercise. Our reading of the paper is that she has not chosen an internally consistent conditioning set and that makes it hard to provide an interpretation for her results. We now turn to a more detailed explanation of each of these points, beginning with some background on nested logits and random coefficients logits, continuing with a discussion of general uses of these models and finishing with a discussion of Wojcik’s particular prediction exercise.

2. Nested logits versus random coefficients logit or BLP We begin with the random coefficients utility function that we have used as a framework for all of our work and which in slightly different form dates back at least to McFadden (1974), McFadden et al. (1977) and Hausman and Wise (1978). It specifies that the utility consumer i receives from product j is u ij 5 x j bi 2 pj b ip 1 j j 1 eij

(1)

where x j is a vector of product characteristics observed by the econometrician, pj is the price of good j, j j is a product characteristic which is not observed by the econometrician, eij is an i.i.d. random ‘disturbance’, and bi is a vector of ‘tastes’ of the consumer for the various product characteristics (including price). It is convenient to keep a special superscript for the price coefficient because price is typically treated differently than the other product characteristics in estimation. Further recent discussion of this class of models is found in Berry et al. (1995), Nevo (2001), Petrin (2000), Berry and Pakes (2000) and Berry et al. (1998). The models that we discuss in this note can all be defined by making particular assumptions on the heij j and on the distribution of tastes for the product characteristics (on the distribution of the h bi j). The ‘logit’ assumption is that e has the double-exponential (or extreme value) distribution function that leads to the familiar logit functional form for market shares. If, in addition, the tastes for product characteristics are the same across consumers ( bi 5 b ), then we have the traditional pure logit model. As discussed extensively by McFadden and others, the logit model implies an independence of irrelevant alternatives assumption (IIA). Even absent the doubleexponential distribution, as long as the e -values are i.i.d., a discrete choice model will ensure very

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

45

restrictive substitution patterns across products. In particular, the own and cross price derivatives of the market demand function will depend only on market shares and not on the characteristics of the good. Thus, there is: (i) no systematic reason for goods with similar characteristics to be close substitutes and (ii) no reason why markups should systematically vary with prices. (We emphasize these points in BLP.) A more general alternative is to allow the tastes for product characteristics to depend on the attributes of the consumer, some of which may be observed, say z i , and some of which may be unobserved, say ni . Thus the specification which underlies our work lets the taste of consumer i for characteristic r be determined by

bir 5 b¯ r 1 z igr 1 sr ni

(2)

Here, b¯ , g and s are the vectors of parameters to be estimated. Once we allow for the random coefficients in Eq. (2), the choice model in Eq. (1) can generate more reasonable own and cross price and characteristic elasticities. For example, now if the price of good j goes up the consumers who leave good j have particular tastes; they are consumers who preferred the characteristics of good j (that is why they bought it), and hence will substitute to other goods with similar characteristics. Further, provided the price coefficient has the right sign, the model will imply that expensive goods tend to be bought by people who are not particularly sensitive to price. As a result, a smaller number of consumers who purchase expensive goods substitute out when the price of those goods increases by a fixed amount. This frequently implies that expensive goods have a lower price elasticity and a higher markup in a Nash pricing equilibrium. That is, once we allow for random coefficients, we break the implications of i.i.d. e -values as discussed above. To add the random coefficients, however, we incur a cost. The market shares generated by the choice model in Eq. (1) are obtained by finding the product preferred by every possible ( bi , ei ) combination, and then integrating out over the distribution of ( bi , ei ) in the relevant consumer population. If we use Eq. (2) to model the bi in Eq. (1), then the market shares generated by this procedure do not have an analytic form, and must be either simulated or obtained through some other numerical integration procedure. Moreover if we are to estimate the parameters of the model from product level data, i.e. from the observed market shares and characteristics including price, then we have to simulate these shares every time we evaluate a different value of the parameter vector during the estimation algorithm (Pakes, 1986; Pakes and Pollard, 1989; McFadden, 1989). If we have household level data that matches individuals to the products they chose, then we typically only obtain an analytic form for the individual’s choice probabilities if sr ; 0, that is if there are no unobserved consumer attributes that affect their tastes. Whether or not this latter assumption is reasonable depends on the richness of the z i in the data relative to the richness of the factors that actually affect individual choices 2 . If it is not reasonable and the researcher wrongly assumes sr 5 0 the implications of the estimates using micro data will be similar to the implications discussed above for the logit estimates on aggregate data (see Berry et al. (1998) for more detail). Even if we do have micro data and the assumption sr 5 0 is reasonable, the researcher might still have to use simulation to estimate the b¯ r , as these parameters, which will be determinants of own and cross price and 2

Goldberg (1995) uses individual zi -values in a household-level nested logit model of automobile demand. Because of the observed z i -values, her model is not comparable to the straight aggregate nested logit of Wojcik.

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

46

characteristic elasticities, cannot be identified without the aggregate implications of the model, and the aggregate implications cannot be computed without employing simulation (again see Berry et al. (1998); this problem is especially important if the number of parameters is large). We now come back to the nested logit model. Cardell (1997)3 shows that the nested logit can be written as a special case of the model in Eq. (2). When there is a single level of nests, the restrictions the nested logit imposes on Eq. (2) are • the only x variables which receive random coefficients are a set of dummy variables that provide an a priori division of the products into mutually exclusive and exhaustive groups, and • the distribution of the random coefficients on the dummy variables have to be of a particular form (see below). If there is more than one level of nests, then each subsequent level partitions the products in the level above it in a similar way. More formally for a single level of nests we require the coefficients of each characteristic (or r) to satisfy either

bi,r 5 b¯ r ;i or x j,r 5 d j,r [ h0,1j,

Od r

j,r

5 1 (; j)

and

bi,r 5 b¯ r 1 ni,r ( lr ) where the composite error

n ( lr ) 1 lr e

(3)

distributes double exponential when 0 # lr # 1. Substituting these restrictions into Eq. (1) we obtain the ‘nested logit’ utility function as u ij 5 dj 1

O d n (l ) 1 l e jr ir

r

r

r ij

(4)

where

O b¯ x

dj ;

r

r

j,r

1jj

The parameter lr is constant within a group but can vary across groups (just as the sr in Eq. (2) can

3

Cardell’s results were known to him and widely distributed while he was a graduate student, so they have been part of the verbal tradition in econometrics at least since the early 1980s.

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

47

vary across groups). The distribution in Eq. (3) is derived in Cardell (1997). Note that it has the property that when l 5 0, n ( l) is double-exponential and when l 5 1, n ( l) ; 0. This assumption on n, together with the fact that the maximum of logit utilities itself has a logit distribution generates the familiar nested logit functional form (Cardell, 1997). Note that if l . 0, we can divide Eq. (5) by it without changing the implications of the model (which proves that this is a restricted version of our model). The special case of the nested logit with l 5 0 is a restricted version of the limiting case of our model studied in Berry and Pakes (2000). The nested logit does have the constraint that 0 # l # 1; but this constraint typically does not have any impact on estimates of the parameters of the model if the point estimate of l is within these bounds, so it is not generally imposed at the outset. The implications of the restrictions in the nested logit are analogous to the restrictions of the pure logit model for comparisons across products within a given group, while the restrictions that the nested logit imposes across groups are that the differences in own and cross price and characteristic derivatives can only be a function of their shares and the group-specific lr coefficients. It is important to keep in mind that the nested logit cannot allow for random coefficients on continuous valued product characteristics; i.e. it cannot, for example, allow consumer’s sensitivity to price to differ (say according to their income). In our empirical work we have found that without allowing randomness in the coefficient of price it is difficult to produce reasonable own and cross price elasticities. Also the nested logit cannot allow for ‘overlapping’ groups of products (there has to be a hierarchical structure to the nests)4 . Of course, we can allow some characteristics to have random coefficients that take on the nested logit form while others are assumed to have forms which better mimic the distribution of product and consumer attributes. Though this will not produce analytic forms for the market shares, and hence will still require all the computational tools developed in BLP, if done carefully it may lower computation time significantly at little cost, and hence be quite a reasonable way of proceeding. Estimation in both models can follow BLP. This involves solving for the values of d that predict market shares exactly. There is a very easy algorithm for doing this in the random coefficients case, whereas there is an analytic formula for it in the nested logit. As shown in Berry (1994), the analytic form for d in the nested logit model can be used to generate a linear estimating equation for that model. Since we will need that form below we repeat it here as

SD

sj ln(s j ) 2 ln(s 0 ) 5 x j b¯ 2 b¯ p pj 1 (1 2 l g ) ln ] 1jj s¯ r

(5)

where s¯ r ; o r d j,r s j , is the total share of the rth group. Note that the disturbance term is the value of the unobserved characteristic for product j. These j j will be correlated with ln(s j /s¯ r ) by construction, and as stressed in Berry (1994) and in BLP, regardless of the form of the demand function we expect the j j to be correlated with price in almost any reasonable model of price formation.

4

The extension to Generalized Extreme Value models, as in Bresnahan et al. (1997), can allow for something very close to overlapping nests.

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

48

3. Prediction exercises In addition to relying on a set of primitive assumptions and estimates, any prediction exercise must be clear about what the researcher is conditioning on. In this class of models there are two distinctions among possible predictors that are of importance; the distinction between observed and unobserved characteristics, and the distinction between endogenous and exogenous variables. The connection is that endogenous variables are by definition functions of the unobservables so one cannot condition on endogenous variables without also conditioning on the set of unobservable values that lead to those outcomes. With this in mind, there are at least two kinds of sensible prediction exercises. The first conditions on both observed and unobserved product characteristics (these are estimated) and then either: (i) asks what happens when the policy environment changes, or (ii) compares predictions of the model to alternative sources of information. The second does not condition on unobserved product characteristics and can therefore make predictions (including out of sample predictions) based only on observed product characteristics. As one example of the first kind, some authors have compared the predicted markups from their models to estimates of those markups that have appeared elsewhere. Nevo (2001), using a model similar to ours, compares predicted markups to those reported in a government hearing, and BLP compare predicted markups to the general range of markups as reported to us by industry sources. Perhaps more interesting are predictions for what would happen after a change in the economic environment. Nevo (2000), uses his BLP style model to predict the pricing outcomes of mergers (both hypothetical and real) and reports predictions which are consistent with actual outcomes. As another example, in Pakes et al. (1993) we analyzed the effect of higher gasoline prices after 1973, while holding the consumer’s choice set otherwise constant. The predictions of the average MPG of new car sales in the years immediately after the gas price hike were amazingly accurate. Both the data and model produced the initially surprising result of a sort-run decrease in fuel efficiency, as poorer consumers simply left the market. Our predictions did progressively worse in subsequent years, as the real-world choice set began to change dramatically from our baseline in 1973 (especially as new fuel efficient cars entered the market). Since prices are a function of unobserved characteristics, predictions conditional only on observed characteristics face the problem of how to treat price. We stress here that prior empirical results have shown that the unobserved characteristics explain as much or more of demand variation than all of the observed characteristics, so we expect the endogeneity problem to be empirically important. So a consistent procedure for making predictions conditional only on the observed characteristics requires one to construct the distribution of prices conditional on observed characteristics induced by the distribution of the unobservable characteristics, and this in turn requires both an equilibrium assumption and information on the production costs of goods. That is, to make predictions conditional only on observed characteristics we would, in addition to knowing the demand side parameters, need to: (1) estimate cost-side parameters and a distribution of (demand and cost) unobservables, (2) draw from the distribution of unobservables and numerically calculate equilibrium prices and shares (somehow dealing with possible multiple equilibria)5 and (3) 5

Outside of the pure logit model, it is difficult to prove uniqueness of pricing equilibria in this class of models (Caplin and Nalebuff, 1991).

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

49

repeat this process to generate a distribution of predictions, possibly reporting the mean and S.D. of the predictions. This is approximately what would have to be done for any estimated, non-linear equilibrium market model. Having reviewed the nature of prediction exercises we now turn to Wojcik’s claim that her prediction exercise shows that nested logit models do better than random coefficients logits.

3.1. Wojcik’ s prediction exercise It is natural for Economic Letters articles to provide a short summary of their findings. Perhaps understandably then, there is not enough detail in Wojcik’s paper for us to pin down exactly how her estimates were obtained or how she used those estimates in her prediction exercises. We now discuss Wojcik’s results, noting where we are not clear about what she did. Wojcik’s estimates of our model do not look qualitatively different from the estimates we obtained when we estimated the demand system alone. Those estimates, however, never appeared in BLP (though they were referred to in a footnote). This is because, just as Wojcik notes, they were quite imprecise. For this reason much of our work on demand systems consists of detailing ways in which we can use other sources of information to help obtain more precise estimates of the demand system parameters. In BLP itself, we combined the information from the restrictions derived from the demand system with the information from the restrictions implied by a Nash pricing equilibrium. This doubled the number of restrictions that are brought to estimation and increased precision accordingly, at a cost of assuming a static Nash in prices equilibrium. Turning to the nested logit estimates, when estimating the nested logit model we expect both the price, and the ratio of the shares that appear in Eq. (5), to be correlated with j . It is not clear whether Wojcik instruments the share term (though she clearly states that she does instrument for price). If she does not instrument the shares we would expect the coefficient of those shares to be upwardly biased, as the relative share of the good should be positively correlated with its unobserved product characteristic. Next we come to the implications of the estimates. She spends only one paragraph on own and cross price and characteristic elasticities. Her comment on price elasticities is, ‘‘The nested logit model yields an estimated average demand elasticity of 20.3, which is unbelievably small compared to previous estimates; the BLP estimate of 220.6 which is unbelievably large’’ (pp. 115–116). We note that the 2 0.3 figure implies negative markups in a Nash pricing equilibrium. Again we are not quite sure how the 220.6 figure is calculated but, as she notes in a footnote, it is well within the range of the estimates BLP obtained on individual cars. We do not know what prompts Wojcik to consider an average elasticity of about 220 to be unbelievably large. When one takes into account the fact that the car companies are all large multiproduct companies, so that the prices they charge on one car take into account the cross price elasticities of that car with all the other cars that the company owns, the markups produced by our demand estimates were quite reasonable. Indeed it was the estimates of these markups that most convinced the staff at General Motors that our results were realistic enough to pursue further, a conclusion which led to our obtaining the proprietary CAMIP micro data used in Berry et al. (1998). The contrast between the markups in the BLP-style estimates versus the nested logit estimates is quite striking. Wojcik uses the models to predict the levels of 1990 shares, by model, from parameters that were

50

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

estimated on 1971–1989 data. To engage in any prediction exercise, we have to decide which variables we want to condition on in forming our predictions. As noted, we could condition only on observed non-price car characteristics, or we could condition also on price and unobserved product characteristics 6 . Wojcik wants to predict demand for the cars actually sold in 1990 conditional on the ‘characteristics’ of the cars marketed in 1990. As noted, she cannot condition on all of the car’s characteristics since the unobserved characteristics of the cars, the j -values above, are defined to be the values of j that generate the equilibrium quantities at the given prices. Our understanding of what is done in her paper is to assume that all of the j -values of the new cars are zero, and then to make predictions based on that assumption and on the observed price. This is an odd choice, given our discussion above, because price ought to be correlated with j and therefore the observed 1990 prices cannot be associated with arbitrary values of j . In the prior section, we suggest solving for the pricing associated with a given estimated distribution of j -values, but this would be much more complicated and would require cost-side estimates as well. Further, in the nested logit predictions Wojcik appears to use the observed 1990 within-group market shares, s j /s¯ r , to help predict 1990 shares, s j . Clearly the group shares are a function of the observed shares she is trying to predict. Even if one thought the endogeneity problem in price was not serious, surely we think that the endogeneity problem that arises using the observed group shares to predict differences in shares across groups is serious. Moreover if she did not instrument the group share variable in the original estimation, that coefficient is too high, so she will be giving an unusually large role to the endogenous group share in the prediction exercise. In rather stark contrast, she uses exactly the same x-values but neither the endogenous group shares nor the group dummies in her prediction for BLP’s model.

4. Conclusion Interesting prediction exercises using structural market models involve predicting the outcomes of ‘policy’ experiments and / or showing how to move from a list of exogenous variables to predictions for endogenous variables. In her paper, Wojcik does neither, attempting an out-of-sample prediction exercise using a ‘linearized nested logit’ that has endogenous prices and group shares on the right-hand side and comparing this to a BLP-style framework that uses only the endogenous prices (and not the group shares) as predictors. Aside from the inappropriate prediction exercise, her nested logit model has nonsensical predictions for markups, whereas our model predicts both reasonable markups and makes other policy predictions that are in the realm of plausibility. We therefore do not agree with her conclusion that the nested logit model performs in a superior fashion to other random coefficients frameworks.

6 We shall assume that the distribution of consumer characteristics in the years covered by the data is the same as that distribution in the prediction year, though this can often be questionable also.

S. Berry, A. Pakes / Economics Letters 74 (2001) 43 – 51

51

References Berry, S., 1994. Estimating discrete choice models of product differentiation. RAND Journal of Economics 23 (2), 242–262. Berry, S., Levinsohn, J., Pakes, A., 1995. Automobile press in market equilibrium. Econometrica 60 (4), 889–917. Berry, S., Levinsohn, J., Pakes, A., 1998. Differentiated products demand systems form a combination of micro and macro data: the new car market, Working Paper 6481, NBER [6481. Berry, S.T., Pakes, A., 2000. Estimating the Pure Characteristics Discrete Choice Model. Yale University, Discussion paper. Bresnahan, T., Stern, S., Trajtenberg, M., 1997. Market segmentation and the sources of rents from innovation: personal computers in the late 1980s. RAND Journal of Economics, Special Issue 28 (0), S17–44. Caplin, A., Nalebuff, B., 1991. Aggregation and imperfect competition: on the existence of equilibrium. Econometrica 59 (1), 1–23. Cardell, N.S., 1997. Variance components structures for the extreme-value and logistic distributions with applications to models of heterogeneity. Econometric-Theory 13 (2), 185–213. Goldberg, P.K., 1995. Product differentiation and oligopoly in international markets: the case of the U.S. automobile industry. Econometrica 63 (4), 891–951. Hausman, J., Wise, D., 1978. A conditional probit model for qualitative choice: discrete decisions recognizing interdependence and heterogeneous preferences. Econometrica 46, 403–426. McFadden, D., 1974. Conditional logit analysis of qualitative choice behavior. In: Zarembka, P. (Ed.), Frontiers of Econometrics. Academic Press, New York. McFadden, D., 1989. Method of simulated moments for estimation of discrete response models without numerical integration. Econometrica 57 (5), 995–1026. McFadden, D., Talvitie, A. et al., 1977. Demand Model Estimation and Validation. Institute of Transportation Studies, Berkeley, CA. Nevo, A., 2000. Mergers with differentiated products: the case of the ready-to-eat cereal industry. RAND Journal of Economics 31 (3), 395–421. Nevo, A., 2001. Measuring market power in the ready-to-eat cereal industry. Econometrica 69 (2), 307–342. Pakes, A., 1986. Patents as options: some estimates of the value of holding European patent stocks. Econometrica 54, 755–784. Pakes, A., Berry, S., Levinsohn, J., 1993. Some applications and limitations of recent advances in empirical industrial organization: price indexes and the analysis of environmental change. American Economic Review, Paper and Proceedings 83, 240–246. Pakes, A., Pollard, D., 1989. Simulation and the asymptotics of optimization estimators. Econometrica 54, 1027–1057. Petrin, A., 2000. Quantifying the Benefits of New Products: The Case of the Minivan. University of Chicago GSB, Discussion paper. Wojcik, C., 2000. Alternative models of demand for automobiles. Economics Letters 68, 113–118.