163
Editorial to Special Issue on “Analysis of Panel Data”
Models of purchase timing and models of brand choice: Outlook and issues Albert
Models of purchase timing and models of brand choice
C. Bemmaor
Ecole SupPrieure des Sciences Economques et Commerciales (ESSEC), 95021 Cergv Pontorse Cedex, France
David C. Schmittlein The Wharton School, PA 19104, USA
University
of Pennsylvania,
This position paper shows that models and models of brand choice have evolved and outlines areas for further study.
Philadelphra,
of purchase timing along similar lines,
Total Variance (Purchases) = Within-Household Variance + Between-Household Variance,
Introduction Since the early 1950s a continuous flow of research has documented the purchasing process as a “potential” outcome of a stochastic process. Whereas the early studies focused on the analysis of purchase sequences irrespective of time of purchase, later work has concentrated on predicting the number of purchases over a fixed time period. This position paper shows the parallel between both approaches in recent research, and suggests avenues for further study.
Intern. J. of Research North-Holland
in Marketing
8 (1991) 163-168
0167-8116/91/$03.50
0 1991 - Elsevier
Science Publishers
Perhaps the work of Ehrenberg (1959) has motivated the research impetus on the study of purchase timing (or “purchase incidence”). This approach attempts to “explain” the variation of purchases uc~~0s.shouseholds by the variation of their long-term mean purchase rate:
where the Within-Household Variance (or “Residual” Variance) is the variance of a Poisson process and the Between-Household Variance (or “Explained” Variance) is the variance of the Poisson parameter. 1 Ehrenberg (1959) assumes that this parameter is distributed Gamma across households. This one-way analysis-of-variance framework has shown its capability to describe consumer
’ This decomposition
can also be pointed at through the R* measure which is equal to Between Variance/Total Variance. If the parameters of the Gamma density are r and (Y (r = shape parameter, a = scale parameter) with mean r/a. the R2 is equal to a/( a + 1). Such an expression is sometimes considered as an “Upper Bound” for R2 for econometric model of purchase timing (e.g., Wheat and Morrison, 1990). However, over short time periods, the aggregate number of purchases typically varies from one period to the next. Such a formula ignores the Between-Period. in addition to the Within-Household, variation of purchases which may explain the “large” discrepancies found between the fit of econometric models and the “upper bounds”.
B.V. All rights reserved
164
A. C. Bemmnor.
D. C. Schmitflein
purchasing patterns. Still, some researchers have attempted to modify the structure of each of the components. For example, Chatfield and Goodhardt (1973) considered that the purchasing process is more regular than Poisson and modified the Within Variance component with a Condensed Poisson distribution. Sichel (1982) changed the Between Variance to an Inverse Gaussian distribution. These changes apparently have not led to systematic improvements over the Gamma/Poisson scheme. Hence, one may wonder: Is a one-way analysis of variance good enough to “explain” purchase? Why “theorize” when simple models do the job? Those questions are not new, but after years of experience with models of purchase, they are becoming even more relevant. Obviously, the answer is: “it depends.” Clearly, if the sole objective of the research is to describe, and even to forecast purchase under stationary conditions, then simple models work. But marketing being typically oriented toward decision making, simple probabilistic models are usually short of insights into the “ true” purchasing process. ’ For example, researchers have not explained why purchasing patterns are consistent with the Poisson assumptions, i.e., independence of increments and stationary purchasing rate. To circumvent this shortage of implied insightful and “ actionable” results, more recent research has attempted to explain the variation of purchases across households and over time. The analysis of variance becomes two-way: Total Variance (Purchases) = Within-Household Variance + Between-Household Variance + Residual Variance.
’ Note here that when one wishes to understand the purchasing process, the “correctness” of the model vis-a-vis the true phenomenon becomes crucial. Hence, specification error tests are useful tools to discriminate between competing processes.
/ Edirorial
The Within Variance is explained by timevarying variates such as promotional activity (deals, price cuts, coupons, end-of-aisle displays) or advertising whereas the Between Variance is due to (fixed or slowly moving) socio-economic variables such as household size, income or age of the head of the household. This stream of research is exemplified by the component of the model in Gupta (1988) which pertains to interpurchase times within and across consumers. 3 Such a stream is highly promising but it does not provide insights into the choice process. Why did consumers choose this particular brand on a given purchase occasion? Traditionally, this process has been analyzed separately from the product class purchase timing process. Researchers have usually grounded their work on the independence assumption (see, e.g., Shoemaker et al., 1977; Jeuland et al., 1980, for some supporting evidence). Typically, they have assumed that individual brand choice is consistent with a zero-order Bernoulli process (see, e.g., Bass et al., 1984). Consumers’ choice pattern is summarized by a vector of probabilities fixed over time. In the same spirit as the timing process, the variance of brand choice across consumers is explained by the Between Variation of individual purchase probabilities. A popular model for heterogeneity of brand choice probabilities is the standard Dirichlet model. 4 Again, this approach has led to
Gupta (1988) includes “household average interpurchase time” instead of socio-economic variables as an explanatory variable to grasp the “Between Households” variation. Hence, the model does not “explain” average interpurchase time. Gupta obtains (surprisingly) low pseudo-R*‘s for the heterogeneous Erlang-2 model (p’ = 0.04) given the fact that he is using a homogeneous Erlang-2 density as a null model. One reason for this lack of fit may be due to the censoring of duration data, and the implied difficulty in estimating averages, in particular for light buyers. We do not refer to the work on the linear learning model since, to our knowledge, most published studies have only dealt with the two-brand case (instead of the multibrand case).
A. C. Bemmaor,
D. C. Schmittlein
“acceptable” descriptions of brand penetration and the distribution of brand purchases. The model can be parameterized with market shares and a market coefficient of heterogeneity of brand choice. However, the cost of parsimony may be high: the model assumes that brands draw sales on other brands in proportion to their shares (the aggregate version of the so-called “independence of irrelevant alternatives” assumption). The fact that the model describes some aspects of the data accurately does not necessarily imply that its assumptions are true. In particular, what will happen if a brand is pulled out of the market? Which brands will benefit most from it? Which brands will be unaffected? The effectiveness of the Dirichlet model here is not yet known: it is a steady-state model and, though markets may appear to be at equilibrium at a given point in time, managers want them to provide “good” answers if market conditions change. In parallel with purchase timing models, recent work on brand choice has modeled the “Within Variation” by the inclusion of such variables as dealing activity (e.g., Guadagni and Little, 1983: Kalwani et al., 1990), in addition to the Between Variation which is being grasped by an individual-specific brand loyalty factor. However, consistent with earlier work, this research has been mostly based upon the IIA assumption. The main reason for using the multinomial logit model (instead of the multinomial probit which does allow for interaction between brands) has been essentially computational. Very recent work in econometrics has circumvented the issue of numerical integration with the use of simulation (McFadden, 1989; Borsch-Supan et al., 1990). In total, models of purchase timing and models of brand choice have evolved in parallel. The NBD model, like the Dirichlet model, is an appropriate model of cross-section (survey) data for the number of purchases and brand choice. Paradoxically, they have
/ Editorial
165
been mostly used on cross-section/time series (panel) data when they ignore the time-series variation. Recent work has corrected (in part) for that by the inclusion of time-varying explanatory variables such as price cuts or deals. Although richer, these models are still overly simple. In particular, purchase timing models typically ignore brand choice, and brand choice models still need to grasp interaction between brands more accurately.
Some additional issues Clearly, the number of issues yet to be dealt with is very large. Hence, any selection of them is highly subjective. Based upon recent data bank developments and “real” practitioners’ concerns, we have selected seven issues or areas of potential future development: (1) In the area of new products’ sales forecasting Multibrand choice models developed so far represent individual choice under equilibrium conditions, i.e., for mature brands in existing markets. Although these brands provide the firm with day-to-day means of survival, the future is based upon the new products’ sales performance. Consequently, explanatory and predictive models need to be adapted to take account of the dynamics of brand preference, brand awareness and distribution build. In addition, some more specific issues need to be coped with. For example, a key decision to be made when launching a new packaged good is the pack size, and perhaps, the number of pack sizes to offer. Managers may be inclined to launch “large” pack sizes to cater for the heavy user’s needs. From the management scientist’s viewpoint, the question then becomes: How often will the buyers of the new product buy it in a month? or: What is the interaction effect between interpurchase time
166
A. C. Bemmaor,
D. C. Schmittkin
and package size? Although the assumption of independence between brand choice and purchase timing may in some cases be a fair approximation, it may not be appropriate for pack size choice. To our knowledge, this relationship has not been empirically examined. One possible approach would be to model I usage times (inter-purchase time divided by the quantity bought on the previous purchase occasion) instead of interpurchase times (e.g., Banerjee and Bhattacharyya, 1976). However, the drawback is that this type of model does not provide predictions of measures such as the proportion of repeaters, i.e., consumers who bought at least twice over a fixed time period (but rather the proportion of buyers of two unit quantities). (2) Modeling purchasing ods
over short time peri-
With the massive development of (home or/and store) scanning data, market reports are now available on such periods as weeks for a large number of purchases. Managers wish to track consumers’ purchases over the time of their deals which often last a few days (depending on the attractiveness of the deal!). Over such time intervals, the NBD model does not work particularly well (Ehrenberg, 1988). This is due, in part, to the lack of variance (compared to the mean) of the number of and Bhattacharyya purchases. 5 Banerjee (1976) suggested to use a compound Inverse Gaussian distribution as a model of interoffering a large purchase times: although flexibility, this model has not generated much follow-up study, perhaps because of its (less than convenient) computational tractability relative to the NBD model. 6 ’ As a reminder, the NBD assumes that the variance is larger than the mean. 6 Jeuland et al. (1980) have empirically shown that, as the period of analysis becomes longer, the predictions of a heterogeneous condensed Poisson model approach those of the NBD.
/ Editorial
(3) Assessing
the effect of deals ex ante
Models that forecast promotion effects need to be developed. Assessing a brand promotion impact ex post is clearly a valuable, but insufficient, source of information (Goodhardt and Ehrenberg, 1967). Ex ante forecasts need to be made over short time periods and, preferably, be adjusted to the length of the planned period of promotion. (4) Deriving “optimum” promotion the econometric models of choice
policy from
Questions such as the following ones need to be addressed: Which brands should be promoted? What kind of promotion (coupon versus price cuts, for example)? How long should it last? How often should the brand be promoted? These questions are complex and interrelated, but approaching them will increase the managerial relevance of the existing econometric models of choice. (5) Deriving temporally / cross-sectionally gregated models of choice
ag-
The question is: What is the temporally/ cross-sectionally aggregated model when individual consumers are consistent with a multinomial logit (or probit) model of brand choice? For some product classes (e.g., cigarettes in France), syndicated individuallevel scanner panel data are not available; for some others, the household scanner panels’ coverage rate may be low relative to store audit sales, because of non-scanned purchases (due to the bulkiness of the product, for example). In addition, studies of individual choice behavior as measured in scanner panels are typically based upon small samples (e.g., 100 households in Guadagni and Little, 1983, and Gupta, 1988; 300 in Lattin and Bucklin, 1989; 216 in Kalwani et al., 1990) because, in part, of computational costs and the exclusion of “light” buyers. Although val-
A. C. Bemmaor,
D. C. Schmittlein
uable, such results may not apply to the Aggregated consumer “average consumer”. sales (such as store sales) may, at times, be the only reliable source of sales. In addition, store-level scanning data are gradually replacing store audits as market status reports. The issue consists of deriving the aggregated time series model from which the micro-parameters can be extracted. ’ Such an approach would provide greater insight into the individual choice process than the direct estimation of an aggregate form without explicit behavioral foundation. (6) Assessing tive strategy
market competition
and competi-
Modelers have had substantial success using panel data to measure short-run influences of marketing variables. While there has also been some investigation of longer-run effects and competitive market strategies, these latter issues have not been explored in as much depth to date. (7) Issues in designing panel datasets Relatively little attention has been given in the marketing literature to questions involving the size, composition and coverage (temporal and geographic) of consumer panels, and to the specific array of product category, brand, store environment and informational environment information, together with customer characteristics, that make up such a panel dataset. Certain in-store conditions, sales in non-scanner-equipped stores, and especially information on customer attitudes, perceptions and preferences represent additional useful information that is just beginning to become available.
’ One might suspect the aggregate model to be analytically distinct in nature from its micro-components.
/ Editorial
167
Content of this special issue The articles published in this special issue can be seen as contributing to many of the seven issues raised above. Ortmeyer, Lattin and Montgomery look at promotion response patterns for individuals, with results that have implications for issues (l), (3), and (4). Kannan, Wright and Worobetz’s work has relevance for marketing strategy (6), in testing for competitive submarkets. Schneider and Currim consider the question of customer deal proneness in a study with applicability to issues (2), (3), and (4). Wagner and Taudes’s article is focussed specifically on the response of consumers to new products, issue (1). This explanatory model of purchase behavior, together with the one proposed in Zufryden’s article, has appeal for those who study promotions, issues (2) and (3), and also for understanding aggregate market response patterns, issue (4). The work by Parry and Gengler examines the estimation of brand switching matrices, a key input to competitive market analyses, issue (6). Finally, Golany, Phillips and Rousseau provide some interesting observations on panel design, issue (7).
Conclusion The research trend is toward building more detailed, insightful models in order to improve understanding, and (hopefully) decision making. Most models now being developed are based on a sound empirical footing. The “ theorizing” typically builds upon good “working” (fitting) models. In sum, the area of stochastic modeling has advanced with small (but firm) steps over the last 40 years. Data now abound, but the methods are still short of extracting “all” relevant information. Based upon performance to date, there is reason for optimism that new (and challenging) areas of investigation will be addressed.
168
A. C. Bemmaor,
D.C. Sehmittlein
References Banejee, A.K. and G.K. Bhattacharyya, 1976. A purchase incidence model with inverse Gaussian interpurchase times. Journal of the American Statistical Association 71, 823-829. Bass, F.M., M.M. Givon, M.U. Kalwani, D. Reibstein and G.P. Wright, 1984. An investigation into the order of the brand choice process. Marketing Science 3, 267-287. Borsch-Supan, A., V. Hajivassiliou, L.J. Kotlikoff and J.N. Morris, 1990. Health, children, and elderly living arrangements: A multiperiod-multinomial probit model with unobserved heterogeneity and autocorrelated errors. Working Paper No. 3343, National Bureau of Economic Research, Cambridge, MA. Chatfield, C. and G.J. Goodhardt, 1973. A consumer purchasing model with Erlang interpurchase times. Journal of the American Statistical Association 68, 828-835. Ehrenberg, A.S.C., 1959. The pattern of consumer purchases. Applied Statistics 8, 26-41. Ehrenberg, A.S.C., 1988. Repeat-buying: Facts, theory and applications. 2nd ed. ondon: Charles Griffin. 1967. Conditional Goodhardt, G.J. and d”.S.C. Ehrenberg, trend analysis: A breakdown by initial purchasing level. Journal of Marketing Research 4, 155-161. Guadagni, P.M. and J.D.C. Little, 1983. A logit model of brand choice calibrated on scanner data. Marketing Science 2, 203-238.
/ Editorial
Gupta, S., 1988. Impact of sales promotions on when, what, and how much to buy. Journal of Marketing Research 25, 342-355. Jeuland, A.P., F.M. Bass and G.P. Wright, 1980. A multibrand stochastic model compounding heterogeneous Erlang timing and multinomial choice processes. Operations Research, 28, 255-277. Kalwani, M.U., C.K. Yim, H.J. Rinne and Y. Sugita, 1990. A price expectations model of customer brand choice. Journal of Marketing Research 27, 251-262. Lattin, J.&i. and R.E. Bucklin, 1989. Reference effects of price and dromotion on brand choice behavior. Journal of Marketing Research 26, 299-310. McFadden, D., 1989. A method of simulated moments for estimajion of discrete response models without calculating multiple integals. Econometrica 57, 995-1026. Shoemaker, R.W., R. Staelin, J.B. Kadane and F.R. Shoaf, 1977. Relation of brand choice to purchase frequency. Journal of Marketing Research 14, 458-468. Sichel, H.S., 1982. Repeat-buying and the generalized inverse Gaussian-Poisson distribution. Applied Statistics 31, 193204. Wheat, R.D. and D.G. Morrison, 1990. Assessing purchase timing models: Whether or not is preferable to when. Marketing Science 9, 162-70.