A methodology for incorporating prior information into choice models

A methodology for incorporating prior information into choice models

ARTICLE IN PRESS Journal of Retailing and Consumer Services 12 (2005) 113–123 A methodology for incorporating prior information into choice models P...

262KB Sizes 0 Downloads 61 Views

ARTICLE IN PRESS

Journal of Retailing and Consumer Services 12 (2005) 113–123

A methodology for incorporating prior information into choice models Peter T.L. Popkowski Leszczyca,*, Ashish Sinhab a

Department of Marketing Business Economics and Law, University of Alberta, 4-30F Faculty of Business Building, Edmonton, Alberta, Canada, T6G R6 b Victoria University of Wellington, Wellington, New Zealand

Abstract In this paper, we propose a method which facilitates the way a modeler or manager can include subjective information (such as judgment or intuition) into a choice model. The major contribution and focus of this research is on the way this prior information can be incorporated in a logit model. An important advantage of our approach is that, unlike the standard Bayesian approach, the prior information is incorporated using exogenous variables. We contend that it is easier for a manager or modeler to think in terms of market share and exogenous variables rather than in terms of unobservable parameter distributions. Two empirical illustrations are provided of our model: (i) showing the impact of a change in marketing strategy by including informative prior through subjective judgments, (ii) parameter estimation and sales forecasting when limited information is available. The results indicate that incorporating subjective prior information may lead to a significant improvement of parameter estimates and sales forecasts. r 2004 Elsevier Ltd. All rights reserved. Keywords: Bayesian prior information forecasting

1. Introduction Several researchers have concluded that combining information or parameter estimates from different sources or estimation procedures can lead to better sales forecasts (see, for example, Mahajan and Wind, 1988; Blattberg and Hoch, 1990; Morwitz and Schmittlein, 1998). Managers often use a combination of subjective information (judgment or intuition) and objective information (results of a mathematical model) to form sales forecasts or expected outcomes of changes in marketing strategies. Indeed, it is common practice for modelers and managers to evaluate model results based on intuition or face validity. However, few researchers use this subjective information when estimating parameters. Prior information such as intuition or judgment are often ignored, which if incorporated

*Corresponding author. Tel.: +1-780-492-1866. E-mail addresses: [email protected] (P.T.L. Popkowski Leszczyc), [email protected] (A. Sinha). 0969-6989/$ - see front matter r 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.jretconser.2004.05.001

can improve parameter estimates and sales-forecasting ability of a model. It has been well established in the forecasting literature that combining different forecasts tends to lead to improved results (for a review, see Clemen and Armstrong, 1989). Leeflang and Wittink (2000) advocate that different data sources need to be integrated, while Wedel et al. (2000) discuss the importance of incorporating subjective judgmental data. Morwitz and Schmittlein (1998) provided an application of direct marketing offerings, where a combination of managerial judgment and statistical models leads to increased profitability. Blattberg and Hoch (1990) concluded that forecasts, which combine results obtained from a salesforecasting model and forecasts from managers (50% model and 50% managerial intuition), provided better forecasts than either method alone. Finally, there is a substantial literature on decision support systems and management information systems, which integrate managers’ expertise, data and mathematical models (see, for example, Wierenga and Van Bruggen, 2000; Wierenga et al., 1999). However, it is not our objective

ARTICLE IN PRESS 114

P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

in this paper to develop a complete decision support system, we merely propose an easier way to incorporate managerial or modeler’s judgments into choice models. There are different ways to incorporate prior information, such as modeler’s or managerial judgment or intuition into models. The approach by Blattberg and Hoch (1990) obtains separate forecasts from managers and models and combines the results from these forecasts. An alternative way to combine managerial intuition and model estimates is to integrate this information or managerial intuition as priors into a model before estimation. Rather than combining the final (sales) estimates from different sources, managerial intuition is combined with the model data to obtain a joined estimate. This idea was introduced in marketing by Little (1970), who used decision calculus to incorporate managerial judgment in the calibration of a model.1 Morwitz and Schmittlein (1998) integrate model and managerial judgment, by relying on judgment in cases where managers are more likely to be accurate and a formal model when judgments tend to be less accurate. This requires an interactive approach between the manager and the modeler. In this paper we develop a method, which allows us to incorporate prior information into a logit model. While a number of papers have included prior information, such as modeler’s or managerial judgment or intuition, these studies have mostly used regression-based models. However, with the availability of scanner data, there has been a strong trend towards the usage of choice models in marketing, but limited progress has been made on better ways to incorporate prior information into choice models. Recently, Bayesian models have been used to include prior information into choice models (Allenby and Rossi, 1999). Prior information is incorporated through the model’s prior, which is combined with the model data to obtain (Bayesian posterior) estimates. Since the prior information is specified through the parameter estimates, including prior information consists of specifying the distribution function of the prior parameters. Prior information, such as modeler’s or managerial intuition, can be included in these models, however, to do this, the complete prior distributions for the parameters to be estimated need to be specified. For example, in a brand choice model, which includes price and advertising as exogenous variables, a manager would need to specify the prior mean, and variance of the price and advertising coefficient before estimation. Therefore, this approach is not well suited to incorporate subjective information. In particular, for managers it is not intuitive to think in terms of unobservable parameter estimates, for situations where the impact of 1

The Adbug model is a special case where the model is completely determined by the manager. We thank an anonymous reviewer for pointing this out.

a change in marketing strategy is of interest to the decision makers.2 The primary usage of these models has been to incorporate unobserved heterogeneity or uncertainty in parameter estimates, and in most studies no external or out-of-sample information is included in the model. 1.1. Contributions The major objective of the current paper is to introduce a novel methodology to incorporate prior information into the logit model. The major contribution of the current paper lies in the way this prior information is incorporated in the model. Prior information is included using observed exogenous variables rather than unobserved parameters. This methodology facilitates the inclusion of subjective information into a logit model, as it is considerably easier for modelers or managers to think in terms of exogenous variables than in terms unobservable parameters. This prior information can come from different sources of information, such as managerial judgment or intuition, or from previous studies, different data sources, etc. Another important advantage is that our model allows us to vary the relative weight placed on the statistical data and the weight placed on the subjective information. Our proposed model has important applications to specific marketing problems. In this paper, we provide two important empirical illustrations of our model. The first application shows the impact of a proposed change in the marketing strategy, while incorporating subjective prior information. The proposed model has an important advantage over other reduced-form models like the traditional logit model, which implicitly assume constant parameters even during structural or policy changes. For example, the predicted impact of changes in the pricing strategy is based on fixed parameter estimates and price elasticities. However, if the price of Heinz is reduced by half, it is also expected that the price and loyalty parameters will change. Since our model can include these changes in the pricing strategies in the prior structure, parameter estimates are adjusted accordingly. A change in the pricing strategy of one brand 2 The major difficulty lies in the fact that the influence of changes in the exogenous variables on the probability of choice for a brand is a non-linear function of the parameters. Therefore, the impact of changes in exogenous variables on parameter estimates is difficult to interpret. Not knowing the magnitude of changes in parameter estimates, it is difficult to incorporate this information into the priors. It is possible to obtain priors from previous studies. However, an added complexity for the logit model is that parameter estimates are only identified up to an arbitrary constant, and therefore, parameter estimates from previous studies are not compatible with the parameter to be estimated (Swait and Louviere, 1993).

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

may impact its own price parameter, as well as price parameters of competing brands. The second application shows the model’s ability to forecast sales when limited information is available. The proposed model can be used for sales forecasting when no or limited data are available. In certain industries, data on sales and marketing mix variables may not be available for all markets, or only limited information is available. In such instances, in order to estimate parameters and forecast sales for the markets with missing data, we can estimate a model using data from an existing market while incorporating prior information reflecting the differences in the market conditions and marketing strategies used in the two markets. Modelers or managers can update their priors, reflecting changes in the market conditions, and include this information (e.g. changes in marketing mix strategies, consumer preferences) into the model’s priors, and obtain parameter estimates. Basically, this entails to forecasting with ‘‘old’’ data but with new or updated priors. In this paper, we show an application using only aggregate market statistics as priors and data from a different market region to obtain sales forecasts. Our model is based on the work by Koop and Poirier (1993), which we extend in several important ways: (i) We determine a method to incorporate changes in exogenous variables as priors. The original paper by Koop and Poirier (1993) provided a ‘theoretical’ example of how to incorporate prior information into the logit model. However, in their model the prior information included has no interpretable meaning. A limitation of the Koop and Poirier method is that both the prior and the likelihood are dependent of the same conditioning variables (or the same values of the exogenous variables); therefore, they cannot incorporate the effect of changes in exogenous variables using their priors. (ii) The model proposed in this paper is a timeseries extension of the cross-sectional Koop and Poirier (1993) model. The remainder of the paper is organized as follows. First, we will discuss the development of our model. This is followed by a description of the data and a discussion of the results. Finally, we draw some conclusions and discuss potential future research.

2. Model development In this section, we postulate a Bayesian logit model, which will combine prior information, and a statistical model. The prior information, such as modeler’s or managerial intuition, is incorporated through the model’s prior. The model is based on the multinomial logit model and is a time-series extension of the model proposed by Koop and Poirier (1993). Let us start by defining the likelihood function of the standard multi-

115

nomial logit model (McFadden, 1974): expðx0ijt yij Þ Pðdijt ¼ 1Þ ¼ P ; 0 i j expðxijt yj Þ

ð1Þ

where Pijt is the probability that consumer i selects brand j at time t: (i ¼ 1; 2, y, N; j ¼ 1; 2, y, J; t ¼ 1; 2, y, Ti ), dijt ¼

1

if household i choses brand j at time t;

0

otherwise;

xijt ¼ K  1 vector of the values of the exogenous variable for consumer i for brand j at time t; for attributes, k ¼ 1;yK; yij ¼ K  1 matrix of parameter estimates for the ith individual and jth brand. Then, the likelihood function for the ith household over T time periods is specified as follows: 2 Ti Y Y 4expðx0 yi Þ Lðyi jxijt ; dijt Þ ¼ ijt j t¼1

( 

j

X

)1 3dijt 5 : expðx0ijt yij Þ

ð2Þ

j

The likelihood function in Eq. (2) is combined with a prior distribution to form the following posterior Bayesian likelihood function (the derivations of the posterior distribution are provided in Appendix A): 0( )dijt 1 Y X YY @ A % i Þ 0 yi Þ LðyÞ ¼ ½expððC expðx0 yi Þ ijt j

i

0(

@

ti

X j

j

j

)dijt 1 A: expðxijt yij Þ % 0

ð3Þ

% i ¼ Ci þ ri Ci is the posterior function, and where C % ri X0 is the weight attached to the prior. Ci is a K  J matrix of observable and unobservable variables affectP i ing the choice of the ith household, C ¼ d x ; and ijt ijt :j t P Ci is the prior and Ci:j ¼ t dijt xijt ; xijt is the prior for % exogenous variables, % the and, dijt% ¼ r%i dijt : To facilitate the ability to incorporate prior information, we make two simplifications in Eq. (3); it is assumed that yi ¼ y; and ri ¼ r: Though our model is estimable at the individual level, this would require the specification of household-specific priors. It is difficult, if not impossible, to provide separate priors for individual households. However, a modeler or manager will be able to provide market level judgments. Hence, the model is limited by the ability of the modeler or manager to provide priors at the aggregate level. An alternative approach is to use a segmented model, where subjective priors are provided for different consumer segments. However, we do not pursue that extension in this paper.

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

116

Assuming yi ¼ y; the likelihood function reduces to the following form: 2( )dijt YY X 0 0 4 % yÞ LðyÞ ¼ expððCÞ expðx yj Þ

i the :j ¼ P amount of weight put on the prior, and C % x d ; incorporates prior information about ijt ijt t % in) the exogenous variables. (changes

The matrix C can be further decomposed into:

ijt

t

( þ

j

j

)dijt 3 X 5; expðx0ijt yj Þ % j

C ¼ MOT; ð4Þ

where X % i: C¼ C i

Several features of the model warrant further explanation, in particular, the way we incorporate prior information into Eqs. (3) and (4). This will be discussed in the next section. 2.1. How to incorporate subjective information? We will first explain the details of the model and next show a specific example of how this information is included in our model. We start by focusing on the importance of the conjugate prior, and next show how we extend the Koop and Poirier (1993) model. Conceptually, let us take Bayes theorem: pðy% jDÞ ¼ LðDjyÞpðyÞ; where pðy% jDÞ is the posterior distribution of % of parameters y; given the data, LðDjyÞ is the the vector likelihood function and pðyÞ is the prior distribution. To % estimate this model, one needs to specify the distribution pðyÞ; its mean and variance. In our approach, rather % specifying the mean and variance of the prior than distribution we denote pðyjDÞ as a conjugate prior. This % % than specifying y; where allows us to specify D rather % % D ¼ f ðCi ; xijt and rÞ: It is important to note here that %the reason % %why we can specify the prior as pðyjDÞ is the % % prior, usage of a conjugate prior, as for the conjugate the data (likelihood function), the priors and the posteriors come from the same family of distributions. In Eq. (4), prior information is incorporated in the logit model through Ci ; xijt and r: Ci is the numerator % which % % of the prior function, includes the effects of changes in exogenous variables on market shares, while changes in the exogenous variables are integrated in the denominator using xijt; and rX0 is the amount of weight % shrinkage) put on the prior. (or the degree of Furthermore, the prior information is incorporated in our model using a conjugate prior. A major advantage of the usage of a conjugate logit prior is it facilitates the ability to incorporate intuition in the model. More specifically, since we use a conjugate prior, the posterior distribution is an additive function of the prior information and the sample % i ¼ Ci þ rCi ; where C % i is the information, such that C % i posterior function, C is a K  J matrix of exogenous variables affecting the choice of the ith household, r is

ð5Þ

where P M is a K  J matrix with market shares, Mkj ¼ Tj = P j ¼ msj is the market share for the jth brand, j TP Tj ¼ i t dijt is the number of purchase occasions on which the jth brand was chosen, and O is a K  J matrix of mean values of the exogenous variables, given as follows: P P i P t dijt xijt : ð6Þ O:j ¼ ð1=Tj ÞC:j ¼ P i t dijt Similarly the prior C can be written as C ¼ M OT; ð7Þ % % % where M is a K  J matrix with predicted market shares % conditional on changes in the exogenous variables, O ¼ % GO; and G is a K  J matrix, which incorporates changes in the exogenous variables. The matrix G scales the O matrix with the average values of the exogenous variables, such that values greater (less) than one increase (decrease) the prior means of the exogenous variables. For example, for Gkj ¼ 1; for k ¼ 1;yK and j ¼ 1;yJ; the mean values of the exogenous variables for the likelihood function will be the same as those of the prior function. Therefore, we will be able to determine the impact of change in marketing strategy by including changes in exogenous variables in the prior. Similarly for the denominator we include prior information as follows: xijt ¼ G:j xijt : ð8Þ % Therefore, a major contribution of our model over the method developed by Koop and Poirier (1993) is the way prior information is incorporated. In Koop and Poirier’s model, the prior comes from a fictitious sample with the same conditioning variables, meaning that the values of the exogenous variables are the same for the prior and the likelihood function. Therefore, it is not possible to incorporate the effect of changes in exogenous variables in their model. To incorporate the prior information into the logit model, three different sources of information are required: (i) the changes in the exogenous variables (values for the matrix G), (ii) an estimate of market shares given changes in the exogenous variables (the matrix M), and (iii) the weight % attached to the prior (r). We will now provide an illustration of how a proposed change in pricing policy can be incorporated in our model. Let us suppose a market with three brands (A, B and C), and with the following information about

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

the exogenous variables: 0 1 1 1 1 B C O ¼ @ pA pB pC A ; adA adB adc

ð9Þ

where the first row denotes the intercept, pA ; pB and pC are the average prices, and adA ; adB ; adC are the average values of advertising for the three brands. Assume that brand A has a market share of 60%, brand B of 20%, and brand C also 20%, and in total there are 1000 observations in the data set. Then the matrix M with market shares is given as follows: 0 1 0:6 0:2 0:2 B C M ¼ @ 0:6 0:2 0:2 A: ð10Þ 0:6

0:2

0:2

Given this information, one can easily reconstruct the matrix C in Eq. (6), as follows: 0 10 1 0:6 0:2 0:2 1 1 1 B CB C C ¼ @ 0:6 0:2 0:2 A@ pA pB pC A  1000: 0:6 0:2 0:2 adA adB adc ð11Þ Let us suppose that a 10% proposed increase in price of brand A is expected to lead to a 10% decrease in market share for brand A, while brands B and C will each gain half of this loss in market share. Hence, as a result of this change the market share for brands A, B and C are expected to become 0.54, 0.23 and 0.23, respectively. In order to incorporate this information as a prior in our model, we need to specify the G and M % matrices, which have the following form: 0 1 0 1 1 1 1 0:54 0:23 0:23 B C B C G ¼ @ 1:1 1 1 A; M ¼ @ 0:54 0:23 0:23 A; ð12Þ % 1 1 1 0:54 0:23 0:23 where t21 ¼ 1:1 indicates that the price for brand A has been increased by 10%. Then, the prior matrix C is given as follows: % 0 10 1 0:54 0:23 0:23 1 1 1 B CB C C ¼ MGOT ¼ @ 0:54 0:23 0:23 A@ 1:1 1 1 A % % 0:54 0:23 0:23 1 1 1 0 1 1 1 1 B C  @ pA ð13Þ pB pC A  1000: adA

adB

adc

Finally, we need to determine the amount of weight to attach to the prior (the parameter r). Based on the value of this parameter, either more weight is put on the prior information or on the household’s purchase history. We can conceptualize this as the degree of confidence that modelers or managers have in their subjective priors. Alternatively, if prior

117

data are obtained from different studies, or when combining two different sources of data the relative sample size can be used as a weight.3 The weighting parameter can take any value between zero and infinity. For example, when r ¼ 0; we have a diffuse prior and no weight is put on the prior, when r ¼ 1; the prior and the likelihood function obtain equal weight, and when r ¼ 2; the prior has a weighting of 2/3 and the likelihood function a weighting of 1/3. 3. Data and empirical analysis 3.1. The data The data used in this paper are scanner panel data of Ketchup purchases provided by A.C. Nielsen Inc. Sales data for the three major brands Heinz, Hunt and Delmonte, and an aggregate of generic brands are included for analysis. These four brands account for close to 99% of the total market sales. Households, which purchased any of the brands not included in this study, were deleted from the sample. For our analysis we selected the Springfield, MD market and included households who made more than five purchases. The data consist of a sample of 703 households consisting of 7034 purchase occasions. The variables included in our model are: price, advertising and brand loyalty. Price is the actual pre-coupon price per ounce paid by a consumer at a specific store. For competing prices we used the retail tracking data to obtain store specific prices. The advertising variable is a dummy variable indicating whether a particular brand was locally advertised or not. Brand loyalty measures the propensity of a household to buy a particular brand. We use a simple static measure; the household level brands’ market shares, initialized using 52 weeks of purchase history. Brand loyalty is included to incorporate observed heterogeneity in consumer preferences. We next provide two empirical illustrations of our model. The first illustration shows the impact of a proposed change in the price of Heinz Ketchup. The second application shows the model’s ability to estimate parameters and forecast sales in a market region when only limited information is available. 3.2. Empirical illustration of the impact of a change in pricing strategy We study the impact of a ‘proposed’ decrease of 20% in the price of Heinz Ketchup. In order to estimate 3

Swait and Louviere (1993) have recommended the use of scale parameters as weights when combining different sources of data. However, this is not possible for our model as subjective information, such as modeler’s or managerial intuition, generates fictitious data for which the scale parameter is unknown.

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

118

Eq. (4), we need to obtain the following prior information, which consists of three parts: (i) an objective component, the 20% decrease in price included in the price prior for Heinz, and (ii) a subjective part, which incorporates the changes in brands’ market shares due to the price decrease, and (iii) the amount of weight to put on the priors. This prior information for parts (ii) and (iii) is subjective information such as judgment or intuition, which can be obtained from modelers or mangers. In this paper, we use a surrogate for modeler’s intuition by using information obtained from previous studies to specify our priors. The amount of weight to put on the priors can be based on the degree of confidence one has in the prior. Prior information from previous studies: On average, a 20% decrease in price of Heinz Ketchup leads to a 10.2% increase in market share (Allenby and Lenk, 1994). The effects on sales due to a change in price differs by brand (e.g. Blattberg and Wisniewski, 1989; Kamakura and Russell, 1989). Asymmetric effects due to price changes for Ketchup have been reported by Popkowski Leszczyc and Bass (1998). The largest proportion of promotional purchases are switches from Delmonte to Heinz (38%), followed by switches from Generics to Heinz (32%), and Hunts to Heinz (30%) (Popkowski Leszczyc, 1992).

*

*

*

terms and the exogenous variables, namely advertising, price and loyalty. Similarly the M matrix has the following form: 0 1 0:71 0:08 0:14 0:07 B 0:71 0:08 0:14 0:07 C B C ð15Þ M¼B C; @ 0:71 0:08 0:14 0:07 A 0:71

0:08

0:810

0:349 0:264 0:282 P P P P where Okj ¼ i t dijt xkijt = i t dijt : The columns of the O matrix represent different brands, namely Heinz, Hunts, Delmonte and Generics, while the rows of this matrix represent the intercept 4

A price elasticity for Heinz of 0.5 is relatively low for a frequently purchased consumable good (Tellis, 1988). However, Heinz’s market share is over 60% and has less room to increase. In addition, Heinz is a premium priced national brand and consumers tend to be more loyal and less price sensitive (Popkowski Leszczyc, 1992).

0:07

where, each of the four columns of the M-matrix represent the market shares for, respectively, Heinz, Delmonte, Hunts and Generics. Similarly, we need to specify C ¼ OGMT: The G matrix represents the changes in the exogenous variables or the effect of a 20% decrease in the price of Heinz, which is represented by G21 ¼ 0:8: All other values of the G matrix are equal to one, signifying no other changes in the pricing or advertising policies. 0 1 1 1 1 1 B 0:8 1 1 1 C B C ð16Þ G¼B C: @ 1 1 1 1A 1

1

1 1

1

1

1

1

1

1

1 1

1 1

B 1C CB 0:007 CB 1 A@ 0:033

Hence, 0

B 0:8 B O ¼ GO ¼ B @ 1 % 1

Based on the information from previous studies (Allenby and Lenk, 1994), we expect that a 20% decrease in price for Heinz will lead to a 10.2% increase in its market share.4 Heinz is expected to draw 24% of its increase in market share from Delmonte, 45% from Hunts and 31% from Generics. This is the subjective prior information which we will incorporate in the model by specifying the O; G and M matrices. However, % C ¼ OMT % first we need to determine from the data set, where M contains the market share of the different brands, O the average of the exogenous variables, and T is the sample size (T ¼ 7034). These matrices have the following form: 0 1 1 1 1 1 B 0:007 0:054 0:042 0:0004 C B C ð14Þ O¼B C; @ 0:033 0:262 0:20 0:08 A

0:14

10

1

1

0:810

1

1

0:054

0:042

0:262 0:349

0:20 0:264

1

1

0:0004 C C C: 0:08 A 0:282

Based on the information from prior studies, we predict that the market share for the four brands will change to 0.78 for Heinz, 0.06 for Delmonte, 0.11 for Hunts and 0.05 for Generics. This information is incorporated into prior using the M matrix. This is % represented as follows: 0 1 0:78 0:06 0:11 0:05 B 0:78 0:06 0:11 0:05 C B C ð17Þ M¼B C: @ 0:78 0:06 0:11 0:05 A % 0:78

0:06

0:11

0:05

Using Eqs. (16) and (17), and for T ¼ 7034; we can calculate C; 0 % 1 5478 430 775 351 B 28:9 24 32 0:15 C B C ð18Þ C¼B C: @ 182 113 155 28:08 A % 4185

150

205

99

Eq. (4) is estimated using a maximum likelihood estimation routine. The programs were implemented and estimated in GAUSS using the MAXLIK routine. Table 1 provides the results of the logit model for different values of the weights for the prior (r). The first column with results shows the results of a simple logit model without prior information (r ¼ 0). The other results are for models, which include prior information

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

119

Table 1 Results of model with and without prior information Variables

r¼0

r ¼ 0:5

Constant Heinz Constant Delmonte Constant Hunts Price Advertising Loyalty

1.10 0.20 0.49 5.60 1.30 2.66

1.26 0.18 0.52 5.62 1.32 2.73

(.06) (0.07) (0.06) (0.49) (0.05) (0.06)

r ¼ 1:1

r ¼ 10

1.36 (0.06)

(0.06) (0.09) (0.06) (0.54) (0.05) (0.06)

1.58 0.14 0.57 5.44 1.41 2.94

0.17 (0.11) 0.53 (0.06) 5.60 (0.58) 1.35 (0.06) 2.79 (0.07)

(.06) (0.07) (0.07) (0.62) (0.06) (.07)

 Coefficients are statistically significant at the 0.05 level.

Table 2 Predicted sales for the different models (T ¼ 7034) (impact on sales due to a 20% decrease in price of Heinz)

Heinz Delmonte Hunts Generics

Actual sales

r¼0

r ¼ 0:5

r ¼ 1:1

r ¼ 10

Prior shares

4972 551 1003 508

4972 551 1002 507

5130 513 931 459

5185 553 875 421

5428 441 795 367

0.78 0.06 0.11 0.05

(0.707) (0.078) (0.143) (0.072)

(0.707) (0.078) (0.142) (0.072)

but with different weights on the prior (different r values). The value of r ¼ 1:1 places almost equal weight on the prior and the likelihood data. We selected this value for the prior as the sample size from which the priors were obtained was slightly larger than the sample size in our study. In addition, we selected a smaller and larger weight put on the prior for comparison sake. The parameter estimates are all significant and have the correct signs. The brand constants indicate that customer intrinsic brand preferences are highest for Heinz, followed by Hunts, Generics and Delmonte. We note that most of the effect of the price change is absorbed by a change in the intercept for Heinz and in the brand loyalty parameter. Hence, the intrinsic brand preferences for Heinz increase because of the increase in market share for Heinz due to the price decrease. In addition, for the same reason the brand loyalty parameter increases. These effects increase when more weight is put on the prior. The price parameter remains about the same, indicating that we do not observe a change in the price sensitivity of consumers, due to a 20% decrease in the price of Heinz.5 It is important to realize that the prior information incorporates the effect of a decrease in the base price rather than a temporary price special. This change has an impact on brand loyalty and the intercept term, but does not influence the short run price elasticities. The log-likelihood values are not included in Table 1, since values for these models are not comparable over different samples (the sample size changes based on the weight put on the prior). 5 The price elasticities for the logit model are calculated as follows: bxð1  pÞ; where p=the purchase probability. Therefore, when r ¼ 0; we obtain the following price elasticities: Heinz=0.467, Delmonte=2.043, Hunts=1.857, and Generics=1.623.

(0.729) (0.073) (0.132) (0.065)

(0.737) (0.079) (0.124) (0.060)

(0.772) (0.063) (0.113) (0.052)

Another question of interest is what is the effect of the change in pricing strategy on sales of Heinz and the competitors. Table 2 provides the predicted sales for each of the models estimated in Table 1. These are sales estimates conditional on a change in price under current market conditions. These results are obviously influenced by the subjective prior for the market shares, when r ¼ 0 estimates of market shares are consistent with the data, and when r ¼ 10 they are with the priors. These results show that we can effectively include prior information, which influences the model’s outcome. Note the priors can have an influence on sales forecast, as well as, an immediate impact on parameter estimates. We can therefore both utilize the ability of the mathematical model with the strength of the modeler’s or managerial intuition. A superior way to test the ability of our model to determine the impact on sales due to a change in pricing strategy is to have data with the occurrence of a price break and determine the forecasting accuracy for sales during the post price break period. Since we do not have data on price breaks, we next look at a different illustration, which provides a test of the model’s ability to forecast out of sample sales. 3.2.1. Parameter estimation and forecasting with limited data The following empirical illustration looks at the model’s ability to estimate parameters and forecast sales in a market when only limited information is available. For example, scanner panel data are only available for a limited number of cities or market places. For other market places companies will only have limited information, such as overall sales and market shares, and

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

120

Table 3 Results of model with and without prior information Variables

r¼0

r¼1

r ¼ 2:035

Estimates using holdout data

Model estimates (both data sets)

Constant Heinz Constant Delmonte Constant Hunts Price Advertising Loyalty

1.10 0.20 0.49 5.60 1.30 2.66

0.82 0.43 0.34 9.43 0.89 2.42

0.69 0.54 0.28 10.34 0.72 2.28

0.36 0.84 0.14 11.67 0.22 1.89

1.04 0.21 0.37 8.67 1.02 2.39

(0.06) (0.07) (0.06) (0.49) (0.05) (0.06)

(0.05) (0.07) (0.06) (0.45) (0.06) (0.06)

(0.05) (0.07) (0.06) (0.44) (0.06) (0.06)

(0.05) (0.08) (0.05) (0.42) (0.08) (0.07)

(0.05) (0.02) (0.01) (0.12) (0.01) (0.01)

 Coefficients are statistically significant at the 0.05 level.

the marketing strategies used. The following example will show our models’ ability to estimate parameters and forecast sales for the market place with the limited information. For model estimation we will use data from a market place with full information and include prior information from the market place with the limited information. More specifically, data from the Springfield market are used for model estimation, while including priors obtained from the Sioux Falls market, to estimate parameters and forecast sales for the Sioux Falls market. The question is how accurate will these parameters and forecasts be, given that we only include aggregate level or point priors from the Sioux Falls market? Normally we would estimate such a model when we have only limited information for the Sioux Fall market and full information from the Springfield market. In our instance, we have full information from both markets, which allows us to conduct out-of-sample forecasts and test the accuracy of our approach. The prior information is obtained from the Sioux Falls market (a random sample of 500 households consisting of 3455 purchase occasions). Similar to example one, the differences in the market shares and the exogenous variables are included in the form of prior information (see Eqs. (16)–(18): 0

10

1

0:66

0:09

0:16

0:09

1 1

1

1

B 0:66

0:09

0:16

0:09 CB 1 1

1

1C

0:66

0:09

0:16

0:09

1

CB C C ¼ MGOT ¼ B B CB C @ 0:66 0:09 0:16 0:09 A@ 1 1 1 1 A % %

0

1

B 0:0256 B B @ 0:0205 0:2114  3455; 0 2280:3 B 58:9 B C¼B @ 46:6 % 482:0

1

1

0:0609

0:0620

1 1

1

0:1167 0:1569 0:0699 0:0683

0:0390 C C C 0:0520 A 0:2000

552:8

18:9

12:1 C C C: 86:7 16:2 A 37:7 62:2

36:3 21:7

311:0

1

311:0

34:3

1

1

ð19Þ

The parameter estimates are provided in Table 3. The results of these estimates are used to forecast sales in the Sioux Falls market (see Table 4). The first column with

Table 4 Model fit on sample data for the different models based on residual mean squared error (N ¼ 3455)

Heinz Delmonte Hunts Generics

Actual sales

r¼0

r¼1

r ¼ 2:035

Model on holdout data

1621 288 761 785

2069 239 598 548

1907 270 651 626

1833 279 678 665

1620 287 761 785

1964

1822

1787

1757

Predictive Fit

results in Table 3 is the basic logit model for the Springfield market without prior information (hence, these are the same as the results in the first column of Table 1). The second and third columns are estimated using the Springfield market data, but with prior information obtained from the Sioux Falls market. The fourth column with results provides the actual parameter estimates for the basic logit model estimated on the Sioux Falls market data. The last column shows the results of a logit model estimated on both data sets. The value for r=1 provides equal weighting and r ¼ 2:035; was selected based on the sample size (as the prior information came from a sample which was 2.035 times the size of the estimation sample). Results show that the parameter estimates for the model which includes prior information are much closer to the ‘‘true’’ parameters. In particular, we can see the noticeable difference in the price parameter. For comparisons, we also estimated a model on the complete data set. In cases when there is additional data, would it be better to combine data from different markets, rather than just use prior information from the Sioux Falls market? Results in Table 3 clearly indicate that parameter estimates, when r ¼ 2:035; are closer to the true estimates (using the holdout data) than those obtained from both data sets. Table 4 shows the sales forecasts and model fit for the different models in Table 3. Model fit is based on the residual mean squared error, which is calculated as follows:

PRESS ¼

XXX j

i

t

U# itj  ditj P # j Uitj

!2 ;

ð20Þ

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

where U# itj is the predicted utility for the jth brand and ith individual at the tth time period. These results show a substantial improvement in predictive fit after including prior information. This shows that inclusion of simple information, such as differences in the market share and marketing mix variables across different markets, as prior information can greatly enhance the predictive ability of a model, as well as, the parameter estimates.

4. Conclusions and areas of future research The objective of this paper was to illustrate how subjective prior information, such as modeler’s or managerial intuition, can be incorporated into consumer choice models. In this paper, we have derived a novel Bayesian choice model, which allows modelers or managers to include intuition in an easy way. In particular, prior information about exogenous variables (rather than parameter estimates) is included in our model. Bayesian models of consumer choice have been used in the marketing literature, though predominantly to incorporate consumer heterogeneity in choice models. The need to specify the mean and variance of all prior parameters reduces its applicability for managerial decision-making. Using our approach, managers need not think in terms of prior parameters of the model, but rather need to think in terms of the effect of changes in the market share conditional on changes in the exogenous variables. It is significantly easier for managers to think in terms of observable covariates than in terms of the unobservable parameter estimates. We have provided several applications and showed how our model allows modelers or managers to incorporate managerial intuition or prior knowledge into the model’s priors: (i) for forecasting purposes when limited information is available in a market place, (ii) to determine the impact on sales of changes in marketing strategies. Our results have shown that we can effectively combine or include subjective prior information in our model and obtain superior estimates and forecasts. In the case of limited or missing information, subjective priors can be effectively used to forecast sales in regions with missing data. In addition, we show when trying to estimate results for a Sioux Falls that just combining data from both regions may lead to worse estimates than our model, which uses information data from Spring Field combined with subjective priors from Sioux Falls. This implies that in cases when there are limited data for a market, it may be better to use subjective priors (with a model estimated on data from different regions) than just combine these all data to obtain parameter estimates. In this way, it is possible to utilize the ability of the mathematical model, as well as the strength of modelers’

121

or managerial judgments. This is especially useful when trying to determine the impact of a change in marketing strategy. A modeler or manager will be able to consider events not considered by the model. Another benefit is that these results will be more acceptable to management as model results are in part based on their own priors or intuition. In conclusion, empirical analysis in this paper shows that our approach has substantial potential as an important forecasting tool and to determine the impact of changes in marketing mix strategies. There are, however, several areas of future research from which our method could benefit. An important application of our model is to forecast sales by incorporating managerial knowledge or intuition as priors. Hence, future research should collect managerial judgment and compare results against other approaches, which combine statistical and managerial information (e.g., Blattberg and Hoch, 1990). Another important area of future research is to determine the best way to elicit priors from management. Revealed preference data, such as scanner panel data, cannot be used for developing normative models of managerial decision-making as the parameters of consumer models change with policy regime (Lucas, 1976). An alternative method (to structural equation models) to estimate policy invariant parameters is to obtain data from many different policy regimes and then fit a statistical model on this data (Keane, 1997). Since this type of data is generally not available, experiments or managerial intuition can provide these data. Managers can be asked to predict market shares for alternate policy regimes and this can be included as priors. Similarly results of experiments can be combined with revealed preference data. This provides a convenient way to combine stated and revealed preference data, by using stated preferences as priors in a model. Furthermore, our model can be used to solve two problems often encountered with estimating models using scanner data. These are as follows: (1) The confounding of parameter estimates like advertising and price promotions, which are often observed simultaneously, (2) the lack of variation in observed variables. For example, when households are brand loyal, or there is lack of coupon usage for generic products, this will lead to problems in estimating parameters. Priors from managerial intuition or experiments can be included to solve these problems. Finally, it is also possible to incorporate multiple priors in a model. Currently, priors incorporate market level information, an alternative method is to obtain priors for different segments. For example, managers could determine a meaningful way to segment the market a priori (e.g. loyal consumers and switchers) and specify separate priors for these segments. Discrete mixing distributions provide an alternative way to

ARTICLE IN PRESS 122

P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123

incorporate segment level (Wedel and Kamakura, 1999). Furthermore, we may obtain managerial intuition from a variety of sources (different managers), and different weights may be attached to these priors. Also, managers could be asked to provide different priors for optimistic and pessimistic scenarios. This may be useful because managers may overstate (or understate) the actual impact on sales due to changes in market shares.

Acknowledgements The authors gratefully acknowledge A.C. Nielsen for providing the data. The authors acknowledge the valuable suggestions received from Adam Finn, Ujwal Kayande´, Gary Koop, Dale Poirier, Harry Timmermans, and the participants of the University of Alberta seminar series. Funding for this research has been received from the Pearson Fellowship, the Central Research Fund at the University of Alberta and the Social Sciences and Humanities Research Council of Canada.

Appendix A We start with the standard multinomial logit model (McFadden, 1974), with the following likelihood function for the ith household: 2 ( )1 3dijt Ti Y Y X 4expðx0 yi Þ 5 Lðyi jxijt ; dijt Þ ¼ expðx0 yi Þ ijt j

t¼1

ijt j

j

j

ðA:1Þ where xijt ¼ K  1 vector of observable and unobservable characteristic of the jth brand in the tth time period, yi ¼ K  J matrix of parameters to be estimated for the ith individual,  1 if household i choses brand j at time t; dijt ¼ 0 otherwise: As per Koop and Poirier (1993), Eq. (A.1) can be rewritten as the following: 0

Lðyi jxijt ; dijt ; Ci Þ ¼ expðCi yi Þ 2( ) 3dijt Ti Y   1 Y X 4 5  exp x0ijt yij t¼1

j

j

ðA:2Þ where N is the total number of households in the data set, Ci is a K  J matrix of observable and unobservable exogenous variables for the ith individual, and Ci:j ¼ P t dijt xijt : To compute the Bayesian posterior distribution, the prior distribution for gðyi Þ needs to be specified. As per

Koop and Poirier (1993), we specify a conjugate prior as follows: gðyi jxijt ; Ci ; dijt ; ri Þ ¼ expððri Ci Þyi Þ % ( % % )dijt YY X 0 i  expðxijt yj Þ ; % t j j ðA:3Þ where xijt ; Ci are the priors, dijt ¼ ri dijt ; and ri X0; and % can be %interpreted as the weight attached to the prior. The posterior distribution is obtained by combing Eqs. (A.2) and (A.3), and is given as follows: % i ; dijt ; dijt Þ ¼ c expððC % i Þ 0 yi Þ gðyi jxijt ; xijt ; C % 0( ) X YY i d 0 ijt @  expðxijt yj Þ t

( 

j

X j

j

)dijt 1 A; expðx0ijt yij Þ % ðA:4Þ

where c is a constant and Ci ¼ Ci þ ri Ci : To obtain maximum likelihood estimates for the parameters in Eq. (A.4), we need to first specify the likelihood function of the posterior for the ith individual, which is given as follows: % i ; dijt ; dijt Þ; Lðyi Þ ¼ gðyi jxijt ; xijt ; C ðA:5Þ % and the likelihood function summed over households, brands and time period is given by " " Y YY X 0 i i % ÞyÞ LðyÞ ¼ expððC ffexpðx0ijt yij Þgdijt i

t

j

##

þ fexpðx0ijt yij Þgdijt g %

j

ðA:6Þ

References Allenby, G.M., Lenk, P.J., 1994. Modeling household purchase behavior with logistic normal regression. Journal of American Statistical Association 89, 1218–1231. Allenby, G.M., Rossi, P.E., 1999. Marketing models of consumer heterogeneity. Journal of Econometrics 89 (1–2), 57–78. Blattberg, R.C., Hoch, S.J., 1990. Database models and managerial intuition: 50% model+50% manager. Management Science 36 (8), 887–897. Blattberg, R.C., Wisniewski, K.J., 1989. Price-induced patterns of competition. Marketing Science 8, 291–309. Clemen, R.T., Armstrong, J.S., 1989. Combining forecasts: a review and annotated bibliography; the end of the beginning or the beginning of the end. International Journal of Forecasting 5 (4), 559–588. Kamakura, W., Russell, G.J., 1989. A probabilistic choice model for market segmentation and elasticity structure. Journal of Marketing Research 26, 379–390.

ARTICLE IN PRESS P.T.L. Popkowski Leszczyc, A. Sinha / Journal of Retailing and Consumer Services 12 (2005) 113–123 Keane, M.P., 1997. Current issues in discrete choice modeling. Marketing Letters 8, 307–322. Koop, G., Poirier, D.J., 1993. Bayesian analysis of logit models. Journal of Econometrics, 323–340. Leeflang, P.S.H., Wittink, D.R., 2000. Building models for marketing decisions: past present and future. International Journal of Research in Marketing 17 (2–3), 105–126. Little, J.D.C., 1970. Models and managers: the concept of decision calculus. Management Science 16, B466–B485. Lucas, R.E. Jr., 1976. Economic policy evaluation: A critique. In: Meltzer, K., Meltzer, A.H. (Eds.), The Philip Curve and Labor Markets. Journal of Monetary Economics (suppl), 7–33. Mahajan, V., Wind, Y., 1988. New product forecasting models: directions for research and implementation. International Journal of Forecasting 4, 341–358. Mcfadden, D., 1974. Conditional logit analysis of qualitative choice data. In: Zarembda, P. (Ed.), Frontiers in Econometrics. Academic Press, New York. Morwitz, V.G., Schmittlein, D.C., 1998. Testing new direct marketing offerings: the interplay of management judgment and statistical model. Management Science 44 (5), 610–628. Popkowski Leszczyc, P.T.L., 1992. Investigating the effects of unobserved heterogeneity in stochastic models of consumer choice:

123

a hazard model approach. Unpublished Doctoral Dissertation, University of Texas at Dallas. Popkowski Leszczyc, P.T.L., Bass, F.M., 1998. Determining the effects of observed and unobserved heterogeneity on consumer brand choice. Applied Stochastic Models and Data Analysis 14, 95–115. Swait, J., Louviere, J., 1993. The role of the scale-parameter in the estimation and comparison of multinomial logit models. Journal of Marketing Research 30 (3), 305–314. Tellis, G.J., 1988. The price elasticity of selective demand: a metaanalysis of econometric models of sales. Journal of Marketing Research 25 November, 331–341. Wedel, M., Kamakura, W., 1999. Market Segmentation: Conceptual and Methodological Foundations. Kluwer Academic Publishing, Dordrecht, Netherlands. . Wedel, M., Kamakura, W., Bockenholt, U., 2000. Marketing data, models and decisions. International Journal of Research in Marketing 17 (2–3), 203–208. Wierenga, B., Van Bruggen, G.H., Staelin, R., 1999. The success of marketing management support systems. Marketing Science 18 (3), 385–396. Wierenga, B., Van Bruggen, G.H., 2000. Marketing Management Support Systems: Principles, Tools, and Implementation. Kluwer Academic Publishing, Boston: MA.