Measuring Consumer, Nonlinear Brand Choice Response to Price MAKOTO ABE Universityof Tokyo
Nonlinear price responses by shoppers in brand choice decisions have important managerial and theoretical implications for designing pricing and price promotions. However, the study of these phenomena by traditional parametric techniques is not only tedious, but may lead to an incorrect confirmation of theory. To avoid these problems I demonstrate a simplified means for detecting the possible presence of nonlinearities in consumer response by means of a unique, nonparametric method. This method relaxes the usual linear-in-parameters character of the multinomial logit (MNL) model by an additive sum of one-dimensional nonparametric functions of explanatory variables. In an application of this model to aseptic and red drink databases, I document the presence of these nonlinearities in the form of price threshold effects and a saturation effect with respect to price reductions, l further apply the model to orange juice and ground coffee scanner data by employing the framework of a previous parametric study by Kalyanaram and Little (1994). The model easily found the expected existence of weak price sensitivity around the reference price and a greater consumer negative reaction to price increases than positive reactions to price decreases. Researchers may find this nonparametric technique a useful addition to their kit of statistical tools.
INTRODUCTION Understanding household brand choice decisions, and how they are affected by price and promotion, provides the packaged goods marketer with a distinct advantages. The potential
Makoto Abe, Universityof Tokyo,Facultyof Economics,7-3-1 Hongo,Bunkyo-ku,Tokyo 113-0033JAPAN, TEL 81-3-3812-2111,x5646,FAX 81-3-3818-7082
. Journal of Retailing, Volume 74(4), pp. 541--568, ISSN: 0022-4359 Copyright © 1998 by New York University. AH rights of reproduction in any form reserved.
541
542
Journal of Retailing Vol. 74, No. 4 1998
exploitation of nonlinear responses, in particular, offers promise since knowledge of departures from linearity may lead to significantly more efficient marketing decisions. Previous studies have indicated that consumers' subjective price perception can result in a complex, nonlinear utility function of price. Some cognitive studies have, for example, shown the existence of a region of prices, termed the latitude of price acceptance, to which consumers are insensitive. In an experiment of purchase intention, Gupta and Cooper (1992) found a minimum depth for price discount before consumers would start responding and a saturation point beyond which further discount had little impact on purchase intentions. A threshold effect, a point at which price change is recognized by the consumer and where utility changes abruptly was discovered by Monroe (1973, 1990). The ability to map these twists and turns would provide enormous insight in planning of efficient promotional programs, especially for packaged goods. To identify these nonlinear relationships, researchers have used covariates in the multinomial logit (MNL) models by introducing higher order terms and/or transformations. In 1991, for example, Allenby and Rossi (1991) incorporated a logarithmic transformation of price in the measurement of brand utility. Tellis (1988) introduced a quadratic term for an advertising index. Unfortunately, the selection of such terms and/or their transformation must be carried out by trial-and-error using researcher judgment or repeated statistical testing for the model fit. Worse, when the underlying function has a more complicated shape, the fitting of an appropriate parametric function may even be difficult a priori. At best, finding the correct utility specification will be a time-consuming and subjective task. Nonparametric techniques, such as kernel and spline regression and the alternating least squares optimal scaling method, offer the potential to overcome such difficulties (Abe, 1991; Donthu and Rust, 1989; Perreanlt and Young, 1980; Rust, 1988). These techniques estimate a function, f, nonparametrically, given a relationship y=f(x) between a dependent variable y and a vector of independent variables x. The apparently logical extension of these techniques to the brand choice context, by adapting them to the utility function in a multinomial logit model, v--f(x), where v is utility, is unfortunately not feasible. Utility is a latent concept that cannot be observed directly, but instead must be inferred from actual brand choice. Though some researchers have recently suggested alternative nonparameWic methods that take a nominal discrete dependent variable, large data and computational requirements tend to limit their practical values (Abe, 1995; Matzkin, 1993). In light of these problems, the objective of this paper is two fold. I first introduce a method that can estimate a utility function of a tried-and-tested multinomial logit model nonparametrically. This is done by re-interpreting an extended formulation of the generalized additive model for matched case-control data, an approach used in medical studies (Hastie and Tibshirani, 1990). The resulting algorithm is simple and can be implemented easily on a PC. Because the method belongs to a family of generalized additive models studied extensively in statistics, its mathematical properties are well documented. The sophistication of this method, and its ease of use, suggests that it has the potential to become a valuable exploratory tool for discrete choice analysis. However, despite a close analogy between matched case-control data and discrete choice data, the author is not aware of any application to discrete choice in marketing, statistics, economics or transportation studies.
Nonlinear Brand Choice Responseto Price
543
To demonstrate the model's value, I apply this technique to several different sets of marketing data to show how one can take advantage of this approach to pricing and promotion. In one product category, I estimate a utility function that reveals the highest price a retailer can command without losing significant sales volume. In another, I show the effective depth of a deal revealed by consumer responses to temporary price reductions. My second purpose is to demonstrate the ease whereby this technique may be applied relative to earlier approaches. I do this by replicating the parametric study by Kalyanaram and Little (1994) of nonlinear aspects of consumer price responses. In the earlier study, the authors searched for latitudes of price acceptance by employing a three-piecewise-linear utility function of price in the MNL model. For different product categories, I tackle the same estimation problem by using the nonparametric approach. The viability of the nonparametric approach is shown by its ready discovery of the previously identified empirical generalizations. The article is organized as follows. First, the model used in these studies, the nonparametric MNL, is introduced and its ability to estimate nonlinear response in brand choice is established through simulation studies. I then use the model to search for nonlinearities in consumer brand choice response from scanner-panel data in two product categories, aseptic and red drinks. Managerial implications are then discussed. Next, the latitude of price acceptance and asymmetric price responses are studied in the framework of Kalyanaram and Little by use of coffee and orange juice data. The article concludes with a discussion of the advantages and limitations of the nonparametric method.
METHOD Model Specification The nonpararnetric utility specification is built on a multinomial logit (MNL) model. Among many classes of stochastic utility maximization models of discrete choice, MNL has been used extensively in studying brand choice using scanner panel data. Its analytic and computational tractability has had great success not only in marketing but in other fields such as econometrics (Manski and McFadden, 1981) and transportation (Ben-Akiva and Lerman, 1985). In some commercial firms the use of MNL models to analyze scanner panel data is part of everyday operation. The choice probability of alternative j as expressed in the typical linear-in-parameters MNL model is
eVJ Pj = ~, evk k
where
vj
~,~pXjpp
(1)
544
Journal of Retailing Vol. 74, No. 4 1998
and xjp denotes the p-th explanatory variable for alternative j. Our objective is to obtain an MNL model with a flexible utility structure such that eVJ
Pj
=
~
E e vk k
where Vj = ~ , f p(Xjp) P
(2)
and fp(.) is a one-dimensional nonparametric function of the p-th covariate. Additive separability in explanatory variables, as shown in equation 2, is of particular interest to researchers for four reasons. First, it is a natural generalization of the linear specification, resulting in a tractable and interpretable model. Second, the separability allows researchers to focus only those variables of special interest by relaxing a subset of the covariates while parametrically specifying the remaining variables to conserve the degrees of freedom. Third, an interaction term can be incorporated by creating a new variable that is a product of interacting covariates and extending the p-index as in the linear model. Fourth, if several one-dimensional functions fp(-) are combined in a more general, multidimensional nonparametric function fpq..(-,.,.... .), one is likely to encounter the "curse of dimensionality." (Abe, 1995; Silverman, 1986) In such conditions, an exponential increase in sample size may be required to maintain estimation accuracy as the number of dimensions increases.
Generalized Additive Models Because the proposed nonparametric model is derived from a generic formulation of the so called generalized additive models (GAMs), I introduce the basic concept first. Generalized linear models (GLMs) can accommodate a wide range of relationships between a response variable y and a predictor index ~ that is linear in parameters of explanatory variables Xp(p=l,2,..,P) such that rl(x) = Y-,pl~pXp.GLMs generalize the standard regression methodology to allow diverse types of a response variable. Appropriate specification of the random Component and the link function in GLMs leads to various regression models such as OLS, logistic regression, binary probit, and log-linear models. Generalized additive models (Hastie and Tibshirani, 1986, 1987) extend GLMs by relaxing the linear-in-parameters assumption of the predictor index r I with a sum of onedimensional nonparametric functions of each explanatory variable so that rl(x) = Y_,pfp(Xp). For example, the GAM for logistic regression of a binary response variable y is expressed as
k t ( x ) - E(ylx) =
1
1 + e -rl(x)
(3)
where l](x) = £pfp(Xp) and fp is a nonparametric function of the p-th explanatory variable.
Nonlinear Brand Choice Response to Price
545
Nonparametric Utility Function in a MNL Model Examination of model of (2) might suggest that the nonparametric logistic regression of (3) can be adapted to MNL by substituting the predictor index 1] with the utility vj. However, a straightforward extension of equation (3) to a multinomial setting is not possible. This can be seen by dividing the numerator and denominator of the nonparametfic specification of MNL in (2) by ejV:
Pj=
1 1 + e -~(x)
where~(x)= ~,fp(Xjp)-log~ Y~exp[Y~fp(Xkp)J~ p
Lk~j
(4)
p
Equation 4 no longer conforms to the binary logit framework of the GAM because the predictor index ~(x) is not additive in functions of each covariate, fp. Nevertheless, the proposed nonparametric MNL model of (2) can be derived through the generic formulation of GAMs using a penalized likelihood function (Hastie and Tibshirani, 1990, ch. 6). 1 Because the details are rather technical, involving materials from several chapters, I refer interested readers to the 1990 monograph for such information. Using the penalized likelihood approach, Hastie and Tibshirani (1990, ch.8) extend the generalized additive models for several special settings. 2 One of them is for matched case-control data often used in medical studies. The data contain information about a subject who carries the disease (case) as well as those who do not (control), allowing epidemiologists to infer causes of the disease (Breslow and Day, 1980). By interpreting the case patient as the brand chosen, and the control subjects as the brands available but not chosen by the consumer, this medical framework can be adapted to discrete choice data. Modification to the Hastie and Tibshirani's formulation results in our nonparametric MNL model of (2). The details, including the algorithm, are shown in the appendix. Mathematical properties of GAMs have been studied extensively and are well documented. All of them apply to this nonparametric MNL model as well, because it belongs to the extended family of GAMs. When the dependent variable is binary, the algorithm reduces to that of the logistic regression GAM of Hastie and Tibshirani, although the derivations for the binary and multinomial cases are quite different. This fact also supports the validity for the modification to their formulation. It is possible to conduct an approximate statistical test on GAMs (Hastie and Tibshirani, 1990, sec. 6.8) and thus on the proposed nonparametric MNL model. In light of the emphasis of the proposed model as an exploratory tool, the detail can be found in Abe (1998).
Illustration of the Nonparametric MNL by Simulation Simulation studies were conducted with various nonlinear utility specifications to establish the validity of the nonparametric MNL. Four sets of simulated brand choice data consisting of 988 choice incidents were generated according to the multinomial logit process
546
Journal of Retailing Vol. 74, No. 4 1998
1
! I
0.5
......: ..............................
".......~
i,?',--" ~,J" ,-'"
0
-0.5
1 5 ~'°':""
" " 0
.
0.1
.
0.2
.
.
0.3
.
.
0.4
.
0.5
.
. 0.6
0.7
0.8
028
0.85
0.9
0.9
ADV
1 0.8 0.6 0.4 0.2
°i -0.2 I -0.4 i
0.5
0.55
016
0.~5
027
0.75
"
l °
•
0.95
PRICE FIGURE 1
Mean of the 100 estimates and its 95 percent confidence band along with the true underlying function for specification 1. The underlying specification is mostly contained within the confidence band, thereby supporting the nonlinear structure. The 95 percent confidence band is computed as two times the standard errors, which are obtained from 500 bootstrap samples of the first dataset.
Nonlinear Brand Choice Response to Price
547
1 0.8 0.6
0.4 0.2 0 4).2 -0.4 -0.6 -0.8 -10"
011
012
013
014
015
016
017
018
0~9
1
ADV
o.8i /
0.6
\
0.4 0.2
-0.2 -0.4 -0.6 0 -
I
"~.5
0.55
016
i
............
0.65
,- ......................
0.7
L .............
0.75
i .........
0.8
0.85
0.9
0.95
PRICE
FIGURE 2 Mean of the 100 estimates and its 95 percent confidence band along with the true underlying function for specification 2. The underlying specification is mostly contained within the confidence band, thereby supporting the nonlinear structure. The 95 percent confidence band is computed as two times the standard errors, which are obtained from 500 bootstrap samples of the first dataset.
548
Journal of Retailing Vol. 74, No. 4 1998 1.5 11 0.5 0 -0.5 -1 -1.5
011
012
013
014
015
016
017
01S
019
0~8
0.8.5
0.9
0.95
ADV
1.5
0.5
-0.5
"1"~15
0.55
016
0.~
017
0.75 PRICE
FIGURE 3
Mean of the 100 estimates and its 95% confidence band along with the true underlying function for specification 3. The underlying specification is mostly contained within the confidence band, thereby supporting the nonlinear structure. The 95% confidence band is computed as two times the standard errors, which are obtained from 500 bootstrap samples of the first dataset.
Nonlinear Brand Choice Response to Price
549
1.5
/ . . ~
0.5
.................................
-0.5
-L50
0.1
0.2
0.3
0.4
0.5
0:6
0:7
o:a
0:9
018
0.85
0.9
0.95
ADV
1.5, 1
0.51
-0.5 -1
-1 i 5
0.55
016
0.65
017
0.75 PRICE
FIGURE 4
Mean of the 100 estimates and its 95% confidence band along with the true underlying function for specification 4. The underlying specification is mostly contained within the confidence band, thereby supporting the nonlinear structure. The 95% confidence band is computed as two times the standard errors, which are obtained from 500 bootstrap samples of the first dataset.
550
Journal of Retailing Vol. 74, No. 4 1998
with a nonlinear utility function of equation (2). The utility function included two hypothetical variables (referred to as advertising and price) that took prespecified functional forms. The following four specifications were selected on the basis of proposed hypotheses on nonlinear price response phenomena in the literature. They are marked by the dashed line in Figures 1 through 4. The task is to assess the ability of the nonparametric MNL to recover linearity, concavity, convexity, piecewise lineafity, and maximum and minimum. To make the recovery of the advertising and price functions more challenging, two binary indicator variables (referred to as feature and display) were added to the utility function.
Specification 1: Specification 2: Specification 3:
Specification 4:
• Higher advertising increases utility at a diminishing rate. • Higher prices reduce utility at a diminishing rate. • Advertising exhibits a maximum at 0.5 and is symmetric about 0.5. • Price exhibits a minimum at O.75 and is symmetric about O.75. • Advertising increases linearly. • Price exhibits a three-piecewise-linear response. (slope=-5 f o r price<0.65, 0 f o r 0.650.85). • Advertising exhibits a saturation effect at 0.5. • Price exhibits a three-piecewise-linear response. (slope---5 f o r price<0.65, 0 f o r 0.650.85).
For each specification, the simulation was repeated 100 times to reduce sampling variation. Each estimation took about 30 seconds on a 166 MHz Pentium machine running MATLAB software. The solid line in Figures 1 through 4 shows the average of the 100 estimates along with the 95 percent confidence band obtained by the bootstrap method (Efron, 1981). For each specification, the underlying shape was recovered well, and the true function was contained mostly within the confidence band. However, all estimates exhibited some difficulty fitting the true function at the boundaries due to asymmetry and sparsity in the distribution of data points evidenced by the widening confidence interv~--a common problem in nonparametric regression (Abe, 1991; Silverman, 1986). For specification 1, the estimates correctly reconstructed the diminishing rates characterized by the concavity and convexity. In Figure 2, both maximum and minimum were estimated correctly. The kinks in specifications 3 and 4 were estimated as smooth bends. For specifications 3 and 4, asymmetric negative slopes for price below $0.65 (-5) and above $0.85 (-7) can be observed from the estimates. The saturation effect of advertising in specification 4 was also captured. To examine overfitting, often associated with nonparametric models, I evaluated the fit of the estimates with both calibration and holdout samples. In addition, each model's prediction of brand choice in terms of log likelihood was compared with the usual linear as well as the true utility specifications. These specifications represent the lower and upper bounds of fit, respectively. Table 1 reports the mean log likelihood values of 100 replications along with the standard errors. For calibration, the fit of the nonparametric MNL was very close to that of the true specification and in some cases even beat the true specification
Nonlinear Brand Choice Response to Price
551
TABLE 1 Mean Loglikelihood Values of 1IX) Replications (standard errors in parentheses) Model
Sample
Speficiation 1
Calibratio Holdout Calibration Holdout Calibration Holdout Calibration Holdout
Specification 2 Specification 3 Specification 4
True-specification Nonparametric -896.7 (1.75) -903.0 (2.18) -877.1 (1.66) -877.4 (1.84) -901.8 (1.78) -903.3 (1.58) -901.4 (1.70) -902.6 (1.55)
-898.1 -903.2 -878.0 -896.8 -900.2 -905.5 -900.1 -908.4
(1.68) (1.64) (1.68) (1.53) (1.74) (1.62) (1.70) (1.85)
Linear > > > > > > > >
-913.9 -914.0 -1064.9 -1066.1 -907.5 -909.3 -935.8 -938.6
(1.74) (1.53) (1.57) (1.60) (1.74) (1.64) (1.58) (1.44)
by a slight margin. This is due to the highly flexible nature of nonparametric methods that can conform to random noise in data, thereby increasing perception of fit. The proposed model also performed well in the holdout sample, indicating that the overfitting was not a problem. Note, in particular, the poor performance by the linear model for specifications 2 and 4, where nonlinearity of the underlying utility function is severe. Under such circumstances, improvement in fit by the nonparametfic MNL model is especially large.
APPLICATION TO MARKET DATA This section reports application to scanner panel data on actual consumer brand choices in two product categories--aseptic and red drinks, and considers marketing implications. Because the purpose of the study is exploratory, the entire sample was used for estimation to obtain the maximum degrees of freedom.3
The Aseptic Drink Data The scanner panel data provided by Information Resources Inc., a Chicago-based marketing research f'u'm, contained three brands of aseptic drinks that came in a package of three single-serve paper cartons. The data represented 988 purchases made by 143 households. Explanatory variables for each brand were price and two promotional variables, feature and display at time of purchase. The latter two variables were binary indicators of whether the brand was promoted as a featured item in store circulars (feature) and by special in-store display (display), respectively. In addition, I incorporated a household's brand preference, manifested through previous purchase history, as a household-specific covariate. This variable, often characterized as brand loyalty and first introduced by Guadagni and Little (1983), accounts for household heterogeneity as well as any dynamic change in household brand preference. The carry-
552
Journal of Retailing Vol. 74, No. 4 1998 Utility Component explained by LOYALTY 4
• :
.:":i:.. : .:':.'"
. ..,'
.
,
~.,.M....
3 : '.. " ''i:
. .: ":.
2 /
1 f
/
0 .'
........
.
:..':;.~::-..~',~-.,
.
..
....
,.:v~., ~..~;.:;~-~ f.~.?.> :,.-: ~..:. :..:...;.
-1
.~Cgi.7..... ......
-2
•'
..
-
.
. '
Smoothin~ constant ~ 0.2 I t e r - - 3:, L-~cle - 3
-3
0
012
014
016
018
LOYALTY
Utility Component explained by PRICE 0.6 0.4 0.2 0 -0.2 -0.4 -0.6
-0.8
'016
i 0.65
I
'017
!,
0.35
0.8
Smoothin~constant = 0.7 Iter = 3; "Cycle = 3 0.85
019
0.95
---
PRICE
FIGURE 5
Estimated nonparametric functions of loyalty and price for aseptic drinks data. The loyalty function seems to suggest a three-piecewise-linear function. The price function has a fiat region (<0.8 cent/oz) and a linear downward-sloping region (>0.8 cent/oz) where consumers are insensitive and linearly sensitive to price, respectively.
Nonlinear Brand Choice Response to Price
553
over constant was estimated to be approximately 0.8 by MLE (Fader, Lattin, and Little, 1992). Use of households'past purchase behavior to account for household heterogeneity, as in the loyalty variable, is useful in package goods because previous studies show that past behavior is a strong predictor, much stronger than, say, demographic information. In the linear MNL model the price term was not statistically significant, whereas the nonparametric function of price was significant (t=3.62) in the nonparametric MNL model. Likewise, the log likelihood value increased substantially from -468.85 to -A.A.A..37when the linear specification was relaxed. The two promotion coefficients and their t-values remained similar in both models. Figure 5 shows the estimated nonparametric utility functions of loyalty and price. With feature and/or display promotion, these functions would be shifted by the magnitude of the coefficients (feature--0.58, display=0.72). The loyalty function suggests a piecewise linear shape: high and low loyalty values above 0.85 and below 0.20 with steep slopes, and a relatively flat slope in-between.4 Interpretation of the loyalty variable is tricky because it captures much of confounding residual effects of the model (Allenby and Rossi, 1991; Fader and Lattin, 1993). 5 In contrast, the price variable provides straightforward managerial insights. The price function may be approximated by two linear utility segments: an almost flat region for price below 80 cents and a downward-sloping region with a slope of -7.0 for price above 80 cents. The result was confirmed by the standard linear-in-parameters MNL with a piecewise linear price function for the two segments separated at 80 cents. The fit in terms of the log likelihood value of -448.15 was substantially better than that of the linear MNL (-468.85). Although the slope was significant (-6.9, t=-3.7) for the segment above 80 cents, for the segment below 80 cents it was not (1.65, t--1.0). Note, however, that it is not straightforward to construct a piecewise specification without knowing the number and the location of the kinks that can be readily obtained by the nonparametric MNL (i.e., a single kink at 80 cents). The finding indicates that the panelist utility is fairly constant over the price range up to 80 cents, but begins to decrease linearly beyond that point. Such nonlinearity is consistent with many cognitive studies that confirm the latitude of price acceptance and a threshold effect (Gupta and Cooper, 1992; Kalyanaram and Little, 1994; Monroe, 1973, 1990). The manager in this case may not want to price much above 80 cents unless there is some compelling reason for the brand to take a premium price.
The Red Drink Data The database contained 594 panel purchase records for the red drink category, which included cranberry drinks and blended cranberry drinks such as cranberry apple and cranberry grape. There were six brand-size combinations in the database: Brand O cranberry cocktail, 32oz (18.0 percent share), 48oz (20.0 percent) and 64oz (21.0 percent), Brand O cranberry apple 48oz (12.6 percent), Brand O cranberry low-calorie 48 oz (13.6 percent), and private-label cranberry cocktail 48oz (14.7 percent). Different sizes of the same brand were considered to be distinct alternatives, because their unit prices (cents/ounce) were dif-
554
Journal of Retailing Vol. 74, No. 4 1998 Utility
Componentexplained by LOYALTY
.n
-: ~,i .
~ _.jj~,,.:.~.::~.i.!l ~'':"~i
'"' .
• -
""
012
:
lSrm°°"cthine 3;~Yco~=lea~ = 0.3
014
0.6
@8
LOYALTY Utility Component
explaip,cd by REGPR]CE
11
I
/
~:" i'll il-~'-'~ -1t
:' ~.,. -7"-'-"---'------~
-2[-
:
'
, ~
II
~
r ";..._>. -3 !
Smoothinicomtant = 0.6
-4 ............................... 4.5 5
5.5
!
i 6.5
6
REGPRICE
Utility Componentexplainedby DISCOUNT 1'51I
;
!
O.51
~
--
-0. /:
i
i
" ,',! ,ll, ;i~i '
-15
0
0.2
014
•
Smoot hinRconstant= 0.6 Iter = 3;~'ycle = 3 016
018 i DISCOUNT
112
114
1 1.6
FIGURE 6 Estimated nonparametric functions of loyalty, regular price and discount for red drinks data. Loyalty and regular price (regprice) exhibit linear functions, whereas discounts seems to have three linear segments: two fiat regions for less than 0.4 cent/oz and more than 1.1 cents/oz and a linear function between those two regions
Nonlinear Brand Choice Response to Price
555
ferent and the promotions were size specific. Explanatory variables included were brandsize loyalty, regular price (cents/ounce), price-cut discount (cents/ounce), and two binary promotion indicators, feature and display. The estimates show that log likelihood value for the nonparametric model increased to -540.26 in comparison to -545.02 for the linear model. Figure 6 shows the estimated nonparametric utility functions of loyalty, regular price, and discount. Both loyalty and regular price appear linear, whereas discount has an interesting shape indicating the threshold and saturation effects from price promotions.6 As seen from the flat regions, a discount of less than 0.4 cent/ounce, and more than 1.1 cents/ounce, had almost no effect on consumers. A price cut, on the other hand, was effective between 0.4 and 1.1 cents/ounce. In terms of consumer behavior, if a discount is not deep enough, consumers may not notice. However, price cuts beyond a certain level may not attract additional buyers and would result in lower profits to retailers from a reduced margin. The result implies that consumer response to price deal is S-shaped for this data, exhibiting the threshold and saturation effects that are supported by some cognitive studies (Blattberg, Briesch, and Fox, 1995; Gupta and Cooper, 1992). Managerially, a discount amount of 1.1 cents/ounce (which corresponds to about 40 cents, 50 cents, and 70 cents for 36 oz, 48 oz, and 64 oz sizes, respectively) would be most effective as temporary price reduction. The proposed model therefore provides retailers with valuable guidance for their pricing decisions by suggesting the efficient amount of price cut for their promotions. This nonlinear result was supported by the fact that replacing both loyalty and regular price with linear terms and re-estimating the model with a single nonparametric function of discount did not change its shape. Further, the three segment piecewise-linear specification resulted in the significant slope for discount between 0.4 and 1.1 cents/ounce (1.72, t=2.2) and, as anticipated, nonsignificant slopes below (-0.53, t=0.5) and above (0.45, t=l.2) the range.
INVESTIGATION OF THE LATITUDE OF PRICE ACCEPTANCE Understanding price response of consumers' brand choice is not only valuable to managers in pricing and price promotion, but also considerable interest to researchers in consumer behavior. In particular, reference price, latitude of price acceptance, and asymmetric price response effects have been examined both in laboratory and field studies. Based on Helson's (1964) adaptation-level theory, a reference price is a consumer's representation of the prices at which a brand had formerly been sold. The consumer employs this perception as a basis for evaluation in current shopping. Though the precise definitions vary by researchers, the existence of reference price has been supported by empirical studies (Hardie, Johnson, and Fader, 1993; Kalwani et al., 1990; Kalwani and Yim, 1992; Krishnamurthi, Mazumdar, and Raj, 1992; Lattin and Bucklin, 1989; Puffer, 1992; Wirier, 1986). Latitude of price acceptance refers to the presence of a region of price insensitivity around the reference price. Price variation within this region is not "noticed" by a consumer. Only prices either above or below the region affect her behavior (Emery, 1969;
556
Journal of Retailing Vol. 74, No. 4 1998
Oatitudeof ~ c , e ) refer
(loss)
price
FIGURE7 Nonlinear price response effects. The response curve is characterized by three linear segments corresponding to gain, latitude of acceptance, and loss from low to high price. Mazumdar and Jun, 1992; Monroe, 1971). A cognitive concept supporting the phenomenon is the assimilation-contrast theory (Sherif, Taub, and Hovland, 1958; Sherif, 1963). Asymmetric price response describes that consumer's negative response to price increases is stronger than her positive response to price decreases. Such asymmetric response is supported by empirical studies (Hardie et al., 1993; Kalwani et al., 1990; Kalwani and Yim, 1992; Mayhew and Winer, 1992; Puffer, 1992). The underlying cognitive mechanism is based on prospect theory (Kahneman and Tversky, 1979). The essence of these price response effects is depicted graphically in Figure 7 where the vertical axis plots consumer's utility relative to price. The fiat response curve in the middle around the reference price implies the latitude of price acceptance. Asymmetric price response predicts that the slope is steeper above the region (loss to consumers) than below the region (gain to consumers). Kalyanaram and Little (1994) estimated such a piecewiselinear price response curve from scanner panel data in two drink categories using a multinomial logit (MNL) model. Estimation of the slope for each of the three segments is straightforward once the boundary of each region is known. However, simultaneous estimation of the boundaries and slopes cannot be achieved with a standard MNL model. Kalyanararn and Little overcame this difficulty by a grid-like search procedure. Here, slopes were estimated for various widths of latitude of acceptance that were symmetrically located about the reference price. The appropriate width was inferred at the point where the slope of the middle segment started to change from zero to negative. The largest width that caused the slope of the middle segment to stay at zero was chosen as the region of latitude of acceptance. My approach estimates the price response function of an MNL model directly with the nonparameaic MNL. We know, from the simulation study of specifications 3 and 4, that the proposed model can recover a shape like Figure 7 if this is the underlying response curve. Use of the nonparametric method eliminates the tedious grid search as well as the
Nonlinear Brand Choice Response to Price
557
symmetric location assumption. It may be likened to turning on a flashlight in a dark tunnel instead of relying on the compass reading at finite locations. One can literally see the shape of an entire price response function rather than inferring it from coefficient magnitudes. This ensures that no other unknown phenomena or outiiers are driving the estimation results.
Orange Juice Data The database comprised 1,868 purchase records of 77 panelists on refrigerated orange juice in a 64-oz container during 78 weeks starting mid 1983. Six brands were included: regional brand (15.4 percent, $1.67), Citrus Hills (32.9 percent, $1.83), Minute Maid (22.5 percent, $1.92), private-label (8.8 percent, $1.41), Tropicana Regular (13.7 percent, $1.71), and Tropicana Premium (3.9 percent, $2.30). Market shares and average prices are shown in the parentheses. Because the study investigates price reference formation, we focused on buyers who had purchased at least ten times during the period. The first 26 weeks constituted the initialization period for several explanatory variables. Since this is an exploratory study, no hold-out sample was set aside. Data from the remaining 52 weeks were used for estimation to attain the maximum sample size. Explanatory variables in the model closely followed those used by Kalyanaram and Littie to facilitate a comparison of the nonparametric MNL with their parametric approach. They were brand loyalty, feature, reference price, price deviation, and brand dummies. Though there are different approaches in the literature to operationalize reference price and latitude of price acceptance, we adopted the exponential smoothing definition used by Kalyanaram and Little to insure comparison. This definition captures the essential feature of the underlying consumer process of adaptive expectation while accounting for consumer heterogeneity. It also has been shown to work well by other researchers. The reference price of brand k for consumer i at the t-th purchase, RPik(t) is RPik(0 = a RPik(t- 1) + (l-a) Pik(t-1) where Pik(t) is the actual price observed by consumer (household) i for brand k at the t-th purchase and ct is a carry-over constant, ct was determined to be 0.81 from the data by the MLE method. Following Kalyanaram and Little, the deviation between the actual and reference prices of brand k for consumer i at the t-th purchase, dik(t), is defined as: dik(t) = [Pik(t) - RPik(t)] / Sik(t) where Sik(t) is the variability of price of brand k for consumer i at the t-th purchase. Sik(t) is an exponentially smoothed standard-deviation-like-measure of the difference between the actual and reference prices defined as [Sik(t)]2 = ~t [Sik(t-1)] 2 + (1-)') [Pik(t-1) - RPik(t- 1)] 2
558
Journal of Retailing Vol. 74, No. 4 1998
Utility Component explained by DIFPRICE
;""
" -L'
":'::"
" "~ " .;".'~"
"
t"..'.'.' "*'...'.: "':'~.G~:~;".~'.". ""~""":Z" ":.- ...." ' :
"
"~'.,..,:~¢~o-.,'~.. .:'. ..... .... . ...... .%a,/-.~.:
0
•" " " '•"-'.-:.:~..., .. ~,.~.,. ~ :' .'.. • / "v'.¢:'~.#.'.~.-'_. ~ ~i/ " ':. .. . .... :..-,.:.:...;~..~:.':.,. "'..".
-1
" .i.::"
•
•
• . , •
-2
..
.....
-3
i
-2
.
..
"
.
..,:. ~,.. •
:
..
.
~
,
.'."
. ...
:.
~
"' "' :.'
,.
,
..
.
~
."
.
"'a
~
•
:~ .:.".......,. .
."
'~
......
?.%.
..Smoo~eonstam
.
" • i-
,
':....
.,...
.:
.
-1
.,"
"":.:...
~.~.~.,, •
....
-3
.:
". " : " • ::b•o ~ . '
•
"'.."
" "
.... ."'i,. "" .
.. ',,
.
0
" Iter "
=. "
3~ i
1
""
:Cycle
= "0,4" :::~ ...
= 2
. i
• • >''': _ _
2
DIFPRICE
FIGURE8 Estimated nonparametric function of the standardized price deviation for orange juice. The three linear segments can be seen in the regions of gain (d<-1.8), latitude of acceptance (-1.81.8). The order of the absolute magnitude of the slopes, loss > gain > midrange, is consistent with the behavioral theory.
where y is a carry-over constant, which is estimated to be 0.80 by MLE as before. Thus, dik(t) is a standardized measure of the deviation of the actual price from the reference price after accounting for the difference in price variabihty among brands, consumers, and purchase occasions. According to the adaptation-level theory, consumers judge current price deviation from their accustomed norm for the brand's price variability. A positive dik(t ) indicates that the brand's price is higher than the expected by the consumer at the purchase occasion, implying a loss in the consumer's mind, Alternatively, a negative dik(t) corresponds to a gain to the consumer because the brand's price is lower than expected. For comparison with the study of Kalyanaram and Little, we report the result of the nonparametric model in which price deviation is the only explanatory variable relaxed nonparametrically. When the other continuous variables (brand loyalty and reference price) were relaxed nonparametrically, they exhibited nearly straight lines and the result for the price deviation did not change qualitatively. Consequently, the remaining explanatory variables were specified linearly.
Nonfinear Brand Choice Response to Price
559
Figure 8 shows the estimated nonparametric function for price deviation. Note its similarity to Figures 3 and 4 from the simulation study. As in the simulation, we should also overlook the boundary effect of flattening curvature at the extreme end-points for price deviation near +3 due to the data asymmetry and sparsity. The three regions can be seen for deviations of less than about -1.8, between -1.8 and 1.8, and greater than 1.8. Although the slope in the region of latitude of price acceptance is not quite flat, it is less steep than the other two regions. This finding is consistent with the simulation result. Visual comparison of the slopes between the loss and gain regions indicates that the former is slightly steeper. The log likelihood value has increased from -917.1 for the linear price specification to 902.3 for the nonparametric specification, whereas the coefficients and their t-values for loyalty, feature, and reference price remained similar. 7 As graphically shown in Figure 8, the nonparametric MNL estimates the latitude width to be 3.6 (1.8 on each side). Once this region is found visually, one can return to the parametric approach by defining a piecewise-linear function with the following variables. m=d 1= d g=d
if -1.8 < d < 1.8, if d > 1.8, if d < - 1.8,
0 0 0
otherwise; otherwise; otherwise;
where d = (P-RP)/S is the standardized price deviation as before. Table 2 reports the result of the linear MNL model with these covariates. The coefficient for covariate m is statistically significant, which is consistent with Figure 8. We do not observe the fiat response function for m suggested by latitude of price acceptance. Nevertheless, the order of the absolute magnitude of the slopes, loss > gain > midrange, is consistent with the behavioral theory.
Ground Coffee Data The database contained 3,776 purchase records of 167 panelists for regular, ground coffee during 65 weeks in late 1979. The data included six brand sizes, one and three pound
TABLE2 Parameter Estimate of Three-Linear-Segment MNL for Orange Juice variable
I m
g REFERENCE PRICE LOYALTY FEATURE Citrus Hills Minute Maid Private-label
Tropicana Regular Tropicana Premium
coeff,
t-stat
-0.587 -0.365 -0.456 -I .649 3.812 0.718 0.675
-5.57 -5.13 -3.30 -3.71 26.18 5.51 4.41 3.16 1.10 1.74 1.36
0.577 0.298 0.263 0.404 Loelikelihood = -908.7
560
Journal of Retailing Vol. 74, No. 4 1998 Utility Component explained by DIFPRICE
, . . ~; ...., ... ... • ...
".'. :.;... ". .. ; ..-'"'."--'.,~:'.i:-.- . '. • • ' ::" ' - .';::~".;"'...':' 'l ..... .."i.,,'... .'.'-.' .:~":~t - i ' - " .'" " "-" " .':~ ":".':~;-"','~,"".S-.'
: j,:.-I-!:
1.5 1
1
05
~
~
"
' ~ ;";.: : ~ : ~ 9 "~::" ~ : ~ ':,'r,~..' ~ : - i . .-:'.' .' .,,'...". ,:1" ~ "-'.'; . . . .
"
0
- :
,
• .'.-..: :"
• " • :
•
'";' I • " ", ' . ' ;','-" .',; ~ : ~" ..a. ,~. ., ',~~ l ~ , . . ~kd _"~~ " : " ' : :-.~ ' ~ " .",. ' t : ' . ,:~.: ..' : ,, • ~.,, .~ : , "• . -..,,.," •
....... . .
" :::. " ' ; : . . - ' . •
-2
-1.5
"......
"'c'
.-."
" : i " : i - - ' . . ~ _ _ ~ _ "
....
..... , •
"
~
\
...... ..
•
•
.,,u....~r~
~.~222'..
:."
~
/
. . . ~ ~ , , ~ ~ , : : ~ , . . . . _ _ ~ ~ . .
-...... .: : . ~ , . ~ " ~, :... ..- .;!.!.: i 0.4 .~ % , ~ . . . . ~ . . . .
.
.
-1
-0.5
0
0.5
1
1.5
2
DIFPRICE
FIGURE9 Estimated nonparametric function of the standardized price deviation for ground coffee. The three linear segments can be seen in the regions of gain (d<-1.2), latitude of acceptance (-1.20.8). The order of the absolute magnitude of the slopes, loss > gain > midrange, is consistent with the behavioral theory.
sizes, for three brands: Brand A, I-lb. (20.3 percent, $2.98), Brand A, 3-lb. (6.2 percent, $2.95), Brand B, l-lb. (36.9 percent, $3.11), Brand B, 3-lb. (10.1 percent, $3.15), Brand C, I-lb. (18.9 percent, $3.01), and Brand C, 3-lb. (7.7 percent, $3.03). Unit shares and average prices per pound are again shown in the parentheses. As before, all panelists had purchased at least ten times during the period. The first 26 weeks constituted the initialization period and the remaining 39 weeks were used for estimation. Explanatory variables were defined in the same manner as the orange juice data, brand-size loyalty, feature, reference price, price deviation, and brand dummies. Figure 9 shows the estimated nonparametric function of price deviation. The shape closely resembles that of orange juice and the simulation estimates for specifications 3 and 4. The three regions for price deviation are less than -1.2, between -1.2 and 0.8, and greater than 0.8. Though the slope in the region of latitude of price acceptance is again not quite fiat, it is less steep than the other two regions. Visual comparison of the slopes between the loss and gain regions indicates that the former is slightly steeper. The log likelihood value has increased from -2113.5 for the linear specification to -2075.3 for the nonparametric
Nonlinear Brand Choice Response to Price
561
TABLE 3
Parameter Estimate of Three-Linear-Segment MNL for Ground Coffee variable
I m g REFERENCE PRICE LOYALTY FEATURE Brand A 3-lb. Brand B l-lb. Brand B 3-lb. Brand C l-lb. Brand C 3-lb.
coeff.
t-stat
-0.641 -0.250 -0.554 -0.250 4.007 2.059 -0.983 0.266 -0.233 0.102 -0.682 LoBlikelihood = -2107.1
-6.78 -2.45 -4.38 -0.60 36.17 25.73 -7.59 2.65 -1.97 1.22 -5.93
specification, while the coefficients and their t-values for loyalty, feature, and reference price remained stable as before. 8 For the coffee data, the width of latitude of price acceptance is 2.0 and it is shifted slightly such that the threshold at which consumers notice price change is smaller for loss (0.8) than for gain (1.2) as seen from Figure 9. The finding implies that price increase is recognized more readily by consumers as opposed to price decrease. Based on that location and width, slopes for the three linear segments were estimated as before with the following m=d l=d g=d
if -1.2 < d < 0.8, if d > 0.8, if d < -1.2,
0 0 0
otherwise; otherwise; otherwise.
Table 3 reports the result of the linear MNL model using these variables. The coefficient for covariate m is statistically significant, as expected from Figure 9. Therefore, once again, latitude of price acceptance characterized by a fiat response region was not observed. However, the order of the absolute magnitude of the slopes, loss > gain > midrange, supports the behavioral theory and is consistent with the finding from the orange juice data.
CONCLUSIONS Nonlinear price response of consumers' brand choice was studied using a multinomial logit model of discrete choice with a nonparametric utility function. The model generalizes the logistic regression of GAM proposed by Hastie and Tibshirani for a binary response to a qualitative response that can assume more than two values. The simulation studies showed the recovery of various nonlinear relationships. The nonparametric MNL improved fit not
562
Journal of Retailing Vol. 74, No. 4 1998
only in the calibration sample, but also in a holdout sample. This demonstrated that overfitting was not a problem. The applications involved consumer brand choice data collected by UPC scanners in two product categories. In aseptic drinks, consumers' utility was found to be fairly constant up to a threshold level (80 cents). Above that price it declined linearly. In red drinks, consumer response was fairly linear with respect to the regular price. On the other hand, consumers did not respond to temporary price discount until the depth reached the level of 0.4 cents per ounce. Beyond that point, the magnitude of the response increased almost linearly with the discount depth until it hit a saturation level at 1.1 cents per ounce. All the nonlinear results were verified by piecewise parametric specifications and were consistent with the cognitive studies of price perception and purchase intention using experimental data. Following the framework of Kalyanaram and Little, I next applied the nonparametric model to examine the latitude of price acceptance and asymmetric price response. Kalyanaram and Little estimated a piecewise-linear function of a standardized price deviation-a difference between the actual and the consumer's reference price accounting for the brand's price variability--for various locations of discontinuity. For this purpose, they used a grid-like search procedure to identify the width of the latitude. The nonparametric model permitted a direct estimation of the response function without the tedious grid search. The results from two product categories, orange juice and ground coffee, clearly showed the expected difference in price sensitivity among three regions: loss (price is much higher than expected), latitude of acceptance (price is more or less what is expected), and gain (price is much lower than expected). The estimated response function indicated that consumers were most price sensitive in the loss region, followed by the gain region, and least sensitive in the region of latitude of acceptance. The width of the latitude seemed category specific: for orange juice it was about 3.6 and symmetric about the reference price, whereas it was 2.0 and asymmetric for ground coffee. The results were also verified by the traditional parametric approach. The advantages of the nonparametric MNL are three fold. First, the utility specification does not have to be confined to a particular parametric functional form that may not correctly represent the underlying behavioral process. Second, graphic outputs facilitate communication with users who have little statistical background. As shown in the applications, the resulting utility functions are intuitive and can be used directly by managers in pricing decision. While plotting data is a standard practice in data analysis, it poses great difficulty when a latent variable such as utility is involved. Hence, powerful statistical tools such as GAMs can make a major contribution. Third, the method can be a valuable tool in exploratory analysis for researchers. Nonparametric utility functions obtained by the nonparametric MNL suggest appropriate parametric transformations of explanatory variables for confirmatory analysis. Furthermore, less effort by the researcher reduces subjective input to model specification and enables the process to be more data-driven. The assumption of additive separability of covariates in the proposed model deserves some remarks. Although additivity might appear to be restrictive at first, it can be overcome by [1] constructing a new variable that is a product of interacting covariates as in the linear methodology or [2] estimating a utility function of higher dimensions. Although estimation of a two-dimensional function is often done (Hastie and Tibshirani, 1990), exam-
Nonlinear Brand Choice Response to Price
563
pies beyond two dimensions are almost nonexistent. This is because the additivity assumption is postulated mainly to avoid the curse of dimensionality to which nonparametric methods in general are vulnerable. Nevertheless, if interaction among variables is suspected, one must take an appropriate caution as described above. Other disadvantages, due partly to the immaturity of nonparametric methods, are [1] that the nonparametric method has limited appeal for theory testing (i.e., use of statistical inference) and predictive (forecasting) models and [2] the subjective element in the choice of a smoothing constant. Thus, parametric (theory-driven) and nonparametric (data-driven) approaches can complement each other to advance our knowledge. Without the nonparametric MNL, we could not have found the threshold and saturation effects of price and discount response in the aseptic and red drinks data. I verified the results with the traditional piecewise-linear parametric model because the nonparametric method is relatively new and there is less familiarity with it. Some might argue why do we need a nonparametric approach if we must still resort to a parametric approach? The answer is because it is easier said than done. Estimating a three-piecewise linear function of the discount variable in red drinks is simple once we know the shape as shown in Figure 6. Without such a plot, finding the number of linear pieces, and where discontinuities are located, with a trial-and-repeat process of the parametric approach is a formidable task. Kalyanaram and Little employed a three-piecewise-linear function because existing theory suggested that this was reasonable. While the estimated coefficients for the slopes confirmed the theory, other nonlinear phenomenon may be driving the three-segment estimate and providing a false signal. Other nonlinear shapes could produce the identical parameter estimates for the three slopes. So, why not examine four or five segments? Why not a quadratic piecewise function? With the theory-driven approach, the model is only as good as the theory and phenomena outside the scope of the theory may not be adequately captured. Worse yet, the result of the theory being tested could be biased if the model was misspecifled. A data-driven approach could alert such pitfalls of ignoring potentially important phenomena. These can, in turn, be tested by building appropriate parametric models. Indeed, Kalyanaram and Little suggested (1) estimating the response function with more segments and (2) investigating the nature of the asymmetric location of the latitude, as two possible directions for future research. There is no need to emphasize that the proposed nonparametric MNL achieved both elaborations with ease. The estimated response function suggested that the possibility of additional nonlinear phenomena could be safely ruled out, and that three segments seemed to be sufficient. For coffee but not orange juice, the latitude of price acceptance was located asymmetrically such that consumers noticed smaller price changes if it were a price increase as opposed to decrease. In sum, some researchers advocate that model building must start from theory. While the importance of theory-driven model building is irrefutable, and theory testing must be conducted on parametric models, there is substantial benefit to be gained from data-driven approaches. As illustrated by the exploratory analyses permitted by nonparametric methods, these can lead to the discovery of unexpected phenomena and potentially lead to new theories. These must, of course, be verified by confirmatory analysis. But, practical and easy-to-implement nonparametric methods are a valuable tool for helping the researcher advance knowledge.
564
Journal of Retailing Vol. 74, No. 4 1998
Acknowledgment: The author greatly appreciates the valuable comments of William Bearden, the editor, and three anonymous reviewers.
APPENDIX ALGORITHM FOR THE NONPARAMETRIC MNL OF DISCRETE CHOICE
A Generic Algorithm for the Standard GAMs For estimation of GAMs, Hastie and Tibshirani (1986) propose the local scoring algorithm, a nonparametric variant of the Fisher scoring procedure. Operationally, the so-called adjusted dependent variable--a new estimate of the predictor index, q(x) = Zp fp(Xp), based on the current estimate of fp (Xp)'S--is updated iteratively by nonparametric regression on the explanatory variables with certain weight. The nonparametric functions fp (p=l,..,P) are estimated sequentially, one variable at a time, by using the preceding estimates of the functions for other covariates. This technique is referred to as the baclditting procedure (Friedman and Stuetzle, 1981). Technical discussion for the convergence of the algorithm as well as the existence, consistency and non-degeneracy (uniqueness) of the solution can be found in Buja, Hastie, and Tibshirani (1989). For the case of a binary logit GAM, the adjusted dependent variable, z, and the weight, w, become z =~(x)+
y-tt
(5)
Ix(I -Ix)
w = Ix(1 - ~t)
(6)
where n(x) is based on the current estimate of fp's, and Ix-=-E(y)is the mean of the response variable y predicted by the logistic regression of (3) with the current estimate of q. The adjusted dependent variable z--an updated value for the predictor index T1---can be interpreted as the first-order Taylor series approximation of rl about the current estimate of Ix. The weight w is a reciprocal of the variance of z, providing less (more) weight on observations with high (low) variance in the regression. It is introduced in the nonparametric regression to compensate for the data reliability as is the case for the method of weighted least squares (WLS). The additive functions fp (p=l,..,P) are obtained by nonparametric regression of z on x with weight w by the backfitting procedure. Each iteration of the local scoring procedure consists of [ 1] updating the choice probability Ix from the previous estimate of additive predictor index, q, [2] computing the adjusted dependent variable z and weight w, and [3] applying nonparametric regression of z on x with weight w. The following summarizes the local scoring algorithm for a binary logit GAM. Initial estimate by linear model q(x) = ~l'x Repeat Compute the current estimate of Ix from 1"1
[by (3)]
Nonlinear Brand Choice Responseto Price
565
Compute the adjusted dependent variable z and weight w [by (5) and (6)] Obtain fp (xp)'s by nonparametric regression of z on x with weight w [by backfitting procedure] Until log likelihood converges.
Algorithm for the GAM of Discrete Choice Modifying the algorithm for the GAM for matched case-control data (Hastie and Tibshirani, 1990, Algorithm 8.1) results in the following algorithm for the GAM of discrete choice. As in the binary logit case, the same three basic steps are involved. For operational purpose, subscript n - - a n index for data points--is made explicit. Initial estimate by the linear model rl(xnj ) = 13'xnj Vn and j Repeat Compute the current estimate of IXnj from r I as rl(xnj) e Ilnj = y, erl(xnk ) k Compute the adjusted dependent variable Znj and the weight Wnj, where ZnJ = rl(Xnj ) +
Ynj--_gnj I,tnj ( 1 - ~tnj )
Wnj = ~nj (1 - ~ n j ) Obtain fp(Xp) (p = 1 ..... P) by nonparametric regression of znj on xai with weight Wnj [by the backfitting procedure] Until log likelihood converges.
NOTES 1. In the nonparametric case, the usual likelihood is maximized when the estimated function interpolates the observed values of a response variable. However, such a nonparametric function would be too "wiggly" and rough. For instance, in the logistic regression, the estimated function will be such that q(xi)=+oo for Yi=l and "q(xi)=- **for yi=O. A smoother function can be obtained by maximizing a conditional likelihood that penalizes the curvature of the estimated function characterized by the quadratic form of the second derivative. 2. Derivation for the estimation of the standard GAMs, including logistic regression (binary lo0t), relies on the fact that the random component has an exponential family density. This is no longer the case in the nonparametric MNL model, requiring a separate derivation. Thus, any GAMs
566
Journal of Retailing Vol. 74, No. 4 1998
whose random component does not belong to the exponential family are referred to as "extensions to other settings" by Hastie and Tibshirani (1990, ch. 8). 3. Even if the entire sample was used, there are ways to conduct cross-validation such as the jackknife method (Efron and Gong, 1983). Perhaps more practical route to evaluate the extent of overestimated goodness-of-fit is to conduct cross-validation using a subset of data, and if satisfactory, the entire data can be used in analysis for maximum parameter stability. 4. The loyalty variable is defined for each brand at each purchase occasion of each panelist. Thus, each data point in the loyalty plot represents this unit rather than an individual consumer. This means segmentation of the plot applies to the grouping of brand-occasion-panelist cells rather than that of consumers, and should not be interpreted as market segmentation. 5. The loyalty variable defined by Guadagni and Little is known to be an extremely parsimonious, yet powerful predictor of brand choice. It captures the effects of cross-sectional heterogeneity and time-series dynamic variation in brand preference as well as the impact of past marketing mix on a household, all in a single variable. 6. The approximate likelihood ratio index test devised by Hastie and Tibshirani (1990) also supported the nonlinearity of discount and the linearity of the loyalty and regular price. 7. The approximate likelihood ratio test implied that the nonlinearity was statistically significant. 8. The approximate likelihood ratio test implied that the nonlinearity was statistically significant.
REFERENCES Abe, Makoto. (1991), "A Moving Ellipsoid Method for Nonparametric Regression and its Application to Logit Diagnostics Using Scanner Data," Journal of Marketing Research, 28: 339-346. . (1995). "A Nonparametric Density Estimation Method for Brand Choice Using Scanner Data," Marketing Science, 14: 300-325. . (1998). "A Generalized Additive Model for Discrete Choice Data," Journal of Business and Economic Statistics, forthcoming. Allenby, Greg M. and Peter E. Rossi. (1991). "Quality Perceptions and Asymmetric Switching Between Brands," Marketing Science, 10: 185-204. Blattberg, R. C., Briesch, R., and Fox, E. (1995). "How Promotions Work," Marketing Science, 14: G122-G133. Breslow, N. S. and N. E. Day. (1980). Statistical Methods in Cancer Research. 1: The Analysis of Case-Control Studies, I.A.R.C, Lyon. Ben-Akiva, Moshe and Steve Lerman. (1985). Discrete Choice Analysis: Theory and Application to Travel Demand. Cambridge, MA: MIT Press. Buja, A., T. J. Hastie, and R. J. Tibshirani. (1989). "Linear Smoothers and Additive Models," The Annals of Statistics, 17: 453-555. Donthu, Naveen and Ronald T. Rust. (1989). "Estimating Geographic Customer Densities Using Kernel Density Estimation," Marketing Science, 19: 323-332. Emery, F. E..(1969). "Some Psychological Aspects of Price." Pp. 98-111 in B. Taylor and G. Willsin (eds.), Pricing Strategy. London: Staples. Efron, B. (1981). "Nonparametric Estimates of Standard Error: The Jackknife, the Bootstrap and Other Methods," Biometrika, 68: 589-599. Efron, B. and Gail Gong. (1983). "A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation," The American Statistician, 37 (1) (February): 36-48.
Nonlinear Brand Choice Response to Price
567
Fader, Pete S. and James M. Lattin. (1993). "Accounting for Heterogeneity and Nonstationarity in a Cross-sectional Model of Consumer Purchase Behavior," Marketing Science, 12: 304-317. Fader, Pete S., James M. Lattin, and John D. C. Little. (1992). "Estimating Nonlinear Parameters in the Multinomial Logit Model," Marketing Science, 11: 372-385. Friedman, J. H. and W. Stuetzle. (1981)• "Projection Pursuit Regression," Journal of American Statistical Association, 76:817-823• Guadagni, Peter M. and John D. C. Little. (1983)• "A Logit Model of Brand Choice Calibrated on Scanner Data," Marketing Science, 2: 203-238. Gupta, Sunil and Lee G. Cooper. (1992). '°The Discounting of Discounts And Promotion Thresholds (by consumers)," Journal of Consumer Research, 19:401-411. Hardie, Bruce G. S., Eric J. Johnson, and Pete Fader. (1993)• "Modelling Loss Aversion and Reference Dependence Effects on Brand Choice," Marketing Science, 12: 378-394• Hastie, T. J. and R. J. Tibshirani. (1986). "Generalized Additive Models," Statistical Science, 1: 297318. • (19&7). "Generalized Additive Models: Some Applications," Journal of the American Statistical Association, 82:371-386. • (1990). Generalized Additive Models. New York: Chapman and Hall. Helson, H. (1964). Adaptation-Level Theory. New York: Harper and Row. Kahneman, D. and Amok Tversky. (1979). "A Prospect Theory: An Analysis of Decision under Risk," Econometrica, 47: 263-291. Kalwani, Manohar U., Chi K. Yim, Heikki J. Rinne, and Yoshi Sugita. (1990). "A Price Expectation Model of Customer Brand Choice," Journal of Marketing Research, 27:251-262. Kalwani, Manohar U. and Chi K. Yim. (1992). "Consumer Price and Promotion Expectations: An Experimental Study," Journal of Marketing Research, 29: 90-100. Kalyanaram, G. and John D. C. Little. (1994). "An Empirical Analysis of Latitude of Price Acceptance in Consumer Packaged Goods," Journal of Consumer Research, 21: 408-418. Krishnamurthi, Lakshman, T. Mazumdar, and S. P. Raj. (1992). "Asymmetric Response to Price in Consumer Choice and Purchase Quantity Decisions," Journal of Consumer Research, 19: 387400. Lattin, James M. and Randolph E. Bucklin. (1989). "Reference Effects of Price and Promotion on Brand Choice Behavior," Journal of Marketing Research, 26: 299-310. Manski, Charles F., and Daniel McFadden, editors. (1981). Structural Analysis of Discrete Data with Econometric Applications• Cambridge, MA: MIT Press. Matzkin, Rosa L. (1993). "Nonparametric Identification and Estimation of Polychotomous Choice Models," Journal of Econometrics, 58 (July): 137-168. Mayhew, G. E. and Russell S. Winer. (1992). "An Empirical Analysis of Internal and External Reference Prices Using Scanner Data," Journal of Consumer Research, 19: 62-70. Mazumdar, T. and S. Y. Jun. (1992). "Effects of Price Uncertainty on Consumer Purchase Budget and Price Thresholds," Marketing Letters, 3: 323-330. Monroe, Kent B. (1971)• "Measuring Price Thresholds by Psychophysics and Latitudes of Acceptance," Journal of Marketing Research, g: 460-464. • (1973). "Buyers' Subjective Perceptions of Price," Journal of Marketing Research, 10: 70-
801 • (1990). Pricing: Making Profitable Decisions. New York: McGraw Hill. Nelder, J. A. and R. W. M. Wedderhnm. (1972). "Generalized Linear Models," Journal of the Royal Statistical Society A, 135: 370-384. Perreault, W. D. and F. W. Young. (1980). "Alternating Least Squares Optimal Scaling: Analysis of Nonmetric Data in Marketing Research," Journal of Marketing Research, 17: 1-13.
568
Journal of Retailing Vol. 74, No. 4 1998
Puffer, Daniel S. (1992). "Incorporating Reference Price Effects into a Theory of Consumer Choice," Marketing Science, 11: 287-309. Rust, Roland T. (1988). "Flexible Regression," Journal of Marketing Research, 25: 10-24. Sherif C. W. (1963). "Social Categorization as a Function of Latitude of Acceptance and Series Range," Journal of Abnormal and Social Psychology, 67: 148-156. Sherif M., D. Tanb and C. I. Hovland. (1958). "Assimilation and Contrast Effects of Anchoring Stimuli on Judgments," Journal of Experimental Psychology, 55: 150-155. Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis, Monographs on Statistics and Applied Probability. New York: Chapman and Hall. Tellis, Gerard J. (1988). "Advertising Exposure, Loyalty, and Brand Purchase: A Two-Stage Model of Choice," Journal of Marketing Research, 25: 134-144. Winer, Russell S. (1986). "A Reference Price Model of Brand Choice for Frequently Purchased Products," Journal of Consumer Research, 18:45-51.