European Journal of Operational Research 76 (1994) 283-289 North-Holland
283
Regularity, recency and rates Rita D. Wheat School of Business 306K, University of Southern California, Los Angeles, CA 90089, USA
Donald G. Morrison John E. Anderson Graduate School of Management, UCLA, Los Angeles, CA 90024, USA Abstract: Survey researchers often use the recency question format - " W h e n was the last time y o u . . . " to estimate purchase or usage rates. This format is particularly useful when the behavior in question is of a sensitive nature. The calculation of interpurchase times and, in turn, rates from recency format responses requires an a priori assessment of the distribution of interpurchase times. Although the usual assumption is that interpurchase times are exponential, evidence indicates that times between purchases are more regular. We relax the exponential assumption and measure the effects of more regular purchase behavior, discrete purchase behavior and the dead period between purchases on estimated rates. Even though our focus is on recency, our results also give some insights into the commonly reported frequency based result that shows heavy consumers are 'more regular' in their purchasing behavior. Keywords: Erland 2; Exponential; Geometric; Interpurchase time; Recency
Introduction Survey researchers are often interested in estimating the rate at which consumers engage in particular activities, specifically the purchase or use of products. They have two question formats at their disposal with which to measure these unobservable, latent rates. The frequency question format asks the respondent to record how many times h e / s h e has engaged in the activity of interest in a certain interval of time, e.g., " H o w many times did you purchase cereal last month?". The recency question format asks the respondent to record the last time h e / s h e engaged in the activity, e.g., " W h e n was the last time you purchased cereal?". While both of these questions
Correspondence to: Prof. R.D. Wheat, School of Business 306K, University of Southern California, Los Angeles, CA 90089, USA.
yield data with which to estimate latent rates, the recency format is inefficient in that it ignores all information but that of the last purchase. Despite the fact that the recency question throws out valuable information in estimating rates, it is often the more desirable format. Greene (1982) recommends using the recency format " . . . i f we expect subjects to report more accurately how long ago they last b o u g h t . . , than how often they bought in the past". He points out that the format can actually reduce response error, particularly when the behavior in question is of a sensitive nature, such as alcohol consumption, or one with strong social norms, such as going to church. In these instances, recency questions, which are less threatening because they only weakly reveal rates, will most certainly yield more accurate responses than the intrusive frequency format. Buchanan and Morrison (1987) compare the reliability and statistical efficiency of
0377-2217/94/$07.00 © 1994 - Elsevier Science B.V. All rights reserved SSDI 0377-2217(93)E0301-D
284
R.D. Wheat,D.G. Morrison / Regularity, recencyand rates
the two formats and find that the recency format is preferable not only for behavior of a sensitive nature, but when the latent rate is very low. Models of the recency format have been previously analyzed and applied. Several studies have examined its use in estimating job durations (Sorenson, 1977), residential mobility (Allison, 1985), and deodorant purchases and alcohol consumption (Greene, 1982). When researchers use the recency question format, they require an a priori assessment of the distribution of interpurchase times, that is, their regularity. In fact, Morrison (1973) shows that the true, latent rate estimated from responses to recency questions will vary depending on the assumption one makes on the true distribution of interpurchase times. If true purchase behavior is deterministic, the response to the recency question on average will equal half of the true purchase rate. However, if purchases occur in a Poisson fashion, and interpurchase times are exponential, the response to the recency question on average will exactly equal the true purchase rate. Ignoring the underlying purchase process will lead to biased results. The purpose of this paper is to examine how sensitive the rates derived from self-reported recency data are to other, non-random underlying purchase processes. We begin with a review of the relationship between recency responses and purchase rates. We then relax the usual assumption of exponential interpurchase times to measure the effects of issues such as more regular, condensed Poisson purchase behavior, the dead period between purchases, and regularly scheduled shopping trips on these estimated purchase rates.
rocal of the mean interpurchase time.) For example, if a consumer responds that he last bought the product 4 days ago, the researcher will use this response to derive an estimate of how often the consumer buys the product. Although it would appear that the recency response, Z, would be on average half of the true interpurchase time (in the example above, 8 days, because on average we would expect t to fall halfway between the consumer's interpurchase time interval), this reasoning is faulty. Actually, on average, response Z is half of the 'sampled interval', Y, because one needs to take into account the issue of length biased sampling. That is, the distribution of the sampled interval Y, which contains t, is not exactly the same as that of the true interpurchase time interval X. The fact that the interval Y was sampled makes its derived distribution g(y) biased. The relationship between the true interpurchase time distribution f ( x ) and the sampled interval distribution g(y) is
g(y)=u-lyf(y),
y>0,
(1)
where u is the mean of the random variable X, and
E(Y) =E(X)
Var(X) + - -
E(X)
(2)
(See Feller (1971) and Morrison (1973) for details of this result.) Equation (1) clearly shows that the distribution of the sampled interval is proportional to the true interpurchase time distribution multiplied by the likelihood that the sample interval Y contains t. How does the researcher get from Z to X ? As discussed above, the relationship between Z and Y for any arbitrary f ( x ) is
E ( z ) = ½[E(v)] How are purchase rates determined from recency data?
For a particular individual, let X equal his true interpurchase time. Let f ( x ) denote the probability density function of X. At an arbitrary point in time, t, he is asked, " H o w long ago did you last purchase the product?". His response yields the recency time Z. The researcher desires to estimate the true mean interpurchase time, i.e., the expected value of X, from the response Z. (Note that the purchase rate is then the recip-
and (2) defines the relationship between E(Y) and E(X). Therefore, the relationship between recency response Z and true interpurchase time X depends on the assumption made about how regularly the consumer purchases the product. When interpurchase times are random In this case purchases are Poisson, and the probability density function (p.d.f.) of interpurchase time X is
f ( x ; A) =A e -xx,
R.D. Wheat, D.G. Morrison / Regularity, recency and rates
with a mean
E ( X ) = 1/~ and a variance V a r ( X ) = 1 / ~ 2. From (1), we find that the p.d.f, of the sampled interval Y is
285
estimate interpurchase times. For an individual whose purchases occur in a random, Poisson fashion, the amount of time that has passed since he last made a purchase also represents his interpurchase time. Thus, his purchase rate is simply the reciprocal of E(Z). These relationships are summarized in Table 1.
When interpurchase times are deterministic g ( Y ) = A2y e - A y , or Erlang 2 - the sum of two exponential distributions - with a mean
In the case of regular interpurchase times (i.e., purchases are clockwork-like, deterministic),
E ( Y ) = 2/A.
f ( x ) = constant,
Since
with no variance. Thus, from (2) we find that
E ( Z ) = ~1 [ E ( Y ) ]
e(r) =E(X)
for a n y f ( x ) ,
in this case E(Z) is 1/A. Therefore, when interpurchase times are exponentially distributed,
E(Z) = E(X).
(a)
In words, when purchase behavior is random, the average self-reported time since the last purchase will be exactly equal to the true mean interpurchase time. In his elaboration on Feller's (1971) 'waiting time for the bus paradox', Morrison (1973) shows that this result stems from the lack of memory property of the exponential distribution; knowing how much time has passed since the last occurrence of an event tells you nothing about when the next event will occur. This property holds true not only for forward times but for backwards times as well. Result (a) indicates that there is no necessary adjustment to the recency response in order to
and the sampled interval and the true interval are identically distributed. Since E ( Z ) = ½[E(Y)] for any f ( x ) ,
e(z)
(b)
1
In words, the average self-reported time elapsed since the last purchase will equal half the true mean interpurchase time. The intuition for this result is that on average, we expect t, the time at which the recency question is asked, to lie halfway between two purchases, and thus the time since the last purchase will equal half the true interpurchase time. For an individual whose purchase behavior is deterministic, his recency response must be adjusted by a 'fudge factor' of 2 to arrive at his mean interpurchase time. His purchase rate is then the reciprocal of adjusted mean interpur-
Table 1 The relationship between recency responses and interpurchase times Interpurchase time distribution
Mean interpurchase time E ( X )
Mean recency time E(Z)
Recency response adjustment factor
Exponential Erlang 2
1/3. 2/3.
1/3. 3/(23.)
1 4
Delayed exponential Geometric Deterministic
1 -- + d 3. 1
(1 + Ad)2+ 1
2(1 + Ad) 2
23.(1 + 3.d) 2-p
(1 + 3.d)2 + 1 2 b
p k
2p ½k
2- p 2
" When d = 0.42(1/3.), the adjustment factor equals 54" b When p = 0.5, the adjustment factor equals 5. 4
a
286
R.D. Wheat, D.G. Morrison / Regularity, recency and rates
chase times. These relationships are summarized in Table 1.
Adjusting recency responses for more regular interpurchase times
Results (a) and (b) specify the relationships between recency data and true interpurchase times for the extremes of random and deterministic interpurchase times. The exponential assumption has been attacked as unrealistic because the distribution has its mode at 0, implying that the most likely time for the next purchase to occur is immediately following the last one. This conflicts with the dead period usually observed between purchases. In fact, empirical evidence suggests that for many product purchases, although behavior is not deterministic, it is certainly more regular than random. (See Chatfield and Goodhardt (1973), Gupta (1988), Herniter (1971) and Lawrence (1980) for several examples.) In these instances, two competing purchase behavior models have been suggested as alternatives to the Poisson with exponential interpurchase times. The first and most commonly used is the condensed Poisson model, which assumes that interpurchase times are distributed Erlang 2. The Erlang 2 is an appealing alternative because of its time dependence property. As more time passes since the last purchase, the probability of making another purchase increases, up to a point. In addition, its mode is greater than zero. However, this distribution still does allow for a purchase to be made immediately following the last purchase occasion, so it does not entirely capture the dead period phenomenon between purchases, during which the probability of making a purchase is null. The second less commonly used model is one in which interpurchase times are distributed delayed exponential (Wheat and Morrison, 1990). The behavioral explanation for delayed exponential interpurchase times is that for some time following a purchase there is a dead period during which the consumer will not make a repeat purchase. The dead period could be a function of the size of the product bought, or the consumer's usage pattern. After this dead period, another purchase will occur in a random, exponential pattern. Thus, the delayed exponential distribu-
tion is simply an exponential pushed forward in time. The issue of interest is what happens to the relationship between responses to the recency question format and true mean interpurchase times when the distribution of interpurchase times is Erlang 2 or delayed exponential. From results (a) and (b) above, we know that while the average recency response in both cases will not exactly equal the true mean interpurchase time, it will be greater than half of the true mean, i.e.,
½[E(X)]
f ( x ; A) =A2x e -~x, with a mean
E ( X ) = 2/A and a variance Var(X) = 2/,~z. Using (1) we find that the length biased sampling distribution of Y is Erlang 3 with a mean
E(Y) =3/A. Based on the general relationship between E(Z) and E(Y),
E(Z)
(c)
=3
In other words, when interpurchase times are actually Erlang 2, the average recency response, or self-reported time since the last purchase is equal to 33 of the true mean interpurchase time. The recency responses of individuals whose interpurchase times are distributed Erlang 2 need to be adjusted by a factor of ~4 to arrive at mean interpurchase times and in turn purchase rates. These relationships are summarized in Table 1.
When interpurchase times are distributed delayed exponential In this case, the p.d.f, of X is
f(x;k,d)=Ae
-x(x-a),
x>d,
where d is the delay, and k is the purchase rate.
R.D. Wheat,D.G. Morrison / Regularity, recencyand rates The mean is
E(X)
= 1/X + d
and the variance is Var(X) = 1/A 2. The length biased sampled interval has a mean (1 + h d ) 2 + 1 e(r)
=
A(1 + Ad)
and the relationship between E(Z) and E ( X ) is (Ad)2 + 2 + 2Ad
E(Z) =
(2 + 2Ad)(1 + Ad)
E(X)
(d)
and is a function of Ad. Note that when the delay d is equal to 0, the true distribution of interpurchase times is exponential and E ( Z ) = E ( X ) as in result (a). On the other hand, when the delay becomes infinitely large, interpurchase times are deterministic, and E ( Z ) = I[E(X)], as in result (b). The relationship between E(Z) and E ( X ) depends only on the ratio of the delay to the mean of the exponential part of the delayed exponential distribution, 1/A. When does this relationship equal that of the competing Erlang 2 interpurchase time model? We derive the answer analytically, and find that when d = 0.42(1/A) the relationship between E(Z) and E ( X ) is identical for Erlang 2 and delayed exponential distributed interpurchase times. For survey researchers who use the recency format, the implication is that if interpurchase times are more regular than exponential, the self-reported time since the last purchase needs to be adjusted to arrive at mean time between purchases and in turn purchase rates. The adjustment to recency responses for individuals with Erlang 2 or delayed exponential (with a delay equal to approximately 40% of the exponential mean) interpurchase times is -~. 4
287
Erlang 2, and delayed exponential distributions allow purchases to occur at any point in time. However, many consumers have regularly scheduled shopping trips and will only purchase a product on those trips. Dunn, Reader, and Wrigley (1983) present histograms of interpurchase times that exhibit distinct spikes at 7, 14, 21, etc. days. Kahn and Morrison (1989) and Wheat and Morrison (1990) quantify these observations and find that in many instances, regularly scheduled shopping trips can make interpurchase times appear more regular than exponential. They model interpurchase time with a geometric distribution. This distribution is the discrete analogue of the exponential distribution and possesses its lack of memory property, i.e., the probability of making a purchase on any shopping trip given that one has not yet been made is constant for all shopping trips. Here again, the question of interest is what happens to the relationship between responses to the recency question format and true mean interpurchase times when the distribution of interpurchase times is discrete, rather than continuous. From results (a) and (b) above, we know that the average recency response will either equal half of the true mean interpurchase time or somewhat more, i.e.,
l [ e ( x ) ] < =E(z)
When interpurchase times are distributed geometric The distribution of interpurchase times is
f(x;p)=p(1-p)X-1,
x=l,2,3,...,n,
where p is the probability of making a purchase on any shopping trip. The variable X represents a unit of time, say 1 week. The mean is
E( X ) = 1/p and the variance is
Adjusting recency responses for discrete interpurchase times
The implicit assumption underlying the relationships discussed above is that interpurchase times are continuous. That is, the exponential,
Var(X) = (1 _ p ) / p 2 . The mean of the length biased sampled interval is
E ( Y ) = (2 - p ) / p
288
and the relationship between
E(Z) =½(2-p)E(X)
R.D. Wheat, D.G. Morrison / Regularity, recency and rates
E(Z) and E(X) is (e)
and is a function of p. Note that as p approaches 1, E(Z) = ½[E(X)]. In other words, when p = 1, a purchase occurs on every shopping trip and the time between purchases is deterministic. However, as p approaches 0, E(Z)= E(X). That is, when the probability of making a purchase on any shopping trip is very slight, the time between purchases is random. For what value of p is the relationship between E(Z) and E(X) similar to that of more regular, Erlang 2 distributed interpurchase times? We derive the answer analytically, and find that when p = 0.5, the relationship between E(Z) and E(X) is identical for geometric and Erlang 2 interpurchase times. Thus, when a purchase occurs on average every 2 shopping trips, the adjustment which needs to be made to recency response times is the same as that for Erlang 2 distributed interpurchase times, and 4 equal to 3. The relationships are summarized in Table 1. For survey researchers who use recency format, the implication is that if consumers have regularly scheduled shopping trips, the response to recency questions needs to be adjusted as indicated above. Determining whether or not consumers have regularly scheduled shopping trips is as simple as asking an additional survey question about this matter.
Discussion The recency question format is used by survey researchers to estimate the rate at which consumers purchase or use products. It is particularly useful when consumers would otherwise be reluctant to truthfully respond to the frequency question format. When the researcher asks a respondent how long ago he last purchased a product, the transformation of the response into an average interpurchase time, or purchase rate, will clearly depend upon the assumptions made about the distribution of the consumer's interpurchase times. We have derived the relationship between the responses to recency questions and mean interpurchase times for various scenarios. While the recency responses are derived at the individ-
ual level, purchase rates are not. Rather, an appropriate 'average' fudge factor (as summarized in Table 1) is applied to estimate the population average purchase rate. Since the adjustment made to self-reported recency times to eliminate the length biased sampling bias varies from no adjustment to a doubling of recency responses, it is imperative that the researcher start out with some assumption about the regularity with which consumers' purchases occur. One way to assess purchase regularity is to examine existing data on consumers' purchase histories for similar brands or products. Finally, this recency based paper has some implications for a result that has often been reported in frequency based literature, namely that heavy consumers are more regular than light consumers in their interpurchase times (cf. Chatfield and Goodhardt, 1973). Here we have seen that both the geometric distribution and the delayed exponential distribution, while retaining the spirit of random purchasing, may be more appropriate for modeling interpurchase times. Thus the result on heavy consumers may be driven by the fact that when regularity is measured by the coefficient of variation, the dead period of the delayed exponential and the higher p-value for heavy consumers make these consumers 'look' more regular than the corresponding light consumers even though the behavior of the heavy consumers may be just as 'random'. While these deviations from exponential purchase behavior are not critical in estimating rates from frequency data, we have shown that they must be addressed very carefully when analyzing recency data. In the case of the frequency data, Morrison and Schmittlein (1988) show that the negative binomial distribution works well because typical deviations from the model's assumptions, including that of exponential interpurchase times, tend to balance out. In the case of recency data, though, we cannot take the same cavalier approach. Deviations from the exponential interpurchase time assumption must be examined and appropriate 'correction factors' applied to the recency data collected. References Allison, D. (1985), "Survival analysis of backward recurrence times", Journal of the American Statistical Association 80, 315-322.
R.D. Wheat, D.G. Morrison / Regularity, recency and rates
Buchanan, B., and Morrison, D.G. (1987), "Sampling properties of rate questions with implications for survey research", Marketing Science 6, 286-298. Chatfield, C., and Goodhardt, G.J. (1973), "A consumer purchasing model with Erlang interpurchase times", Journal of the American Statistical Association 68, 828-835. Dunn, R., Reader, S. and Wrigley, N. (1983), "An investigation of the assumptions of the NBD model as applied to purchasing at individual stores", Applied Statistics 32/3, 249-259. Feller, W. (1971), An Introduction to Probability Theory and its Applications. Vol. II, 2nd ed., Wiley, New York. Greene, J. (1982), Consumer Behavior Models for Non-Statisticians, Praeger, New York. Gupta, S. (1988), "Impact of sales promotions on when, what, and how much to buy". Journal of Marketing Research 25, 342-355. Herniter, J. (1971), "A probabilistic market model of purchase timing and brand selection", Management Science 18, 102-113.
289
Kahn, B.E., and Morrison, D.G. (1989), "A note on random purchasing: Additional insights from Dunn, Reader & Wrigley", Applied Statistics 38/1, 111-114. Lawrence, R.J. (1980), "The Lognormal distribution of buying frequency rates", Journal of Marketing Research 17, 212220. Morrison, D.G. (1973), "Some results for waiting times with an application to survey data", The American Statistician 27/5, 226-227. Morrison, D.G., and Schmittlein, D.C. (1988), "Generalizing the NBD model for consumer purchases: What are the implications and is it worth the effort?", Journal of Business Economics and Statistics 6, 145-159. Sorensen, A.B. (1977), "Estimating rates from retrospective questions", in: D. Heise (ed.), Sociological Methodology 1977, Jossey-Bass, San Francisco, CA. Wheat, R.D., and Morrison, D.G. (1990), "Estimating purchase regularity with two interpurchase times", Journal of Marketing Research 27, 87-93.