Determination of panel sizes in designed experiments

Determination of panel sizes in designed experiments

BEHRAM 1. H A N S O M Determination of Panel Sizes in Designed Experiments ABSTRACT This articie develops formulas for panel size for designed exper...

677KB Sizes 2 Downloads 91 Views

BEHRAM 1. H A N S O M

Determination of Panel Sizes in Designed Experiments

ABSTRACT This articie develops formulas for panel size for designed experiments. Using a slmpie three-atMbute probfem, the author discusses Situations for determining panel sites in the context of the multiple regression and logistic regression formulations.Nlodeling member net presen: value and response rate as funcdons of the attributes of an acquisition program are the two situations considered. The aside also has a pedagogica! flavor and discusses how the maximum likelihood approach is used in estimating the variance of :he pardal logistic regression coeffcients. lfiustrative examples are provided to show :he application of the formulas. For marketers, the bottom line resuit is useful: tesmg with designed experiments requires far fewer observztionsthan classical tests of .hypotheses, where paifwise comparisons are carried out. This is because all the information is used simultaneously to estimate model coefkients. for a non-technicaf overview. see :he end of the articie.

DETERMINATION OF PANEL SIZES IN DESIGNED EXPERIMENTS

In a previous article ( 2 ) we discussed the use of factorial and fractional factorial designs to collect information for modeling response rates as a function of offer attributes. We recommended the use of the probit and logit models, rather than the OLS regression in that situation, because of the problems associated with OLS in modeling a binary response variable. This article addresses the problem of setting the panel size for factorial and fractional factorial designs. Using a simple three-attribute problem, we first discuss the situation where OLS is appropriate and then extend the problem to the logistic regression case.

P M Skc for R e g r c t s i o n Merklr U s i n g fwtorkl h r l g m single most important factor affecting the lifetime value (LTV) of a continuity club member is the time period over which the member is an active, dues-paying member. This in turn is affected by the characteristics of the acquisition program-namely, offer and acquisition medium. Suppose we wish to understand the impact of the acquisition medium (mail/telephone) and the use of a premium (present/absent) and a discount (present/absent) on membership lifetime value. To do this we can select eight panels of club members, acquired, say, more than five years ago, with each panel consisting of members characterized by the above variables. Exhibit 1 shows this Z 3 factorial design. The lifetime value of each panel member can then be calculated as the discounted cash flow over each member’s life. All active members (those with member life greater than five years) can be assumed to have the lifetime value of a five-year member. This may be a reasonable conservative assumption if the percentage of active members after five years is small. (If that is not the case, a more specialized technique, tobit, needs to be used.) An OLS regression model can then be built to estimate the effect of the three attributes on customer lifetime value. The question we wish to address here is how many members should be selected for each panel.

A

z3 Factorial Design for Modelinq Lifetime Value

Let: x , = Acquisition medium (telephone = - I , mail = 1 J x2 = Premium (present = - 1 , absent = 1 J x3 = Discount (present = - 1 , absent = 1 J

Panel

XI

x2

x3

I

-1

-1

-1

2

1

3

-1

4

-1

5

-1

1

7 8

-1

1

-1

I

-1 1

6

-1

1

-1

-1 1

1 1

1

1

1

All panels of equal size = r members

I~AMPLE.The

L N E T W VALUE REGRESSION MODEL

variables as follows:

ne

JOURNAL OF DIRECT MARKETING

Defining the

y i = lifetime value of member in ith panel xli= acquisition medium of ith panel (telephone = -1, mail = 1) xZi= premium (present = -1, absent = 1) x3i= discount (present = -1, absent = 1)

We wish to estimate the partial regression coefficients &, Bl , &, B3 of the model:

Here ei is the error term and is assumed to be identically, independently distributed with a mean of zero and a standard deviation given by u , , . ~ In . the above model, if hi represents the estimated value of Bi, the expected L T V of members in panel 1, (in is given by: Exhibit 1 )

A,

and that of members in Panel 2 is given by

92 = & + B, - & - B3

(3)

subtracting ( 2 ) from ( 3 ) gives us 2&

=

( A-PI>

(4)

2B1 thus measures the incremental effect of the mail program over the telemarkedng program. Similarly,

VOLUME 6 NUMBER 4 AUTUMN 1992

2& and 2B3 measure the incremental effects of the

Formula fora( &)

premium and the discount respectively. The Bi are often referred to as contrasts in the statistics literature. The panel size Y, for each of the panels in Exhibit 1, can then be calculated by adapting the sample size formulas for estimating means, to the case of estimating the means of the random variables Bi. This is shown in Figure 1. If T is the tolerance or the maximum error about the true value of Bi that the decision maker is willing to tolerate, and a(&) is the standard deviation of B,, Tcan be written in terms of u ( B i ) . Since u ( & ) is a function of the panel size r, equation ( 5 ) can be used to calculate its value. However, before w e present the formula for a ( f j i ) , it is necessary to introduce some matrix notation. The lower and upper limits, L and U of the ( 1 - a ) percent confidence interval for Bi are given by:

Ignoring the panels for the time being, and assuming that i in equation ( 1 ) takes on values from 1 through 8, equation ( 1) can be written conveniently in the form:

L

=

here,

Y=

[ ; z; 2: ”1 [ ;] Y8

x=

B3

... .. . ... ... z = ... XI8

X28

X38

It can be shown that (see for example ( 4 ) ) :

Bi - Z , / , U ( & )

(7)

B = (x’x)-’x’y

u = B, + Z 0 , 2 U ( B i )

and

Where Bi and a(&) are the estimated mean and standard deviation of Bi and Z,/z is the standard normal variate with a right-tail area of a / 2 . Defining the tolerance T , a value set by the decision maker equal to ( U - L ) / 2 , we have: T = Z01/2u(&)

[.;I= [q

(5)

Here, x‘ is the transpose of x (formed by interchanging the rows of x with its columns) and (x’x)-’ is the inverse of the 4 X 4 ( 4 rows, 4 columns) sums of squares and cross product matrix, x’x; assumed to be invertible. GY.-,‘ is the estimated variance of the ei, and Cov ( B ) is the 4 X 4 variance co-variance matrix of B.The x matrix in our example can be easily generated by adding a column of “1s” to the design shown in Exhibit 1. x , x ‘ x and ( x ’ x ) are shown in Exhibit 2 . Since the off diagonal terms of ( x ‘ x ) - ’ matrix are zero, w e observe that there is n o correlation among the regression coefficients. This is a property of all “orthogonal designs,” a larger class of designs to which all 2‘ factorial designs belong. The variances of the estimated regression coefficients are all equal and given by C,,.x2/8. Hence, we note that in this instance the variance of Bi is estimated by dividing C,,.x2, the estimated variance of the error term, by the number of rows in x . If each panel has r members, x has 8r rows. Thus,

-’

L

Bi

U

~

FIGURE 1

Density Function of the Random Variable B,

JOURNAL OF DIRECT MARKETING

VOLUME b NUMBER 4 AUTUMN 1992 19

EXHlBlT 2

Design, Sums of Squares and Cross-Product,and Inverse of Sums of Squares and Cross-ProductMatrices

giving us: 8

0

0

customer and finally computing the variance of the lifetime values. This problem is n o different than the o n e where the decision maker wishes to estimate the mean of a population and must know the value of the standard deviation to calculate the appropriate sample size. Since Sy’ is based o n a pilot sample, to be conservative we recommend using the upper bound of a (1 - a) confidence interval for the population variance of y. Exhibit 3 derives an expression for ; y . x 2 and substitutes it in equation ( 9 ) providing a formula for the panel size for our example problem. Exhibit 4 shows the application of this formula. Panel Size for Family of Confidence Intervals

0

After the regression model is built the partial regression coefficientswould normally be presented as confidence intervals as shown in Figure 1. Equa-

and

EXHIBIT 3

Formula for Panel Size

-

Let:

-

0

2

(B,) = i?y.x2/8r

(8)

Substituting ( 8 ) into ( 5 and simplifying we have:

Hence, if i?,,..x is known we can estimate r. However, i?,,.x is rarely known and thus must first be estimated. Estimating Gy.x The variance of the error term, i?y.x2, is estimated by MSE, the mean square error. However, we can obtain a value for MSE only after building the regression model. We get around this problem by first estimating the variance of the criterion variable for the adby ’3’. and assuming a justed coefficient of determination, RAdi2, for the yetto-be built regression model. This allows us to calculate an estimated value for Gy.x* the estimated variance of y, S”, may be available from prior research or can be calculated by first selecting a random pilot sample of customers acquired, say, five years ago, then calculating the lifetime value of each

20 JOURNAL OF DIRECT MARKETING

n be the size of the sample used to estimate the standard deviation of y 8’, be the calculated (sample) variance of y

Since :S has a chi-square distribution, the ( I - a]confidence interval for the population variance of y, uy2, is given by

Here. ~ ~ ~ 2 i and ~ l ixt--o/2.ir?-l~ 2 are the a / 2 and ( I - 4 2 ) fractiles of the chi-square distribution with ( n - 1 I degrees of freedom. RAd,’

MSE/MST

=

I

=

I -GyY,Z/~s~

-

(1 IJ

Here. MST estimates u:. However, instead of using 1 ,’ for MST. the upper bound of the (, we recommend confidence interval of c,’ from (10).into ( I 1 J and solving for ;y;.

:,G

= nil

-

R A d ~ ) s ~ / x m / 2 . ( P l :

Finally. substituting ( 1 2) into (91 gives

1121

US:

r = ~(Z,,ZT12*ntl- RA,‘IS~/X,,Z,~,~~

(131

VOLUME b NUMBER 4 AUTUMN 1992

ZmIS = 2.50 for large n , and the panel size, r, is 239.35 or about 240. =

EXHIBIT 4 Calculating the Panel Size Assume that the size of the random pilot sample is 51 and the calculated value of the variance of LTV for these 51 customers i s 81 (dollars)’ If previous work leads management to believe that at least 40 percent of the variability in LTV can be explained by acquisition medium and the presence or absence of a premium and an initial discount, what should be the size of the panels? Management would also like to estimate the effect of the attributes within one dollar of their true value and be 95 percent confident in the results Here, n = 51, S,’= 81. Rho: = 0 40, T = 0 5 (tolerance for 2B,= I ] = 0.05. Zoo,,

= 1 96

Assuming an a of 0.05 also for the pilot sample, Xooz,,,z

= 32.36

Hence.

Hence, panels of size 150 each appear to be adequate for this problem.

tion ( 5 ) and thus equation ( 1 3 ) are based on presenting the confidence intervals separately for each of the partial regression coefficients. If we were interested in a joint ( 1 - a ) confidence level for all partial regression coefficients, the only change in formulas ( 5 ) and ( 13) would be the replacement of Z,/* by t ( a / 2 s ; n - s). This gives us the Bonferroni joint confidence intervals. Here, s is the number of confidence intervals, (which in this case would be 4 ) and n , the total number of observations for the 2 3 full factorial design is 8 r . Since typically 8 r is a number at least in the hundreds, Zm12can be replaced by ZalZsrather than the t statistic. The panel sizes based on the Bonferroni joint confidence intervals would then be given by:

Panel Size for a One-Half Factorial Design If a one-half fractional factorial design consisting of

4 panels is used, the formula for panel size changes in equation ( 13) only slightly. Instead of the we would have a ”. Hence, we would need twice the number of members per panel, resulting in no savings. Note, however, since four parameters are being estimated the mean square error estimate would now represent only the variation within treatment (panel) combinations and is thus the pure experimental error variation. The one-half factorial thus does not allow us to test for lack-of-fit, since the interaction terms will be biased with main effects (see ( 4 ) for further details) . “

Panel Size for the Logit or Logistic Regression Model If we are interested in understanding the effects of

offer attributes on response rates, the logistic regression model can be used to estimate the response rate as a function of offer attributes. We discuss here first the maximum likelihood approach to estimating this model since we will need this background to estimate the variance of the partial logistic regression coefficients. Our presentation is based on Goldberger’s (1) probit analysis model. As before, assume x l i ,xziand are the attributes of the ith offer and Zi the underlying unobservable response variable for the ith individual is defined by:

Depending on the assumptions made about L;., different models can be obtained from this formulation via the maximum likelihood approach. For example, if V, is assumed to be normally distributed, with mean zero, we have the probit model. If V, is assumed to have a logistic distribution, we have the logit or logistic regression model. We assume here that V, has a logistic distribution. If F( t ) is the cumulative distribution function of the identically, independently distributed V, we have: ~ ( t=)Prob

For the Bonferroni joint confidence interval, s = 4 , (for the four regression coefficients) t( a / 8 , n - 4 )

JOURNAL OF DIRECT MARKETING





( F
(15)

Here, e is approximately equal to 2.72.

VOLUME b NUMBER 4 AUTUMN 1992 21

In practice what we observe is not Zi but the response y i , which may be coded 1 or 0 depending on whether Zi is positive or not. That is, y i = 1 if Zi > 0 =

0 otherwise

(16)

Using ( i 4 ) , (15) and (16) we get: Prob ( y i = 1) = Prob (Zi > 0) = Prob ( V , > =

-

1- F(-&

-&

- Blxli-

& q i - B3x3i)

- B 1 q - &xi-

nomial process, like flipping a coin, with probabilities of response and non-response given by (17) and ( 1 8 ) . The terms in the Rrst set of brackets in expression (19) represent the probability that the first individual responds and the second individual responds . . . and the cth individual responds. Since these are independent events, the probabilities are multiplied. Similarly the terms within the second set of brackets represent the probability of the remaining N- c individuals not responding. Using the II notation to represent products, expression (19) may be written as: L=

e ( &+ 8 1xll+ '%x2& kx3d 1+

e(&+~1xl+Rzx21+Yx33

(20)

Substituting (17) and (18) into ( 2 0 ) and simplifying results in:

1+

l)]

Since the likelihood function is the probability of observing the sample, it is easy to interpret. The observed values of y i are just realizations of a bi-

1

e&+~l"l+Rzx21+sx3r )l-y'

(21)

The variance-covariance matrix of Bi can be derived from equation ( 2 1) by taking the second order partial derivatives of log L with respect to Bi. (See for example (3) .) The expected value of the negative of the matrix of second cxder partial derivatives is the information matrix, I ( B ) , and the variancecovariance matrix of Bi is the inverse of I ( B) . The estimated values for the variance of the partial logistic regression coefficients, Var ( & ) , are thus the diagonal terms of the matrix [ Z(5)I-l. Denoting the ith row of the x matrix in ( 6 ) by X: and again ignoring the panels for the time being we have:

X:

L = [Prob ( y l = l)*Prob ( y 2= 1 )

22 JOURNAL OF DIRECT MARKETING

Prob ( y i = 0 )

i=c+l

(17)

In the maximum likelihood approach to estimating Bi we first write down the joint probability, or likelihood, of observing the sample (as a function of .Bi) and then search over the values of Bi that maximizes this function. Typically, instead of maximizing the likelihood function we maximize the log-likelihood function. This gives us the same estimates for Bi but makes the algebra considerably simpler. If we assume that the total number of observations is N ( N = 8r for the 2 factorial design) and without loss of generality if we assume that y i = 1 for the first c observations and y i = 0 for the remaining N- c observations, then the likelihood function L is given by:

=

Prob ( y , = 1) i= 1

Prob ( y i = 0) = Prob (Zi i 0)

- .Prob ( y c

n N

C

B3X3i)

=

(1 x1j

X7.i X3i)

and

It can then be shown that the information matrix, I( B ) , is given by ( 3 ) :

VOLUME 6 NUMBER 4 AUTUMN 1992

n

I(B) =

eB’Xi

C ( 1 + e 12 x i 4

i=l

(22)

Here B‘ = (& Bl & B 3 )

Note, the coefficient of the xi$ matrix in expression ( 2 2 ) is basically the product of expressions (17) and (18). Hence,

and

Xdj =

~

l

i( ~ 1 i ) ’

xIix2i

~

2

i x2ixli

(x2i)’

x2ix3i

x3ixli

x3ix2i

(~31)’

xji

xlixji

In the appendix we list the eight xi$ matrices when $ is the ith row of the matrix in Exhibit 2 and show that Z(B) can be written as:

0

0

Since I( B ) is a diagonal matrix

Note, as expected, since the Z 3 factorial design is orthogonal, the covariance terms are all zero and,

or

8

Var ( B,) = 1 / C Var ( y i )

r = Zul2’/T2

Var ( y i )

(30)

i= 1

for j = 0, 1, 2 , 3

(27)

The denominator in equation ( 2 7 ) is the sum of the variances of the eight rows. If each row is a panel consisting of r observations we have: Var (B,)= 1 / r C Var ( y i ) Writing (5) in terms of Var (B,) we have:

JOURNAL OF DIRECT MARKETING

Hence, calculating the panel size, r, requires the estimation of the variance of response for each panel. This would require guessing the value of the response rate for each panel since:

(28)

Note, T is the tolerance for B,. However, Bi is not the estimated effect of the ith variable on response rate, rather it is the estimated effect of the ith variable on log odds. This is easy to see, since equation

VOLUME b NUMBER 4 AUTUMN 1992

ZB

(17) can be rewritten in terms of the estimated parameters B, as:

(d/l- d) = rj, + Blxl + B2x2+ B3x3

In

(32)

here,

b = prob ( y =

1)

Effect of Other Attributes

We next show that eZHt is the multiplicative esti-

mated effect of the ith offer attribute on the odds ratio. Defining 0 as the odds ratio, its estimated may be written as: value, 6,

o1= (d/(l - I ; ) )

_

.

EXHIBIT 5 Calculating the Panel Size for a

z3 Factorial Design

Suppose management wishes t o estimate the effects of a premium, a discount o n membership dues for the first three months, and a fast 50 response initiative. where the first 50 responders are re warded with a gift If the plan IS t o use a Z3 factorial design h o b large should each panel be? Several assumptions need t o be made before equation (30)can be applied W e first assume a confidence level, ( I - a),of 90 percent This gives a value for Zml2of 1 65 W e next assume response rates for the eight panels shown in Exhibit 1 as follows Panel

Assumed Response Rate (%)

_

= e(MO+4~l+~x2+~3~3)

(33)

Assumed Variance

100

0.0099

115

0.01 14

1.20

001 19

130

0.01 28

For panel 1, xl= -1, x2 = -1 and x3 = -1. Hence,

I .35

00133

o1= e(&-hl-;j-i5)

I45

0.0143

For panel 2, xl

=

1, x2 = -1 and x3 = -1, and

h2=

(

150

0.0 148

165

0.0162 0 1046

Total Variance

Since w e believe the third variable. the presence of the fast 50 response initiative. (Panel 4) has a larger impact on response rates, than either the premium or the drscount, w e will set our tolerance as a percent o f the effect of this variable o n the odds ratio

i$+HI-I&H3)

Therefore,

0, = 0 . 0 1 / 0 . 9 9 = 1.01%andd,=00.0130/0.9870= 1317%

That is, e2’*is the estimated multiplicative effect of the ith variable on the odds ratio. Similarly, it can be shown that:

o3= (2”) 0,

Hence, 1.3 17 = (ezi)

Assuming the tolerance for eZB3is 15% of 1 304 1 or 0.1956 w e have

(35)

and

1 01

or, eZB3 = I .3041

I . 1085 5 e2835 I 4997 or, 0.1030 5 B3 5 0 2026 Hence, T = 0.0996 8

Since

c Var (y,)= 0 1046 I=

Expressions ( 3 4 ) - (36) are important since they provide a convenient mechanism for setting the tolerance. For example, if panel l is estimated to have a response rate of 1 percent and panel 2 a response rate of, say, 1.25% then:

I

Equation (30)gives us: 8

2

r = Z m l ~ / T Z Var I=

In)

I

= ( 1 65j2/(0.0996)’*0. 1046

61= 0.01/0.99

=

1.01%

and

6 = 0.0125/0.9875 = 1.27% 24

JOURNAL OF DIRECT MARKETING

=

2.624

To calculate the panel size based on the Bonferronijoint confidence interval, substitute Z+ = 2 50. for Za12in equation (30j. giving a panel size of 6,024.

VOLUME 6 NUMBER 4 AUTUMN 1992

resulting in: 1.27 = (eZR1)l.O1

eZH1= 1.2574

Hence, if we assigned a tolerance of 15 percent of 1.2574 to 6”,the ( 1 - a) confidence interval would be: 1.0688

I

e:” I 1.4460

or. 0.0332

IBl I0.1844

i.e., 2 T = 0.1844 - 0.0332

=

0.1512

or, T = 0.0756 Hence, the tolerance for i?,can be obtained by setting the tolerance on the multiplicative effect of the ith variable on the odds ratio. Exhibit 5 shows the application of these equations. SUMMARY

Formulas were developed here for calculating the panel size for factorial designs for data used to build OLS regression and logistic regression models. Though the formulas were developed for a specific Z 3 factorial design, they can be easily adopted to any 2 design or fraction thereof. For the examples provided, the recommended approach clearly shows the efficiency of this approach over classical tests of hypotheses of pairwise comparisons. The required panel sizes here are significantly smaller since all the data are used simultaneously. The modeling approach also lends itself to estimating interaction effects if they are present. However, as discussed, this requires a full factorial design rather than a fractional factorial for the three variable case. Once models are developed they can be used to estimate the response rates of different creative-offer combinations not tested previously. For exampfe, if the response rate of a new creative is known for a given offer, the response rate for the

JOURNAL OF DIRECT MARKETING

same creative but for a different offer (in terms of combinations of previously tested offer attributes) can be estimated, since the effects of offer attributes are known. Because the experimental design and modeling approach has significant advantages over the classical testing approach, since it is both cheaper and provides a much richer interpretation of the results, it is hoped that it becomes more widely accepted in the direct marketing community. APPENDIX Derivation of the Information Matrix f o r the 2’ Factorial Design

-x;=

(1

-1

-1

-1)

& = (1

1 -1

x;=

(1

-1

1 -1)

A = (1

-1

-1

x;=(1

1 1 -1)

&=(1

1 -1

$=(1

-1

&=((I

1

1

x2&=

-1)

1)

1)

1 1) 1 1)

1 -1

[ -/ -;] [ -; -; 41 1 -1

x34=

=

1

-1

1 - 11

-1

x*&

-1

-1

-1 1 1 -1 -1 1 1 -11 1 -1 -1 1

VOLUME 6 NUMBER 4 AUTUMN 1992

2S

; 1

&=[

-1

1 1 -1

4

..=[-;

;

1 - 1 1 -11 -1

1 - 1 1

-;I 1

1 -1

-1

x&=[

1 1 1 1

1 1 1 1

lj

1 1 1 1 1 1 1

Var ( y , ) x i &in equation ( 2 4 ) is obtained by multiplying each cell of the xiX; matrix by Var ( y i ) . Hence,

8

I(B)

=

C Var ( y i ) x i x i= i= 1

REFERENCES 1. Goldberger, A. S . (19641, Econometric Theoy, NY: John

Wiley 8r Sons. 2. Hansotia, Behrarn J. (1990). “Sample Size and Design of Experiment Issues inTesting Offers,” Journd ofDirect Marketing, 4 (Autumn) 15-25. 3. Maddala, G . S. (1985), Limited-Dependent and Quulifutive Variables in Econometrics, Cambridge: Cambridge University Press. 4 . Myers, Raymond H . (1976), Response Surface Methodology, independently published.

NON-TECHNICAL OVERVIEW The real world has long been a laboratory for direct marketers and the gospel of testing has been extensively preached since the early days of our field. Testing, of course, is both art and science, and in setting up a budget for testing, a company has to address several strategic issues. It needs to:

-

Assess its knowledge base- what it knows and does not know. This is often driven by its strategic plan, objectives, and management practices.

26 JOURNAL OF DIRECT MARKETING

-

Prioritize the information that needs to be developed. Develop a plan for the sequence and timing for obtaining this information. Determine costs and resources needed for each test, which in turn are driven by each test’s experimental design.

This article addresses the last issue- how tests or experiments should be designed to gather the most information at the least cost. Historically, marketers have tested o n e attribute or concept at a time. Typically, the response of a treated group is compared to that of the control group. Each such test is then separately evaluated. Generally, the better direct marketers are aware of the problems of sampling error and avoid its pitfalls by using large samples based on appropriate statistical sample size formulas. Typically, this approach results in fairly large samples, generally in the vicinity of 20,000 to 40,000, for response rates of around one percent and the typical assumptions of tolerance and confidence levels.

VOLUME 6 NUMBER 4 AUTUMN 19Y2

The methods discussed in this article on the other hand, generally require far smaller samples. The key difference is that instead of making several different pairwise comparisons, or comparisons of several test results to desired values, the experimental design approach takes a wholistic approach to the problem. The approach is wholistic yet not gestaltic, in that it assumes that the “whole” is the sum of its parts. For ease of exposition w e consider two specific problems. In the first instance we are interested in determining the effect of offer attributes on a customer’s lifetime value and in the second, the effect of offer attributes o n response rates. The problem is positioned in the context of a continuity such as an auto o r travel club, and w e assume that different offers can be designed by manipulating three offer attributes which can be “on” or “off.” The methods discussed, of course, d o not have any specific limitations regarding the number of attributes but for the sake of simplicity we have assumed a small number. The formulas derived in the article can, of course, be extended to any number of attributes. Since there are three attributes we can have a total of eight different offers, each consisting of different combinations of offer attributes. The experimental design approach consists of testing all eight offers (a full factorial design) or a fraction thereof (fractional factorial design) and fitting a function, often referred to as the response function, to the test results. The offer attributes in this instance are represented by binary valued (on/of€) variables. The

JOURNAL OF DIRECT MARKETING

key question addressed in this study is how large should each test panel be. For the first problem, since we are modeling a continuous, or metric variable, namely customer value, the appropriate modeling technique is multivariate regression. To be technically accurate, we call it OLS, o r Ordinary Least Squares regression, since there also exists a somewhat less restrictive regression model called GLS, or Generalized Least Squares. For the second problem, we use the logistic regression model, since we are interested in predicting response rates. This model is similar to the regression model, however, instead of assuming response rate to be the target, or criterion variable, it assumes a non-linear function of response rate, namely, the natural logarithm of the odds ratio, to be the criterion variable. This ensures that the predicted response rate always lies between zero and one. The model is also estimated via a significantly more computer-intensivetechnique, called method of maximum likelihood. We also discuss the relevant mathematics of this method as it applies to our sample size problem. The panel size formulas are derived in a fashion similar to that in classical tests of hypotheses. We assume a certain tolerance and confidence level in estimating the partial regression coefficients of the models. This requires estimation of the standard error of the partial regression and logistic regression coefficients. A series of numerical examples are also provided to show how the formulas can be applied.

VOLUME b NUMBER 4 AUTUMN 1992 27