Journal of Econometrics7 (1978) 351-372. 0 North-Holland
OPTIMAL
EXPERIMENTAL
Publishing Company
DESIGN IN ECONOMETRICS
The Time Series Problem
Panagiotis A. PAPAKYRIAZIS* California Polytechnic State University, San Luis Obispo, CA 93407, USA Received October 1975, final version received December 1976 Experiments in economics are expensive in terms of dollars and time. This suggests that efficient design is crucial, and that there is a need for econometricians to extend design theory to handle peculiarities of economic experimentation. This paper concerns optimal experimental design for various time series models. Examples are presented illustrating the improvement in estimation accuracy that can be obtained.
1. Introduction
Until recently most interest in econometrics was directed toward the data analysis problem. However, there now exists a growing recognition of the importance of controlled experimentation in economics, despite the traditional view that economics is a ‘non-experimental’ discipline. Indeed, many economists have presented convincing evidence that controlled experimentation is possible, where the experimental unit may be a consumer, a firm, an industry, or the economy as a whole.’ A brief review of the experimental economics literature may be helpful. The most important contributions in the field of experimental economics may be roughly categorized into : (1) Real-World Experiments, (2) Game Experiments, and (3) Computer Simulation Experiments. (1) Real- World Experiments are the type of experiments in which subjects respond to controlled policy parameters in real world situations. Perhaps the best known real-world experiment is the New Jersey Graduated Work Incentive (negative income tax) Experiment, which was an attempt by the federal government to use the experimental method to answer, once and for all, some of the *The author is greatly indebted to Professors John Conlisk, John Hooper, and Ramachandra Ramanathan of the University of California, San Diego, for their helpful comments and suggestions. Thanks are due to an editor and a referee of the Journal for useful comments on an earlier version of the paper. ‘See, for example, Castro and Weingarten (1970) and Naylor (1971).
352
P.A. Papakyriazis, Optimal experimental
design
policy questions that surrounded welfare reform in the mid-1960s.’ Other realworld experiments are the North Carolina and Iowa rural negative tax experiment and the Seattle and Gary negative tax experiments.3 We are likely to see more of this type of experimentation in the near future; experiments on manpower, housing subsidy, and health insurance are presently proposed. Experiments on the effectiveness of pollution taxes could now be laying the foundation for new antipollution laws; and several developing countries are now engaged in experiments on financial incentives to control population growth. In business economics, real-world experimentation is on the increase.4 (2) Game Experiments, in which some human behavior of a system is represented by live participants, while other human behavior and the non-human factors of the system are represented by a model, have been used increasingly in microeconomic contexts since 194K5 Examples are indifference curves experiments,6 market experiments,7 and price expectations experiments.’
(3) Computer Simulation Experiments are experiments in which large computers are used to simulate the behavior of models too complicated for analytical solution. Naylor (1971) describes a number of simulation experiments.’ In applying the ‘new’ methodological skills of experimentation, econometricians face some peculiar problems. The relevant statistical design techniques are often inapplicable, because they were developed for simpler situations. Thus, the problems of controlled experimentation in economics provide an opportunity for econometricians to contribute to the literature of experimental design. The dynamic nature of economic experimentation is the motivation underlying this paper. In almost all types of experiments the experimenter is strongly interested in the dynamic behavior of the experimental subjects. Game experiments and computer simulation experiments are typically dynamic, and realworld experiments are likely to be multiperiod. The New Jersey Graduated Work Incentive Experiment, for example, was to last four years, which raised issues of the dynamics of response. However, since the relevant statistical design literature was not dynamic, the designers treated the sample of families as a cross-section of data, despite the time series complications [Conlisk and Watts (1969)]. But, as ‘See Orcutt and Orcutt (1969), Orr (1969), Conlisk and Watts (1969) and Watts (1969, 1971). ‘See Bawden (1969), Kurz and Spiegelman (1969) and Kelly and Singer (1971). ‘See, for example, Buzzel, Cox and Brown (1969) and Gabor, Granger and Sowter (1969). %e Chamberlin (1948). ‘%ee Yaari (1965), and McCrimmon and Toda (1969). ‘See, Shubik (1969), Smith (1962, 1964, 1965), Friedman (1963, 1967, 1969), Carson (1967), Dolbear and Others (1968) and Frahm and Schrander (1970). ‘See, for example, Fisher (1966). ‘See also, Naylor, Balintfy, Burdick and Chu (1966), Naylor and Sasser (1967) and Naylor (1969) and references there.
P.A. Papakyriazis, Optimal experimental design
353
this paper suggests, good designs for a simple cross section experiment may be poor designs for a time series or cross section of time series experiment. This suggests the need for analyzing time series ‘optimum’ designs, which is the purpose of this paper. The first systematic attempt at optimal designing for a time series model seems to have been Box and Jenkins (1970, pp. 416420). Although they illustrate optimal designing with a simple and elegant example, their main purpose is to point out the problems. The present paper utilizes the Box-Jenkins approach to obtain optimal designs for certain time series models not considered by Box and Jenkins. The organization of this paper is as follows : In section 2 we present a mathematical formulation of the experimental design problem and discuss a number of design criteria. Section 3 suggests extensions of experimental design to a number of time series models. Examples of efficiency improvement through optimal designing are presented in section 4. Section 5 concludes. 2. Problem statement and design criteria 2.1. Problem statement
Consider the classical linear regression model
Y = Xp+&.
(2.la)
Here Y is a TX 1 dependent variable vector. X is a TX K regressor variable matrix with rank K; in a design problem, X is subject to (constrained) choice. /I is a K x 1 vector of parameters to be estimated. Eis a T x 1 vector of error terms for which E(e) = 0 and E(d) = 0’1. The Best Linear Unbiased Estimates (BLUE) of /? and their variance-covariance matrix are
jl = (X'xy'X'Y,
V(B)= 02(X’X)?
(2.lb)
Once model (2.la) has been selected by an experimenter, the design problem is to choose an ‘optimal’ X subject to appropriate constraints such as: a limit on the number of observations, a budget constraint, bounds on the values regressors may take on, and functional dependencies among elements of X (as when one regressor variable is the square of another). In some cases the problem of choosing X can be simplified by using optimality theorems from the literature, such as the orthogonality theorem [Tacker (1952), for example]. When no such simplifications are available, the Conslisk-Watts (1969) approach can be used to solve the experimental design problem. Steps toward relaxing the assumptions of the standard linear regression model have been taken by many researchers. We will be concerned with various time
354
P.A. Papakyriazis, Optimal experimental design
series generalizations. In particular, we will be concerned with various special cases of the general distributed lag model [as given, for example, by Box and Jenkins (1970)], Yt =
fl
Pi(L)xit+Et9
where
Here 6(L), 0(L), 4(L), fli(L) and wi(L) are polynomials of specific degrees in the lag operator L (defined by Lkxi, = xi,+J. The a,‘~ and bit’s form white noise processes which are assumed to be independent of each other. Model (2.2) is quite general in the sense that almost all time series models discussed in the econometric literature are special cases. The design problem of this paper is to find a (constrained variance) control process (Xit} which optimizes a suitable criterion function of the variance-covariance matrix of the model estimators. 2.2. Design criteria In general, parameter estimation may be performed for one or more of the following objectives: first, accurate determination of parameter values which may have some economic significance in the context of an econometric model; second, forecasting; and third, control. When one is essentially interested in understanding the model itself (i.e., not directly using it for forecasting or control) the methodology of Box and Jenkins suggests three possible aspects of design criteria: a control process can be optimal for identification (model specification), estimation (all or part of the parameters) and/or diagnostic checking (model validation). On the other hand, in the combined estimation and control problem the control process will have a dual purpose: affecting the information gathering process on the model parameters and affecting the behavior of the state variable. It is not difficult to see that these objectives can be often conflicting [Zellner (1971, ch. lo), for example]. Thus, the notion of optimal (as will be used later) is clearly related to the goal one desires to achieve and for experimental design purposes, a single criterion, which reflects the final use to which the model will be put, is to be chosen. The paper assumes the experimental goal is accurate parameter estimation. The goodness of parameter estimates is most conveniently expressed in terms of the bias and the variance-covariance matrix of the estimates. We assume that an unbiased estimator is used; so a design can be evaluated in terms of the covariance properties of the estimates. Many notions of optimality have been proposed, most of them trying to express ‘the smallness’ of the dispersion matrix
P.A. Papakyriazis, Optimal experimental design
355
by a scalar function. The most important measures of performance, exposited in terms of I’(/?) from (2.lb), are the following:” (1) Orthogonali~~ : This criterion makes it possible that the parameters can be estimated independently (in the statistical sense) from each other. (2) Rototubility: This criterion requires that I@) must be invariant to the orthogonal transformations on variables (i.e., the level contours of predicted variance should be rotosymmetrical - spherical - in the space of control variables). (3) D-Optimality: Minimize 1W$)I . Thi s is equivalent to minimizing the volume of the parameter uncertainty ellipsoids [CramCr (1957, p. 120)]. Fig. 1 illustrates this and the following criteria in two dimensions; the determinant is proportional to the dotted area.” (4) HPD-Optimality: According to this Bayesian criterion, a design is optimal when the HPD (Highest Posterior Density) is minimized [Box and Tiao (1965)]. The HPD is proportional to the square root of 1V(b) I. Minimize tr[V@)]; that is, the average variance of the (5) A-Optimality: parameters must be minimized. Geometrically, tr[ V@)] represents the square of the distance between the center of the ellipse and the vertices of the K-dimensional orthotope containing the concentration ellipsoid; alternatively, among all the concentration ellipsoids corresponding to different control processes, the experimenter using the trace criterion selects the one which corresponds to the smallest hypersphere (1) of radius Y, given by rf = tr[V(&]. [See fig. 1 for the case K= 2.1 If (pu,,pZ ,..., ,ux) are the eigenvalues of V(j) matrix, the determinant and trace criteria are related in the sense that det[ I@)] = pr.. .pK and tr[V@)] = fil+...+pK. (6) E-Optimality: Minimize the maximum eigenvalue of V(b). Geometrically, the largest axis of the uncertainty ellipsoid (i.e., max b1 ,pJin fig. 1) should be minimum. The experimenter using the E-Optimality (maximum eigenvalue) criterion selects among all concentration ellipsoids corresponding to different control processes, the ellipsoid corresponding to the shortest principal semi-axis of maximum length; equivalently, he selects the smallest hypershere (2) of radius rt given by rf = maximum eigenvalue of [I@)]. (See fig. 1 for the case “‘In each case, the instruction to ‘minimize’ the criterion means ‘minimize subject to relevant constraints’ (on sample size, budget, and so on). l’An important property of D-Optimality is that it is invariant under scale changes in the parameters and linear transformations of the design variables, whereas other conventional criteria (A-Optimality and E-Optimality) are affected by these transformations. Therefore the determinant criterion has the convenient property that the experimenter need not keep track of units. This is viewed differently by various researchers. Box (1972), for example, seems to believe that a design criterion should not depend on the scale of measurement; on the other hand, other researchers, including Conlisk and Watts (1969), insist that the sensitivity of a design criterion to a linear transformation is an advantage. In our opinion, if one has reasons to be interested, for example, in some weighted sum of the coefficient variances, then the weighted trace criterion may be appropriate. Intuitively, the trace accommodates linear weights because it is an additive function; the determinant does not because it is multiplicative.
356
P.A. Papakyriazis, Optimal experimental design
K = 2.1 This might be desirable if the experimenter fears large one sigma interval (imprecise estimation). Suppose that a given design suffers from imprecise estimation. Then it is advisable to add some additional observations in order to increase the precision of the estimation and the maximum eigenvalue
Fig. 1. Design criteria for K = 2. 1. Area of 2.
(E) = rc.OM.ON
= ~~(p,*p~)~/~= n(det [V@)])1/2.
Area of circle (1) = n’ OK2 = n(OL2+ OF*) = n(p, + pz) = n(tr [ PQ?)]) = x’ OD’ = K(OG* + OH’) = 7~(V[b] + V[&]) = (tr [V(j)]) [or OK = (OL’+ OFZ)l’Z = (tr [ V(j)])1/2, OD = (OG2+OHZ)1’Z = (tr [V(B)])“‘].
3. Area of circle (2) = n*OMZ = n(pmax) = TC,U, [or OM = (pJ-“2 = (~max)“‘]. 4.
Area of circle (3) = 7~.ON2 = n(pmin) = npcZ [or OM = (p2)-1’2 = (pmin)1/2].
criterion seems appropriate [see Johnston (1972, pp. 165-168), for example]. From another viewpoint, this criterion calls for minimizing the maximum variance of an estimated contrast c’j? of fi, where c is a unit length Kx 1 vector [the largest eigenvalue of V(b) equals the maximization over c of V(c’@].
P.A. Papakyriazis, Optimal experimental design
357
(7) G-Optimality: This minimix criterion (defined in the dependent variable space) requires that the maximum variance of the estimated regression function over the relevant region be minimized. Geometrically, this is equivalent to minimizing the maximum dimension of the smallest orthotope containing the concentration ellipsoid, where the sides of the orthotope are parallel to the Br, . . . ,fiK axes (rectangle ABCD in fig. 1). All together, loosely stated, the experimenter is either minimizing the volume of (E), or minimizing the volume of the spheroid (1) containing the box (4) containing (E), or minimizing the volume of the spheroid (2) containing (E), or minimizing the length of the orthotope (ABCD) containing (E). This paper will focus on the determinant and trace criteria. 3. Experimental design for time series regression models 3.1. The lagged independent variables model Consider the finite lag model of the form K
r+l
(3.1) where E, = a,
(whitenoise),
which is a special case of the general regression model (2.2). Here we assume that z is known and small; and no special assumptions are made about the coefficients. The variables in (3.1) are measured as deviations from their respective means. Furthermore, the following assumptions are made. Assumption 1.
{xi,} and {a,>are independent.
Assumption 2. {a,) is a white noise process; particularly, the a’s are independently distributed with mean zero and constant variance cf. Assumption 3. {xit} is stationary throughout the period of the experiment and to the infinite past. Assumption 4.
Var(xiJ = 1 for all i and t.
Assumption 5. The sample variances and covariances of the parameter estimates converge to their true values as the sample size Tapproaches infinity.
P.A. Papakyriazis,
358
Optimal experimental
design
With regard to Assumption 4, the context of the experiment will place bounds on the range of control variable observation, and thus on var(xi,). The experimenter will never wish an independent variable to have less than maximum allowable variance. Hence we may presume var(xi,) is always set at its maximum. For a given i, var(xi,) is the same for all t by the stationarity assumption; and the units of xit may be chosen (without loss of generality) so that var(x,,) = 1. In matrix notation (3.1) can be written as Y =
xj+a,
(3.2)
where Y = (Y1,Y2, . . ..Yr)‘. B = (P1l,Plz,...,P1,r+l,...,PK,r+l)‘, @ = (q, a2, . . . . ar)‘, and x = (X1,X,,
. . . . X,),
where xi, 1
Xi0 .
...
Xii-r .
xi = .
The minimum variance unbiased estimates of the coefficients in (3.2) are given by b = (X’X)%$?Y.
(3.3)
Under Assumption 5 we have asymptotically V(b) = a,2E(X’X)-1 = T-‘S-‘a,2, where
(3.4)
P.A. Papakyriazis,Optimalexperimentaldesign
359
and
(3.5)
Here Pii is the correlation between xit and xjt lagged (e) periods; so S is a (r + 1)Kx (z+ 1)K correlation matrix (non-negative definite with ones in the diagonal). The experimental design problem is to minimize either the trace or determinant of (3.4) with respect to the correlation structure S. Theorem 1. The determinant and trace of V@> are minimized with respect to the correlation structure S if and only ifs = I. Proof for determinant.
From (3.4) it is apparent that minimizing 1V(j)1 is equivalent to maximizing 1S( . But, from Hadamar’s Inequality [Marcus and Mint (1965, p. 114)], the determinant of a positive definite matrix is less than or equal to the product of its diagonal elements, with equality holding if and only if the matrix is diagonal. Hence I S I 5 1, with equality holding if and only if S = Z. It follows immediately that (S( is maximized if and only if S = Z. Prooffor
trace. From (3.4) we wish to minimize tr(S-I) = C!:_il’K sii, where sir is the ith diagonal element of S-r. However, it is known [Rao (1965, p. 58), for example] that sii is greater than or equal to the inverse of the corresponding element of S, with equality holding if and only if S is diagonal. Thus sii 2 1; and t&S-‘) is minimized if and only if S = I. While we can find more than one process to satisfy the condition S = Z, a natural choice is the white noise process {xi,}, independent across i.
3.2. The regression model with autocorrelated errors
A general regression model with autocorrelated form
errors may take the following
(3.6) where
360
P.A. Papakyriazis, Optimal experimental design
This is a special case of the more general model (2.2). In addition to Assumptions 1,2,3,4, and 5, the following assumption is made. Assumption 6. The roots of the auxiliary polynomials 4(Z) = 0 and e(Z) = 0 (where the operator L is replaced by the variable Z) lie outside the unit circle, with no single root common to both polynomials. We continue to assume that variables are m_easured as deviations from_th_eirresnective ~ _ - __ _ means. __-_-__-.
3.2.1. The general regression model with first-order autoregressive errors Theorem 2. Consider model (3.6) where z = 0, 8(L) = 1, and 4(L) = l -4L. The independent variables process which minimizes the determinant and trace of the least squares variance-covariance matrix V@“) has the following autocorrelation structure:
pii
for all i,
= -sign(+)
Pij(l)= Pji(O)= 0
for all i,j(i # j).
Proof.
Transforming the model under consideration satisfies the classical assumptions we get
(3.7)
so that its disturbance
The asymptotic variance-covariance matrix of the least squares estimates of the unknown parameters /I = (fll, p2, . . . , pK)’ is [Pierce (1971), for example] v@‘+‘) = ,,2E(x”“‘X’+‘)-1
= (@-1)
(S(4))-1,
(3.8)
where
x’+’ =
[xi,-$mi
,-I],
s’+’ = b..(O) (l+@)-2C$p..(l)]Y SJ
’
Following analogous steps to those the proof of Theorem 1, we conclude that the determinant and trace of V(b’@) are minimized with respect to the positive definite matrix S@‘)when SC4’is diagonal with maximal diagonal elements. The theorem follows. Although we may find more than one process to satisfy the conditions (3.7) (since a given autocorrelation structure may correspond to many different processes) a natural choice is a process where each xir follows an independent
P.A. Papakyriazis, Optimal experimental design
361
AR(l) when 4 is negative and an independent AR(- 1) when 4 is positive.” There is a minor problem here. The proof involves the stationarity assumption, whereas AR( + 1) is not a stationary process. However, AR( + 1) is clearly the limit of a sequence of increasingly more efficient designs. Note that we have to know the sign of 4 in order to apply Theorem 2. Presumably this information is often available. If it is not, the experimenter faces a dilemma in that the appropriate designs for 4 > 0 and 4 < 0 are so radically different. (The theorem is still of value in that it at least clarifies what the dilemma is.)
3.2.2. The general regression model with mixed first-order autoregressive jirstorder moving average errors Theorem 3. Consider modeI(3.6) wherez = 0, B(L) = 1 - 0L, and4(L) The optimalprocess has thefolIowing correlation structure: Pii
= sign(d) [sign(e)]”
= l-4.
for aNi, (3.9) for all i,k (i # k),
Pik(O) = Pik(h) = 0 where A = l++(&~)(e2+1). Proof.
The standard regression model with mixed ARMA (1.1) errors can be written as follows
[(I-&)
(l-eL)-‘l_Yt =
fl Bi[(l-~L)(l-eL)-'lxi,+a,.
For large samples, the asymptotic variance-covariance matrix of the least squares estimates of /I = (& , p2, . . . , jlK)’ is [see, for example, Pierce (1971)] JQW’)
= o;E(X’(+p;B))-l
= o,z(c$#.;e))-l,
(3.10)
where
x(4@)= [(I -4~)
ScbGe) = [E([l
-dL]’
(1 -
eLylxitl,
[l -t?L]-2~i~XkJ.
‘%I general, an approximate method for realizing a random process having a given autocorrelation is to use the Arcsine Law [Eykhoff (1974, p. 300)] and to represent the process as the output of a linear system driven by a stationary normal process with zero mean,
362
P.A. Papakyriazis, Optimal experimental design
By reasoning in the proof of Theorem 1, the determinant and trace of (3.10) are minimized if and only if S (@;@is diagonal with each diagonal element maximized. NOW, the typical diagonal element is E[(l -@)
=
=
(1 - BL)?Ci,]2
E
K
(i--e2)-9+(4/e)
1 .
(3.11)
(3.11) is maximized when xit has the autocorrelation structure given by the first line in (3.9). If Cpand 8 have opposite signs, then A = 1-b 4’ -(4/e) (0” + 1) is positive and no information about C$and 8 beyond signs is required. However, if 4 and 8 have the same sign, A can be either positive or negative depending on whether (1+ 4”) > (4/e) (1 + e2) or (1 + 4’) < (4/e) (1 + 8’); SO greater knowledge of 4 and 8 is required. When A > 0, a stochastic process where each Xit follows an independent AR(l) when 8 > 0 and an independent AR(- 1) when 0 < 0 will be a natural choice. When A < 0, the processes graphed in fig. 2 will provide the optimal correlation structure. The requirement that the experimenter knows something ahead of time about the parameters he is estimating is a vexing problem, but a common one in design contexts [see, for example, Box and Lucas (1959), and Box and Jenkins (1970, pp. 416-420)]. The problem arises in the above theorem, in Theorem 4 below, and even more severely in Theorem 5. In situations where the knowledge is just not there, the theorems do not tell the experimenter what to do. Nonetheless, they are still of value in analyzing the problem. If a sequential strategy is possible, the experimenter can improve the design as his knowledge of the parameters improves. I3 If not, the experimenter will have to pursue (formally or informally) some sort of Bayesian procedure based on prior distributions of the parameters. Given that the optimal designs alter radically when certain signs change, the cost of designing for a wrongly guessed sign can be very high. Thus, an appropriate design in the face of ignorance about parameters may well be very different from 131n particular when parameter values (estimates) are not available, and sequential experimentation is poskble, a ‘preliminary’ experiment could be conducted by the experimenter to get things going. The number of such preliminary observations is chosen such that it is sufficient to obtain fairly accurate estimates of the unknown parameters. The initial experiments may be selected by any non-optimal design, say white noise. The initial estimates are then used to choose the optimal design. Since at the post-experiment estimation phase, we are free to disown the initial estimates on which we based the design, the use of initial estimates is in a way ‘safer’ in design than in estimation.
363
P.A. Papakyriazis, Optimal experimental design
i
1
1
: : : D : i, .I
tfl
t
1
i
1
i
: : : : : !
i i : : i
t+3
t+4
I
I
I t+2
Time
(a)
1
I tt1
t
I tt2
tt3
tt4
Time
(b) Pig. 2. Optimal design processes
when A < 0 in Theorem 3. (a) Deterministic p(1) = p(2) = . . . = - 1 and (b) with p(j) = (- l)‘+‘.
process with
the optimal design for any particular parameter values. These issues can surely be formally modelled. The theorems here should be a helpful start. Corollary I. Consider model (3.6) where z = 0, 0(L) = 1 - f3L, and 4(L) = 1. The optimal process Cfor determinant and trace criteria) has the following autocorrelation structure: pii
=
bign(Wh,
P~O) = Pi/c(h)= 0
for all i,k (i # k),
PiAh) = J%it&,t-h)
foralli,k,h
where
Proof.
See Theorem 3.
= 1,2, . . .
(3.12)
364
P.A. Papakyriazis, Optimal experimental design
Corollary 2. Consider model (2.2) where 8(L) = qb(L) = 1,6(L) = 1 -6L, Wi(L) = 1, and fli (L) = Bi for all i. The optimal process (determinant and trace criteria) has the correlation structure (3.12). Proof. When f?(L) = 4(L) = 1, 6(L) = l -6L, all i, model (2.2) becomes
Yt = jI @i/[l-sLI) Xit+atThe large sample variance-covariance is
q(L)
= 1, and Bi(L) = pi for
(3.13)
matrix of the least squares estimates b(p)
J@‘d’) = (T-‘~,2) (Scd))-l ,
(3.14)
Scd) = plim[T-‘(l-6L)-‘x,,(1-6L)-‘x~,].
(3.15)
where
Consider first the determinant criterion. Minimizing 1V(j?(“) 1 is equivalent to maximizing 1Scd’1. By Hadamard’s inequality, I ~3’~’I 5 JJ f= 1 ([S]$)), where (s)if’ = (l-6’)-‘(1+2qi) and qi = C ?=I 6ipiio’). Thus, IS’d’) is maximized when Scd)is diagonal, which implies the second line of (3.12). Finally, the ith diagonal element of S(‘) is maximized when the first line in (3.12) holds. Similarly, it can be shown that trace of V(fi”‘) is minimized for correlation structure (3.12). Corollary 3. Consider model (2.2) where O(L) = 1, 4(L) = l -4L, 6(L) = 1 - 6L, and where q(L) = 1, and pi(L) = fit for all i. Then the optimal process (determinant and trace criteria) has correlation structure (3.9). Proof.
Similar to proof of Theorem 3.
3.2.3. The one period lag, one independent variable model with AR(I) disturbances Now, the assumption that the independent variables Xit appear without lags is relaxed, and consideration is given to the combined problem of lags and autocorrelation of the disturbances. Theorem 4.
Consider thefollo wing model :I 4 yt =
Plx,+Bzxt-l+(l-_L)-‘a,,
(3.16)
14We have considered some other cases with more than one period lag and/or other structures of disturbances, but we have not been able to derive analytically the optimal correlation structure of the control process. In those cases numerical methods for tiding the optimal or nearly optimal structures might be used.
P.A. Papakyriazis, Optimal experimental design
365
which is a special case of (2.2). The optimal control process (for the determinant and trace criteria) has the following correlation structure:l 5
f(l) = -sign(+),
(3.17)
f(2) = -sign(~) [(l/$)+$-l].
Proof for determinant.
The asymptotic variance-covariance matrix of the least squares estimates of (3.16) is [Box and Jenkins(l970), Pierce (1972)] J,‘(jjp)) = r_@(@)X~+))-’
= (o,2T-‘) (@‘)-‘,
(3.18)
where
sp =
-cl +4J2~-2dv(l)
1
P(l) (1+42F--w +Lw-
P(l) (1+42hm +AW
(l+42)-wPo)
_ (3.19)
Minimizing 1V@y)) 1is equivalent to maximizing I$!‘[ = WMl)-4+42P(l)P(v-~2P(l)l +1+4~2p2(1)+~4-2~p(l)+~2-2~3p(l)-p2(1)-~2 -W(l)+2$~(1)-3~2(1)+2@~(1).
(3.20)
For any given value of p(l), I Sp) I is a quadratic function of (2), which can be maximized by setting p(2) = p(l)[l ++-‘I-
1.
(3.21)
By substitutingthisvalue ofp(2) in(3.20) we obtain: 4$2p2(1)-4~[1 +42]p(l)+ (1 + 95”+2~$~), which is maximized when p(l) = -sign(4). Substituting in (3.21) yields the second expression in (3.17). Proof for trace.
Minimizing (tr[ V(&))] = 2[ l -24p(l) + 4”]/ I Sy’ I} with respect to p(l) and p(2), we find that the same correlation structure (3.17) is optimal. One process that has the correlation structure (3.17) is the second-order autoregressive process. However, there is a minor problem with the realizability of this process; the stationarity conditions [p(l); p(2)) -C 1 and p’(1) < (l/2) ‘%I this, and subsequent discussion, we shall use the notation p(o) instead of pll(-).
366
P.A. Papakyriazis, Optimal experimental design
x [l-l- p(2)] [Box and Jenkins (1970)] are violated when the considered process has the correlation structure (3.17). A suboptimal but realizable process would be a second-order autoregressive process with P( 1) equal to a number (positive when 6 > 0, negative when C$< 0) as close to zero as possible, but not equal to zero, and P(2) as close to - 1 as possible, but not equal to minus one.’ 6
3.3. The mixed autoregressive-regressive model with autocorrelated errors Theorem 5.
Consider the model
(3.22)
Yt = 6Y,-l+Pxt+%,
where 8, = [l-cjL]-la,, and 4 is assumed to be known. Then, to minimize 1V(j&@’= [8c4)j?c@)]‘) 1 is equivalent to maximizing thefollowing expression with respect to p(1):
where s =
f$sh-*p(h)
h=2 =
[~6-~p(l)+~6*P(l)-~63+6*+~*63--2~6*p(l)]
LS’(l -
8”) (1- @ -4/s
+ 4*)1- (PW0
(3.23b)
Proof.
See Box and Jenkins (1970, app. Al 1.2), and Theorem 4. In general, the maximization of (3.23a) should be done numerically for initial estimates of 4,& P, and 0,‘.
16~other Processthat has p(l) = 0 and p(2) = - 1 is the discrete triangular wave process.
P.A. Papakyriiatis, Optimal experimental design
361
4. Examples Three examples will be presented to illustrate the loss in estimation accuracy possible with processes having non-optimal autocorrelation structure. Example 1. s, = 4.ql+at,
Consider
the
simple
regression
model
y, = Qx,+E,
where
141 < 1. Th e asymptotic variance of the least squares estimate
of the unknown parameter j? is
v(p) = (T-'~~)([(1f~2)-2~p(l)I)-1.
(3.24)
Table 1 shows the efficiency I7 of three processes for x, with non-optimal autocorrelation structure; these processes are: the white noise process, the autoregressive process with autocorrelation parameter equal to the correlation coefficient of the errors, which is sometimes suggested by economists [Johnston (1972, p. 247), for example], and the alternating deterministic process used by control engineers.l’ The table shows that much efficiency can be lost by using a non-optimal process for (x,>. For example, the efficiency of the AR(l) with parameter 4 when the error terms E,have autocorrelation coefficient 0.9 (or - 0.9) is 0.053 relative to the process with optimal correlation structure. We have considered other simple processes, like the impulse and step processes; but we found their autocorrelation structure inefficient. 2. Consider the simple model yt = @ct+ E, where E, obeys a iirstorder moving average process; that is, E, = a, - Oat_ 1. The asymptotic variance of the least squares estimate of /I is given by
Example
1+2 5
V(jP) = [z?&l--P)] IC
h=l
I 1
fPp(h) ,
and (3.25) is minimized when the control process {x,} has p(h) = 1 if 6 < 0 and p(h) = (- l)h if 8 > 0. Table 2 presents efficiencies for the same three processes discussed above in relation to table 1. “The relative efficiencies of various processes are measured in the usual way by taking the ratio of the associated variances. Thus efficiency is detined as the ratio of [(l + 42)-2&41)l when the process under consideration is used to the value of [(l+ 49-2&*(l)] when the process with optimal correlation structure p*(l) is used. l8An alternating process is a process which alternates between two values a and 6, xt = (;I
:zq.
when CL= cx and b = -a,
this process has
P.A. Papakyriazis,
Optimal
experimental
design
0.500
o.oOOO3 0.003
AR(l) process with parameter 0
Alternating process 0.031
0.516
0.177
0.70
0.111
0.555
0.333
0.50
0.290
0.644
0.538
0.30
1.00
1.00
1.00
0.00
Table 3
0.670
0.835
0.818
0.10
1.00
0.835
0.818
-0.10
1.00
0.644
0.538
-0.30
1.00
0.555
0.333
-0.50
1.00
0.516
0.177
-0.70
0.99
0.750
0.0001
4
White noise process
AR(l) process with parameter(
0.011
0.753
0.90
0.117
0.770
0.70
0.360
0.840
0.50
0.697
0.924
0.30
0.961
0.990
0.10
1.00
1.00
0.00
0.961
0.990
-0.10
0.697
0.924
-0.30
0.360
0.840
-0.50
0.117
0.770
-0.70
0.011
0.753
-0.90
1.00
0.500
0.005
-0.99
0.0001
0.750
-0.99
variable model
1.00
0.501
0.053
-0.90
Efficiency of a white noise process and an AR(l) with parameter 0 for different values of 4 in the one lag, one independent with first-order autoregressive errors.
0.501
0.053
0.005
White noise process
0.90
0.99
e
Efficiency of a white noise process, an AR(l) with parameter 0, and an alternating process for different values of ~9in the simple regression model with first-order moving average errors.
Table 2
5
2 r, ;T E 2
3 3. “?i
P a b $ q
370
P.A. Papakyriazis, Optimal experimental design
Example 3. Finally, table 3 shows the efficiencies of a white noise process and an AR(l) with parameter 4 process in the context of the one period lag, one independent variable model with first-order autoregressive disturbances (Theorem 4).
5. Conclusions Experimentation in economics is expensive and complex relative to experimentation in the fields (such as ggriculture) for which most of design theory was developed. This suggests the importance to and the need for econometricians to extend design theory in directions of particular interest to economists. This paper has presented optimal processes for a number of time series models. The optimal processes are derived in terms of the control variable correlation structures [instead of allocations to design points as is customary - Conlisk and Watts (1969), Fedorov (1972), for example]. Examples have been presented illustrating the significant improvement in estimation accuracy that can typically be obtained. A point which stands out is that optimal designs vary greatly across models. The design procedure in this paper is believed to prove useful in all real-world, game, or computer simulation experiments where the experimenter is interested in the dynamic behavior of the experimental subjects. Although use of this approach requires stochastic processes whose parameters depend on the unknown model parameters, the procedure should still be of value in analyzing the problem. In situations where a sequential strategy is possible, the experimenter can improve the design as his knowledge of the parameters improves. If not, because the prior guessing problem is at the heart of this issue, it may be that a Bayesian approach is a sensible next step.lg 19We intend to undertake such an approach in a subsequent paper.
References Bawden, D., 1969, A negative tax experiment for rural areas, American Statistical Association Proceedings, Social Statistics Section, 157-162. Box, G.E.P. and H. Lucas, 1959, Design of experiments in non-linear situations, Biometrika 46,77-90. Box, G.E.P. and G. Tiao, 1965, Multiparameter problems from a Bayesian point of view, Annals of Mathematical Statistics 36, 1468-1482. Box, G.E.P. and G. Jenkins, 1970, Time series analysis, forecasting and control (Holden-Day, San Francisco, CA). Box, M., 1972, Discussion of Dr. Wynn’s and Dr. Laycock’s papers, Journal of the Royal Statistical Society (B) 34, 170-172. Buzzel, R.D., D.F. Cox and R.V. Brown, 1969, Marketing research and information systems (McGraw-Hill, New York). Carlson, J.A., 1967, The stability of an experimental Southern Economic Journal 33,305-321.
market with a supply response
lag,
P.A. Papakyriazis, Optimal experimental design
371
Castro, B. and K. Weingarten, 1970, Toward experimental economics, Journal of Political Economy 78, 598-607. Chamberlin, E.H., 1948, An experimental imperfect market, Quarterly Journal of Economics 56,95-108. Conlisk, J. and H. Watts, 1969, A model for optimizing designs for estimating response surfaces, American Statistical Association Proceedings, Social Statistics Section, 150-156. Cramer, H., 1957, Mathematical methods of statistics (Princeton University Press, Princeton, NJ). Dolbear, F.T., Jr., L.B. Lave, G. Bowman, A. Liberman, E. Prescott, F. Rueter and R. Sherman, 1968, Collusion in oligopoly: An experiment on the effect of numbers and mformation, Quarterly Journal of Economics 82, 240-259. Eykhoff, P., 1974, System identification: Parameter and state estimation (Wiley, New York). Fedorov, V.V., 1972, Theory of optimal experiments, Translated and edited by W.J. Studden and E.M. Klimko (Academic Press, New York). Fisher, F.M., 1966, A priori information and time series analysis (North-Holland, Amsterdam) ch. 3. Frahm, D. and L.F. Schrander, 1970, An experimental comparison in pricing in two auction systems, American Journal of Agricultural Economics 52, 528-534. Friedman, J., 1963, Individual behavior in oligopolistic markers: An experimental study, Yale Economic Essays 3,359-417. Friedman, J., 1967, An experimental study of cooperative duopoly, Econometrica 35, 379397. Friedman, J., 1969, On experimental research in oligopoly, Review of Economic Studies (SYmposium on Experimental Economics) 36, 399-415. Gabor, G.W., J. Granger and A.P. Sowter, 1969, Real and hypothetical situations in market research, A study on method, Unpublished. Johnston, J., 1972, Econometric methods, 2nd ed. (McGraw-Hill, New York). Kelley, T. and L. Singer, 1971, The Gary, Indiana income maintenance experiment, Plants and Progress, American Economic Review 61, 30-34. Kurz, M. and R. Spiegelman, 1969, The Seattle experiment, The combined effect of income maintenance and manpower investments, American Economic Review 61, 22-29. MacCrimmon, K.R. and M. Toda, 1969, The experimental determination of indifference curves, Review of Economic Studies (Symposium on Experimental Economics) 36,433-451. Marcus, M. and H. Mint, 1964, A survey of matrix theory and matrix inequalities (Allyn and Bacon, Boston, MA). Naylor, T.H., ed., 1969, The design of computer simulation experiments (Duke University Press, Durham, NC). Naylor, T.H., 1971, Computer simulation experiments with models of economic systems (Wiley, New York). Naylor, T., 1971, Experimental economics revisited, Journal of Political Economy 79,347-352. Naylor, T.H., J.L. Balintfy, D.S. Burdick and K. Chu, 1966, Computer simulation techniques (Wiley, New York). Naylor, T., D. Burdick and W. Sasser, 1967, Computer simulation experiments with economic systems: The problem of experimental design, Journal of American Statistical Association 62,1315-1357. Grcutt, G. and A. Orcutt, 1969, Incentive and disincentive experimentation for income maintenance policy purposes, American Economic Review 59, 463-472. Orr, L., 1969, Strategy for a broad program of experimentation in income maintenance, American Statistical Association Proceedings, Social Statistics Section, 163-173. Pierce, D.A., 1971, Least squares estimation in the regression model with autoregressive moving average errors, Biometrika 58, 299-312. Pierce, D.A., 1972, Least squares estimation in dynamic-disturbance time series models, Biometrika 59, 73-78. Rao, C., 1965, Linear statistical inference and its applications (Wiley, New York). Shubik, M., 1959, Strategy and market structure (Wiley, New York). Smith, V.L., 1962, Experimental studies of competitive market behavior, Journal of Political Economy 70, 11l-137.
312
P.A. Papakyriazis,
Optimal experimental
design
Smith, V.L., 1964, Effects of market organization on competitive equilibrium, Quarterly Journal of Economics 77,181-201. Smith, V.L., 1965, Effect of market organization on competitive equilibrium, Quarterly Journal of Economics 78, 387-393. Tacker, K.D., 1952, A note on the design problem, Biometrika 39, 189. Watts, H., 1969, Graduate work incentives: An experiment in negative taxation, American Economic Review Papers and Proceedings 59,463472. Watts, H., 1971, The graduated work incentive experiments: Current progress, American Economic Review 59, 15-21. Yaari, M.E., 1965, Convexity in the theory of choice under risk, Quarterly Journal of Economics 79,278-290. Zellner, A., 1971, An introduction to Bayesian inference in econometrics (Wiley, New York).