317
International Journal of Forecasting 6 (1990) 317-326 North-Holland
Mode predictors in nonlinear systems with identities * Giorgio Calzolari and Lorenzo Panattoni IBM Scientific Center 56100 Pisa, Italy
Abstract: For a nonlinear system of simultaneous equations, the mode of the joint distribution of the endogenous variables in the forecast period is proposed as an alternative to the more usual deterministic or mean predictors. A first method follows from maximizing the joint density of a subset of the endogenous variables, corresponding to stochastic equations only (analogously to FIML estimation, where identities are first substituted into stochastic equations). Then a more general approach is developed, which maintains the identities. The model with identities is viewed as a mapping between the space of the random errors and a hypersurface in the space of the endogenous variables; the probability density is defined, and maximization is performed on such a hypersurface. Experimental results on these two mode predictors (and comparisons with deterministic and mean predictors) are provided for a macro model of the Italian economy. Keywords: Nonlinear econometric models, Simultaneous equations, Deterministic tor, Joint density function.
1. Introduction Predictors from nonlinear systems of stochastic equations are usually produced with deterministic
* Preliminary versions of this paper were presented at The Eighth International Symposrum on Forecasfing (Amsterdam, 1988), and at the European Meeting of the Econometric Society (Bologna, 1988). A simplified approach to the problem had earlier been proposed at the European Meeting of the Econometric Society (Budapest, 1986) in a joint paper with Jean-Louis Brillet, where experimental results were given for a large scale macroeconomic model of the French economy developed at INSEE. After suggestions and comments received from F.J. Henk Don, Stephen G. Hall, Fortunato Pesarin, Frederic P. Sterbenz, and Kenneth F. Wallis, in this paper we tackle the problem from a more general point of view. Mirella Damiani supplied the data and the model experimented with at the end of the paper. We are greatly indebted to all of them. Particular thanks are due to Giorgio Letta, for suggesting to us the mathematical framework that underlines the new method. However, responsibility for the contents of this paper remains only with the authors. 0169-2070/90/%03.50
predictor, Mean predic-
simulation. The shortcomings of this method have been widely investigated in the literature. For example, as the deterministic solution is usually different from the conditional mean of the endogenous variables, the model’s validation should not be based on the comparison between deterministic solution and historical values of the variables (Howrey and Kelejian, 1971). Moreover, Wallis (1982) gives examples where the use of deterministic simulation predictors from a nonlinear econometric model may be less efficient than an extrapolative time-series forecast, in contrast with a well-known result on the relative efficiency in the case of linear systems. Replacing deterministic solution predictors with an estimate of the conditional means of endogenous variables in the prediction period allows us to overcome this problem, to some extent. Observing that deterministic solutions of a nonlinear model with estimated parameters produce asymptotically biased and inefficient estimates of the conditional means, Mariano and Brown (1983)
0 1990 - Elsevier Science Publishers B.V. (North-Holland)
318
G. Calzolari, L. Panattoni / Mode predtctors in nonlinear systems
recommend the use of the sample mean of replicated parametric stochastic simulations, or (Brown and Mariano, 1984, 1985) the use of the sample mean of nonparametric residual-based simulations. Both procedures attain, under certain conditions, asymptotic unbiasedness (and efficiency to some extent). Empirical studies have been performed on a variety of real-world macroeconomic models used in practice by model builders for forecasting purposes (e.g. Bianchi et al., 1976, 1980; Calzolari, 1979; Fair, 1980; Hall, 1986a,b; Fisher and Salmon, 1986). They aimed at evidencing whether considering the conditional mean of the endogenous variables in place of the deterministic solution gives significant improvements to the forecaster or not. The conclusion is clearly model-dependent and not unequivocal; cases with significant differences have been evidenced. Predictors obtained as conditional means of the endogenous variables are not, however, optimal from all points of view. They are obviously optimal if the forecaster has in mind some kind of quadratic loss function for his forecasts, but they have two highly undesirable properties. The first is that the means do not necessarily satisfy the equations of the model: given the values of the predetermined variables and of the model’s parameters, it is generally impossible to find values of the random error terms which correspond to the mean values of all the endogenous variables.
This is a problem in particular for those variables which appear in nonlinear identities; if the means are computed with replicated stochastic simulations (parametric or nonparametric), although all identities hold replication by replication, they do not hold everywhere (at least the nonlinear ones) in terms of the means. In other words, the conditional mean does not provide a coherent forecast, and this can lead to some serious misinterpretations of the results. Suppose for example that an Italian model is designed to predict the exchange rate in lire per dollar (ERl). Then, in order to compute the exchange rate in dollars per lire, the model simply takes the reciprocal: the identity ER2 = l/ERl will therefore be included among the equations. If we use the conditional means as predictors, the mean of ER2 is not the reciprocal of the mean of ERl. The naive analyst, who looks
at the table of results supplied by the model, will probably think that these results are wrong (unless he is so ingenous to believe that the model also considers the existence of arbitrage opportunities in the exchange market). Considering the problem of incoherency of predictors obtained from conditional means, Hall (1986b) observes also that the modes of the marginal distributions of the endogenous variables would not escape the problem: the mode of the product of two random variables is generally not the product of the two modes. A second undesirable property of the mean predictors is also usually expected from the analysis of simple univariate distributions: the mean is usually expected to be less probable than the deterministic predictor; with respect to the deterministic solution value, the mean of an endogenous variable is shifted towards the side of more unlikely values, rather than towards the side where the probability density grows.
In fact in skewed univariate distributions which are unimodal and of moderate asymmetry, an empirical relationship holds between mean, median and mode (see Kendall and Stuart, 1969, section 2.11) and it is that the three quantities occur on the distribution in the same order (or in the reverse order) as in the dictionary. This should imply that, if we associate the median of each endogenous variable to the deterministic solution of the model, the most likely values of the variable are systematically expected to lie on the opposite side of the mean. What we propose in this paper is to reconsider the mode of the joint distribution of the endogenous variables as an optimal predictor which preserves identities. Although it does not coincide with the mode of the univariate distributions of the endogenous variables considered separately (i.e. their marginal distributions), in some sense it might be regarded as an estimate of the most likely joint value of all the endogenous variables simultaneously. At the same time, since it is obtained as a solution of the system corresponding to a particular set of values of the random error terms, it implies coherency. In a previous study, Brillet, Calzolari and Panattoni (1986) proposed to estimate the mode of the joint distribution, maximizing the joint density of a subset of the endogenous variables, corre-
G. Calzolari, L. Panattoni
/ Mode predictors in nonlinear systems
sponding to stochastic equations only. This was analogous to the approach that is usually followed for estimating models with full information maximum likelihood; before the estimation process (at least conceptually) identities are substituted into stochastic equations, and only after this can the likelihood be defined (and maximized) for the model that is, now, without identities. In this paper, a more general approach is developed; it is explicitly designed to keep the identities. The model with identities is viewed as a mapping between the space of the random errors and a hypersurface in the higher dimensional space of the endogenous variables, and maximization is performed on such a hypersurface. This provides the mode of the joint distribution of all the endogenous variables. Some technical difficulties, arising when searching for the maximum of the joint density function of all the endogenous variables in a large scale macroeconometric model, had to be solved. The computational problems, however, are only slightly more complex than for the method in Brillet et al. (1986). Experimental results on the two different mode predictors are provided for a macro model of the Italian economy. For the sake of completeness and to allow comparisons, we also provide results related to the mean predictors, without details on the computational methods that are widely described in the literature.
2. Notations and first assumptions Let the simultaneous equation model be represented as
f(
Yt, -53
a) =
1
Ut [ 0 ’
t=1,2
,...,
T,
where y, is the M x 1 vector of endogenous variables at time t, x, is the vector of predetermined variables at time t, a is the vector of all unknown structural coefficients in the model, and f is the M x 1 vector of structural form operators. The model is supposed to contain m I M stochastic
319
equations and M - m identities. The m of random error terms at time t,
X
1 vector
(2)
is assumed to be independently and identically distributed as N(0, 2) with the m X m covariance matrix JZ completely unknown, apart from being symmetric and positive definite. The vector u, is followed by a vector of M - m zeroes in the structural form equations (1). We also decompose each of the vectors f and y, in two subvectors of length m and M - m, respectively:
f =
[
;;::
I=
fi
YlJ
f,
Y2.l
(1)
r,
f m+l
r,
7
JI=
YI
Y’m,f
i
L Y/*)
Ym+l,,
L
YM,r
where, according to the partitioning of the random error terms adopted in eq. (1) f(l) is the subvector of the structural stochastic operators and f (‘) contains the identities. There is a certain freedom in ordering the endogenous variables inside the subvectors y,(i) and y>*). A choice that is rather obvious, but not necessarily unique, is to put into y,,, the variable which is explained by the structural equation whose error term is ~i,~, to put into y,,, the variable which is explained by the structural equation whose error term is u*,~, and so on, till y_ and u,,~. We have filled in this way the subvector y,(l). The subvector yl(*) is filled in some way from the remaining M - m endogenous variables. It is usually assumed that a simultaneous equations system like (1) uniquely defines the values of the elements of yt once values for the coefficients, the predetermined variables, and the disturbance terms are given. This means that the structural
G. Calzolari, L. Panattoni
320
form equations (1) implicitly reduced form equations: ’ Yl =d-%,
/ Mode predictors m nonlinear systems
define a system of
a, %I.
(4)
Since all predictors are conditional on the model’s parameters and predetermined variables, in what follows we shall indicate the mapping between the set of the endogenous variables and the set of the random error terms, defined by the stochastic structural equations of the model and its reduced form, simply as
subspace: for example, some of its elements may not assume negative values) and that the inverse function is also continuous and differentiable. The joint density of the elements of y,“) can thus be obtained, as usual, from the density of U, and the Jacobian determinant. Apart from an additive constant, the log-density of the m-dimensional random vector y,“) is
L, = - : log/E/ + log
~ Cap*> I; (ay,““) 11
- :P:z-‘P,,
3. The mode predictors 3.1. A simplified
approach:
m-mode
The way the problem is tackled in this section is reminiscent of the full information maximum likelihood estimation of simultaneous equations systems (e.g. Amemiya, 1983). Since the model (1) includes M - m identities, we first substitute these identities into the m stochastic equations. The resulting model may be written as p(y/“)=u,,
t=1,2
,...,
T,
(6)
where with the m X 1 vector of functions p we indicate the structural form equations after substitutions. 2 We assume that, given the coefficients and the predetermined variables, the functions vector p is a continuous and differentiable one to one mapping from a subset of Iw” onto the whole Iwm (u, is multivariate normal, and therefore it spans the whole lRm space, while y:” may be restricted to a
Given the assumptions, reduced form would be
a more precise notation
for the
where the vertical bar indicates an absolute value, while the slash indicates a determinant. To obtain the mode predictor at time h, we must first compute the subvector y;‘) that maximizes (7) given the predetermined variables x,,, the coefficients u and the covariance matrix 2 (for a and E:, of course, we shall use the available estimates a^ and 2). Given yi’), we must then solve the subsystem of identities, obtaining yi’), and thus the complete M X 1 predictor for all the endogenous variables. It is well known from the theory of maximum likelihood estimation that the boring operation of substituting the identities can be bypassed (see Oberhofer, 1971, who extends to the nonlinear case the treatment of identities in maximum likelihood estimation given by Rothenberg and Leenders, 1964, pp. 71-72; see also Anderson, 1984, Theorem A.3.2). In fact, the same values (and therefore the same maximum) of L, can be obtained if we partition the M X M Jacobian matrix corresponding to all equations into four blocks:
a fp ayp a j-,“)
We use, however, the notation of equation (4), as it is simpler and more frequently adopted in the literature. Solving the subsystem f@‘( y,) = 0 gives y,“’ = f”‘-l( y,(l)), then substitution gives
For simplicity, however, we use the notation of equation (6).
a@ (8) (the 1, 1 block corresponds to stochastic equations) and instead of /ap,,/i3y~‘)‘/ we use in (7) the ratio between the determinant of the M X M Jacobian matrix of the complete system, and
G. Calzolari, L. Panattoni
/ Mode predictors in nonlinear systenw
the determinant of the (M - m) X (M - m) submatrix corresponding to identities: L, = - + log/Z/
+ log -,;i, 1
- $p;z-‘p,.
(9)
I
To simplify notation, in what follows the time subscripts t and h will sometimes be suppressed, when this does not create confusion. For an easier comparison with the approach of next section, we modify the expression of the density given in equation (7). First, rather than viewing L as a function of y(l), we view it as a function of U. To do this, we simply substitute u into p in the last term, and consider the Jacobian matrix of first derivatives of the elements of y(l) with respect to the elements of II (which is simply the inverse of the matrix k3p/ay(‘)’ used above): - : log/Z/ - log I;
(W)) (aU’>
- ;U’Z-‘U. /I
(IO)
Then we simply substitute into the absolute value of the Jacobian determinant the square root of the determinant of the product between the transpose of the Jacobian matrix and the matrix itself: - + log/z‘/ - log /~!!P/1/2_
A.
(11) Maximizing (11) in the forecast period, h, max - : log/E/ - log ” [
/a;;)!
a#,, y=
-&l’X-Gl
1 )
1
(12)
provides a value for the m X 1 vector of random errors u,,. Inserting this vector into the model (1) and solving the model at time h (that is, at least conceptually, the same as inserting it into the reduced form equations (4) or (5)), we obviously get the same M x 1 predictor yh discussed above. The values of the endogenous variables computed in this way provide only a partial answer to our problem. In fact, the value computed for the first subvector ( y,“)) is the mode of the joint distribution of the elements of this subvector, but the second subvector ( yi”)) is simply calculated to preserve the internal coherency of the model (the whole vector yh is, in fact, a solution of the system). Suppose now that we interpret the mode
321
predictor as the most likely joint endogenous variables in the forecast only y,!(l) can be interpreted in this whole vector y,, is not the most likely all the endogenous variables. 3.2. A more general approach:
value of the period. Then way, but the joint value of
M-mode
In order to maximize the joint density function of all the endogenous variables of the model, we must first consider the type of mapping implied by the model, then some concepts on the measure of a regular (hyper)surface, and finally introduce a suitable definition of the probability density, with respect to this measure. To clarify the problem, let us consider the simplest cases. Suppose that our model consists of two equations, the former is stochastic while the latter is an identity. Given coefficients and predetermined variables, solving the model we get the value of the two endogenous variables for one value of the error term. If the solution is unique and the error term is normally distributed, this gives a mapping from R’ into IIX’. The reverse is not true; we can, in fact, enter y,,, and y,,, into the first structural equation and compute ui,* as a residual (and this will certainly be unique, as the error term is additive in the structural form equations), but we cannot take two arbitrary values for the endogenous variables, since they are constrained by the second equation (a point that is well known to the model builders: coherent historical data must satisfy all the identities). The values of y,,, and Y2.f must lie on a curve in the two-dimensional plane. Therefore our model can be viewed as a mapping from R’ onto a one-dimensional subset of R2, that is a curve in the plane. Suppose that our model consists of three equations, two of which are stochastic. If for any value of the random errors ul, and t+ (in R*) the model provides a unique solution for yl,,, y2,t and y, ,, the values of the y’s will neither span the full Iwj space, nor any three-dimensional subset of lR3, but will lie on a surface. The model, therefore, can be regarded as a mapping from R2 onto a two-dimensional surface in R3. In the general case of M equations, m of which are stochastic, the model (in the forecast period, given predetermined variables and parameters) can be viewed as a mapping from Iw” onto an m-dimensional hypersurface in the lRM space (being
322
G. Calzolari, L. Punattonr / Mode predictors in nonhear
m 5 M). We call Y this hypersurface. Under the two assumptions (1) for any y in Y we get a unique vector zf, (2) for any II in 88m the solution y is unique, we are then dealing with a one to one mapping between R” and the m-dimensional hypersurface Y in lR”. As in equation (5), we indicate this mapping as f(i): Y + R” (the stochastic structural equations) and its inverse as g: R” -+ Y (the reduced form). The second assumption is somewhat restrictive, but widely adopted in the treatment of nonlinear systems. The first, on the contrary, is not restrictive at all, given the way in which model builders write the structural form equations of a model, with additive error terms, like (1). In fact, if a vector of endogenous variables belongs to Y, it certainly satisfies the constraint of the M - m identities. In such a case, if we plug this vector into the m structural stochastic equations, we get a unique vector of residuals. Having defined the nature of this one to one mapping, we now assume, as in the previous section, continuity and differentiability (in both directions), and finally assume that the M X m Jacobian matrix G(u) = Cly,/au’ = ag( u)/Zlu’ has full rank (= m). In an M x m rectangular matrix, a substitute for the notion of determinant of a square matrix is the concept of modulus. It is defined as the square root of the sum of all the (,“) squared determinants of order m obtained by cancelling M - m rows. A simple way of computing the modulus, even when M and m are large, is obtained applying a well-known theorem of linear algebra on the determinant of the product of two rectangular matrices (e.g. Hadley, 1961, p. 102); it is the square root of the determinant of the product between the m X M transpose and the matrix itself. For our Jacobian, G(u), it is
(13) Having assumed full rank (= m) for the Jacobian, the modulus is # 0. The m-dimensional hypersurface Y defined by the model is, under all these assumptions, a regular hypersurface. We need now to define a measure on the regular m-dimensional hypersurface Y. As is well
systems
known, the M-dimensional Lebesgue measure of any m-dimensional hypersurface in R M (m -C M) is zero (e.g. Sikorski, 1969, p. 299). This is consistent with intuition: the two-dimensional measure (area) of a curve in the plane is zero; the three-dimensional measure (volume) of a curve or of a surface in lR3 is also zero, etc. But we can define a special one-dimensional measure on curves, consistent with the intuitive notion of length; or a special two-dimensional measure on surfaces, consistent with the intuitive notion of area. Existence and uniqueness of such a measure, for a regular m-dimensional hypersurface Y in WM (m _( M) is stated, for example, in Sikorksi (1969, p. 327, Theorem l.l), and is given by d/.+= ,G(u)
,du=l$$ldu=i%$$idu.
(14)
This theorem can be applied to derive a probability density with respect to the hypersurface measure pLr. If C+(U)is the probability density for u in R”“, the measure of the probability that y belongs to an element of the hypersurface Y, resulting (through y = g(u)) from the element du of R”, is given by dn, = rpfu) du and can easily be developed as follows:
where 3/u(~)=#ff(“(~)]/jG~f”‘(~)jj
(17)
is taken as the definition of the probability density of the vector y on the regular m-dimensional hypersurface Y. It is rather obvious that qr( y) respects the usual conditions for density functions, being always 2 0 and with integral being equal to 1 over the whole Y, with respect to the m-dimensional hypersurface measure py. We now indicate with L, the log-density &=
1ogM
Y>l
(18)
that will obviously be L,, for the values of y at time t, or L,,, in the forecast period, h. Being tp
G. Calzolari, L. Panattoni
/ Mode predictors in nonlinear systems
the density of the m-variate normal N(0, E), we have, apart from a constant term, L,=
-+
log/&-log]G[f”‘(y)]]
-b[f”‘(Y)l’H-L[f(l)(Y)l.
(19)
L, in equation (19) is conceptually defined as a function of the A4 x 1 vector y, whose domain is the m-dimensional hypersurface Y in IwM. The M x 1 vector y, that maximizes L, in Y (that is, y is constrained on Y), is the mode predictor max (-~log/~/-log/G[f’*‘(y)l( YGY
-~[f”‘(Y)l’~~‘~f”‘(Y)l}.
(20)
If we interpret it as the most likely joint value of the endogenous variables in the forecast period, such an interpretation applies to all the endogenous variables, unlike the result derived in the previous section. Numerical values of L y can easily be calculated by replacing f(‘)(y) with u: - + log/X/-
log 1G(u)
I- +u’Z’u.
(21)
Equation (21) provides a function of the m X 1 vector u, that must be maximized for II without constraints in R”: m;[-$
log/I=/-logIG(u)~-tu’~-‘u] (22)
in the forecast period, h. Inserting the resulting vector uh into the model (1) then solving the model in the forecast period, we obtain the M X 1 mode predictor y,, . From a technical point of view, the process is quite similar to what we did in the previous section (equation (12)). The only difference is that in equation (12) we have the modulus of the m x m square matrix ay”)/au’, while here we have the modulus of the M x m rectangular matrix ay/au’.
4. Maximization
algorithm
The two functions to be maximized for the methods of sections 3.1 and 3.2 are given in equations (11) and (21) respectively. Maximization is to be done with respect to the vector u,,, while predetermined variables are given, as well as coefficients and covariance matrix of the random error process (set at their estimated values a^ and 2).
323
The first term in both equations is a constant. We must therefore compute the vector uh which maximizes the sum of the second and third term of equations (11) and (21). The third term in both equations is particularly simple, being a positive definite quadratic form, whose Hessian is the matrix 8-l. We have used the well-known updating formula due to Broyden, Fletcher, Goldfarb and Shanno (BFGS, see for example Dennis and More, 1977). The algorithm is based on an iterative updating of an initial (m x m) positive definite matrix. Since the computational efficiency is greatly improved if the initial matrix approximates the Hessian, it was rather obvious in both cases to use the available estimate of 2-l. The algorithm also requires the evaluation at each step of the gradient of the function. This proved to be a rather serious problem and at this stage it has been solved by the numerical computation of first derivatives. This approach, however, has two main drawbacks. First of all it requires a long computation time, but nevertheless it came out to be computationally more efficient than rival maximizing algorithms which require only the computation of the function value. To quantify this aspect we can mention that in the case of the Italian model here considered (see below for description) the maximization of the quantities defined by equations (11) and (21) with ten significant digits on the function’s value, in both cases required five iterations. The computation globally took about 14 minutes of CPU time on an IBM 4341 computer in both cases. However, for the method of section 3.1 we can maximize the expression of the density given in equation (9), rather than (11); in this case the computation is much faster (only 50 seconds). This is mainly due to the fact that the elements of the Jacobian matrices in equations (11) and (21) are obtained from differences between two solutions of the simultaneous equation system, while the corresponding elements in equation (9) are simply computed from the differences between the residuals of the single equations: obviously the latter computation is much faster than the former. Unfortunately, we are unable to provide a similar computational simplification for the method of section 3.2. A second main drawback lies in a possible lack of accuracy in the computation of the derivatives.
324
G. Calzolari, L. Panuttoni
/ Mode predictors in nonlinear systems
In order to assess the robustness of the results versus the way in which derivatives are computed, several formulas (i.e. two, three and five points formulas with different sizes of the increment) were experimented with. The results proved to be very robust versus both the choice of the formula and the choice of the numerical increment, provided that a centered formula is used.
5. Results for a macro model of the Italian economy In a recent paper, Sylos Labini (1987) has analyzed the main characteristics of the labor market in Italy. Some structural equations are also presented in the paper, for unemployment, productivity, wages and salaries, sometimes disaggregated by sectors. These equations have then been plugged into a macroeconometric model of the real sector (Damiani, 1987). The present version of the model consists of six blocks: demand, supply, prices, labor market, foreign trade, income distribution. The total number of equations in the model is M = 62, while m = 25 is the number of stochastic equations.
Table 1 List of the main endogenous PREMIN PREXP P2 s IN VIDK PR7 PREING VAINDK DIST OCCD OCSRD EXFOBK IMFOBK PILK COFAMK
variables.
= Deflator for domestic private consumption (growth rate). = Export price index (growth rate). = Production per hour in the industrial sector (growth rate). = Wage per hour in the industrial sector (growth rate). = Gross fixed investment in the industrial sector (growth rate). = Profit rate. = Wholesale price index (growth rate). = Value added in the industrial sector at constant prices (growth rate). = Unemployment (growth rate). = Total dependent employment (growth rate). = Employment in the sector of services (growth rate). = Exports at constant prices (growth rate). = Imports at constant prices (growth rate). = Gross domestic product at constant prices (growth rate). = Domestic private consumptions.
Table 2 One-period terministic
PREMIN PREXP P2 s IN VIDK PR7 PREING VAINDK DIST OCCD OCSRD EXFOBK IMFOBK PILK COFAMK
forecast at 1986 (percentage forecasts).
deviations
from
mean
M-mode
m-mode
- 0.126 - 0.394 0.030 - 0.338 0.075 0.021 - 0.595 0.084 0.049 0.040 0.006 0.148 - 0.390 0.050 - 0.467
- 0.40 0.959 - 3.36 0.127 0.884 - 0.081 0.293 -1.08 5.44 2.66 0.179 0.131 -1.57 0.581 1.15
- 0.004 0.063 - 0.401 - 0.002 -0.140 - 0.016 0.009 - 0.286 0.217 - 0.033 - 0.001 0.034 - 0.281 -0.172 - 0.244
de-
Estimates of the structural coefficients of the model have been obtained by means of a straightforward extension of Brundy and Jorgenson’s (1971) instrumental variables method (limited information) to the case of nonlinear models. The method was applied iteratively, as proposed in Dutta and Lyttkens (1974), so that the final estimates of coefficients are not affected by the choice of the initial values of coefficients. In each iteration, the instrumental variables are computed with a deterministic solution of the system (which is the simplest choice, although not the best in the class of nonlinear limited information estimators, as is explained in Amemiya, 1983). The last year of the sample period is 1985. Forecasts are related to the first year outside the sample estimation period, 1986. For some of the main endogenous variables of the model (listed in Table l), Table 2 presents in the last two columns the differences between the deterministic predictor and the two mode predictors defined in sections 3.2 and 3.1. The two columns are labelled M-mode and m-mode, respectively, and the differences are displayed in percentage form (23)
For the sake of completeness, we display in the first column (indicated as mean) the differences between the deterministic predictor and the mean
G. Calzolari, L. Panattoni
/ Mode predictors in nonlinear systems
predictor, still in percentage form. 3 The mean predictor, under the assumption of normally distributed error terms, is computed with parametric stochastic simulation (e.g. Bianchi et al., 1976, 1980; Fisher and Salmon, 1986; Hall, 1986a,b), with antithetic variates (e.g. Calzolari, 1979). The number of replications performed ensures an accuracy at least till the second decimal digit for all the variables. It has been shown (see Kendall and Stuart, 1969, section 2.11) that univariate distributions (unimodal and moderately asymmetric) have the mean and the mode on opposite sides of the median, and that the difference between mean and mode is about three times the difference between mean and median. Under some conditions, the deterministic solution can be considered equivalent to the median (see Hall, 1986b). We should therefore expect a value of the difference median - mode about minus two times the difference median - mean. This is obviously not true in our case, whether we consider the mode computed with the simplified approach of section 3.1, or the global method of section 3.2 (third and second columns of Table 2, respectively). It is also not true that mean and mode lie on opposite sides of the deterministic solution. This means that, unlike the univariate case, when we abandon the deterministic forecast in favor of the (more efficient) mean predictor, we are not necessarily moving towards a region of less likely values. Viewing the same thing in the other way, this means that if our optimality criterion suggests abandoning the deterministic forecast in favor of a mode predictor, we are not necessarily moving towards a less efficient (in quadratic terms) forecast. Between the two mode predictors, the global approach of section 3.2 produces values that are considerably more distant from the deterministic solution. The distance is also considerably larger than for the mean predictors. The mean predictor, in fact, is always very close to the deterministic
3 Alternative
mean predictors could be obtained with the residual-based procedures proposed by Brown and Mariano (1984, 1985). These procedures are non-parametric; in particular they do not rely on the assumption of normality of the error terms. Since this paper is focussed on the mode predictors, results on the mean predictors are given only for comparison purposes and are therefore confined to the parametric method only.
325
solution. The largest discrepancies are observable for the wholesale price index PREING, for which it is about -0.59%, and for the private consumption COFAMK, for which it is about -0.47%; for the growth rate of the gross domestic product PILK it is completely irrelevant, being only 0.05% (of course we should not forget that we are dealing only with forecasts one year ahead; these discrepancies may become larger in a long run of dynamic simulation). When we look at the mode predictors, it is only for the m-mode that we still observe very small differences with respect to the deterministic solution. For the M-mode, on the contrary, the differences seem to be rather relevant.
References Amemiya, T., 1983, “Non-linear regression models”, in: 2. Griliches and M.D. Intriligator, eds., Handbook of Econometrics. Vol. 1 (North-Holland, Amsterdam) 333-389. Anderson, T.W., 1984, An Introduction to Multiuariate Statistical Analysis (Wiley, New York). Bianchi, C. and G. Calzolari, 1980, “The one-period forecast errors in nonlinear econometric models”, International Economic Reuiew, 21, 201-208. Bianchi, C., G. Calzolari and P. Corsi, 1976, “Divergences in the results of stochastic and deterministic simulation of an Italian non-linear econometric model”, in: L. Dekker, ed., Simulation of Systems (North-Holland, Amsterdam) 653661. Brillet, J.L., G. Calzolari and L. Panattoni, 1986, “Coherent optimal prediction with large nonlinear systems: An example based on a French model”, Discussion paper presented at the European Meeting of the Econometric Society, Budapest, September l-5 (INSEE, Paris). Brown, B.W. and R.S. Mariano, 1984, “Residual-based procedures for prediction and estimation in a nonlinear simultaneous system”. Econometrica, 52, 321-343. Brown, B.W. and R.S. Mariano, 1985, “Reduced variance prediction in nonlinear simultaneous systems”, Discussion paper presented at the World Congress of the Econometric Society, M.I.T., Cambridge, MA, August 17-24 (Rice University, Houston, TX). Brundy, J.M. and D.W. Jorgenson, 1971, “Efficient estimation of simultaneous equations by instrumental variables”, The Review of Economics and Statistics, 53, 201-224. Calzolari, G., 1979, “Antithetic variates to estimate the simulation bias in non-linear models”, Economics Letters, 4, 323328. Damiani, M., 1987, “Uno schema macroeconomico di simulazione per l’economia italiana”, Mimeo. (Facoltb di Scienze Politiche, Universita di Perugia, Perugia). Dennis, J.E. and J.J. More, 1977, “Quasi-Newton methods, motivation and theory”, SIAM Review, 19, 46-89. Dutta, M. and E. Lyttkens, 1974, “Iterative instrumental varia-
326
G. Calzolari, L. Panattoni
/ Mode predictors in nonlinear systems
bles method and estimation of a large simultaneous system”, Journal of the American Statistical Association, 69, 917-986. Fair, R.C., 1980, “Estimating the expected predictive accuracy of econometric models”, International Economic Review, 21, 355-378. Fisher, P. and M. Salmon, 1986, “On evaluating the importance of nonlinearity in large macroeconometric models”, International Economic Review, 27, 625-646. Hadley, G., 1961, Linear Algebra (Addison-Wesley, Reading, MA). Hall, S.G, 1986a, “The application of stochastic simulation techniques to the National Institute’s model 7”, The Manchester School, June, 180-201. Hall, S.G., 1986b. “The importance of non-linearities in large forecasting models with stochastic error processes”, Journal of Forecasting, 5, 205-215. Howrey, E.P. and H.H. Keleijan, 1971. “Simulation versus analytical solutions: The case of econometric models”, in: T.H. Naylor, ed., Computer Simulation Experiments with Models of Economic Systems (Wiley, New York) 2999319. Kendall, M.G. and A. Stuart, 1969, The Advanced Theory of Statistics, Vol. I (Charles Griffin, London). Mariano, R.S. and B.W. Brown, 1983, “Asymptotic behavior of predictors in a nonlinear simultaneous system”, International Economic Review, 24, 523-536. Oberhofer, W., 1971, “On the treatment of definition equa-
tions in likelihood functions”, Discussion paper no. 14, (Institut fir Gesellschaftsund Wirtschaftswissenschaften, Universitat Bonn, FRG). Rothenberg, T.J. and C.T. Leenders, 1964, “Efficient estimation of simultaneous equation systems”, Econometrica, 32, 57-16. Sikorski, R., 1969, Advanced Calculus: Functions of Several Variables (PWN-Polish Scientific Publishers, Warszawa). Sylos Labini, P., 1987, “The theory of unemployment, too, is historically conditioned”. Banca Nazronale del Lavoro Quarterly Review, 163, 379-435. Wallis, K.F., 1982, “Time series versus econometric forecasts: A non-linear regression counter-example”, Economtcs Letters. 10, 309-315.
Biographies:
Giorgio CALZOLARI and Lorenzo PANATTONI are research staff members at the IBM Scientific Center of Pisa (Italy). Their research activity is mainly concerned with simulation and estimation methods for econometric models, decision models, forecasting methodology, applications of Monte Carlo methods to econometric models. They have published papers and notes in Econometrrca, Economics Letters, International Economic Reuiew, International Journal of Forecasting, Journal of Economtc Dynamics and Control, Journal of Econometrics.