Economics Letters 75 (2002) 11–16 www.elsevier.com / locate / econbase
The partially linear regression model: Monte Carlo evidence from the projection pursuit regression approach Dingding Li, Thanasis Stengos* Department of Economics, University of Guelph, Guelph, Ontario, Canada N1 G 2 W1 Received 28 April 2001; accepted 23 July 2001
Abstract In a partially linear regression model with a high dimensional unknown component we find an estimator of the parameter of the linear part based on projection pursuit methods to be considerably more efficient than the standard density weighted kernel estimator. 2002 Elsevier Science B.V. All rights reserved. Keywords: Partially linear model; Projection pursuit regression JEL classification: C14; C15
1. Introduction The most extensively studied nonparametric regression techniques are based on p-dimensional local averaging: the estimate of the regression surface at a point x 0 is the average of the responses of those observations with predictors in a neighborhood of x 0 . These techniques can be shown to have desirable asymptotic properties. In high-dimensional settings, however, they do not perform well for reasonable sample sizes. The reason is the inherent sparsity of high-dimensional samples. This is the well known ‘curse of dimensionality’ that plagues nonparametric methods. Among the methods developed in the last few years in order to deal with the above problem is the Projection Pursuit Regression (PPR), see Friedman and Stuetzle (1981) and Huber (1985) for an excellent review. The PPR is based on an approach designed to detect structure in high-dimensional data sets by viewing lower dimensional representations of the data by means of linear projections. The power of the technique stems from its ability to overcome the ‘curse of dimensionality’ since it relies * Corresponding author. Present address: Department of Economics, University of Cyprus, P.O. Box 20537, Nicosia, Cyprus. Fax: 1357-2-750-310. E-mail address:
[email protected] (T. Stengos). 0165-1765 / 02 / $ – see front matter PII: S0165-1765( 01 )00589-4
2002 Elsevier Science B.V. All rights reserved.
D. Li, T. Stengos / Economics Letters 75 (2002) 11 – 16
12
in estimation in at most tri-variate settings. The term Projection Pursuit was coined by Friedman and Tukey (1974) in the first successful implementation of the technique. Hall (1989) proved that the common form of kernel-based PPR did estimate projections with convergence rates identical to those encountered in one-dimensional problems, although a greater degree of smoothness (in fact, an extra derivative) must be assumed to achieve this end. In this paper we will employ PPR in a partially linear semiparametric regression (PLR) model and by means of Monte Carlo simulations evaluate the efficiency gains that are obtained using PPR as compared with the more widely applied local constant kernel methods. In the next section we present a brief presentation of the idea behind PPR. We proceed then to present the PLR model, see Robinson (1988). Finally we present the results of the simulation.
2. The projection pursuit regression approach The PPR approach is based on projections of the data on planes spanned by the endogenous variable y and a linear combination a T x of the explanatory variables. The idea of PPR is to approximate the mean regression function by a sum of unknown ridge functions. Diaconis and Shahshahani (1984) provide an approximation theory justification as the number of ridge functions goes to infinity. Many data sets are high dimensional. It has been a common practice to use lower dimensional linear projections of the data for visual inspection. The lower dimension is usually 1 or 2 (or maybe 3). More precisely, if x 1 , . . . ,x n [ R p are p-dimensional data, then a k( , p)-dimensional linear projection is w 1 , . . . ,w n [ R k , where w i 5 a T x i for some p 3 k matrix a such that a T a 5 Ik , the k-dimensional identity matrix. When k 5 1, the structure of the projected data can be viewed through a histogram; when k52, the structure can be inspected through its scatter plot; and when k53, it can be comprehensived by spinning a three-dimensional scatter plot. Since there are infinitely many projections from a higher dimension to a lower dimension, it is important to have a technique of pursuing a finite sequence of projections that can reveal the most interesting structures of the data. Below we will present a simple description of the mechanics involved in regression estimation for the case k 5 1. In a typical regression model, (X, y) is an observable pair of random variables, where X [ R p is a p-dimensional variable and y [ R is the dependent variable. The goal is to estimate the unknown regression function f(x) 5 Es yuX 5 xd, using a random sample (x 1 , y 1 ), . . . ,(x n , y n ). PPR approximates the unknown regression function f(x) by a finite sum of ridge functions g m (x) 5 o mj51 g jsx Tj ad. The regression function f(x i ), (x i is 13p) is written then as the sum of m , p unknown nonlinear functions in scalars w ij 5 x Ti a j :
O g sx a d 1 ´ m
yi 5
T i j
j
(1)
i
j51
T
Suppose that we want to estimate a 1 in w i 1 5 x i a 1 . We start by conditioning (1) on w i 1 :
O Ef g sw duw g k
Es y i uw i 1d 5
j 51
j
ij
i1
(2)
D. Li, T. Stengos / Economics Letters 75 (2002) 11 – 16
13
We can write: y i 5 Es y i uw i 1d 1 u i
(3)
where:
Oh g sx a d 2 Ef g sw duw j 1 ´ k
ui 5
j
T i j
j
ij
i1
i
j52
Hence by construction, E(u i uw i 1 ) 5 0. The above can be also written as: y i 5 fsw i 1d 1 u i
(4)
If f( ? ) were known in (4) one could simply apply nonlinear least squares (NLS) to obtain an estimate of a 1 . However, since f( ? ) is unknown we proceed as follows. Step 1: We start with an initial guess for a 1 , say a 01 , then form w 0i 1 5 x Ti a 01 . We then estimate by 0 nonparametric kernel methods Es y i uw i 1d. We choose the value of a i that minimizes the sum of squared residuals (SSR) function o in51s y i 2 Es y i uw i 1dd 2 . That yields an estimate of g1 (w i 1 ), say gˆ 1 (w i 1 ). Step 2: We form y *i 5 y i 2 gˆ 1 (w i 1 ) and we repeat the procedure of step 1 above to obtain g2 (w i 2 ). Step 3: We proceed to form y i** 5 y i* 2 gˆ 2 (w i 2 ) and we continue as before. We stop whenever the improvement in the SSR function between successive iterations is less than some than some predetermined small tolerance value. Note that in the implementation of the PPR method we only have to deal with univariate constant kernel regressions. They have to be computed for all n observations at each stage of each iteration however, and for large n there is a computational burden involved.
2.1. Semiparametric estimation of partially linear models Our analysis will follow closely that of Robinson (1988). We consider a semiparametric partially linear model (e.g. Engle et al., 1986; Robinson, 1988; Stock, 1989): y i 5 x 9i b 1 u (z i ) 1 u i
(5)
where x i is of dimension p 3 1 and z i is of dimension q 3 1. b is an unknown parameter of dimension p 3 1. E(u i ux i , z i ) 5 0 and the conditional variance function s 2 (x i , z i ) 5 E(u i2 ux i , z i ) is not specified, u ( ? ) is an unknown smooth function. We interested (mainly) in estimating b. Following Robinson (1988), we use a two-step procedure and we will first estimate b. Taking conditional expectation of (1) (conditional on z i ) and then subtracting it from (1) yields: y i 2 Es y i uz id 5hx i 2 Esx i uz idj9 b 1 u i
(6)
Eq. (2) no longer has the unknown function u ( ? ). Therefore, we can estimate b based on Eq. (2). However, E( y i uz i ), E(x i uz i ) are unknown in practice. These conditional expectations can be consistently estimated using some nonparametric methods. In this paper we will use the kernel method. In order to avoid the random denominator problem associated with nonparametric kernel estimation, we choose to estimate a density weighted version of Eq. (2) (e.g. Powell et al., 1989). Let f ( ? ) be the density function of z i . Multiplying Eq. (2) by f (z i ) gives:
D. Li, T. Stengos / Economics Letters 75 (2002) 11 – 16
14
fsz idh y i 2 Es y i uz idj 5 fsz idhx i 2 Esx i uz idj9 b 1 fsz id u i
(7)
We estimate E(u i uz i ), E(x i uz i ) and f (z i ) by kernel estimators given by:
O Yfˆ 1 xˆ ; Eˆ sx uz d 5 ]] O x K Y fˆ Nh 1 yˆ i ; Eˆ s y i uz id 5 ]]q yK Nh s±i s i,s i
i
i
q
s
i,s
i
i
(8) (9)
s±i
and
O
1 fˆ i ; fˆ sz id 5 ]]q K Nh s±i i,s
(10)
where Ki,s 5 K(z i 2 z s /h) is the kernel function and h is the smoothing parameter. We use the product q kernel K(z i ) 5 l51 k(z i,l ), where k is a univariate kernel and z i,l is the lth component of z i . Then replacing the unknown functions in (7) by their respective kernel estimates we get:
P
fˆ is y i 2 yˆ id 5 fˆ isz i 2 zˆ id9 b 1 fˆ i u i
(11)
Therefore we estimate b by the least squares method of regressing ( y i 2 yˆ i ) fˆ i on (x i 2 xˆ i ) fˆ i :
bˆ 5 S 21 ˆ y2yˆ ) fˆ (x 2xˆ ) fˆ S(x2xˆ ) f,(
(12)
where for scalar or column-vector sequences with ith elements A i fˆ i and Bi fˆ i , the notations ] SA fˆ ,B fˆ 5 ]N1 o i A i fˆ i B i9 fˆ i and SA fˆ 5 SA fˆ , A fˆ . The estimator bˆ is then shown to be Œn-consistent. Below we will investigate its small sample properties when we compute it using the standard approach based on constant kernel methods as in Eq. (11) and applying PPR to E( y i uz i ) and E(x i uz i ) and then using least squares estimation directly on Eq. (6).
3. Monte Carlo simulations We implement the PPR method on the partially linear regression model by obtaining estimates of E( y i uz i ) and E(x i uz i ) and then using least squares estimation directly on Eq. (6) to obtain an estimate of b. We carry out simulations to investigate the behavior of bˆ in small to moderate sample sizes under two different cases, a low dimensional and a high dimensional z-vector. The standard approach is implemented using the density weighted method outlined in the previous section. The Monte Carlo design is similar to that of Robinson (1988). Table 1 contains the results for the model: y i 5 x i b 1 z i2g 1 u i
(13)
The variables X and Z are taken to be scalars from a bivariate normal population within zero means, variances 4 and 3, respectively, and covariance 2. The error term u is also chosen to be standard normal. The parameters are taken to be b 5 g 5 1 and the the sample size n is chosen as 50, 100, 200
D. Li, T. Stengos / Economics Letters 75 (2002) 11 – 16
15
Table 1 One-dimensional unknown component n
MSE-ppr
MSE-semi
Rel. EFF
50 100 200 300
0.0134 0.0069 0.0032 0.0021
0.0140 0.0062 0.0028 0.0021
0.9720 1.0173 1.0240 1.0241
and 300. The bandwidth selection was done by cross validation both for the standard density weighted semiparametric estimator and the PPR estimator. For each sample size we report the mean square error (MSE) for bˆ semi , the standard density weighted semiparametric estimator of b and bˆ ppr , the PPR estimator of b. We also report a measure of relative efficiency between the two estimators as the ratio of the standard deviation bˆ ppr to that of bˆ semi . A value greater than unity indicates that bˆ ppr is less efficient than bˆ semi and a value less than unity the opposite. The results suggest that for an one-dimensional z, the standard density weighted approach produces overall more efficient results, since all the values of the relative efficiency measure are greater than unity, except for the case when n550. The MSE values are also smaller for the estimates from the standard approach. Table 2 presents the results for the case of a Z which is higher dimensional. The equation that generates the data is given by: q
yi 5 a 1 xi b 1
O z g 1u 2 ji j
(14)
i
j51
where q 5 5, and b, gj , ( j 5 1, . . . ,5) are all 1, u is as before standard normal and X and the Z 9j s are equicorrelated identically distributed N(1, 3) variables, with correlation 2 / 3. The results are presented in Table 2. It becomes quite apparent that the PPR approach produces considerable improvements in relative efficiency. As the sample size increases the measure of relative efficiency increases as well from about 0.5 for n 5 50 to about 0.75 for n 5 300. As it is expected as n increases the performance of the standard semiparametric estimator improves but it is apparent that for small to moderate sizes the PPR approach offers substantial efficiency gains. Hence, it can become a useful tool to improve the small sample efficiency of local averaging nonparametric estimators obtained in high dimensional settings. Table 2 Multi-dimensional unknown component n
MSE-ppr
MSE-semi
Rel. EFF
50 100 200 300
0.0483 0.0245 0.0120 0.0056
0.2000 0.0738 0.0324 0.0211
0.5176 0.6323 0.7145 0.7498
D. Li, T. Stengos / Economics Letters 75 (2002) 11 – 16
16
Acknowledgements The second author wants to acknowledge support from SSHRC of Canada.
References Diaconis, P., Shahshahani, M., 1984. On nonlinear functions of linear combinations. SIAM Journal of Science and Statistical Computation 5, 175–191. Engle, R.F., Grange, C.W.J., Rice, J., Weiss, A., 1986. Semiparametric estimates of time relationship between weather and electricity sales. Journal of the American Statistical Association 81, 310–320. Friedman, J.H., Tukey, J., 1974. A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computer C 23, 881–889. Friedman, J.H., Stuetzle, W., 1981. Projection pursuit regression. Journal of the American Statistical Association 76, 817–823. Hall, P., 1989. On projection pursuit regression. Annals of Statistics 17, 573–588. Huber, P., 1985. Projection pursuit. Annals of Statistics 13, 435–525. Powell, J.L., Stock, J.H., Stoker, T.M., 1989. Semiparametric estimation of index coefficients. Econometrica 57, 1403–1430. Robinson, P., 1988. Root-N consistent semiparametric regression. Econometrica 56, 931–954. Stock, J.H., 1989. Nonparametric policy analysis. Journal of American Statistical Association 84, 567–575.