Spatial Autocorrelation: A Primer

Spatial Autocorrelation: A Primer

JOURNAL OF HOUSING ECONOMICS ARTICLE NO. 7, 304–327 (1998) HE980236 Spatial Autocorrelation: A Primer Robin A. Dubin Case Western Reserve Universit...

420KB Sizes 0 Downloads 165 Views

JOURNAL OF HOUSING ECONOMICS ARTICLE NO.

7, 304–327 (1998)

HE980236

Spatial Autocorrelation: A Primer Robin A. Dubin Case Western Reserve University Received September 17, 1998

Regression error terms are likely to be spatially autocorrelated in any situation in which ‘‘location matters.’’ While both the precision of the estimates and the reliability of hypothesis testing can be improved by making a correction for spatial autocorrelation, the techniques for making such a correction are not widely understood. The purpose of this paper is to explore some of the issues involved in estimating models with spatially autocorrelated error terms. One of the two most common methods of handling spatial autocorrelation is the weight matrix approach, in which the process generating the errors is modeled. The resulting correlation structure is then derived from the hypothesized process. The second method models the correlation structure itself, rather than the underlying process. The bulk of this paper is concerned with comparing these two methods and their resulting correlation structures. Other issues are discussed at the end of the paper.  1998 Academic Press

1. INTRODUCTION

While autocorrelation in a time series context is well understood, and researchers routinely test and correct for this problem, the same cannot be said of autocorrelation in a cross-sectional context. The standard rule of thumb is that autocorrelation is a problem in time series data and heteroscedasticity is a problem with cross-sectional data. However, there are many instances in which an entity’s location affects its behavior. Housing prices are a prime example: clearly the location of the house will have an effect on its selling price. If the location of the house influences its price, then the possibility arises that nearby houses will be affected by the same location factors. Any error in measuring these factors will cause their error terms to be correlated. Spatial autocorrelation is likely to be present in any situation in which location matters. Although spatial autocorrelation can occur in many contexts, in this paper I will focus on housing prices. In the case of housing prices, the location factors are called neighborhood effects. There are at least two reasons to suspect that neighborhood effects are measured with errors. First, neighborhood is unobservable. This means that researchers wishing to 304 1051-1377/98 $25.00 Copyright  1998 by Academic Press All rights of reproduction in any form reserved.

SPATIAL AUTOCORRELATION

305

account for neighborhood must use proxies. Crime rates and socioeconomic characteristics of residents are examples of variables which are commonly used. Second, to make the use of proxies operational, a set of geographic boundaries must be assumed. Typically, the researcher uses the same set of boundaries as the data collector: census tracts are generally the boundaries when socioeconomic data are used and crime reporting areas are commonly used when crime rates are needed. Of course, the geographic boundaries that should be used are the (unknown) neighborhood boundaries. To the extent that neighborhood boundaries differ from the data gathering boundaries, the proxies themselves will contain error. These two problems, unobservability and boundaries, make it virtually certain that neighborhood variables will be measured with error, with the result that the regression error terms will be autocorrelated. The consequences of spatial autocorrelation are the same as those of time series autocorrelation: the OLS estimators are unbiased but inefficient, and the estimates of the variance of the estimators are biased. Thus the precision of the estimates as well as the reliability of hypotheses testing can be improved by making a correction for autocorrelation. Once the structure of the autocorrelation has been estimated, this information can be incorporated into any predictions, thereby improving their accuracy.1 Just as with time series autocorrelation, maximum likelihood (ML) techniques are commonly used to estimate the autocorrelation parameters and the regression coefficients.2 Despite the similarities, spatial autocorrelation is conceptually more difficult to model than time series autocorrelation, because of the ordering issue. In a time series context, the researcher typically assumes that earlier observations can influence later ones, but not the reverse. In the spatial context, an ordering assumption such as this is not possible: if A affects B, it is likely that the reverse is also true. Also, the direction of influence is not limited to one dimension as in time series, but can occur in any direction (although we generally restrict the problem, at least in the case of housing, to two dimensions). The purpose of this paper is to explore some of the issues involved in estimating models with spatially autocorrelated error terms. I use hedonic regression as the example problem, although the techniques discussed here are applicable to a wide variety of problems. I discuss the basic issues involved in modeling the autocorrelation structure and compare and contrast the most commonly used techniques. My purpose in doing so is to 1

This technique is known as kriging in the geostatistics literature and best linear unbiased prediction (BLUP) in the econometrics literature. Dubin (1992) and Basu and Thibodeau (1998) use this technique to predict housing prices. Also, Dubin (1998) and Dubin et al. (1998) discuss the issues involved in kriging. 2 Although other techniques are used in the literature for estimating models with spatially autocorrelated error terms, ML will be the only technique discussed here.

306

ROBIN A. DUBIN

promote a better understanding of these techniques, which I hope will encourage their use.

2. MODELS

There are two commonly used methods of modeling the autocorrelation structure. The first is to model the process itself. This approach is based on the work of geographers (Cliff and Ord, 1981) and requires the use of a weight matrix. This approach is probably the more common of the two in the real estate literature (see Can (1992) and Pace and Gilley (1998) for examples). The second approach is to model the covariance matrix of the error terms directly. This approach is based on the work of geologists (Matheron, 1963) and has also been used in the real estate literature (see Dubin (1988) and Basu and Thibodeau (1998) for examples). 2.a. First Approach: Weight Matrix In this approach, the process generating the error terms is modeled explicitly. The model is Y 5 Xb 1 u

(1.a)

u 5 lWu 1 «.

(1.b)

In a hedonic regression, Y is an (N 3 1) vector containing the selling prices of the houses, X is an (N 3 K ) matrix of the characteristics of the houses, u is an (N 3 1) vector of the correlated error terms, and b is a (K 3 1) vector of unknown regression coefficients. The process generating the correlations is shown in Eq. (1.b). Here, « is an (N 3 1) vector of normally distributed and independent error terms (with mean zero and variance s 2) and l is an unknown autocorrelation parameter (note that l is a scalar). W is the weight matrix, which represents the spatial structure of the data. By far, the most common practice is to treat W as nonstochastic; that is, the researcher takes W as known a priori, and therefore, all results are conditional upon the specification of W (see Pace et al. (1998) for an exception). Note the similarity of this model to the time series AR1 model. Also, just as in time series, the model can be expanded by using various spatial lags (see Anselin (1988, pp. 22–24) for details). In view of its centrality in this approach, a digression on W is in order. W is an N 3 N matrix with zeros on its main diagonal. The off-diagonal elements, Wij , represent the spatial relationship between observations i and j. A common method of forming W is to use nearest neighbors. Under this scheme, Wij 5 1 if i and j are such that there is no observation closer to

SPATIAL AUTOCORRELATION

307

either i or j, and zero otherwise. This scheme can easily be extended to n nearest neighbors. Another popular approach is to set Wij 5 1 if i and j are separated by a distance less than some, prespecified, limit. Rather than making the elements of W binary, another approach is to set Wij 5 1/DPij , where D is an N 3 N matrix showing the distances separating the observations, and P is a constant. All of these approaches have been used in the real estate literature; there does not appear to be any consensus regarding which scheme represents the best realization of the correlation structure appearing in the housing market. This is problematic because all of the results are conditional on the researcher’s a priori specification of the spatial structure. Solving (1.b) for u gives u 5 (I 2 W )21«

(2)

and thus V 5 E [uu9] 5 s 2(I 2 lW )21(I 2 lW9)21

(3)

where V is the variance/covariance matrix of u. Note that V typically will not have a constant on the main diagonal. Thus, in this type of model, u is heteroskedastic, even though « is not. The fact that V involves the product of two inverted matrices makes it difficult to visualize. In what follows, I show the correlations implied by the various choices of W, given a set of locations. Because housing data are not typically located on a regular grid, I use 10 observations, randomly located in a 10 3 10 square. These locations are shown in Fig. 1. Once the locations are known, the distance matrix, D, can be calculated; all of the weight matrices discussed here are based on D (see Table I). Once W is calculated, the population variance/covariance matrix is given by (3). In the illustration, I generate the correlations3 implied by the choice of W for each of the commonly used methods of specifying it: nearest neighbors, Wij 5 1 if Dij # L, and Wij 5 1/DPij . In addition to choosing the spatial weighting scheme, the researcher must also choose a parameter pertaining to it. For example, if the researcher chooses nearest neighbors, he must also decide the number of neighbors to use. For Wij 5 1 if Dij # L, the researcher must decide the distance limit (L). And for the inverse distance weight matrix, the researcher must decide the power to which the denominator is raised (P). These choices 3

The correlations are derived from (3) as follows: Corrij 5 Vij / ÏViiVjj .

308

ROBIN A. DUBIN

FIG. 1. Locations.

(the form of the weight matrix and the value of the parameter) are made a priori by the researcher; the resulting weight matrix is taken as given. As the illustration below shows, these choices change the nature of the implied correlations considerably. A useful tool for representing spatial dependencies is the correlogram. The correlogram shows the correlations between points, graphed as a function of the distance separating them. Although not necessary, a nice prop-

TABLE I Distance Matrix

1 2 3 4 5 6 7 8 9 10

1

2

3

4

5

6

7

8

9

10

0.00 2.80 5.64 3.45 3.43 3.76 1.53 3.14 5.67 4.42

2.80 0.00 7.70 6.11 2.06 6.01 4.32 0.55 8.45 2.32

5.64 7.70 0.00 5.82 9.00 7.26 4.61 7.71 4.14 7.82

3.45 6.11 5.82 0.00 5.93 1.52 2.33 6.52 3.33 7.86

3.43 2.06 9.00 5.93 0.00 5.31 4.84 2.55 8.84 4.26

3.76 6.01 7.26 1.52 5.31 0.00 3.20 6.50 4.76 8.05

1.53 4.32 4.61 2.33 4.84 3.20 0.00 4.62 4.14 5.72

3.14 0.55 7.71 6.52 2.55 6.50 4.62 0.00 8.72 1.77

5.67 8.45 4.14 3.33 8.84 4.76 4.14 8.72 0.00 9.61

4.42 2.32 7.82 7.86 4.26 8.05 5.72 1.77 9.61 0.00

SPATIAL AUTOCORRELATION

309

erty for correlograms to exhibit is that the correlations decline as separation distance increases. This is in accordance with Tobler’s (1970) first law of geography: ‘‘everything is related to everything else, but near things are more related than distant things.’’ In the illustration, I show the correlograms for each of the spatial weighting schemes for different values of the parameters (which are normally chosen by the researcher) and for different values of l (the autocorrelation parameter, which is normally estimated). I use three values of the parameters and two values of l, which gives six correlograms for each weighting scheme. These are shown in Figs. 2 through 4. Note that these correlograms are not based on simulated data, but are the population correlograms, given the locations and the choice of W.4 I also present one weight matrix and one correlation matrix for each scheme, these are shown in Tables 2 through 4. Finally the distance matrix for the data is presented in Table 1. 2.a.1. Nearest neighbors. Figure 2.a. shows the correlograms for three choices of the number of nearest neighbors (1, 2, or 3) when the spatial dependencies are strong (l 5 0.67). Note that because the weight matrices are row standardized,5 the range of l 5 21 to 1. Two observations can be drawn from an examination of this figure. First, while the correlations implied by this choice of W tend to fall with separation distance, the correlations do not fall monotonically. For example, in Fig. 2.a.1, there are zeros interspersed with positive correlations. This means that points separated by the same distances can have very different correlations. This occurs for two reasons: (a) the definition of W itself and (b) the formulation of the variance/covariance matrix as the product of two inverted matrices. The definition comes into play because Wij 5 1 only for nearest neighbors. Consider a case where points A and B are 0.5 units apart and points A and C are 0.6 units apart. For one nearest neighbor, only A and B are neighbors, and therefore, WAC 5 0. The presence of the inverted matrices is important, because it means that the locations of all points are taken into consideration when calculating the correlations. For example, consider row 2 of Table II (this is the correlation matrix for one nearest neighbor, when l 5 0.67). Corr2,8 is the highest in this row because 2 and 8 are nearest neighbors. The other correlations are not zero, however. Corr2,5 is 0.826. This is because 2 is 5’s nearest neighbor.6 Also, Corr2,10 5 0.764. This illustrates a three-way interaction: 8 is nearest neighbor to both 10 and 2, therefore 10 is correlated with 2 4 These correlograms were generated by graphing the values in the population correlation matrix (obtained from Eq. (3)) against the values in the distance matrix. 5 ‘‘Row standardized’’ means that W is transformed so that the rows sum to one. 6 Note that the reverse is not true: 8 (and not 5) is 2’s nearest neighbor. Thus W is not symmetric for the nearest neighbor model.

310

ROBIN A. DUBIN

FIG. 2A. Nearest neighbor correlations: l 5 0.67. (A1) One nearest neighbor; (A2) two nearest neighbors; (A3) three nearest neighbors.

SPATIAL AUTOCORRELATION

311

FIG. 2B. Nearest neighbor correlations: l 5 0.33. (B1) One nearest neighbor; (B2) two nearest neighbors; (B3) three nearest neighbors.

312

ROBIN A. DUBIN

FIG. 3A. Correlograms for Wij 5 1 if Dij , L: l 5 0.67. (A1) L 5 2; (A2) L 5 3; (A3) L 5 4.

SPATIAL AUTOCORRELATION

313

FIG. 3B. Correlograms for Wij 5 1 if Dij , L: l 5 0.33. (B1) L 5 2; (B2) L 5 3; (B3) L 5 4.

314

ROBIN A. DUBIN

FIG. 4A. Correlograms for Wij 5 1/DijP: l 5 0.67. (A1) P 5 1; (A2) P 5 2; (A3) P 5 3.

SPATIAL AUTOCORRELATION

315

FIG. 4B. Correlograms for Wij 5 1/DijP: l 5 0.33. (B1) P 5 1; (B2) P 5 2; (B3) P 5 3.

316

ROBIN A. DUBIN

TABLE II One Nearest Neighbor

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1

2

3

A. Weight Matrix 4 5 6

7

8

9

10

0 0 0 0 0 0 1 0 0 0

0 0 0 0 1 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 1 0

1 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 1

0 0 1 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0

B. Correlation Matrix: l 5 0.67 4 5 6 7

8

9

10

0.00 0.92 0.00 0.00 0.76 0.00 0.00 1.00 0.00 0.83

0.00 0.00 0.76 0.83 0.00 0.76 0.00 0.00 1.00 0.00

0.00 0.76 0.00 0.00 0.63 0.00 0.00 0.83 0.00 1.00

1

2

3

1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.00

0.00 1.00 0.00 0.00 0.83 0.00 0.00 0.92 0.00 0.76

0.00 0.00 1.00 0.63 0.00 0.58 0.00 0.00 0.76 0.00

0.00 0.00 0.63 1.00 0.00 0.92 0.00 0.00 0.83 0.00

0 0 0 0 0 0 0 0 0 0

0.00 0.83 0.00 0.00 1.00 0.00 0.00 0.76 0.00 0.63

0 0 0 1 0 0 0 0 0 0

0.00 0.00 0.58 0.92 0.00 1.00 0.00 0.00 0.76 0.00

0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00

because of 8. Although the correlations are not monotonic in terms of separation distance, they are with respect to the strength of the relationships. Corr2,8 is the highest because 2 and 8 are each other’s nearest neighbors. Corr2,5 is smaller because 2 is 5’s nearest neighbor, but not the reverse. Corr2,10 is smaller yet because 10 and 2 are related only indirectly, through 8. The second observation to be drawn from Fig. 2 is that the choice of the number of nearest neighbors changes the implied correlation structure considerably. For one nearest neighbor, the correlations decline with distance. For two nearest neighbors, there are no nonzero correlations, because all of the points are related, either directly or indirectly. For three nearest neighbors, all of the correlations are about the same, because all of the points are related to each other (recall that there are only 10 observations). This is a potential weakness of this approach, because the researcher generally chooses (rather than estimates) the number of neighbors. 2.a.2. Wij 5 1 if Dij # L. This spatial weighting scheme is similar in

317

SPATIAL AUTOCORRELATION

TABLE III Wij 5 1 if Dij , 2

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

1

2

3

A. Weight Matrix 4 5 6

7

8

9

10

0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 1 0 0 0 0

1 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 1 0 0

B. Correlation Matrix: l 5 0.67 4 5 6 7

8

9

10

0.00 0.87 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.87

0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00

0.00 0.72 0.00 0.00 0.00 0.00 0.00 0.87 0.00 1.00

1

2

3

1.00 0.00 0.00 0.00 0.00 0.00 0.92 0.00 0.00 0.00

0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.87 0.00 0.72

0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

0.00 0.00 0.00 1.00 0.00 0.92 0.00 0.00 0.00 0.00

0 0 0 0 0 0 0 0 0 0

0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00

0 0 0 1 0 0 0 0 0 0

0.00 0.00 0.00 0.92 0.00 1.00 0.00 0.00 0.00 0.00

0.92 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00

concept to nearest neighbors. The weight matrix is still binary; however, rather than specifying the number of ones in each row, this number is determined by setting a maximum distance within which points can influence each other. Unlike the nearest neighbor scheme, this W is always symmetric. Despite the similarities, the two schemes produce different correlation patterns (see Table III). When one nearest neighbor is used, each row of W contains exactly one 1. When L 5 2, the number of 1’s contained in each row of W is zero, one, or two, with the average being 0.75. Thus, L 5 2 is somewhat comparable to one nearest neighbor. However, L 5 2 gives much different correlations: there are only four off-diagonal nonzero correlations and these are very large. When L 5 3, the rows of W contain between zero and four 1’s each, with the average being 1.8. But again the correlations are very different from two nearest neighbors: the correlations fall off more markedly with distance and some pairs exhibit zero correlation. When L 5 4, the average number of ones is 3, but again the pattern of correlations is much different from three nearest neighbors.

318

ROBIN A. DUBIN

TABLE IV Wij 5 1/D3ij

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10

A. Weight Matrix 5 6

1

2

3

4

0.00 0.36 0.18 0.29 0.29 0.27 0.65 0.32 0.18 0.23

0.36 0.00 0.13 0.16 0.48 0.17 0.23 1.81 0.12 0.43

0.18 0.13 0.00 0.17 0.11 0.14 0.22 0.13 0.24 0.13

0.29 0.16 0.17 0.00 0.17 0.66 0.43 0.15 0.30 0.13

1

2

3

1.00 0.32 0.45 0.40 0.36 0.37 0.77 0.31 0.41 0.30

0.32 1.00 0.28 0.13 0.73 0.12 0.23 0.92 0.20 0.75

0.45 0.28 1.00 0.42 0.29 0.39 0.48 0.28 0.54 0.27

0.29 0.48 0.11 0.17 0.00 0.19 0.21 0.39 0.11 0.23

0.27 0.17 0.14 0.66 0.19 0.00 0.31 0.15 0.21 0.12

7

8

9

10

0.65 0.23 0.22 0.43 0.21 0.31 0.00 0.22 0.24 0.17

0.32 1.81 0.13 0.15 0.39 0.15 0.22 0.00 0.11 0.56

0.18 0.12 0.24 0.30 0.11 0.21 0.24 0.11 0.00 0.10

0.23 0.43 0.13 0.13 0.23 0.12 0.17 0.56 0.10 0.00

8

9

10

0.31 0.92 0.28 0.12 0.72 0.12 0.22 1.00 0.19 0.78

0.41 0.20 0.54 0.57 0.23 0.51 0.47 0.19 1.00 0.19

0.30 0.75 0.27 0.13 0.61 0.13 0.23 0.78 0.19 1.00

B. Correlation Matrix: l 5 0.67 4 5 6 7 0.40 0.13 0.42 1.00 0.18 0.82 0.48 0.12 0.57 0.13

0.36 0.73 0.29 0.18 1.00 0.18 0.28 0.72 0.23 0.61

0.37 0.12 0.39 0.82 0.18 1.00 0.44 0.12 0.51 0.13

0.77 0.23 0.48 0.48 0.28 0.44 1.00 0.22 0.47 0.23

Here, the correlations fall off with separation distance, rather than being approximately constant, as for three nearest neighbors. However, this case is similar to nearest neighbors, in that the choice of the parameter greatly affects the correlation pattern. 2.a.3. Wij 5 1/DPij . In this formulation, the elements of W are fractions. This is a departure from the earlier cases, both of which resulted in binary weight matrices. When this case is compared to the previous cases, it is important to remember that the larger P the smaller the ‘‘band of influence.’’ Thus, P 5 3 is closest to one nearest neighbor and to L 5 2. The correlations for this case tend to fall with separation distance, particularly for the smaller ‘‘bandwidths’’ (see Table IV). The variation in the correlations is largest when the band width is small, because this allows the indirect relations to show up. As the band width increases, more of the points share neighbors, and so the correlations become more uniform. Finally, note that

SPATIAL AUTOCORRELATION

319

the pattern of correlations produced by this scheme is markedly different from those produced by the other spatial weighting schemes. 2.a.4. Discussion. This illustration has demonstrated that different spatial weighting schemes produce markedly different implied correlation patterns. Furthermore, the choice of the parameter, which must be set once the family of weighting schemes is specified, also affects the implied correlations. This is problematic for a number of reasons. First, the weighting schemes discussed here are all plausible, and yet they imply different things for the data. Second, most tests of the presence of spatial autocorrelation are conditional on the choice of W. For example, Moran’s I statistic, which is one of the most commonly used tests of spatial autocorrelation, is given by the formula I5

N(e9We) , S(e9e)

(4)

where N is the number of observations, e is a vector of regression residuals, S is a standardization factor, and W is the weight matrix. Clearly the results of this test will be conditional on the researcher’s choice of W.7 This problem is illustrated by a recent article by Can (1992). In this paper, Can uses three weighting schemes: W1ij 5 1 if Dij # 5 miles, W2ij 5 1/Dij , and W3ij 5 1/D2ij . She also uses two functional forms of the hedonic regression: linear and semilog. This gives six combinations. Three of these combinations show significant spatial autocorrelation and three do not. Can has no way of knowing which is the correct specification and therefore whether the errors are spatially correlated or not.8 It would seem that users of this approach to modeling spatial autocorrelation should move in the direction of estimating the parameters of the weight matrix. 2.b. Second Approach: Direct Specification of the Covariance Structure In this approach, rather than starting with the process and deriving the covariance matrix, a functional form for the covariance structure is assumed. The parameters of this function are then estimated, along with the regression coefficients, using maximum likelihood methods. Functions are chosen which cause the correlations to fall as separation distance increases. The following are all permissible functions: 7

Kelejian and Robinson (1982) provide a test of spatial autocorrelation that does not use a weight matrix. 8 It is possible that the likelihood values could give some guidance as to which model best fits the data. However, this requires that the models be nested.

320

ROBIN A. DUBIN

Negative Exponential

S D

(5)

S D

(6)

Kij 5 b1 exp 2

Dij b2

Gaussian Kij 5 b1 exp 2

D2ij b2

Spherical

S

Kij 5 b1 1 2 50

3Dij D3ij 1 2b2 2b32

D

if 0 # Dij , b2

(7)

if Dij . b2 ,

where K is the correlation matrix for the error terms (and s 2K 5 V ). The correlograms for these models for the simulated data are shown in Figs. 5 through 7. These figures differ from those for the weight matrix method. For example, in Fig. 2.a., the three panels represent different choices made by the researcher: the number of nearest neighbors to consider. In Fig. 5, the three panels represent different values of b1 , where b1 is estimated. Once the researcher picks the functional form, the data determine which of the nine functions shown is best (of course, values of b1 and b2 other than those shown in the figure are possible). The three functions result in similar graphs. The Gaussian correlogram falls off faster with separation distance than does the Negative Exponential. The Gaussian also has somewhat more weight at very small separation distances. This is difficult to see from these figures, however, because of the lack of observations with small separation distances (there is only one pair separated by a distance smaller than 1.5).9 As depicted in Fig. 7, the Spherical Correlogram looks very much like the Negative Exponential. In reality, the functions differ in their behavior near the origin, where the Spherical model produces higher correlations. These functions are much smoother than the implied correlograms for the various weight matrices. This is because the correlations are modeled directly, and thus, all points separated by a given distance will have the 9 This turns out to be a problem in empirical work as well. Typically, there are many pairs with large separation distances but a much smaller number with small separation distances. This can make it difficult to fit the beginning of the curve.

SPATIAL AUTOCORRELATION

321

FIG. 5. Correlograms for Kij 5 b1 exp(2Dij /b2). (A) b1 5 0.95; (B) b1 5 0.67; (C) b1 5 0.33.

322

ROBIN A. DUBIN

FIG. 6. Correlograms for Kij 5 b1 exp(2D2ij /b2). (A) b1 5 0.95; (B) b1 5 0.67; (C) b1 5 0.33.

SPATIAL AUTOCORRELATION

323

FIG. 7. Correlograms for spherical case. (A) b1 5 0.95; (B) b1 5 0.67; (C) b1 5 0.33.

324

ROBIN A. DUBIN

same correlation, regardless of the location of other points. This is not the case for the weight matrix correlograms. For example, in the case of one nearest neighbor and l 5 0.95, points 5 and 8 have a correlation of 0.629 and are separated by a distance of 2.549. Points 4 and 7 are closer (separation distance equals 2.327), but have a correlation of zero. This seeming anomaly occurs because point 2 provides the link between points 5 and 8 (as described earlier), while points 4 and 7 have other points which are closer to them and therefore are not nearest neighbors. 2.c. Discussion As pointed out above, there are two main approaches to modeling spatial autocorrelation: the weight matrix approach and the direct approach. Within each approach, there are alternatives available to the researcher (i.e., the method of forming the weight matrix or the functional form for the direct approach). As shown by the figures, each alternative implies a different assumption about the spatial relationships in the data. The literature currently provides little guidance about which models work best in which situations. However, two points seem clear. First, to the extent possible, it is probably better to estimate the parameters of the model, rather than choosing them a priori. Second, any spatial modeling of the error terms, in a situation when autocorrelation is likely to be present, will dominate a model which ignores the problem completely.

3. ESTIMATION

Once a model (weight matrix or direct approach) has been chosen to represent the covariance structure of the error terms, it can be estimated via Maximum Likelihood.10 In Maximum Likelihood estimation, the following log likelihood function is maximized with respect to the unknown parameters: 1 ˜ u 2 n ln(Y 2 Xb˜ )9V˜ 21(Y 2 Xb˜ ). ln(L) 5 2 ln uV 2 2

(8)

The unknowns are the regression coefficients (b˜ ), the error variance (s˜ 2), 10

Other techniques are available. For example, in the direct approach, one technique is to fit (usually by eye) the parameters to an empirical correlogram (which is the average correlation among all points in a given separation distance range, plotted against separation distance). Once the parameters of the correlation function have been estimated, EGLS (estimated generalized least squares) can be used to obtain the regression coefficients. These techniques will not be discussed further here.

SPATIAL AUTOCORRELATION

325

˜ (b˜ 1 and b˜ 2 or l˜ , depending on which approach is and the parameters of V used). One nice byproduct of the ML approach is that a likelihood ratio test can be used to determine the presence of spatial autocorrelation: two times the difference between the likelihood functions of the restricted and unrestricted models is distributed as a x 2 random variable. Here the restricted model is OLS, i.e., restricting V to be the identity matrix. The degrees of freedom are 1 for the weight matrix approach and 2 for the direct approach.

4. OTHER ISSUES

4.a. Sample Size V is an N 3 N matrix, where N is the sample size. The log likelihood function contains both the determinant and the inverse of this matrix. Thus, the computational burden increases with sample size. However, since the accuracy of the estimates also increase with the sample size, it is important to use a large sample size in these problems. Pace (1997) has suggested the use of sparse matrix techniques to facilitate the use of large samples. If V is specified so that the number of nonzero elements is relatively small, these methods can reduce the computational burden considerably. 4.b. Measurement of Separation Distance Urban areas vary in the density of development. Therefore, it is possible that neighborhood size varies with the location of the neighborhood within the city: dense areas may have neighborhoods which are more compact, while suburban areas may have geographically larger neighborhoods. Researchers may wish to account for this by using separation measures other than geographic distance. For example, Dubin (1992) measures separation distance in terms of houses. 4.c. Functional Form of the Regression Hedonic regressions are reduced form, and economic theory has little to say about the proper functional form of such an equation. Most of this paper addresses the issue of the assumed form of the covariance structure. Clearly the functional form of the regression itself is of even greater importance: the regression residuals will not reflect the true error structure if the wrong functional form is used.

326

ROBIN A. DUBIN

5. FURTHER READING

Below, I provide an annotated list of sources which the interested reader may wish to consult. Some of these are cited elsewhere in this paper. Texts 1. Anselin (1988). This book provides an extremely complete presentation of the weight matrix approach. 2. Ripley (1981). Chapter 4 of this book provides an excellent discussion of Kriging (prediction incorporating the spatially autocorrelated errors). 3. Upton and Fingleton (1985). In Chapter 5, the authors discuss regression with autocorrelated errors, using the weight matrix approach. This book is particularly nice because data and solutions are provided for most of the techniques discussed. 4. Anselin and Florax (1995). This book is an edited collection of many interesting papers on spatial econometrics. Papers 1. Dubin (1988). This is probably the first application of these techniques to estimating a hedonic regression. This paper uses the direct approach. 2. Can (1992). An example of a hedonic estimation using the Weight matrix technique. 3. Pace et al. (1998). Provides an example of estimating the number of nearest neighbors in the weight matrix. 4. Basu and Thibodeau (1998). Kriges housing prices in Dallas. 5. Dubin (1998). Discusses the issues involved in Kriging housing prices. 6. Pace (1997). Uses sparse matrix techniques to facilitate the estimation.

REFERENCES Anselin, L. (1988). Spatial Econometrics: Methods and Models. Dordrecht: Kluwer. Anselin, L., and Florax, R. (1995). New Directions in Spatial Econometrics. Berlin: Springer-Verlag. Basu, S., and Thibodeau, T. (1998). ‘‘Analysis of Spatial Autocorrelation in House Prices,’’ J. Real Estate Finance Econ. 17, 61–86. Can, A. (1992). ‘‘Specification and Estimation of Hedonic Housing Price Models,’’ Reg. Sci. Urban Econ. 22, 453–474.

SPATIAL AUTOCORRELATION

327

Cliff, A. D., and Ord, J. K. (1981). Spatial Processes: Models and Applications. London: Pion. Dubin, R. A. (1988). ‘‘Estimation of Regression Coefficients in the Presence of Spatially Autocorrelated Error Terms,’’ Rev. Econ. Statist., 168–173. Dubin, R. A. (1992). ‘‘Spatial Autocorrelation and Neighborhood Quality,’’ Reg. Sci. Urban Econ. 22, 433–452. Dubin, R. A. (1998). ‘‘Predicting House Prices Using Multiple Listings Data,’’ J. Real Estate Finance Econ. 17, 35–60. Dubin, R. A., Pace, K., and Thibodeau, T. (forthcoming). ‘‘Spatial Autoregression Techniques for Real Estate Data.’’ Kelejian, H. H., and Robinson, D. P. (1982). ‘‘Spatial Autocorrelation: A New Computationally Simple Test with an Application to Per Capita County Policy Expenditures,’’ Reg. Sci. Urban Econ. 22, 317–332. Matheron, G. (1963). ‘‘Principles of Geostatistics,’’ Econ. Geol. 58, 1246–1266. Pace, K. (1997). ‘‘Performing Large Spatial Regressions and Autoregressions,’’ Econ. Lett., 283–291. Pace, K., and Gilley, O. (1998). ‘‘Generalizing the OLS and Grid Estimators,’’ Real Estate Econ., 331–347. Pace, K., Barry, R., Clapp, J. M., and Rodriguez, M. (1998). ‘‘Spatiotemporal Autoregressive Models of Neighborhood Effects,’’ J. Real Estate Finance Econ., 15–34. Ripley, B. D. (1981). Spatial Statistics. New York: Wiley. Tobler, W. (1970). ‘‘A Computer Movie Simulating Urban Growth in the Detroit Region,’’ Econ. Geog. Supplement 46, 234–240. Upton, G., and Fingleton, B. (1985). Spatial Data Analysis by Example. New York: Wiley.