TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0 214 11/21 pp344- 353 Volume 10, Number 3, June 2005
Influence of Spatial Features on Land and Housing Prices* 噛
GAO Xiaolu (໋غা)**, ASAMI Yasushi
Japan Society for the Promotion of Science; Department of Urban Research, National Institute for Land and Infrastructure Management, Japan; 噛 Center for Spatial Information Science, the University of Tokyo, Tokyo 113-8654, Japan Abstract:
The analysis of hidden spatial features is crucial for the improvement of hedonic regression
models for analyzing the structure of land and housing prices. If critical variables representing the influence of spatial features are omitted in the models, the residuals and the coefficients estimated usually exhibit some kind of spatial pattern. Hence, exploration of the relationship between the spatial patterns and the spatial features essentially leads to the discovery of omitted variables. The analyses in this paper were based on two exploratory approaches: one on the residual of a global regression model and the other on the geographically weighted regression (GWR) technique. In the GWR model, the regression coefficients are allowed to differ by location so more spatial patterns can be revealed. Comparison of the two approaches shows that they play supplementary roles for the detection of lot-associated variables and area-associated variables. Key words:
spatial features; spatial variation; regression model; residual; geographically weighted regression (GWR)
Introduction When the hedonic regression approach is applied to analyze the structure of land and housing prices, we often seek hidden variables to improve the models. If critical variables representing the influence of spatial features are omitted in the models, the residuals and the coefficients being estimated usually exhibit some kind of spatial pattern. Analysis of the relationships between the spatial patterns and the spatial features will lead to the discovery of omitted variables. For this reason, analysis of the spatial patterns is very important. In recent years, a variety of approaches have been proposed for exploring spatial patterns. This work Received: 2003-07-10; revised: 2003-12-08 γ Supported by the Special Coordination Funds for Promoting Science and Technology, and the Research Grant-In-Aid provided by
analyzes two methods. The first method studies the patterns of the residuals with a standard regression model to investigate the influence of spatial features on the patterns. Since the regression coefficients are constant over space by definition of the model, this method is a global regression method. The second method uses the geographically weighted regression (GWR) approach, which belongs to the family of local regression methods. A GWR model allows the regression coefficients to differ with location, and thus more spatial patterns are revealed, which can be used to explore the impacts of spatial features. The two approaches are compared to show how they play supplementary roles for improving hedonic regression models.
1 Analysis of the Residuals of the Global Regression Model
the Ministry of Education, Culture, Sports, Science, and Technology, Japan
γγ To whom correspondence should be addressed. E-mail:
[email protected]; Tel: 81-29-864-3839
The Gao and Asami[1] sample, which includes the transaction price and the detailed attributes of 190
GAO Xiaolu (໋غা) et alġInfluence of Spatial Features on Land and Housing Prices
detached housing lots in Setagaya Ward, western Tokyo, was used for the analyses. We start with the hedonic regression model given by Gao and Asami[1], which models the unit price of the 190 properties with Table 1 Variable
345
a stepwise regression method using 16 independent variables. Their coefficients were satisfactory. Model A in Table 1 lists the estimated coefficients for this model.
Unit price regression model results
Definition of variable
Constant Actual_FAR
Building floor area / lot area
Train
Time to the nearest train station (min)
Road_width
Width of road fronting on lot (m)
Bldg_duration/S
Remaining age of house (a)/ lot area
Landscaping
Within landscape zones(1), 1; otherwise, 0
Shinjuku
Time to Shinjuku CBD by train (min)
Frontage
Lot frontage (m)
Good_pavement
Front road pavement is good, 1; otherwise, 0
Parking_lot
Number of parking lots
Bldg_quality
Building quality in the district is good, 1; otherwise, 0
Sunshine/S
Sunshine duration of house (h)/ lot area
Adj_park
Adjacent to park, 1; otherwise, 0
Adj_park/S
Adj_park / lot area
Mixed_use Mixed_use/S
Non-residential uses are more than 1/3 in the district, 1; otherwise, 0 Mixed_use/ lot area
Tree
Greenery is good in the district, 1; otherwise, 0
500 m_large_parks
Within 500 m to parks over 5000 m2, 1; otherwise, 0
Pop_den
Density of population in block (100 person/hm2)
% of road area/S
Road coverage ratio in block
R-square Adjusted R-square AIC
Regression coefficient (million Yen/m2 per unit) Model A Model B Model C 0.9115** (9.17) 0.1276** (3.22) 䋭0.0157** (9.61) 0.0209** (2.86) 0.5686** (6.42) 䋭0.1726** (䋭8.46) 䋭0.0168** (䋭6.60) 0.0058* (2.38) 0.0420** (2.80) 0.0382** (3.54) 0.0575** (3.51) 0.9476** (2.67) 䋭0.1956** (䋭2.97) 21.4547** (3.14) 0.2384** (2.64) 䋭17.4766* (䋭2.44) 0.0335* (1.99)
0.8600** (8.81) 0.1360** (3.54) 䋭0.0154** (䋭9.72) 0.0212** (3.00) 0.5300** (6.11) 䋭0.1660** (䋭8.33) 䋭0.0164** (䋭6.61) 0.0054* (2.26) 0.0417** (2.87) 0.0421** (4.00) 0.0627** (3.93) 0.9750** (2.83) 䋭0.2410** (䋭3.70) 25.2110** (3.75) 0.2290** (2.61) 䋭17.1450* (䋭2.47) 0.0353* (2.16) 0.0502** (3.46)
0.756 0.734 䋭876.11
0.772 0.749 䋭887.12
0.9127** (9.20) 0.1122** (2.81) 䋭0.0166** (䋭9.98) 0.0162* (2.28) 0.3899** (4.02) 䋭0.1860** (䋭8.77) 䋭0.0146** (䋭5.92) 0.0049* (䋭2.15) 0.0339* (2.33) 0.0482** (4.60) 0.0610** (3.89) 0.8488* (2.51) 䋭0.2468** (䋭3.90) 26.2212** (4.01) 0.2973** (3.38) 䋭23.1857** (䋭3.31) 0.0368* (2.23) 0.0571** (4.00) 䋭5.9931* (䋭2.42) 47.9868** (2.82) 0.789 0.766 䋭897.02
*: Significant at 5% level. **: Significant at 1% level. T-statistic is given inside brackets. (1)
: Planning controls in landscape zones are stricter than in other districts. This helps to explain the negative coefficient of this variable.
Tsinghua Science and Technology, June 2005, 10(3): 344̢353
346
Because this model assumed that the estimated regression coefficients are constant over space, it is called a global regression model. One way to improve a hedonic regression model is to identify critical spatial features that have been neglected in the model. Our strategy is to investigate the spatial pattern of the residuals of the global regression
Fig. 1
1.1
Contour map of regression residuals
Public facilities
According to survey results, poor accessibility to public facilities, such as parks, libraries, and community centers, are among the items most cited by residents that cause dissatisfaction towards living environments (1993 National Housing Survey of Japan). The distribution of public facilities, including schools, hospitals, community centers, parks, sports facilities, and water treatment plants, can be roughly related to the residual map in Fig. 1. Therefore, the distances from the housing lots to these facilities were included in the regression analysis to indicate the effects of these facilities. The new distance variables that correlated relatively strongly to the residuals were added to the regression function specified in Eq. (1), which is the same as that of Model A, and their significance levels were examined.
P/S
k
k
i 1
i 1
model to find spatial features that have some relationship with the spatial pattern, then to build an improved model including these features. The 190 sample properties were plotted on a contour map, Fig. 1, with the regression residuals associated with each sample point. This contour map was generated with a spline method. The following features were then examined.
constant ¦ ci ( X i /S ) ¦ d i X i İ
(1)
where P is the price of land lot and house; S is lot area;
constant is the intercept; Xi is independent variables representing attribute i; ci and di are parameters for i ę{1,…,k}, indicating hedonic prices; and H is the error term. This specification identifies the variables that affect the price and that affect the unit price. If di is significantly different from 0, Xi has a significant effect on the unit price (P/S). If ci is significantly different from 0, Xi has a significant effect on the price (P). If ci and di for the same variable are both significant, Xi has a variable effect as the lot size changes. The results show that, among the new variables, the proximity to a large park has a significant positive effect on the unit prices of the land and house. By changing the threshold size of a large park (including 1000, 2000, 3000, 4000, 5000, 6000, 8000, 10 000, and 20 000 m2) and the distances from the large park (including 400, 450, 500, 600, 800, and 1000 m), the dummy variable for within 500 m to parks over 5000 m2 gave the best fit. With this variable, the R-square of the regression model increased from 0.756 to 0.772 and the Akaike’s information criterion(AIC) decreased
GAO Xiaolu (໋غা) et alġInfluence of Spatial Features on Land and Housing Prices
from 876 to 887. The second model in Table 1, Model B, lists the new results. In practice, parks over 5000 m2 are designated as regional parks. The results suggest that the unit price of lands and houses within 500 m (about 7-8 min walking distance) of large parks is approximately 0.05 million Yen per square meter higher than in other areas. This estimate is quite reasonable and agrees well with other studies[2]. The results also imply that large parks are an effective way to improve the environment of the areas without large parks. 1.2
values in the third column are the correlation of residual_Z with the residual of Model A. The variables having high correlation coefficients in the third column correspond well to those signified in Model C. Table 2
Correlation of spatial features with residuals of Model A Correlation coefficient
Spatial feature variables (Z)
Z and
Residual_Z and
residual 500 m_large_parks
0.243
residual **
䋭0.068
Pop_den
District environmental indices
% of road area/S
0.099 (1)
% of public open space /S
The residual map was also compared to a variety of environmental indicators in the blocks. The correlation between these indicators and the residuals was not as strong as expected, but the regression analysis showed that the population density (pop_den) and the road coverage ratio (% of road area) were significant. The unit prices of lands and houses were higher in the areas with greater road coverage ratios or lower population densities. Model C in Table 1 shows the results with the road coverage ratio and population density included. Model C gives the best fit among the three models from the values of R-square and AIC. The estimated coefficients for this model were similar to those of the other two models. The residual map and the correlations between the residuals and the spatial features were used to analyze the effects of various variables. The correlations of the spatial features with the residuals do not immediately reflect their significance in Model C. For example, the correlation coefficients for population density and the ratio of road area / lot size with the residuals were fairly weak but these two variables were significant in Model C. This is because the new variables indicating spatial features may be correlated with the variables originally included in Model A. To identify the significant variables, the regressions were run with the new spatial features (to be called Z) as dependent variables and the original variables of Model A as independent variables. The residuals of these regression analyses were called residual_Z. The residual_Z were then correlated with the residuals of Model A. Some examples are given in Table 2, where the values in the second column are the correlation of Z with the residual of Model A, and the
347
(2)
% of vacant land /S
0.255** 䋭0.114* 0.181**
䋭0.013
䋭0.016
0.063
0.098
*: Significant at 5% level (one-tail); **: Significant at 1% level (one-tail). (1)
: Coverage ratio of parks and playgrounds in block;
(2)
: Coverage ratio of vacant parcels in block.
However, the residual analysis method has a serious shortcoming: while a residual map can reveal the effects of the most significant features, it cannot as easily reveal the presence of weak effects or the presence of many co-existing spatial effects. Sometimes, the spatial patterns of the residuals may be just too complicated to be described with one or several global features. In such cases, a more formal framework is needed to explore the influences of the spatial features. Techniques that focus on the localized estimates of coefficients are very useful for this purpose.
2 Local Concerns and Geographically Weighted Regression In the field of spatial analysis, there has been an increasing interest on local forms and local modeling in recent years. A variety of new techniques have been developed which focus on identifying spatial variations in relationships rather than on the establishment of global statements of spatial behavior, e.g., local point pattern analyses, local graphical approaches, local measures of spatial dependency, the spatial expansion method, adaptive filtering, multilevel modeling, GWR, random coefficient models, autoregressive models, and local forms of spatial interaction models[3-5]. Among these techniques, GWR is thought to be a particularly good exploratory method to assist modeling. The GWR theory is based on the model given by
Tsinghua Science and Technology, June 2005, 10(3): 344̢353
348
Fotheringham et al.[6] Consider the global regression model given by a0 ¦ ak xik İi
yi
(2)
k
The GWR technique extends the traditional regression framework of Eq. (2) by allowing local rather than global parameters to be estimated with the model rewritten as a0i ¦ aki xik İi
yi
(3)
i
k
where aki represents the value of ak at point i. With Eq. (3), regressions run at various locations give different estimates. To estimate the parameters in Eq. (3), an observation is weighted in accordance with its proximity to point i, so the weighting of the observation is no longer constant but varies with i. Data from observations close to i are weighted more than data from observations far away. In vector form, a GWR model can be written as y
Then, Eq. (5) gives the estimate of ai as aˆi ( X TWi X ) 1 X TWi y
(4) (5)
where X is the independent variable matrix, y is the dependent variable vector, and Wi is an nun matrix whose diagonal elements (wij) denote the geographical weighting of the observed data for point i and whose off-diagonal elements are zero. n is the number of samples. There may be various definitions for wij, for example, the reciprocal of the Euclidean distance between i and j. A global regression can be seen as a specific case of Eq. (4) where the weightings are unity. To be more adaptable, Fotheringham et al.[6] defined wij as wij
§ dij2 · exp ¨ 2 ¸ ¨ ȕ ¸ © ¹
where yˆi ( ȕ ) is the fitted value of yi with a bandwidth of E. However, if E becomes so small that the weightings of all points except for i itself become negligible, the value of Formula (7) becomes zero, but E=0 is meaningless. A cross-validation approach was proposed by Fotheringham et al. [6] to solve this problem, where the estimates at point i were calibrated with samples near to i but excluding i. Accordingly, a crossvalidation (CV) score defined by (8) CV score= ¦ [ yi yˆ z i ( ȕ )]2 i
1
ai X Wi 2 İ
The choice of bandwidth E is related to the trade-off between bias and variance. The greater the local sample size, the lower the standard errors of the coefficient estimates are. But this must be offset against the fact that enlarging the subset increases the chance that coefficient drift introduces bias. Therefore, the selection of an appropriate value for E is critical. One way to select E is by minimizing (7) ¦ [ yi yˆi ( ȕ )]2
(6)
where dij represents the Euclidean distance between points i and j, and E is a bandwidth. With Eq. (6), if j coincides with i, the weighting of the data at that point will be unity, and the weighting of other data will decrease as dij increases. For data far from i, the weighting will be essentially zero, effectively excluding these observations from the estimates of parameters for location i.
was computed and the optimal value of E was derived by minimizing the CV score. The localized parameter estimates obtained from GWR exhibit a high degree of variability over space. These spatial patterns reveal the spatial nature of relationships and the spatial consequences of modeling such relationships, which can be used for model improvement.
3 Analysis of Housing and Land Prices with GWR The GWR method was used to analyze the 190-sample data set. The same independent variables as in Model A were used, but the parameters to be estimated varied with location. ( P / S )i
constant i ¦ cki ( X k / S ) ¦ d ki X k İi (9) k
k
The localized parameter matrix was estimated with Eq. (5), where the weighting matrix Wi at point i was defined as ª wi1 0 ... 0 º « 0 w ... 0 » i2 », Wi « « ... ... ... ... » « » 0 ... win ¼ ¬0
GAO Xiaolu (໋غা) et alġInfluence of Spatial Features on Land and Housing Prices
349
§ dij2 · with wij= exp ¨ 2 ¸ . ¨ ȕ ¸ © ¹ A cross validation method was used to estimate E. As Fig. 2 shows, the optimal value was 1.243 km. This value of E was used with the regression analysis using Eq. (9) to estimate the regression coefficients at each sample point.
Fig. 4 Regression coefficients for the ratio of the building floor area to the lot size (actual_FAR)
Fig. 2
CV score variation with bandwidth
Then, the coefficients for each of the 190 sample points were plotted over the study area. Three examples will be given whose spatial patterns are relatively evident: the constant term, actual_FAR (the marginal effect of increasing the ratio of the actual building floor area to the lot size), and tree (the marginal effect of having many trees in the neighborhood). The graphs of the three variables are shown in Figs. 3-5. In all three examples, the estimates are positive. The extents of the spatial variations are represented by the shadings of the Voronoi polygons generated from the sample points. The regression coefficients in the darker areas are higher.
Fig. 3
Regression coefficients for the constant term
Fig. 5 Regression coefficients for the abundance of trees (tree)
The large fluctuations of the coefficients estimated by GWR indicate that it is irrational for them to be assumed constant and suggest the necessity of new variables. If the spatial patterns of the coefficients are quite uniform, then the variable and its form are appropriate. Consider the spatial distribution of the constant term in Fig. 3 as an example. The variation of the unit price of land and houses is evident, even though 16 independent variables have already been included. The dark areas are concentrated in the Seijo area, a residential area known for expensive housing. This might either be caused by a so-called “brand” effect, or be related to the fact that most lots in Seijo area are reasonably large and the population density of this area is lower than that of other areas. The constant terms are lower in the north central and southeastern districts. The distribution maps of actual_FAR (the ratio of the actual building floor area to the lot size) in Fig. 4, of tree (abundance of trees in the neighborhood) in Fig. 5, and of the other variables help visually identify spatial
Tsinghua Science and Technology, June 2005, 10(3): 344̢353
350
features that may affect the spatial patterns. The GWR model provides much more information than a global model. For example, Model A gives 17 parameters, while GWR yields 17u190 parameters that suggest more linkages to omitted spatial features that may affect the unit price of land and houses. As an exploratory method, the GWR method is especially useful in that the mappings of results can be directly used to find underlying relationships. The weighting method used in the GWR method gives higher weights to geographically adjacent observations, so the effect of geographically related features such as the effects in the Seijo area are easily found, since the model estimates around such areas are likely to differ from those of other areas. However, for features that lack geographical correlations, the GWR method will not reveal their effects. For example, if observations belonging to a certain income class or a special interest are loosely distributed, their impact on the land and house prices will not be captured by the GWR method, because the number of observations in Table 3
4 Comparison of Two Approaches The performance of the GWR analysis was compared to that of the residual analysis by examining the Pearson’s correlation coefficients between the residual and the spatial features excluded in the global regression model, and the correlation coefficients between the geographical weighted estimates and the same group of spatial features. The new variables are arranged into two groups. Lot-associated variables are variables such as the unevenness degree (uneven) and the distance to parks which are directly associated with each lot (Table 3). Area-associated variables are variables related to an area containing the lot, such as the block population density (Table 4).
Correlation coefficients for lot-associated variables
Uneven (2) Residuals of global model(1) GWR estimates Constant Actual_FAR Train Road_width Bldg_duration/S Landscaping Shinjuku Frontage Good_pavement Parking_lot Bldg_quality Sunshine/S Adj_park/S Adj_park Mixed_use/S Mixed_use Tree
each localized regression is limited. In such cases, however, the global regression model is a more powerful method for revealing their effects on the land and house prices.
0.025
50 m (3) 0.029
500 m (3) 0.164*
䋭0.006 0.064 0.071 0.047 䋭0.064 䋭0.032 䋭0.027 0.075 0.028 0.003 䋭0.083 䋭0.115 0.023 䋭0.012 䋭0.005 䋭0.005 䋭0.031
䋭0.202** 0.075 0.054 0.142* 䋭0.072 0.052 0.210** 0.124* 䋭0.158* 0.051 䋭0.053 0.169** 䋭0.051 0.055 䋭0.057 0.049 䋭0.137*
0.089 䋭0.152* 䋭0.113 䋭0.013 0.088 0.032 䋭0.052 䋭0.196** 0.039 0.027 䋭0.009 0.037 䋭0.019 0.018 0.072 䋭0.063 0.074
Distance to park 600 m (3) 800 m (3) 0.122* 䋭0.004 0.065 0.000 䋭0.088 䋭0.067 䋭0.110 0.058 0.012 0.073 0.026 0.048 0.064 0.038 䋭0.044 䋭0.004 䋭0.126* 䋭0.073 0.007 䋭0.008 0.065 0.073 䋭0.070 䋭0.177** 0.021 0.009 䋭0.012 0.031 0.008 䋭0.028 0.034 0.108 䋭0.027 䋭0.112 0.043 0.022
*: Significant at 5% level (one-tail); **: Significant at 1% level (one-tail). (1)
: This row shows the correlations of residual_Z with the residual of Model A.
(2)
: Variable indicating the unevenness degree of land around lot.
(3)
: Within 50 m to parks, 1; otherwise, 0. The definitions for 500 m (600 m, 800 m) to parks are similar.
(4)
: Within 500 m to large parks. Large parks indicate the parks larger than 5000 m2.
500 m_large_parks(4) 0.255** 0.056 䋭0.014 䋭0.159* 䋭0.012 䋭0.077 䋭0.048 䋭0.037 0.026 䋭0.032 䋭0.002 0.090 䋭0.050 䋭0.051 0.031 䋭0.077 0.087 䋭0.051
GAO Xiaolu (໋غা) et alġInfluence of Spatial Features on Land and Housing Prices Table 4
Residuals of global model(1) GWR estimates Constant Actual_FAR Train Road_width Bldg_duration/S Landscaping Shinjuku Frontage Good_pavement Parking_lot Bldg_quality Sunshine/S Adj_park/S Adj_park Mixed_use/S Mixed_use Tree
Residuals of global model(1) GWR estimates Constant Actual_FAR Train Road_width Bldg_duration/S Landscaping Shinjuku Frontage Good_pavement Parking_lot Bldg_quality Sunshine/S Adj_park/S Adj_park Mixed_use/S Mixed_use Tree
351
Correlation coefficients for area-associated variables
Seijo
% of road area
0.162*
0.180**
0.447** 䋭0.506** 䋭0.368** 䋭0.472** 0.389** 䋭0.095 䋭0.290** 䋭0.411** 0.146* 䋭0.451** 0.341** 䋭0.294** 0.230** 䋭0.221** 0.526** 䋭0.493** 0.391**
䋭0.178** 䋭0.146* 䋭0.047 0.210** 0.147* 0.136 0.234** 䋭0.057 䋭0.297** 0.098 䋭0.114 0.274** 䋭0.099 0.110 䋭0.045 0.033 䋭0.097
% of road area/S 0.181*
% of public open apace 䋭0.01
% of public open space/S 䋭0.016
% of vacant land 0.07
ˉ0.369** 0.209** 0.076
0.09 0.022 0.201**
0.354** ˉ0.172** 0.040 0.342** 0.224**
䋭0.052 䋭0.018 䋭0.059 䋭0.138* 䋭0.039 0.113 䋭0.117 0.122* 䋭0.052 0.146* 䋭0.138* 0.177** 䋭0.180** 0.072
0.009 0.161* 0.256** 0.002 䋭0.136* 䋭0.089 䋭0.085 0.050 0.109 䋭0.008 0.102 䋭0.005 0.086 䋭0.086 0.069 䋭0.077 0.014
ˉ0.025 0.126* 0.133* 0.222** ˉ0.112 0.089 ˉ0.054 ˉ0.099 0.203** 0.125* ˉ0.180** 0.022 ˉ0.017 0.021 ˉ0.132* 0.105 0.008
ˉ0.269** 0.238** ˉ0.192** 0.320** ˉ0.271** 0.275** ˉ0.245** 0.222** ˉ0.298**
% of vacant land/S
Pop_den
C/R(2)
I/R(2)
(C+I)/R(2)
0.098
䋭0.114
0.044
0.068
0.068
ˉ0.224** 0.296** 0.133* 0.323** ˉ0.275** 0.060 0.138* 0.132* 0.030 0.215** ˉ0.202** 0.157* ˉ0.214** 0.215** 0.248** 0.221** ˉ0.204**
䋭0.458** 0.481** 0.015 0.438** 䋭0.549** 0.282** 0.365** 0.736** 䋭0.493** 0.482** 䋭0.373** 0.161* 䋭0.382** 0.346** 䋭0.511** 0.510** 䋭0.620**
0.194** ˉ0.336** ˉ0.296** ˉ0.044 0.278** 0.157 ˉ0.124* ˉ0.187** 0.048 ˉ0.199** 0.109 ˉ0.103 0.042 0.027 0.201** ˉ0.200** 0.093
0.057 ˉ0.289** 0.078 0.065 0.393** ˉ0.136 ˉ0.025 ˉ0.349** 0.204** ˉ0.171** 0.087 0.088 0.177** ˉ0.156* 0.023 ˉ0.048 0.174**
0.136* ˉ0.367** ˉ0.093 0.022 0.410** ˉ0.022 ˉ0.079 ˉ0.334** 0.165* ˉ0.217** 0.114 0.009 0.143* ˉ0.122* 0.116 ˉ0.133* 0.167*
*: Significant at 5% level (one-tail); **: Significant at 1% level (one-tail). (1)
: This row shows the correlations of residual_Z with the residual of Model A; (2)
: C, I, and R indicate the area of commercial, industrial, and residential land used in block. For instance, I/R is the ratio of industrial to residential
used land in block, and (C+I)/R is the ratio of total commercial and industrial used land to residential used land in block.
352
4.1
Tsinghua Science and Technology, June 2005, 10(3): 344̢353
The GWR method effectively characterized the area-associated variables
The results in Tables 3 and 4 imply that the residual analysis and GWR methods differ greatly in their ability to detect the significance of lot-associated and areaassociated variables. The GWR estimates exhibit significant correlations with such area-associated variables as the Seijo area indicator, population density, road coverage ratio, and land use mixture. However, the GWR approach does not effectively correlate the lot-associated variables. For instance, the population density (pop_den) had strong positive correlations with the GWR estimates of frontage and mixed_use, but strong negative correlations with the spatial variations of bldg_duration/S and tree. This implies that people living in densely populated blocks tend to think that the lot frontage and a large number of mixed land uses are more valuable, while the remaining building age and a large amount of trees are less valuable. Thus, pop_den is a proper variable in the regression model. Likewise, % of road area/S is also significantly correlated to the GWR estimates. In addition, the correlation between GWR estimates and another area-associated variables, such as Seijo (within Seijo area, 1; otherwise, 0), is highly significant, which implies that the hedonic prices for actual_FAR, road_width, landscaping, frontage, and parking_lot are significantly lower in the Seijo area, while the prices for bldg_duration, bldg_quality, tree, etc. are higher. Therefore, the use of the Seijo indicator in the regression model may likely smooth the spatial variations since some omitted physical features or socio-economic characteristics of the people living in this area might have caused the spatial variations. However, when Seijo was added to the regression model in Eq. (1), the resulting coefficient was not significant, perhaps due to the multiple collinearity between Seijo and other variables, especially road_width, Shinjuku, tree, and pop_den. In fact, when Seijo was included, the estimated coefficients for these other variables fluctuated wildly. Similarly, the mixed land use indicators such as the ratio of commercial and industrial used land to residential used land in block, ((C+I)gR1)/S (the ratio of commercial and industrial used land to residential used
land in a block), were significantly correlated to the GWR estimates, but were not significant in the global regression model. 4.2
Residual analysis characterized lot-associated variables
Tables 3 and 4 also show that the residual analysis effectively characterizes the lot-associated variables in Model A, but does not as effectively reveal the influences of area-associated variables.
5 Conclusions Residual analysis and GWR methods were used to analyze the influence of spatial features on housing and land prices. Comparison of the two methods shows that, for the data set in this paper, both methods are useful for improving the hedonic regression models, with the GWR method more effectively revealing the influence of area-associated spatial features, while the residual analysis more effectively identifies the influence of lotassociated spatial features. These results are probably related to the aggregation effects of area-associated variables. Because they indicate the features of the entire block, the multiple observations in the same block often had the same value, so their values were geographically more uniform than the lot-associated variables and the spatial variations caused by the omitted area-associated variables were easily captured by GWR. As a result, the correlations of the area-associated variables with the GWR estimates were stronger. This conclusion ought to be examined further, for example, by changing the aggregation of area-associated variables or by applying the methods described in this paper to other data sets. The localized analysis techniques not only facilitated the discovery of omitted spatial features but also helped identify the effects of variables that tend to conflict with each other. Global regression models sometimes include variables having strong spatial variations, such as the distance to the central city areas, which tend to mask the effects of other more important features. This problem was avoided by restricting the sample area. With local modeling techniques like GWR, even if the sample area is very large, the models can give good estimates because the results vary with location. In addition, the omitted spatial features are
GAO Xiaolu (໋غা) et alġInfluence of Spatial Features on Land and Housing Prices
not the only reasons for the variations of the regression parameters. Market segmentation may also result in different bidding prices for the same attribute. In such situations, localized regression techniques will help distinguish geographically separated markets. A crucial issue underlying the analyses of local regression models is the validation of the models. If, for example, a global regression model outperforms a GWR model, the rational for including the spatial variations revealed by GWR may be lost. For the data set used in this paper, cross validation tests were used to validate the model with the price of each sample predicted with a model including all the rest samples and with analysis of the predicted prices’ deviations from really observed prices. The analyses demonstrate that the global model is reasonable for this data set, but the GWR model is slightly better [7]. The development of spatial analysis techniques is providing more investigative tools for improving hedonic regression models. Experience will give more know-how on which technique is better: in which situation certain techniques should be utilized, how to assess different methods, etc. The comparison of the residual analysis and GWR methods provides useful information for evaluating and using these methods. Acknowledgements The authors gratefully acknowledge the valuable comments from Prof. Atsuyuki Okabe, Prof. Yukio Sadahiro, Dr. Takaya Kojima, Prof. Hongyu Liu, Dr. Chang-Jo Chung, and the members of the Housing Economics Research Group, Japan.
353
References [1] Gao Xiaolu, Asami Y. The external effects of local attributes on living environment in detached residential blocks in Tokyo. Urban Studies, 2001, 38(3): 487-505. [2] Yazawa N, Kanemoto Y. The choice of variables in hedonic approaches. Proceedings of Environmental Science, 1992, 5(1): 45-56. [3] Can A. Specification and estimation of hedonic housing price models. Regional Science and Urban Economics, 1992, 22: 453-474. [4] Fotheringham A S, Brunsdon C. Local forms of spatial analysis. Geographical Analysis, 1999, 31(4): 341-358. [5] Orford S. Modeling spatial structures in local housing market dynamics: A multilevel perspective. Urban Studies, 2000, 37(9): 1643-1671. [6] Fotheringham A S, Charlton M E, Brunsdon C. Geographical weighted regression: A natural evolution of the expansion method for spatial data analysis. Environment and Planning A, 1998, 30: 1905-1927. [7] Gao Xiaolu, Asami Y, Chung C F. An empirical evaluation of hedonic regression models. In: Proceedings of Joint International Symposium on Geospatial Theory, Processing and Applications. Ottawa, Canada, 2002.