Regional Science and Urban Economics 60 (2016) 260–275
Contents lists available at ScienceDirect
Regional Science and Urban Economics journal homepage: www.elsevier.com/locate/regec
Urban house price surfaces near a World Heritage Site: Modeling conditional price and spatial heterogeneity Markus Fritsch a , Harry Haupt a , Pin T. Ng b, c, * a
University of Passau, Department of Statistics, Passau 94030, Germany Northern Arizona University, W. A. Franke College of Business, USA c Anhui University, School of Economics, Hefei, China b
A R T I C L E
I N F O
Article history: Received 17 March 2015 Received in revised form 14 July 2016 Accepted 28 July 2016 Available online 2 August 2016 JEL classification: R12 R23 R3 C21 C51 Keywords: Hedonic pricing Quantile regression Spatial association Spline smoothing
A B S T R A C T In housing price regression, a large bundle of non-separable structural and location characteristics, potentially affecting prices nonlinearly, constitute the relevant set of predictors. Spatial subcenters and complex spatial association structures may, therefore, exist or, stated differently, horizontal market segmentation might be prevalent. Moreover, it is not unlikely for the housing price generating market mechanisms to vary across different parts of the conditional price distribution. This can ultimately cause disparate price segments to exhibit varying functional relationships through different subsets of characteristics and lead to vertical market segmentation. In order to take nonlinearity, horizontal and vertical market segmentation into account within the scope of housing price regressions, we propose incorporating a semiparametric approach into the quantile regression framework. In our empirical application, we investigate rental data from the German city of Regensburg, which contains an Old Town on the World Heritage List. Focusing on location effects exerted by the World Heritage Site, we illustrate how statements about horizontal and vertical market segmentation can be derived from a semiparametric quantile regression model based on empirical evidence and economic reasoning. © 2016 Elsevier B.V. All rights reserved.
Housing price analysis constitutes a complex multivariate problem, as housing is a heterogeneous (or differentiated) good consisting of a variety of attributes, which in turn differ with respect to numerous characteristics. Considerable research interest lies in models capable of explaining the observed market outcomes as much as possible, generating accurate forecasts of prices (of unobserved houses), and foremost providing insights into the complex underlying price generating mechanisms. The latter, captured by the hedonic housing price surface (Rosen, 1974), is usually analyzed via the marginal and aggregated effects that physical and location characteristics exert on prices, and can be addressed by using regression methods. However, regression analysis of marginal and
aggregated hedonic effects is a particularly complex problem for several reasons.1 First, one of the attributes of a house is its geographic location; hence, housing markets are intrinsically spatial, leading to cluster or segmentation effects (see e.g., Goodman, 1981; Dale-Johnson, 1982, for early accounts on this problem) and spatial association (see e.g., Dubin, 1988, 1992; Cheshire and Sheppard, 1995, 1998; Basu and Thibodeau, 1998; Clapp et al., 2002; McMillen and Redfearn, 2010). The statistical consequence is that housing price data may exhibit spatial breaks, spatial correlation and spatial heterogeneity. Generally, the focus of approaches based on spatial econometric methods is to control for spatial association (e.g., Basu and Thibodeau, 1998; Pace et al., 1998b; Cohen and Coughlin, 2008; Holly et al., 2011; McMillen, 2015, and the literature cited therein), while approaches
* Corresponding author at: Northern Arizona University, W. A. Franke College of Business, Flagstaff, AZ86011-5066, USA. E-mail addresses:
[email protected] (M. Fritsch),
[email protected] (H. Haupt),
[email protected] (P. Ng).
1 Green and Malpezzi (2003, p. 32f) survey methods for the analysis of house prices. The issue of regression based estimation of price indices is beyond the scope of this paper (see Diewert, 2003; Silver and Heravi, 2007, or McMillen, 2008, for recent contributions).
1. Introduction
http://dx.doi.org/10.1016/j.regsciurbeco.2016.07.011 0166-0462/© 2016 Elsevier B.V. All rights reserved.
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
relying on methods from spatial statistics explicitly try to estimate the spatial structure (e.g., Clapp et al., 2002; Gelfand et al., 2003; Banerjee et al., 2004; Majumdar et al., 2006). Cressie (1993) discusses models of both worlds while Pace et al. (1998a) and more recently Kauermann et al. (2012) elaborate on inherent differences. Second, housing prices are generically nonlinear with respect to the characteristics of the house. Hence, simple (local) linearization strategies are either incompatible with economic theory (e.g., Epple, 1987; Sheppard, 1999; Ekeland et al., 2004) or do not produce satisfying empirical results (e.g., Goodman, 1978; Halvorsen and Pollakowski, 1981), or both. Simulations of Cropper et al. (1988) reveal that parametric hedonic pricing models do not perform well. McMillen and Redfearn (2010) extend these arguments and provide Monte Carlo and empirical evidence in favor of nonparametric methods. These findings are in line with the recent applied econometric literature on hedonic pricing (e.g., Anglin and Gencay, 1996; Parmeter et al., 2007; Haupt et al., 2010). We attempt to carefully address each of these issues and their delicate interplay. To begin with, we try to account for nonlinearities in both the structural and the spatial component of the hedonic regression. For this, we employ a flexible semiparametric specification incorporating additive structures by using univariate splines and triograms for modeling nonlinearities and spatial association. Such a strategy imposes only weak assumptions on the structure of the inherent data generating processes while avoiding the curse of dimensionality of fully nonparametric approaches. Appropriately specified, the resulting price surface permits the existence of irregularly distributed spatial subcenters – horizontal segmentation of a housing market. We then incorporate the semiparametric approach for modeling nonlinearities and spatial association into the quantile regression framework (Koenker and Bassett, 1978; Koenker 2005). Quantile regression is a type of regression analysis that estimates the conditional median or quantiles of the dependent variable. While the popular least squares regression provides an estimate of the conditional mean given certain values of the covariates, quantile regression directly estimates the various quantiles of the dependent variables given the same set of values of the covariates. In addition to its inherent robustness against outliers in the dependent variable measurements, the main attraction of quantile regression lies in its ability to provide estimates for different aspects of measures of central tendency and dispersion. This provides a more complete picture of the conditional distribution of the dependent variable. The virtue of quantile regression in this context is that it may reveal different market mechanisms being prevalent across different portions of the conditional distribution of the housing price. Stated more explicitly, quantile regression considers possible varying functional relationships defining the influence of different subsets of characteristics on housing price across different segments in the distribution of the housing price. These potential disparities in functional relationships can be directly estimated via different conditional quantiles, thus, allowing vertical segmentation of a housing market. In a nonlinear and spatially structured regression, however, it is not straightforward to assess the complexity of the spatial structure, the degree of nonlinearity, and the relevant subset of characteristics that affect housing price. This problem is further complicated by the fact that these properties of the regression may differ horizontally (for housing submarkets) or vertically (across the conditional distribution of the housing price). Thus, identification of the functional impact of relevant physical and location specific characteristics on hedonic housing price regressions constitutes a particularly challenging problem (see e.g., Parsons, 1990; Arguea and Hsiao, 1993, and Cheshire and Sheppard, 1995). Calculating the effective dimension of a hedonic price surface allows comparison to ad hoc parametric approaches but may be difficult in nonlinear models (see the discussion in Sections 4 and 5 of McMillen and Redfearn, 2010).
261
This highlights another benefit of embedding the semiparametric approach into the quantile regression framework: The so-called exact-fit-property of quantile regressions allows us to estimate the effective dimension of a hedonic price surface – even in highly nonlinear contexts (see e.g., Koenker et al., 1994; Koenker and Mizera, 2004) – and constitutes an essential piece of information that enables us to determine the functional form and interpret the corresponding hedonic shadow prices and conditional predictions. To sum up, incorporating a flexible semiparametric specification in the quantile regression framework facilitates estimation of submarket specific hedonic price surfaces (and hedonic shadow prices) which subsequently enables us to simultaneously account for horizontal and vertical market segmentation. Using a representative sample of the rental housing market in Regensburg, Germany in 2001, we empirically illustrate the proposed method for modeling of horizontal and vertical segmentation of the hedonic housing price surface. We employ flexible nonparametric univariate and bivariate smoothers. The resulting spatial structure under an anisotropic assumption enables us to capture the potential heterogeneity in the spatial distribution of housing prices in Regensburg. The nonparametric approach is purely data-driven while avoiding the necessity of complex ad hoc parametrizations of traditional approaches. We detail specification, estimation and model selection for different methods allowing for horizontal and vertical variation in the hedonic housing price surfaces across different quantiles. In our empirical analysis, we focus on one of the key amenities of Regensburg, the distance to the boundary of the Old Town (historic city center). This area is on the UNESCO World Heritage List.2 After controlling for relevant housing characteristics in a flexible nonlinear fashion, maps of the estimated hedonic price surfaces reveal substantial spatial inequalities between neighborhoods to the east and to the west of the historic city center besides a strong historic city center effect. Though this pattern holds generally, variations emerge across different quantiles. In addition, the effects of housing-quality inducing variables differ across quantiles. Economically, these inequalities are founded by urban development outside the historic city center taking quite different routes in the eastern neighborhoods compared to the western neighborhoods, leading to the city districts in the east and west diverging in terms of their structural amenities and disamenities. The remainder of the paper is organized as follows: Section 2 introduces spatially additive hedonic price quantile regression models and embeds our approach in the relevant literature. In Section 3, we apply the proposed method to investigate and economically interpret the complexity and variability in hedonic shadow prices and price predictions using spatially structured urban housing market data. Finally, Section 4 summarizes our findings. 2. Smooth modeling of spatial heterogeneity To attempt to address the issues listed in Section 1, we define a hedonic price equation to be of the form p = h(l, g, 4), with a scalar dependent price variable p, structural regression component l, spatial component g and latent (error) component 4. It has been known since the 1950s that any multivariate function can be represented by finite composition (addition) of continuous functions of
2 “Located on the Danube River, the Old Town of Regensburg with Stadtamhof is an exceptional example of a central-European medieval trading center, which illustrates an interchange of cultural and architectural influences. The property encompasses the city center on the south side of the river, two long islands in the Danube [. . . ], and the area of the former charity hospital [. . . ]. A navigable canal, part of the European waterway of the Rhine-Main-Danube canal, forms the northern boundary of Stadtamhof.” (Quote from UNESCO World Heritage Centre (2016) at http://whc.unesco.org/en/list/ 1155).
262
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
4
4
12
2
2
12 11
0
10 −2
−2
10
y
y
0
11
9 −4
−4
9
8 −4
−2
0
2
4
−4
x
−2
0
2
4
x
(a) isotropic
(b) anisotropic
Fig. 1. Estimated conditional mean price surface for a purely spatial model p = g + 4. Left display: isotropic setting. Right display: anisotropic setting. The plot includes the contours of the rivers passing through Regensburg (thin blue lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
the respective variables via the Kolmogorov-Arnold Representation Theorem (Kolmogorov, 1956 and Arnold, 1957). We, therefore, consider a spatial additive setup for regression analysis in this article, where a hedonic price regression function has the form
As a framework for our estimations, we consider the additive semiparametric model
p = l1 (z1 ) + z2 b2 + g(s) + 4, p = l + g + 4.
(2)
(1)
Various approaches can be employed to estimate Eq. (1), for instance the conditional mean or conditional t-quantiles for t ∈ (0, 1). This paper focuses on the latter in order to estimate the potentially different market mechanisms that might prevail across different parts of the conditional price distribution while reporting the conditional mean as a traditional reference point. In a monocentric baseline model, the spatial component g is a scalar function of distance to central business district (CBD). Craig and Ng (2001) use univariate smoothing splines to identify employment subcenters for multicentric urban areas. As such an isotropic assumption – effectively implying horizontal segmentation of the price surface by concentric circles – rarely holds in empirical practice (e.g., Zhu et al., 2011), we model g as a flexible and smooth bivariate function (a triogram, which is a flexible surface connected by a series of piecewise linear triangular planes) of longitude and latitude. This more general setting allows for the existence of subcenters and anisotropic spatial association. Fig. 1 contrasts estimated conditional mean price surfaces from the German city of Regensburg by using the purely spatial model p = g + 4. In the isotropic case, g( • ) is assumed to be a scalar function of the euclidean distance from city center, while in the anisotropic case, g( • ) is assumed to be a bivariate triogram function of longitude and latitude. The surfaces are estimated by smoothing over a fine grid of location covariates (and corresponding euclidean distances to the historic city center in the isotropic setting), with the grid being limited by the range of the location characteristics in the Regensburg rental data. As illustrated by the plots, the model employing bivariate triograms flexibly adapts to complex price surfaces while the monocentric baseline model enforces a price surface consisting of concentric circles. The additivity between l and g (and 4) in Eq. (1) also occurs in the conditionally parametric (CPAR) approach used by McMillen and Redfearn (2010). The CPAR model assumes additivity between a spatial component g and a structural component l. It allows hedonic shadow prices (of the structural characteristics in l) to vary over space by using a weighting scheme based on euclidean distance between observations.
where z = (z1 , z2 ) are characteristics of a house such as its physical attributes, and micro- and macro-location (dis)amenities while l 1 ( • ) and g( • ) constitute unknown smooth functions. In the anisotropic case, the spatial location s = s(x, y) of a house is captured by its longitude and latitude measurements x and y. This enables us to better capture the potential spatial breaks, spatial correlation and spatial heterogeneity of the price surface. For isotropic specifications, the locational information is represented by the distance to a certain reference point. If a household chooses a particular house in an urban housing market, this choice simultaneously determines a certain set of characteristics z and a location s. Restricting all non-spatial characteristics of a house to enter Eq. (2) in linear parametric fashion, results in the partially linear model p = zb + g(s) + 4. In the partially linear model, only the spatial component enters the model in nonparametric fashion while the CPAR model p = zb(s) + g(s) + 4 relaxes the assumptions of the former by allowing the coefficients b to depend on s. Our model allows z1 to influence p nonparametrically while letting z2 affect p parametrically. Compared to the partially linear model, this specification provides us with greater flexibility to better capture the generically nonlinear relationship of p with respect to the characteristics of a house while still allowing a subset of characteristics to influence the housing price linearly if called for. Finally, restricting the spatial effect g( • ) in the partially linear model to be represented by a parametric polynomial of longitude and latitude (e.g., Clapp et al., 2002) yields a classical multiple linear regression model (MLR). Quantile regression analysis of the additive semiparametric model in Eq. (2) is studied by He et al. (1998), He and Ng (1999), Koenker and Mizera (2004), and Koenker (2011). To model the smoothing of the univariate quantile function l 1 (z1 ) in Eq. (2), Koenker et al. (1994) propose the use of quantile smoothing splines. He et al. (1998) introduce bivariate tensor product splines to capture nonparametric bivariate quantile functions, whereas Koenker and Mizera (2004) suggest triograms with the advantage of being rotationally invariant as compared to tensor product splines. Rotational invariance is a desirable property of an estimation technique where the resulting estimates do not depend on the orientation of the coordinates. This is particularly appealing in our context as any rotation
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
of the spatial coordinates (x, y) will not affect the estimated quantile regression surfaces. We, therefore, adopt triograms to specify the bivariate function g(s) in Eq. (2). Previous studies on hedonic housing price estimation typically use either linear least absolute deviations based regressions (e.g., McMillen, 2008 and the literature cited therein), or semiparametric least squares based approaches (e.g., Clapp et al., 2002 and the literature cited therein). McMillen (2013, 2015) discusses flexible quantile versions of spatial econometric models. Using kernel based least squares estimation, McMillen and Redfearn (2010) provide a thorough discussion of the empirical modeling steps starting from a general hedonic price equation. As mentioned above, their approach allows for horizontal, that is spatial variation of hedonic shadow prices due to their flexible specification of l 1 ( • ) and g( • ). In their model, the spatial component and the spatially varying structural characteristics also follow the additive structure of Eq. (1). Vertical market segmentation based on the conditional distribution of the housing price surface, however, can only be estimated using our quantile regression framework. Landmark papers motivating and advocating the use of additivity in flexible regression models are Friedman (1991) and Stone (1994). A general approach embedding the virtues of all the additive models discussed above can be seen in a fully nonparametric (quantile) regression model. It is well known that such an approach does not require any of the structural assumptions made by all works cited above. However, the data requirement is more extensive; see Haupt et al. (2011) for an elaborate comparison of parametric, semi- and fully nonparametric quantile regressions. Their detailed discussion of Monte Carlo results and an application to the well-known Boston housing data provides insights on exactly how the proposed semiand nonparametric models work (see also McMillen and Redfearn, 2010). All of our semiparametric quantile regression estimations are based on minimizing the following penalized objective function n
qt pi − l1t (z1i ) − z2i b2t − gt (si ) + k(t)g (l1t , gt ) ,
(3)
i=1
over a suitable function space F , with l, g ∈ F , where qt (u) = {t −I(u < 0)}u is the well known “check-function” in quantile regression which assigns a positive weight of t to positive residuals and a negative weight of t − 1 to negative residuals. For t = 0.5, the check-function treats positive and negative residuals equally and, as a result, the estimated price surface is the conditional median estimate which divides all the points in the data cloud into two equal halves, one above and one below the quantile regression hyperplane. For any other t values, the check-function weighs the positive and negative residuals differently and results in an estimated price surface with t portions of the observations falling below and 1 − t portions falling above the quantile regression hyperplane. The quantile regression estimated price surface is, hence, the t-th estimated conditional quantile of the housing price given the set of covariates. Note that all of the model parameters or respective predictors may vary with t and, thus, are estimated for a family of quantiles, t ∈ (0, 1). For every t ∈ (0, 1), there is a nonparametric estimate for l 1 ( • ) and g( • ) and a parametric estimate for b2 . Therefore, we use the t subscript to specify l 1 ( • ), g( • ) and b2 in Eq. (3). As mentioned in Section 1, assessing the complexity of the model to best estimate the spatial structure, the degree of nonlinearity, and the relevant subset of characteristics of a model is a huge challenge, especially when both vertical and horizontal segmentations are taken into consideration simultaneously. Fortunately, the effective dimension or degrees of freedom of a fit in Eq. (3) can be used to mitigate this challenge. The effective dimension of the estimated hedonic price surface (denoted as dk (t) below) quantifies the
263
complexity of the fit. The higher the effective dimension, the more complex is the model. In the quantile smoothing spline l 1t , for example, the lowest possible effective dimension is 2, which translates to the simplest linear fit that passes through only two data points that yields two zero residuals. The highest possible effective dimension is equal to the number of observations n, which boils down to the most complex fit of an interpolating spline that passes through every single data point – resulting in n zero residuals. In the triogram gt , the lowest effective dimension is 3, which corresponds to the best global linear plane that interpolates only 3 data points to provide the simplest model with 3 zero residuals, while the highest effective dimension occurs when all the data points are interpolated by the roughest (most complex and most flexible) piecewise triangular fit that yields n zero residuals. In this article, the appropriate dimension of the fitted model is determined through the choice of the optimal smoothing parameter (vector) k(t) that minimizes the following quantile-based version of the Schwarz information criterion SICk (t) that is most suited for quantile regression as suggested in Machado (1993) and Koenker et al. (1994): 1 SICk (t) = log n−1 qt pi − l1t (z1i ) − z2i b2t − gt (si ) + n−1 dk (t) log n 2
The SICk (t) combines both the measure of infidelity, captured by qt , and the measure of complexity (roughness) of the model, captured by the effective dimension dk (t) of the fitted model. The objective is to choose a k(t) to minimize SICk (t) across the spectrum of possible values for k(t) in an attempt to balance between fidelity to the data (low qt value) and parsimony of the model (low dk (t) value). The smaller the qt value is, the closer the fitted value is to the observed values of the dependent variable – high fidelity to the data. The lower the dk (t) value is, the smoother is the fit – simpler and more parsimonious model. When k(t) approaches 0, no penalty is exerted on the roughness g of the fitted l 1t and gt in the objective function in Eq. (3). Hence, the fitted l 1t and gt will result in the roughest (largest g) interpolating spline and triogram and will pass through all the points in the data cloud to reduce the amount of infidelity to the data qt to as low as possible. This yields the most complex model possible (high dk (t) value). When k(t) approaches ∞, on the other hand, the objective function is forced to trade-off fidelity qt for the smoothest fit to reduce the roughness penalty g. This translates to a global straight line and flat linear plane for l 1t and gt , respectively, which corresponds to the simplest model (smallest dk (t) and lowest g). The smoothing parameter k(t), therefore, serves as a knob that balances the fidelity to the data and parsimony of the model through minimizing the SICk (t).
3. Empirical analysis of urban rental data In this section, we investigate the functional form and complexity of the structural and spatial components in a hedonic housing price surface using data from an urban rental market. After an outline of the data, we start by discussing empirical evidence of spatial nonlinearity and complexity along the lines of the theoretical considerations of Section 1. In order to gain a deeper insight into how the latter are generated by the interplay of physical and location characteristics, we provide a detailed comparison of the alternative flexible modeling strategies introduced in Section 2, based on model fit (fidelity), model complexity and respective economic implications. In order to make our empirical analysis as accessible as possible, we
264
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275 Table 1 Descriptive statistics. Variable rpsqm size age eucl.dist.c eucl.dist.hist.in eucl.dist.hist.out
Fig. 2. City map of Regensburg with stylized facts. Observations within the historic city center are included as blue dots, observations outside the historic city center are represented in orange. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
provide figures, tables, and interpretations for classical least squares methods along with our quantile regression estimation results.3 The data are a representative sample from the urban rental market in Regensburg, Germany and form the basis for the Regensburg rental guide 2002. By act of German law, a rental guide should reflect the market rent customary in a city – given a specific quality, location, and equipment. The law precisely defines the population from which the data need to be sampled: subsidized, cut-rate and owner-occupied apartments, occasionally used apartments such as vacation homes, apartments in student or care facilities and readyfurnished apartments are explicitly excluded from the sampling scheme. Equally barred from the sample are apartments for which the net rent has not been adjusted within the last four years, since they are considered as non-market-compliant. Fig. 2 presents stylized facts concerning the city of Regensburg and illustrates the location of the n = 783 observations available for our empirical analysis. Besides the contours of the rivers passing through Regensburg (thin blue lines), the plot contains the 18 districts (thin black dashed lines), the highways (bold grey lines) and the railway system (bold black lines). Moreover, the figure pictures a peculiarity of Regensburg: its medieval city center, the Old Town, is on the UNESCO World Heritage List since 2008 and comprehends a pedestrian zone in its middle. In the north, branches of the rivers Danube and Regen (bold blue line) effectively bound the medieval
3 All estimations are performed using R version 3.2.5 (R Development Core Team, 2016). For quantile regression, we employ the quantreg package (Koenker, 2016) and model nonparametric univariate and bivariate terms with univariate quantile smoothing splines and triograms. Smoothing parameters are selected by minimizing the Schwarz information criterion (SIC(t)) using a Nelder-Mead simplex search algorithm when necessary. For investigating the conditional mean with linear parametric and semiparametric models we resort to generalized additive models (Wood, 2006) implemented in the mgcv package (Wood, 2016). We use thin plate regression splines (Wood, 2003) to represent smooth nonparametric effects. Smoothing parameters for the generalized additive models are estimated by Maximum Likelihood based on a Pearson estimate of scale. The smoothing parameters govern the fit-complexity trade-off of the spline terms. In order to create the euclidean distance variables, we employ the packages maptools (Bivand, 2016a) and rgeos (Bivand, 2016b). The heatmaps are generated using the plot3D package (Soetaert, 2016).
Mean
Median
Std. Dev.
Min
Max
11 66 44 2025 71 1168
11 63 42 1808 0 959
3.25 25.70 30.73 1242.84 159.86 1106.91
3 13 1 102 0 0
25 170 121 6364 697 5187
city center while the southern bound is made up of a green belt which is located where parts of the ancient city walls once stood (bold green line). The green belt is surrounded by arterial roads connecting the medieval city center with the highways and state roads. Single and semidetached houses are the dominating real estate west of the historic city center boundary. Farther out, the western part of Regensburg also features parks and recreational areas. The eastern part of Regensburg is less loosely covered compared to the western part – with multistory houses constituting the predominant type of real estate. Additionally, farther out, the eastern part of the city hosts the harbor and industrial areas. Beside the outcome variable – monthly net rent per square meter in German Marks (rpsqm) – the data set contains covariates indicating physical quality and location of apartments. Among the apartment quality-inducing variables are size of living area in square meters (size), age of the building (or age after complete refurbishment) (age), and dummy variables recording if an apartment possesses central heating (ch), warm water supply (ww), above-average sanitary equipment (bath) or a nearby park (green). Information on the location of the individual observations is given by longitude (x), latitude (y), and a categorical variable for the 18 districts of Regensburg (district). Using longitude and latitude, we calculate the euclidean distance (measured in meters) to the cathedral (eucl.dist.c) for our isotropic models. Two additional variables are calculated to allow the distance to the historic city center boundary to have a different effect within (eucl.dist.hist.in) and outside (eucl.dist.hist.out) the historic city center. Table 1 presents descriptive statistics for all continuous variables used for estimations. A natural first step to gain insights into the complex housing price generating mechanisms consists of modeling the observed market outcomes in a spatial dimension only. This exploratory analysis is built on the purely spatial model p = g(s) + 4. Placing an isotropic assumption on g(s) basically enforces the spatial pattern to be composed of concentric circles. The comparison in Fig. 1 raises doubts that the Regensburg rental data support an isotropic assumption. A highly nonlinear conditional mean price surface is revealed when we allow for more flexible anisotropic spatial association. Modeling g(s) with a bivariate triogram of longitude and latitude can be compared to the surfaces estimated in Koenker and Mizera (2004) or Ng and Yan (2008) – each regressing on the spatial component g(s) only. The estimated price surface in the right display of Fig. 1 hints at the existence of horizontal market segmentation resulting from spatial association structures in the Regensburg rental data. Besides the effect of apartment location, the spatial component is susceptible to picking up effects of omitted physical characteristics clustered together (e.g., regions in which net rent per square meter is high due to high quality neighborhoods being developed contemporaneously). We continue by investigating if the spatial association structures differ across quantiles and, thereby hint at the presence of vertical market segmentation. Fig. 3 illustrates the estimated conditional quantile price surfaces for t = {0.1, 0.5, 0.9} based on the isotropic and the anisotropic version of the purely spatial model. The surfaces are fitted by smoothing over the same fine grid of location covariates
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
265
Fig. 3. Estimated conditional t = {0.1, 0.5, 0.9} quantile price surfaces for the purely spatial model p = g + 4. Left displays: isotropic setting. Right displays: anisotropic setting. The plots include the contours of the rivers passing through Regensburg (thin blue lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
266
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
Table 2 Estimated least squares coefficients of baseline multiple linear regression and multiple linear regression with district fixed effects. mlr1
Intercept size age ch bath ww green x/1000 y/1000 eucl.dist.hist.in/1000 eucl.dist.hist.out/1000 Deviance explained SIC Degrees of freedom
mlr2
Est.
Std. Dev.
p-Value
Est.
Std. Dev.
p-Value
464.099 −0.051 −0.014 2.090 0.887 1.013 0.795 −0.158 0.048 3.952 −0.215
434.610 0.004 0.004 0.296 0.610 0.253 0.199 0.053 0.069 0.719 0.114
0.286 0.000 0.001 0.000 0.146 0.000 0.000 0.003 0.484 0.000 0.059
999.840 −0.051 −0.014 1.956 0.866 0.974 0.764 −0.018 −0.166 2.228 −0.210
1545.285 0.004 0.004 0.298 0.634 0.253 0.200 0.159 0.252 1.075 0.195
0.518 0.000 0.002 0.000 0.173 0.000 0.000 0.908 0.509 0.039 0.284
0.410 3732.590 11
0.434 3813.522 28
Notes: Summary statistics for the district fixed effects are omitted for the sake of brevity.
also used in creating Fig. 1. Again, the estimated conditional quantile price surfaces under an isotropic assumption are composed of concentric circles enforced over the city region and seem to miss substantial features of the conditional quantile price surfaces. As it does not impose concentric circles artificially, an anisotropic specification appears to be more natural for investigating the conditional quantile price surface. Hence, we focus our attention on the anisotropic setting, where spatial association structures with varying subcenters become apparent across quantiles. The estimated conditional quantile price surfaces differ considerably across t, indicating that spatial association structures vary across the conditional price distribution. Evident are a historic city center effect and an east-west pattern. The level of the estimated conditional quantile price surfaces in the historic city center and the western parts of the city is generally above the level in the eastern parts of the city. A location in the historic city center has a more pronounced effect on rent for higher quantiles (t = {0.5, 0.9}). Overall, under an anisotropic assumption, the results from the quantile regressions with purely spatial predictor provide evidence for varying spatial association structures across quantiles and, therefore, suggest that horizontal and vertical market segmentation effects are present. It is natural to wonder whether the previous evidence for market segmentation still emerges, when controlling for structural characteristics. As stressed in Section 1 and the literature cited therein, this is a particularly challenging task due to nonlinear effects of housing characteristics on housing price and their potential variation across the conditional distribution of the housing price. Again, we start our exposition by discussing the results obtained by using traditional least squares based estimation of the conditional expectation. We consider a model containing the structural characteristics size, age and the apartment quality indicating dummy variables, as well as the locational information longitude, latitude and the conditional euclidean distance to historic city center boundary covariates as explanatory variables. The resulting baseline MLR is denoted by (mlr1). Alternatively we also estimate a model (mlr2), which additionally includes fixed effects for the districts displayed in Fig. 2. Table 2 presents estimated regression coefficients, the corresponding standard errors and p-values, as well as selected measures on model fit and model complexity for both MLRs. With the exception of bath, y/1000, and eucl.dist.hist.out, all estimated coefficients are significant at any reasonable significance level for the baseline MLR and have the expected signs. The estimated coefficient for x/1000 is negative, suggesting that net rent per square meter decreases (on average) with an apartment being located further east (when everything else is held constant). This seems reasonable due to single and semidetached
houses dominating the real estate landscape in the western part of Regensburg while the eastern parts are less loosely covered, predominantly with multistory houses. The distances to the historic city center boundary have a positive effect on net rent per square meter if an observation is located within the historic city center and exert a negative effect otherwise. This implies that for apartments located inside the historic city center, net rent per square meter increases when moving from the historic city boundary where arterial roads are located to the pedestrian zone in the middle of the historic city center which includes the main tourist attractions and shopping facilities. For apartments located outside the historic city center, net rent per square meter decreases with increasing distance to the historic city center boundary. When including district fixed effects into the MLR, some of the relevant location characteristics are captured by the district fixed effects and the fit increases as expected. However, the loss in degrees of freedom is substantial when incorporating district fixed effects and the Bayesian information criterion (SIC), balancing the trade-off between model fit and model complexity, increases. We abstain from further attempts to improve the fixed effects model by introducing interactions (see the discussion in McMillen and Redfearn, 2010) and proceed by restricting our attention to the baseline MLR. In Fig. 4, we examine the baseline MLR with generalized partial residual plots for evidence of nonlinearities. Generalized partial residual plots provide visualization of residuals with respect to certain covariates and are obtained as follows: First, the centered outcome variable is regressed on all centered continuous predictors and the categorical predictors. Second, the estimated effect of the predictor of interest is added to the residuals of the regression from step one. These so-called partial residuals are then plotted versus the respective predictors and a scatterplot smoother is fitted (see e.g. Tsai et al., 1998). A scatterplot smoother based on local polynomial regression (solid black line) and a straight line (red dashed line) are fit to the partial residuals. The plots indicate the presence of pronounced nonlinearities in size and weaker nonlinear effects in all other continuous predictors. The figure also reveals heteroscedasticity in that the variation in the dependent variable decreases as apartment size increases. Fig. 5 shows further regression diagnostics obtained by performing linear quantile regressions of net rent per square meter on the predictors contained in the baseline MLR over a grid of quantiles t. The plots present the conditional mean effects as solid red lines (with dashed one standard error bands) and the conditional quantile effects as dashed black lines interrupted by points, which mark the grid points at t. The surrounding grey shades represent the 0.95 confidence bands for the conditional quantile effects; for details on
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
267
Fig. 4. Partial residual plots based on least squares estimation of the baseline multiple linear regression mlr1. Axes displaying distance information are scaled in kilometers. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
the calculation of the confidence bands, see Koenker (2011). Fig. 5 unveils quantile effects in all continuous covariates as the estimated effects of the covariates change markedly over t. The detected nonlinearities suggest employing a more general generalized additive model for the conditional mean estimation where all continuous housing characteristics are allowed to enter in a flexible nonparametric fashion. We utilize univariate smoothing splines to model all continuous housing quality-inducing variables and the euclidean distance conditional on an apartment being located within or outside the historic city center. Categorical covariates are included in the estimations via the usual dummy variable approach. Again, we consider an isotropic and an anisotropic version of Eq. (2): in the isotropic case, a univariate smoothing spline of euclidean distance from city center is used to capture the spatial effect, while the anisotropic version of the model employs a bivariate smoothing spline of longitude and latitude. Table 3 shows the coefficient estimates, the standard errors and p-values for all parametric effects, and the information on the estimated degrees of freedom (edf), the number of columns of the model matrix corresponding to the respective term (Ref.df) plus the p-values obtained via approximate significance tests (details are provided in Wood, 2013) for the effects represented by univariate and bivariate smoothing splines for both models.4 The results indicate that similar to the MLRs, the estimated coefficient for bath is not significant for any specification at any reasonable significance level, while all other estimated coefficients of indicator variables for housing quality are. For the smooth effects of the isotropic and
4 Note that an estimated degrees of freedom equal to 1 for univariate nonparametric components corresponds to a linear fit in the respective covariate dimension, while large values are equivalent to a rougher fit. For bivariate nonparametric components, a value of 2 corresponds to a linear location manifold and larger values signify a rougher fitted spatial manifold.
the anisotropic model, the p-values of tests for approximate significance suggest insignificance at any reasonable level only for the anisotropic univariate smoothing spline of eucl.dist.hist.out. The anisotropic specification consumes slightly fewer degrees of freedom compared to the isotropic version of the model. Making use of almost 19 degrees of freedom, the anisotropic version of Eq. (2) poses more severe requirements in terms of necessary observations on the data than the MLR without district fixed effects (11 degrees of freedom as stated in Table 2). However, the degrees of freedom spent are markedly lower than the ones consumed by the MLR with district fixed effects (28 degrees of freedom as shown in Table 2). Furthermore, the Bayesian SIC of the anisotropic model is well below that of the MLRs considered in Table 2. Fig. 6 displays the univariate smoothing splines fits under isotropic and anisotropic assumptions, where the labels s( • ) on the vertical axes provide details on the predictors to which the nonparametric effects are fitted and the estimated degrees of freedom of the respective term. The estimated degrees of freedom approximate the degrees of freedom spent on the respective effects. Fig. 6 suggests that modeling only size and age with smoothing splines may be sufficient: the regression function is penalized such that the splines representing the conditional distance to historic city center boundary variables (eucl.dist.hist.in and eucl.dist.hist.out) are straight lines. Fig. 7 illustrates conditional mean price surfaces based on an isotropic and an anisotropic setting. The displays differ from the purely spatial model in Fig. 1 because the structural characteristics are included besides the bivariate smoothing spline in longitude and latitude. The surfaces shown in Fig. 7 are fitted by smoothing over the same fine grid of location covariates already used to create the estimated surfaces for the purely spatial model displayed in Fig. 1. Additionally, corresponding conditional euclidean distances to the historic city center boundary are calculated for all grid points. In computing the fitted price surfaces, age and
268
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
Fig. 5. Diagnostic plot to check for quantile effects (ordinate) based on linear quantile regression of net rent per square meter on all predictors contained in the baseline multiple linear regression mlr1 over a grid of t (abscissa). Axes displaying distance information are scaled in kilometers. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)
size are set to their respective median values, while all dummy covariates are set to zero. For the isotropic specification, circular patterns still appear in the estimated surface, though the effects are mitigated to a certain extent compared to the purely spatial model. The reason the effect is diminished stems from the variables
creating the aggregated spatial pattern originating from different locations and descending in different shapes: eucl.dist.hist.in and eucl.dist.hist.out emanate from the historic city center boundary and create roughly elliptical patterns while eucl.dist.c descends from the cathedral via concentric circles. Furthermore, the
Table 3 Estimated coefficients of parametric effects, selected statistics to illustrate smooth terms and descriptives on fit-complexity trade-off for analyzing the conditional expectation of isotropic and anisotropic versions of Eq. (2). gam.iso
Intercept ch bath ww green
s(size) s(age) s(eucl.dist.hist.in) s(eucl.dist.hist.out) s(eucl.dist.c) s(x, y) Deviance explained SIC Degrees of freedom
gam.anis
Est.
Std. Err.
p-Value
Est.
Std. Err.
p-Value
8.586 1.782 0.364 0.876 0.748
0.214 0.280 0.576 0.238 0.184
0.000 0.000 0.527 0.000 0.000
8.587 1.810 0.291 0.855 0.723
0.213 0.278 0.569 0.236 0.184
0.000 0.000 0.610 0.000 0.000
edf
Ref.df
p-Value
6.684 3.761 0.981 0.854 2.353 –
9.000 9.000 9.000 9.000 9.000 –
0.000 0.000 0.001 0.003 0.012 –
0.499 3682.942 19.633
edf
Ref.df
p-Value
6.916 3.815 0.923 0.418 – 1.794
9.000 9.000 9.000 9.000 – 29.000
0.000 0.000 0 .000 0 .181 – 0.000
0.504 3650.077 18.867
Notes: The s(x, y) signifies the bivariate smoothed fit while the s( • ) represent the univariate smoothed components.
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
269
Fig. 6. Estimated marginal effects of univariate smoothing spline terms in conditional mean analysis of isotropic and anisotropic versions of Eq. (2). The y-axes indicate fitted values, the x-axes represent the respective covariates. Left display: isotropic setting. Right display: anisotropic setting. Axes displaying distance information are scaled in kilometers.
cathedral is not located exactly in the middle of the historic city center. In the anisotropic case, the spatial pattern is comprised of the flexible bivariate smoothing spline of longitude and latitude plus the distance to historic city center boundary effect, which expands elliptically over the city region. Both plots reveal a pronounced impact of a location within the historic city center. Beyond this effect, the anisotropic specification again indicates that the Regensburg rental data contain an east-west pattern, with net rent per square meter being higher in the western parts of the city compared to the eastern
parts. The isotropic specification is not capable of revealing this pattern due to its restricted flexibility when compared to the bivariate anisotropic counterpart. Beyond the east-west pattern and the historic city center effect, the spatial association structures that show up in Fig. 1 disappear when we control for structural characteristics and extend the set of location covariates. As noted before, the anisotropic specification seems to be more appropriate due to its higher flexibility and, therefore, better able to capture complex spatial association structures.
Fig. 7. Estimated conditional mean price surface for spatial pattern of isotropic and anisotropic versions of Eq. (2). Left display: isotropic setting. Right display: anisotropic setting. The plot includes the contours of the rivers passing through Regensburg (thin blue lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
270
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
Fig. 8. Estimated conditional t = {0.1, 0.5, 0.9} quantile price surfaces for spatial pattern of isotropic and anisotropic version of Eq. (2). Left displays: isotropic setting. Right displays: anisotropic setting. The plots include the contours of the rivers passing through Regensburg (thin blue lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
In Fig. 8, we sustain this extended flexible regression setup and return to investigating both horizontal and vertical market segmentation by performing quantile regressions. We embed the isotropic and anisotropic versions of Eq. (2) into the quantile regression framework and solve the penalized quantile regression objective function denoted in Eq. (3) for the resulting expressions. In the isotropic setting, all nonparametric terms are included as univariate quantile smoothing splines l 1t and gt ; under an anisotropic assumption, the bivariate effect of longitude and latitude is represented by a triogram function gt . Fig. 8 shows estimated conditional quantile price surfaces for both cases. The surfaces are created by fitting the estimated quantile regression models on the exact same fine grid of location characteristics that is used to generate the estimated surfaces in Fig. 7. As before, age and size are set to their respective medians and all dummy variables are set to zero. In the isotropic setting, the resulting estimated conditional quantile price surfaces are again comprised of an elliptical texture descending from the historic city center boundary and a concentric circle-shaped pattern originating from the historic city center. In the anisotropic case, the elliptical texture and bivariate flexibility in longitude and latitude make up the spatial effect. Fig. 8 indicates that estimation based on an isotropic assumption is able to pick up the historic city center effect, but enforces artificial concentric patterns on the estimated surfaces. Across all quantiles this leads to the isotropic models completely missing the general, east-west pattern in Regensburg. Analogous to the purely spatial model, an anisotropic setting seems to be more appropriate. The east-west pattern is economically plausible, due to reasons discussed above: since the western parts of the city contain parks and recreational areas with single and semidetached houses being the dominating type of real estate. The eastern parts of Regensburg include the harbor, industry and are less loosely covered, compared to the western parts of the city, since multistory houses are the predominant building structures. The varying texture of the estimated conditional quantile price surfaces over t under an anisotropic assumption suggests that both horizontal and vertical market segmentation effects are present – even after controlling for structural characteristics. We will investigate the east-west pattern observed in Fig. 8 in greater detail. Under an anisotropic assumption, the spatial pattern g(s) consists of a bivariate effect of longitude and latitude s(x, y) and a univariate distance to historic city center boundary effect sin = s(dist.hist.in) or sout = s(dist.hist.out) – depending on if an apartment is located within or outside the historic city center. Let us consider Eq. (2) for two different spatial points s1 and s2 with identical structural characteristics, the first located within, the second outside the historic city center walls. Taking conditional expectations and calculating the difference E (p|s = s1 ) − E (p|s = s2 ) = g (s1 ) − g (s2 ) = s (x1 , y1 ) + sin1 − (s (x2 , y2 ) + sout2 ) gives us the expected price difference between an apartment at location s1 and an apartment at location s2 . The same can be done for conditional quantiles of interest. More generally, we can generate maps based on the estimated spatial effects sˆ = s(x,ˆ y) + sˆin ˆ for apartments within the historic city center or sˆ = s(x,ˆ y) + sout for apartments outside the historic city center – on a spatial grid of interest. For our purpose of investigating the spatial pattern in Regensburg under an anisotropic assumption more thoroughly, we exemplarily select one point in the middle of the historic city center and two points approximately 0.3 km west and east of its boundary, respectively. Fig. 9 shows the three selected points. We calculate the estimated spatial effect sˆ for each of the three points using the t = {0.1, 0.5, 0.9} quantile estimates of Eq. (2) displayed and discussed in Fig. 8.
271
Fig. 9. Three locations selected for illustration of the east-west and the historic city center boundary effect under an anisotropic assumption in Regensburg. The plot includes the contours of the rivers passing through Regensburg (thin blue lines), the highways (grey lines), the railway system (black lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
The results are summarized in Table 4. As in previous analysis, we observe an east-west pattern – with sˆ in the western point exceeding sˆ in the eastern point – and an even more pronounced historic city center effect. At the middle point, the sˆin effect dominates the spatial pattern sˆ . For the western and eastern points the s(x,ˆ y) effect is the dominating part for the t = 0.5 quantile while ˆ slightly exceeds s(x,ˆ y) in absofor the t = {0.1, 0.9} quantiles, sout lute terms. At the three selected points, the spatial effect sˆ induces an absolute variation in predicted net rent per square meter from middle to east and from middle to west in between 2.757 (middle to east for t = 0.1 quantile) and 1.256 (middle to west for t = 0.9 quantile) German Marks. Figs. 10 and 11 provide a visualization of the estimated marginal effects of the univariate quantile smoothing spline terms across t = {0.1, 0.5, 0.9} in the isotropic and the anisotropic setting. The plots illustrate that estimated conditional quantile effects vary over t for all covariates. For quite disparate quantiles t = 0.5 and t = 0.9, we observe quantile crossing of the estimated conditional effects for the euclidean distance to city center (see Fig. 10). As theory prohibits crossing quantile curves, He (1997) suggests that crossing quantile curves obtained from performing unrestricted quantile regression may serve as indicators for misspecification of the quantile regression model. Beside the isotropic version of Eq. (2) not being able to capture the spatial association structures uncovered under an anisotropic assumption, the quantile crossing provides further evidence in favor of the anisotropic setting. Against this backdrop, we restrict our attention to the estimated marginal quantile effects based on the anisotropic setting and, hence, to a detailed interpretation of Fig. 11. The estimated marginal quantile effects of each of the covariates in Fig. 11 are measured by the slopes of the estimated quantile fits for the corresponding covariate values. The estimated marginal quantile effects of apartment size on net rent per square meter are negative (with the exception of the t = 0.9 quantile for apartments between
272
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
Table 4 Estimated spatial effects calculated at the points displayed in Fig. 9 from conditional t = {0.1, 0.5, 0.9} quantile price surfaces for anisotropic version of Eq. (2). Western point s(x, y) t = 0.1 t = 0.5 t = 0.9
0.261 0.149 0.160
s out 0.391 −0.029 −0.235
Middle point s 0.652 0.120 −0.075
Eastern point
s(x, y)
s in
s
s(x, y)
s out
s
0.007 −0.134 −0.025
2.912 1.683 1.207
2.919 1.549 1.181
−0.240 −1.098 −0.206
0.401 −0.030 −0.241
0.162 −1.128 −0.447
35 and 50 m2 ) and differ markedly across quantiles. For t = 0.9 and t = 0.5, the decrease in net rent per square meter with increasing apartment size is rather steep until a size of around 35 m2 while the decrease is less severe and further diminishes at an apartment size of roughly 60 m2 near the bottom of the conditional distribution of housing price when t = 0.1. The overall shape of the effect is plausible. It can be commonly observed for university towns in Germany as a stylized fact of a separation into a student and a non-student real estate market. Depending on quality and location, the student
real estate market is mainly focussed on apartments under around 30–60 m2 . For the age of an apartment, the estimated marginal quantile effect on net rent is negative until apartments are about 27 (t = 0.5) or 35 years old (t = 0.9) and remains almost constant (t = 0.5) or slightly increases (t = 0.9) afterwards. This increase seems plausible as there is still a large stock of upper-mid- to high-quality rental units built for federal officers during the Weimar Republic. For the t = 0.1 quantile, the marginal effect of age is negative until around
Fig. 10. Estimated marginal quantile effects of nonparametric terms represented by univariate smoothing spline terms for t = {0.1, 0.5, 0.9} quantiles based on the isotropic version of Eq. (2). Axes displaying distance information are scaled in kilometers.
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
273
Fig. 11. Estimated marginal quantile effects of nonparametric terms represented by univariate smoothing spline terms for t = {0.1, 0.5, 0.9} quantiles based on the anisotropic version of Eq. (2). Axes displaying distance information are scaled in kilometers.
65 years and then becomes positive until an age of about 90 years with the positive effect leveling off thereafter. Again, this seems plausible as there is still a large stock of rather low-priced rental units built for workers in the years preceding World War II. Concerning the general tendency of the variables measuring the euclidean distance to the boundary of the historic city center given an observation is located outside the ancient city center, negative estimated marginal quantile effects are found for the displayed quantiles – while the magnitude of the effect decreases from higher to lower quantiles and is almost constant for t = 0.1. In line with economic reasoning, this indicates that for upscale apartments, a location within or close to the historic city center is an important amenity while for low price apartments, location is generally not that important. Moreover, a slightly positive estimated marginal effect of distance to the historic city center boundary is obtained for apartments within the UNESCO World Heritage Site, with the largest impact demonstrated for the t = 0.1 quantile at distances beyond 400m and the t = 0.9 quantile at distances beyond around 300m. Note, however, that if a low-price apartment (t = 0.1) is located within the historic city center boundary, proximity to the city center has a slightly larger positive estimated marginal effect on its net rent than for upscale apartments (t = 0.9). Again, this effect is economically plausible due to the fact that a location in the pedestrian zone in the middle of the historic city center is equivalent to proximity to the main shopping facilities and tourist attractions where traffic noise is low. In total, analyzing the estimated marginal quantile effects reveals further evidence for vertical market segmentation: different market mechanisms seem to be prevalent across different parts of the conditional distribution. Taken together with the results derived from the estimated conditional quantile price surfaces, the proposed framework is able to unveil horizontal and vertical market segmentation effects in the Regensburg rental data and yields highly plausible results from an economic perspective. The estimated conditional interdecile ranges shown in Fig. 12 represent the difference between the estimated conditional t = 0.9 quantile price surface and the estimated conditional t = 0.1 quantile price surface. Areas with high values are areas in which differences between the respective surfaces are high. These differences are
driven by high dispersion between the net rent per square meter of apartments possessing high net rent per square meter and those with low net rent per square meter, holding everything else constant, and may point to neighborhoods with mixed socioeconomic composition and/or areas in which buildings are refurbished in opposite cycles; opposite reconstruction cycles may lead to some apartments – which were not refurbished for quite some time – corresponding to the low price segment, while the recently refurbished apartments pertain to the high price segment. Alternatively, high dispersion between estimated quantile price surfaces may indicate areas in which structural amenities and disamenities are close-by. For example, a building may be located in between an arterial road and a park, with some apartments being seriously affected by traffic noise and others being able to enjoy the view of the park. The estimated conditional interdecile price surface ranges once again highlight the differences in the isotropic and the anisotropic setting and illustrate the limits of an isotropic specification: since the estimated conditional t = 0.9 and t = 0.1 quantile price surfaces under an isotropic assumption consist of concentric circles (besides a less pronounced elliptical formation which mitigates the circular pattern to a certain extent), the interdecile range is also made up of circular patterns. By contrast, an anisotropic setting is less restrictive. The anisotropic specification reveals that estimated differences between the t = 0.9 and t = 0.1 conditional quantile price surfaces are high in the area immediately around the historic city center boundary and in an area in between two ellipses around 1.5 km from the boundary which extends from the southwest to the east. Around the historic city center boundary, arterial roads connect the city center with the highways and state roads and may provide a disamenity for some apartments while other apartments benefit from being aligned to the parks located at the historic city center boundary. About 1.5 km south of the historic city center boundary, a main highway connecting western Germany with the southeastern region passes through Regensburg and may negatively affect some apartments being aligned to the highway while not affecting others. Low interdecile ranges are found in the pedestrian zone within the historic city center, the outskirts of Regensburg and in an area in between two ellipses around 1.5 km from the boundary which extends from the west to the northeast. The low interdecile ranges
274
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
Fig. 12. Estimated conditional interdecile price surface range for isotropic and anisotropic version of Eq. (2). Left display: isotropic setting. Right display: anisotropic setting. The plot includes the contours of the rivers passing through Regensburg (thin blue lines) and the boundaries of the historic city center, of which the ancient city walls are marked in green (bold lines) as reference points. The axes are scaled in kilometers. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
found again suggest that a location in the pedestrian zone of the historic city center and the western parts of the city seems to be a clear amenity causing net rent per square meter of apartments to vary only slightly when controlling for structural characteristics. We abstain from deriving statements about the other areas mentioned, since they are mostly based on quite sparse data.
4. Summary By incorporating a semiparametric specification into the quantile regression framework, we simultaneously address horizontal and vertical market segmentation, which are crucial drivers of the complex mechanisms generating housing prices. The benefit of employing semiparametric models within the quantile regression framework lies in their ability to account for nonlinear covariate effects flexibly – yet circumventing the curse of dimensionality of fully nonparametric methods by imposing relatively mild structural assumptions. McMillen and Redfearn (2010), for example, impose an additive model structure, in which the estimated coefficients of some selected (or alternatively all) covariates depend on location. Their CPAR approach allows regression coefficients (i.e., hedonic shadow prices) to vary gradually over space and consequently, to interact with location based on a priori assumptions about the weighting matrix and the data points to be included in the estimation. We also assume additivity in our semiparametric quantile regression model, but bypass the need to impose a weighting scheme by presuming that the housing price generating mechanisms can be decomposed into global structural characteristics (i.e., not varying over space) and a spatial component (varying over space). Since in practice, it is virtually impossible to encompass all relevant characteristics of an apartment in housing price analysis, the spatial component not only captures the pure effect of location, but also picks up the effect of omitted physical characteristics clustered together (e.g., neighborhoods with extensive well-tended public green spaces, low crime rates or other structural amenities or disamenities). Depending on the chosen path, the spatially varying estimated coefficients (for the
CPAR approach) or the estimated spatial effect absorb the impact of omitted structural characteristics. Contrasting the influence of omitted variables and possible misspecification in the weighting scheme on the estimated hedonic shadow prices and the performance of the CPAR approach compared to our additive semiparametric model might be an interesting question for future research, but is beyond the scope of this paper. In our empirical application, we illustrate the proposed technique to account for horizontal and vertical market segmentation simultaneously on a representative housing data set from the German city of Regensburg, which contains a historic city center (Old Town) which is on the UNESCO World Heritage List. We provide diagnostics that hint at nonlinearities being present in almost all continuous predictors and at the presence of quantile effects. Subsequently, we investigate if an isotropic or an anisotropic setting seems to be more appropriate based on a model incorporating a spatial effect only and a model which contains further explanatory variables. In the purely spatial model, the spatial effect is represented by euclidean distance to city center under an isotropic assumption – effectively restricting the spatial formation to be composed of concentric circles originating from the historic city center. Under an anisotropic assumption, a flexible bivariate function of longitude and latitude is used to encompass the spatial pattern. Comparing conditional quantile price surfaces for both versions of the purely spatial model, we find evidence for varying spatial subcenters over quantiles (i.e., horizontal and vertical market segmentation effects) under the more flexible anisotropic assumption. The common formation across all conditional quantile price surfaces considered in the anisotropic setting is an east-west pattern and a historic city center effect: net rent per square meter is higher in the historic city center and the western parts of the city compared to the eastern parts. While the isotropic setup picks up the historic city center effect, it is not able to identify the complex spatial association structures. These results are confirmed when incorporating structural predictors into the model and extending the set of locational predictors by the conditional distance to the historic city center boundary: the more flexible anisotropic setup yields more compelling results, as it again identifies an east-west pattern and a
M. Fritsch, et al. / Regional Science and Urban Economics 60 (2016) 260–275
historic city center effect. Furthermore, quantile crossing is detected in the estimated marginal quantile effects of the isotropic setting. He (1997) proposes that quantile crossing obtained from an unconstrained quantile regression may be suggestive of misspecification of the regression model. In the anisotropic setting, no quantile crossing occurs. Furthermore, the variation of the estimated marginal quantile effects of the predictors over different quantiles suggests that vertical market segmentation effects are present in the Regensburg rental data – even when accounting for spatial association structures. Taken together, the empirical evidence emphasizes two main points. First, an anisotropic setting seems to be more appropriate to capture the spatial association structures contained in the Regensburg rental data, as compared to an isotropic setting. Second, employing the anisotropic version of the semiparametric model within a quantile regression framework hints at both horizontal and vertical market segmentation effects being prevalent in the Regensburg rental data. Therefore, the findings obtained by applying the proposed framework suggest that complex spatial association structures and varying market mechanisms are prevalent across different parts of the conditional price distribution.
References Anglin, P.M., Gencay, R., 1996. Semiparametric estimation of a hedonic price function. J. Appl. Econ. 11, 633–648. Arguea, N.M., Hsiao, C., 1993. Econometric issues of estimating hedonic price functions. J. Econ. 56, 243–267. Arnold, V.I., 1957. On functions of three variables. Proceedings of the USSR Academy of Sciences 114. pp. 679–681 English translation: Am. Math. Soc. Transl. 28 (1963) 51–54. Banerjee, S., Gelfand, A.E., Knight, J.R., Sirmans, C.F., 2004. Spatial modeling of house prices using normalized distance-weighted sums of stationary processes. J. Bus. Econ. Stat. 22, 206–213. Basu, S., Thibodeau, T.G., 1998. Analysis of spatial autocorrelation in housing prices. J. Real Estate Financ. Econ. 17, 61–85. Bivand, R., 2016a. maptools: Tools for Reading and Handling Spatial Objects. R package version 0.8-39. http://cran.r-project.org/web/packages/maptools/index.html. Bivand, R., 2016b. rgeos: Interface to Geometry Engine - Open Source (GEOS). R package version 0.3-19. http://cran.r-project.org/web/packages/rgeos/index.html. Cheshire, P., Sheppard, S., 1995. On the price of land and the value of amenities. Economica 62, 247–267. Cheshire, P., Sheppard, S., 1998. Estimating the demand for housing, land, and neighbourhood characteristics. Oxf. Bull. Econ. Stat. 60, 357–382. Clapp, J.M., Kim, H.J., Gelfand, A.E., 2002. Predicting spatial patterns of house prices using LPR and Bayesian smoothing. Real Estate Research 30, 505–532. Cohen, J.P., Coughlin, C.C., 2008. Spatial hedonic models of airport noise, proximity, and housing prices. J. Reg. Sci. 48, 859–878. Craig, S., Ng, P., 2001. Using quantile smoothing splines to identify employment subcenters in a multicentric urban area. J. Urban Econ. 49, 100–120. Cressie, N., 1993. Statistics for Spatial Data. Wiley. Cropper, M.L., Deck, L.B., McConnell, K.E., 1988. On the choice of functional form for hedonic price functions. Rev. Econ. Stat. 70, 668–675. Dale-Johnson, D., 1982. An alternative approach to housing market segmentation using hedonic price data. J. Urban Econ. 11, 311–332. Diewert, W.E., 2003. Hedonic regressions: a consumer theory approach. In: Feenstra, R.C. (Ed.), Scanner Data and Price Indexes, Studies in Income and Wealth. 64. pp. 317–348. Dubin, R.A., 1988. Estimation of regression coefficients in the presence of spatially autocorrelated error terms. Rev. Econ. Stat. 70, 466–474. Dubin, R.A., 1992. Spatial autocorrelation and neighborhood quality. Reg. Sci. Urban Econ. 22, 433–452. Ekeland, I., Heckman, J.J., Nesheim, L., 2004. Identification and estimation of hedonic models: using all of the economics of the model to identify it. J. Polit. Econ. 112, 60–109. Epple, D., 1987. Hedonic prices and implicit markets: estimating demand and supply functions for differentiated products. J. Polit. Econ. 95, 59–80. Friedman, J.H., 1991. Multivariate adaptive regression splines (with discussion). Ann. Stat. 19, 1–141. Gelfand, A.E., Kim, H.J., Sirmans, C.F., Banerjee, S., 2003. Spatial modeling of house prices using normalized distance-weighted sums of stationary processes. J. Am. Stat. Assoc. 98, 387–396. Goodman, A.C., 1978. Hedonic prices, price indices and housing markets. J. Urban Econ. 5, 471–484. Goodman, A.C., 1981. Housing submarkets within urban areas: definitions and evidence. J. Reg. Sci. 21, 175–185.
275
Green, R.K., Malpezzi, S., 2003. A primer on U.S. housing markets and housing policy. AREUEA Monograph Series 3 Halvorsen, R., Pollakowski, H.O., 1981. Choice of functional form for hedonic price equations. J. Urban Econ. 10, 37–49. Haupt, H., Kagerer, K., Schnurbus, J., 2011. Cross-validating fit and predictive accuracy of nonlinear quantile regressions. J. Appl. Stat. 38, 2939–2954. Haupt, H., Schnurbus, J., Tschernig, R., 2010. On nonparametric estimation of a hedonic price function. J. Appl. Econ. 25, 894–901. He, X., 1997. Quantile curves without crossing. Am. Stat. 51, 186–192. He, X., Ng, P., 1999. Quantile splines with several covariates. J. Stat. Plann. Infer. 75, 343–352. He, X., Ng, P., Portnoy, S., 1998. Bivariate quantile smoothing splines. J. R. Stat. Soc. B 60, 537–550. Holly, S., Pesaran, M.H., Yamagata, T., 2011. The spatial and temporal diffusion of house prices in the UK. J. Urban Econ. 69, 2–23. Kauermann, G., Haupt, H., Kaufmann, N., 2012. A Hitchhiker’s view on spatial statistics and spatial econometrics for lattice data. Stat. Model. 12, 419–440. Koenker, R., 2005. Quantile Regression, Econometric Society Monographs 38. Cambridge University Press. Koenker, R., 2011. Additive models for quantile regression: model selection and confidence bandaids. Braz. J. Probab. Stat. 25, 239–262. Koenker, R., 2016. Quantile Regression. R package version 5.21. Koenker, R., Bassett, G., 1978. Regression quantiles. Econometrica 46, 33–50. Koenker, R., Mizera, I., 2004. Penalized triograms: total variation regularization for bivariate smoothing. J. R. Stat. Soc. B 66, 145–163. Koenker, R., Ng, P., Portnoy, S., 1994. Quantile smoothing splines. Biometrika 81, 673–680. Kolmogorov, A.N., 1956. On the representation of continous functions of several variables by superpositions of continuous functions of a smaller number of variables. Proceedings of the USSR Academy of Sciences. 108. pp. 179–182. English translation: Am. Math. Soc. Transl. 17 (1961) 369–373. Machado, J., 1993. Robust model selection and M-Estimation. Economic Theory 9, 478–493. Majumdar, A., Munneke, H.J., Gelfand, A.E., Banerjee, S., Sirmans, C.F., 2006. Gradients in spatial response surfaces with application to urban land values. J. Bus. Econ. Stat. 22, 206–213. McMillen, D.P., 2008. Changes in the distribution of house prices over time: structural characteristics, neighborhood, or coefficients? J. Urban Econ. 64, 573–589. McMillen, D.P., 2013. Quantile Regression for Spatial Data. Springer Briefs in Regional Science. McMillen, D.P., 2015. Conditionally parametric quantile regression for spatial data: an analysis of land values in early nineteenth century Chicago. Reg. Sci. Urban Econ. 55, 28–38. McMillen, D.P., Redfearn, C.L., 2010. Estimation and hypothesis testing for nonparametric hedonic house price functions. J. Reg. Sci. 50, 712–733. Ng, P., Yan, Y.Y., 2008. Evaluation of human bioclimates using quantile regression: a China case study. Phys. Geogr. 29, 387–403. Pace, R.K., Barry, K., Sirmans, C.F., 1998a. Spatial statistics and real estate. J. Real Estate Financ. Econ. 17, 5–13. Pace, R.K., Barry, K., Clapp, J.M., Rodriquez, M., 1998b. Spatiotemporal autoregressive models of neighborhood effects. J. Real Estate Financ. Econ. 17, 15–33. Parmeter, C.F., Henderson, D.J., Kumbhakar, S.C., 2007. Nonparametric estimation of a hedonic price function. J. Appl. Econ. 22, 695–699. Parsons, G.R., 1990. Hedonic prices and public goods: an argument for weighting locational attributes in hedonic regressions by lot size. J. Urban Econ. 27, 308–321. Development Core Team, R., 2016. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Azustria. Rosen, S., 1974. Hedonic prices and implicit markets: product differentiation in pure competition. J. Polit. Econ. 82, 34–55. Sheppard, S., 1999. Hedonic analysis of housing markets. Applied Urban Economics. In: Cheshire, P. (Ed.), Handbook of Regional and Urban Economics. vol. 3. Elsevier., pp. 1595–1635. Silver, M., Heravi, S., 2007. The difference between hedonic imputation indexes and time dummy hedonic indexes. J. Bus. Econ. Stat. 2, 239–246. Soetaert, K., 2016. Plot3d: Plotting Multi-dimensional Data. R package version 1.1. Stone, C.H., 1994. The use of polynomial splines and their tensor products in multivariate function estimation. Ann. Stat. 22, 118–171. Tsai, C.L., Cai, Z., Wu, X., 1998. The examination of residual plots. Stat. Sin. 8, 445–465. UNESCO World Heritage Centre, 2016. Old Town of Regensburg with Stadtamhof. (accessed 14.07.2016) Wood, S.N., 2003. Thin plate regression splines. J. R. Stat. Soc. B 65, 95–114. Wood, S.N., 2006. Generalized Additive Models: An Introduction With R. Chapman and Hall/CRC Press. Wood, S.N., 2013. On p-values for smooth components of an extended generalized additive model. Biometrika 100, 221–228. Wood, S.N., 2016. Mgcv: Mixed GAM Computation Vehicle With GCV/AIC/REML Smoothness Estimation. R package version 1.8-10. Zhu, B., Fuess, R., Rottke, N.B., 2011. The predictive power of anisotropic spatial correlation modeling in housing prices. J. Real Estate Financ. Econ. 42, 542–565.