Accepted Manuscript
The Influence of House, Seller, and Locational Factors on the Probability of Sale R. Kelley Pace, Shuang Zhu PII: DOI: Reference:
S1051-1377(18)30064-0 https://doi.org/10.1016/j.jhe.2018.09.009 YJHEC 1609
To appear in:
Journal of Housing Economics
Received date: Accepted date:
21 March 2018 20 September 2018
Please cite this article as: R. Kelley Pace, Shuang Zhu, The Influence of House, Seller, and Locational Factors on the Probability of Sale, Journal of Housing Economics (2018), doi: https://doi.org/10.1016/j.jhe.2018.09.009
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
CR IP T
ACCEPTED MANUSCRIPT
The Influence of House, Seller, and Locational Factors on the Probability of Sale
AC
CE
PT
ED
M
AN US
R. Kelley Pace LREC Endowed Chair of Real Estate Department of Finance E.J. Ours College of Business Administration Louisiana State University Baton Rouge, LA 70803-6308 OFF: (225)-578-6256
[email protected] and Shuang Zhu Assistant Professor Department of Finance Kansas State University Manhattan, KS 66506
[email protected] 1
1 The
September 24, 2018
authors would like thank Geoffrey Turnbull and Zsuzsa Huszari for their insightful comments. We appreciate the helpful comments from other participants in the 2017 FSU-UF Critical Issues in Real Estate Symposium. All errors are our own.
ACCEPTED MANUSCRIPT
CR IP T
Abstract Ability to model the probability of individual house sales would be of benefit in a number of real estate contexts. Existing housing literature models the probability of a house sale
AN US
using mainly property characteristics or macroeconomic variables. However, the use of only property characteristics typically yields a poor model fit. This study investigates the relative roles of property characteristics, seller mortgage origination variables,
ED
M
and locational factors in terms of explaining the probability of individual house sales. Homeowner information such as their time horizon of staying in the house and their stage in the life cycle should have an
PT
important impact on house sale decisions. Although obtaining such information is typically difficult, mortgage choices made at loan origination may help reveal such information. In addition, the measurable characteristics of neighbors and the geographic
AC
CE
clustering of house sales, ceteris paribus, may aid in modeling house sales. Incorporating the property characteristics, mortgage origination variables, and locational factors increases the pseudoR2 of a probit model from under 1 percent when only using property characteristics to over 23 percent when using property, mortgage, and spatial information. The results are consistent
ACCEPTED MANUSCRIPT
through a particularly large housing cycle in both Las Vegas
AC
CE
PT
ED
M
AN US
CR IP T
County in Nevada and Maricopa County in Arizona from 2006 to 2010, and for a variety of sub samples.
2
ACCEPTED MANUSCRIPT
1
Introduction
Explaining whether an individual house sold or not over some
CR IP T
time interval is of interest in at least three areas of real estate
research. First, if houses sold are not a random sample from
the population of all houses, sample selection bias could be an issue. Inferences drawn from the selected sample of sold houses
could lead to biased parameter estimates and erroneous infer-
AN US
ences about either the contribution of individual characteristics
or the trajectory of real estate prices over time such as measured by a transaction price based house price index (e.g., Gatzlaff and Haurin, 1997, 1998; Fisher et al., 2003).
M
Second, a house sale automatically results in the prepayment of the mortgage, because of the due on sale clauses in mortgages.
ED
This form of mortgage termination competes with other mutually exclusive mortgage decisions such as default, refinancing, or
PT
payment (e.g., Kau and Keenan, 1995; Deng et al., 2000; Clapp et al., 2001; Ong, 2000). Therefore, factors that affect the sale
CE
of the house also affect these other decisions and thus modeling of these decisions.
AC
Third, housing transactions provide another measure of the
state of the market. For example, the Pending Sales Index from the National Association of Realtors (NAR) is widely viewed as an indicator of housing market conditions. Like the MSA Median Price index from the NAR, this index is also not adjusted 1
ACCEPTED MANUSCRIPT
for variables that differ across markets. This suggests examining the aggregate house sale probability for submarkets after conmeasure of real estate market liquidity.
CR IP T
trolling for important factors could also be used as an improved Despite of the importance of housing sales, papers that ex-
plicitly model the probability of house sales, especially at the
individual property-level, are rare (Johnson et al., 2007). At the
AN US
aggregate level, literature indicates that market conditions may
influence housing market transactions(e.g., Qian, 2012; Fu and Qian, 2014; Chan, 2001). While macroeconomic conditions are important factors affecting the house sales, this study focuses on the contribution of individual property/borrower characteristics
M
in individual level house sales.1
Property-level house sale models normally appear as the first
ED
stage regression of the Heckman two-stage model in the house price index sample selection literature (e.g., Gatzlaff and Hau-
PT
rin, 1997, 1998; Fisher et al., 2003). In the residential real estate literature, house sale models rely mainly on housing variables.
CE
Typically, it has been difficult to obtain a good fit of the probability of a transaction as a function of housing variables. As the
AC
fit becomes worse and worse, a probit or logit model will basically default to showing that each property has a 1/n probability 1 Our data is on seller side. The current seller is previously a mortgage borrower when they purchased or refinanced the property some time in the past. So seller and borrower are used interchangeably in this paper.
2
ACCEPTED MANUSCRIPT
of sale and this either indicates that sample selection is not a problem (e.g., Munneke and Slade, 2000; Jud and Seaks, 1994)
CR IP T
or that the empirical approach taken is inadequate to address the issue.2
To understand the low effect of property characteristics on the probability of a house sale, we introduce a simple theo-
retical model. This model shows that if potential buyers and
AN US
sellers place similar values on characteristics such as living area
or house age, property characteristics by themselves may not materially affect the probability of a sale.
Mortgage borrowers/homeowners make house sale decisions. Their characteristics such as their time horizon of staying in the
M
house or life cycle stage should play a role in the house sale decision (e.g., Ortalo-Magn and Rady, 2006).3 Although typically
ED
such information is difficult to obtain, it is reasonable to assume 2
AC
CE
PT
A feature of the probit model is that for identification the variance of the disturbances is normalized to 1. This means that the error variance is always the same in probit, but that the fit shows up in the variance of the estimated index. A poor fit will result in parameter estimates close to zero and the variance of the estimated index will be low (Yatchew and Griliches, 1985). Therefore, omitted variables in probit, even if they are unrelated to the included variables, enter into the disturbance term prior to normalization, and after normalization result in a shrinkage of the parameter estimates on the included variables. Of course, if the omitted variables are related to the included variables, this leads to bias. This suggests that it will be helpful to augment property characteristic variables with other variables that affect the probability of sale in the first-stage probit regression. 3 In the intertemporal setting, macroeconomic variables are found to explain house sales over time (e.g., Fisher et al., 2003). Chan (2001) and Brueckner and Follain (1988) show that ARM-FRM choice is a function of residential mobility. Fortowsky et al. (2011) also document that mortgage types such as FRM or ARM help explain house tenure. Stanton and Wallace (1998) argues that mortgage choice with or without points helps reveal buyer type.
3
ACCEPTED MANUSCRIPT
that mortgage choices made at loan origination should reflect borrower housing preferences (Chambers et al., 2009). Public
CR IP T
mortgage records provide a convenient source of such origination variables. For example, public records contain information
on whether individuals refinance. Given everything else is the same, individuals who expect to continue stay in the same house
are more likely to take advantage of the market refinance oppor-
AN US
tunities than individuals who expect to move in the near future. Thus, a recently refinanced borrower is more likely stay in the house for some time. Similarly, hybrid adjustable rate mortgages normally have a lower contract rate for initial payments, but also a higher interest rate risk in the long run. Individuals
M
are more likely to select such products if they expect to stay in the house for a shorter time period, holding everything else con-
ED
stant. Therefore, financial choices made by homeowners reveal house.
PT
some information about their anticipated future tenure in the In addition, we investigate whether location matters in the
CE
house sales decision by explicitly modeling both the measurable characteristics of neighbors, and the locationally related
AC
omitted variables (as captured by the geographic clustering of sales). Some of these factors have been discussed in the housing tenure literature. However, whether and to what extent these pre-determined variables could help explain the individual house
4
ACCEPTED MANUSCRIPT
sale model in a given time period is not yet clear.4 For example, although location is an important factor in housing decisions,
CR IP T
because of the computational challenge, it was until recently thought almost impossible to implement formal spatial probit model empirically (Pace and LeSage, 2016).
Using the public transaction data for single family properties
from Clark County in Nevada and Maricopa County in Ari-
AN US
zona, this manuscript examines the performances of modeling the probability of sale as a function of property characteristics,
loan characteristics,5 and location information. In a cross sectional setting, we investigate whether and how these variables affect house sales. We find that mortgage variables greatly en-
M
hance the explanatory power of the probit model. For example, in year 2010, property characteristics alone do not perform
ED
well with a pseudo-R2 of 0.42 percent while loan characteristics alone give a pseudo-R2 of 17.79 percent. Using both property
PT
and loan variables yields a pseudo-R2 of 21.00 percent. Many of the inferences concerning the property characteristics change
CE
after adding the mortgage information. The results hold well for different time periods, different areas, and a variety of sub
AC
samples. The results also indicate that the coefficients of the house sale model vary from year to year. 4
Krupka (2008) find that neighborhood with mixed income tend to less stable. Loan characteristics are pre-determined variables from previous borrower/current seller loan origination in the past. 5
5
ACCEPTED MANUSCRIPT
Locational factors, as captured by both neighborhood information and a spatially dependent disturbance term, are also im-
CR IP T
portant in determining house sales. For example, in year 2006, for the loan variables only model, the pseudo-R2 increases from a 14.01 percent for the iid model to a 16.82 percent for the spatial
error dependent model, and to a 20.30 percent when including neighborhood information in the spatial model.
AN US
In summary, this study investigates at the individual propertylevel the relative roles played by property characteristics, seller mortgage origination variables, and location information in determining cross sectional house sales. Our theoretical model helps explain why housing characteristics have little explana-
M
tory power. Overall, our results show that mortgage and location information together increase the house sale model fit as of
ED
pseudo-R2 from 0.76 percent to 23.02 percent, which is an over model.
PT
30-fold improvement from the housing characteristics only iid This paper is organized as follows: in Section 2, we introduce
CE
a simple theoretical model of the probability of house transactions that incorporates property and seller/buyer characteristics
AC
as well as a spatial econometrics model to explicitly incorporate location related variables. In Section 3 we describe our data gleaned from multiple public record sources, and present the empirical results in Section 4. We summarize the key findings
6
ACCEPTED MANUSCRIPT
in Section 5.
Model
CR IP T
2
Section 2.1 introduces a simple model to theoretically illustrate the relative importance of housing and mortgage information in
explaining house sales. Section 2.2 introduces the spatial econo-
AN US
metric model which formally models both the measurable neighbor information, and the location related omitted variables. 2.1
Mortgage Choices and House Sales
In this section, we introduce a simple model that incorporates
M
property, seller, and buyer characteristics.6 The model illus-
ED
trates why housing characteristics may not aid much in the performance of models of housing transactions. Similarly, we explain why borrower characteristics can enhance the performance
PT
of models of the probability of a housing transaction. In selecting the explanatory variables, we wish to ascertain
CE
the strength of various factors that theoretically could affect the probability that a property transacts (y = 1 if it sells, y = 0
AC
if it does not) in a certain period. For such a transaction to occur, the seller must have a latent reservation price ys∗ that is 6
The model assumes a cross sectional setting in a single area such as the same MSA, so that all observations have the same macroeconomics variables in a single year. This would allow us to focus on the impact of the property-level variables.
7
ACCEPTED MANUSCRIPT
below that of the buyer’s latent reservation price yb∗ as in (1). Alternatively, if the seller’s latent reservation price exceeds the (y = 0) as in (2).
yb∗ − ys∗ ≥ 0 → y = 1
(1)
(2)
AN US
yb∗ − ys∗ < 0 → y = 0
CR IP T
buyer’s latent reservation price, the transaction does not occur
The latent buyer and seller reservation prices in (3) and (4) depend upon the buyer-side characteristics b and seller-side characteristics s where b and s are row-vectors and γ and ϕ are con-
M
formable column vectors. There is a scalar buyer disturbance of
PT
ED
ξb and a scalar seller disturbance of ξs .
yb∗ = bγ + ξb
(3)
ys∗ = sϕ + ξs
(4)
CE
Putting together (3) and (4) shows that the difference in reservation prices depends on the various characteristics (b and s),
AC
their respective coefficients (γ and ϕ), and a random component (ξb and ξs ) as shown in (5). We can further simplify the model since the difference between two random normal variables is also a random normal variable (although with a different vari-
8
ACCEPTED MANUSCRIPT
ance) as shown in (6). Alternatively, if the disturbances have the same variances and follow a Gumbel distribution, the differ-
CR IP T
ence between the two disturbances follows a logistic distribution. Substituting (6) in (5) leads to the simpler (7).
yb∗ − ys∗ = [bγ − sϕ] + [ξb − ξs ] ξ = ξb − ξs
(5)
AN US
(6)
yb∗ − ys∗ = [bγ − sϕ] + ξ
(7)
The common feature of the buyer-side and the seller-side is the house itself with characteristics contained in the row-vector
M
h. However, the buyer may place a value γh on the characteristics h while the seller may place a value ϕh on the characteristics
ED
h. Both the buyer and seller have their own characteristics cb
AC
CE
PT
and cs with values given by γc and ϕc as shown in (8)–(11). h
b = h cb
i
bγ = hγh + cb γc i h s = h cs
sϕ = hϕh + cs ϕc
(8) (9) (10) (11)
Substituting in (9) and (11) into (7) yields (12). Note, if the values placed on the house characteristics h by both the buyer 9
ACCEPTED MANUSCRIPT
(γh ) and the seller (ϕh ) are identical, the house characteristics will not affect whether the house sells or not. In that case only
CR IP T
buyer and seller characteristics and their values along with random disturbances will determine whether the transaction takes
place. Moreover, the differences between the two disturbances
captured by ξ from (6) has a larger variance than the variance of the disturbance from the buyer or the seller (assum-
AN US
ing arms length dealings and independence between the two
disturbances). Specifically, σξ2 = σξ2a + σξ2b and so the process of differencing in (12) has the outcome of usually reducing the magnitude of the signal while increasing the magnitude of the
M
noise (ξ).
ED
yb∗ − ys∗ = h(γh − ϕh ) + [cb γc − cs ϕc ] + ξ
(12)
In terms of implementation in terms of probit model estima-
PT
tion, let yi∗ represent the latent difference in reservation prices between the seller and the buyer for the ith property as shown
CE
in (13). Therefore, y ∗ is the overall n by 1 vector of latent differences in reservation prices across all n observations. The n by
AC
p1 matrix H specifies all the property characteristics across all n observations while the n by p2 matrix C in (14) contains the buyer and seller characteristics. The n by p3 matrix A contains
a vector of ones and other dichotomous variables such as those 10
ACCEPTED MANUSCRIPT
for time or regions and is given the values contained in the p3 by 1 parameter vector α. Therefore, the difference in reservation
CR IP T
prices across all n properties depends on the intercept and dichotomous variable terms, the characteristics of each house, the buyer and seller characteristics, and the random disturbances in
the n by 1 vector ε as shown in (15). The disturbance variance
in probit is not identified, and so is always set to 1 as in (16).
AN US
Given these assumptions, the probability of observing a trans-
action for the ith property equates to the probability that the latent index exceeds 0 as in (17).
(13)
ε ∼ N (0, In )
(16)
M
yi∗ = yb∗ − ys∗ h i C = Cb Cs
ED
y ∗ = Aα + Hβ + Cδ + ε
PT
Pr(yi = 1) = Pr(yi∗ ≥ 0)
(14) (15)
(17)
CE
Housing characteristics H may not matter much in probit es-
timation of β for a couple of reasons. First, β may be close to 0.
AC
As mentioned earlier, this could occur if buyers and sellers give equal values to the housing characteristics in determining their reservation prices. Although one can easily point to individual cases where a buyer and seller diverge on the valuation of an par11
ACCEPTED MANUSCRIPT
ticular characteristic, an empirical model averages over all of the observations and it becomes harder to imagine that buyers and
CR IP T
sellers on average value characteristics differently. Second, characteristics in probit suffer from attenuation bias if omitted variables (even uncorrelated ones) have a large variance. Since the
overall variance of the disturbances is set to 1, omission of vari-
ables attenuates the probit parameter estimates. Therefore, it is
AN US
important to introduce factors to measure or proxy for buyer and seller characteristics in C and other characteristics in A to avoid attenuation bias in β. Similarly, in modeling mortgage termina-
tion, not having property, buyer, or other characteristics could result in attenuation bias of the mortgage characteristics (which
M
pertain to the potential seller). In using the probability of sale as a measure of market liquidity, it would be ideal to control
ED
for as many factors as possible. Otherwise, estimates associated with the temporal dichotomous variable would be affected by
PT
the property, buyer/seller characteristics, and macroeconomics
CE
variables mix over time. 2.2
Locational Aspects of House Sales
AC
In this section we briefly set forth the motivation for including neighborhood variables in section 2.2.1 and spatially dependent disturbances in section 2.2.2. Taken together spatial considerations result in four models of interest in section 2.2.3. Essen12
ACCEPTED MANUSCRIPT
tially, this approach follows that of Zhu and Pace (2014) who examined the spatial aspects of the probability of default. Neighborhood Characteristics and House Sales
CR IP T
2.2.1
As mentioned earlier, much of the previous literature has found
that property characteristics do not help much in predicting house sales. However, property characteristics may only have
AN US
meaning in the context of individual markets or neighborhoods. For example, a 3,000 square foot dwelling on Manhattan Island
in New York could have quite different liquidity than a 3,000 square foot dwelling in Manhattan Kansas. Therefore, adding the neighborhood average of the housing characteristics X via
M
the operation W X, where W is a n by n spatial weight matrix, could augment explanatory power of the models. From mort-
ED
gage perspective, Hanson et al. (2012) find evidence that credit quality is spatially correlated. This suggests that LTV ratio and
PT
refinance activities could be spatially correlated as well. Thus, neighbor’s financing activity could possibly have an impact on
CE
the house sale.
The spatial weight matrix W contains positive elements when-
AC
ever observation i and j are neighbors and zeros otherwise. By
convention, Wii = 0 for i = 1, . . . , n and therefore, observations are not allowed to be neighbors to themselves. Neighbors could be specified by cardinal (e.g., within 0.25 miles) or ordinal 13
ACCEPTED MANUSCRIPT
distances (e.g., six nearest neighbors).7 Augmenting the model to contain both characteristics and
η = Xβ + W Xθ + ε, ηi < 0 → yi = 0
2.2.2
ε ∼ N (0,In )
(18)
(19) (20)
AN US
η i ≥ 0 → yi = 1
CR IP T
their neighborhood averages produces (18).
Spatial Omitted Variables and House Sales
In statistical models the disturbance term ε captures the effect of the many omitted and possibly unobservable variables that
M
affect almost any outcome. If these omitted influences are independent, and no one influence dominates the others, this leads
ED
to specification of the disturbances as following an independent multivariate normal distribution so that ε ∼ N (0,In ).
PT
In the context of real estate, however, many of the omitted
or unobservable variables have a spatial nature. Variables such
CE
as accessibility, noise, architecture, landscaping, functionality of neighborhood associations, and maintenance are difficult to ob-
AC
serve and are either spatial or produce spatial externalities. This leads to spatial dependence in the disturbances and a common way of modeling this is through the conditional autoregressive 7
See LeSage and Pace (2009) for more details on the spatial weight matrix and for motivations underlying the use of W X as well as spatially dependent disturbances.
14
ACCEPTED MANUSCRIPT
(CAR) process so that ε ∼ N (0,(In − ρW )−1 ). Spatial probit
model was previously extremely difficult to estimate due to the
CR IP T
substantial computational challenges this model presents (Pace and LeSage, 2016). Fortunately the newly developed method by
Pace and LeSage (2016) makes estimation possible. This paper is the first house sales paper using spatial probit. Overall Models
AN US
2.2.3
The spatial and independence models from sections 2.2.1 and 2.2.2, result in four different probit models as in (21) to (26).
η = Xβ + ε,
ε ∼ N (0,In )
M
η = Xβ + ε,
ε ∼ N (0,(In − ρW )−1 ) ε ∼ N (0,In )
ED
η = Xβ + W Xθ + ε,
ε ∼ N (0,(In − ρW )−1 )
η = Xβ + W Xθ + ε,
(21) (22) (23) (24) (25)
ηi ≥ 0 → yi = 1
(26)
CE
PT
η i < 0 → yi = 0
Data
AC
3
This section introduces data and variables. Section 3.1 introduces our data sources and presents how the sample data is constructed. Section 3.2 discusses the variables and summary 15
ACCEPTED MANUSCRIPT
statistics. Data Source and Data Construction
CR IP T
3.1
This section describes the data sources, the sample selection, and the data construction. Our analysis focuses on single family residential properties.
We use the public record data from both Clark County Asses-
AN US
sor’s Office and Clark County Recorder’s Office in Nevada. We also obtain data from Maricopa County in Arizona for robust-
ness checks. The assessor data contains information of housing characteristics and street addresses. The recorder data contains
M
details on real estate related transactions including ownership transactions such as arms length house sales, foreclosure sales,
ED
and quit claim transfers as well as non-ownership transactions such as refinance and home equity loan activities. Our transaction data starts from year 2000. The recorder data has detailed
PT
information about the transaction such as the dollar value transferred, mortgage information, deed type, and transaction type.
CE
Street addresses are geocoded to longitude and latitude by using ArcGIS 10.0 Desktop for spatial econometrics analysis. We
AC
also use the Case-Shiller house price index (HPI) to estimate a loan-to-value at origination for refinance loans. Our study requires house sale information, the seller’s mort-
16
ACCEPTED MANUSCRIPT
gage information, and property characteristics.8 To obtain mortgage information associated with arms length resale transac-
CR IP T
tions, we filter the recorder data by excluding quit claim transfers, construction and time share transactions.9 For the refi-
nanced/equity loans, we eliminate the line of credit refinance loans, and the equity refinance loans.10 The equity refinance
is identified by the criterion of less than forty percent of loan
AN US
amount versus the last resale amount.11 For each year inves-
tigated, the mortgage information comes from the latest prior transaction which could be either a resale or a refinance. In another word, we derive the sellers’ loan information, for example in year 2010, from the latest previous transactions when the cur-
M
rent seller either bought the house and borrowed the mortgage, or refinanced. These loan origination dates could go back to year
ED
2000. Thus the sellers’ mortgage variables are pre-determined in our analysis. 8
PT
The sales data is identified by resale transactions.12 We then This study does not include buyer information. So the analysis is on the seller’s side. Transactions without mortgage information are excluded from this study. This reduces the property-year sample size by 176,996, about 12.5% of the raw sample with 1,414,085 property-year observations. We compared property characteristics for houses with and without mortgage to check if there is any systematic difference. It seems that houses with mortgages are slightly older houses with a little bit lower quality than those without mortgage information. Number of bathrooms, total living areas and land are similar between the two groups. 10 While cash out refinance and rate reduction refinance might have different effect on house sale, our data does not contain such information. 11 In another word, if refinance with loan-to-value ratio lower thatn forty percent is identified as equity refinance. 12 We focus on the successful sales and only successful sales are observed in our data. In-
AC
CE
9
17
ACCEPTED MANUSCRIPT
merge the sales data with the mortgage data, and the assessor data for housing characteristics. From here, we exclude the dis-
CR IP T
tressed property related transactions which include foreclosure auction sales, deed-in-lieu foreclosures, and post foreclosure sales
(REO).13 Our analysis focus on normal market sales. We also require a total living area of between 500 square feet and 7000
square feet, valid mortgage information, and property charac-
AN US
teristics. Our sample time period is from 2006 to 2010. Year 2000 to 2005 data is also used for mortgage data since we need to go back to the previous transactions for the loan information. This results in a property-year sample size of 1,237,089. Variables and Summary Statistics
M
3.2
ED
This section discusses variables and some summary statistics. The dependent variable of the probit model is the house sale, which equals 1 if a house sold in a certain year and other-
PT
wise equals 0. Explanatory variables are housing characteristics, loan information, and spatial averages of the own observation
CE
explanatory variables. Housing characteristics include logged
AC
house age (House Age), number of bathrooms (Bath), logged vestigating how seller mortgage choices at loan origination could affect the listing decisions would be interesting for future research. 13 Distressed sales are taken out of the sample for two reasons: (1) we want to rule out the alternative explanation that the mortgage variables are risk factors of mortgage default, since properties with risky loans are more likely to go into a distress sale; (2) distressed property sales and normal market sales could have quite different explanatory variables or regressions. By focusing on normal market sales, we have a cleaner setting.
18
ACCEPTED MANUSCRIPT
total living area in square footage (Sqft), logged lot size (Land), and an assessor quality rating (Quality) indicating the construc-
CR IP T
tion quality of the property ranging from the lowest quality of one to the highest quality of ten. Mortgage information includes
combined loan-to-value ratio (LTV) at origination (the house value of refinance loans is calculated by using HPI to update
the last resale price), fixed rate mortgage dummy (FRM), refi-
AN US
nanced loan dummy (Refi), and loan origination year dummies.
The mortgage choices at loan origination are used to capture borrower’s characteristics or housing preferences. We have loan origination year dummies to capture the market condition at loan origination. In the cross sectional regressions, the sale year
M
dummy is implicitly controlled. In the panel regression, we also control the sale year dummies to capture the market condition
ED
at the time of house sale, and the zip code dummies to better control the potential geographical difference. To have an even
PT
finer control of the geographical difference, we add in the spatial components into the model. The weight matrix used is the
CE
nearest neighbor weight matrix with six closest neighbors where the second nearest neighbor had an influence of 0.35 of the first
AC
nearest neighbor, the third nearest neighbor had an influence of 0.35 of that of the second nearest neighbor, and so forth.14 The summary statistics appear in Table 1. The loan information is 14
We did not investigate alternative specifications of the spatial weight matrix, but just used the same one as in Zhu and Pace (2014).
19
ACCEPTED MANUSCRIPT
at origination.
0.000 0.000 1.000 6.217 6.770 1.000 0.001 0.000 0.000
1.000 4.691 9.500 8.854 10.588 10.000 1.500 1.000 1.000
AN US
Sold 0.037 0.190 House Age 2.476 0.806 Bath 2.332 0.701 Sqft 7.511 0.346 Land 8.756 0.452 Quality 4.666 1.123 LTV 0.757 0.253 FRM 0.655 0.475 Refi 0.453 0.498 N(in 1000) 1237.089 Note: House Age, Sqft and Land are
Max
CR IP T
Variable
Table 1: Summary Statistics Mean Std Dev Min
in logged format
M
To obtain an idea whether borrower mortgage choices are correlated with the house sale probability, we report the sam-
ED
ple sizes and the proportions of sold properties across different mortgage types for year 2006-2010 in Table 2.15 Several clear
PT
patterns arise. First, properties with a FRM have a lower proportion of sold than with an ARM in all years. For example, in
CE
year 2010, about 7.2 percent properties with ARM sold on the market versus only 2.3 percent properties with FRM sold dur-
AC
ing the same time period. Second, properties with refinanced loans are much less likely to be sold on the market than those with non-refinanced loans. For example, in year 2006, less than 15
The proportion of borrower using FRM increases after year 2008. This might be a result from the high lending standard after crisis and low interest rate environment.
20
ACCEPTED MANUSCRIPT
one percent of refinanced properties are sold while more than 11 percent of properties with non-refinanced loans are sold. Third,
CR IP T
higher LTV ratio loans also have a much higher percentage of sold properties than lower LTV ratio properties. These results provide preliminary evidence that mortgage choices vary with
AC
CE
PT
ED
M
AN US
house sale decisions.
21
CE
N
139.017 95.086
98.587 135.516
130.348 103.755
FRM=1 FRM=0
Refi=1 Refi=0
LTV≥0.8 LTV<0.8
Whole Sample 234.103
Sample
0.1049 131.184 0.0236 111.948
0.0024 108.560 0.1172 134.572
0.0605 144.320 0.0810 98.812
M
N
0.0414 140.453 0.0079 118.250
0.0005 120.630 0.0480 138.073
0.0166 191.951 0.0474 66.752
CR IP T
0.0357 136.438 0.0071 117.832
0.0003 118.370 0.0414 135.900
0.0142 177.989 0.0365 76.281
N
0.0573 0.0105
0.0000 0.0672
0.0234 0.0717
0.0359
Sold
Y2010 0.0259 258.703
Sold
Y2009 0.0223 254.270
Sold
AN US
0.0552 131.352 0.0140 115.529
0.0011 114.831 0.0646 132.050
ED
N
0.0306 157.125 0.0444 89.756
Sold
Y2008 0.0362 246.881
N
Y2007 0.0688 243.132
Sold
PT
Y2006
Table 2: Sample Size and Proportion Sold by Year and Loan Characteristics (N in Thousands)
AC
ACCEPTED MANUSCRIPT
22
ACCEPTED MANUSCRIPT
4
Empirical Results
This section presents the empirical results of various house sales
CR IP T
models. Section 4.1 focuses on using mortgage choice informa-
tion to enhance house sales models. Section 4.2 studies whether neighborhood information has impact on house turnover, and
summarizes of the overall results. For robustness of the results,
we used both probit and logit regressions. Results in Section 4.1
AN US
are logit regression results. Results in Section 4.2 are probit regression results. 4.1
Mortgage Choices and House Sales
ED
general structure.
M
We estimate a cross sectional logit model with the following
CE
PT
y ∗ = Hβ + Cδ + Aα + ε ε ∼ Logistic(0, In )
Pr(yi = 1) = Pr(yi∗ ≥ 0)
(27) (28) (29)
AC
To make the general specification in (27) implementable, we
used the empirical specification in (30) to (32).
23
ACCEPTED MANUSCRIPT
i H = ln(House Age) Bath ln(Sqft) ln(Land) Quality (30) h i C = LTV FRM Refi Origin Year Dummies (31)
CR IP T
h
A = Intercept
(32)
Ideally, we would like to know detailed information on the
AN US
seller and buyer.16 However, this information would be diffi-
cult to obtain. Therefore, we infer some seller information from the mortgage choices made in the past by the existing borrower who is the seller. Holding everything else constant, homeowners with a longer tenure time horizon are more likely to seek
M
refinancing than homeowners who plan to move in a short time period. Therefore, we infer that those who refinanced possess refinance.
ED
less motivation to sell their house relative to those who did not
PT
Similarly, individuals who chose a fixed rate loan also revealed their time horizon. Usually, individuals with short time horizons
CE
will find ARMs less expensive in the short-run. However, borrowers of ARM need to bear more interest rate risk in the long
AC
run. Therefore, individuals who chose fixed rate mortgages typically have longer expected holding periods. Thus, we anticipate 16
We do not have information on the buyer characteristics, although some of these could be inferred from the mortgage choices they made after buying the house. Currently, we have not matched these records, but this is feasible in future research.
24
ACCEPTED MANUSCRIPT
that the FRM variable would have a negative coefficient. Many house buying decisions are a function of the individ-
CR IP T
ual’s life-cycle (Ortalo-Magn and Rady, 2006; Artle and Varaiya, 1978; Flavin and Yamashita, 1998). Typically, younger buyers
need to borrow a higher proportion of the price as they have
fewer financial resources. At the other end of the life-cycle, po-
tential retirees often wish to enter retirement without mortgage
AN US
debt. Therefore, LTV is likely inversely related to age. In turn,
mobility is inversely related to age. We infer that LTV is positively associated with mobility and thus should be positively associated with the probability of sale. Therefore, we anticipate a positive coefficient for the LTV variable.17
M
Insofar as credit standards, interest rates, and other market conditions change over time, we control for some of these effects
ED
through origin year dummy variables. For example, if credit standards are lax in a particular year, loan-to-value ratios could
PT
be shifted upwards for most borrowers. Similarly, the relative rates between fixed rate loans and the initial periods of ARM
CE
loans differ over time. Using the origin year dummies helps control for some of this variation.
AC
In terms of housing characteristics, we would usually an-
ticipate that older houses have more trouble transacting, that 17
Note that falling house price leads to a higher current LTV, which may hinder mobility (e.g., Chan, 2001). Thus, instead of current LTV, we use the original LTV to better capture the individual’s life-cycle, which also helps avoid the simultaneity issue. The origination year dummies help control the change in house prices.
25
ACCEPTED MANUSCRIPT
houses with too few bathrooms have trouble transacting, that both very small and very large houses in terms of floor area and houses would have more trouble selling.
CR IP T
land area would be less likely to trade, and that low quality Table 3 presents estimates from the logit estimation using
observations from 2006. Thus, it reflects the market situation before or shortly into the housing crisis. Several features emerge
AN US
that repeat in subsequent periods. First, the fit, as measured by
pseudo-R2 , using only housing characteristics is a very low level of 0.0079. Although individual housing characteristics such as age, number of baths, size, and land are significant at the one percent level (6.635 for a χ2 with one degree of freedom), this
M
should be assessed in light of the large sample size of 234,103 transactions.
ED
The fit from using only loan characteristics is much better with a pseudo-R2 of 0.1404. This continues when both property
PT
and loan characteristics are combined (House/Loan 1) which augments the pseudo-R2 to 0.1438 (a slight increase over just
CE
using loan characteristics only). Finally, using property characteristics and loan characteristics which includes year of origin
AC
dichotomous variables increases the pseudo-R2 to 0.1692. Many of the housing characteristics fail to become significant
after taking into account the borrower characteristics. In fact, only house age, which declines from 0.23 to 0.11, is still indi-
26
ACCEPTED MANUSCRIPT
vidually significant. The borrower characteristics also undergo change when adding loan origin year variables. For example, the value to a significant and negative value.18
CR IP T
fixed rate mortgage variable goes from a significant and positive In the last regression in Table 3, having a larger loan to value significantly increases the likelihood of a transaction while using
a fixed rate loan or having a refinanced loan leads to a lower
AN US
likelihood of a transaction. The origin year variables show a pattern of higher transactions probabilities for those with older loans. Since buying a house also reveals some expectation of
AC
CE
PT
ED
M
longer tenure, those who recently bought seem less likely to sell.
18 This indicates that House and Loan 2 model is the proper model. The other models are presented to show the contribution of different groups of variables.
27
28
Pseudo R2 (%)
Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006
AC
0.789
0.135 408.427 9.777 24.856 52.834 2.941
−0.091 0.230 −0.050 −0.195 −0.180 0.017
14.041
0.776 0.250 −3.888
M
−2.835
Estimates
ED
χ2
Estimates
PT
178.782 215.116 3292.281
−2.651 0.192 0.028 −0.024 −0.039 0.001 0.584 0.193 −3.912
Estimates 101.703 283.967 3.321 0.359 2.424 0.006 95.459 123.775 3320.126
χ2
16.923
−1.335 0.109 0.019 −0.095 −0.015 −0.002 0.663 −0.190 −3.887 −0.107 −0.147 −0.278 −0.422 −1.258 −1.613
Estimates
14.376
24.142 83.526 1.557 5.772 0.378 0.057 117.933 87.044 3284.761 5.399 11.461 46.141 107.612 857.373 946.710
χ2
House/Loan 2
CR IP T
AN US
2669.045
χ2
Table 3: Logit Regressions on House Sales Year 2006 House Only Loan only House/Loan 1
CE
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Next we conduct a variety of robustness checks. We first investigate whether the results hold for other time period. Table 4
CR IP T
through Table 7 represent the regression results from years 2007 to 2010, respectively. Like Table 3, each table has four models
with house characteristics only (House Only), loan only (Loan
Only), house and loan (House and Loan 1), and house, loan,
and loan origination year dummies (House and Loan 2). The
AN US
dependent variable is the house sale in that particular year.
The results show some similar patterns with gradual parameter changes over time. For example, focusing on the regressions for data from 2010 in Table 7, shows that again the property characteristics alone have little explanatory power (pseudo-R2
M
of 0.0042) and that explanatory power rises with loan characteristics and origin year variables (pseudo-R2 of 0.2100). Relative
ED
to 2006, in 2010 fixed rate mortgages have a greater marginal deterrent effect on sales and loan-to-value has a greater marginal
PT
positive effect on sales. Refinancing has a large negative effect on sales, but appears to have no significance. The reason lies in
CE
lack of variation, in 2010 almost no properties that refinanced sold as shown in Table 2 for the 2010 columns. The origin year
AC
variables tell a similar story of low likelihood of transactions from sellers who recently obtained a new loan or refinance an existing loan. In 2010, it seems that individuals who purchased or refinanced a house in 2008, 2009, or in 2010 were much less
29
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
likely to sell in 2010.
30
53.989 190.376 10.113 1.439 21.213 68.653
−2.390 0.221 −0.065 −0.062 −0.149 0.103
Estimates
12.601
0.394
Pseudo R2 (%)
M
0.087 0.159 −4.196
−2.826
ED
χ2
Estimates
PT
1.473 49.837 1903.586
−3.975 0.204 0.014 0.060 −0.018 0.076 0.067 0.125 −4.199
Estimates
31
137.682 161.372 0.518 1.384 0.303 36.857 0.819 29.832 1902.577
χ2
14.990
−2.635 0.108 0.008 −0.015 0.011 0.079 0.248 −0.199 −4.116 −0.093 −0.250 −0.410 −0.572 −1.126 −1.450 −1.502
Estimates
12.903
57.184 41.407 0.186 0.086 0.120 38.679 10.800 56.965 1835.779 2.167 17.126 52.780 104.940 386.405 574.350 386.558
χ2
House/Loan 2
CR IP T
AN US
1755.491
χ2
Table 4: Logit Regressions on House Sales Year 2007 House Only Loan only House/Loan 1
CE
Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007
AC
ACCEPTED MANUSCRIPT
32
Pseudo R2 (%)
Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008
AC
0.199
−2.981 −0.036 0.045 0.030 −0.169 0.093
Estimates
53.745 2.853 2.971 0.202 17.790 36.812
13.069
0.646 −0.461 −4.528
M
−3.506
ED
χ2
PT
Estimates −5.886 −0.025 0.094 0.175 0.024 0.081 0.992 −0.442 −4.480
Estimates 192.618 1.386 15.630 7.231 0.370 27.683 89.282 239.264 778.226
χ2
14.337
−5.038 −0.119 0.089 0.163 0.026 0.086 1.153 −0.519 −4.447 0.057 −0.109 −0.486 −0.703 −0.701 −0.756 −1.137 −1.246
Estimates
13.471
133.361 29.325 14.298 6.302 0.436 30.778 118.038 240.802 767.317 0.341 1.386 29.900 65.848 68.227 77.277 148.111 134.961
χ2
House/Loan 2
CR IP T
AN US 41.132 261.449 793.516
1370.006
χ2
Table 5: Logit Regressions on House Sales Year 2008 House Only Loan only House/Loan 1
CE
ACCEPTED MANUSCRIPT
33
Pseudo R2 (%)
Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008 OriginY2009
AC
0.228
0.051 0.102 0.177 −0.382 −0.196 0.049
0.018 23.510 46.004 37.718 27.985 11.480
14.476
1.149 −0.741 −4.271
−2.907 0.122 0.228 −0.241 0.001 0.042 1.209 −0.734 −4.280
Estimates 52.975 33.125 92.424 15.334 0.002 8.062 147.169 819.651 1012.837
χ2
15.674
−2.887 0.023 0.233 −0.186 −0.019 0.043 1.435 −0.479 −4.331 0.050 −0.076 −0.346 −0.312 −0.266 −0.029 −0.409 −1.203 −1.125
Estimates
14.704
48.811 1.116 98.468 9.096 0.261 8.527 201.758 244.545 1034.944 0.213 0.562 12.833 11.293 8.424 0.104 18.367 137.650 108.193
χ2
House/Loan 2
CR IP T
AN US 140.966 838.480 1008.872
1587.707
−3.612
M
χ2
Estimates
ED
χ2
PT
Estimates
Table 6: Logit Regressions on House Sales Year 2009 House Only Loan only House/Loan 1
CE
ACCEPTED MANUSCRIPT
34
Pseudo R2 (%)
Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008 OriginY2009 OriginY2010
AC
0.417
17.785
1.154 −0.972 −17.259
−0.760 0.179 0.216 −0.419 −0.040 0.053 1.103 −0.958 −17.275
Estimates 4.980 84.819 114.855 63.114 1.599 17.924 170.227 1927.064 0.061
χ2
21.004
−0.632 −0.024 0.243 −0.343 −0.081 0.058 1.676 −0.420 −17.359 −0.217 −0.390 −0.554 −0.383 −0.358 −0.385 −0.456 −1.613 −2.319 −2.279
Estimates
18.088
3.186 1.364 149.964 42.281 6.336 21.386 375.395 263.325 0.062 6.034 21.686 51.354 26.993 24.250 27.324 36.658 360.537 664.945 450.597
χ2
House/Loan 2
CR IP T
AN US
194.835 2002.735 0.061
1600.096
−3.070
M
χ2
Estimates
ED
25.497 53.504 36.567 91.973 58.190 27.625
χ2
PT
1.626 0.140 0.134 −0.506 −0.241 0.064
Estimates
Table 7: Logit Regressions on House Sales Year 2010 House Only Loan only House/Loan 1
CE
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Table 8 reports the marginal effects for model House and Loan 2 for year 2006 to 2010. The base case for the marginal
CR IP T
effect is a FRM, non-refinanced loan which was originated in year 2005. All other variables are taken at the sample mean
level. For continuous variable the marginal effect is the change of probability in house sale associated with one unit change of the
independent variable. The marginal effect for dummy variables
AN US
is calculated as the difference of probabilities when the dummy
variable changes from 0 to 1. Table 8 shows that the main results concerning loan-to-value ratio, FRM, and refinanced loans hold
AC
CE
PT
ED
M
up over a particularly large cycle in the Clark county market.
35
CR IP T
ACCEPTED MANUSCRIPT
Table 8: Marginal Effect of Probit Model House and Loan 2 (in %) 0.5687 0.0995 −0.4984 −0.0809 −0.0126 0.0347 −0.1803 −19.8625 −0.0515 1.1187 1.7910 2.2194 −2.3209
0.4274 0.0326 −0.0597 0.0445 0.3134 0.0099 −0.1559 −16.0488 −0.0303 1.1354 1.4318 1.4489 −1.4966 −1.7576
Y2008
Y2009
M
AN US
Y2007
AC
CE
PT
House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2006 OriginY2007 OriginY2008 OriginY2009 OriginY2010
Y2006
ED
Variable
36
−0.2999 0.2243 0.4117 0.0659 0.2169 0.0291 −0.8145 −11.0794 −0.0079 0.2024 0.2762 −0.0020 −0.0958 −0.9775 −1.2794
0.0791 0.7947 −0.6346 −0.0634 0.1457 0.0489 −0.8908 −14.5631 −0.0082 0.0501 −0.0855 −0.0455 0.0244 −0.1743 −2.4250 −2.1411
Y2010
−0.1495 1.5082 −2.1372 −0.5000 0.3606 0.1041 −1.0973 −81.5436 −0.2324 −0.0656 −0.5440 −0.0508 −0.0546 −0.2317 −6.8668 −12.1209 −11.8116
ACCEPTED MANUSCRIPT
Next, we check if the results hold for a variety of sub samples. Table 9 reports the model fit as measured by the pseudo
CR IP T
R2 for the sub samples of year 2006. Panel A to C report the sub sample results according to different house values (assessments), different house ages, and houses in different quality categories
(assessor-based). Since the motivation to sell might be different for investors versus owner-occupiers, Panel D reports the
AN US
model fits for owner-occupied and non owner-occupied status
separately. Housing appreciation may vary across different zip codes within the same MSA. As an additional robustness check, Panel E separates the sample according to owner occupied status and also include zip code dummies to better capture local
M
housing market variations.
Loan origination variables help improve the model fits in all
ED
the sub samples. However, it seems that mortgage information has relatively less predictive power for older, lower quality, and
PT
lower priced house sales. In addition, borrower characteristics improve the model fits for owner occupied house sales more dra-
CE
matically than for non owner occupied house sales. For example, loan only model has a pseudo R2 of 0.1637 for owner occupied
AC
houses versus 0.0754 for non owner occupied houses. A number of factors could create a difference between deci-
sions made by owner occupants and investors and this would have implications for the goodness-of-fit. Investors may have
37
ACCEPTED MANUSCRIPT
different leverage preferences than owner occupants and/or have different financing constraints. Occupants and investors could
CR IP T
differ by time horizon. Certainly, leverage may be negatively associated with age for owner occupants but there may be little connection between leverage and age for investors.
Table 9: Model Fit Statistics (Pseudo R2 (%)) Sub Sample Results Year 2006 House Only
PT
House Quality Low Medium High
CE
Owner Occupied N Y
Owner Occupied (Zip) N Y
AC
12.876 15.801 16.433 11.926
0.483 0.251 0.325 0.391
13.715 15.165 13.954 11.976
Panel B 14.235 15.203 13.986 12.154
17.366 18.456 16.215 13.117
1.216 0.846 0.752
9.056 14.099 13.585
Panel C 10.441 14.438 14.413
11.408 17.042 16.790
1.379 0.530
Panel D 7.539 8.241 16.368 16.585
10.805 19.195
1.823 0.871
Panel E 8.228 8.546 16.710 16.864
11.144 19.444
AN US
Age Age Age < 20y Age < 30y Age
House/Loan 2
0.693 0.302 0.531 0.955
ED
House 10y > 10y ≤ 20y ≤ 30y ≤
House/Loan 1 Panel A 13.256 16.043 17.033 12.804
M
House Value 150K > V 150K ≤ V <300K 300K ≤ V <450K 450K ≤ V
Loan only
38
16.006 18.479 18.768 14.293
ACCEPTED MANUSCRIPT
Another concern is whether the results could be generalized to other areas. To answer this question, we performed the ex-
CR IP T
periments with the Maricopa county data. Table 10 reports the Maricopa results. Panel A is the regression results for year 2006. For simplicity, we report the pseudo R2 for the different spec-
ifications for year 2006 to 2010 in Panel B. Maricopa county
assessor data does not report the number of bathrooms, so the
AN US
variable Bath is excluded from housing characteristics. House age is not well populated in the data, so we add in a missing
house age dummy as a control variable. Similar to the results of Clark county, borrower characteristics help improve the model fit of house sale for every year ranging from 2006 to 2010. For
M
example, in year 2010, by including loan variables, the model fit increases from a pseudo R2 of 0.074 for housing characteristics
ED
alone to 0.1750 for housing and loan characteristics model. Both the sign and the magnitude of the mortgage variable estimates
PT
are similar to the Clark county results. For example, the refinance dummy reduces the probability of sale in every year. The
CE
Maricopa results indicate that the findings are not specific to Clark county.
AC
Based on these results, which hold well for different time
periods, a variety of sub samples, and different areas, individuals modeling loan termination may be able to disregard property characteristics in their models. However, individuals wishing to
39
ACCEPTED MANUSCRIPT
model the effects of property characteristics on the probability of a sale cannot disregard the large effect of the seller’s financial
AC
CE
PT
ED
M
AN US
CR IP T
characteristics which may reveal their expected time horizon.
40
41
Panel B Year 2006 Year 2007 Year 2008 Year 2009 Year 2010
PT
0.797 0.175 0.122 0.125 0.074
1.956 0.454 −0.112 −0.612 −0.121 −0.026
Estimates
ED
19.878 16.313 1.218 756.666 49.366 10.569
χ2 −1.862 0.377 0.042 −0.282 −0.036 0.028 0.978 0.112 −3.697
Estimates 18.445 11.956 0.169 149.425 4.320 10.715 557.529 88.076 7128.909
χ2
Pseudo R2 (%) Year 2006 - 2010 14.253 14.352 13.276 13.337 12.934 13.329 14.595 14.655 15.326 15.423
15.685 15.460 14.593 15.816 17.498
−1.218 0.429 0.064 −0.226 −0.074 0.011 0.906 −0.174 −3.702 −0.039 −0.167 −0.407 −0.538 −0.856 −1.268
7.735 15.206 0.390 93.566 17.827 1.620 480.316 167.576 7120.786 1.803 35.595 233.443 440.819 1127.407 1608.246
χ2
House/Loan 2 Estimates
CR IP T
AN US
1.114 0.128 −3.696
M
6380.011
−2.997
778.471 117.740 7137.538
χ2
Estimates
Table 10: Logit Regressions on House Sales - Maricopa House Only Loan only House/Loan 1
CE
Intercept House Age House Age Missing Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006
Panel A
AC
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
Next, we briefly investigate the relative importance of macroeconomics conditions, house characteristics, and homeowner in-
CR IP T
formation in their impact on house sale. We form a panel data from Clark county, and conduct the logit regressions. Table 11 reports the regression results. To capture the macroeconomic conditions, we include both origination year dummies and sale year dummies in the house/Loan/Macro regression. This last
AN US
regression also includes zip code dummies to control for possible geographical variations. The results show that mortgage information by itself can explain about 10.82% of house sale.
While adding in housing characteristics, macroeconomics controls and zip code fixed effect increases the model fit pseudo-R2
M
to 13.43%. The results indicate that mortgage/borrower information is likely to have stronger explaining power of house sale
ED
than other variables. When adding variety of controls, the sign and significance of loan variables remain stable. Next section
AC
CE
PT
includes even tighter neighborhood controls.
42
Intercept House Age Bath Sqft Land Quality LTV FRM Refi Orig Year Sale Year Zip Code Pseudo R2 (%)
AC
No No No 0.122
0.388 −0.310 −4.388 No No No 10.819
M
−5.750
Estimates
ED
122.692 13.176 3.153 197.527 25.465 38.521
−1.763 0.030 0.019 −0.359 −0.078 0.038
PT
χ2
Estimates
CE −5.308 0.063 0.096 −0.223 0.089 0.022 0.382 −0.317 −4.398 No No No 10.885
Estimates
43
682.675 59.180 101.852 79.815 34.781 13.561 219.629 842.870 6281.101
χ2 −7.782 0.044 0.113 −0.163 −0.009 0.035 0.554 −0.337 −4.390 Yes Yes Yes 13.426
Estimates
4.930 14.364 141.494 38.362 0.274 29.509 409.500 704.690 6267.398
χ2
House/Loan/Macro
CR IP T
AN US
236.239 805.643 6258.378
2546.039
χ2
Table 11: Panel Regressions on House Sales House Only Loan only House/Loan
ACCEPTED MANUSCRIPT
ACCEPTED MANUSCRIPT
4.2
Does Neighborhood/Location Impact House Sales?
This section investigates whether neighborhood/location mat-
CR IP T
ters in house sales. We adopt a spatial econometric method to study the effects of: (1) allowing a spatially correlated disturbance term; and (2) spillovers from nearby houses. Spatially correlated disturbances pick up omitted location related inforrepresent the spillovers.
AN US
mation. The coefficients of the average neighbor values (W X)
We estimated four different probit models as in (21) to (26) and use year 2006 data for this analysis. Table 12 reports the probit results for X only models with both iid disturbances
M
and spatially dependent disturbances. Table 13 includes the average neighbor’s value (W X), and reports the probit results
ED
for X and W X models with both iid disturbances and spatially dependent disturbances. Variables in Table 12 and 13 includes
PT
mortgage information, housing characteristics, and origination year dummies.
CE
Note, the estimates of spatial dependence parameters in both tables have relatively high value of 0.6329 and 0.6444 correspondingly. More important, the t statistics from both tables
AC
associated with the spatial dependence variables are the highest among the corresponding explanatory variables in the regressions. This indicates that the spatial dependence in the error term may have a larger impact on model fit than other included 44
ACCEPTED MANUSCRIPT
independent variables. The pseudo-R2 increases from 0.1658 to 0.1938, and 0.2027 to 0.2302 for X only model and both X and
CR IP T
W X model correspondingly. Modeling spatially dependent disturbances improve model performance because the spatial model picks up location related omitted variable information.
The housing and loan information of the neighbors also helps
increase model explanatory power. The pseudo-R2 increases
AN US
from 0.1658 to 0.2027, and 0.1938 to 0.2302 for the iid model and the spatial model correspondingly. Neighbor’s mortgage
choices have the same direction of impact on house sales as the
AC
CE
PT
ED
M
homeowner’s own mortgage choices decision.
45
iid Model Estimates
Estimates
t
−0.9563 −36.1206 0.0643 9.5807 0.0099 1.5348 −0.0202 −2.3924 −0.0054 −0.7642 −0.0018 −0.2655 0.0730 9.1612 −0.0951 −7.9746 −1.6455 −61.2014 −0.0677 −2.2664 −0.1030 −3.6548 −0.1740 −6.5853 −0.2394 −9.1174 −0.7374 −27.8220 0.6329 69.1508 −47,126.4812 19.3822
AN US
−37.4570 9.6566 1.4176 −2.4176 −0.0651 −0.5513 9.5163 −8.6349 −70.5548 −2.4831 −3.7691 −7.1958 −10.0341 −29.2842
CE
PT
ED
Spatial Model
t
M
Intercept −0.8502 House Age 0.0534 Bath 0.0078 Sqft −0.0173 Land −0.0004 Quality −0.0032 LTV 0.0694 FRM −0.0912 Refi −1.5554 OrigY2001 −0.0644 OrigY2002 −0.0921 OrigY2003 −0.1648 OrigY2004 −0.2279 OrigY2005 −0.6696 ρ Log-Lik −48,766.3011 2 Pseudo R (%) 16.5770
CR IP T
ACCEPTED MANUSCRIPT
AC
Table 12: Spatial and Independent Probit Parameter Estimates - X Only
46
ACCEPTED MANUSCRIPT
Estimates
t
−20.1145 12.8701 1.0002 0.6574 −0.5786 3.0954 6.6944 −7.4565 −69.3564 −10.6841 −0.0262 0.8647 2.5356 −3.7791 4.5389 −1.4508 −48.3668 −2.4463 −3.3214 −6.8490 −10.3071 −29.2867
−0.5432 0.3378 0.0071 0.0094 −0.0100 0.0575 0.0572 −0.0867 −1.7149 −0.2865 0.0031 0.0042 0.0410 −0.0715 0.0511 −0.0264 −1.1156 −0.0671 −0.0935 −0.1689 −0.2505 −0.7538 0.6444 −45,001.8578 23.0167
−18.4456 12.6896 0.8668 0.7202 −0.5354 2.6042 6.6060 −6.7131 −63.3110 −10.6042 0.2332 0.2462 2.0279 −3.0889 3.9851 −1.3218 −39.6832 −2.1796 −3.2227 −6.2082 −9.2689 −27.6379 69.4198
AC
CE
PT
ED
M
Intercept −0.5095 House Age 0.2821 Bath 0.0060 Sqft 0.0062 Land −0.0075 Quality 0.0498 LTV 0.0506 FRM −0.0815 Refi −1.5452 W ·House Age −0.2362 W ·Bath −0.0002 W ·Sqft 0.0102 W ·Land 0.0355 W ·Quality −0.0635 W ·LTV 0.0456 W ·FRM −0.0225 W ·Refi −0.9798 OrigY2001 −0.0650 OrigY2002 −0.0830 OrigY2003 −0.1604 OrigY2004 −0.2395 OrigY2005 −0.6860 ρ Log-Lik −46,610.1264 2 Pseudo R (%) 20.2655
t
AN US
Estimates
Spatial Model
CR IP T
iid Model
Table 13: Spatial and Independent Probit Parameter Estimates - X and WX
47
ACCEPTED MANUSCRIPT
To summarize model fits of various models investigated in this study, iid vs. spatial probit, own characteristics vs. neighbor
CR IP T
characteristics, and housing variables vs. loan characteristics, Table 14 reports the model fit statistics (Log Likelihood and
pseudo-R2 ) of the various models for Clark county in year 2006. The results clearly show that mortgage information, neighbor characteristics, and allowing spatially dependent errors all
AN US
materially increase the predictability or the explanatory power of the house sales models. For example, the pseudo-R2 increases from 0.0076 to 0.0514 when allowing spatially correlated error term for the house variables only model (Model 1 from iid to spatial model). The pseudo-R2 increases from 0.0076 to 0.1434
M
when adding mortgage information for iid model (from Model 1 to Model 3, iid, X only model). Adding neighbor charac-
ED
teristics increases the pseudo-R2 from 0.1434 to 0.1806 (Model 3, iid, from X only to X and W X model). Overall, this re-
PT
search shows that by enhancing the house sales model with loan characteristics, a spatially dependent error term, and neighbor
CE
information, the model fit using the measure of pseudo-R2 increases from 0.0076 to 0.2302, a more than 30-fold improvement
AC
in model fit performance.
48
iid Model
Log-Lik X only −50,264.5860 X and WX −48,186.7911 Log-Lik
0.7620 0.8467
−55,453.4300 −55,423.6000
Model 2: Loan Only (X=Loan Variables) Pseudo R2 (%) Log-Lik 14.0139 17.5683
Pseudo R2 (%) 5.1375 5.1885
Pseudo R2 (%)
−48,621.9553 −46,591.8214
16.8239 20.2968
−48,491.9071 −46,334.6184
17.0464 20.7368
−47,126.4812 −45,001.8578
19.3822 23.0167
Model 3: House/Loan 1 (X=House and Loan Variables) Pseudo R2 (%) Log-Lik Pseudo R2 (%) 14.3381 18.0587
ED
X only −50,075.0876 X and WX −47,900.1440
Model 1: House Only (X=House Variable) Pseudo R2 (%) Log-Lik
AN US
X only −58,011.2100 X and WX −57,961.7100
Spatial Model
M
Log-Lik
CR IP T
ACCEPTED MANUSCRIPT
PT
Model 4: House/Loan 2 (X=House and Loan Variables and Orig Year Dummies) Log-Lik Pseudo R2 (%) Log-Lik Pseudo R2 (%)
CE
X only −48,766.3011 X and WX −46,610.1264
16.5770 20.2655
AC
Table 14: Summary Results: Model Fit Statistics of Spatial vs. iid Probit Models, House Characteristics vs. Loan Characteristics, and own characteristics vs. neighbor characteristics Models
49
ACCEPTED MANUSCRIPT
5
Conclusion
The factors influencing the probability of an individual prop-
CR IP T
erty selling are of interest in a number of areas of real estate research. Historically, at the individual house level, more re-
search has gone into the influence of the property characteristics or macroeconomic variables on the probability of sale than into other factors. Despite these efforts, results from this research
AN US
have been inconclusive as property characteristics do not seem to explain much of the empirical pattern of transactions. However, the mortgage literature has focused on pure financial variables to model mortgage terminations. This manuscript combines both
M
property and mortgage variables together to model the probability of a house transaction. We also investigate whether house
ED
sales are spatially interdependent, or in another word, whether location matters in house sales.
PT
We find that both mortgage and location information materially increase the explanatory power of house sale model. Mort-
CE
gage information relates to seller characteristics, especially the individuals expected tenure in the house. This is revealed by
AC
choices that borrowers make when they choose products whose benefits lie in the future. Explicitly modeling location helps pick up omitted explanatory variables, and measurable neighbor’s characteristics. Taken together, the property, mortgage and location information yields empirical pseudo-R2 of up to 50
ACCEPTED MANUSCRIPT
23%, a great improvement over modeling using property characteristics alone which can result in pseudo-R2 levels of under
CR IP T
1%. The results held up across a particularly large cycle in Las Vegas and Maricopa each year from 2006 to 2010, and for a variety of sub samples.
The addition of the mortgage information changes signifi-
cance levels of the property variable estimates. The results show
AN US
that property characteristics may not aid in modeling mortgage termination, but mortgage variables definitely need to be in-
cluded when modeling the probability of housing transactions in other contexts. Since the data is from public records, this makes it possible to improve the house sale model in a nation-
AC
CE
PT
ED
M
wide scale.
51
ACCEPTED MANUSCRIPT
References Artle, R. and P. Varaiya (1978). Life cycle consumption and
CR IP T
homeownership. Journal of Economic Theory 18 (1), 38–58.
Brueckner, J. K. and J. R. Follain (1988). The rise and fall of the arm: An econometric analysis of mortgage choice. The Review of Economics and Statistics, 93–102.
Spatial lock-in: Do falling house prices
AN US
Chan, S. (2001).
constrain residential mobility? nomics 49 (3), 567–586.
Journal of Urban Eco-
Chambers, M. S., C. Garriga, and D. Schlagenhauf (2009). The
M
loan structure and housing tenure decisions in an equilibrium model of mortgage choice. Review of Economic Dynam-
ED
ics 12 (3), 444–468.
Clapp, J. M., G. M. Goldberg, J. P. Harding, and M. LaCour-
PT
Little (2001). Movers and shuckers: interdependent prepay-
CE
ment decisions. Real Estate Economics 29 (3), 411–450. Deng, Y., J. M. Quigley, and R. Van Order (2000). Mortgage
AC
terminations, heterogeneity and the exercise of mortgage options. Econometrica 68 (2), 275–307.
Fisher, J., D. Gatzlaff, D. Geltner, and D. Haurin (2003). Controlling for the impact of variable liquidity in commercial real estate price indices. Real Estate Economics 31 (2), 269–303. 52
ACCEPTED MANUSCRIPT
Flavin, M. and T. Yamashita (1998). Owner-occupied housing and the composition of the household portfolio over the life-
CR IP T
cycle. National Bureau of Economic Research. Fortowsky, E., M. LaCour-Little, E. Rosenblatt, and V. Yao
(2011). Housing tenure and mortgage choice. The Journal of Real Estate Finance and Economics 42 (2), 162–180.
AN US
Fu, Y. and W. Qian (2014). Speculators and price overreaction in the housing market. Real Estate Economics 42 (4), 977– 1007.
Gatzlaff, D. H. and D. R. Haurin (1997). Sample selection bias
M
and repeat-sales index estimates. The Journal of Real Estate Finance and Economics 14, 33–50.
ED
Gatzlaff, D. H. and D. R. Haurin (1998). Sample selection and biases in local house value indices. Journal of Urban Eco-
PT
nomics 43 (2), 199–222.
Hanson, A., K. Schnier, and G. K. Turnbull (2012). Drive’til
CE
you qualify: Credit quality and household location. Regional
AC
Science and Urban Economics 42 (1), 63–77.
Heckman, J. J. (1979). Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Econometrica (47), 153–161.
53
ACCEPTED MANUSCRIPT
Johnson, K. H., J. D. Benefield, and J. A. Wiley (2007). The probability of sale for residential real estate. Journal of Hous-
CR IP T
ing Research 16 (2), 131–142. Jud, D. G. and T. G. Seaks (1994). Sample selection bias in estimating housing sales. Journal of Real Estate Research 9 (3), 289–298.
An overview of the
AN US
Kau, J. B. and D. C. Keenan (1995).
option-theoretic pricing of mortgages. Journal of Housing Research 6 (2), 217–244.
Krupka, D. J. (2008). The stability of mixed income neighbor-
M
hoods in america.
LeSage, J. P. and R. K. Pace (2009). Introduction to spatial
ED
econometrics. Chapman & Hall/CRC. Munneke, H. J. and B. A. Slade (2000). An empirical study of
PT
sample-selection bias in indices of commercial real estate. The
CE
Journal of Real Estate Finance and Economics 21, 45–64. Ong, S. E. (2000). Prepayment risk and holding period for res-
AC
idential mortgages in Singapore-evidence from condominium transactions data. Journal of Property Investment and Fi-
nance 18 (6), 586–602.
Ortalo-Magn, F. and S. Rady (2006). Housing market dynamics: 54
ACCEPTED MANUSCRIPT
On the contribution of income shocks and credit constraints. Review of Economic Studies 73 (2), 459–485.
CR IP T
Pace, R. K. and J. P. LeSage (2016). Fast simulated maximum
likelihood estimation of the spatial probit model capable of handling large samples. Advances in Econometrics, Volume 37, Emerald.
AN US
Qian, W. (2012). Why do sellers hold out in the housing market? an option-based explanation. Real Estate Economics.
Stanton, R. and N. Wallace (1998). Mortgage choice: What’s the point? Real Estate Economics 26 (2), 173–205.
M
Yatchew, A. and Z. Griliches (1985). Specification error in probit
ED
models. Review of Economics and Statistics 67 (1), 134–139. Zhu, S. and R. K. Pace (2014). Modeling spatially interdependent mortgage decisions. The Journal of Real Estate Finance
AC
CE
PT
and Economics 49(4), 598–620.
55