The influence of house, seller, and locational factors on the probability of sale

The influence of house, seller, and locational factors on the probability of sale

Accepted Manuscript The Influence of House, Seller, and Locational Factors on the Probability of Sale R. Kelley Pace, Shuang Zhu PII: DOI: Reference:...

1MB Sizes 1 Downloads 13 Views

Accepted Manuscript

The Influence of House, Seller, and Locational Factors on the Probability of Sale R. Kelley Pace, Shuang Zhu PII: DOI: Reference:

S1051-1377(18)30064-0 https://doi.org/10.1016/j.jhe.2018.09.009 YJHEC 1609

To appear in:

Journal of Housing Economics

Received date: Accepted date:

21 March 2018 20 September 2018

Please cite this article as: R. Kelley Pace, Shuang Zhu, The Influence of House, Seller, and Locational Factors on the Probability of Sale, Journal of Housing Economics (2018), doi: https://doi.org/10.1016/j.jhe.2018.09.009

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CR IP T

ACCEPTED MANUSCRIPT

The Influence of House, Seller, and Locational Factors on the Probability of Sale

AC

CE

PT

ED

M

AN US

R. Kelley Pace LREC Endowed Chair of Real Estate Department of Finance E.J. Ours College of Business Administration Louisiana State University Baton Rouge, LA 70803-6308 OFF: (225)-578-6256 [email protected] and Shuang Zhu Assistant Professor Department of Finance Kansas State University Manhattan, KS 66506 [email protected] 1

1 The

September 24, 2018

authors would like thank Geoffrey Turnbull and Zsuzsa Huszari for their insightful comments. We appreciate the helpful comments from other participants in the 2017 FSU-UF Critical Issues in Real Estate Symposium. All errors are our own.

ACCEPTED MANUSCRIPT

CR IP T

Abstract Ability to model the probability of individual house sales would be of benefit in a number of real estate contexts. Existing housing literature models the probability of a house sale

AN US

using mainly property characteristics or macroeconomic variables. However, the use of only property characteristics typically yields a poor model fit. This study investigates the relative roles of property characteristics, seller mortgage origination variables,

ED

M

and locational factors in terms of explaining the probability of individual house sales. Homeowner information such as their time horizon of staying in the house and their stage in the life cycle should have an

PT

important impact on house sale decisions. Although obtaining such information is typically difficult, mortgage choices made at loan origination may help reveal such information. In addition, the measurable characteristics of neighbors and the geographic

AC

CE

clustering of house sales, ceteris paribus, may aid in modeling house sales. Incorporating the property characteristics, mortgage origination variables, and locational factors increases the pseudoR2 of a probit model from under 1 percent when only using property characteristics to over 23 percent when using property, mortgage, and spatial information. The results are consistent

ACCEPTED MANUSCRIPT

through a particularly large housing cycle in both Las Vegas

AC

CE

PT

ED

M

AN US

CR IP T

County in Nevada and Maricopa County in Arizona from 2006 to 2010, and for a variety of sub samples.

2

ACCEPTED MANUSCRIPT

1

Introduction

Explaining whether an individual house sold or not over some

CR IP T

time interval is of interest in at least three areas of real estate

research. First, if houses sold are not a random sample from

the population of all houses, sample selection bias could be an issue. Inferences drawn from the selected sample of sold houses

could lead to biased parameter estimates and erroneous infer-

AN US

ences about either the contribution of individual characteristics

or the trajectory of real estate prices over time such as measured by a transaction price based house price index (e.g., Gatzlaff and Haurin, 1997, 1998; Fisher et al., 2003).

M

Second, a house sale automatically results in the prepayment of the mortgage, because of the due on sale clauses in mortgages.

ED

This form of mortgage termination competes with other mutually exclusive mortgage decisions such as default, refinancing, or

PT

payment (e.g., Kau and Keenan, 1995; Deng et al., 2000; Clapp et al., 2001; Ong, 2000). Therefore, factors that affect the sale

CE

of the house also affect these other decisions and thus modeling of these decisions.

AC

Third, housing transactions provide another measure of the

state of the market. For example, the Pending Sales Index from the National Association of Realtors (NAR) is widely viewed as an indicator of housing market conditions. Like the MSA Median Price index from the NAR, this index is also not adjusted 1

ACCEPTED MANUSCRIPT

for variables that differ across markets. This suggests examining the aggregate house sale probability for submarkets after conmeasure of real estate market liquidity.

CR IP T

trolling for important factors could also be used as an improved Despite of the importance of housing sales, papers that ex-

plicitly model the probability of house sales, especially at the

individual property-level, are rare (Johnson et al., 2007). At the

AN US

aggregate level, literature indicates that market conditions may

influence housing market transactions(e.g., Qian, 2012; Fu and Qian, 2014; Chan, 2001). While macroeconomic conditions are important factors affecting the house sales, this study focuses on the contribution of individual property/borrower characteristics

M

in individual level house sales.1

Property-level house sale models normally appear as the first

ED

stage regression of the Heckman two-stage model in the house price index sample selection literature (e.g., Gatzlaff and Hau-

PT

rin, 1997, 1998; Fisher et al., 2003). In the residential real estate literature, house sale models rely mainly on housing variables.

CE

Typically, it has been difficult to obtain a good fit of the probability of a transaction as a function of housing variables. As the

AC

fit becomes worse and worse, a probit or logit model will basically default to showing that each property has a 1/n probability 1 Our data is on seller side. The current seller is previously a mortgage borrower when they purchased or refinanced the property some time in the past. So seller and borrower are used interchangeably in this paper.

2

ACCEPTED MANUSCRIPT

of sale and this either indicates that sample selection is not a problem (e.g., Munneke and Slade, 2000; Jud and Seaks, 1994)

CR IP T

or that the empirical approach taken is inadequate to address the issue.2

To understand the low effect of property characteristics on the probability of a house sale, we introduce a simple theo-

retical model. This model shows that if potential buyers and

AN US

sellers place similar values on characteristics such as living area

or house age, property characteristics by themselves may not materially affect the probability of a sale.

Mortgage borrowers/homeowners make house sale decisions. Their characteristics such as their time horizon of staying in the

M

house or life cycle stage should play a role in the house sale decision (e.g., Ortalo-Magn and Rady, 2006).3 Although typically

ED

such information is difficult to obtain, it is reasonable to assume 2

AC

CE

PT

A feature of the probit model is that for identification the variance of the disturbances is normalized to 1. This means that the error variance is always the same in probit, but that the fit shows up in the variance of the estimated index. A poor fit will result in parameter estimates close to zero and the variance of the estimated index will be low (Yatchew and Griliches, 1985). Therefore, omitted variables in probit, even if they are unrelated to the included variables, enter into the disturbance term prior to normalization, and after normalization result in a shrinkage of the parameter estimates on the included variables. Of course, if the omitted variables are related to the included variables, this leads to bias. This suggests that it will be helpful to augment property characteristic variables with other variables that affect the probability of sale in the first-stage probit regression. 3 In the intertemporal setting, macroeconomic variables are found to explain house sales over time (e.g., Fisher et al., 2003). Chan (2001) and Brueckner and Follain (1988) show that ARM-FRM choice is a function of residential mobility. Fortowsky et al. (2011) also document that mortgage types such as FRM or ARM help explain house tenure. Stanton and Wallace (1998) argues that mortgage choice with or without points helps reveal buyer type.

3

ACCEPTED MANUSCRIPT

that mortgage choices made at loan origination should reflect borrower housing preferences (Chambers et al., 2009). Public

CR IP T

mortgage records provide a convenient source of such origination variables. For example, public records contain information

on whether individuals refinance. Given everything else is the same, individuals who expect to continue stay in the same house

are more likely to take advantage of the market refinance oppor-

AN US

tunities than individuals who expect to move in the near future. Thus, a recently refinanced borrower is more likely stay in the house for some time. Similarly, hybrid adjustable rate mortgages normally have a lower contract rate for initial payments, but also a higher interest rate risk in the long run. Individuals

M

are more likely to select such products if they expect to stay in the house for a shorter time period, holding everything else con-

ED

stant. Therefore, financial choices made by homeowners reveal house.

PT

some information about their anticipated future tenure in the In addition, we investigate whether location matters in the

CE

house sales decision by explicitly modeling both the measurable characteristics of neighbors, and the locationally related

AC

omitted variables (as captured by the geographic clustering of sales). Some of these factors have been discussed in the housing tenure literature. However, whether and to what extent these pre-determined variables could help explain the individual house

4

ACCEPTED MANUSCRIPT

sale model in a given time period is not yet clear.4 For example, although location is an important factor in housing decisions,

CR IP T

because of the computational challenge, it was until recently thought almost impossible to implement formal spatial probit model empirically (Pace and LeSage, 2016).

Using the public transaction data for single family properties

from Clark County in Nevada and Maricopa County in Ari-

AN US

zona, this manuscript examines the performances of modeling the probability of sale as a function of property characteristics,

loan characteristics,5 and location information. In a cross sectional setting, we investigate whether and how these variables affect house sales. We find that mortgage variables greatly en-

M

hance the explanatory power of the probit model. For example, in year 2010, property characteristics alone do not perform

ED

well with a pseudo-R2 of 0.42 percent while loan characteristics alone give a pseudo-R2 of 17.79 percent. Using both property

PT

and loan variables yields a pseudo-R2 of 21.00 percent. Many of the inferences concerning the property characteristics change

CE

after adding the mortgage information. The results hold well for different time periods, different areas, and a variety of sub

AC

samples. The results also indicate that the coefficients of the house sale model vary from year to year. 4

Krupka (2008) find that neighborhood with mixed income tend to less stable. Loan characteristics are pre-determined variables from previous borrower/current seller loan origination in the past. 5

5

ACCEPTED MANUSCRIPT

Locational factors, as captured by both neighborhood information and a spatially dependent disturbance term, are also im-

CR IP T

portant in determining house sales. For example, in year 2006, for the loan variables only model, the pseudo-R2 increases from a 14.01 percent for the iid model to a 16.82 percent for the spatial

error dependent model, and to a 20.30 percent when including neighborhood information in the spatial model.

AN US

In summary, this study investigates at the individual propertylevel the relative roles played by property characteristics, seller mortgage origination variables, and location information in determining cross sectional house sales. Our theoretical model helps explain why housing characteristics have little explana-

M

tory power. Overall, our results show that mortgage and location information together increase the house sale model fit as of

ED

pseudo-R2 from 0.76 percent to 23.02 percent, which is an over model.

PT

30-fold improvement from the housing characteristics only iid This paper is organized as follows: in Section 2, we introduce

CE

a simple theoretical model of the probability of house transactions that incorporates property and seller/buyer characteristics

AC

as well as a spatial econometrics model to explicitly incorporate location related variables. In Section 3 we describe our data gleaned from multiple public record sources, and present the empirical results in Section 4. We summarize the key findings

6

ACCEPTED MANUSCRIPT

in Section 5.

Model

CR IP T

2

Section 2.1 introduces a simple model to theoretically illustrate the relative importance of housing and mortgage information in

explaining house sales. Section 2.2 introduces the spatial econo-

AN US

metric model which formally models both the measurable neighbor information, and the location related omitted variables. 2.1

Mortgage Choices and House Sales

In this section, we introduce a simple model that incorporates

M

property, seller, and buyer characteristics.6 The model illus-

ED

trates why housing characteristics may not aid much in the performance of models of housing transactions. Similarly, we explain why borrower characteristics can enhance the performance

PT

of models of the probability of a housing transaction. In selecting the explanatory variables, we wish to ascertain

CE

the strength of various factors that theoretically could affect the probability that a property transacts (y = 1 if it sells, y = 0

AC

if it does not) in a certain period. For such a transaction to occur, the seller must have a latent reservation price ys∗ that is 6

The model assumes a cross sectional setting in a single area such as the same MSA, so that all observations have the same macroeconomics variables in a single year. This would allow us to focus on the impact of the property-level variables.

7

ACCEPTED MANUSCRIPT

below that of the buyer’s latent reservation price yb∗ as in (1). Alternatively, if the seller’s latent reservation price exceeds the (y = 0) as in (2).

yb∗ − ys∗ ≥ 0 → y = 1

(1)

(2)

AN US

yb∗ − ys∗ < 0 → y = 0

CR IP T

buyer’s latent reservation price, the transaction does not occur

The latent buyer and seller reservation prices in (3) and (4) depend upon the buyer-side characteristics b and seller-side characteristics s where b and s are row-vectors and γ and ϕ are con-

M

formable column vectors. There is a scalar buyer disturbance of

PT

ED

ξb and a scalar seller disturbance of ξs .

yb∗ = bγ + ξb

(3)

ys∗ = sϕ + ξs

(4)

CE

Putting together (3) and (4) shows that the difference in reservation prices depends on the various characteristics (b and s),

AC

their respective coefficients (γ and ϕ), and a random component (ξb and ξs ) as shown in (5). We can further simplify the model since the difference between two random normal variables is also a random normal variable (although with a different vari-

8

ACCEPTED MANUSCRIPT

ance) as shown in (6). Alternatively, if the disturbances have the same variances and follow a Gumbel distribution, the differ-

CR IP T

ence between the two disturbances follows a logistic distribution. Substituting (6) in (5) leads to the simpler (7).

yb∗ − ys∗ = [bγ − sϕ] + [ξb − ξs ] ξ = ξb − ξs

(5)

AN US

(6)

yb∗ − ys∗ = [bγ − sϕ] + ξ

(7)

The common feature of the buyer-side and the seller-side is the house itself with characteristics contained in the row-vector

M

h. However, the buyer may place a value γh on the characteristics h while the seller may place a value ϕh on the characteristics

ED

h. Both the buyer and seller have their own characteristics cb

AC

CE

PT

and cs with values given by γc and ϕc as shown in (8)–(11). h

b = h cb

i

bγ = hγh + cb γc i h s = h cs

sϕ = hϕh + cs ϕc

(8) (9) (10) (11)

Substituting in (9) and (11) into (7) yields (12). Note, if the values placed on the house characteristics h by both the buyer 9

ACCEPTED MANUSCRIPT

(γh ) and the seller (ϕh ) are identical, the house characteristics will not affect whether the house sells or not. In that case only

CR IP T

buyer and seller characteristics and their values along with random disturbances will determine whether the transaction takes

place. Moreover, the differences between the two disturbances

captured by ξ from (6) has a larger variance than the variance of the disturbance from the buyer or the seller (assum-

AN US

ing arms length dealings and independence between the two

disturbances). Specifically, σξ2 = σξ2a + σξ2b and so the process of differencing in (12) has the outcome of usually reducing the magnitude of the signal while increasing the magnitude of the

M

noise (ξ).

ED

yb∗ − ys∗ = h(γh − ϕh ) + [cb γc − cs ϕc ] + ξ

(12)

In terms of implementation in terms of probit model estima-

PT

tion, let yi∗ represent the latent difference in reservation prices between the seller and the buyer for the ith property as shown

CE

in (13). Therefore, y ∗ is the overall n by 1 vector of latent differences in reservation prices across all n observations. The n by

AC

p1 matrix H specifies all the property characteristics across all n observations while the n by p2 matrix C in (14) contains the buyer and seller characteristics. The n by p3 matrix A contains

a vector of ones and other dichotomous variables such as those 10

ACCEPTED MANUSCRIPT

for time or regions and is given the values contained in the p3 by 1 parameter vector α. Therefore, the difference in reservation

CR IP T

prices across all n properties depends on the intercept and dichotomous variable terms, the characteristics of each house, the buyer and seller characteristics, and the random disturbances in

the n by 1 vector ε as shown in (15). The disturbance variance

in probit is not identified, and so is always set to 1 as in (16).

AN US

Given these assumptions, the probability of observing a trans-

action for the ith property equates to the probability that the latent index exceeds 0 as in (17).

(13)

ε ∼ N (0, In )

(16)

M

yi∗ = yb∗ − ys∗ h i C = Cb Cs

ED

y ∗ = Aα + Hβ + Cδ + ε

PT

Pr(yi = 1) = Pr(yi∗ ≥ 0)

(14) (15)

(17)

CE

Housing characteristics H may not matter much in probit es-

timation of β for a couple of reasons. First, β may be close to 0.

AC

As mentioned earlier, this could occur if buyers and sellers give equal values to the housing characteristics in determining their reservation prices. Although one can easily point to individual cases where a buyer and seller diverge on the valuation of an par11

ACCEPTED MANUSCRIPT

ticular characteristic, an empirical model averages over all of the observations and it becomes harder to imagine that buyers and

CR IP T

sellers on average value characteristics differently. Second, characteristics in probit suffer from attenuation bias if omitted variables (even uncorrelated ones) have a large variance. Since the

overall variance of the disturbances is set to 1, omission of vari-

ables attenuates the probit parameter estimates. Therefore, it is

AN US

important to introduce factors to measure or proxy for buyer and seller characteristics in C and other characteristics in A to avoid attenuation bias in β. Similarly, in modeling mortgage termina-

tion, not having property, buyer, or other characteristics could result in attenuation bias of the mortgage characteristics (which

M

pertain to the potential seller). In using the probability of sale as a measure of market liquidity, it would be ideal to control

ED

for as many factors as possible. Otherwise, estimates associated with the temporal dichotomous variable would be affected by

PT

the property, buyer/seller characteristics, and macroeconomics

CE

variables mix over time. 2.2

Locational Aspects of House Sales

AC

In this section we briefly set forth the motivation for including neighborhood variables in section 2.2.1 and spatially dependent disturbances in section 2.2.2. Taken together spatial considerations result in four models of interest in section 2.2.3. Essen12

ACCEPTED MANUSCRIPT

tially, this approach follows that of Zhu and Pace (2014) who examined the spatial aspects of the probability of default. Neighborhood Characteristics and House Sales

CR IP T

2.2.1

As mentioned earlier, much of the previous literature has found

that property characteristics do not help much in predicting house sales. However, property characteristics may only have

AN US

meaning in the context of individual markets or neighborhoods. For example, a 3,000 square foot dwelling on Manhattan Island

in New York could have quite different liquidity than a 3,000 square foot dwelling in Manhattan Kansas. Therefore, adding the neighborhood average of the housing characteristics X via

M

the operation W X, where W is a n by n spatial weight matrix, could augment explanatory power of the models. From mort-

ED

gage perspective, Hanson et al. (2012) find evidence that credit quality is spatially correlated. This suggests that LTV ratio and

PT

refinance activities could be spatially correlated as well. Thus, neighbor’s financing activity could possibly have an impact on

CE

the house sale.

The spatial weight matrix W contains positive elements when-

AC

ever observation i and j are neighbors and zeros otherwise. By

convention, Wii = 0 for i = 1, . . . , n and therefore, observations are not allowed to be neighbors to themselves. Neighbors could be specified by cardinal (e.g., within 0.25 miles) or ordinal 13

ACCEPTED MANUSCRIPT

distances (e.g., six nearest neighbors).7 Augmenting the model to contain both characteristics and

η = Xβ + W Xθ + ε, ηi < 0 → yi = 0

2.2.2

ε ∼ N (0,In )

(18)

(19) (20)

AN US

η i ≥ 0 → yi = 1

CR IP T

their neighborhood averages produces (18).

Spatial Omitted Variables and House Sales

In statistical models the disturbance term ε captures the effect of the many omitted and possibly unobservable variables that

M

affect almost any outcome. If these omitted influences are independent, and no one influence dominates the others, this leads

ED

to specification of the disturbances as following an independent multivariate normal distribution so that ε ∼ N (0,In ).

PT

In the context of real estate, however, many of the omitted

or unobservable variables have a spatial nature. Variables such

CE

as accessibility, noise, architecture, landscaping, functionality of neighborhood associations, and maintenance are difficult to ob-

AC

serve and are either spatial or produce spatial externalities. This leads to spatial dependence in the disturbances and a common way of modeling this is through the conditional autoregressive 7

See LeSage and Pace (2009) for more details on the spatial weight matrix and for motivations underlying the use of W X as well as spatially dependent disturbances.

14

ACCEPTED MANUSCRIPT

(CAR) process so that ε ∼ N (0,(In − ρW )−1 ). Spatial probit

model was previously extremely difficult to estimate due to the

CR IP T

substantial computational challenges this model presents (Pace and LeSage, 2016). Fortunately the newly developed method by

Pace and LeSage (2016) makes estimation possible. This paper is the first house sales paper using spatial probit. Overall Models

AN US

2.2.3

The spatial and independence models from sections 2.2.1 and 2.2.2, result in four different probit models as in (21) to (26).

η = Xβ + ε,

ε ∼ N (0,In )

M

η = Xβ + ε,

ε ∼ N (0,(In − ρW )−1 ) ε ∼ N (0,In )

ED

η = Xβ + W Xθ + ε,

ε ∼ N (0,(In − ρW )−1 )

η = Xβ + W Xθ + ε,

(21) (22) (23) (24) (25)

ηi ≥ 0 → yi = 1

(26)

CE

PT

η i < 0 → yi = 0

Data

AC

3

This section introduces data and variables. Section 3.1 introduces our data sources and presents how the sample data is constructed. Section 3.2 discusses the variables and summary 15

ACCEPTED MANUSCRIPT

statistics. Data Source and Data Construction

CR IP T

3.1

This section describes the data sources, the sample selection, and the data construction. Our analysis focuses on single family residential properties.

We use the public record data from both Clark County Asses-

AN US

sor’s Office and Clark County Recorder’s Office in Nevada. We also obtain data from Maricopa County in Arizona for robust-

ness checks. The assessor data contains information of housing characteristics and street addresses. The recorder data contains

M

details on real estate related transactions including ownership transactions such as arms length house sales, foreclosure sales,

ED

and quit claim transfers as well as non-ownership transactions such as refinance and home equity loan activities. Our transaction data starts from year 2000. The recorder data has detailed

PT

information about the transaction such as the dollar value transferred, mortgage information, deed type, and transaction type.

CE

Street addresses are geocoded to longitude and latitude by using ArcGIS 10.0 Desktop for spatial econometrics analysis. We

AC

also use the Case-Shiller house price index (HPI) to estimate a loan-to-value at origination for refinance loans. Our study requires house sale information, the seller’s mort-

16

ACCEPTED MANUSCRIPT

gage information, and property characteristics.8 To obtain mortgage information associated with arms length resale transac-

CR IP T

tions, we filter the recorder data by excluding quit claim transfers, construction and time share transactions.9 For the refi-

nanced/equity loans, we eliminate the line of credit refinance loans, and the equity refinance loans.10 The equity refinance

is identified by the criterion of less than forty percent of loan

AN US

amount versus the last resale amount.11 For each year inves-

tigated, the mortgage information comes from the latest prior transaction which could be either a resale or a refinance. In another word, we derive the sellers’ loan information, for example in year 2010, from the latest previous transactions when the cur-

M

rent seller either bought the house and borrowed the mortgage, or refinanced. These loan origination dates could go back to year

ED

2000. Thus the sellers’ mortgage variables are pre-determined in our analysis. 8

PT

The sales data is identified by resale transactions.12 We then This study does not include buyer information. So the analysis is on the seller’s side. Transactions without mortgage information are excluded from this study. This reduces the property-year sample size by 176,996, about 12.5% of the raw sample with 1,414,085 property-year observations. We compared property characteristics for houses with and without mortgage to check if there is any systematic difference. It seems that houses with mortgages are slightly older houses with a little bit lower quality than those without mortgage information. Number of bathrooms, total living areas and land are similar between the two groups. 10 While cash out refinance and rate reduction refinance might have different effect on house sale, our data does not contain such information. 11 In another word, if refinance with loan-to-value ratio lower thatn forty percent is identified as equity refinance. 12 We focus on the successful sales and only successful sales are observed in our data. In-

AC

CE

9

17

ACCEPTED MANUSCRIPT

merge the sales data with the mortgage data, and the assessor data for housing characteristics. From here, we exclude the dis-

CR IP T

tressed property related transactions which include foreclosure auction sales, deed-in-lieu foreclosures, and post foreclosure sales

(REO).13 Our analysis focus on normal market sales. We also require a total living area of between 500 square feet and 7000

square feet, valid mortgage information, and property charac-

AN US

teristics. Our sample time period is from 2006 to 2010. Year 2000 to 2005 data is also used for mortgage data since we need to go back to the previous transactions for the loan information. This results in a property-year sample size of 1,237,089. Variables and Summary Statistics

M

3.2

ED

This section discusses variables and some summary statistics. The dependent variable of the probit model is the house sale, which equals 1 if a house sold in a certain year and other-

PT

wise equals 0. Explanatory variables are housing characteristics, loan information, and spatial averages of the own observation

CE

explanatory variables. Housing characteristics include logged

AC

house age (House Age), number of bathrooms (Bath), logged vestigating how seller mortgage choices at loan origination could affect the listing decisions would be interesting for future research. 13 Distressed sales are taken out of the sample for two reasons: (1) we want to rule out the alternative explanation that the mortgage variables are risk factors of mortgage default, since properties with risky loans are more likely to go into a distress sale; (2) distressed property sales and normal market sales could have quite different explanatory variables or regressions. By focusing on normal market sales, we have a cleaner setting.

18

ACCEPTED MANUSCRIPT

total living area in square footage (Sqft), logged lot size (Land), and an assessor quality rating (Quality) indicating the construc-

CR IP T

tion quality of the property ranging from the lowest quality of one to the highest quality of ten. Mortgage information includes

combined loan-to-value ratio (LTV) at origination (the house value of refinance loans is calculated by using HPI to update

the last resale price), fixed rate mortgage dummy (FRM), refi-

AN US

nanced loan dummy (Refi), and loan origination year dummies.

The mortgage choices at loan origination are used to capture borrower’s characteristics or housing preferences. We have loan origination year dummies to capture the market condition at loan origination. In the cross sectional regressions, the sale year

M

dummy is implicitly controlled. In the panel regression, we also control the sale year dummies to capture the market condition

ED

at the time of house sale, and the zip code dummies to better control the potential geographical difference. To have an even

PT

finer control of the geographical difference, we add in the spatial components into the model. The weight matrix used is the

CE

nearest neighbor weight matrix with six closest neighbors where the second nearest neighbor had an influence of 0.35 of the first

AC

nearest neighbor, the third nearest neighbor had an influence of 0.35 of that of the second nearest neighbor, and so forth.14 The summary statistics appear in Table 1. The loan information is 14

We did not investigate alternative specifications of the spatial weight matrix, but just used the same one as in Zhu and Pace (2014).

19

ACCEPTED MANUSCRIPT

at origination.

0.000 0.000 1.000 6.217 6.770 1.000 0.001 0.000 0.000

1.000 4.691 9.500 8.854 10.588 10.000 1.500 1.000 1.000

AN US

Sold 0.037 0.190 House Age 2.476 0.806 Bath 2.332 0.701 Sqft 7.511 0.346 Land 8.756 0.452 Quality 4.666 1.123 LTV 0.757 0.253 FRM 0.655 0.475 Refi 0.453 0.498 N(in 1000) 1237.089 Note: House Age, Sqft and Land are

Max

CR IP T

Variable

Table 1: Summary Statistics Mean Std Dev Min

in logged format

M

To obtain an idea whether borrower mortgage choices are correlated with the house sale probability, we report the sam-

ED

ple sizes and the proportions of sold properties across different mortgage types for year 2006-2010 in Table 2.15 Several clear

PT

patterns arise. First, properties with a FRM have a lower proportion of sold than with an ARM in all years. For example, in

CE

year 2010, about 7.2 percent properties with ARM sold on the market versus only 2.3 percent properties with FRM sold dur-

AC

ing the same time period. Second, properties with refinanced loans are much less likely to be sold on the market than those with non-refinanced loans. For example, in year 2006, less than 15

The proportion of borrower using FRM increases after year 2008. This might be a result from the high lending standard after crisis and low interest rate environment.

20

ACCEPTED MANUSCRIPT

one percent of refinanced properties are sold while more than 11 percent of properties with non-refinanced loans are sold. Third,

CR IP T

higher LTV ratio loans also have a much higher percentage of sold properties than lower LTV ratio properties. These results provide preliminary evidence that mortgage choices vary with

AC

CE

PT

ED

M

AN US

house sale decisions.

21

CE

N

139.017 95.086

98.587 135.516

130.348 103.755

FRM=1 FRM=0

Refi=1 Refi=0

LTV≥0.8 LTV<0.8

Whole Sample 234.103

Sample

0.1049 131.184 0.0236 111.948

0.0024 108.560 0.1172 134.572

0.0605 144.320 0.0810 98.812

M

N

0.0414 140.453 0.0079 118.250

0.0005 120.630 0.0480 138.073

0.0166 191.951 0.0474 66.752

CR IP T

0.0357 136.438 0.0071 117.832

0.0003 118.370 0.0414 135.900

0.0142 177.989 0.0365 76.281

N

0.0573 0.0105

0.0000 0.0672

0.0234 0.0717

0.0359

Sold

Y2010 0.0259 258.703

Sold

Y2009 0.0223 254.270

Sold

AN US

0.0552 131.352 0.0140 115.529

0.0011 114.831 0.0646 132.050

ED

N

0.0306 157.125 0.0444 89.756

Sold

Y2008 0.0362 246.881

N

Y2007 0.0688 243.132

Sold

PT

Y2006

Table 2: Sample Size and Proportion Sold by Year and Loan Characteristics (N in Thousands)

AC

ACCEPTED MANUSCRIPT

22

ACCEPTED MANUSCRIPT

4

Empirical Results

This section presents the empirical results of various house sales

CR IP T

models. Section 4.1 focuses on using mortgage choice informa-

tion to enhance house sales models. Section 4.2 studies whether neighborhood information has impact on house turnover, and

summarizes of the overall results. For robustness of the results,

we used both probit and logit regressions. Results in Section 4.1

AN US

are logit regression results. Results in Section 4.2 are probit regression results. 4.1

Mortgage Choices and House Sales

ED

general structure.

M

We estimate a cross sectional logit model with the following

CE

PT

y ∗ = Hβ + Cδ + Aα + ε ε ∼ Logistic(0, In )

Pr(yi = 1) = Pr(yi∗ ≥ 0)

(27) (28) (29)

AC

To make the general specification in (27) implementable, we

used the empirical specification in (30) to (32).

23

ACCEPTED MANUSCRIPT

i H = ln(House Age) Bath ln(Sqft) ln(Land) Quality (30) h i C = LTV FRM Refi Origin Year Dummies (31)

CR IP T

h

A = Intercept

(32)

Ideally, we would like to know detailed information on the

AN US

seller and buyer.16 However, this information would be diffi-

cult to obtain. Therefore, we infer some seller information from the mortgage choices made in the past by the existing borrower who is the seller. Holding everything else constant, homeowners with a longer tenure time horizon are more likely to seek

M

refinancing than homeowners who plan to move in a short time period. Therefore, we infer that those who refinanced possess refinance.

ED

less motivation to sell their house relative to those who did not

PT

Similarly, individuals who chose a fixed rate loan also revealed their time horizon. Usually, individuals with short time horizons

CE

will find ARMs less expensive in the short-run. However, borrowers of ARM need to bear more interest rate risk in the long

AC

run. Therefore, individuals who chose fixed rate mortgages typically have longer expected holding periods. Thus, we anticipate 16

We do not have information on the buyer characteristics, although some of these could be inferred from the mortgage choices they made after buying the house. Currently, we have not matched these records, but this is feasible in future research.

24

ACCEPTED MANUSCRIPT

that the FRM variable would have a negative coefficient. Many house buying decisions are a function of the individ-

CR IP T

ual’s life-cycle (Ortalo-Magn and Rady, 2006; Artle and Varaiya, 1978; Flavin and Yamashita, 1998). Typically, younger buyers

need to borrow a higher proportion of the price as they have

fewer financial resources. At the other end of the life-cycle, po-

tential retirees often wish to enter retirement without mortgage

AN US

debt. Therefore, LTV is likely inversely related to age. In turn,

mobility is inversely related to age. We infer that LTV is positively associated with mobility and thus should be positively associated with the probability of sale. Therefore, we anticipate a positive coefficient for the LTV variable.17

M

Insofar as credit standards, interest rates, and other market conditions change over time, we control for some of these effects

ED

through origin year dummy variables. For example, if credit standards are lax in a particular year, loan-to-value ratios could

PT

be shifted upwards for most borrowers. Similarly, the relative rates between fixed rate loans and the initial periods of ARM

CE

loans differ over time. Using the origin year dummies helps control for some of this variation.

AC

In terms of housing characteristics, we would usually an-

ticipate that older houses have more trouble transacting, that 17

Note that falling house price leads to a higher current LTV, which may hinder mobility (e.g., Chan, 2001). Thus, instead of current LTV, we use the original LTV to better capture the individual’s life-cycle, which also helps avoid the simultaneity issue. The origination year dummies help control the change in house prices.

25

ACCEPTED MANUSCRIPT

houses with too few bathrooms have trouble transacting, that both very small and very large houses in terms of floor area and houses would have more trouble selling.

CR IP T

land area would be less likely to trade, and that low quality Table 3 presents estimates from the logit estimation using

observations from 2006. Thus, it reflects the market situation before or shortly into the housing crisis. Several features emerge

AN US

that repeat in subsequent periods. First, the fit, as measured by

pseudo-R2 , using only housing characteristics is a very low level of 0.0079. Although individual housing characteristics such as age, number of baths, size, and land are significant at the one percent level (6.635 for a χ2 with one degree of freedom), this

M

should be assessed in light of the large sample size of 234,103 transactions.

ED

The fit from using only loan characteristics is much better with a pseudo-R2 of 0.1404. This continues when both property

PT

and loan characteristics are combined (House/Loan 1) which augments the pseudo-R2 to 0.1438 (a slight increase over just

CE

using loan characteristics only). Finally, using property characteristics and loan characteristics which includes year of origin

AC

dichotomous variables increases the pseudo-R2 to 0.1692. Many of the housing characteristics fail to become significant

after taking into account the borrower characteristics. In fact, only house age, which declines from 0.23 to 0.11, is still indi-

26

ACCEPTED MANUSCRIPT

vidually significant. The borrower characteristics also undergo change when adding loan origin year variables. For example, the value to a significant and negative value.18

CR IP T

fixed rate mortgage variable goes from a significant and positive In the last regression in Table 3, having a larger loan to value significantly increases the likelihood of a transaction while using

a fixed rate loan or having a refinanced loan leads to a lower

AN US

likelihood of a transaction. The origin year variables show a pattern of higher transactions probabilities for those with older loans. Since buying a house also reveals some expectation of

AC

CE

PT

ED

M

longer tenure, those who recently bought seem less likely to sell.

18 This indicates that House and Loan 2 model is the proper model. The other models are presented to show the contribution of different groups of variables.

27

28

Pseudo R2 (%)

Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006

AC

0.789

0.135 408.427 9.777 24.856 52.834 2.941

−0.091 0.230 −0.050 −0.195 −0.180 0.017

14.041

0.776 0.250 −3.888

M

−2.835

Estimates

ED

χ2

Estimates

PT

178.782 215.116 3292.281

−2.651 0.192 0.028 −0.024 −0.039 0.001 0.584 0.193 −3.912

Estimates 101.703 283.967 3.321 0.359 2.424 0.006 95.459 123.775 3320.126

χ2

16.923

−1.335 0.109 0.019 −0.095 −0.015 −0.002 0.663 −0.190 −3.887 −0.107 −0.147 −0.278 −0.422 −1.258 −1.613

Estimates

14.376

24.142 83.526 1.557 5.772 0.378 0.057 117.933 87.044 3284.761 5.399 11.461 46.141 107.612 857.373 946.710

χ2

House/Loan 2

CR IP T

AN US

2669.045

χ2

Table 3: Logit Regressions on House Sales Year 2006 House Only Loan only House/Loan 1

CE

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Next we conduct a variety of robustness checks. We first investigate whether the results hold for other time period. Table 4

CR IP T

through Table 7 represent the regression results from years 2007 to 2010, respectively. Like Table 3, each table has four models

with house characteristics only (House Only), loan only (Loan

Only), house and loan (House and Loan 1), and house, loan,

and loan origination year dummies (House and Loan 2). The

AN US

dependent variable is the house sale in that particular year.

The results show some similar patterns with gradual parameter changes over time. For example, focusing on the regressions for data from 2010 in Table 7, shows that again the property characteristics alone have little explanatory power (pseudo-R2

M

of 0.0042) and that explanatory power rises with loan characteristics and origin year variables (pseudo-R2 of 0.2100). Relative

ED

to 2006, in 2010 fixed rate mortgages have a greater marginal deterrent effect on sales and loan-to-value has a greater marginal

PT

positive effect on sales. Refinancing has a large negative effect on sales, but appears to have no significance. The reason lies in

CE

lack of variation, in 2010 almost no properties that refinanced sold as shown in Table 2 for the 2010 columns. The origin year

AC

variables tell a similar story of low likelihood of transactions from sellers who recently obtained a new loan or refinance an existing loan. In 2010, it seems that individuals who purchased or refinanced a house in 2008, 2009, or in 2010 were much less

29

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

likely to sell in 2010.

30

53.989 190.376 10.113 1.439 21.213 68.653

−2.390 0.221 −0.065 −0.062 −0.149 0.103

Estimates

12.601

0.394

Pseudo R2 (%)

M

0.087 0.159 −4.196

−2.826

ED

χ2

Estimates

PT

1.473 49.837 1903.586

−3.975 0.204 0.014 0.060 −0.018 0.076 0.067 0.125 −4.199

Estimates

31

137.682 161.372 0.518 1.384 0.303 36.857 0.819 29.832 1902.577

χ2

14.990

−2.635 0.108 0.008 −0.015 0.011 0.079 0.248 −0.199 −4.116 −0.093 −0.250 −0.410 −0.572 −1.126 −1.450 −1.502

Estimates

12.903

57.184 41.407 0.186 0.086 0.120 38.679 10.800 56.965 1835.779 2.167 17.126 52.780 104.940 386.405 574.350 386.558

χ2

House/Loan 2

CR IP T

AN US

1755.491

χ2

Table 4: Logit Regressions on House Sales Year 2007 House Only Loan only House/Loan 1

CE

Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007

AC

ACCEPTED MANUSCRIPT

32

Pseudo R2 (%)

Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008

AC

0.199

−2.981 −0.036 0.045 0.030 −0.169 0.093

Estimates

53.745 2.853 2.971 0.202 17.790 36.812

13.069

0.646 −0.461 −4.528

M

−3.506

ED

χ2

PT

Estimates −5.886 −0.025 0.094 0.175 0.024 0.081 0.992 −0.442 −4.480

Estimates 192.618 1.386 15.630 7.231 0.370 27.683 89.282 239.264 778.226

χ2

14.337

−5.038 −0.119 0.089 0.163 0.026 0.086 1.153 −0.519 −4.447 0.057 −0.109 −0.486 −0.703 −0.701 −0.756 −1.137 −1.246

Estimates

13.471

133.361 29.325 14.298 6.302 0.436 30.778 118.038 240.802 767.317 0.341 1.386 29.900 65.848 68.227 77.277 148.111 134.961

χ2

House/Loan 2

CR IP T

AN US 41.132 261.449 793.516

1370.006

χ2

Table 5: Logit Regressions on House Sales Year 2008 House Only Loan only House/Loan 1

CE

ACCEPTED MANUSCRIPT

33

Pseudo R2 (%)

Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008 OriginY2009

AC

0.228

0.051 0.102 0.177 −0.382 −0.196 0.049

0.018 23.510 46.004 37.718 27.985 11.480

14.476

1.149 −0.741 −4.271

−2.907 0.122 0.228 −0.241 0.001 0.042 1.209 −0.734 −4.280

Estimates 52.975 33.125 92.424 15.334 0.002 8.062 147.169 819.651 1012.837

χ2

15.674

−2.887 0.023 0.233 −0.186 −0.019 0.043 1.435 −0.479 −4.331 0.050 −0.076 −0.346 −0.312 −0.266 −0.029 −0.409 −1.203 −1.125

Estimates

14.704

48.811 1.116 98.468 9.096 0.261 8.527 201.758 244.545 1034.944 0.213 0.562 12.833 11.293 8.424 0.104 18.367 137.650 108.193

χ2

House/Loan 2

CR IP T

AN US 140.966 838.480 1008.872

1587.707

−3.612

M

χ2

Estimates

ED

χ2

PT

Estimates

Table 6: Logit Regressions on House Sales Year 2009 House Only Loan only House/Loan 1

CE

ACCEPTED MANUSCRIPT

34

Pseudo R2 (%)

Intercept House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006 OriginY2007 OriginY2008 OriginY2009 OriginY2010

AC

0.417

17.785

1.154 −0.972 −17.259

−0.760 0.179 0.216 −0.419 −0.040 0.053 1.103 −0.958 −17.275

Estimates 4.980 84.819 114.855 63.114 1.599 17.924 170.227 1927.064 0.061

χ2

21.004

−0.632 −0.024 0.243 −0.343 −0.081 0.058 1.676 −0.420 −17.359 −0.217 −0.390 −0.554 −0.383 −0.358 −0.385 −0.456 −1.613 −2.319 −2.279

Estimates

18.088

3.186 1.364 149.964 42.281 6.336 21.386 375.395 263.325 0.062 6.034 21.686 51.354 26.993 24.250 27.324 36.658 360.537 664.945 450.597

χ2

House/Loan 2

CR IP T

AN US

194.835 2002.735 0.061

1600.096

−3.070

M

χ2

Estimates

ED

25.497 53.504 36.567 91.973 58.190 27.625

χ2

PT

1.626 0.140 0.134 −0.506 −0.241 0.064

Estimates

Table 7: Logit Regressions on House Sales Year 2010 House Only Loan only House/Loan 1

CE

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Table 8 reports the marginal effects for model House and Loan 2 for year 2006 to 2010. The base case for the marginal

CR IP T

effect is a FRM, non-refinanced loan which was originated in year 2005. All other variables are taken at the sample mean

level. For continuous variable the marginal effect is the change of probability in house sale associated with one unit change of the

independent variable. The marginal effect for dummy variables

AN US

is calculated as the difference of probabilities when the dummy

variable changes from 0 to 1. Table 8 shows that the main results concerning loan-to-value ratio, FRM, and refinanced loans hold

AC

CE

PT

ED

M

up over a particularly large cycle in the Clark county market.

35

CR IP T

ACCEPTED MANUSCRIPT

Table 8: Marginal Effect of Probit Model House and Loan 2 (in %) 0.5687 0.0995 −0.4984 −0.0809 −0.0126 0.0347 −0.1803 −19.8625 −0.0515 1.1187 1.7910 2.2194 −2.3209

0.4274 0.0326 −0.0597 0.0445 0.3134 0.0099 −0.1559 −16.0488 −0.0303 1.1354 1.4318 1.4489 −1.4966 −1.7576

Y2008

Y2009

M

AN US

Y2007

AC

CE

PT

House Age Bath Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2006 OriginY2007 OriginY2008 OriginY2009 OriginY2010

Y2006

ED

Variable

36

−0.2999 0.2243 0.4117 0.0659 0.2169 0.0291 −0.8145 −11.0794 −0.0079 0.2024 0.2762 −0.0020 −0.0958 −0.9775 −1.2794

0.0791 0.7947 −0.6346 −0.0634 0.1457 0.0489 −0.8908 −14.5631 −0.0082 0.0501 −0.0855 −0.0455 0.0244 −0.1743 −2.4250 −2.1411

Y2010

−0.1495 1.5082 −2.1372 −0.5000 0.3606 0.1041 −1.0973 −81.5436 −0.2324 −0.0656 −0.5440 −0.0508 −0.0546 −0.2317 −6.8668 −12.1209 −11.8116

ACCEPTED MANUSCRIPT

Next, we check if the results hold for a variety of sub samples. Table 9 reports the model fit as measured by the pseudo

CR IP T

R2 for the sub samples of year 2006. Panel A to C report the sub sample results according to different house values (assessments), different house ages, and houses in different quality categories

(assessor-based). Since the motivation to sell might be different for investors versus owner-occupiers, Panel D reports the

AN US

model fits for owner-occupied and non owner-occupied status

separately. Housing appreciation may vary across different zip codes within the same MSA. As an additional robustness check, Panel E separates the sample according to owner occupied status and also include zip code dummies to better capture local

M

housing market variations.

Loan origination variables help improve the model fits in all

ED

the sub samples. However, it seems that mortgage information has relatively less predictive power for older, lower quality, and

PT

lower priced house sales. In addition, borrower characteristics improve the model fits for owner occupied house sales more dra-

CE

matically than for non owner occupied house sales. For example, loan only model has a pseudo R2 of 0.1637 for owner occupied

AC

houses versus 0.0754 for non owner occupied houses. A number of factors could create a difference between deci-

sions made by owner occupants and investors and this would have implications for the goodness-of-fit. Investors may have

37

ACCEPTED MANUSCRIPT

different leverage preferences than owner occupants and/or have different financing constraints. Occupants and investors could

CR IP T

differ by time horizon. Certainly, leverage may be negatively associated with age for owner occupants but there may be little connection between leverage and age for investors.

Table 9: Model Fit Statistics (Pseudo R2 (%)) Sub Sample Results Year 2006 House Only

PT

House Quality Low Medium High

CE

Owner Occupied N Y

Owner Occupied (Zip) N Y

AC

12.876 15.801 16.433 11.926

0.483 0.251 0.325 0.391

13.715 15.165 13.954 11.976

Panel B 14.235 15.203 13.986 12.154

17.366 18.456 16.215 13.117

1.216 0.846 0.752

9.056 14.099 13.585

Panel C 10.441 14.438 14.413

11.408 17.042 16.790

1.379 0.530

Panel D 7.539 8.241 16.368 16.585

10.805 19.195

1.823 0.871

Panel E 8.228 8.546 16.710 16.864

11.144 19.444

AN US

Age Age Age < 20y Age < 30y Age

House/Loan 2

0.693 0.302 0.531 0.955

ED

House 10y > 10y ≤ 20y ≤ 30y ≤

House/Loan 1 Panel A 13.256 16.043 17.033 12.804

M

House Value 150K > V 150K ≤ V <300K 300K ≤ V <450K 450K ≤ V

Loan only

38

16.006 18.479 18.768 14.293

ACCEPTED MANUSCRIPT

Another concern is whether the results could be generalized to other areas. To answer this question, we performed the ex-

CR IP T

periments with the Maricopa county data. Table 10 reports the Maricopa results. Panel A is the regression results for year 2006. For simplicity, we report the pseudo R2 for the different spec-

ifications for year 2006 to 2010 in Panel B. Maricopa county

assessor data does not report the number of bathrooms, so the

AN US

variable Bath is excluded from housing characteristics. House age is not well populated in the data, so we add in a missing

house age dummy as a control variable. Similar to the results of Clark county, borrower characteristics help improve the model fit of house sale for every year ranging from 2006 to 2010. For

M

example, in year 2010, by including loan variables, the model fit increases from a pseudo R2 of 0.074 for housing characteristics

ED

alone to 0.1750 for housing and loan characteristics model. Both the sign and the magnitude of the mortgage variable estimates

PT

are similar to the Clark county results. For example, the refinance dummy reduces the probability of sale in every year. The

CE

Maricopa results indicate that the findings are not specific to Clark county.

AC

Based on these results, which hold well for different time

periods, a variety of sub samples, and different areas, individuals modeling loan termination may be able to disregard property characteristics in their models. However, individuals wishing to

39

ACCEPTED MANUSCRIPT

model the effects of property characteristics on the probability of a sale cannot disregard the large effect of the seller’s financial

AC

CE

PT

ED

M

AN US

CR IP T

characteristics which may reveal their expected time horizon.

40

41

Panel B Year 2006 Year 2007 Year 2008 Year 2009 Year 2010

PT

0.797 0.175 0.122 0.125 0.074

1.956 0.454 −0.112 −0.612 −0.121 −0.026

Estimates

ED

19.878 16.313 1.218 756.666 49.366 10.569

χ2 −1.862 0.377 0.042 −0.282 −0.036 0.028 0.978 0.112 −3.697

Estimates 18.445 11.956 0.169 149.425 4.320 10.715 557.529 88.076 7128.909

χ2

Pseudo R2 (%) Year 2006 - 2010 14.253 14.352 13.276 13.337 12.934 13.329 14.595 14.655 15.326 15.423

15.685 15.460 14.593 15.816 17.498

−1.218 0.429 0.064 −0.226 −0.074 0.011 0.906 −0.174 −3.702 −0.039 −0.167 −0.407 −0.538 −0.856 −1.268

7.735 15.206 0.390 93.566 17.827 1.620 480.316 167.576 7120.786 1.803 35.595 233.443 440.819 1127.407 1608.246

χ2

House/Loan 2 Estimates

CR IP T

AN US

1.114 0.128 −3.696

M

6380.011

−2.997

778.471 117.740 7137.538

χ2

Estimates

Table 10: Logit Regressions on House Sales - Maricopa House Only Loan only House/Loan 1

CE

Intercept House Age House Age Missing Sqft Land Quality LTV FRM Refi OriginY2001 OriginY2002 OriginY2003 OriginY2004 OriginY2005 OriginY2006

Panel A

AC

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

Next, we briefly investigate the relative importance of macroeconomics conditions, house characteristics, and homeowner in-

CR IP T

formation in their impact on house sale. We form a panel data from Clark county, and conduct the logit regressions. Table 11 reports the regression results. To capture the macroeconomic conditions, we include both origination year dummies and sale year dummies in the house/Loan/Macro regression. This last

AN US

regression also includes zip code dummies to control for possible geographical variations. The results show that mortgage information by itself can explain about 10.82% of house sale.

While adding in housing characteristics, macroeconomics controls and zip code fixed effect increases the model fit pseudo-R2

M

to 13.43%. The results indicate that mortgage/borrower information is likely to have stronger explaining power of house sale

ED

than other variables. When adding variety of controls, the sign and significance of loan variables remain stable. Next section

AC

CE

PT

includes even tighter neighborhood controls.

42

Intercept House Age Bath Sqft Land Quality LTV FRM Refi Orig Year Sale Year Zip Code Pseudo R2 (%)

AC

No No No 0.122

0.388 −0.310 −4.388 No No No 10.819

M

−5.750

Estimates

ED

122.692 13.176 3.153 197.527 25.465 38.521

−1.763 0.030 0.019 −0.359 −0.078 0.038

PT

χ2

Estimates

CE −5.308 0.063 0.096 −0.223 0.089 0.022 0.382 −0.317 −4.398 No No No 10.885

Estimates

43

682.675 59.180 101.852 79.815 34.781 13.561 219.629 842.870 6281.101

χ2 −7.782 0.044 0.113 −0.163 −0.009 0.035 0.554 −0.337 −4.390 Yes Yes Yes 13.426

Estimates

4.930 14.364 141.494 38.362 0.274 29.509 409.500 704.690 6267.398

χ2

House/Loan/Macro

CR IP T

AN US

236.239 805.643 6258.378

2546.039

χ2

Table 11: Panel Regressions on House Sales House Only Loan only House/Loan

ACCEPTED MANUSCRIPT

ACCEPTED MANUSCRIPT

4.2

Does Neighborhood/Location Impact House Sales?

This section investigates whether neighborhood/location mat-

CR IP T

ters in house sales. We adopt a spatial econometric method to study the effects of: (1) allowing a spatially correlated disturbance term; and (2) spillovers from nearby houses. Spatially correlated disturbances pick up omitted location related inforrepresent the spillovers.

AN US

mation. The coefficients of the average neighbor values (W X)

We estimated four different probit models as in (21) to (26) and use year 2006 data for this analysis. Table 12 reports the probit results for X only models with both iid disturbances

M

and spatially dependent disturbances. Table 13 includes the average neighbor’s value (W X), and reports the probit results

ED

for X and W X models with both iid disturbances and spatially dependent disturbances. Variables in Table 12 and 13 includes

PT

mortgage information, housing characteristics, and origination year dummies.

CE

Note, the estimates of spatial dependence parameters in both tables have relatively high value of 0.6329 and 0.6444 correspondingly. More important, the t statistics from both tables

AC

associated with the spatial dependence variables are the highest among the corresponding explanatory variables in the regressions. This indicates that the spatial dependence in the error term may have a larger impact on model fit than other included 44

ACCEPTED MANUSCRIPT

independent variables. The pseudo-R2 increases from 0.1658 to 0.1938, and 0.2027 to 0.2302 for X only model and both X and

CR IP T

W X model correspondingly. Modeling spatially dependent disturbances improve model performance because the spatial model picks up location related omitted variable information.

The housing and loan information of the neighbors also helps

increase model explanatory power. The pseudo-R2 increases

AN US

from 0.1658 to 0.2027, and 0.1938 to 0.2302 for the iid model and the spatial model correspondingly. Neighbor’s mortgage

choices have the same direction of impact on house sales as the

AC

CE

PT

ED

M

homeowner’s own mortgage choices decision.

45

iid Model Estimates

Estimates

t

−0.9563 −36.1206 0.0643 9.5807 0.0099 1.5348 −0.0202 −2.3924 −0.0054 −0.7642 −0.0018 −0.2655 0.0730 9.1612 −0.0951 −7.9746 −1.6455 −61.2014 −0.0677 −2.2664 −0.1030 −3.6548 −0.1740 −6.5853 −0.2394 −9.1174 −0.7374 −27.8220 0.6329 69.1508 −47,126.4812 19.3822

AN US

−37.4570 9.6566 1.4176 −2.4176 −0.0651 −0.5513 9.5163 −8.6349 −70.5548 −2.4831 −3.7691 −7.1958 −10.0341 −29.2842

CE

PT

ED

Spatial Model

t

M

Intercept −0.8502 House Age 0.0534 Bath 0.0078 Sqft −0.0173 Land −0.0004 Quality −0.0032 LTV 0.0694 FRM −0.0912 Refi −1.5554 OrigY2001 −0.0644 OrigY2002 −0.0921 OrigY2003 −0.1648 OrigY2004 −0.2279 OrigY2005 −0.6696 ρ Log-Lik −48,766.3011 2 Pseudo R (%) 16.5770

CR IP T

ACCEPTED MANUSCRIPT

AC

Table 12: Spatial and Independent Probit Parameter Estimates - X Only

46

ACCEPTED MANUSCRIPT

Estimates

t

−20.1145 12.8701 1.0002 0.6574 −0.5786 3.0954 6.6944 −7.4565 −69.3564 −10.6841 −0.0262 0.8647 2.5356 −3.7791 4.5389 −1.4508 −48.3668 −2.4463 −3.3214 −6.8490 −10.3071 −29.2867

−0.5432 0.3378 0.0071 0.0094 −0.0100 0.0575 0.0572 −0.0867 −1.7149 −0.2865 0.0031 0.0042 0.0410 −0.0715 0.0511 −0.0264 −1.1156 −0.0671 −0.0935 −0.1689 −0.2505 −0.7538 0.6444 −45,001.8578 23.0167

−18.4456 12.6896 0.8668 0.7202 −0.5354 2.6042 6.6060 −6.7131 −63.3110 −10.6042 0.2332 0.2462 2.0279 −3.0889 3.9851 −1.3218 −39.6832 −2.1796 −3.2227 −6.2082 −9.2689 −27.6379 69.4198

AC

CE

PT

ED

M

Intercept −0.5095 House Age 0.2821 Bath 0.0060 Sqft 0.0062 Land −0.0075 Quality 0.0498 LTV 0.0506 FRM −0.0815 Refi −1.5452 W ·House Age −0.2362 W ·Bath −0.0002 W ·Sqft 0.0102 W ·Land 0.0355 W ·Quality −0.0635 W ·LTV 0.0456 W ·FRM −0.0225 W ·Refi −0.9798 OrigY2001 −0.0650 OrigY2002 −0.0830 OrigY2003 −0.1604 OrigY2004 −0.2395 OrigY2005 −0.6860 ρ Log-Lik −46,610.1264 2 Pseudo R (%) 20.2655

t

AN US

Estimates

Spatial Model

CR IP T

iid Model

Table 13: Spatial and Independent Probit Parameter Estimates - X and WX

47

ACCEPTED MANUSCRIPT

To summarize model fits of various models investigated in this study, iid vs. spatial probit, own characteristics vs. neighbor

CR IP T

characteristics, and housing variables vs. loan characteristics, Table 14 reports the model fit statistics (Log Likelihood and

pseudo-R2 ) of the various models for Clark county in year 2006. The results clearly show that mortgage information, neighbor characteristics, and allowing spatially dependent errors all

AN US

materially increase the predictability or the explanatory power of the house sales models. For example, the pseudo-R2 increases from 0.0076 to 0.0514 when allowing spatially correlated error term for the house variables only model (Model 1 from iid to spatial model). The pseudo-R2 increases from 0.0076 to 0.1434

M

when adding mortgage information for iid model (from Model 1 to Model 3, iid, X only model). Adding neighbor charac-

ED

teristics increases the pseudo-R2 from 0.1434 to 0.1806 (Model 3, iid, from X only to X and W X model). Overall, this re-

PT

search shows that by enhancing the house sales model with loan characteristics, a spatially dependent error term, and neighbor

CE

information, the model fit using the measure of pseudo-R2 increases from 0.0076 to 0.2302, a more than 30-fold improvement

AC

in model fit performance.

48

iid Model

Log-Lik X only −50,264.5860 X and WX −48,186.7911 Log-Lik

0.7620 0.8467

−55,453.4300 −55,423.6000

Model 2: Loan Only (X=Loan Variables) Pseudo R2 (%) Log-Lik 14.0139 17.5683

Pseudo R2 (%) 5.1375 5.1885

Pseudo R2 (%)

−48,621.9553 −46,591.8214

16.8239 20.2968

−48,491.9071 −46,334.6184

17.0464 20.7368

−47,126.4812 −45,001.8578

19.3822 23.0167

Model 3: House/Loan 1 (X=House and Loan Variables) Pseudo R2 (%) Log-Lik Pseudo R2 (%) 14.3381 18.0587

ED

X only −50,075.0876 X and WX −47,900.1440

Model 1: House Only (X=House Variable) Pseudo R2 (%) Log-Lik

AN US

X only −58,011.2100 X and WX −57,961.7100

Spatial Model

M

Log-Lik

CR IP T

ACCEPTED MANUSCRIPT

PT

Model 4: House/Loan 2 (X=House and Loan Variables and Orig Year Dummies) Log-Lik Pseudo R2 (%) Log-Lik Pseudo R2 (%)

CE

X only −48,766.3011 X and WX −46,610.1264

16.5770 20.2655

AC

Table 14: Summary Results: Model Fit Statistics of Spatial vs. iid Probit Models, House Characteristics vs. Loan Characteristics, and own characteristics vs. neighbor characteristics Models

49

ACCEPTED MANUSCRIPT

5

Conclusion

The factors influencing the probability of an individual prop-

CR IP T

erty selling are of interest in a number of areas of real estate research. Historically, at the individual house level, more re-

search has gone into the influence of the property characteristics or macroeconomic variables on the probability of sale than into other factors. Despite these efforts, results from this research

AN US

have been inconclusive as property characteristics do not seem to explain much of the empirical pattern of transactions. However, the mortgage literature has focused on pure financial variables to model mortgage terminations. This manuscript combines both

M

property and mortgage variables together to model the probability of a house transaction. We also investigate whether house

ED

sales are spatially interdependent, or in another word, whether location matters in house sales.

PT

We find that both mortgage and location information materially increase the explanatory power of house sale model. Mort-

CE

gage information relates to seller characteristics, especially the individuals expected tenure in the house. This is revealed by

AC

choices that borrowers make when they choose products whose benefits lie in the future. Explicitly modeling location helps pick up omitted explanatory variables, and measurable neighbor’s characteristics. Taken together, the property, mortgage and location information yields empirical pseudo-R2 of up to 50

ACCEPTED MANUSCRIPT

23%, a great improvement over modeling using property characteristics alone which can result in pseudo-R2 levels of under

CR IP T

1%. The results held up across a particularly large cycle in Las Vegas and Maricopa each year from 2006 to 2010, and for a variety of sub samples.

The addition of the mortgage information changes signifi-

cance levels of the property variable estimates. The results show

AN US

that property characteristics may not aid in modeling mortgage termination, but mortgage variables definitely need to be in-

cluded when modeling the probability of housing transactions in other contexts. Since the data is from public records, this makes it possible to improve the house sale model in a nation-

AC

CE

PT

ED

M

wide scale.

51

ACCEPTED MANUSCRIPT

References Artle, R. and P. Varaiya (1978). Life cycle consumption and

CR IP T

homeownership. Journal of Economic Theory 18 (1), 38–58.

Brueckner, J. K. and J. R. Follain (1988). The rise and fall of the arm: An econometric analysis of mortgage choice. The Review of Economics and Statistics, 93–102.

Spatial lock-in: Do falling house prices

AN US

Chan, S. (2001).

constrain residential mobility? nomics 49 (3), 567–586.

Journal of Urban Eco-

Chambers, M. S., C. Garriga, and D. Schlagenhauf (2009). The

M

loan structure and housing tenure decisions in an equilibrium model of mortgage choice. Review of Economic Dynam-

ED

ics 12 (3), 444–468.

Clapp, J. M., G. M. Goldberg, J. P. Harding, and M. LaCour-

PT

Little (2001). Movers and shuckers: interdependent prepay-

CE

ment decisions. Real Estate Economics 29 (3), 411–450. Deng, Y., J. M. Quigley, and R. Van Order (2000). Mortgage

AC

terminations, heterogeneity and the exercise of mortgage options. Econometrica 68 (2), 275–307.

Fisher, J., D. Gatzlaff, D. Geltner, and D. Haurin (2003). Controlling for the impact of variable liquidity in commercial real estate price indices. Real Estate Economics 31 (2), 269–303. 52

ACCEPTED MANUSCRIPT

Flavin, M. and T. Yamashita (1998). Owner-occupied housing and the composition of the household portfolio over the life-

CR IP T

cycle. National Bureau of Economic Research. Fortowsky, E., M. LaCour-Little, E. Rosenblatt, and V. Yao

(2011). Housing tenure and mortgage choice. The Journal of Real Estate Finance and Economics 42 (2), 162–180.

AN US

Fu, Y. and W. Qian (2014). Speculators and price overreaction in the housing market. Real Estate Economics 42 (4), 977– 1007.

Gatzlaff, D. H. and D. R. Haurin (1997). Sample selection bias

M

and repeat-sales index estimates. The Journal of Real Estate Finance and Economics 14, 33–50.

ED

Gatzlaff, D. H. and D. R. Haurin (1998). Sample selection and biases in local house value indices. Journal of Urban Eco-

PT

nomics 43 (2), 199–222.

Hanson, A., K. Schnier, and G. K. Turnbull (2012). Drive’til

CE

you qualify: Credit quality and household location. Regional

AC

Science and Urban Economics 42 (1), 63–77.

Heckman, J. J. (1979). Sample selection bias as a specification error (with an application to the estimation of labor supply functions). Econometrica (47), 153–161.

53

ACCEPTED MANUSCRIPT

Johnson, K. H., J. D. Benefield, and J. A. Wiley (2007). The probability of sale for residential real estate. Journal of Hous-

CR IP T

ing Research 16 (2), 131–142. Jud, D. G. and T. G. Seaks (1994). Sample selection bias in estimating housing sales. Journal of Real Estate Research 9 (3), 289–298.

An overview of the

AN US

Kau, J. B. and D. C. Keenan (1995).

option-theoretic pricing of mortgages. Journal of Housing Research 6 (2), 217–244.

Krupka, D. J. (2008). The stability of mixed income neighbor-

M

hoods in america.

LeSage, J. P. and R. K. Pace (2009). Introduction to spatial

ED

econometrics. Chapman & Hall/CRC. Munneke, H. J. and B. A. Slade (2000). An empirical study of

PT

sample-selection bias in indices of commercial real estate. The

CE

Journal of Real Estate Finance and Economics 21, 45–64. Ong, S. E. (2000). Prepayment risk and holding period for res-

AC

idential mortgages in Singapore-evidence from condominium transactions data. Journal of Property Investment and Fi-

nance 18 (6), 586–602.

Ortalo-Magn, F. and S. Rady (2006). Housing market dynamics: 54

ACCEPTED MANUSCRIPT

On the contribution of income shocks and credit constraints. Review of Economic Studies 73 (2), 459–485.

CR IP T

Pace, R. K. and J. P. LeSage (2016). Fast simulated maximum

likelihood estimation of the spatial probit model capable of handling large samples. Advances in Econometrics, Volume 37, Emerald.

AN US

Qian, W. (2012). Why do sellers hold out in the housing market? an option-based explanation. Real Estate Economics.

Stanton, R. and N. Wallace (1998). Mortgage choice: What’s the point? Real Estate Economics 26 (2), 173–205.

M

Yatchew, A. and Z. Griliches (1985). Specification error in probit

ED

models. Review of Economics and Statistics 67 (1), 134–139. Zhu, S. and R. K. Pace (2014). Modeling spatially interdependent mortgage decisions. The Journal of Real Estate Finance

AC

CE

PT

and Economics 49(4), 598–620.

55