The use of proxy variables in housing price analysis

The use of proxy variables in housing price analysis

JOURNAL OF URBAN ECONOMICS 7, 75-83 (1980) The Use of Proxy Variables in Housing Price Analysis JOHN F. MCDONALD’ Department of Economics, Univ...

516KB Sizes 6 Downloads 70 Views

JOURNAL

OF URBAN ECONOMICS

7, 75-83 (1980)

The Use of Proxy Variables

in Housing

Price Analysis

JOHN F. MCDONALD’ Department

of Economics, University of Illinois at Chicago Circle, Box [email protected], Chicago, Illinois 60680

Received July 5, 1977; revised February 9, 1978 This paper presents a reexamination of data used by Berry to study housing prices in Chicago. The detailed data on 275 single-family houses are used to test the proposition that the tax assessment on improvements is a good proxy for the attributes of the houses. It is shown that the test used by Berry is irrelevant for the question of omitted variables bias, and the correct test is presented. It is concluded that the proxy corrects for a bias in the coefficient of percent black population, but increases the negative bias in the coefficient of percent Latin0 population.

1. INTRODUCTION Many housing economists and other observers of urban housing markets were surprised by the results of the study by Brian Berry [2]. A principle conclusion of that study is that, in Chicago during 1968 to 1972, single-family housing prices in stable (not in recent racial transition) black and Latin0 neighborhoods were significantly less than prices in stable white neighborhoods. This result conflicts with earlier studies in Chicago and other cities. (See Berry [Z] for a listing of many such studies.) In light of the extensive tests detailed by Bednarz [l], Berry [a], and Berry and Bednarz [3, 41 for Chicago during this period, this paper does not directly challenge this conclusion. Rather, this paper is a critique of the methodology in the Berry [Z] study that may have some bearing on the strength of the conclusion. In the analysis of the “full” data set (about 30,000 house sales during 196%1972), Berry [2] used the tax assessment of the property as a proxy variable for various attributes of the property. The tax assessment is broken down into the assessments on improvements and on land. 1 The author thanks Professors Robert Bednarz and Lalitha Sanathanan for their assistance.

75 0094-1190/80/010075-09$02.00/O Copyright 0 1980 by Academic Press, Inc. All rights of reproduction in any form reserved.

JOHN F. MCDONALD

76

Berry used one of these variables, or their total, depending upon the model to be estimated. The purpose of this paper is to examine in detail the assumption that the tax assessment on improvements is a “good” proxy for the attributes of the housing structure for Berry’s sample data. If so, there are obvious implications for reducing the data requirements in the analysis of housing prices. A second purpose of the paper is to test the proposition that the assessment on land is a “good” proxy for the neighborhood characteristics of the property. The paper begins with an examination of Berry’s procedure for testing the proposition that the assessment on improvements is a good proxy for the characteristics of the housing structure. His procedure is found to be lacking an econometric rat,ionale. Section 3 presents the econometric theory of proxy variables. The empirical tests are presented in Section 4. The data base is the sample of 275 single-family houses sold in Chicago during 1970 to 1972 used by Berry [2] to test the efficacy of the proxy variable. These results indicate t,hat the proxy variable (assessment on improvements) helps to correct for the omitted variables bias in the coefficient of percent black population, but the use of the proxy increases the negative bias in the coefficient of percent Latin0 population. Also, the tax assessment on land is found to be a poor proxy for neighborhood characteristics. Some implications of these findings are discussed in the final section of the paper. 2. THE

BERRY

PROCEDURE

The procedure followed by Berry [Z] to investigate racial differences in housing prices first posits the hedonic price index model, written simply in linear form as P = C aiHi + C bjLj. * i

0)

Here P is the market price, the Hi are characteristics of the housing structure (size, age, number of baths, etc.), and the Lj are the characteristics of the land (accessibility, area, neighborhood characteristics, and environmental amenities). Because data on the Hi are unavailable (except at high cost) for most of the housing units sold, Berry substituted a proxy variable, the assessed value of housing improvements (R*). According to the Cook County Assessor, the assessed value of improvements is supposed to be a fixed percentage a* of the replacement cost of the house. Berry thus estimated P = aR* + C b”jLj. j Berry

[Z] first presented a test of the assumption

(2) that the assessed

PROXY

VARIABLES

IN HOUSING

ANALYSIS

77

value of improvements is a good proxy for the housing characteristics. To conduct t’his test, he obtained from the Society of Real Estate Appraisers a sample of 275 single-family houses sold in Chicago during 1970 to 72 for which both housing characteristics and assessed values are available.2 These data contain information on many attributes of the house such as age, floor area, number of baths, lot area, and presence of air condit,ioning, improved attic, improved basement,, and garage. The data on the houses were merged with 1970 Census dat,a, which provide median family income in the Census tract, percentage of homes in the Census tract which are mult,i-family, percentage of tract population that lived elsewhere in 1965, and percent, black, Latino, and Irish in the tract. Distance to the central business district in kilometers was measured, and a measure of air pollution (particulate concentration) was added. The selling price of the property was considered to be a function of these variables, and the regression equation was calculated [2]. Berry not.ed that t’he selling price of the property depends significantly upon the attributes of the house and upon several characteristics of the neighborhood. Next, he examined the assessment on improvements as a function of the same set of housing and neighborhood variables [a]. He concluded that, ‘I

. . . it was confirmed that tax assessments on improvements did in fact reflect structure characteristics and not characteristics of location or environment, as they should if the latter are capitalized into the value of land. This finding permitted assessments to bc substituted for the mass of Society of Real Estate Appraisers’ Data in the study of the full data set.” [2, pp. 404-406)

An examination of the regression results upon which this statement is based [2] leads one perhaps to question the conclusion since median family income, distance from the CBD, and particulate pollution are significant determinants of the assessment on improvements. However, Berry’s procedure raises a more fundamental issue. The objective of the Berry study is to obtain unbiased estimates of the racial price differences in Chicago during the period 1968 to 1972, holding housing quality constant. Thus, the question is one of bias in the estimated coefficients of certain included variables (percentage black or Lat’ino, in particular). The test conducted by Berry does not examine the biases in these coefficients introduced by substituting a proxy variable (assessment, on improvements) for several housing characteristics. 2 These data are described Berry and Bednarz [3, 41.

in detail

and extensively

analyzed

by Bednare

[l]

and

JOHN F. MCDONALD

78

However, the data used by Berry for his test contain the variables necessary to conduct the appropriate test. Before this test is conducted, the econometric theory of proxy variables must be examined to cover the issue raised by Berry. 3. ECONOMETRIC

THEORY

OF PROXY

VARIABLES

The fundamental hypothesis in the procedure used by Berry [2] is that the omitted variables bias can be reduced by the use of a proxy variable which is (assumed to be) a linear combination of those omitted variables. This section provides a brief discussion of the problems in using a proxy variable. The discussion summarizes the presentation in Maddala [5]. To simplify, suppose that the true model is the tworegressor case y = Px + YZ + u,

(3)

where z is observed and z is unobserved, and u is the normal error term. Suppose we have the proxy variable p = crx + z + e, where we allow correlation between p and u by allowing correlation between e and u. Correlation between p and x is also allowed. McCallum [S] and Wickens [7] proved that the bias in the estimate of p is less if the proxy is used in the OLS estimation of Eq. (3), assuming o = 0 and the correlation between e and u is zero. However, both of these assumptions may be violated in the case of Berry’s data. As noted above, bhe assessed value of improvements is significantly influenced by three neighborhood variables (median family income, distance from the CBD, and particulate pollution). Thus it seems that LYis not zero. Also, the results presented by Berry and Bednarz [3] indicate that the ratio of assessment to selling price is correlated with percent black population. This may also be interpreted to mean that p is correlated with Z. Furthermore, it would not be prudent to rule out a correlation between e and u. Maddala [5] presents the expression for the bias in the estimation of p given this tworegressor model: the bias can be larger or smaller depending upon the magnitude of LYand the correlations of p with x and e with u. The interested reader should consult Maddala [5]. The point is that the issue is an empirical question, to which we now turn. 4. EVIDENCE

FROM

THE

CHICAGO

The data used by Bednarz Cl], Berry and Bednarz [2] have been obtained for the purpose of comparing when a proxy variable for housing characteristics housing improvements) is included or excluded. In variable for neighborhood characteristics (assessed

DATA [3, 43, and Berry the biases present (assessed value of addition, a proxy value of land) is

PROXY VARIABLES

IN HOUSING TABLE

ANALYSIS

79

1

Regression Analysis of Selling Price (Dependent variable is in natural log form) Sample size = 275 Independent variables Constant Age (years) Square feet (natural log) Number of baths Improved attic (dummy) Improved basement (dummy) Air conditioned (dummy) Garage (dummy) Area of lot (natural log) Particulate pollution Median income (natural log) Multi-family

units (%)

Migrants (ye) Black (%) Latin0 (%) Irish (%) Distance to CBD

1

2

1.098 (1.21)” -0.007 (7.60) 0.296 (5.57) 0.095 (2.97) 0.110 (3.08) 0.078 (3.04) 0.047 (1.36) 0.056 (2.04) 0.155 (4.13) 0.001 (0.25) 0.555 (5.64) 0.154 (1.81) 0.071 (0.47) -0.276 (5.16) -0.749 (2.24) - 1.726 (3.44) 0.025 (3.87)

-1.260 (1.09)

0.160 (0.17)

0.216 (4.99) 0.003 (1.15) 0.924 (7.59) 0.027 (0.26) 0.331 (1.68) -0.225 (3.23) -0.857 (1.96) - 1.412 (2.16) 0.036 (4.29)

0.210 (5.83) -0.002 (0.68) 0.654 (6.26) 0.211 (2.41) 0.148 (0.90) -0.292 (5.02) -1.061 (2.91) -2.015 (3.69) 0.018 (2.51) 0.249 (10.85)

Assessment on improvements (natural log)

3

4 6.767 (15.56) -0.011 (10.67) 0.484 (7.28) 0.084 (1.84) 0.158 (3.05) 0.080 (2.16) 0.124 (2.53) 0.100 (2.69)

Assessment on land (natural log) R8 a t-values are in parentheses.

5 7.120 (17.70) -0.011 (11.85) 0.228 (3.22) 0.111 (2.62) 0.146 (3.08) 0.084 (2.45) 0.109 (2.42) 0.104 (3.04)

0.208 (7.13) 0.802

0.644

0.754

0.561

0.631

80

JOHN F. MCDONALD

tested. The data consist of 275 observations of single-family homes sold in Chicago during the period 1970 t,o 1972. These data are described briefly above and more fully by Berry and Bednarz [3]. This section presents a direct examination of the extent to which inclusion of the above-mentioned proxy variables “corrects” for omitted variables bias. The full regression results are present,ed in Table 1. The dependent variable, selling price, is in natural log form. In column 1 of Table 1 the independent variables included are the same as those used by Berry [Z] but the functional form of most of these variables has been changed to facilitate interpretation of the results. A few independent variables are in natural log form as in Berry [2] but’ the dummy variables and the variables bounded by 0 and 1 have not been converted to natural logs. The results in column 1 are comparable to Berry’s results and are satisfactory from the point’ of view of normality of the residuals. Using a Chi-square test, the probability of rejecting the normality assumption is 0.83. It is further assumed t’hat the specification in column 1 of Table 1 is correct in that no variables have been omitted that cause omitted variables bias in t.he results. The only coefficients that require some explanation arc the very large and negative coefficients of percent Irish (-1.726) and percent Latin0 (-0.749). These coefficients are also negative in Berry’s study. However, the variables in Berry’s study are in natural log form so that the magnitudes of the estimated coefficients do not appear to be large. The results in column 1 of Table 1 indicate that if percent Irish rises from 0 to 100, the price of the house falls by 173%. However, the actual range of the variable percent Irish is only 0 to 14. The actual range for percent, Latin0 is 0 to 33. The basic procedure to t,est for omitted variables bias and the correction for this bias is to (1) run the regression omitting selected variables and (2) run another regression including the proxy for these omitted variables. Columns 2 and 3 present the relevant test for the omission of housing characteristics: the age of t,he house, square feet of floor space, number of baths, improved attic (dummy variable), improved basement air conditioned (dummy variable), and garage (dummy variable), (dummy variable). The area of the lot is not omitted because it is not, a characteristic of the house. As shown in column 1 of Table 2, there are substantial biases introduced to some coefficients, especially median income, percent$ multi-family units, and percent migrants. The coefficient of percent black, a major focus of the Berry [2] study, changes by +0.05, or 5%,, for a change in percent black from 0 to 100. The positive direction of this omitted variables bias is perhaps unexpected, indicating that blacks in the sample occupy houses of higher “quality” than other ethnic groups. In column 3 of Table 1 the natural log of the assessment on improve-

PROXY VARIABLES

IN HOUSING TABLE

ANALYSIS

Sl

2

-

Area of lot Particulate pollution Median income Multi-family units Migrants Black Latin0 Irish Distance to CBD

Biae introduced by omitting housing improvements

Bias after including assessment on improvements

0.061 0.002 0.369 -0.127 0.260 0.051 -0.108 0.289 0.011

0.055 -0.003 0.099 0.057 0.077 -0.016 -0.312 -0.289 -0.007

ments is added to the variables used in column 2 of Table 1. The assessment variable is highly statistically significant (t = 10.85), and the coefficient is 0.25. The biases still present (comparing columns 1 and 3 of Table 1) are shown in column 2 of Table 2. Comparing columns 1 and 2 of Table 2, the absolute value of the bias is reduced for six of nine coefficients. Note that the sign of the bias changes in five of nine cases. The biases in the coefficients of median income, percent multi-family units, and percent migrants are reduced substantially. Also, the bias of the coefficient of percent black is reduced to only -0.016. However, the bias in the coefficient of percent Latin0 increases substantially from -0.108 to -0.312. Column 4 of Table 1 contains the results of omitting lot area and the neighborhood characterisbics ; the measure of particulate pollution, median income, percent multi-family units, percent migrants, percent black, percent Latino, percent Irish, and distance to the CBD. The omitted variables biases appear in the coefficients of square feet and air conditioned. Berry [2] contends that the assessedvalue of land is possibly a good proxy for neighborhood characteristics (and land area, of course), so t,his variable is added in column 5 of Table 1. The natural log of assessment on land is highly significant (t = 7.13) and has a coefficient of 0.208. However, as indicated in columns 1 and 2 of Table 3, the omitted variables bias is reduced in only three out of seven cases. The bias in the coefficient of square feet is substantially reduced (from 0.188 to -0.068), but the large bias in the coefficient of air conditioned remains. 5. CONCLUSION This paper has demonstrated the hazards in using a proxy variable to replace several independent variables. The author suggests that tests

JOHN

82

F. MCDONALD TABLE

3

Bias introduced by omitting-neighborhood characteristics and lot area Age (years) Square feet (natural Number of baths Improved attic Improved basement Air conditioned Garage

-0.004 0.188 -0.011

log)

0.048

Bias after including assessment on land

-0.004

-0.068 0.016

0.036

0.002

0.006

0.077 0.044

0.062 0.048

of the kind reported here be used on a small sample before a large sample is analyzed using a proxy variable. It is possible that the omitted variables bias can be made worse when the proxy is included compared to results obtained when the proxy is excluded. Such a perverse result seems to occur in the Chicago data for the coefficient of percent Latin0 population. The omission of the variables that measure the attributes of the house causes a negative bias in this coefficient, but the inclusion of t,he proxy (assessment on improvements) makes the bias more negative. On the other hand, the tests in this paper lend further support to Berry’s conclusion that prices in all-black neighborhoods were significantly below the prices in peripheral white neighborhoods. However, this conclusion rests upon t,he assumption that’ there are no relevant variables omitted from the set used in column 1 of Table 1. Many would be unwilling to make this assumption. In particular, the expectations that home buyers have for the future quality of neighborhoods probably play an important role in the determination of housing prices.3 No variables have been included to capture these effects. In any event, the methodological points of this paper stand. As mentioned in the introduction, the results in this paper bear only peripherally on Berry’s basic results, which are based on the full sample of over 30,000 properties, converted to a tract format. Berry does not rely directly on the results obtained from the subsample of 275 individual properties to reach conclusions concerning racial price differences. REFERENCES 1. R. S. Bednarz, “The Effect of Air Pollution on Property Value in Chicago.” The University of Chicago, Department of Geography, Research Paper No. 166 (1975). 2. B. J. L. Berry, Ghetto expansion and single-family housing prices: Chicago, 196% 1972. J. Urban Econ. 3, 397423 (1976). s I am indebted

to Professor

Marcus

Alexis for this point.

PROXY VARIABLES

IN HOUSING

ANALYSIS

83

3. B. J. L. Berry and R. S. Bednarz, A hedonic model of prices and assessments for single-family homes: Does the assessor follow the market or the market follow the assessor? Lund Econ. 51, 2140 (1975). 4. B. J. L. Berry and R. S. Bednarz, “The Disbenefits of Neighborhood and Environment to Urban Property.” Michigan Geographical Publications (forthcoming). 5. G. S. Maddala, Econometrics. New York: McGraw-Hill Book Co. (1977). 6. B. T. McCallum, Relative asymptotic bias from errors of omission and measurement. Econometrica 40, 757-758 (1971). 7. M. R. Wickens, A note on the use of proxy variables, Economet&a 40, 759-761 (1971).