Using ridge regression with genetic algorithm to enhance real estate appraisal forecasting

Using ridge regression with genetic algorithm to enhance real estate appraisal forecasting

Expert Systems with Applications 39 (2012) 8369–8379 Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal hom...

1MB Sizes 0 Downloads 19 Views

Expert Systems with Applications 39 (2012) 8369–8379

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Using ridge regression with genetic algorithm to enhance real estate appraisal forecasting Jae Joon Ahn a, Hyun Woo Byun a, Kyong Joo Oh a,⇑, Tae Yoon Kim b,1 a b

Department of Information and Industrial Engineering, Yonsei University, 134, Shinchon-Dong, Seodaemun-Gu, Seoul 120-749, South Korea Department of Statistics, Keimyung University, Daegu 704-701, South Korea

a r t i c l e Keywords: Ridge regression Genetic algorithm Real estate market

i n f o

a b s t r a c t This study considers real estate appraisal forecasting problem. While there is a great deal of literature about use of artificial intelligence and multiple linear regression for the problem, there has been always controversy about which one performs better. Noting that this controversy is due to difficulty finding proper predictor variables in real estate appraisal, we propose a modified version of ridge regression, i.e., ridge regression coupled with genetic algorithm (GA-Ridge). In order to examine the performance of the proposed method, experimental study is done for Korean real estate market, which verifies that GA-Ridge is effective in forecasting real estate appraisal. This study addresses two critical issues regarding the use of ridge regression, i.e., when to use it and how to improve it. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction In recent years, interest in performance of real estate markets and real estate investment trusts (REITs) has grown up so fast and tremendously as they are usually required for asset valuation, property tax, insurance estimations, sales transactions, and estate planning. Conventionally, sales comparison approach has been widely accepted to forecast residential real estate. The sales comparison grid method, however, is often questioned for relying too much on subjective judgments for obtaining reliable and verifiable data (Wiltshaw, 1995). As a consequence, multiple linear regression (MLR) based on related predictors has been considered as a rigorous alternative enhancing predictability of real estate and property value, which immediately faces criticism such as nonlinearity within the data, multicollinearity issues in the predictor variables and the inclusion of outlier in the sample. As is often the case with other financial forecasting problems, this criticism has prompted researchers to resort to artificial neural network (ANN) as another logical alternative (Ahn, Lee, Oh, & Kim, 2009; Chen & Du, 2009; Dong & Zhou, 2008; Lee, Booth, & Alam, 2005; Lu, 2010; Oh & Han, 2000; Versace, Bhatt, Hinds, & Shiffer, 2004). The follow-up studies observe, however, that either ANN or MLR fails to report a dominating performance than the other, i.e., ANN excels MLR in some cases while MLR excels ANN in other cases (Dehghan, Sattari, Chehreh, & Aliabadi, 2010; Hua, 1996; Nguyen

⇑ Corresponding author. Tel.: +82 2 2123 5720; fax: +82 2 364 7807. E-mail addresses: [email protected] (J.J. Ahn), [email protected] (H.W. Byun), [email protected] (K.J. Oh), [email protected] (T.Y. Kim). 1 Tel.: +82 53 580 5533. 0957-4174/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.01.183

& Cripps, 2001; Worzala, Lenk, & Silva, 1995). In this study, it will be shown that this confusing episode appears due to difficulty finding proper predictor variables and could be resolved quite successfully by a modified version of ridge regression, i.e., ridge regression coupled with genetic algorithm (GA-Ridge). Theoretically as well as practically, there has been widespread strong objection to arbitrary use of ridge regression. The main criticisms are twofold. Firstly, though it is well known that ridge regression is effective for the case where the unknown parameters (or the linear coefficients) are known a priori to have small modulus values, it is hard to obtain or implement such prior information. Secondly, blind use of ridge regression is likely to change any nonsignificant predictor variable into significant one easily. Our study addresses these two critical issues and proposes GA-Ridge as a measure that takes care of them nicely. The rest of the study is divided as follows. Section 2 discusses background of this article involving difficulty finding proper predictor variables in real estate forecasting. Section 3 is devoted to detailed description of the proposed GA-Ridge and discusses its effectiveness for handling the two critical issues of ridge regression. In Section 4, GA-Ridge is experimented in the Korean real estate market to demonstrate its effectiveness. Lastly, the concluding remarks are given in Section 5. 2. Background 2.1. Predictor variable for real estate forecasting Forecasting of asset pricing is a major issue in real estate practice (Bourassa, Cantoni, & Hoesli, 2010; Chica-Olmo, 2007;

8370

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

McCluskey & Anand, 1999; O’Roarty et al., 1997; Peterson & Flanagan, 2009; Tay & Ho, 1994; Wilson, Paris, Ware, & Jenkins, 2002). Property development relies on prediction of expected costs and returns (Allen, Madura, & Springer, 2000; Evans, James, & Collins, 1993; Juan, Shin, & Perng, 2006; McCluskey & Anand, 1999; Pivo & Fisher, 2010). Property and facilities managers need forecasts of supply and demand as well as of cost and return. Funds and investment managers rely on forecasts of the present and future values of real estate in terms of economic activities. In these real estate forecasting problems, there has been a hot controversy over superiority of ANN over MLR as proper tool since use of ANN for residential valuation was first suggested by Jensen (1990). Rossini (1997) seeks to assess the application of ANN and MLR to residential valuation and supports the use of MLR while Do and Grudnitski (1992) suggests ANN as s superior technique. Worzala et al. (1995) notices that while ANN slightly outperforms MLR in some cases, the difference between the two is insignificant. Hua (1996) and Brooks and Tsolacos (2003) support ANN over MLR with some cautionary note on predictor variables and McGreal, Berry, McParland, and Turner (2004) expresses skepticism about ANN. Noting that ANN is designed mainly for the purpose of modeling any functional relationship (or ANN is designed mainly to correct modeling bias or assumption), ANN is expected to excel MLR when there is significant modeling bias from linear model, while MLR is expected to excel ANN otherwise. Thus no clear-cut superiority between ANN and MLR implicitly suggests that other source of trouble might exist than incorrect modeling in real estate appraisal forecasting. One possible source of trouble is difficulty finding significant and reliable predictors as discussed by several authors. Rossini (1997) noticed that quantitative predictor variables such as past sale price, land area, rooms and year of construction tend to suffer from lack of qualitative measures, while qualitative predictor variables such as building style and environments are frequently rather simplistic and fail to capture sufficient information. Similar observations are made by Brooks and Tsolacos (2003) and McGreal et al. (2004). In particular, Brooks and Tsolacos (2003) noticed that significant predictors depend on the used methodology. These discussions altogether suggest clearly that finding a proper set of predictor variables is hard in real estate appraisal and it would be highly desirable to take care of this predictor selection problem technically. 2.2. Ridge regression Ridge regression is known as a very useful tool for alleviating multicolinearity problem (Walker & Birch, 1988). Its formal formulation is given as one of least squares subject to a specific type of restrictions on the parameters. The standard approach to solve an overdetermined system of linear equations:

Y ¼ Xb is known as linear least squares and seeks to minimize the residual:

kY  Xbk2 ; where Y is n  1 vector, X is n  p matrix (n P p), b is p  1 vector and k k is Euclidean norm. However, the matrix X may be ill conditioned or singular yielding a non-unique solution. In order to give preference to a particular solution with desirable properties, the regularization term is included in this minimization: 2

kY  Xbk2 þ k kbk2 : This regularization improves the conditioning of the problem, thus ^ is enabling a numerical solution. An explicit solution, denoted by b, given by

 1 ^ ¼ bðkÞ ^ b ¼ X0 X þ kI X0 Y;

ð1Þ

Table 1 Training and testing period for moving window scheme. Window number

Training period

Testing period

1 2 3 4 5 6 7 8 9 10

1996.07–2004.12 1997.01–2005.06 1997.07–2005.12 1998.01–2006.06 1998.07–2006.12 1999.01–2007.06 1999.07–2007.12 2000.01–2008.06 2000.07–2008.12 2000.01–2009.06

2005.01–2005.06 2005.07–2005.12 2006.01–2006.06 2006.07–2006.12 2007.01–2007.06 2007.07–2007.12 2008.01–2008.06 2008.07–2008.12 2009.01–2009.06 2009.07–2009.12

Fig. 1. Moving window scheme.

where k is a positive number. In applications, the interesting values of k usually lie in the range of (0, 1). This procedure is called ridge regression. It is well known that ridge regression can be regarded as an estimation of b from the data subject to prior knowledge that smaller values in modulus of the b s are more likely than larger values, and that larger and larger values of the b s are more and more unlikely. Thus ridge regression is quite useful when smaller values in modulus of the b s are expected more than larger values. In this context, one major drawback of ridge regression is the ‘‘unchecked arbitrariness’’ when it is implemented in practice. Indeed the characteristic effect of the ridge regression procedure is to change any non-significant estimated b to the significant estimated b and hence it is questionable that much real improvement can be really achieved by such a procedure. Refer to Draper and Smith (1981).

2.3. Genetic algorithm GA is a stochastic search technique that can explore large and complicated spaces on the ideas from natural genetics and evolutionary principle (Goldberg, 1989; Holland, 1975; Oh, Kim, & Min, 2005). It has been demonstrated to be effective and robust in searching very large spaces in a wide range of applications (Koza, 1993). GA is particularly suitable for multi-parameter optimization problems with an objective function subject to numerous hard and soft constraints. GA performs the search process in four

8371

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 2 List of predictor variables. Selected predictor

Explanation

X1 X2

Note default rate (NDR) Size of the run of increasing X1 during the latest 12 months

X3 X4 X5 X6 X7 X8 X9

Change rate of foreign exchange holdings (FEH) Change rate of money stock Change rate of producer price index Change rate of consumer price index Change rate of balance of trade Change rate of index of industrial production Size of the run of decreasing X8 during the latest 12 months

X10 X11 X12 X13

Change rate of index of producer shipment Change rate of index of equipment investment FEH per gross domestic products Size of the run of decreasing X12 during the latest 12 months

X14

Size of the run of decreasing monthly change of X12 during the latest 12 months

X15 X16 X17

Change rate of FEH per GDP Balance of trade per GDP Size of the run of increasing X16 during the latest 16 months

Using raw data Pt i¼t11 ðZ1Þi ; Z1 ¼ 1 if ðX1Þt > ðX1Þt1 ; 0; otherwise Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last Pt i¼t11 ðZ8Þi ; Z8 ¼ 1 if ðX8Þt < ðX8Þt1 ; 0; otherwise Ratio of the current month to the same month of the last Ratio of the current month to the same month of the last FEH/GDP Pt i¼t11 ðZ8Þi ; Z12 ¼ 1 if; 0; otherwise Pt i¼t11 ðZ12Þi Z12 ¼ 1 if ðX12Þt < ðX12Þt1 ; 0; otherwise Ratio of the current month to the same month of the last BOT/GDP Pt i¼t11 ðZ16Þi ; Z16 ¼ 1 if ðX16Þt < ðX16Þt1 ; 0; otherwise Pt i¼t11 ðZ16Þi ; Z16 ¼ 1 if ðX16Þt < 0; 0; otherwise Use raw data Use raw data Use raw data Use raw data

X18

Size of the run of negative X16 during the latest 16 months

X19 X20 X21 X22

Balance of payments (direct Investment) Balance of payments (securities investment) Other balance of payments Amount of foreigners’ investment in the stock market

year year year year year year year year

year

(a)

stages: (i) initialization, (ii) selection, (iii) crossover, and (iv) mutation (Wong & Tan, 1994). In the initialization stage, a population of genetic structures (known as chromosomes) that are randomly distributed in the solution space is selected as the starting point of the search. After the initialization stage, each chromosome is evaluated using a user-defined fitness function. The goal of the fitness function is to encode numerically the performance of the chromosome. For real-world applications of optimization methods such as GA, the choice of the fitness function is the most critical step. In this paper, GA is employed for finding optimal k and proper set of predictors simultaneously.

(b)

3. GA-Ridge algorithm Procedure of GA-Ridge algorithm is described. A general multiple linear regression model is represented as follows:

Y i ¼ D1 b1 X i1 þ    þ Dp bp X ip þ ei ;

i ¼ 1; 2; . . . ; n;

ð2Þ

where Dj (j = 1, . . . , p) equals to either 0 or 1 according to the inclusion of predictor variable Xj(j = 1, . . . , p) in the model (2). Then, for numerous combinations of ðD1 ; D2 ; . . . ; Dp Þ and various values of  0 < k < 1, the optimal ðD1 ; D2 ; . . . ; Dp ; k Þ is searched by GA, i.e.,

(c)

   D1 ; D2 ; . . . ; Dp ; k ¼ argminD1 ;D2 ;...;Dp ;k 

n h X

 i2 Y t  Y^ t D1 ; D2 ; . . . ; Dp ; k

t¼1

  ¼ argminD1 ;D2 ;...;Dp ;k SSE D1 ; D2 ; . . . ; Dp ; k ;

ð3Þ

^ ðD ; D ; . . . ; D ; kÞ ¼ XðD1 ; D2 ; . . . ; Dp ÞbðD ^ 1 ; D2 ; . . . ; Dp ; kÞ, where Y  t 1 2  p is with  q 6 p and  X D1 ; D2 ; . . . ; Dp  a n  q  matrix  ^ D1 ; D2 ; . . . ; Dp ; k ¼ ½X D1 ; D2 ; . . . ; Dp 0 X D1 ; D2 ; . . . ; Dp þ kI1 XðD1 ; b D2 ; . . . ; Dp Þ0 Y. Thus final GA-Ridge regression estimator is: Fig. 2. Performance comparison of ANN and MLR for HSI forecasting during evaluation period.

    ^ GA ¼ X D ; D ; . . . ; D b ^ D ; D ; . . . ; D ; k : Y 1 2 p 1 2 p

ð4Þ

8372

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Remark 1. Ridge regression is preferred when one expects smaller b s in modulus. Note that insignificant b s might be included in smaller b s in modulus. Problem is that it is usually hard to determine such situation effectively because smaller b s in modulus requires a subjective judgment. To resolve this problem, we propose to use ridge regression ‘‘when neither ANN nor MLR excels the other that significantly’’. This litmus rule for using ridge regression is based on the fact that the quoted situation arises when smaller b s in modulus are very likely or reliable predictor variables are hard to find. Note that this litmus rule is desirable because it does not depend on subjective judgment that much. Refer to Section 4 for how to implement the litmus rule in practice. Remark 2. One strong criticism against ridge regression is that ^ bðkÞ in (1) arbitrarily changes the non-significant estimated b to ^  ; D ; . . . ; D ; k Þ of (4) prevents this the significant estimated b. bðD 1 2 p unchecked arbitrariness effectively since the optimal k and the optimal predictor variables are searched ‘‘simultaneously’’. Note that GA plays a key role for GA-Ridge algorithm because it is particularly suitable for multi-parameter optimization problems with an objective function subject to numerous hard and soft constraints.

4. Empirical studies 4.1. Experimental setting In this experimental study, forecasting of home sales index (HSI) and home rental index (HRI) in the Korean real estate market are considered. These monthly indexes are produced and maintained by the KB bank, one of the major banks in Korea, for the purpose of monitoring real estate market movement. In this study forecasting analysis of HSI and HRI covers a 14-year period from July 1996 to December 2009. In order to evaluate the forecasting accuracy of GA-Ridge algorithm under different experimental situations, ‘‘moving window scheme’’ is employed. Indeed moving window which is a block of time series data of size l comprising the first sub-block of size l1 and the second sub-block of size l2 (i.e., l = l1 + l2), moves by size l2 each time and thus each moving window of size l overlaps the next window by size l2. Here the latter sub-block of size l2 is held out for evaluation purpose while GARidge algorithm is implemented for the former sub-block of size l1. The moving window scheme with 10 windows is illustrated in Table 1 and Fig. 1. Refer also to Jang, Lai, Jiang, Pan, and Chien (1993) and Hwarng (2001). For experimenting GA-Ridge algorithm with monthly HRI or monthly HSI as predicted variables, the predictor variables used for monitoring economic condition in Ahn, Oh, Kim, and Kim (2011) are employed as predictors here. Indeed the three major economic variables, foreign exchange rates, interest rates, and stock market index, and the key macroeconomic predictors such as GDP and trade balance with their derivations are included to obtain 22 predictors. Refer to Table 2. Note that all the predictors are monthly data and are developed for the purpose of monitoring the Korean economic conditions by Ahn et al. (2011). What is behind this selection of predictor is that economic condition itself is obviously quite influential on the real estate market but hard to quantify as a single predictor. Thus it is decomposed as the predictors in Table 2 instead. In order to evaluate the forecasting accuracy, the following three distance metrics (5)–(7) are employed: root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE):

Table 3 Significance test for 22 predictor variables when MLR is used for HSI forecasting. Predictor

Coefficient

p-Value

(a) Window No. 1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22

0.0014402 0.0010064 0.058566 0.054909 0.082096 0.053088 2.686E05 0.017776 0.0008066 0.039748 0.012778 7.649E05 0.0002201 0.003623 0.071041 1.9552 0.0007475 0.0022285 1.94E06 4.56E07 3.37E07 1.54E09

0.5826 0.3458 0.0553 0.2217 0.0757 0.655 0.411 0.7411 0.326 0.4673 0.1496 0.1309 0.7276 0.0005 0.0166 0.0004 0.396 0.008 0.1587 0.358 0.4 0.0455

(b) Window No. 10 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22

0.0047141 0.0004101 0.076859 0.094025 0.11661 0.14433 4.36E05 0.15811 0.0012202 0.15055 0.012823 5.252E05 0 0.0009064 0.09998 0.23245 0.0016523 0.0006417 3.88E07 1.83E07 1.13E07 2.76E10

0.6146 0.6759 0.2488 0.048 0.0647 0.3806 0.46 0.0313 0.2552 0.0569 0.3028 0.2384

n  2 1X Y t  Y^ t ; n t¼1 n   1X   MAE ¼ Y t  Y^ t ; n t¼1 !2 n 1X Y t  Y^ t MAPE ¼  100; n t¼1 Yt

RMSE ¼

0.4315 0.1568 0.6975 0.0887 0.7232 0.6875 0.642 0.654 0.5506

ð5Þ ð6Þ

ð7Þ

^ t are respectively the actual and forecasted value of where Yt and Y HSI or HRI at time t. While both MAE and RMSE are simply measures of discrepancies between the predicted values and the actual observations, MAPE measures scaled discrepancies at each t. 4.2. Experimental results Forecasting analysis is done for both HSI and HRI. Since forecasting analysis result of HRI is quite similar to HSI, detailed forecasting analysis of HSI is given first and then a brief summary of HRI forecasting analysis is given later. As a prior validity check for using GA-Ridge for HSI forecasting, we examined two things. Firstly, we compare performance of MLR and ANN, which is our

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Fig. 3. Predicted vs actual HSI during testing period.

8373

8374

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 4 Numerical comparison of four forecasting methods for HSI. Distance metric

GA-Ridge

Multiple regression

Ridge regression

ANN

RMSE MAE MAPE

0.0074 0.0055 239.66

0.0104 0.0086 304.91

0.0088 0.0069 244.72

0.0110 0.0088 291.05

Table 5 p-Values of 6 paired t-tests for four forecasting methods on HIS.

(a) MSE GA-Ridge Multiple regression Ridge regression ANN (b) MAE GA-Ridge Multiple regression Ridge regression ANN (c) MAPE GA-Ridge Multiple regression Ridge regression ANN *

GA-Ridge

Multiple regression

Ridge regression

ANN



0.000* –

0.000* 0.003* –

0.000* 0.266 0.009* –



0.000* –

0.001* 0.000* –

0.000* 0.421 0.009* –



0.000* –

0.000* 0.011* –

0.001* 0.425 0.043* –

Significant at 5%.

Table 6 Significance test for the selected variables when GA-Ridge is used for HSI forecasting. Selecting variables

Coefficient

p-Value

Ridge value

(a) Window No. 1 X1 X2 X3 X5 X14 X15 X16 X18 X22

0.0037 0.0013 0.0381 0.0984 0.0025 0.0473 1.7609 0.0019 1.93E09

0.0831 0.0747 0.0045 0.0001 0.0042 0.0009 0.0001 0.0001 0.0007

0.0019

(b) Window No. 2 X3 X4 X5 X9 X12 X14 X15 X16 X18 X19 X20 X22

0.0416 0.0539 0.0767 0.0011 0.0001 0.0021 0.0522 2.1558 0.0017 0.000002 0.0000005 9.83E10

0.0234 0.1198 0.0013 0.0501 0.0192 0.0004 0.0055 0.0001 0.0001 0.187 0.2578 0.1143

0.0007

(c) Window No. 10 X4 X5 X6 X8 X9 X10 X16 X17

0.0545 0.1291 0.1397 0.1324 0.0011 0.1152 0.5481 0.0019

0.0314 0.0001 0.1571 0.0054 0.1278 0.0169 0.2067 0.0372

0.0027

litmus rule for using GA-Ridge (refer to Remark 1). Fig. 2 shows that neither ANN nor MLR excels its counterpart uniformly throughout 10 windows. Recall that our experiments use moving window scheme with 10 windows as described in Fig. 1. Secondly,

significances of the 22 predictors are tested individually when they are employed for MLR at each window (see Table 3). Table 3 shows that most of the predictors are not significant and significant predictors found at each window vary. In addition it shows that the estimated coefficients are close to zero in their modulus. For editorial purpose, test results for only window 1 and 10 are given in Table 3 (others point out similar things). This is not really surprising because each predictor constituting the current economic condition together is expected to have indirect influence though the current economic condition itself has evidently great influence on HSI. As a result, it seems to be technically as well as intuitively desirable to employ GA-Ridge as appropriate method for forecasting on this particular problem. For forecasting performance comparison with GA-Ridge, three other forecasting methods are considered: MLR, Pure Ridge regression and ANN. Pure ridge regression method is considered here in order to assess how effectively GA-Ridge resolves the unchecked arbitrariness of ridge regression mentioned in Remark 2. Fig. 3 depicts the forecasting result when each method is employed during the testing periods of the moving window scheme. Note that the testing periods are connected continuously without any time break, starting from January 2005 (refer to Fig. 1). Fig. 3 is summarized by Table 4 numerically which calculates RMSE, MAE and MAPE values from Fig. 3 for evaluating the performances of the four methods. It is easy to notice from Table 4 that GA-Ridge is superior to the other methods across the three distance metrics. To understand things better, mean difference tests (or paired ttests) are done for 6 pairs out of the 4 methods. Indeed from calcu^ tj Þ2 : t ¼ 1; . . . ; ng lation of MASE, a set of data W j ¼ fwtj ¼ ðY tj  Y for j = 1 (GA Ridge), j = 2 (MLR), j = 3 (Pure Ridge) and j = 4 (ANN) are obtained and paired tests are done for six pairs (W1, W2), (W1, W3), (W1, W4), (W2, W3), (W2, W4) and (W3, W4). Similar procedure is done for MAE and MAPE. Results of these paired tests given in Table 5 verify that the performances of the four methods are significantly different from each other except the pair (ANN, MLR). This together with Fig. 2 and Table 4 confirms the superior performance of GA-Ridge and the insignificant difference between

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Fig. 4. Performance comparison of ANN and MLR for HRI forecasting during testing period.

8375

8376

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 Table 7 Significance test for 22 predictor variables when MLR is used for HRI forecasting. Predictor

Coefficient

p-Value

(a) Window No. 1 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22

0.00005 0.00204 0.06572 0.06273 0.09897 0.05789 4.231E05 0.07073 0.00073 0.02647 0.01088 0.00011 0.00041 0.00396 0.05529 2.65423 0.00029 0.00357 3.072E06 2.065E06 4.27E07 7.82E10

0.9909 0.2479 0.1896 0.3963 0.1922 0.7676 0.4324 0.4263 0.5915 0.7689 0.4547 0.1922 0.6975 0.0185 0.2516 0.0031 0.8365 0.0099 0.1759 0.0131 0.5178 0.5339

(b) Window No. 10 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22

0.01136 7.792E06 0.07624 0.12016 0.07448 0.06548 6.922E05 0.11270 0.00070 0.08147 0.01323 2.074E05 0 0.00097 0.08063 0.04988 0.00204 0.00170 3.71E07 3.33E07 1.21E07 8.52E11

0.2566 0.994 0.2827 0.0183 0.2648 0.7083 0.2719 0.1465 0.4835 0.3294 0.3183 0.6609 0.0000 0.4296 0.2826 0.9376 0.0491 0.3789 0.7183 0.4286 0.6512 0.8627

performances of MLR and ANN. Finally the predictors selected by GA-Ridge are examined for their significance at each window in Table 6, which shows that almost all the selected predictors are changed to significant ones in GA-Ridge method. Again for editorial purpose results for windows 1, 2 and 10 are given. The above comparison studies altogether indicate the followings: (i) ANN and MLR equally match. (ii) Pure ridge is improved significantly by GA-Ridge. (iii) GA-Ridge excels others easily. Note that (i) recommends the use of GA-Ridge (see Remark 1) while (ii) implies the checked arbitrariness of pure ridge by GA-Ridge (see Remark 2). For forecasting analysis of HRI, almost identical steps are done. Fig. 4 shows that neither ANN nor MLR excels its counterpart uniformly throughout 10 windows. Significances of the 22 predictor variables are tested for HRI in Table 7, which suggests that most of predictor variables have weak influence on HRI though some of them show strong significance depending on window. For forecasting performance comparison, the four forecasting methods are considered again. Fig. 5 depicts the forecasting result when each method is employed during the testing periods of the moving window scheme. Then Fig. 5 is summarized by Table 8 numerically, which confirms that GA-Ridge is superior to the other methods

across the three distance metrics. Again, mean difference tests (or paired t-tests) are done for 6 pairs out of the four methods from calculation of MASE, MAE and MAPE. Results of the paired tests in Table 9 verify that performances of the four methods are significantly different from each other except the pair (ANN, MLR). Finally the predictors and the ridge value k selected by GA-Ridge are examined for their significance at each window in Table 10 which shows almost all the selected predictors are changed to significant ones in GA-Ridge method.

5. Concluding remarks We studied ridge regression as an alternative tool in real estate forecasting where one usually faces difficulty finding proper predictors. GA-Ridge is proposed here and its performance is examined against other forecasting methods. It is shown that GA is not only successful for real estate forecasting but also nicely settles critical issues in ridge regression. Experimental results are given for justification of GA-Ridge. It is noteworthy from the experimental results that GA-Ridge becomes a perfect solution particularly when a desirable predictor is hard to quantify but might be

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Fig. 5. Predicted vs actual HRI during testing period.

8377

8378

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379

Table 8 Comparison of four forecasting methods for HRI. Distance metric

GA-Ridge

Multiple regression

Ridge regression

ANN

RMSE MAE MAPE

0.0057 0.0045 131.14

0.0111 0.0079 208.97

0.0070 0.0051 140.73

0.0112 0.0085 242.38

Table 9 p-Values of 6 paired t-tests for four forecasting methods on HRI.

(a) MSE GA-Ridge Multiple regression Ridge regression ANN (b) MAE GA-Ridge Multiple regression Ridge regression ANN (c) MAPE GA-Ridge Multiple regression Ridge regression ANN *

GA-Ridge

Multiple regression

Ridge regression

ANN



0.003* –

0.020* 0.002* –

0.000* 0.470 0.000* –



0.000* –

0.022* 0.000* –

0.000* 0.298 0.001* –



0.000* –

0.018* 0.000* –

0.002* 0.151 0.049* –

Significant at 5%.

Table 10 Significance test for the selected variables when GA-Ridge is used for HRI forecasting. Selecting variables

Coefficient

p-Value

Ridge value

(a) Window No. 1 X2 X3 X4 X5 X7 X12 X14 X16 X18 X19 X20

0.0007 0.0088 0.1116 0.1686 0.0000 0.0001 0.0019 2.3114 0.0019 0.0000 0.0000

0.4126 0.0968 0.0107 0.0001 0.2943 0.0036 0.0428 0.0001 0.0009 0.3463 0.0005

0.01396

(b) Window No. 2 X1 X3 X5 X7 X10 X14 X16 X18 X19 X20

0.0041 0.01289 0.1242 0.0001 0.0228 0.0032 2.4530 0.0017 0.0000 0.0000

0.0964 0.0001 0.0003 0.0453 0.0068 0.0006 0.0001 0.0002 0.1365 0.0001

0.01531

(c) Window No. 9 X4 X5 X6 X7 X8 X9 X14 X17 X22

0.0673 0.1591 0.2497 0.0001 0.0356 0.0015 0.0012 0.0012 5.75E10

0.0183 0.0001 0.0339 0.1600 0.0173 0.0465 0.0238 0.1975 0.0974

0.0005

Acknowledgment T. Y. Kim’s work was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (MEST) (KRF-2011-0015936). References

decomposed into various other predictors having less influence on response.

Ahn, J. J., Lee, S. J., Oh, K. J., & Kim, T. Y. (2009). Intelligent forecasting for financial time series subject to structural changes. Intelligent Data Analysis, 13, 151–163. Ahn, J. J., Oh, K. J., Kim, T. Y., & Kim, D. H. (2011). Usefulness of support vector machine to develop an early warning system for financial crisis. Expert Systems with Applications, 38, 2966–2973. Allen, M. T., Madura, J., & Springer, T. M. (2000). REIT characteristics and the sensitivity of REIT returns. Journal of Real Estate Finance and Economics, 21, 141–152. Bourassa, S. C., Cantoni, E., & Hoesli, M. (2010). Predicting house prices with spatial dependence: A comparison of alternative methods. Journal of Real Estate Research, 32, 139–159. Brooks, C., & Tsolacos, S. (2003). International evidence on the predictability of returns to securitized real estate assets: Econometric models versus neural networks. Journal of Property Research, 20, 133–155. Chen, W. S., & Du, Y. K. (2009). Using neural networks and data mining techniques for the financial distress prediction model. Expert Systems with Applications, 36, 4075–4086. Chica-Olmo, J. (2007). Prediction of housing location price by a multivariate spatial method: Cokriging. Journal of Real Estate Research, 29, 92–114. Dehghan, S., Sattari, G., Chehreh, C. S., & Aliabadi, M. A. (2010). Prediction of uniaxial compressive strength and modulus of elasticity for Travertine samples using regression and artificial neural networks. Mining Science and Technology, 20, 41–46. Do, A. Q., & Grudnitski, G. (1992). A neural network approach to residential property appraisal. The Real Estate Appraiser, 58, 38–45. Dong, M., & Zhou, X. S. (2008). Knowledge discovery in corporate events by neural network rule extraction. Applied Intelligence, 29, 129–137. Draper, N., & Smith, H. (1981). Applied regression analysis. New York: Wiley. Evans, A., James, H., & Collins, A. (1993). Artificial neural networks: An application to residential valuation in the UK. Journal of Property Valuation & Investment, 11, 195–204. Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. New York: Addison-Wesley. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence. Cambridge: The MIT Press.

J.J. Ahn et al. / Expert Systems with Applications 39 (2012) 8369–8379 Hua, C. (1996). Residential construction demand forecasting using economic indicators: A comparative study of artificial neural networks and multiple regression. Construction Management and Economics, 14, 125–134. Hwarng, H. B. (2001). Insights into neural-network forecasting of time series corresponding to ARMA(p, q) structures. Omega, 29, 273–289. Jang, G. S., Lai, F., Jiang, B. W., Pan, C. C., & Chien, L. H. (1993). Intelligent stock trading system with price trend prediction and reversal recognition using dualmodule neural networks. Applied Intelligence, 3, 225–248. Jensen, D. (1990). Artificial intelligence in computer-assisted mass appraisal. Property Tax Journal, 9, 5–26. Juan, Y. K., Shin, S. G., & Perng, Y. H. (2006). Decision support for housing customization: A hybrid approach using case-based reasoning and genetic algorithm. Expert Systems with Applications, 31, 83–93. Koza, J. (1993). Genetic programming. Cambridge: The MIT Press. Lee, K., Booth, D., & Alam, P. (2005). A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms. Expert Systems with Applications, 29, 1–16. Lu, C. J. (2010). Integrating independent component analysis-based denoising scheme with neural network for stock price prediction. Expert Systems with Applications, 37, 7056–7064. McCluskey, W., & Anand, S. (1999). The application of intelligent hybrid techniques of residential properties. Journal of Property Investment & Finance, 17, 218–238. McGreal, S., Berry, J., McParland, C., & Turner, B. (2004). Urban regeneration, property performance and office markets in Dublin. Journal of Property Investment & Finance, 22, 162–172. Nguyen, N., & Cripps, A. (2001). Predicting housing value: A comparison of multiple regression analysis and artificial neural networks. Journal of Real Estate Research, 22, 313–336. Oh, K. J., & Han, I. (2000). Using change-point detection to support artificial neural networks for interest rates forecasting. Expert Systems with Applications, 19, 105–115.

8379

Oh, K. J., Kim, T. Y., & Min, S. (2005). Using genetic algorithm to support portfolio optimization for index fund management. Expert Systems with Applications, 28, 371–379. O’Roarty, B., Patterson, D., McGreal, W. S., & Adair, A. S. (1997). A case based reasoning approach to the selection of comparable evidence for retail rent determination. Expert Systems with Applications, 12, 417–428. Peterson, S., & Flanagan, A. B. (2009). Neural network hedonic pricing models in mass real estate appraisal. Journal of Real Estate Research, 31, 148–164. Pivo, G., & Fisher, J. D. (2010). Income, value and returns in socially responsible office properties. Journal of Real Estate Research, 32, 243–270. Rossini, P. (1997). Artificial neural networks versus multiple regression in the valuation of residential property. Australian Land Economics Review, 3, 1–12. Tay, D., & Ho, D. (1994). Intelligent mass appraisal. Journal of Property Tax Assessment and Administration, 1, 5–25. Versace, M., Bhatt, R., Hinds, O., & Shiffer, M. (2004). Predicting the exchange traded fund DIA with a combination of genetic algorithms and neural networks. Expert Systems with Applications, 27, 417–425. Walker, E., & Birch, J. B. (1988). Influence measures in ridge regression. Technometrics, 30, 221–227. Wilson, I. D., Paris, S. D., Ware, J. A., & Jenkins, D. H. (2002). Residential property price time series forecasting with neural networks. Knowledge-Based Systems, 15, 335–341. Wiltshaw, D. G. (1995). A comment on methodology and valuation. Journal of Property Research, 12, 157–161. Wong, F., & Tan, C. (1994). Hybrid neural, genetic, and fuzzy systems. In G. J. Deboeck (Ed.), Trading on the edge (pp. 243–261). New York: Wiley. Worzala, E., Lenk, M., & Silva, A. (1995). An exploration of neural networks and its application to real estate valuation. Journal of Real Estate Research, 32, 185–202.