Expert Systems with Applications 39 (2012) 9054–9063
Contents lists available at SciVerse ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Using a fuzzy association rule mining approach to identify the financial data association G.T.S. Ho a, W.H. Ip a, C.H. Wu a,⇑, Y.K. Tse a,b a b
Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hunghom, Hong Kong The York Management School, University of York, Heslington, York YO10 5GD, United Kingdom
a r t i c l e
i n f o
Keywords: Hang Seng Index Financial data association Fuzzy association rule Fuzzy set theory
a b s t r a c t In the rapidly changing financial market, investors always have difficulty in deciding the right time to trade. In order to enhance investment profitability, investors desire a decision support system. The proposed artificial intelligence methodology provides investors with the ability to learn the association among different parameters. After the associations are extracted, investors can apply the rules in their decision support systems. In this work, the model is built with the ultimate goal of predicting the level of the Hang Seng Index in Hong Kong. The movement of Hang Seng Index, which is associated with other economics indices including the gross domestic product (GDP) index, the consumer price index (CPI), the interest rate, and the export value of goods from Hong Kong, is learnt by the proposed method. The case study shows that the proposed method is a feasible way to provide decision support for investors who may not be able to identify the hidden rules between the Hang Seng Index and other economics indices. 2012 Elsevier Ltd. All rights reserved.
1. Introduction The stock market has become an important venture for investment in many parts of the world in which Hong Kong is a salient example. The Hong Kong Stock Exchange was the second largest stock market in Asia up to 2010. It is believed that most Hong Kong citizens possess experience in trading stocks. However, the majority of investors only rely on their own judgment based on past experience. It is common for the individuals to lose money due to a lack of professional insight. Therefore, a reliable approach for the prediction of stock prices, which has a comprehensive reasoning considering environmental factors influencing fluctuation, is essential for investors in their decision making process. Due to the high degree of uncertainty in stock markets, there are numerous studies that attempt to predict stock prices. In the past, traditional time series models such as autoregressive integrated moving average (ARIMA) method were widely used. However, it is demonstrated that these traditional time series analyses are difficult to be applied because of the noisy environment in the stock market. Apart from the traditional statistical methods, artificial intelligence has been suggested for forecasting in recent years. For instance, Walczak (1999) attempted to use neural network to look ⇑ Corresponding author. Address: Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University, Hunghom Kowloon, Hong Kong. Tel.: +852 2766 6584. E-mail addresses:
[email protected],
[email protected], jack.
[email protected] (C.H. Wu). 0957-4174/$ - see front matter 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2012.02.047
for rules in the capital market. Moreover, interesting research like text mining on news article (Huang, Liao, Yang, Chang, & Luo, 2010) was carried out to discover the relationship between the social psyche and future stock prices. Artificial intelligence can provide satisfactory results in discovering patterns in data. Several studies have made good use of particular artificial intelligence techniques like neural network (Walczak, 1999), association rule (Huang et al., 2010), and genetic algorithm (Iba & Sasaki, 1999). However, the research only focused on the stock price patterns, and whilst these models can accurately predict stock prices, they could not provide any direct implications for decision making support. Investors usually want direct parameter guidelines besides associations. Fuzzy logic is widely used in decision support systems and it is salient in providing direct suggestions to users. Recent research (Tavakkoli, Jamali, & Ebrahimi, 2010) showed how fuzzy logic can help in evaluating the financial performance of stocks. Nevertheless, the learning process of fuzzy logic in predicting financial markets is the most important part. More studies are being carried out to demonstrate how artificial intelligence methods can be applied to gather the essential historical rules. In this work, a stock market prediction system using fuzzy association rule is proposed. The fuzzy association rule is a combination of the association rule and the fuzzy set theory. A fuzzy set can compensate for the weakness of the association rule in the prediction of complex data. Different social and economic indices in Hong Kong, including gross domestic product (GDP), consumer price index (CPI), total annual export value, and interest rate, are used to predict the Hang Seng Index. A direct guideline should be
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
able to be put into the decision support system. Nonetheless, using a computable program, other parameters can be added anytime.
2. Previous work in stock price prediction using artificial intelligence In this dynamic, complex and noisy environment, prediction in the financial market is not an easy task. In the past, statistical methods were used to model the stock price. However, statistical models are not very useful in providing accurate results under such a turbulent environment. The evolution of artificial intelligence techniques has in the past few decades helped scholars build up new models in providing more accurate and efficient results for market trend prediction. Artificial neural networks (ANNs) are widely used in the stock price prediction systems. Recently, Mostafa (2010) demonstrated using neural networks in forecasting stock moments in Kuwait in an empirical study. Although artificial neural networks can provide a robust performance in handling vast amounts of data, it is hard to provide a consistent and predictable performance for noisy data (Kim, 2006). Neural network may not be reliable enough to deal with the tremendous noise and complex dimensionality. Several researchers have suggested improving neural networks techniques to revolutionize the reliability of neural networks. Recently, Yudong and Lenan (2009) demonstrated the use of a back propagation neural network to predict stock prices. Chang, Liu, Lin, Fan, and Ng (2009) also used a back propagation neural network together with a case reasoning system for stock trading prediction. However, these back propagation neural networks lack convergence in the learning process. On the other hand, neural networks are usually blamed for the lack of logical reasoning as well. Tsakonas, Dounias, Doumpos, and Zopounidis (2006) used logic neural nets which could be better understood by the public. The meaning of the neural net can be preserved by modifying the architecture of the network. The other common method is the genetic algorithm. Hui (2003) summarized the way to apply a simple genetic programming model to the prediction of stock prices. Furthermore, Oh, Jim, Min, and Lee (2006) also developed a portfolio management system based on genetic algorithms. Nevertheless, the weakness of genetic programming is the difficulty in discovering the optimal constants for terminal nodes. The numbers are difficult to set by users who are not experts in artificial intelligence. Both genetic algorithms and ANNs can hardly learn the process from a professional and meaningful perspective. Recently, rule-based theory has been widely applied in the prediction system as the introduction of membership values can be set by professionals in the field. One of the most well-known rulebased methods is fuzzy logic. Liu (2009) presented an integrated fuzzy time series model for forecasting. He summarized different fuzzy time series models and built his own. The model can be used for predicting any time series data including stock prices. There are also other types of rule-based systems. Chang and Liu (2008) suggested a Takagi–Sugeno–Kang type Fuzzy Rule Based System for stock price prediction. Moreover, Dymova, Sevastianov, and Bartosiewicz (2010) demonstrated a new approach to set up a system using the Dempster–Shafer Set Theory. More importantly, there are several interesting research studies on hybrid models, combining two or more artificial intelligence techniques to build an integrated model. Kim and Han (2000) stated a genetic approach in using ANNs in stock price index prediction. Besides neural networks and genetic programming, the neuro-fuzzy model is also a hot topic at the moment recently. Ang and Quek (2006) investigated a method to predict stock price differences by using neuro-fuzzy systems. Kim and Shin (2007)
9055
suggested an integrated approach based on neural networks and genetic algorithms to detect short-term patterns in stock markets. In addition, Atsalakis and Valavanis (2009) pointed out a neurofuzzy methodology to forecast stock market short-term trends. The method being used in this work is a good example of a hybrid model – the fuzzy association rule. An association rule is a case-based reasoning algorithm, which is commonly used in data mining. Oh and Kim (2007) applied a case-based reasoning system to predict the financial market collapse in Korea. Chun and Park (2006) applied case-based reasoning techniques to predict the Korea Stock Exchange Index price based on the volume and price of the Index. Nevertheless, the association rule is rarely used in stock price prediction due to its limitation in quantitative data analysis. Early research of association rules basically focused on Boolean association rules. Lu, Han, and Feng (1998) applied binary association rules in predicting simple stock price movements and inter-transactions. Recently, scholars tried to apply the association rule from another angle. Huang et al. (2010) demonstrated the use of association rules and text mining techniques in predicting short-term movement trends of the market. Anyhow, it is evident that the binary association rule is only able to predict the direction, rather than accurate ranges. Obviously, it is not sufficient only to have Boolean attributes in an artificial intelligence system. Combining the association rule with fuzzy theory will provide a new and meaningful approach in time-series forecasting. To address the weakness of the association rule, fuzzy sets can help map the ranges of the parameters. Therefore, the association rule can be applied after the data are simplified. 3. The proposed decision support system based on a fuzzy association rule mining approach The proposed decision support system is designed to extract hidden patterns from the economical attributes and the individual stock price or index. Fig. 1 illustrates the flow of the proposed algorithms embedded in the decision support system, and Table 1 states the notation used in the algorithm. The system is proposed to capture the historical data of the economical attributes and the individual stock market prices from different sources, such as research engines and government press releases, followed by converting of the data into knowledge by applying fuzzy association rules to determine if any of the economical attributes have a positive or negative impact on the stock price. In fact, the system can also allow users to access the knowledge repository where all the mined association rules are stored. The association rules can then provide direct decision support to the user in portfolio investment decisions. 3.1. Data collection and pre-processing Historical data of economical indices can be collected from the internet. For instance, the Census and Statistics Department of Hong Kong provides important economy indices, including gross domestic product and consumer price index for each quarter. These data in excel table format are available and accessible from the Hong Kong government website. The historical price data of the chosen stock should also be collected. Yahoo! Finance provides accurate and detailed historical data on listed stocks. Users can gather the historical data from the Yahoo! Financial historical record in excel format as well. It is suggested that more patterns could be learnt over a relatively long period, like 10 years. Since any input errors could affect the result, errors should be removed before the data enters the mining process. Possible errors such as
9056
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
Fig. 1. Flow chart of the proposed method.
the mismatching of data between the data from the government website and Yahoo! Finance should be avoided. After double checking the data and eliminating the errors, the data should be processed to a readable format of the mining process for further calculation. 3.2. Data mining engine The processed data will be transferred to the mining engine. The objective of the data mining engine is to identify the hidden patterns between the economical attributes and the stock market prices. The algorithm of this data mining engine is similar to the mining algorithm that combines data mining techniques, fuzzy sets theory and association rules theory, which was proposed and used by Hong, Lin, and Wang (2003) and Lau, Ho, Zhao, and Chung (2009). The algorithm has been widely and successfully
applied in data mining in the manufacturing field and this paper integrates this potent technique to the proposed system in the financial context. In addition, the fuzzy sets and threshold values should be set before the data mining process. It is suggested that professionals in the area should be invited to set the relevant sets and thresholds. As the values of the sets and thresholds directly affect the accuracy of the system, it is essential to define proper values in a flexible way. In the algorithm, the system engineer collects h records, mentioned in the previous procedure. Each record contains the value of the economical attributes and stock market prices at a particular time frame. The fuzzy set membership function M, minimum threshold support values S-threshold, and a universe minimum threshold confidence value Cthreshold should be predefined, and are necessary in the mining process. The further calculating process of the algorithm is discussed in the Case Scenario and Case
9057
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063 Table 1 List of notations used in the proposed stock price prediction fuzzy-association algorithm. h H = {1, 2, . . . , h} Hi a A = {1, 2, 3, . . . , a} Hij Mij Fj Fij Sj-threshold Stpq MAX-Stp MAX-FCtp MIN-Mij Yl Clx Cthreshold RT
Total number of the historical records of HS index and other economy attributes in the database The set of index of historical records The ith record in the historical database, "i 2 H The number of attributes in the database The set of index of attributes The jth attributes in the ith records, "i 2 H, j 2 A The membership value of the jth attributes in the ith records, "i 2 H, j 2 A The set of types of fuzzy category in the jth attributes,"j 2 A The fuzzy category of membership value of Mij, "k 2 Fj The threshold support for the jth parameter, "j 2 A The support count of the tth p-itemset combination for the qth fuzzy category type The maximum count value among supportcounttp values The fuzzy class of tp with MAX-Stp The minimum membership value of the combination in the specific record The number of combinations in the l-itemset table The confidence value of the xth association rule in the litemset table The universe threshold confidence value The sets of association rule
Table 2 The symbols of the economic attributes. Parameters
Symbol
Input Interest rate RMB-HK exchange rate Consumer price index GDP Export value
A B C D E
Output Hang Seng index (HSI) Variation of HSI
F G
Demonstration. By applying the data mining process, a set of mined fuzzy association rules, which reflects the relationship between the economical attributes and the stock market prices, can be discovered. 3.3. Decision supporting system Afterwards, the ‘‘if-then’’ based rule can be extracted by the mechanism. The mined fuzzy association rule will be transferred to a knowledge repository. The user in the firm can visit the knowledge repository and retrieve the association rules. The association rules provide direct decision support to the user. For instance, it
can show the relationship between the economical attributes and the level or the variation of the individual stock market price. Therefore the user can make a decision based on the evidence. The knowledge repository will be updated repeatedly by updating the data and running the mining engine periodically in order to maintain accuracy and up-to-date information. 4. Case scenario A case scenario is given to illustrate the feasibility of the proposed algorithm mining rules from quantitative process data in the historical economic data. The algorithm was applied and evaluated in the case, which used actual data from Hong Kong for the past 5 years (2006–2010) to identify the relationships between the economic attributes in the Hong Kong context. This case scenario helps demonstrate the mechanism of the algorithm, and how users can learn through discovering the hidden relationships in the data. Different symbols are used to represent various attributes, as shown in Table 2. The data set, including the historical Hong Kong market data was collected, with each record including different attributes as stated in Table 3. The fuzzy membership sets used to transfer the quantitative data into sets are shown in Fig. 2. The minimum support threshold for each attribute is predefined in Table 4. The threshold confidence value is set to 75% in the calculation, and ten steps are used in order to extract the important rules. Step 1: Transform the quantitative values of all historical records into respective fuzzy sets using the fuzzy membership functions stated in the Fig. 3. Using 7.64 in the first record of attribute A as an example, referring to the fuzzy membership function, 7.64 lies in both ‘‘Low’’ and ‘‘High’’ classes. As shown in Fig. 3, 7.64 is converted into the fuzzy set which is calculated as (0.64/High + 0.36/Low). The step is then repeated for all attributes in all records. Step 2: After converting all the values into fuzzy sets, the values of same fuzzy class have to be summed. The count of each fuzzy class is calculated, and put in set Y, where Y is called an ‘‘itemset’’, which contains the qualified combination of items for further calculation. Taking attribute A as an example again, as attribute A has four fuzzy classes including ‘‘very low’’, ‘‘low’’, ‘‘high’’ and ‘‘very high’’, therefore counts for four classes have to be calculated. The counts are calculated and shown in Table 5. The ‘‘Very Low’’ class is chosen to demonstrate the calculation. From the 10 records, the count of the ‘‘very low’’ class of attribute A is taken by summing up the fuzzy count of {A.Very Low} in each record. The calculation is (0 + 0 + 0 + 0 + 0 + 0 + 1 + 1 + 1 + 1) = 4. The other attribute support counts are calculated by following the same step.
Table 3 The historical record from Hong Kong (2006–2010 Semi-annually). Period
Interest rate (A)
RMB-HK exchange rate (B)
Consumer price index (C)
GDP per capita (D)
Export value (E)
Hang Seng index (HSI) (F)
HSI Variation (G)
Jan-06 Jul-06 Jan-07 Jul-07 Jan-08 Jul-08 Jan-09 Jul-09 Jan-10 Jul-10
7.64 8 8 7.75 7.75 6.56 5.25 5.08 5 5
0.9594 0.9681 0.9859 1.0128 1.0367 1.0836 1.1369 1.1305 1.1325 1.1369
101.2 102.1 102.9 103.9 104.4 106.7 110.1 110 109.6 108.2
205628 207816 233494 235156 256828 247553 247196 239305 215398 255040
373314 350588 400310 373686 415498 410438 402082 433905 390465 414516
14876.43 15857.89 18324.35 19800.93 23984.14 23455.74 22102.01 13888.24 15520.99 20955.25
461.08 1701.48 1037.44 1221.03 4702.28 6143.97 3149.61 3503.02 2565.34 2503.93
9058
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
Fig. 2. Fuzzy membership sets for the attributes.
9059
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063 Table 4 The threshold support count for each attributes.
Support count
Table 8 The support count for each attribute in 2-itemset.
A
B
C
D
E
F
G
Attributes
Count
4.2
4.1
3.8
4
3.6
2.8
2.8
AB AC AD AE AF AG BC BD BE BF BG CD CE CF CG DE DF DG EF EG FG
2.35 4.14 2.67 2.70 0.75 0.88 3.95 7.09 6.94 3.48 5.51 4.15 4.05 2.02 2.23 6.83 3.48 5.17 3.48 5.39 3.48
Fig. 3. The fuzzification of attribute A.
Table 9 Reviewed 2-itemset.
Table 5 The fuzzy counts of itemset 1. Attributes
Count
A.Very Low A.Low A.High B.Medium B.High C.Medium C.High D.High D.Very High E.Medium E.High F.Very Low F.Low F.High G.Low G.High G.Very High
4.00 1.86 4.14 2.15 7.84 6.10 3.90 2.60 7.41 2.24 7.76 3.31 3.21 3.48 1 3.49 5.51
Table 6 The maximum count for each 1-itemset.
Attributes
Count
AC BD BE BG CE DE DG EG
4.14 7.09 6.94 5.51 4.05 6.83 5.17 5.39
Table 10 3-Itemset. Attribute
Count
BDE BDG BEG DEG
6.50 5.17 5.39 5.05
Table 11 4-Itemset.
Attributes
Count
A.High B.High C.Medium D.Very High E.High F.High G.Very High
4.14 7.84 6.1 7.41 7.76 3.48 5.51
Table 7 Reviewed 1-itemset. Attributes
Count
A.High B.High C.Medium D.Very High E.High F.High G.Very High
4.14 7.84 6.1 7.41 7.76 3.48 5.51
Attribute
Count
BDEG
5.05
Step 3: Calculate the maximum count (MAX-Stp) for each attribute and the corresponding fuzzy class (MAX-FCtp), which is the class with the highest support count in the attribute. In attribute A, the counts of the four classes ‘‘Very Low’’, ‘‘Low’, ‘‘High’’ and ‘‘Very High’’ are 4, 1.86, 4.14, and 0 respectively. Therefore, the class ‘‘High’’ will be selected for attribute A. Attribute A has a MAX-Stp set as 4.14, and MAX-FCtp set as ‘‘High’’. The maximum counts and the corresponding classes are calculated and shown in Table 6. Step 4: Check if the value maximum count (MAX-Stp) of the corresponding fuzzy class (MAX-FCtp) is larger than the threshold support count of the attribute. The threshold support count is predefined. If the value is smaller than
9060
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
the threshold support count, the attribute will be rejected from itemset Y1. The reviewed 1-itemset is listed in the Table 7. As all the attributes passed the threshold checking, none of them are removed. Step 5: The attributes from 1-itemsset Y1 is used to form every possible combination. The combination will be put in 2-itemset Y2. The support count of each item is calculated by selecting the minimum number of counts for each item in the historical record. For instance, considering {A.High, B.High} in the first record, the count of A.High is 0.64 and the count of B.High is 1. Therefore, the support count of {A.High, B.High} is 0.64 in the first record. The calculation in records 2–9 are similar. The support count is found for each attribute and is listed in Table 8. Step 6: After obtaining the support count for each item, the support counts have to be checked with the threshold. The threshold of the item is the maximum value of the predefined minimum support count within the combination. For instance, the support count of A.High is 4.2 and B.High is 4.1. Therefore, the threshold support count of {A.High, B.High} is 4.2. If the support count is smaller than threshold support count, the items will be rejected and dropped from the itemset table. The reviewed 2itemset table is listed in Table 9. Step 7: Steps 5–6 are repeated to formulate a higher level of itemset until there are no available combinations to be found. In this case, 3-itemset and 4-itemset can be developed as listed in Tables 10 and 11.
Table 12 Possible association rules and confidence value. Rules
Confidence
Confidence (%)
If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If If
4.14/4.14 7.09/7.84 6.94/7.84 5.51/7.84 4.14/6.1 4.05/6.1 7.09/7.41 6.83/7.41 5.17/7.41 5.39/7.76 5.51/5.51 5.17/5.51 5.39/5.51 6.5/7.09 6.5/6.94 6.5/6.83 6.5/7.84 6.5/7.41 6.5/7.76 5.17/7.09 5.17/5.51 5.17/5.17 5.17/7.84 5.17/7.41 5.17/5.51 5.39/6.94 5.39/5.51 5.39/5.39 5.05/6.83 5.05/5.17 5.05/5.39 5.05/7.41 5.05/7.76 5.05/5.51 5.05/6.5
100 90 89 70 68 66 96 92 70 69 100 94 98 92 94 95 83 88 84 73 94 100 66 70 94 78 98 100 74 98 94 68 65 92 78
{A.High} then {C.Medium} {B.High} then {D.Very High} {B.High} then {E.High} {B.High} then {G.Very High} {C.Medium} then {A.High} {C.Medium} then {E.High} {D.Very High} then {B.High} {D.Very High} then {E.High} {D.Very High} then {G.Very High} {E.High} then {G.Very High} {G.Very High} then {B.High} {G.Very High} then {D.Very High} {G.Very High} then {E.High} {B.High, D.Very High} then {E.High} {B.High, E.High} then {D.Very High} {D.Very High, E.High} then {B.High} {B.High} then {D.Very High, E.High} {D.Very High} then {B.High, E.High} {E.High} then {B.High, D.Very High} {B.High, D.Very High} then {G.Very High} {B.High, G.Very High} then {D.Very High} {D.Very High, G.Very High} then {B.High} {B.High} then {D.Very High, G.Very High} {D.Very High} then {B.High, G.Very High} {G.Very High} then {B.High, D.Very High} {B.High, E.High} then {G.Very High} {B.High, G.Very High} then {E.High} {E.High, G.Very High} then {B.High} {D.Very High, E.High} then {G.Very High} {D.Very High, G.Very High} then {E.High} {E.High, G.Very High} then {D.Very High} {D.Very High} then {E.High, G.Very High} {E.High} then {D.Very High, G.Very High} {G.Very High} then {D.Very High, E.High} {B.High, D.Very High, E.High} then G.Very High}
Table 13 A collection of association rule with confidence value over 75%. Rules
Confidence
Confidence (%)
If If If If If If If If If If If If If If If If If If If If If If If If
4.14/4.14 7.09/7.84 6.94/7.84 7.09/7.41 6.83/7.41 5.51/5.51 5.17/5.51 5.39/5.51 6.5/7.09 6.5/6.94 6.5/6.83 6.5/7.84 6.5/7.41 6.5/7.76 5.17/5.51 5.17/5.17 5.17/5.51 5.39/6.94 5.39/5.51 5.39/5.39 5.05/5.17 5.05/5.39 5.05/5.51 5.05/6.5
100 90 89 96 92 100 94 98 92 94 95 83 88 84 94 100 94 78 98 100 98 94 92 78
{A.High} then {C.Medium} {B.High} then {D.Very High} {B.High} then {E.High} {D.Very High} then {B.High} {D.Very High} then {E.High} {G.Very High} then {B.High} {G.Very High} then {D.Very High} {G.Very High} then {E.High} {B.High, D.Very High} then {E.High} {B.High, E.High} then {D.Very High} {D.Very High, E.High} then {B.High} {B.High} then {D.Very High, E.High} {D.Very High} then {B.High, E.High} {E.High} then {B.High, D.Very High} {B.High, G.Very High} then {D.Very High} {D.Very High, G.Very High} then {B.High} {G.Very High} then {B.High, D.Very High} {B.High, E.High} then {G.Very High} {B.High, G.Very High} then {E.High} {E.High, G.Very High} then {B.High} {D.Very High, G.Very High} then {E.High} {E.High, G.Very High} then {D.Very High} {G.Very High} then {D.Very High, E.High} {B.High, D.Very High, E.High} then G.Very High}
Table 14 Final interesting rules extracted. Rules
Confidence
Confidence (%)
If {B.High, E.High} then {G.Very High} If {B.High, D.Very High, E.High} then {G.Very High}
5.39/6.94 5.05/6.5
78 78
Step 8: All possible association rules for each Yi itemsets (i > = 2), are extracted, and the confidence value for each rule tested. Take If {B.High, D.Very High} then {E.High} as an example, the confidence value is calculated as
ðB:High \ D:Very High \ E:HighÞ 6:5 ¼ ¼ 92% ðB:High \ D:Very HighÞ 7:09 All the possible rules and confidence values are listed in Table 12. Step 9: Compare the confidence value of each possible association rule with the predefined threshold confidence value, which is 75% in this case. Only rules with confidence values greater than 75% are kept in the table, as shown in Table 13. Step 10: Keep the rules with ‘‘Output Attribute’’, which are H and G in this case, in the consequence part only. Output rules as interesting rules as shown in Table 14. 5. Feasibility case study Nowadays, most investment agencies apply computer programs in assisting in market stock price prediction or any other decision support activities, so the suggested system can readily be brought into a computer programme. Applying computer programmes provide vast amounts of benefits to firms such as avoiding manual mistakes in calculation and in providing good reliability. In this part, a prototype developed in MATLAB environment based on the proposed data mining technique is shown to demonstrate the application of the proposed system (Fig. 4). Similarly as in the previous part, several attributes like GDP and Hong Kong interest rate are selected, together with the historical records of the Hang Seng Index over 10 years. The firm collects the data on different re-
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
9061
Fig. 4. The user interface of the prototype.
Fig. 5. A CSV file contains the seven parameters.
lated attributes and the historical stock price that is stored in a CSV file. For instance, an CSV file is created and contains the interest rate data, the HK-RMB exchange rate, the consumer price index (CPI), GDP, the export figures, the Hang Seng Index, and the monthly variation of HSI from 2000–2010, as in Fig. 5. In addition, the membership function and threshold minimum support count of each attributes must be determined before the
operation. The most common method of deciding fuzzy sets is relying on the knowhow of an expert in the particular area. In this case, the firm could seek opinions on the sets from financial experts. The membership function and the threshold values should be typed into CSV files. After importing the CSV files into the program, the user can select the category of the attribute including ‘‘Input’’, ‘‘Output’’ and
9062
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
both, as shown in Fig. 6. It affects whether the attribute should be shown in the cause or result statement in the rules. The category notation affects the names of the fuzzy set like ‘‘Long–short’’ and ‘‘High–low’’. After execution, the mined rules are shown in the text box, as shown in Fig. 6. The results can be exported into a TXT file, as in Fig. 7. The TXT file can be further developed and be read
by the knowledge repository. The mined association rules can be applied in the decision support system later. On the other hand, a pie chart of the frequency of appearance in the results of different attributes is shown in the result box. The user can know which attribute play a more important role in the rules set, and can provide a direct and simple analysis instantly.
Fig. 6. After importing the CSV file.
Fig. 7. The exported results after rule-mining.
G.T.S. Ho et al. / Expert Systems with Applications 39 (2012) 9054–9063
9063
6. Conclusive remarks and future work
References
Predicting the stock market price is invaluable to any investment company or individual user. The traditional way of stock price prediction requires vast amounts of data collection, and information and graphical analysis on every individual stock. These activities require very strong financial knowledge and lots of time. This study has shown an infrastructure for an expert system on market prediction. The system can lead to a high investment efficiency from different parties without applying any financial analysis knowhow. There are significant advantages compared to the other data mining techniques and artificial intelligence methodologies. The mined fuzzy association rules can be used to provide investors with interesting patterns between different economical attributes and the individual stock market price. In addition, the mined rules can be used in a decision support system which can provide direct decision advice to the users. The decision support function can greatly reduce the risk in investment activities. However, there is a downside in the proposed system. To obtain a high accuracy, financial knowledge has to be applied when designing the fuzzy set and threshold value before applying the system to the real market. Researchers have pointed out that both directly affecting the accuracy of the rules mining (Coenen & Leng, 2006). The expertise involves a vast amount of resources and time to design the optimal sets and thresholds. Once the values are designed, the values have to be reviewed periodically. It can be seen that users can apply the method and improve the effectiveness of investment if the parameters can be set up in a good manner. This research provides a systematic way in mining interesting patterns for stock market value prediction and demonstrates the operation of the proposed method. Users have to prepare historical data and need to seek help from experts to set up fuzzy sets and threshold values for the system. The proposed method provides an efficient way in data mining and furthermore, in providing resources in decision support. Hidden patterns and rules in the historical records can be found by the fuzzy association rule approach. The rules are easy to understand and can be gathered in the knowledge repository. Thus, a decision support system can be built based on the mined rules. The proposed method can improve the success of users in making investment decisions in the stock market. Further work is suggested to refine the methodology of designing suitable fuzzy sets and threshold values to enhance the reliability. It is recommended that more research is undertaken on determining suitable sets and threshold values to improve the reliability and feasibility of the decision support system.
Ang, K. K., & Quek, C. (2006). Stock trading using RSPOP: A novel rough set-based neuro-fuzzy approach. IEEE Transactions on Neural Networks, 17(5), 1301–1315. Atsalakis, G. S., & Valavanis, K. P. (2009). Forecasting stock market short-term trends using a neuro-fuzzy based methodology. Expert Systems with Applications, 36(7), 10696–10707. Chang, P., & Liu, C. (2008). A TSK type fuzzy rule based system for stock price prediction. Expert Systems with Applications, 34(1), 135–144. Chang, P. C., Liu, C. H., Lin, J. L., Fan, C. Y., & Ng, C. S. P. (2009). A neural network with a case based dynamic window for stock trading prediction. Expert Systems with Applications, 36(3), 6889–6898. Chun, S. H., & Park, Y. J. (2006). A new hybrid data mining technique using a regression case based reasoning: Application to financial forecasting. Expert Systems with Application, 31(2), 329–336. Coenen, F., & Leng, P. (2006). The effect of threshold values on association rule based classification accuracy. Data & Knowledge Engineering, 60(2), 345–360. Feb. 2007. Dymova, L., Sevastianov, P., & Bartosiewicz, P. (2010). A new approach to the rulebase evidential reasoning: Stock trading expert system application. Expert Systems with Applications, 37(8), 5564–5576. Hong, T. P., Lin, K. Y., & Wang, S. L. (2003). Fuzzy data mining for interesting generalized association rules. Journal of Fuzzy Sets and Systems, 138, 255–269. Huang, C. J., Liao, J. J., Yang, D. X., Chang, T. Y., & Luo, Y. C. (2010). Realization of a news dissemination agent based on weighted association rules and text mining techniques. Expert Systems with Applications, 37, 6409–6413. Hui, A. (2003). Using genetic programming to perform time-series forecasting of stock prices. Available at:
. [Accessed 29 Aug 2011]. Iba, H. & Sasaki, T. (1999). Using genetic programming to predict financial data. In Proceedings of the congress on evolutionary computation, Washington, US (pp. 244–251). Kim, K. (2006). Artificial neural networks with evolutionary instance selection for financial forecasting. Expert Systems with Applications, 30(3), 519–526. Kim, K. J., & Han, I. (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index. Expert Systems with Applications, 19, 125–132. Kim, H., & Shin, K. (2007). hybrid approach based on neural networks and genetic algorithms for detecting temporal patterns in stock markets. Applied Soft Computing, 7(2), 569–576. Lau, H. C. W., Ho, G. T. S., Zhao, Y., & Chung, N. S. H. (2009). Development of a process mining system for supporting knowledge discovery in a supply chain network. International Journal of Production Economics, 122, 176–187. Liu, H. T. (2009). An integrated fuzzy time series forecasting system. Expert Systems with Applications, 36(6), 10045–10053. Lu, H., Han, J., & Feng, L. (1998).Stock movement and n-dimensional intertransaction association rules. In Proceedings of the SIGMOD workshop research issues on data mining and knowledge discovery, Seattle, US (pp. 12:1–12:7). Mostafa, M. M. (2010). Forecasting stock exchange movements using neural networks: Empirical evidence from Kuwait. Expert Systems with Applications, 37(9), 6302–6309. Oh, K. J., Jim, T. Y., Min, S. H., & Lee, H. Y. (2006). Portfolio algorithm based on portfolio beta using genetic algorithm. Export Systems with Applications, 30(3), 527–534. Oh, K. J., & Kim, T. Y. (2007). Financial market monitoring by case-based reasoning. Expert Systems with Applications, 32(3), 789–800. Tavakkoli, M., Jamali, A., & Ebrahimi, A. (2010). New Method to evaluate financial performance of companies by fuzzy logic: Case study, drug industry of Iran. Asia Pacific Journal of Finance and Banking Research, 4(4), 15–24. Tsakonas, A., Dounias, G., Doumpos, M., & Zopounidis, C. (2006). Bankruptcy prediction with neural logic networks by means of grammar-guided genetic programming. Expert Systems with Applications, 30(3), 449–461. Walczak, S. (1999). Gaining competitive advantage for trading in emerging capital markets with neural networks. Journal of Management and Information Systems, 16, 177–192. Yudong, Z., & Lenan, W. (2009). Stock market prediction of S&P 500 via combination of improved BCO approach and BP neural network. Expert Systems with Applications, 36(5), 8849–8854.
Acknowledgment The authors wish to thank the Research Office and the Department of ISE of The Hong Kong Polytechnic University for supporting the project (GYJ12).