Discovering utility-based episode rules in complex event sequences

Discovering utility-based episode rules in complex event sequences

ESWA 9872 No. of Pages 12, Model 5G 2 March 2015 Expert Systems with Applications xxx (2015) xxx–xxx 1 Contents lists available at ScienceDirect E...

3MB Sizes 0 Downloads 66 Views

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Expert Systems with Applications xxx (2015) xxx–xxx 1

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa 4 5 3

Discovering utility-based episode rules in complex event sequences

6

Yu-Feng Lin a, Cheng-Wei Wu a, Chien-Feng Huang b, Vincent S. Tseng a,⇑

7 8 9 10 1 1 2 2 13 14 15 16 17 18 19 20

a b

Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan 701, Taiwan Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 811, Taiwan

a r t i c l e

i n f o

Article history: Available online xxxx Keywords: Utility mining Episode rule mining High utility episode rules Complex event sequences

a b s t r a c t Mining high utility episode rules in complex event sequences has emerged as an important topic in data mining because the utility-based episode rules generated may provide important insights that facilitate decision making for expert and intelligent systems. Although one may employ previous methods in this research area to indirectly construct utility-based episode rules, they typically lack efficiency and effectiveness for real-world applications. In this paper, we develop a novel methodology to directly generate high utility episode rules during the mining process, which is the first work addressing the issue of utility-based episode rule mining. Our goal is to simultaneously resolve the difficulty of the previous reported methods for frequent episode mining and utility-based episode mining. An algorithm called UBER-Mine (Utility-Based Episode Rules) and a structure named UR-Tree (Utility Rule Tree) are proposed to mine efficiently the complete set of high utility episode rules in complex event sequences. In short, UBER-Mine is based on an extended downward closure property, which can efficiently discover utility-based episode rules. On the other hand, UR-Tree can maintain important event information without producing candidate episodes to further accelerate the mining process. Results on both real and synthetic datasets show that UBER-Mine with UR-Tree has good scalability on large datasets and runs faster than the basic UBER-Mine and the current best high utility episode mining algorithm over 100 times. Furthermore, by proposing a high-utility episode-rule model called IV-UBER (InVestment by Utility-Based Episode Rules), we further demonstrate the effectiveness of our method for mining high utility-based episode rules on a real-world application for stock investment. The experimental results show that our proposed IV-UBER method outperforms several state-of-the-art algorithms in terms of both precision and annualized return for investment.  2015 Published by Elsevier Ltd.

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44

45 46

1. Introduction

47

In the research of frequent episode mining (Gwadera, Atallah, & Szpankowski, 2005; Huang & Chang, 2008; Lin, Qiao, & Wang, 2014; Ma, Pang, & Tan, 2004; Mannila, Toivonen, & Verkamo, 1997; Zimmermann, 2014), a typical goal is to discover episodes of high frequency in order to shed light on the interesting characteristics embedded in many complex event sequences. In some particular applications, the high-frequency episodes discovered may be further used to generate episode rules for decision making and prediction. Generally, an episode rule takes the form X ) Y, where X and Y are frequent episodes. The left hand side (LHS) of the rule is called the antecedent, whereas the right hand side

48 49 50 51 52 53 54 55 56 57

⇑ Corresponding author. Tel.: +886 6 2757575x62536; fax: +886 6 2747076. E-mail addresses: [email protected] (Y.-F. Lin), silvemoonfox@ idb.csie.ncku.edu.tw (C.-W. Wu), [email protected] (C.-F. Huang), tsengsm@ mail.ncku.edu.tw (V.S. Tseng). URL: http://idb.csie.ncku.edu.tw/tsengsm (V.S. Tseng).

(RHS) is called the consequent. A classical way of episode rule mining consists of two steps: (1) discovering frequent episodes according to the number of times of their occurrence (i.e., frequencies); (2) generating frequent episode rules from the set of frequent episodes discovered in the first step. However, the frequent episodes discovered and their corresponding events may be associated with low utilities; and in some other cases, high utility episodes of low frequency may be missed during the mining process. As a consequence, these frequency-based episode mining methods may not succeed in revealing promising and meaningful episodes for realworld applications where the utility of the episodes is crucial for the performance of the mining methods. In the area of data mining, there exist several studies that had taken utility into account (Ahmed, Tanbeer, & Jeong, 2010; Chan, Yang, & Shen, 2003; Liu, Wang, & Fung, 2012; Tseng, Wu, Fournier-Viger, & Yu, 2014). More recently, high utility episode mining has emerged as a novel and important topic of research (Guo et al., 2014; Wu, Lin, Yu, & Tseng, 2013). In the framework of high utility episode mining (Guo et al., 2014; Wu et al., 2013),

http://dx.doi.org/10.1016/j.eswa.2015.02.022 0957-4174/ 2015 Published by Elsevier Ltd.

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 2 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

events are typically associated with distinct weights or utilities. Accordingly, a high utility episode may be defined as an episode whose utility is larger than a user-specified threshold and episodes with high utility can be discovered. Wu et al. (2013) first addressed the problem and proposed an efficient algorithm called UP-Span for efficiently discovering high utility episodes in complex event sequences. Presumably, in order to generate high-utility episode rules, one may just first use the method in Wu et al. (2013) to discover episodes with high utility and then construct promising episode rules from these episodes by the aforementioned methods for episoderule generation (Gwadera et al., 2005; Lin, Qiaoe, et al., 2014; Lin, Huang, & Tseng, 2014; Ma et al., 2004; Mannila et al., 1997). However, simply deriving rules from the set of high utility episodes by integrating these methods may not produce useful or meaningful episode rules to users; for instance, they may produce rules of low utility or simply miss the rules of high utility in the process of mining. To be more specific, if we use HE and LE to represent high utility and low utility episodes, simply deriving rules using the methods (Gwadera et al., 2005; Lin et al., 2014; Ma et al., 2004; Mannila et al., 1997) by the high utility episodes obtained from Wu et al. (2013) may produce episode rules of the form HE ) LE, whose consequent is of low-utility and not useful if one’s goal is to produce high utility consequents for decision making. (For example, one may be interested in generating an investment rule that brings about high profit when the antecedent of the rule is valid.) On the other hand, it may be computationally costly to generate high utility episodes in an initial stage, and then uses these episodes to generate the corresponding episode rules in a later stage. Therefore, rather than the two-stage method, it shall be highly meaningful to develop a straightforward way that directly generates high utility episode rules during the mining process. In light of these two major deficiencies arising from applying the twostage method (Gwadera et al., 2005; Ma et al., 2004; Mannila et al., 1997; Wu et al., 2013), in this study we propose a novel methodology for efficiently and effectively constructing high utility episode rules for complex event sequences. In addition to the aforementioned motivation for developing a method for the generation of high utility episode rules, it is worth noting that discovering episode rules that will contribute high utilities to users is not an easy task with the following challenges posed:  Mining high utility episode rules from complex event sequences is not a trivial task. In complex event sequences, different events can occur simultaneously at any time points, which is substantially different and much more challenging than mining episode rules in simple event sequences.  The downward closure property (Agrawal & Srikant, 1994; Mannila et al., 1997) may not be valid in the problems of high utility episode rule mining. As indicated in Wu et al. (2013), the utility of an episode may be higher, equal to, or lower than that of its super-episodes and/or sub-episodes. Therefore, many search space-pruning techniques that rely on the downward closure property cannot be used for mining high utility episode rules. Hence, a challenge is to design algorithms that can efficiently discover all high utility episode rules in complex event sequences.  The prediction models constructed by the proposed high utility episode rules might be ineffective in terms of profitability and accuracy. Thus, the utilization of high utility episode rules for constructing an effective prediction model and obtaining good results in real applications is a challenging task.

138 139 140 141

To sum up, since the aforementioned work of simply deriving rules from high utility episodes may produce insignificant episode rules and incur high computational cost, the major motivations of

this work are two-fold: (1) designing efficient algorithms to construct meaningful episode rules of high utility for complex event sequences, and (2) demonstrating the effectiveness of these utility-based episode rules to facilitate decision making for expert and intelligent systems. Therefore, in this study, we propose an efficient and effective methodology for discovering high utility episode rules in complex event sequences. The major contributions of this work are summarized as follows.

142

 To the best of our knowledge, the issue concerning utility-based episode rule mining has not been explored systematically. This research is the first work that develops a straightforward way to directly generate high utility episode rules in the mining process for complex event sequences.  An efficient algorithm, called UBER-Mine (Utility-Based Episode Rules), is proposed for discovering the complete set of high utility episode rules in complex event sequences. To further enhance the performance of UBER-Mine, a compact tree structure, named UR-Tree, (Utility Rule Tree) is proposed, which maintains important event information in the event sequences so that our proposed method can generate high utility episode rules more efficiently. Experimental results on both synthetic and real-world datasets show that our proposed UBER-Mine method, along with the UR-Tree structure, significantly outperforms the UP-Span method (Wu et al., 2013) and the baseline approach.  In order to further demonstrate the effectiveness of our proposed method, we propose an extended version of our method in the context of investment, named InVestment by Utility-Based Episode Rules (IV-UBER). We propose this method particularly for a challenging application of stock investment using a realworld dataset (the price dataset of Taiwan Capitalization Weighted Stock Index (TAIEX) (Taiwan Stock Exchange Corporation, 1962)). The experimental results show that the idea of high utility episode rules is successfully applied to the challenging problem of prediction for stock movement, and our proposed IV-UBER algorithm outperforms several state-of-the-art methods, including Lin, Huang, et al. (2014).

150

143 144 145 146 147 148 149

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178

The remainder of this paper is organized as follows. In Section 2, we introduce the background for episode mining and utility mining. Section 3 provides the formal definitions of high utility episode rule and presents our proposed algorithms. Experimental results are shown in Section 4. Conclusions and future works are given in Section 5.

179

2. Background and relevant definitions

185

In this section, we introduce the relevant background and definitions concerning our proposed methods for mining high utility episode rules.

186

2.1. Background

189

Mannila et al. (1997) first introduced the research of frequent episode mining for simple event sequences, and they aimed to discovery the relationships among events that frequently occur sequentially. For different applications, Mannila et al. (1997) proposed two interesting measurements, i.e., window-based occurrence and minimal occurrence, to calculate the number of times of episode occurrences in simple event sequences and proposed the Apriori-based algorithm to find the corresponding episodes. The algorithm, however, employed the candidate-generation mechanism to find frequent episodes, which has to generate a large number of candidates during the mining processes and the computational overhead is significant. To improve the performance of frequent episode mining, several methods have been proposed

190

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

180 181 182 183 184

187 188

191 192 193 194 195 196 197 198 199 200 201 202

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241

242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259

260 261 262 263 264 265

(Huang & Chang, 2008; Lin et al., 2014; Ma et al., 2004; Tatti & Cule B., 2012). After finding frequent episodes, frequent episode rules may be derived from the set of frequent episodes straightforwardly, according to the downward closure property (Agrawal & Srikant, 1994; Mannila et al., 1997). Although many studies (Ao, Luo, Li, Zhuang, & He, 2015; Gwadera et al., 2005; Huang & Chang, 2008; Lin et al., 2014; Ma et al., 2004; Mannila et al., 1997; Tatti & Cule, 2012; Zimmermann, 2014) have been proposed for frequent episode mining on different applications (e.g., streaming event sequences or compressed event sequences), and frequent episodes may be used to generate rules, as discussed in the introduction, these frequency-based episode mining methods may discover a large number of rules with low profit, and miss profitable rules of low frequency. In light of this deficiency, the concept of utility-based episode mining was first proposed and addressed in Wu et al. (2013). In the framework of high utility episode mining, the appearance count of events at any time point is also considered (i.e., internal utility) and the weight of events is considered (i.e., external utility). Based on the concept of minimal occurrence, Wu et al. (2013) proposed an efficient algorithm called UP-Span for efficiently discovering high utility episodes in complex event sequences. Thereafter, Guo et al. (2014) presented a prefix-tree structure and tighter upper bounds to speed up UP-Span method, but they may find an uncompleted set of high utility episodes due to the episode-weighted downward closure property (Wu et al., 2013). However, simply deriving episode rules using the set of high utility episodes discovered from Wu et al. (2013) and Guo et al. (2014) may produce rules with low utility because of the absence of an anti-monotone property. In utility mining (Ahmed et al., 2010; Chan et al., 2003; Liu et al., 2012; Tseng et al., 2014), typically, patterns or rules are treated as interesting and useful if they can bring about high utilities (i.e., high profits) to users. Although rules of the form HE ) LE or LE ) LE might be useful in certain domains, in this paper, the task we are interested in is to discover all the rules of the form (HE||LE) ) HE, where (HE||LE) represents episodes of either high or low utility, and our goal is to find episodes that will lead to a consequent with high utility.

2.2. Relevant definitions and illustrations In this subsection, we adopt the notations used in Wu et al. (2013) for definitions and properties relevant to episode mining. For more details about episode mining, readers can refer to (Gwadera et al., 2005; Huang & Chang, 2008; Ma et al., 2004; Mannila et al., 1997; Tatti & Cule, 2012; Wu et al., 2013). In the framework of episode mining, a complex event sequence C is denoted as h(SE1, T1), (SE2, T2), . . ., (SEn, Tn)i, where Ti < Tj for all 1 6 i < j 6 n. SEi = (E1, E2, . . ., Em) represents a simultaneous event set such that Ei 2 e for all 1 6 i 6 m. e is a finite set of events and an event E is associated with a time occurrence T. The length of a simultaneous event set SE is denoted as |SE| and defined as the number of events in SE. Given two simultaneous event sets SE1 and SE2, SE2 is the subset of SE1 and SE1 is the superset of SE2 if SE2 # SE1. The length of C is denoted as |C| = (Tn  T1 + 1). The hth window on C is denoted as Wh = h(SEh, Th), (SEh+1, Th+1), . . ., (SEh+WS-1, Th+WS-1)i, where 1 6 h 6 (|C|  WS + 1) and WS is a user-specified window size. The number of windows is denoted as NW = (|C|  WS + 1). An episode a = hSE1, SE2, . . ., SEmi is an ordered collection of simultaneous event sets, where SEi occurs before SEj for all 1 6 i < j 6 m. P The length of a is defined as jaj ¼ m i¼1 jSEi j, and an episode of length k is called a k-episode. Let a = hSE1, SE2, . . ., SEki and b = hSE01 ; SE02 ; . . . ; SE0l i be two episodes, where l 6 k. The episode b is the sub-episode of a if there exists r integers 1 6 I1 < I2 <    <

3

Ir 6 k such that SE0It # SEt for 1 6 t 6 l 6 k. In addition, episode a is the super-episode of b.

266

Definition 1 (Simultaneous and serial concatenations). Given two episodes a = hSE1, SE2, . . ., SEni and b = hSE01 ; SE02 ; . . . ; SE0m i, the simultaneous concatenation of a and b is defined as a h b = hSE1, SE2, . . ., SEn [ SE01 ; SE02 ; . . . ; SE0m i. The serial concatenation of a and b is defined as a  b = h(SE1), (SE2), . . ., (SEn), ðSE01 Þ; ðSE02 Þ; . . . ; ðSE0m Þi.

268

Definition 2 (Occurrence, set of all occurrences and set of ending time point of an episode). Let a = hSE1, SE2, . . ., SEki be an episode and TI = [Ts, Te] be a time interval between Ts and Te, where (Te  Ts) < WS. TI is called occurrence of a if (1) a occurs in [Ts, Te], and (2) simultaneous event sets SE1 and SEk occur at Ts and Te respectively. The set of all occurrences of a is denoted as

273

occSet(a) = {½T 1s ; T 1e ; ½T 2s ; T 2e ; . . . ; ½T ns ; T ne }, where T ks and T ke are the starting and ending time points of the kth occurrence of a for all 1 6 k 6 n. The set of ending time points of a is defined as

279

ETSetðaÞ ¼ fT 1e ; T 2e ; . . . ; T ne g.

282

Definition 3 (Window-based occurrence). Given an occurrence [Ts, Te] of episode a = hSE1, SE2, . . ., SEmi and a window W = hðSE01 ; T 01 Þ; ðSE02 ; T 02 Þ; . . . ; ðSE0n ; T 0n Þi, where m 6 n, the episode a is contained in W if there exists r integers 1 6 I1 < I2 <    < Ir 6 n such that SEIt # SE0t for 1 6 t 6 m 6 n. Let woSet(a) = {y1, y2, . . ., yk}, 1 6 i 6 k 6 NW be the set of window IDs that contain a; the number of window-based occurrences of a is denoted as |woSet(a)| and defined as the number of windows containing a.

283

Example 1. The following is a complex event sequence:ESeq: h((a, b),1), ((a,c), 2),((b, d), 4),((a, b,c),5),((d),6), ((a,b), 7)i. The length of ESeq is equal to 7, and (a, b) is a simultaneous event set at time points 3 and 7. If the user-specified window size WS is set to 3, the second window W2 on ESeq is h((a, c), 2), ((b, d), 4)i. Let a be a 3-episode h(b), (a, c)i, h(b, d), (a, c), (d)i and h(a)i are the super-episode and sub-episode of a, respectively. If WS = 3, [1, 2] is an occurrence of a and the set of ending time points of a is {2, 5}. In addition, the window-based occurrence of a is {1, 3, 4} and the number of windows containing a is 3.In the framework of high utility episode mining (Wu et al., 2013), a utility event, or u-event, is an ordered pair (E, u), where E 2 e represents an event with a utility u. A utility-based simultaneous event set, or u-SE, consists of no less than one u-event, which is defined as ((Ej1 , u1), (Ej2 , u2), . . ., (Ejr , ur)), where (Ejk , uk) is a u-event for 1 6 k 6 r, and "k1, k2, where 1 6 k1, k2 6 r and k1 – k2, jk1 – jk2 . The utility of u-SE at the time P point T is defined as TU = u(u-SE, T) = ki¼1 uðEi ; TÞ. A utility-based complex event sequence, or u-CES, is denoted as h(u-SE1, T1), (u-SE2, T2), . . ., (u-SEn, Tn)i, where Ti < Tj for all 1 6 i < j 6 n. The utility of P u-CES is defined as CU WS ¼ juCESj i¼WSþ1 uðu  SEi ; T i Þ.

291

Example 2. Table 1 shows a utility-based complex event sequence. The utility of the event (a) at the time point T1 is u((a, 3), T1) = 3, and the utility of the simultaneous event set h(a, 3) (b, 2)i at the time point T1 is u(h(a, 3) (b, 2)i, T1) = u((a, 3), T1) + u((b, 2), T1) = (3 + 2) = 5. In addition, the utility of a u-CES in Table 1 is 33 if WS is set to 3.

311

3. High utility episode rule mining

317

In this section, we explain how we incorporate the concept of utility into episode rule mining, and then formulate the problem

318

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

267

269 270 271 272

274 275 276 277 278

280 281

284 285 286 287 288 289 290

292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310

312 313 314 315 316

319

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 4

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx Table 1 A utility-based complex event sequence. Ti

u-events

TU

CU3

1 2 3 4 5 6 7

(a, 3) (b, 2) (a, 6) (c, 4)

5 10 0 6 7 12 8

33

(b, 2) (d, 4) (a, 3) (b, 2) (c, 2) (d, 12) (a, 6) (b, 2)

324

of high utility episode rule mining. Next, we describe our proposed algorithm UBER-Mine (Utility-Based Episode Rules) for efficiently discovering high utility episode rules in complex event sequences. Finally, a compact tree structure, named UR-Tree (Utility Rule Tree), is proposed to enhance the performance of UBER-Mine.

325

3.1. Preliminary

326

Given a set of time points N and a utility-based complex event sequence u-CES = h(u-SE1, T1), (u-SE2, T2), . . ., (u-SEn, Tn)i of length n, where each simultaneous event set u-SEi is associated with a time point Ti 2 N and Ti < Tj for all 1 6 i < j 6 n.

320 321 322 323

327 328 329 330 331 332 333 334

335 336 337 338 339 340 341 342

343 344 345 346

347 348 349 350 351 352 353

354 355 356 357 358 359

360 361 362

Definition 4 (Utility of an episode with respect to the hth window). If episode X is contained in Wh, the utility of the episode X with respect to Wh is denoted as u(X, Wh) and defined as u(u-SEh+WS, Th+WS), that is, the utility of the utility-based simultaneous event set u-SEh+WS at the time point Th+WS. Definition 5 (Utility of an episode rule). An episode rule r is an implication of the form X ) Y, where X and Y are episodes. Let woSet(X) = {y1, y2, . . ., yk}, 1 6 i 6 k 6 NW be the set of window IDs that contain X; the set of ending time points of windows that contain X is defined as ETW(X) = {p1, p2, . . ., pk}, where pi is the ending time point of W yi . The utility of the episode rule r in a complex event P sequence is defined as u(r) = u(X ) Y) = ki¼1 uðY; pi þ 1Þ, that is, the summation of the utility of Y at each pi. Definition 6 (High utility episode rule). Given a user-specified minimum utility threshold min_utility, an episode rule X ) Y is called a high utility episode rule if u(X ) Y) P (min_utility  CUWS). Otherwise, the episode rule is a low utility rule. Example 3. Let a be a 3-episode h(b), (a, c)i, the utility of a with respect to the W3 in ESeq is u(a, W3) = 12. If WS = 3, X = h(a), (d)i, Y = h(a, b)i, the utility of an episode rule is u(X ) Y) = u(Y, 5) + u(Y, 7) = 5 + 8 = 13. Given a user-specified minimum utility threshold min_utility = 0.3, the episode rule X ) Y is a high utility episode rule because u(X ) Y) = 13 is no less than (min_utility  CUWS) = 33  0.3 = 9.9. Definition 7 (Maximum utility of a rule if its antecedent is X). Let X be the antecedent of an episode rule r, and ETW(X) = {p1, p2, . . ., pk} be the set of the ending time points of windows that contain X, 1 6 i 6 k 6 NW, the maximum utility of r is defined as MURðXÞ ¼ Pk i¼1 uðuSEpiþ1 ; pi þ 1Þ. If MUR(X) 6 (min_utility  CUWS), r is a low utility rule. Theorem 1 (Episode Rule Antecedent-Weighted Downward Closure property). Let X and Y be episodes, and X ) Y be an episode rule r. The Episode Rule Antecedent-Weighted Downward Closure property

(abbreviated as ERADC) states that if MUR(X) 6 (min_utility  CUWS), r is a low utility rule.

363

Proof. Let ETW(X) = {p1, p2, . . ., pk}, where 1 6 i 6 k 6 NW. Because Y is a simultaneous concatenations from uSEpiþ1 , uðuSEpiþ1 ; pi þ 1Þ P uðY; pi þ 1Þ. According to Definitions 5–7, u(r) = Pk Pk so u(r) 6 i¼1 uðY; pi þ 1Þ 6 i¼1 uðuSEpiþ1 ; pi þ 1Þ = MUR(X), MUR(X) 6 (min_utility  CUWS), which yields that r is low utility rule. h Given this theorem, one may use it to accelerate the mining process by pruning the rules with less than pre-specified utility threshold. Problem statements (High utility episode rule mining). Given a utility-based complex event sequence and a user-specified minimum utility threshold min_utility, the goal of mining high utility episode rules is to discover all the episode rules that have utility no less than min_utility.

365

3.2. Mining utility-based episode rules (UBER-Mine)

379

The main procedure of the UBER-Mine algorithm is shown in Fig. 1. The input for the UBER-Mine algorithm is: (1) a utility-based complex event sequence u-CES, (2) a minimum utility threshold min_utility, and (3) a window size WS. The algorithm scans the complex event sequence once to find all one-episodes and their occurrences (Definition 7) (Line 1). For each one-episode a, the algorithm explores the search space of episodes by considering a as a prefix of LHS. The LHS a is extended by executing the MineUBER procedure (Lines 2–3). The algorithm consists of two sub-procedures MineSimuUBER and MineSerialUBER. The sub-procedures MineSimuUBER and MineSerialUBER aim at finding the simultaneous and serial events related to a, respectively (Lines 4–6). The procedure for MineSimuUBER is shown in Fig. 2, which is performed as follows. For each occurrence occ(a) = [Ts, Te] in occSet(a), the algorithm checks whether Te is in ETSet(c) (Definition 9) for each one-episode c. If Te is in ETSet(a), the algorithm puts c into the projected database of a (abbreviated as a-PDB) and generates an episode X = a h c by the concatenation of a and c. Then, the algorithm puts the occurrence [Ts, Te] to occSet(X) (Lines 1–6). After that, events that simultaneously occur with a and their occurrences are stored in a-PDB. Next, for each simultaneous event c in a-PDB, the episode X is formed by the simultaneous concatenation of a and c (Lines 7–8). The sub-procedure CalculateUtility is performed to find woSet(X) (Line 9). By giving woSet(X), MUR(X) can be calculated by Definition 18. If MUR(X) is no less than (min_utility  CU), the procedure GenUBER is performed to generate all the high utility episode rules whose antecedents are X (Definition 18). Then, the procedure MineUBER is performed to find high utility episode rules whose LHS are extended by X (Lines 10–12). The procedure for MineSerialUBER is shown in Fig. 3, which is performed as follows. For each occurrence occ(a) = [Ts, Te] in occSet(a), the algorithm performs the following: for each time point t between [Te + 1, Ts + WS  1], the algorithm checks whether Te is in ETSet(c) (Definition 9) for each one-episode c. If t is in ETSet(a), the algorithm puts c into a-PDB and generates an episode X = a  c by serial concatenation of a and c. Then, the procedure puts the occ(X) = [Ts, t] into occSet(X) (Lines 1–7). After that, events that occur serially with a and their occurrences are stored in a-PDB (Line 8). For each serial event c in a-PDB, the episode X is formed by serial concatenation of a and c (Lines 8–9). The sub-procedure CalculateUtility is performed to find woSet(X) (Line 10). By giving woSet(X), MRU(X) is calculated (Definition 18). If MRU(X) is no less than (min_utility  CU), the procedure GenUBER is performed to

380

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

364

366 367 368 369 370 371 372 373 374 375 376 377 378

381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396

397 398 399 400 401

402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

5

Fig. 1. Pseudocode for algorithm UBER-Mine.

Fig. 2. Pseudocode for procedure MineSimuUBER.

Fig. 3. Pseudocode for procedure MineSerialUBER.

425 426 427 428 429 430 431 432 433 434 435 436 437 438 439

generate all high utility episode rules whose antecedents are X (Definition 18). Then, the procedure MineUBER is performed to find all high utility episode rules whose LHS are extended by X (Lines 11–13). The procedure for CalculateUtility is shown in Fig. 4, which is performed as follows. For each occurrence occ(a) = [Ts, Te] in occSet(a), the variable t is set to one. If (Te  WS + 1) is higher than zero, t is set to (Te  WS + 1) (Lines 1–4). After that, each time point h that occurs between the time interval [ t, Ts] is collected into woSet(X) (Lines 5–6). Next, for each window ID y in woSet(X), the variable uX is accumulated by u(X, Wy) (Lines 7–8). Finally, MCU(X) is set to uX and the procedure returns MCU(X) and woSet(X) (Lines 9–10). The procedure of GenUBER is shown in Fig. 5, which is performed as follows. For each window ID y in woSet(X), a variable

SE is used to collect simultaneous event sets that occur at the time point (y + WS). The algorithm collects all such simultaneous event sets into the set DB (Lines 1–3). Then, the algorithm performs UpGrowth algorithm (Tseng, Wu, Shie, & Yu, 2010) to find all high utility itemsets in DB. The discovered high utility itemsets are collected into the set ISet (Line 4). Finally, for each pattern Y in ISet, the algorithm generates a rule r: X ) Y and puts r into the set UBERSet (Lines 5–7).

440

3.3. Utility Rule Tree (UR-Tree)

448

In order to speed up the mining process, we propose to use a compact tree structure, named UR-Tree (Utility Rule Tree), to maintain important information of the event sequences related to high utility episode rules. The UR-Tree is introduced below.

449

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

441 442 443 444 445 446 447

450 451 452

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 6

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

Fig. 4. Pseudocode for function CalculateUtility.

Fig. 5. Pseudocode for procedure GenUBER.

453 454 455 456

457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475

Definition 8 (Simultaneous event prefix). Given a simultaneous event set SE = (E1, E2, . . ., Em) such that Ei 2 e for all 1 6 i 6 m. The simultaneous event prefix is denoted so that each event Ei in SE occurs in alphabetical order. Definition 9 (Utility rule tree). Given a simultaneous event SE at the time point T, the UR-Tree stores the simultaneous event prefixes of SE, time point T, and the utility of SE at T. The nodes in the UR-Tree consist of one root labeled ‘‘null’’, an internal node, and a leaf node. Each internal node registers the event label, and the root to an internal node forms a simultaneous event prefix. The leaf node registers a set of time points where corresponding simultaneous event sets occur. We also employ a header table for facilitating the traversal of the UR-Tree, and each entry in the header table consists of a time point, a utility value, and a link. The link points to the leaf node that has the same time point as the entry in the UR-Tree. Fig. 6 shows an example of a UR-Tree. UR-Tree can be constructed during the first scan of complex event sequence. At the same time, the utility of simultaneous event sets at time points is computed, and the link between header table and UR-Tree is created. Events in the complex event sequence are rearranged in a fixed order, such as lexicographic order. For example, if one considers utility complex event sequence in Table 1, when the events at time point 1 are retrieved, two internal

Fig. 6. A sample UR-Tree.

nodes corresponding to the events ‘‘a’’ and ‘‘b’’ are created. Because the events occur at time point 1, a leaf node labeled ‘‘{1}’’ is created. The sum of the utility of ‘‘a’’ and ‘‘b’’ at time point 1 is stored in the corresponding entry of the header table. After retrieving all simultaneous events in sequences, the UR-Tree is constructed completely. To adopt the UR-Tree to efficiently generate high utility episode rules, we replace lines 2–6 of the MineSimuUBER procedure pseudocode and lines 3–7 of the MineSerialUBER pseudocode with the pseudocodes in Fig. 7. The pseudocode of Fig. 7(a) is performed as follows. Let NL be the leaf node pointed by the entry of the header table that corresponds to time point Te, and P be the path of events that start from NL to the root. For each single event e in P, the algorithm puts e into the projected database of a (abbreviated as a-PDB) and generates an episode X = a h e by the simultaneous concatenation of a and e. The pseudocode in Fig. 7(b) is performed as follows. Let NL be the leaf node pointed by the entry of the header table that corresponds to time point t, and P be the path of events that start from NL to the root. For each single event e in P, the algorithm puts e into the projected database of a (abbreviated as a-PDB) and generates an episode X = a  e by the serial concatenation of a and e.

476

3.4. Investment application by utility-based episode rules

498

Stock investment is a challenging and important task. Typically, the studies in the area of artificial intelligence for this line of research provide black-boxed models for investment (Brock, Lakonishok, & LeBaron, 1992; Huang, 2012; Huang, Chang, Cheng, & Chang, 2012; Mochón, Quintana, & Sáez, 2008; Sapankevych & Sankar, 2009). As opposed to this, episode mining is a good modeling approach for the purpose of easy comprehension. In short, there are two major types of papers in the episode mining literature that have been reported so far for stock prediction or investment (Lin et al., 2014; Ng & Fu, 2003). Ng and Fu (2003) utilized episode mining techniques to discover the relationship among stocks. However, they did not generate frequent

499

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497

500 501 502 503 504 505 506 507 508 509 510

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

7

Fig. 7. Pseudocode for adopting UR-TREE to UBER-Mine: (a) Pseudocode for MineSimuUBER for adopting UR-TREE to UBER-Mine; (b) Pseudocode for MineSimuUBER for adopting UR-TREE to UBER-Mine.

511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555

episode rules for further predictions on the price movement of stocks and so this method is not useful from the point of view of practical applications. On the other hand, Lin et al. (2014) proposed to construct frequent episode rules for such predictions, but the proposed method lacks the consideration of utility, and thus often recommends users with episode objects that frequently appear in the database which may not generate high profits for investment. In this subsection we propose an investment model using our aforementioned utility-based methodology—IV-UBER (InVestment by Utility-Based Episode Rules)—for the investment application reported in Lin et al. (2014). Our objective is to showcase how our proposed episode mining method may be used to solve the challenging, real-world problem and we show the relevant experimental results in Section 4. Our IV-UBER algorithm is shown in Fig. 8, which consists of four main steps: (1) feature extraction, (2) rule generation, (3) rule match, and (4) rule selection and prediction. Here we provide explanations for these components as follows. Feature extraction. The stock price data is a time series dataset consisting of the opening, highest, lowest, and closing prices of a stock every day. Events are extracted as features from these prices by applying technical indicators (Brock et al., 1992; Huang, 2012; Lin et al., 2014), which can be then converted to complex event sequences. In this work, the utilities of events are defined as the daily return (i.e., the price change of a stock expressed as percentages). Rule generation. Given a user-specified minimum utility threshold min_util, a window size WS, and a utility-based complex event sequence u-CES, the model generates all high utility episode rules in u-CES by executing the UBER-Mine algorithm. The discovered rules are then collected to form the set ER-Set. Rule match. For convenience, for each simultaneous event set u-SEi in the testing dataset TestDS = {u-SEi, u-SEi+1, . . ., u-SEk}, we create a sequence di = hu-SEi-WS, . . ., u-SEi-2, u-SEi-1i to represent the WS closest utility-based simultaneous event sets occurring before u-SEi, which can be considered as the features used to predict the class label Ti. All such sequences are collected for the set TS-Set for matching rules. Then, we use the exact match method to match rules. Given a testing data di, a rule r: X ) Y matches di if (1) X occurs in di, and (2) the first and the last simultaneous event sets of X are the subsets of u-SEi-WS and u-SEi-1, respectively. All rules that match di are collected into the set RM-Set(di). Rule selection and rule prediction. If there is no rule in RMSet(di), the model does not predict the class label Ti (or predicts the class label of Ti as ‘‘Fall’’ which stands for a negative daily return; analogously, the label of ‘‘Rise’’ stands for a positive daily

return). If RM-Set(di) is not null, the system ranks the rules by their significance (in terms of profitability and accuracy) and uses the rule with the highest significance to make a prediction. Here the importance of a rule r is measured by its confidence (Mannila et al., 1997) and profitability, which is defined as: Ip(r) = [(1  l)  u(r) + l  conf(r)]1/2, where l (0 6 l 6 1) is a user-specified weighted function that can be used to control the trade-off between profitability and accuracy. Other measurements, such as entropy, gini index, and other discriminative measurements (Kecman, 2001) can be considered.

556

4. Experimental evaluation

566

In this section, we compare the performance discrepancy of our proposed algorithms and two relevant algorithms previously reported in Wu et al. (2013) using both synthetic and real-world datasets. All experiments were conducted on a computer with Intel Core 2 Processor (3.40 GHz, 8 GB RAM) running on Windows 7. We follow previous works (Tseng et al., 2010; Wu, Philippe, Yu, & Tseng, 2011; Wu, Shie, Tseng, & Yu, 2012; Wu et al., 2013) and use the IBM data generator (Agrawal & Srikant, 1994; Lan, Hong, Tseng, & Wang, 2014; Shie, Yu, & Tseng, 2012; Tseng et al., 2010) to generate synthetic datasets. The parameters of the generator are described in Table 2. The internal and external event utilities are generated with the settings used in Wu et al. (2013). Realworld datasets with different characteristics were also used in the experiments, including Foodmart (a small sparse dataset from Microsoft Developer Network (2000)), ChainStore (a large dataset from (Center for Ultra-scale Computing, 2006)), as well as TAIEX (the price data of Taiwan Capitalization Weighted Stock Index (Taiwan Stock Exchange Corporation, 1962)) from January 6, 1967 to January 15, 2013.1 Here we compare the method studied in Wu et al. (2013) and our proposed algorithms—the algorithm (Wu et al., 2013) with two strategies (Discarding Global unpromising Events and Discarding Local unpromising Events) is denoted as UPSpan; our baseline algorithm UBER is denoted as UBER (Baseline); our advanced algorithm with compact UR-Tree structure is denoted as UBER (UR-Tree). Table 3 provides the characteristics of all the datasets used in the experiments.

567

1 As indicated in Wu et al. (2013), the datasets of Foodmart and ChainStore can be regarded as single complex sequences by considering items as events and each transaction as a simultaneous event set at a time point. Notice that these two datasets and the synthetic datasets do not have built-in utilities; so Wu et al. (2013) proposed a way to generate simulated utility data which is then used for our study here, as well.

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

557 558 559 560 561 562 563 564 565

568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 8

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

Fig. 8. Pseudocode for IV-UBER algorithm.

occurrence of candidate episodes, but UBER (UR-Tree) finds high utility episodes rules by traversing related branches of the UR-Tree, which is much more efficient than UBER (Baseline). In addition, when the threshold is lower than 20%, UP-Span runs even slower than UBER (Baseline). The reason is that UP-Span generates many episode rules with low-utility consequents, which need to be removed in the time-consuming post process.

620

4.2. Evaluation on real dataset Foodmart

627

We then evaluate the performance of the algorithms on the Foodmart dataset, which is a small sparse dataset. Fig. 9(b) shows the execution time of the algorithms on this dataset under various minimum utility thresholds when the window size is set to six. As shown in Fig. 9(b), UBER (UR-Tree) is quite efficient even for very low minimum utility thresholds—it runs over ten times faster than both UBER (Baseline) and UP-Span across all the thresholds tested here.

628

4.3. Evaluation on large dataset ChainStore

636

We continue to evaluate the performance of the algorithms using the ChainStore dataset, which is a large dataset with several distinct events. Fig. 9(c) shows the corresponding execution time under various minimum utility thresholds when the window size is set to four. Notice that the execution time of UBER (Baseline) and UP-Span cannot be displayed in Fig. 9(c) because both of them run extremely slow on this dataset, even for higher minimum utility thresholds. For example, when the threshold is set to 1%, UBER (Baseline) and UP-Span cannot terminate within 150,000 s, whereas UBER (UR-Tree) terminates within 200 s.

637

602

The synthetic dataset T12-I8-N1K-Q5-D40K is first evaluated for observation of reasonable setting of different parameters. Based on the observed parameter settings, we then conduct experiments on two real datasets (i.e., Foodmart and ChainStore) to evaluate the performance of the proposed algorithms in terms of computational overhead and memory consumption. We continue to examine the scalability of our two proposed algorithms for observation on various different parameters. Last, the TAIEX dataset is employed to gain insights into utility-based episode rules for a promising application on stock investment.

603

4.1. Evaluation on synthetic dataset

4.4. Evaluation on memory consumption

647

604

We first evaluate the performance of the algorithms on the synthetic dataset T12-I8-N1K-Q5-D40K. Fig. 9(a) shows the execution time of the algorithms on this dataset under various minimum utility thresholds when the window size is set to six. As can be seen, the performance discrepancy increases with decreasing thresholds. For instance, UBER (UR-Tree) runs approximately ten times faster than both of UBER (Baseline) and UP-Span when the minimum utility threshold is 30%. Furthermore, UBER (UR-Tree) runs more than ten times faster than UBER (Baseline) and UP-Span when the minimum utility threshold is 10%. When the threshold is lower than 0.5%, UBER (UR-Tree) runs faster than UBER (Baseline) over two orders of magnitude. The reason is that UBER (Baseline) uses an Apriori-like approach to generate many unnecessary candidate episodes for the LHS of rules, but UBER (UR-Tree) finds high utility episodes in LHS of rules without producing candidates. Furthermore, UBER (Baseline) spends considerable time calculating the

Next, the memory consumption of the execution of algorithms on datasets Foodmart, ChainStore, and T12-I8-N1K-Q5-D40K is investigated and shown in Fig. 10. One notices that UBER (Baseline) uses slightly less memory than UBER (UR-Tree) on the T12-I8-N1KQ5-D40K and Foodmart datasets. The reason is that UBER (URTree) needs to maintain UR-Tree in memory, whereas UBER (Baseline) does not. However, since the UR-Tree compresses simultaneous event sets well, the demand on memory is thus insignificantly distinct when comparing with the UBER (Baseline). In addition, UPSpan generates many episode rules with low-utility consequents and so its memory consumption may increase quickly when the minimum utility threshold decreases considerably. In the case of the ChainStore dataset, since UBER (Baseline) and UP-Span cannot terminate within the pre-specified duration as discussed in the previous subsection, Fig. 10(c) thus displays only the memory consumption for UBER (UR-Tree) for the purpose of reference.

648

Table 2 Parameter descriptions. Parameters

Description

D T N I

The The The The

total number of time points average size of a simultaneous event set number of distinct events average size of maximal potential episodes

Table 3 Characteristics of the experimental datasets.

593 594 595 596 597 598 599 600 601

605 606 607 608 609 610 611 612 613 614 615 616 617 618 619

Dataset

# Time points (# Transactions)

# Events (# Items)

Avg. length

T12-I8-N1K-Q5-D40K Foodmart ChainStore TAIEX Weighted Stock Index

40,000 4141 1,112,949 12,663

1000 1559 46,086 440

12 4.4 7.3 22

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

621 622 623 624 625 626

629 630 631 632 633 634 635

638 639 640 641 642 643 644 645 646

649 650 651 652 653 654 655 656 657 658 659 660 661 662 663

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

9

Fig. 9. Execution time on different datasets: (a) T12-I8-N1K-Q5-D40K; (b) Foodmart; (c) ChainStore.

Fig. 10. Memory consumption on the three datasets: (a) T12-I8-N1K-Q5-D40K; (b) Foodmart; (c) ChainStore.

664

4.5. Evaluation on scalability testing

665

In this subsection we examine the scalability of our two proposed algorithms for various window sizes and number of time points. Fig. 11(a) shows the effect of window sizes on the execution time of the algorithms using the T12-I8-N1K-Q5-D40K dataset when the minimum utility threshold is set to 5%. As can be seen, both algorithms have good scalability, but UBER (UR-Tree) runs faster than UBER (Baseline). Thereafter, we examine the scalability of the algorithms using the T12-I8-N1K-Q5-DxK dataset, where x is varied from 20 to 100. In the experiment, the minimum utility threshold and the window size are set to 5% and six, respectively. Fig. 11(b) shows the execution time of the algorithms. As shown in Fig. 11(b), UBER (UR-Tree) runs about ten times faster than UBER (Baseline), and both the approaches have decent scalability on a large database, as well.

666 667 668 669 670 671 672 673 674 675 676 677 678

679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697

Fig. 11. Scalability test on T12-I8-N1K-Q5 dataset for (a) various window size, and (b) various number of time points.

4.6. Evaluation of the IV-UBER algorithm on real-world stock market data To evaluate the performance of our proposed utility-based IV-UBER model for investment, we employ the cross validation (CV) method (Huang, 2012; Lin et al., 2014) to split the TAIEX data into two independent datasets for training and testing. As shown in Fig. 12, the first pc percentage of the dataset is used to train prediction models, and the remaining data is used for testing the effectiveness of the prediction models. Here we compare seven methods as follows: B&H (the traditional Buy-and-Hold scheme (Sensoy, 2009; Shukla & Singh, 1997)), SVM (Support Vector Machines) (Kecman, 2001), NN (Neural Network) (Kecman, 2001), SISTEM (Stock Investment Strategy using Technical indicator and Episode Mining (Lin et al., 2014), UP-Span (Wu et al., 2013), IV-UBER (l = 0.5), and IV-UBER (l = 0). In each CV, the methods are executed using the best turned parameters to generate the best 50 models in the training phase, and the average performance of the 50 models is then reported for the testing phase.2 We will use annualized return (Huang,

Fig. 12. Model validation: training (black) and testing (white).

2012; Lin et al., 2014), precision (Kecman, 2001; Lin et al., 2014) and computational cost to compare the performance discrepancy of the methods. The annualized return (Huang, 2012; Lin et al., 2014) may be derived as follows. Let CP(Ti) be the closing price of a stock at time point Ti. At Ti1, if the prediction model predicts the class label at Ti as ‘‘Rise’’, Ti is collected into the set SetBuyTime. The return of the model at Ti is denoted as R(Ti) and defined as:

CPðT i Þ  CPðT i1 Þ RðT i Þ ¼ 1 þ : CPðT i1 Þ Let SetBuyTime = {T10 , T20 , . . ., Tm 0 }. Then the cumulative total return for a period of m days, denoted as CTR, is defined as:

CTR ¼ 2

In this study we use each of the seven methods to generate prediction (on day n) for the return of a stock on the next day to be positive (negative), and then our investment strategy determines to buy (sell) the stock at day n’s closing price and sells (buy) it the next day.

mþk1 Y

RðT 0j Þ:

j¼k

The annualized return of an investment model over n years is defined as:

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

698 699 700 701 702 703 704 705 706

707 709 710 711

712

714 715 716

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 10

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

Table 4 Comparison of annualized returns for different pc % values.

Table 5 Comparison of precision for different pc % values.

pc %

B&H

SVM

NN

SISTEM

UPSpan

IV-UBER (l = 0.5)

IV-UBER (l = 0)

pc %

SVM

NN

SISTEM

UPSpan

IV-UBER (l = 0.5)

IV-UBER (l = 0)

10% 20% 30% 40% 50% 60% 70% 80% 90%

10.7% 8.6% 8.6% 8.0% 0.3% 3.1% 1.7% 6.0% 1.3%

3.5% 2.3% 15.9% 20.3% 8.1% 0.1% 0.2% 0.9% 10.1%

8.9% 12.2% 9.5% 11.3% 4.9% 0.3% 0.1% 3.1% 13.0%

8.6% 19.5% 20.3% 20.9% 11.9% 8.2% 1.2% 3.7% 5.2%

7.6% 11.6% 12.6% 4.0% 9.3% 2.3% 1.8% 0.1% 0.7%

14.7% 21.6% 18.1% 22.2% 12.8% 12.8% 11.5% 11.5% 9.7%

12.8% 21.6% 18.5% 21.1% 5.2% 8.1% 10.4% 9.1% 1.8%

10% 20% 30% 40% 50% 60% 70% 80% 90%

44.6% 53.7% 33.7% 43.8% 38.4% 51.1% 35.7% 49.4% 46.0%

34.5% 45.5% 35.4% 33.6% 53.7% 46.2% 35.4% 43.1% 43.2%

60.7% 64.6% 100.0% 58.9% 54.0% 91.2% 68.5% 93.8% 60.2%

39.7% 49.6% 43.7% 45.8% 40.2% 39.1% 40.3% 52.3% 41.7%

100.0% 100.0% 93.6% 84.4% 97.7% 88.1% 93.9% 85.9% 100.0%

91.1% 91.7% 83.2% 97.6% 97.1% 84.2% 89.0% 95.4% 84.8%

Avg.

4.6%

4.4%

7.0%

10.8%

5.4%

15.0%

12.0%

Avg.

51.9%

48.4%

64.9%

50.5%

96.4%

90.7%

717

p ffiffiffiffiffiffiffiffiffi n CTR  1:

719

AR ¼

720

Table 4 displays the comparison of annualized returns by the seven methods for different pc values. As can be seen, in terms of the annualized return, IV-UBER (l = 0.5) outperforms the other methods in seven out of the nine CVs; and IV-UBER (l = 0.5) outperforms the others in terms of the average annualized return, as well. Next we compare the precision of the methods which is defined as:

721 722 723 724 725 726 727

728 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763

Precision ¼

#TP ; #TP þ #FP

where #TP and #FP are the numbers of true positives and false positives, respectively. Here we acknowledge that a true positive (TP) occurs if the annualized return (AR) of a model is larger than that of the B&H model in both the training and testing phases. A false positive (FP) occurs if the AR of the model is larger than that of the B&H model in training, but less than that of the B&H model in testing. Table 5 lists the precision comparison for the six methods. The results show that the precision of IV-UBER (l = 0.5) is higher than 90% in six out of the nine CVs. This method also offers the highest precision among all the methods in six out of nine CVs and has the best average precision, as well. Therefore, IV-UBER outperforms state-of-the-art methods compared here in terms of annualized return and precision. In particular, IV-UBER (l = 0.5) considering both high utility episode rules and frequent episode rules outperforms SISTEM and IV-UBER (l = 0) in terms of annualized return and precision, in which IV-UBER (l = 0) and SISTEM consider only high utility episode rules and frequent episode rules, respectively. Therefore, it can be seen that the rules constructed by their significance (in terms of profitability and accuracy) outperform those generated by either profitability or accuracy. In Table 6 we further compare the computational cost (in seconds) for different models in the training and testing phases. Since the episode-based algorithms (i.e., SISTEM, UP-Span, and IV-UBER) have to extract features from the dataset and train the prediction models using these features, one may expect that they are more costly in computational overhead than the SVM and NN methods, as can be seen in Table 6. However, we have shown that our episode mining methods typically outperform the SVM and NN methods in the performance of investment; here we thus aim to further compare the computation cost of our IV-UBER with the other two state-of-the-art methods (i.e., the UP-Span from Wu et al. (2013) and SISTEM from Lin et al. (2014). 3 For all the different values of 3

In addition, although SVM and NN can be used to solve the investment problem studied in this work, they typically generate less interpretable results. In other words, the rationales behind the predictions by these two methods are difficult to comprehend by users and, thus it is less interesting for us to continue exploring them here.

Table 6 Computational cost in seconds for different models in the training (testing) phase. pc %

SVMs

NNs

SISTEM

UP-Span

IV-UBER

10% 20% 30% 40% 50% 60% 70% 80% 90%

4.5 9.3 29.1 61.9 117.5 200.9 313.9 426.7 601.5

21.7 40.1 73.2 104.5 142.6 205.3 166.1 191.2 368.8

448.1 (1.6) 655.8 (1.3) 1104.5 (2.2) 2286.7 (6.1) 3356.0 (7.7) 5146.0 (17.1) 7365.2 (20.1) 9396.5 (26.3) 16843.1 (37.9)

101.9 142.6 178.7 242.4 258.7 326.4 363.1 419.3 857.6

98.6 (0.5) 118.3 (0.7) 146.7 (1.8) 209.7 (3.1) 227.8 (3.2) 272.2 (4.5) 344.1 (6.1) 387.1 (7.9) 740.7 (11.0)

(0.9) (0.9) (2.7) (5.9) (8.1) (16.4) (22.1) (26.1) (40.3)

s: Computational costs of the SVM and NN models are less than 0.01 s in the testing phase.

Table 7 Investment results after n years using different models (assuming present value = 10 K USD) (unit: K USD). n years

B&H

SVM

NN

SISTEM

UPSpan

IVUBER (l = 0.5)

IVUBER (l = 0)

PMMS

3 5 10 30 50

11.4 12.5 15.7 38.7 95.4

11.4 12.4 15.3 36.1 85.0

12.3 14.1 19.8 77.1 300.9

13.6 16.7 27.8 215.7 1671.3

11.7 13.0 16.9 48.4 138.3

15.2 20.1 40.4 659.9 10775.0

14.1 17.7 31.2 303.3 2949.3

10.4 10.7 11.5 15.4 20.5

pc %, Table 6 clearly shows that our proposed IV-UBER indeed runs faster than UP-Span and SISTEM in both the training and testing phases. Finally we provide an intuitive idea for how the investment models may work in the real world. Suppose that the present value is 10 K, Table 7 lists the investment results after n years using different models. Given a present value, the future value can be calculated by the following equation (Bank of America, 1784): n

Future Value ¼ Present Value  ð1 þ annualized returnÞ :

764 765 766 767 768 769 770 771

772 774

Assume the present value is $10 K USD. If we use IVUBER (l = 0.5) for investment, the future value after 50 years shall become $10,775 K USD, whereas the future value after 50 years for SISTEM is $1,681.3 K USD. As indicated in Table 7, the future value of our proposed method is significantly larger than that of SISTEM, UP-Span and PMMS (Personal Money Market Savings in Bank of America).

775

5. Conclusions and future research directions

782

In this paper, we present a novel methodology for mining high utility episode rules in complex event sequences to facilitate decision making for expert and intelligent systems. Although previous

783

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

776 777 778 779 780 781

784 785

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851

methods have existed that may be employed to indirectly construct utility-based episode rules, these methods are not promising because they typically lack efficiency and effectiveness for realworld applications. In particular, we observed that the inefficacy of the previous methods arises because they may produce high utility episode rules whose consequent is of low-utility and thus not useful if one’s goal is to produce high utility consequents for decision making. In addition, the inefficiency of these methods arises due to the two stages of first generating high-utility episodes and then using these episodes to generate episode rules. Therefore, in this study we proposed a novel methodology named UBER-Mine, which is based on our derived ERADC property that provides a theoretical background to efficiently discover utility-based episode rules. We then also proposed a special tree-based structure called UR-Tree to further facilitate the mining task. Theoretically, the UR-Tree maintains important event information in the complex event sequences so that UBER-Mine with UR-Tree can efficiently find utility-based episode rules by traversing related branches of the UR-Tree without producing candidate episodes. However, the UR-Tree is maintained in memory, and so the demand on memory is increased because the UR-Tree does not compress simultaneous event sets well. Experimental results are also consistent with our assertion in theory. The results on both real and synthetic datasets show that UBER-Mine with UR-Tree has good scalability on large datasets and runs faster than UBER-Mine and the current best algorithm over 100 times, and its memory consumption, however, increases about five to fifteen percent when compared with UBER-Mine. In order to further demonstrate the practical advantage of our method, we extended our method for an application in a realworld stock investment problem by further proposing a prediction model called IV-UBER. Besides, IV-UBER provided a weighted function to control the trade-off between profitability and accuracy, and IV-UBER can rank the rules by their significance (in terms of profitability and accuracy) and use the top-ranked rules for prediction. Theoretically, if a rule is highly profitable but less accurate, then it may not be useful to users. Experimental results are also consistent with our assertion here. The results show that the rules constructed by their significance outperform those generated by either profitability or accuracy. In addition, experimental results verified that the high utility episode rules discovered by our proposed IV-UBER model are effective and our method significantly outperforms several state-of-the-art episode mining and machine learning algorithms in terms of both precision and annualized return for investment. Overall, this work makes a contribution to the area of research and application for mining high utility episode rules and the construction of intelligent systems. However, there are some limitations in our proposed methodology. We first notice that this study focuses only on static complex event sequences. If the events are dynamic and continuously arriving, the utility-based episode rules have to be re-mined using the new sequence of datasets in order to rebuild the prediction model. In addition, the consequent of a rule is composed by only one simultaneous event set and we have yet to consider the more complex case in which the consequent of a rule may be composed by more than one simultaneous event sets. In the future, we will conduct further research to deal with these issues and take into account the following interesting extensions. First, in our current method, episode rules and the corresponding prediction models are constructed using the prespecified parameters. For the future work we intend to conduct a thorough optimization study for our proposed method. E.g., genetic algorithm or particle swarm optimization may be further employed for simultaneous optimization on parameters and selection of subsets of models in order to further improve the performance of our episode mining models. Another interesting research direction

11

is to extend the idea of high utility episode rule mining to high utility itemset mining and high utility sequential pattern mining. These research and application areas are closely related to the topic of mining utility-based episode rules and are certainly worthwhile to be explored in the future.

852

References

857

Ahmed, C. F., Tanbeer, S. K., & Jeong, B. (2010). A novel approach for mining high utility sequential patterns in sequence databases. ETRI Journal, 32(5), 676–686. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. The 20th int’l conf. on very large data bases (pp. 487–499). Santiago, Chile: Morgan Kaufmann Publishers Inc. Ao, X., Luo, P., Li, C., Zhuang, F., & He, Q. (2015). Online Frequent Episode Mining. In: The int’l conf. on data engineering. Seoul, Korea. Bank of America (1784). Bank account interest rates. Retrieved October 15, 2013, from . Brock, W., Lakonishok, J., & LeBaron, B. (1992). Simple technical trading rules and the stochastic properties of stock returns. Journal of Finance, 47(5), 1731–1764. Center for Ultra-scale Computing and Information Security (2006). NU-MineBench version 2.0 dataset. Retrieved July 06, 2014, from . Chan, R., Yang, Q., & Shen, Y. (2003). Mining high-utility itemsets. Proc. of third IEEE int’l conf. on data mining (pp. 19–26). Guo, G., Zhang, L., Liu, Q., Chen, E., Zhu, F., & Guan, C. (2014). High utility episode mining made practical and fast. 10th international conference of advanced data mining and applications (pp. 71–84). Gwadera, R., Atallah, M. J., & Szpankowski, W. (2005). Reliable detection of episodes in event sequences. Knowledge and Information System, 7(4), 415–437. Huang, C.-F. (2012). A hybrid stock selection model using genetic algorithms and support vector regression. Applied Soft Computing, 12(2), 807–818. Huang, C.-F., Chang, B.-R., Cheng, D.-W., & Chang, C.-H. (2012). Feature selection and parameter optimization of a fuzzy-based stock selection model using genetic algorithms. International Journal of Fuzzy Systems, 14(1), 65–75. Huang, K.-Y., & Chang, C.-H. (2008). Efficient mining of frequent episodes from complex sequences. Information Systems, 33(1), 96–114. Kecman, V. (2001). Learning and soft computing: support vector machines, neural networks, and fuzzy logic models. London, England: The MIT Press. Lan, G.-C., Hong, T.-P., Tseng, V. S., & Wang, S.-L. (2014). Applying the maximum utility measure in high utility sequential pattern mining. Expert Systems with Applications, 41(11), 5071–5081. Lin, S., Qiao, J., & Wang, Y. (2014a). Frequent episode mining within the latest time windows over event streams. Applied Intelligence, 40(1), 13–28. Lin, Y.-F., Huang, C.-F., & Tseng, V. S. (2014b). A novel episode mining methodology for stock investment. Journal of Information Science and Engineering, 30, 571–585. Liu, J., Wang, K., & Fung, B. C. M. (2012). Direct discovery of high utility itemsets without candidate generation. Proc. of 12th IEEE int’l conf. on data mining (pp. 984–989). Ma, X., Pang, H., & Tan, K. (2004). Finding constrained frequent episodes using minimal occurrences. The 8th IEEE int’l conf. on data mining (pp. 471–474). Shenzhen, China: IEEE. Mannila, H., Toivonen, H., & Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 259–289. Microsoft Developer Network (2000). FoodMart2000. Retrieved July 06, 2014, from . Mochón, A., Quintana, D., & Sáez, Y. (2008). Soft computing techniques applied to finance. Applied Intelligence, 9(2), 111–115. Ng, A., & Fu, A. W. (2003). Mining frequent episodes for relating financial events and stock trends. The 7th Pacific-Asia conf. on advances in knowledge discovery and data mining (pp. 27–39). Seoul, Korea: Springer-Verlag. Sapankevych, N. I., & Sankar, R. (2009). Time series prediction using support vector machines: A survey. IEEE Computational Intelligence Magazine, 4(2), 24–38. Sensoy, B. A. (2009). Performance evaluation and self-designated benchmark indexes in the mutual fund industry. Journal of Financial Economics, 92, 25–39. Shie, B.-E., Yu, P. S., & Tseng, V. S. (2012). Efficient algorithms for mining maximal high utility itemsets from data streams with different models. Expert Systems with Applications, 39(17), 12947–12960. Shukla, R., & Singh, S. (1997). A performance evaluation of global equity mutual funds: Evidence from 1988–95. Global Finance Journal, 8, 279–293. Tatti, N., & Cule, B. (2012). Mining closed episodes with simultaneous events. The ACM SIGKDD int’l conf. on knowledge discovery and data mining (pp. 1172–1180). Tseng, V. S., Wu, C.-W., Shie, B.-E., & Yu, P. S. (2010). Up-growth: An efficient algorithm for high utility itemset mining. The ACM SIGKDD int’l conf. on knowledge discovery and data mining (pp. 253–262). Washington, USA: ACM.

858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

853 854 855 856

ESWA 9872

No. of Pages 12, Model 5G

2 March 2015 12 932 933 934 935 936 937 938 939 940

Y.-F. Lin et al. / Expert Systems with Applications xxx (2015) xxx–xxx

Tseng, V.S., Wu, C.-W., Fournier-Viger, P., & Yu, P. (2014). Efficient algorithms for mining the concise and lossless representation of closed+ high utility itemsets. IEEE Transactions on Knowledge and Data Engineering. Taiwan Stock Exchange Corporation (1962). Historical Trading Info/Data. Retrieved July 06, 2013, from . Wu, C.-W., Lin, Y.-F., Yu, P. S., & Tseng, V. S. (2013). Mining high utility episodes in complex event sequences. The ACM SIGKDD int’l conf. on knowledge discovery and data mining (pp. 536–544). Chicago, USA: ACM.

Wu, C.-W., Shie, B.-E., Tseng, V. S., & Yu, P. S. (2012). Mining top-k high utility itemsets. The ACM SIGKDD int’l conf. on knowledge discovery and data mining (pp. 78–86). Beijing, China: ACM. Wu, C.-W., Philippe, P., Yu, P. S., & Tseng, V. S. (2011). Efficient mining of a concise and lossless representation of high utility itemsets. The IEEE int’l conf. on data mining (pp. 824–833). Vancouver, Canada: IEEE. Zimmermann, A. (2014). Understanding episode mining techniques: Benchmarking on diverse, realistic, artificial data. Intelligent Data Analysis, 18(5), 761–791.

Please cite this article in press as: Lin, Y.-F., et al. Discovering utility-based episode rules in complex event sequences. Expert Systems with Applications (2015), http://dx.doi.org/10.1016/j.eswa.2015.02.022

941 942 943 944 945 946 947 948 949