Intelligent optimization algorithms for the problem of mining numerical association rules

Journal Pre-proof Intelligent optimization algorithms for the problem of mining numerical association rules Elif Varol Altay, Bilal Alatas PII: DOI: ...

Download PDF

940KB Sizes 0 Downloads 18 Views

Report

PDF Reader
Full Text

Journal Pre-proof Intelligent optimization algorithms for the problem of mining numerical association rules Elif Varol Altay, Bilal Alatas

PII: DOI: Reference:

S0378-4371(19)31770-4 https://doi.org/10.1016/j.physa.2019.123142 PHYSA 123142

To appear in:

Physica A

Received date : 24 May 2019 Revised date : 17 July 2019 Please cite this article as: E.V. Altay and B. Alatas, Intelligent optimization algorithms for the problem of mining numerical association rules, Physica A (2019), doi: https://doi.org/10.1016/j.physa.2019.123142. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.

*Highlights (for review)

Journal Pre-proof HIGHLIGHTS

Jo

urn

al

Pr e-

p ro

of

• It has been demonstrated that intelligent optimization algorithms are the most suitable methods for data mining problems in numerical data • The efficiency comparison of seven evolutionary and fuzzy evolutionary intelligent algorithms based numerical association rules mining (NARM) approaches has been performed for the first time • The obtained results have been compared with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem • A comparative analysis of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics within eleven real-world datasets has been performed

*Manuscript Click here to view linked References

Journal Pre-proof Intelligent Optimization Algorithms for the Problem of Mining Numerical Association Rules Elif Varol Altay* Department of Software Engineering, Firat University Elazig, Turkey [email protected] * Corresponding author

p ro

of

Bilal Alatas Department of Software Engineering, Firat University Elazig, Turkey [email protected]

Abstract

al

Pr e-

There are many effective approaches that have been proposed for association rules mining (ARM) on binary or discrete-valued data. However, in many real-world applications, the data usually consist of numerical values and the standard algorithms cannot work or give promising results on these datasets. In numerical ARM (NARM), it is a difficult problem to determine which attributes will be included in the rules to be discovered and which ones will be on the left of the rule and which ones on the right. It is also difficult to automatically adjusting of most relevant ranges for numerical attributes. Directly discovering the rules without generating the frequent itemsets as used in the literature as the first step of ARM accelerates the whole process without determining the metrics needed for this step. In classical ARM algorithms, generally one or two metrics are considered. However, mined rules are needed to be comprehensible, surprising, interesting, accurate, confidential, and etc. in many real-world applications. Adjusting all of these processes without the need for the metrics to be pre-determined for each dataset seems another problem. For these purposes, evolutionary intelligent optimization algorithms seem potential solution method for this complex problem. In this paper, the performance analysis of seven evolutionary algorithms and fuzzy evolutionary algorithms; namely Alatasetal, Alcalaetal, EARMGA, GAR, GENAR, Genetic Fuzzy Apriori, and Genetic Fuzzy AprioriDC for NARM problem has been performed within eleven real datasets for the first time. The obtained results have also been compared with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem. Performances of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics have been comparatively performed with eleven real-world datasets. One of the best-mined rules obtained by each algorithm has been given and analyzed with respect to confidence, support, and lift metrics.

1.

urn

Keywords: Numerical Association Rules Mining, Evolutionary Algorithms, Fuzzy Evolutionary Algorithms Introduction

Jo

Generally, the success of ARM methods within datasets containing different types of data and especially in numerical or quantitative data may be low. Because, for ARM in numerical data, the data are mostly discretized, and in this way, the data are converted in order to be processed by the working standard ARM algorithms. However, such an approach is not in line with the automatic rule discovery property of data mining. In fact, the dataset has been changed by determining the boundaries of the discrete data beforehand. Thus, the rules for the modified data are found. The process of efficient discretization according to the pre-defined intervals is a difficult problem. It is unreasonable to change the data that is likely to accommodate the accurate and interesting rules outside the respective ranges. Because of attribute interaction, important rules (perhaps very accurate and interesting) cannot be found outside of these boundaries of the related attributes. Discretization may lead to information loss. This means that these intervals should be automatically found in the data mining process without a preprocessing such as discretization. Integrating the discretization of continuous attributes, reduction of attributes, and the mining of numerical rules into a single step is very meaningful in terms of accuracy and speed. Most of the studies on ARM in the literature have been proposed to ensure that the rules are specific. Most of the time, the rules are expected to be accurate, according to the different metrics measured based on the records in the dataset. However, the only accuracy or confidence may not be a sufficient criterion for high-quality rules to emerge. Most of the time, the rules discovered are expected to have many features such as understandability,

Journal Pre-proof

Pr e-

p ro

of

accuracy, reliability, interestingness, surprisingness, and etc. In addition, the requirement of previously identifying some metrics for each dataset hosting the rules to be discovered by classical ARM algorithms can be seen as a deficiency that prevents the automation of data mining applications. Generally, the algorithms used for ARM in the literature work in two stages. In the first stage, frequent itemsets are found and in the second stage, rules are drawn from these frequent itemsets. It will be very meaningful and efficient to transform these two stages into a single stage and directly mine only a few highquality comprehensible rules. In NARM problem, it is a difficult problem to determine which attributes will be included in the rules to be discovered and which ones will be on the left of the rule and which ones on the right. It is also difficult to automatically adjusting of most relevant ranges for numerical attributes. Directly discovering the rules without generating the frequent itemsets as used in the literature as the first step of ARM accelerates the whole process without determining the metrics needed for this step. In classical ARM algorithms, generally one or two metrics are considered. However, mined rules are needed to be comprehensible, surprising, interesting, accurate, confidential, and etc. in many real-world applications. Adjusting all of these processes without the need for the metrics to be pre-determined for each dataset seems another problem. Intelligent optimization algorithms have been efficiently used due to their many advantages. These algorithms do not require the information about search space and they are population-based and search the optimal solutions in parallel. They do not depend on the type and number of decision variables, the type of search space, and the type and number of constraints. They also do not need well-defined mathematical models. Due to their simplicity of execution and appropriate performance, intelligent optimization algorithms have been extensively applied to different complex real-world problems. Briefly, due to derivation-independent structure, flexibility, ease, and local optimum avoidance, these methods have become outstandingly widespread in recent years. The problem of mining association rules within datasets that contain numerical or quantitative attributes seems one of the application areas of these intelligent search and optimization algorithms. The datasets can be considered as search space and these intelligent optimization algorithms may be adjusted to be the global search methods in order to find the rules that have many characteristics satisfying the needed objectives.

al

The main contributions of this paper are: • To demonstrate that intelligent optimization algorithms are the most suitable methods for data mining problems in numerical data • To perform the efficiency comparison of seven evolutionary and fuzzy evolutionary intelligent algorithms based NARM approaches (Alatasetal, Alcalaetal, EARMGA, GAR, GENAR, Genetic Fuzzy Apriori, and Genetic Fuzzy AprioriDC) for the first time • To compare the obtained results with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem • To perform a comparative analysis of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics within eleven real-world datasets.

2.

urn

This paper has been organized as follows. The second section describes and analyzes the related works about NARM problem. In section three, the experimental setup and results have been presented. The last section concluded the paper along with further researches. Related Works

Jo

There are three main approaches to discovering numerical association rules as shown in Fig. 1. The attribute domain is first partitioned into smaller intervals and then adjacent intervals are combined with a larger one to make the combined intervals have enough support. By this, the NARM problem has been transformed into Boolean rules mining one. Partitioning of variables often leads to information loss. Furthermore, in these approaches, an attribute is discretized without taking into account the other attributes and this causes the attribute interaction problem [1]. Lian et al. have proposed DRMiner algorithm, using density to calculate the eigenvalues of quantitative attributes for adopting an effective process to position dense regions. However, requiring many thresholds and different values that leads to great different results may be seen as a disadvantage of this approach [2]. Diverse researchers afterward have used clustering techniques. Lent et al. have proposed a geometric-based BitOP algorithm in order to cluster the numerical attributes [3]. They have shown that clustering may be a potential solution for determining meaningful regions and for mining of association rules. DBSMiner [4] aims to scale up well for high dimensional numerical association rule mining using the notion of density-connected. MQAR (Mining Quantitative Association Rules) is another clustering based approach using a dense grid. It uses DGFP-tree to cluster dense subspaces [5].

Journal Pre-proof Fuzzy sets have also been used for NARM [6, 7]. Prakash and Parvathi have converted the numerical attributes into fuzzy binary attributes utilizing thread based mechanism [7]. In [8], hard clustering and fuzzy clustering have been used for the quantization. However, clustering results are not desirable when the scales of clusters differ a lot. Determining the most suitable types of fuzzy sets for each attribute, determining the suitable characteristics of membership functions, and etc. are problems in fuzzy sets based approaches. These approaches also consist of preprocessing and mine the rules within the changed data.

Biology based

Particle Swarm Optimization

Genetic Algorithm

Wolf Search Algorithm

Differential Evolution Algorithm

Physics based

Gravitational Search Algorithm

Arficial Immune System

Fig. 1 Methods for the problem of NARM

Discretization

Mean based

Fuzzifying

Median based

Clustering

Variance based

Partitioning and Combining

Pr e-

Swarm Intelligence

Distribution

p ro

Optimization

of

Numerical Association Rules Mining

Jo

urn

al

Throughout the process proposed in many these approaches, the initial discretization and the follow-up rules mining steps are independently performed. The initial discretization may make the loss of the original information, and make mistakes in the rules mining process. The work proposed by Aumann and Lindell has introduced a new definition of numerical association rules based on statistical inference theory. They have used many distribution measures such as mean, median, and variance [9]. Swarm, biology, and physics-based intelligent optimization algorithm have also been proposed in order to efficiently mine numerical association rules. The study proposed in [10] uses chaos numbers integrated particle swarm optimization to automatically mine association rules within datasets that have numerical attributes. The authors have proposed a rough particle swarm optimization algorithm that simultaneously finds for intervals of numerical attributes and the rule mining that these intervals conform [11]. Parallel particle-oriented and dataoriented PSO algorithms have also been proposed for NARM on huge datasets [12]. In this work, reducing the computational cost has been aimed to be reduced by parallelizing the PSO in order to obtain more scalable and efficient results. Wolf search that is based on the wolves hunting behavior is another swarm intelligence based optimization algorithm that has been used for NARM [13]. However, there exists no experiment nor any simulation in this paper and the methodology has been only theoretically explained. Genetic algorithm (GA) [14, 15] has been used by different researchers for numerical rules mining. One of the first works in this research area has been proposed in [16]. The GAR method proposed in this study finds only frequent itemsets by maximizing the support metric with GA and then the rules are extracted from these sets. The confidence values of rules are not optimized in this work. The same authors have proposed GENAR that uses the evolutionary algorithm for mining the numerical rules without generating frequent itemsets in [17]. Álvarez and Vázquez have worked on large and small datasets containing discrete and numerical data to discover numerical association rules [18]. QUANTMINER is also another GA based NARM system [19, 20]. It dynamically discovers good and small intervals in association rules and maximizes both the confidence and the support values. GA has also been efficiently proposed for discovering association rules from numerical data for smart cities [21]. However, there is not any simulation or experimental results about data in the smart cities. A real-coded GA (RCGA) has been proposed to mine numerical association rules from atmospheric climatology data [22]. Association rules between climatological attributes and pollutant agents have been mined for forecasting purposes. Yan et al. have proposed EARMGA that uses relative confidence as the fitness function for directly

Journal Pre-proof

3.

of

discovering numerical association rules [23]. The efficiency of GAR, GENAR, and EARMGA over Apriori and Eclat algorithm has been presented on two public datasets in [24]. Differential evolution [25, 26] has been used for mining of association rules in datasets consist of categorical and numerical attributes [27]. Not only positive rules both also negative numerical association rules have been mined with the approach proposed by Alatas and Akin [28]. An immune mechanism is introduced into the classical GA and a new model of immune GA has been proposed by Yang for the discovery of association rules within datasets that consist of both discrete and continuous attributes [29]. However, using binary encoding and considering only support metric in the fitness function are the disadvantages of this approach. The physics-based intelligent optimization algorithm, namely gravitational search algorithm has also been efficiently modeled for automatically mining of numerical association rules by eliminating the requirements for the pre-processing of the data and attribute interaction problem [30]. Experimental Results

p ro

The eleven real-world datasets obtained from [31] have been selected for the performance analysis of intelligent optimization algorithms. The information about the used datasets all of the attributes of which are numeric has been listed in Table 1. All of the experiments have been performed using Intel® Core™ i7-3630QM, 2.40 GHz CPU with 8GB of memory and running Windows 10. Table 1 Datasets Records 96 252 40 16 60 200 2178 57 950 186 52

Attributes 5 18 8 7 16 11 4 8 10 61 4

Pr e-

Dataset Basketball Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines Vineyard

Jo

urn

al

The parameters of the seven intelligent optimization algorithms and Apriori algorithm have been given in Table 2. The parameter values of the algorithm listed in Table 2 are the default values given in the articles. However, in order to evaluate the algorithms under equal conditions, the number of evaluations has been selected as 10000 and the number of population has been chosen as 50 in all algorithms. All of the algorithms excluding Apriori have been executed 10 times and the obtained values for comparison metrics have been recorded. The average support values obtained from all of the intelligent optimization algorithms and Apriori algorithm within eleven datasets have been shown in Table 3. According to this table, the algorithm proposed in [28] has outperformed in five out of eleven datasets. However, this algorithm has not produced any rule within three datasets namely; Quake, Sleep, and Triazines according to the determined parameters. GAR seems the second best algorithm in terms of obtained average support values. In two out of eleven datasets, GENAR has found the highest value for average support metrics and it is the third best algorithm. Fig. 2 demonstrates the average support values obtained from algorithms within eleven real-world numerical datasets. Apriori based algorithms seem worse performance than the other algorithms in terms of support. This is due to the partitioning of variables that leads to information loss. Furthermore, in Apriori based algorithms, an attribute is discretized without taking into account the other attributes and this causes the attribute interaction problem. That is why, intelligent optimization algorithms seem to outperformed the Apriori based approaches. When evaluated in terms of number of attributes; in the databases with few attributes (basketball, vineyard, bolts), the algorithm proposed in [28] has the best performance in terms of average support. It is seen that GAR algorithm has the best performance in terms of average support in databases with high number of attributes (pollution, triazines). In the databases with a low number of records (basketball, bolts, vineyard), the algorithm proposed in [28] has the best performance in terms of average support. Table 2 Algorithm parameters Algorithms

Parameters

[28]

Number of Evaluations: 10000, Initial Random Chromosomes: 6, r-dividing Points: 3, Tournament Size: 10, Probability of Crossover: 0.7, Min Probability of Mutation: 0.05, Max Probability of Mutation:

Journal Pre-proof 0.9, Importance of Rules Support: 5, Importance of Rules Confidence: 20, Importance of Number of Involved Attributes: 0.05, Importance of Intervals Amplitude: 0.02, Importance of Number of Records Already Covered: 0.01, Amplitude Factor: 2.0 Number of Evaluations: 10000, Number of Population: 50, Number of Bits per Gene: 30, Factor for Parent Centric BLX Crossover: 1.0, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Partitions for Numeric Attributes: 4, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Fixed Length of Association Rules: 2, Probability of Selection: 0.75, Probability of Crossover: 0.7, Probability of Mutation: 0.1, Number of Partitions for Numeric Attributes: 4 Number of Evaluations: 10000, Number of Population: 50, Number of Itemsets: 100, Probability of Selection: 0.25, Probability of Crossover: 0.7, Probability of Mutation: 0.1, Importance of Number of Records Already Covered: 0.4, Importance of Intervals Amplitude: 0.7, Importance of Number of Involved Attributes: 0.5, Amplitude Factor: 2.0, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Number of Association Rules: 30, Probability of Selection: 0.25, Probability of Mutation: 0.1, Penalization Factor: 0.7, Amplitude Factor: 2.0 Number of Evaluations: 10000, Number of Population: 50, Probability of Mutation: 0.01, Probability of Crossover: 0.8, Parameter d for MMA Crossover: 0.35, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Probability of Mutation: 0.01, Probability of Crossover: 0.8, Parameter d for MMA Crossover: 0.35, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8

[32]

Apriori [1] EARMGA [23]

GENAR [17]

Genetic Fuzzy Apriori [33] Genetic Fuzzy AprioriDC [34]

Table 3 Average support Algorithm Dataset

[28]

[32]

Apriori [1]

EARMGA [23]

GAR [16]

Basketball

0.98

0.23

0.15

0.27

Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep

0.93 0.75 0.13 0.02 0.96 -

0.14 0.16 0.13 0.26 0.17

0.15 0.15 0.17 0.13 0.13 0.25

0.05 0.41 0.27 0.25 0.47 0.19 -

Stock price Triazines Vineyard

0.01 0.95

0.15 0.27

0.26 0.05 0.33

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

0.75

0.27

0.16

0.14

0.70 0.21 0.14 0.65 -

0.44 0.13 0.21 0.20 0.01 0.50 -

0.13 0.15 0.13 0.12 0.13 0.22 0.14

0.14 0.15 0.13 0.22 0.17

0.39 0.29 0.74

0.27 0.03 0.45

0.13 0.16

0.12 0.16

al

Pr e-

GENAR [17]

urn

1.2 1 0.8 0.6 0.4

Jo

0.2 0

0.21 0.13 0.15

p ro

of

GAR [16]

Basketball Bodyfat

Bolts

Longley

Pollution

Pwlinear

Quake

Sleep

Stock price Triazines

[28]

[32]

Apriori [1]

EARMGA [23]

GAR [16]

GENAR [17]

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

Vineyard

Fig. 2 Average support values of the rules obtained from algorithms within datasets Table 4 shows the average confidence values obtained from the intelligent optimization algorithms and Apriori algorithm within eleven datasets. The EARMGA seems the best algorithm in terms of obtained confidence values according to the determined minimum confidence value. This algorithm has the best

Journal Pre-proof

Table 4 Average confidence Algorithm Dataset

[28]

[32]

Apriori [1]

EARMGA [23]

GAR [16]

Basketball

1.00

0.90

0.87

1.00

0.88

Bodyfat

1.00

-

0.93

1.00

0.93

Bolts

1.00

0.95

0.99

1.00

0.97

Longley

1.00

0.96

1.00

1.00

1.00

Pollution

1.00

-

0.95

1.00

0.91

Pwlinear

1.00

0.83

0.88

1.00

-

Quake

-

0.89

0.91

1.00

-

0.92

Sleep Stock price Triazines Vineyard

-

0.95

0.97

-

-

1.00

0.93

0.91

1.00

-

0.93

0.93

1.00 1.00

Genetic Fuzzy AprioriDC [34]

p ro

Genetic Fuzzy Apriori [33]

0.97

0.85

0.98

0.92

-

1.00

0.96

0.97

1.00

0.94

0.95

0.99

0.90

-

1.00

0.81

0.86

0.89

0.90

-

0.89

0.95

0.89

0.92

0.87

0.91

0.91 0.91

1.00 0.99

0.87

0.88

Pr e-

1.00

GENAR [17]

of

performance in ten out of eleven numerical datasets. The work proposed by Alatas and Akin [28] is the second best algorithm and it is more successful in eight out of eleven datasets. GENAR seems the third best algorithm and it outperforms in five out of eleven datasets among eight algorithms. Fig. 3 demonstrates the average confidence values of the rules obtained from the intelligent optimization algorithms and Apriori algorithm within eleven datasets. Similar to the results obtained in terms of support values, Apriori based algorithms do not outperform the other algorithms in terms of confidence. Apriori based algorithms change the data that is likely to accommodate the accurate and interesting rules outside the respective ranges. Because of attribute interaction, important accurate rules have not been found outside of these boundaries of the related attributes. It seems that discretization has led to information loss because these intervals have not been automatically found in the data mining process without a preprocessing such as discretization. Integrating the automated discovery of ranges of continuous attributes, reduction of attributes, and the mining of numerical rules into a single step is very meaningful in terms of accuracy.

0.86

al

The number of generated association rules via the algorithms has been listed in Table 5. As seen in this table, GAR and the algorithm proposed in [28] have mined fewer rules than the other intelligent optimization and search algorithms with respect to the eleven datasets. Less number of discovered association rules may be more readable and comprehensible. High-quality few rules are more meaningful than huge numbers of complex, unreadable, and not high-quality rules. In this perspective, Apriori and Genetic Fuzzy AprioriDC algorithms have found more unreadable rule set according to the selected parameters.

urn

1.2 1 0.8 0.6 0.4 0.2 0

Bolts

Jo

Basketball Bodyfat

Longley

[28] EARMGA [23] Genetic Fuzzy Apriori [33]

Pollution

Pwlinear

Quake

[32] GAR [16] Genetic Fuzzy AprioriDC [34]

Sleep

Stock price Triazines

Vineyard

Apriori [1] GENAR [17]

Fig. 3 Average confidence values of the rules obtained from algorithms within datasets Table 5 Number of generated association rules Algorithm Dataset

[28]

[32]

Apriori [1]

EARMGA [23]

GAR [16]

GENAR [17]

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

Journal Pre-proof Basketball

7

706

4

50

2

30

5

Bodyfat Bolts

63

47

-

24743

50

76

30

120418

-

21

3241

1246

50

28

30

80

475

Longley

7

25046

1406

50

21

16

232

1513

Pollution

16

-

41510

50

31

30

19708

-

Pwlinear

34

21

50

-

30

1

9

-

121

18

50

-

30

14

24

Sleep Stock price Triazines

-

24415

1096

-

-

-

44

2242

1

983275

855

50

1

30

9

7850

-

-

-

50

18

30

Vineyard

5

283

11

50

2

30

-

-

21

55

p ro

Table 6 Number of covered records (%)

of

39

Quake

[28]

[32]

Basketball

100.00

100.00

33.34

100.00

96.88

89.59

69.79

100.00

Bodyfat

99.21

-

100.00

73.02

100.00

81.75

100.00

-

Bolts

80.00

100.00

97.50

100.00

97.50

35.00

100.00

100.00

Longley

12.50

100.00

100.00

100.00

68.75

100.00

100.00

100.00

Pollution

1.67

-

100.00

100.00

100.00

46.67

100.00

-

Pwlinear

100.00

100.00

84.50

100.00

-

15.0

61.50

95.50

Quake

-

100.00

90.55

100.00

-

81.92

94.26

97.25

Sleep Stock price Triazines

-

100.00

96.78

-

-

-

96.77

100.00

0.11

100.00

99.48

100.00

44.00

85.69

46.95

100.00

-

-

-

84.41

-

-

98.08

100.00

50.00

100.00

96.78 88.47

27.96

Vineyard

94.24

100.00

100.00

Apriori [1]

EARMGA [23]

GAR [16]

GENAR [17]

Pr e-

Algorithm Dataset

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

urn

al

The number of covered records according to the mined rules from the algorithms within the datasets has been shown in Table 6. According to these results, EARMGA and the algorithm proposed by Alcalá-Fdez et al. have shown better performance in terms of covered records. Both have outperformed in eight out of eleven datasets [32]. By performing more sensitivity analysis for the algorithm parameters, more successful results may be obtained. Due to the stochastic characteristics of intelligent optimization algorithms, there is not a unique best algorithm for all the datasets considering all metrics. This is coherent with no-free-lunch theorem that states there is not a general-purposed universal optimization algorithm which outperforms the other methods. If one intelligent optimization algorithm is specialized to the structure of the specific problem under consideration, it can outperform other algorithms. The obtained number of frequent itemsets within the datasets has been listed in Table 7. EARMGA, GENAR, and the approach proposed by Alatas and Akin [28] directly mine the rules without generating the frequent itemsets. Hence they have been excluded from the table. Table 7 Number of frequent itemsets [32]

Apriori [1]

GAR [16]

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

66 11103 707 568 14730 417 39 508 615 -

100 100 100 100 100 100 100 100 100

64 32564 97 155 8076 53 31 86 41 -

177 347 639 259 49 1037 3215 -

Jo

Algorithm Dataset Basketball Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines

412 1720 7196 530 132 7070 198206 -

Journal Pre-proof Vineyard

203

47

100

44

88

Table 8 shows the average time in seconds for the algorithms to mine numerical association rules within each dataset. GENAR seems the best algorithm within six out of eleven datasets. Apriori is the second best algorithm among other algorithms. Intelligent optimization algorithms have outperformed Apriori based approaches in other metrics. However, the relative better performance of Apriori with respect to time metric is due its deterministic characteristics. Table 8 Time [28]

[32]

Apriori [1]

EARMGA [23]

GAR [16]

GENAR [17]

Basketball

1.447

0.829

1.440

0.161

0.835 0.398 0.224 0.408 0.778 0.966 0.017

0.736 1.115 2.654 8.866 1.577

0.056 1.416 0.199 0.151 1.425 0.083 0.122 0.199

0.404

Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines Vineyard

0.380 0.252 0.167 0.263 0.389 1.554 0.017

2.904 1.116 1.200 1.325 1.805 12.372 0.016

0.280 0.121 0.081 0.175 0.167 0.684 0.016

0.493

894.274

0.306

0.774

5.069

0.396 0.359

0.458

0.040

0.380 0.211

54.352 1.120

Genetic Fuzzy Apriori [33]

Genetic Fuzzy AprioriDC [34]

of

Algorithm Dataset

0.447 0.300 0.278 0.549 3.510 0.383

0.418

8.254

2.308

0.329 0.113

0.409

0.255

Pr e-

p ro

0.724

14.741 0.560 0.341 2.213 2.247 8.841 0.679

Table 9-Table 19 demonstrate the sample rules mined from the algorithms within the eleven datasets. The first column Alg represents the name of the algorithm; the Rule represents the mined association rules. Conf, Sup, and Lift are shown for the confidence, support, and the lift values of the rules, respectively. Due to the fuzzy rules, the Lift values have not been included in this table. GENAR seems to have the worst comprehensibility according to the rules within the used dataset. The number of attributes in the numerical association rules obtained from GENAR is the highest among other algorithms. Table 9 Sample rules mined from the algorithms within basketball dataset

[17] [33] [34]

al

Basketball Rule points ∉ [0.6319, 0.829] ⇒ assists ∉ [0.2772, 0.3436] ᴧ age ∈ [22, 37] assists ∈ [low] ᴧ age ∈ [high] ⇒ height ∈ [middle] assists ∈ [0.19655, 0.270125] ᴧ age ∈ [25.75, 29.5] ⇒ height ∈ [181.5, 192.25] points ∈ [0.1593, 0.4942] ⇒ time ∈ [10.08, 40.71] points ∈ [0.2938, 0.5777] ⇒ height ∈ [183, 198] assists ∈ [0.09132, 0.2384] ᴧ height ∈ [183, 203] ᴧ time_played ∈ [20.2425, 35.5575] ᴧ age ∈ [22, 28] ⇒ points ∈ [0.26705, 0.60195] assists ∈ [low] ⇒ height ∈ [high] assists ∈ [middle] ᴧ height ∈ [middle] ᴧ time_played ∈ [high] ⇒ points ∈ [middle]

urn

Alg [28] [32] [1] [23] [16]

Conf 1.00 1.00 0.93 1.00 0.89

Sup 1.00 0.15 0.14 0.79 0.75

Lift 1.00 4.06 1.00 1.02

1.00

0.28

1.11

0.81 0.97

0.27 0.11

-

1.00

0.98

1.02

1.00 1.00 0.99

0.47 0.37 0.83

2.16 1.00 1.11

1.00

0.47

1.19

0.91

0.56

-

1.00

0.75

1.00

Table 10 Sample rules mined from the algorithms within bodyfat dataset

[1] [23] [16] [17] [33]

Bodyfat Neck ∉ [43.9001, 51.2] ᴧ chest ∈ [83.4, 121.6] ᴧ knee ∉ [35.5001, 35.5999] ᴧ forearm ∈ [23.1, 33.8] ⇒ biceps ∉ [39.10, 45.0] ˄ wrist ∈ [16.1, 20.9] ᴧ class ∉ [40.10, 47.4999] density ∈ [1.05195, 1.080425] ⇒ height ∈ [65.6875, 77.75] forearm ∈ [21.0, 27.95] ⇒ class ∈ [0.0, 47.5] thigh ∈ [52.1984, 66.5827] ⇒ weight ∈ [138.9796, 224.9768] density ∈ [1.033825, 1.090775] ᴧ age ∈ [32.25, 61.75] ᴧ weight ∈ [118.5, 212.6625] ᴧ height ∈ [54.6875, 77.75] ᴧ neck ∈ [31.475, 41.525] ᴧ chest ∈ [84.175, 112.625] ᴧ abdomen ∈ [69.4, 105.7750] ᴧ hip ∈ [85.0, 110.875] ᴧ thigh ∈ [49.175, 69.225] ᴧ knee ∈ [33.675, 41.725] ᴧ ankle ∈ [19.1, 25.2] ᴧ biceps ∈ [27.3499, 37.4499] ᴧ forearm ∈ [24.9249, 31.875] ᴧ wrist ∈ [16.4, 19.2] ⇒ class ∈ [5.02499, 28.775] weight ∈ [low] ⇒ hip ∈ [low]

Jo

[28]

Table 11 Sample rules mined from the algorithms within bolts dataset [28]

Bolts run ∉ [26.0, 27.0] ᴧ speed2 ∉ [1.5001, 2.4999] ᴧ sens ∉ [7.0, 9.0] ⇒ number2 ∈ [0.0, 2.0]

Journal Pre-proof [32] [1] [23] [16] [17] [33] [34]

t20bolt ∈ [middle] ⇒ time ∈ [low] t20bolt ∈ [7.32, 27.825000000000003] ⇒ time ∈ [3.94, 36.457499999999996] time ∈ [3.94, 68.975] ᴠ [101.49249, 134.01] ⇒ speed2 ∈ [1.5, 2.0] ᴠ [2.25, 2.5] time ∈ [5.1752, 40.3566] ⇒ sens ∈ [6.0, 10.0] run ∈ [1.0, 19.0] ᴧ speed1 ∈ [3.0, 5.0] ᴧ total ∈ [15.0, 25.0] ᴧ speed2 ∈ [1.75, 2.25] ᴧ number2 ∈ [1.0, 1.0] ᴧ sens ∈ [6.0, 10.0] ᴧ time ∈ [3.94, 48.0975] ⇒ t20bolt ∈ [7.32, 36.085] total ∈ [low ] ⇒ time ∈ [low] t20bolt ∈ [low] ⇒ time ∈ [low]

0.99 1.00 1.00 1.00

0.50 0.63 0.98 0.78

1.30 1.00 1.12

1.00

0.15

1.43

1.00 0.98

0.26 0.48

-

1.00 0.99 1.00 1.00 1.00

0.13 0.43 0.25 0.57 0.19

3.21 4.00 1.00 5.34

1.00

0.25

2.29

0.99 0.97

0.32 0.31

-

1.00 1.00 1.00 0.98

0.02 0.57 0.99 0.74

1.20 1.77 1.00 1.13

1.00

0.24

1.23

0.97

0.38

-

1.00

1.00

1.00

0.85 1.00 1.00

0.29 0.15 1.00

1.87 1.00

1.00

0.01

1.31

0.81 0.90

0.13 0.22

-

0.91 0.91 1.00

0.59 0.66 0.73

1.01 1.00

0.96

0.56

1.01

0.86 0.88

0.50 0.54

-

Table 12 Sample rules mined from the algorithms within longley dataset

[33] [34]

of

[17]

Longley year ∉ [1949.0, 1962.0] ⇒ unemployed ∉ [2357.0, 4806.0] gnp ∈ [low] ⇒ population ∈ [low] gnp ∈ [234289.0, 314440.25] ⇒ deflator ∈ [83.0, 91.475] employed ∈ [1.5, 2.0] ᴠ [2.25, 2.5] ⇒ unemployed ∈ [1870.0, 4806.0] year ∈ [1952.0, 1954.0] ⇒ gnp ∈ [346996.6894, 365755.9439] deflator ∈ [92.725, 109.675] ᴧ gnp ∈ [317317.75, 477620] ᴧ unemployed ∈ [2170, 3638] ᴧ armed_forces ∈ [2514, 3582] ᴧ population ∈ [111770, 123006] ᴧ year ∈ [1952, 1958] ⇒ employed ∈ [63424, 68614] population ∈ [high] ⇒ deflator ∈[high] deflator ∈ [high] ⇒ armed_forces ∈ [middle]

p ro

[28] [32] [1] [23] [16]

Table 13 Sample rules mined from the algorithms within pollution dataset

[17] [33]

Pollution wwdrk ∈ [51.9, 51.9] ᴧ so ∉ [3.0001, 278.0] ⇒ educ ∉ [9.0, 12.1999] jant ∈ [25.75, 39.5] ᴧ nox ∈ [1.0, 80.5] ⇒ hc ∈ [1.0, 162.75] humid ∈ [46.75, 73.0] ⇒ nox ∈ [1.0, 80.5][160.0, 319.0] jant ∈ [22.7769, 40.1764] ⇒ hc ∈ [3.7535, 67.7596] prec ∈ [23.5, 48.5] ᴧ jant ∈ [14.25, 41.75] ᴧ jult ∈ [65.5, 76.5] ᴧ ovr65 ∈ [7.25, 10.35] ᴧ popn ∈ [3.1375, 3.4425] ᴧ educ ∈ [10.575, 12.2250] ᴧ hous ∈ [75.525, 87.475] ᴧ dens ∈ [1643.3739, 5570.3961] ᴧ nonw ∈ [0.8, 18.225] ᴧ wwdrk ∈ [38.8250, 51.775] ᴧ poor ∈ [9.4, 16.45] ᴧ hc ∈ [1.0, 165.75] ᴧ nox ∈ [1.0, 94.5] ᴧ so ∈ [1.0, 128.25] ᴧ humid ∈ [49.25, 66.75] ⇒ mort ∈ [842.6283, 1003.8398] nox ∈ [low] ⇒ hc ∈ [low]

Pr e-

[28] [1] [23] [16]

Table 14 Sample rules mined from the algorithms within pwlinear dataset

[32] [1] [23] [17] [33] [34]

Pwlinear a5 ∉ [1.0E-4, 0.9999] ᴧ a6 ∉ [-0.9999, -1.0E-4] ⇒ a1 ∉ [-0.9999, 0.9999] ᴧ a9 ∉ [1.0E-4, 0.9999] ᴧ a10 ∉ [0.9999, -1.0E-4] a6 ∈ [middle] ⇒ class ∈ [middle] class ∈ [-10.81, -5.08500] ⇒ a1 ∈ [0.5, 1.0] a7 ∈ [-1.0, 1.0] ⇒ a2 ∈ [-1.0, 0.0] ᴠ [0.5, 1.0] a1 ∈ [-1.0, -0.5] ᴧ a2 ∈ [-1.0, -0.5] ᴧ a3 ∈ [-1.0, -0.5] ᴧ a4 ∈ [0.5, 1.0] ᴧ a5 ∈ [0.5, 1.0] ᴧ a6 ∈ [-0.5, 0.5] ᴧ a7 ∈ [-1.0, -0.5] ᴧ a8 ∈ [-0.5, 0.5] ᴧ a9 ∈ [-0.5, 0.5] ᴧ a10 ∈ [-0.5, 0.5] ⇒ class ∈ [-8.075, 3.375] a3 = low ⇒ class = middle class ∈ [low] ⇒ a1 ∈ [high]

al

[28]

[17] [33] [34]

Quake latitude ∈ [middle] ⇒ focal_depth ∈ [low] richter ∈ [5.8, 6.075] ⇒ focal_depth ∈ [0.0, 164.0] richter ∈ [5.8, 6.075] ⇒ latitude ∈ [-66.49, 78.15] focal_depth ∈ [0.0, 256.0] ᴧ latitude ∈ [-17.1099, 55.2099] ᴧ longtitude ∈ [-7.71, 172.27] ⇒ richter ∈ [5.8, 6.3204] richter ∈ [low ] ⇒ focal_depth ∈ [low ] richter ∈ [low] ⇒ focal_depth ∈ [low]

Jo

[32] [1] [23]

urn

Table 15 Sample rules mined from the algorithms within quake dataset

Table 16 Sample rules mined from the algorithms within sleep dataset [32] [1] [33] [34]

Sleep brain_weight ∈ [ low] ⇒ body_weight ∈ [ low] body_weight ∈ [0.005, 1663.5037] ⇒ brain_weight ∈ [0.14, 1428.105] brain_weight ∈ [low]ᴧmax_life_span ∈ [low] ⇒ body_weight ∈ [low] body_weight ∈ [ low] ⇒ brain_weight ∈ [low]

1.00 1.00 0.98 0.98

0.93 0.97 0.25 0.87

1.04 -

Table 17 Sample rules mined from the algorithms within stock price dataset [28] [32] [1]

Stock Price company5 ∈ [36.5, 36.5] ᴧ company10 ∈ [57.25, 57.25] ⇒ company4 ∈ [42.875, 42.875] company7 ∈ [middle] ⇒ company4 ∈ [middle] company2 ∈ [50.0, 60.25] ᴧ company5 ∈ [60.9375, 77.5312] ᴧ company10 ∈ [34.0, 41.0] ⇒ company1 ∈ [17.219, 28.2892]

1.00 0.91

0.01 0.58

105.56 -

0.98

0.19

3.48

Journal Pre-proof company3 ∈ [12.75, 15.84375] ᴠ [18.9375, 25.125] ⇒ company8 ∈ [16.375, 29.25] company3 ∈ [19.5939, 22.6336] ⇒ company2 ∈ [49.1171, 59.1645] company1 ∈ [17.219, 33.35125] ᴧ company2 ∈ [44.75, 60.25] ᴧ company3 ∈ [17.53125, 23.71875] ᴧ company4 ∈ [36.5625, 49.4375] ᴧ company5 ∈ [48.15625, 81.34375] ᴧ company6 ∈ [23.96875, 34.53125] ᴧ company7 ∈ [58.9375, 73.5625] ᴧ company8 ∈ [16.375, 22.09375] ᴧ company9 ∈ [38.875, 49.625] ⇒ company10 ∈ [34.0, 46.625] company1 ∈ [middle] ᴧ company9 ∈ [middle] ᴧ company10 ∈ [low] ⇒ company8 ∈ [middle] company3 ∈ [middle] ᴧ company6 ∈ [middle] ⇒ company9 ∈ [middle]

[23] [16] [17] [33] [34]

1.00 0.89

0.78 0.39

1.00 1.80

1.00

0.25

1.99

0.93 0.93

0.12 0.26

-

Table 18 Sample rules mined from the algorithms within traizenes dataset Traizenes p1_polar ∈ [0.1, 0.3] ⇒ p5_flex ∈ [0.0, 0.0] p1_polar ∈ [0.4749, 0.6555] ⇒ p1_size ∈ [0.2073, 0.4068] p1_polar ∈ [0.1, 0.3] ᴧ p1_size ∈ [0.1, 0.3] ᴧ p1_flex ∈ [0.1, 0.3] ᴧ p1_h_doner ∈ [0.1, 0.3] ᴧ p1_h_acceptor ∈ [0.1, 03] ᴧ p1_pi_doner ∈ [0.1, 0.3] ᴧ p1_pi_acceptor ∈ [0.1, 0.3] ᴧ p1_polarisable ∈ [0.1, 0.3] ᴧ p1_sigma ∈ [0.1, 0.3] ᴧ p1_branch ∈ [0.1, 0.3] ᴧ p2_polar ∈ [0.1, 0.3] ᴧ p2_size∈[0.1, 0.3] ᴧ p2_flex ∈ [0.1, 0.3] ᴧ p2_h_doner ∈ [0.1, 0.3] ᴧ p2_h_acceptor ∈ [0.1, 0.3] ᴧ p2_pi_doner∈[0.1, 0.3] ᴧ p2_pi_acceptor ∈ [0.1, 0.3] ᴧ p2_polarisable ∈ [0.1, 0.3] ᴧ p2_sigma ∈ [0.1, 0.3] ᴧ p2_branch ∈ [0.1, 0.3] ᴧ p3_polar ∈ [0.1, 0.3] ᴧ p3_size ∈ [0.1, 0.3] ᴧ p3_flex ∈ [0.1, 0.3] ᴧ p3_h_doner ∈ [0.1, 0.3] ᴧ p3_h_acceptor ∈ [0.1, 0.3] ᴧ p3_pi_doner ∈ [0.1, 0.3] ᴧ p3_pi_acceptor ∈ [0.1, 0.3] ᴧ p3_polarisable ∈ [0.1, 0.3] ᴧ p3_sigma ∈ [0.1, 0.3] ᴧ p3_branch ∈ [0.1, 0.3] ˄ p4_polar ∈ [0.3, 0.7] ᴧ p4_size ∈ [0.40, 0.8] ᴧ p4_flex ∈ [0.243, 0.643] ᴧ p4_h_doner ∈ [0.1, 0.3] ᴧ p4_h_acceptor ∈ [0.167, 0.567] ˄ p4_pi_doner ∈ [0.1, 0.3] ˄ p4_pi_acceptor ∈ [0.1, 0.3] ᴧ p4_polarisable ∈ [0.3, 0.7] ᴧ p4_sigma ∈ [0.1, 0.3] ˄ p4_branch ∈ [0.3, 0.7] ˄ p5_polar ∈ [0.1, 0.3] ᴧ p5_size ∈ [0.1, 0.3] ᴧ p5_flex ∈ [0.0, 0.0] ᴧ p5_h_doner ∈ [0.0, 0.0] ᴧ p5_h_acceptor ∈ [0.1, 0.3] ᴧ p5_pi_doner ∈ [0.1, 0.3] ᴧ p5_pi_acceptor ∈ [0.1, 0.3] ᴧ p5_polarisable ∈ [0.1, 0.3] ᴧ p5_sigma ∈ [0.1, 0.3] ᴧ p5_branch ∈ [0.1, 0.3] ᴧ p6_polar ∈ [0.1, 0.3] ᴧ p6_size ∈ [0.1, 0.3] ᴧ p6_flex ∈ [0.1, 0.3] ᴧ p6_h_doner ∈ [0.1, 0.3] ᴧ p6_h_acceptor ∈ [0.1, 0.3] ᴧ p6_pi_doner ∈ [0.1, 0.3] ᴧ p6_pi_acceptor ∈ [0.1, 0.3] ᴧ p6_polarisable ∈ [0.1, 0.3] ᴧ p6_sigma ∈ [0.1, 0.3] ᴧ p1_branch ∈ [0.1, 0.3] ⇒ activity ∈ [0.453, 0.853]

1.00 1.00

0.48 0.33

1.00 2.59

1.00

0.05

1.18

1.00 0.98 1.00 1.00 0.93

0.97 0.41 0.25 0.77 0.74

1.00 4.00 1.00 1.13

1.00

0.49

1.45

0.95 0.92

0.20 0.27

-

p ro

of

[23] [16]

Pr e-

[17]

Table 19 Sample rules mined from the algorithms within vineyard dataset

Vineyard row_number ∉ [48.0, 49.0] ⇒ lugs_1990 ∉ [12.0001, 12.4999] row_number ∈ [middle] ˄ lugs_1990 ∈ [high] ⇒ lugs_1991 ∈ [high] row_number ∈ [39.25, 52.0] ⇒ lugs_1989 ∈ [0.0, 2.0] lugs_1989 ∈ [0.0, 2.0] ᴠ [4.0, 6.0] ⇒ lugs_1990 ∈ [2.5, 14.0] lugs_1990 ∈ [7.8916, 13.1467] ⇒ lugs_1991 ∈ [14.7244, 23.0200] row_number ∈ [2.0, 27.0] ˄ lugs_1989 ∈ [2.5, 6.5] ˄ lugs_1990 ∈ [7.625, 13.375] ⇒ lugs_1991 ∈ [16.125, 26.0] row_number ∈ [high] ⇒ lugs_1989 ∈ [low] row_number ∈ [high] ⇒ lugs_1989 ∈ [low]

[28] [32] [1] [23] [16] [17] [33] [34]

Conclusions

al

4.

   

Evolutionary intelligent algorithms have better performances in terms of support, confidence, and time metrics. Although the algorithm proposed by Alatas and Akin [28] has not produced any rule within three datasets according to the selected algorithm parameters, this method has outperformed in five out of eleven datasets with respect to the average support. EARMGA seems the best algorithm in terms of obtained average confidence values in ten out of eleven numerical datasets according to the determined standard minimum confidence value. GAR and the algorithm proposed by Alatas and Akin [28] have mined fewer rules than the other intelligent algorithms and Apriori with respect to the eleven datasets and these algorithms seem to find more comprehensible rules. GENAR seems the best algorithm within eight out of eleven datasets with respect to the time metric. However, this algorithm seems to have the worst comprehensibility according to the rules within the used dataset. The number of attributes in the numerical association rules obtained from GENAR is the highest among all other algorithms.

Jo



urn

In this paper, a study on the performances of the evolutionary and fuzzy evolutionary intelligent algorithms for NARM problem has been presented. Performances of seven intelligent algorithms and Apriori algorithm in terms of support, confidence, number of mined rules, number of covered records, and time metrics have been comparatively performed with eleven real-world datasets that have numeric-valued attributes. One of the best mined rules obtained by each algorithm has also been given and analyzed with respect to confidence, support, and lift metrics. By evaluating the results over eleven real-world numerical datasets, following conclusions about the effectiveness of these methods can be deduced:

Journal Pre-proof

of

According to the obtained results, intelligent search and optimization algorithms seem the best alternatives for the complex numerical association rules mining problem. The search space is very huge and there are many metrics that should be optimized for this type of problem. Automating the rule discovery process is also efficiently performed by intelligent algorithms by eliminating the pre-need of the many metrics and preprocesses of the data. Attribute interaction problem is also eliminated. Due to the promising results and high performance of intelligent algorithms in NARM problem, many new and efficient intelligent search and optimization algorithms can be adapted for better results. Their multiobjective versions can also be proposed for simultaneously satisfying the different objectives for numerical association rules. Recent intelligent optimization algorithms in the literature can also be hybridized to provide better results for this problem. References

[6] [7] [8]

[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]

[19] [20]

p ro

[5]

Pr e-

[4]

al

[3]

urn

[2]

Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD pp. 1–12 Lian W, Cheung DW, Yiu SM (2015) An efficient algorithm for finding dense regions for mining quantitative association rules. Computers & Mathematics with Applications 50(3-4):471-490 Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of IEEE international conference on data engineering pp. 220–231 Guo Y, Yang J, Huang Y (2008) An effective algorithm for mining quantitative association rules based on high dimension cluster. In fourth IEEE International Conference on Wireless Communications, Networking and Mobile Computing pp. 1–4 Junrui Y, Feng Z (2010) An effective algorithm for mining quantitative associations based on subspace clustering. In 2nd International Conference on Networking and Digital Society (ICNDS) IEEE 1:175–178 Abood AM, Hussein MS, Namdar JH (2018) Data mining using association rules with fuzzy logic. Scholars Press Prakash S, Parvathi R (2011) Qualitative approach for quantitative association rule mining using fuzzy rule set. Journal of Computational Information Systems 7(6):1879–1885 Watanabe T, Takahashi H (2011) A study on quantitative association rules mining algorithm based on clustering algorithm. International Journal of Biomedical Soft Computing and Human Sciences: The official journal of the Biomedical Fuzzy Systems Association 16(2):59-67 Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 261-270 Alatas B, Akin E (2009) Chaotically encoded particle swarm optimization algorithm and its applications. Chaos, Solitons & Fractals 41(2):939-950 Alatas B, Akin E (2008) Rough particle swarm optimization and its applications in data mining. Soft Computing 12(12):1205-1218 Yan D, Zhao X, Lin R, Bai D (2018) PPQAR: Parallel PSO for quantitative association rule mining. In IEEE International Conference on Big Data and Smart Computing (BigComp) pp. 163-169 Agbehadji IE, Fong S, Millham R (2016) Wolf search algorithm for numeric association rule mining. In IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) pp. 146-151 Holland JH (1975) Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI Zhang K, Du H, Feldman MW (2017) Maximizing influence in a social network: Improved results using a genetic algorithm. Physica A: Statistical Mechanics and its Applications, 478:20-30 Mata J, Alvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. Lecture Notes in Artificial Intelligence 2336:40–51 Mata J, Alvarez JL, Riquelme JC (2001) Mining numeric association rules with genetic algorithms. In Artificial Neural Nets and Genetic Algorithms Springer, Vienna pp. 264-267 Álvarez VP, Vázquez JM (2012) An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization. Expert Systems with Applications 39(1):585-593 Salleb-Aouissi A, Vrain C, Nortet C, Kong X, Rathod V, Cassard D (2013) QuantMiner for mining quantitative association rules. The Journal of Machine Learning Research 14(1):3153-3157 Seki H, Nagao M (2017) An efficient java implementation of a GA-based miner for relational association rules with numerical attributes. In IEEE International Conference on Systems, Man, and Cybernetics (SMC) pp. 2028-2033

Jo

[1]

Journal Pre-proof

Jo

urn

al

Pr e-

p ro

of

[21] Kumar P, Singh AK (2019) Efficient generation of association rules from numeric data using genetic algorithm for smart cities. In Security in Smart Cities: Models, Applications, and Challenges, Springer, Cham pp. 323-343 [22] Martínez-Ballesteros M, Troncoso A, Martínez-Álvarez F, Riquelme JC (2010) Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution. Integrated Computer-Aided Engineering 17(3):227-242 [23] Yan X, Zhang C, Zhang S (2009) Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Expert Systems with Applications 36(2): 3066-3076 [24] Alcala-Fdez J, Flugy-Pape N, Bonarini A, Herrera F (2010) Analysis of the effectiveness of the genetic algorithms based on extraction of association rules. Fundamenta Informaticae 98(1): 1-14 [25] Storn R, Price K (1995) Differential Evolution: A simple and efficient adaptive scheme for global optimization over continuous spaces. Technical Report TR-95-012, International Computer Science Institute, Berkeley [26] Zhao Y, Li M, Lu X, Tian L, Yu Z, Huang K, Li T (2017) Optimal layout design of obstacles for panic evacuation using differential evolution. Physica A: Statistical Mechanics and its Applications, 465:175194 [27] Fister I, Iglesias A, Galvez A, Del Ser J, Osaba E (2018) Differential evolution for association rule mining using categorical and numerical attributes. In International Conference on Intelligent Data Engineering and Automated Learning, Springer, Cham pp. 79-88 [28] Alatas B, Akin E (2006) An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules. Soft Computing 10(3):230-237 [29] Yang G (2010) Mining association rules from data with hybrid attributes based on immune genetic algorithm. In IEEE Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) 3:1446-1449 [30] Can U, Alatas B (2017) Automatic mining of quantitative association rules with gravitational search algorithm. International Journal of Software Engineering and Knowledge Engineering 27(3):343-372 [31] Guvenir HA, Uysal I (2000) Bilkent university function approximation repository. http://funapp.cs.bilkent.edu.tr/DataSets [32] Alcalá-Fdez J, Alcalá R, Gacto MJ, Herrera F (2009) Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms. Fuzzy Sets and Systems 160(7): 905-921 [33] Hong TP, Chen CH, Wu YL, Lee YC (2006) A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions. Soft Computing 10(11):1091-1101 [34] Hong TP, Chen CH, Lee YC, Wu YL (2008) Genetic-fuzzy data mining with divide-and-conquer strategy. IEEE Transactions on Evolutionary Computation 12(2):252-265

Intelligent optimization algorithms for the problem of mining numerical association rules

Intelligent optimization algorithms for the problem of mining numerical association rules

Recommend Documents