Journal Pre-proof Intelligent optimization algorithms for the problem of mining numerical association rules Elif Varol Altay, Bilal Alatas
PII: DOI: Reference:
S0378-4371(19)31770-4 https://doi.org/10.1016/j.physa.2019.123142 PHYSA 123142
To appear in:
Physica A
Received date : 24 May 2019 Revised date : 17 July 2019 Please cite this article as: E.V. Altay and B. Alatas, Intelligent optimization algorithms for the problem of mining numerical association rules, Physica A (2019), doi: https://doi.org/10.1016/j.physa.2019.123142. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier B.V.
*Highlights (for review)
Journal Pre-proof HIGHLIGHTS
Jo
urn
al
Pr e-
p ro
of
• It has been demonstrated that intelligent optimization algorithms are the most suitable methods for data mining problems in numerical data • The efficiency comparison of seven evolutionary and fuzzy evolutionary intelligent algorithms based numerical association rules mining (NARM) approaches has been performed for the first time • The obtained results have been compared with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem • A comparative analysis of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics within eleven real-world datasets has been performed
*Manuscript Click here to view linked References
Journal Pre-proof Intelligent Optimization Algorithms for the Problem of Mining Numerical Association Rules Elif Varol Altay* Department of Software Engineering, Firat University Elazig, Turkey
[email protected] * Corresponding author
p ro
of
Bilal Alatas Department of Software Engineering, Firat University Elazig, Turkey
[email protected]
Abstract
al
Pr e-
There are many effective approaches that have been proposed for association rules mining (ARM) on binary or discrete-valued data. However, in many real-world applications, the data usually consist of numerical values and the standard algorithms cannot work or give promising results on these datasets. In numerical ARM (NARM), it is a difficult problem to determine which attributes will be included in the rules to be discovered and which ones will be on the left of the rule and which ones on the right. It is also difficult to automatically adjusting of most relevant ranges for numerical attributes. Directly discovering the rules without generating the frequent itemsets as used in the literature as the first step of ARM accelerates the whole process without determining the metrics needed for this step. In classical ARM algorithms, generally one or two metrics are considered. However, mined rules are needed to be comprehensible, surprising, interesting, accurate, confidential, and etc. in many real-world applications. Adjusting all of these processes without the need for the metrics to be pre-determined for each dataset seems another problem. For these purposes, evolutionary intelligent optimization algorithms seem potential solution method for this complex problem. In this paper, the performance analysis of seven evolutionary algorithms and fuzzy evolutionary algorithms; namely Alatasetal, Alcalaetal, EARMGA, GAR, GENAR, Genetic Fuzzy Apriori, and Genetic Fuzzy AprioriDC for NARM problem has been performed within eleven real datasets for the first time. The obtained results have also been compared with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem. Performances of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics have been comparatively performed with eleven real-world datasets. One of the best-mined rules obtained by each algorithm has been given and analyzed with respect to confidence, support, and lift metrics.
1.
urn
Keywords: Numerical Association Rules Mining, Evolutionary Algorithms, Fuzzy Evolutionary Algorithms Introduction
Jo
Generally, the success of ARM methods within datasets containing different types of data and especially in numerical or quantitative data may be low. Because, for ARM in numerical data, the data are mostly discretized, and in this way, the data are converted in order to be processed by the working standard ARM algorithms. However, such an approach is not in line with the automatic rule discovery property of data mining. In fact, the dataset has been changed by determining the boundaries of the discrete data beforehand. Thus, the rules for the modified data are found. The process of efficient discretization according to the pre-defined intervals is a difficult problem. It is unreasonable to change the data that is likely to accommodate the accurate and interesting rules outside the respective ranges. Because of attribute interaction, important rules (perhaps very accurate and interesting) cannot be found outside of these boundaries of the related attributes. Discretization may lead to information loss. This means that these intervals should be automatically found in the data mining process without a preprocessing such as discretization. Integrating the discretization of continuous attributes, reduction of attributes, and the mining of numerical rules into a single step is very meaningful in terms of accuracy and speed. Most of the studies on ARM in the literature have been proposed to ensure that the rules are specific. Most of the time, the rules are expected to be accurate, according to the different metrics measured based on the records in the dataset. However, the only accuracy or confidence may not be a sufficient criterion for high-quality rules to emerge. Most of the time, the rules discovered are expected to have many features such as understandability,
Journal Pre-proof
Pr e-
p ro
of
accuracy, reliability, interestingness, surprisingness, and etc. In addition, the requirement of previously identifying some metrics for each dataset hosting the rules to be discovered by classical ARM algorithms can be seen as a deficiency that prevents the automation of data mining applications. Generally, the algorithms used for ARM in the literature work in two stages. In the first stage, frequent itemsets are found and in the second stage, rules are drawn from these frequent itemsets. It will be very meaningful and efficient to transform these two stages into a single stage and directly mine only a few highquality comprehensible rules. In NARM problem, it is a difficult problem to determine which attributes will be included in the rules to be discovered and which ones will be on the left of the rule and which ones on the right. It is also difficult to automatically adjusting of most relevant ranges for numerical attributes. Directly discovering the rules without generating the frequent itemsets as used in the literature as the first step of ARM accelerates the whole process without determining the metrics needed for this step. In classical ARM algorithms, generally one or two metrics are considered. However, mined rules are needed to be comprehensible, surprising, interesting, accurate, confidential, and etc. in many real-world applications. Adjusting all of these processes without the need for the metrics to be pre-determined for each dataset seems another problem. Intelligent optimization algorithms have been efficiently used due to their many advantages. These algorithms do not require the information about search space and they are population-based and search the optimal solutions in parallel. They do not depend on the type and number of decision variables, the type of search space, and the type and number of constraints. They also do not need well-defined mathematical models. Due to their simplicity of execution and appropriate performance, intelligent optimization algorithms have been extensively applied to different complex real-world problems. Briefly, due to derivation-independent structure, flexibility, ease, and local optimum avoidance, these methods have become outstandingly widespread in recent years. The problem of mining association rules within datasets that contain numerical or quantitative attributes seems one of the application areas of these intelligent search and optimization algorithms. The datasets can be considered as search space and these intelligent optimization algorithms may be adjusted to be the global search methods in order to find the rules that have many characteristics satisfying the needed objectives.
al
The main contributions of this paper are: • To demonstrate that intelligent optimization algorithms are the most suitable methods for data mining problems in numerical data • To perform the efficiency comparison of seven evolutionary and fuzzy evolutionary intelligent algorithms based NARM approaches (Alatasetal, Alcalaetal, EARMGA, GAR, GENAR, Genetic Fuzzy Apriori, and Genetic Fuzzy AprioriDC) for the first time • To compare the obtained results with the classical Apriori algorithm to show the efficiencies of the intelligent algorithms on NARM problem • To perform a comparative analysis of eight algorithms in terms of support, confidence, number of mined rules, number of covered records, and time metrics within eleven real-world datasets.
2.
urn
This paper has been organized as follows. The second section describes and analyzes the related works about NARM problem. In section three, the experimental setup and results have been presented. The last section concluded the paper along with further researches. Related Works
Jo
There are three main approaches to discovering numerical association rules as shown in Fig. 1. The attribute domain is first partitioned into smaller intervals and then adjacent intervals are combined with a larger one to make the combined intervals have enough support. By this, the NARM problem has been transformed into Boolean rules mining one. Partitioning of variables often leads to information loss. Furthermore, in these approaches, an attribute is discretized without taking into account the other attributes and this causes the attribute interaction problem [1]. Lian et al. have proposed DRMiner algorithm, using density to calculate the eigenvalues of quantitative attributes for adopting an effective process to position dense regions. However, requiring many thresholds and different values that leads to great different results may be seen as a disadvantage of this approach [2]. Diverse researchers afterward have used clustering techniques. Lent et al. have proposed a geometric-based BitOP algorithm in order to cluster the numerical attributes [3]. They have shown that clustering may be a potential solution for determining meaningful regions and for mining of association rules. DBSMiner [4] aims to scale up well for high dimensional numerical association rule mining using the notion of density-connected. MQAR (Mining Quantitative Association Rules) is another clustering based approach using a dense grid. It uses DGFP-tree to cluster dense subspaces [5].
Journal Pre-proof Fuzzy sets have also been used for NARM [6, 7]. Prakash and Parvathi have converted the numerical attributes into fuzzy binary attributes utilizing thread based mechanism [7]. In [8], hard clustering and fuzzy clustering have been used for the quantization. However, clustering results are not desirable when the scales of clusters differ a lot. Determining the most suitable types of fuzzy sets for each attribute, determining the suitable characteristics of membership functions, and etc. are problems in fuzzy sets based approaches. These approaches also consist of preprocessing and mine the rules within the changed data.
Biology based
Particle Swarm Optimization
Genetic Algorithm
Wolf Search Algorithm
Differential Evolution Algorithm
Physics based
Gravitational Search Algorithm
Arficial Immune System
Fig. 1 Methods for the problem of NARM
Discretization
Mean based
Fuzzifying
Median based
Clustering
Variance based
Partitioning and Combining
Pr e-
Swarm Intelligence
Distribution
p ro
Optimization
of
Numerical Association Rules Mining
Jo
urn
al
Throughout the process proposed in many these approaches, the initial discretization and the follow-up rules mining steps are independently performed. The initial discretization may make the loss of the original information, and make mistakes in the rules mining process. The work proposed by Aumann and Lindell has introduced a new definition of numerical association rules based on statistical inference theory. They have used many distribution measures such as mean, median, and variance [9]. Swarm, biology, and physics-based intelligent optimization algorithm have also been proposed in order to efficiently mine numerical association rules. The study proposed in [10] uses chaos numbers integrated particle swarm optimization to automatically mine association rules within datasets that have numerical attributes. The authors have proposed a rough particle swarm optimization algorithm that simultaneously finds for intervals of numerical attributes and the rule mining that these intervals conform [11]. Parallel particle-oriented and dataoriented PSO algorithms have also been proposed for NARM on huge datasets [12]. In this work, reducing the computational cost has been aimed to be reduced by parallelizing the PSO in order to obtain more scalable and efficient results. Wolf search that is based on the wolves hunting behavior is another swarm intelligence based optimization algorithm that has been used for NARM [13]. However, there exists no experiment nor any simulation in this paper and the methodology has been only theoretically explained. Genetic algorithm (GA) [14, 15] has been used by different researchers for numerical rules mining. One of the first works in this research area has been proposed in [16]. The GAR method proposed in this study finds only frequent itemsets by maximizing the support metric with GA and then the rules are extracted from these sets. The confidence values of rules are not optimized in this work. The same authors have proposed GENAR that uses the evolutionary algorithm for mining the numerical rules without generating frequent itemsets in [17]. Álvarez and Vázquez have worked on large and small datasets containing discrete and numerical data to discover numerical association rules [18]. QUANTMINER is also another GA based NARM system [19, 20]. It dynamically discovers good and small intervals in association rules and maximizes both the confidence and the support values. GA has also been efficiently proposed for discovering association rules from numerical data for smart cities [21]. However, there is not any simulation or experimental results about data in the smart cities. A real-coded GA (RCGA) has been proposed to mine numerical association rules from atmospheric climatology data [22]. Association rules between climatological attributes and pollutant agents have been mined for forecasting purposes. Yan et al. have proposed EARMGA that uses relative confidence as the fitness function for directly
Journal Pre-proof
3.
of
discovering numerical association rules [23]. The efficiency of GAR, GENAR, and EARMGA over Apriori and Eclat algorithm has been presented on two public datasets in [24]. Differential evolution [25, 26] has been used for mining of association rules in datasets consist of categorical and numerical attributes [27]. Not only positive rules both also negative numerical association rules have been mined with the approach proposed by Alatas and Akin [28]. An immune mechanism is introduced into the classical GA and a new model of immune GA has been proposed by Yang for the discovery of association rules within datasets that consist of both discrete and continuous attributes [29]. However, using binary encoding and considering only support metric in the fitness function are the disadvantages of this approach. The physics-based intelligent optimization algorithm, namely gravitational search algorithm has also been efficiently modeled for automatically mining of numerical association rules by eliminating the requirements for the pre-processing of the data and attribute interaction problem [30]. Experimental Results
p ro
The eleven real-world datasets obtained from [31] have been selected for the performance analysis of intelligent optimization algorithms. The information about the used datasets all of the attributes of which are numeric has been listed in Table 1. All of the experiments have been performed using Intel® Core™ i7-3630QM, 2.40 GHz CPU with 8GB of memory and running Windows 10. Table 1 Datasets Records 96 252 40 16 60 200 2178 57 950 186 52
Attributes 5 18 8 7 16 11 4 8 10 61 4
Pr e-
Dataset Basketball Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines Vineyard
Jo
urn
al
The parameters of the seven intelligent optimization algorithms and Apriori algorithm have been given in Table 2. The parameter values of the algorithm listed in Table 2 are the default values given in the articles. However, in order to evaluate the algorithms under equal conditions, the number of evaluations has been selected as 10000 and the number of population has been chosen as 50 in all algorithms. All of the algorithms excluding Apriori have been executed 10 times and the obtained values for comparison metrics have been recorded. The average support values obtained from all of the intelligent optimization algorithms and Apriori algorithm within eleven datasets have been shown in Table 3. According to this table, the algorithm proposed in [28] has outperformed in five out of eleven datasets. However, this algorithm has not produced any rule within three datasets namely; Quake, Sleep, and Triazines according to the determined parameters. GAR seems the second best algorithm in terms of obtained average support values. In two out of eleven datasets, GENAR has found the highest value for average support metrics and it is the third best algorithm. Fig. 2 demonstrates the average support values obtained from algorithms within eleven real-world numerical datasets. Apriori based algorithms seem worse performance than the other algorithms in terms of support. This is due to the partitioning of variables that leads to information loss. Furthermore, in Apriori based algorithms, an attribute is discretized without taking into account the other attributes and this causes the attribute interaction problem. That is why, intelligent optimization algorithms seem to outperformed the Apriori based approaches. When evaluated in terms of number of attributes; in the databases with few attributes (basketball, vineyard, bolts), the algorithm proposed in [28] has the best performance in terms of average support. It is seen that GAR algorithm has the best performance in terms of average support in databases with high number of attributes (pollution, triazines). In the databases with a low number of records (basketball, bolts, vineyard), the algorithm proposed in [28] has the best performance in terms of average support. Table 2 Algorithm parameters Algorithms
Parameters
[28]
Number of Evaluations: 10000, Initial Random Chromosomes: 6, r-dividing Points: 3, Tournament Size: 10, Probability of Crossover: 0.7, Min Probability of Mutation: 0.05, Max Probability of Mutation:
Journal Pre-proof 0.9, Importance of Rules Support: 5, Importance of Rules Confidence: 20, Importance of Number of Involved Attributes: 0.05, Importance of Intervals Amplitude: 0.02, Importance of Number of Records Already Covered: 0.01, Amplitude Factor: 2.0 Number of Evaluations: 10000, Number of Population: 50, Number of Bits per Gene: 30, Factor for Parent Centric BLX Crossover: 1.0, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Partitions for Numeric Attributes: 4, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Fixed Length of Association Rules: 2, Probability of Selection: 0.75, Probability of Crossover: 0.7, Probability of Mutation: 0.1, Number of Partitions for Numeric Attributes: 4 Number of Evaluations: 10000, Number of Population: 50, Number of Itemsets: 100, Probability of Selection: 0.25, Probability of Crossover: 0.7, Probability of Mutation: 0.1, Importance of Number of Records Already Covered: 0.4, Importance of Intervals Amplitude: 0.7, Importance of Number of Involved Attributes: 0.5, Amplitude Factor: 2.0, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Number of Association Rules: 30, Probability of Selection: 0.25, Probability of Mutation: 0.1, Penalization Factor: 0.7, Amplitude Factor: 2.0 Number of Evaluations: 10000, Number of Population: 50, Probability of Mutation: 0.01, Probability of Crossover: 0.8, Parameter d for MMA Crossover: 0.35, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8 Number of Evaluations: 10000, Number of Population: 50, Probability of Mutation: 0.01, Probability of Crossover: 0.8, Parameter d for MMA Crossover: 0.35, Number of Fuzzy Regions for Numeric Attributes: 3, Minimum Support: 0.1, Minimum Confidence: 0.8
[32]
Apriori [1] EARMGA [23]
GENAR [17]
Genetic Fuzzy Apriori [33] Genetic Fuzzy AprioriDC [34]
Table 3 Average support Algorithm Dataset
[28]
[32]
Apriori [1]
EARMGA [23]
GAR [16]
Basketball
0.98
0.23
0.15
0.27
Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep
0.93 0.75 0.13 0.02 0.96 -
0.14 0.16 0.13 0.26 0.17
0.15 0.15 0.17 0.13 0.13 0.25
0.05 0.41 0.27 0.25 0.47 0.19 -
Stock price Triazines Vineyard
0.01 0.95
0.15 0.27
0.26 0.05 0.33
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
0.75
0.27
0.16
0.14
0.70 0.21 0.14 0.65 -
0.44 0.13 0.21 0.20 0.01 0.50 -
0.13 0.15 0.13 0.12 0.13 0.22 0.14
0.14 0.15 0.13 0.22 0.17
0.39 0.29 0.74
0.27 0.03 0.45
0.13 0.16
0.12 0.16
al
Pr e-
GENAR [17]
urn
1.2 1 0.8 0.6 0.4
Jo
0.2 0
0.21 0.13 0.15
p ro
of
GAR [16]
Basketball Bodyfat
Bolts
Longley
Pollution
Pwlinear
Quake
Sleep
Stock price Triazines
[28]
[32]
Apriori [1]
EARMGA [23]
GAR [16]
GENAR [17]
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
Vineyard
Fig. 2 Average support values of the rules obtained from algorithms within datasets Table 4 shows the average confidence values obtained from the intelligent optimization algorithms and Apriori algorithm within eleven datasets. The EARMGA seems the best algorithm in terms of obtained confidence values according to the determined minimum confidence value. This algorithm has the best
Journal Pre-proof
Table 4 Average confidence Algorithm Dataset
[28]
[32]
Apriori [1]
EARMGA [23]
GAR [16]
Basketball
1.00
0.90
0.87
1.00
0.88
Bodyfat
1.00
-
0.93
1.00
0.93
Bolts
1.00
0.95
0.99
1.00
0.97
Longley
1.00
0.96
1.00
1.00
1.00
Pollution
1.00
-
0.95
1.00
0.91
Pwlinear
1.00
0.83
0.88
1.00
-
Quake
-
0.89
0.91
1.00
-
0.92
Sleep Stock price Triazines Vineyard
-
0.95
0.97
-
-
1.00
0.93
0.91
1.00
-
0.93
0.93
1.00 1.00
Genetic Fuzzy AprioriDC [34]
p ro
Genetic Fuzzy Apriori [33]
0.97
0.85
0.98
0.92
-
1.00
0.96
0.97
1.00
0.94
0.95
0.99
0.90
-
1.00
0.81
0.86
0.89
0.90
-
0.89
0.95
0.89
0.92
0.87
0.91
0.91 0.91
1.00 0.99
0.87
0.88
Pr e-
1.00
GENAR [17]
of
performance in ten out of eleven numerical datasets. The work proposed by Alatas and Akin [28] is the second best algorithm and it is more successful in eight out of eleven datasets. GENAR seems the third best algorithm and it outperforms in five out of eleven datasets among eight algorithms. Fig. 3 demonstrates the average confidence values of the rules obtained from the intelligent optimization algorithms and Apriori algorithm within eleven datasets. Similar to the results obtained in terms of support values, Apriori based algorithms do not outperform the other algorithms in terms of confidence. Apriori based algorithms change the data that is likely to accommodate the accurate and interesting rules outside the respective ranges. Because of attribute interaction, important accurate rules have not been found outside of these boundaries of the related attributes. It seems that discretization has led to information loss because these intervals have not been automatically found in the data mining process without a preprocessing such as discretization. Integrating the automated discovery of ranges of continuous attributes, reduction of attributes, and the mining of numerical rules into a single step is very meaningful in terms of accuracy.
0.86
al
The number of generated association rules via the algorithms has been listed in Table 5. As seen in this table, GAR and the algorithm proposed in [28] have mined fewer rules than the other intelligent optimization and search algorithms with respect to the eleven datasets. Less number of discovered association rules may be more readable and comprehensible. High-quality few rules are more meaningful than huge numbers of complex, unreadable, and not high-quality rules. In this perspective, Apriori and Genetic Fuzzy AprioriDC algorithms have found more unreadable rule set according to the selected parameters.
urn
1.2 1 0.8 0.6 0.4 0.2 0
Bolts
Jo
Basketball Bodyfat
Longley
[28] EARMGA [23] Genetic Fuzzy Apriori [33]
Pollution
Pwlinear
Quake
[32] GAR [16] Genetic Fuzzy AprioriDC [34]
Sleep
Stock price Triazines
Vineyard
Apriori [1] GENAR [17]
Fig. 3 Average confidence values of the rules obtained from algorithms within datasets Table 5 Number of generated association rules Algorithm Dataset
[28]
[32]
Apriori [1]
EARMGA [23]
GAR [16]
GENAR [17]
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
Journal Pre-proof Basketball
7
706
4
50
2
30
5
Bodyfat Bolts
63
47
-
24743
50
76
30
120418
-
21
3241
1246
50
28
30
80
475
Longley
7
25046
1406
50
21
16
232
1513
Pollution
16
-
41510
50
31
30
19708
-
Pwlinear
34
21
50
-
30
1
9
-
121
18
50
-
30
14
24
Sleep Stock price Triazines
-
24415
1096
-
-
-
44
2242
1
983275
855
50
1
30
9
7850
-
-
-
50
18
30
Vineyard
5
283
11
50
2
30
-
-
21
55
p ro
Table 6 Number of covered records (%)
of
39
Quake
[28]
[32]
Basketball
100.00
100.00
33.34
100.00
96.88
89.59
69.79
100.00
Bodyfat
99.21
-
100.00
73.02
100.00
81.75
100.00
-
Bolts
80.00
100.00
97.50
100.00
97.50
35.00
100.00
100.00
Longley
12.50
100.00
100.00
100.00
68.75
100.00
100.00
100.00
Pollution
1.67
-
100.00
100.00
100.00
46.67
100.00
-
Pwlinear
100.00
100.00
84.50
100.00
-
15.0
61.50
95.50
Quake
-
100.00
90.55
100.00
-
81.92
94.26
97.25
Sleep Stock price Triazines
-
100.00
96.78
-
-
-
96.77
100.00
0.11
100.00
99.48
100.00
44.00
85.69
46.95
100.00
-
-
-
84.41
-
-
98.08
100.00
50.00
100.00
96.78 88.47
27.96
Vineyard
94.24
100.00
100.00
Apriori [1]
EARMGA [23]
GAR [16]
GENAR [17]
Pr e-
Algorithm Dataset
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
urn
al
The number of covered records according to the mined rules from the algorithms within the datasets has been shown in Table 6. According to these results, EARMGA and the algorithm proposed by Alcalá-Fdez et al. have shown better performance in terms of covered records. Both have outperformed in eight out of eleven datasets [32]. By performing more sensitivity analysis for the algorithm parameters, more successful results may be obtained. Due to the stochastic characteristics of intelligent optimization algorithms, there is not a unique best algorithm for all the datasets considering all metrics. This is coherent with no-free-lunch theorem that states there is not a general-purposed universal optimization algorithm which outperforms the other methods. If one intelligent optimization algorithm is specialized to the structure of the specific problem under consideration, it can outperform other algorithms. The obtained number of frequent itemsets within the datasets has been listed in Table 7. EARMGA, GENAR, and the approach proposed by Alatas and Akin [28] directly mine the rules without generating the frequent itemsets. Hence they have been excluded from the table. Table 7 Number of frequent itemsets [32]
Apriori [1]
GAR [16]
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
66 11103 707 568 14730 417 39 508 615 -
100 100 100 100 100 100 100 100 100
64 32564 97 155 8076 53 31 86 41 -
177 347 639 259 49 1037 3215 -
Jo
Algorithm Dataset Basketball Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines
412 1720 7196 530 132 7070 198206 -
Journal Pre-proof Vineyard
203
47
100
44
88
Table 8 shows the average time in seconds for the algorithms to mine numerical association rules within each dataset. GENAR seems the best algorithm within six out of eleven datasets. Apriori is the second best algorithm among other algorithms. Intelligent optimization algorithms have outperformed Apriori based approaches in other metrics. However, the relative better performance of Apriori with respect to time metric is due its deterministic characteristics. Table 8 Time [28]
[32]
Apriori [1]
EARMGA [23]
GAR [16]
GENAR [17]
Basketball
1.447
0.829
1.440
0.161
0.835 0.398 0.224 0.408 0.778 0.966 0.017
0.736 1.115 2.654 8.866 1.577
0.056 1.416 0.199 0.151 1.425 0.083 0.122 0.199
0.404
Bodyfat Bolts Longley Pollution Pwlinear Quake Sleep Stock price Triazines Vineyard
0.380 0.252 0.167 0.263 0.389 1.554 0.017
2.904 1.116 1.200 1.325 1.805 12.372 0.016
0.280 0.121 0.081 0.175 0.167 0.684 0.016
0.493
894.274
0.306
0.774
5.069
0.396 0.359
0.458
0.040
0.380 0.211
54.352 1.120
Genetic Fuzzy Apriori [33]
Genetic Fuzzy AprioriDC [34]
of
Algorithm Dataset
0.447 0.300 0.278 0.549 3.510 0.383
0.418
8.254
2.308
0.329 0.113
0.409
0.255
Pr e-
p ro
0.724
14.741 0.560 0.341 2.213 2.247 8.841 0.679
Table 9-Table 19 demonstrate the sample rules mined from the algorithms within the eleven datasets. The first column Alg represents the name of the algorithm; the Rule represents the mined association rules. Conf, Sup, and Lift are shown for the confidence, support, and the lift values of the rules, respectively. Due to the fuzzy rules, the Lift values have not been included in this table. GENAR seems to have the worst comprehensibility according to the rules within the used dataset. The number of attributes in the numerical association rules obtained from GENAR is the highest among other algorithms. Table 9 Sample rules mined from the algorithms within basketball dataset
[17] [33] [34]
al
Basketball Rule points ∉ [0.6319, 0.829] ⇒ assists ∉ [0.2772, 0.3436] ᴧ age ∈ [22, 37] assists ∈ [low] ᴧ age ∈ [high] ⇒ height ∈ [middle] assists ∈ [0.19655, 0.270125] ᴧ age ∈ [25.75, 29.5] ⇒ height ∈ [181.5, 192.25] points ∈ [0.1593, 0.4942] ⇒ time ∈ [10.08, 40.71] points ∈ [0.2938, 0.5777] ⇒ height ∈ [183, 198] assists ∈ [0.09132, 0.2384] ᴧ height ∈ [183, 203] ᴧ time_played ∈ [20.2425, 35.5575] ᴧ age ∈ [22, 28] ⇒ points ∈ [0.26705, 0.60195] assists ∈ [low] ⇒ height ∈ [high] assists ∈ [middle] ᴧ height ∈ [middle] ᴧ time_played ∈ [high] ⇒ points ∈ [middle]
urn
Alg [28] [32] [1] [23] [16]
Conf 1.00 1.00 0.93 1.00 0.89
Sup 1.00 0.15 0.14 0.79 0.75
Lift 1.00 4.06 1.00 1.02
1.00
0.28
1.11
0.81 0.97
0.27 0.11
-
1.00
0.98
1.02
1.00 1.00 0.99
0.47 0.37 0.83
2.16 1.00 1.11
1.00
0.47
1.19
0.91
0.56
-
1.00
0.75
1.00
Table 10 Sample rules mined from the algorithms within bodyfat dataset
[1] [23] [16] [17] [33]
Bodyfat Neck ∉ [43.9001, 51.2] ᴧ chest ∈ [83.4, 121.6] ᴧ knee ∉ [35.5001, 35.5999] ᴧ forearm ∈ [23.1, 33.8] ⇒ biceps ∉ [39.10, 45.0] ˄ wrist ∈ [16.1, 20.9] ᴧ class ∉ [40.10, 47.4999] density ∈ [1.05195, 1.080425] ⇒ height ∈ [65.6875, 77.75] forearm ∈ [21.0, 27.95] ⇒ class ∈ [0.0, 47.5] thigh ∈ [52.1984, 66.5827] ⇒ weight ∈ [138.9796, 224.9768] density ∈ [1.033825, 1.090775] ᴧ age ∈ [32.25, 61.75] ᴧ weight ∈ [118.5, 212.6625] ᴧ height ∈ [54.6875, 77.75] ᴧ neck ∈ [31.475, 41.525] ᴧ chest ∈ [84.175, 112.625] ᴧ abdomen ∈ [69.4, 105.7750] ᴧ hip ∈ [85.0, 110.875] ᴧ thigh ∈ [49.175, 69.225] ᴧ knee ∈ [33.675, 41.725] ᴧ ankle ∈ [19.1, 25.2] ᴧ biceps ∈ [27.3499, 37.4499] ᴧ forearm ∈ [24.9249, 31.875] ᴧ wrist ∈ [16.4, 19.2] ⇒ class ∈ [5.02499, 28.775] weight ∈ [low] ⇒ hip ∈ [low]
Jo
[28]
Table 11 Sample rules mined from the algorithms within bolts dataset [28]
Bolts run ∉ [26.0, 27.0] ᴧ speed2 ∉ [1.5001, 2.4999] ᴧ sens ∉ [7.0, 9.0] ⇒ number2 ∈ [0.0, 2.0]
Journal Pre-proof [32] [1] [23] [16] [17] [33] [34]
t20bolt ∈ [middle] ⇒ time ∈ [low] t20bolt ∈ [7.32, 27.825000000000003] ⇒ time ∈ [3.94, 36.457499999999996] time ∈ [3.94, 68.975] ᴠ [101.49249, 134.01] ⇒ speed2 ∈ [1.5, 2.0] ᴠ [2.25, 2.5] time ∈ [5.1752, 40.3566] ⇒ sens ∈ [6.0, 10.0] run ∈ [1.0, 19.0] ᴧ speed1 ∈ [3.0, 5.0] ᴧ total ∈ [15.0, 25.0] ᴧ speed2 ∈ [1.75, 2.25] ᴧ number2 ∈ [1.0, 1.0] ᴧ sens ∈ [6.0, 10.0] ᴧ time ∈ [3.94, 48.0975] ⇒ t20bolt ∈ [7.32, 36.085] total ∈ [low ] ⇒ time ∈ [low] t20bolt ∈ [low] ⇒ time ∈ [low]
0.99 1.00 1.00 1.00
0.50 0.63 0.98 0.78
1.30 1.00 1.12
1.00
0.15
1.43
1.00 0.98
0.26 0.48
-
1.00 0.99 1.00 1.00 1.00
0.13 0.43 0.25 0.57 0.19
3.21 4.00 1.00 5.34
1.00
0.25
2.29
0.99 0.97
0.32 0.31
-
1.00 1.00 1.00 0.98
0.02 0.57 0.99 0.74
1.20 1.77 1.00 1.13
1.00
0.24
1.23
0.97
0.38
-
1.00
1.00
1.00
0.85 1.00 1.00
0.29 0.15 1.00
1.87 1.00
1.00
0.01
1.31
0.81 0.90
0.13 0.22
-
0.91 0.91 1.00
0.59 0.66 0.73
1.01 1.00
0.96
0.56
1.01
0.86 0.88
0.50 0.54
-
Table 12 Sample rules mined from the algorithms within longley dataset
[33] [34]
of
[17]
Longley year ∉ [1949.0, 1962.0] ⇒ unemployed ∉ [2357.0, 4806.0] gnp ∈ [low] ⇒ population ∈ [low] gnp ∈ [234289.0, 314440.25] ⇒ deflator ∈ [83.0, 91.475] employed ∈ [1.5, 2.0] ᴠ [2.25, 2.5] ⇒ unemployed ∈ [1870.0, 4806.0] year ∈ [1952.0, 1954.0] ⇒ gnp ∈ [346996.6894, 365755.9439] deflator ∈ [92.725, 109.675] ᴧ gnp ∈ [317317.75, 477620] ᴧ unemployed ∈ [2170, 3638] ᴧ armed_forces ∈ [2514, 3582] ᴧ population ∈ [111770, 123006] ᴧ year ∈ [1952, 1958] ⇒ employed ∈ [63424, 68614] population ∈ [high] ⇒ deflator ∈[high] deflator ∈ [high] ⇒ armed_forces ∈ [middle]
p ro
[28] [32] [1] [23] [16]
Table 13 Sample rules mined from the algorithms within pollution dataset
[17] [33]
Pollution wwdrk ∈ [51.9, 51.9] ᴧ so ∉ [3.0001, 278.0] ⇒ educ ∉ [9.0, 12.1999] jant ∈ [25.75, 39.5] ᴧ nox ∈ [1.0, 80.5] ⇒ hc ∈ [1.0, 162.75] humid ∈ [46.75, 73.0] ⇒ nox ∈ [1.0, 80.5][160.0, 319.0] jant ∈ [22.7769, 40.1764] ⇒ hc ∈ [3.7535, 67.7596] prec ∈ [23.5, 48.5] ᴧ jant ∈ [14.25, 41.75] ᴧ jult ∈ [65.5, 76.5] ᴧ ovr65 ∈ [7.25, 10.35] ᴧ popn ∈ [3.1375, 3.4425] ᴧ educ ∈ [10.575, 12.2250] ᴧ hous ∈ [75.525, 87.475] ᴧ dens ∈ [1643.3739, 5570.3961] ᴧ nonw ∈ [0.8, 18.225] ᴧ wwdrk ∈ [38.8250, 51.775] ᴧ poor ∈ [9.4, 16.45] ᴧ hc ∈ [1.0, 165.75] ᴧ nox ∈ [1.0, 94.5] ᴧ so ∈ [1.0, 128.25] ᴧ humid ∈ [49.25, 66.75] ⇒ mort ∈ [842.6283, 1003.8398] nox ∈ [low] ⇒ hc ∈ [low]
Pr e-
[28] [1] [23] [16]
Table 14 Sample rules mined from the algorithms within pwlinear dataset
[32] [1] [23] [17] [33] [34]
Pwlinear a5 ∉ [1.0E-4, 0.9999] ᴧ a6 ∉ [-0.9999, -1.0E-4] ⇒ a1 ∉ [-0.9999, 0.9999] ᴧ a9 ∉ [1.0E-4, 0.9999] ᴧ a10 ∉ [0.9999, -1.0E-4] a6 ∈ [middle] ⇒ class ∈ [middle] class ∈ [-10.81, -5.08500] ⇒ a1 ∈ [0.5, 1.0] a7 ∈ [-1.0, 1.0] ⇒ a2 ∈ [-1.0, 0.0] ᴠ [0.5, 1.0] a1 ∈ [-1.0, -0.5] ᴧ a2 ∈ [-1.0, -0.5] ᴧ a3 ∈ [-1.0, -0.5] ᴧ a4 ∈ [0.5, 1.0] ᴧ a5 ∈ [0.5, 1.0] ᴧ a6 ∈ [-0.5, 0.5] ᴧ a7 ∈ [-1.0, -0.5] ᴧ a8 ∈ [-0.5, 0.5] ᴧ a9 ∈ [-0.5, 0.5] ᴧ a10 ∈ [-0.5, 0.5] ⇒ class ∈ [-8.075, 3.375] a3 = low ⇒ class = middle class ∈ [low] ⇒ a1 ∈ [high]
al
[28]
[17] [33] [34]
Quake latitude ∈ [middle] ⇒ focal_depth ∈ [low] richter ∈ [5.8, 6.075] ⇒ focal_depth ∈ [0.0, 164.0] richter ∈ [5.8, 6.075] ⇒ latitude ∈ [-66.49, 78.15] focal_depth ∈ [0.0, 256.0] ᴧ latitude ∈ [-17.1099, 55.2099] ᴧ longtitude ∈ [-7.71, 172.27] ⇒ richter ∈ [5.8, 6.3204] richter ∈ [low ] ⇒ focal_depth ∈ [low ] richter ∈ [low] ⇒ focal_depth ∈ [low]
Jo
[32] [1] [23]
urn
Table 15 Sample rules mined from the algorithms within quake dataset
Table 16 Sample rules mined from the algorithms within sleep dataset [32] [1] [33] [34]
Sleep brain_weight ∈ [ low] ⇒ body_weight ∈ [ low] body_weight ∈ [0.005, 1663.5037] ⇒ brain_weight ∈ [0.14, 1428.105] brain_weight ∈ [low]ᴧmax_life_span ∈ [low] ⇒ body_weight ∈ [low] body_weight ∈ [ low] ⇒ brain_weight ∈ [low]
1.00 1.00 0.98 0.98
0.93 0.97 0.25 0.87
1.04 -
Table 17 Sample rules mined from the algorithms within stock price dataset [28] [32] [1]
Stock Price company5 ∈ [36.5, 36.5] ᴧ company10 ∈ [57.25, 57.25] ⇒ company4 ∈ [42.875, 42.875] company7 ∈ [middle] ⇒ company4 ∈ [middle] company2 ∈ [50.0, 60.25] ᴧ company5 ∈ [60.9375, 77.5312] ᴧ company10 ∈ [34.0, 41.0] ⇒ company1 ∈ [17.219, 28.2892]
1.00 0.91
0.01 0.58
105.56 -
0.98
0.19
3.48
Journal Pre-proof company3 ∈ [12.75, 15.84375] ᴠ [18.9375, 25.125] ⇒ company8 ∈ [16.375, 29.25] company3 ∈ [19.5939, 22.6336] ⇒ company2 ∈ [49.1171, 59.1645] company1 ∈ [17.219, 33.35125] ᴧ company2 ∈ [44.75, 60.25] ᴧ company3 ∈ [17.53125, 23.71875] ᴧ company4 ∈ [36.5625, 49.4375] ᴧ company5 ∈ [48.15625, 81.34375] ᴧ company6 ∈ [23.96875, 34.53125] ᴧ company7 ∈ [58.9375, 73.5625] ᴧ company8 ∈ [16.375, 22.09375] ᴧ company9 ∈ [38.875, 49.625] ⇒ company10 ∈ [34.0, 46.625] company1 ∈ [middle] ᴧ company9 ∈ [middle] ᴧ company10 ∈ [low] ⇒ company8 ∈ [middle] company3 ∈ [middle] ᴧ company6 ∈ [middle] ⇒ company9 ∈ [middle]
[23] [16] [17] [33] [34]
1.00 0.89
0.78 0.39
1.00 1.80
1.00
0.25
1.99
0.93 0.93
0.12 0.26
-
Table 18 Sample rules mined from the algorithms within traizenes dataset Traizenes p1_polar ∈ [0.1, 0.3] ⇒ p5_flex ∈ [0.0, 0.0] p1_polar ∈ [0.4749, 0.6555] ⇒ p1_size ∈ [0.2073, 0.4068] p1_polar ∈ [0.1, 0.3] ᴧ p1_size ∈ [0.1, 0.3] ᴧ p1_flex ∈ [0.1, 0.3] ᴧ p1_h_doner ∈ [0.1, 0.3] ᴧ p1_h_acceptor ∈ [0.1, 03] ᴧ p1_pi_doner ∈ [0.1, 0.3] ᴧ p1_pi_acceptor ∈ [0.1, 0.3] ᴧ p1_polarisable ∈ [0.1, 0.3] ᴧ p1_sigma ∈ [0.1, 0.3] ᴧ p1_branch ∈ [0.1, 0.3] ᴧ p2_polar ∈ [0.1, 0.3] ᴧ p2_size∈[0.1, 0.3] ᴧ p2_flex ∈ [0.1, 0.3] ᴧ p2_h_doner ∈ [0.1, 0.3] ᴧ p2_h_acceptor ∈ [0.1, 0.3] ᴧ p2_pi_doner∈[0.1, 0.3] ᴧ p2_pi_acceptor ∈ [0.1, 0.3] ᴧ p2_polarisable ∈ [0.1, 0.3] ᴧ p2_sigma ∈ [0.1, 0.3] ᴧ p2_branch ∈ [0.1, 0.3] ᴧ p3_polar ∈ [0.1, 0.3] ᴧ p3_size ∈ [0.1, 0.3] ᴧ p3_flex ∈ [0.1, 0.3] ᴧ p3_h_doner ∈ [0.1, 0.3] ᴧ p3_h_acceptor ∈ [0.1, 0.3] ᴧ p3_pi_doner ∈ [0.1, 0.3] ᴧ p3_pi_acceptor ∈ [0.1, 0.3] ᴧ p3_polarisable ∈ [0.1, 0.3] ᴧ p3_sigma ∈ [0.1, 0.3] ᴧ p3_branch ∈ [0.1, 0.3] ˄ p4_polar ∈ [0.3, 0.7] ᴧ p4_size ∈ [0.40, 0.8] ᴧ p4_flex ∈ [0.243, 0.643] ᴧ p4_h_doner ∈ [0.1, 0.3] ᴧ p4_h_acceptor ∈ [0.167, 0.567] ˄ p4_pi_doner ∈ [0.1, 0.3] ˄ p4_pi_acceptor ∈ [0.1, 0.3] ᴧ p4_polarisable ∈ [0.3, 0.7] ᴧ p4_sigma ∈ [0.1, 0.3] ˄ p4_branch ∈ [0.3, 0.7] ˄ p5_polar ∈ [0.1, 0.3] ᴧ p5_size ∈ [0.1, 0.3] ᴧ p5_flex ∈ [0.0, 0.0] ᴧ p5_h_doner ∈ [0.0, 0.0] ᴧ p5_h_acceptor ∈ [0.1, 0.3] ᴧ p5_pi_doner ∈ [0.1, 0.3] ᴧ p5_pi_acceptor ∈ [0.1, 0.3] ᴧ p5_polarisable ∈ [0.1, 0.3] ᴧ p5_sigma ∈ [0.1, 0.3] ᴧ p5_branch ∈ [0.1, 0.3] ᴧ p6_polar ∈ [0.1, 0.3] ᴧ p6_size ∈ [0.1, 0.3] ᴧ p6_flex ∈ [0.1, 0.3] ᴧ p6_h_doner ∈ [0.1, 0.3] ᴧ p6_h_acceptor ∈ [0.1, 0.3] ᴧ p6_pi_doner ∈ [0.1, 0.3] ᴧ p6_pi_acceptor ∈ [0.1, 0.3] ᴧ p6_polarisable ∈ [0.1, 0.3] ᴧ p6_sigma ∈ [0.1, 0.3] ᴧ p1_branch ∈ [0.1, 0.3] ⇒ activity ∈ [0.453, 0.853]
1.00 1.00
0.48 0.33
1.00 2.59
1.00
0.05
1.18
1.00 0.98 1.00 1.00 0.93
0.97 0.41 0.25 0.77 0.74
1.00 4.00 1.00 1.13
1.00
0.49
1.45
0.95 0.92
0.20 0.27
-
p ro
of
[23] [16]
Pr e-
[17]
Table 19 Sample rules mined from the algorithms within vineyard dataset
Vineyard row_number ∉ [48.0, 49.0] ⇒ lugs_1990 ∉ [12.0001, 12.4999] row_number ∈ [middle] ˄ lugs_1990 ∈ [high] ⇒ lugs_1991 ∈ [high] row_number ∈ [39.25, 52.0] ⇒ lugs_1989 ∈ [0.0, 2.0] lugs_1989 ∈ [0.0, 2.0] ᴠ [4.0, 6.0] ⇒ lugs_1990 ∈ [2.5, 14.0] lugs_1990 ∈ [7.8916, 13.1467] ⇒ lugs_1991 ∈ [14.7244, 23.0200] row_number ∈ [2.0, 27.0] ˄ lugs_1989 ∈ [2.5, 6.5] ˄ lugs_1990 ∈ [7.625, 13.375] ⇒ lugs_1991 ∈ [16.125, 26.0] row_number ∈ [high] ⇒ lugs_1989 ∈ [low] row_number ∈ [high] ⇒ lugs_1989 ∈ [low]
[28] [32] [1] [23] [16] [17] [33] [34]
Conclusions
al
4.
Evolutionary intelligent algorithms have better performances in terms of support, confidence, and time metrics. Although the algorithm proposed by Alatas and Akin [28] has not produced any rule within three datasets according to the selected algorithm parameters, this method has outperformed in five out of eleven datasets with respect to the average support. EARMGA seems the best algorithm in terms of obtained average confidence values in ten out of eleven numerical datasets according to the determined standard minimum confidence value. GAR and the algorithm proposed by Alatas and Akin [28] have mined fewer rules than the other intelligent algorithms and Apriori with respect to the eleven datasets and these algorithms seem to find more comprehensible rules. GENAR seems the best algorithm within eight out of eleven datasets with respect to the time metric. However, this algorithm seems to have the worst comprehensibility according to the rules within the used dataset. The number of attributes in the numerical association rules obtained from GENAR is the highest among all other algorithms.
Jo
urn
In this paper, a study on the performances of the evolutionary and fuzzy evolutionary intelligent algorithms for NARM problem has been presented. Performances of seven intelligent algorithms and Apriori algorithm in terms of support, confidence, number of mined rules, number of covered records, and time metrics have been comparatively performed with eleven real-world datasets that have numeric-valued attributes. One of the best mined rules obtained by each algorithm has also been given and analyzed with respect to confidence, support, and lift metrics. By evaluating the results over eleven real-world numerical datasets, following conclusions about the effectiveness of these methods can be deduced:
Journal Pre-proof
of
According to the obtained results, intelligent search and optimization algorithms seem the best alternatives for the complex numerical association rules mining problem. The search space is very huge and there are many metrics that should be optimized for this type of problem. Automating the rule discovery process is also efficiently performed by intelligent algorithms by eliminating the pre-need of the many metrics and preprocesses of the data. Attribute interaction problem is also eliminated. Due to the promising results and high performance of intelligent algorithms in NARM problem, many new and efficient intelligent search and optimization algorithms can be adapted for better results. Their multiobjective versions can also be proposed for simultaneously satisfying the different objectives for numerical association rules. Recent intelligent optimization algorithms in the literature can also be hybridized to provide better results for this problem. References
[6] [7] [8]
[9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
[19] [20]
p ro
[5]
Pr e-
[4]
al
[3]
urn
[2]
Srikant R, Agrawal R (1996) Mining quantitative association rules in large relational tables. In: Proceedings of ACM SIGMOD pp. 1–12 Lian W, Cheung DW, Yiu SM (2015) An efficient algorithm for finding dense regions for mining quantitative association rules. Computers & Mathematics with Applications 50(3-4):471-490 Lent B, Swami A, Widom J (1997) Clustering association rules. In: Proceedings of IEEE international conference on data engineering pp. 220–231 Guo Y, Yang J, Huang Y (2008) An effective algorithm for mining quantitative association rules based on high dimension cluster. In fourth IEEE International Conference on Wireless Communications, Networking and Mobile Computing pp. 1–4 Junrui Y, Feng Z (2010) An effective algorithm for mining quantitative associations based on subspace clustering. In 2nd International Conference on Networking and Digital Society (ICNDS) IEEE 1:175–178 Abood AM, Hussein MS, Namdar JH (2018) Data mining using association rules with fuzzy logic. Scholars Press Prakash S, Parvathi R (2011) Qualitative approach for quantitative association rule mining using fuzzy rule set. Journal of Computational Information Systems 7(6):1879–1885 Watanabe T, Takahashi H (2011) A study on quantitative association rules mining algorithm based on clustering algorithm. International Journal of Biomedical Soft Computing and Human Sciences: The official journal of the Biomedical Fuzzy Systems Association 16(2):59-67 Aumann Y, Lindell Y (1999) A statistical theory for quantitative association rules. In Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 261-270 Alatas B, Akin E (2009) Chaotically encoded particle swarm optimization algorithm and its applications. Chaos, Solitons & Fractals 41(2):939-950 Alatas B, Akin E (2008) Rough particle swarm optimization and its applications in data mining. Soft Computing 12(12):1205-1218 Yan D, Zhao X, Lin R, Bai D (2018) PPQAR: Parallel PSO for quantitative association rule mining. In IEEE International Conference on Big Data and Smart Computing (BigComp) pp. 163-169 Agbehadji IE, Fong S, Millham R (2016) Wolf search algorithm for numeric association rule mining. In IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA) pp. 146-151 Holland JH (1975) Adaption in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI Zhang K, Du H, Feldman MW (2017) Maximizing influence in a social network: Improved results using a genetic algorithm. Physica A: Statistical Mechanics and its Applications, 478:20-30 Mata J, Alvarez JL, Riquelme JC (2002) Discovering numeric association rules via evolutionary algorithm. Lecture Notes in Artificial Intelligence 2336:40–51 Mata J, Alvarez JL, Riquelme JC (2001) Mining numeric association rules with genetic algorithms. In Artificial Neural Nets and Genetic Algorithms Springer, Vienna pp. 264-267 Álvarez VP, Vázquez JM (2012) An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization. Expert Systems with Applications 39(1):585-593 Salleb-Aouissi A, Vrain C, Nortet C, Kong X, Rathod V, Cassard D (2013) QuantMiner for mining quantitative association rules. The Journal of Machine Learning Research 14(1):3153-3157 Seki H, Nagao M (2017) An efficient java implementation of a GA-based miner for relational association rules with numerical attributes. In IEEE International Conference on Systems, Man, and Cybernetics (SMC) pp. 2028-2033
Jo
[1]
Journal Pre-proof
Jo
urn
al
Pr e-
p ro
of
[21] Kumar P, Singh AK (2019) Efficient generation of association rules from numeric data using genetic algorithm for smart cities. In Security in Smart Cities: Models, Applications, and Challenges, Springer, Cham pp. 323-343 [22] Martínez-Ballesteros M, Troncoso A, Martínez-Álvarez F, Riquelme JC (2010) Mining quantitative association rules based on evolutionary computation and its application to atmospheric pollution. Integrated Computer-Aided Engineering 17(3):227-242 [23] Yan X, Zhang C, Zhang S (2009) Genetic algorithm-based strategy for identifying association rules without specifying actual minimum support. Expert Systems with Applications 36(2): 3066-3076 [24] Alcala-Fdez J, Flugy-Pape N, Bonarini A, Herrera F (2010) Analysis of the effectiveness of the genetic algorithms based on extraction of association rules. Fundamenta Informaticae 98(1): 1-14 [25] Storn R, Price K (1995) Differential Evolution: A simple and efficient adaptive scheme for global optimization over continuous spaces. Technical Report TR-95-012, International Computer Science Institute, Berkeley [26] Zhao Y, Li M, Lu X, Tian L, Yu Z, Huang K, Li T (2017) Optimal layout design of obstacles for panic evacuation using differential evolution. Physica A: Statistical Mechanics and its Applications, 465:175194 [27] Fister I, Iglesias A, Galvez A, Del Ser J, Osaba E (2018) Differential evolution for association rule mining using categorical and numerical attributes. In International Conference on Intelligent Data Engineering and Automated Learning, Springer, Cham pp. 79-88 [28] Alatas B, Akin E (2006) An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules. Soft Computing 10(3):230-237 [29] Yang G (2010) Mining association rules from data with hybrid attributes based on immune genetic algorithm. In IEEE Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) 3:1446-1449 [30] Can U, Alatas B (2017) Automatic mining of quantitative association rules with gravitational search algorithm. International Journal of Software Engineering and Knowledge Engineering 27(3):343-372 [31] Guvenir HA, Uysal I (2000) Bilkent university function approximation repository. http://funapp.cs.bilkent.edu.tr/DataSets [32] Alcalá-Fdez J, Alcalá R, Gacto MJ, Herrera F (2009) Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms. Fuzzy Sets and Systems 160(7): 905-921 [33] Hong TP, Chen CH, Wu YL, Lee YC (2006) A GA-based fuzzy mining approach to achieve a trade-off between number of rules and suitability of membership functions. Soft Computing 10(11):1091-1101 [34] Hong TP, Chen CH, Lee YC, Wu YL (2008) Genetic-fuzzy data mining with divide-and-conquer strategy. IEEE Transactions on Evolutionary Computation 12(2):252-265