Available online at www.sciencedirect.com
ScienceDirect
Availableonline onlineatatwww.sciencedirect.com www.sciencedirect.com Available Procedia Computer Science 00 (2018) 000–000
ScienceDirect ScienceDirect
www.elsevier.com/locate/procedia
ProcediaComputer ComputerScience Science00 132 (2018) 1021–1040 Procedia (2018) 000–000 www.elsevier.com/locate/procedia
International Conference on Computational Intelligence and Data Science (ICCIDS 2018)
Using Ensemble StackingC Method and Base Classifiers to Ameliorate Prediction Accuracy of Pedagogical Data c Using Ensemble Method and Base Classifiers to Mudasir StackingC Ashrafa*, Majid Zamanb,Muheet Ahmed Ameliorate Prediction Accuracy ofSrinagar, Pedagogical Data Department of Computer Science, University of Kashmir, 190006 , India
International Conference on Computational Intelligence and Data Science (ICCIDS 2018)
a
c
b Directorate of IT&SS, University of Kashmir, Srinagar, 190006, India b Srinagar, 190006 , India c Department of Computer aScience, University of Kashmir,
Mudasir Ashraf *, Majid Zaman ,Muheet Ahmed
a
Abstract
Department of Computer Science, University of Kashmir, Srinagar, 190006 , India b Directorate of IT&SS, University of Kashmir, Srinagar, 190006, India c Department of Computer Science, University of Kashmir, Srinagar, 190006 , India
Ensemble methods and conventional base class learners have effectively been applied in the realm of educational data mining to ameliorate the accuracy and consistency in prediction. Primarily in the contemporary study, researchers conducted empirical Abstract results on pedagogical real dataset acquired from University of Kashmir, using miscellaneous base classifiers viz. j48, random forest and random tree, to predict the performance of students. However, in the later phase, the pedagogical dataset was subjected Ensemble methods version and conventional base learners with have the effectively appliedtoinameliorate the realm of data to to more proficient of stacking viz.class stackingC, principlebeen objective theeducational performance of mining students. ameliorate the accuracy and consistency in prediction. Primarily in the contemporary study, researchers conducted empirical Furthermore, the dataset was deployed with filtering procedures to corroborate any improvement in results, after the application results on pedagogical real dataset acquired from University of Kashmir, miscellaneous base classifiers viz. j48, random of techniques such as synthetic minority oversampling technique (SMOTE)using and spread sub-sampling method. Moreover, in case forest and random tree, to predict the performance students. the later the pedagogical dataset subjected of ensemble stackingC, hybridization of predicted of output was However, carried outinwith threephase, base classifier vis-a- vis j48, was random forest to proficient version of stacking viz. stackingC, with the principle objective to ameliorate the performance of students. andmore random tree, and the classifier achieved paramount accuracy of 95.65% in predicting the actual class of students. The Furthermore, was noticeably deployed with filtering procedures to corroborate any improvement in results, after theaccuracy application findings havethe bydataset and large corroborated that the stackingC classifier, attained significant prediction of of techniques as synthetic minority oversampling technique (SMOTE) spread sub-sampling method. Moreover, in case 95.96% when such undergone through undersampling (spread sub-sampling) and and 96.11% using oversampling (SMOTE). As a subject of ensemble stackingC, of predicted was carried out with base classifier vis-a-methods vis j48, to random forest of corollary, it calls uponhybridization the researchers to broadenoutput the canvas of literature by three employing the analogous uncover the and random tree, and the classifier achieved paramount accuracy of 95.65% in predicting the actual class of students. The diverse patterns hidden in academic datasets. findings have by and large noticeably corroborated that the stackingC classifier, attained significant prediction accuracy of © 2018 The Published Elsevier Ltd. (spread sub-sampling) and 96.11% using oversampling (SMOTE). As a subject 95.96% whenAuthors. undergone throughbyundersampling This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/) of*corollary, it calls upon the+919797900540 researchers to broaden the canvas of literature by employing the analogous methods to uncover the Corresponding author. Tel.: Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and diverse patterns hidden in academic datasets. E-mail address:
[email protected] Data Science (ICCIDS 2018). 1877-0509 © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and * Corresponding author. Tel.: +919797900540 Data Science (ICCIDS 2018). E-mail address:
[email protected] 1877-0509 © 2018 The Authors. Published by Elsevier B.V.
Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018).
1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the International Conference on Computational Intelligence and Data Science (ICCIDS 2018). 10.1016/j.procs.2018.05.018
Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
2
© 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of theMudasir scientific committee of the International Conference on Computational Intelligence and 1022 Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Data Science (ICCIDS 2018).
Keywords: Ensemble; RandomForest; RandomTree; SMOTE; StackingC;
1. Introduction From the last decade in the realm of educational data mining (EDM), several data mining techniques have been exploited by researchers with the purpose of discovering hidden patterns from educational settings. But, due to the eruption of self-governing academic data sources, there is a growing demand for application of effective data mining techniques in this direction, so that performance of students can be improvised at large. In particular, performance prediction is another foundation stone of entirely personalised schooling surroundings and also an imperative corner to deliver quality edification. The stakeholders are seeking to consolidate predictive components into their pedagogical environments to support students. Furthermore, EDM and utilisation of various tools have shown a significant improvement in the advancement of student’s carrier. Therefore as a matter of consequence, the ability to foretell the performance of students becomes imperative in academic settings. Ensembles have been more found effective in the growth of educational data mining and machine learning. Although methods such as data mining and machine learning have been found most common for predicting the performance of students, as classification is one of the frequent approaches employed by researchers to make distinction among different categories of students. However, from the past studies, it is comprehensible that there have been meagre attempts employed, wherein ensembles are applied in the field of academia. Therefore, it becomes imperative on researchers to utilise ensembles to unearth potential hidden information from various sources of academia which can be constructive for stakeholders and administrators to leverage the performance of students. In the realm of EDM, applications of ensemble methods are new tendencies to predict the performance of students, which integrates different base level classifiers into one precise classifier. In this paper, we focused on predicting the performance of students by analysing different attributes of academic set using a novel approach of stackingC in educational field. The stackingC is an improved version of stacking method wherein base classifiers such as J48, random tree and random forest were combined and inputted to meta classifier (linear regression at meta level in this case) to predict the outcome of students. During this study the dataset which has been brought under investigation was acquired from university of Kashmir. Prior to the utilisation of ensemble approach, each individual base classifier was run across on current dataset and subsequently performance of each individual classifier was contrasted with stackingC approach. Moreover, the dataset was also subjected to techniques such as oversampling and under sampling to verify whether there has been any substantial development in determining the performance of students. Accordingly, in contemporary study deliberate attempt has been made to improve the outcome performance of students through the application of various methods such as ensemble StackingC, individual base classifiers, oversampling and undersampling. Furthermore, the performance of classifiers was evaluated on various performance metrics including true positive rate (TP rate), false positive rate (FP rate), precision, recall, error rate and others while projecting the performance of students. 2. Literature Review Several studies have centred on predicting the performance of students while taking into consideration different characteristics of a student including income, family background, demography, marks, course and, so on and so forth. Moreover, in this direction various research studies have also been conceded to identify different variables necessary for improvising teaching and learning behaviour of a student. Halees, (2009) conducted a research study with the purpose of identifying, discovering and estimating variables to comprehend the leaning behaviour of students [1]. Moreover researchers, Panday and Pal (2011) applied bayes classification, to forecast the performance of students based on attributes such as background qualifications, category and language of students [2]. Another research team comprising of Hijazi and Naqvi employed linear
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1023 3
regression, on an academic dataset encompassing sample size of 300 students. Both researchers analysed diverse components of the dataset including family income, mother’s literacy, mother’s age, attendance, hours spent on studying. Eventually, they came to the conclusion that factors such as mother’s literacy and family income have a strong correlation and therefore, put together significant impact on student’s achievement. Other researchers specifically Nasiri, Minaei and Vafaei (2012) applied techniques such as decision tree and naive bayes, to examine various attributes linked with the student’s performance and subsequently, boost the rate of student retention [3]. Furthermore, Acharaya and Sinha (2014) employed four classification techniques namely decision tree, artificial neural networks, support vector machines and bayesian networks on various elements of computer science records for early prediction of students performance. Among all the classifiers, decision tree and support vector machines exemplified significant results in determining the outcome of students [4]. Moreover, Abu Saa applied manifold mining techniques such as naive bayes and various categories of decision tree to construct quantitative predictive models which produced proficient and impressive results in determining the output performance of students. The study also demonstrated significant results in case of CART. Jindal and borah (2015) applied diverse categories of decision trees including C5.0, C4.5-A1 and C4.5-A2 for predicting student achievement in higher education [5]. The researchers in addition employed neural networks on the same educational dataset for predictive assessment, and then afterwards, compared the results of different varieties of trees with neural networks with the purpose of achieving paramount significance in predicting the performance of students. Convincingly, C5.0 attained high accuracy of 99.95% in classifying the resultant class. Data mining methods have been comprehensively exploited by researchers for predicting and contriving the performance of students. Amrieh, Hamrini and Aljarah (2016) conducted a study wherein combination of base and meta classifiers were applied across the dataset, to foresee the final class of student’s and as a corollary to mitigate the failure ratio. [6]. The base classifiers encompassed of naive bayes, artificial neural networks and decision tree, whereas meta classifiers comprised of boosting, bagging and random forest. Both the classifiers were applied with the principle objective of exploring optimal prediction accuracy of classifiers while predicting the output class of a student. Moreover, allowing academic stakeholders and administrators to take appropriate actions at the precise time. The research team encompassing of Tripti, Sangeeta and Darminder (2014) employed classification methods including random tree and j48, on academic dataset which comprised of MCA (master of computer applications) students. The researchers examined that random tree performed considerably better than j48, in terms of both time (time taken by the classifier to produce the output) and accuracy to forecast the performance of students [7]. Phung et al. (2014), exploited random forests for categorising class instances which have been observed to be most productive and impressive [8]. In additions, a general survey was conducted to predict the failure of students wherein a developmental algorithm was defined, known as interpretable classification rule mining [9]. The study engaged grammar originated genetic programming whereas [10] investigated 10 (ten) contemporary algorithms for decision trees and rule induction. However, conceding a study using genetic programming approach is enticing, but at the same time can be very pricey on account of intricate parameter adjustments. Abeer and Elaraby (2014), applied id3 algorithm to categorize the students at risk and consequently, predict the final outcome of the students [11]. Kumar and Vijayalakshmi (2011), used decision tree to classify students with weak academic background and students with outstanding achievement, to predict their performance [12]. Pandey and Sharma (2013), conducted a research in which they employed various decision trees such as j48, NBtree, Reptree and CART to foretell the performance of students. Furthermore, the classifiers were evaluated on cross validation and percentage split methods to achieve better results [13]. Ensembles are hybridisation of multiple base classifiers wherein the entire classification is contingent on combined output produced by individual base learners [14, 15]. These methods have been exceedingly powerful expansion of data mining and machine learning techniques, which synthesise manifold classifiers into a single more accurate model [16]. An ensemble model is build with two major goals while blending predictions from multiple models. The foremost intend is to augment the prediction accuracy of a model generated by hybridising of multiple classifiers, over a single base classifier. Secondly, to minimise the over fitting problem in base classifiers, and subsequently boost the classifier’s stability and prediction accuracy. The fundamental standard is that an ensemble method can choose a set of instances from a wide spectrum of hypothesis set and blend their predictions into single prediction [17]. Moreover, Rahman and Tasnim (2014) explain common base classifiers as individual classifiers which are exploited to construct the meta classifiers [18]. The ensemble classifiers have rendered significant results than individual classifiers, unless the base learners are diverse and accurate [19]. Furthermore, ensemble techniques
1024
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
4
perform exceptional when individual classifiers are incoherent in producing the final classes. Various techniques have been put forth to induce randomisation in ensemble methods which includes bagging [20] and boosting [21] techniques. Both the methods impart randomisation by contriving the training data dispensed to each base classifier. The past reassessment has further demonstrated that substantial amount of work has been carried out in academics using various data mining approaches. However hitherto, it is evident very few researches have been conceded wherein ensemble methods have been applied to foretell the performance of students from the educational backdrop. The contemporary study in the direction of employing ensembles was conducted in university of Jordon which would, in the practice of momentous findings, adjoin to heftiness of research concerning demographic attributes, academic background, participation of parents and behavioural factors predict academic achievement. Additionally, as deficiency in terms of application of ensembles for exploring different characteristics of the pedagogical dataset is evident from the past literature. Therefore, it calls for adoption of ensembles in the contemporary endeavours that would be valuable in mining of data from educational repositories. Moreover, ensembles are considered to be more accurate in predicting the data than in contrast to conventional data mining methods. Furthermore, no eloquent efforts have been made in the direction of studying cognitive consciousness of students, intrinsic impetus towards learning behaviour and emotional cognition with regard to its impact on academic achievements. 3. Proposed Method Classification is a method for searching a set of models that characterise and differentiate data classes with the intention of predicting the unknown class labels [22]. In this paper, both ensemble stackingC (3) classifier and conventional classification techniques are employed across the educational data set pertaining to university of Kashmir. The ensemble stackingC (3) in the present study comprises of various base level classifiers including j48, random tree and random forest, whose outputs were combined to acquire the improved prediction results. Moreover, filtered dataset was acquired after performing pre-processing step on the original dataset. The pre-processed data was then classified on a number of classifying algorithms, out of which three best performing classifiers are chosen for producing better prediction accuracy on test data.
Fig. 1 - presents the proposed model.
The figure 1 demonstrates the proposed model wherein the dataset comprises of various characteristics of students and the final class labels. The dataset primarily undergoes the pre-processing stage to remove any inconsistent and
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1025 5
noisy data. Furthermore, to acquire better forecasting results, balancing of instances is performed to accomplish uniform distribution of class labels. The selected base classifiers are trained and tested on data, whose outputs are then combined using ensemble method. Conclusively, the predictions are prepared at the meta level on the basis of compiled predictions from different classifiers, with the intend of mitigating the generalization error. Moreover, preprocessed data is also examined on other classifiers without balancing the dataset; thereby can substantiate better comparisons on balanced and imbalanced data. 4. Objectives Of Our Study Investigating huge academic data can be highly valuable and can determine insights about students teaching and learning behaviours based on which academic collaborators can restructure education curriculum and accordingly frame out suitable strategies. In this direction, various data mining techniques have been exploited to uncover productive patterns from academic datasets. Nevertheless in the contemporary research, rational examination among various attributes of dataset has not been extensively investigated using ensemble method viz. stackingC and base classification techniques. Both the techniques were applied on the academic dataset and compared to other researches in past, with the primary goal to predict the outcome and ameliorate performance of the students with greater accuracy. Furthermore, techniques like SMOTE (Synthetic minority oversampling technique) and spread sub-sampling are applied to eliminate inconsistency in the output classes and subsequently craft the classes in uniform and unbiased form to achieve significant results. 5. Data Acquisition The dataset used in this research was collected from University of Kashmir and it comprised of all the colleges from the Kashmir Division including colleges in North, South and Central Kashmir. The total number of colleges in the current study that were taken under investigation were 24 and important parameters like demographics and college code were derived from registration number during pre-processing phase. Furthermore, the dataset comprised of 9 attributes and 28991 records of Bachelor of Science (BSC) students at an initial stage. Among all the streams of BSC, we investigated only one stream comprising of subjects (English, Biology, Chemistry and Zoology). The total number of frequencies after extraction of records for the said stream turned out to be 1793. The dataset contained details of students including registration number, name, parentage, course code, capacity, subjects, total marks, total obtained and overall Grade. Figure 2 provides the snapshot of our academic dataset before preprocessing.
Fig. 2 shows the raw dataset before pre-processing
1026
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
6
5.1 Data Pre-Processing This step is said to be the significant stage in the process of data mining. Generally factual data is incoherent, noisy and incomplete [23]. Therefore, data has to be selected and transformed into a more consistent and reliable state. As a part of the data pre-processing, tasks such as data cleaning, data transformation, data extraction and so on were performed. Under data pre-processing, the following attributes were removed and subsequently various fields were extracted from the dataset. Name, parentage and registration were removed from the dataset. Null values and fields with less significance were also dropped. Fields such as English, Biology, chemistry and zoology were extracted from an attribute subjects using SQL 2008. Demography was derived from an attribute registration number. Additionally discretisaion was performed on demographic filed as per the requirement of data processing. After performing data pre-processing, a sum of 23 attributes were generated from the original data source against 9 attributes before pre-processing and the same can be visualized in the below mentioned figure 3.
Fig.3 shows the dataset after pre-processing
5.2 Description of Variables The table 1 shows the description of each field of current academic dataset which has been applied across various techniques in the below mentioned sections.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1027 7
Table 1 displays the possible variables of dataset with description S.No.
Fields
Description
1 2 3 4
Demogragphy GE_Paper A (General English Paper A) GE_Paper B (General English Paper B) GE_TOT (General English Total)
Rural, Urban 0-75 (Possible Values) 0-75 (Possible Values) 0-150 (Possible Values)
5 6
BO_Paper 1( Biology Paper 1) BO_Paper 2(Biology Paper 2)
0-50 (Possible Values) 0-50 (Possible Values)
7 8
BO_INTERN( Biology Internals) BO_PRACT (Biology Practical’s)
0-25 (Possible Values) 0-25 (Possible Values)
9 10 11
BO_TOT (Biology Total) CH_Paper 1(Chemistry Paper 1) CH_Paper 2(Chemistry Paper 2)
0-150 (Possible Values) 0-34 (Possible Values) 0-33 (Possible Values)
12 13
CH_Paper 3(Chemistry Paper 3) CH_Extern (Chemistry External)
0-33 (Possible Values) 0-25 (Possible Values)
14 15 16
CH_INTERN (Chemistry Internal) CH_TOT (Chemistry Total) ZO_Paper 1 (Zoology Paper 1)
0-25 (Possible Values) 0-150 (Possible Values) 0-50 (Possible Values)
17 18 19
ZO_Paper 2 (Zoology Paper 2) ZO_INTERN(Zoology Internals) ZO_PRACT (Zoology Practicals)
0-50 (Possible Values) 0-25 (Possible Values) 0-25 (Possible Values)
20
ZO_TOT (Zoology Total)
0-150 (Possible Values)
21 22
TOT_Marks (Total Marks) TOT_OBT (Total Obtained)
600 0-600 (Possible Values)
23
Overall Grade
Division I, Division II, III
Figure 4 shows all attributes of current dataset with suitable histograms. The histograms with multiple colours signify the distribution of output classes viz Division I, Division II and Division III across all variables and can provide pragmatic insights about the dissemination of various classes.
Fig.4 shows complete class distribution of all variables
1028
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
8
6. Results and evaluation This section demonstrates the description and performance evaluation of various classification models, on real academic dataset compiled from university of Kashmir, for our prediction system. Moreover, the dataset is subjected to three classification techniques including J48, random tree and random forest, with and without oversampling and under-sampling techniques, to verify whether there is any substantial improvement in prediction method. Furthermore, ensemble approach namely stackingC, is applied to further corroborate whether there is any significant development in predicting the performance of students. 6.1 Balancing of dataset with oversampling and undersampling techniques. The academic dataset in this study has large non uniformity in the final class as is evident from the figure 5. The discrepancy in final classes such as Division I, Division II, and Division III, can cause biasing in predicting the output class of a student. Therefore, the classifier can be more biased towards the high concentrated classes (Division I and Division II) than low concentrated class (Division III), which can further guide to erroneous classification of students in predicting the resultant class.
Fig.5 shows the imbalance in class distribution
Fig.6 shows class distribution after SMOTE
Taking into contemplation the imbalance nature of output class viz Overall_Grade, an oversampling technique (SMOTE) was applied across academic dataset without replacement. The supervised filtering method was oversampled with 100 percent and with nearest neighbor of 5 to obtain significant results. Moreover, after application of SMOTE across classification variable, the size of the minority class (Division III) was increased from 314 to 628. As, the final class (Overall_Grade) was imbalanced and would have been biased towards the majority class, the output class was therefore oversampled, with the purpose of reducing biasness and accordingly improving the prediction accuracy of classifier. The figure 6 reveals class allotment, subsequent to application of SMOTE method. Furthermore, the real academic dataset was exposed to undersampling technique (spread sub-sampling) to investigate whether there is any significant development in classification accuracy of a classifier. The class distribution of different classes including DIVISION I, DIVISION II and DIVISION III are represented in the underneath figure 7. Additionally, the various results accomplished after utilization of spread sub-sampling methods are illustrated in below subsections and results corroborate.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000 350 300
314
314
1029 9
314
250 200 150 100 50 0 DIVISION I
DIVISION II
DIVISION III
Fig.7 shows class distribution after spread sub-sampling
6.2 Various Evaluation measurements of current study In this study, researchers used various estimates for the assessment of the classifier which are generic to any learner such as accuracy, recall, precision and F-measure [24, 25]. Accuracy = (I) Recall =
(II)
Precision =
(III)
F-measure =2.
(IV)
The TP, FN, FP and TN represented in above equations signify True Positive, False Negative, False Positive and True Negative respectively of the confusion matrix. 6.3 Formulating miscellaneous classifiers with the development of SMOTE and Spread Sub-sampling techniques. The imperative intend of this subsection is to demonstrate accuracy of various classifiers viz j48, random tree and random forest, while forecasting the performance of students. Furthermore, an ensemble method viz. stackingC was also applied to corroborate whether the meta classifier shows any significant improvisation in predicting the output class of students. Moreover, prior to the application of both, base and meta classifiers, the classification accuracy was subjected to techniques such as SMOTE and spread sub-sampling, to guarantee supplementary advancement in the prediction system. The results obtained by base classifiers and meta classifiers are subsequently contrasted, with the prime objective of acquiring the best classifier for making predictions. Eventually, it was discovered in the contemporary study that the “ensemble stackingC” method performed outstanding (96%) with base classifiers as J48, random tree and random forest, and meta classifier as linear regression. J48 The classifier corresponds to the class of trees and various performance matrices linked with classifiers were analysed while classifying the instances of real academic dataset fetched from university of Kashmir. Therefore, various performance parameters connected with the classifier as represented in table 2, were observed to be significant such as correctly classified (92.20%), incorrectly classified (7.79%) and relative absolute error (13.51%) . Furthermore, other parameters associated with the classifier were also investigated including TP rate (0.922), FP rate (0.053), precision (0.898), recall (0.917), F-measure (0.922), receiver operating characteristics (ROC) (0.947), and are documented in table 3. Moreover, the overall performance of classifier was recorded as noteworthy with 92.20%, on a 10 fold cross validation mechanism.
Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000 Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040
1030
10
Random tree The classifier also pertains to category of trees and was employed across various academic data components taken under study. The classification accuracy for this classifier, when executed on 10 fold cross validation, was although momentous with 90.30% for correctly classifying the instances of the output class (Overall_Grade), which is nevertheless slightly lower than J48. The performance aspects for this classifier such as correctly classified (90.30%), incorrectly classified (9.69%) and relative absolute error (13.51%) can be visualized in table 2. Moreover, supplementary performance metrics concerned with the classifier were reported to be highly satisfactory viz. TP rate (0.903), FP rate (0.066), precision (0.903), recall (0.904), F-measure (0.892) and ROC (0.837), and can be noticed in table 3. Random forest The final classifier in this study that we employed in the category of trees achieved paramount accuracy of 95.76% than j48 and random tree respectively. In addition, various estimates related with classifier viz. correctly classified (95.76%), incorrectly classified (4.23%) and relative absolute error (19.63%) are exhibited in table 2. Furthermore, TP rate (0.958), FP rate (0.029), precision (0.951), recall (0.953), F-measure (0.952) and ROC (0.996) were observed to be remarkable than other classifiers and the same is illustrated in table 3. From the table 2 and table 3, it is apparent that random forest performed exceptional in classifying the instances of the output class. Table 2 exhibits results of classifiers Classifier Name
Corrrectly Classified
Incorrectly Classified
Rel. Abs. Err.
J48
92.20%
7.79%
13.51%
Random tree
90.30%
9.69%
15.46%
Random forest
95.76%
4.23%
19.63%
The underlying figure 8 corroborates that random forest performs best than the remaining classifiers that were taken under investigation. Furthermore, it is comprehensible from the figure 8 that the random forest has the lowest incorrectly classification error and highest relative absolute error while training and testing the classifier. 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
J48 RandomTree RandomForest
Fig.8 shows classification measurements prior to SMOTE and spread sub-sampling
100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
J48 RandomTree RandomForest
Fig.9 demonstrates the performance of three classifiers.
The figure 9 displays three curved lines with diverse colors to symbolize and categorize the performance of three classifiers viz. j48, random tree, and random forest. The figure 9 further endorses that random forest has performed excellent in comparison to other classifiers.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1031 11
Table 3 displays various performance metrics of classifiers. Classifier Name
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
J48
0.922
0.053
0.923
0.917
0.922
0.947
Random tree
0.903
0.066
0.903
0.904
0.903
0.922
Random forest
0.958
0.029
0.951
0.953
0.952
0.996
The figure 10 characterizes various dimensions of classifiers and it is distinguishable that estimates such as TP rate, precision, recall, F- measure and ROC are significant and exceedingly acceptable in random forest when contrasted with supplementary classifiers, and same can be seen in mentioned figure 10. From the figure, it is also noticeable that random tree performs least than other classifier, although the results are significant. 1.2 1 0.8 0.6
J48
0.4
RandomTree
0.2
RandomForest
0
Fig.10 visualizes various estimates of classifiers.
6.4 Results accomplished in Base learners after oversampling The primary objective was to explore if there is any extensive improvement in predicting the performance of the students, subsequent to application of SMOTE across current academic dataset. However, it was examined that there was significant transition in results of multiple classifiers, wherein classification accuracy of j48 was amplified from 92.20% to 92.98%, 90.30% to 90.84% (random tree), and 95.76% to 96.06% (random forest) respectively. The other two aspects such as incorrectly classified and relative absolute error have also shown considerable improvement which can be comprehended from table 4 (After SMOTE) and table 3 (Before SMOTE). Table 4 shows measurements after SMOTE Classifier Name
Correctly Classified
Incorrectly Classified
Rel. Abs. Err.
J48
92.98%
7.01%
11.43%
Random tree
90.84%
9.15%
13.92%
Random forest
96.06%
3.93%
15.85%
The histograms in figure 11 reveal least misclassification error exhibited by random forest, and subsequently followed by j48 and random tree respectively. Furthermore, after SMOTE was applied across pedagogical dataset, each classifier has publicized constructive enhancement in classification accuracy. However, among each individual classifier, random forest has delivered widespread improvisation in predicting the performance of students.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1032
120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
J48 RandomTree
100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
J48 RandomTree RandomForest
RandomForest
Fig. 11 displays performance of three classifiers
12
Fig. 12 exhibits performance of classifiers with curved lines
The curvature line represented in figure 12 corroborate that among each classifiers, the curvature line for random forest shows dominance over other two line of classifiers viz random tree and j48. As per figure 12, the incorrectly classified peak point is more adjacent to bottom line, in case of random forest, which further signifies that random forest has performed exceptional in predicting the performance of students. The table 5 presents other performance parameters associated with classifiers such as j48, random tree, and random forest. The TP rate, FP rate, precision, recall, F-measure and ROC have revealed paramount growth after application of oversampling technique viz. SMOTE and is exemplified in table 5. As per table3 and table 5, although each estimate of the classifier has gone through a considerable transition, however FP rate in each classifier has demonstrated significant decline in false positive rate from 0.053 to 0.038 (J48), 0.066 to 0.049 (random tree) and 0.029 to 0.021 (random forest) respectively. Table5 reveals other performance estimates associated with classifiers Classifier Name
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
J48
0.930
0.038
0.927
0.925
0.930
0.959
Random tree
0.908
0.049
0.909
0.908
0.908
0.932
Random forest
0.961
0.021
0.96
0.959
0.961
0.997
From the figure 13, it is visible that random tree and j48 have performed nearly similar, while classifying the instances associated with the student’s prediction achievements. Furthermore, it is apparent from the figure 13 that parameters such as precision, recall, F-measure and ROC are distinguishable and much improved in random forest classifier. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
J48 RandomTree RandomForest
Fig. 13 presents different parameters related with the classifiers
In this subsection it can be concluded that most effectual algorithm in the prevalent findings, for predicting the fresh instances is random forest. Besides, each algorithms performance estimates such as correctly classified, incorrectly
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1033 13
classified, TP rate, FP rate, precision and so on, explained improvement after oversampling the instances. 6.5 Results accomplished in Base learners after undersampling. The aim of using oversampling and undersampling techniques was to comprehend which method best suits the data, and can be constructive when the classifier is tested and trained on test data. Moreover, when undersampling namely spread sub-sampling was applied on underlying academic dataset, the results in table 6 illustrated notable improvement in the shape of 92.67% (j48), 88.95% (random tree) and 96.06% (random forest). Though each classifier exhibited significant improvement after undersampling, but oversampling discussed in earlier section signified considerable improvement over undersampling technique. The distinguishing estimates of various classifiers viz j48, random tree, and random forest are furnished in table 4, 5 (SMOTE) and table 6, 7 (spread subsampling). Table 6 shows results after undersampling Classifier Name
Correctly Classified
Incorrectly Classified
Rel. Abs. Err.
J48
92.67%
7.32%
11.68%
Random tree
88.95%
11.04%
16.65%
Random forest
96.06%
3.92%
17.54%
The histograms in figure 14 show that random tree has high incorrect classified instances while as random forest has least incorrectly classified instances. Furthermore, high relative absolute error was recorded in random forest, subsequently followed by random tree and j48 respectively. 120.00% 100.00% 80.00% 60.00% 40.00% 20.00% 0.00%
J48 RandomTree RandomForest
Fig.14 furnishes results of classifiers after undersampling
100.00% 90.00% 80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00%
J48 RandomTree RandomForest
Fig.15 displays same classifiers with different curves
Among three classifiers viz. j48, random tree and random forest presented in figure 15, random forest show considerable development in estimates such as correctly classified, incorrectly classified and relative absolute error, after spread sub-sampling was applied on various input variables of primary dataset. From the table 7, various estimates after exploitation of spread sub-sampling including recall (j48), FP rate (random tree) have revealed considerable improvement than measurements represented with oversampling whereas, parameters such as TP rate (random forest) and ROC (random forest) have not undergone any change. The remaining dimensions after application of undersampling have not rendered any significant change over results obtained by oversampling (table5).
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1034
14
Table 7 exhibits results of various parameters connected with classifiers Classifier Name
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
J48
0.927
0.037
0.925
0.926
0.924
0.955
Random tree
0.890
0.055
0.888
0.889
0.896
0.918
Random forest
0.961
0.020
0.958
0.956
0.957
0.997
The figure 16 shows that classification measurement such as TP rate, FP rate, precision, recall, F-measure and ROC have represented significant improvement in case of random forest than j48 and random tree. Moreover, improvement over entire estimates is followed by j48 and then, by random tree. 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
J48 RandomTree RandomForest
Fig.16 shows graphical representation of various estimates
From this section, the researchers concluded from table 4 and table 6 wherein oversampling and undersampling were applied respectively. It is obvious that random forest did not show any improvement and its performance remained steady, after applying oversampling and undersampling techniques. On the other hand, performance of classifiers j48 and random tree declined from 92.98% to 92.67% (j48), and 90.84% to 88.95% (random tree) respectively. 6.6 Empirical results of ensemble stackingC method The principal objective of this subsection was to employ both oversampling and undersampling techniques on ensemble method viz stackingC to corroborate the effect of these two methods on classification accuracy of ensemble classifier, in predicting the performance of the students. The classifier was combined on three algorithms viz. j48, random tree and random forest and the learning classifier utilised at meta level was linear regression. Furthermore, the academic dataset was put under investigation without oversampling and undersampling procedures, and the same can be seen under mentioned table 8. Table 8 illustrates results of stackingC method Classifier Name StackingC
Corrrectly Classified 95.65%
Incorrectly Classified 4.34%
Rel. Abs. Err. 16.36%
From the table 8, it can be seen that the stackingC achieved remarkable results of 95.65% accuracy in classifying the correct instances, which is relatively significant than the results achieved using individual base classifiers viz. j48, random tree and random forest. In addition, incorrectly classified instances and relative absolute error also showed substantial results in contrast to the results illustrated in earlier subsections.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1035 15
StackingC 120.00%
100.00% 80.00% 60.00%
stackingC
40.00% 20.00% 0.00% CorrectlyClassified
IncorrectlyClassified
Rel. Abs. Err.
Fig.17 demonstrates performance of stackingC method
As per the figure 17, the histograms make the estimates of stackingC more apparent and, it is clear that stackingC has an exceptional accuracy of 95.65 %. The figure visualise relative absolute error and low miss classification error of 4.34%. Table 9 explains different performance estimates using stackingC technique Classifier Name
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
StackingC
0.957
0.029
0.925
0.955
0.954
0.995
The table 9 illustrates various estimates including TP rate, FP rate, precision, recall, F-measure and ROC values, associated with the meta classifier namely stackingC are significant than each individual classifiers viz. j48, random tree, and random forest. Moreover the results achieved and compared are not subjected to techniques such as oversampling and undersampling techniques. StackingC 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
stackingC
TP Rate
FP Rate
Precision
Recall
F-Measure ROC Area
Fig. 18 represents graphically different performance measurements of stackingC
The figure 18, demonstrates the measurement of the classifiers achieved. The curvature line makes it further clear that the estimates viz. TP rate (0.957), FP rate (0.029), precision (0.925), recall (0.955), F-measure and ROC (0.995) have shown significant results. 6.6.1 StackingC measurements after oversampling technique Subsequent to deployment of SMOTE, the classifier (stackingC) illustrated imperative improvement across various parameters including correctly classified from 95.65% to 96.11%, incorrectly classified (4.34% to 3.88%) and relative absolute error as (13.40% to 16.36%). The analogous changes in results can be visualised in underneath mentioned table 10.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1036
16
Table 10 displays results after employing oversampling method Classifier Name
Corrrectly Classified
Incorrectly Classified
Rel. Abs. Err.
StackingC
96.11%
3.88%
13.40%
The figure 19 represents graphically different estimates of the classifier to get a comprehensive view of significant advancements inclusive in StackingC. Furthermore, results demonstrated in figure 19 such as correctly classified, incorrectly classified and relative absolute error are significant. stackingC 120.00% 100.00% 80.00% 60.00%
StackingC
40.00% 20.00% 0.00%
CorrectlyClassified IncorrectlyClassified
Rel. Abs. Err.
Fig. 19 exhibits performance of classifier after oversampling
From the table 11, various measurements of classifier (stackingC) have shown significant achievement after employability of SMOTE on academic dataset. The TP rate has enlarged from 0.975 to 0.961, FP rate (0.029 to 0.021), precision (0.925 to 0.959), recall (0.955 to 0.960), f-measure (0.954 to 0.960) and ROC area (0.995 to 0.997). Table 11 reveals results of various parameters associated with the classifier after SMOTE Classifier Name
TP Rate
FP Rate
Precision
Recall
F-Measure
ROC Area
StackingC
0.961
0.021
0.959
0.960
0.960
0.997
As per the figure 20, the parameters such as precision, recall and f-measure are significant and lounge on the same scale in the figure. Moreover, TP rate and FP rate also indicate significance in results in contrast to results achieved prior to SMOTE on the classifier viz stackingC. StackingC 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Fig.20 demonstrates performance estimates after SMOTE
StackingC
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1037 17
The successive figures including figure 21, figure 22 and figure 23 shows the association among three classes of output class viz. Division I, Division II and Division III respectively, with the help of precision and recall curve (PR curve). The precision is represented on x-axis and recall on y-axis. The PR curves demonstrated in mentioned figures are all significant as each curve tends more towards the upper corner (1,1), which further signifies accurateness of the classifier.
Fig. 21 shows PR curve for Division I
Fig.22 displays PR curve for Division II
Fig. 23 shows PR curve for Division III
Furthermore, among three PR curves, figure 23 has more area under curve than the remaining two curves which indicates that the classifier has more accurately classified instances pertaining to class viz. Division III. 6.6.2 StackingC measurements after undersampling technique In this subsection, researchers applied spread sub-sampling on the same pedagogical dataset to further investigate which sampling technique has produced better results (undersampling or oversampling), subsequent to application of classifier viz. StackingC. Moreover, from the table it is perceptible that classifier has exhibited transition in classification accuracy from 95.65% to 95.96%, after the classifier was subjected to spread subsampling technique. Moreover, both incorrectly classifiers and relative absolute error also revealed significant improvement from 4.34% to 4.03%, and 16.36% to 14.66% respectively. Table 12 shows results after Spread subsampling Classifier Name
Corrrectly Classified
Incorrectly Classified
Rel. Abs. Err.
StackingC
95.96%
4.03%
14.66%
Furthermore, the histograms in figure 24 designate various performance metrics of classifier, which further signify that the classifier has shown significant improvement in predicting the performance of the students, as the classifier has demonstrated a remarkable prediction accuracy of 95.96% in classifying the resultant class.
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
1038
18
StackingC 120.00% 100.00% 80.00% 60.00%
stackingC
40.00% 20.00% 0.00%
CorrectlyClassified IncorrectlyClassified
Rel. Abs. Err.
Fig. 24 displays precision and various errors related to classifier after undersampling
The table 13 shows other performance matrices associated with the meta classifier which were also found to be coherent, once the classifier was put under spread sub-sampling method. The TP rate augmented from 0.957 to 0.960, FP rate (0.029 to 0.020), precision (0.925 to 0.959), recall (0.955 to 0.959), F-measure (0.954 to 0.959) and ROC area (0.995 to 0.994). Therefore, the only estimate that has slightly come down is ROC area under spread subsampling method, and the remaining parameters have shown considerable significance on stackingC classifier while predicting the right class of the students. Table 13 illustrates results of miscellaneous parameters related to stackingC Classifier Name TP Rate StackingC
0.960
FP Rate
Precision
Recall
F-Measure
ROC Area
0.020
0.959
0.959
0.959
0.994
Moreover, various performance measurements have been visualised explicitly in figure 25 to acquire comprehensive study of the classifier, which were achieved while classifying the three classes’ viz. Division I, Division II, and Division III. StackingC
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
stackingC
TP Rate
FP Rate Precision
Recall F-Measure ROC Area
Fig.25 represents graphically performance estimates of stackingC after undersampling After applying stackingC on the pedagogical dataset, and subsequently corroborate results with and without, application of techniques such as undersampling and oversampling techniques, to examine which method can be more sustentative and suitable in predicting the performance of the students. The investigators found that stackingC with oversampling technique fetched remarkable prediction accuracy of 96.11% in forecasting the performance of the students. However, it was found that the parameter (precision) associated with the classifier did not undergo any change, through the method of undersampling and oversampling techniques. Furthermore, the ROC area also came slightly down while the classifier was subjected to undersampling technique.
Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000 Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040
19 1039
6.6.3 Experimental results produced during prediction. The figure 26 shows the snapshot of predictions geared up, by the classifier (stackingC) for the resultant class. Furthermore, the figure makes a distinction among the classes (original class and predicted class), which have been forecasted as correctly classified and the classes which were incorrectly classified.
Fig.26 exhibits snapshot of original and predicted values of some instances.
7. Conclusion In this study, the fundamental aim was to ameliorate the performance of students by undertaking prediction procedures through meta and base classifiers. Therefore, a novel attempt was made to investigate pedagogical dataset through more effective ensemble classifier viz. stackingC. Moreover, primarily in this study, researchers endeavoured to find a comparison among meta and base classifiers with the intent to examine which classifiers are preeminent for making predictions in educational backdrop. However, it was observed that meta classifier viz. stackingC performed with outstanding performance of 95.65% and, among three base classifiers random forest achieved noteworthy prediction accuracy of 95.76%. Furthermore, the academic dataset was further analysed to notice any discrepancy in results after the dataset was subjected to techniques such as undersampling (spread sub sampling method) and oversampling (SMOTE) techniques. In case of base learners, j48 among other classifiers attained admirable accuracy of 92.98% after the dataset was employed with oversampling technique. Nevertheless, subsequent to application of undersampling, the random forest among base classifiers exhibited significant performance of 96.06% in predicting the actual class of students. Furthermore, the meta classifier viz. stackingC, after subjected to methods of undersampling and oversampling, attained unprecedented prediction accuracy of 95.96% and 96.11% respectively. In this work, it was investigated that once the filtering approaches are deployed across base and meta classifiers over primary pedagogical dataset, results explained considerable improvement in prediction accuracy of the students. In future, researchers could focus on utilising techniques such as bagging, boosting, stacked generalization, mixture of experts and subspace and other ensemble techniques on pedagogical datasets to discover further hidden patterns from the educational settings. In addition, datasets with massive size and diverse attributes could ponder on exploiting incremental learning algorithms to tackle the issue of scalability. As the precise algorithm (incremental
1040
Mudasir Ashraf et al. / Procedia Computer Science 132 (2018) 1021–1040 Mudasir Ashraf / Procedia Computer Science 00 (2018) 000–000
20
learning algorithm) has competence to learn from fresh inbound data, even after the model has previously been constructed from the prior existing data. It generates succession of hypothesis for the group of training instances, wherein current hypothesis is none except characterisation of entire data which has been compiled so for, and it perpetually centres on preliminary hypothesis and the current training data [26, 27]. References [1] El-Halees, A. (2009). Mining students data to analyze e-Learning behavior: A Case Study. [2] Pandey, U. K., & Pal, S. (2011). Data Mining: A prediction of performer or underperformer using classification. arXiv preprint arXiv:1104.4163. [3] Nasiri, M., & Minaei, B. (2012, February). Predicting GPA and academic dismissal in LMS using educational data mining: A case mining. In E-Learning and E-Teaching (ICELET), 2012 Third International Conference on (pp. 53-58). IEEE. [4] Acharya, A., & Sinha, D. (2014). Early prediction of students performance using machine learning techniques. International Journal of Computer Applications, 107(1). [5] Rajni, J., & Malaya, D. B. (2015). Predictive analytics in a higher education context. IT Professional, 17(4), 24-33. [6] Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119-136. [7] Mishra, T., Kumar, D., & Gupta, S. (2014, February). Mining students' data for prediction performance. In Advanced Computing & Communication Technologies (ACCT), 2014 Fourth International Conference on (pp. 255-262). IEEE. [8] Anh, N. T. M., Chau, V. T. N., & Phung, N. H. (2014, September). Towards a robust incomplete data handling approach to effective educational data classification in an academic credit system. In Data Mining and Intelligent Computing (ICDMIC), 2014 International Conference on (pp. 1-7). IEEE. [9] Márquez-Vera, C., Cano, A., Romero, C., & Ventura, S. (2013). Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Applied intelligence, 38(3), 315-330. [10] Vera, C. M., Morales, C. R., & Soto, S. V. (2013). Predicting school failure and dropout by using data mining techniques. IEEE Journal of Latin-American Learning Technologie, 8(1), 7-14. [11] Ahmed, A. B. E. D., & Elaraby, I. S. (2014). Data Mining: A prediction for Student's Performance Using Classification Method. World Journal of Computer Application and Technology, 2(2), 43-47. [12] Kumar S. Anupama and Dr. Vijayalakshmi M.N. (2011). Efficiency of Decision Trees in Predicting Students Academic Performance. Computer Science & Information Technology 02, pp. 335–343. [13] Pandey, M., & Sharma, V. K. (2013). A decision tree algorithm pertaining to the student performance analysis and prediction. International Journal of Computer Applications, 61(13). [14] Dietterich, T. G. (2000, June). Ensemble methods in machine learning. In International workshop on multiple classifier systems (pp. 1-15). Springer, Berlin, Heidelberg. [15] Tumer, K., & Ghosh, J. (1996). Error correlation and error reduction in ensemble classifiers. Connection science, 8(3-4), 385-404. [16] Wang, X. (2013). Modeling entrance into STEM fields of study among students beginning at community colleges and four -year institutions. Research in Higher Education, 54(6), 664-692. [17] Fürnkranz, J. (1999). Separate-and-conquer rule learning. Artificial Intelligence Review, 13(1), 3-54. [18] Rahman, A., & Tasnim, S. (2014). Ensemble classifiers and their applications: a review. arXiv preprint arXiv:1404.4088. [19] Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10), 993-1001. [20] Breiman, L. (1996). Bagging predictors. Machine learning, 24(2), 123-140. [21] Freund, Y., & Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1), 119-139. [22] Kumar, V., & Rathee, N. (2011). Knowledge discovery from database using an integration of clustering and classification. International Journal of Advanced Computer Science and Applications, 2(3), 29-33. [23] Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier. [24] Powers, D. M. (2011). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. [25] Chen, T. Y., Kuo, F. C., & Merkel, R. (2004, September). On the statistical properties of the f-measure. In Quality Software, 2004. QSIC 2004. Proceedings. Fourth International Conference on (pp. 146-153). IEEE. [26] Giraud, Carrier, C. (2000). A note on the utility of incremental learning. Ai Communications, 13(4), 215-223. [27] Elwell, R., & Polikar, R. (2009, June). Incremental learning in nonstationary environments with controlled forgetting. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on (pp. 771-778). IEEE.