Performance Prediction in Online Discussion Forum: state-of-the-art and comparative analysis

Performance Prediction in Online Discussion Forum: state-of-the-art and comparative analysis

Available online at www.sciencedirect.com ScienceDirect ScienceDirect Procedia Computer Science 00 (2018) 000–000 Available online at www.sciencedir...

806KB Sizes 0 Downloads 21 Views

Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Procedia Computer Science 00 (2018) 000–000 Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 135 (2018) 302–314

3rd International Conference on Computer Science and Computational Intelligence 2018 3rd International Conference on Computer Science and Computational Intelligence 2018

Performance Prediction in Online Discussion Forum: state-of-the-art Performance Prediction in Online Discussion Forum: state-of-the-art and comparative analysis and comparative analysis Febrianti Widyahastutia, Viany Utami Tjhinb * Febrianti Widyahastutia, Viany Utami Tjhinb *

a School of Information Technology, Deakin University, 221 Burwood Hwy, Burwood, Victoria 3125, Australia a Information Systems Department, SchoolDeakin of Information Systems, Bina Nusantara University,Jakarta, Indonesia 11480 School of Information Technology, University, 221 Burwood Hwy, Burwood, Victoria 3125, Australia b [email protected], [email protected]* Information Systems Department, School of Information Systems, Bina Nusantara University,Jakarta, Indonesia 11480 [email protected], [email protected]* b

Abstract Abstract This survey paper is designed to present the state-of-the-art and comparative study of performance prediction in online discussion forumsurvey usingpaper data mining techniques andthe is dedicated to provide a guideline study or roadmap as a prediction tool in leading effective This is designed to present state-of-the-art and comparative of performance prediction onlinetodiscussion interactions use of online discussion forum. There are different features, methodsas and techniques tool of data mining forum usingwith datafull mining techniques and is dedicated to provide a guideline or roadmap a prediction leading to applying effective performancewith prediction discussion forum. forum There from are 2011-2016 The methods inclination the tip of from data applying help the interactions full use in of online discussion differentdata. features, andand techniques datathe mining researcher to analyze the in potency online discussion forum2011-2016 in predicting students’ performance. becomes a benchmark performance prediction onlineofdiscussion forum from data. The inclination and Hence, the tip itfrom the data help the to find newtoand meaningful innovations research,forum not only in education but also in all aspects of theitfields. This paper also researcher analyze the potency of onlinefordiscussion in predicting students’ performance. Hence, becomes a benchmark provides recommendations students and give sufficient information to preserve and ameliorate learning processalso by to find new and meaningfulfor innovations foreducators research,tonot only in education but also in all aspects of the fields. This paper monitoring the progress offor students’ toolsinformation using datatomining. performances mainly provides recommendations studentsperformance and educatorsvia to prediction give sufficient preserveStudents’ and ameliorate learning are process by observed to the decide the level of students’ progress tovia determine if atools student willdata remain involved or quit the study. Such critical monitoring progress of students’ performance prediction using mining. Students’ performances are mainly issue is obviously by many educational institutions. Hence,ifperformance prediction is obviously important to beSuch applied, not observed to decidefaced the level of students’ progress to determine a student will remain involved or quit the study. critical only in fieldby butmany also educational the others asinstitutions. it avoids theHence, students’ reductionprediction potential in institutions. Inapplied, addition,not it issue is educational obviously faced performance is educational obviously important to be also improves the students’ and knack active engagement. It potential also helpsinhighly-risked students to In recognize only in educational field but standard also the others as it through avoids the students’ reduction educational institutions. addition,the it weakness of thethe study program. also improves students’ standard and knack through active engagement. It also helps highly-risked students to recognize the weakness of the study program. © 2018 The Authors. Published by Elsevier Ltd. © 2018 2018 The Authors. Published by Elsevier Elsevier Ltd. This is an open accessPublished article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) © The Authors. by Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection under responsibility of the 3rdlicense International Conference on Computer Science and Computational This is an and openpeer-review access article under the CC BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational Intelligence 2018. Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational Intelligence 2018. Intelligence 2018. Keywords: performance prediction; student performance; online discussion forum; educational data mining Keywords: performance prediction; student performance; online discussion forum; educational data mining

* Corresponding author. Tel.: +0-000-000-0000 ; fax: +0-000-000-0000 . address:author. [email protected] * E-mail Corresponding Tel.: +0-000-000-0000 ; fax: +0-000-000-0000 . E-mail address: [email protected] 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) 1877-0509 © 2018 Thearticle Authors. Published by Elsevier Ltd. Selection under responsibility of the 3rdlicense International Conference on Computer Science and Computational Intelligence 2018. This is an and openpeer-review access article under the CC BY-NC-ND (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational Intelligence 2018. 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of the 3rd International Conference on Computer Science and Computational Intelligence 2018. 10.1016/j.procs.2018.08.178

2

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

303

1. Introduction Today, the level of research interests in data mining, applied in educational sector, are increasing. Such implementation of data mining is called educational data mining 1. Educational Data Mining (EDM) can be defined as a field Applying statistical, machine-learning, and data mining principle for sundry educational data 1. EDM is very new yet very useful in academic field. Mainly, the aim of Educational data mining is to analyze different types of data by benefitting data mining methods to solve the educational problems 1. It uses many techniques in its application; support vector machines, decision trees, k-nearest Neighbor, neural networks, and Naive Bayes. Educational data set obtained from learning management system which have huge amount of information used by institutions such as phpBB, Blackboard, WebCT and Moodle or from local database with qualitative and quantitative data. One of the approaches implemented in Educational Data mining (EDM) is predictive model. It is a well-known approach attempting to comprehend and predict students’ educational outcomes 2. Predictive models are commonly wanted to detect students’ behavior of assessment functionalities and modelling and use those features as the basis of prediction models. Moreover, those features are also used in other kinds of data mining approach with the aim to predict student performance (current or future outcomes) 3 4. Those research predicted and classified students’ performance into three categories: good, average, and poor by using comparative analysis of three classification techniques; Naïve Bayes (NB), Decision Tree (DT), Rule Based (RB), However, there is only small number of research that has been utilized educational data mining for online discussion forum. Benefitting education data mining corresponded with online communication platform, especially in higher education, is considered as new for researchers. In recent years, not many researchers paid attention and recognized the benefit of existed communication tools when they are engaged with education data mining, whereas, millions of students from all levels of education are communicating through online communication platform. Besides email, discussion forum is an important online platform needed to accommodate interactions between students and teachers or students and students. Furthermore, Online discussion forum is one of the most commonly utilized tools of communication in learning management system (LMS). It becomes so popular as it accommodates collaborations, interactions, and discussions of participants. It also provides serviceable direction and feedbacks to all participants engaged in the forum. All features of online education can come to grips with data mining techniques. Modern online education mainly concentrates on course management systems (CMS) or learning management systems (LMS). LMS/CMS automatically register measurable behavior of individual users as server logs. By using the data of server log, added by data mining, teachers will be able to establish individualized instructions to identify the level of at-risk student potential, to ameliorate course design, and to adjust teaching strategies. In most situations, it is indicated that students’ course achievement depends on frequency of this interaction use, the more active the students interact in the online forum, the higher performance will be obtained. Moreover, the analysis of students’ behavior in online discussion forum can provide a vision about individual status on learning which grant the researchers more precise predictions. 2. Educational Data Mining As what has been described before, Education data mining is very useful as it verifies unique ways of g data mining methods implementation to solve educational-related problems 5. There are six elements contained in Education Data Mining (EDM): user, dataset, features, data mining tools and techniques and last get the result. see Fig. 1.

Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000 Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314

304

User

Dataset

Features

DM tools & techniques

3

Result

Fig. 1. Educational Data Mining Model.

Users are those who benefit from EDM. In higher education, users are all participants involved both in the system and in teaching and learning activities (lecturers and students). Next, dataset is the data collection which is classified into online and offline data. Online data is the data that need internet access to generate and update the system. For example: chat room, file sharing, online discussion forum, learning management system etc. Offline data is the data that do not require internet access to generate and update the system. For example: offline email, local file, questionnaire etc. Features in educational data mining is the attributes or parameter capturing students’ activities from course management systems such as Moodle, LMS software. It allows the excogitation of efficacious, limber, and captivating experiences and online courses 6. Course management systems can identify readable trends and patterns of student online behavior as it benefits from data mining information about users’ activity such as reading, quizzes, and discussion posts, testing 7. It also provides about other detail information such as login activities, course information (course ID, course name), students’ profile (student name, date of birth, address etc), details in discussion forum (post, reply, read) and so forth. However, with all the benefits mentioned, it does not mean that the patterns hidden in E-learning dataset can be identified directly, instead special tools is needed. One of tools that can be used is data mining. Such way of data mining implementation is called Educational Data Mining as it used in educational situation. In addition, It helps educational institutions to design, evaluate, and hike up their programs. Data Mining Tools Recently, a lot of educational data mining (EDM) tools (general machine learning) are available for data mining users. There are two common categories exist, commercial and public. Some examples of commercial software machine learning are SPSS Clementine, DB2 Intelligent Miner, and DBMiner. Meanwhile, the examples of public domain mining tools are Keel, Rapid Miner, and Weka. From all, WEKA, as an open machine learning software, can provides several features of various classification techniques models that easily interpreted like naïve bayes, Bayesian, neural network, and decision trees. The educational data mining tools and techniques can be seen in Fig.2. Several innovative tools have emerged to provide more comprehensive insights into the data, namely: - SPSS SPSS (Statistical Package for the Social Sciences) is one of the most popular predictive statistical software to perform the correlation and regression model in this experiment. - WEKA WEKA (Waikato Environment for Knowledge Analysis) is a machine learning software developed at the University Waikato (New Zealand) and written in Java script. it provides advantages such as free access (General Public License), portability (fully implemented in Java script), compatibility (compatible for majority computing platform). It also provides comprehensives solutions for visualisation, association, regression and classification, clustering, predictive analysis and modelling techniques, as well as practical to use due to its graphical user interfaces 8.

Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000 Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314

4

305

- Rapid Miner Rapid Miner is data science platform provides benefits such as free access and ready to use software for complex analysis. It is written in Java script and consolidated multifaceted data mining functions such as predictive analysis, data pre-processing, and visualisation. It can also be integrated with R-Tool and WEKA 9. - Python based Orange and NTLK Python is famous for having powerful features and high usability. Orange is a free access tool written in Python. It features an interactive data visualisation and visual-programming front-end for explorative data analysis. Furthermore, NTLK is an efficacious language processing data mining tool which consists of data scraping features, machine learning, data mining that can easily be adjusted for customised needs 1.

Fig. 2. Educational Data Mining Tools

Data Mining Methods There is a big number of methods used in data mining to predict students’ academic performance. We analytically compare and evaluate the students’ academic performance by applying several data mining tools. Data mining tools and techniques aim to provide a scientific experiment that can contribute meaningful knowledge. The following methods that used to predict students’ performance as seen in Fig.3: - Correlation analysis Correlation analysis is used to determine the correlation between variables to predict students’ academic performance. The analysis is done with the condition that if the sig. (2-tailed) value is less than or equal to 0.05, the correlation value is significant (there is a correlation). In contrast, if the sig. (2-tailed) value is more than 0.05; the correlation value is considered insignificant (there is no correlation). The dependent variable is viewed as the predicted or result value and independent variable as the predictor or input value. Data mining employs regression and classification used in students’ academic performance prediction reposes on predictor results in correlation. Classification Classification is the most prevalent data mining technique to predict student performance. There are several algorithms in the classification techniques that have been applied in students’ academic performance prediction such as Neural Network, Decision Tree, multilayer preceptron, K-Nearest Neighbours (K-NN), Logistic Regression, and Support Vector Machine (SVM) 3 5 6. -

Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000 Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314

306

-

5

Linear regression

Regression is the most suitable prediction model to assess the causal relationship between one dependent variable (final grade) and one or more independent variables (features in E-learning). The foremost notion of a possible relationship between two continuous variables should always be made on the basis of a scatter graph 7. Moreover, linear regression approach is a simple and rapidly processed large-sized datasets. The target values of linear regression are numeric instead of categorical.

Fig. 3. Educational Data Mining Techniques

2.1. The Result Taxonomy of the Performance Prediction

Fig. 4. Performance Prediction Taxonomy

Figure 4 show the performance prediction taxonomy divided into two, intellectual performance and emotional performance. The results of intellectual performances are represented in two elements: categorical and continuous.

6

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

307

Categorical and continuous is presented quantitatively (intellectual performance) while behavior is presented qualitatively (emotional/behavior performance). Intellectual performance is something that we learn in school or university to get high scores. Whereas, emotional performance is about the people and how they interact with people. For intellectual performance mostly done using data mining technique whereas emotional performance nowadays many research using text mining to express their emotional based on people behavior thru text or comment. 2.2. Categorial Students under categorical value or discrete value are classified into many scale format or intervals to define the performance prediction depending on the educational setup. Each institution has different label to groups the students’ performance with the range of score almost the similar. Previous studies 10, 11 12 3 classified students into five categories: bad, average, good, very good, and excellent. Similar research 13 was also conducted by dividing parameters result into First ≥ 60%; Second ≥ 45 & <60%; Third ≥3 14 & <45%; Fail <36%. 14 15 16 In another research, Suljić consolidated the final grade into three level for each grade, A (A+, A, and A-), B (B+, B, and B-), C (C+, C, C-), and F (Failed) 16with his own categorization, Romero 17 applied the mark attributes with four intervals and labels performance level (FAIL: if value is<5; PASS: if value is ≥5 and<7; OOD: if value is≥7 and<9; and EXCELLENT: if value is ≥9). AlJeraisy et al 18 classified predictor (in three classes: satisfactory, satisfactory, above satisfactory) using Bayesian approach 19. Lastly, Vandamme et al (2007) classified students into three categories: ‘high-risk’ students, ‘medium-risk’ students, and ‘low-risk’ students 20. Generally, most academic grading in Australia categorize the grades into High Distinction (HD), Distinction, Credit, Pass and Fail. 2.3. Continuous Continuous value is numerical values of marks or grades by each grade or accumulative grades (final grade). Mostly, continuous variables were transformed into discrete variables. For example, in grading scale: High distinction marks between “80-100”, Distinction marks between “70-79”, Credit marks between “60-69” and Pass marks between “50-59”. 2.4. Personality Goldberg (1999) listed five personality scores: ‘Extraversion’, ‘Openness to Experience’, ‘Conscientiousness’, ‘Neuroticism’ and ‘Agreeableness’. Extraversion cover certain characters such as assertiveness, being energetic, and talkativeness. Openness to Experience covers characters such as being imaginative and insightful, and having wide interests. Conscientiousness covers characters such as, planning ability, organization, and thoroughness. Agreeableness cover characters such as kindness, sympathy and affection. Neuroticism covers character such as mood, anxiety, and tension. 2.5. Information Literacy Fuji and Nakayama developed the categorization of information literacy that consists of: interest, behavior, motivation, attitude, knowledge and understanding. 20 3. Performance Prediction Overview Among research utilizing Education Data Mining, performance prediction is one of the most well-known topics, which becomes a trend among researchers nowadays. Prediction is cohesive to students’ performances which can be precisely observed through Education Data Mining.

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

308

7

Regarding students’ performance, Predictive modelling includes the analyses of recent or past facts to make predictions of forthcoming events. For example, online discussion forum is capable of benefitting predictive modelling techniques to indicate the main predictors of students’ academic performance which, then, used to develop interventions for the sake of improvement in students’ performance. The best formula of performance prediction can be formed based on the work of Romero et al 21 which discovered the best predictor attributes between online discussion forum and students’ performance with the aim to identify the causal relation between the two in order to get precise classification models using data mining techniques. The prediction tools implemented is beneficial for instructors, students, institution (higher education), and parents. For instructors, it is beneficial to identify students’ academic status in advance so they can focus more on lax students with the purpose to redress their academic achievement and minimize failure by providing support, direction and supervision. Through such way students can be helped to prepare, to design, to evaluate and to develop their perceptivity of the course to obtain higher score in their class. Furthermore, students can also be admonished in advance before they fail. For students and instructors, it is the best practices to participate in fully utilized online discussion forum as in classroom instructional process. For higher education institutions, it can improve their traditional processes and enhance educational processes more accurate, efficient and effective. For parents, it can also prevent their children from failure and know the early stage progress of the students in university. Also, the benefit from students to students is that they will perform well and it will encourage them to do the best in their study because education today is all about competition to achieve high marks among the students. Understanding the overall usefulness of discussion forum applying data mining tools to develop prediction models will provide the opportunity to accommodate and encourage the students to increase access and strong relationship between student-student and teacher-student in online discussion forum. 3.1. Prediction performance using students’ Activity in online discussion forum Referring to the trend analysis from twenty-two papers, Table 1 provides the emersion of students’ behavior in online discussion forum from twenty-two survey paper in the period of 2011-2016. Seventeen papers used the attributes ‘post message’ from online discussion forum to predict students’ performance. Seven papers used the attributes read messages and six papers used attributes related to user and time spend in online discussion forum. Table 1. Occurrence of the behavior of students. Attributes

References

Post message

22 23 24 25 26 27 21 28 29 30 14 31 32 33 34 14

Login discussion

35

Read message

21 36 14 23 24 25 37

Reply message

14 27 38

User

31 39 28 18 40 33

Thread

21 31 33 34

Time

41 27 21 37 29 23

3.2. Prediction academic performance using students’ score In the comparative analysis on performance prediction using students’ score students score, it is found that three papers use final grade as the basis of predictions on academic performance (Table 2). Table 2. The basis of predictions on academic performance. Attributes

References

Final grade

22 38 14

Exam

22

8

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

Quiz

36

Assignment

36

309

3.3. Prediction academic performance using behavior There is lack of research of behavior performance in online discussion forum. Only one paper to predict student’s behavior in online discussion forum as seen in Table 3. Table 3. The basis of predictions on academic performance in online discussion forum. Attributes

References

Behavior

38

The aim of prediction is to create a model which can interpret relationship between one aspect of the data (the predicted/dependent variable) and some combination of other aspects of the data (predictor/independent variable).9 Prediction models are also utilized for some kinds of data mining approach to predict student performance for future output in online discussion forum. Some examples are as in: comparing two techniques of data mining techniques; combination of classification techniques and clustering to classify and predict academic performance of students14. Černezel et al who suggested to use classification, regression and correlation 36. You’s study predicted students’ academic performance using exam and final score. However, the result shows that messages created is not significant for predicting exam and final score. 22 There are many research having been carried out to indicate possible measurable factors contributing to the success of student academic performance in online discussion forum. Demographic information are the common parameters selected by many researchers to predict student’s performance, the content such as age, gender, nationality, marital status, socioeconomic status 42, psychological profile 14 44 45 46. Student information 47 48 42 such as course information, year of enrolment, admission information 4 48 45, tuition fees, academic history, entry level 35, behavior 34 such as psychometric factors 12 are considered useful in helping researchers to predict students’ performance. 4. Overview Techniques and Tools This paper discusses the techniques of classification, correlation, and regression to predict student learning performance. 4.1. The Use of Correlation Techniques for Performance Predictions Correlation mining aims to find negative or positive values of linear correlations between variables. It is a common/popular goal in statistics, in which a literature on how to use dimensionality reduction techniques to avoid having false relationships and/or post-hoc analysis came up 50. There are several examples can be described related to the use of Correlation Techniques for Performance Predictions (Table 4). Table 4. The examples of correlation techniques for Performance Prediction. Methods

Attributes

Results

Authors

Correlation

Message, exam score, final score

Messages created is not significant predicted exam and final score

22

Student’s post

negative correlation between students’ success and number of posts

24

Number of participation vs. no participation

Students who actively participate in online discussion forum obtain significant results compared to those s did not participate.

39

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

310

Forum page

The result find no correlation with forum features

Number of posts, threads, participants

The students who complete the course are more active in forum than other categories of students

9

35 33

To begin with, the current studies shows that, compared to students obtaining passing grade, those who have less frequency of interaction failed in one or more modules 35. However, there is also a case in which students who participated frequently are not always get high grades significantly 20. Similar result is also obtained by 23 25. The results did not show a positive correlation between numbers of posts and students’ success. 25 In contrast to the findings, some researchers suggested that number of posts and numbers of visits in discussion forum were very strongly related with academic performance. 52 53 54 55 56 57 27 40 58 59 18 60 For active user, it showed that the students active in discussion forum will complete the course than other categories of students. 34 4.2. The Use of Classification Techniques for Performance Prediction As what has been elucidated before, education data mining provides different types of approach to discover education data. Referring to Romero and Baker, the best predictive models are using classification and regression techniques for predicting students’ behaviours and final marks. They consist of one or more predictor variables 51 61 which can be seen in Table 5. Table 5. The references of classification on academic performance predictions. Methods

Attributes

Results

Classification

Read, post, reply and final grade

85%

Post, read, total time

93%

Authors 14 28

Classification is the most popular data mining technique to predict student performance. Different data mining methods are implemented to make a prediction of students’ academic performance such as Neural Network , Decision Tree, Logistic Regression, K-Nearest Neighbours (K-NN), and Support Vector Machine (SVM) (4-6) (7). One of classification technique can be used is Decision Tree. It is one of well-known techniques utilized in making prediction. Many researchers have already used this technique for its simplicity and comprehensibility in uncovering small or large structure of data as well as predict the value 62. Moreover, decision tree models can be directly channeled into IF-THEN rules and are easily understood due to their reasoning process 63. 4.3. The Use of Regression Techniques for Performance Prediction There are two regression techniques exists: multiple linear regression and linear regression. In multiple regression, there are more than one predictor variables (independent variables). Meanwhile, linear regression is the basic one and normally used in predictive analysis. Linear regression is the connection between one independent variable and one dependent variables. The list of references is shown in Table 6. Table 6. The references of regression on academic performance predictions using more than 1 predictor variables. Methods

References

Results

Authors

Regression

Forum view, quiz view, assignment view, final grade

92%

37

10

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

311

4.4. The Use of Clustering Techniques for Performance Prediction It is an unsupervised classification of clusters based on their connectivity within the dimensional space 7. The list of references is shown in Table 7.

Table 7. The references of clustering on academic performance prediction. Methods

References

Results

Authors

Clustering

Time forums, number of words and sentences in forums

83%

64

5. Performance Evaluation and Discussion It is natural for the trends of educational data mining to grow since it explores the unexplored on some aspects that can fit into education sector especially in performance prediction, especially ones utilizing communication tools. Although the Prediction models utilizing communication tools are new, it can be a breakthrough. In fact, communication tools are very useful and powerful to contribute to data mining. One of which is Discussion forum, the powerful communication tools in institutions. Romero et al. claim that the classification model is suitable for educational practicality, as it has been proved to be comprehensible and precise for instructors and educational institutions in making decision2. The results using classification methods show that logistic regression, decision trees (J48), and Naïve Bayes are the best techniques for predicting students’ performance 51. However, not all models are user-friendly and easy to interpret. Some researchers faced problems; some models proposed are good in measurements, but difficult to be understood by users or vice versa (black and white models) 22. Black models are the models which have given good result 51, but it is not easy for users to understand the models. On the other hand, the white models are ones that did not show significant results yet have high rate of readability and understandability 22. 6. Challenges and future works The challenges from research on performance prediction is the presence of universal methods which can be used in many different sectors. For future works, the first major issue that has to be taken into account is that many research did not look at the broad educational data mining applied into communication platforms, such as chats, email, and online discussion forums. Most researchers are lack of focus on communication tools. This is the future works for many researchers to utilize the communication tools more. References 1. Romero C, Ventura S. Educational data mining: a review of the state of the art. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on.. 2010; 40(6): p. 601-18. 2. Romero C, Ventura S, Espejo PG. Data mining algorithms to classify students. In Hervás C, editor. Educational Data Mining.; 2008. 3. Pradeep A, Thomas J. Predicting College Students Dropout using EDM Techniques. International Journal of Computer Applications. 2015; 123(5). 4. Ahmad F, Ismail N, Aziz A. The Prediction of Students’ Academic Performance Using Classification Data Mining Techniques. Applied Mathematical Sciences. 2015; 9129: p. 6415-26. 5. Huebner R. A survey of Educational Data-Mining Research. Research in higher education journal. 2013 April

312

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

11

19. 6. Bates A. Technology, e-learning and distance education. Routledge. 2005. 7. Romero C, Ventura S, García E. Data mining in course management systems: Moodle case study and tutorial. Computers & Education. 2008; 51(1): p. 368-84. 8. Romero C, Ventura S. Educational data mining: A survey from 1995 to 2005. Expert systems with applications. 2007; 33(1): p. 135-46. 9. Baker R, Yacef K. The state of educational data mining in 2009: A review and future visions. JEDM-Journal of Educational Data Mining. 2009; 5(1): p. 3-17. 10. Kabakchieva D. Predicting student performance by using data mining methods for classification. Cybernetics and Information Technologies. 2013; 13(1): p. 61-72. 11. El-Halees A. Mining students data to analyze e-Learning behavior: A Case Study. In Department of Computer Science, Islamic University of Gaza; 2009. p. 108. 12. Sembiring S, Zarlis M, Hartama D, Ramliana S, Wani E. Prediction of student academic performance by an application of data mining techniques. In International Conference on Management and Artificial Intelligence; 2011. 13. Sundar P. A Comparative Study For Predicting Student’s Academic Performance Using Bayesian Network Classifiers. Journal of Engineering (IOSRJEN). . 14. Hung J, Wang M, Wang S, Abdelrasoul M, He W. Identifying At-Risk Students for Early Interventions? A Time-Series Clustering Approach. IEEE Transactions on Emerging Topics in Computing. 2015; PP(99): p. 1. 15. Abdous M, He W, Yen CJ. Using data mining for predicting relationships between online question theme and final grade. Educational Technology & Society. 2012; 15(3): p. 77-88. 16. Osmanbegović E, Suljić M. Data mining approach for predicting student performance. Economic Review. 2012; 10(1). 17. Romero C, Espejo P, Zafra A, Romero J, Ventura S. Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education. 2013; 21(1): p. 135-46. 18. AlJeraisy M, Mohammad H, Fayyoumi A, Alrashideh W. Web 2.0 in Education: the Impact of Discussion Board on Student Performance and Satisfaction. TOJET. 2015; 14(2). 19. Bekele R, Menzel W. A Bayesian approach to predict performance of a student (BAPPS): A Case with Ethiopian Students. algorithms. 2005; 22(23): p. 24. 20. Vandamme J, Meskens N, Superby J. Predicting academic performance by data mining methods. Education Economics. 2007; 15(4): p. 405-19. 21. Nakayama M, Yamamoto H, Santiago R. Impact of information literacy and learner characteristics on learning behavior of japanese students in on line courses. International Journal of Case Method Research & Application. 2008; 20(4): p. 403-15. 22. Romero C, López MI, Luna JM, Ventura S. Predicting students' final performance from participation in on-line discussion forums. Computers & Education. 2013; 68: p. 458-72. 23. You J. Identifying significant indicators using LMS data to predict course achievement in online learning. The Internet and Higher Education. 2016; 29: p. 23-30. 24. Wong JS, Pursel B, Divinsky A, Jansen B. An Analysis of MOOC Discussion Forum Interactions from the Most Active Users. Social Computing, Behavioral-Cultural Modeling, and Prediction. Social Computing. 2015;: p. 452-7. 25. Song L, McNary S. Understanding students' online interaction: Analysis of discussion board postings. Journal of Interactive Online Learning. 2011; 10(1): p. 1-14. 26. Nandi D, Hamilton M, Harland J, Warburton G. How active are students in online discussion forums? In Proceedings of the Thirteenth Australasian Computing Education Conference; 2011: Australian Computer Society, Inc. 27. Cheng C, Paré D, Collimore LM, Joordens S. Assessing the effectiveness of a voluntary online discussion

12

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

313

forum on improving students’ course performance. Computers & Education. 2011; 56(1): p. 253-61. 28. Jovanovic M, Vukicevic M, Milovanovic M, Minovic M. Using data mining on student behavior and cognitive style data for improving e-learning systems: a case study. International Journal of Computational Intelligence Systems. 2012; 5(3): p. 597-610. 29. Wen M, Yang D, Rosé C. Sentiment analysis in MOOC discussion forums: What does it tell us? Educational data mining. 2014 Jul 4. 30. Mihail R, Rubin B, Goldsmith J. Online discussions: improving education in CS? In Proceedings of the 45th ACM technical symposium on Computer science education; 2014: ACM. 31. Bagarinao R. Students' Navigational Pattern and Performance in An E-Learning Environment: A Case from UP Open University, Philippines. Turkish Online Journal of Distance Education. 2015; 16(1). 32. Vasanthakumar G, Shenoy P, Venugopal K. PFU: Profiling Forum users in online social networks, a knowledge driven data mining approach. In IEEE International WIE Conference on Electrical and Computer Engineering (WIECON-ECE); 2015. 33. Ferschke O, Howley I, Tomar G, Yang D, Rosé C. Fostering Discussion across Communication Media in Massive Open Online Courses. In Proceedings of the 11th International Conference on Computer Supported Collaborative Learning Gothenburgh; Sweden. 34. Mustafaraj E, Bu J. The Visible and Invisible in a MOOC Discussion Forum. In Proceedings of the Second ACM Conference on Learning @ Scale; 2015; Vancouver. p. 351-4. 35. Hew K. Promoting engagement in online courses: What strategies can we learn from three highly rated MOOCS. British Journal of Educational Technology. 2016; 47(2): p. 320-41. 36. Gašević D, Dawson S, Rogers T, Gasevic D. Learning analytics should not promote one size fits all: The effects of instructional conditions in predicting academic success. The Internet and Higher Education. 2016; 28: p. 68-84. 37. Černezel A, Karakatič S, Brumen B, Podgorelec V. Predicting Grades Based on Students’ Online Course Activities. In Uden L FODTILD. Knowledge Management in Organizations.: Cham: Springer International Publishing; 2014. p. 108-17. 38. Bogarín A, Romero C, Cerezo R, Sánchez-Santillán M. Clustering for improving educational process mining. In Proceedings of the Fourth International Conference on Learning Analytics And Knowledge; 2014: ACM. 39. Cobo G, García-Solórzano D, Morán J, Santamaría E, Monzo C, Melenchón J. Using agglomerative hierarchical clustering to model learner participation profiles in online discussion forums. In Proceedings of the 2nd International Conference on Learning Analytics and Knowledge; 2012: ACM. 40. Carceller C, Dawson S, Lockyer L. Improving academic outcomes: does participating in online discussion forums payoff? International Journal of Technology Enhanced Learning. 2013; 5(2): p. 117-32. 41. Ioannou A, Brown S, Artino A. Wikis and forums for collaborative problem-based activity: A systematic comparison of learners' interactions. The Internet and Higher Education. 2015; 24: p. 35-45. 42. Cerezo R, Sánchez-Santillán M, Paule-Ruiz M, Núñez J. Students' LMS interaction patterns and their relationship with achievement: A case study in higher education. Computers & Education. 2016; 96: p. 42-54. 43. David L, Karanik M, Giovannini M, Pinto N. Academic Performance Profiles: A Descriptive Model based on Data Mining. European Scientific Journal. 2015; 11(9). 44. Kabakchieva D, Stefanova K, V K. Analyzing university data for determining student profiles and predicting performance. In Educational Data Mining.; 2011. 45. Yadav S, Pal S. Data mining: A prediction for performance improvement of engineering students using classification. arXiv preprint arXiv:12033832. 2012 Mar 17. 46. Strecht P, Cruz L, Soares C, Mendes-Moreira J, Abreu R. A Comparative Study of Classification and Regression Algorithms for Modelling Students’ Academic Performance. International Educational Data Mining Society. 2015. 47. Lin SH. Data mining for student retention management. Journal of Computing Sciences in Colleges. 2012;

314

Febrianti Widyahastuti et al. / Procedia Computer Science 135 (2018) 302–314 Widyahastuti & Tjhin / Procedia Computer Science 00 (2018) 000–000

13

27(4): p. 92-9. 48. Thai-Nghe N, Busche A, Schmidt-Thieme L. Improving academic performance prediction by dealing with class imbalance. In International Conference Intelligent Systems Design and Applications (ISDA); 2009. 49. Jai R, David K. Predicting the Performance of Students in Higher Education Using Data Mining Classification Algorithms - A Case Study. IJRASET International Journal for Research in Applied Science & Engineering Technology. 2014; 2(XI). 50. Kabakchieva D. Student performance prediction by using data mining classification algorithms. International Journal of Computer Science and Management Research. 2012; 1(4): p. 686-90. 51. Baker R. Data mining for education. International encyclopedia of education. 2010; 7: p. 112-8. 52. Kay R. Developing a comprehensive metric for assessing discussion board effectiveness. British Journal of Educational Technology. 2006; 37(5): p. 761-83. 53. Ramos C, Yudko E. "Hits" (not "Discussion Posts") predict student success in online courses: A double crossvalidation study. Computer Education. 2008; 50(4): p. 1174-82. 54. Palmer S HDBS. Does the discussion help? The impact of a formally assessed online discussion on final student results. British Journal of Educational Technology. 2008; 39(5): p. 847-58. 55. Coldwell J, Craig A, Paterson T, Mustard J. Online students: Relationships between participation, demographics and academic performance. Electronic journal of e-learning. 2008; 6(1): p. 19-30. 56. Bliss C, Lawrence B. From Posts to Patterns: A Metric To Characterize Discussion Board Activity in Online Courses. Journal of Asynchronous Learning Networks. 2009; 13(2): p. 15-32. 57. Normore L, Blaylock B. Effects of communication medium on class participation: Comparing face-to-face and discussion board communication rates. Journal of Education for Library and Information Science. 2011;: p. 198-211. 58. Xia C, Fielder J, Siragusa L. Achieving better peer interaction in online discussion forums: A reflective practitioner case study. Issues in Educational Research. 2013; 23(1): p. 97-113. 59. Anderson A, Huttenlocher D, Kleinberg J, Leskovec J. Engaging with massive online courses. In Proceedings of the 23rd international conference on World wide web; 2014; Seoul, Korea.: ACM. p. 687-98. 60. Yoo J, Kim J. Can online discussion participation predict group project performance? Investigating the roles of linguistic features and participation patterns. International Journal of Artificial Intelligence in Education. 2014; 24(1): p. 8-32. 61. Romero CVS. Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2013; 3(1): p. 12-27. 62. Shahiri A, Husain W. A Review on Predicting Student's Performance Using Data Mining Techniques. Procedia Computer Science. 2015; 72: p. 414-22. 63. Romero C, Ventura S, Espejo P, Hervás C. Data mining algorithms to classify students. Educational Data Mining. 2008. 64. Bogarín A RCCRSSM. Clustering for improving educational process mining. In Proceedings of the Fourth International Conference on Learning Analytics And Knowledge; 2014: ACM. 65. Wook M, Yahaya Y, Wahab N, Isa M, Awang N, Seong H. Predicting NDUM student's academic performance using Data mining techniques.. In ICCEE; 2009: IEEE.