Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University gokhana@hacettepe.edu.tr, erdal.cosgun@hacettepe.edu.tr, altunar@hacettepe.edu.tr Abstract This study aims to predict students final grades (A, B, C, D and F) based on their wiki usage data. Usage data are stored in wiki database in a limited way when default settings are used. Therefore an extension is developed to extend its capability to log students login and navigation data. A tool is developed for extracting information from this data and preprocessing of it. Dataset includes server side wiki usage log of 81 students throughout 3 months. Classification performance of Random Forest, Support Vector Machines, Naive Bayes and Boosted Classification Tree algorithms are compared for classifying students. Tenfold cross validation is used to evaluate the performance of the models. According to our findings, SVM outperforms other methods with the best classification performance. Keywords: wiki, classification, educational data mining, predicting final grade. Main Conference Topic: New Trends and Experiences, Educational Data Mining Introduction Using wiki in online learning environments is increased in recent years especially with the increasing demand on collaborative learning. Wikis can be used in the following fields to support learning: in-class collaboration, group projects outside of class, collaborative environment for learning from peers, peer and teacher feedback and review, and assessment and management of group performance [1]. Although wiki has a great potential for online learning environments, assessment of the individual contribution is difficult and time consuming if traditional methods are used as many students can be contributed to the content creation. On the other hand there are lots of students - system interaction data are stored in wiki database like other online learning environments (e.g. forum, lms, vle). By analyzing these data with the help of statistical and data mining (DM) techniques, lots of useful Information can be extracted for tutoring, assessment or understanding of learning and learner behavior [2, 3]. Educational data mining is one of the remarkable research areas which has emerged in recent years and defined as the application of data mining techniques to dataset that come from educational settings to address important educational questions [4]. According to Romero and Ventura [2] these questions includes Analysis and Visualization of Data, Providing Feedback for Supporting Instructors, Recommendations for Students, Predicting Student s Performance, Student Modeling, Grouping Students, etc. To answer these questions, educational data mining research uses different DM methods such as Prediction, Clustering, Relationship Mining, Discovery with Models and Distillation of Data for Human Judgment [5]. Among others one of the key application of educational data mining is predicting student s performance. Prediction of a student s performance is one of the oldest and most popular applications of DM in education, and different techniques and models have been applied so far [2]. In a recent study Lopez et al. [6] demonstrates the potential of the classification via clustering approach to predict students final marks (passed or failed) on the basis of their participation in forums. Their results showed that student participation in the course forum 1

was a good predictor of the final marks for the course. Fausett and Elwasif [7] found that neural networks can be trained to predict students' grades in Calculus I based on their placement test responses. They used student's test response pattern as input and the grade in Calculus I as the target responses. Martinez [8] suggested that student pre-college assessment data can be used to predict academic success (a grade of A, B, or C) in community college courses with discriminant function analysis. Minaei-Bidgoli and Punch [9] presented an approach for classifying students by using genetic algorithms to predict their final grade based on logged data in online learning environment. Superby et al. [10] used classification for predicting factors influencing the academic success of the first-year university students by means of discriminant analysis, neural networks, random forests and decision tree. Kotsiantis et al. [11] compared six different machine learning algorithms for predicting students marks (pass or fail) in Hellenic Open University data. They also compared six regression algorithms to predict students marks on similar data [12]. Delgado et al. [13] implemented a neural network to Moodle access logs and trained trying to predict the surpass of a course from the students. The model proposed by these authors showed that it is possible to predict those students with problems to pass a course. Two recent studies compared different data mining methods and techniques for classifying students based on students Moodle interaction data for predicting the final marks obtained in the course [14, 15]. In this study we sought to examine the extent to which we can predict students course grades (A, B, C, D and F) on the basis of their wiki usage. MediaWiki was used as the wiki engine. MediaWiki is a free, open source and easy to use wiki engine for creating wiki based web sites. We developed an extension to log students login and navigation data which are not tracked in default configuration. Background We applied four of the most commonly used classification algorithm for predicting students final grades and compared their prediction performance. The following paragraphs describe these methods briefly. Random Forest: A random forest is a decision tree ensemble classifier, with each tree grown using some type of randomization. Random forests have a capacity for processing huge amounts of data with high training speeds, based on a Classification and Regression Tree (CART) [16]. CART is a simple statistical tool applying recursive binary partitioning of the feature space. CART is well known for its efficiency in coping with large data sets. However, as the data become noisier, and less information is contained in each variable, the predictive ability of CART diminishes. RF overcomes this problem by introducing random elements into the model by which subsets of variables are chosen at random and bootstrap samples are selected with replacement for tree growing [17]. For each classification tree, a bootstrap sample is drawn from the original samples [18]. At each non-leaf node of a classification tree, the best split feature is selected from a small random subset of the original features. When the forest receives an input vector, each classification tree casts a unique vote, the final prediction is determined by the majority votes of all the trees in the random forest. Since the bootstrap sample is drawn with replacement, the samples which are not in the bootstrap samples are called out-of-bag (OOB) data [18, 19]. Boosted Classification Tree: The algorithm for Boosting Trees evolved from the application of boosting methods to regression trees. The general idea is to compute a sequence of simple CARTs, where each successive tree is built for the prediction residuals of the preceding tree. This method will build binary trees, i.e., partition the data into two samples at each split node. We suppose that user were to limit the complexities of the trees to 3 nodes 2

only: a root node and two child nodes, i.e., a single split. Thus, at each step of the boosting (boosting trees algorithm), a simple (best) partitioning of the data is determined, and the deviations of the observed values from the respective means (residuals for each partition) are computed. The next 3-node tree will then be fitted to those residuals, to find another partition that will further reduce the residual (error) variance for the data, given the preceding sequence of trees. It can be shown that such "additive weighted expansions" of trees can eventually produce an excellent fit of the predicted values to the observed values, even if the specific nature of the relationships between the predictor variables and the dependent variable of interest is very complex (nonlinear in nature). Hence, the method of gradient boosting - fitting a weighted additive expansion of simple trees - represents a very general and powerful machine learning algorithm [20]. Support Vector Machines: SVMs are a relatively new computational learning methods based on the statistical learning theory presented by Vapnik [21]. In SVMs, original input space mapped into a high-dimensional dot product space called a feature space, and in the feature space the optimal hyper plane is determined to maximize the generalization ability of the classifier. The maximal hyper plane is found by exploiting the optimization theory, and respecting insights provided by the statistical learning theory [22]. Naïve Bayes: Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Bayesian classifier is based on Bayes theorem. Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computation involved and, in this sense, is considered naïve [23]. Let X = (x1, x2,..., xn) be a sample, whose components represent values made on a set of n attributes. In Bayesian terms, X is considered evidence. Let H be some hypothesis, such as that the data X belongs to a specific class C. For classification problems, our goal is to determine P(H X), the probability that the hypothesis H holds given the evidence, (i.e. the observed data sample X). In other words, we are looking for the probability that sample X belongs to class C, given that we know the attribute description of X. According to Bayes theorem, the probability that we want to compute P(H X) can be expressed in terms of probabilities P(H), P(X H), and P(X) as P(H X) = [P(X H)* P(H)] / P(X) [23]. Description of the Data Used The dataset used in this study was gathered from a wiki used by university students during a third-year course. Students used wiki to write reflection about concepts that they learned in Computer Network and Communication course. Variables selected for this experiment were extracted from two different tables. One was a revision table of wiki which stored all changes conducted by students. The other was a table which stored students login and navigation data via extension. Revision table included more than 1900 records and we used a WikLog tool developed by Akçapınar and Aşkar [24] to extract information automatically from this table. The usage data included server side wiki usage log of 81 students with a total of 1800 sessions and 40.000 page requests throughout 3 months. The tool was developed for extracting information from this data and pre-processing of it. Variables extracted from these two tables are shown in Table 1. Table 2 shows the summary of statistics for the extracted variables. 3

Table 1. Variables of a student in a wiki Name Domain Description n_session Usage log Total session count a_time Usage log Average time in one session n_mainpagereturn Usage log Main page return rate n_uniquepage Usage log Unique page visits n_revisits Usage log Total number of revisited web pages n_edit MediaWiki db Total number of edits n_word MediaWiki db Total word count f_grade Class Final grade of the student Table 2. Descriptive statistics for variables mean sd median min max n_session 22,69 20,77 18,00 1,00 143,00 a_time 17,81 7,57 17,38 1,47 46,49 n_mainpagereturn 25,19 13,99 22,00 6,00 80,00 n_uniquepage 143,25 77,83 146,00 2,00 265,00 n_revisits 56,15 18,91 60,00 0,00 87,00 n_edit 21,90 29,24 9,00 0,00 130,00 n_word 161,98 251,66 60,00 0,00 1240,00 Results Naive Bayes, Support Vector Machines, Boosted Classification Tree and Random Forest were implemented by R software. We used the gbm package for BCT, the randomforest package for RF, and the e1071 package for SVM and Naive Bayes. The models were generalized with 10-fold Cross Validation (CV). In this study True Classification Rate of four different data mining techniques for classifying students are compared. Table 3 shows classification accuracy of these techniques. According to these results the best method with our data is SVMs. Table 3. Classification accuracy of classification algorithm Algorithm Classification Accuracy (%) Random Forest 1 63,3 Support Vector Machine 2 67,1 Naïve Bayes 3 59,6 Boosted Classification Tree 4 61,4 1 RF: 1000 tree, 5 mtry. 2 SVM: Radial Based Kernel. 3 Naive Bayes: Threshold: 0.100, Sub-Sample Rate: 0,30. 4 Boosted Classification Tree: 1000 tree, Number of Additive Terms: 200, Learning Rate: 0.1000. 4

Conclusions Although mining educational data to predict students' performance is not a new phenomenon, there is no published paper on the use of data mining techniques to predict student performance based on their wiki usage data until now. This paper reports the comparison of Random Forest, Support Vector Machines, Naive Bayes, and Boosted Classification Tree for classifying students for predicting final grades obtained in an undergraduate course on the basis of their wiki usage data. In recent years, these methods became popular and robust for the prediction problems. We compared different classification algorithm because there is not one single algorithm that obtains the best classification accuracy in all cases and all datasets [15, 25]. According to our findings, SVM outperforms other methods. Possible reason of this result could be that our classification problem is nonlinear. On the other hand, tree based methods have enough performance for prediction as well. These findings showed that data mining methods can help researchers to assess students individual contributions to wiki if the necessary information is stored in a database or in log files. Presented study also showed that students navigation logs and wiki usage data are good predictors of their course performance. For future research, instructors can use the extracted knowledge for decision making and for classifying new students [15]. Feedback is an important variable in changing behavior, and studies suggests that many students will respond appropriately in the face of feedback that they understand [26]. These extracted knowledge can also be used as a feedback to help students who are potentially at risk and intervene in their problems early enough to allow them to change their behavior. References 1. Ben-Zvi, D., Using Wiki to Promote Collaborative Learning in Statistics Education. Technology Innovations in Statistics Education, 2007. 1(1). 2. Romero, C. and S. Ventura, Educational Data Mining: A Review of the State of the Art. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 2010. 40(6): p. 601-618. 3. Rudas, I.J. and P. Tóth. Web Mining Usage in Course Development. in The SEFI Annual Conference 2011. 2011. Lisbon, Portugal. 4. Romero, C. and S. Ventura, Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2013. 3(1): p. 12-27. 5. Baker, R.S.J.d., Data Mining for Education, in In International Encyclopedia of Education, 3rd Ed., B. McGaw, P. Peterson, and E. Baker, Editors. 2011, Oxford, UK: Elsevier. 6. Lopez, M.I., et al. Classification via clustering for predicting final marks based on student participation in forums. in 5th International Conference on Educational Data Mining, EDM 2012. 2012. Chania, Greece. 7. Fausett, L.V. and W. Elwasif. Predicting performance from test scores using backpropagation and counterpropagation. in Neural Networks, 1994. IEEE World Congress on Computational Intelligence., 1994 IEEE International Conference on. 1994. 8. Martinez, D. Predicting Student Outcomes Using Discriminant Function Analysis. 2001. 9. Minaei-Bidgoli, B. and W. Punch, Using Genetic Algorithms for Data Mining Optimization in an Educational Web-Based System Genetic and Evolutionary Computation GECCO 2003, E. Cantú-Paz, et al., Editors. 2003, Springer Berlin / Heidelberg. p. 206-206. 5

10. Superby, J.F., J.P. Vandamme, and N. Meskens. Determination of Factors Influencing the Achievement of the First-year University Students using Data Mining Methods. in Workshop on Educational Data Mining. 2006. 11. Kotsiantis, S., C. Pierrakeas, and P. Pintelas, Predicting Students' Performance in Distance Learning Using Machine Learning Techniques. Applied Artificial Intelligence, 2004. 18(5): p. 411-426. 12. Kotsiantis, S.B. and P.E. Pintelas. Predicting students marks in Hellenic Open University. in Advanced Learning Technologies, 2005. ICALT 2005. Fifth IEEE International Conference on. 2005. 13. Delgado, M., et al. Predicting Students Marks from Moodle Logs using Neural Network Models. in Current Developments in Technology-Assisted Education. 2006. Badajoz. 14. Romero, C., et al. Data mining algorithms to classify students. in Proc. Int. Conf. Educ. Data Mining. 2008. Montreal, Canada. 15. Romero, C., et al., Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 2010: p. n/a-n/a. 16. Ko, B., S. Kim, and J.-Y. Nam, X-ray Image Classification Using Random Forests with Local Wavelet-Based CS-Local Binary Patterns. Journal of Digital Imaging, 2011. 24(6): p. 1141-1151. 17. Chen, C.C.M., et al., Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression. Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 2011. 8(6): p. 1580-1591. 18. Breiman, L., Random Forests. Machine Learning, 2001. 45(1): p. 5-32. 19. Lin, X., et al., A method for handling metabonomics data from liquid chromatography/mass spectrometry: combinational use of support vector machine recursive feature elimination, genetic algorithm and random forest for feature selection. Metabolomics, 2011. 7(4): p. 549-558. 20. StatSoft, I., Electronic Statistics Textbook. 2011, StatSoft: Tulsa. 21. Vapnik, V., Statistical learning theory. 1998: Wiley. 22. Widodo, A., B.-S. Yang, and T. Han, Combination of independent component analysis and support vector machines for intelligent faults diagnosis of induction motors. Expert Systems with Applications, 2007. 32(2): p. 299-312. 23. Han, J., M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, Second Edition (The Morgan Kaufmann Series in Data Management Systems). 2006: Morgan Kaufmann. 24. Akçapınar, G. and P. Aşkar. Measuring Author Contributions to the Mediawiki. in IADIS International Conference WWW/Internet 2009. 2009. Rome, Italy. 25. Osmanbegović, E. and M. Suljić, Data Mining Approach for Predicting Student Performance. Economic Review, 2012. 10(1). 26. Bienkowski, M., M. Feng, and B. Means, Enhancing Teaching and Learning Through Educational Data Mining and Learning Analytics: An Issue Brief. 2012: Washington, D.C. 6