Educational Data Mining / Learning Analytics: Methods, Tasks and Current Trends

Size: px
Start display at page:

Download "Educational Data Mining / Learning Analytics: Methods, Tasks and Current Trends"

Transcription

1 Sabine Rathmayer, Hans Pongratz (Hrsg.): Proceedings of DeLFI Workshops 2015 co-located with 13th e-learning Conference of the German Computer Society (DeLFI 2015) München, Germany, September 1, Educational Data Mining / Learning Analytics: Methods, Tasks and Current Trends Agathe Merceron 1 Abstract: The 1 st international conference on Educational Data Mining (EDM) took place in Montreal in 2008 while the 1 st international conference on Learning Analytics and Knowledge (LAK) took place in Banff in Since then the fields have grown and established themselves with an annual international conference, a journal and an association each, and gradually increase their overlapping. This paper begins with some considerations on big data in education. Then the principal analysis methods used with educational data are reviewed and are illustrated with some of the tasks they solve. Current emerging trends are presented. Analysis of educational data on a routine basis to understand learning and teaching better and to improve them is not a reality yet. The paper concludes with challenges on the way. Keywords: Educational data mining, learning analytics, prediction, clustering, relationship mining, distillation of data for human judgment, discovery with models, multi modal analysis, multilevel analysis, natural language processing, privacy, data scientist. 1 Introduction Big Data in Education was the name of a MOOC offered on Coursera in 2013 by Ryan Baker. What means big data in education? To answer this question I consider different sources of educational data following the categorization of [RV 10]. Schools and universities use information systems to manage their students. Take the case of a small- medium European university with students and let us focus on the marks. Assuming that each student is enrolled in 6 courses, each semester the administration records new marks (including the null value when students are absent). Many universities and schools use a Learning Management System (LMS) to run their courses. Let us take an example of a small course, not a MOOC, taught for 60 students on 12 weeks with one single forum, and a set of slides and one quiz per week. LMSs record students interactions, in particular when students click on a resource, write or read in the forum. Assume that each student clicks on average twice each week on the set of slides and the quiz, and 3 times on the forum in the semester. This gives 3060 interactions that are stored for one course during one semester. Let us suppose that the small-medium university from above has 40 degree-programs with 15 courses each. This gives interactions stored by the LMS each semester. 1 Beuth Hochschue für Technik, Fachbereich Medieninformatik, Luxemburgerstrasse 19, Berlin, [email protected]

2 102 Agathe Merceron Another main source of data in education is dedicated software like Intelligent Tutoring Systems that students use to train specific skills in one discipline. A well-known repository for such data is Datashop [KBS10] that contains millions of interactions. On top of those main sources of data, there are various other sources like social media, questionnaires or online forums. These simple considerations show that data in education are big and cannot be analyzed by hand. Research published the international educational data mining society or the society of learning analytics and research show that analyzing these data allows us to better understand students, and the settings which they learn in. [BY 09]. These two societies started around different persons with different research backgrounds [BS 14] but they have similar aims. While research with an emphasis in machine learning appears more in EDM and research with an emphasis on humans appear more in LAK, research that could be published in both conferences grows each year as the citations in this paper make clear. The next section reviews the computational methods used to analyze educational data as listed in [BY 09] and illustrates some of the tasks they solve. Current trends are presented in section three. The conclusion presents challenges of the field. 2 Computational Methods Methods come mainly from machine learning, data mining and, emerging in the last years, from natural language processing; methods classical in artificial intelligence such as the hill-climbing algorithm are occasionally used [Ba 05]. 2.1 Prediction A major task tackled by prediction methods is to predict performance of students. Predicting performance has several levels of granularity: it can be predicting that there will be no performance at all when students drop-off [WZN13], predicting pass/fail or the mark in a degree [ZBP11, AMP14], pass / fail or the mark in a course [LRV12] or predicting whether a student masters a given skill in a tutoring system [PHA07]. The numerous studies published in that area show that it is indeed possible to predict drop-off or performance in a degree or in a course (MOOCs excluded) with a reasonable accuracy, mostly over 70%. However these studies show also that there is neither one classifier nor a set of features that work well in all contexts though a number of studies indicate that having only socio-economic features and no marks at all lead to a poorer accuracy [GD 06, ZBP11]. Therefore one has to investigate which methods and which features work the best with the data at hand. I take [AMP14] to illustrate such a work. The aim of this work is to predict the class or interval A, B, C, D or E, in which the mark

3 Interaktive Visualisierung zur Darstellung und Bewertung von Learning-Analytics-Ergebnissen in Foren mit vielen Teilnehmern 103 of the degree lies. The degree is a 4 years Bachelor of Science in Computing and Information Technology in a technical university in Pakistan. Enrollment in this degree is competitive: students are selected based on their marks at High School Certificate (short HSC), average and Math-Physics-Chemistry, and on their performance on a entrance test. Because of this context, drop-off is almost non-existent. Using in particular the conclusions of [GD 06, ZBP11], [AMP14] conjectured that marks only, no socio- economic features, might be enough to predict performance at the end of the degree with an acceptable accuracy. The features that have been used to build several classifiers are HSC marks as well as first year and second year marks in each course. Table 1 shows the classifiers that have achieved an accuracy better than the baseline of 51.92%, the accuracy that is achieved when the majority class C is always predicted. Classifier Accuracy / Kappa Decision Tree with Gini Index 68.27% / 0.49 Decision Tree with Information Gain 69.23% / Decision Tree with Accuracy 60.58% / Rule Induction with Information Gain 55.77% / Nearest Neighbors 74.04% / Naives Bayes 83.65% / Neural Networks 62.50% / Random Forest with Gini Index 71.15% / Random Forest with Information Gain 69.23% / Random Forest with Accuracy 62.50% / Tab. 1: Comparison of Classifiers A unique feature of this work is to take one cohort to train a classifier and the next cohort to test it, as opposed to most of the works reported in the literature which use cross-validation, which means that only one cohort is used to train and test the classifier. The aim of using two successive cohorts is to check how well results generalize over time so as to use the experience of one cohort to put in place some policy to detect weak or strong students for the following cohort. One notices that 1- nearest neighbor and Naives Bayes perform particularly well although they have the drawback of not giving a human interpretable explanation of the results: it is not possible to know whether some courses could act as detectors of particularly poor or particularly good performance. 2.2 Clustering Clustering techniques are used to group objects so that similar objects are in the same cluster and dissimilar objects in different clusters. There are various clustering techniques and there are many tasks that use clustering. [CGS12] for instance clusters students and find typical participation s behaviors in forums. [EGL15] clusters utterances and is concerned with classifying automatically dialog acts

4 104 Agathe Merceron also called speech acts within tutorial dialogs. A dialog act is the action that a person performs while uttering a sentence like asking a question ( What is an anonymous class? ), exposing a problem or issue, giving an answer, giving a hint ( Here an interesting link about Apache Ant ), making a statement ( The explanations of the lectures notes are a bit succinct ), giving a positive acknowledgment ( Thanks, I have understood ), etc.. A common way of classifying sentences or posts into dialog acts is to use prediction or supervised methods as done in [KLK10]. First a labeled corpus is built: several annotators label the sentences of a corpus and identify cues or features to choose the dialog act. Support vector machines are reported to do rather well for this kind of task: [KLK10] reports F-score varying from 0.54 for positive acknowledgment (9.20% of the sentences of the corpus) to 0.95 for questions (55.31% of the sentences of the corpus). A major drawback of this approach is getting the labeled corpus, a major work done by hand. Therefore several works such as [EGL15] investigate approaches to classify sentences without the manual labeling. The corpus of [EGL15] comes from a computer-mediated environment to tutor students in introductory programming; moreover, in this case study, students have been recorded by Kinect cameras. Sentences are described by different kinds of features: lexical features (e.g. unigrams, word ordering, punctuation), dialog-context features (e.g. utterance position, utterance length, author of the previous message (student, tutor)), task features (e.g. writing code), posture features (e.g. distance between camera and head, mid torso, lower torso) and gesture features (e.g. one-hand-to-face, two-hands-to-face). Utterances are clustered using the K-Medoids algorithm and Bayesian Information Criterion (BIC) to infer the optimal number of clusters. For lexical features the distance between two utterances is calculated using their longest common subsequence and for other features using cosine similarity. Several clusterings are performed according to the dialog act of the previous tutor utterance. The majority vote of the utterances in each cluster gives the dialog act or label of that cluster. To classify a new student's utterance, the proper clustering is chosen according to the preceding dialog act of the tutor and the distance between the new utterance and the center of each cluster is calculated. The nearest cluster gives its dialog act to the utterance. Using a manually labeled corpus for evaluation, and a leave-one-student-out cross-validation, an average accuracy of 67% is reported (61.7% without posture and gesture features). Even if these results stay below what is currently achieved with supervised methods, this approach is very promising and continues to improve over earlier similar work such as reported in [VMN12]. 2.3 Relationship Mining [BY 09] divides this category into four sub-categories. Two of them, association rule mining and correlation mining, are illustrated here. [MY 05] uses the apriori algorithm for association rules to find mistakes that students often make together while solving exercises with a logic tutor. Results include associa-

5 Interaktive Visualisierung zur Darstellung und Bewertung von Learning-Analytics-Ergebnissen in Foren mit vielen Teilnehmern 105 tions such as if a student chooses the wrong set of premises to apply a rule, s/he is likely to also make a wrong deduction when applying a rule. Such findings have been used to enhance the tutor with proactive feedback whenever students make a mistake belonging to the found associations. One challenge in using association rules is the big number of rules that algorithms can return and the choice of an appropriate interestingness measure to filter them [MY 10]. [BCK04] conducted observations of students while using a cognitive tutor for middle school mathematics. Observers recorded whether students were on-task or off-task and, when off-task, whether students were in conversation, doing something else, inactive or gaming the system. Gaming the system means that a student uses quickly the hints offered by the tutor and so finishes quickly an exercise as the solution is basically given by the tutor. The calculation of correlations between post-tests and different off-task behaviors revealed that the biggest correlation in absolute value was obtained with gaming the system: This is high enough to indicate that gaming the system has a negative impact on learning. 2.4 Distillation of Data for Human Judgment This category includes statistics and visualizations that help humans make sense of their findings and analyses. Proper diagrams on the proper data help to grasp what happens at a glance and they form the essence of many dashboards and analytics tools such as LeMo [FEM13]. All the works presented so far show that data preparation is crucial. This remains true for visualization. Students and their marks can be visualized by a heat map. Clustering the students according to their marks [AMP15] and using the order given by the clustering to build the heat map helps visualize courses that can act as indicators of good or poor performance. 2.5 Discovery with Models As noted in [BY 09] this category is usually absent from conventional books about data mining or machine learning. This category encompasses approaches in which the model obtained in a previous study is included in the data to discover more patterns. An interesting illustration is given by the work of [BCR06] and [SPB15]. Building on [BCK04] the work in [BCR06] proposes a detector for gaming the system. This detector uses only data stored in the log files recorded by the cognitive tutor, no other source of data from sensors or cameras. Features include the number of times a specific step is wrong across all problems, the probability that the student knows a skill as calculated by the tutor, various times such as the time taken by the last 3 or 5 actions. Latent Response Models have been used to build the detector. This detector has been shown to generalize to new students and to new les-

6 106 Agathe Merceron sons of the cognitive tutor, and thus can be used to infer whether students who used the cognitive tutor gamed the system without having to actually observe them. The work in [SPB15] investigates the relation between different affects and behaviors and the majors chosen in college. Students who game the system enroll less in Science, Math. & Technology. 3 Current Trends Over the three last editions of the EDM conferences and the last edition of the LAK conference I observe an increase in the number of papers using following techniques: Natural Language Processing, Multilevel Analysis and Multimodal Analysis. Textual data produced by learners receive more attention. Methods of natural language processing are mainly used to analyze tutorial dialogs, to model students' reading and writing skills and to understand discussion forums. Multimodal analysis means that data from different sources are aggregated together as illustrated earlier with [EGL15]. Multilevel analysis means that different kinds of data stored by one system are aggregated and analyzed as in the framework Traces [Su 15] that is used to analyze interactions of a large community of users in a kind of learning platform offering chats, forums, file uploads and a calendar. Traces extracts events from the database and constructs contingency graphs which show the likelyhood that events are related. For example two events like uploading a file and writing a message in a chat might be related by a proximal contingency if they occur close enough in time, or two events like two messages having an overlap in their vocabulary might be related by a lexical contingency. These graphs can be abstracted and folded at several levels, the most general level being a sociogram which represents how actors are related through their contributions. Traces" can detect session of activities and in these sessions identify the main actors and those who might be disengaged. On a much smaller scale [Me 14] relates the forum level (dialog acts) to the performance level in an online-course taught with a LMS. 4 Conclusion Big data in education is a reality. There are numerous approaches to analyze educational data, numerous tasks that are tackled and interesting findings that are discovered. What is not a reality yet is the analysis of educational data on a routine basis to understand learning and teaching better and to improve them. I see at least two challenges on the way. One is privacy. Users of educational software have to trust what happens with their data that systems store and analyze. A reasonable answer is opt-in: interactions are stored only when users opt for it. This can limit the available data, hence the findings that can be made. Another challenge is generalizability: is a classifier for predicting

7 Interaktive Visualisierung zur Darstellung und Bewertung von Learning-Analytics-Ergebnissen in Foren mit vielen Teilnehmern 107 performance still valid 2 years later or for another degree? My answer is probably not. Validation needs to be checked regularly, which can slow down the adoption of educational data mining or learning analytics in everyday life. Models that are demanding computationally like classifiers for performance or detectors of behaviors have to be continuously re-established by data scientists. Nonetheless I believe that this field will continue to grow and finds its place in everyday education. Literaturverzeichnis [AMP14] Asif, R.; Merceron, A.; Pathan, M.K.: Predicting Student Academic Performance at Degree Level: A Case Study, In International Journal of Intelligent Systems and Applications (IJISA), Volume 7, Number 1, S , [AMP15] Asif, R., Merceron, A. and Pathan, M.K.: Investigating Performance of Students: a Longitudinal Study. In (Blinkstein, A.; Merceron, A.; Siemens, G.; Baron, J.; Marziaz, N.; Lynch, G. Hrsg.) Proceedings of LAK 15, ACM, S , [Ba 05] [BCK04] [BCR06] Barnes, T.: Q-matrix Method: Mining Student Response Data for Knowledge. In the technical Report (WS-05-02) of the AAAI-05 Workshop on Educational Data Mining. 8 p., Baker, R.; Corbett, A.T.; Koedinger, K.; Wagner, A.Z.: Off-task behavior in the cognitive tutor classroom: when students game the system. In Proc. Of SIGCHI conference of Human Factors in Computing Systems, Vienna, Austria, , Baker, R.; Corbett, A.T.; Roll, I.; Koedinger, K.: Developing a generalizable detector of when students game the system. User Modeling and User-Adapted Interaction, 18(3), , [BS 14] Baker, R.; Siemens G.: Educational data mining and learning analytics. In Sawyer, K. (Ed.) Cambridge Handbook of the Learning Sciences: 2nd Edition, pp , [BY 09] [CGS12] [EGL15] [FEM13] Baker, R.; Yacef, K.: The State of Educational Data Mining in 2009: A Review and Future Visions, In Journal of Educational Data Mining, Vol. 1(1), Cobo, G., Garcia, D., Santamaria, E., Moran, J.A., Melenchon, J., Monzo, C. Using agglomerative hierarchical clustering to model learner participation profiles in online discussion forums. In (Dawson, S., Haythornthwaite, C. Eds.): Proceedings of the 2nd International Conference on Learning Analytics and Knowledge., ACM, S , Ezen-Can, A.; Grafsgaard, J.F.; Lester, J.C., Boyer, K.E.: Classifying Dtudent Dialogue Avts with Multimodal Learning Analytics. In (Blinkstein, A.; Merceron, A.; Siemens, G.; Baron, J.; Marziaz, N.; Lynch, G. Hrsg.) Proceedings of LAK 15, ACM, S , Fortenbacher, A.; Elkina, M.; Merceron, A.: The Learning Analytics Application LeMo Rationals and First Results. In International Journal of Computing, Volume 12, Issue 3, S , 2013.

8 108 Agathe Merceron [GD 06] P. Golding, O. Donaldson: Predicting Academic Performance, Proceedings of 36th ASEE /IEEE Frontiers in Education Conference, [KBS10] Koedinger, K.; Baker, R.., Cunningham, K., Skogsholm, A., Leber, B.; Stamper, J.: A Data Repository for the EDM Community: The PSLC DataShop, Handbook of Educational Data Mining, CRC Press, S , [KLK10] [LRV12] Kim, J.; Li, J.; Kim. T. Towards Identifying Unresolved Discussions in Student Online Forums. In (Tetreault, J., Burstein, J., Leacock, C. Hrsg.): Proceedings of the 5th Wokshop on Innovative Use of NLP for Building Educational Applications. (Los Angeles, CA, USA, June 2010). Association for Computational Linguistics, Lopez, M. I., Romero, R., Ventura, V., Luna, J.M. Classification via clustering for predicting final marks starting from the student participation in Forums. In (Yacef, K., Zaïane, O., Hershkovitz, H., Yudelson, M., and Stamper, J. Hrsg.): Proceedings of the 5th International Conference on Educational Data Mining. S , [MY 05] Merceron, A.; Yacef, K.: Educational Data Mining: a case study. In (C.-K. Looi, G. McCalla, B. Bredeweg and J. Breuker Eds) Proceedings of Artificial Intelligence in Education (AIED2005), S , [Me 14] [MY 10] [PHA07] [RV 10] [SPB15] [Su 15] Merceron, A.: Connecting Analysis of Speech Acts nd Performance Analysis: a Initial Study. In Proceedings of the Workshop 3: Computational Approaches to Connecting Levels of Analysis in Networked Learning Communities, LAK 2014, Vol-1137, Merceron, A.; Yacef, K.: Measuring correlation of Strong Association Rules in Educational Data. In (C. Romero, S. Ventura, M. Pechenizkiy & R.S.J.d. Baker Eds.) the Handbook of Educational Data Mining. CRC Press, S , Pardos, Z.; Hefferman, H.; Anderson, B.; Hefferman, C.: The effect of Model Granularity on Student Performance Prediction Using Bayesian Networks, In Proceedings of the international Conference on User Modelling, Springer, Berlin, ps , Romero, C.; Ventura, S.: Educational Data Mining: A Review of the State of the Art. IEEE transactions on Systems, Man and Cybernetics, vol. 40(6), S , San Pedro, M.O., R. Baker, N. Heffernan, J. Ocumpaugh: What Happens to Students Who Game the System?. In (Blinkstein, A.; Merceron, A.; Siemens, G.; Baron, J.; Marziaz, N.; Lynch, G. Hrsg.) Proceedings of LAK 15, ACM, S , Suthers, D.: From Contingencies to Network-level Phenomena: Multilevel Analysis of Activity and Actors in Heterogeneous Networked Learning Environments. In (Blinkstein, A.; Merceron, A.; Siemens, G.; Baron, J.; Marziaz, N.; Lynch, G. Hrsg.) Proceedings of LAK 15, ACM, S , [VMN12] Rus, V.; Moldovan, C.; Niraula, N.; Graesser, C.C.: Automated discovery of speech act categories in educational games. In Proceedings of the International Conference on Educational Data Mining, S , 2012.

9 Interaktive Visualisierung zur Darstellung und Bewertung von Learning-Analytics-Ergebnissen in Foren mit vielen Teilnehmern 109 [WZN13] Wolff, A.; Zdrahal, Z.; Nikolov, A.; Pantucek, M.: Improving retention: predicting atrisk students by analysing clicking behaviour in a virtual learning environment. In Proceedings of the Third International Conference on Learning Analytics and Knowledge, S , [ZBP11] Zimmermann, J.; Brodersen, K.; Pellet, J.P.; August, E.; Buhmann, J.M.: Predicting graduate-level performance from undergraduate achievements. In Proceedings of the 4th International Conference on Educational Data Mining, S , 2011.

EDM: Methods, Tasks and Emerging Trends

EDM: Methods, Tasks and Emerging Trends EDM: Methods, Tasks and Emerging Trends Agathe Merceron Beuth Hochschule für Technik, Berlin Beuth Hochschule für Technik Berlin 1 Outline! Introduction: Big Data in Education! Methods and Tasks:! Prediction!

More information

How To Solve The Kd Cup 2010 Challenge

How To Solve The Kd Cup 2010 Challenge A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]

More information

Scholars Journal of Arts, Humanities and Social Sciences

Scholars Journal of Arts, Humanities and Social Sciences Scholars Journal of Arts, Humanities and Social Sciences Sch. J. Arts Humanit. Soc. Sci. 2014; 2(3B):440-444 Scholars Academic and Scientific Publishers (SAS Publishers) (An International Publisher for

More information

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal E-Learning Using Data Mining Shimaa Abd Elkader Abd Elaal -10- E-learning using data mining Shimaa Abd Elkader Abd Elaal Abstract Educational Data Mining (EDM) is the process of converting raw data from

More information

Predicting Student Academic Performance at Degree Level: A Case Study

Predicting Student Academic Performance at Degree Level: A Case Study I.J. Intelligent Systems and Applications, 2015, 01, 49-61 Published Online December 2014 in MECS (http://www.mecs-press.org/) DOI: 10.5815/ijisa.2015.01.05 Predicting Student Academic Performance at Degree

More information

Intinno: A Web Integrated Digital Library and Learning Content Management System

Intinno: A Web Integrated Digital Library and Learning Content Management System Intinno: A Web Integrated Digital Library and Learning Content Management System Synopsis of the Thesis to be submitted in Partial Fulfillment of the Requirements for the Award of the Degree of Master

More information

COURSE RECOMMENDER SYSTEM IN E-LEARNING

COURSE RECOMMENDER SYSTEM IN E-LEARNING International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand

More information

Data Mining for Education

Data Mining for Education Data Mining for Education Ryan S.J.d. Baker, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA [email protected] Article to appear as Baker, R.S.J.d. (in press) Data Mining for Education. To appear

More information

Unsupervised Modeling for Understanding MOOC Discussion Forums: A Learning Analytics Approach

Unsupervised Modeling for Understanding MOOC Discussion Forums: A Learning Analytics Approach Unsupervised Modeling for Understanding MOOC Discussion Forums: A Learning Analytics Approach ABSTRACT Aysu Ezen-Can Department of Computer Science [email protected] Shaun Kellogg Friday Institute for Educational

More information

Mining Wiki Usage Data for Predicting Final Grades of Students

Mining Wiki Usage Data for Predicting Final Grades of Students Mining Wiki Usage Data for Predicting Final Grades of Students Gökhan Akçapınar, Erdal Coşgun, Arif Altun Hacettepe University [email protected], [email protected], [email protected]

More information

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance.

Keywords Big Data; OODBMS; RDBMS; hadoop; EDM; learning analytics, data abundance. Volume 4, Issue 11, November 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Analytics

More information

Using Artificial Intelligence to Manage Big Data for Litigation

Using Artificial Intelligence to Manage Big Data for Litigation FEBRUARY 3 5, 2015 / THE HILTON NEW YORK Using Artificial Intelligence to Manage Big Data for Litigation Understanding Artificial Intelligence to Make better decisions Improve the process Allay the fear

More information

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100 Erkan Er Abstract In this paper, a model for predicting students performance levels is proposed which employs three

More information

Edifice an Educational Framework using Educational Data Mining and Visual Analytics

Edifice an Educational Framework using Educational Data Mining and Visual Analytics I.J. Education and Management Engineering, 2016, 2, 24-30 Published Online March 2016 in MECS (http://www.mecs-press.net) DOI: 10.5815/ijeme.2016.02.03 Available online at http://www.mecs-press.net/ijeme

More information

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL The Fifth International Conference on e-learning (elearning-2014), 22-23 September 2014, Belgrade, Serbia BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL SNJEŽANA MILINKOVIĆ University

More information

Chapter X: Educational Data Mining and Learning Analytics

Chapter X: Educational Data Mining and Learning Analytics Chapter X: Educational Data Mining and Learning Analytics Ryan Shaun Joazeiro de Baker and Paul Salvador Inventado Abstract In recent years, two communities have grown around a joint interest in how big

More information

Predicting Student Performance by Using Data Mining Methods for Classification

Predicting Student Performance by Using Data Mining Methods for Classification BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 1 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0006 Predicting Student Performance

More information

A STUDY ON EDUCATIONAL DATA MINING

A STUDY ON EDUCATIONAL DATA MINING A STUDY ON EDUCATIONAL DATA MINING Dr.S.Umamaheswari Associate Professor K.S.Divyaa Research Scholar, School of IT and Science Dr.G.R.Damodaran College of Science Tamilnadu ABSTRACT Data mining is called

More information

Business Intelligence in E-Learning

Business Intelligence in E-Learning Business Intelligence in E-Learning (Case Study of Iran University of Science and Technology) Mohammad Hassan Falakmasir 1, Jafar Habibi 2, Shahrouz Moaven 1, Hassan Abolhassani 2 Department of Computer

More information

InVis: An EDM Tool For Graphical Rendering And Analysis Of Student Interaction Data

InVis: An EDM Tool For Graphical Rendering And Analysis Of Student Interaction Data InVis: An EDM Tool For Graphical Rendering And Analysis Of Student Interaction Data Vinay Sheshadri [email protected] Collin Lynch [email protected] Dr. Tiffany Barnes [email protected] ABSTRACT InVis is

More information

Learning is a very general term denoting the way in which agents:

Learning is a very general term denoting the way in which agents: What is learning? Learning is a very general term denoting the way in which agents: Acquire and organize knowledge (by building, modifying and organizing internal representations of some external reality);

More information

Learning, Schooling, and Data Analytics Ryan S. J. d. Baker

Learning, Schooling, and Data Analytics Ryan S. J. d. Baker Thank you for downloading Learning, Schooling, and Data Analytics Ryan S. J. d. Baker from the Center on Innovations in Learning website www.centeril.org This report is in the public domain. While permission

More information

E-Learning Standards and Learning Analytics

E-Learning Standards and Learning Analytics E-Learning Standards and Learning Analytics Can Data Collection Be Improved by Using Standard Data Models? Ángel del Blanco, Ángel Serrano, Manuel Freire, Iván Martínez-Ortiz, Baltasar Fernández-Manjón

More information

Modeling learning patterns of students with a tutoring system using Hidden Markov Models

Modeling learning patterns of students with a tutoring system using Hidden Markov Models Book Title Book Editors IOS Press, 2003 1 Modeling learning patterns of students with a tutoring system using Hidden Markov Models Carole Beal a,1, Sinjini Mitra a and Paul R. Cohen a a USC Information

More information

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE

EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE EFFICIENCY OF DECISION TREES IN PREDICTING STUDENT S ACADEMIC PERFORMANCE S. Anupama Kumar 1 and Dr. Vijayalakshmi M.N 2 1 Research Scholar, PRIST University, 1 Assistant Professor, Dept of M.C.A. 2 Associate

More information

Modeling and Design of Intelligent Agent System

Modeling and Design of Intelligent Agent System International Journal of Control, Automation, and Systems Vol. 1, No. 2, June 2003 257 Modeling and Design of Intelligent Agent System Dae Su Kim, Chang Suk Kim, and Kee Wook Rim Abstract: In this study,

More information

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results , pp.33-40 http://dx.doi.org/10.14257/ijgdc.2014.7.4.04 Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results Muzammil Khan, Fida Hussain and Imran Khan Department

More information

An Overview of Knowledge Discovery Database and Data mining Techniques

An Overview of Knowledge Discovery Database and Data mining Techniques An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,

More information

Mapping Question Items to Skills with Non-negative Matrix Factorization

Mapping Question Items to Skills with Non-negative Matrix Factorization Mapping Question s to Skills with Non-negative Matrix Factorization Michel C. Desmarais Polytechnique Montréal [email protected] ABSTRACT Intelligent learning environments need to assess the

More information

Machine Learning using MapReduce

Machine Learning using MapReduce Machine Learning using MapReduce What is Machine Learning Machine learning is a subfield of artificial intelligence concerned with techniques that allow computers to improve their outputs based on previous

More information

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning Proceedings of the 6th WSEAS International Conference on Applications of Electrical Engineering, Istanbul, Turkey, May 27-29, 2007 115 Data Mining for Knowledge Management in Technology Enhanced Learning

More information

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS

More information

Comparison of Data Mining Techniques used for Financial Data Analysis

Comparison of Data Mining Techniques used for Financial Data Analysis Comparison of Data Mining Techniques used for Financial Data Analysis Abhijit A. Sawant 1, P. M. Chawan 2 1 Student, 2 Associate Professor, Department of Computer Technology, VJTI, Mumbai, INDIA Abstract

More information

A comparative study of data mining (DM) and massive data mining (MDM)

A comparative study of data mining (DM) and massive data mining (MDM) A comparative study of data mining (DM) and massive data mining (MDM) Prof. Dr. P K Srimani Former Chairman, Dept. of Computer Science and Maths, Bangalore University, Director, R & D, B.U., Bangalore,

More information

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets

Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Data Quality Mining: Employing Classifiers for Assuring consistent Datasets Fabian Grüning Carl von Ossietzky Universität Oldenburg, Germany, [email protected] Abstract: Independent

More information

Facilitating Business Process Discovery using Email Analysis

Facilitating Business Process Discovery using Email Analysis Facilitating Business Process Discovery using Email Analysis Matin Mavaddat [email protected] Stewart Green Stewart.Green Ian Beeson Ian.Beeson Jin Sa Jin.Sa Abstract Extracting business process

More information

An Introduction to Data Mining

An Introduction to Data Mining An Introduction to Intel Beijing [email protected] January 17, 2014 Outline 1 DW Overview What is Notable Application of Conference, Software and Applications Major Process in 2 Major Tasks in Detail

More information

Data Mining Applications in Higher Education

Data Mining Applications in Higher Education Executive report Data Mining Applications in Higher Education Jing Luan, PhD Chief Planning and Research Officer, Cabrillo College Founder, Knowledge Discovery Laboratories Table of contents Introduction..............................................................2

More information

DATA MINING TECHNIQUES AND APPLICATIONS

DATA MINING TECHNIQUES AND APPLICATIONS DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,

More information

2015 Workshops for Professors

2015 Workshops for Professors SAS Education Grow with us Offered by the SAS Global Academic Program Supporting teaching, learning and research in higher education 2015 Workshops for Professors 1 Workshops for Professors As the market

More information

A Framework for the Delivery of Personalized Adaptive Content

A Framework for the Delivery of Personalized Adaptive Content A Framework for the Delivery of Personalized Adaptive Content Colm Howlin CCKF Limited Dublin, Ireland [email protected] Danny Lynch CCKF Limited Dublin, Ireland [email protected] Abstract

More information

Chapter 6. The stacking ensemble approach

Chapter 6. The stacking ensemble approach 82 This chapter proposes the stacking ensemble approach for combining different data mining classifiers to get better performance. Other combination techniques like voting, bagging etc are also described

More information

LeMo: a Learning Analytics Application Focussing on User Path Analysis and Interactive Visualization

LeMo: a Learning Analytics Application Focussing on User Path Analysis and Interactive Visualization The 7 th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications 12-14 September 2013, Berlin, Germany LeMo: a Learning Analytics Application

More information

Information Management course

Information Management course Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 01 : 06/10/2015 Practical informations: Teacher: Alberto Ceselli ([email protected])

More information

Classification of Learners Using Linear Regression

Classification of Learners Using Linear Regression Proceedings of the Federated Conference on Computer Science and Information Systems pp. 717 721 ISBN 978-83-60810-22-4 Classification of Learners Using Linear Regression Marian Cristian Mihăescu Software

More information

Social Media Mining. Data Mining Essentials

Social Media Mining. Data Mining Essentials Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers

More information

The emergence of analytics

The emergence of analytics Educational Data Mining and Learning Analytics Ryan S.J.d. Baker, Teachers College, Columbia University George Siemens, Athabasca University 1. Introduction During the last decades, the potential of analytics

More information

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics ROCHESTER INSTITUTE OF TECHNOLOGY COURSE OUTLINE FORM KATE GLEASON COLLEGE OF ENGINEERING John D. Hromi Center for Quality and Applied Statistics NEW (or REVISED) COURSE (KGCOE- CQAS- 747- Principles of

More information

Introduction to Data Mining

Introduction to Data Mining Introduction to Data Mining Jay Urbain Credits: Nazli Goharian & David Grossman @ IIT Outline Introduction Data Pre-processing Data Mining Algorithms Naïve Bayes Decision Tree Neural Network Association

More information

Pedagogical Use of Tablet PC for Active and Collaborative Learning

Pedagogical Use of Tablet PC for Active and Collaborative Learning Pedagogical Use of Tablet PC for Active and Collaborative Learning Oscar Martinez Bonastre [email protected] Antonio Peñalver Benavent [email protected] Francisco Nortes Belmonte [email protected]

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Unit # 11 Sajjad Haider Fall 2013 1 Supervised Learning Process Data Collection/Preparation Data Cleaning Discretization Supervised/Unuspervised Identification of right

More information

Meta-learning. Synonyms. Definition. Characteristics

Meta-learning. Synonyms. Definition. Characteristics Meta-learning Włodzisław Duch, Department of Informatics, Nicolaus Copernicus University, Poland, School of Computer Engineering, Nanyang Technological University, Singapore [email protected] (or search

More information

A Data Mining view on Class Room Teaching Language

A Data Mining view on Class Room Teaching Language 277 A Data Mining view on Class Room Teaching Language Umesh Kumar Pandey 1, S. Pal 2 1 Research Scholar, Singhania University, Jhunjhunu, Rajasthan, India 2 Dept. of MCA, VBS Purvanchal University Jaunpur

More information

Advanced Ensemble Strategies for Polynomial Models

Advanced Ensemble Strategies for Polynomial Models Advanced Ensemble Strategies for Polynomial Models Pavel Kordík 1, Jan Černý 2 1 Dept. of Computer Science, Faculty of Information Technology, Czech Technical University in Prague, 2 Dept. of Computer

More information

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

Enhanced Boosted Trees Technique for Customer Churn Prediction Model IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction

More information

MOCLog Monitoring Online Courses with log data

MOCLog Monitoring Online Courses with log data MOCLog Monitoring Online Courses with log data Riccardo Mazza1, Marco Bettoni2, Marco Faré1, Luca Mazzola1 1eLab elearning Lab, USI Università della Svizzera italiana, Via Buffi, 13, Lugano - Switzerland

More information

René F. Kizilcec. Curriculum Vitae

René F. Kizilcec. Curriculum Vitae Department of Communication Stanford University, Stanford, CA 94305 [email protected] http://rene.kizilcec.com/ 650-798-4606 Education Ph.D., Communication, Stanford University, Stanford, CA, Expected

More information

A Framework of User-Driven Data Analytics in the Cloud for Course Management

A Framework of User-Driven Data Analytics in the Cloud for Course Management A Framework of User-Driven Data Analytics in the Cloud for Course Management Jie ZHANG 1, William Chandra TJHI 2, Bu Sung LEE 1, Kee Khoon LEE 2, Julita VASSILEVA 3 & Chee Kit LOOI 4 1 School of Computer

More information

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies

Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 12, December 2014 International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online at: www.ijarcsms.com

More information

How To Find Influence Between Two Concepts In A Network

How To Find Influence Between Two Concepts In A Network 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation Influence Discovery in Semantic Networks: An Initial Approach Marcello Trovati and Ovidiu Bagdasar School of Computing

More information

Detecting E-mail Spam Using Spam Word Associations

Detecting E-mail Spam Using Spam Word Associations Detecting E-mail Spam Using Spam Word Associations N.S. Kumar 1, D.P. Rana 2, R.G.Mehta 3 Sardar Vallabhbhai National Institute of Technology, Surat, India 1 [email protected] 2 [email protected]

More information

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1 Data Mining 1 Introduction 2 Data Mining methods Alfred Holl Data Mining 1 1 Introduction 1.1 Motivation 1.2 Goals and problems 1.3 Definitions 1.4 Roots 1.5 Data Mining process 1.6 Epistemological constraints

More information

Learning outcomes. Knowledge and understanding. Competence and skills

Learning outcomes. Knowledge and understanding. Competence and skills Syllabus Master s Programme in Statistics and Data Mining 120 ECTS Credits Aim The rapid growth of databases provides scientists and business people with vast new resources. This programme meets the challenges

More information

Web Mining as a Tool for Understanding Online Learning

Web Mining as a Tool for Understanding Online Learning Web Mining as a Tool for Understanding Online Learning Jiye Ai University of Missouri Columbia Columbia, MO USA [email protected] James Laffey University of Missouri Columbia Columbia, MO USA [email protected]

More information

Data Mining in Education: Data Classification and Decision Tree Approach

Data Mining in Education: Data Classification and Decision Tree Approach Data Mining in Education: Data Classification and Decision Tree Approach Sonali Agarwal, G. N. Pandey, and M. D. Tiwari Abstract Educational organizations are one of the important parts of our society

More information

Segmentation and Classification of Online Chats

Segmentation and Classification of Online Chats Segmentation and Classification of Online Chats Justin Weisz Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract One method for analyzing textual chat

More information

Identifying Learning Styles in Learning Management Systems by Using Indications from Students Behaviour

Identifying Learning Styles in Learning Management Systems by Using Indications from Students Behaviour Identifying Learning Styles in Learning Management Systems by Using Indications from Students Behaviour Sabine Graf * Kinshuk Tzu-Chien Liu Athabasca University School of Computing and Information Systems,

More information

Subject Description Form

Subject Description Form Subject Description Form Subject Code Subject Title COMP417 Data Warehousing and Data Mining Techniques in Business and Commerce Credit Value 3 Level 4 Pre-requisite / Co-requisite/ Exclusion Objectives

More information

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,

More information

Customer Classification And Prediction Based On Data Mining Technique

Customer Classification And Prediction Based On Data Mining Technique Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor

More information

A Framework for Dynamic Faculty Support System to Analyze Student Course Data

A Framework for Dynamic Faculty Support System to Analyze Student Course Data A Framework for Dynamic Faculty Support System to Analyze Student Course Data J. Shana 1, T. Venkatachalam 2 1 Department of MCA, Coimbatore Institute of Technology, Affiliated to Anna University of Chennai,

More information

The Open University s repository of research publications and other research outputs

The Open University s repository of research publications and other research outputs Open Research Online The Open University s repository of research publications and other research outputs OUSocial2: a platform for gathering students feedback from social media Conference Item How to

More information

Teaching in School of Electronic, Information and Electrical Engineering

Teaching in School of Electronic, Information and Electrical Engineering Introduction to Teaching in School of Electronic, Information and Electrical Engineering Shanghai Jiao Tong University Outline Organization of SEIEE Faculty Enrollments Undergraduate Programs Sample Curricula

More information

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery Index Contents Page No. 1. Introduction 1 1.1 Related Research 2 1.2 Objective of Research Work 3 1.3 Why Data Mining is Important 3 1.4 Research Methodology 4 1.5 Research Hypothesis 4 1.6 Scope 5 2.

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014 RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer

More information

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students

Research-based Learning (RbL) in Computing Courses for Senior Engineering Students Research-based Learning (RbL) in Computing Courses for Senior Engineering Students Khaled Bashir Shaban, and Mahmoud Abdulwahed Computer Science and Engineering Department; and CRU, Dean s Office Best

More information

IT services for analyses of various data samples

IT services for analyses of various data samples IT services for analyses of various data samples Ján Paralič, František Babič, Martin Sarnovský, Peter Butka, Cecília Havrilová, Miroslava Muchová, Michal Puheim, Martin Mikula, Gabriel Tutoky Technical

More information

Towards better accuracy for Spam predictions

Towards better accuracy for Spam predictions Towards better accuracy for Spam predictions Chengyan Zhao Department of Computer Science University of Toronto Toronto, Ontario, Canada M5S 2E4 [email protected] Abstract Spam identification is crucial

More information

Analysis of Social Media Streams

Analysis of Social Media Streams Fakultätsname 24 Fachrichtung 24 Institutsname 24, Professur 24 Analysis of Social Media Streams Florian Weidner Dresden, 21.01.2014 Outline 1.Introduction 2.Social Media Streams Clustering Summarization

More information

Evolution of Interests in the Learning Context Data Model

Evolution of Interests in the Learning Context Data Model Evolution of Interests in the Learning Context Data Model Hendrik Thüs, Mohamed Amine Chatti, Roman Brandt, Ulrik Schroeder Informatik 9 (Learning Technologies), RWTH Aachen University, Aachen, Germany

More information