How Course Management Systems Can Benefit from Exploratory Analysis of Student On-line Activity



Similar documents
Web Usage Mining for a Better Web-Based Learning Environment

Analyzing lifelong learning student behavior in a progressive degree

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Web Mining as a Tool for Understanding Online Learning

Providing Adaptive Courses in Learning Management Systems with Respect to Learning Styles *

Predicting Students Final GPA Using Decision Trees: A Case Study

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

PREDICTING STUDENT SATISFACTION IN DISTANCE EDUCATION AND LEARNING ENVIRONMENTS

Do Supplemental Online Recorded Lectures Help Students Learn Microeconomics?*

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

The move to Moodle: Perspective of academics in a College of Business

Assessing Blackboard: Improving Online Instructional Delivery

Data Mining Solutions for the Business Environment

Business Intelligence in E-Learning

Multi-agent System for Web Advertising

E-LEARNING EDUCATION SYSTEM IN UNIVERSITIES WITH INSTRUCTORS PERSPECTIVES AND A SURVEY IN TURKEY

Web-Based Education in Secondary Schools in Cyprus A National Project

Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

Getting Started with WebCT

Implementation of the Web-based Learning in PhD Education

Quality of Online Courses at a Tertiary Learning Institution: From its Academic Staff s Perspective. Lai Mei Leong* 1, Chong Lin Koh* 2

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

A Survey on Web Mining From Web Server Log

Establishing Guidelines for Determining Appropriate Courses for Online Delivery

PREPROCESSING OF WEB LOGS

The Design Study of High-Quality Resource Shared Classes in China: A Case Study of the Abnormal Psychology Course

Instructor and Learner Discourse in MBA and MA Online Programs: Whom Posts more Frequently?

Instructional Systems Design

Internet classes are being seen more and more as

Teacher-Learner Interactions in Online Learning at the Center for Online and Distance Training (CODT), Travinh University, Vietnam

Online Course Development Guide and Review Rubric

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Learning paths in open source e-learning environments

Asynchronous Learning Networks and Student Outcomes: The Utility of Online Learning Components in Hyhrid Courses

Assessment of Online Learning Environments: Using the OCLES(20) with Graduate Level Online Classes

Students Behavioural Analysis in an Online Learning Environment Using Data Mining

Examining Students Performance and Attitudes Towards the Use of Information Technology in a Virtual and Conventional Setting

2015 Workshops for Professors

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

EDD Curriculum Teaching and Technology by Joyce Matthews Marcus Matthews France Alcena

MEDICAL INTERPRETING TRAINING SCHOOL

Big Data: Rethinking Text Visualization

Learning Equity between Online and On-Site Mathematics Courses

Data Mining Project Report. Document Clustering. Meryem Uzun-Per

Comparison of Student Performance in an Online with traditional Based Entry Level Engineering Course

Analysis of Data Mining Concepts in Higher Education with Needs to Najran University

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

BLENDED LEARNING APPROACH TO IMPROVE IN-SERVICE TEACHER EDUCATION IN EUROPE THROUGH THE FISTE COMENIUS 2.1. PROJECT

EFL LEARNERS PERCEPTIONS OF USING LMS

Running head: INTERNET AND DISTANCE ED 1

Identifying At-Risk Students Using Machine Learning Techniques: A Case Study with IS 100

Achievement and Satisfaction in a Computer-assisted Versus a Traditional Lecturing of an Introductory Statistics Course

Data mining for the analysis of content interaction in web-based learning and training systems

Using web blogs as a tool to encourage pre-class reading, post-class. reflections and collaboration in higher education

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Task-based pedagogy in technology mediated writing

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

An Investigation on Learning of College Students and the Current Application Situation of the Web-based Courses

Penn State Online Faculty Competencies for Online Teaching

Behaviour analysis for Web-mediated active learning 1

DEVELOPING E-LEARNING PROTOTYPE FOR LIBRARY MANAGEMENT - A CASE STUDY. Indira Gandhi National Open University Indira Gandhi National Open University

Comparing Blogs, Wikis, and Discussion Boards. as Collaborative Learning Tools

Progressive Teaching of Mathematics with Tablet Technology

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

A Web-based Interactive Data Visualization System for Outlier Subspace Analysis

Data Mining and Exploration. Data Mining and Exploration: Introduction. Relationships between courses. Overview. Course Introduction

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Strategies for Designing. An Orientation for Online Students. Norma Scagnoli. University of Illinois at Urbana Champaign

A New Approach for Evaluation of Data Mining Techniques

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

A CALCULUS COURSE WITH INTERACTIVE SUPPORT ON MOODLE

MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts

SAS JOINT DATA MINING CERTIFICATION AT BRYANT UNIVERSITY

Implementing Information Technology in the Learning Process

Data Exploration Data Visualization

GRADUATE STUDENT SATISFACTION WITH AN ONLINE DISCRETE MATHEMATICS COURSE *

Transcription:

How Course Management Systems Can Benefit from Exploratory Analysis of Student On-line Activity Manuel Rubio-Sánchez, Raquel Hijón-Neira, Francisco Domínguez-Mateos, and Ángel Velázquez-Iturbide Rey Juan Carlos University Department of Computer Science Languages and Systems c/ Tulipán s/n, 28-933 Madrid, Spain {manuel.rubio,raquel.hijon,francisco.dominguez,angel.velazquez}@urjc.es Abstract. Over the last years significant efforts have been made to develop effective Course Management Systems (CMS). Recently, they have begun to incorporate databases that contain valuable information about student on-line activity, behavior, scores, etc. Although CMS provide descriptive statistics related to these databases, they still lack adequate Exploratory Data Analysis (EDA) and Data Mining tools that would enhance a knowledge discovery process capable of revealing interesting patterns and hidden information. This paper analyzes the utility of EDA tools by examining data associated with a particular course taught through the Atnova Virtual Campus CMS. The study provides valuable information to be considered by any CMS, such as patterns of student activity, relevance of multivariate methods, or guidelines regarding the construction of databases. 1 Introduction Many sophisticated CMS have been developed, and are in use, around the world. Educators, that use these environments and tools, however, have very little support to evaluate students activities and discriminate between their different on-line behaviors. New ways of obtaining information about the learning patterns should be studied. This requires the development of effective methods for determining and evaluating student behavior in electronic environments. In addition to the descriptive statistical analysis provided by most web access log analysis tools, such as, calculating hit frequency, average, median, etc., length and duration of sessions, and other limited low-level statistical measures, there have been several data mining approaches adapted specifically for web usage mining [1]. The most popular methods include association rules mining, clustering, classification, sequential pattern analysis and dependency modelling [2], as well as prediction. None of these applications, however, were tailored to distance learning, but the methods are general enough for e-learning systems to benefit from them. All of these techniques were designed for knowledge discovery

from very large databases of numerical data [3], were adapted for web mining, and applied in on-line businesses with relative success. Therefore, in [4] the authors designed and implemented a prototype of such an application as a tool for educators to apply association rules in order to discover relationships between learning activities that learners perform, sequential analysis to reveal interesting patterns in the sequences of on-line activities, and clustering to group similar access behaviors. Another study, developed in [5], is a special web sequence analyzer for improving web pages layout and structure based on the history of access sequences. Carr-Chellman and Duchasel [6] argue that simply transposing traditional course material onto the web does not use the medium to its best advantage, and that effective on-line instruction has specific and different design requirements. However, determining learning behavior in electronic media is a complex problem. The difficulty resides in the fact that students mostly use these environments away from the classroom and out of sight of their educators. Without the informal monitoring that occurs in face-to-face teaching it is difficult for educators to know how their students are using and responding to these environments. Educators have had to seek new ways of obtaining information about the learning patterns of their students. This requires the development of effective methods that would determine and evaluate learner behavior in electronic environments. For example, an analysis of student use of a courseware website (see [7]) found that the most popular on-line activities were passive and involved retrieving information rather than contributing. Their conclusion was that the students were very goal oriented in their use of the web site. Further information can be obtained through discovering students access resources [8]. This can help educators understand students preferred learning patterns. A study carried out in [9] explored interactions of doctoral students with an on-line environment, and concluded that student interactions were goal focussed. For instance, in a study of student use of a first year geology website (see [10]), log file analysis showed that students accessed the most recent lecture notes first, picking up a couple of key slides, before returning to a previous lecture. As a result, it was shown that students were accessing resources according to immediate need. Following a similar approach, another study of these characteristics showed that the average connection to the CMS was over thirty minutes long [11]. Educational web based systems for the improvement of the e-learning experience, where the ultimate objective is the discovery of system usage patterns and, generally, database knowledge discovery, are [12]: (1) methods of classifying students based on their usage patterns on a web-based course; (2) methods of system personalization (see [13]); and (3) methods that allow automatic detection of atypical behaviors. Despite the existence of some research concerned with the mining of data generated by the use of e-learning systems, there is still a lack of standard methods and techniques to address some open problems in distance education. This paper analyzes the utility of EDA tools by examining data associated with a particular course taught through the Atnova Virtual Campus CMS. The

study provides valuable information to be considered by any CMS, such as patterns of student activity, relevance of multivariate methods, or guidelines regarding the construction of databases. The rest of the paper is organized as follows. Section 2 provides an overview of the course under study. Section 3 describes the different exploratory analysis methods, along with their results. Finally, several conclusions are summarized in Sec. 4. 2 Brief Course Description The course under analysis, How to visually show data and explanations, belongs to the ADA-Madrid project [14], a pioneering program aimed at fomenting the use of communication and information technologies in on-line teaching activities. Every year the project comprises about 20 general elective courses offered to students who attend any of the six public universities of Madrid (Complutense, Autónoma, Rey Juan Carlos, Politécnica, Alcalá, Carlos III). The maximum number of students per course is 60 (10 per university). Teaching is carried out via the Internet through the Atnova Virtual Campus CMS (see Fig. 1), but can also be complemented with special videoconference sessions. Fig. 1. ADA-Madrid Virtual Campus. The course is based on information visualization [15], a classical area of computer science, but is presented to students in a simple and pleasing manner. This is a prerequisite imposed by the ADA-Madrid project, where students must be able to benefit from the courses, regardless of their specific major. In this sense, the approach is general and informative, where the main concepts stem from areas of graphical design and visualization. There exist several works on experiences with similar courses [16]. The teaching methodology consists of posting a new lesson every week (9 total lessons). Students can either read the lessons directly from the monitors or print them on paper. They can also download them as they are posted in commonly used electronic formats (pdf, HTML, doc, avi). Interactions with professors and with other students are carried out by means of special forums, a chat tool, or simply interchanging messages (not emails) within the Atnova Virtual Campus.

Evaluations are based on two assignments where students are required to use graphical computer tools (40% each), and a final test (20%). 3 Exploratory Analysis In order to gain insight into students learning patterns and behavior, enhance the teaching methodology for future courses, and discover information unknown a priori, several visual EDA were carried out. For this purpose, a particular working-database was constructed aiming to analyze the following numeric variables and their relationships: scores on both assignments, the test, and the final grade (4 variables); number of posts in forums, total number of established sessions, total connection time in minutes (3 variables); number of established sessions per lesson (9 variables), and connection time in minutes per lesson (9 variables). Two additional nominal/categorical variables were also considered: whether the student dropped the course or not, and their particular majors (this variable was used in order to estimate their computer user skills). 3.1 Data Acquisition Process The Atnova Virtual Campus CMS provides several predefined tables or databases, which can be exported to files with a particular format, containing information related to the numeric variables, for example: connection times per day, sessions and connection times, connection times per session, connection times per lesson, test scores, or final grades. The system is quite flexible since instructors can customize the size or range of the tables. For instance, when analyzing connection times per day it is possible to choose a date interval (the first and last day). However, the system does not incorporate a tool for combining information from various tables. Note that the working-database used in the analysis contains data from several tables. This has eventually lead to a very tedious, practically manual, task of merging them. One of the main problems encountered was the different number of entries per table. Therefore, it would be very desirable for CMS to provide mechanisms enabling educators to create customized databases (see Fig. 2). In this setting, instructors would be able to select the specific variables needed for their analyses in an effortless manner. Fig. 2. CMS should provide tools for merging different tables. The data acquisition process can also reveal surprising information. In this case, when compiling information about the students majors, it was found that

every student had specified theirs, except for the 10 students belonging to the Alcalá university. This suggests a possible problem in the application/registration system at that university. 3.2 Specific Category Analysis In some analyses it is necessary to consider a particular partition of the original working-database. For on-line courses, it seems mandatory to evaluate students performance according to their computer user skills. Several EDA techniques can be used to examine or estimate probability distributions. Fig 3 shows the distributions of students grades (in a scale from 0 to 10) according to their computer user skills with box-plots. This result indicates, for example, the need to provide less experienced students with tutorials and user manuals to ease their adaptation to the CMS. Fig. 3. Analysis of final grades according to computer user skills with box-plots. Other aspects, such as gender, may also be analyzed. Although there have been various studies of web usage, there are scarce studies that report disparate manners of using courseware based on gender. A study by [7] found differences in the type of resources accessed by male and female students. Males used interactive resources significantly more than females, whereas females used passive resources more than males. 3.3 Correlations between Variables The relationships between variables can also provide useful information. There exist multiple techniques for this purpose (see [17 19]). In Fig. 4 the correlation coefficient matrix (absolute values) shows a curious behavior pattern: there exists a high degree of correlation between the number of sessions used in consecutive lessons. According to the previous result, sharp descents in interest or performance could help identify potential drop-out students. In on-line courses it is very important to detect and understand, as early as possible, factors responsible for

Fig. 4. Absolute values of the correlation coefficient symmetric matrix. dropping a course. Note that when a student drops a course, missing data is found in the database, which is difficult to process. Furthermore, the success of any on-line course depends on a high degree of student participation, therefore, instructors should pay special attention to this matter. Thus, it would be desirable for CMS to provide adaptive guidance tools to prevent student failure. A study [20], which analyzed log file interactions with different resources on a courseware website, found a relationship between the frequency of access to learning resources and final exam scores. The authors claim that this provides evidence that the use of relevant web contents improves learning. On the other hand, in [21] there is a study of six measures of student behavior in a CMS that did not consistently correlate with their grades. Note that in the present study this situation appears as well. 3.4 Multivariate Analysis Multivariate analysis methods can detect similarities in learning behaviors and group (cluster) students. However, due to the curse of dimensionality (see [18, 19]), it is difficult to draw conclusions when the number of students is low, or when working with a large amount of variables. Fig 5 shows projections onto a plane of the working-database with two graphical multivariate analysis methods: (a) Sammon s mapping [22], and (b) a self-organizing map, visualized with the U-matrix method [23, 19]. Since the database only contains information about 47 students (13 dropped the course), and since there are up to 25 variables, it is very unlikely for clusters to appear in the images. The only valuable information regards outlier data, which corresponds to two extremes: users who barely access the CMS, or who appear to be connected for weeks.

(a) Sammon s Mapping (b) Self-organizing Map Fig. 5. Graphical multivariate methods. 4 Conclusions This paper analyzes the utility and necessity of EDA tools for enhancing CMS. The study shows that these systems should incorporate a tool for customizing database tables in order to enable the selection of any particular set of variables. In this setting, EDA methods can be used effectively to discover information. Most techniques work well if the number of variables to be analyzed is relatively low. For instance, in the analyzed course, students with better computer user skills obtained, on average, higher grades. The problem of detecting clusters in the data resulted impractical due to the low number of students and the high dimensionality of the database. Graphical multivariate methods, however, were capable of recognizing outlier students. Finally, a pattern of student activity was discovered by analyzing correlations between variables: the number of sessions used in consecutive lessons were highly correlated. 5 Acknowledgements This work is supported by project TIN2004-07568 of the Spanish Ministry of Education and Science. References 1. Zaine, O.R.: Web usage mining for a better web-based learning environment. In: Proc. of Conference on Advance Technology for Education, Banff, Alberta (2001) 60 64 2. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.: Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations 1(2) (2000) 12 23 3. Han, J., Kamber, M.: Data Mining Concepts and Techniques. Morgan Kaufmann Publisher (2001) 4. Zaine, O.R.: Towards evaluating learners behaviour in a web-based distance learning environment. In: Second IEEE International Conference on Advance Learning Technologies (ICALT 01), Madison, WI (2001) 357 360

5. Spiliopoulou, M., Faulstich, L.C., Winkler, K.: A data miner analyzing the navigational behaviour of web users. In: Proc. of workshop on Machine Learning in User Modeling of the ACAI 99, Creta, Greece (1999) 357 360 6. Car-Chellman, A., Duchasel, P.: The ideal online course. British Journal of Educational Technology 31(3) (2000) 229 241 7. Peled, A., Rashty, D.: Logging for success: Advancing the use of WWW logs to improve computer mediated distance learning. Journal of Educational Computing Research 21 (1999) 413 431 8. Sheard, J., Albrecht, D.W., Butbul, E.: ViSION: Visualization student interactions online. In Treloar, A., Ellis, A., eds.: Proc. of the Eleventh Australasian World Wide Web Conference Gold Coast, QLD, Australia, Southern Cross University, Lismore, NSW, Australia (2005) 48 58 9. McIsaac, M.S., Blocher, J.M., Mahes, V., Vrasidas, C.: Student and teacher perceptions of interaction in online computer-mediated communication. Educational Media International 36(2) (1999) 121 131 10. Hellwege, J., Gleadow, A., Naught, C.M.: Paperless lectures on the web: An evaluation of the educational outcomes of teaching geology using the web. In: Proc. of 13th Annual Conference of the Australian Society for Computers in Learning in Tertiary education (ASCILITE 96), Adelaide, Australia, University of South Australia (1996) 11. Hijón, R., Velázquez, A.: Web, log analisis and surveys for tracking university students. In: Proc. of IADIS International Conference on Applied Computing, San Sebastián, Spain (2006) 561 564 12. Castro, F., Vellido, A., Nebot, A., Minguillón, J.: Detecting atypical student behaviour on an e-learning system. In: I National Symposium of Technologies of the Information and the Communications in the Education, SINTICE 2005, Granada, Spain (2005) 153 160 13. Brusilovsky, P.: Adaptive hypermedia. User Modeling and User-Adapted Interaction, Ten Year Anniversary Issue (1/2) (2001) 87 110 14. Velázquez, A., Rubio, M.: Chap. 12: Design and evaluation of the course: How to visually show data and explanations. In Criado, R., Conde, J.V., eds.: I Pedagogical Conference of the ADA-Madrid Project. (2005) 119 125 In Spanish. 15. Spence, R.: Information Visualization. Addison & Wesley, Harlow, England (2001) 16. Carter, R.: Teaching visual design principles for computer science students. Computer Science Education 3(1) (2003) 67 90 17. Tukey, J.W.: Exploratory Data Analysis. Addison & Wesley, Reading Mass. (1977) 18. Du Toit, S.H., Steyn, A.G., Stumpf, R.H.: Graphical exploratory data analysis. Springer-Verlag, New York, NY, USA (1986) 19. Rubio, M.: New Methods for Visual Analysis of Self-Organizing Maps. PhD thesis, Polytechnic University of Madrid, Madrid, Spain (2004) In Spanish. 20. Lu, A.X., Zhu, J.J., Stokes, M.: The use and effects of web-based instruction: Evidence from a single-source study. Journal of Interactive Learning Research 11(2) (2000) 197 218 21. Nickles, G.M.: Correlations of student grades and behavior while using a course management system under different contexts. In: Proc. of the American Society for Engineering Education Annual Conference & Exposition, Portland, OR, US (2005) 22. Sammon Jr., J.W.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers c-18(5) (1969) 401 409 23. Kohonen, T.: Self-Organizing Maps. Third edn. Springer (2001)