Data Analysis in E-Learning System of Gunadarma University by Using Knime



Similar documents
COURSE RECOMMENDER SYSTEM IN E-LEARNING

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

E-Learning Using Data Mining. Shimaa Abd Elkader Abd Elaal

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

IT services for analyses of various data samples

Ensembles and PMML in KNIME

Data Mining & Data Stream Mining Open Source Tools

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

SPMF: a Java Open-Source Pattern Mining Library

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.

Technical Report. The KNIME Text Processing Feature:

An Introduction to WEKA. As presented by PACE

Building A Smart Academic Advising System Using Association Rule Mining

PARAMETRIC COMPARISON OF DATA MINING TOOLS

Chapter 2: Getting Started

Introduction Predictive Analytics Tools: Weka

A Time Efficient Algorithm for Web Log Analysis

Monitis Project Proposals for AUA. September 2014, Yerevan, Armenia

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

PHP Language Binding Guide For The Connection Cloud Web Services

Experiments in Web Page Classification for Semantic Web

New Relic & JMeter - Perfect Performance Testing

The Prophecy-Prototype of Prediction modeling tool

Arti Tyagi Sunita Choudhary

automates system administration for homogeneous and heterogeneous networks

Contents WEKA Microsoft SQL Database

Semantic Search in Portals using Ontologies

How To Analyze Web Server Log Files, Log Files And Log Files Of A Website With A Web Mining Tool

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Processing and data collection of program structures in open source repositories

A Visualization System and Monitoring Tool to Measure Concurrency in MPICH Programs

Component visualization methods for large legacy software in C/C++

JRefleX: Towards Supporting Small Student Software Teams

Pentaho Data Mining Last Modified on January 22, 2007

Evaluating an Integrated Time-Series Data Mining Environment - A Case Study on a Chronic Hepatitis Data Mining -

First Steps towards a Frequent Pattern Mining with Nephrology Data in the Medical Domain. - Extended Abstract -

aloe-project.de White Paper ALOE White Paper - Martin Memmel

PROGRAMMING FOR BIOLOGISTS. BIOL 6297 Monday, Wednesday 10 am -12 pm

RAPIDMINER FREE SOFTWARE FOR DATA MINING, ANALYTICS AND BUSINESS INTELLIGENCE. Luigi Grimaudo Database And Data Mining Research Group

Efficiency Considerations of PERL and Python in Distributed Processing

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Improved metrics collection and correlation for the CERN cloud storage test framework

Web Document Clustering

Chapter 2 Interactive and Collaborative E-Learning Platform with Integrated Social Software and Learning Management System

An Effective Analysis of Weblog Files to improve Website Performance

Data Mining Solutions for the Business Environment

SAP Business Objects Business Intelligence platform Document Version: 4.1 Support Package Data Federation Administration Tool Guide

A Performance Study of Load Balancing Strategies for Approximate String Matching on an MPI Heterogeneous System Environment

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

THE COMPARISON OF DATA MINING TOOLS

Welcome to Collage (Draft v0.1)

Association rules for improving website effectiveness: case analysis

A Survey on Web Mining From Web Server Log

TSRR: A Software Resource Repository for Trustworthiness Resource Management and Reuse

EFFICIENCY CONSIDERATIONS BETWEEN COMMON WEB APPLICATIONS USING THE SOAP PROTOCOL

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Product Guide. Sawmill Analytics, Swindon SN4 9LZ UK tel:

Business Intelligence in E-Learning

ProM 6 Exercises. J.C.A.M. (Joos) Buijs and J.J.C.L. (Jan) Vogelaar {j.c.a.m.buijs,j.j.c.l.vogelaar}@tue.nl. August 2010

Monitoring Replication

LiDDM: A Data Mining System for Linked Data

City Data Pipeline. A System for Making Open Data Useful for Cities. stefan.bischof@tuwien.ac.at

ABSTRACT The World MINING R. Vasudevan. Trichy. Page 9. usage mining. basic. processing. Web usage mining. Web. useful information

A Review on Network Intrusion Detection System Using Open Source Snort

Generalization of Web Log Datas Using WUM Technique

Evolution of Interests in the Learning Context Data Model

Excel for Data Cleaning and Management

KNIME Enterprise server usage and global deployment at NIBR

Problem Solving Hands-on Labware for Teaching Big Data Cybersecurity Analysis

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

KNIME opens the Doors to Big Data. A Practical example of Integrating any Big Data Platform into KNIME

LDIF - Linked Data Integration Framework

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

Integrating Big Data into the Computing Curricula

An application for clickstream analysis

2015 Workshops for Professors

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Handling Big(ger) Logs: Connecting ProM 6 to Apache Hadoop

How To Know If You Can Get Open Source Software To Work For A Corporation

A QTI editor integrated into the netuniversité web portal using IMS LD

DIABLO VALLEY COLLEGE CATALOG

A Framework of User-Driven Data Analytics in the Cloud for Course Management

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

MicroStrategy Course Catalog

A system is a set of integrated components interacting with each other to serve a common purpose.

Final Project Report

Transcription:

Data Analysis in E-Learning System of Gunadarma University by Using Knime Dian Kusuma Ningtyas tyaz tyaz tyaz@student.gunadarma.ac.id Prasetiyo prasetiyo@student.gunadarma.ac.id Farah Virnawati virtha 1408@student.gunadarma.ac.id Tirta Paramitta eatha 020688@student.gunadarma.ac.id I. Wayan Simri W. iwayan@staff.gunadarma.ac.id Gunadarma University Jl. Margonda Raya 100, Depok, Indonesia, 16424 Abstract Nowadays, E-Learning has become a preferred choice of learning method because it can reduce operational cost in establishing educational activities. It also offers time efficiency, flexibility in the way of attending courses, and ease of access on teaching material. The main problem is how to make user more interested in accessing and fulfilling user requirements to provide suitable E-Learning system. One way to meet that challenge is by analyzing user behaviors. The purpose is to give a recommendation and evaluation in developing E-Learning system. In this paper, we will use log data from E-Learning system of Gunadarma University. Using Knime, we have done some analysis to explore the pattern of user s behaviors. Later, the pattern will be used as an input to decide a way to enhance E-Learning system. Keywords : access, data analysis, E-Learning, Knime, pattern, user behavior 1. Introduction E-Learning is one of learning methods that appears as a personalization demand of human resource development supported with Information Technology sector. E-Learning enables user to access course material by using internet, intranet, or other media. Now E-Learning apparently becomes a preferred choice because it offers some benefits, such as reduced operational cost, ease of access, broader participant, and minimal use of equipment and printed material. E-Learning can also be intended to those who have several limitations in attending physical class. With the great number of benefits offered, now many institutions implement E-Learning as a learning solution to their staffs, especially in educational sector, like Gunadarma University. E-Learning has also been applied to their staffs and students. Moreover, E-Learning becomes a local issue released by Ministry of National Education in National Education Meeting on February 4th - 6th 2008. It makes E-Learning considered as a way to enrich the material of learning activities which are still performed classically[1]. Based on that fact, E-Learning has been considered as the most prospective way to establish learning environment especially in Indonesia. In the new established E-Learning system, it is important to continuously evaluate and monitor effects of its implementation. The main problem is how to provide E-Learning system that can attract and really suitable with user demand. One way to explore user demands is by analyzing their behavior when accessing E-Learning system. Primary result expected from behavioral analysis is to give a recommendation in enhancing E-Learning system. This recommendation result is very suitable for an institution, like Gunadarma University, that recently establishes E-Learning system. This analysis is intended to two parties, developer and E-Learning user. The developer of E-Learning system can observe some effects and tendencies that really happen in E-Learning implementation. The effects and tendencies can become input for evaluation to improve or decide the next step in E-Learning system development. Users can also take the benefits of this analysis. With a better E-Learning system, they will become more interested in accessing and also increasing their learning interests. This paper is organized as follows. In section 2, we will explain recent approach used for optimizing E-Learning. In 63

section 3, we will give a brief overview to Knime as tool that used in this paper. Section 4 presents our case study that can be solved by using Knime. Section 5 shows the result and discussion to our case study. Section 6 concludes the paper. 2. Recent Approach For E-Learning Optimization In fact, there are a lot of methods which can be used to optimize the effectiveness of learning process through E-Learning. It is proven with some papers discuss about methodologies used to solve problem in E-Learning. In paper Discovering Student Preferences in E-Learning, Cristina Carmina, Gladys Castillo and Eva Millan propose to use adaptive machine learning algorithms to learn about the student s preferences over time. First, they use all the background knowledge available about a particular student to build an initial decision model based on learning styles. This model can then be fine-tuned with the data generated by the student s interactions with the system in order to reflect more accurately his/her current preferences[4]. Then Felix Castro, Alfredo Vellido, Angela Nebot, and Francisco Mugica explain more detail about several techniques of data mining which deal with Artificial Intelligent. Also, they provide a taxonomy of e-learning problems to which Data Mining techniques have been applied, including, for instance: Students classification based on their learning performance; detection of irregular learning behaviors; e-learning system navigation and interaction optimization; clustering according to similar E-learning system usage; and systems adaptability to students requirements and capacities in paper Applying of Data of Mining Techniques to E-Learning Problems [5]. There is also Wengang Liu trying to introduce an ecological method which reveals latent user models that can be used for various pedagogical purposes in educational applications. This method consists of six layers which are the raw data layer, the factor data layer, the data mining layer, the measurement layer, the metric layer and the application layer. Liu also applies clustering tools to dynamically analyze and group learners to match their behavior and performance, and create new metrics and measurements to represent user models[6]. Besides the studies about method used in Data Mining, there are also many writers have tried to analyze the internal issue of E-Learning, one of them is David Monk, by paper entitled Using of Data of Mining for e-learning Decision Making, he tried to examine the path learners followed when offered the course in a custom virtual learning environment (VLE), which is applied at University of Glamorgan, structured by tasks, course materials and learning resources. A better understanding of how learners accessed the electronic course materials was needed to evaluate the effectiveness of developing and delivering courses in this way. By combining the data content of activity with the user s profiles were possible to examine alternate information perspectives and reveal patterns in large volume data sets. Mining data in this way provides ways to learn about learners in order to make effective decisions regarding teaching methods, delivery models and infrastructure investment[7]. There is also Azizul Azhar bin Ramli who is trying to implement the high level process of Web Usage Mining using basic Association Rules algorithm called Apriori Algorithm, in order to produce the university E Learning (UUM Educare) portal usage patterns and user behaviors, using commercial data Web mining tools (WebLog Expert Lite 3.5 and Sawmill 7) and ARunner 1.0 (prototype of GUI Christian Borgelt Apriori tool by Shamrie Sainin, FTM, UUM), it has identified several Web access patterns. This analysis includes descriptive statistic and Association Rules for the portal including support and confidence to represent the Web usage and user behavior, which is explained in his paper entitled Web Usage Mining Apriori Algorithm : UUM Learning Care Portal Case [3]. What we have done in this paper is relatively a simple approach. By using Knime tool, we have discovered some results of our behavior analysis. Mostly, we use statistical methodology combined with query manner to retrieve some facts. 3. Knime KNIME is a modular data exploration platform that enables the user to visually create data flows (often referred to as pipelines), selectively executes some or all analysis steps, and later investigates the results through interactive views on data and models[knime]. KNIME was developed (and will continue to be expanded) by the Chair for Bioinformatics and Information Mining at the University of Konstanz, Germany. The group headed by Michael Berthold is also using KNIME for teaching and research at the University. Quite a number of new data analysis methods developed at the chair are integrated in KNIME. KNIME is released under the Aladdin Free Public License, and can be run on Windows and Linux. KNIME based version has already incorporated over 100 processing nodes for data I/O, preprocessing and cleansing, model, analysis and data mining as well as various interactive views, such as scatter plots, parallel coordinates and others. It includes all analysis modules of the well-known Weka data mining environment (http://www.cs.waikato.ac.nz/ml/weka/) and additional plugins allow R-scripts (www.r-project.org) to be run, offering access to a vast library of statistical routines. Screenshot of Knime can be seen in figure 1 below. 64

Figure 1. Screenshot of Knime Knime can be obtained as a single package of binary file or bundled in eclipse plugin. In this paper, we use the single package that can compile directly in Java environment. The eclipse bundled package is for development purposes, which offer code-level access. 4. Log Data Analysis using Knime One approach that can be used to optimize E-Learning system is by analyzing user behavior. In this paper we have done log data analysis in the E-Learning system to find some facts related to E-Learning implementation. Those facts are in form of user behavior pattern and statistical result. Log data contain NPM (student ID), access time (in UNIX time format), IP address, course code, course fullname, user action, and URL address that captured in the system, shown in figure 2. the data consistency, data cleaning also done for action that not so significant. The insignificant action is such action with occurrence level under the defined threshold value. After cleaning process, we have 546.449 records for log data along March 16th 2006 until June 3rd 2008. The next step is to find user access pattern. Our primary concern is user time tendency in E-Learning system when accessing and course spreading toward student department, faculty, and generation year. Beside that, we will show several statistic results such as the most accessed course, the most accessed action, average of access duration, the number of IP inside and outside campus, and activity rate per department. 5. Result and Discussion 5.1 Experimental Result After cleaning the log data, we import it to Knime and do some data analysis process. Knime supports several data types, such as any text file, ARFF, database, and it also provides artificial data generator. In this paper, we use text file (.txt) for the data. The first analysis is the most favorite course based on the number of access on each course. In Knime, we do some steps to get the expected result, shown in figure 3. Figure 3. Steps to get the most favorite lecture Figure 2. Entry at Used E-Learning System Log Data analysis is implemented to explore E-Learning user s behavior in E-Learning system of Gunadarma University. Problem of analysis may occur in the level of understanding and cleaning from log data of E-Learning system. At the beginning, we have more than 600.000 record along 2006-2008 period with cleaning level needed almost 11 percent. Cleaning process done to the entry which does not have username, IP address, and access time. Besides In figure 3, there are 4 operators performed in Knime. The first operator is File Reader, which is needed to read the source data log. The second operator is GroupBy, which is needed to group records in the data log based on a field (for this case we group the records based on course names). The next operator is Sorter, used to sort the records based on a field (for this case we sort the data log based on the number of NPM (student ID). The last operator is Interactive Table, to view the result in a table. After executing all these operators, we get that the most favorite lecture is Ilmu Budaya Dasar (KA). The second analysis is the action mostly occurs in the E-Learning system. The step is the same as the first analysis. The differences are the chosen fields in GroupBy operator, which is based on actions, and the chosen field in Sorter operator, which is based on the number of URL. 65

After executing all those operators, we get that the most action is view. Third analysis is about average of access duration. Based on the following equation: AccessingT ime = (1, 212, 477, 975 1, 142, 499, 905)s 546, 449access = 69, 978, 070s 546, 449access = 128.06s/access = 2minutes/access needed to count the number of local ip in block C. The difference between those stages is only in regular expression used in the Row Filter operator. And then fourth stage is needed to count all of ip (local or public) accessed the system. After execute all these operators, the result is the number of local IP accessed is less than public IP. The fifth analysis is most active department. In Knime, we do some steps to get the expected result, shown in figure 5. where : 1,212,477,975 = last accessing time 1,142,499,905 = the first accessing time 546,449 = the number of entry From the equation above, we can see that the average of access duration is about 2 minutes per access. The fourth analysis is the number of IP inside campus environment (local ip). The steps in Knime are shown in figure 4. Figure 4. Steps to get the number of IP inside campus environment In figure 4, after File Reader operator, we can see that there are four stages to get the number of local ip. Each of the stages has sequence of operators which are not different from the previous analysis. First stage is needed to count the number of local ip in block A. Second stage is needed to count the number of local ip in block B. Third stage is Figure 5. Steps to get the most active major In figure 5, we see that there are 5 operators performed previous analysis, which is used to read the log data. The second operator is GroupBy, for this result we group the records based on NPM. There are two third operators after GroupBy operator, which are Sorter, for this result we sort the data log based on the number of actions, and Histogram (interactive). The last operator is Interactive Table, which is needed to view the result in a table. After execute all these operators, we get that the most active department accessed E-Learning system is Sistem Informasi. From the fifth analysis, we can have another analysis. We can analyze which generation year in Sistem Informasi department is the most active. To know that, we must perform some steps in Knime, shown in figure 6 In figure 6, we see that there are 5 operators performed previous analysis, which is used to read the log data. The second operator is Row Filter, which is needed to filter the records based on a field (for this result we filter the data based on NPM in Sistem Informasi department. The next operator is GroupBy, for this result we group the records based on generation year. The last two operators are Interactive Table and Pie Chart. Pie Chart operator is used to make a pie chart based on the result shown in Interactive Table. After execute all these operators, we got the most active generation year in Sistem Informasi department is 2007. We can also analyze profoundly the most accessed 66

6. Conclusion Figure 6. Steps to get the most active generation in Sistem Informasi course of the generation of year 2007. In order to get the result, there are some steps to be performed in Knime, shown in figure 7. Figure 7. Steps to get the most accessed lecture in generation year 2007 In figure 7, we see that there are 5 operators performed previous analysis. The second operator is Row Filter, which is needed to filter the records based on a field (for this result we filter the data based on student ID. The third operator is GroupBy, for this result we group the records based on course name. The next operator is Sorter, for this result we sort the data log based on student ID. The last operator is Interactive Table. After execute all these operators, we get that the most accessed lecture for generation year 2007 is Ilmu Budaya Dasar (KA). 5.2 Discussion As a benchmark, we also compare those results with manual approach. Our manual approach is developed by PHP and mysql and the results are exactly as the same as what we have got using Knime. For our purpose, Knime is satisfying enough and can meet our desired result. Moreover, Knime is easier than using our manual approach since it allows us to process the file directly. GUI environment that Knime possessed also makes file processing simpler without getting stuck in syntax and command. Data analysis using Knime in E-Learning system of Gunadarma University is able to give the expected result. Released under the Aladdin Free Public License, Knime is capable to handle large data set, perform filtering, clustering, sorting, and give the result in table or chart. Also, Knime can be executed in Windows or Linux operating system. By implementing data analysis, we can see user behavior and access pattern in E-Learning system of Gunadarma University. The results are addressed to give a recommendation to improve the E-Learning system, for example: socialization to gain more access to E-Learning system, enhance course material quality, and infrastructure improvement. The result of this analysis can also give an evaluation for implementation of E-Learning system in Gunadarma University in order to make it better. The number of access from outside of the campus environment indicates two things. First, students are not satisfied with the facilities in campus to access V-Class. Because of that, it is important to improve infrastructure of the campus so that the students can feel comfortable to access V- Class in campus environment. Second, access from outside of the campus environment needs a better bandwidth supply and management to be able to handle number of access especially when peak access occurs. In advanced, Knime s ability can be used as a background process to capture data and analyze it automatically. This is possible since Knime developer version can be accessed freely and give us opportunity to embed our self-defined operator. With this ability, Knime can be wellcollaborated with E-Learning system to capture long-term data. References [1] http://rembuknasional.diknas.go.id/web/?q=node/7. [2] http://www.knime.org/. [3] A. A. bin Ramli. Web usage mining using apriori algorithm: Uum learning care portal case. In International Conference On Knowledge Management, pages 1 19, Malaysia, 2005. [4] E. M. Cristina Carmona, Gladys Castillo. Discovering student preferences in e-learning. In International Workshop on Applying Data Mining in e-learning 2007, pages 33 42, 2007. [5]. N. Flix Castro, Alfredo Vellido and F. Mugica. Applying data mining techniques to e-learning problems. In Evolution of Teaching and Learning Paradigms in Intelligent Environment, volume 62, pages 183 221. Springer Berlin Heidelberg, 2007. [6] W. Liu. Applying educational data mining in e-learning environment. www.cs.usask.ca/100/posters/wengang.pdf, May 2008. [7] D. Monk. Using data mining for e-learning decision making. Electronic Journal of e-learning, 3:41 54, 2005. 67