A FRAMEWORK FOR COLLECTING CLIENTSIDE PARADATA IN WEB APPLICATIONS



Similar documents
Remote Usability Evaluation of Mobile Web Applications

10CS73:Web Programming

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Web Design and Implementation for Online Registration at University of Diyala

Performance Testing for Ajax Applications

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Oracle Service Bus Examples and Tutorials

Course Scheduling Support System

FileMaker Server 10 Help

Usability Tool for Analysis of Web Designs Using Mouse Tracks

Arti Tyagi Sunita Choudhary

MOOCviz 2.0: A Collaborative MOOC Analytics Visualization Platform

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Intelligent Analysis of User Interactions with Web Applications

How To Test Your Web Site On Wapt On A Pc Or Mac Or Mac (Or Mac) On A Mac Or Ipad Or Ipa (Or Ipa) On Pc Or Ipam (Or Pc Or Pc) On An Ip

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Chapter 5. Regression Testing of Web-Components

Bug Report. Date: March 19, 2011 Reporter: Chris Jarabek

ICE Trade Vault. Public User & Technology Guide June 6, 2014

LabVIEW Internet Toolkit User Guide

International Journal of Engineering Technology, Management and Applied Sciences. November 2014, Volume 2 Issue 6, ISSN

Load testing with. WAPT Cloud. Quick Start Guide

Integrating REST with RIA-Bus for Efficient Communication and Modularity in Rich Internet Applications

DIPLOMA IN WEBDEVELOPMENT

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

World-wide online monitoring interface of the ATLAS experiment

Intelligent Log Analyzer. André Restivo

FileMaker Server 11. FileMaker Server Help

TechTips. Connecting Xcelsius Dashboards to External Data Sources using: Web Services (Dynamic Web Query)

An introduction to creating Web 2.0 applications in Rational Application Developer Version 8.0

Zoomer: An Automated Web Application Change Localization Tool

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Title: Front-end Web Design, Back-end Development, & Graphic Design Levi Gable Web Design Seattle WA

Bitrix Site Manager ASP.NET. Installation Guide

OIT 307/ OIT 218: Web Programming

LoadRunner Tutorial. Using Correlation to Troubleshoot Errors When Executing LoadRunner Scripts

Evaluation of Nagios for Real-time Cloud Virtual Machine Monitoring

HTML5 Data Visualization and Manipulation Tool Colorado School of Mines Field Session Summer 2013

Distance Examination using Ajax to Reduce Web Server Load and Student s Data Transfer

An Effective Analysis of Weblog Files to improve Website Performance

Web Development using PHP (WD_PHP) Duration 1.5 months

Web Development. Owen Sacco. ICS2205/ICS2230 Web Intelligence

Transformation of honeypot raw data into structured data

Big Data Analytics in LinkedIn. Danielle Aring & William Merritt

Monitoring Replication

Portals and Hosted Files

Web Usage mining framework for Data Cleaning and IP address Identification

EVALUATION. WA1844 WebSphere Process Server 7.0 Programming Using WebSphere Integration COPY. Developer

BreezingForms Guide. 18 Forms: BreezingForms

Wiley. Automated Data Collection with R. Text Mining. A Practical Guide to Web Scraping and

PoS(ISGC 2013)021. SCALA: A Framework for Graphical Operations for irods. Wataru Takase KEK wataru.takase@kek.jp

Abstract. Description

Data Driven Success. Comparing Log Analytics Tools: Flowerfire s Sawmill vs. Google Analytics (GA)

Effective Prediction of Kid s Behaviour Based on Internet Use

LetsVi: A Collaborative Video Editing Tool Based on Cloud Storage

Framework as a master tool in modern web development

How To Use Query Console

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

SUBJECT CODE : 4074 PERIODS/WEEK : 4 PERIODS/ SEMESTER : 72 CREDIT : 4 TIME SCHEDULE UNIT TOPIC PERIODS 1. INTERNET FUNDAMENTALS & HTML Test 1

PERSONAL MOBILE DEVICE FOR SITUATED INTERACTION

Java Metadata Interface and Data Warehousing

Assessing Learners Behavior by Monitoring Online Tests through Data Visualization

National Fire Incident Reporting System (NFIRS 5.0) Configuration Tool User's Guide

Preprocessing Web Logs for Web Intrusion Detection

Techniques and Tools for Rich Internet Applications Testing

Computer Networks 1 (Mạng Máy Tính 1) Lectured by: Dr. Phạm Trần Vũ MEng. Nguyễn CaoĐạt

XML Processing and Web Services. Chapter 17

FileMaker Server 13. FileMaker Server Help

Fig (1) (a) Server-side scripting with PHP. (b) Client-side scripting with JavaScript.

A Framework for Personalized Healthcare Service Recommendation

Monitoring System Status

WebSphere Business Monitor V7.0 Script adapter lab

WebSphere Business Monitor

Teacher Assessment Blueprint. Web Design. Test Code: 5934 / Version: 01. Copyright 2013 NOCTI. All Rights Reserved.

From Desktop to Browser Platform: Office Application Suite with Ajax

Chapter 2: Getting Started

DiskPulse DISK CHANGE MONITOR

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

WIDAM - WEB INTERACTION DISPLAY AND MONITORING

CHAPTER 10: WEB SERVICES

Example. Represent this as XML

Jet Data Manager 2012 User Guide

Adding Panoramas to Google Maps Using Ajax

ORACLE USER PRODUCTIVITY KIT USAGE TRACKING ADMINISTRATION & REPORTING RELEASE 3.6 PART NO. E

GLEN RIDGE PUBLIC SCHOOLS MATHEMATICS MISSION STATEMENT AND GOALS

Transcription:

A FRAMEWORK FOR COLLECTING CLIENTSIDE PARADATA IN WEB APPLICATIONS Natheer Khasawneh *, Rami Al-Salman *, Ahmad T. Al-Hammouri *, Stefan Conrad ** * College of Computer and Information Technology, Jordan University of Science and Technology, Irbid 22110, Jordan {hammouri, natheer}@just.edu.jo, ramialsalman9@gmail.com ** Institute of Computer Science, Heinrich Heine University, Dusseldorf, Germany conrad@cs.uni-duesseldorf.de ABSTRACT User behavior on web applications holds valuable information that can be used by web engineers to enhance user experience. So far, most frameworks in tracking user behaviors rely on data solely collected on the server side. Server-side data usually holds limited information, such as IP address, requested URL, and time and date of the request. client behavior on a page itself, such as mouse clicks, mouse hovering, and text editing, are not recorded on the server side. In this paper, we present a framework that collects and stores clients behavior on web applications. The framework is implemented in JavaScript, PHP, and MySQL. The data collected using the proposed framework can be then used in enhancing future user experience with the web application, and in web personalization. Keywords: Web Engineering, Client-Side Data, Web Personalization, JavaScript 1. INTRODUCTION In the recent years, the number of websites has grown vastly. Unlike those websites that were spread during the 1990s which used to present the information to users with static web pages current web sites which we call web applications interact with users through rich and dynamic contents. Such web applications are complex and contain many interaction items, such as multimedia and JavaScript content. This type of web applications is commonly referred to by Web 2.0. The scripting language JavaScript became the de facto standard for web applications. In the past, it was mainly used for making the websites more interactive with the clients. Lately, JavaScript is employed to do effective tasks, such as decreasing the manipulation tasks assigned to the sever side. For example, JavaScript can be used to validate form user-input values. Very recently, JavaScript has evolved to be more interactive not only with the client side but also with the server side, Thus, Asynchronous JavaScript and XML (AJAX) is introduced. The main idea of the AJAX is to allow the JavaScript code to communicate back with the server. Using the XMLHttpRequest object, JavaScript can request data from the server. This allows partially loading parts of a webpage instead of loading the whole webpage at once. Understanding and evaluating users behavior with these applications will help in improving these applications to better serve clients. In the past, the main source of users behavior used is server log data. Server logs are restricted to have fixed number of data entries (e.g., IP-Address, Date, etc ). So the process to extract users behavior is constrained by the limitations of these logs. Whereas, in the case of client logs data, the number of data entries are not fixed and not uniform and some entries can be extracted from the client s mouse movements over the visited Webpage. Additionally, the client s logs data can be formulated to fit exactly into the behavior analysis method that makes the analysis process easier and efficient. Building a complete framework, which can track the client s data in an efficient way and can extract a useful knowledge from these collected data, will definitely help in understanding clients behaviors and attitudes in better way. Web developers and designers can therefore utilize this information for improving the usability and accessibility of the web applications. 2. RELATED WORK Several client-side data loggers and events capturing systems have been proposed. One of the earliest automatic systems for capturing client-side events is WebVIP [1]. WebVIP is mainly used for tracking the number of different event types. The client s events are captured and stored in a database as is without any preprocessing steps. Tracking client s movements and actions is presented in [2], where the authors proposed a JavaScript framework that can handle the client s events in websites. In addition, these clients events are combined with server logs to give more information about clients behavior. Finally, based on the collected data, the Webpage was modified by injecting adequate JavaScript code via a proxy server. The authors did not mention how and based on what they modified the Webpage. An approach for detecting client s browsing patterns is presented in [3]. The detection is performed based on the client s data. Clients events are presented as a tree that contains each accessed element and its associated accessed time. Then, the Sequence Alignment Method (SAM) is applied to discover the client s patterns from the navigated elements. The problem is that SAM is applied directly without a

preprocessing step. Furthermore, it is not clear how the events are reordered. A framework that catches the client s events and the associated AJAX events is proposed by [4]. The key point was the ability of the framework to catch the executed AJAX events. In addition, the users willingness to take part in a remote usability test of a website is evaluated. Two different software packages for both client s data and eye tracking are presented in [5]. The first one is called the WebLogger, which records client s data and used to track eye client s data. These two types of data are merged and entered as input to the analysis program, which is called WebEyeMapper. A mouse tracking system using client-side scripting is presented in [6]. The system tracks the mouse movements (position and associated timestamps).the data is then sent to a backend server via the AJAX framework. Additionally, the client s movements are visualized to understand the client s behavior. Unfortunately the collected events are not sufficient to understand the client s behavior. A Simple Mouse Tracking (SMT) system is proposed [7]. The main idea is to track the client s mouse movements and to allow the system administrator to visualize these movements. Clients cookies are used as a source of data in the WET [8] JavaScript tool. WET stores and analyzes the client s cookies that contain some of the client s preferences related to a given website. The problem in this tool is that the client must explicitly select his favorite cookie option. Furthermore, each website must provide options to transfer and gather user s data. 3. A FRAMEWORK FOR COLLECTING CLIENT-SIDE DATA In this section, we present the overall architecture of our proposed system. Fig. 1 shows the four phases of the proposed framework: session identification, events identification and catching, events storing, and merging and exporting events. In the rest of this section, we elaborate on each phase. 3.1 SESSION IDENTIFICATION During this phase, each period of activity for every unique user and the web server is identified. SessionId, which is a unique number among all identified sessions, is created. SessionId is a function of the client s IP address and the current time stamp on the local computer, i.e., the server. This guarantees the uniqueness of the sessionid across sessions. 3.2 EVENTS IDENTIFICATION AND CATCHING During this stage, we identify the web elements and the associated events. The events are classified into two categories: Clickstream based and Time based. Clickstream-based events track events which are related to the clickable web elements. For example, a user could click on either a button or a division (DIV) web element. Time based events tracks the events which are related to the time spent over a web element. For example, a user could hover his mouse over a web element for a given time and then moves away from over that element. So, the time difference between the entrance and the exit of that web element is recorded. 3.3 STORING EVENTS During this phase, the identified events are stored in relational database on the server side. The data are sent in raw format and are then parsed and stored in a structured format in the database. 3.4 MERGING AND EXPORTING EVENTS In this phase, events records are grouped per client session (user id) and are then exported as two types of data: General mode and Time based mode. 4. IMPLEMENTATION DETAILS In this section, we present the implementation details of the proposed framework with excerpts of JavaScript code the database design. Session Identification Events Identification and Catching Events Storing Merging and Exporting Events Figure 1: System architecture. 4.1 SESSION IDENTIFICATION Once a client requests a webpage, the session identification function is fired by the OnLoad event. As shown in Fig. 2, when the Major_In() function is called, a new Date object is instantiated. Then, the session starting time (maj_in variable) value is obtained via the gettime() function. The user_id() function is called which to obtain the sessionid value from the Date object using the gettime() function too. The returned value of gettime() represents the number of milliseconds since midnight Jan 1, 1970. By this way, the assigned sessionid for each client is a unique. The generated sessionid is used to identify all recorded events which belong to the same user. Finally, for finishing the current sessionid value and (or) instantiate new sessionid value, the Major_out() function is called. Major_out() function calculates the session end time (maj_out variable), and then the time difference between the start and the end time

calculated in seconds (i.e., maj_outmaj_in/1000). Figure 3: Code to record and send events to server. Figure 2: Session identification code. 4.2 EVENTS IDENTIFICATION AND CATCHING To track clickstream based events, once a client clicks on the element, the function inside OnClick event is fired, and the target data is transferred associated with userid via XmlHttpRequest AJAX call (i.e., <a hrefname= LinkNews OnClick = Transfere( LinkNewsValue ) >LinkNews </a>). The transferring data is a lightweight operation, the reason why we used AJAX technology. As shown in Figure 3, the content variable which will be transferred consists of seven variables, which are concatenated and delimited by,. The seven variables are name, value, Item_time, sessionid, Date, TotalMouses and Personlized. The name variable represents the name of the web element. The value variable represents the value of web element. The Item_time variable, represents the amount of time spent over a specific web element. In other words, Item_time variable stores the time difference between OnMouseOver and OnMouseOut events. The sessionid variable, is a very important variable, and represents a current session id for a specific client. In addition the final sessionid state value will be identified, when the value of personlized variable is not null. The Date variable represents the event date and time. We register the date on the client side for two reasons: to determine the date of the element event and to determine whether the date on the client machine matches the server machine s date. The TotalMouses variable is a counter which is incremented on every click. Moreover, the final counter value is sent when the Personalized value is not null. The Personalized value indicates if the client session is finished or not. In this framework, the focus is not only on the name and the value of the element but also on the time which is spent on each element. Thus, we inject two types of events: OnMouseOver and OnMouseOut. Once the client sets his mouse over a web element, a function is invoked to instantiate the client s time object. As shown in Figure 2. This operation can be accomplished using JavaScript s Date Object. Likewise, when the client moves the mouse away from over the web element, another function, related to OnMouseOut event, is invoked to calculate the time difference between OnMouseOver and OnMouseOut events and to store the value in the Item_time variable. Next, The OnMouseOut event uses the XmlHttpRequest protocol to send the content variable to the server. All concatenated variables inside the content variable are identical to the Clickstream based approach, except that for the newly added variable, Item_time. 4.3 STORING EVENTS

4.4 MERGING AND EXPORTING EVENTS The operation of exporting data has two modes: General mode and Time based mode. In a general mode, the exporting is done by a simple aggregation of all records within a specific session. The output of the aggregation operation is a vector data structure consisting of six fields: all clickstreams, the date for each click stream, the session ID, the total spent time in the session, the total number of clicks, and the IP address. Figure 6 shows the snapshot of the aggregated records without both the IP address and the date columns in the general mode. Figure 4: Code to explode and insert data in Database. The client data, which is sent via the GET method inside XmlHttpRequest, is received by the backend server (script_page.php). As shown in Figure 4, because the data was concatenated and delimited with, on the client side, it must be parsed to extracting the variables values. The explode function in line 2, is used to convert the data into array of variables. In addition, in lines 6 and 7, other variables are initialized at the server side. These are the IP-Address and the server date. In line 8, the variables will be inserted into items_events table. Finally, in line 11, the user id and personalized variables are inserted into the user_select. Figure 5 shows the Database schema, which consists of two tables, Events table and session Primary key table. The Events table stores vectors of values for the client transaction (events). On other hand, the Session primary key table stores the session id for each client, which is a primary key, and the final personalized item. The relationship between these tables allows us to easily merge data for client sessions. In the time based mode, the scenario is similar to the general mode, except that while the records are aggregated, the time spent over each item is accumulated. In this case, the vector data structure consists of n fields to store n-accumulated times for each web element in addition to the fields in the general mode but without the clickstream field. Figure 7 shows the snapshot of the aggregated records without both the IP address and the date columns in the time based mode. Figure 6: Snapshot of the aggregated records in general mode Figure 7: Snapshot of the aggregated records in time based mode Figure 5: ER Diagram of the used Database. 5. CONCLUSION AND FUTURE WORK In this paper, we presented a new framework to collect client side Paradata. The proposed framework was implemented using JavaScript, PHP, and MySQL. The

events on the client-side were collected and sent back to the server. At the server side, the data were structured in a convenient way for further applications, such as enhancing user experience, web personalization, and better evaluating user behavior. As a future work, we will apply the proposed framework to different web applications and then apply different data mining techniques to cluster or classify the users of the web application. ACKNOWLEDGEMENT This research was supported in part by the German Research Foundation (DFG), the Higher Council for Science and Technology (HCST) and the Jordan University of Science and Technology (JUST). REFERENCES [1] WebVIP, http://zing.ncsl.nist.gov/webtools/webvip/overvi ew.html, 2002. [2] Atterer R., Wnuk M., Schmidt A., Knowing the user's every move: user activity tracking for website usability evaluation and implicit interaction, In Proceedings of the 15th international conference on World Wide Web, Edinburgh, Scotland, pp. 203-212, 2006. [3] De Santana V. F., Baranauskas M., Summarizing observational client-side data to reveal web usage patterns, In Proceedings of the 2010 ACM Symposium on Applied Computing (SAC 10), Sierre, Switzerland, pp. 1219-1223, 2010. [4] Atterer R., Schmidt A., Tracking the interaction of users with AJAX applications for usability testing, In Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose, California, USA, pp. 1347-1350, 2007. [5] Reeder R., Pirolli P., Card S., WebEyeMapper and WebLogger: tools for analyzing eye tracking data collected in web-use studies, In CHI '01 extended abstracts on Human factors in computing systems (CHI EA '01), Seattle, Washington, pp. 19-20, 2001. [6] Mueller F., Lockerd A., Cheese: tracking mouse movement activity on websites, a tool for user modeling. In CHI '01 extended abstracts on Human factors in computing systems (CHI EA '01). Seattle, Washington, pp. 279-280. 2001. [7] Torres L., Hern R., (smt) Real Time Mouse Tracking Registration and Visualization Tool for Usability Evaluation, In Proceedings of the IADIS international conference WWW/Internet, Vila Real, Portugal, 2007. [8] Etgen, M., J. Cantor. What Does Getting WET (Web Event-Logging Tool) Mean for Web Usability? In Proceedings of Fifth Human Factors and the Web Conference, Gaithersburg, Maryland, USA, 1999.