UNIVERSITI TEKNOLOGI MARA THE DEVELOPMENT AND EVALUATION OF CONFIGURABLE WEB USAGE ANALYZER NASRUL AZLI BIN AHMAD



Similar documents
UNIVERSITI TEKNOLOGI MARA FACULTY OF INFORMATION TECHNOLOGY AND QUANTITATIVE SCIENCE

A WEB-BASED SYSTEM APPLYING THE CUSTOMER RELATIONSHIP MANAGEMENT (CRM) CONCEPTS ON CUSTOMER SERVICES AND SUPPORT (CSS)

Unjverslti Teknologi MARA. Prototype Of Web - Based Journal Publication Systems For Institute Of Research, Development Commercialization (IRDC)

Universiti Teknologi MARA. Requirement Analysis Using UML Approach for Research Management System (RMS)

Universiti Teknologi MARA. ANALYSIS THE PERFORMANCE OF VIDEO CONFERENCING BASED ON QUALITY OF SERVICE (QoS) Nor Hayaty binti Amran

An Effective Analysis of Weblog Files to improve Website Performance

Arti Tyagi Sunita Choudhary

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

Towards Virtual Course Evaluation Using Web Intelligence

INTEGRATED STAFF ATTENDANCE SYSTEM (ISAS) WEE PEK LING

Universiti Teknologi MARA. User Perception on Electronic Customer Relationship Management (E-CRM) Features in Online Hotel Reservation

Universiti Teknologi MARA. A Development of Prototype Web Based Template Teaching Aid System

Research and Development of Data Preprocessing in Web Usage Mining

Business Intelligence in E-Learning

Advanced Preprocessing using Distinct User Identification in web log usage data

Web Mining as a Tool for Understanding Online Learning

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

IMPLEMENTATION OF SECURE MEDICAL RECORD USING SMARTCARD TECHNOLOGY

ERIE COMMUNITY COLLEGE COURSE OUTLINE A. COURSE TITLE: CS WEB DEVELOPMENT AND PROGRAMMING FUNDAMENTALS

Effective User Navigation in Dynamic Website

A Survey on Web Mining From Web Server Log

Session Administration System (SAS) Manager s Guide

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

PREPROCESSING OF WEB LOGS

A SURVEY ON WEB MINING TOOLS

Chapter 1 Domain Names...1

Web Usage Mining for a Better Web-Based Learning Environment

Multifunctional Barcode Inventory System for Retailing. Are You Ready for It?

IBM Tivoli Composite Application Manager for Microsoft Applications: Microsoft Hyper-V Server Agent Version Fix Pack 2.

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

CLIR/CIC - Archives for Non-Archivists Workshop Introduction to The Archivists Toolkit. Introduction

HELP DESK SYSTEM IZZAT HAFIFI BIN AHMAD ARIZA

SAS IT Resource Management 3.2

DEANSHIP OF ACADEMIC DEVELOPMENT e-learning Center GUIDELINES FOR

IMPLEMENTATION OF A TIME TABLE GENERATOR USING VISUAL BASIC.NET

E-Learning by Using Content Management System (CMS)

NADHIRA YASMIN ZULKAPLI ( )

PROJECT MANAGEMENT SYSTEM

Web Hosting Features. Small Office Premium. Small Office. Basic Premium. Enterprise. Basic. General

Technical Specifications (Excerpt) TrendInfoWorld Web Site

COURSE RECOMMENDER SYSTEM IN E-LEARNING

THE CASE FOR VALUE MANAGEMENT TO BE INCLUDED IN EVERY CONSTRUCTION PROJECT DESIGN PROCESS

Analysis of Server Log by Web Usage Mining for Website Improvement

WEB APPLICATION FIREWALL

LMS USER GUIDE AN INTRODUCTION TO REPORTS

A Time Efficient Algorithm for Web Log Analysis

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Web Design and Implementation for Online Registration at University of Diyala

A COMPARATIVE ON PERFORMANCE OF VOIP USING POWER LINE AND WIRED (UTP CAT5)

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

Content Management System User Guide

Lesson 7 - Website Administration

SUBJECT CODE : 4074 PERIODS/WEEK : 4 PERIODS/ SEMESTER : 72 CREDIT : 4 TIME SCHEDULE UNIT TOPIC PERIODS 1. INTERNET FUNDAMENTALS & HTML Test 1

Bitrix Site Manager 4.1. User Guide

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

Software Delivery Integration and Source Code Management. for Suppliers

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

EBOX Digital Content Management System (CMS) User Guide For Site Owners & Administrators

Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

A Manual on use of ABCD central and VHL-Site modules for Developing Library Information Discovery and Information Literacy Tools

SYSTEM DEVELOPMENT AND IMPLEMENTATION

A Review On Authoring Tools

CUSTOMER ONLINE PURCHASE INTENTION TOWARDS AIRLINE E-TICKETING IN KLANG VALLEY CHEW YUH YIING CHONG CHOOI SUN MICHELLE SIM KAI FERN YONG SOOK HUOI

Nipper Studio Beginner s Guide

ADP Workforce Now Security Guide. Version 2.0-1

DETECTING AND ANALYZING NETWORK ATTACKS USING VIRTUAL HONEYNET NUR ATIQAH BT. HASAN

SAS Business Data Network 3.1

WordPress Security Scan Configuration

Open Source Content Management System for content development: a comparative study

SPATIAL DATA CLASSIFICATION AND DATA MINING

KOINOTITES: A Web Usage Mining Tool for Personalization

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

Release System Administrator s Guide

WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques

Term of References (TOR)

Guide to Analyzing Feedback from Web Trends

DEVELOPING A WEB-BASED PACKET MONITORING TOOL

The objectives for this unit as presented are below. This unit can be adjusted to suit the particular objectives of your training session.

Richmond SupportDesk Web Reports Module For Richmond SupportDesk v6.72. User Guide

CWU Content Management System (CMS) User Guide

LMS Evaluation Tool User Guide

CONNECTING TO DEPARTMENT OF COMPUTER SCIENCE SERVERS BOTH FROM ON AND OFF CAMPUS USING TUNNELING, PuTTY, AND VNC Client Utilities

End User Guide The guide for /ftp account owner

elearning Content Management Middleware

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

Table of Contents. CHAPTER 1 Web-Based Systems 1. CHAPTER 2 Web Engineering 12. CHAPTER 3 A Web Engineering Process 24

Student Guide to Neehr Perfect Go!

Intinno: A Web Integrated Digital Library and Learning Content Management System

Generalization of Web Log Datas Using WUM Technique

Software Requirement Specification for Web Based Integrated Development Environment. DEVCLOUD Web Based Integrated Development Environment.

FACULTY STUDENT MENTORSHIP PROGRAM. A Thesis. Presented to the. Faculty of. San Diego State University. In Partial Fulfillment

In the case of the online marketing of Jaro Development Corporation, it

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Software Requirement Specification For Flea Market System

Pre-Processing: Procedure on Web Log File for Web Usage Mining

Automatic Recommendation for Online Users Using Web Usage Mining

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Transcription:

UNIVERSITI TEKNOLOGI MARA FACULTY OF INFORMATION TECHNOLOGY AND QUANTITATIVE SCIENCE THE DEVELOPMENT AND EVALUATION OF CONFIGURABLE WEB USAGE ANALYZER BY NASRUL AZLI BIN AHMAD 2004633591 B. Sc (HONS) DATA COMMUNICATION AND NETWORKING

THE DEVELOPMENT AND EVALUATION OF CONFIGURABLE WEB USAGE ANALYZER BY NASRUL AZLI BIN AHMAD 2004633591 A final project submitted in partial fulfillment of the requirement for the B. Sc (HONS) DATA COMMUNICATION AND NETWORKING A project paper submitted to FACULTY OF INFORMATION TECHNOLOGY AND QUANTITATIVE SCIENCE UNIVERSITI TEKNOLOGI MARA MAY 2006 Approved by the Examining Committee :.. Project Supervisor, EN MOHD FAISAL BIN IBRAHIM Examiner, PROF. MADYA DR. SAADIAH BINTI YAHYA

DECLARATION I hereby declare that the work in this project paper is on my own except for those quotations and summaries, which have been acknowledged.... NASRUL AZLI BIN AHMAD 2004633591 ii

ACKNOWLEDGEMENT Alhamdulillah, with gratitude and blessed from Allah S.W.T, finally I have completed my project without having major problems within the period given. Firstly, I am so grateful that I have been given the strength by Allah S.W.T to complete this project. I would like to take this space of opportunity to express my gratitude to my Supervisor, Encik Mohd Faisal bin Ibrahim for his guidance, encouragement and support that really helps me a lot in completing this project. I am feeling so luck for being under his supervision. He is so supportive and always gives me an idea to enhance this project. I also would like to express my appreciation to Prof. Madya Dr. Saadiah binti Yahya for their ideas and moral support. Moreover, I am so thankful to my project team members, especially to Rohaniah binti Yusof for helping me to accomplish this project successfully. Lastly, thank you to all my friends for their cooperation in helping me to complete this project. Thank you. APRIL 2006 NASRUL AZLI BIN AHMAD iii

ABSTRACT Learning Web users preferences and adapting the Web information structure to the users behaviors can improve the effectiveness of the particular websites. In addition, automatic knowledge extraction from the Web server log files, page tags, network packets and cookies can be useful for identifying such reading patterns and infer user profiles in order to design the website suited for different group of users. Therefore we develop a Web Usage Analyzer System in order to extract the useful implicit and previously unknown patterns from the usage of the website. In our study we analyzed an online course using web usage mining techniques by analyzing users behavior in terms of site usage, involvement of the most active users from their navigational activities and the number of visits throughout the semester of the course. By using this system, lecturer manage to analyze students behavior in terms of site usage such as last date accessed, login name, time and visited page. Lecturer can also investigate the trends of the website in terms of popularity by identifying the most visited page. In conclusion, the Web Usage Analyzer System provides great features to the lecturer to investigate and analyze the users behaviors, activities and their performances. iv

TABLE OF CONTENTS DECLARATION... ii ACKNOWLEDGEMENT... iii ABSTRACT... iv TABLE OF CONTENTS... v LIST OF ABBREVIATIONS... ix LIST OF FIGURES... x LIST OF TABLES... xiii 1.0 INTRODUCTION... 1 1.1 PROBLEM BACKGROUND... 1 1.2 PROBLEM OF STATEMENT... 2 1.3 OBJECTIVES OF THE RESEARCH... 3 1.4 SCOPE OF THE RESEARCH... 3 1.5 SIGNIFICANCES OF THE RESEARCH... 3 1.5.1 Significances to the Future Researchers... 4 1.5.2 Significances to the Content Developer... 4 1.5.3 Significances to the NBPL Users... 4 1.6 OUTLINED OF THE THESIS... 5 2.0 LITERATURE REVIEW... 6 2.1 INTRODUCTION... 6 2.2 DEFINITION OF THE COMMON TERMS... 6 2.3 WEB USAGE MINING... 7 2.4 WEB-BASED COLLABORATIVE LEARNING... 10 2.5 WEB USAGE MINING PHASES... 11 2.5.1 Preprosessing... 11 2.5.1.1 Content Preprocessing... 12 2.5.1.2 Structure Preprosessing... 13 2.5.1.3 Usage Preprosessing... 13 2.5.2 Pattern Discovery... 14 v

2.5.2.1 Statistical Analysis... 14 2.5.2.2 Association Rules... 15 2.5.2.3 Clustering... 15 2.5.2.4 Classification... 15 2.5.2.5 Sequential Patterns... 16 2.5.2.6 Dependency Modeling... 16 2.5.3 Pattern Analysis... 16 2.6 NETWORK BASED PROJECT LEARNING... 16 2.7 RELATED WORKS... 17 2.7.1 Optimization of an Online Course with Web Usage Mining... 17 2.7.2 Web Usage Mining for a Better Web-Based Learning Environment... 17 2.7.3 Towards Evaluating Learners Behavior in a Web-Based Distance Learning Environment... 18 2.7.4 Web Mining and Knowledge Discovery of Usage Patterns... 18 2.7.5 Web Usage Mining : Discovery and Applications of Usage Patterns from Web Data... 19 2.8 CONCLUSIONS... 19 3.0 METHODOLOGY... 20 3.1 INTRODUCTION... 20 3.2 DATA COLLECTION METHODOLOGY... 20 3.3 PROJECT METHODOLOGY... 20 3.3.1 Phase 1 : Project Planning / Preliminary Investigation... 22 3.3.2 Phase 2 : Analysis... 22 3.3.2.1 Hardware Requirements... 22 3.3.2.2 Software Requirements... 23 3.3.3 Phase 3 : Design... 23 3.3.4 Phase 4 : Development... 24 3.3.5 Phase 5 : Testing and Implementation... 26 3.3.6 Phase 6 : Maintenance... 27 3.4 CONCLUSIONS... 27 vi

4.0 SYSTEM OVERVIEW AND ARCHITECTURE... 28 4.1 INTRODUCTION... 28 4.2 USER MODEL... 28 4.2.1 Context Diagram... 28 4.3 DEVELOPMENT... 29 4.3.1 Modification to the Database... 29 4.3.2 Modification to the Source Code... 32 4.3.3 Site Statistics Pages... 32 4.3.3.1 Page Tagging... 33 4.3.3.1.1 Internal Page Tagging... 34 4.3.3.1.2 External Page Tagging... 38 4.3.3.2 Site Statistics Menu... 45 4.3.3.3 Detailed Statistics... 45 4.3.3.4 Visitor Tracking... 48 4.3.3.5 Personal Information... 50 4.4 CONCLUSIONS... 51 5.0 RESULTS AND FINDINGS... 52 5.1 INTRODUCTION... 52 5.2 CONFIGURABLE RANGE OF DATE LINE CHART... 52 5.3 SITE STATISTICS ANALYSIS... 54 5.3.1 Users Behavior... 54 5.3.2 Users Involvement... 56 5.3.3 Trends of NBPL Website... 60 5.3.4 Data Transfer Efficiency and Duration Time Analysis... 62 5.3.4.1 Testing... 62 5.3.4.2 Data Transfer Efficiency... 63 5.3.4.3 Duration Time... 65 5.3.4.4 Application- and Total Frames... 67 5.4 CONCLUSIONS... 68 vii

6.0 CONCLUSIONS AND RECOMMENDATIONS... 69 6.1 INTRODUCTION... 69 6.2 CONCLUSIONS... 69 6.3 RECOMMENDATIONS... 70 REFERENCES... 72 APPENDIX A... 75 viii

LIST OF ABBREVIATIONS WWW NBPL HTML World Wide Web Network Based Project Learning HyperText Markup Language ix

LIST OF FIGURES Figure 2. 1 : Web mining categories ((Cooley et al., 1997) and (Chakrabarti et al., 1999))... 8 Figure 2. 2 : General Architecture for Web Usage Mining (R. Cooley, B. Mobasher, J. Srivastava, 1999)... 9 Figure 2. 3 : Web Usage Mining sub steps (L. E. Akman, B. Akkan, N. Baykal, 2003)... 11 Figure 2. 4 : Details of Web Usage Mining Preprocessing (R. Cooley, B. Mobasher, J. Srivastava, 1999)... 11 Figure 2. 5 : Raw data format (L. E. Akman, B. Akkan, N. Baykal, 2003)... 12 Figure 2. 6 : Sample Web Server Log (J. Srivastava, R. Cooley, M. Deshpande, P. Tan, 2000)... 13 Figure 2. 7 : Preprocessed Web Log (L. E. Akman, B. Akkan, N. Baykal, 2003)... 14 Figure 3. 1: Project Methodology Diagram... 21 Figure 3. 2 : Block Chart of the System... 24 Figure 3. 3 : Raw data string based on file name... 24 Figure 3. 4 : Raw data string... 25 Figure 3. 5 : Data cleaning technique will discard the unwanted keywords... 25 Figure 3. 6 : Location and section will be saved into the database based on visited page... 25 Figure 3. 7 : Example of Statistical Analysis Output... 26 Figure 3. 8 : Example of SQL commands in Pattern Analysis phase... 26 Figure 4. 1 : Context Diagram of Web Usage Analyzer System... 28 Figure 4. 2 : Relationship Between Database Tables that Involves in Web Usage Analyzer System... 32 Figure 4. 3 : File Tagging Page... 33 Figure 4. 4 : File Tags Form.... 34 x

Figure 4. 5 : Files in the root folder (named as htdocs) located in the web server... 35 Figure 4. 6 : Syntax that must be included in each file in order to apply the Internal Page Tagging... 35 Figure 4. 7 : PHP coding in `tiki-stats_tag.php` which is included in every files in the root folder... 36 Figure 4. 8 : Example of Internal Page Tagging... 37 Figure 4. 9 : New field is generated automatically into the `tiki_stats_all_section` table... 38 Figure 4. 10 : PHP code to retrieve filename... 38 Figure 4. 11 : PHP codings in `ext_links_in.php`... 39 Figure 4. 12 : Example to tag a non-php files but located in root folder... 40 Figure 4. 13 : Online Notes Menu option... 40 Figure 4. 14 : Need to bypass `ext_links_in.php` file before loading the requested page... 40 Figure 4. 15 : Redirect to `Copper_Media-Coaxial_Cable.htm` page... 41 Figure 4. 16 : PHP codings in `ext_links_out.php`... 42 Figure 4. 17 : Example to tag an external link such as URL website... 43 Figure 4. 18 : External Links Menu option... 43 Figure 4. 19 : Need to bypass `ext_links_out.php` file before loading the requested page... 43 Figure 4. 20 : Redirect to `www.uitm.edu.my` page... 44 Figure 4. 21 : Site Statistics Menu Options.... 45 Figure 4. 22 : Options for Detailed Stats Submenu.... 45 Figure 4. 23 : Detailed Stats Output for File Galleries Section.... 46 Figure 4. 24 : Popup window for viewing the output in spreadsheet format... 46 Figure 4. 25 : Detailed Stats output viewed in Microsoft Excel... 47 Figure 4. 26 : Redirect to another page by clicking the hyperlink... 47 Figure 4. 27 : Detailed information for individual NBPL user only... 48 Figure 4. 28 : Visitor Tracking Page... 49 Figure 4. 29 : Personal Information Details for All NBPL Users... 50 Figure 4. 30 : Student Registration Form... 51 xi

Figure 5. 1 (a) and (b) : Example of Line Chart from 6th March to 24th April 2006... 53 Figure 5. 2 : Example of Total Hits as 9th April 2006... 54 Figure 5. 3 : Detailed Statistics of NBPL Users For `File Galleries`... 55 Figure 5. 4 : Total Hits For Selected User... 55 Figure 5. 5 : Total Hits for All NBPL Users... 56 Figure 5. 6 : Detailed Statistics For One of the Most Active... 57 Figure 5. 7 : Detailed Statistics For One of the Most Inactive User... 59 Figure 5. 8 : Daily Chart... 60 Figure 5. 9 : Weekly Charts... 61 Figure 5. 10 : Bandwidth and Latency... 62 Figure 5. 11 : Data Transfer Efficiency to Load Statistics Pages... 63 Figure 5. 12 : Data Transfer Efficiency for Viewing Different Chart Types... 64 Figure 5. 13 : Duration Time to Load Site Statistics Page... 65 Figure 5. 14 : Duration Time for Exporting Data to Ms Excel... 66 Figure 5. 15 : Number of Frames Sent for Each Page in Windows 2003 Server... 67 Figure 5. 16 : Number of Frames Sent for Each Page in Linux Server... 68 xii

LIST OF TABLES Table 4. 1 : Table `tiki_stats_all_section`... 30 Table 4. 2 : Table `tiki_stats_file_download`... 30 Table 4. 3 : Table `tiki_stats_files_tag`.... 31 Table 4. 4 : Table `tiki_stats_logs`.... 31 Table 4. 5 : Table `tiki_stats_logs_day`... 31 Table 4. 6 : Table `tiki_stats_poll`... 31 xiii

CHAPTER I INTRODUCTION 1.1 PROBLEM BACKGROUND With the rapid evolvement of the information technology, the World Wide Web (WWW) nowadays becomes the most important media for collecting, sharing and distributing information to anyone, anytime and anywhere. According to Y. Wang (2000), the Web is a huge, diverse, dynamic, explosive and mostly unstructured data repository that supplies an incredible amount of information. Education is one of the disciplines where web-based technology has been rapidly and successfully adopted. In this field, it may support many of the activities that occur in the classroom. It also provides and facilitates communication and feedback between users. These two elements are essential keys to effective online learning environment. Furthermore, it supports various styles of learning such as collaborative learning, discussion-led learning, student-based learning, student-centered learning and resourcebased learning. Instead of this field is providing flexible of time and place, it can also accommodate the increased number of students who use it and may share and reuse of the available resources. Learning Web users preferences and adapting the Web information structure to the users behaviors can improve the effectiveness of the particular websites. In addition, automatic knowledge extraction from the Web server log files, page tags, network packets and cookies can be useful for identifying such reading patterns and infer user profiles in order to design the website suited for different group of users. Therefore, the Web mining can be useful to encounter these problems. 1

Web Usage Mining is one of the techniques that fall under Web mining. This research will discuss in depth, adapt this kind of technique for the web-based collaborative learning website or even for portal web-based. In short, Y. Wang (2000) describes Web Usage Mining as the technique to predict user behavior while interacting with the Web. In this category, some information sources such as Web server logs are needed to be extracted to discover the hidden patterns after undergo some processes. Furthermore, Web Usage Mining also will interact with Web Content Mining and Web Structure Mining through the clustering process of pattern discovery as a bridge. 1.2 PROBLEM OF STATEMENT In recent days, there are several commercially available Web log analysis tools, but it is difficult to find appropriate and suitable tools for analyzing raw Web log data in order to retrieve significant and useful information. Moreover, most of the available tools are considered too slow, inflexible, expensive, difficult to maintain or very limited in the result that they can actually produce (D. Batista, M. J. Silva, 2001). There are some examples of the Web usage mining tools that are available nowadays such as WebSIFT, SpeedTracer, WebLogMiner, Shahabi, SurfAid, Analog and many more. Furthermore, web-based collaborative learning that currently available nowadays lack of a closer student-lecturer relationship. Most of them do not support suitable tools to allow lecturers to keep track and assess all the activities performed by their students. Lecturers cannot evaluate the course content provided and the effectiveness of the learning process. From the students point of view, they cannot deliver their problems and deficiencies in a natural way. (M. E. Zorrilla, E. Menasalvas, D. Marin, E. Mora, J. Segovia, 2005). So, the main objective of this project is to extract the valuable patterns for access Web log data that containing the behavior characteristics of the users which can be used to improve the web-based collaborative learning system environment or help in the learning evaluation. 2

1.3 OBJECTIVES OF THE RESEARCH As a guidance to successfully complete this research, some objectives are determined and defined precisely. The objectives of this project are as follows : a. to analyze users behavior in terms of site usage such as last date accessed, login name, time and visited page. b. keep track and trace the involvement of the frequent (most active) users from their navigational activities through the whole website. c. to investigate the trends of the website which are more attractive to the users in terms of developing the Web contents or features to fulfill the users wants when surfing the particular website. d. to analyze the data transfer efficiency and duration time to load the page for both Windows 2003 Server and Linux Server platforms. 1.4 SCOPE OF THE RESEARCH This research will focus on the Network Based Project Learning (NBPL) community as an initial platform. The respondent of this project will be those who registered with NBPL. These respondents will act as an input and contribute towards the analysis of the Web Usage Mining. 1.5 SIGNIFICANCES OF THE RESEARCH This research will bring great significances and contributions to many parties especially to the future researchers, the content developer and also to the users themselves. 3

1.5.1 Significances to the Future Researchers Since this research is conducted by focusing on NBPL, thus the Web Usage Mining technique can be further enhanced and applied for the larger scope of web-based collaborative learning disciplines or any related transactional activities via website. For an instance, Web Usage Mining technique can be applied for the Learning Management System (LMS), Course Management System (CMS), Enterprise Learning Management (ELM) and also Learning Content Management System (LCMS). 1.5.2 Significances to the Content Developer Besides, Web Usage Mining technique is beneficial to the content developer for allowing them to do an analysis which can support the management of NBPL website for continuous improvement. The above element can be part of the research value of this study. The content developer will continuously improve and enrich the content of the website based on users interests so that the structure of the course content is effective for web-based collaborative learning purpose. The content developer can analyze the page s- and user s statistics through some charts that are developed in this project. 1.5.3 Significances to the NBPL Users By applying Web Usage Mining technique, the content of the website are more beneficial to the users and can fulfill their needs. Each interaction of the users will be an input to the content developer. For an example, Web Usage Mining can be used to exploit user s information such as user id, date and time accessed and visited page accessed. In addition, the users will enjoy themselves when surfing the website since the contents can build their interests. 4

1.6 OUTLINED OF THE THESIS Basically, this thesis report consists of six chapters. This chapter highlights the background of this project including the problem statement, objectives to be achieved, scope and significances of the research. As the next chapter, Literature Review will give a clear definition of some common terms used throughout this project and also discussing similar related works. Followed by Chapter Three, Methodology, will explain some phases and methods that have to be followed in completing this project. The overview and architecture of the system are explained in Chapter Four, System Overview and Architecture. Besides, any findings and results gained through this project will be discussing in depth in Chapter Five. Finally, Chapter Six will provide some conclusions, suggestions and recommendations either to improve NBPL or as a reference for the next future researcher. 5

CHAPTER II LITERATURE REVIEW 2.1 INTRODUCTION This chapter defines some common terms that are used throughout this project for a better understanding. In addition, this chapter also includes previous works that related to this project. 2.2 DEFINITION OF THE COMMON TERMS The title of this research is The Development and Evaluation of Configurable Web Usage Analyzer. As defined by The American Heritage Dictionary of the English Language (2004), the word development means the act of developing or determination of the best techniques for applying a new device or process to production of goods or services. The word evaluation can be defined as assesses the effectiveness of an ongoing program in achieving its objectives, relies on the standards of project design to distinguish a program's effects, and aims at program improvement through a modification of current operations. Meanwhile, the word configurable can be understood as design, arrange, set up, or shape with a view to specific applications or for a particular purpose depends on user s requirements. The word analyzer is any of various instruments used for performing an analysis, as interpreted by WordNet. 6

The American Heritage Dictionary of the English Language (2004) defines the word technique as the systematic procedure by which a complex or scientific task is accomplished. In addition, the word technique also means a practical method applied to some particular task or skillfulness in the command of fundamentals deriving from practice and familiarity, according to WordNet. 2.3 WEB USAGE MINING Further on, the word Web is actually a short from World Wide Web, which can be defined as a system of Internet servers that support specially formatted documents. The documents are formatted in a script called HyperText Markup Language (HTML) that supports links to other documents, as well as graphics, audio, and video files. It means that one document can be linked to another, simply by clicking on hot spots. Not all Internet servers are part of the World Wide Web (Glossary of Library Terms, University at Buffalo Libraries, 2005). Meanwhile, the word usage can be simply defined as use, cause to act or to serve for a purpose or as an instrument of material. The word mining means extracting something useful or valuable from a baser substance. But in the context of this overall research, the term web mining or data mining can be used which also brings the same meanings that support the title of this study. By referring to XSB Inc, data mining is interpreted as the process of autonomously extracting useful information or knowledge ( actionable assets ) from large data stores or sets. Data mining can be performed on a variety of data stores, including the World Wide Web, relational databases, transactional databases, internal legacy systems, pdf documents, and data warehouses. Besides, web mining is the term of applying data mining technique to automatically discover and extract useful information from the World Wide Web documents and services. (O. Etzioni, 1996). According to L. E. Akman, B. Akkan and N. Baykal (2003), they define web mining as discovering, analyzing and processing the information from the World Wide Web. 7

Furthermore, the word website means a collection of interlinked documents on a Web server (Glossary of Library Terms, University at Buffalo Libraries, 2005) on a particular subject, including a beginning file called a home page. Other pages on the site can be reached, directly or indirectly, from the home page (Starr Sites, 1999). Actually, Etzioni was the first person who brought the idea with the term of Web mining in his research paper (O. Etzioni, 1996). He questioned whether it is practical or not to mine the Web data and he stated that Web mining could be divided into three processes, which are Web Content Mining, Web Structure Mining and Web Usage Mining (also known as Web Log Mining). Figure 2. 1 : Web mining categories ((Cooley et al., 1997) and (Chakrabarti et al., 1999)) 8

Figure 2. 2 : General Architecture for Web Usage Mining (R. Cooley, B. Mobasher, J. Srivastava, 1999) Basically, the main idea of Web mining is to meet the web users needs. Briefly, according to Y. Wang (2000), Web Content Mining concentrates on the discovery or retrieval of the useful information from the Web contents, data or documents. Meanwhile, Web Structure Mining describing the technique to model the underlying link structures of the Web. Lastly, he defines Web Usage Mining as the technique, which discovers the users usage patterns and attempts to predict the users behaviors while they interact with the Web. L. E. Akman, B. Akkan and N. Baykal (2003) in their research stated that Web Usage Mining processes the usage data by extracting the information in the Web logs to discover the hidden patterns. The Web log provides a raw trace of the learners navigation and activities on the site, as cited by O. R. Zaiane (2002). The Web log data are relatively poor, unstructured and also containing erroneous and irrelevant entries. In the context of learning environment, the discovery of patterns from navigation history by Web Usage Mining can reveal the learners navigation behavior, the efficiency of the models used in the online learning process besides evaluating the learners activities. These patterns cannot be simply extracted with the common statistically analysis. (O. R. Zaiane, J. Luo, 2001). 9

Many sophisticated web-based learning environment have been developed and are in use around the world, but there is very little done to automatically discover access patterns to understand learners behavior on web-based distance learning (O. R. Zaiane, 2002). 2.4 WEB-BASED COLLABORATIVE LEARNING The Web-based environment creates new possibilities to support and enhance this communication within the lecturer-student community, while retaining the familiar faceto-face classroom interaction, as one of the essential aspects of a learning process (K. H. Vat, 2001). And nowadays, web-based collaborative learning environments are benefiting from the rising of communication and information sharing services. However, the mere fact of setting up an environment for students and lecturers does not guarantee mutual collaboration or successful student learning (E. Gaudioso, O. C. Santos, A. Rodríguez, J. G. Boticario, 2004). Collaborative learning is the idea that small, interdependent groups of students work together as a team to help each other learn. So, small learning group plays a very important role in collaborative learning process, especially in web-based collaborative learning environment (J. Zhao, D. McConnell, 2001). Besides, B. L. Smith and J. T. MacGregor (1992) address that collaborative learning represents a significant shift away from the typical teacher-centered or lecturecentered milieu in college classrooms. In collaborative classrooms, the lecturing, listening or note-taking process may not disappear entirely, but it lives alongside other processes that are based in students discussion and active work with the course material. 10