Generalization of Web Log Datas Using WUM Technique



Similar documents
Arti Tyagi Sunita Choudhary

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

A Survey on Web Mining From Web Server Log

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

Advanced Preprocessing using Distinct User Identification in web log usage data

PREPROCESSING OF WEB LOGS

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining Solutions for the Business Environment

An Effective Analysis of Weblog Files to improve Website Performance

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Analyzing the Different Attributes of Web Log Files To Have An Effective Web Mining

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

A SURVEY ON WEB MINING TOOLS

Data Preprocessing and Easy Access Retrieval of Data through Data Ware House

How To Analyze Web Server Log Files, Log Files And Log Files Of A Website With A Web Mining Tool

ANALYSING SERVER LOG FILE USING WEB LOG EXPERT IN WEB DATA MINING

Analysis of Server Log by Web Usage Mining for Website Improvement

Web Log Analysis for Identifying the Number of Visitors and their Behavior to Enhance the Accessibility and Usability of Website

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Pre-Processing: Procedure on Web Log File for Web Usage Mining

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator

Web Mining Patterns Discovery and Analysis Using Custom-Built Apriori Algorithm

Web Mining Functions in an Academic Search Application

Web Usage mining framework for Data Cleaning and IP address Identification

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

A Survey on Preprocessing of Web Log File in Web Usage Mining to Improve the Quality of Data

E-CRM and Web Mining. Objectives, Application Fields and Process of Web Usage Mining for Online Customer Relationship Management.

Prediction of Heart Disease Using Naïve Bayes Algorithm

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

ISSN: A Review: Image Retrieval Using Web Multimedia Mining

ANALYZING OF SYSTEM ERRORS FOR INCREASING A WEB SERVER PERFORMANCE BY USING WEB USAGE MINING

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

A Time Efficient Algorithm for Web Log Analysis

Journal of Global Research in Computer Science RESEARCH SUPPORT SYSTEMS AS AN EFFECTIVE WEB BASED INFORMATION SYSTEM

Mining for Web Engineering

Automatic Recommendation for Online Users Using Web Usage Mining

Preprocessing Web Logs for Web Intrusion Detection

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

Web Mining Techniques in E-Commerce Applications

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Guide to Analyzing Feedback from Web Trends

WebAdaptor: Designing Adaptive Web Sites Using Data Mining Techniques

Search Result Optimization using Annotators

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

DATA MINING TECHNIQUES AND APPLICATIONS

Chapter-1 : Introduction 1 CHAPTER - 1. Introduction

Enhanced Boosted Trees Technique for Customer Churn Prediction Model

A Survey on Web Mining Tools and Techniques

ABSTRACT The World MINING R. Vasudevan. Trichy. Page 9. usage mining. basic. processing. Web usage mining. Web. useful information

Search and Information Retrieval

Web Mining using Artificial Ant Colonies : A Survey

An application for clickstream analysis

Web Usage Mining: Identification of Trends Followed by the user through Neural Network

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

Business Lead Generation for Online Real Estate Services: A Case Study

Application of Data Mining Methods in Health Care Databases

Web Log Based Analysis of User s Browsing Behavior

AN OVERVIEW OF PREPROCESSING OF WEB LOG FILES FOR WEB USAGE MINING

Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

Implementation of a New Approach to Mine Web Log Data Using Mater Web Log Analyzer

VOL. 3, NO. 7, July 2013 ISSN ARPN Journal of Science and Technology All rights reserved.

Data Mining in Web Search Engine Optimization and User Assisted Rank Results

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Digital media glossary

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

A Study of Web Log Analysis Using Clustering Techniques

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Web Mining as a Tool for Understanding Online Learning

DOCUMENTS ON WEB OBJECTIVE QUESTIONS

International Journal of Scientific & Engineering Research, Volume 5, Issue 4, April ISSN

Chapter 12: Web Usage Mining

Importance of Domain Knowledge in Web Recommender Systems

Associate Professor, Department of CSE, Shri Vishnu Engineering College for Women, Andhra Pradesh, India 2

HOW DOES GOOGLE ANALYTICS HELP ME?

123 LogAnalyzer is the fastest and most powerful Web Customer Analysis Tool available and by far, the most cost effective

Visualizing e-government Portal and Its Performance in WEBVS

An Overview of Knowledge Discovery Database and Data mining Techniques

COURSE RECOMMENDER SYSTEM IN E-LEARNING

Data mining in the e-learning domain

Bisecting K-Means for Clustering Web Log data

Customer Classification And Prediction Based On Data Mining Technique

Application of Data Mining Techniques in Intrusion Detection

Web Design and Implementation for Online Registration at University of Diyala

Indirect Positive and Negative Association Rules in Web Usage Mining

Internet Advertising Glossary Internet Advertising Glossary

Web Hosting Features. Small Office Premium. Small Office. Basic Premium. Enterprise. Basic. General

Transcription:

Generalization of Web Log Datas Using WUM Technique 1 M. SARAVANAN, 2 B. VALARAMATHI, 1 Final Year M. E. Student, 2 Professor & Head Department of Computer Science and Engineering SKP Engineering College, Tiruvannamalai, INDIA. deivanai.saravanan@gmail.com, valar_mathi_2007@yahoo.co.in ABSTRACT This paper attempts to understand the behavioral patterns of the websites visitors with the aim to create better and effective websites. The behavioral pattern is understood by analyzing the web log files maintained by the respective websites. The analysis of this work involves how many visitors browse the web site, which pages they view, which they ignore, how long they spend on the site, where they come from and find out the frequency of visitors. In this project, the web log files are analyzed to obtain the user access pattern of the various web pages in the web site. This information is then used to predict the preferences of the different users about the web site and it will give the reports how many number of visitors accessed in the particular website, how many number of unique IP addresses was used, find out the amount of bandwidth was used and finally how many number of hits of the site was received. The number of hits of the site was broken into with respect to time increment, daily usage of the report, day of the week, hour of the day. To learn more about the information that the visitors have accessed, we can see which how many web pages were viewed, how many files are downloaded, what are all directories were accessed and which images are looked at, in which web site. Referrer information includes the domains and URL's that the visitors came from. General Terms: Human Factors, Measurement. Used Keywords: Query log analysis, Web Search Measurement. 1 INTRODUCTION 1.1 BACKGROUND Web users increase at a fast rate and useful information can be obtained from the WWW (World Wide Web).The available data is growing explosively, so, the techniques for analysis and discovery of useful information are important. The information providers and web manager make an effort to construct the effective web site. If providers and administrators can determine user s browsing patterns from web access logs, they could use the patterns as one index to construct an effective web site [2]. However, it is difficult to extract user s browsing patterns manually because the web access log is huge. Therefore, data mining technique is adopted to solve this problem. The data mining is to extract patterns from large amounts of data. Web page complexity far exceeds the complexity of any traditional text document collection. The Web constitutes a highly dynamic information source and Web serves a broad spectrum of user communities[3].further only a small portion of the Web s pages contain truly relevant or useful information. Web mining is mining of data related to the World Wide Web. This may be the data actually present in WebPages or data related to the Web activity [4,5]. Web data can be classified into the following classes: Content of actual web pages. Intra-page structure includes the HTML or XML code for the page. Inter-page structure is the actual linkage structure between Web pages. Usage data that describes how Web pages are accessed by visitors. User profiles include demographic and registration information obtained about users. This could also include information found in cookies. ISSN: 1790-5117 157 ISBN: 978-960-474-162-5

Whenever a visitor access the web server it leaves the IP, authenticated user ID, time/date, request mode, status, bytes, referrer, agent and so on. The available data fields are specified by the HTTP protocol. Web mining task can be divided into several classes. Figure 1.1 shows one taxonomy of web mining activities. General access pattern tracking is a type of usage mining that looks at a history of Web pages visited. This usage may be general or may be targeted to specific usages or users. Taxonomy of Web Mining Figure: 1.1. Taxonomy of Web Mining. Web Usage Mining is that part of Web Mining which deals with the extraction of knowledge from server log files. Source data mainly consist of the (textual) logs that are collected when users access web servers and might be represented in standard formats. 1.2 MOTIVATION The aim of this paperwork is to analyze the log files of a web site obtained from a web server using WUM technique. The data warehouse has been created and populated, various statistical and data mining techniques will be used in order to identify any web usage patterns that exist. An existing application that may be able to assist with this pattern discovery phase is 123LogAnalyzer. These patterns will then be analyzed, interpreted and used to determine how well the web site is being used. A graphical representation of these patterns will also be created. 1.3 OBJECTIVES Web usage mining is the type of Web mining activity that involves the automatic discovery of user access patterns from one or more Web servers. Organizations often generate and collect large volumes of data in their daily operations. Most of this information is usually generated automatically by Web servers and collected in server access logs. Other sources of user information include referrer logs which contains information about the referring pages for each page reference, and user registration or survey data gathered via tools such as CGI scripts [7]. Analyzing such data can help organizations to determine the life time value of customers, cross marketing strategies across products, and effectiveness of promotional campaigns, among other things. Analysis of server access logs and user registration data can also provide valuable information on how to better structure a Web site in order to create a more effective presence for the organization [8]. Finally, for organizations that sell advertising on the World Wide Web, analyzing user access patterns helps in targeting advertisement to specific groups of users. 1.4 CHALLENGES The World Wide Web is a huge, diverse and dynamic medium for the dissemination of information maybe too much information to mine information overload a lot of this information is irrelevant and not indexed.finding relevant information to mine, Personalization and mass customization is difficult and E-commerce businesses have to know what the customers wants. Most of the Web documents are in HTML format and contain many markup tags, mainly used for formatting. Traditional IR systems often contain structured and well- written documents, this is NOT the case on the Web. Most documents in traditional IR systems tend to remain static over time, Web pages are much more dynamic. Web pages are hyperlinked to each other, and it is through hyperlink that a Web page author cites other Web pages. ISSN: 1790-5117 158 ISBN: 978-960-474-162-5

The size of the Web is larger than traditional data sources or document collections by several orders of magnitude. 2 PROPOSED SYSTEM 2.1 SYSTEM OVERVIEW Data mining is a technique used to deduce useful and relevant information to guide professional decisions and other scientific research. It is a cost-effective way of analyzing large amounts of data, especially when a human could not analyze such datasets. Massification of the use the internet has made automatic knowledge extraction from Web log files a necessity. Information provided are interested in techniques that could learn Web users information needs and preferences [9]. This can improve the effectiveness of their Web sites by adapting the information structure of the sites to the users behavior. Recently, the advent of data mining techniques for discovering usage pattern from Web data (Web Usage Mining) indicates that these techniques can be a viable alternative to traditional decision making tools. Web Usage Mining is the process of applying data mining techniques to the discovery of usage patterns from Web data and is targeted towards applications.web Usage Mining mines the secondary data derived from the interactions of the users during certain period of Web sessions. This work explores the use of Web Usage Mining techniques to analyze Web log records collected from Web servers. Using commercial data Web mining tool (123Log analyzer) have identified several Web access pattern by applying well known data mining techniques to the access log files. 2.2 SYSTEM REQUIREMENTS 123LogAnalyzer is a powerful online tool that turns your Web logs into a comprehensive analysis of the customers and prospects [10]. 123LogAnalyzer describes how visitors browse our Web site, which pages they view (and ignore), how long they spend on our site, and where they come from. 123LogAnalyzer's Web server activity report displays the number of visitors, the number of unique IP addresses, the amount of bandwidth used, and the number of hits the site received, broken down by time increment, day of the week, and hour of the day. To learn more about the information that visitors accessed you can see which Web pages were viewed, files were downloaded, directories were accessed, and images were viewed. Referrer information includes the domains and URL's that the visitors came from. The search engine performance report displays the search engines that referred visitors to the site, and the words and phrases that visitors searched for. 123LogAnalyzer provide geographic information about the visitors, as well as which platforms and browsers people are using to visits the site. We can even identify missing files, broken links, and other errors that visitors encountered. The sample output of 123LogAnalyzer is given below. Fig 2.1 adding the log file Fig 2.2 Daily Visit Report ISSN: 1790-5117 159 ISBN: 978-960-474-162-5

end user and improve web server system performance[3]. Fig 2.3 Most popular Day of week Report Fig: 3.1. Design of Web log system The log file contents are retrieved from text file and tokens are separated by using String Tokenize. The contents are then stored into a database. Unwanted Tuples are then removed and stored in another table. Aggregate functions are used for extracting the required tuples. SQL Queries are passed to database using Fig 2.4 Hits in Hour of day Report Fig 2.5 Hits in Day of week Report 3 DESIGN OF THE SYSTEM 3.1 DESIGN OF THE SYSTEM Web usage mining mines web log records to discover web access pattern of web pages. Analyzing and exploring identifying potential customers for e-commerce enhance the quality and delivery of internet information services to LOG FILE: Log files are files that contain a record of website activity. Every time a person visits the website, a log file is updated with the visitor's information by the web server. These log files can be downloaded and used to generate useful statistics. An access of a web page or a file will generate a "Hit" on the web server. For example, if a web page contains 10 pictures, a visit on that page will generate 11 "hits" on the web server, one hit for the web page, 10 hits for the pictures. If a visitor viewed 5 web pages on the web site, each page contain 10 pictures, the web server will record: 55 Hits 5 Page Views 1 Visit 3.1.2 WEBLOG FILES Web Server log files are simple text files that are automatically generated every time ISSN: 1790-5117 160 ISBN: 978-960-474-162-5

someone accesses the Website. Every "hit" of the Web site, including each view of a HTML document, image or other object, is logged. The raw web log file format is essentially one line of text for each hit to the website. This contains information about who was visiting the site, where they came from, and exactly what they were doing on the particular Web site. There are up to four files that is, Access (or transfer), error, agent (or browser), and referrer files. More and more often, the transfer, agent, and referrer are being gathered into a combined file. 3.1.3 SAMPLE LINE OF A WEB LOG FILE IN ITS RAW FORMAT: 217.13.12.209 - - [19/JUL/2007:02:50:32-0400] "GET /meta_tags.htm HTTP/1.1" 200 28950 "http://www.google.com/search?q=meta+and+ta g" "Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000; DigExt). Generalization of Log files Bar / Line chart generation Conversion of log files to Data Base Fig: 3.2.System Architecture Table generation 3.3 DETAILED PROCESS OF WUM This web server log file line tells us: Visitor's IP address or hostname [217.13.12.209] Login [ -] Authuser [ -] Date and time [19/JUL/2007:02:50:32-0400] Request method [GET] Request path [meta_tags.htm] Request protocol [HTTP/1.1] Response status [200] Response content size [28950] Referrer path [http://www.google.com/search?q=meta+and+ta g] User agent [Mozilla/4.0 (compatible; MSIE 5.0; Windows 2000; DigExt)] 3.2SYSTEM ARCHITECTURE As part of system requirements and design activity, the system has to be modeled as a set of components and relationships between these components. The figure 3.2 shows the major sub-systems of software and interconnection between these sub-systems. Figure 3.3.Activites of WUM Step 1: Data preprocessing Data preprocessing has a fundamental role in Web Usage Mining applications. It has different tasks [12]: (a) Data Cleaning-This step consists of removing all the data tracked in web logs that are useless for mining purposes. (b) Session Identification and Reconstruction-This step consists of (i) identifying the different users sessions from the usually very poor information available in log files and (ii) reconstructing the users navigation path within the identified sessions. (c) Content and Structure Retrieving- Web content refers to the discovery of useful information from web contents including text, image, audio and video etc., structure retrieving gives the analysis of the out links of a webpage and it has been used for search engine result ranking. (d) Data Formatting - Once the previous phases have been successfully completed, data are properly formatted before applying mining techniques. So stored data extracted from web logs into a relational database. ISSN: 1790-5117 161 ISBN: 978-960-474-162-5

Fig 3.4 Phases of WUM Step 2: Mining Algorithms Process of mining algorithm or pattern discovery: (a) Statistical Analysis: Statistical techniques are the most common method to extract knowledge about visitors to a Web site. By analyzing the session file, one can perform different kinds of descriptive statistical analyses (frequency, mean, median, etc.) on variables such as page views, viewing time and length of a navigational path. (b)clustering: Clustering is a technique to group together a set of items having similar characteristics. In the Web Usage domain, there are two kinds of interesting clusters to be discovered. (i.e.) usage clusters and page clusters. Clustering of users tends to establish groups of users exhibiting similar browsing patterns. Such knowledge is especially useful for inferring user demographics in order to perform market segmentation in E-commerce applications or provide personalized Web content to the users. (c)classification: Classification is the task of mapping a data item into one of several predefined classes. In the Web domain, one is interested in developing a profile of users belonging to a particular class or category. This requires extraction and selection of features that best describe the properties of a given class or category. (d)association Rules: Association rule generation can be used to relate pages that are most often referenced together in a single server session. In the context of Web Usage Mining, association rules refer to sets of pages that are accessed together with a support value exceeding some specified threshold. These pages may not be directly connected to one another via hyperlinks[11]. (e)sequential Patterns: The technique of sequential pattern discovery attempts to find inter-session patterns such that the presence of a set of items is followed by another item in a time-ordered set of sessions or episodes. By using this approach, Web marketers can predict future visit patterns which will be helpful in placing advertisements aimed at certain user groups. (f)dependency Modeling: Dependency modeling is another useful pattern discovery task in Web Mining. The goal here is to develop a model capable of representing significant dependencies among the various variables in the Web domain. Step 3: Pattern Analysis Pattern analysis is the last step in the overall Web Usage mining process as described in Figure 3. The motivation behind pattern analysis is to filter out uninteresting rules or Patterns from the set found in the pattern discovery phase[13]. The exact analysis methodology is usually governed by the application for which Web mining is done. The most common form of pattern analysis consists of a knowledge query mechanism such as SQL. 4 IMPLEMENTATION OF SYSTEM 4.1 METHODOLOGY OVERVIEW The Web Usage Mining process becomes a major guide line upon project implementation. Fig.4.1 shows the general flow of the project methodology. Fig 4.1 Flow of the project methodology Server Log File The server log file dated from JANUARY 2007 TO SEPTEMBER 2007 has been selected for further analysis. The server log files are retrieved from the (IIS) web server. The large amount of data becomes the most challenging problem to handle during the ISSN: 1790-5117 162 ISBN: 978-960-474-162-5

Data Preprocessing phase. The server log file consists of nine attributes in the single line of record as shown in Fig 4. 192.168.2.85 - - [21/Jun/2007:05:27:59 +0000] "GET / HTTP/1.0" 200 0 "-" "Microsoft-WebDAV- 192.168.10.82 - - [12/May/2007:05:40:57 +0000] "GET /sysvol HTTP/1.0" 404 0 "-" "Microsoft-WebDAV- 192.168.10.79 - - [23/Jul/2007:05:54:52 +0000] "GET /sysvol HTTP/1.0" 404 0 "-" "Microsoft-WebDAV- 192.168.10.75 - - [02/Aug/2007:06:14:07 +0000] "GET / HTTP/1.0" 200 0 "-" "Microsoft-WebDAV- 192.168.10.74 - - [20/May/2007:06:16:33 +0000] "GET /sysvol HTTP/1.0" 404 0 "-" "Microsoft-WebDAV- 192.168.10.72 - - [28/Sep/2007:06:27:33 +0000] "GET / HTTP/1.0" 200 0 "-" "Microsoft-WebDAV- 192.168.10.72 - - [23/Mar/2007:06:27:33 +0000] "GET /sysvol HTTP/1.0" 404 0 "-" "Microsoft-WebDAV- viewed webpage, Most viewed directories).see Figure 4.2. e. Table generation: Based on the information available in the database from the log file, its going to build the required information on the table on that database.(eg.:daily hits, Daily visit, Daily bandwidth, Daily page views, Most popular day of week, Weekly bandwidth, Hits in day of week, Visitor viewed the web most, Most viewed webpage, Most viewed directories).see the table 4.1. 4.3 SAMPLE SCREEN SHOTS Figure 4.2 Bar / Line chart of Daily hits Report 4.2 DESCRIPTION OF THE MODULES WITH SCREEN SHOTS 4.2.1 Description of Modules a. Extracting web log files. Extracting the log files from different web servers with various formats. b. Converting web log files. Converting information from text files (it is a file which is created by the log analyzer) and storing those webs based available in the file to database. c. Generalization web log data Posting of all data to the appropriate tuples. d. Bar / Line chart generation Based on the information available in the database from the log file, it s going to build the required Bar chart. (Eg.:Daily hits,daily visit,daily bandwidth, Daily page views, Most popular day of week, Weekly bandwidth, Hits in day of week, Visitor viewed the web most, Most Table 4.1 Generation of Daily hits Report 5 CONCLUSION AND FUTURE ENHANCEMENTS 5.1 CONCLUSION: The Web Usage Mining modules were used to preprocess the log file and various charts are generated depicting the daily, weekly, ISSN: 1790-5117 163 ISBN: 978-960-474-162-5

monthly usage patterns. Sample charts generated from the mining process are presented below. Web Usage Mining is an active field for research and Web Usage Mining applications are being used in some famous Websites. This project presents an implementation of the Web Usage Mining. Web Server log files are mined in order to analyze the Web Usage pattern. The methodology employs Data Preprocessing, Mining Algorithms and Pattern Analysis. Data Processing phase for the Web Usage Mining is a challenging task. By applying mining algorithms to the Web log file, the relationship between the accessed pages can be mined. The results from this project can be used by Web administrator and Web masters in order to improve Web services and performance through the improvement of Web sites, including their contents, structure, presentation and delivery. 5.2 APPLICATIONS The results can be used to improve the web site from the users viewpoint. Further the results produced by the mining of web logs can used for various purposes: to personalize the delivery of web content to improve user navigation through prefetching and caching to improve web design or in e- commerce to improve the customer satisfaction Personalization of Web Content. Web Usage Mining techniques can be used to provide personalized web user experience. For instance, it is possible to predict, in real time, the user behavior by comparing the current navigation pattern with typical patterns which were extracted from past web log. Prefetching and Caching. The results produced by Web Usage Mining can be exploited to improve the performance of web servers and web-based applications. Typically, Web Usage Mining can be used to develop proper prefetching and caching strategies so as to reduce the server response time. Support to the Design. Usability is one of the major issues in the design and implementation of web sites. The results produced by Web Usage Mining techniques can provide guidelines for improving the design of web applications. E-commerce. Mining business intelligence from web usage data is dramatically important for e-commerce web-based companies. Customer Relationship Management (CRM) can have an effective advantage from the use of Web Usage Mining techniques. In this case, the focus is on business specific issues such as: customer attraction, customer retention, cross sales, and customer departure. 5.3 FUTURE ENHANCEMENT As a future enhancement of this project, web pages can be pre-fetched depending on the usage patterns. Pre-fetching can improve the web performance at a great level. Further, the method for analyzing sparse data can be used in the study of Web log access, use of different similarity Association Rules and conclude about the most suitable alternatives for knowledge extraction from Web log data. Finally the project can be extended to access and process the external web servers with appropriate access rights. REFERENCES [1] Abraham A., Business Intelligence from Web Usage Mining, Journal of Information and Knowledge Management (JIKM), World Scientific Publishing Co., Singapore, Volume 2, No. 4, pp. 1-15, 2003. [2] Azizul Azhar bin Ramli, Web usage mining using apriori algorithm: UUM learning care portal case. In: Proc. of the Int. Conf. on Knowledge Management,pp 1-19,2001. [3] Cooley, R, Mobasher.B.,Srivastava,J,Web mining information and pattern discovery on the World Wide Web, Ninth IEEE International Conference,Volume, Issue, 3-8,page(s):558 567,p. 1-15, 2003. [4] Jiawei Han, Kevin Chen-Chuan Chang, "Data Mining for Web Intelligence" Computer, Vol. 35, no. 11, pp. 64-70, Nov., 2002. [5] Jiawei Han and Micheline Kamber, Data Mining Concepts and Techniques, Second Edition, Morgan Kaufmann Publishers, 2006. ISSN: 1790-5117 164 ISBN: 978-960-474-162-5

[6] Kato, H.; Hiraishi, H.; Mizoguchi, F., Log summarizing agent for Web access data using data mining techniques, IFSA World Congress and 20th NAFIPS International Conference, Joint 9 th Volume, Issue,25-28, Page(s):2642-2647 Vol.5, 2001. [7] Marquardt, C.G.; Becker, K.; Ruiz, D.,A preprocessing tool for Web usage mining in the distance education domain, Database Engineering and Applications Symposium, Volume,Issue,7-9, page(s): 78 87, July 2004. [8]Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, "Preprocessing and Mining Web Log Data for Web Personalization", Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence,2003. [9] Mukesh Mohania, A. Min Tjoa,Data Warehousing and Knowledge Discovery: First International Conference, DaWaK'99 Florence, Italy,1999. [10] F. van Harmelen, A. Kampman, H. Stuckenschmidt, and T. Vogele. Knowledgebased meta-data validation: Analyzing a webbased information system. In K. Greve, editor, 14 th International Symposium Informatics for Environmental Protection. German Computer Society, 2000. [11] Vinodkumar P. Kizhakke, "Mir: A Tool For Visual presentation Of Web Access Behavior", Master thesis, University of Florida, Gainesville, 2000. [12] Yang, T.Li and K.Wang, Web-Log Cleaning for Constructing Sequential Classification Applied Artificial Intelligence, vol 17, 2003. [13] Abraham A., Business Intelligence from Web Usage Mining, Journal of Information and Knowledge Management (JIKM), World Scientific Publishing Co., Singapore, Volume 2, No. 4, pp. 1-15, 2003. http://citeseer.ist.psu.edu/abraham03business.ht ml [14] http://httpd.apache.org/docs/1.3/logs.html [15] http://www.apacheweek.com/features/logfiles [16]http://msdn2.microsoft.com/enus/library/ms 525807.aspx [19]http://webhosting.devshed.com/c/a/Web- Hosting-Articles/The-Top-Web- Servers-inthe-Market/2/ [20]http://www.lib.utexas.edu/dlp/imls /tools/logdb/atributedetails.html [21] Hiraishi, H.; Mizoguchi, F. Log summarizing agent for Web access data using data miningtechniques Kato, H.IFSA World Congress and 20th NAFIPS International Conference, 2001. Joint 9 th Volume, Issue, 25-28 July 2001 Page(s):2642-2647 vol.5 [22] Jiawei Han, Kevin Chen-Chuan Chang, "Data Mining for Web Intelligence," Computer, vol. 35, no. 11, pp. 64-70, Nov., 2002 [23] F. van Harmelen, A. Kampman, H. Stuckenschmidt, and T. Vogele. Knowledgebased meta-data validation: Analyzing a web-based information system. In K. Greve, editor, Fourtheenth International Symposium Informatics for Environmental Protection. German Computer Society, 2000. [24] Miriam Baglioni, U. Ferrara, Andrea Romei, Salvatore Ruggieri, Franco Turini, "Preprocessing and Mining Web Log Data for Web Personalization", Proc. of 8th Natl' Conf. of the Italian Association for Artificial Intelligence,2003 [25] www.123loganalyzer.com/ [17]http://www.summary.net/manual/ log_formats.html [18] http://stream.bo.cnr.it/syshelp/config.htm ISSN: 1790-5117 165 ISBN: 978-960-474-162-5