ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION



Similar documents
Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining Solutions for the Business Environment

Web Mining in E-Commerce: Pattern Discovery, Issues and Applications

Data Mining in Telecommunication

Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Advanced Preprocessing using Distinct User Identification in web log usage data

How To Find Out What A Web Log Data Is Worth On A Blog

Arti Tyagi Sunita Choudhary

A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING

A SURVEY ON WEB MINING TOOLS

Web Mining as a Tool for Understanding Online Learning

Web Analytics Definitions Approved August 16, 2007

A Survey on Web Mining From Web Server Log

ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

A Study of Web Log Analysis Using Clustering Techniques

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA

WEB SITE OPTIMIZATION THROUGH MINING USER NAVIGATIONAL PATTERNS

DATA MINING AND KNOWLEDGE DISCOVERY FROM RESEARCH PROBLEMS

Data Mining System, Functionalities and Applications: A Radical Review

Visualizing e-government Portal and Its Performance in WEBVS

Transforming the Telecoms Business using Big Data and Analytics

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Interactive Exploration of Decision Tree Results

Evaluating the impact of research online with Google Analytics

20 A Visualization Framework For Discovering Prepaid Mobile Subscriber Usage Patterns

A Comparative Study of Different Log Analyzer Tools to Analyze User Behaviors

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Innovative Analysis of a CRM Database using Online Analytical Processing (OLAP) Technique in Value Chain Management Approach

Data mining in the e-learning domain

RESEARCH ISSUES IN WEB MINING

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Monitoring Replication

Adobe Insight, powered by Omniture

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Web Analytics Understand your web visitors without web logs or page tags and keep all your data inside your firewall.

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: X DATA MINING TECHNIQUES AND STOCK MARKET

INTERNET MARKETING. SEO Course Syllabus Modules includes: COURSE BROCHURE

Beyond Web Application Log Analysis using Apache TM Hadoop. A Whitepaper by Orzota, Inc.

What is Customer Relationship Management? Customer Relationship Management Analytics. Customer Life Cycle. Objectives of CRM. Three Types of CRM

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Visualization methods for patent data

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

THE INTELLIGENT BUSINESS INTELLIGENCE SOLUTIONS

Role of Social Networking in Marketing using Data Mining

How To Predict Web Site Visits

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

not possible or was possible at a high cost for collecting the data.

An Overview of Knowledge Discovery Database and Data mining Techniques

graphical Systems for Website Design

Implementing Data Models and Reports with Microsoft SQL Server 2012 MOC 10778

PSG College of Technology, Coimbatore Department of Computer & Information Sciences BSc (CT) G1 & G2 Sixth Semester PROJECT DETAILS.

Dynamic Data in terms of Data Mining Streams

HOW DOES GOOGLE ANALYTICS HELP ME?

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Web Mining Techniques in E-Commerce Applications

Data Warehousing and Data Mining in Business Applications

Data Mining and Marketing Intelligence

Efficient Integration of Data Mining Techniques in Database Management Systems

Data Mining Algorithms Part 1. Dejan Sarka

Nonprofit Technology Collaboration. Web Analytics

KnowledgeSEEKER Marketing Edition

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Distance Learning and Examining Systems

Web Traffic Capture Butler Street, Suite 200 Pittsburgh, PA (412)

Moreketing. With great ease you can end up wasting a lot of time and money with online marketing. Causing

Data Mining Analytics for Business Intelligence and Decision Support

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

Using Google Analytics

Web Mining Functions in an Academic Search Application

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

Web Data Mining: A Case Study. Abstract. Introduction

Augmented Search for Web Applications. New frontier in big log data analysis and application intelligence

SEO Search Engine Optimization. ~ Certificate ~ For: Q MAR WDH By

Single Level Drill Down Interactive Visualization Technique for Descriptive Data Mining Results

Yandex: Webmaster Tools Overview and Guidelines

Website Usage Monitoring and Evaluation

JK WEBCOM TECHNOLOGIES

Introduction to Data Mining

COURSE RECOMMENDER SYSTEM IN E-LEARNING

SEACW DELIVERABLE D.1.6

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

1. INTERFACE ENHANCEMENTS 2. REPORTING ENHANCEMENTS

Prerequisites. Course Outline

Emerging Trends in Business Analytics

Transcription:

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION K.Vinodkumar 1, Kathiresan.V 2, Divya.K 3 1 MPhil scholar, RVS College of Arts and Science, Coimbatore, India. 2 HOD, Dr.SNS Rajalakshmi College of Arts & Science, Coimbatore, India. 3 MPhil scholar, PSG College of Arts & Science, Coimbatore, India. Abstract Web usage mining is a main research area in Web mining focused on learning about Web users and their interactions with Web sites to know the usage level of the website. The motive of mining is to find users access models automatically and quickly from the vast Web log data, such as frequent access paths, frequent access page groups and user clustering. Through web usage mining, the server log, registration information and other relative information left by user access can be mined with the user access mode which will provide foundation for decision making of organizations. The main feature of this research is, this is a complete framework and findings in Web usage patterns from Web log files of a real Web site that has all the difficult aspects of real-life Web usage, including developing user profiles and external data describing database of the Web content. Keywords: web log, server log,user profile. 1. Introduction The frame work WEB USAGE MINING is used to mine the web usage. This frame work uses the data mining concept pattern recognition. Through this research we can determine the usage level of one particular website. From the details of the various user s behaviour we can predict the usage level of one particular web page. Here the web page usage details will be automatically stored and maintained by the Web log file. Here the registration process allows the user to register on the particular web site. User s personal details will be stored in the data base. Then the user can log in to the site. When one registered user logged into the particular page means the web log file stores the details about the user s various access on the page. Until the user s log out the log file automatically stores these details. The web log file also maintain the details about the number of hits for one particular page (or) link, with in the particular time duration who all are visited one particular page, the log in and log out time details of one particular user and their various access details on the web pages. Through the web log file details we can determine that which page (or) link is mostly visited by the users. From these details we can produce a report for the web site usage. Through the usage level we can provide an effective and user friendly solutions. This will help us to use the web as more profitable one for a particular business strategy. 196

2. Related Work In Web Mining, data can be collected at the server side, client-side, proxy servers, or obtained from an organization's database (which contains business data or consolidated Web data). Each type of data collection differs not only in terms of the location of the data source, but also the kinds of data available, the segment of population from which the data was collected, and its method of implementation. There are many kinds of data that can be used in Web Mining. This paper classifies such data into the following types: Content: The real data in the Web pages, i.e. the data the Web page was designed to convey to the users. This usually consists of, but is not limited to, text and graphics. Structure: Data which describes the organization of the content. Intra-page structure information includes the arrangement of various HTML or XML tags within a given page. This can be represented as a tree structure, where the <html>tag becomes the root of the tree. Usage: Data that describes the pattern of usage of Web pages, such as IP addresses, page references, and the date and time of accesses. User Profile: Data that provides demographic information about users of the Web site. Includes registration data and customer profile information. 3. Proposed System This framework covers the various aspects related to the WEB USAGE MINING. The proposed system uses the data mining concept PATTERN DISCOVERY AND PATTERN ANALYSIS. The proposed system is developed to ease the process of mine the usage level of one particular website. The uses of this application are, It used to identify the usage level of one particular web site. The various user s behaviour on one particular website will be monitored by this application. Through this application we can answer for the following six questions: 1. How are people using this site? 2. Which pages are being accessed most frequently 3. To know which page (or) hyperlink got maximum number of hits. 4. Users details will be easily maintained. 5. With the help of the web log file details the reports will be produced. 6. With the results of the reports we can use web as a more profitable one for particular business strategy. 3.1 Pattern Discovery Pattern discovery techniques involve algorithms to discover interesting patterns from web data. Once user transactions or sessions have been identified, there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. Some of these discovery techniques are discussed below. 3.1.1 Path Analysis Graph models are most commonly used for Path Analysis. In the graph models, a graph represents some relation defined on Web pages and each tree of the graph represents a web site. Each node in the tree represents a web page and edges between trees represent the links between web sites and the edges 197

between nodes inside a same tree represent links between documents at a web site. When path analysis is used on the site as a whole, this information can offer valuable insights about navigational problems. Most graphs are involved in determining frequent traversal patterns and more frequently visited paths in a web site. 3.1.2 Association Rules Predict the association and correlation among set of items where the presence of one set of items in a transaction implies with a certain degree of confidence the presence of other items. That is, it can discover the correlations between pages that are most often referenced together in a single server session/user session. It can provide the information: What are the set of pages frequently accessed together by web users? What page will be fetched next? What are paths frequently accessed by web users?. Implement association rules to on-line shopper can generally find out his/her spending habits on some related products. 3.1.3 Sequential Patterns Sequential patterns discovery is to find the intertransaction patterns such that the presence of a set of items is followed by another item in the time-stamp ordered transaction set. Web log files can record a set of transactions in time sequence 3.1.4 Decision Trees A decision tree is essentially a flow chart of questions or data points that ultimately leads to a decision. 3.1.5 Clustering Clustering identifies visitors who share common characteristics. After you get the customers /visitors profiles, you can specify how many clusters to identify within a group of profiles, and then try to find the set of clusters that best represents the most profiles. Besides information from Web log files, customer profiles often need to be obtained from an on-line survey form when the transaction occurs. 3.1.6 Grouping Users usually can draw higher-level conclusions by grouping similar information. how many visitors came from a Yahoo server. For example: http://search.yahoo.com/bin/search?p=web+miners 3.1.7 Filtering Simple reporting needs require only simple analysis systems. However, as the company s Web becomes more integrated with the other functionality of the company, for example, customer service, human resources, marketing activity, analysis need to rapidly expand. For example, the company launches a marketing campaign. Print and television ads now are designed to drive consumers to a Web site, rather than to call an 800 number or to visit a store. Consequently, tracking online marketing campaign results is no longer a minor issue but a major marketing concern. Often it s difficult to predict which variables are critical until considerable information has been captured and analysed. Consequently, a Web traffic analysis system should allow precise filtering and grouping information even after the data has been collected. Systems that force a company to predict which variables are important before capturing the data can lead to poor decisions because the data will be skewed toward the expected outcome. Filtering information allows a manager to answer specific questions about the site. For example, filters can be used to calculate how many visitors a site received this week from Microsoft. In this example, a filter is set for this week, and for visitors that have the word Microsoft in their domain name e.g.proxy12.microsoft.com. 198

3.1.8 Dynamic Site Analysis Traditional Web sites were usually static HTML pages, often hand-crafted by Webmasters. Today, a number of companies, including Microsoft, make systems that allow an HTML file to be dynamically created around a database. This offers advantages like, included centralized storage, flexibility, and version control. 3.3 Pattern Analysis Pattern analysis is the last step in the overall Web Usage mining process. The motivation behind pattern analysis is to filter out uninteresting rules or patterns from the set found in the pattern discovery phase. The exact analysis methodology is usually governed by the application for which Web mining is done. The most common form of pattern analysis consists of a knowledge query mechanism such as SQL. Another method is to load usage data into a data cube in order to perform OLAP operations. Visualization techniques, such as graphing patterns or assigning colours to different values, can often highlight overall patterns or trends in the data. Content and structure information can be used to filter out patterns containing pages of a certain usage type, content type, or pages that match a certain hyperlink structure. 4. Conclusion We presented a framework for mining, tracking, and validating evolving multifaceted user profiles on Web sites that have all the challenging aspects of real-life Web usage mining, include evolving user profiles and access patterns, dynamic Web pages, and external data describing ontology of the Web content. A multifaceted user profile summarizes a group of users with similar access activities and consists of their viewed pages, search engine queries, and inquiring and inquired companies. The choice of the period length for analysis depends on the application or can be set, depending on the cross-period validation results. Even though we did not focus on scalability, the latter can be addressed by following an approach similar to, where Web click streams are considered as an evolving data stream, or by mapping some new sessions to persistent profiles and updating these profiles, hence eliminating most sessions from further analysis and focusing the mining on truly new sessions. 5. References [1] Web Analytics An hour a day, Avinash kaushik, Wiley India Publication,2007. [2] Web Analytics 2.0, Avinash kaushik, Wiley India Public ation,2010. [3] Usama Fayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurasamy, "Advances in Knowledge Discovery and Data Mining", AAAI Press/ The MIT Press, 1996. [4] J. Ross Quinlan, "C4.5: Programs for Machine Learning", Morgan Kaufmann Publishers, 1993. [5] Michael Berry and Gordon Linoff, "Data Mining Techniques (For Marketing, Sales, and Customer Support), John Wiley & Sons, 1997. [6] Sholom M. Weiss and Nitin Indurkhya, "Predictive Data Mining: A Practical Guide", Morgan Kaufmann Publishers, 1998. [7] Alex Freitas and Simon Lavington, "Mining Very Large Databases with Parallel Processing", Kluwer Academic Publishers, 1998. [8] A. K. Jain and R. C. Dubes, "Algorithms for Clustering Data", Prentice Hall, 1988. [9] V. Cherkassky and F. Mulier, "Learning From Data", John Wiley & Sons, 1998. 199

6. AUTHOR PROFILE: K.Vinodkumar is an M.Phil scholar, Department of Computer Applications (MCA) in RVS College of Arts and Science, Coimbatore. He received his B.Sc., in 2010 from Bharathiar University and MCA in 2013 from Anna University, Chennai. He is pursuing his M.Phil. in the area of Data mining from Bharathiar University, Coimbatore. He got university rank in MCA. Dr.V.Kathiresan is an HOD, Department of Computer Applications (MCA) in Dr.SNS Rajalakshmi College of Arts & Science, Coimbatore. He received his B.Sc., in 2003 and MCA in 2006 from Bharathiar University, Coimbatore. He obtained his M.Phil. in the area of Data mining from Periyar University, Salem in 2007. He obtained his Phd in the area of Data mining. His research interest lies in the area of Data mining. He got Faculty Excellence Award from RVS College of Arts & Science for the Academic years 2007-08, 2008-09,2009-10, 2010-11, 2011-12, 2012-13 and 2013-14 consecutively. K.Divya is an M.Phil scholar, Department of Computer Science in PSG College of Arts and Science, Coimbatore. He received his BCA in 2013 from Bharathiar University and Msc(CS) in 2015 from PSG College of Arts and Science, Coimbatore.. 200

201