E-CRM and Web Mining. Objectives, Application Fields and Process of Web Usage Mining for Online Customer Relationship Management.



Similar documents
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Data Mining Solutions for the Business Environment

An Enhanced Framework For Performing Pre- Processing On Web Server Logs

Arti Tyagi Sunita Choudhary

Advanced Preprocessing using Distinct User Identification in web log usage data

Web Advertising Personalization using Web Content Mining and Web Usage Mining Combination

Customer Relationship Management

AN EFFICIENT APPROACH TO PERFORM PRE-PROCESSING

Identifying the Number of Visitors to improve Website Usability from Educational Institution Web Log Data

A guide to affilinet s tracking technology

Web Analytics Definitions Approved August 16, 2007

Opinion 04/2012 on Cookie Consent Exemption

EVALUATION OF E-COMMERCE WEB SITES ON THE BASIS OF USABILITY DATA

Adaptive Business Management Systems Privacy Policy

5 Big Data Use Cases to Understand Your Customer Journey CUSTOMER ANALYTICS EBOOK

Web Mining Functions in an Academic Search Application

Web Mining. Margherita Berardi LACAM. Dipartimento di Informatica Università degli Studi di Bari

Pre-Processing: Procedure on Web Log File for Web Usage Mining

Data Preprocessing and Easy Access Retrieval of Data through Data Ware House

APPLICATION OF INTELLIGENT METHODS IN COMMERCIAL WEBSITE MARKETING STRATEGIES DEVELOPMENT

Exploitation of Server Log Files of User Behavior in Order to Inform Administrator

graphical Systems for Website Design

How To Use Data Mining For Knowledge Management In Technology Enhanced Learning

Web Usage mining framework for Data Cleaning and IP address Identification

A Survey on Preprocessing of Web Log File in Web Usage Mining to Improve the Quality of Data

DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

Association rules for improving website effectiveness: case analysis

PREPROCESSING OF WEB LOGS

Hello, Goodbye. The New Spin on Customer Loyalty. From Customer Acquisition to Customer Loyalty. Definition of CRM.

SKoolAide Privacy Policy

TEXT ANALYTICS INTEGRATION

DATA MINING TECHNIQUES AND APPLICATIONS

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Google Analytics Guide. for BUSINESS OWNERS. By David Weichel & Chris Pezzoli. Presented By

ISSN: (Online) Volume 3, Issue 7, July 2015 International Journal of Advance Research in Computer Science and Management Studies

Introduction. A. Bellaachia Page: 1

A SURVEY ON WEB MINING TOOLS

Google Analytics for Robust Website Analytics. Deepika Verma, Depanwita Seal, Atul Pandey

Generalization of Web Log Datas Using WUM Technique

Web Log Analysis for Identifying the Number of Visitors and their Behavior to Enhance the Accessibility and Usability of Website

HOW DOES GOOGLE ANALYTICS HELP ME?

Web Mining in E-Commerce: Pattern Discovery, Issues and Applications

Lifesize Cloud Privacy Statement

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

Chapter ML:XI. XI. Cluster Analysis

1. TYPES OF INFORMATION WE COLLECT.

Guide to Analyzing Feedback from Web Trends

1 Which of the following questions can be answered using the goal flow report?

Business Process Services. White Paper. Personalizing E-Commerce: Improving Interactivity to Increase Revenues

A STUDY OF DATA MINING ACTIVITIES FOR MARKET RESEARCH

Cookie Policy. Introduction About Cookies

Digital media glossary

GOOGLE ANALYTICS 101

Selling Digital Goods Online

The web server administrator needs to set certain properties to insure that logging is activated.

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

CHAPTER 3 PREPROCESSING USING CONNOISSEUR ALGORITHMS

White paper: Google Analytics 12 steps to advanced setup for developers

Cookie Policy. Introduction About Cookies

Monitoring Replication

recommendation in e-commerce

Lead Generation in Emerging Markets

1. Understanding Big Data

Web Analytics: Enhancing Customer Relationship Management

DARTFISH PRIVACY POLICY

Internet Advertising Glossary Internet Advertising Glossary

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

SecuraLive ULTIMATE SECURITY

Context Aware Predictive Analytics: Motivation, Potential, Challenges

Google Analytics Basics

Google Analytics Access for Bentley Employees

Secure Web Appliance. SSL Intercept

Effective Prediction of Kid s Behaviour Based on Internet Use

Mining for Web Engineering

An Effective Analysis of Weblog Files to improve Website Performance

Study Guide #2 for MKTG 469 Advertising Types of online advertising:

Johnson Controls Privacy Notice

Load testing with. WAPT Cloud. Quick Start Guide

5Subscription Management Automate. 6Electronic License Activation (ELA) 7Electronic License Management. 8Electronic Software Delivery (ESD)

Healthcare Measurement Analysis Using Data mining Techniques

Transcription:

University of Fribourg, Switzerland Department of Computer Science Information Systems Research Group Seminar Online CRM, 2005 Prof. Dr. Andreas Meier E-CRM and Web Mining. Objectives, Application Fields and Process of Web Usage Mining for Online Customer Relationship Management. 01.05.2005 Paper by: Beat Raess Ruelle des Macons 9 CH-1700 Fribourg beat.raess@gmx.net

Contents 1 Introduction 2 2 CRM and Web Mining 3 2.1 Customer Relationship Management (CRM).......... 3 2.1.1 Perspectives of CRM................... 4 2.1.2 Adding the E to CRM.................. 4 2.2 Web Mining............................ 5 2.2.1 Web Content Mining................... 6 2.2.2 Web Structure Mining.................. 7 2.2.3 Web Usage Mining.................... 7 3 Web Usage Mining in E-CRM 8 3.1 Objectives............................. 8 3.2 Application Fields of Web Usage Mining............ 9 3.3 The Web Usage Mining Process................. 10 3.3.1 Data Cleaning...................... 11 3.3.2 User and Session Identification............. 12 3.3.3 Data Integration..................... 14 3.3.4 Pattern Discovery and Analysis............. 14 4 Conclusion 16 1

1 Introduction Customer Relationship Management (CRM) is a business strategy oriented on the needs of customers (cf. [Wikipedia 2005]). As a management and organisation form, it is mainly concerned with improving customer orientation and satisfaction in order to assure the acquisition of new and the retention of existing customers of an enterprise (cf. [Schoegel 2002, p. 33]). Online or Electronic Customer Relationship Management (E-CRM) adds the electronic component of modern information and communication technologies (e.g. the internet) to CRM. E-CRM depends on detailled information concerning the customer s characteristics, needs, interests and behaviour. In the case of e-commerce, the problem consists in extracting the relevant information from a huge amount of data. As a consequence, appropriate techniques such as Data or Web Mining are a useful mean to achieve the desired results. Data mining 1 deals with the automated search of patterns inherent in large amounts of data. The aim is to retrieve new information based on existing data by applying methodes from statistics and pattern recognition (cf. [Wikipedia 2005c]). The term Web Mining then refers to the application of such data mining techniques to extract information from the internet (cf. [Wikipedia 2005b]). Web Mining can be classified into three different categories. Web Content Mining and Web Structure Mining analyse the content and structure of web pages, whereas Web Usage Mining studies the behaviour of online surfers. Based on research papers of the involved domains, the objective of this paper is to present, how Web Usage Mining can provide the necessary information needed for Electronic Customer Relationship Management. Section 2 introduces the relevant terms and notions concerning CRM, E-CRM and Web Mining in general. Afterwards, section 3 concentrates on Web Usage Mining and how it is used in E-CRM. This includes an overview over the objectives and application fields, as well as an explanation of the process of Web Usage Mining in E-CRM. 1 also referred to as Knowledge-Discovery in Databases (KDD) 2

2 CRM and Web Mining The aim of this section is to provide the background concerning Customer Relationship Management and Web Mining. In addition to the definition of these terms, their fundamental concepts are briefly explained. 2.1 Customer Relationship Management (CRM) The relation of a customer to an enterprise can be classified into the four phases of the Buying Cycle 2 (cf. [Kincaid 2002, p. 45]): In the consider phase, the customer becomes aware of a specific need and explores possible solutions. He then evaluates and chooses the optimal alternative and places an order in the purchase phase. The obtained product is then installed and utilized in the use phase, and in the repurchase phase he eventually decides to retire or upgrade it. Figure 1 shows this Buying Cycle and the associated marketing and E-CRM tasks involved. Figure 1: Buying Cycle [Schoegel 2002, p.69] Customer Relationship Management now is the strategic use of information, processes, technology and people to manage the customer s relationship with [a] company (marketing, sales, services, and support) accross the whole customer life cycle [Kincaid 2002, p.41]. The objective of CRM is to assure a company s ability to build up and maintain a long-lasting, profitable relation to customers in order to fully benefit from their economical potential. In particular, the acquisition of new clients as well as the retention of existing ones needs to be ensured (cf. [Schoegel 2002, p. 33]). 2 a similar concept is the Customer Life Cycle 3

2.1.1 Perspectives of CRM Based on the above definition, CRM can now be analysed according to [Schoegel 2002, p. 35-44] from three different perspectives: a strategic, a process oriented and a technological point of view. The strategic perspective focuses on the active management of the customer relationship. The aim is to build up and maintain the economic potential of the individual customers, based on the fact, that a customer only is profitable for an enterprise after some years, due to more frequent and increased purchases, reduced support and operating costs. Instead of only aquiering new clients, CRM therefore tries to establish a long-lasting relation to valuable customers by for example offering new products (crossselling) and services to existing clients. From the process perspective, the goal is to implement the above mentioned strategy. Independent of the communication or distribution channel involved, a customer should obtain the necessary services by the company during the whole Customer Life Cycle. CRM should convince the customer that he will benefit from establishing a long-lasting relation to the company. The perspective of the information technology consists in the automatisation of all customer management processes. CRM systems should support and integrate operative processes at the front-office, offer possibilities for the collaboration between all involved parties as well as provide analytical tools to retrieve the relevant data. The aim is to build up an integrated system in order to acquire and exploit the full economical potential of a customer. Using such a system adds the E to CRM. 2.1.2 Adding the E to CRM Electronic Customer Relationship Management (E-CRM) incorporates modern information and communication technologies into traditional CRM. The potential of E-CRM is to provide a system, which allows the integration of all involved parties and aspects of CRM. The vision of E-CRM is to offer means to collect and analyse data from various internal and external sources in order to extract information concerning valuable customers and how a company should interact with them (cf. [Schoegel 2002, p. 44]). Nevertheless, as the cited author states, IT is only an instrument to achieve this vision, but does not provide a CRM-Solution by itself. To illustrate the 4

activities of E-CRM systematically, he proposes in [Schoegel 2002, p. 44-60] a model consisting of an identification, selection and interaction phase (compare Figure 2). Figure 2: Phase Model [Schoegel 2002, p. 46] The identification phase tries to first gather all relevant data concerning customers, markets and basic conditions, and to structure this data afterwards to identify specific customer groups. In the case of e-commerce, an approach is to use Data or Web Mining methods (see section 3) to extract the necessary information and to classify customers into groups. Having these customer groups, the selection phase then aims at choosing the valuable customer segments. The objective is to adapt or adjust the relation to these groups according to their specific needs and their value for the enterprise. The next phase, the interaction, now wants to establish a profitable relation to these customers. This interaction is based on the information obtained in the two previous stages. 2.2 Web Mining With the still increasing importance of the internet, the amount of information and data available is constantly growing. Associated with this fact are among others the following problems (cf. [Kosala 2000, p. 1]): finding the relevant information, creating new knowledge out of the information available, personalization of information and learning about consumers or individual users. To solve these information overload problems, Web Mining techniques can be used. The term Web Mining describes the use of data mining techniques to automatically discover and extract information from Web documents 5

and services [Kosala 2000, p. 2]. It is an interdisciplinary domain integrating research from database, information retrieval, artificial intelligence, machine learning and natural language processing. According to [Kosala 2000, p. 2], Web Mining can be decomposed in the subtasks of resource finding (retrieve intended Web documents and data), information selection and pre-processing (automatically selecting specific information), generalization (discover patterns) and analysis (validation, interpretation of the mined patterns). Based on which part of the Web is mined, Kosala further suggests to decompose Web Mining into three different categories (cf. [Kosala 2000, p. 3-4]). Web Content Mining searches useful information from Web contents, data or documents, Web Structure Mining wants to discover a model of the link structures of the web, and Web Usage Mining analyses secondary data generated by the user s interaction and behaviour. These categories (see Figure 3) are briefly introduced in the following paragraphs, a more detailled overview can be found in [Kosala 2000, p. 3-10]. Figure 3: Web Mining Categories [Kosala 2000, p. 5] 2.2.1 Web Content Mining The web contains several types of data like text, images, hyperlinks, audio or video. In order to find these information, people nowadays often browse or search the internet based on keywords. But using search tools, as for example Google, often leads to a great number of irrelevant search results. Web Content Mining therefore tries to find new means to retrieve relevant data from the internet. From an information retrieval view, the research aims at improving the finding itself or the filtering of the information. This is basically done by applying text mining techniques, because most of the 6

web content is unstructured text data. From an database research view, the focus is on finding models to capture web data. The goal here is to obtain new means to perform more sophisticated queries, which lead to better search results. 2.2.2 Web Structure Mining The internet basically is a set of linked web pages. Web Structure Mining is interested in the structure of these links to finally be able to improve retrieval performance, to categorize web pages, find out similarities or relationships between pages or to determine authority or overview sites for specific subjects. To accomplish this, the internet often is modelled as a graph so that appropriate algorithms can be applied afterwards. An introduction to this research area can be found in the paper [Fuernkranz 2002]. 2.2.3 Web Usage Mining While the above mentioned categories rely on primary web data, Web Usage Mining studies secondary data generated by the interaction of the user with a website. This secondary data includes for example web server logs, proxy or browser logs, cookies, user queries, bookmark data, sessions, transactions or mouse clicks and scrolls. The idea here is to make sense of the data generated by the Web surfer s sessions or behaviors [Kosala 2000, p. 4]. A first approach is to map web usage data into common relational database tables and then to apply corresponding data mining techniques. Another approach uses the raw data, like log files and cookies, directly. The objective of both is to pre-process and clean the data to be able to find patterns within it afterwards. The main applications for Web Usage Mining are to find out user profiles or models and to learn navigation patterns. The gained information then can be used for different tasks like adaptive web interfaces (personalized), improving the effectiveness of web sites or systems, site modification, business intelligence and usage characterization. 7

3 Web Usage Mining in E-CRM Customer Realtionship Management (CRM) tries to build up and maintain a long-lasting, profitable relation to a customer accross the whole customer life cycle in order to fully benefit from a client s economical potential. Electronic Customer Relationship Management (E-CRM) integrates modern information and communication technologies into traditional CRM as a means to support this process. To achieve the objectives of E-CRM, detailled information concerning the customer s characteristics, needs, interests and behaviour are required. In the case of e-commerce, a possible approach to obtain this information is to apply techniques of Web Usage Mining. This section first provides an overview over the objectives and general application fields of Web Usage Mining. Afterwards, the process of Web Usage Mining, consisting of the data selection and cleaning, user and session identification, data integration and pattern discovery and analysis, is outlined. Where not stated otherwise, the information concerning this section is based on the research papers of Hajo Hippner (cf. [Hippner 2002]) and Michael Ceyp (cf. [Ceyp 2002]). 3.1 Objectives In the phase model of E-CRM (compare Figure 2), the identification stage serves to first gather all relevant data concerning customers, markets and basic conditions, and to structure this data afterwards to identify specific customer groups. The resulting information then is used as a basis for the decisions in the selection and interaction phase, where valuable customer segments are selected for establishing a profitable relation to these groups of customers. In the case of e-commerce, such as for example in online shops, a customer first is only an anonymous visitor of the corresponding web site. It is therefore necessary to obtain the relevant information about these customers. During the interaction of the user with the web pages or applications, a considerable amount of data is generated (see Figure 4). This includes for example server log files, where each page request or error is recorded, cookies, which store information of a user s session, or data from online forms, where the user himself enters information. 8

Figure 4: Datasources in E-Commerce [Hippner 2002, p. 98] The problem now consists in extracting the relevant information out of this huge amount of data. As simple web page and logfile statistics often are inadequate, an alternative approach is to use techniques from Web Usage Mining to identify and analyse patterns in the collected secondary web data. 3.2 Application Fields of Web Usage Mining One of the main applications of Web Usage Mining is to provide information about online customers concerning their characteristics, needs, interests and behaviour. In the context of E-CRM, the purpose is to e.g. identify specific customer groups according to criterias like buying behaviour. This can serve as a basis for further activities, such as proposing cross- or up-selling opportunities. Another possibility is to use the obtained mining results directly to optimize a company s internet portal. Having information about the clients preferred way of browsing allows to place product information or advertisement at the corresponding important pages. Also personalization can be achieved by Web Usage Mining. This includes for example customized marketing campaigns for certain groups of customers. Likewise, the content of the web page can be displayed dynamically for each customer according to the group he belongs to. For example, the layout of the page can be adjusted or information concerning certain products can be displayed. Web Mining can also have an impact on the strategical planning of an enterprise. Based on the mining result, marketing activities can for example be addressed to a certain segment of customers only. Other possibilities include strategic alliances with other websites or online advertisement on certain sites. 9

Table 1 lists the above mentioned and further application fields of Web Usage Mining according to [Hippner 2002, p. 101]. Documentation: Information: Buying Behaviour: Optimization: Personalization: Strategical Planning: - documentation of online customer behaviour - web statistics - enhance datawarehouse with interaction data - how are my clients? - market / customer segmentation - classification of customers - customer potential analysis - analyse the buying behaviour of customers - shopping cart analysis - selling patterns, recognize trends - determine cross- / up-selling potential - configure internet portal - success control of the internet presence - optimized placement of advertisement - target group specific marketing campaigns - individual interaction with online customers - personalized page content - personalized products, services - planning of marketing campaigns - strategical alliances Table 1: Application Fields of Web Usage Mining (cf. [Hippner 2002, p. 101]) 3.3 The Web Usage Mining Process The fundamental difference between Data and Web Mining is the format of the raw data used. In the case of Data Mining, this data is stored in the tables of a relational database, whereas the data source for Web Usage Mining consists of server log files, cookies or data from online forms. A first approach is to map this web usage data into common relational database tables, whereas another approach uses the raw data directly. The objective of both is to pre-process and clean the raw data, apply corresponding data mining techniques and try to find patterns afterwards. The process of Web Usage Mining then can be decomposed into five different stages (see Figure 5). The first stage is the data cleaning, which 10

Figure 5: Web Usage Mining Process [Hippner 2002, p. 91] serves to extract the web page requests out of other data contained in the server log files. Based on the clean log, the user and session identification stage then tries to identify individual user s and their session s. Having this session data discovered allows to integrate other data, like user or customer data, afterwards. Pattern discovery is then applied to the integrated data and the analysis tries to find interesting patterns, which may contain useful information. 3.3.1 Data Cleaning Prior to the actual data cleaning, it needs to be determined, which type of raw data is going to be analysed. The most common data source here are the access and error logfiles of webservers or application servers. Such a server records all user activity by default and writes them into a text file. In the case of webservers, two common formats for the log files are used: the Common Logfile Format (CLF) and the Expanded Common Logfile Format (ECLF). They contain among others information concerning the IP adress of the request (host), the date and time or the kind of request, depending on how the webserver is configured. Table 2 lists the entries of a default Apache webserver logfile. The data cleaning then consists primarely in the identification of web page requests. The problem with webserver logfiles is, that they log each file request from a client. This includes web pages, images or stylesheets, as seen in the example of Figure 6. The determination of the number of actual 11

host ident authuser date request status bytes IP address of the client (remote host) which made the request to the server identity of the client (if available) userid of the person requesting the document (if available) time that the server finished processing the request request line from the client status code that the server sends back to the client size of the object returned to the client Table 2: Apache Webserver Logfile (cf. [Apache 2005]) page requests is therefore usually done by selecting a characteristic element of a certain page, like the page document itself (e.g. home.php). Figure 6: Example Apache Logfile Nevertheless, it is not possible to determine the exact number of web page requests due to caching. Often requested pages are kept in cache of either the proxy server of an internet provider or in the clients browser itself. This improves performance, because such a cached page can be loaded faster as when it is requested of the webserver each time. A possibility to solve this problem is to force clients to reload cached pages regularely by setting the corresponding parameters. But the drawback of this strategy is a decreased performance of the webserver. 3.3.2 User and Session Identification User Identification: The next step is the identification of individual users, which is the crucial criteria for the whole Web Usage Mining process. Basically, this can be done by using the IP adresse contained in the logfile, which uniquely defines a computer in the internet. Nevertheless, there exists a number of problems concerning this approach. First, because internet provider only have a restricted number of possible IP adresses, they usually assign dynamic IP adresses to their clients. As a consequence, clients do not always have the same IP or, stated otherwise, the same IP may belong to different clients. Second, even a unique IP 12

adresse may be used by different users, as for example in the case of public computers in a company or in an internet cafe. And last, not all clients are really human users, there exists a number of robots or spiders which automatically search the web for information or indexing web pages. To solve these problems, a basic approach is to combine the IP address with the browser used. The ECLF logfile contains a field indicating the browser type (e.g. Internet Explorer, Firefox). As this field also indicates the access of robots and spiders, these records can be easily eliminated. But it stays difficult to determine users with dynamic IP s, because most people still use the same browser (Internet Explorer). Further, this approach does not give any result for clients using a public computer. An alternative is to use cookies for the identification of users. A webserver can send cookies to a client in order to store information like the session-id on the local harddisc of a client s machine (see Figure 7). The persistency of cookies can be set to only the current session or to a specific date. This way, it is possible to clearly identify a certain computer. Nevertheless, clients using a public computer still can not be identified. Further, a user can delete cookies or set his browser s privacy settings to either disable cookies at all, or to automatically delete them. Figure 7: Cookie Content In summary, there is no possibility to automatically identificate a user uniquely. This fact is a major problem of Web Usage Mining, because it is only possible to analyse multiple-transaction patterns with a clear identification. As a consequence, if an unambiguous identification is required, the only possibility to identify a user is by a login with username and password. 13

But also this approach has its drawback, because customers can invent fake identities. Session Identification: The term session refers to all page requests a specific user has done in a certain time period. Because webservers log requests independent of clients, the previous identification of users is required in order to be able to identify now such a session. The usual criteria to determine a session is the time between subsequent page requests of a user. The end of a session is reached, if the user does not request another page in a defined time period (e.g. 30 minutes). Having identified a session, valuable information can be obtained concerning the entrance of a user, the number of requested pages, the time a user spends on a certain page or the most common exit pages. 3.3.3 Data Integration In addition to the information obtained by the user and session identification, it is often useful to integrate further data for a refined analysis. Common data sources include user data, customer data or transaction data. Care needs to be taken concerning data security and privacy. No data should be accessible by unauthorized persons nor should it be used without the explicit approval of a customer. User data can be obtained by means of login and user profiles. Here the user himself enters the corresponding information into forms. A good practice is to use an opt-out default, where the user explicitly selects the optin possibility only if he agrees that this data may be used by the company. Similarly to user data, customer data from a productive system can be used as well. Finally, transaction data can be gathered automatically from e.g. online orders. This kind of data is especially useful, because it contains valuable information regarding the behaviour of customers like the buying behaviour. 3.3.4 Pattern Discovery and Analysis Having selected and cleaned the raw web data, identified users and sessions and possibly integrated further data, the actual data mining can now begin. The aim is to find patterns in the integrated data, which allow in the 14

subsequent analysis to obtain information concerning the customer s characteristics, needs, interests and behaviour. The usual methods used in Data Mining include Path Analysis (Sequence Analysis, Click-Stream Analysis), Classification, Clustering, Predicting and Association Analysis. The following paragraphs introduce these different methods based on [Ceyp 2002, p. 115-117], for a more detailled explanation refer to one of the available books on data mining like [Han 2000]. Path Analysis (Sequence Analysis, Click-Stream Analysis): This method tries to find out constantly recurring navigation paths of the user. This can be accomplished by analysing the discovered sessions of the users. The main interest is to optimize the website in such a way, as it is convenient to browse for a user. Further goals are to find out the user s entrance URL to a company s website. Based on this information, the locating of the website can be optimized by for example adding advertisement banners on the corresponding entrance sites. Classification: This analysis assigns customers to certain predefined classes according to some criterias. If for example a user visits a certain page and stays there for some time, this can be interpreted as an interest in the corresponding page s content. According to this content, he then can be classified e.g. as buyer or non-buyer. Clustering: With clustering, users are assigned according to similarities to a certain class. Such a class is consistent internally, but can be clearly distinguished from other classes. The example given in [Ceyp 2002, p. 116] is a travel portal that classifies customers according to their interests into e.g. last-minute, education or beach vacationer. Predicting: As implied in the name, this methode tries to predict certain values based on recorded facts. This is usually only a probabilistic approach by applying for example regression analysis. Predicting can for example be used to estimate an average order value of a customer in an online shop. Association Analysis: Association Analysis wants to discover rules having a certain significance concerning some conditions. As the measured 15

values do not need to be absolutely precise, this approach often is used for e.g. offering customers cross-selling opportunities. So proposes for example Amazon.com further books according to already selected books. 4 Conclusion In the case of E-CRM, it is necessary to collect information concerning the characteristics, needs, interests and behaviour of previously anonymous customers. As a huge amount of data is generated by the interaction of clients with a web application, the problem consists in extracting the relevant information out of this data. The approach to apply Web Usage Mining to identify and analyse patterns in the collected secondary web data has the potential to obtain the desired results. Nevertheless, it is not possible to definitively identify a customer. This is a major drawback, because the clear identification of clients is a key component of Web Usage Mining for E-CRM. Not having this information restricts the possible analysis afterwards, because no multiple-transaction patterns can be discovered. A solution to overcome this problem is to demand an explicit login of the customer by a username and password. But this approach requires, that the customer has the impression to obtain a profit (e.g. sales discount) by providing his personal information. In summary, the objective of Web Usage Mining is to provide information about online customers. The main application fields include the documentation of online customer behaviour, optimization of a company s internet portal, the analysis of buying behaviour and personalization of the content and layout of websites. But due to the mentioned uncertainties concerning the user identification, it is doubtable, if multiple-transactions patterns can be analysed. 16

List of Figures 1 Buying Cycle [Schoegel 2002, p.69]............... 3 2 Phase Model [Schoegel 2002, p. 46]............... 5 3 Web Mining Categories [Kosala 2000, p. 5].......... 6 4 Datasources in E-Commerce [Hippner 2002, p. 98]...... 9 5 Web Usage Mining Process [Hippner 2002, p. 91]....... 11 6 Example Apache Logfile..................... 12 7 Cookie Content.......................... 13 List of Tables 1 Application Fields of Web Usage Mining (cf. [Hippner 2002, p. 101])................................ 10 2 Apache Webserver Logfile (cf. [Apache 2005])......... 12 17

References [Apache 2005] Apache: Apache Web Server Version 2. Log Files. URL: http://httpd.apache.org/docs-2.0/logs.html (14.04.2005). [Ceyp 2002] Ceyp, Michael H. 2002: Potenziale des Web Mining für das Dialog Marketing. In: Schoegel, Marcus / Schmidt, Inga (Hrsg.) 2002: ecrm mit Informationstechnologien Kundenpotenziale nutzen. 1. Auflage, Symposion Publishing GmbH, Duesseldorf, Germany, p. 105-126. [Fuernkranz 2002] Fuernkranz, Johannes 2002: Web Structure Mining. Exploiting the Graph Structure of the World-Wide Web. Austrian Research Institute for Artificial Intelligence, Wien, Austria, O.J. URL: ftp://ftp.ai.univie.ac.at/papers/oefai-tr-2002-33.pdf (14.04.2005). [Han 2000] Han, Jiawei / Kamber, Micheline: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers Inc, San Fransisco, USA. [Hippner 2002] Hippner, Hajo / Merzenich, Melanie / Wilde, Klaus D. 2002: Web Mining im E-CRM. In: Schoegel, Marcus / Schmidt, Inga (Hrsg.) 2002: ecrm mit Informationstechnologien Kundenpotenziale nutzen. 1. Auflage, Symposion Publishing GmbH, Duesseldorf, Germany, p. 87-104. [Kincaid 2002] Kincaid, Judith W. 2002: Customer Relationship Management. Getting it Right! Prentice Hall PTR, New Jersey, USA. [Kosala 2000] Kosala, Raymond / Blockeel, Hendrik: Web Mining Research: A Survey. Department of Computer Science Katholieke Universiteit Leuven, Belgium, July 2000. URL: http://wwwfaculty.cs.uiuc.edu/%7ehanj/pdf/computer02.pdf (14.04.2005). [Schoegel 2002] Schoegel, Marcus / Schmidt, Inga 2002: E- CRM - Management von Kundenbeziehungen im Umfeld neuer Informations- und Kommunikationstechnologien. In: Schoegel, Marcus / Schmidt, Inga (Hrsg.) 2002: ecrm mit Informationstechnologien Kundenpotenziale nutzen. 1. Auflage, Symposion Publishing GmbH, Duesseldorf, Germany, p. 29-86. 18

[Wikipedia 2005] Wikipedia 2005: Customer relationship management. 11.04.2005. URL: http://en.wikipedia.org/wiki/customer relationship management (14.04.2005). [Wikipedia 2005b] Wikipedia 2005: Web Mining. 10.01.2005. URL: http://de.wikipedia.org/wiki/web Mining (14.04.2005). [Wikipedia 2005c] Wikipedia 2005: Data Mining. 11.04.2005. URL: http://en.wikipedia.org/wiki/data mining (14.04.2005). 19