Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe



Similar documents
Database Marketing, Business Intelligence and Knowledge Discovery

A Near Real-Time Personalization for ecommerce Platform Amit Rustagi

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Understanding Web personalization with Web Usage Mining and its Application: Recommender System

Analysis & Visualization of EHR Patient Portal Clickstream Data

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

Mining for Web Engineering

Web Usage Mining. from Bing Liu. Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data, Springer Chapter written by Bamshad Mobasher

Introduction to Data Mining

Oracle Real Time Decisions

Appendix B Data Quality Dimensions

Analyzing the footsteps of your customers

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Predictive Dynamix Inc

Organizational Requirements Engineering

Seamless Dynamic Web Reporting with SAS D.J. Penix, Pinnacle Solutions, Indianapolis, IN

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Fluency With Information Technology CSE100/IMT100

Verifying Business Processes Extracted from E-Commerce Systems Using Dynamic Analysis

Data Mining for Web Personalization

not possible or was possible at a high cost for collecting the data.

PRACTICAL DATA MINING IN A LARGE UTILITY COMPANY

Introduction to Data Mining

MS SQL Performance (Tuning) Best Practices:

Detection and mitigation of Web Services Attacks using Markov Model

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Data Warehousing and Data Mining in Business Applications

Information Management course

Social Media Mining. Data Mining Essentials

Using Provenance to Improve Workflow Design

Sanjeev Kumar. contribute

Web analytics: Data Collected via the Internet

Data Mining System, Functionalities and Applications: A Radical Review

Requirements Analysis Concepts & Principles. Instructor: Dr. Jerry Gao

IFS-8000 V2.0 INFORMATION FUSION SYSTEM

Data Mining Solutions for the Business Environment

recommendation in e-commerce

An Overview of Database management System, Data warehousing and Data Mining

Understanding BEx Query Designer: Part-2 Structures, Selections and Formulas

Using reporting and data mining techniques to improve knowledge of subscribers; applications to customer profiling and fraud management

ISSN: (Online) Volume 3, Issue 4, April 2015 International Journal of Advance Research in Computer Science and Management Studies

Worldwide Advanced and Predictive Analytics Software Market Shares, 2014: The Rise of the Long Tail

Recommendation Tool Using Collaborative Filtering

Comparison of Request Admission Based Performance Isolation Approaches in Multi-tenant SaaS Applications

Web Personalization based on Usage Mining

Data Mining is sometimes referred to as KDD and DM and KDD tend to be used as synonyms

Federico Rajola. Customer Relationship. Management in the. Financial Industry. Organizational Processes and. Technology Innovation.

IBM SPSS Direct Marketing 23

IMPROVING DATA INTEGRATION FOR DATA WAREHOUSE: A DATA MINING APPROACH

Journal of Chemical and Pharmaceutical Research, 2015, 7(3): Research Article. E-commerce recommendation system on cloud computing

Data Mining Analytics for Business Intelligence and Decision Support

ANALYSIS OF WEBSITE USAGE WITH USER DETAILS USING DATA MINING PATTERN RECOGNITION

A Procedure for Classifying New Respondents into Existing Segments Using Maximum Difference Scaling

IBM SPSS Direct Marketing 22

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Data Discovery, Analytics, and the Enterprise Data Hub

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

MINING CLICKSTREAM-BASED DATA CUBES

Program Visualization for Programming Education Case of Jeliot 3

Users Interest Correlation through Web Log Mining

Viewpoint ediscovery Services

Improved Software Testing Using McCabe IQ Coverage Analysis

Knowledge Mining for the Business Analyst

KnowledgeSEEKER Marketing Edition

Multi-agent System for Web Advertising

Interactive Information Visualization of Trend Information

4 Testing General and Automated Controls

Importance of Online Product Reviews from a Consumer s Perspective

SELF-SERVICE ANALYTICS: SMART INTELLIGENCE WITH INFONEA IN A CONTINUUM BETWEEN INTERACTIVE REPORTS, ANALYTICS FOR BUSINESS USERS AND DATA SCIENCE

A RESEARCH STUDY ON DATA MINING TECHNIQUES AND ALGORTHMS

Business Intelligence Solutions. Cognos BI 8. by Adis Terzić

Introduction to Data Mining

Classification and Prediction

How To Choose A Business Intelligence Toolkit

KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH

CHAPTER 4 Data Warehouse Architecture

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

FRAMEWORK FOR WEB PERSONALIZATION USING WEB MINING

QOS Based Web Service Ranking Using Fuzzy C-means Clusters

The University of Jordan

Introduction. A. Bellaachia Page: 1

Business Intelligence. A Presentation of the Current Lead Solutions and a Comparative Analysis of the Main Providers

We have addressed all reviewers comments as described below. We have no potential conflict of interest to report.

Vendor briefing Business Intelligence and Analytics Platforms Gartner 15 capabilities

Automatic Document Categorization A Hummingbird White Paper

Importance of Domain Knowledge in Web Recommender Systems

Chapter 20: Data Analysis

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Distinguishing Humans from Robots in Web Search Logs: Preliminary Results Using Query Rates and Intervals

ENHANCING INTELLIGENCE SUCCESS: DATA CHARACTERIZATION Francine Forney, Senior Management Consultant, Fuel Consulting, LLC May 2013

Expeditionary Learning at King Middle School. June 18, 2009

Engage your customers

How To Turn Big Data Into An Insight

SOFTWARE REQUIREMENTS

Challenges and Opportunities in Data Mining: Personalization

Web Mining as a Tool for Understanding Online Learning

Segmentation and Data Management

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Purposes and Processes of Reading Comprehension

Transcription:

Website Personalization using Data Mining and Active Database Techniques Richard S. Saxe Abstract Effective website personalization is at the heart of many e-commerce applications. To ensure that customers visiting these websites receive useful product recommendations and additional personalized service, website personalization is a critical business strategy. It is proposed that database techniques, including data mining and active databases, can be effectively combined to achieve an efficient and scalable personalization framework. This paper reviews existing personalization techniques, analyzes characteristics of database techniques applicable to website personalization, and identifies ways in which combining active databases and data mining techniques can produce a measurable improvement in existing website personalization efforts. Introduction Effective website personalization has become an important issue because of the pervasiveness of e- commerce applications [2, 10, 12]. Personalization aids e- commerce websites by targeting customers and providing them with recommendations based on information such as viewing history, previous purchases, and location, among others. Wu et al. examined some of the most widely used websites and classified the sites by the type and degree of personalization utilized [11]. The websites were classified into the major categories of e-commerce, information service, financial service, and search engine sites. The personalization analysis identified the e-commerce and search engine pages with personal tools as having the most personalized content [11]. Financial services pages offered some degree of personalization while information service sites provided little or no personalization. Previous methods for website personalization exist and have been documented elsewhere [2, 5, 7, 9]. These techniques include both rules-based personalization engines [7, 9] as well as web usage data mining personalization [2]. Literature reviews to date have not found a framework that incorporates both of these practices. Data mining is a general term used to describe the extraction of relevant data from large databases. Behavior patterns can be recognized and organized using the data mining technique of web usage mining [2]. Therefore, data mining is a potentially useful technique for website personalization and providing recommendations to customers visiting e-commerce websites. Active database techniques are often employed to retain rules in a database and to link the rules to specific actions within the database [3]. Active database applications are distinct from traditional databases in that active databases utilize eventcondition-action (ECA) rules that specify the action to be executed by the database when a certain event occurs under a given condition [3]. The problem of establishing a website personalization framework could be mitigated through the use of a combination of the above database techniques. Data mining can be used to gather relevant, timely information related to a specific customer or user activities. Active database rules could use the information gathered by data mining to trigger specific events, such as display of a product recommendation. Thus, it is proposed that website personalization can be accomplished through the use of database techniques based on the rules entered in an active database and trends in the customer s behavior discovered through data mining techniques. Website Personalization The objective of website personalization is to provide users with a personalized experience based on known, relevant information. Examples include the user s location, previous navigation patterns, and items purchased [8, 9, 10, 12]. Website personalization is especially useful in e-commerce applications because of the ability to understand and predict patterns of Internet shopper behavior and to provide relevant recommendations to a user based on a complex history of browsing and purchases. In addition, the web personalization system can collect details such as the user s location, personal preferences, and demographic information to integrate into recommendations generated by the system. Active Database Systems Any database system with rule processing capabilities [3] can be termed an active database system. An active database is able to trigger a specific action when a set of conditions is met. In contrast, a traditional passive database system relies on an outside program or user to explicitly execute a query or action. Active database systems, on the other hand, are able to incorporate the execution of a query within the database when a specific set of conditions is met [3]. The action is a response to a set of production rules or event-conditionaction (ECA) rules. A prerequisite for an effective active database system (ADBS) is that it meets the criteria of a traditional, passive database system [4]. One of the most distinguishing characteristics of the ADBSs is the event-condition-action (ECA) rule. The ECA rules are pre-defined in the active database. Each item, event, condition, and action, must be defined in the ADBS in order to execute the proper response. In general, an event is any action on the part of the user or application that would require a response. The conditions define what criteria must be met in order for an action, triggered by the event, to occur. Once an event SF2-T2-1

triggers the rule and one or more conditions are met, an action can occur [3, 4]. All ADBSs must implement a rules management engine in order to facilitate the ECA rules. For example, some ADBSs allow for concurrent rule execution of multiple rules, while in others, trigger of multiple ECA rules requires sequential processing. In the case that multiple rules are triggered, there may be a need for some type of conflict resolution system to prioritize actions [3, 4]. Three of the earliest ADBS prototypes that truly exhibit ECA rule processing are the Ariel (1992), POSTGRES (1991), and Starburst (1991) systems. These prototypes differ somewhat in how the rule is triggered [3]. Data Mining Data mining is one technique to draw a subset of useful information from a large database. The large amount of data that can be stored in databases not only allows for extraction of useful information on trends and patterns, but also validation of the trends or patterns that are extracted. Thus, data that may appear to be unrelated at first glance can be mined as a means of knowledge discovery [1] to extract data that is more meaningful. Website Personalization Although there are many definitions to describe website personalization, the definition of Wu et al. will be used in this review. Wu et al. describes website personalization as the modification of the display of a website for each specific user. The modifications can include website content, the display of information on the page, and the path that a user is directed to navigate through a site. There are three primary objectives of website personalization: keeping users interested in the website, building a relationship with a user, and getting the user to interact with the website [6]. Although personalization cannot draw a user to a website for the first time, personalization can be used with other tools to keep the user on the website and encourage them to visit the site again. Building a relationship between a user and an e- commerce application is important so that the website keeps a customer s business and the customer perceives that there is value in frequenting the site. Allowing a user to participate in the website lets the e-commerce application learn about the user and also promotes customer loyalty [6]. Website personalization techniques can be generally divided into implicit and explicit techniques. Explicit techniques (cookies, profiles, personal tools) rely on the user entering some type of information into a form on the website. On the other hand, implicit techniques (opportunistic links, recommender systems) gather information on the user s activities and alter the page without the user being aware of this activity. Each of the explicit and implicit methods can be used to modify the content (information) or interface (page display characteristics) of the website [11]. The remainder of this section discusses website personalization techniques from a user s perspective as discussed in Wu et al. Website Personalization from a User Perspective Content personalization. Personalization of content is arguably the most important type of personalization from a user s perspective. Different filtering techniques, described in subsequent sections, can be used to personalize the content to match some characteristics of the user. For example, Amazon.com uses a recommender system based on previous purchases and other similar users purchases to personalize the content displayed throughout the user s page navigation activities. Link personalization. Display of links to other sites or web pages based on user activity and profile is another way that a user can interact with a personalized page. Link personalization can be either implicit or explicit. User tools such as My Links that are added by the user are an example of explicit link personalization. Implicit link personalization would involve the use of opportunistic links, discussed below. Customized screen design personalization. The user may also be able to personalize a page by changing the display characteristics of the page. Personalized calendars and other tools can be specified by the user and will be updated each time the user visits the site. Software Techniques for Implementing Website Personalization Cookies. Cookies are data files that are generated on a user s computer each time personal information is entered into a form or other interactive portion of a website. These are a very simple method of personalization. When a user visits that web page the next time, the software accesses these files to personalize aspects of the page. Profile-based. Having a user fill out a user profile is another simple way to personalize a website. Address and postal code information can be used to personalize the display to show items of interest in a particular geographic area, whereas demographic information can be used to personalize the display to highlight items of interest to a specific gender or age group. Personal tools. Several e-commerce websites allow a user to set up a personal profile. This information allows the user to personalize a page to their exact content specifications. For example, portal sites, such as Yahoo, allow users to enter personally chosen links to appear within their portal. Opportunistic links. Opportunistic links are generated based on user behavior. The generation of these links does not take into account the whole of a user s past activity, but instead generates links that may be of interest to a user based on a recent action without consideration of the intent of the action. Recommender systems. Recommender systems act much like the rules that dictate which opportunistic links to display. Recommender systems, however, use the aggregate of a user s past activity to provide very personalized recommendations. The users can be pointed to a specific product or link based on the activities of other, similar users, or the personalized recommendation can be based on a user s profile and past browsing SF2-T2-2

behavior. Recommender systems are the most sophisticated of the website personalization techniques and will be the focus of personalization discussions throughout the remainder of this paper. Current Methods of Website Personalization through Recommender Systems Three major methods of website personalization are commonly used in recommender systems: decision rulebased filtering, content-based filtering, and collaborative filtering [8, 10, 12]. Collaborative filtering. Collaborative filtering, the most common technique, uses the preferences or ratings of a large group of users (shoppers, etc.) to make recommendations for other users with a similar profile [10, 12]. In order for this method to provide useful recommendations, the system must have gathered preference information from a sufficiently large user group. If the group is too small, the tastes of that group will be too varied and the recommendations are less likely to be of a high quality. The items rated must be relatively simple so that users do not have the option of liking one aspect of the product but not another. Content-based filtering. Content-based filtering is similar to collaborative filtering. Instead of analyzing the patterns of a large number of users and making recommendations to other users based on the group selections, content-based filtering focuses on a specific user s past browsing and purchasing history [8]. Decision rule-based filtering. In contrast to both content-based and collaborative filtering, decision rulebased filtering is a more static system. This means that a fixed survey is presented to a user and, on the basis of responses to the survey, recommendations are provided based on predetermined rules that are programmed into the system [8]. Elements of a Successful Recommender System To characterize a website personalization recommender system as successful, several measures of overall performance can be used. Common performance indicators of personalization engines are efficiency, scalability, and quality of recommendation [12]. Efficiency refers to the amount of processing necessary to serve up the appropriate recommendation. The diminished end-user satisfaction that results from slow responses in real-time systems makes efficiency of response a critical parameter. In the same manner, system scalability ensures that the response efficiency does not decrease as more and more data is collected by the system. Recommendation quality is essential to provide the proper user with appropriate recommendations based on the suite of data gathered on the user. Database Techniques for Website Personalization ADBSs and Web Personalization ADBSs have a set of characteristics that make their use in web personalization favorable. The purpose of the three types of web personalization engines is to dish up information based on a set of pre-defined rules. Most personalization engines store and execute the pre-defined rules within the application layer of the system. In terms of web personalization, this could mean that the event might be a user browsing to a database-driven e-commerce site. Once this event occurs, the ADBS may check for certain conditions, such as the user s location, previous purchase and viewing history, or date and time of the visit. Finally, an action is executed, returning to the user specific content, as defined by the ECA rules in the ADBS [9]. Data Mining for User Behavior Patterns In web personalization, the application will first prepare data for searching by storing a user s requests in a common database and then use data mining pattern discovery algorithms to search within a large dataset for patterns [2]. The groupings of associated data and the extracted patterns could then be returned in a more manageable format to the web personalization engine in order to help the personalization engine determine which data to display. Common Data Mining Techniques related to Web Personalization Clustering. The use of clustering allows a large set of data associated with one user to be simplified and plotted as one point on a graph. Clustering analysis will identify groups, or clusters, of data points located close together on this graph. Thus, distribution patterns, not previously obvious in the initial large database, can be discovered [1]. Clustering is most applicable when the collaborative filtering approach for personalization is used. Clustering is a method of forming data sets derived from a history of multiple users preferences in order to formulate peer groups that exhibit common preferences [12]. Based on this, personalized recommendations to members of the same peer group, or cluster, can be made. The graphs are constructed by using certain specifications of the product as the graph s axes. Data mining will allow each customer to be represented as one point on the plot, based on his or her preference history. Customers with similar preference histories will be clustered together on the plot. However, it is important to note that the most relevant product specifications must be used in constructing the plot, because if too many product variables are involved, no clustering will be apparent [12]. Association rules. Association rules are used to discover patterns in sets of items, as opposed to the behavior patterns of similar users. Association rules can discover buying patterns and link the presence of one item in a transaction to another item in the transaction [1, 12]. For example, if there is one brand of peanut butter in the consumer basket, there may always be a specific brand of jelly in the same transaction. The confidence of the rule can be calculated by determining the fraction of transactions that contain both products of interest [12]. The confidence value can be used in web personalization systems when determining whether or not specific products should be recommended based on other items in the basket. Classification. Classification is a third data mining technique useful in web personalization systems. Whereas clustering can analyze patterns of users and form them into SF2-T2-3

peer groups and association rules can discover patterns between items purchased, classification is able to incorporate all of this information and more in making a recommendation. Classification relies on a set of training data that is used in a way similar to a regression analysis. Each category (previous purchase of an item, browsing history, demographics of consumer, etc.) of information is given a label (yes/no, quantitative, etc.) and this is used to train the model to give certain recommendations based on a large amount of input data 1, 12]. Once each label is optimized through the training steps, real user data is input and the appropriate recommendations are given. Proposed Personalization Framework This paper proposes to combine the elements of active database systems and data mining techniques to provide an efficient, scalable, high quality web personalization system. Because data mining techniques are able to gather and analyze stored historical data for users, this data can be stored within an active database and used to execute rules based on the information gathered. The elements of eventcondition-action rules, clustering, association, and classification techniques could be effectively combined to enhance existing web personalization systems. Examples of Database Techniques in Web Personalization Systems Web-usage (data mining) personalization engines. One example of a data mining personalization system is described in Cho et al. [2]. This five-step methodology is based on data mining techniques for a web-based recommendation system. The steps of problem definition, target customer selection, customer preference analysis, product association analysis, and recommendation generation are each based on data mining. After definition of the problem, targeting customers and making recommendations, predictive data mining was used to target customers. Customer preference analysis was tracked with a clustering technique. Association rules were used in the product association analysis. Finally, the customer preferences and product associations were used to generate a recommendation. Rules-based (active database) personalization engines. An example of a rules-based web page personalization system is described in Kiyomitsu et al. [9]. This system is based on the active database event-condition-action sequence. The ECA rule is comprised of an event (the user s request to read the page and his or her location, date and time of access, and the clickstream used to navigate to the page), a condition (the user s browsing history), and an action (creation of the end page for display to the user). Improving Performance Measures through Data Mining and Active Databases The incorporation of data mining results and ECA rules in an active database is proposed to enhance the performance of a website personalization system as measured by three metrics: efficiency, scalability, and quality of recommendation. Efficiency. An efficient web personalization system uses minimal processing in order to serve up the appropriate recommendation. Efficient processing will enhance end-user satisfaction by preventing slow responses in these real-time systems. The use of the data mining technique of clustering will accelerate the association of a user with a specific peer group with similar preferences. The use of an active database framework allows the database to determine what content to display to the end user without the need for a middle-tier of business logic. By embedding the ECA logic into the active database, one can eliminate the costly use of passing the mined data from the database to a program that would then be needed to determine the proper action or content to be displayed. Scalability. A measure related to efficiency, system scalability ensures that the response time does not increase as additional data is collected by the personalization system. The use of data mining techniques such as clustering, association rules, and classification simplifies complex data by separating out the most significant variables, allowing the large collections of data to be searched in a more efficient manner. Active databases again eliminate the need for intermediate processing, keeping a more compact system. Quality of recommendation. To provide the proper user with appropriate recommendations, the personalization system must be able to provide high quality recommendations based on the data gathered on each user. The quality of recommendation will be increased by allowing multiple methods of website personalization, both implicit and explicit. Data mining can be used to implicitly predict the user s preferences based on the choices of other users. Active database techniques allow administrators to incorporate explicit ECA rules into the system. Combining implicit web-based and explicit rules-based personalization techniques allows for more precise recommendations. Conclusions Website personalization in e-commerce applications helps to provide users with meaningful content when making purchase decisions. The type of website will dictate the degree of personalization and the type of personalization that is appropriate. Many methods for software- and user-defined personalization allow for more personalized interaction through explicit and implicit techniques. This paper focused on recommender systems, an implicit personalization technique. It is proposed that data mining and active database techniques can enhance the functionality of websites using recommender systems for personalization. The data mining techniques allow the database to process large amounts of data while active databases specify rules that should be executed contingent upon the results of the data mining activity. Future work could investigate an architecture for implementation of website personalization through database techniques such as data mining and active database systems. References [1] Chen, M., Han, J., and Yu, P. (1996) Data Mining: An Overview from a Database Perspective. IEEE Transactions on Knowledge and Data Engineering. 8, 866-883. [2] Cho, Y., Kim, J., and Kim, S. (2002) A personalized recommender system based on web usage mining and SF2-T2-4

decision tree induction. Expert Systems with Applications. 23, 329-342. [3] Dayal, U., Hanson, E., and Widom, J. (1994) Active Database Systems. In: Modern Database Systems: The Object Model, Interoperability, and Beyond. Kim, W., ed. Addison-Wesley, Reading, MA. [4] Dittrich, K., Gatziu, S., and Geppert, A., eds. (1996) The Active Database Management System Manifesto: Rulebase of ADBMS Features, A Joint Report by the ACT-NET Consortium. SIGMOD Record. 25, 40-49. [5] Fong, J., Hughes, J., and Zhu, J. (2000) Online Web Mining Transactions Association Rules using Frame Metadata Model. IEEE. 121-129. [6] Fraser, S.R.G. (2002) Real-World ASP.NET: Building a Content Management System. New York: Springer [7] Garrigós, I., Gómez, J., Cachero, C. (2003) Modelling Dynamic Personalization in Web Applications. ICWE 2003. 472-475. [8] Hu, S. (2002) Helping Online Customers Decide through Web Personalization. IEEE. 17, 34-43. [9] Kiyomitsu, H., Takeuchi, A., Tanaka, K. (2001) Web Reconfiguration by Spatio-Temporal Page Personalization Rules Based on Access Histories. In: Proceedings of the Symposium on Applications and the Internet (SAINT 2001). IEEE Press. 75-82. [10] VanderMeer, D., Dutta, K., and Datta, A. (2000) Enabling Scalable Online Personalization on the Web. In: Proceedings of EC 00, October 17-20, 2000, Minneapolis, Minnesota. 185-196. [11] Wu, D., Im, I., Tremaine, M., Instone, K., Turoff, M. (2002) A Framework for Classifying Personalization Scheme Used on e-commerce Websites. IEEE Proceedings of the 36 th Hawaii International Conference on System Sciences (HICSS 03). [12] Yu, P. (1999) Data Mining and Personalization Technologies. In: Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA99). 1-22. SF2-T2-5