An Intelligent E-commerce Recommender System Based on Web Mining



Similar documents
A Multivariate Survey Analysis: Evaluation of Technology Integration in Teaching Statistics

Traffic Flow Data Mining and Evaluation Based on Fuzzy Clustering Techniques

Fuzzy Task Assignment Model of Web Services Supplier

output voltage and are known as non-zero switching states and the remaining two

The dimensionless compressibility factor, Z, for a gaseous species is defined as the ratio

Laws of Exponents. net effect is to multiply with 2 a total of = 8 times

A Capacity Supply Model for Virtualized Servers

INVESTMENT PERFORMANCE COUNCIL (IPC) Guidance Statement on Calculation Methodology

Queueing Analysis of Patient Flow in Hospital

Social Network Analysis Based on BSP Clustering Algorithm

Soving Recurrence Relations

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Modified Line Search Method for Global Optimization

ABSTRACT INTRODUCTION MATERIALS AND METHODS

Professional Networking

3 Energy Non-Flow Energy Equation (NFEE) Internal Energy. MECH 225 Engineering Science 2

1 Computing the Standard Deviation of Sample Means

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Clustering Algorithm Analysis of Web Users with Dissimilarity and SOM Neural Networks

INVESTMENT PERFORMANCE COUNCIL (IPC)

Baan Service Master Data Management

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

APPLIED THERMODYNAMICS TUTORIAL 2 GAS COMPRESSORS

The Importance of Media in the Classroom

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

I. Why is there a time value to money (TVM)?

Hypothesis testing. Null and alternative hypotheses

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand

Measures of Spread and Boxplots Discrete Math, Section 9.4

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Journal of Manufacturing Systems. Tractable supply chain production planning, modeling nonlinear lead time and quality of service constraints

Determining the sample size

Planning Approximations to the average length of vehicle routing problems with time window constraints

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Engineering Data Management

CHAPTER 3 THE TIME VALUE OF MONEY

Government intervention in credit allocation: a collective decision making model. Ruth Ben-Yashar and Miriam Krausz* Bar-Ilan University, Israel

Elementary Theory of Russian Roulette

On the Production of Homeland Security Under True Uncertainty

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

ODBC. Getting Started With Sage Timberline Office ODBC

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

On Formula to Compute Primes. and the n th Prime

SOLID MECHANICS DYNAMICS TUTORIAL DAMPED VIBRATIONS. On completion of this tutorial you should be able to do the following.

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

BINOMIAL EXPANSIONS In this section. Some Examples. Obtaining the Coefficients

over an MC-MOmni network when has a capacity gain of 2π θ

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

PUBLIC RELATIONS PROJECT 2016

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Chapter 5: Inner Product Spaces

Lecture 2: Karger s Min Cut Algorithm

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

Sequences and Series

Business Rules-Driven SOA. A Framework for Multi-Tenant Cloud Computing

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Predictive Modeling Data. in the ACT Electronic Student Record

Factors of sums of powers of binomial coefficients

Comparing Credit Card Finance Charges

Domain 1: Designing a SQL Server Instance and a Database Solution

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

How To Solve The Homewor Problem Beautifully

Section 11.3: The Integral Test

GOOD PRACTICE CHECKLIST FOR INTERPRETERS WORKING WITH DOMESTIC VIOLENCE SITUATIONS

How to read A Mutual Fund shareholder report

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

CCH Accountants Starter Pack

Strategic Remanufacturing Decision in a Supply Chain with an External Local Remanufacturer

Institute of Actuaries of India Subject CT1 Financial Mathematics

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

CHAPTER 3 DIGITAL CODING OF SIGNALS

Supply Chain Network Design with Preferential Tariff under Economic Partnership Agreement

My first gold holdings. My first bank. Simple. Transparent. Individual. Our investment solutions for clients abroad.

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Output Analysis (2, Chapters 10 &11 Law)

Biology 171L Environment and Ecology Lab Lab 2: Descriptive Statistics, Presenting Data and Graphing Relationships

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Chapter 7: Confidence Interval and Sample Size

IT Support n n support@premierchoiceinternet.com. 30 Day FREE Trial. IT Support from 8p/user

ANALYTICS. Insights that drive your business

5: Introduction to Estimation

A guide to School Employees' Well-Being

G r a d e. 2 M a t h e M a t i c s. statistics and Probability

CHAPTER 4: NET PRESENT VALUE

A GUIDE TO LEVEL 3 VALUE ADDED IN 2013 SCHOOL AND COLLEGE PERFORMANCE TABLES

How to use what you OWN to reduce what you OWE

Transcription:

Iteratioal Joural of Busiess ad Maagemet A Itelliget E-ommere Reommeder System Based o We Miig Zimig Zeg Shool of Iformatio Maagemet, Wuha Uiversity Wuha 43007, Chia E-mail: zmzeg1977@yahoo.om. The researh is suorted y MOE Projet of Humaities ad Soial Siee i Chiese Uiversity (O8JC870011) Astrat The roserity of e-ommere has haged the whole outlook of traditioal tradig ehavior. More ad more eole are willig to odut Iteret shoig. However, the massive rodut iformatio rovided y the Iteret Merhats auses the rolem of iformatio overload ad this will redues the ustomer s satisfatio ad iterests. To overome this rolem, a reommeder system ased o we miig is roosed i this aer. The system utilizes we miig tehiques to trae the ustomer s shoig ehavior ad lear his/her u-to-date referees adatively. The exerimets have ee oduted to evaluate its reommeder quality ad the results show that the system a give sesile reommedatios, ad is ale to hel ustomers save eormous time for Iteret shoig. Keywords: Reommeder system, We miig, E-ommere 1. Itrodutio owadays, the advae of Iteret ad We tehologies has otiuously oosted the roserity of e-ommere. Through the Iteret, differet merhats ad ustomers a ow easily iterat with eah other, ad the have their trasatios withi a seified time. However, the Iteret ifrastruture is ot the oly deisive fator to guaratee a suessful usiess i the eletroi market. With the otiuous develomet of eletroi ommere, it is ot easy for ustomers to selet merhats ad fid the most suitale roduts whe they are ofroted with the massive rodut iformatio i Iteret. I the whole shoig roess, ustomers still sed muh time to visit a floodig of retail shos o We sites, ad gather valuale iformatio y themselves. This roess is muh time-osumig, eve sometimes the otets of We doumet that ustomers rowse are othig to do with those that they eed ideed. So this will ievitaly ifluees ustomers ofidee ad iterests for shoig i Iteret. I order to rovide deisio suort for ustomers, oe way to overome the aove rolem is to develo itelliget reommedatio systems to rovide ersoalized iformatio servies. A reommedatio system is a valid mehaism to solve the rolem of iformatio overload i Iteret shoig. I the shoig wesites, the system a hel ustomers fid the most suitale roduts that they would like to uy y rovidig a list of reommeded roduts. For those roduts that ustomers uy frequetly, suh as groery, ooks ad lothes, the system a e develoed to reaso aout the ustomers ersoal referees y aalyzig their ersoal iformatio ad shoig reords, thus rodues the sesile reommedatios for them. Therefore, it is of imortae to develo the high effiiet learig algorithm to ature what ustomers eed ad hel them what to uy. To date, ollaorative filterig has ee kow to e the most suessful tehique i aalyzig the ustomer s shoig ehavior. Collaoratio filterig aims to idetify ustomers whose iterests are similar to those of the urret ustomer, ad reommed roduts that similar ustomers have liked. However, desite its suess, the widesread use of ollaoratio filterig has exosed some rolems, amog whih there are so-alled sarsity ad old-start rolems, resetively. I order to overome the limitatios of ollaoratio filterig, the reommeder system ased o we miig is roosed i the aer. It utilized a variety of data miig tehiques suh as we usage miig, assoiatio rule miig et. Based o these tehiques, the system a trae the ustomer s shoig ehavior ad lear his/her u-to-date referees adatively. Therefore, the aer is orgaized as follows. Setio rovides the details of the ersoalized reommeder system, with the reommeder roess relevat to the system. Setio 3 gives some exerimetal result aout the reommeder quality i our system, ad Setio 4 gives a overall summary. 10

Iteratioal Joural of Busiess ad Maagemet July, 009. The Persoalized Reommeder System.1 Overview of the reommeder roess The mai task of the reommeder system is to aquire the ustomers u-to-date referees usig we miig tehiques, i order to rovide deisio suort for their Iteret shoig. Figure 1 gives a overview of the ersoalized reommeder roess of the system. We oly selet some memer ustomers as the target ustomers for rovidig reommeder servies, osiderig the effiiey of the system ruig ad maiteae. The reommeder roess osists of three hases as show i figure 1. After eessary data leasig ad trasformed i the form usale i the system, target ustomer s referees are mied first i hase 1. I this hase, how to trae the ustomer s revious shoig ehavior effetively i the system is very imortat ad a e used to make referee aalysis. I hase, differet assoiatio rule sets are mied from the ustomer urhase dataase, itegrated ad used for disoverig rodut assoiatios etwee roduts. I hase 3, we use the math algorithm to math ustomer referees ad rodut assoiatios disovered i the revious two hases, so the reommedatio roduts list, omrisig the roduts with the highest sores, are retured to a give target ustomer.. Customer referee miig This roess alies the results of aalyzig referee iliatio of eah ustomer to make reommedatio. To ahieve this urose, the ustomer referee model is ostruted ased o the followig three geeral shoig stes i olie e-ommere sites. 1) lik-through: the lik o the hyerlik ad the view of the we age of the rodut. ) asket laemet: the laemet of the rodut i the shoig asket. 3) urhase: the urhase of the rodut omletio of a trasatio. A simle ut straightforward idea of miig the ustomer s referee is that the ustomer s referee a e measured y oly outig the umer of ourree of URLs maed to the rodut from lik stream of the ustomers. Aordig to three sequetial shoig stes, we a lassify all roduts ito four rodut grous suh as urhased roduts, roduts laed i the asket, roduts liked through oly, ad the other roduts. It is evidet to otai a referee order etwee roduts suh that {roduts ever liked} { roduts oly liked through} {roduts oly laed i the asket} {urhased roduts}. Suosig that is the total umer of ourree of lik through of ustomer i aross every rodut lass j. Likewise, ad are defied as the total umer of ourree of asket laemet ad urhases of ustomer i for roduts lass j, resetively., ad are alulated from the raw lik stream data as the sum over the give time eriod, ad so reflet idividual ustomer s ehaviors i the orresodig shoig roess over multile shoig visits. From the disussios aove, the ustomer referees a e aquired from the lik stream data ad exressed as the referee matrix C ), whih is deoted as follows: ( C 11 1 m1 1 m 1 m ( ) (1) m I formula (1), i 1,, M (total umer of target ustomers), ad j 1,, (total umer of rodut lasses). I order to aquire eah ustomer s referee aout eah rodut lass, matrix elemet should e omuted y formula (), whe osiderig the three shoig stes. () j1 ( ) j1 ( ) I the formula (),,, rereset the weight adjustig oeffiiet orresodig to the three shoig stes. It is evidet that the weights for eah shoig ste are ot the same. It is reasoale to assig the higher weight to the urhased roduts tha those of roduts oly laed i the asket. Similarly, the higher weight should e give to roduts laed i the asket tha those of roduts oly liked through. Therefore, we set 0.5, 0. 5, ad 1. I fat, the formula () reflets referee order amog roduts, ad hee it is the weighted sum of ourree 11 j1 ( )

Iteratioal Joural of Busiess ad Maagemet frequeies i differet shoig stes..3 Produt assoiatio miig I this hase, we disover valuale relatioshis amog differet roduts y miig assoiatio rules form the ustomer urhase trasatios. Similar to the referee miig roess, assoiatio rule miig is erformed at the level of the rodut lasses. Corresodig to three geeral shoig stes, the assoiatio rules a e geerated from three differet trasatio sets aordigly: urhase trasatio set, asket laemet trasatio set ad lik-through trasatio set. For eah trasatio set aquired from We logs, there are three hases to geerate assoiate rules: 1) Set miimum suort ad miimum ofidee; ) Relaig eah rodut i trasatio set with its orresodig rodut lasses; 3) Geeratig assoiatio rules for eah trasatio set usig Ariori. After assoiatio rules are geerated, the rodut assoiatio model a also e exressed y a matrix P ( ), i whih eah elemet reresets the assoiatio degree amog the rodut lasses i differet shoig ste. The matrix P ( ), i 1,, M (total umer of rodut lasses), j 1,, (total umer of rodut lasses) a e defies as the formula (3). 1.0 1.0 0.5 0.1 0 if i j if i j if i j if i j otherwise I the formula (3), the first oditio idiates that a urhase of a rodut i a rodut lass imlies a referee for other rodut withi the same rodut lass. The seod oditio idiates that the degree of assoiatio i the urhase ste is more related to the urhasig atter of ustomers tha those i the asket laemet, so the assoiatio degree for urhase a e set 1.0, whih is higher tha that for asket laemet. I the same maer, the assoiatio degree for asket laemet a e set 0.5, while the assoiatio degree for lik-through is set oly 0.1..4 Mathig algorithm for reommedatio I the reedig setios, we have uilt the model of ustomer referees ad rodut assoiatio defied y referee matrix ad rodut assoiatio matrix, resetively. The fial ste i the reommedatio roess is to sore eah rodut ad rodue the reommedatio rodut lists for a seifi ustomer. This sore should reflet the degree of similarity etwee the ustomer referees ad the rodut assoiatio. There are several methods to measure the similarity, iludig Pearso orrelatio, Eulidia distae, ad osie oeffiiet. I the system, we hose osie oeffiiet to measure the similarity. Hee, the mathig sore m etwee ustomer m ad rodut lass a e omuted as follows: (m) (withi same lasses) (withi urhase ste) (withi asket laemet ste) (3) (withi lik-through ste) mk k k 1 m (4) mk k k 1 k 1 I the formula (4), C is a row vetor of the M ustomer referee matrix C, ad P is a row vetor of the rodut assoiatio matrix P. Here, M refers the total umer of target ustomers ad deotes the (m) total umer of rodut lasses. So the mathig sore m rages from 0 to 1, where more similarity etwee C () ad P result i igger value. All roduts i the same rodut lasses have idetial mathig sores for a give target ustomer. However, eause mathig sores are omuted at the level of rodut lasses ut o at the rodut level, the sigle roduts must e hose ad reommeded to the target ustomer. I the system, the hose strategy is adoted that for all roduts i the same lasses, those roduts whih were urhased i the latest eriod would e assumed to e the most oular ad the more uyale roduts. Therefore, we use this hoie strategy to rovide the reommeder servies for the target ustomers. The whole mathig algorithm for reommedatio a e exressed as follows: () 1

Iteratioal Joural of Busiess ad Maagemet July, 009 Algorithm Reommeder_geeratio(): Iut: ustomer referee matrix C, rodut assoiatio matrix P Outut: reommeded rodut lists Begi 1: Set the umer of reommeded roduts as, the umer of reommeded rodut lasses as k, suh as k ad / k is a iteger; : Calulate the mathig sore m usig the formula (4); 3: Selet to- k rodut lasses with the highest m as reommeded rodut lasses; 4: for eah lass do 5: elet to- / k latest urhased roduts as the reommeded roduts to target ustomer; 6: ed for Ed 3. The exerimet Oe imortat issue for evaluatig the reommeder quality is the extet to whih reommedatios with higher reommeder sores are aeted referetially over reommedatios with lower sores. We address this issue y omarig the distriutio of sores omuted from the formula (4) for aeted reommedatios with the aalogous distriutio for offered reommedatios. The results are show i Figure. The sores for the aeted reommedatios are ased o 10 roduts aeted from 50 distit reommedatio lists. The distriutio for the offered reommedatios is take from aout 300 reommedatios made to the ustomers who aeted at least oe reommedatio durig the relimiary hase of system ruig. Figure shows that the sores of the aeted reommedatios are higher tha the sores of a large umer of offered reommedatios. For examle, 76 % of the roduts laed oto the reommedatios lists have sores elow 0.1, ut oly % of the aeted reommedatios fall i this lower sa. The mea sores for the offered reommedatios are 0.07, while the mea sores for the aeted reommedatios are 0.165. The differee etwee the two meas is 0.093, falls well withi the 95% ofidee iterval (0.089, 0.106) omutig usig t-test statistial method for the differee etwee meas. These results illustrate that the sore omuted usig the formula (4) is ideed a useful method of a reviously uought rodut s aeal to the target ustomer. 4. Colusios I this aer, we have develoed a rodut reommedatio system to rovide ersoalized iformatio servies i makig a suessful Iteret usiess. The harateristis of the system a e desried as follow. First, the ustomer referee ad rodut assoiatio are automatially mied from lik streams of ustomers. Seod, the mathig algorithm whih omies the ustomer referee ad rodut assoiatio is utilized to sore eah rodut ad rodue the reommeded rodut lists for a seifi ustomer. The future work will ilude omare the suggested methodology i our system with a stadard ollaorative filterig algorithm i the aset of uyig reisio ad other reommeder erformae. Referees Balaaovi M., & Shoham Y. (1997). Fa: otet-ased ollaorative reommedatio. Commuiatios of the ACM, 40(3): 66-7. Huag Z, Zeg D, Che HC. (007). A omariso of ollaorative-filterig reommedatio algorithms for e-ommere. IEEE Itelliget Systems, (5): 68-78. Lawaree R.D., Almasi,G.S., & Kotlyar V., et al. (001). Persoalizatio of Suermarket Produt Reommedatios. Data Miig ad Kowledge Disovery, 5(1-): 11-3. Lee, J., Podlasek, M., Shoerg, E., & Hoh, R. (001). Visualizatio ad aalysis of likstream data of olie stores for uderstadig we merhadisig. Data Miig ad Kowledge Disovery, 5(1-): 59-84. Li, W., Alvarez, S.A., & Ruiz, C. (00). Effiiet adative-suort assoiatio rule miig for reommeder systems. Data Miig ad Kowledge Disovery, 6(1), 83-105. Sarwar, B., Karyis, G., Kosta, J., & Riedl, J. (000). Aalysis of reommedatio algorithms for e-ommere. Proeedigs of ACM E-ommere Coferee. Sarwar, B., Karyis, G., Kosta, J., & Riedl, J. (001). Item-ased ollaorative filterig reommedatio algorithm. Proeedigs of the Teth Iteratioal World Wide We Coferee,.85-95. 13

Iteratioal Joural of Busiess ad Maagemet Figure 1. Overview of the reommeder roess of the system 1.0 0.8 Offered Reommedatios (Mea = 0.07) Aeted Reommedatios (Mea= 0.165) Fratio 0.6 0.4 0. 0.0 0.0 0.1 0. 0.3 0.4 0.5 0.6 Reommeder Sore Figure. Distriutio of sores for offered ad aeted reommedatios 14