A Framework of Business Intelligence-driven Data Mining for e-business



Similar documents
The impact of service-oriented architecture on the scheduling algorithm in cloud computing

Applications of Support Vector Machine Based on Boolean Kernel to Spam Filtering

APPENDIX III THE ENVELOPE PROPERTY

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

6.7 Network analysis Introduction. References - Network analysis. Topological analysis

Automated Event Registration System in Corporation

Projection model for Computer Network Security Evaluation with interval-valued intuitionistic fuzzy information. Qingxiang Li

Green Master based on MapReduce Cluster

An IG-RS-SVM classifier for analyzing reviews of E-commerce product

A Parallel Transmission Remote Backup System

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

of the relationship between time and the value of money.

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

Banking (Early Repayment of Housing Loans) Order,

A DISTRIBUTED REPUTATION BROKER FRAMEWORK FOR WEB SERVICE APPLICATIONS

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

A Study of Unrelated Parallel-Machine Scheduling with Deteriorating Maintenance Activities to Minimize the Total Completion Time

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

Suspicious Transaction Detection for Anti-Money Laundering

Study on prediction of network security situation based on fuzzy neutral network

A particle swarm optimization to vehicle routing problem with fuzzy demands

Proactive Detection of DDoS Attacks Utilizing k-nn Classifier in an Anti-DDos Framework

Numerical Methods with MS Excel

Fault Tree Analysis of Software Reliability Allocation

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

Dynamic Provisioning Modeling for Virtualized Multi-tier Applications in Cloud Data Center

Average Price Ratios

Discrete-Event Simulation of Network Systems Using Distributed Object Computing

Report 52 Fixed Maturity EUR Industrial Bond Funds

Maintenance Scheduling of Distribution System with Optimal Economy and Reliability

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

1. The Time Value of Money

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Fractal-Structured Karatsuba`s Algorithm for Binary Field Multiplication: FK

CHAPTER 2. Time Value of Money 6-1

Classic Problems at a Glance using the TVM Solver

AN ALGORITHM ABOUT PARTNER SELECTION PROBLEM ON CLOUD SERVICE PROVIDER BASED ON GENETIC

Optimization Model in Human Resource Management for Job Allocation in ICT Project

IP Network Topology Link Prediction Based on Improved Local Information Similarity Algorithm

Load Balancing Algorithm based Virtual Machine Dynamic Migration Scheme for Datacenter Application with Optical Networks

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

DECISION MAKING WITH THE OWA OPERATOR IN SPORT MANAGEMENT

10.5 Future Value and Present Value of a General Annuity Due

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

Common p-belief: The General Case

Application of Grey Relational Analysis in Computer Communication

Entropy-Based Link Analysis for Mining Web Informative Structures

Software Reliability Index Reasonable Allocation Based on UML

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Web Service Composition Optimization Based on Improved Artificial Bee Colony Algorithm

TESTING AND SECURITY IN DISTRIBUTED ECONOMETRIC APPLICATIONS REENGINEERING VIA SOFTWARE EVOLUTION

Performance Attribution. Methodology Overview

Speeding up k-means Clustering by Bootstrap Averaging

Fast, Secure Encryption for Indexing in a Column-Oriented DBMS

VIDEO REPLICA PLACEMENT STRATEGY FOR STORAGE CLOUD-BASED CDN

Developing a Fuzzy Search Engine Based on Fuzzy Ontology and Semantic Search

The Popularity Parameter in Unstructured P2P File Sharing Networks

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

Using Phase Swapping to Solve Load Phase Balancing by ADSCHNN in LV Distribution Network

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Load and Resistance Factor Design (LRFD)

The Digital Signature Scheme MQQ-SIG

Optimizing Software Effort Estimation Models Using Firefly Algorithm

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Software Aging Prediction based on Extreme Learning Machine

Cyber Journals: Multidisciplinary Journals in Science and Technology, Journal of Selected Areas in Telecommunications (JSAT), January Edition, 2011

The Application of Intuitionistic Fuzzy Set TOPSIS Method in Employee Performance Appraisal

Learning to Filter Spam A Comparison of a Naive Bayesian and a Memory-Based Approach 1

ROULETTE-TOURNAMENT SELECTION FOR SHRIMP DIET FORMULATION PROBLEM

On formula to compute primes and the n th prime

Simple Linear Regression

Bayesian Network Representation

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

Efficient Traceback of DoS Attacks using Small Worlds in MANET

Chapter Eight. f : R R

How To Balance Load On A Weght-Based Metadata Server Cluster

Robust Realtime Face Recognition And Tracking System

Time Series Forecasting by Using Hybrid. Models for Monthly Streamflow Data

Transcription:

A Framework of Busess Itellgece-drve Data Mg for e-busess Yag Hag, Smo Fog Dept. of Computer ad Iformato Scece Uversty of Macau Macau SAR ma76562@umac.mo, ccfog@umac.mo Abstract Ths paper proposes a data mg methodology called Busess Itellgece-drve Data Mg (BIdDM). It combes kowledge-drve data mg ad method-drve data mg, ad flls the gap betwee busess tellgece kowledge ad estet varous data mg methods e- Busess. BIdDM cotas two processes: a costructo process of a four-layer framework ad a data mg process. A methodology s establshed settg up the four-layer framework, whch s a mportat part BIdDM. A case study of B2C e-shop s provded to llustrate the use of BIdDM. mapulato of the data to dscover assocatos, patters, rules or fuctos. DM cotas three compoets: data capturg, data mg ad formato represetato. So far, however, most researches are focusg o ether algorthm mplemetato techcal layer or kowledge represetato busess layer. The two DM approaches however have bee appled separately may cases. Keywords-Busess tellgece; Data mg; BI-drve Data Mg I. INTRODUCTION e-busess has bee rapdly evolvg the last two decades. Itellget B2C recommeder, smart ole e- servce, kowledge-drve customer-relato-maagemet are some eamples of emerget e-servces o Iteret. Behd such e-servces, there are comple etworked computg systems ad tremedous amouts of data stored databases. Wth the growth of demad, stuatoal servce that s comprsed of two or more dsparate e-servces are requred, whch have bee combed to create a ew tegrated eperece. As such, e-servce tegrato evetually becomes a mash-up system terms of the e- Servce costructo chages. Busess Itellgece (BI) s a cocept of applyg a set of techologes to covert data to meagful formato [1]. BI methods clude formato retreval, data mg, statstcal aalyss as well as data vsualzato. Large amouts of data orgatg dfferet formats ad from dfferet sources ca be cosoldated ad coverted to key busess kowledge. Fgure 1 presets a geeral vew o how data are trasformed to busess tellgece. The process volves both busess eperts ad techcal eperts. It coverts a large scale of data to meagful outcomes so as to provde decso-makg support to ed users. O the other had, DM s a core process to trasform data to meagful patters. Rules ca be etracted from DM models heurstcally as output results. Tradtoally, there are two types of data mg approaches: verfcato-drve ad dscovery-drve [2]. The former oe s to purport some hypothetcal assocato or patter ad the eame the data to fd proof; whle the latter oe s relyg o sophstcated Fgure 1. Busess Itellgece Processg Busess Itellgece-drve Data Mg (BIdDM) s a ew data mg methodology combg kowledge-drve data mg ad method-drve data mg. It ams to propose a way to fll the gap betwee busess tellgece kowledge e-commerce ad curret varous data mg methods. BIdDM pursues a fleble way to mplemet data mg terms of busess requremet. It cotas two processes: four-layer framework costructo ad data mg. The four layers are kowledge layer, e-servce layer, method layer ad data layer. The compoets each layer are re-usable. The compoets ca be flebly added or dropped. II. BACKGROUND A. Kowledge Dscovery Proces The kowledge dscovery process (KDP), s costatly seekg ew kowledge applcato doma. It s defed as a otrval process of detfyg vald, ovel, potetally meagful, ad ultmately uderstadable patters data [4]. It cossts of may steps. Each step attempts to complete

a partcular dscovery task ad each accomplshed by the applcato of a dscovery method. KDP cocers about the followg steps: how data are stored ad accessed, how to use effcet algorthms to aalyze massve datasets, how to terpret ad vsualze results, ad how to model ad support teractos betwee huma ad mache. I geeral, BI ca be geerated drectly from DM methods ad the requred data. SEMMA ad CRISP-DM are two commo methodologes to mplemet data mg. SEMMA focuses o the dscoverg ad mg potetal meag from a data set; whle CRISP-DM proposes a project mplemetato for data mg process. However, tradtoal methodologes lke those requres substatal amout of huma terveto. B. Dstrbuted E-Servce Archtecture Fgure 2 shows a kd of ew eperece creato resulted from e-servce systems frequet teracto. Such e-servces are dstrbuted ad ther formato trasfer over the Iteret. Servce s o hghest layer - applcato layer aalogy to the ISO Seve-Layer Model [5]. Each e- Servce has ts ow backed structure o dowward layers, ether dstrbuted or cetralzed. Local database s provdg a storage medum for system to place the relevat formato, whch s the data source of KDP. Fgure 2. e-servce dstrbuted archtecture C. Dstrbuted Data Mg Dstrbuted Data Mg (DDM) s the mg of dstrbuted data sets, whch are stored dfferet local databases. They are hosted by local servers ad coected through a computer etwork. The local DM fdgs are composed to ga a global result [11]. DDM desg follows a bottom-up approach, whch relates to data fragmetato, replcato, commucato ad tegrato tasks [6]. However DDM faces some challeges that clude applyg the dscrete output ad tellget algorthms to obta meagful results wthout too much computato cost; esurg t s capable of workg wth cotual ad categorcal varable symmetrcally; ad the commucato asychroous problem etwork. Uder the e-servce mash-up s dstrbuted archtecture, DM over the e-servces becomes dstrbuted as well. There have bee may researches o DDM, for eample, dstrbuted classfcato ad dstrbuted clusterg [6]. However, data are dstrbuted dfferet formats, osy data mpede o DDM systems. D. Method-drve ad Kowledge-drve Data Mg Method-drve DM s a commo method o data mg owadays. It works o developg ad applyg algorthms to model ad me data. Utl ow, there have bee a large of studes establshed o the bascs of mg algorthms, both for cetralzed ad dstrbuted data mg. For eample, k-meas clusterg algorthms are developed to recover clusters that are hyper-sphercal shape [7]. Boosted regresso tree [8] s used to model data fttg small regresso trees to cases. As a result, may papers publshed the research commuty dscuss performace ssues ad merts of some partcular algorthms. Kowledge-drve DM advocates that the atteto data mg s drected from the type of kowledge that s deemed useful rather tha cocetratg o the algorthms ad techques [9]. It suggests that DM eeds to make better use of kowledge to acheve a better outcome tha t does ow. It frst determes what ssues are aalyzed, what kowledge s desred, the to assocate wth the patter ad tred dscovery, etc. Method-drve DM s mplemeted o a bottom-up desg approach. It cruches o data frstly, ad uses mg algorthms to dscover patters, the try to make sese out of them. Kowledge-drve DM s relyg o the kowledge from eperts relevat felds durg modelg ad mg. Oe commo method s rpple-dow rules (RDR) [10]. Over tme the classfcato of RDR ad other eperts coverge where there wll be a hgh layer of agreemet betwee them. The gap betwee method-drve ad kowledge-drve data mg brgs a great challege BI applcatos. III. LITERATURE REVIEW The gap betwee the felds of busess ad computg techques poses a obstacle busess tellgece applcato. Isghts to ths challege that users ad developers face eterprse wde busess tellgece system are descrbed [14]. It detfes fve potetal flueces that cocers wth user empowermet, trag, data terpretato, supportg for usage ad egotatg authorshp. It summares each terwove has ts ow advatages ad dsadvatages o ether ed user s or IS developer s vews. Busess metadata s proposed paper [15]. It mproves data terpretato by eplag the relevace ad cotet of the data. Ths study targets the data estet eterprse data warehouse (DWH). The metadata s derved from the use of model weavg [16]. However, some busess tellgece comes from the data warehouse va data mg algorthm. DWH oly provdes a orgazed place for data storage. By DWH aloe, t s mpossble to geerate BI. Paper [3] presets a otology-based archtecture to trasforms data to comprehesble model busess model by usg sematc mddleware tegrato. It mplemets drectly o data trasacto of a dvdual system, ad provdes a vew cosumable by busess users. But ts four-layer archtecture oly volves query-admappg process betwee data ad ed users, whch caot be see as real KDP. KDP cocers the etre kowledge abstract process from data to kowledge.

IV. BI-DRIVEN DATA MINING We proposed BIdDM as a fleble DM methodology, whch ca combe ad re-use compoets from estet DM methodology to gude the creato of BI. It releves huma assstace gudg the data mg process. Sce a e-servce layer s corporated betwee kowledge ad data mg method, kowledge-drve ad method-drve data mg ca be combed as a tegrated process. A. Overall Process BIdDM cotas two processes: framework costructo ad data mg (Fgure 3). Framework costructo s the process to establsh a four-layer framework, whch s a topdow approach from BI to data. Data mg process s a bottom-up approach dscoverg the potetal kowledge from estet data sources. But here, oce the user chooses a BI as prospectve outcome, a set of approprate data mg techques ad the process are selected automatcally correspodg to the BI, eve for ovce data mg users. C. Data Mg Process Whe the four-layer framework s establshed, oe or several partcular DM algorthms are tyg to the selected BI. I ths sese, the data mg methods are reusable as they ca be ted to more tha oe BI. The data mg outcomes are labeled as BI Combo, whch assembles to geerate ew busess kowledge. The process s show Fgure 5. Fgure 5. BI-drve Data Mg Process Flowchart V. FOUR-LAYER FRAMEWORK BIdDM s a methodology bult upo the four-layer framework whch s show Fgure 6. Our framework s eteded from the four-layer archtecture [3]. It s appled for busess tellgece derved from data mg methods, but oly as a query-ad-mappg data process. Fgure 3. Overall Process Flowchart B. Framework Costructo Process Framework costructo s the process of establshg the four-layer framework as show Fgure 4. As a topdow process, the top level s to uderstad the busess ad whch e-servce relates to the busess, ad what data source relates to the e-servce. Hece t proceeds to resemblg a set of sutable data mg methods based o uderstadg. It loops utl all BI relatg to a e-servce have bee foud. Fgure 6. Four-layer BI-drve Data Mg Fgure 4. Ifrastructure Costructo Flowchart A. Kowledge Layer Ths layer s o top terfacg wth ed-users. It s referred as a kowledge or busess decso-makg support. BI combo reflects the outcome of kowledge dscovery. Busess maagers ca use BI fdgs to make strategc busess decso. However, the ed user does ot have to kow much about the techcal detals o the kowledge dscovery system. Algorthms ad data are selected as guded by the e-servce mash-up BdDM. Ths layer harvests busess tellgece from patters that are derved from e- Busess data.

Ths kowledge layer s adherg closely to the e- Busess applcato that dctates the BI goals. It ca be dvded to several domas e-busess, such as e- paymet, e-logstc, e-procuremet, e-shoppg, e- commuty, or vrtually aythg that utlzes oe or more e- Servces. Oe BI may correspod to more tha oe e-servce. For eample, f the epectg BI s about cosumers browsg ad shoppg behavors, the records of customers partculars, ole requests ad shoppg behavors are requred from several e-servce compoets [12]. There are two types of e-servce systems ths layer. It s classfed by the usage of BI. See from Fgure 7, some e- Servces, such as e-servce 1 ad e-servce 2 are solated to the others oly belogg to BI 1 ; whle e-servce 2 s a cross-bi e-servce belogg to BI 2 ad BI 3. The smlar relatoshps also appear method layer ad data layer. Fgure 7. IES ad CES B. Method Layer Ths layer cotas data mg algorthms, whch s used to trasform data to some meagful epresso. There are varous algorthms that would be selected to use, oe or several algorthms assembled to dscover potetal meagful outcome. So far, a umber of algorthms have bee proposed as method-drve data mg. For eample, mg the frequet patters from web logs ca dscover the avgatoal behavors of users [13]. Also, there are two types of data mg ths layer. It s classfed by the usage of e-servce. Smlar wth the e- Servce layer, some data mg algorthms are solated to the others oly belogg to oe e-servce; whle some are cross-servce data mg belogg to dfferet e-servces. C. Data Layer Ths layer s the bottom oe. It s resposble for provdg data source of kowledge dscovery. The data source has bee preprocessed, whch meas trasformg the raw data to cleaed data. The useful data come from raw data sources wth the ose ad cossteces fltered. Some BI may eed cross-servce data mg techques. For ths reaso, e-servce mash-up system s data mg s usually dstrbuted data mg. It s oted that dstrbuted data mg mplemetato s more dffcult tha cetralzed data mg because of sychroous ad asychroous problems. VI. DEFINITIONS AND EXPRESSIONS A. Process Epresso The four-layer framework s mplemeted a top-dow approach. Oe certa kowledge e-commerce s derved from a set of BI combo. Each BI s captalzed from some data mg methods (algorthms) used e-servce. To mplemet the data mg algorthms, useful data s requred as mg sources. Ths process s epressed ferece logc as (1). 1 KNOW ( BI) KNOWLEDGE (1) eservice( METHOD) BI ALGORITHM ( Data) METHOD B. Compoet Costrat The defed framework compoets has two costrats Formula(2): every data mg algorthm the project shall be appled at least oe e-servce to geerate busess tellgece;2 ad every data source the project shall be appled at least oe data mg algorthm to geerate busess tellgece e-servce. ( Method ( Data EServce BI ( Method, EServce)) Method EServce( Data, Method)) C. Abbrevatos Here we epla the abbrevatos used four-layer framework compoets: Kow(BI): kows from BI ; BI(ES): BI comes from e-servce ; ES(M): e-servce relates to data mg algorthm ; M(D): data mg algorthm relates to data set ; ES: e-servce CES: cross-bi e-servce; IES: solated e-servce for BI; M: data mg algorthm CM: cross-e-servce data mg algorthm; IM: solated data mg algorthm for e-servce; D: data source; CD: cross-algorthm data source; ID: solated data source for data mg algorthm D. Costruct Kowledge Layer Kowledge layer s defed as logcal epresso Formula(3). Kow(BI ) s the tellgece kow from BI. BI s the set of all busess tellgece estg correspodg to a BI combo. If a set of BI s kow to reflect oe kd of busess kowledge whle there s o ay other BI to reflect ths kowledge, the the compoets ths BI set are busess tellgece kowledge layer. BI= BI BI { 1 2 BI, BI,..., BI } BI Kow( BI ) Kowledge Kowledge BI outcome s derved from oe or more e-servces. If the e-servce(s) s oly correspodg to ths BI, t s a IES. If the e-servce(s) belogs to oe or more other BIs, t s a CES. Oe BI s derved from all of ts IES ad CES. (2) (3)

E. Costruct e-servce Layer e-servce layer s defed as logcal epresso Formula(4). ES s a set of e-servces. CES s defed as a set of cross-bi e-servce belogg to BI ; whle IES s a set of solated e-servce belogg to BI. There s o tersecto betwee IES ad CES, whle IES ad CES are cosstg of all e-servce compoets of ths layer. Ihertg from Formula(3), the set BI cotas all busess tellgece relatg to oe type of busess kowledge. If every BI belogs to BI, the all ts CES ad IES relate to the kowledge represetato. ES = { ES ES ES ES 1, 2, 3,..., } CES IES = ES CES IES = φ BI ( ES ) BI ES BI ES CES ( BI ( ES ) BI ES BI ) ES IES BI ( CES IES ) = BI BI BI Kowledge // mportform ula(3) ( CES IES ) Kowledge F. Costruct Method Layer Method layer s defed as logcal epresso Formula(5). M s a set of dfferet data mg methods, such as clusterg, classfyg, etc. CM s defed as a crosse-servce method whle IM s a solated method. There s o tersecto betwee IM ad CM, whle IM ad CM are cosstg of all algorthm compoets of ths layer. Ihertg the costrat of IES ad CES from Formula(4), both of them hert the characters belogg to ES. The set ES cotas all data mg algorthms relatg to a e- Servce. If every ES belogs to a type of kowledge the correspodg the e-servce reflects, the all the relevat algorthms of ES lead to the kowledge represetato. M = { M M M M 1, 2, 3,..., } CM IM = M CM IM = φ ES( M ) ES M ES M CM ( ES( M ) ES M ES ) M IM ES( CM IM ) = ES CES IES = ES // mportformula(4) CES IES = φ // mportformula(4) CES IES ES ES ( CES IES ) Kowledge // mportformula(4) (( CES ( CM = ( CES ( CM IM )) U ( IES( CM IM )) U ( IES( CM IM (4) IM ))) Kowledge )) (5) G. Costruct Data Layer Data layer s defed as logcal epresso Formula(6). D s a set of dfferet clea data (D) data sources. Isolated data (ID) ad cross-method data (CD) est ths layer. There s o tersecto betwee ID ad CD, whle ID ad CD are cosstg as all data compoets of ths layer. Ihertg the costrat of IM ad CM from Formula(5), both of them hert the characters belogg to M. The set M cotas all data sets relatg to a data mg algorthm. If every M belogs to a type of kowledge the correspodg the algorthm reflects, the all the relevat data sets M lead to the kowledge represetato. D = { D D D D 1, 2, 3,..., } ID CD = D ID CD = φ M ( D ) M D M D CD ( M ( D ) M D M ) D ID M ( CD ID ) = M CM IM = M // mportformula(5) CM IM = φ // mportformula(5) CM IM = ( CM ( CD ID ) IM ( CD ID )) M M (( CES ( CM IM )) U ( IES( CM IM ))) Kowledge // mportformula(5) (( CES (( CM ( CD ID ) IM ( CD ID )))) U ( IES(( CM ( CD ID ) IM ( CD ID ))))) Kowledge VII. CASE STUDY I ths secto, we gve a eample to demostrate how to establsh BIdDM framework a B2C e-shop. A. e-servce ad Data Uderstadg A eample busess model that we used to llustrate BIdDM s a B2C e-shop. Ths e-shop cotas fve prmary e-servces the whole busess cycle as show Fgure 8. Fgure 8. Sample e-servces of B2C e-shop e-portal: a webste that provdes e-commerce to customers. It wll geerate web log data o the web server. These data represet customers avgatg actvtes. e-catalog: a lst of products, cludg products prces, troducto ad other relevat formato. By referecg (6)

to ths formato, customers are able to decde whch product to purchase. e-paymet: a ole trasacto module to fulfll paymet. The process usually relates to three ettes: customers, e-shop ad paymet gateway. I ths B2C eample, e-shop s servce oly covers the trasactos betwee customers ad e-shop. e-logstc: a fulfllmet process of delverg goods to customer. Ths process s usually doe by thrd-party logstc provders. Utl the product s delvered to the customer, a sellg process s ot completed. e-support: a post-sales e-servce that provdes relevat customer supports, such as sales records, complats, goodsretured ad servcg, etc. B. Methods Uderstadg As eamples the case study, we choose three kds of popular data mg algorthms: clusterg, assocato rule ad classfyg. Parttog aroud medods (PAM) algorthm of clusterg mg: a represetatve object s chose for every group. Oce every medods are chose, other o-medods wll be throw to a certa group accordg to the smlarty, ad the smlarty s the Eucldea dstace betwee ay two objects [17]. It s a smple ad effcet cluster algorthm to put the most smlar data to the same group [18]. FP-Tree of assocato rule mg: t s a classc method to fd assocato rules that satsfy the predefed mmum support ad cofdece from gve data sources. FP-Tree s a frequet patter mg. Ths algorthm ca reduce the umber of passes over the data source. It rus two processes: costructg FT-Tree, ad geerate frequet patters from FT-Tree [19, 20]. Sequece mg algorthm of structure data mg: Sequece mg s cocered wth fdg statstcally relevat patters betwee data eamples where the values are delvered a sequece. It s usually presumed that the values are dscrete, ad thus tme seres mg s closely related, but usually cosdered a dfferet actvty. It s a specal case of structured data mg [21]. Nave Bayes algorthm of statstcal classfcato: Bayesa classfers assg the most lkely class to a gve eample descrbed by ts feature vector. It has prove effectve much practcal applcatos such as tet classfcato, medcal dagoss etc [22]. C. Prospectve Busess Itellgece Through certa e-servces ad data mg algorthms, some busess tellgece result wll be obtaed from the e- Shop eample. They are of mportace customer relatoshp maagemet (CRM), provdg potetal sght for maager to make busess strateges. Customer avgatoal behavor: web log s the data source stored o e-shop s WWW-server. It records all teractos betwee e-portal ad customers. Usg methods of web usage mg, the most frequet vsted pages of e- Portal are gaed by FP-Tree assocato rule ad Sequetal mg algorthm from web logs [13]. Customer ole shoppg habt: e-paymet servce provdes some alteratve choces for customers to pay the blls. Also, they are requred to choose a logstc method for delvery. Usg the Nave Bayes algorthm, ole shoppers group themselves by ther paymet formato ad logstcs prefereces. By dfferet ole shoppg habts, promoto packages ca be prepared for dstct customer groups. Cosumer s potetal bought products: e-catalog cotas the product s relevat troducto. Accordg to the characters of the smlar group s formato, dvde ths data to two groups: the smlar ad dssmlar; the, use the method proposed [18] wll predct potetal products that the curret customer may probably buy. D. Four-layer Framework Costructo A four-layer framework of BI-drve Data Mg s costructed ( Fgure 9) for e-servce of the B2C e-shop busess model. Ths s a top-dow approach, whch starts from choosg BI kowledge layer, defes the relevat e- Servce, ad the detfes sutable data mg algorthms ad useful data sources. Each le represets a depedet relatoshp betwee compoets across two layers. I kowledge-layer, these three compoets compose oe BI combo that wll brg kowledge of customers shoppg behavor to ed-user to mprove CRM strateges. I e-servce layer, the solated-bi e-servces (IES) are e- Portal, e-paymet, e-logstc ad e-support, whle e-catalog s cross-bi e-servce (CES) ths eample. e-portal s oly relatg to BI of customers avgatoal patters. So e- Portal s IES. e-paymet ad e-logstc are oly relatg to BI of e-shoppg habts, so they are IES too. e-support s oly relatg to BI of potetally purchase products so t s also IES. e-catalog s eablg BI of both ole shoppg habt ad potetally purchase products, hece t s CES. I method layer, the solated-eservce data mg algorthms (IM) are FP-Tree, sequece mg, whle Naïve Bayes ad PAM are cross-eservce methods (CM). IM oly relates to oe e-servce, whle CM relates to more tha oe e-servce the four-layer framework archtecture. I data layer, the solated-method data sources (ID) are paymet optos, delvery methods ad sales records, whle web log fles ad products formato are cross-method data sources (CD). ID oly relates to oe data mg algorthm, whle CD relates to more tha oe data mg. Fgure 9. Four-Layer Framework of Sample e-shop

VIII. CONCLUSION AND FUTURE WORK BIdDM s proposed ths paper that s useful for e- Busess to obta busess tellgece through some guded data mg methods by detfyg the related e-servces. All the elemets requred uder BIdDM for costructg a four-layer framework are show Fgure 10. Fgure 10. Elemets uder BIdDM BIdDM represets a ew data mg methodology combg kowledge-drve data mg ad methoddrve data mg. Whe the framework s establshed, busess tellgece dscovery wll brg potetal meag to ed-user guded fasho because data mg methods are predefed ad pre-detfed to each e-servce whch tur cotrbutes to certa types of BI. A B2C e-shop case study s gve for demostratg how such a framework ca be costructed. For future work, we would test BIdDM ad costruct more detaled eamples from may other e- Busess scearos such as govermet servces ad B2B models. REFERENCES [1] H. P. Luh. "A Busess Itellgece System" (PDF). IBM Joural,October 1958. [2] Smouds, E., Realty check for data mg, IEEE Epert: Itellget Systems ad Ther Applcatos, Volume 11 Issue 5, pp 26-33, Oct. 1996. [3] Spah, M., Kleb, J., Grmm, S., ad Schedl, S. Supportg busess tellgece by provdg otology-based ed-user formato selfservce. I Proceedgs of the Frst teratoal Workshop o otology-supported Busess tellgece (OBI '08), vol. 308. ACM, New York, NY, pp. 1-12, 2008. [4] K. J. Cos, W. Pedrycz, ad R. M. Swarsk. Data Mg: A Kowledge Dscovery Approach., Chapter 2, pp. 9-24, Sprger Press, 2007. [5] Zmmerma, H., OSI Referece Model--The ISO Model of Archtecture for Ope Systems Itercoecto, IEEE Trasactos o Commucatos, Volume 28, Issue 4, pp 425-432, 1980 [6] Fu Y., Dstrbuted Data Mg: A overvew. Newsletter of the IEEE Techcal Commttee o Dstrbuted Processg, pp 5-9, Sprg, 2001. [7] Mu-Chu Su, M-C; ad Chou, C-H. A Modfed Verso of the K- Meas Algorthm wth a Dstace Based o Cluster Symmetry. IEEE Trasactos o Patter Aalyss ad Mache Itellgece, 23, pp 674-680, 2001. [8] Brema, L. Radom Forests. Mache Learg Joural,45, pp 5-32, 2001. [9] Graco, W., Semeova, T., ad Dubossarsky, E. Toward kowledgedrve data mg. I Proceedgs of the 2007 teratoal Workshop o Doma Drve Data Mg. DDDM '07. ACM, New York, NY, pp 49-54. 2007. [10] S. Chakrabart, Data mg for hypertet: A tutoral survey. SIGKDD: SIGKDD Eploratos: Newsletter of the Specal Iterest Group (SIG) o Kowledge Dscovery ad Data Mg, ACM, Vol. 1, No. 2, pp. 1-11, 2000 [11] Tbor S., Iveta Z., Leka L., "Dstrbuted data mg ad data warehouse", XXX.ASR 2005 Semar, Istrumets ad Cotrol,pp 417-420, Aprl.2005 [12] Ta Che; Zh-geg Pa; Ja-mg Zheg, "EasyMall - A Iteractve Vrtual Shoppg System," Fuzzy Systems ad Kowledge Dscovery, 2008. FSKD '08. Ffth Iteratoal Coferece o, vol.4, o., pp.669-673, 18-20 Oct. 2008 [13] Reata Ivacsy, Istva Vajk, Frequet Patter Mg Web Log, 2006 Acta Polytechca Hugarca, Vol 3, No.1, pp 77-90, 2006. [14] B. Pgto, B. L. B, ad E. Fereley. Too Much of a Good Thg? A Feld Study of Challeges Busess Itellgece Eabled Eterprse System Evromets. I Proceedgs of the 15th Europea Coferece o Iformato Systems, pages 1941--1952, 2007. [15] V. Stefaov ad B. Lst. Eplag Data Warehouse Data to Busess Users - A Model-Based Approach to Busess Metadata. I Proceedgs of the 15th Europea Coferece o Iformato Systems, pages 2062--2073, 2007. [16] Breto E. ad Bézv J. Weavg Defto ad Eecuto Aspects of Process Meta-models. I 35th Hawa It. Cof. o System Sceces, pp 290, 2002 [17] L. Kaufma ad P. J. Rousseeuw, Fdg Groups Data: a Itroducto to Cluster Aalyss, Joh Wley & Sos, 1990. [18] Qgzhag Che; Jaghog Ha; Yuqg Chu; Xaodog Yg, "Mg Cosumers' Most Adaptve Products by Effcet Clusterg Algorthm," Artfcal Realty ad Telestece--Workshops, 2006. ICAT '06. 16th Iteratoal Coferece o, vol., o., pp.195-199, Nov. 29 2006-Dec. 1 2006. [19] S. Kotsats, D. Kaellopoulos, Assocato Rules Mg: A Recet Overvew, GESTS Iteratoal Trasactos o Computer Scece ad Egeerg, Vol.32, Issue 1, pp. 71 82, 2006. [20] Ha, J. ad Pe, J. Mg frequet patters by patter-growth: methodology ad mplcatos. ACM SIGKDD Eploratos Newsletter 2, 2, pp.14-20, 2000. [21] Pe, J., Ha, J., Mortazav-Asl, B., ad Zhu, H. Mg Access Patters Effcetly from Web Logs. I Proceedgs of the 4th Pacfc-Asa Coferece o Kowledge Dscovery ad Data Mg, Curret Issues ad New Applcatos (Aprl 18-20, 2000). T. Terao, H. Lu, ad A. L. Che, Eds. Lecture Notes I Computer Scece, vol. 1805. Sprger-Verlag, Lodo, pp.396-407,2000. [22] Domgos, P. ad Pazza, M. O the Optmalty of the Smple Bayesa Classfer uder Zero-Oe Loss. Mach. Lear. 29, 2-3, pp.103-130, 1997.