Mut-agent System for Custom Reatonshp Management wth SVMs oo Yanshan Xao, Bo Lu, 3, Dan Luo, and Longbng Cao Guangzhou Asan Games Organzng Commttee, Guangzhou 5063, P.R. Chna Facuty of Informaton echnoogy, Unversty of technoogy Sydney, PO Box 3, Broadway, NSW 007, Austraa 3 Coege of Computer Scence and Engneerng, South Chna Unversty of echnoogy, Guangzhou 50640, P.R. Chna xaoyanshan@gma.com; csbu@gma.com; duo@t.uts.edu.au; bcao@t.uts.edu.au Abstract. Dstrbuted data mnng n the CRM s to earn avaabe knowedge from the customer reatonshp so as to nstruct the strategc behavor. In order to resove the CRM n dstrbuted data mnng, ths paper proposes the archtecture of dstrbuted data mnng for CRM, and then utzes the support vector machne too to separate the customs nto severa casses and manage them. In the end, the practca experments about one Chnese company are conducted to show the good performance of the proposed approach. Introducton Dstrbuted Data Mnng (DDM) ams at extractng usefu nformaton from dstrbuted heterogeneous data bases. [] A ot of modern appcatons fa nto the category of systems that need DDM supportng dstrbuted decson makng. Appcatons can be of dfferent natures and from dfferent scopes []. Customer reatonshp management (CRM) has ganed ts mportance roe n the marketng decsons strateges [3], whch amng at earnng avaabe knowedge from the customer reatonshp by machne earnng or statstca method so as to nstruct the strategc behavor. In the methods for CRM, machne earnng s a power too for ths probem [4], and was wdey used n the correspondng feds. Network [5] as a popuar method n the machne earnng whch based on the experenta rsk mnmzatons (ERM) s a popuar agorthm for CRM. However, some drawbacks exst n the agorthm, as the sze of tranng set ncrease, tranng tme become sharpy ong [6]. In addton, because network s based on ERM, the performance of the approach s poor [7]. In the recent years, support vector machnes (SVMs) have been ntroduced for sovng pattern recognton probems because of ther superor performance [8]. he SVMs are deveoped based on the dea of structura rsk mnmzatons (SRM), whch guarantee the good genera performance of the method. In the approach, one need map the orgna data nto the feature space and one constructs an optma separatng hyperpane wth maxma margn n ths space. N.. Nguyen et a. (Eds.): KES-AMSA 008, LNAI 4953, pp. 333 340, 008. Sprnger-Verag Bern Hedeberg 008
334 Y. Xao et a. In order to resove the CRM n dstrbuted data mnng for CRM, ths paper proposes the archtecture of dstrbuted data mnng for CRM and utzes the support vector machne too to separate the customs nto severa casses. In the end, the practca experments are conducted to verty the performance of the approach. he rest of the paper s organzed as foows. Secton gves the archtecture of dstrbuted data mnng for CRM, the support vector machne too s revewed n Secton 3, Secton 4 presents the practca experments and the concusons are gven n Secton 5. Archtecture of Dstrbuted Data Mnng for CRM We utze the mut-agent system (MAS) archtectures for the CRM n DDM and dstrbuted cassfcaton (DC) systems, whch s demonstrated n Fg.. he man functons of them are as foows: DC Agent Agent Agent 3 Agent n MAS Agent Agent Agent 3 Agent n KDD Master Data Source Learnng rue Cassfer Predct Fg.. Archtecture of Dstrbuted Data Mnng for CRM KDD Master: coect the sampes from the agents, and determne what knd the exampes shoud be stored and ftrate the nose sampes. Data source (): Partcpates n the dstrbuted desgn of the consstent shared by DDM and DC MAS components of the appcaton ontoogy; (): Coaborates wth machne earnng toos n the tranng and testng procedures; (3): Provdes gateway to databases through performng transformaton of queres from the anguage used nontoogy nto SQL anguage. 3 Learnng rue: through the machne earnng toos to obtan the rues of the exampes and form the cassfer for future predct. 4 Cassfer: ths the core of the DC, whch s used to predct the exampes from the agent and gve the correspondng abes for decson makng. In ths paper, we w focus on how to determne the cassfer by utzng the machne earnng toos. After anayze the data n the data source, we can spt
Mut-agent System for Custom Reatonshp Management wth SVMs oo 335 them nto the tranng dataset and testng dataset. he tran dataset s used to extract the rues and form the cassfer, the testng dataset s utzed to verty the performance of the cassfer. Because support vector machnes (SVMs) have been ntroduced for sovng pattern recognton probems because of ther superor performance, whch s deveoped based on the dea of structura rsk mnmzatons (SRM), we w adopt ths technoogy to form the cassfer. 3 Support Vector Machne oo Let S = {( x, y),( x, y), K,( x, y )} be a tranng set, where x are m- dmensona attrbute vectors, y { +, }. he SVMs cassfer s defned as foows: D( x ) = w Φ( x) + b = 0. () where Φ (x) s a mappng functon, w s an m-dmensona vector and b s a scaar. o separate the data neary separabe n the feature space, the decson functon satsfes the foowng condton: y ( w Φ( x ) + b) for =, L,. () Among a the separatng hyperpanes, the optma separatng hyperpane whch has the maxma margn between two casses can be formed as foows: mn J ( w, b) w b w w, = (3) subject to (). If the tranng data are nonneary separabe, sack varabes ξ s ntroduced nto () to reax the hard margn constrants as foows: y ( w Φ( ) + b) ξ for =, L,, x (4) ξ 0 for =, L,. (5) hs technque aows the possbty of havng exampes that voate (). In order to obtan the optma separatng hyperpane, we shoud mnmze mn J ( w, b, ξ ) = w w + C w, b, ξ = subject to (4) and (5), where C s a parameter whch determnes the tradeoff between the maxmum margn and the mnmum cassfcaton error. ξ (6)
336 Y. Xao et a. he optmzaton probem of (6) s a convex quadratc program whch can be soved usng Lagrange mutper method. By ntroducng Lagrange mutpers α and β ( =,, L, ) L( w, b, ξ, α, β ) = J( w, b, ξ ), one can construct the Lagrangan functon as foows: = β ξ. = α { y [ w Φ( x ) + b] + ξ } Accordng to the Kuhn-ucker theorem, the souton of the optmzaton probem s gven by the sadde pont of the Lagrangan functon and can be shown to have an expanson: w = = (7) α Φ( x ), (8) y the tranng exampes ( x, y ) wth nonzero coeffcents α are caed support vectors. he coeffcents α can be found by sovng the foowng probem: subject to max α = j= = y y j ( Φ( x ) Φ( x j )) α α j + α, (9) = α y = 0, =,, L,. (0) 0 α C, =,, L,. () By substtutng (8) nto (), the cassfer can be obtaned. Gven a new nput x, f (x) can be estmated usng (). If f ( x) > 0, the sampe beongs to Cass ; on the contrary, t beongs to Cass. where f ( x) = sgn{ α y ( Φ( x ) Φ( x) + b}. () sgn( = ) = 0 x > 0. x 0 x (3)
Mut-agent System for Custom Reatonshp Management wth SVMs oo 337 In the (), the parwse nner product n the feature space can be computed from the orgna data tems usng a kerne functon, and the kerne functon can be denoted by K ( x, x ) = Φ( x) Φ( x ) (4) In ths way, f (x) can be rewrtten as foows: f ( x) = sgn{ α y K( x, x) + b}. (5) = As for mut-cass cassfcaton, they are usuay converted nto bnary ones, Oneaganst-A and One-aganst-One schemes [8, 9] are preferred methods. 4 Practca Experments about Chnese Company For the one company n chna, we coect the data sampes from a ts branches a over the chna, whch can be consdered as the agents of the system, and then the sampes are stored n the data source. A the dataset conssts of 6 data ponts wth 4 features and casses, whch are sted n the appendx. he attrbutes of the dataset s as foows: the age and educaton of the empoyee, the producton eve and the dstrbuton area. In order to obtan the cassfer to predct n the future, we seect randomy 6 sampes to form the tranng set and the remanng as the testng dataset. he experments are run on a PC wth a.8ghz Pentum IV processor and a maxmum of 5MB memory. he program s wrtten n C++, usng Mcrosoft s Vsua C++ 6.0 comper. he support vector machne too s used to bud the cassfer and an RBF kerne functon s empoyed as foows: σ K( x, x ) = exp( x x / ) (6) For the hyperparameters, we estmate the generazed accuracy usng dfferent 3 3 6 kerne parameters σ and cost parameters C : σ = [,,, L, ] and 3 4 C = [,,, L, ]. We have to try 00 combnatons to seect the better hyperparameters. he resuts of computaton are sted n the abe, whch nvoves ten fgures. Each fgure descrbes the tranng and testng accuraces of the gven probem when fxng the parameter C and et the parameter σ vary n the range of 3 3 6 [,,, L, ]. In genera, the performance of the agorthm wth SVMs s senstve to the hyperparameter, and the testng accuracy represents the predctve abty. Smar to the common method n SVMs, we report the hghest testng accuracy and the hyperparameters n the abe.
338 Y. Xao et a. abe. Resuts of experments
Mut-agent System for Custom Reatonshp Management wth SVMs oo 339 abe. he hghest testng accuracy and correspondng hyperparameters Parameters ( C, σ ) ranng Accuracy (%) estng Accuracy (%) (8,6) 93.75 80 (8,4) 93.75 80 (3,8) 93.75 80 (04,3) 93.75 80 (048,3) 93.75 80 From the resuts above, we can easy concude that the tranng and testng accuraces of the cassfer wth SVMs s very hgh and SVMs coud resove the customer reatonshp management. Because SVMs s amng at the machne earnng of tte sampe, t coud be successfuy used n many practca probems. 5 Concusons Dstrbuted data mnng n the CRM s to earn avaabe knowedge from the customer reatonshp so as to nstruct the strategc behavor and obtan the most proft. In order to resove the CRM n dstrbuted data mnng for CRM, ths paper presents the archtecture of dstrbuted data mnng, and then utzes the SVMs too to separate the customs nto severa casses and manage them. In the end, the practca experments about one Chnese company are conducted to show the good performance of the proposed approach. References [] Prodromds, A., Chan, P., Stofo, S.: Meta-earnng n dstrbuted data mnng systems: Issues and approaches. Advance n Dstrbuted Data Mnng. AAAI Press, Meno Park (999) [] Provost, F., Hennessy, D.: Scang up: Dstrbuted machne earnng wth cooperaton. Workng Notes of IMLM-96, pp. 07 (996) [3] Battberg, R.C., Deghton, J.: Manage marketng by the customer equty test, pp. 36 44. Harvard Busness Revew (Juy August 996) [4] He, Z.Y., et a.: Mnng cass outers: concepts, agorthms and appcatons n CRM. Expert Systems wth Appcaton 7, 68 697 (004) [5] Baesens, B., et a.: Bayesan network cassfers for dentfyng the sope of the customer fecyce of ong-fe customers. European Journa of Operatona Research 56, 508 53 (004) [6] Langey, P., Iba, W., hompson, K.: An anayss of Bayesan cassfers. In: Proceedngs of the enth Natona Conference on Artfca Integence (AAAI), San Jose, pp. 3 8. AAAI Press, Meno Park (99) [7] Heckerman, D.: A tutora on earnng wth Bayesan networks. echnca Report MSR- R-95-06, Mcrosoft Research (995) [8] Vapnk, V.N.: Statstca earnng theory. John Wey & Sons, Chchester (998)
340 Y. Xao et a. [9] KreBe, U.H.G.: Parwse cassfcaton and support vector machnes. In: Schökopf, B., Burges, C.J., Smoa, A.J. (eds.) Advances n Kerne Methods: Support Vector Learnng, pp. 55 68. he MI Press, Cambrdge (999) Appendx Sampes Age Educaton Producton Leve Dstrbuton Area Cassfcaton <=30 Hgh Low A Bad <=30 Hgh Hgh A Good 3 <=30 Hgh Medum B Bad 4 <=30 Hgh Hgh B Good 5 <=30 Low Hgh A Good 6 <=30 Low Low A Good 7 <=30 Low Low B Good 8 <=30 Medum Hgh A Good 9 <=30 Medum Medum A Good 0 <=30 Medum Medum B Good <=30 Medum Low A Good 3-50 Medum Medum A Good 3 3-50 Medum Medum B Good 4 3-50 Medum Low A Bad 5 3-50 Hgh Hgh A Good 6 3-50 Hgh Medum A Good 7 3-50 Hgh Low A Good 8 3-50 Hgh Hgh B Bad 9 3-50 Hgh Low B Bad 0 3-50 Low Hgh A Good 3-50 Low Low A Good 3-50 Medum Hgh B Bad 3 3-50 Medum Hgh A Good 4 >=50 Medum Hgh A Bad 5 >=50 Medum Hgh B Bad 6 >=50 Medum Medum A Good