ata Mnng Prblems and lutns fr Respnse Mdelng n CRM Ch, ungzn hn, Hyunjung Yu, Enzhe Ha, Kyungnam MacLachlan, L. uglas Abstract Ths paper presents three data mnng prblems that are ften encuntered n buldng a respnse mdel. They are rbust mdelng, varable selectn and data selectn. Respectve algrthmc slutns are gven. They are baggng based ensemble, genetc algrthm based wrapper apprach and nearest neghbr-based data selectn n that rder. A real wrld data set frm rect Marketng Educatnal Fundatn, r MEF4, s used t shw ther effectveness. Prpsed methds were fund t slve the prblems n a practcal way. Keywrds : Respnse Mdelng, CRM, ata Mnng, tablty, Ensemble, Varable electn, ata electn, Nearest Neghbr, Genetc Algrthm, versty, Accuracy. Intrductn Respnse mdelng s cncerned wth cmputng the lkelhd f a custmer t respnd t a marketng campagn. Lack f mental mdels fr decsn makng leaves theretcal mdels unattanable. tatstcal mdels are ften develped frm a hstrc data set whch cntans custmers past purchasng behavr and demgraphc nfrmatn. The marketers can then use the mdel t cmpute the lkelhd values r scres t reduce the number f custmers whm they target n actual campagn. As far as the number f the respndng custmers s reasnably smlar, the effectveness f the campagn can be rased. Cnventnal statstcal methds nclude lgstc regressn and mult-nrmal regressn. Recently, hwever, machne learnng based data mnng mdels such as neural netwrks and supprt vectr machnes have been ncreasngly emplyed wth sgnfcantly better accuraces. There are several data mnng related ssues n respnse mdelng. They nclude nstablty f cmplex nnlnear mdels, many ptental predctr varables t cnsder, and many data n the database. Each f these ssues s nt easy t tackle. Left unslved, hwever, they result n pr perfrmance. In ths wrk, we present practcal remedes Ch, ungzn : Prfessr, Industral Engneerng, eul Natnal Unversty hn, Hyunjung : Researcher, Fredrch Mescher Labratry Yu, Enzhe : Cnsultant, amsung ata ystems, Inc. Ha, Kyungnam : Ph.. Canddate, Texas A & M Unversty MacLachlan, L. uglas : Prfessr, Unversty f Washngtn Busness chl fr these prblems : baggng ensemble, GA wrapper varable selectn and Neghbrhd Prperty based pattern selectn, respectvely. Expermental results nvlve a real wrld data set frm rect Marketng Educatnal Fundatn(MEF). Fur remedes are appled and shwn t have the desred effect. Ths paper cmprses fur sectns. The next three sectns cver each ssue, ts remedy and emprcal results : nstablty, varable selectn and data selectn, n that rder and a cnclusn s presented. 2. Mdel Instablty Tradtnally, a lnear statstcal methd such as lgstc regressn has been used t mdel respnse based n a test f a randm sample f custmers frm the cmplete lst. In rder t vercme the lmtatns f lgstc regressn, ther appraches such as rdge regressn, stchastc RFM respnse mdels and hazard functn mdels have been prpsed recently. Neural netwrks, a class f nn-lnear mdels that mmc bran functn, have been emplyed n marketng because n a prr knwledge r assumptn abut the errr dstrbutn s requred[9]. It has been shwn n ne nstance that neural netwrk mdels mprved the respnse rate up t 95% n drect marketng[2]. In anther applcatn, bank custmers respnse was predcted usng a neural netwrk[8], yeldng superr results. A neural netwrk was als shwn t utperfrm multnmal lgstc regressn[]. 55
Hwever, there s a trcky prcess nvlved n usng neural netwrks r any cmplex nnlnear classfers such as decsn tree. An verly cmplex neural netwrk s sad t have a large varance; perfrmance f the netwrk vares greatly ver dfferent data sets frm an dentcal ppulatn dstrbutn. mple mdels such as lgstc regressn wuld nt have such a prblem. When mdels have a large dscrepancy between the true target and the expectatn f the mdel utput ver dfferent data sets, the mdels are sad t have a large bas. Bth bas and varance create classfcatn errr. A cmplex mdel has a large varance and a small bas whle a smple mdel has a large bas and a small varance. One can typcally mprve ne type f errr at the expense f the ther, thus the bas-varance dlemma[5]. It s a dffcult task t determne the ptmal cmplexty f a neural netwrk gven a fnte tranng data set. What s usually practced s a mdel selectn prcess, a tedus, tmecnsumng tral-and-errr search fr the ptmal cmplexty. Numerus research papers have been wrtten abut ways t fnd the rght cmplexty, but t appears mpssble t fnd an easy-t-understand and easy-t-fllw prcedure. One way t allevate the nstablty f a sngle neural netwrk s t cmbne multple netwrks. One f the mst ppular methds s baggng, r btstrap aggregatng, prpsed n late nnetes[3]. The baggng prcess begns wth btstrappng the tranng data(tep n Fgure ). Gven a data set f N patterns(.e., custmer recrds), a btstrap sample s cnstructed by randmly samplng N patterns wth replacement. ue t replacement, sme patterns are pcked mre than nce whle sme ther patterns are nt pcked at all. It s estmated that a btstrap sample cntans abut 67% f the rgnal data set patterns. Next, each btstrap data set s used t tran a neural netwrk(tep 2 n Fgure ). Then ne has L neural netwrks. The utputs f L neural netwrks are all ptentally dfferent. The fnal mdel utput s cmputed by averagng L utputs fr a regressn prblem r by a majrty vtng fr a classfcatn prblem(tep 3 n Fgure ). It s bvus that baggng tranng takes at least L tmes the duratn f a sngle neural netwrk tranng, plus the btstrap tme. One pstve aspect f baggng n terms f cmputatn tme s that the tranng f L neural netwrks can be dne n parallel, f suffcent hardware and/r sftware systems are avalable. Orgnal data set f N patterns Randm samplng wth replacement each wth sze N f N patterns 2 f N patterns 3 f N patterns L f N patterns L data sets tep. Generate L btstrap samples {B }, =,2, L, L L MLPs Aggregatn : A set f btstrappng replcantes f rgnal data set B { f }, =,2, L, L : A set f Netwrks n the ensemble tep 2. Tranng L dfferent neural netwrks f ( x, y), =,2, L, L, ( x, y) B : Tran each learner based n the btstrap sample tep 3. Aggregatng L utputs f Bag (x) = L = α f (x) : Aggregate the uputs f learners based n wegh α r majrty v tng Fgure. Cnceptual mdel f baggng wth L MLPs, each traned wth N patterns Baggng reduces varance[3] r mdel varablty ver dfferent data sets frm a gven dstrbutn, wthut ncreasng bas, whch results n a reduced verall generalzatn errr and an mprved stablty. Althugh the methd s qute smple, t was fund t be effectve n many applcatns. The ther advantage f usng baggng s related t mdel selectn. nce baggng transfrms a grup f verftted netwrks nt a better-than-perfectly-ftted netwrk, the tedus tme-cnsumng mdel selectn s n lnger necessary. Ths culd even ffset the cmputatnal verhead ntrduced by baggng that nvlves tranng L neural netwrks. MEF4 s a publc-dman data set prvded by the rect Marketng Educatnal Fundatn(www.the-dma.rg/dmef/). The data set s frm an upscale gft busness that mals general and specalzed catalgs t ts custmers several tmes a year. There are tw tme perds, base tme perd frm 56
ecember 97 thrugh June 992, and later tme perd frm eptember 992 thrugh ecember 992. Every custmer n the later tme perd receved at least ne catalg n early autumn f 992. The fact that the data set s ld des nt matter snce we fcus n cmparng the relatve perfrmance f dfferent appraches. Nw we are t buld a respnse mdel fr perd Octber 992 thrugh ecember 992 tme perd. That data set cnssts f 0,532 custmers r recrds and 9 predctr varables r clumns. The respnse rate s 9.4%. nce the experments were repeated 30 tmes, the number f data r custmers had t be reduced. Instead f randmly selectng 20% f custmers, we chse t select thse 20% f custmers wh are mprtant n terms f ther recent purchase. Recency was mplemented usng weghted dllar amunt spent defned as Weghted llar = 0.4 llars f ths year + 0.3 llars f last year + 0. llars f 2 years ag + 0. llars f 3 years ag + 0. llars f 4 years ag The partcular values f weghts were arbtrarly determned. Hwever, a partcular chce f the values wuld nt alter the utcme f the experments. The reduced data set nw has 20,300 mprtant custmers wth 8.9% respnse rate. We randmly chse 90% f these data fr tranng. The remanng 0% r 2,030 recrds were set asde fr test. The respnse rate f the test data set was 7.73%. The same data set was used n the fllwng sectns. Fgure 2 dsplays ROC charts fr wrst case ut f 20 repettns. The curve was frmed by cnnectng adjacent pnts, each f whch crrespnds t a(false pstve, false negatve) par btaned frm a partcular threshld value amng nne, rangng frm 0. thrugh 0.9. A ttal f fur mdels were bult: the prpsed baggng MLP(BMLP), a sngle MLP(MLP), lgstc regressn wth balanced data set (BLR) and lgstc regressn wth unbalanced data set(ublr). Three bservatns can be made. Frst, the baggng MLP(BMLP) dd best. That was exactly what we set ut t demnstrate by usng BMLP n the frst place: buldng a mdel that s gd as well as stable n ft. ecnd, the sngle MLP dd wrst. Thrd, the BLR and UBLR dd nt seem t dffer much n terms f false pstve and false negatve..0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. 0.0 -senstvty 0.0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9.0 -specfcty BLR UBLR MLP BMLP Fgure 2. Wrst case ROC curves fr fur mdels : BLR, UBLR, MLP and BMLP 3. Varable electn A secnd ssue nvlves whch varable t chse n respnse mdelng. A custmer related data set usually cntans hundreds f features r varables, many f whch are rrelevant and heavly crrelated wth thers. Wthut prescreenng r feature selectn, they tend t deterrate perfrmance f the mdel, as well as ncrease the mdel tranng tme. Feature subset selectn can be frmulated as an ptmzatn prblem whch nvlves searchng the space f pssble features t dentfy a subset that s ptmum r nearptmal wth respect t perfrmance measures such as accuracy. Varus ways t perfrm feature subset selectn exst[6]. They can be classfed nt tw categres based n ts relatn wth base learner: flter and wrapper. The flter apprach usually chses features ndependently f base learner by human experts r statstcal methds such as prncpal cmpnent analyss(pca). Generally, It s cmputatnally mre effcent than a wrapper apprach, but ts majr drawback s that an ptmal selectn f features may nt be ndependent f the nductve and representatnal bases f the learnng algrthm that s used t cnstruct the classfer. The wrapper apprach, n the ther hand, evaluates canddate feature subsets by tranng a base learner such as a neural netwrk r supprt vectr machne wth a gven tranng data set usng each feature subset under cnsderatn and then testng t aganst a separate valdatn data set. nce the number f feature subsets s usually very large, ths scheme s feasble nly f tranng s relatvely fast[6]. 57
Furthermre, snce wrapper apprach usually nvlves search, chsng a prper way f searchng s als an mprtant ssue. A smple exhaustve search s cmputatnally nfeasble n practce. Therefre, subptmal yet practcal search methd s desred. The well knwn stepwse regressn s a wrapper apprach based n a greedy search. It s cmputatnally effcent yet acheves subptmal slutn. nce mre cmputng pwer s avalable nw, randmzed r prbablstc search such as genetc algrthm(ga) s used mre ften. It s cmputatnally mre nvlved, yet prmses a slutn clse t the ptmal slutn. Mst lterature has treated the tw ssues separately. Mdel buldng prcedure usually deals wth feature selectn frst, and then wth ensemble creatn[7]. Hwever, separatn f the ssues s nt desrable snce a feature subset selected may be ptmal fr a sngle classfer, but may nt be ptmal fr an ensemble. In fact, t may be better ff f each member classfer f an ensemble emplyees dfferent feature subsets. nce they are nterrelated, these tw ssues need t be addressed at the same tme. In ths paper, we prpse a framewrk where these tw ssues are addressed tgether wth a gal f btanng a glbally ptmal slutn, nt a cmbnatn f tw lcally ptmal slutns. In partcular, we prpse an ensemble f classfers traned wth dfferent subsets f features chsen by a GA based wrapper feature selectn apprach. In a GA based wrapper apprach[6], each feature s encded as a gene and a subset f features as a chrmsme. Presence r absence f a feature crrespnds t r 0 value respectvely n the chrmsme. T each chrmsme des base learner crrespnd whch s traned wth nly thse features that are present n the chrmsme. Intally, a ppulatn f chrmsmes s randmly created. The valdatn accuracy f the traned base learner s cnsdered as the ftness f the crrespndng chrmsme. electn peratn screens ut thse chrmsmes that crrespnd t a lw ftness value. A next generatn f chrmsmes s created frm the selected chrmsmes thrugh crss-ver and mutatn. These peratns are desgned s that the next generatn s ftter. Wth ths prcess beng repeated, the ppulatn f chrmsmes evlves wth an expectatn that ftter chrmsmes emerge and survve. In mst practcal stuatns, chrmsmes d mprve and becme ftter. In the end, nly the fttest nes d survve and they qualfy as gd feature subsets. The base learner can be any classfer wth a reasnably gd generalzatn perfrmance and quck tranng capablty. We fund that upprt Vectr Machne(VM) suts the requrement very well[5]. In ne f ur prevus studes, an VM resulted n a cmparable perfrmance wth that f fur layer multlayer perceptrn neural netwrk whle ts learnng tme was three rders f magntude faster than that f neural netwrk[8]. A GA wrapper apprach wth VM as a base learner s depcted n Fgure 3. Intal Ppulatn Pl f Canddate Crssver and New Pl f Canddate Feature Evaluatn by VM electn Best lutn Fgure 3. GA-VM Wrapper Apprach fr Feature ubset electn Prevus researches n ensemble are ether based n the dfference f tranng sets, such as baggng[3] and bstng [], r n dfference f classfers[7]. The ensemble creatn methd prpsed here s based n the dfference amng feature subsets whch crrespnd t dfferent chrmsmes. nce GA wrapper apprach nvlves generatng a ppulatn f gd subset features, t s qute natural t base t fr ur ensemble member selectn prcess. There s ne caveat, hwever. One needs t balance between accuracy and dversty. An ensemble s effectve nly when the member classfers are bth accurate and dverse. If ne selects nly thse classfers wth a hgh accuracy, the ensemble may nt have a suffcent dversty. If, n the ther hand, ne fcuses n the dversty f classfers r ther crrespndng subset f features, ther ndvdual members accuracy may nt be hgh. Thus, t s crtcal hw t balance these tw cntradctry bjectves n a structured way. Our apprach s based n the fllwng bservatn that n an early stage f genetc evlutn, the chrmsmes tend t be mre dverse but less accurate whle n a latter stage f genetc evlutn, the survvng chrmsmes tend t be mre accurate but less dverse. Thus, we prpse t stp the genetc evlutn prcess early s that dversty s stll kept whle accuracy s acceptable. Then, dverse subset f chrmsmes s chsen 58
frm the fnalsts. The prpsed Feature electn Ensemble (FE) prcedure s descrbed n Fgure 4. Phase (Early stpped GA) tep. Evlve a ppulatn f classfers each f whch emplys a subset f features fr( ε) 00% f tme T, where T s the tme necessary fr cmplete cnvergence and 0<ε <. tep 2. elect all thse premature classfers. Phase 2 (electn f dverse Classfers) tep. Identfy k grups f VM classfers, each f whch emplys the same set f features. (Even f a same set f features were used, dfferent classfers culd result n general, dependng n the partcular VM parameters such as γ and cst c used durng tranng.) tep 2. Chse fr each grup the VM classfer ƒ that has the smallest valdatn errr, resultng n a ttal f k ( n) classfers. tep 3. Fr each classfer ƒ, cmpute the valdatn utput uv vectr f = (ƒ (), ƒ (2),, ƒ (N)), where ƒ (k) s the utput f classfer fr the k-th valdatn pattern. tep 4. Cmpute hammng dstance H(,j) between every uv uv par f valdatn utput vectrs f and f. (There are a ttal f k C 2 such dstances.) tep 5. elect the classfers nvlved n the largest H, the secnd largest H, and s n untl the pre-determned number f classfers s btaned. (If H(3,7) were the largest, fr nstance, classfers 3 and 7 wuld be btaned.) Fgure 4. Prpsed Feature electn Ensemble(FE) prcedure The prpsed Feature electn Ensemble(FE) prcedure cnssts f tw phases, GA phase and classfer selectn phase. Frst, n GA phase, a typcal GA wrapper s run t fnd T. Then, thse chrmsmes that emerge at ( ε) 00% f T are dentfed fr phase 2. As mentned earler, f the GA prcess s let t run t cnverge, very accurate but smlar classfers wll result, thus lwerng the dversty. They are cnsdered t be reasnably accurate and farly dverse. ecnd, n the classfer selectn phase, we try t select a subset f classfers frm phase fr an ensemble. nce they are already reasnably accurate, we fcus here the dversty. The prblem f chsng the mst dverse subset f classfers based n ther valdatn predctns can be transfrmed nt a set packng prblem, whch s knwn as NP-cmplete. Thus, we emplyed a greedy heurstc apprach where a par f classfers wth a largest dfference s selected, then anther par wth a secnd largest dfference s selected, and s n. The dfference between tw classfers s estmated by the hammng dstance between tw respectve valdatn utput vectrs crrespndng t the tw classfers. The frst step n phase 2 dentfes k grups f VM classfers, each f whch emplys the same set f features. Nte that even f a same set f features were used, dfferent classfers culd result n general, dependng n the partcular VM parameters such as γ and cst c used durng tranng. In the secnd step, the VM classfer ƒ that has the smallest valdatn errr fr each grup s chsen, resultng n a ttal f k ( n) classfers. Then, fr each classfer ƒ, cmpute the valdatn utput uv = (ƒ (), ƒ (2),,ƒ (N)), where ƒ (k) s the utput vectr f f classfer fr the k-th valdatn pattern. Based n these vectrs, hammng dstance H(,j) between every par f uv uv and s cmputed (There are a valdatn utput vectrs f f ttal f k C 2 such dstances.). Fnally, at step 5, thse classfers are selected whch are nvlved n the largest H, then thse nvlved n the secnd largest H, and s n untl the predetermned number f classfers s btaned. If H(3,7) were the largest, fr nstance, classfers 3 and 7 wuld be btaned. Or alternatvely, the tp χ% f the Hs culd be cnsdered and all the classfers nvlved wuld be btaned. In the latter, the exact number f classfers can nt be determned a prr. The prpsed FE prcedure emplys three knds f parameters whch must be set and tuned by the user. Frst s the set f parameters related t GA such as the number f generatns, crssver and mutatn rates. In partcular, GA s usually let t run untl the ppulatn ftness change becmes very small. Here, we set the generatn number t 00 ecnd s the set f parameters related t the base learner r VM. Fr Gaussan kernels, ts wdth parameter γ and cst parameter α need t be set. They culd have been put nt a chrmsme and ptmal values culd have been btaned thrugh GA search. ue t an ncreased cmputatnal tme, ther ranges were set and tuned wthn the ranges n a randmzed fashn. The thrd knd nvlves thse parameters partcular t ur prpsed apprach : the number f ensemble members and early stp parameter ε. We set the number f ensemble sze as 0 n ths study. Parameter ε cntrls the level f accuraces and dversty f the ppulatn. If ε s t large, the 59
chrmsmes wll shw a hgh level f dversty whle the qualty s lw. On the cntrary, f ε s t small, the chrmsmes wll have a hgh qualty, but the level f dversty wll be lw. ettng parameter ε requres an emprcal justfcatn thrugh a prr studes n the prblem cmplexty aganst GA cnvergence. The values f ε s were set between 0% t 20%, that s, the fnal chrmsmes were 80% t 90% mature. True Pstve Rate (TP) 0.8 0.75 0.7 0.65 0.6 0.55 0.5 (0.2005, 0.522) 0.45 0. 0.5 0.2 0.25 (0.2623, 0.6903) (0.2358, 0.6604) (0.2362, 0.6302) (0.3649, 0.7382) (0.2940, 0.7406) (0.4059, 0.729) MLPF VMF EMLP FVM-PCA FVM-T VMF FE 0.3 0.35 0.4 0.45 0.5 0.55 0.6 False Pstve Rate (FP) Fgure 5. ROC pnts f the seven respnse mdels MLP-FF VM-FF EMLP FVM-PCA FVM-T VM-F FE (prpsed) Table. even dfferent mdels t be cmpared ngle MLP wth Full Feature et ngle VM wth Full Feature et Ensemble MLP wth Full Feature et VM wth feature subsets selected by PCA VM wth feature subsets selected by T VM wth Feature electn (by GA wrapper) VM Ensemble based n Feature electn A ppular measure f perfrmance s an ROC graph whch plts true pstve rate(tp) aganst false pstve rate(fp)(see Fgure 5). Pnt(0,) n the fgure crrespnds t perfect classfcatn. VM s perfrmance s dented by a sngle(fp, TP) par pnt n the fgure. Cmpared t ther measurements, an ROC curve has the fllwng characterstcs. An ROC curve r pnt s ndependent f class dstrbutn r errr csts. An ROC graph encapsulates all nfrmatn cntaned n the cnfusn matrx, snce FN s the cmplement f TP and TN s the cmplement f FP. ROC curves prvde a vsual tl fr examnng the tradeff between the ablty f a classfer t crrectly dentfy pstve cases and the number f negatve cases that are ncrrectly classfed. Fgure 5 presents (FP, TP) pars fr the seven mdels(see Table ). The perfect mdel wuld fall n (0, ). The prpsed FE les clset t t whle VMF and EMLP lk pretty clse. VMF has a gd TP but s FP s t large. 4. Pattern electn A thrd majr prblem encuntered n respnse mdelng s sheer vlume f data r patterns. Generally speakng, retalers keep huge amunts f custmer data. Mrever, a new custmer s recrd s added n tp f t n and n. Even thugh data mnng algrthms are desgned t deal wth the prblem, t s always desrable t sample the data and wrk n a subset f the huge data set. The prblem s mre severe when the mst pwerful classfcatn mdel f ur tme s emplyed, the supprt vectr machne[5]. In supprt vectr machne, quadratc prgrammng(qp) frmulatn s made where the dmensn f kernel matrx (M M) s equal t the number f tranng patterns(m). A standard QP slver has tme cmplexty f rder O(M3) : MINO, CPLEX, LOQO and MATLAB QP rutnes. And the slvers usng decmpstn methds apprxmately have tme cmplexty f T O(Mq +q3) where T s the number f teratns and q s the sze f the wrkng set : Chunkng, MO, VM lght and OR[6]. Needless t say, T ncreases as M ncreases. One way t crcumvent ths cmputatnal burden s t select sme f tranng patterns n advance whch cntan mst nfrmatn gven t learnng. One f the merts f VM thery dstngushable frm ther learnng algrthms s that t s clear that whch patterns are f mprtance t tranng. Thse are called supprt vectrs(vs), dstrbuted near the decsn bundary, and fully and succnctly defne the classfcatn task at hand[4]. Furthermre, n the same tranng set, the VMs traned wth dfferent kernel functns,.e., RBF, plynmal and tanh, have selected almst dentcal subset as supprt vectrs[3]. Therefre, t s wrth fndng such wuldbe supprt vectrs prr t VM tranng. Here we prpse neghbrhd prperty based pattern selectn algrthm(npp). The tme cmplexty f NPP s O(vM) where v s the number f patterns n the verlap regn arund decsn bundary[4]. We utlzed k nearest neghbrs t lk arund the pattern s perphery. The frst neghbrhd prperty s that a pattern lcated near the decsn bundary tends t have mre hetergeneus neghbrs n ther class-membershp. The secnd neghbrhd prperty dctates that an verlap r a nsy pattern tends t belng t a dfferent class frm ts neghbrs. And the thrd neghbrhd prperty s that the neghbrs f 60
a pattern lcated near the decsn bundary tend t be lcated near the decsn bundary as well. The frst ne s used fr dentfyng thse patterns lcated near the decsn bundary. The secnd ne s used fr remvng the patterns lcated n the wrng sde f the decsn bundary. And the thrd ne s used fr skppng calculatn f unnecessary dstances between patterns, thus acceleratng the pattern selectn prcedure. Fgure 6 shws the prpsed algrthm. NPP() { Intalze wth randmly chsen patterns frm. 0, 0,, Whle _e^0 nt empty d { /* Chse x satsfyng [expandng crtera] */ O /* elect x satsfyng [electng crtera] */ /* Update the pattern sets : Expanded, Nn-expanded, elected */ /* Cmpute the next evaluatn set */ + knn ( x) e U x = + } 0 x 0. { x Neghbrs _ Entrpy ( x) > 0, x e},. x e + { x Neghbrs _ Match ( x) β, x }. J U, + x x U x, + + ( U ) x + U s. (a) VM result wth all patterns (b) VM result wth selected patterns Fgure 7. 4x4 CHECKERBOAR PRBOLEM ) return ^ } /* where Neghbrs_entrpy J r r ( x, k ) = P ( x ) lg r r r r r r Neghbrs_match { x label ( x ) = label ( x ), x knn ( x )} ( x, k ) = k j j J = P j ( x ) Table 2. Cmparsn between VM traned wth all data vs. sampled data. 2) Tranng Patterns UB ALL ET ALL Vs Executn Tme (sec) UB ET ALL UB ET Test Errr (%) ALL UB ET 8,226 8,87 35,529 6.624 4820 29 34.8 35. Fgure 6. Neghbrhd Prperty based Pattern selectn algrthm Fgure 7 vsualzes ne f the expermental results f artfcal prbles prevusly reprted The decsn bundares n bth fgures lk qute smlar, thus, generalzatn perfrmance s mmlar. The results shw that NPP reduced VM tranng tme up t almst tw rders f magntude wth vrtually n less f accuracy Table 2 clearly shws hw the prpsed pattern selectn methd wrks fr MEF4 dataset. Out f 8,226 patterns, slghtly mre than 0% r 8,87 were selected. Hwever, ) ecsn bundary s depcted as a sld lne and the margns are defned by the dtted lnes n bth sdes f t. upprt vectrs are utlned. Fgure (a) ndcates a typcal VM result f all patterns whle (b) stands fr that f selected patterns by NPP. 2) The clumn, ELECTE f Executn Tme ncludes VM tranng tme as well as NPP runnng tme. 6
abut ne ffth f supprt vectrs were selected. The executn tme was reduced frm 4,820 secnds t 29 secnds, achevng almst 40 tmes decrease whle resultng n almst dentcal errr rate. 5. Cnclusns In ths paper, we addressed three majr data mnng ssues n respnse mdelng. They nclude nstablty f cmplex nnlnear mdels, t many ptental predctr varables t cnsder, and t many data n the database. Fr nstablty prblem, we suggested t use baggng ensemble methd. Fr varable selectn prblem, we prpsed a genetc algrthm based wrapper apprach. Fr pattern selectn prblem, we prpsed a neghbrhd prperty based methd. All these prpsed methds were emprcally shwn that they are effectve. A real wrld data set frm rect Marketng Educatnal Fundatn(MEF) was emplyed. Future wrks nclude smplfcatn r remval f phase tw n wrapper apprach and pattern selectn fr regressn prblem. References [] Bentz, Y. and Merunkay,., Neural Netwrks and the Multnmal Lgt fr Brand Chce Mdelng : a Hybrd Apprach, Jurnal f Frecastng, Vl.9(3), 2000, pp.77-200. [2] Bunds,. and Rss,., Frecastng Custmer Respnse wth Neural Netwrk, n : Fsler, E. and Beale, R. (eds.), Handbk f Neural Cmputatn, Taylr & Francs,pp.-7, 997. [3] Breman, L., Baggng predctrs, Machne Learnng, Vl.24(2), 996, pp.23-40. [4] Cauwenberghs, G. and Pgg, T., Incremental and ecremental upprt Vectr Machne Learnng, Advances n Neural Infrmatn Prcessng ystems, 3th edtn, MIT Press, pp.409-45, 200. [5] Geman,., Benenstck, E. and ursat, R., Neural Netwrks and the Bas/Varance lemma, Neural Cmputan, Vl.4(), 992, pp.-58. [6] Hearst, M. A., chölkpf, B., umas,., Osuna, E. and Platt, J., Trends and Cntrverses upprt Vectr Machnes, IEEE Intellgent ystems, Vl.3, 997, pp.8-28. [7] H, T. K., The randm subspace methd fr cnstructng decsn frests, IEEE transactns n Pattern Analyss and Machne Intellgence, Vl.20(8), 998, pp.823-844. [8] Mutnh, L., Curry, B., aves, F. and Rta, P., Neural Netwrk n Marketng, Rutledge, 994. [9] Platt, J. C., Fast Tranng f upprt Vectr Machnes Usng equental Mnmal Optmzatn, n : chölkpf, B., Chrstpher, J. C. and Mka, B.. (eds.), Advances n Kernel Methds : upprt Vectr Machnes, MIT press, pp.85-208, 999. [0] Pntl, M. and Verr, A., Prpertes f upprt Vectr Machnes, Neural Cmputatn, Vl.0, 998, pp.955-974. [] chapre, R., The trength f Weak Learnablty, Machne Learnng, Vl.5, 990, pp.97-227. [2] harkey, A. J., On Cmbnng Artfcal Neural Nets, Cnnectn cence,vl.8, 996, pp.299-34. [3] chölkpf, B., Burges, C. and Vapnk, V., Extractng supprts data fr gven task, Prceedngs f st Internatnal Cnference n Knwledge scvery and ata Mnng, AAAI, 995, pp.252-257. [4] hn, H. J. and Ch,., Fast Pattern electn Algrthm fr upprt Vectr Classfers : Tme Cmplexty Analyss, Prceedngs f the 3rd Internatnal Cnference n Intellgent ata Engneerng and Autmated Learnng, AAAI, 2003, pp.008-05. [5] Vapnk, V., The Nature f tatstcal Learnng Thery, 2nd edtn, prnger, 999. [6] Yang, J. and Hnavar, V., Feature ubset electn usng a Genetc Algrthm, n : Lu, H. and Mtda, H. (eds.), Feture electn fr Knwledge scvery and ata Mnng, Kluwer Academc Publshers, pp.7-36, 998. [7] Ya, X. and Lu, Y., Makng Use f Ppulatn Infrm atn, IEEE transactns n ystems, Man, and Cybern etcs, Vl.28(3), 998, pp.47-425. [8] Yu, E. and Ch,., Keystrke ynamcs Identty Ver fcatn-its Prblems and Practcal lutns, Cm puter and ecurty, Vl.23(5), 2004, pp.428-440. [9] Zahav, J. and Levn, N., Issues and Prblems n Applyng Neural Cmputng t Target Marketng, Jurnal f rect Marketng, Vl.(4), 997, pp.63-75. 62
Authrs 조성준 (Ch, ungzn) 서울대학교 산업공학과에서 학사, 석사를 마치고 미국 Unversty f Washngtn과 Unversty f Maryland에서 컴 퓨터사이언스로 각각 석사와 박사를 마쳤다. 포항공대 컴퓨터공학과 조교수를 역임하였고 현재 서울대학교 산업공학과에 부교수 겸 학과장으로 재직하고 있다. 관심분야는 데이터마이닝 알고리즘 및 데이터마이닝 의 비즈니스 응용이다. E-mal : zn@snu.ac.kr Tel : +82-2-880-6275 우은철 (Yu, Enzhe) 중국 하얼빈공대(HIT) 컴퓨터공학과를 졸업하고 강릉 대학교 산업공학과에서 석사학위를 취득하였으며 서 울대학교 산업공학과에서 데이터마이닝 전공으로 박 사학위를 취득하였다. 2004년8월 삼성 컨설팅본부 에 입사하였으며 현재 삼성전자 무선사업부 정보전략 업무를 담당하고 있다. 관심분야는 데이터마이닝, 생 체인식, CRM, CM, BPM, OA, EA 등이다. E-mal : ez.yu@samsung.cm 신현정 (hn, Hyunhung) 2005년 서울대학교 산업공학과에서 박사학위를 받았 다. 전공분야는 데이터마이닝으로 이에 관한 제반 알 고리즘을 연구, 개발한다. 특히, upprt Vectr Machnes 및 Kernel Methd, Neural Netwrk, Graph Methd에 이론 적 전문성이 있다. 응용분야는 크게 Custmer Relatnshp Management와 Bnfrmatcs를 겸하고 있 다. 2004년부터는 독일의 Max-Planck-Insttute 연구원으 로 재직하고 있다. E-mal : shn@tuebngen.mpg.de Tel : +49-0-707-60-824 하경남 (Ha, Kyngnam) 현재 미국 Texas A&M Unversty에서 Industral and ystems Engneerng 박사과정 중에 있다. 서울대학교 산업공학과 학사학위를 받았으며, 동대학교 대학원 데 이터마이닝 연구실에서 석사학위를 받았다. 관심분야 는 스토케스틱 프로그래밍을 이용한 광고모델, 데이터 마이닝 기법 기반의 응답모델, Cmbnatral Prblems 및 휴리스틱 알고리즘 개발 등이다. E-mal : knamha@ne.tamu.edu 63
MacLachlan, L. uglas MacLachlan, L. uglas s Prfessr, epartment f Marketng and Internatnal Busness, Unversty f Washngtn Busness chl. He has a BA n Physcs, MA n tatstcs, MBA and Ph.. n Marketng, all frm the Unversty f Calfrna, Berkeley. He teaches prmarly marketng research, database marketng, multvarate data analyss and data mnng. He has had extensve experence n cnsultng, marketng research and management develpment and has publshed many artcles n varus academc and busness jurnals. He has been Vstng chlar at INEA n Fntanebleau, France; and Vstng Prfessr at Cathlc Unversty f Leuven, Belgum and at Kç Unversty n Istanbul, Turkey. As Asscate ean f the UW Busness chl, he helped establsh the Glbal Executve MBA jnt prgram between Ynse Unversty and UW. E-mal : macl@u.washngtn.edu 64