Probabilistic Latent Semantic User Segmentation for Behavioral Targeted Advertising*

Probablstc Latent Semantc User Segmentaton for Behavoral Targeted Advertsng* Xaohu Wu 1,2, Jun Yan 2, Nng Lu 2, Shucheng Yan 3, Yng Chen 1, Zheng Chen 2 1 Department of Computer Scence Bejng Insttute of Technology Bejng, Chna, 100081 xaohuwu85@gmal.com, chenyng1@bt.edu.cn 2 Mcrosoft Research Asa Sgma Center, 49 Zhchun Road Bejng, Chna, 100080 {v-xwu, junyan, nngl, zhengc}@mcrosoft.com 3 Natonal Unversty of Sngapore Offce E4-05-11, 4 Engneerng Drve 3 Sngapore, 117576 eleyans@nus.edu.sg ABSTRACT Behavoral Targetng (BT, whch ams to delver the most approprate advertsements to the most approprate users, s attractng much attenton n onlne advertsng maret. A ey challenge of BT s how to automatcally segment users for ads delvery, and good user segmentaton may sgnfcantly mprove the ad clc-through rate (CTR. Dfferent from classcal user segmentaton strateges, whch rarely tae the semantcs of user behavors nto consderaton, we propose n ths paper a novel user segmentaton algorthm named Probablstc Latent Semantc User Segmentaton (PLSUS. PLSUS adopts the probablstc latent semantc analyss to mne the relatonshp between users and ther behavors so as to segment users n a semantc manner. We perform experments on the real world ad clc through log of a commercal search engne. Comparng wth the other two classcal clusterng algorthms, K-Means and CLUTO, PLSUS can further mprove the ads CTR up to 100%. To our best nowledge, ths wor s an early semantc user segmentaton study for BT n academa. Categores and Subject Descrptors H.3.5 [Informaton Storage and Retreval]: Onlne Informaton Servce Commercal Servce; I.5.1 [Pattern Recognton]: Models Statstcal General Terms Algorthms, Performance, Expermentaton Keywords Behavoral Targetng (BT, User segmentaton, probablstc latent semantc analyss 1. INTRODUCTION Nowadays, a large number of advertsers would le to publsh ther advertsements through Internet, whch brought a new Permsson to mae dgtal or hard copes of all or part of ths wor for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. ADKDD 09, June 28, 2009, Pars, France. Copyrght 2009 ACM 978-1-60558-671-7...$10.00. developng feld nown as onlne advertsng scence. Sponsored search [9] and contextual ads [4] are two of the most wdely studed onlne advertsng busness models. Besdes, Behavoral Targetng, whch ams to analyze users behavors to delver approprate ads to potental consumers, has been valdated to mae onlne advertsng more effectve [19]. A crucal porton n BT s the problem of user segmentaton, whch ams at groupng users nto user segments wth smlar behavors. Snce advertsers generally select user segments most relevant to ther ads, f users wth smlar purchase ntentons are successfully gathered nto the same segment, advertser may gan more proft from the ads delvery. Thus, the qualty of user segmentaton has domnant mpact on the performance of behavoral targeted advertsng. In ths paper, we focus on the problem of user segmentaton for BT n search engne advertsng. User segmentaton s a process arrangng each user nto one or more segments to guarantee that users wth smlar nterests or purchase ntentons are wthn the same segment. We formulate the problem of user segmentaton as follows. Suppose a set of onlne users s gven. For each user, we adopt hs/her hstorcal onlne behavors such as queres to depct hs/her nterests. Some ads have been dsplayed to these users, and these ads are recorded wth the status whether they are clced n the mpresson. Our objectve s to group all users nto approprate segments by the analyss of user behavors n order to mprove the ad clc probablty wthn the user segments n contrast to the massve maret ads. Conventonal user segmentaton approaches such as classfcaton and clusterng stand two lmtatons. (1 Many tradtonal strateges utlze eywords as features, and then mplement clusterng or classfcaton on these features. In ths way, two users, who have the smlar buyng ntentons but have no common words between each other, shall not be put nto the same segment. (2 Many classcal clusterng methods do not allow an object to belong to multple clusters, whch means one user can only stay n a unque segment. We notce that semantc approaches such as Latent Semantc Analyss (LSA [8], Probablstc Latent Semantc Analyss (PLSA [13] and Latent Drchlet Allocaton (LDA [1] are wdely studed and adopted n feld of document classfcaton. Among those approaches, PLSA effectvely mnes the relatonshp between document and word wth a hdden varable called topc. Besdes, PLSA has the ablty *The wor was done when the frst author was ntern at Mcrosoft Research Asa. 10

to tae one document nto multple topcs. Motvated by PLSA n the feld of text mnng, we propose to analyze the smlarty between user-query and document-word and present a semantc approach called Probablstc Latent Semantc User Segmentaton (PLSUS for handlng the lmtatons of those tradtonal user segmentaton strateges. In order to execute semantc user segmentaton for BT, we frstly splt users search queres nto terms and thus each user can be represented as a collecton of terms [19]. Usng ths Bag of Words (BOW representaton [18], the latent varable, whch presents the topcs,.e. semantc user nterests, s nvolved to represent users. Ths topcal varable s able to brdge users and ther observed behavors. We wll show that the latent semantc topcs can present users nterests mplyng potental purchase ntenton n our experments. For ths reason, we drectly utlze the latent semantc topcs to segment users. The Expectaton Maxmzaton (EM approach s appled to mne the latent semantc topcs. Snce a user may have multple nterests, to better mage the user s nterests, we set a threshold and push the user nto those segments wth the probabltes larger than the predefned threshold. In the experments, we compare our proposed PLSUS and a modfed verson, whch s nown as Sngle-PLSUS wth two commonly used clusterng algorthms, CLUTO and -Means. Sngle-PLSUS only allows a user n a unque segment as many tradtonal clusterng algorthms do. The results show that PLSUS can mprove the ads CTR up to 100%. In addton, PLSUS has good performance on classcal F-measure. The rest of ths paper s organzed as follows. In Secton 2, we ntroduce the bacground nowledge about BT and semantc graphcal models n text mnng. In Secton 3, we descrbe our soluton to semantc user segmentaton, namely PLSUS. In Secton 4, we ntroduce the expermental confguraton and results wth analyss. Fnally n Secton 5, we conclude ths paper along wth future wor dscusson. 2. BACKGROUND In ths secton, we ntroduce the bacground nowledge for better understandng ths wor. We demonstrate the basc nowledge on BT ncludng the defnton and related commercal systems n Secton 2.1. In addton, we revew the semantc approaches such as LSA, PLSA, and LDA n Secton 2.2. 2.1 Behavor Targetng Behavoral Targetng s an advertsng methodology, whch s burgeonng n onlne advertsng. Wth ths technque, ads can be effectvely delvered to the most relevant users. Behavor Targetng may mprove the performance of onlne advertsement delvery by two major steps, namely, user segmentaton and user segments ranng. In the user segmentaton step, based on onlne behavor such as vsted webstes, clced pages and nput queres, users are located nto some user segments created n system. In the user segments ranng step, gven an ad, user segments are raned by relevance and the top segments are chosen for ads delvery. Thus, BT successfully dsplays the ads to those most approprate users. At present, BT s attractng more and more attenton n both ndustry and academa. In ndustry, a large amount of commercal systems nvolvng Behavoral Targetng were proposed: Adln [20], whch taes the short user sesson nto consderaton for BT, DoubleClc [24], whch adopts specal features such as browser types and operaton systems of users to mprove the user segmentaton step, Specfcmeda [28], whch predcts each user s nterest and purchase ntenton as a score, and the Yahoo! Smart ads [30], whch ntegrates the demographc and geographc targetng. Addtonally, Almond Net [21], Blue Lthum [27], Burst [23], NebuAd [25], Phorm [26], Revenue Scence [26], and TACODA [29] are the commercal systems wth BT. In academa, Yan et al. [19] frst studed the mprovement of BT n commercal search engnes from three aspects ncludng effectveness, mprovement, and the best strategy for BT. User segmentaton s a process arrangng each user nto one or more segments by a specfc crteron. In BT, ths crteron s to endeavor to guarantee that users wth smlar nterests and purchase ntentons are n the same segment. However, we cannot derve that nformaton drectly. The most wdespread way s mnng the user behavors to represent user nterests and purchase ntentons. That means users wth the smlar behavors mply that they have the smlar favors. Thus, user segmentaton for BT can be descrbed as attemptng to place each user n one or more segments for guaranteeng that the users wth smlar behavors are n the same segment. Snce advertsers tend to choose most relevant segments to pay, the qualty of user segmentaton s extraordnarly crucal. On one hand, f system can gather more users wth smlar nterests nto one segment, advertsers wll buy fewer segments to delver ther ads. On the other hand, apparently, CTR s to mprove f the smlarty between each par of users wthn the same segment s large. Thus, user segmentaton s a ey problem n BT applcaton. Tradtonal user segmentaton approaches for BT can be classfed nto three categores, namely manual user segmentaton, user classfcaton, and user clusterng. Manual rule based user segmentaton, whch classfes users nto segments manually, suffers from a sgnfcant defcency n tme cost. As a result of that large scale data s used for BT, ths method was hardly adopted by the commercal systems. User classfcaton and user segmentaton respectvely mplement classfcaton and clusterng for users. The tradtonal clusterng or classfcaton approaches have two lmtatons n ths applcaton scenaro. (1 Users are segmented only based on contents of ther behavors, not ther semantc nterests. Wth the Bag of Words model, tradtonal strateges utlze terms as features n order to mplement clusterng. That means two users wth the smlar purchase ntentons but wthout same terms between each other have lttle chance to be grouped nto one segment. (2 Many clusterng methods whch are wdely used for BT concentrate on settlng one object n one cluster. On account of ths lmtaton, f a user has two completely dfferent nterests, only one nterest can be presented and the other one has to be dscarded. Thus, t s desred to propose new semantc segmentaton approaches for BT. 2.2 Semantc Analyss Semantc analyss, whch s a well establshed technque n ndustry, mnes hdden semantc relatonshps among objects. Latent Semantc Analyss (LSA [8] s the well-nown approach for dervng the latent semantc relatonshp and wdely used n automatc ndexng and nformaton retreval. The man dea s mappng hgh-dmensonal vectors to low-dmensonal ones n the latent semantc space. Probablstc Latent Semantc Analyss (PLSA model [13, 15], whch s derved from LSA, s able to capture hdden varables wth sold statstcal foundaton. Each 11

object s represented by the convex combnaton of topc, whch s a latent varable n PLSA. Latent Drchlet Allocaton (LDA [1] s smlar to PLSA. The dfference between these two models s that the topc dstrbuton s assumed to have a Drchlet pror n LDA. In document classfcaton, LDA derves more reasonable mxtures of topcs. However, the wor n [11] has proved that the PLSA model s equvalent to the LDA model under a unform Drchlet pror dstrbuton. In ths wor, we focus on PLSA to derve our PLSUS model. PLSA s a sgnfcant breathrough, snce t can dscover latent varables wth more flexblty. Besdes, usng the EM algorthm, we can easly estmate the value n PLSA. In practce, PLSA s wdely used n many felds such as document classfcaton [2, 3, 10, 17], nformaton retreval [14], web usage mnng [16], coctaton analyss [5, 6] and collaboratve flterng [7, 12]. However, there are rare wors whch apply PLSA to user segmentaton for BT. In our study, followng the Bag of Words model, we descrbe each user as a collecton of terms, whch are extracted from ther behavors, such that we can represent users n the Bag of Words model, whch s smlar to the commonly used document representaton strategy. 3. PROBABILISTIC LATENT SEMANTIC USER SEGMENTATION (PLSUS In ths secton, we ntroduce our semantc user segmentaton algorthm. PLSA, whch can dscover the latent relatonshp between two objects, s wdely studed n document classfcaton and clusterng problems. In text mnng, we generally use the Bag of Words model [18] to represent documents. Accordng to the wor of Yan et al. [19], users behavors can be represented by ther hstorcal queres. Notce the fact that query conssts of terms, thus we can treat each query as one set of terms. Through ths way, each user can be represented by a bag of words, whch s the same as the representaton of text document. Let u U u, u,..., u } { 1 2 n stand for a user, where U presents the set of all users for BT, suppose t T t, t,..., t } s a term, where T represents the j { 1 2 m vocabulary of all terms used by all users. We defne of all terms used by u, thus, T Tu u U T u as the set Then, we defne the co-occurrence matrx N { n( u, t j }, where n( u, t j descrbes the number of tme t j used by u. To semantcally segment users, we ntroduce the latent varable z Z z, z,..., z } whch represents the topcs,.e. semantc { 1 2 l ntentons of users. Ths latent varable has the close relatonshp wth both user and query, whch has been transformed nto terms. From the user s perspectve, topc mples the hdden nterest of user. On the other hand, from the term s perspectve, terms n one topc may be gathered wth some specfed feld. Here, we assume that for a gven topc varable z, users and terms are ndependent to each other. We adopt the classcal aspect model [13] n PLSUS. The graphc model of aspect model s gven by Fgure 1. Fgure 1. Graph of the aspect model In the BT scenaro, each user has the probablty P z u to ( generate a topc z, and then z has the probablty P ( t j z to generate term t j. Gven the basc model, P( u, t j P( u P( t j u P ( t j u P( t j z P( z u z Z Notce that, ths model contans the probablty P z u and ( P ( u whch are not convenent to compute. Thus, we transform ths model nto another equalng form, P ( u, t j P( z P( u z P( t j z, z Z where P ( z presents the probablty that z s observed n Z, P ( u z s the probablty that u s relevant to the gven topc z and P ( t j z s the probablty that t j s related to the gven topc z. The Graphcal model representaton s shown n Fgure 2. Fgure 2. Graph of the PLSUS. The same as PLSA n the feld of text mnng, we am to maxmze the lelhood defned as, L n m n( u, t j logp( u, t j 1 j1 n m l n( u, t j logp( z P( u z P( t j z 1 j1 1 In order to maxmze L, we adapt the classcal Expectaton Maxmzaton (EM approach. EM approach s wdely used n computng maxmum lelhood n latent varable model. EM s an teratve method whch alternates between performng two steps. (1 Expectaton step (E step. Usng the current estmates of parameters, we compute the posteror probabltes P z u, t ( j for the latent varable. (2 Maxmzaton step (M step. Amng to maxmze complete maxmze lelhood E [ L c ], we update P ( z, P( u z and P ( t j z. 12

After fnshng EM computaton, PLSUS ams to segment users wth the model obtaned. Snce the topc has the close relatonshp wth user and term, apparently, topc can be used as user segment. In ths way, the semantc attrbutes become the domnant factors n user segmentaton. Thus, we am to solve the queston of how to segment users nto dfferent topcs. To solve ths queston, we focus on an mportant probablty P z u whch presents the ( topc (user segment z s observed wth a gven user u. It can descrbe how close the relatonshp between z and u s. P z u s able to be computed by, ( m j1 m l n( u, t j P( z u, t j P( z u. n( u, t j P( z' u, t j j1 ' 1 Intutvely, the easest way to segment users nto topcs s that, computng all P ( z u, z Z for each u, and then puttng u nto the topc wth the hghest P z u. However, ths ( approach of user segmentaton cannot handle the followng crcumstance: If a user s nterested n sports and coong whle there are two topcs whch exactly mply sports and coong, ths segmentaton method wll choose only one topc for a user at most. In ths way, we may lose a user s nterest. In order to get over ths defcency, we present a novel approach for segmentng the users based on the probablty P z u. Here, we apply a threshold ( for user segmentaton. Let S be the set of user segments and s S as the segment wth topc z, thus the user segmentaton approach s, u s u s f P( z u threshold, otherwse. Comparng wth those tradtonal clusterng methods, ths smple method allows one user belong to multple segments. 4. EXPERIMENTS In ths secton, we systematcally evaluate the proposed PLSUS algorthm. Two normal clusterng methods are used as baselnes n experments. Also, to better compare wth normal clusterng approaches, a modfed PLSUS whch we called Sngle-PLSUS s nducted. Some evaluatons are used n our experments to measure the performance of each approach. 4.1 Data Sets In ths part, we use a one day s ads clc-through log record collectng from a commercal search engne. Ths data can effectvely present users clc-through behavors. Table 1 shows the format of ths data used n our experments. From ths table, we can see that there are four propertes for the data we focused on. UserId presents a specfed user, dfferent user has dfferent UserId. Smlar to UserId, AdId s used as the unque dentfcaton for each advertsement. Query shows the content of a query used by user, and we can dvde t nto terms to adapt to PLSUS. ClcCnt s an mportant property whch s used n our evaluaton metrcs such as CTR. From the example n Table 1, we now a specfed user wth UserId EEEC97C25FD50C1AB282 D39FB13976D9 used a query whose content s boos, and then the system dsplays an advertsement wth AdId 3238034 to ths user. However, ths user dd not clc ths ad. Table 1. Format of log record used n our experments. UserId Query EEEC97C25FD50C1AB282D39FB13976D9 Boos AdId 3238034 ClcCnt 0 We use two datasets ncludng 120,000 and 150,000 log records respectvely to verfy the performance of PLSUS. Both of them contan thousands of users. In our experments, we tae all users n 120,000 log dataset nto 5 and 10 segments, whle all users n 150,000 log dataset are pushed n 10 and 20 segments respectvely usng dfferent approaches. 4.2 Experment Setup In ths part, we ntroduce the ey steps of our experments. In user segmentaton, let A a, a,..., a } be the set of ads n our { 1 2 n U { u1, u2,..., um dataset, } be the group of users who have dsplayed a. Furthermore, after we segment users wth dfferent approaches, we acqure the user segments. Thus, we defne user D U { d ( U, d ( U,..., d ( U }, 1, 2,..., n be the ( 1 2 dstrbuton of U wth our obtaned user segments and d ( U the set of users who belong to the th segment. Apparently, the th segment can be descrbe as, d 1,2,..., n The ey steps n our experments are, d ( U (1 We compare PLSUS, Sngle-PLSUS, -Means and CLUTO n our dataset, where Sngle-PLSUS s a modfed PLSUS whch we wll ntroduce n latter secton of ths paper. (2 We utlze the dfferent threshold whch s adopted n segmentng users after comng out the fnal model by EM algorthm to test the senstvty of PLSUS. 4.3 Evaluaton Metrcs In [19], Yan ntroduced some evaluatons whch can measure the BT s performance effectvely. Consultng these good evaluatons, we perform four evaluatons to measure the performance of each approach and to compare our soluton wth the baselnes. They are, ads Clc-Through Rate (CTR, ads Clc-Through Rate Improvement, ads clc Entropy and F-measure. Wth the symbols we defned, CTR can be represented by, where ( u j s defned as, m 1 CTR a ( u j m j1 1 ( uj 0 CTR of a over user segment (, f ujclceda otherwse d s, 13

1 CTR( a d ( uj, d ( U uj d ( U where d ( U s the number of users n d. ( U Note that CTR a s the raw CTR. n other words, CTR a s ( the CTR over all users dsplayed a. CTR ( a d presents the CTR of each user segment after segmentaton. In order to measure the mprovement of CTR by user segmentaton, we defne a new evaluaton metrc for PLSUS. Ths new evaluaton should satsfy two condtons, (1 Maxmum: choosng the segment whch has maxmum CTR. Ths s reasonable because ad publsher would le to recommend the user segment wth hghest ad clc probablty to advertser for ads delvery. (2 Majorty: the number of users n ths segment cannot be less than average. Ths condton can reduce some specal stuaton. For example, the th user segment only has 1 user and he/she clced a. Then, CTR ( a d 1. Apparently, ths segment s not approprate to be recommended to advertser. Integratng these two condtons, we defne the CTR mprovement for a as, where CTR( a d ( a CTR( a ( a, CTR( a * d ( a arg max{ CTR( a ~ d { d Thus, CTR mprovement Entropy s defned as, where, * d, d d ~ } d ( U 1 1,2,..., K and } m K ( a / n. K Enp( a P( d a logp( d a, 1 1 P( d a ( uj m uj d ( U Note that the smaller the Entropy s, the better results we wll obtan [19]. The classcal F-measure ncludng Precson, Recall and F measure, are defned as, F( a Pre( a d CTR( a d Rec( a d d uj d ( U m j1 ( uj ( uj 2Pre( a d Rec( a d Pre( a d Rec( a d ( where the larger F-measure s, the better performance we have. 4.4 Results In ths part, we ntroduce the detals n our experments and show the results. To show the performance of PLSUS, we am to compare PLSUS wth tradtonal clusterng methods. CLUTO and -Means are selected as the baselnes. However, t s unfar to compare CLUTO and -Means wth PLSUS snce PLSUS allows one user belong to multple segments, whle both CLUTO and - Means permt one user to belong to only one user segment. In order to solve ths problem, a Sngle-PLSUS s mplemented to brdge the gap between PLSUS and tradtonal clusterng approaches. By Sngle-PLSUS, a gven user u s settled n a P ( z u unque segment z whch has the max. On one hand, comparng Sngle-PLSUS wth CLUTO and -Means can show whether the semantc approach mproves BT s performance. On the other hand, t can clearly show the mpact on allowng one user to belong to multple segments by comparson between PLSUS and Sngle-PLSUS. The results are shown n Table 2-4. Note that the best results are n bold face. Note that we set threshold 0.2, the further explanaton s shown n the latter sectons. CTR s one of the most basc and crtcal evaluaton metrc for onlne advertsng problems. From the Table 2, we can generally observe two phenomena. Frst, by ncreasng the number of segments, the mprovement of CTR s ncreasng smultaneously. In the 150,000 log dataset, as the segments doubled, the mprovement of CTR ncreases two fold. In the same dataset, wth the 20 segments, the PLSUS mproves CTR up to 100% aganst tradtonal CLUTO. Second, all semantc approaches have the good performances on CTR mprovement. By further analyss, Sngle-PLSUS totally exceeds CLUTO and -Means. Ths fact proves that the semantc approach s approprate to be adopted n BT. Snce we gathered all queres used for each user and dvde these queres nto terms, we dscover the correspondence between user-query and document-words. The results verfy the correctness of our dea. The observaton of comparson between PLSUS and Sngle-PLSUS shows the advantages from allowng user to be pushed nto multple segments. Besdes, n Yan s wor [19], CTR mprovement wth CLUTO and -Means are around 100% by group users nto 20 segments, whch has been proved by our expermental results. Snce Yan s experments shown that CTR mprovement can reach to 670% by 160 user segments n the large scale dataset, we are confdent to expect that we can mprove CTR more than that f we group users nto more segments. In our future wor, we wll ncrease the scalablty to verfy ths concluson. We compute the average ads clc Entropy over all ads n the dataset we used. The result s shown n Table 3. Generally, all user segmentaton approaches entropes are almost the same. In ths case, entropy has less effect on dstncton among those methods than CTR. From the detaled observaton, we dscover that the entropy of PLSUS s larger than others. Consderng ther attrbutes, the reason s easy to get. The same crteron of user segmentaton, whch allows sngle user belong to multple segments, s used n PLSUS. That means there s more than one segment whch may have been delvered an ad many tmes. In ths way, the entropy s naturally larger than those user segmentaton approaches whch only assocate one user wth one segment. 14

Table 2. CTR mprovement of dfferent user segmentaton strateges. 5 segments n 10 segments n 10 segments n 20 segments n PLSUS 0.7876 1.3583 1.3036 2.6549 Sngle-PLSUS 0.6753 1.2985 1.2353 2.5287 CLUTO 0.6444 0.7399 0.7447 1.2076 -Means 0.5440 0.7761 0.8616 1.0324 Table 3. Ads clc Entropy of dfferent user segmentaton strateges. 5 segments n 10 segments n 10 segments n 20 segments n PLSUS 0.1636 0.1780 0.1824 0.1735 Sngle-PLSUS 0.1527 0.1542 0.1590 0.1611 CLUTO 0.1506 0.1586 0.1531 0.1540 -Means 0.1532 0.1515 0.1575 0.1554 PLSUS Sngle-PLSUS CLUTO -Means Table 4. F-measure of dfferent user segmentaton strateges. 5 segments n 10 segments n 10 segments n 20 segments n Precson 0.9947% 1.0628% 1.0954% 1.2424% Recall 1.3116% 1.3080% 1.3221% 1.3407% F 1.0503% 1.1071% 1.1414% 1.2567% Precson 0.9932% 1.0568% 1.0972% 1.2100% Recall 1.2672% 1.2927% 1.2896% 1.3410% F 1.0443% 1.1005% 1.1378% 1.2332% Precson 0.9271% 0.9634% 0.9546% 1.0019% Recall 1.3386% 1.3718% 1.3824% 1.3979% F 0.9958% 1.0283% 1.0229% 1.0656% Precson 0.9196% 0.9197% 0.9663% 0.9833% Recall 1.3122% 1.3708% 1.3930% 1.4083% F 0.9870% 0.9945% 1.0346% 1.0520% Precson, Recall and F-measure are shown n Table 4. Note that, the results reported n ths table are the average over all ads. Frst of all, we dscover the two facts that: (1 semantc approaches have better presentatons n Precson. Snce we choose the CTR as the Precson, ths result can be predcted by CTR mprovement. (2 Wthn three semantc approaches, PLSUS performs better than others. By these two facts, we can conclude that our proposed methods are helpful to mprove the Precson (CTR. An nterestng observaton s the Recall of tradtonal clusterng approaches s hgher than others n our two small datasets. Consderng the low precson, we can decde that the hgh-ctr segments clustered by CLUTO or -Means should nclude many users. In other words, the way to mprove CTR of a segment n tradtonal approaches s to add more users to ths segment. On the contrary, semantc user segmentaton can mprove the CTR wthout buldng user segment wth too large populaton. Ths characterstc s very useful for accurate ads delvery. Integratng Precson and Recall, the F-measure can evaluate the performance of user segmentaton. From the results of hgh F-measures of PLSUS and Sngle-PLSUS, we can draw the concluson that semantc user segmentaton has better performance than classcal clusterng methods. Fnally, we dscuss the nfluence of parameter threshold n PLSUS model. We set up a seres of experments whch group users nto 10 segments on the 120,000 log record data. Apparently, f threshold 0. 5, the output of PLSUS wll be constant. Thus, we set threshold from 0.05 to 0.5 and the Fgure 3-4 dsplay the results. Snce bgger threshold ndcates that user have smaller chance to be collected nto multple segments, the CTR mprovement lowers down when threshold becomes bgger n Fgure 3. However, f we tae a too small threshold, each user 15

wll have bg opportunty to be settled n many segments. In ths way, each segment wll contan too much users and lead bg entropy. The result n Fgure 4 shows ths fact. Analyzng Fgure 3-4, we consder that threshold around 0.2 can perform good performance both on CTR mprovement and entropy. Therefore, we set threshold 0. 2 for PLSUS n the experment whch compares four user segmentaton approaches. Fgure 3. Change of CTR mprovement wth ncreasng threshold. Fgure 4. Change of ads clc Entropy wth ncrease threshold. 5. CONCLUSION AND FUTURE WORK In ths paper, we developed a novel semantc approach called PLSUS for BT. We compared the proposed PLSUS algorthm wth two tradtonal clusterng user segmentaton approaches, CLUTO and -Means. From the expermental results we can draw the concluson that semantc approach PLSUS brngs better mprovements for BT n contrast to the tradtonal user clusterng, especally n terms of CTR mprovement. In our future wor, we wll pay more attenton to Latent Drchlet (LDA. It has been noted that, LDA has better results n document classfcaton than PLSA. Thus, we wll study ths model and attempt to apply t to user segmentaton for verfyng whether t has better performance for BT than PLSUS does. In addton, we wll modfy the EM algorthm to parallelze PLSUS. We beleve t s helpful to further ncrease the algorthmc scalablty and mprove the effcency. 6. REFERENCES [1] D. Ble, A. Ng, and M. Jordan. Latent Drchlet allocaton. Journal of Machne Learnng Research, 3(2003, 993 1022. [2] T. Brants, F. Chen, and I. Tsochantards. Topc-based document segmentaton wth probablstc latent semantc analyss. In Proceedngs of CIKM '02 (Las Palmas, June 2002, ACM Press, 211-218. [3] T. Brants and R. Stolle. Fnd smlar documents n document collectons. In Proceedngs of LREC '02 (Span, June 2002. [4] A. Broder, M. Fontoura, V. Josfovs and L. Redel. A semantc approach to contextual advertsng. In Proceedngs of SIGIR '07 (Amsterdam, July 2007, ACM Press, 559-566. [5] D. Cohn and H. Chang. Learng to probablstcally dentfyng authortatve documents. In Proceedngs of the ICML '00 (Stanford, June 2000, Morgan Kauffmann, 167-174. [6] D. Cohn and T. Hofmann. The mssng ln: A probablstc model of document content and hypertext connectvty. In Proceedng of NIPS '00 (Denver, November 2000, MIT Press. [7] A. Das, M. Datar and A. Garg. Google news personalzaton: scalable onlne collaboratve flterng. In Proceedng WWW '07 (Banff, May 2007, ACM Press, 271-280. [8] S. Deerwester, S. Dumas, G. Furnas, T. Landauer, and R. Hashman. Indexng by latent semantc analyss. Journal of the Amercan Socety for Informaton Scence, 41(1990, 391-407. [9] D. C. Fan and J. O. Pedersen. Sponsored search: a bref hstory. In Bulletn of the Amercan Socety for Informaton Scence and Technology, 2005. [10] E. Gausser, C. Goutte, K. Popat, and F. Chen. A herarchcal model for clusterng and categorzng documents. In Advances n Informaton Retreval Proceedngs of the 24th BCS-IRSG European Colloquum on IR Research (Glasgow, March 2002. [11] M. Grolam and A. Kabán. On an equvalence between PLSI and LDA. In Proceedng SIGIR '03 (Toronto, July 2003, ACM Press, 433-434. [12] A. Harpale and Y. Yang. Personalzed actve learnng for collaboratve flterng. In Proceedng of SIGIR '08 (Sngapore, July 2008, ACM Press, 91-97. [13] T. Hofmann. Probablstc latent semantc analyss. In Proceedngs of UAI '99 (Stocholm, July 1999, Morgan Kaufmann, 289-296. [14] T. Hofmann. Probablstc latent semantc ndexng. In Proceedngs of SIGIR '99 (Bereley, August 1999, ACM Press, 50-57. [15] T. Hofmann. Unsupervsed learnng by probablstc latent semantc analyss. Machne Learnng Journal, 42(2001, 177 196. [16] X. Jn, Y. Zhou, and B. Mobasher. Web usage mnng based on probablstc latent semantc analyss. In Proceedngs of KDD '04 (Seattle, August 2004, ACM Press, 197-205. [17] Y. Km, J. Chang, and B. Zhang. An emprcal study on dmensonalty optmzaton n text mnng for lngustc nowledge acquston. In Proceedngs of KDD '03 (Seoul, Aprl 2003, ACM Press, 111-116. 16

[18] G. Salton and C. Bucley. Term-weghtng approaches n automatc text retreval. Informaton Processng and Management: an Internatonal Journal, 24 (1988, 513-523. [19] J. Yan, N. Lu, G. Wang, W. Zhang, Y. Jang and Z. Chen. How much the Behavoral Targetng can help onlne advertsng? In Proceedng of WWW '09 (Madrd, Aprl 2009, ACM Press, 261-270. [20] Adln, https://www.google.com/adsense/logn/en_us/?gsessond= Dc28hZShnCI [21] Almond Net, http://www.almondnet.com/ [22] Blue Lthum, http://www.bluelthum.com/ [23] Burst, http://www.burstmeda.com/ [24] Double Clc, http://www.doubleclc.com/products/dfa/ndex.aspx [25] NebuAd, http://www.nebuad.com/ [26] Phorm, http://www.phorm.com/ [27] Revenue Scence, http://www.revenuescence.com/advertsers/advertser_solut ons.asp [28] Specfcmeda, http://www.specfcmeda.co.u/ [29] TACODA, http://www.tacoda.com/ [30] Yahoo! Smart Ads, http://advertsng.yahoo.com/maretng/smartads/ 17