Adversarial Classification

Transcription

1 Adversaral Classfcaton Nlesh Dalv Pedro Domngos Mausam Sumt Sangha Deepak Verma Department of Computer Scence and Engneerng Unversty of Washngton, Seattle Seattle, WA , U.S.A. ABSTRT Essentally all data mnng algorthms assume that the datageneratng process s ndependent of the data mner s actvtes. However, n many domans, ncludng spam detecton, ntruson detecton, fraud detecton, survellance and counter-terrorsm, ths s far from the case: the data s actvely manpulated by an adversary seekng to make the classfer produce false negatves. In these domans, the performance of a classfer can degrade rapdly after t s deployed, as the adversary learns to defeat t. Currently the only soluton to ths s repeated, manual, ad hoc reconstructon of the classfer. In ths paper we develop a formal framework and algorthms for ths problem. We vew classfcaton as a game between the classfer and the adversary, and produce a classfer that s optmal gven the adversary s optmal strategy. Experments n a spam detecton doman show that ths approach can greatly outperform a classfer learned n the standard way, and (wthn the parameters of the problem) automatcally adapt the classfer to the adversary s evolvng manpulatons. Categores and Subject Descrptors H.2.8 [Database Management]: Database Applcatons data mnng; I.2.6 [Artfcal Intellgence]: Learnng concept learnng, nducton, parameter learnng; I.5. [Pattern Recognton]: Models statstcal; I.5.2 [Pattern Recognton]: Desgn Methodology classfer desgn and evaluaton, feature evaluaton and selecton; G.3 [Mathematcs of Computng]: Probablty and Statstcs multvarate statstcs General Terms Algorthms Keywords Cost-senstve learnng, game theory, nave Bayes, spam detecton, nteger lnear programmng Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. To copy otherwse, to republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. KDD 4, August 22 25, 24, Seattle, Washngton, USA. Copyrght 24 M /4/8...$5... INTRODUCTION Many major applcatons of KDD share a characterstc that has so far receved lttle attenton from the research communty: the presence of an adversary actvely manpulatng the data to defeat the data mner. In these domans, deployment of a KDD system causes the data to change so as to make the system neffectve. For example, n the doman of emal spam detecton, standard classfers lke nave Bayes were ntally qute successful (e.g., [23]). Unfortunately, spammers soon learned to fool them by nsertng non-spam words nto emals, breakng up spam ones wth spurous punctuaton, etc. Once spam flters were modfed to detect these trcks, spammers started usng new ones [4]. Effectvely, spammers and data mners are engaged n a never-endng game where data mners contnually come up wth new ways to detect spam, and spammers contnually come up wth new ways to avod detecton. Smlar arms races are found n many other domans: computer ntruson detecton, where new attacks crcumvent the defenses put n place aganst old ones [7]; fraud detecton, where perpetrators learn to avod the actons that prevously gave them away [5, 25]; counter-terrorsm, where terrorsts dsguse ther dentty and actvtes n ever-shftng ways []; aeral survellance, where targets are camouflaged wth ncreasng sophstcaton [22]; comparson shoppng, where merchants contnually change ther Web stes to avod wrappng by shopbots [3]; fle sharng, where meda companes try to detect and frustrate llegal copyng, and users fnd ways to crcumvent the obstacles [4]; Web search, where webmasters manpulate pages and lnks to nflate ther rankngs, and search engnes reengneer ther rankng functons to deflate them back agan [9, 6]; etc. In many of these domans, researchers have noted the presence of adaptve adversares and the need to take them nto account (e.g., [4, 5, ]), but to our knowledge no systematc approach for ths has so far been developed. The result s that the performance of deployed KDD systems n adversaral domans can degrade rapdly over tme, and much human effort and cost s ncurred n repeatedly brngng the systems back up to the desred performance level. Ths paper proposes a frst step towards automatng ths process. Whle complete automaton wll never be possble, we beleve our approach and ts future extensons have the potental to sgnfcantly mprove the speed and cost-effectveness of keepng KDD systems up to date wth ther adversares. Notce that adversaral problems cannot smply be solved by learners that account for concept drft (e.g., []): whle these learners allow the data-generatng process to change

2 over tme, they do not allow ths change to be a functon of the classfer tself. We frst formalze the problem as a game between a costsenstve classfer and a cost-senstve adversary (Secton 2). Focusng on the nave Bayes classfer (Secton 3), we descrbe the optmal strategy for the adversary aganst a standard (adversary-unaware) classfer (Secton 4), and the optmal strategy for a classfer playng aganst ths strategy (Secton 5). We provde effcent algorthms for computng or approxmatng these strateges. Experments n a spam detecton doman llustrate the sometmes very large utlty gans that an adversary-aware classfer can yeld, and ts ablty to co-evolve wth the adversary (Secton 6). We conclude wth a dscusson of future research drectons (Secton 7). 2. PROBLEM DEFINITION Consder a vector varable X = (X,..., X,..., X n), where X s the th feature or attrbute, and let the nstance space X be the set of possble values of X. An nstance x s a vector where feature X has the value x. Instances can belong to one of two classes: postve (malcous) or negatve (nnocent). Innocent nstances are generated..d. (ndependent and dentcally dstrbuted) from a dstrbuton P (X -), and malcous ones lkewse from P (X +). The global dstrbuton s thus P (X) = P (-)P (X -) + P (+)P (X +). Let the tranng set S and test set T be two sets of (x, y) pars, where x s generated accordng to P (X) and y s the true class of x. We defne adversaral classfcaton as a game between two players: Classfer, whch attempts to learn from S a functon y C = C(x) that wll correctly predct the classes of nstances n T, and Adversary, whch attempts to make Classfer classfy postve nstances n T as negatve by modfyng those nstances from x to x = A(x). (Adversary cannot modfy negatve nstances, and thus A(x) = x for all x -.) Classfer s characterzed by a set of cost/utlty parameters (see Table for a summary of the notaton used n ths paper):. V : Cost of measurng X. Dependng on ther costs, Classfer may choose not to measure some features. 2. (y C, y): Utlty of classfyng as y C an nstance wth true class y. Typcally, (+, -) < and (-, +) <, denotng the cost of msclassfyng an nstance (costs beng negatve utltes), and (+, +) >, (-, -) >. Adversary has a correspondng set of parameters:. W (x, x ) : Cost of changng the th feature from x to x. W (x, x ) = for all x. We wll also use W (x, x ) to represent the cost of changng an nstance x to x (whch s smply the sum of the costs of all the ndvdual feature changes made). 2. U A(y C, y): Utlty accrued by Adversary when Classfer classfes as y C an nstance of class y. Typcally, U A(-, +) >, U A(+, +) < and U A(-, -) = U A(+, -) =, and we wll assume ths henceforth. The goal of Classfer s to buld a classfer C that wll maxmze ts expected utlty, takng nto account that nstances may have been modfed by Adversary: = (x,y) X Y P (x, y) (C(A(x)), y) X X C (x) V () where Y = {+, -} and X C(x) {X,..., X n} s the set of features measured by C, possbly dependent on x. We call C the optmal strategy of Classfer. The goal of Adversary s to fnd a feature change strategy A that wll maxmze ts own expected utlty: U A = (x,y) X Y P (x, y) [U A(C(A(x)), y) W (x, A(x))] (2) We call A the optmal strategy of Adversary. Notce that Adversary wll not change nstances f the cost of dong so exceeds the utlty of foolng Classfer. For example, a spammer wll not modfy hs emals to the pont where they no longer help sell hs product. In practce, and U A are estmated by averages over T : = (/ T ) (x,y) T [(C(A(x)), y) X X C (x) V], etc. Gven two players, the actons avalable to each, and the payoffs from each combnaton of actons, classcal game theory s concerned wth fndng a combnaton of strateges such that nether player can gan by unlaterally changng ts strategy. Ths combnaton s known as a Nash equlbrum [7]. In our case, the actons are classfers C and feature change strateges A, and the payoffs are and U A. As the followng theorem shows, some realzatons of the adversaral classfcaton game always have a Nash equlbrum. Theorem 2.. Consder a classfcaton game wth a bnary cost model for Adversary,.e., gven a par of nstances x and x, Adversary can ether change x to x (ncurrng a unt cost) or t cannot (the cost s nfnte). Ths game always has a Nash equlbrum, whch can be found n tme polynomal n the number of nstances. We omt the proof due to lack of space. Unfortunately, the calculaton of the Nash equlbrum requres complete and perfect knowledge of the probabltes of all the nstances, whch n practce Adversary and Classfer wll not have. Computng Nash equlbra wll generally be ntractable. The chef dffculty s that even n fnte domans the number of avalable actons s doubly exponental n the number of features n. The best known algorthms for fndng Nash equlbra n general (nonzero) sum games have worst-case exponental tme n the number of actons, makng them trply exponental n our case. Even usng the more general noton of correlated equlbra, for whch polynomal algorthms exst, the computatonal cost s stll doubly exponental. Recent years have seen substantal work on computatonally tractable approaches to game theory, but they focus manly on scalng up wth the number of players, not the number of actons [2]. Further, equlbrum strateges, ether mxed or pure, assume optmal play on the part of the opponent, whch s hghly unrealstc n our case. When ths assumpton s not met, standard game theory gves no gudance on how to play. (Ths, and computatonal ntractablty, have sgnfcantly lmted ts practcal use.) We thus leave the general exstence and form of Nash or other equlbra n adversaral classfcaton as an open

3 Symbol Meanng X = (X, X 2,..., X n) Instance. P (x) Probablty dstrbuton of untanted data. X th feature (attrbute). X, X Doman of X and X, respectvely. x, x An nstance and the th attrbute of that nstance. S, T Tranng and test set. y C = C(x) The Classfer functon. x A = A(x) The Adversary transformaton. V Cost of measurng X. (y C, y) Utlty for Classfer of classfyng as y C an nstance of class y. W (x, x ), W (x, x ) Cost of changng the th feature from x to x and nstance x to x, respectvely. U A(y C, y) Utlty accrued by Adversary when Classfer classfes as y C an nstance of class y. X C(x) Set of features measured by C. ( ) LO C(x ) Log-odds or contrbuton of th attrbute to nave Bayes classfer ( ln P (X =x +) P (X =X -) ). gap(x) gap(x) > classfes x as postve LO C(x) (-,-) (+,-) (+,+) (-,+). U A Adversary s utlty gan from successfully camouflagng a postve nstance (U A(-,+) U A(+,+)). LO,x Gan towards makng x negatve by changng th feature to x (LO C(x ) LO C(x )). MCC(x) Nearest nstance (costwse) to x whch Nave Bayes classfes as negatve. x [=x ] An nstance dentcal to x except that th attrbute s changed to x X. P A(x) Probablty dstrbuton after Adversary has modfed the data. Table : Summary of the notaton used n ths paper. queston, and propose nstead to start from a set of assumptons that more closely resembles the way adversaral classfcaton takes place n practce: Classfer ntally operates assumng the data s untanted (.e., A(x) = x for all x); Adversary then deploys an optmal plan A(x) aganst ths classfer; Classfer n turn deploys an optmal classfer C(A(x)) aganst ths adversary, etc. Ths approach has some commonalty wth evolutonary game theory [26], but the latter makes a number of assumptons that are napproprate n our case (nfnte populaton of players repeatedly matched at random, symmetrc payoff matrces, players havng offsprng proportonal to average payoff, etc.). In ths paper, we focus manly on the sngle-shot verson of the adversaral classfcaton game: one move by each of the players. We touch only brefly on the repeated verson of the game, where players contnue to make moves ndefntely. A number of learnng approaches to repeated games have been proposed [6], but these are also ntractable n large acton spaces. Other learnng approaches focus on games wth sequental states (e.g., [5]), whle classfcaton s stateless. We make the assumpton, standard n game theory, that all parameters of both players are known to each other. Although ths s unlkely to be the case n practce, t s generally plausble that each player wll be able to make a rough guess of the other s (and, ndeed, ts own) parameters. Classfcaton wth mprecsely known costs and other parameters has been well studed n KDD (e.g., [2]), and extendng ths to the adversaral case s an mportant tem for future work. 3. COST-SENSITIVE LEARNING In ths paper, we wll focus on nave Bayes as the classfer to be made adversary-aware [2]. Nave Bayes s attractve because of ts smplcty, effcency, and excellent performance n a wde range of applcatons, ncludng adversaral ones lke spam detecton [23]. Nave Bayes estmates the probablty that an nstance x belongs to class y as P (y x) = P (y) P (y) P (x y) = P (x) P (x) n P (x y) (3) and predcts the class wth hghest P (y x). The denomnator P (x) s ndependent of the class, and can be gnored. P (x y) = n = P (x y) s the nave Bayes assumpton. The relevant probabltes are learned smply by countng the correspondng occurrences n the tranng set S. We begn by extendng nave Bayes to ncorporate the measurement costs V and classfcaton utltes (y C, y) defned n the prevous secton, and to maxmze the expected utlty (Equaton ). For now, we assume that no adversary s present (.e., A(x) = x for all x). We remove ths restrcton n the next sectons. Cost-senstve learnng has been the object of substantal study n the KDD lterature [, 27]. Gven a classfcaton utlty matrx (y C, y), the Bayes optmal predcton for an nstance x s the class y C that maxmzes the condtonal utlty U(y C x): = U(y C x) = y Y P (y x)(y C, y) (4) Ths s smply Equaton condtoned on a partcular x, and gnorng the adversary and measurement costs V. In nave Bayes, P (y x) s computed usng Equaton 3. Measurement costs are ncorporated nto the choce of whch subset of features to measure, X C {X,..., X n}. Intutvely, we want to measure feature X only f ths mproves the expected utlty by more than V. Snce a feature s effect on wll n general depend on what other features are beng measured, fndng the optmal X C requres a potentally exponental search. In practce, X C can be found usng standard feature selecton algorthms wth as the evaluaton functon. We use greedy forward selecton ([3])

4 n our experments. (Feature selecton can also be carred out onlne, but we do not pursue that approach here.) 4. ADVERSARY STRATEGY In ths secton, we formalze the noton of an optmal strategy for Adversary. We model t as a constraned optmzaton problem, whch can be formulated as an nteger lnear program. We then propose a pseudo-lnear tme soluton to the nteger LP, based on dynamc programmng. We make the followng assumptons. Assumpton. Complete Informaton: Both Classfer and Adversary know all the relevant parameters: V,, W, U A and the nave Bayes model learned by Classfer on S (ncludng X C, P (y), and P (x y) for each feature and class). Assumpton 2. Adversary assumes that Classfer s unaware of ts presence (.e., Adversary assumes that C(x) s the nave Bayes model descrbed n the prevous secton). To defeat Classfer, Adversary needs only to modfy features n X C, snce the others are not measured. From Equaton 3: log P (+ x) P (+) = log P (- x) P (-) + log P (x +) P (x x X -) C For brevty, we wll use the notaton LO C(x) = log P (+ x) P (- x) and LO C(x ) = log P (x +) P (x, where LO s short for log odds. -) Nave Bayes classfes an nstance x as postve f the expected utlty of dong so exceeds that of classfyng t as negatve,.e., f (+, +)P (+ x) + (+, -)P (- x) > (-, +) P (+ x) + (-, -)P (- x), or P (+ x) UC(-, -) UC(+, -) > P (- x) (+, +) (-, +) Let the log of the rght hand sde be LT () (log threshold). Then nave Bayes classfes nstance x as postve f LO C(x) > LT (), or equvalently f gap(x) >, where gap(x) = LO C(x) LT (). If the nstance s classfed as negatve, Adversary does not need to do anythng. Let us assume, then, that x s classfed as postve,.e., gap(x) >. The objectve of Adversary s to make some set of feature changes to x that wll cause t to be classfed as negatve, whle ncurrng the mnmum possble cost. Ths causes Adversary to gan a utlty of U A = U A(-, +) U A(+, +). Thus Adversary wll transform x as long as the total cost ncurred s less than U A and not otherwse. We formulate the problem of fndng an optmal strategy for Adversary as an nteger lnear program. Recall that X s the doman of X. For x X, let δ,x be an nteger (bnary) varable whch takes the value one f the feature X s changed from x to x, and zero otherwse. Let the new data tem thus obtaned be x. The cost of transformng x to x s W (x, x ) = W (x, x ), and the resultng change n log odds s LO C(x ) LO C(x) = LO C(x ) LO C(x ). Defne LO,x = LO C(x ) LO C(x ). Ths s the gan n Adversary s objectve of makng the nstance negatve. Note that LO,x = ; ths represents the case where X has not been changed. To transform x so that the new nstance s classfed as negatve, Adversary needs to change the values of (5) (6) some features such that the sum of ther gans (decrease n log odds) s more that gap(x). Thus, to fnd the mnmum cost changes requred to transform ths nstance nto a negatve nstance, we need to solve the followng nteger (bnary) lnear program: mn X X C x X X X C x X δ,x {, }, W (x, x )δ,x s.t. LO,x δ,x gap(x) x X δ,x The bnary δ,x values encode whch features are changed to whch values. The optmzng equaton mnmzes the cost ncurred n ths transformaton. The frst constrant makes sure that the new nstance wll be classfed as negatve. The second constrant encodes the requrement that a feature can only have a sngle value n an nstance. We wll call the transformed nstance obtaned by solvng ths nteger lnear program the mnmum cost camouflage (MCC) of x. In other words, MCC(x) s the nearest nstance (costwse) to x whch nave Bayes classfes as negatve. After solvng ths nteger LP, Adversary transforms the nstance only f the mnmum cost obtaned s less than U A. Therefore, lettng (x) be the nave Bayes class predcton for x, { MCC(x) f (x) = +, W (x, MCC(x)) < U A A(x) = x otherwse (7) The above nteger (bnary) LP problem s NP-hard, as the - knapsack problem can be reduced to t [8]. However, a pseudo-lnear tme algorthm can be obtaned by dscretzng LO C, whch allows dynamc programmng to be used. Although the algorthm s approxmate, t can compute the soluton to arbtrary precson. The procedure s shown n Algorthm. Functon Fnd- MCC(, w) computes the mnmum cost needed to change the log odds of x by w usng only the frst features. It returns the par (MnCost, MnLst) where MnCost s the mnmum cost and MnLst s a lst of feature-value pars denotng the changes to be made to x. (In each par, s the feature ndex and x s the value t should be changed to.) To obtan the optmal adversary strategy, we need to compute FndMCC(n, W ), where the nteger W s gap(x) after dscretzaton and n s the number of features n X C. Note that LO,x s now a (non-negatve) nteger n the dscretzed log odds space. The algorthm can be effcently mplemented usng topdown recurson wth memozaton (so that no recursve call s computed more than once). Note that although the features can be consdered n any order, some orderngs may fnd solutons faster than the others. If, n the dscretzed space, nstance x requres a gap of W to be flled by the transformaton, then the algorthm runs n tme at most O(W X ) (snce the for loop n FndMCC s called at most W tmes and each tme t takes O( X ) tme). Hence t s pseudo-lnear n the number of features. Pseudo-

5 lnearty may be expensve for large values of W or for cases where features have large domans. We now present two prunng rules, one for use n the frst stuaton, and one for the second. Algorthm FndMCC(,w) f w then return (, {}) end f f = then return (,Undefned) end f MnCost MnLst Undefned for x X do f LO,x then (CurCost, CurLst) FndMCC(, w LO,x ) CurCost CurCost + W (x, x ). CurLst CurLst + (, x ). f CurCost < MnCost then MnCost CurCost MnLst CurLst end f end f end for return (MnCost, MnLst) Algorthm 2 A(x) W gap(x) (dscretzed). (MnCost, MnLst) FndMCC(n, W ) f (x) = + and MnCost < U A then newx x for all (, x ) MnLst do newx x end for return newx else return x end f Lemma 4.. If then A(x) = x. max,x ( ) LO,x W (x, x ) < gap(x) U A Ths lemma s easy to prove and can be used to detect the nstances for whch MnCost > U A. Instances whch are postve by very large gap(x) values can thus be pruned early on, and we need to run the algorthm only for more reasonable values of gap(x). Our second prunng strategy can be employed n stuatons where the cost metrc s suffcently coarsely dscretzed. We globally sort all the (, x ) tuples n ncreasng order of W (x, x ). For dentcal values of W (x, x ), we use decreasng order of LO,x as the secondary key. For a partcular, W (x, x ) combnaton, over all, we can remove all but the frst entry n the lst. Ths s vald because, f the X s changed n the optmal soluton, then takng the value x wth the hghest LO,x wll also yeld the optmal soluton. We can prune even further by only consderng the frst k tuples n each W such that j= LO j,x > gap(x) j and k j= LO j,x < gap(x). It s easy to see that ths j prunng does not affect the optmal soluton. Thus, f the feature-changng costs W are suffcently coarsely dscretzed, we wll never need to consder more than a few tuples for each nteger value of W. Our algorthm wll thus run effcently even when the domans of features are large. 5. CLASSIFIER STRATEGY We now descrbe how Classfer can adapt to the adversary strategy descrbed n the prevous secton. We derve the optmal C(x) takng nto account A(x), and gve an effcent algorthm for computng t. We make the followng addtonal assumptons. Assumpton 3. Classfer assumes that Adversary uses ts optmal strategy to modfy test nstances (Algorthm 2). Assumpton 4. The tranng set S used for learnng the ntal nave Bayes classfer s not tampered wth by Adversary (.e., S s drawn from the real dstrbuton of adversaral and non-adversaral data). Assumpton 5. X X, W (x, x ) s a sem-metrc,.e., t has the followng propertes:. W (x, x ) and the equalty holds ff x = x 2. W (x, x ) W (x, x ) + W (x, x ) The above also mples that W (x, x ) W (x, x )+W (x, x ). The trangular nequalty for cost holds n most real domans. Ths s because to change a feature from x to x the adversary always has the opton of changng t va x,.e., wth x as an ntermedate value. The goal of Classfer, as n Secton 3, s to predct for each nstance x the class that maxmzes ts condtonal utlty (Equaton 4). The dfference s that now we want to take nto account the fact that Adversary has tampered wth the data. Of all the probabltes used by Classfer (Equaton 3), the only one that s changed by Adversary s P (x +); P (+), P (-) and P (x -) reman unaltered. Let P A(x +) be the post-adversary verson of P (x +). Then P A(x +) = x X P (x +)P A(x x, +) (8) In other words, the probablty of observng an nstance x s the probablty that the adversary generates some nstance x and then modfes t nto x, summed over all x. Snce P A(x x, +) = f A(x) = x and P A(x x, +) = otherwse, P A(x +) = x X A (x ) P (x +) (9) where X A(x ) = {x : x = A(x)}. There are two cases where Adversary wll leave an nstance x untampered (.e., A(x) = x): when nave Bayes predcts t s negatve, snce then no acton s necessary, and when there s no transformaton of x whose cost s lower than the utlty ganed by makng t appear negatve. Thus

6 P A(x +) = x X A (x ) P (x +) + I(x )P (x +) () where X A(x ) = X A(x ) \ {x }, I(x ) = f (x ) = - or W (x, MCC(x )) U A, and I(x ) = otherwse (see Equaton 7 and Algorthm 2). The untampered probabltes P (x +) are estmated usng the nave Bayes model (Equaton 3): P (x +) = X X C P (X = x +). The optmal adversary-aware classfcaton algorthm C(x ) s shown below, wth ˆP () used to denote the probablty P () estmated from the tranng data S. ˆPA(x +) s gven by Equaton usng the emprcal estmates of P (x +). The second term n Equaton, I(x )P (x +), s easy to compute gven calls to (x ) and Algorthm to determne f x has a feasble camouflage. The remander of ths secton s devoted to effcently computng the frst term, x X A (x ) P (x +). Algorthm 3 C(x ) P - x ˆP (-) ˆP (X = x -) P + x ˆP (+) ˆP A(x +) U(+ x ) P + x UC(+, +) + P - x UC(-, +) U(- x ) P + x UC(-, +) + P - x UC(-, -) f U(+ x ) > U(- x ) then return + else return - end f One soluton s to terate through all possble postve examples and check f x s ther mnmum cost camouflage. Ths s, of course, not feasble. We now study some theoretcal propertes of the MCC functon whch wll later be used to prune ths search. Recall that f (x) =- then gap(x) < and vce versa. We defne x [=x ] as a data nstance whch s dentcal to x except that ts th feature s changed to x. Lemma 5.. Let x A be any postve nstance and let x = MCC(x A). Then,, (x A) x gap(x ) + LO C((x A) ) LO C(x ) > Proof. Let x = x [=(x A ) ]. Ths mples that W (xa, x ) < W (x A, x ), snce x dffers from x A on one less feature than x. Also gap(x ) = gap(x ) + LO C((x A) ) LO C(x ). Snce x s MCC(x A) and W (x A, x ) < W (x A, x ), (x ) must be +, and therefore gap(x ) >, provng the result. Gven a negatve nstance x, for each feature we compute all values v that satsfy Lemma 5.. To compute X A(x ), we only need to take combnatons of these feature-value pars and check f x s ther MCC. Ths can substantally reduce the number of postve nstances n our search space. The search space can stll potentally contan an exponental number of nstances. However, after we employ the next theorem, we obtan a fast algorthm for estmatng the set X A(x ). Notce that the optmal feature subset X C for the adversary-aware classfer may be dfferent from the adversary-unaware one, but can be found usng the same methods (see Secton 3). Theorem 5.2. Let x A be a postve nstance such that x = MCC(x A). Let D be the set of features that are changed n x A to obtan x. Let E be a non-trval subset of D, and let x A be an nstance that matches x for all features n E and x A for all others,.e., (x A) = x f X E, (x A) = (x A) otherwse. Then x = MCC(x A). Proof. By contradcton. Suppose x = MCC(x A) and x x. Then W (x A, x ) < W (x A, x ). Also, snce E D, by defnton of W (x, y) we have W (x A, x ) = W (x A, x A) + W (x A, x ). So by the trangle nequalty W (x A, x ) W (x A, x A) + W (x A, x ) < W (x A, x A) + W (x A, x ) = W (x A, x ) Thus W (x A, x ) < W (x A, x ), whch gves a contradcton, snce then x MCC(x A). Ths completes the proof. The above theorem mples that f x A s a postve nstance such that x MCC(x A) then x cannot be the MCC of any other nstance x A such that the changed features from x A to x form a superset of the changed features from x A to x. We now use the followng result to obtan bounds on X A(x ). Corollary 5.3. Let F V be the set of feature-value pars that satsfy Lemma 5.. Let GV F V be such that (, x ) GV f x [=x ] X A(x ). Then for every x A X A(x ), the set of feature-value pars where x A and x dffer form a subset of GV. From the above corollary, after we compute GV, we only need to consder the combnatons of feature-value pars that are n GV and change those n the observed nstance x. Theorem 5.2 also mples that performng sngle changes from GV returns nstances n X A(x ). Ths gves us the followng bounds on x X A (x ) P (x +). Theorem 5.4. Let x be any nstance and let GV be the set defned n Corollary 5.3. Let G = { x (, x ) GV } and let X G = {x X (, x ) GV }. Then P (x [=x ] +) P (x +) (,x ) GV x X A (x ) + G x X G P (x [=x ] +) P (x +) Proof. The proof of the frst nequalty follows drectly from Theorem 5.2, whch states that changng any sngle feature of x to any value n GV returns an nstance from X A(x ). To prove the second nequalty, we observe that the expresson on the rght sde, when expanded, gves the sum of probabltes of all possble changes n x due to the set GV, and X A(x ) s a subset of those nstances. Gven these bounds, we can classfy a test nstance as follows. If pluggng the lower bound nto Algorthm 3 gves U(+ x ) > U(- x ), then the nstance can be safely classfed as postve. Smlarly, f usng the upper bound gves U(+ x ) < U(- x ), then the nstance s negatve. If GV s large, so s the lower bound on P A(X A(x ) +). If GV s small, we can do an exhaustve search over the subsets of GV and check f each of the tems consdered belongs to X A(x ). In our experments we fnd that usng the lower bound for makng predctons works well n practce.

7 6. EXPERIMENTS We mplemented an adversaral classfer system for the spam flterng doman. Spam s an attractve testbed for our methods because of ts practcal mportance, ts rapdly evolvng adversaral nature, the wde avalablty of data (n contrast to many other adversaral domans), the fact that nave Bayes s the de facto standard classfer n ths area, and ts rchness as a challenge problem for KDD [4]. One dsadvantage of spam as a testbed s that feature measurement costs are generally neglgble, leavng ths part of our framework untested. (In contrast, n a doman lke counterterrorsm feature measurements are a major ssue, often requrng large numbers of personnel and expensve equpment, rasng prvacy ssues, and mposng costs on mllons of ndvduals and transactons.) We used the followng two datasets n our experments: Lng-Spam [24]: Ths corpus contans the legtmate dscussons on a lngustcs malng lst and the spam mals receved by the lst. There are 242 non-spam messages and 48 spam ones. Thus, around 6.6% of the corpus s spam. Emal-Data [9]: Ths corpus conssts of texts from 43 emals, wth 642 non-spam messages (conferences (37) and jobs (272)) and 789 spam ones. Each of these datasets was dvded nto ten parts for tenfold cross-valdaton. We defned three scenaros, as descrbed below, and appled our mplementaton of nave Bayes () and the adversary-aware classfer () to each. We used fle [2] for preprocessng emals. 6. Scenaros The three spam flterng scenaros that we mplemented dffer n how the emal s represented for the classfer, how the adversary can modfy the features, and at what cost. Add Words (AW): Ths the smplest scenaro. The bnomal model of text for classfcaton s used [8]: there s one Boolean feature per word, denotng whether or not the word s present n the emal. The only way to modfy the emal s by addng words whch are not already present, and each word added ncurs unt cost. Ths s akn to sayng that the orgnal mal has content that the spammer s not wllng to change, and thus the spammer only adds unnecessary words to fool the spam detector. In ths model, Adversary s strategy reduces to a greedy search where t adds words n decreasng order of LO C. Add Length (AL): Ths model s very smlar to AW, except that the cost of addng a word s proportonal to the number of characters n t. Ths corresponds to a hypothetcal stuaton where Adversary needs to pay a certan amount per bt of mal transmtted, and wants to mnmze the number of characters sent. Synonym (SYN): Generally, spammers want to avod detecton whle preservng the semantc content of ther messages. Thus, n ths scenaro we consder the case where Adversary changes the mal by replacng the exstng words by other semantcally smlar words. For example, a spammer attemptng to sell a product would lke to send emals clamng t to be cheap, but (+, +) (+, -) (-, +) (-, -) U A 2 / / Table 2: Utlty matrces for Adversary and Classfer used n the experments. wthout the use of words lke free, sale, etc. Ths s because the nave Bayes classfer uses the presence or absence of specfc words wth hgh LO C to classfy emals, ndependent of ther actual meanng. Gven the above ntent, we defne ths scenaro as follows. We use the multnomal model of text [8]: an emal s vewed a sequence of word postons, wth one feature per poston, and the doman of each feature s the set of words n the vocabulary. In ths case, the number of tmes a word occurs n an emal s mportant. However, the word order s dsregarded (.e., the probablty of word occurrence s assumed to be ndependent of locaton). For each word, we obtan a lst of synonyms from the WordNet lexcal database [28]. A word n an emal can then be changed only to one of ts synonyms, at unt cost. It s easy to see that the costs used n all scenaros are metrcs, so we can apply Lemma 5. and Theorem 5.2. The U A classfcaton utlty matrx for Adversary we used s such that whenever a spam emal s classfed as non-spam the adversary receves a utlty of 2, and all other entres are. Thus, n the SYN and AW scenaros 2 word replacements/addtons are allowed. In the AL scenaro, the cost of addng a character s set to., and as a result 2 character addtons are allowed. For Classfer, we ran the experments wth three dfferent utlty matrces (). All matrces had a utlty of for correctly classfyng an emal and a penalty (negatve utlty) of for ncorrectly classfyng a spam emal as nonspam. The penalty for ncorrectly classfyng a non-spam emal as spam was set to n one matrx, n another, and n the thrd. Ths reflects the fact that, n spam flterng, the crtcal and domnant cost s that of false postves: lettng a sngle spam emal get through to the user s a relatvely nsgnfcant event, but flterng out a non-spam emal s hghly undesrable (and potentally dsastrous). The dfferent (+, -) values correspond to the dfferent values of the λ parameter n Sakks et al [24]. Table 2 summarzes the utlty parameters used n the experments. 6.2 Results The results of runnng the varous algorthms on the Lng- Spam and Emal-data datasets are shown n Fgures and 2 respectvely. The fgures show the average utltes obtaned (wth a maxmum value of.) by nave Bayes and the adversary-aware classfer under the dfferent scenaros and dfferent matrces. The utlty of nave Bayes on the orgnal, untampered data ( PLAIN ) s represented by the black bar on the left. The remanng black bars represent the performance of nave Bayes on tanted data n the three scenaros, and the whte bars the performance of the correspondng adversary-aware classfer. We observe that Adversary sgnfcantly degrades the performance of nave Bayes n all three scenaros and wth all three Classfer utlty matrces. Ths effect s more pronounced n the Emal-Data set because t has a hgher percentage of spam

8 Avg. Utlty (+, )= PL AW SYN AL Scenaros Avg. Utlty (+, )= PL AWSYN AL Scenaros Avg. Utlty (+, )= PL AWSYN AL Scenaros Fgure : Utlty results on the Lng-Spam dataset for dfferent values of (+, -). Avg. Utlty (+, )= PL AW SYN AL Scenaros Avg. Utlty (+, )= PL AWSYN AL Scenaros Avg. Utlty (+, )= PL AWSYN AL Scenaros Fgure 2: Utlty results on the Emal-Data set for dfferent values of (+, -). emals than Lng-Spam. For nave Bayes on Emal-Data, the cost of msclassfyng spam emals exceeds the utlty of the correct predctons, causng the overall utlty to be negatve. In contrast, Classfer was able to correctly dentfy a large percentage of the spam emals n all cases, and ts accuracy on non-spam emals was also qute hgh. To help n nterpretng these results, we report the numbers of false negatves and false postves for the Lng-Spam dataset n Table 3. We observe that as the msclassfcaton penalty for non-spam ncreases, fewer non-spam emals are classfed ncorrectly, but naturally more spam emals are msclassfed as non-spam. Notce that the adversaral classfer never produces false postves (except for the SYN scenaro wth (+, -) = ). As a result, ts average utlty stays approxmately constant even when (+, -) changes by two orders of magntude. An nterestng observaton s that Adversary s manpulatons can actually help Classfer to reduce the number of false postves. Ths s because Adversary s unlkely to send a spam mal unaltered, and as a result many non-spam emals whch were prevously (+, -) Classfer FN FP FN FP FN FP -PLAIN AW AW AL AL SYN SYN Table 3: False postves and false negatves for nave Bayes and the adversary-aware classfer on the Lng-Spam dataset. The total number of postves n ths dataset s 48, and the total number of negatves s 242. classfed as postve are now classfed as negatve. We also compared the runnng tmes of our algorthms for the three scenaros, for both the adversary and classfer strateges. For both AW and SYN models, the average runnng tmes were less than 5 ms per emal. For AL, the average runnng tme of the classfer strategy was less than 5 ms per mal whle the runnng tme of the adversary strategy was around 5 ms per emal. The adversary runnng tme for AW was small because one can use a smple greedy algorthm to mplement the adversary strategy. In the SYN model, the search space s small because there are few synonyms per word. Hence the tme taken by both algorthms s small. However, n the AL model, when the LO C of emals was hgh (> 5), the adversary took longer. On the other hand, the adversaral classfer, after usng the prunng strateges, had to consder very few nstances and these had small LO C. Hence ts runnng tme was qute small. From the experments we can conclude that n practce we can use the prunng strateges for Adversary and Classfer to reduce the search space and tme wthout compromsng accuracy. To smulate the effects of non-adversaral concept drft (a realty n spam and many other adversaral domans), we also tred classfyng the mals n the Emal-data set usng and traned on the Lng-Spam data. As the frequences of spam and non-spam mals are dfferent n the two datasets, we ran the classfers wthout consderng the class prors. For both algorthms, the results obtaned were only margnally worse than the results obtaned by tranng on the Emal-data set tself, demonstratng the robustness of the algorthms. 6.3 Repeated Game In Sectons 4 and 5 we dscussed one round of the game that goes on between Adversary and Classfer. It conssts of one ply of Adversary n whch t fnds the best strategy to fool Classfer and then one ply of Classfer to adapt to t. Both partes can contnue playng ths game. However, Classfer s no longer usng a smple nave Bayes algorthm. In these experments, we make the smplfyng assumpton that Adversary contnues to model the classfer as Nave Bayes, and uses the technques that we have developed for Nave Bayes. At the end of each round, Adversary learns a Nave Bayes classfer based on the outputs of the actual classfer that Classfer s usng n that round. We denote the classfer used by Classfer n round by C.

9 Let be the classfer that Adversary learns from t. Then A (x) s defned as the optmal adversary strategy (as n Algorthm 2) to fool nstead of the orgnal learned on S. The data comng from Adversary n round s A appled to the orgnal test data to produce T,.e., T = A (T ). Classfer uses Algorthm 3 based on to classfy T,.e., Y = C (T ). The key nsght s that a new nave Bayes model can be traned on (T, Y ) and that can serve as the Classfer for the next round. We compared the performance of wth that of C and found them to be very smlar, justfyng our assumpton, as Adversary s not reactng to a crppled Classfer but to one whch performs almost as well as the optmal Classfer. Ths procedure can then repeated for an arbtrary number of rounds. Fgure 3 shows the results of ths experment on the Lng- Spam dataset for the AW scenaro. The X-axs s round of the game, and the Y-axs s the average utlty obtaned by Y (the th adversary-aware classfer). The graphs also show the average utlty obtaned by (T ), to demonstrate the effect of usng an adversary-aware classfer at each round. In all rounds of the game, Classfer usng the adversaryaware strategy performs sgnfcantly better than the plan nave Bayes. As expected, the dfference s hghest when the penalty for msclassfyng non-spam s. Furthermore, n ths scenaro Classfer and Adversary never reach an equlbrum, and utlty alternates between two values. Ths s surprsng at frst glance, but a closer examnaton elucdates the reason. In the AW scenaro, Adversary can only add words. So the only way of tamperng wth an nstance s to add good words wth very low (negatve) LO C (based on n the th round). Let the top few good words be GW. These would have a hgh frequency of occurrence n spam mals of T. When s learned on (T, Y ) these words no longer have a low LO C and hence are not n GW. Thus, A + gnores these words, makng them have a hgh LO C n + and be n GW +! Ths phenomenon causes LO C values for a word to oscllate, gvng rse to the perodc average utlty n Fg. 3. Avg. Utlty Round () () () () () () Fgure 3: Utlty of nave Bayes and the adversaral classfer for a repeated game n the AW scenaro and Lng-Spam dataset. The number n parentheses s (+,-). 7. FUTURE WORK Ths paper s only the frst step n a potentally very rch research drecton. The next steps nclude: Repeated games. In realty, Adversary and Classfer never cease to evolve aganst each other. Thus, we need to fnd the optmal strategy A for Adversary takng nto account that an adversaral classfer C(A(x)) s beng used, then fnd the optmal strategy for Classfer takng A nto account, and so on ndefntely. To what extent ths can be done analytcally s a key open queston. Theory. We would lke to answer questons lke: What are the most general condtons under whch adversaral classfcaton problems have Nash or correlated equlbra? If so, what form do they take, and are there cases where they can be computed effcently? Under what condtons do repeated games converge to these equlbra? Etc. Incomplete nformaton. When Classfer and Adversary do not know each other s parameters, and Adversary does not know the exact form of the classfer, addtonal learnng needs to occur, and the optmal strateges need to be made robust to mprecse knowledge. Approxmately optmal strateges. When fndng the optmal strategy s too computatonally expensve, approxmate solutons and weaker notons of optmalty become necessary. Also, real-world adversares wll often act suboptmally, and t would be good to take ths nto account. Generalzaton to other classfers. We would lke to extend the deas n ths paper to classfers lke decson trees, nearest neghbor, support vector machnes, etc. Interacton wth humans. Because adversares are resourceful and unpredctable, adversaral classfers wll always requre regular human nterventon. The goal s to make ths as easy and productve as possble. For example, extendng the framework to allow new features to be added at each round of the game could be a good way to combne human and automatc refnement of the classfer. Multple adversares. Classfcaton games are often played aganst more than one adversary at a tme (e.g., multple spammers, ntruders, fraud perpetrators, terrorst groups, etc.). Handlng ths case s a natural but nontrval extenson of our framework. Varants of the problem. Our problem defnton does not ft all classfcaton games, but t could be extended approprately. For example, adversares may produce nnocent as well as malcous nstances, they may delberately seek to make the classfer produce false postves, detecton of some malcous nstances may deter them from producng more, etc. Other domans and tasks. We would lke to apply adversaral classfers to computer ntruson detecton, fraud detecton, face recognton, etc., and to develop adversaral extensons to related data mnng tasks (e.g., adversaral rankng for search engnes).

10 8. CONCLUSION In domans rangng from spam detecton to counter-terrorsm, classfers have to contend wth adversares manpulatng the data to produce false negatves. Ths paper formalzes the problem and extends the nave Bayes classfer to optmally detect and reclassfy tanted nstances, by takng nto account the adversary s optmal feature-changng strategy. When appled to spam detecton n a varety of scenaros, ths approach consstently outperforms the standard nave Bayes, sometmes by a large margn. Research n ths drecton has the potental to produce KDD systems that are more robust to adversary manpulatons and requre less human nterventon to keep up wth them. KNOWLEDGMENTS We are grateful to Danel Lowd, Foster Provost and Ted Senator for ther nsghtful comments on a draft of ths paper. Ths research was partly supported by a Sloan Fellowshp awarded to the second author. 9. REFERENCES [] P. Domngos. MetaCost: A general method for makng classfers cost-senstve. In Proceedngs of the Ffth M SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages 55 64, San Dego, CA, 999. M Press. [2] P. Domngos and M. Pazzan. On the optmalty of the smple Bayesan classfer under zero-one loss. Machne Learnng, 29:3 3, 997. [3] R. B. Doorenbos, O. Etzon, and D. S. Weld. A scalable comparson-shoppng agent for the World-Wde Web. In Proceedngs of the Frst Internatonal Conference on Autonomous Agents, pages 39 48, Marna del Rey, CA, 997. M Press. [4] T. Fawcett. In vvo spam flterng: A challenge problem for KDD. SIGKDD Exploratons, 5(2):4 48, 23. [5] T. Fawcett and F. Provost. Adaptve fraud detecton. Data Mnng and Knowledge Dscovery, (3):29 36, 997. [6] D. Fudenberg and D. Levne. The Theory of Learnng n Games. MIT Press, Cambrdge, MA, 999. [7] D. Fudenberg and J. Trole. Game Theory. MIT Press, Cambrdge, MA, 99. [8] M. R. Garey and D. S. Johnson. Computers and Intractablty. Freeman, New York, NY, 979. [9] L. Guernsey. Retalers rse n Google rankngs as rvals cry foul. New York Tmes, November 2, 23. [] G. Hulten, L. Spencer, and P. Domngos. Mnng tme-changng data streams. In Proceedngs of the Seventh M SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages 97 6, San Francsco, CA, 2. M Press. [] D. Jensen, M. Rattgan, and H. Blau. Informaton awareness: A prospectve techncal assessment. In Proceedngs of the Nnth M SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages , Washngton, DC, 23. M Press. [2] M. Kearns. Computatonal game theory. Tutoral, Department of Computer and Informaton Scences, Unversty of Pennsylvana, Phladelpha, PA, mkearns/nps2tutoral/. [3] R. Kohav and G. John. Wrappers for feature subset selecton. Artfcal Intellgence, 97(-2): , 997. [4] B. Krebs. Onlne pracy spurs hgh-tech arms race. Washngton Post, June 26, 23. [5] M. L. Lttman. Markov games as a framework for mult-agent renforcement learnng. In Proceedngs of the Eleventh Internatonal Conference on Machne Learnng, pages 57 63, New Brunswck, NJ, 994. Morgan Kaufmann. [6] B. Lloyd. Been gazumped by Google? Tryng to make sense of the Florda update. Search Engne Gude, November 25, 23. [7] M. V. Mahoney and P. K. Chan. Learnng nonstatonary models of normal network traffc for detectng novel attacks. In Proceedngs of the Eghth M SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages , Edmonton, Canada, 22. M Press. [8] A. McCallum and K. Ngam. A comparson of event models for Nave Bayes text classfcaton. In Proceedngs of the AAAI-98 Workshop on Learnng for Text Categorzaton, Madson, WI, 998. AAAI Press. [9] F. Nelsen. Emal data, rem/datasets/. [2] F. Provost and T. Fawcett. Robust classfcaton for mprecse envronments. Machne Learnng, 42:23 23, 2. [2] J. Renne. Ifle spam classfer, [22] P. Robertson and J. M. Brady. Adaptve mage analyss for aeral survellance. IEEE Intellgent Systems, 4(3):3 36, 999. [23] M. Saham, S. Dumas, D. Heckerman, and E. Horvtz. A Bayesan approach to flterng junk e-mal. In Proceedngs of the AAAI-98 Workshop on Learnng for Text Categorzaton, Madson, WI, 998. AAAI Press. [24] G. Sakks, I. Androutsopoulos, G. Palouras, V. Karkaletss, C.D. Spyropoulos, and P. Stamatopoulos. A memory-based approach to ant-spam flterng for malng lsts. In Informaton Retreval, volume 6, pages Kluwer, 23. [25] T. Senator. Ongong management and applcaton of dscovered knowledge n a large regulatory organzaton: A case study of the use and mpact of NASD regulaton s advanced detecton system (ADS). In Proceedngs of the Sxth M SIGKDD Internatonal Conference on Knowledge Dscovery and Data Mnng, pages 44 53, Boston, MA, 2. M Press. [26] J. M. Smth. Evoluton and the Theory of Games. Cambrdge Unversty Press, Cambrdge, UK, 982. [27] P. Turney. Cost-senstve learnng bblography. Onlne bblography, NRC Insttute for Informaton Technology, Ottawa, Canada, [28] WordNet 2.: A lexcal database for the Englsh language, wn/.