Improving Resource Allocation Strategy Against Human Adversaries in Security Games

roceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence Improvng Resource Allocaton Strategy Aganst Human Adversares n Securty Games Rong Yang, Chrstopher Kekntveld, Fernando Ordonez, Mlnd Tambe, Rchard John Unversty of Southern Calforna, Los Angeles, CA Unversty of Texas El aso, El aso, TX {yangrong,fordon,tambe,rchardj}@usc.edu ckeknt@gmal.com Abstract Recent real-world deployments of Stackelberg securty games make t crtcal that we address human adversares bounded ratonalty n computng optmal strateges. To that end, ths paper provdes three key contrbutons: () new effcent algorthms for computng optmal strategc solutons usng rospect Theory and Quantal Response Equlbrum; () the most comprehensve experment to date studyng the effectveness of dfferent models aganst human subjects for securty games; and () new technques for generatng representatve payoff structures for behavoral experments n generc classes of games. Our results wth human subjects show that our new technques outperform the leadng contender for modelng human behavor n securty games. Introducton Recent real-world deployments of attacker-defender Stackelberg games, ncludng ARMOR at the LAX arport [ta et al., 28] and IRIS at the Federal Ar Marshals Servce [Tsa et al., 29], have led to an ncreasng nterest n buldng decson-support tools for real-world securty problems. One of the key sets of assumptons these systems make s about how attackers choose strateges based on ther knowledge of the securty strategy. Typcally, such systems apply the standard game-theoretc assumpton that attackers are perfectly ratonal and strctly maxmze ther expected utlty. Ths s a reasonable proxy for the worst case of a hghly ntellgent attacker, but t can lead to a defense strategy that s not robust aganst attackers usng dfferent decson procedures, and t fals to explot known weaknesses n the decson-makng of human attackers. Indeed, t s wdely accepted that standard game-theoretc assumptons of perfect ratonalty are not deal for predctng the behavor of humans n mult-agent decson problems [Camerer et al., 24]. Thus, ntegratng more realstc models of human decsonmakng has become necessary n solvng real-world securty problems. However, there are several open questons n movng beyond perfect ratonalty assumptons. Frst, the lterature has ntroduced a multtude of canddate models, but there s an mportant emprcal queston of whch model best represents the salent features of human behavor n appled securty contexts. Second, ntegratng any of the proposed models nto a decson-support system (even for the purpose of emprcally evaluatng the model) requres developng new computatonal methods, snce the exstng algorthms for securty games are based on mathematcally optmal attackers [ta et al., 28; Kekntveld et al., 29]. The current leadng contender that accounts for human behavor n securty games s COBRA [ta et al., 2], whch assumes that adversares can devate to ɛ optmal strateges and that they have an anchorng bas when nterpretng a probablty dstrbuton. It remans an open queston whether other models yeld better solutons than COBRA aganst human adversares. We address these open questons by developng three new algorthms to generate defender strateges n securty games, based on usng two fundamental theores of human behavor to predct an attacker s decsons: rospect Theory [Kahneman and Tvesky, 979] and Quantal Response Equlbrum [McKelvey and alfrey, 995]. We evaluate our new algorthms usng expermental data from human subjects gathered usng an onlne game desgned to smulate a securty scenaro smlar to the one analyzed by ARMOR for the LAX arport. Furthermore, we desgned classfcaton technques to select payoff structures for experments such that the structures are representatve of the space of possble games, mprovng the coverage relatve to prevous experments for COBRA. Our results show that our new algorthms outperform both CO- BRA and a perfect ratonalty baselne. 2 Background and Related Work Securty games refer to a specal class of attacker-defender Stackelberg games, ncludng those used n ARMOR and IRIS [ta et al., 28; Tsa et al., 29]. The defender needs to allocate lmted securty resources to protect nfrastructure from an adversary s attack. In ths paper, we wll use a more compact representaton of defender s strategy: the probablty that each target wll be protected by a securty force, whch wll be ntroduced n Secton 3.. In Stackelberg securty games, the defender (leader) frst commts to a mxed strategy, assumng the attacker (follower) decdes on a pure strategy after observng the defender s strategy. Ths models the stuaton where an attacker conducts survellance to learn the defender s mxed strategy and then launches an attack on a 458

π (p).8.6.4.2 π(p) = p γ (p γ +( p) γ ) γ.2.4.6.8 p (a) weghtng functon V(C) 4 2 4 6 8 V (C) =C α,c V (C) = θ ( C) β,c < 5 5 C (b) value functon Fgure : T functons [Haste and Dawes, 2] sngle target. In these non zero-sum games, the attacker s utlty of attackng a target decreases as the defender allocates more resources to protect t (and vce versa for the defender). In ths work, we constran the adversary to select a pure strategy. Gven that the defender has lmted resources (e.g., she may need to protect 8 targets wth 3 guards), she must desgn her strategy to optmze aganst the adversary s response to maxmze effectveness. One leadng famly of algorthms to compute such mxed strateges are DOBSS and ts successors [ta et al., 28; Kekntveld et al., 29], whch are used n the deployed AR- MOR and IRIS applcatons. These algorthms formulate the problem as a mxed nteger lnear program (MIL), and compute an optmal mxed strategy for the defender assumng that the attacker responds optmally. However, n many real world domans, agents face human adversares whose behavor may not be optmal assumng perfect ratonalty. COBRA [ta et al., 2] represents the best avalable benchmark for how to determne defender strateges n securty games aganst human adversares, and t outperforms DOBSS wth statstcal sgnfcance n experments usng human subjects. Ths paper ntroduces alternatve methods for computng strateges to play aganst human adversares, based on two well-known theores from the behavoral lterature, rospect Theory (T) and Quantal Response Equlbrum (QRE). rospect Theory s a nobel-prze-wnnng theory [Kahneman and Tvesky, 979], whch descrbes human decson makng as a process of maxmzng prospect. rospect s defned as π(p )V (C ), where p s the actual probablty of outcome C. The weghtng functon π(p ) descrbes how probablty p s perceved. π( ) s not consstent wth the defnton of probablty,.e. π(p) + π( p) n general. An emprcal form of π( ) s shown n Fg. (a). The value functon V (C ) reflects the value of outcome C. T ndcates that ndvduals are rsk averse regardng gan but rsk seekng regardng loss, and care more about loss than gan, as shown n Fg. (b) [Haste and Dawes, 2]. Quantal Response Equlbrum s an mportant model n behavoral game theory [McKelvey and alfrey, 995]. It suggests that nstead of strctly maxmzng utlty, ndvduals respond stochastcally n games: the chance of selectng a non-optmal strategy ncreases as the cost of such an error decreases. Recent work [Wrght and Leyton-Brown, 2] shows Quantal Level-k [Stahl and Wlson, 994] to be best We appled QRE nstead of Quantal Level-k because n Stackelberg securty games the attacker observes the defender s strategy, suted for predctng human behavor n smultaneous move games. However, the applcablty of QRE and T to securty games and ther comparson wth COBRA reman open questons. 3 Defender Mxed-Strategy Computaton We now descrbe effcent computaton of the optmal defender mxed strategy assumng a human adversary s response s based on ether T or QRE. 3. Methods for Computng T Best Response to rospect Theory (BRT) s a mxed nteger programmng formulaton for the optmal leader strategy aganst players whose response follows a T model. Only the adversary s modeled usng T n ths case, snce the defender s actons are recommended by the decson ad. max d x,q,a,d,z s.t. = k= x k Υ () (x k + x k )=, (2) k= x k, x k c k c k,, k =..5 (3) z k (c k c k ) x k,, k =..4 (4) z k (c k c k ) x k,, k =..4 (5) x (k+) z k,, k =..4 (6) x (k+) z k,, k =..4 (7) z k, z k {, },, k =..4 (8) x = b k x k, x = b k x k, (9) k= k= q =,q {, }, () = a (x ( a ) + x (R a ) ) M( q ), () M( q )+ (x k R d + x k d ) d, (2) k= BRT maxmzes, d, the defender s expected utlty. The defender has a lmted number of resources, Υ, to protect the set of targets, t T for =..n. The defender selects a strategy x that descrbes the probablty that each target wll be protected by a resource; we denote these ndvdual probabltes by x. Note that x = x s the margnal dstrbuton on each target whch s equvalent to a mxed-strategy over all possble assgnment of the securty forces 2. The attacker so level-k reasonng s not applcable. 2 It s proved n [Korzhyk et al., 2] that the margnal probablty dstrbuton of coverng each target s equvalent to a mxedstrategy over all possble resource assgnments when there are no assgnment restrctons. 459

chooses a target to attack after observng x. We denote the attacker s choce usng the vector of bnary varables q for =..n, where q = f t s attacked and otherwse. In securty games, the payoffs depend only on whether or not the attack was successful. So gven a target t, the defender receves reward R d f the adversary attacks a target that s covered by the defender; otherwse, the defender receves penalty d. Respectvely, the attacker receves penalty a n the former case; and reward R a n the latter case. The defender optmzaton problem s gven n Equatons ()-(2). T comes nto the algorthm by adjustng the weghtng and value functons as descrbed above. The beneft (prospect) perceved by the adversary for attackng target t f the defender plays the mxed strategy x s gven by π(x )V ( a)+π( x )V (R a). Let ( a) = V ( a) and (R a) = V (R a ) denote the adversary s value of penalty a and reward R a, whch are both gven nput parameters to the MIL. We use a pecewse lnear functon π( ) to approxmate the non-lnear weghtng functon π( ) and emprcally set 5 segments 3 for π( ). Ths functon s defned by {c k c =,c 5 =,c k <c k+,k =,..., 5} that represent the endponts of the lnear segments and {b k k =,...,5} that represent the slope of each lnear segment. Accordng to T, the probablty x s perceved by the attacker as x = π(x )= 5 k= b k x k, as dscussed below. In order to represent the pecewse lnear approxmaton,.e. π(x ) (and π( x )), we break x (and x ) nto fve segments, denoted by varable x k (and x k ). We can enforce that such breakup of x (and x ) s correct f segment x k (and x k ) s postve only f the prevous segment s used completely, for whch we need the auxlary nteger varable z k (and z k ). Ths s enforced by Equatons (3) (8). Equaton (9) defnes x and x as the value of the pecewse lnear approxmaton of x and x : x = π(x ) and x = π( x ). Equatons () and () defne the optmal adversary s pure strategy. In partcular, Equaton () enforces that q = for the acton that acheves maxmal prospect for the adversary. Equaton (2) enforces that d s the defender s expected utlty on the target that s attacked by the adversary (q =). Robust-T (RT) modfes the BRT method to account for some uncertanty about the adversares choce, caused (for example) by mprecse computatons [Smon, 956]. Smlar to COBRA, RT assumes that the adversary may choose any strategy wthn ɛ of the best choce, defned here by the prospect of each acton. It optmzes the worst-case outcome for the defender among the set of strateges that have prospect for the attacker wthn ɛ of the optmal prospect. We modfy the BRT optmzaton problem as follows: the frst Equatons are equvalent to those n BRT; n Equaton (3), the bnary varable h ndcates all the ɛ optmal strateges for the adversary; the epslon-optmal assumpton s embed n Equaton (5), whch forces h =for any target t that leads to a prospect that s wthn ɛ of the optmal prospect,.e. a; Equaton (6) enforces that d s the mnmum expected utlty of the defender aganst the ɛ optmal 3 Ths pecewse lnear representaton of π( ) can acheve a small approxmaton error: sup z [,] π(z) π(z).3. strateges of the adversary. max d x,h,q,a,d,z s.t. Equatons () () h (3) = h {, }, q h, (4) ɛ( h ) a (x ( a ) + x (R a ) ) M( h ), (5) M( h )+ (x k R d + x k d ) d, (6) k= Runtme: We choose AML (http://www.ampl.com/) to solve the MIL wth CLEX as the solver. Both BRT and RT take less than second for up to targets. 3.2 Methods for Computng QRE In applyng the QRE model to our doman, we only add nose to the response functon for the adversary, so the defender computes an optmal strategy assumng the attacker response wth a nosy best-response. The parameter λ represents the amount of nose n the attacker s response. Gven λ and the defender s mxed-strategy x, the adversares quantal response q (.e. probablty of ) can be wrtten as q = e λu a (x) n j= eλu a j (x) (7) where, U a(x) =x a +( x )R a s the adversary s expected utlty for attackng t and x s the defender s strategy. q = e λra e λ(r a a )x n j= eλra j e λ(r a j a j )xj (8) The goal s to maxmze the defender s expected utlty gven q,.e. n = q (x R d +( x ) d ). Combned wth Equaton (8), the problem of fndng the optmal mxed strategy for the defender can be formulated as n = max eλra e λ(r a a )x ((R d d)x + d ) x n (9) j= eλra j e λ(rj a j a)xj s.t. x Υ = x,, j Gven that the objectve functon n Equaton (9) s nonlnear and non-convex n ts most general form, fndng the global optmum s extremely dffcult. Therefore, we focus on methods to fnd local optma. To compute an approxmately optmal QRE strategy effcently, we develop the Best Response to Quantal Response (BRQR) heurstc descrbed n Algorthm. We frst take the negatve of Equaton (9), convertng the maxmzaton problem to a mnmzaton problem. In each teraton, we fnd the local mnmum 4 usng a gradent 4 We use fmncon functon n Matlab to fnd the local mnmum. 46

descent technque from the gven startng pont. If there are multple local mnma, by randomly settng the startng pont n each teraton, the algorthm wll reach dfferent local mnma wth a non-zero probablty. By ncreasng the teraton number, IterN, the probablty of reachng the global mnmum ncreases. Algorthm BRQR : opt g ; Intalze the global optmum 2: for,..., IterN do 3: x randomly generate a feasble startng pont 4: (opt l,x ) FndLocalMnmum(x ) 5: f opt g >opt l then 6: opt g opt l, x opt x 7: end f 8: end for 9: return opt g,x opt arameter Estmaton: The parameter λ n the QRE model represents the amount of nose n the best-response functon. One extreme case s λ=, when play becomes unformly random. The other extreme case s λ=, when the quantal response s dentcal to the best response. λ s senstve to game payoff structure, so tunng λ s a crucal step n applyng the QRE model. We employed Maxmum Lkelhood Estmaton (MLE) to ft λ usng data from [ta et al., 2]. Gven the defender s mxed strategy x and N samples of the players choces, the logarthm lkelhood of λ s N log L(λ x) = log q τ(j) (λ) j= where τ(j) denotes the target attacked by the player n sample j. Let N be the number of subjects attackng target. Then, we have log L(λ x)= n = N log q (λ). Combnng wth Equaton (7), log L(λ x) =λ N U a (x) N log( e λu a (x) ) = = log L(λ x) s a concave functon 5. Therefore, log L(λ x) only has one local maxmum. The MLE of λ s.76 for the data used from [ta et al., 2]. Runtme: We mplement BRQR n Matlab. Wth targets and IterN=3, the runtme of BRQR s less than mnute. In comparson, wth only 4 targets, LINGO2 (http://www.lndo.com/) cannot compute the global optmum of Equaton (9) wthn one hour. 4 ayoff Structure Classfcaton One mportant property of payoff structures we want to examne s ther nfluence on model performance. We certanly 5 The second order dervatve of log L(λ x) s d 2 log L <j = (U a (x) Uj a (x)) 2 e λ(u a (x)+u j a (x)) dλ 2 ( < eλu a(x) ) 2 Table : A-pror defned features Feature Feature 2 Feature 3 Feature 4 mean( Ra ) std( Ra a ) mean( Rd a ) std( Rd d ) d Feature 5 Feature 6 Feature 7 Feature 8 mean( Ra ) std( Ra d ) mean( Rd d ) std( Rd a ) a 2 nd CA Component 6 4 2 4 cluster cluster 2 cluster 3 cluster 4 ayoff ayoff 2 ayoff 3 ayoff 4 ayoff 5,6,7 6 4 2 4 6 8 st CA Component Fgure 2: ayoff Structure Clusters (color) cannot test over all possble payoff structures, so the challenges are: () the payoff structures we select should be representatve of the payoff structure space; () the strateges generated from dfferent algorthms should be suffcently separated. As we wll dscuss later, the payoff structures used n [ta et al., 2] do not address these challenges. We address the frst crteron by randomly samplng payoff structures, each wth 8 targets. R a and Rd are ntegers drawn from Z + [, ]; a and d are ntegers drawn from Z [, ]. Ths scale s smlar to the payoff structures used n [ta et al., 2]. We then clustered the payoff structures nto four clusters usng k-means clusterng based on eght features, whch are defned n Table. Intutvely, features and 2 descrbe how good the game s for the adversary, features 3 and 4 descrbe how good the game s for the defender, and features 5 8 reflect the level of conflct between the two players n the sense that they measure the rato of one player s gan over the other player s loss. In Fg. 2, all payoff structures are projected onto the frst two rncpal Component Analyss (CA) dmensons for vsualzaton. We select one payoff structure from each cluster, followng the crtera below to obtan suffcently dfferent strateges for the dfferent canddate algorthms: We defne the dstance between two mxed strateges, x k and x l, usng the Kullback-Lebler dvergence: D(x k,x l )=D KL (x k x l )+D KL (x l x k ), where D KL (x k x l )= n = xk log(xk /xl ). For each payoff structure, D(x k,x l ) s measured for every par of strateges. Wth fve strateges (dscussed later), we have such measurements. We remove payoff structures that have a mean or mn- 46

Table 2: Strategy Dstance ayoff Structure 2 3 4 5 6 7 mean D KL.83.9.64.88.32.5.2 mn D KL.26.25.2.25.7.2.4 Fgure 3: Game Interface mum of these quanttes below a gven threshold. Ths gves us a subset of about 25 payoff structures n each cluster. We then select one payoff structure closest to the cluster center from the subset of each cluster. The four payoff structures (payoffs -4) we selected from each cluster are marked n Fg. 2, as are the three (payoffs 5-7) used n [ta et al., 2]. Fg. 2 shows that payoffs 5-7 all belong to cluster 3. Furthermore, Table 2 reports the strategy dstances n all seven payoff structures. The strateges are not as well separated n payoffs 5-7 as they are n payoffs - 4. As we dscuss n Secton. 5.2, the performance of dfferent strateges s qute smlar n payoffs 5-7. 5 Experments We conducted emprcal tests wth human subjects playng an onlne game to evaluate the performances of leader strateges generated by fve canddate algorthms. We based our model on the LAX arport, whch has eght termnals that can be targeted n an attack [ta et al., 28]. Subjects play the role of followers and are able to observe the leader s mxed strategy (.e., randomzed allocaton of securty resources). 5. Expermental Setup Fg. 3 shows the nterface of the web-based game we developed to present subject wth choce problems. layers were ntroduced to the game through a seres of explanatory screens descrbng how the game s played. In each game nstance a subject was asked to choose one of the eght gates to open (attack). They knew that guards were protectng three of the eght gates, but not whch ones. Subjects were rewarded based on the reward/penalty shown for each gate and the probablty that a guard was behnd the gate (.e., the exact randomzed strategy of the defender). To motvate the subjects they would earn or lose money based on whether or not they succeed n attackng a gate; f the subject opened a gate not protected by the guards, they won; otherwse, they lost. Subjects start wth an endowment of $8 and each pont won or lost n a game nstance was worth $.. On average, subjects earned about $4. n cash. Table 3: Model arameter ayoff Structure 2 3 4 5 6 7 RT-ɛ 2.4 3. 2. 2.75.9.5.5 COBRA-α.5.5.5.5.37.25 COBRA-ɛ 2.5 2.9 2. 2.75 2.5 2.5 2.5 We tested the seven dfferent payoff structures 6 from Fg. 2 (four new, three from [ta et al., 2]). For each payoff structure we tested the mxed strateges generated by fve algorthms: BRT, RT, BRQR, COBRA and DOBSS. There were a total of 35 payoff structure/strategy combnatons and each subject played all 35 combnatons. In order to mtgate the order effect on subject responses, a total of 35 dfferent orderngs of the 35 combnatons were generated usng Latn Square desgn. Every orderng contaned each of the 35 combnatons exactly once, and each combnaton appeared exactly once n each of the 35 postons across all 35 orderngs. The order played by each subject was drawn unformly randomly from the 35 possble orderngs. To further mtgate learnng, no feedback on success or falure was gven to the subjects untl the end of the experment. A total of 4 human subjects played the game. We could explore only a lmted number of parameters for each algorthm, whch were selected followng the best avalable nformaton n the lterature. The parameter settngs for each algorthm are reported n Table 3. DOBSS has no parameters. The values of T parameters are typcal values reported n the lterature [Haste and Dawes, 2]. We set ɛ n RT followng two rules: () No more than half of targets are n the ɛ optmal set; () ɛ.3r a max, where R a max s the maxmum potental reward for the adversary. The sze of the ɛ optmal set ncreases as the value of ɛ ncreases. When ɛ s suffcently large, the defender s strategy becomes maxmn, snce she beleves that the adversary may attack any target. The second rule lmts the mprecson n the attacker s choce. We emprcally set the lmt to.3r a max. For BRQR, we set λ usng MLE wth data reported n [ta et al., 2] (see Secton 3.2). For payoffs 4, we set the parameters for COBRA followng the advces gven by [ta et al., 2] as close as possble. In partcular, the values we set for α meet the entropy heurstc dscussed n that work. For payoffs 5 7, we use the same parameter settngs as n ther work. 5.2 Experment Result We used defender s expected utlty to evaluate the performance of dfferent defender strateges. Gven that a subject selects target t to attack, the defender s expected utlty depends on the strategy she played: Uexp(x t d )=x R d +( x ) d Average erformance: We frst evaluate the average defender expected utlty, Uexp(x), d of dfferent defender strateges based on all 4 subjects choces: Uexp(x) d = N U d 4 exp(x t ) = 6 Refer to http://anon-submsson.webs.com/ for nformaton of payoff structures, defender s mxed strategy and subjects choces. 462

where N s the number of subjects that chose target t. Fg. 4 dsplays Uexp(x) d for the dfferent strateges n each payoff structure. The performance of the strateges s closer n payoffs 5 7 than n payoffs 4. The man reason s that strateges are not very dfferent n payoffs 5 7 (see Table 2). We evaluate the statstcal sgnfcance of our results usng the bootstrap-t method [Wlcox, 23]. The comparson s summarzed below: BRQR outperforms COBRA n all seven payoff structures. The result s statstcally sgnfcant n three cases (p<.5) and borderlne (p=.5) n payoff 3 (p<.6). BRQR also outperforms DOBSS n all cases, wth statstcal sgnfcance n fve of them (p<.2). RT outperforms COBRA except n payoff 3. The dfference s statstcally sgnfcant n payoff 4 (p<.5). In payoff 3, COBRA outperforms RT (p>.7). Meanwhle, RT outperforms DOBSS n fve payoff structures, wth statstcal sgnfcance n four of them (p<.5). In the other two cases, DOBSS has better performance (p>.8). BRQR outperforms RT n three payoff structures wth statstcal sgnfcance (p<.5). They have very smlar performance n the other four cases. BRT s outperformed by BRQR n all cases wth statstcal sgnfcance (p<.3). It s also outperformed by RT n all cases, wth statstcal sgnfcance n fve of them (p<.2) and one borderlne (p<.6). BRT s falure to perform better (and even worse than COBRA) s a surprsng outcome. Average Defender Expected Utlty 2 3 Average Defender Expected Utlty BRT RT BRQR COBRA DOBSS ayoff ayoff 2 ayoff 3 ayoff 4 3 (a) New ayoffs ayoff 5 ayoff 6 ayoff 7 (b) ayoffs from ta et al. Fgure 4: Average Expected Utlty of Defender Robustness: The dstrbuton of defender s expected utlty s also analysed to evaluate the robustness of dfferent defender strateges. Fgure 5 dsplays the emprcal Cumulatve Dstrbuted Functon (CDF) of Uexp(x t d ) for dfferent defender strateges based the choces of all 4 subjects. The x- axs s the, the y-axs shows the of subjects aganst whom the defender has ganed less than certan amount of expected utlty. As the curve moves towards left, the decreases aganst a certan of the subjects; and vce versa. The left most postve pont on the curve ndcates the worst defender expected utlty of a strategy aganst dfferent subjects. On the other hand, the range of the curve on the x-axs ndcates the relablty of the strategy aganst varous subjects. As can be seen from Fgure 5, has smallest varance when BRQR strategy s played; DOBSS and BRT strateges lead to large varance n defender expected utlty. Furthermore, BRQR acheves hghest worst n all payoff structures except n payoff 5, where the CDF of BRQR and RT strateges are very close. BRT and DOBSS are not robust aganst an adversary that devates from the optmal strategy. BRQR, RT and COBRA all try to be robust aganst such devatons. BRQR consders some (possbly very small) probablty of adversary attackng any target. In contrast, COBRA and RT separate the targets nto two groups, the ɛ-optmal set and the non-ɛ-optmal set, usng a hard threshold. They then try to maxmze the worst case for the defender assumng the response wll be n the ɛ- optmal set, but assgn less resources to other targets. When the non-ɛ-optmal targets have hgh defender penaltes, CO- BRA and RT become vulnerable, especally n the followng two cases: Unattractve targets are those wth small reward but large penalty for the adversary. COBRA and RT consder such targets as non-ɛ-optmal and assgn sgnfcantly less resources than BRQR on them. However, some subjects would stll select such targets and caused severe damage to COBRA and RT (e.g. about 3% subjects 5 selected door 5 n payoff 4 aganst COBRA). Hgh-rsk targets are those wth large reward and large penalty for the adversary. RT consders such targets as non-ɛ-optmal and assgns far less resources than other algorthms. Ths s caused by the assumptons made by T that people care more about loss than gan and that they overestmate small probabltes. However, experments show RT gets hurt sgnfcantly on such targets (e.g. more than 5% subjects 5 select door n payoff 2). Overall, BRQR performs best, RT outperforms COBRA n sx of the seven cases, and BRT and DOBSS perform the worst. 6 Conclusons The unrealstc assumptons of perfect ratonalty made by exstng algorthms applyng game-theoretc technques to realworld securty games need to be addressed due to ther lmtaton n facng human adversares. Ths paper successfully ntegrates two mportant human behavor theores, T and QRE, nto buldng more realstc decson-support tool. To that end, the man contrbutons of ths paper are, () Developng effcent new algorthms based on T and QRE models 463

payoff payoff 2 payoff 3 payoff 4.8.8.8.8.6.4.2 5 5.8.6.4 BRT RT BRQR COBRA DOBSS.6.4.2 payoff 5 5 5.8.6.4.6.4.2 (a) New ayoffs payoff 6 5 5.8.6.4.6.4.2 payoff 7 BRT RT BRQR COBRA DOBSS 5 5.2.2.2 6 4 2 6 4 2 (b) ayoffs from ta et al. 6 4 2 Fgure 5: Dstrbuton of Defender s Expected Utlty (color) of human behavor; () Conductng the most comprehensve experments to date wth human subjects for securty games (4 subjects, 5 strateges, 7 game structures); () Desgnng technques for generatng representatve payoff structures for behavoral experments n generc classes of games. By provdng new algorthms that outperform the leadng compettor, ths paper has advanced the state-of-the-art. Acknowledgments Ths research was supported by Army Research Offce under the grand # W9NF---85. We also thank Moht Goenka and James ta for ther help on developng the webbased game. F. Ordonez would also lke to acknowledge the support of Concyt, through Grant No. ACT87. References [Camerer et al., 24] C. F. Camerer, T. Ho, and J. Chongn. A congntve herarchy model of games. QJE, 9(3):86 898, 24. [Haste and Dawes, 2] R. Haste and R. M. Dawes. Ratonal Choce n an Uncertan World: the sychology of Judgement and Decson Makng. Sage ublcatons, Thounds Oaks, 2. [Kahneman and Tvesky, 979] D. Kahneman and A. Tvesky. rospect theory: An analyss of decson under rsk. Econometrca, 47(2):263 292, 979. [Kekntveld et al., 29] C. Kekntveld, M. Jan, J. Tsa, J. ta, F. Ordonez, and M. Tambe. Computng optmal randomzed resource allocatons for massve securty games. In AAMAS, 29. [Korzhyk et al., 2] D. Korzhyk, V. Contzer, and R. arr. Complexty of computng optmal stackelberg strateges n securty resource allocaton games. In AAAI, 2. [McKelvey and alfrey, 995] R. D. McKelvey and T. R. alfrey. Quantal response equlbra for normal form games. Games and Economc Behavor, 2:6 38, 995. [ta et al., 28] J. ta, M. Jan, F. Ordonez, C. ortway, M. Tambe, C. Western,. aruchur, and S. Kraus. Deployed armor protecton: The applcaton of a game theoretc model for securty at the los angeles nternatonal arport. In AAMAS, 28. [ta et al., 2] J. ta, M. Jan, F. Ordonez, M. Tambe, and S. Kraus. Solvng stackelberg games n the real-world: Addressng bounded ratonalty and lmted observatons n human preference models. Artfcal Intellgence Journal, 74(5):42 7, 2. [Smon, 956] H. Smon. Ratonal choce and the structure of the envronment. sychologcal Revew, 63(2):29 38, 956. [Stahl and Wlson, 994] D. O. Stahl and. W. Wlson. Expermental evdence on players models of other players. JEBO, 25(3):39 327, 994. [Tsa et al., 29] J. Tsa, S. Rath, C. Kekntveld, F. Ordonez, and M. Tambe. Irs - a tool for strategc securty allocaton n transportaton networks. In AAMAS, 29. [Wlcox, 23] R. R. Wlcox. Applyng contemporary statstcal technques. Academc ress, 23. [Wrght and Leyton-Brown, 2] J. R. Wrght and K. Leyton-Brown. Beyond equlbrum: redctng human behavor n normal-form games. In AAAI, 2. 464