Improving Resource Allocation Strategy Against Human Adversaries in Security Games



Similar documents
benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

What is Candidate Sampling

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

Logistic Regression. Lecture 4: More classifiers and classes. Logistic regression. Adaboost. Optimization. Multiple class classification

Can Auto Liability Insurance Purchases Signal Risk Attitude?

An Alternative Way to Measure Private Equity Performance

1 Example 1: Axis-aligned rectangles

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

An Interest-Oriented Network Evolution Mechanism for Online Communities

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

CS 2750 Machine Learning. Lecture 3. Density estimation. CS 2750 Machine Learning. Announcements

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

J. Parallel Distrib. Comput.

Credit Limit Optimization (CLO) for Credit Cards

Multiple-Period Attribution: Residuals and Compounding

Recurrence. 1 Definitions and main statements

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Support Vector Machines

Robust Design of Public Storage Warehouses. Yeming (Yale) Gong EMLYON Business School

Efficient Project Portfolio as a tool for Enterprise Risk Management

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

8 Algorithm for Binary Searching in Trees

When Network Effect Meets Congestion Effect: Leveraging Social Services for Wireless Services

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Project Networks With Mixed-Time Constraints

A Lyapunov Optimization Approach to Repeated Stochastic Games

A Game-Theoretic Approach for Minimizing Security Risks in the Internet-of-Things

Risk-based Fatigue Estimate of Deep Water Risers -- Course Project for EM388F: Fracture Mechanics, Spring 2008

L10: Linear discriminants analysis

The Greedy Method. Introduction. 0/1 Knapsack Problem

Statistical Methods to Develop Rating Models

Single and multiple stage classifiers implementing logistic discrimination

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

Vision Mouse. Saurabh Sarkar a* University of Cincinnati, Cincinnati, USA ABSTRACT 1. INTRODUCTION

A hybrid global optimization algorithm based on parallel chaos optimization and outlook algorithm

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Enabling P2P One-view Multi-party Video Conferencing

"Research Note" APPLICATION OF CHARGE SIMULATION METHOD TO ELECTRIC FIELD CALCULATION IN THE POWER CABLES *

Research Article Enhanced Two-Step Method via Relaxed Order of α-satisfactory Degrees for Fuzzy Multiobjective Optimization

Realistic Image Synthesis

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

A Genetic Programming Based Stock Price Predictor together with Mean-Variance Based Sell/Buy Actions

2008/8. An integrated model for warehouse and inventory planning. Géraldine Strack and Yves Pochet

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Power-of-Two Policies for Single- Warehouse Multi-Retailer Inventory Systems with Order Frequency Discounts

Simulation and optimization of supply chains: alternative or complementary approaches?

Availability-Based Path Selection and Network Vulnerability Assessment

Sketching Sampled Data Streams

DEFINING %COMPLETE IN MICROSOFT PROJECT

Descriptive Models. Cluster Analysis. Example. General Applications of Clustering. Examples of Clustering Applications

How To Solve An Onlne Control Polcy On A Vrtualzed Data Center

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

NEURO-FUZZY INFERENCE SYSTEM FOR E-COMMERCE WEBSITE EVALUATION

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Brigid Mullany, Ph.D University of North Carolina, Charlotte

CHAPTER 14 MORE ABOUT REGRESSION

Distributed Multi-Target Tracking In A Self-Configuring Camera Network

Intra-year Cash Flow Patterns: A Simple Solution for an Unnecessary Appraisal Error

Master s Thesis. Configuring robust virtual wireless sensor networks for Internet of Things inspired by brain functional networks

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

An Empirical Study of Search Engine Advertising Effectiveness

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

A Resource-trading Mechanism for Efficient Distribution of Large-volume Contents on Peer-to-Peer Networks

A Secure Password-Authenticated Key Agreement Using Smart Cards

Traffic State Estimation in the Traffic Management Center of Berlin

Logistic Regression. Steve Kroon

Bayesian Cluster Ensembles

Joint Scheduling of Processing and Shuffle Phases in MapReduce Systems

The OC Curve of Attribute Acceptance Plans

Data Broadcast on a Multi-System Heterogeneous Overlayed Wireless Network *

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

AD-SHARE: AN ADVERTISING METHOD IN P2P SYSTEMS BASED ON REPUTATION MANAGEMENT

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Effective Network Defense Strategies against Malicious Attacks with Various Defense Mechanisms under Quality of Service Constraints

Calculating the high frequency transmission line parameters of power cables

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

How To Calculate An Approxmaton Factor Of 1 1/E

Extending Probabilistic Dynamic Epistemic Logic

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

Properties of Indoor Received Signal Strength for WLAN Location Fingerprinting

General Iteration Algorithm for Classification Ratemaking

Using Multi-objective Metaheuristics to Solve the Software Project Scheduling Problem

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

Predicting Software Development Project Outcomes *

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

Transcription:

roceedngs of the Twenty-Second Internatonal Jont Conference on Artfcal Intellgence Improvng Resource Allocaton Strategy Aganst Human Adversares n Securty Games Rong Yang, Chrstopher Kekntveld, Fernando Ordonez, Mlnd Tambe, Rchard John Unversty of Southern Calforna, Los Angeles, CA Unversty of Texas El aso, El aso, TX {yangrong,fordon,tambe,rchardj}@usc.edu ckeknt@gmal.com Abstract Recent real-world deployments of Stackelberg securty games make t crtcal that we address human adversares bounded ratonalty n computng optmal strateges. To that end, ths paper provdes three key contrbutons: () new effcent algorthms for computng optmal strategc solutons usng rospect Theory and Quantal Response Equlbrum; () the most comprehensve experment to date studyng the effectveness of dfferent models aganst human subjects for securty games; and () new technques for generatng representatve payoff structures for behavoral experments n generc classes of games. Our results wth human subjects show that our new technques outperform the leadng contender for modelng human behavor n securty games. Introducton Recent real-world deployments of attacker-defender Stackelberg games, ncludng ARMOR at the LAX arport [ta et al., 28] and IRIS at the Federal Ar Marshals Servce [Tsa et al., 29], have led to an ncreasng nterest n buldng decson-support tools for real-world securty problems. One of the key sets of assumptons these systems make s about how attackers choose strateges based on ther knowledge of the securty strategy. Typcally, such systems apply the standard game-theoretc assumpton that attackers are perfectly ratonal and strctly maxmze ther expected utlty. Ths s a reasonable proxy for the worst case of a hghly ntellgent attacker, but t can lead to a defense strategy that s not robust aganst attackers usng dfferent decson procedures, and t fals to explot known weaknesses n the decson-makng of human attackers. Indeed, t s wdely accepted that standard game-theoretc assumptons of perfect ratonalty are not deal for predctng the behavor of humans n mult-agent decson problems [Camerer et al., 24]. Thus, ntegratng more realstc models of human decsonmakng has become necessary n solvng real-world securty problems. However, there are several open questons n movng beyond perfect ratonalty assumptons. Frst, the lterature has ntroduced a multtude of canddate models, but there s an mportant emprcal queston of whch model best represents the salent features of human behavor n appled securty contexts. Second, ntegratng any of the proposed models nto a decson-support system (even for the purpose of emprcally evaluatng the model) requres developng new computatonal methods, snce the exstng algorthms for securty games are based on mathematcally optmal attackers [ta et al., 28; Kekntveld et al., 29]. The current leadng contender that accounts for human behavor n securty games s COBRA [ta et al., 2], whch assumes that adversares can devate to ɛ optmal strateges and that they have an anchorng bas when nterpretng a probablty dstrbuton. It remans an open queston whether other models yeld better solutons than COBRA aganst human adversares. We address these open questons by developng three new algorthms to generate defender strateges n securty games, based on usng two fundamental theores of human behavor to predct an attacker s decsons: rospect Theory [Kahneman and Tvesky, 979] and Quantal Response Equlbrum [McKelvey and alfrey, 995]. We evaluate our new algorthms usng expermental data from human subjects gathered usng an onlne game desgned to smulate a securty scenaro smlar to the one analyzed by ARMOR for the LAX arport. Furthermore, we desgned classfcaton technques to select payoff structures for experments such that the structures are representatve of the space of possble games, mprovng the coverage relatve to prevous experments for COBRA. Our results show that our new algorthms outperform both CO- BRA and a perfect ratonalty baselne. 2 Background and Related Work Securty games refer to a specal class of attacker-defender Stackelberg games, ncludng those used n ARMOR and IRIS [ta et al., 28; Tsa et al., 29]. The defender needs to allocate lmted securty resources to protect nfrastructure from an adversary s attack. In ths paper, we wll use a more compact representaton of defender s strategy: the probablty that each target wll be protected by a securty force, whch wll be ntroduced n Secton 3.. In Stackelberg securty games, the defender (leader) frst commts to a mxed strategy, assumng the attacker (follower) decdes on a pure strategy after observng the defender s strategy. Ths models the stuaton where an attacker conducts survellance to learn the defender s mxed strategy and then launches an attack on a 458

π (p).8.6.4.2 π(p) = p γ (p γ +( p) γ ) γ.2.4.6.8 p (a) weghtng functon V(C) 4 2 4 6 8 V (C) =C α,c V (C) = θ ( C) β,c < 5 5 C (b) value functon Fgure : T functons [Haste and Dawes, 2] sngle target. In these non zero-sum games, the attacker s utlty of attackng a target decreases as the defender allocates more resources to protect t (and vce versa for the defender). In ths work, we constran the adversary to select a pure strategy. Gven that the defender has lmted resources (e.g., she may need to protect 8 targets wth 3 guards), she must desgn her strategy to optmze aganst the adversary s response to maxmze effectveness. One leadng famly of algorthms to compute such mxed strateges are DOBSS and ts successors [ta et al., 28; Kekntveld et al., 29], whch are used n the deployed AR- MOR and IRIS applcatons. These algorthms formulate the problem as a mxed nteger lnear program (MIL), and compute an optmal mxed strategy for the defender assumng that the attacker responds optmally. However, n many real world domans, agents face human adversares whose behavor may not be optmal assumng perfect ratonalty. COBRA [ta et al., 2] represents the best avalable benchmark for how to determne defender strateges n securty games aganst human adversares, and t outperforms DOBSS wth statstcal sgnfcance n experments usng human subjects. Ths paper ntroduces alternatve methods for computng strateges to play aganst human adversares, based on two well-known theores from the behavoral lterature, rospect Theory (T) and Quantal Response Equlbrum (QRE). rospect Theory s a nobel-prze-wnnng theory [Kahneman and Tvesky, 979], whch descrbes human decson makng as a process of maxmzng prospect. rospect s defned as π(p )V (C ), where p s the actual probablty of outcome C. The weghtng functon π(p ) descrbes how probablty p s perceved. π( ) s not consstent wth the defnton of probablty,.e. π(p) + π( p) n general. An emprcal form of π( ) s shown n Fg. (a). The value functon V (C ) reflects the value of outcome C. T ndcates that ndvduals are rsk averse regardng gan but rsk seekng regardng loss, and care more about loss than gan, as shown n Fg. (b) [Haste and Dawes, 2]. Quantal Response Equlbrum s an mportant model n behavoral game theory [McKelvey and alfrey, 995]. It suggests that nstead of strctly maxmzng utlty, ndvduals respond stochastcally n games: the chance of selectng a non-optmal strategy ncreases as the cost of such an error decreases. Recent work [Wrght and Leyton-Brown, 2] shows Quantal Level-k [Stahl and Wlson, 994] to be best We appled QRE nstead of Quantal Level-k because n Stackelberg securty games the attacker observes the defender s strategy, suted for predctng human behavor n smultaneous move games. However, the applcablty of QRE and T to securty games and ther comparson wth COBRA reman open questons. 3 Defender Mxed-Strategy Computaton We now descrbe effcent computaton of the optmal defender mxed strategy assumng a human adversary s response s based on ether T or QRE. 3. Methods for Computng T Best Response to rospect Theory (BRT) s a mxed nteger programmng formulaton for the optmal leader strategy aganst players whose response follows a T model. Only the adversary s modeled usng T n ths case, snce the defender s actons are recommended by the decson ad. max d x,q,a,d,z s.t. = k= x k Υ () (x k + x k )=, (2) k= x k, x k c k c k,, k =..5 (3) z k (c k c k ) x k,, k =..4 (4) z k (c k c k ) x k,, k =..4 (5) x (k+) z k,, k =..4 (6) x (k+) z k,, k =..4 (7) z k, z k {, },, k =..4 (8) x = b k x k, x = b k x k, (9) k= k= q =,q {, }, () = a (x ( a ) + x (R a ) ) M( q ), () M( q )+ (x k R d + x k d ) d, (2) k= BRT maxmzes, d, the defender s expected utlty. The defender has a lmted number of resources, Υ, to protect the set of targets, t T for =..n. The defender selects a strategy x that descrbes the probablty that each target wll be protected by a resource; we denote these ndvdual probabltes by x. Note that x = x s the margnal dstrbuton on each target whch s equvalent to a mxed-strategy over all possble assgnment of the securty forces 2. The attacker so level-k reasonng s not applcable. 2 It s proved n [Korzhyk et al., 2] that the margnal probablty dstrbuton of coverng each target s equvalent to a mxedstrategy over all possble resource assgnments when there are no assgnment restrctons. 459

chooses a target to attack after observng x. We denote the attacker s choce usng the vector of bnary varables q for =..n, where q = f t s attacked and otherwse. In securty games, the payoffs depend only on whether or not the attack was successful. So gven a target t, the defender receves reward R d f the adversary attacks a target that s covered by the defender; otherwse, the defender receves penalty d. Respectvely, the attacker receves penalty a n the former case; and reward R a n the latter case. The defender optmzaton problem s gven n Equatons ()-(2). T comes nto the algorthm by adjustng the weghtng and value functons as descrbed above. The beneft (prospect) perceved by the adversary for attackng target t f the defender plays the mxed strategy x s gven by π(x )V ( a)+π( x )V (R a). Let ( a) = V ( a) and (R a) = V (R a ) denote the adversary s value of penalty a and reward R a, whch are both gven nput parameters to the MIL. We use a pecewse lnear functon π( ) to approxmate the non-lnear weghtng functon π( ) and emprcally set 5 segments 3 for π( ). Ths functon s defned by {c k c =,c 5 =,c k <c k+,k =,..., 5} that represent the endponts of the lnear segments and {b k k =,...,5} that represent the slope of each lnear segment. Accordng to T, the probablty x s perceved by the attacker as x = π(x )= 5 k= b k x k, as dscussed below. In order to represent the pecewse lnear approxmaton,.e. π(x ) (and π( x )), we break x (and x ) nto fve segments, denoted by varable x k (and x k ). We can enforce that such breakup of x (and x ) s correct f segment x k (and x k ) s postve only f the prevous segment s used completely, for whch we need the auxlary nteger varable z k (and z k ). Ths s enforced by Equatons (3) (8). Equaton (9) defnes x and x as the value of the pecewse lnear approxmaton of x and x : x = π(x ) and x = π( x ). Equatons () and () defne the optmal adversary s pure strategy. In partcular, Equaton () enforces that q = for the acton that acheves maxmal prospect for the adversary. Equaton (2) enforces that d s the defender s expected utlty on the target that s attacked by the adversary (q =). Robust-T (RT) modfes the BRT method to account for some uncertanty about the adversares choce, caused (for example) by mprecse computatons [Smon, 956]. Smlar to COBRA, RT assumes that the adversary may choose any strategy wthn ɛ of the best choce, defned here by the prospect of each acton. It optmzes the worst-case outcome for the defender among the set of strateges that have prospect for the attacker wthn ɛ of the optmal prospect. We modfy the BRT optmzaton problem as follows: the frst Equatons are equvalent to those n BRT; n Equaton (3), the bnary varable h ndcates all the ɛ optmal strateges for the adversary; the epslon-optmal assumpton s embed n Equaton (5), whch forces h =for any target t that leads to a prospect that s wthn ɛ of the optmal prospect,.e. a; Equaton (6) enforces that d s the mnmum expected utlty of the defender aganst the ɛ optmal 3 Ths pecewse lnear representaton of π( ) can acheve a small approxmaton error: sup z [,] π(z) π(z).3. strateges of the adversary. max d x,h,q,a,d,z s.t. Equatons () () h (3) = h {, }, q h, (4) ɛ( h ) a (x ( a ) + x (R a ) ) M( h ), (5) M( h )+ (x k R d + x k d ) d, (6) k= Runtme: We choose AML (http://www.ampl.com/) to solve the MIL wth CLEX as the solver. Both BRT and RT take less than second for up to targets. 3.2 Methods for Computng QRE In applyng the QRE model to our doman, we only add nose to the response functon for the adversary, so the defender computes an optmal strategy assumng the attacker response wth a nosy best-response. The parameter λ represents the amount of nose n the attacker s response. Gven λ and the defender s mxed-strategy x, the adversares quantal response q (.e. probablty of ) can be wrtten as q = e λu a (x) n j= eλu a j (x) (7) where, U a(x) =x a +( x )R a s the adversary s expected utlty for attackng t and x s the defender s strategy. q = e λra e λ(r a a )x n j= eλra j e λ(r a j a j )xj (8) The goal s to maxmze the defender s expected utlty gven q,.e. n = q (x R d +( x ) d ). Combned wth Equaton (8), the problem of fndng the optmal mxed strategy for the defender can be formulated as n = max eλra e λ(r a a )x ((R d d)x + d ) x n (9) j= eλra j e λ(rj a j a)xj s.t. x Υ = x,, j Gven that the objectve functon n Equaton (9) s nonlnear and non-convex n ts most general form, fndng the global optmum s extremely dffcult. Therefore, we focus on methods to fnd local optma. To compute an approxmately optmal QRE strategy effcently, we develop the Best Response to Quantal Response (BRQR) heurstc descrbed n Algorthm. We frst take the negatve of Equaton (9), convertng the maxmzaton problem to a mnmzaton problem. In each teraton, we fnd the local mnmum 4 usng a gradent 4 We use fmncon functon n Matlab to fnd the local mnmum. 46

descent technque from the gven startng pont. If there are multple local mnma, by randomly settng the startng pont n each teraton, the algorthm wll reach dfferent local mnma wth a non-zero probablty. By ncreasng the teraton number, IterN, the probablty of reachng the global mnmum ncreases. Algorthm BRQR : opt g ; Intalze the global optmum 2: for,..., IterN do 3: x randomly generate a feasble startng pont 4: (opt l,x ) FndLocalMnmum(x ) 5: f opt g >opt l then 6: opt g opt l, x opt x 7: end f 8: end for 9: return opt g,x opt arameter Estmaton: The parameter λ n the QRE model represents the amount of nose n the best-response functon. One extreme case s λ=, when play becomes unformly random. The other extreme case s λ=, when the quantal response s dentcal to the best response. λ s senstve to game payoff structure, so tunng λ s a crucal step n applyng the QRE model. We employed Maxmum Lkelhood Estmaton (MLE) to ft λ usng data from [ta et al., 2]. Gven the defender s mxed strategy x and N samples of the players choces, the logarthm lkelhood of λ s N log L(λ x) = log q τ(j) (λ) j= where τ(j) denotes the target attacked by the player n sample j. Let N be the number of subjects attackng target. Then, we have log L(λ x)= n = N log q (λ). Combnng wth Equaton (7), log L(λ x) =λ N U a (x) N log( e λu a (x) ) = = log L(λ x) s a concave functon 5. Therefore, log L(λ x) only has one local maxmum. The MLE of λ s.76 for the data used from [ta et al., 2]. Runtme: We mplement BRQR n Matlab. Wth targets and IterN=3, the runtme of BRQR s less than mnute. In comparson, wth only 4 targets, LINGO2 (http://www.lndo.com/) cannot compute the global optmum of Equaton (9) wthn one hour. 4 ayoff Structure Classfcaton One mportant property of payoff structures we want to examne s ther nfluence on model performance. We certanly 5 The second order dervatve of log L(λ x) s d 2 log L <j = (U a (x) Uj a (x)) 2 e λ(u a (x)+u j a (x)) dλ 2 ( < eλu a(x) ) 2 Table : A-pror defned features Feature Feature 2 Feature 3 Feature 4 mean( Ra ) std( Ra a ) mean( Rd a ) std( Rd d ) d Feature 5 Feature 6 Feature 7 Feature 8 mean( Ra ) std( Ra d ) mean( Rd d ) std( Rd a ) a 2 nd CA Component 6 4 2 4 cluster cluster 2 cluster 3 cluster 4 ayoff ayoff 2 ayoff 3 ayoff 4 ayoff 5,6,7 6 4 2 4 6 8 st CA Component Fgure 2: ayoff Structure Clusters (color) cannot test over all possble payoff structures, so the challenges are: () the payoff structures we select should be representatve of the payoff structure space; () the strateges generated from dfferent algorthms should be suffcently separated. As we wll dscuss later, the payoff structures used n [ta et al., 2] do not address these challenges. We address the frst crteron by randomly samplng payoff structures, each wth 8 targets. R a and Rd are ntegers drawn from Z + [, ]; a and d are ntegers drawn from Z [, ]. Ths scale s smlar to the payoff structures used n [ta et al., 2]. We then clustered the payoff structures nto four clusters usng k-means clusterng based on eght features, whch are defned n Table. Intutvely, features and 2 descrbe how good the game s for the adversary, features 3 and 4 descrbe how good the game s for the defender, and features 5 8 reflect the level of conflct between the two players n the sense that they measure the rato of one player s gan over the other player s loss. In Fg. 2, all payoff structures are projected onto the frst two rncpal Component Analyss (CA) dmensons for vsualzaton. We select one payoff structure from each cluster, followng the crtera below to obtan suffcently dfferent strateges for the dfferent canddate algorthms: We defne the dstance between two mxed strateges, x k and x l, usng the Kullback-Lebler dvergence: D(x k,x l )=D KL (x k x l )+D KL (x l x k ), where D KL (x k x l )= n = xk log(xk /xl ). For each payoff structure, D(x k,x l ) s measured for every par of strateges. Wth fve strateges (dscussed later), we have such measurements. We remove payoff structures that have a mean or mn- 46

Table 2: Strategy Dstance ayoff Structure 2 3 4 5 6 7 mean D KL.83.9.64.88.32.5.2 mn D KL.26.25.2.25.7.2.4 Fgure 3: Game Interface mum of these quanttes below a gven threshold. Ths gves us a subset of about 25 payoff structures n each cluster. We then select one payoff structure closest to the cluster center from the subset of each cluster. The four payoff structures (payoffs -4) we selected from each cluster are marked n Fg. 2, as are the three (payoffs 5-7) used n [ta et al., 2]. Fg. 2 shows that payoffs 5-7 all belong to cluster 3. Furthermore, Table 2 reports the strategy dstances n all seven payoff structures. The strateges are not as well separated n payoffs 5-7 as they are n payoffs - 4. As we dscuss n Secton. 5.2, the performance of dfferent strateges s qute smlar n payoffs 5-7. 5 Experments We conducted emprcal tests wth human subjects playng an onlne game to evaluate the performances of leader strateges generated by fve canddate algorthms. We based our model on the LAX arport, whch has eght termnals that can be targeted n an attack [ta et al., 28]. Subjects play the role of followers and are able to observe the leader s mxed strategy (.e., randomzed allocaton of securty resources). 5. Expermental Setup Fg. 3 shows the nterface of the web-based game we developed to present subject wth choce problems. layers were ntroduced to the game through a seres of explanatory screens descrbng how the game s played. In each game nstance a subject was asked to choose one of the eght gates to open (attack). They knew that guards were protectng three of the eght gates, but not whch ones. Subjects were rewarded based on the reward/penalty shown for each gate and the probablty that a guard was behnd the gate (.e., the exact randomzed strategy of the defender). To motvate the subjects they would earn or lose money based on whether or not they succeed n attackng a gate; f the subject opened a gate not protected by the guards, they won; otherwse, they lost. Subjects start wth an endowment of $8 and each pont won or lost n a game nstance was worth $.. On average, subjects earned about $4. n cash. Table 3: Model arameter ayoff Structure 2 3 4 5 6 7 RT-ɛ 2.4 3. 2. 2.75.9.5.5 COBRA-α.5.5.5.5.37.25 COBRA-ɛ 2.5 2.9 2. 2.75 2.5 2.5 2.5 We tested the seven dfferent payoff structures 6 from Fg. 2 (four new, three from [ta et al., 2]). For each payoff structure we tested the mxed strateges generated by fve algorthms: BRT, RT, BRQR, COBRA and DOBSS. There were a total of 35 payoff structure/strategy combnatons and each subject played all 35 combnatons. In order to mtgate the order effect on subject responses, a total of 35 dfferent orderngs of the 35 combnatons were generated usng Latn Square desgn. Every orderng contaned each of the 35 combnatons exactly once, and each combnaton appeared exactly once n each of the 35 postons across all 35 orderngs. The order played by each subject was drawn unformly randomly from the 35 possble orderngs. To further mtgate learnng, no feedback on success or falure was gven to the subjects untl the end of the experment. A total of 4 human subjects played the game. We could explore only a lmted number of parameters for each algorthm, whch were selected followng the best avalable nformaton n the lterature. The parameter settngs for each algorthm are reported n Table 3. DOBSS has no parameters. The values of T parameters are typcal values reported n the lterature [Haste and Dawes, 2]. We set ɛ n RT followng two rules: () No more than half of targets are n the ɛ optmal set; () ɛ.3r a max, where R a max s the maxmum potental reward for the adversary. The sze of the ɛ optmal set ncreases as the value of ɛ ncreases. When ɛ s suffcently large, the defender s strategy becomes maxmn, snce she beleves that the adversary may attack any target. The second rule lmts the mprecson n the attacker s choce. We emprcally set the lmt to.3r a max. For BRQR, we set λ usng MLE wth data reported n [ta et al., 2] (see Secton 3.2). For payoffs 4, we set the parameters for COBRA followng the advces gven by [ta et al., 2] as close as possble. In partcular, the values we set for α meet the entropy heurstc dscussed n that work. For payoffs 5 7, we use the same parameter settngs as n ther work. 5.2 Experment Result We used defender s expected utlty to evaluate the performance of dfferent defender strateges. Gven that a subject selects target t to attack, the defender s expected utlty depends on the strategy she played: Uexp(x t d )=x R d +( x ) d Average erformance: We frst evaluate the average defender expected utlty, Uexp(x), d of dfferent defender strateges based on all 4 subjects choces: Uexp(x) d = N U d 4 exp(x t ) = 6 Refer to http://anon-submsson.webs.com/ for nformaton of payoff structures, defender s mxed strategy and subjects choces. 462

where N s the number of subjects that chose target t. Fg. 4 dsplays Uexp(x) d for the dfferent strateges n each payoff structure. The performance of the strateges s closer n payoffs 5 7 than n payoffs 4. The man reason s that strateges are not very dfferent n payoffs 5 7 (see Table 2). We evaluate the statstcal sgnfcance of our results usng the bootstrap-t method [Wlcox, 23]. The comparson s summarzed below: BRQR outperforms COBRA n all seven payoff structures. The result s statstcally sgnfcant n three cases (p<.5) and borderlne (p=.5) n payoff 3 (p<.6). BRQR also outperforms DOBSS n all cases, wth statstcal sgnfcance n fve of them (p<.2). RT outperforms COBRA except n payoff 3. The dfference s statstcally sgnfcant n payoff 4 (p<.5). In payoff 3, COBRA outperforms RT (p>.7). Meanwhle, RT outperforms DOBSS n fve payoff structures, wth statstcal sgnfcance n four of them (p<.5). In the other two cases, DOBSS has better performance (p>.8). BRQR outperforms RT n three payoff structures wth statstcal sgnfcance (p<.5). They have very smlar performance n the other four cases. BRT s outperformed by BRQR n all cases wth statstcal sgnfcance (p<.3). It s also outperformed by RT n all cases, wth statstcal sgnfcance n fve of them (p<.2) and one borderlne (p<.6). BRT s falure to perform better (and even worse than COBRA) s a surprsng outcome. Average Defender Expected Utlty 2 3 Average Defender Expected Utlty BRT RT BRQR COBRA DOBSS ayoff ayoff 2 ayoff 3 ayoff 4 3 (a) New ayoffs ayoff 5 ayoff 6 ayoff 7 (b) ayoffs from ta et al. Fgure 4: Average Expected Utlty of Defender Robustness: The dstrbuton of defender s expected utlty s also analysed to evaluate the robustness of dfferent defender strateges. Fgure 5 dsplays the emprcal Cumulatve Dstrbuted Functon (CDF) of Uexp(x t d ) for dfferent defender strateges based the choces of all 4 subjects. The x- axs s the, the y-axs shows the of subjects aganst whom the defender has ganed less than certan amount of expected utlty. As the curve moves towards left, the decreases aganst a certan of the subjects; and vce versa. The left most postve pont on the curve ndcates the worst defender expected utlty of a strategy aganst dfferent subjects. On the other hand, the range of the curve on the x-axs ndcates the relablty of the strategy aganst varous subjects. As can be seen from Fgure 5, has smallest varance when BRQR strategy s played; DOBSS and BRT strateges lead to large varance n defender expected utlty. Furthermore, BRQR acheves hghest worst n all payoff structures except n payoff 5, where the CDF of BRQR and RT strateges are very close. BRT and DOBSS are not robust aganst an adversary that devates from the optmal strategy. BRQR, RT and COBRA all try to be robust aganst such devatons. BRQR consders some (possbly very small) probablty of adversary attackng any target. In contrast, COBRA and RT separate the targets nto two groups, the ɛ-optmal set and the non-ɛ-optmal set, usng a hard threshold. They then try to maxmze the worst case for the defender assumng the response wll be n the ɛ- optmal set, but assgn less resources to other targets. When the non-ɛ-optmal targets have hgh defender penaltes, CO- BRA and RT become vulnerable, especally n the followng two cases: Unattractve targets are those wth small reward but large penalty for the adversary. COBRA and RT consder such targets as non-ɛ-optmal and assgn sgnfcantly less resources than BRQR on them. However, some subjects would stll select such targets and caused severe damage to COBRA and RT (e.g. about 3% subjects 5 selected door 5 n payoff 4 aganst COBRA). Hgh-rsk targets are those wth large reward and large penalty for the adversary. RT consders such targets as non-ɛ-optmal and assgns far less resources than other algorthms. Ths s caused by the assumptons made by T that people care more about loss than gan and that they overestmate small probabltes. However, experments show RT gets hurt sgnfcantly on such targets (e.g. more than 5% subjects 5 select door n payoff 2). Overall, BRQR performs best, RT outperforms COBRA n sx of the seven cases, and BRT and DOBSS perform the worst. 6 Conclusons The unrealstc assumptons of perfect ratonalty made by exstng algorthms applyng game-theoretc technques to realworld securty games need to be addressed due to ther lmtaton n facng human adversares. Ths paper successfully ntegrates two mportant human behavor theores, T and QRE, nto buldng more realstc decson-support tool. To that end, the man contrbutons of ths paper are, () Developng effcent new algorthms based on T and QRE models 463

payoff payoff 2 payoff 3 payoff 4.8.8.8.8.6.4.2 5 5.8.6.4 BRT RT BRQR COBRA DOBSS.6.4.2 payoff 5 5 5.8.6.4.6.4.2 (a) New ayoffs payoff 6 5 5.8.6.4.6.4.2 payoff 7 BRT RT BRQR COBRA DOBSS 5 5.2.2.2 6 4 2 6 4 2 (b) ayoffs from ta et al. 6 4 2 Fgure 5: Dstrbuton of Defender s Expected Utlty (color) of human behavor; () Conductng the most comprehensve experments to date wth human subjects for securty games (4 subjects, 5 strateges, 7 game structures); () Desgnng technques for generatng representatve payoff structures for behavoral experments n generc classes of games. By provdng new algorthms that outperform the leadng compettor, ths paper has advanced the state-of-the-art. Acknowledgments Ths research was supported by Army Research Offce under the grand # W9NF---85. We also thank Moht Goenka and James ta for ther help on developng the webbased game. F. Ordonez would also lke to acknowledge the support of Concyt, through Grant No. ACT87. References [Camerer et al., 24] C. F. Camerer, T. Ho, and J. Chongn. A congntve herarchy model of games. QJE, 9(3):86 898, 24. [Haste and Dawes, 2] R. Haste and R. M. Dawes. Ratonal Choce n an Uncertan World: the sychology of Judgement and Decson Makng. Sage ublcatons, Thounds Oaks, 2. [Kahneman and Tvesky, 979] D. Kahneman and A. Tvesky. rospect theory: An analyss of decson under rsk. Econometrca, 47(2):263 292, 979. [Kekntveld et al., 29] C. Kekntveld, M. Jan, J. Tsa, J. ta, F. Ordonez, and M. Tambe. Computng optmal randomzed resource allocatons for massve securty games. In AAMAS, 29. [Korzhyk et al., 2] D. Korzhyk, V. Contzer, and R. arr. Complexty of computng optmal stackelberg strateges n securty resource allocaton games. In AAAI, 2. [McKelvey and alfrey, 995] R. D. McKelvey and T. R. alfrey. Quantal response equlbra for normal form games. Games and Economc Behavor, 2:6 38, 995. [ta et al., 28] J. ta, M. Jan, F. Ordonez, C. ortway, M. Tambe, C. Western,. aruchur, and S. Kraus. Deployed armor protecton: The applcaton of a game theoretc model for securty at the los angeles nternatonal arport. In AAMAS, 28. [ta et al., 2] J. ta, M. Jan, F. Ordonez, M. Tambe, and S. Kraus. Solvng stackelberg games n the real-world: Addressng bounded ratonalty and lmted observatons n human preference models. Artfcal Intellgence Journal, 74(5):42 7, 2. [Smon, 956] H. Smon. Ratonal choce and the structure of the envronment. sychologcal Revew, 63(2):29 38, 956. [Stahl and Wlson, 994] D. O. Stahl and. W. Wlson. Expermental evdence on players models of other players. JEBO, 25(3):39 327, 994. [Tsa et al., 29] J. Tsa, S. Rath, C. Kekntveld, F. Ordonez, and M. Tambe. Irs - a tool for strategc securty allocaton n transportaton networks. In AAMAS, 29. [Wlcox, 23] R. R. Wlcox. Applyng contemporary statstcal technques. Academc ress, 23. [Wrght and Leyton-Brown, 2] J. R. Wrght and K. Leyton-Brown. Beyond equlbrum: redctng human behavor n normal-form games. In AAAI, 2. 464