Predicting the behavior of interacting humans by fusing data from multiple sources



Similar documents
Optimized Data Indexing Algorithms for OLAP Systems

Geometric Stratification of Accounting Data

2 Limits and Derivatives

Research on the Anti-perspective Correction Algorithm of QR Barcode

The modelling of business rules for dashboard reporting using mutual information

Verifying Numerical Convergence Rates

How To Ensure That An Eac Edge Program Is Successful

An inquiry into the multiplier process in IS-LM model

The EOQ Inventory Formula

Lecture 10: What is a Function, definition, piecewise defined functions, difference quotient, domain of a function

Theoretical calculation of the heat capacity

Pre-trial Settlement with Imperfect Private Monitoring

2.23 Gambling Rehabilitation Services. Introduction

- 1 - Handout #22 May 23, 2012 Huffman Encoding and Data Compression. CS106B Spring Handout by Julie Zelenski with minor edits by Keith Schwarz

Instantaneous Rate of Change:

Schedulability Analysis under Graph Routing in WirelessHART Networks

Training Robust Support Vector Regression via D. C. Program

What is Advanced Corporate Finance? What is finance? What is Corporate Finance? Deciding how to optimally manage a firm s assets and liabilities.

Math 113 HW #5 Solutions

FINITE DIFFERENCE METHODS

Can a Lump-Sum Transfer Make Everyone Enjoy the Gains. from Free Trade?

Pretrial Settlement with Imperfect Private Monitoring

Catalogue no XIE. Survey Methodology. December 2004

Tangent Lines and Rates of Change

A system to monitor the quality of automated coding of textual answers to open questions

Distances in random graphs with infinite mean degrees

Pressure. Pressure. Atmospheric pressure. Conceptual example 1: Blood pressure. Pressure is force per unit area:

SAMPLE DESIGN FOR THE TERRORISM RISK INSURANCE PROGRAM SURVEY

Comparison between two approaches to overload control in a Real Server: local or hybrid solutions?

College Planning Using Cash Value Life Insurance

ACT Math Facts & Formulas

Artificial Neural Networks for Time Series Prediction - a novel Approach to Inventory Management using Asymmetric Cost Functions


Notes: Most of the material in this chapter is taken from Young and Freedman, Chap. 12.

Welfare, financial innovation and self insurance in dynamic incomplete markets models

Tis Problem and Retail Inventory Management

MATHEMATICS FOR ENGINEERING DIFFERENTIATION TUTORIAL 1 - BASIC DIFFERENTIATION

Note nine: Linear programming CSE Linear constraints and objective functions. 1.1 Introductory example. Copyright c Sanjoy Dasgupta 1

OPTIMAL FLEET SELECTION FOR EARTHMOVING OPERATIONS

SWITCH T F T F SELECT. (b) local schedule of two branches. (a) if-then-else construct A & B MUX. one iteration cycle

Free Shipping and Repeat Buying on the Internet: Theory and Evidence

M(0) = 1 M(1) = 2 M(h) = M(h 1) + M(h 2) + 1 (h > 1)

To motivate the notion of a variogram for a covariance stationary process, { Ys ( ): s R}

Math Test Sections. The College Board: Expanding College Opportunity

New Vocabulary volume

Cyber Epidemic Models with Dependences

Strategic trading and welfare in a dynamic market. Dimitri Vayanos

Multigrid computational methods are

Yale ICF Working Paper No May 2005

1.6. Analyse Optimum Volume and Surface Area. Maximum Volume for a Given Surface Area. Example 1. Solution

Derivatives Math 120 Calculus I D Joyce, Fall 2013

Staffing and routing in a two-tier call centre. Sameer Hasija*, Edieal J. Pinker and Robert A. Shumsky

h Understanding the safe operating principles and h Gaining maximum benefit and efficiency from your h Evaluating your testing system's performance

Determine the perimeter of a triangle using algebra Find the area of a triangle using the formula

Pioneer Fund Story. Searching for Value Today and Tomorrow. Pioneer Funds Equities

Chapter 11. Limits and an Introduction to Calculus. Selected Applications

Digital evolution Where next for the consumer facing business?

Guide to Cover Letters & Thank You Letters

For Sale By Owner Program. We can help with our for sale by owner kit that includes:

The Dynamics of Movie Purchase and Rental Decisions: Customer Relationship Implications to Movie Studios

Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning

Channel Allocation in Non-Cooperative Multi-Radio Multi-Channel Wireless Networks

Heterogeneous firms and trade costs: a reading of French access to European agrofood

A strong credit score can help you score a lower rate on a mortgage


Equilibria in sequential bargaining games as solutions to systems of equations

An Orientation to the Public Health System for Participants and Spectators

1. Case description. Best practice description

SAT Subject Math Level 1 Facts & Formulas

Operation go-live! Mastering the people side of operational readiness

RISK ASSESSMENT MATRIX

WORKING PAPER SERIES THE INFORMATIONAL CONTENT OF OVER-THE-COUNTER CURRENCY OPTIONS NO. 366 / JUNE by Peter Christoffersen and Stefano Mazzotta

Keskustelualoitteita #65 Joensuun yliopisto, Taloustieteet. Market effiency in Finnish harness horse racing. Niko Suhonen

SHAPE: A NEW BUSINESS ANALYTICS WEB PLATFORM FOR GETTING INSIGHTS ON ELECTRICAL LOAD PATTERNS

OPTIMAL DISCONTINUOUS GALERKIN METHODS FOR THE ACOUSTIC WAVE EQUATION IN HIGHER DIMENSIONS

Multivariate time series analysis: Some essential notions

CHAPTER TWO. f(x) Slope = f (3) = Rate of change of f at 3. x 3. f(1.001) f(1) Average velocity = s(0.8) s(0) 0.8 0

Section 3.3. Differentiation of Polynomials and Rational Functions. Difference Equations to Differential Equations

In other words the graph of the polynomial should pass through the points

Bonferroni-Based Size-Correction for Nonstandard Testing Problems

DEPARTMENT OF ECONOMICS HOUSEHOLD DEBT AND FINANCIAL ASSETS: EVIDENCE FROM GREAT BRITAIN, GERMANY AND THE UNITED STATES

Article. Variance inflation factors in the analysis of complex survey data. by Dan Liao and Richard Valliant

Section 2.3 Solving Right Triangle Trigonometry

Transcription:

Predicting te beavior of interacting umans by fusing data from multiple sources Erik J. Sclict 1, Ritcie Lee 2, David H. Wolpert 3,4, Mykel J. Kocenderfer 1, and Brendan Tracey 5 1 Lincoln Laboratory, Massacusetts Institute of Tecnology, Lexington, MA 02420 2 Carnegie Mellon Silicon Valley, NASA Ames Researc Park, Moffett Field, CA 94035 3 Information Sciences Group, MS B256, Los Alamos National Laboratory, Los Alamos, NM 87545 4 Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501 5 Stanford University, Palo Alto, CA 94304 Abstract Multi-fidelity metods combine inexpensive low-fidelity simulations wit costly but igfidelity simulations to produce an accurate model of a system of interest at minimal cost. Tey ave proven useful in modeling pysical systems and ave been applied to engineering problems suc as wing-design optimization. During uman-in-te-loop experimentation, it as become increasingly common to use online platforms, like Mecanical Turk, to run low-fidelity experiments to gater uman performance data in an efficient manner. One concern wit tese experiments is tat te results obtained from te online environment generalize poorly to te actual domain of interest. To address tis limitation, we extend traditional multi-fidelity approaces to allow us to combine fewer data points from ig-fidelity uman-in-te-loop experiments wit plentiful but less accurate data from low-fidelity experiments to produce accurate models of ow umans interact. We present bot model-based and model-free metods, and summarize te predictive performance of eac metod under different conditions. 1 Introduction Te benefit of using ig-fidelity simulations of a system of interest is tat it produces results tat closely matc observations of te real-world system. Ideally, algoritmic design optimization would be performed using suc accurate models. However, te cost of igfidelity simulation (computational or financial) often proibits running more tan a few simulations during te optimization process. Te ig-fidelity models built from suc a small number of ig-fidelity data points tend not to generalize well. One approac to overcoming tis limitation is to use multi-fidelity optimization, were bot low-fidelity and ig-fidelity data are used togeter during te optimization process (Robinson et al., 2006). Wen te system of interest involves umans, traditionally uman-in-te-loop (HITL) experimentation as been used to build models for design optimization. A common approac in HITL experimentation is to make te simulation environment matc te true environment as closely as possible (Aponso et al., 2009). A major disadvantage of tis approac is tat designing a good ig-fidelity HITL simulation requires substantial uman and tecnological resources. Experimentation requires igly trained participants wo need to pysically visit te test location. As a consequence, collecting large amounts of ig-fidelity data under different conditions is often infeasible. In many cases, one could ceaply implement a lower fidelity simulation environment using test subjects wo are not as extensively trained, allowing for te collection of large amounts of less accurate data. Indeed, tis is te idea beind computational testbeds like Mecanical Turk (Kittur et al., 2008; Paolacci et al., 2010). Te goal of tis paper is to extend multi-fidelity concepts to HITL experimentation, allowing for te combination of plentiful low-fidelity data wit more sparse ig-fidelity data. Tis approac results in an inexpensive model of interacting umans tat generalizes well to real-world scenarios. In tis paper, we begin by outlining te self-separation scenario used as a testbed to study multi-fidelity models of interacting umans. In Section 3 we introduce te multi-fidelity metods used in tis paper. In Section 4, we present model-free approaces to multifidelity prediction, and in Section 5 we present model-

based metods. Results, summarized in Section 6, sow te benefit of incorporating low-fidelity data to make ig-fidelity predictions and tat model-based metods out-perform model-free metods. Section 7 concludes wit a summary of our findings and future directions for investigation. 2 Modeling Interacting Humans In order to ground our discussion of multi-fidelity metods in a concrete scenario, we will begin by presenting our testbed for studying te beavior of interacting umans. We model te interaction of two pilots wose aircraft are on a near-collision course. Bot pilots want to avoid collision (Fig. 1a), but tey also want to maintain teir eading. For simplicity, we model te pilots as making a single decision in te encounter. Teir decisions are based on teir beliefs about te oter aircraft s position and velocity, teir degree of preference for maintaining course, and teir prediction of te oter pilot s decision. We use a combination of Bayesian networks and gameteoretic concepts (Lee and Wolpert, 2012) to model tis scenario. Te structure of te model is sown in Fig. 1b. Te state of aircraft i is a four-dimensional vector representing te orizontal position and velocity information. Te initial distribution over states is described in Section 3. Te action a i taken by te pilot of aircraft i represents te angular cange in eading, away from teir current eading. Eac pilot observes te state of te oter aircraft based on teir view out te window o i ow and teir instrumentation o i in. Eac pilot as a weigt w i in teir utility function reflecting teir relative desire to avoid collision versus maintaining eading. We consider te situation were eac player knows teir own type (i.e., te value of te weigt term in teir utility function) in addition to te type of te oter player (Myerson, 1997). As described in tis section, te pilots cose teir actions based on teir observations and utility function. After bot pilots select an action, tose actions are executed for 5 s, bringing te aircraft to teir final states s i f. Witout loss of generality we describe te scenario from te perspective of one of te pilots, Player 1, wit te oter aircraft corresponding to Player 2. Te model assumptions, detailed below, are symmetric between players. It sould be noted tat te approaces described in tis paper are not limited to cases in wic tere are only two umans interacting, altoug scenarios wit more umans would increase te amount of data required to make accurate predictions. o 1 s 1 f a 1 o 2 s 2 a 2 s 1 (a) Example geometry wit model variables depicted. s 2 f s 1 s 2 a 1 a 2 s 1 f o 1 ow o 1 in s 2 f o 2 ow (b) Model structure. Saded circles are observable variables, wite circles are unobservable variables, squares are decision nodes, and arrows represent conditional relationsips between variables. Figure 1: Visual self-separation game. o 2 in

2.1 Visual inference Player 1 infers te state of te intruder based on visual information out te window and instrumentation. Player 1 is not able to exactly infer te state of Player 2 as umans are not able to perfectly estimate position and velocity based on visual information (Graf et al., 2005; Sclict and Scrater, 2007), and te instruments also ave noise associated wit teir measurements (Fig. 1b). We model te out-te-window visual observation as Gaussian wit a mean of te true state of te intruder aircraft and covariance matrix given by diag(900 ft, 900 ft, 318 ft/s, 318 ft/s). Te instrument observation is Gaussian wit a mean of te true state of te intruder wit covariance given by diag(600 ft, 600 ft, 318 ft/s, 318 ft/s). Te out-tewindow velocity uncertainties are derived from te visual psycopysics literature (Weis et al., 2002; Graf et al., 2005). 2.2 Decision-making Teoretical models of multi-agent interaction offer a framework for formalizing decision-making of interacting umans. Past researc as focused on gameteoretic approaces (Myerson, 1997) and teir grapical representations (Koller and Milc, 2003; Lee and Wolpert, 2012) were agents make a single decision given a particular dept of reasoning between opponents. Suc approaces ave been successfully used for predicting uman decision-making in competitive tasks (Camerer, 2003; Wrigt and Leyton-Brown, 2010, 2012). In cases wen it is important to capture ow people reason across time, sequential models of uman interaction can be used (Littman, 1994). Suc models can reflect eiter cooperative settings were agents are attempting to maximize mutual utility (Amato et al., 2009) or competitive contexts were agents are attempting to maximize teir own reward (Dosi and Gmytrasiewicz, 2005, 2006). Grapical forms of sequential decision processes ave also been developed (Zeng and Xiang, 2010). Since umans often reason over a relatively sort time orizon (Sellitto et al., 2010), especially in sudden, stressful scenarios, we model our interacting pilots scenario as a one-sot game. Like Lee and Wolpert (2012), we use level-k relaxed strategies as a model for uman decision-making. A level-0 strategy cooses actions uniformly at random. A level-k strategy assumes tat te oter players adopt a level-(k 1) strategy (Costa-Gomes et al., 2001). In our work, we assume k = 1 because umans tend to use sallow dept-ofreasoning (Wrigt and Leyton-Brown, 2010), and increasing k > 1 ad little effect on te predicted jointdecisions for our scenario. Te focus on bounded rationality stems from observing te limitations of uman decision making. Humans are unable to evaluate te probability of all outcomes wit sufficient precision and often make decisions based on adequacy rater tan by finding te true optimum (Simon, 1956, 1982; Caplin et al., 2011). Because decision-makers lack te ability and resources to arrive at te optimal solution, tey instead apply teir reasoning only after aving greatly simplified te coices available. Tis type of sampling-based, bounded-rational view of perceptual processing as been used in computational models of uman visual tracking performance (Vul et al., 2010) and as been sown to predict uman decision-making (Vul et al., 2009). To align our model wit a bounded-rational view of uman performance, we assume tat Player 1 makes decisions based on m sampled intruder locations and m candidate actions sampled from a uniform distribution over ±1 radian. In oter words, o (1),..., o (m ) p(o 1 ow s 2 )p(o 1 in s 2 ) (1) a (1),..., a (m) U( 1, 1) (2) Player 1 selects te sampled action tat maximizes te expected utility over te m sampled states of te oter aircraft: a = arg max i In te equation above, w 1 d(s 1 f a, s 2 (i) f o ) (1 w 1 ) a (i) (3) (j) j s 1 f a (i) is te final state of Player 1 after performing action a (i), s 2 is te expected final state of Player 2 given f o (j) observation o (j) and assuming a random eading cange, d(, ) represents Euclidean distance, and w 1 is te relative desire of Player 1 to avoid te intruder versus maintaining teir current eading. Te first term in te utility function rewards distance between aircraft, and te second term penalizes te magnitude of te eading cange. Fig. 2 sows ow joint-actions cange across different w 1 and w 2 combinations. Wen weigts are relatively low (e.g., w 1 = 0.80, w 2 = 0.80), joint-actions are centered around (0, 0), indicating pilots tend to make very small eading cange maneuvers. Conversely, wen weigts are relatively ig (e.g., w 1 = 0.98, w 2 = 0.98), pilots tend to make large eading cange maneuvers.

Player 2 Utility Weigt 0.98 0.89 0.80 Player 2 Decision Player 2 Decision Player 2 Decision 50 0 50 50 0 50 50 0 50 50 0 50 Player 1 Decision 50 0 50 Player 1 Decision 50 0 50 Player 1 Decision 0.80 0.89 0.98 Player 1 Utility Weigt Figure 2: Influence of utility weigts on joint-decision densities. Red regions depict areas wit ig probability joint-decisions (i.e., eading cange, in degrees), wile blue regions represent low probability joint-decisions. 2.3 Executing Actions After coosing an appropriate action, te pilots ten execute te maneuver. Altoug it is known tat umans ave uncertainty associated wit teir movements (Harris and Wolpert, 1998), we simulate te case in wic pilots perfectly execute te action selected by level-k relaxed strategies, since uncertainty in action execution is not a focus of tis study. Recall tat te action of te player is te cange in eading angle of te aircraft, away from te current eading. From te initial state and te commanded eading cange of te aircraft, te final state is obtained by simulating te aircraft for 5 seconds assuming point-mass kinematics, no acceleration, and instantaneous eading cange. 3 Multi-fidelity Prediction Tere are two ways one could distinguis a low-fidelity and ig-fidelity model of tis encounter. Te first is using low-fidelity umans versus ig-fidelity umans, were low-fidelity umans would represent people wo ave been trained in fligt simulators but wo are not commercial pilots. Te second way is to ave bot a low-fidelity and ig-fidelity simulation environment, say aving low-fidelity environment being a fligt simulator tat does not perfectly matc te conditions of an actual commercial jet. For tis paper, our low-fidelity simulators are te same as te ig-fidelity simulation except tere is no instrument panel. Te low-fidelity simulation as te o in nodes removed in Fig. 1b. We used separate training and testing encounters, all sampled wit Player 2 being in Player 1 s field of view, and a eading suc tat a collision occurs if an avoidance maneuver is not performed (dased lines in Fig. 1a). Training encounters were initialized wit Player 2 randomly approacing at 45, 0, and +45. Test encounters were initialized wit Player 2 randomly approacing at 22.5 and +22.5. Initial eadings for bot train and test encounters were sampled from a Gaussian distribution wit a mean around a eading direction tat would result in a collision wit a standard deviation of 5. Te goal of multi-fidelity prediction is to combine te data in te ig and low-fidelity training encounters to predict te joint-decisions of pilots in te ig-fidelity game. Te next two sections describe different approaces to acieving tis type of prediction, and te notation is summarized in Table 1.

Table 1: Notation used for multi-fidelity prediction Description A Observed joint-actions (a 1, a 2 )  Predicted joint-actions (â 1, â 2 ) S Encounter geometry (s 1, s 2 ) N Number of encounters w Joint utility-weigts (w 1, w 2 ) z Regression weigt (.) tr Superscript indicating training encounters (.) te Superscript indicating test encounters (.) l Subscript indicating low-fidelity game (.) Subscript indicating ig-fidelity game 4 Model-Free Prediction Approaces A model-free approac is one were we do not use any knowledge of te underlying game or te decisionmaking process. Tis section discusses a traditional model-free approac to predicting joint-actions as well as a model-free multi-fidelity metod. 4.1 Locally Weigted, Hig-fidelity Te simplest approac to model-free prediction is to use te state and joint-action information from te ig-fidelity training data to predict te joint-actions given te states in te testing data. Using only te ig-fidelity training data, we trained a ig-fidelity predictor R tat predicts joint-actions A tr given te training encounters S tr. Ten, te test encounters were used as inputs to te predictor, to predict te joint-actions at te new states Âte = R (S te ). Te regression model we considered was locally weigted (LW) regression, were te predicted ig-fidelity joint-decisions for a particular test encounter is te weigted combination of ig-fidelity joint-decisions from te training encounters. Te weigts are determined by te distance between te training encounters and te test encounter: j N tr  te i = z i,j A tr,j (4) z i,j = e di,j j e di,j (5) were d i,j is te standardized Euclidean distance between te state in te it testing encounter and te jt training encounter. Tis approac does not use any of te low-fidelity data to make predictions, so te quality of te predictions is strictly a function of te amount of ig-fidelity training data. Tis metod represents ow well one migt do witout taking advantage of multi-fidelity data, and serves as a baseline for comparison against te multi-fidelity approac. 4.2 Locally Weigted, Multi-fidelity Our model-free multi-fidelity metod uses an approac similar to te ig-fidelity metod, but it also takes advantage of te low-fidelity data. We use LW regression on te low-fidelity training data to obtain a lowfidelity predictor R l tat predicts joint-actions Âtr l for te training encounters Sl tr. We use te te low-fidelity predictor to augment te ig-fidelity training data. Te augmented input (S tr, R l(s tr )) is used to train our augmented igfidelity predictor R tat predicts joint-actions based on te ig-fidelity data. Te test encounters S te were used as inputs to te augmented ig-fidelity predictor to predict te joint-action for te test encounters as  te = R (S te, R l (S te )). In essence, we are using te predictions from te low-fidelity predictor as features for training te ig-fidelity predictor. 5 Model-Based Prediction Approaces Te model-free metods described above ignore wat is known about ow te pilots make teir decisions. In many cases, we ave a model of te process wit some number of free parameters. In our case, we model te players interacting in a level-k environment, and treat te utility weigts w as unknown parameters tat can be learned from data, allowing te prediction of beavior in new situations. Suc an approac is used in te inverse reinforcement learning literature (Baker et al., 2009; Ng and Russell, 2000), were te goal is to learn parameters of te utility function of a single agent from experimental data. Tis section outlines tree model-based approaces were te data comes from experiments of varying fidelities. Te first two are based on maximum a posteriori (MAP) parameter estimates, and te tird is a Bayesian approac. 5.1 MAP Hig-fidelity To demonstrate te benefit of multi-fidelity modelbased approaces, we use a baseline metod tat exclusively relies on ig-fidelity data to make predictions. Since we ave a model of interacting umans, we can simply use te test encounters S te as inputs into te model and use te resulting joint-actions as predictions. However, to make accurate predictions, we must estimate te utility weigts used by te pilots. To find te MAP utility weigts w, we first need to estimate p(a S, w), wic is te probability of te joint-actions A given encounters S and weigt w.

As an approximation, we estimate p(a S, w) by simulating te ig-fidelity game using a set of 1000 novel encounters (S n ), under different utility weigt combinations. We stored te resulting joint-actions for te jt utility weigt combination, and ten estimated p(a n wj ) = S p(an Sn, w j ) using kernel density estimation via diffusion (Botev et al., 2010). Fig. 2 sows ow p(a n w ) canges across a subset of w settings. We can find te MAP utility weigt combination by using te joint-actions A tr we observe for our training encounters S tr. For eac joint-action from te training data, we find te nearest neigbor joint-action in A n. We sum te log-likeliood of te nearest-neigbor joint-actions for te training encounters, under eac utility weigt combination. Te MAP utility weigt combination is defined by w = arg max w [ln p(w ) + n ln p(a n w )] (6) were p(w ) is te prior of weigt vector w. In our experiments, we assume p(w ) is uniform over utility weigt combinations. Once we ave estimated te MAP utility weigts, we can simulate our ig-fidelity game using te test encounters and te MAP utility weigts to obtain te predicted joint-action. Te predicted joint-action is obtained by generating N s = 10 samples from p(a te Ste i, w ) and averaging tem togeter: A (1),..., A (Ns) p(a te  te,i = 1 N s N s l=1 S te i, w ) (7) A (l) (8) Te amount of ig-fidelity data impacts te quality of te utility-weigt estimate (w ), tereby limiting its effectiveness in situations wen tere is little data. Te metod described next overcomes tis limitation by fusing bot ig- and low-fidelity data to find te utility weigts to use for prediction. 5.2 MAP Multi-fidelity In situations wen tere is little ig-fidelity data to estimate te utility weigts, it is beneficial to fuse bot low- and ig-fidelity data to increase te reliability of te estimate. In tis approac, te predicted igfidelity joint-actions for test encounters (Âte ) are being computed according to Eq. (8), and our MAP utility weigt is still found by maximizing Eq. (6). However, instead of only using te ig-fidelity data to estimate te utility weigts, we use bot te low-fidelity and te ig-fidelity data to estimate te weigts. Specifically, we find te log-likeliood of te nearest-neigbor jointaction A n for bot te low- and ig-fidelity jointactions A tr and Atr l. Once we ave estimates for te utility weigts in te decision model, we can use te test encounters S te as inputs to te ig-fidelity model to obtain predictions Âte, similar to wat was done in Eq. (8). Altoug tis is a straigtforward way to use bot lowand ig-fidelity training data, it assumes tat te decision makers in te low-fidelity and ig-fidelity encounters are identical. Wile it is plausible tat te decision makers ave similar utility functions to make tis a valid assumption, it could also be te case tat igly trained umans ave very different utility functions from lesser-trained ones, or tat canges in te simulation environment also cause canges in te utility function (for example, due to te decreased sense of immersion). Terefore, we would like a metod tat relaxes te equal utility assumption, allowing us to make predictions for games tat are different for lowand ig-fidelity. 5.3 Bayesian Multi-fidelity Te approaces described above use te test encounters as inputs into te ig-fidelity model and sample joint-actions from p(a te Ste, w ) using te MAP utility weigt estimate w to compute te predicted jointactions Âte. In contrast, a Bayesian approac to multifidelity makes its predictions by sampling joint-actions from p(a te Ste, w j ) under eac of te j ig-fidelity utility weigt combinations, and ten weigting te predicted joint-actions by te probability of te utility weigts, given te low- and ig-fidelity joint-actions observed from te training encounters: were  te,i = j p(a te,k Si te, w j )p(wj Atr l, A tr ) (9) p(w j Atr l, A tr ) = p(w j Atr ) k p(w k l A tr l )p(w k l, w j ) Te probability distribution p(w j Atr ) was estimated using te metod outlined in Section 5.1. Te lowfidelity version of tis distribution p(wl k A tr l ) was estimated in a similar manner for eac of te k lowfidelity utility weigt combinations by simulating te low-fidelity game using te same novel encounters S n and kernel density estimation via diffusion metods. Finally, p(w l, w ) = p(w 1 l, w2 l, w1, w2 ) is a prior probability over low- and ig-fidelity utility weigts, and can be selected based on te degree of similarity between te ig-fidelity and low-fidelity simulations. For our case, we assume a Gaussian prior wit a mean

Identical Weigts 1000 LoFi Samples Identical Weigts 100 LoFi Samples Average Predictive Efficiency 1 0.8 0.6 0 50 100 Number of HiFi Samples 0 50 100 Number of HiFi Samples Model-Free Hig-Fidelity Model-Free Multi-Fidelity Model-Based MAP Hig-Fidelity Model-Based MAP Multi-Fidelity Model-Based Bayes Multi-Fidelity Figure 3: Predictive performance curves for various metods across different numbers of ig-fidelity training samples. Lines depict mean predictive efficiency across te 10 simulations per sample-size condition, and te saded-areas represent standard error. Results are sown for conditions in wic te low- and ig-fidelity utility weigts are identical. corresponding to te ground-trut utility weigts (Section 6) and covariance given by:.0017 0.0013 0 0.0017 0.0013.0013 0.0017 0 0.0013 0.0017 Tis covariance assumes tat tere is variation among players at te same fidelity, but also tat te weigts of te players are similar (but not identical) across fidelities. However, it assumes tat te utilities of te two players witin any specific encounter are independent. A Bayesian approac to multi-fidelity allows for predictions to be made tat account for te uncertainty in ow well te utility weigts account for te jointactions observed in training. Suc an approac also relaxes te assumption of identical utility weigts across games troug a prior distribution tat encodes te coupling of tese parameters. 6 Results In order to assess te performance of bot model-free and model-based approaces to multi-fidelity prediction, we evaluated eac metod wit different amounts of ig-fidelity training data. We assessed te benefit of incorporating an additional 100 or 1000 low-fidelity training examples, depending on te condition. Since novices may employ different strategies tan domain experts, we also simulated te scenario under different ground-trut utility weigt relationsips between te low- and ig-fidelity games. For te scenario were experts and novices employ similar beavior, te ground-trut utility weigts were set to: w gt l = { w 1 = 0.89, w 2 = 0.90 } (10) w gt = { w 1 = 0.89, w 2 = 0.90 } (11) For te scenario were tere is a small difference in ow novices act, te ground-trut low-fidelity utility weigts were canged to: w gt l = { w 1 l = 0.88, w 2 l = 0.89 } (12) For te scenario were novices perform muc differently tan experts, te low-fidelity ground-trut utility weigts were: w gt l = { w 1 l = 0.80, w 2 l = 0.81 } (13) Performance is measured in terms of predictive efficiency, wic is a number typically between 0 and 1. It is possible to obtain a number above 1 because te normalization factor is an estimated lower-bound of te test-set error. Te test-set error is given by D = i d(âi, A te i ), (14) were d(, ) is te Euclidean distance, Â i is te predicted action for te it encounter, and A te i is t-e joint-action of te it test encounter. A lower-bound on te test-set error D lb can be approximated by estimating te error wen using te ground-trut model in conjunction wit Eq. (8). Te predictive efficiency is given by D lb /D.

Small Difference in Weigts 1000 LoFi Samples Large Difference in Weigts 1000 LoFi Samples Average Predictive Efficiency 1 0.8 0.6 0 50 100 Number of HiFi Samples 0 50 100 Number of HiFi Samples Model-Free Hig-Fidelity Model-Free Multi-Fidelity Model-Based MAP Hig-Fidelity Model-Based MAP Multi-Fidelity Model-Based Bayes Multi-Fidelity Figure 4: Predictive performance curves for various metods across different numbers of ig-fidelity training samples. Lines depict mean predictive efficiency across te 10 simulations per sample-size condition, and te saded-areas represent standard error. Results are sown for conditions in wic te low- and ig-fidelity utility weigts ave small (left) and large (rigt) differences between tem. Figure 3 sows ow average predictive efficiency canges as a function of te amount of low- and igfidelity training data for te identical utility weigt condition. Figure 4 sows ow average predictive efficiency is impacted by te level of discrepancy between low- and ig-fidelity utility weigts. Multi-fidelity model-free approaces (blue solid lines) predicted better tan metods tat used only igfidelity data (blue dased lines), except for te case in wic tere was little (100 samples) low-fidelity available to train te model (Figure 3). Model-based multi-fidelity approaces ad better predictive performance (red and green solid lines) tan ig-fidelity metods (red dased line) in cases wen tere was a large amount of low-fidelity data (1000 samples), and tere was little or no difference in te utility weigts used by te low-fidelity and ig-fidelity umans in te task (Figures 3 and 4). 7 Conclusions and Furter Work We developed a multi-fidelity metod for predicting te decisions of interacting umans. We investigated te conditions under wic tese metods produce better performance tan exclusively relying on igfidelity data. In general, our results suggest tat multifidelity metods provide benefit if tere is a sufficient amount of low-fidelity training data available and te differences between expert and novice beavior is not too large. Future effort will investigate tese distinctions in greater detail. Tis approac can also be extended to many domains beyond aviation. For example, work is currently being conducted tat uses a power-grid game to produce low-fidelity data to train algoritms tat detect attacks on te system. Tis data could be combined wit relatively sparse ig-fidelity data (e.g., istorical data involving known attacks) to increase te predictive performance of te model. Acknowledgments Te Lincoln Laboratory portion of tis work was sponsored by te Federal Aviation Administration under Air Force Contract #FA8721-05-C-0002. Te NASA portion of tis work was also sponsored by te Federal Aviation Administration. Opinions, interpretations, conclusions and recommendations are tose of te autor and are not necessarily endorsed by te United States Government. We would like to tank James Cryssantacopoulos of MIT Lincoln Laboratory for is comments on early versions of tis work. References Amato, C., Bernstein, D. S., and Zilberstein, S. (2009). Optimizing fixed-size stocastic controllers for POMDPs and Decentralized POMDPs. Autonomous Agents and Multi-Agent Systems, 21:293 320. Aponso, B., Beard, S., and Scroeder, J. (2009). Te NASA Ames vertical motion simulator a facility engineered for realism. In Royal Aeronautical Society Spring Fligt Simulation Conference.

Baker, C., Saxe, R., and Tenenbaum, J. B. (2009). Action understanding as inverse planning. Cognition, 113:329 349. Botev, Z., Grotowski, J., and Kroese, D. (2010). Kernel density estimation via diffusion. Te Annals of Statistics, 38:2916 2957. Camerer, C. (2003). Beavioral Game Teory: Experiments in Strategic Interaction. Princeton University Press. Caplin, A., Dean, M., and Martin, D. (2011). Searc and satisficing. American Economic Review, 101(7):2899 2922. Costa-Gomes, M., Crawford, V., and Broseta, B. (2001). Cognition and beavior in normal-form games: An experimental study. Econometrica, 69(5):1193 1235. Dosi, P. and Gmytrasiewicz, P. (2005). A particle filtering based approac to approximating interactive POMDPs. In AAAI Conference on Artificial Intelligence. Dosi, P. and Gmytrasiewicz, P. (2006). On te difficulty of acieving equilibria in interactive POMDPs. In AAAI Conference on Artificial Intelligence. Graf, E. W., Warren, P. A., and Maloney, L. T. (2005). Explicit estimation of visual uncertainty in uman motion processing. Vision Researc, 45:3050 3059. Harris, C. and Wolpert, D. (1998). Signal-dependent noise determines motor planning. Nature, 394:780 784. Kittur, A., Ci, E., and Su, B. (2008). Crowdsourcing user studies wit Mecanical Turk. In ACM Conference on Human Factors in Computing Systems. Koller, D. and Milc, B. (2003). Multi-agent influence diagrams for representing and solving games. Games and Economic Beavior, 45:181 221. Lee, R. and Wolpert, D. H. (2012). Game teoretic modeling of pilot beavior during mid-air encounters. In Decision Making wit Imperfect Decision Makers. Springer. Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In International Conference on Macine Learning. Myerson, R. B. (1997). Game Teory: Analysis of Conflict. Harvard University Press. Ng, A. and Russell, S. J. (2000). Algoritms for inverse reinforcement learning. In International Conference on Macine Learning. Paolacci, G., Candler, J., and Ipeirotis, P. G. (2010). Running experiments on Amazon Mecanical Turk. Judgement and Decision Making, 5:411 419. Robinson, T., Willcox, K., Eldred, M., and Haimes, R. (2006). Multifidelity optimization for variablecomplexity design. In AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference. Sclict, E. J. and Scrater, P. R. (2007). Effects of visual uncertainty on grasping movements. Experimental Brain Researc, 182:47 57. Sellitto, M., Ciaramelli, E., and di Pellegrino, G. (2010). Myopic discounting of future rewards after medial orbitofrontal damage in umans. Journal of Neuroscience, 30:16429 16436. Simon, H. (1956). Rational coice and te structure of te environment. Psycological Review, 63:129 138. Simon, H. (1982). Models of bounded rationality. MIT Press. Vul, E., Frank, M., Alverez, G., and Tenenbaum, J. B. (2010). Explaining uman multiple object tracking as resource-constrained approximate inference in a dynamic probabilistic model. In Advances in Neural Information Processing Systems. Vul, E., Goodman, N., Griffits, T. L., and Tenenbaum, J. B. (2009). One and done? Optimal decisions from very few samples. In COGSCI Annual Meeting of te Cognitive Science Society. Weis, Y., Simoncelli, E. P., and Adelson, E. H. (2002). Motion illusions as optimal percepts. Nature Neuroscience, 5:598 604. Wrigt, J. R. and Leyton-Brown, K. (2010). Beyond equilibrium: Predicting uman beavior in normalform games. In AAAI Conference on Artificial Intelligence, Atlanta, Georgia. Wrigt, J. R. and Leyton-Brown, K. (2012). Beavioral game-teoretic models: A bayesian framework for parameter analysis. In International Conference on Autonomous Agents and Multiagent Systems. Zeng, Y. and Xiang, Y. (2010). Time-critical decision making in interactive dynamic influence diagram. In IEEE/WIC/ACM International Conference on Intelligent Agent Tecnology.