Learning to Play 33 Games: Neural Networks as Bounded-Rational Players Technical Appendix

Similar documents
UCLA. Department of Economics Ph. D. Preliminary Exam Micro-Economic Theory

Temporal Difference Learning in the Tetris Game

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

Problem Set 2 Due Date: March 7, 2013

Game Theory: Supermodular Games 1

How To Play A Static Bayesian Game

Equilibrium computation: Part 1

36 CHAPTER 1. LIMITS AND CONTINUITY. Figure 1.17: At which points is f not continuous?

Oligopoly markets: The price or quantity decisions by one rm has to directly in uence pro ts by other rms if rms are competing for customers.

CAPM, Arbitrage, and Linear Factor Models

The Binomial Distribution

0.0.2 Pareto Efficiency (Sec. 4, Ch. 1 of text)

SELECTING NEURAL NETWORK ARCHITECTURE FOR INVESTMENT PROFITABILITY PREDICTIONS

ECON 459 Game Theory. Lecture Notes Auctions. Luca Anderlini Spring 2015

Follow links Class Use and other Permissions. For more information, send to:

10 Evolutionarily Stable Strategies

EconS Advanced Microeconomics II Handout on Cheap Talk

How to Solve Strategic Games? Dominant Strategies

Do Commodity Price Spikes Cause Long-Term Inflation?

Lecture V: Mixed Strategies

1 Representation of Games. Kerschbamer: Commitment and Information in Games

Self Organizing Maps: Fundamentals

Midterm March (a) Consumer i s budget constraint is. c i b i c i H 12 (1 + r)b i c i L 12 (1 + r)b i ;

How I won the Chess Ratings: Elo vs the rest of the world Competition

A Game Theoretic Model of Spam ing

Tennis Winner Prediction based on Time-Series History with Neural Modeling

Solution to Homework 2

Lecture 6. Artificial Neural Networks

Exact Nonparametric Tests for Comparing Means - A Personal Summary

Competition and Regulation. Lecture 2: Background on imperfect competition

COMBINED NEURAL NETWORKS FOR TIME SERIES ANALYSIS

Recurrent Neural Networks

DATA ANALYSIS II. Matrix Algorithms

Online Appendix to Stochastic Imitative Game Dynamics with Committed Agents

Making Sense of the Mayhem: Machine Learning and March Madness

From the probabilities that the company uses to move drivers from state to state the next year, we get the following transition matrix:

c 2008 Je rey A. Miron We have described the constraints that a consumer faces, i.e., discussed the budget constraint.

How To Solve A Minimum Set Covering Problem (Mcp)

Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski

Machine Learning: Multi Layer Perceptrons

Moral Hazard. Itay Goldstein. Wharton School, University of Pennsylvania

A Content based Spam Filtering Using Optical Back Propagation Technique

Manipulability of the Price Mechanism for Data Centers

Game Theory and Algorithms Lecture 10: Extensive Games: Critiques and Extensions

19 LINEAR QUADRATIC REGULATOR

An Introduction to Neural Networks

How to Win Texas Hold em Poker

Course: Model, Learning, and Inference: Lecture 5

Representation of functions as power series

FINAL EXAM, Econ 171, March, 2015, with answers

Game Theory and Nash Equilibrium

Game Theory 1. Introduction

The Basics of Game Theory

Computational Game Theory and Clustering

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

The theory of the six stages of learning with integers (Published in Mathematics in Schools, Volume 29, Number 2, March 2000) Stage 1

Topic 5: Stochastic Growth and Real Business Cycles

A Simpli ed Axiomatic Approach to Ambiguity Aversion

Lecture Notes: Basic Concepts in Option Pricing - The Black and Scholes Model

6.254 : Game Theory with Engineering Applications Lecture 1: Introduction

Iranian J Env Health Sci Eng, 2004, Vol.1, No.2, pp Application of Intelligent System for Water Treatment Plant Operation.

Heterogeneous Expectations, Adaptive Learning, and Evolutionary Dynamics

Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems

Lecture Notes 10

CHAPTER 5 PREDICTIVE MODELING STUDIES TO DETERMINE THE CONVEYING VELOCITY OF PARTS ON VIBRATORY FEEDER

6.254 : Game Theory with Engineering Applications Lecture 2: Strategic Form Games

Oligopoly: How do firms behave when there are only a few competitors? These firms produce all or most of their industry s output.

WARWICK ECONOMIC RESEARCH PAPERS

Games of Incomplete Information

Hybrid Data Envelopment Analysis and Neural Networks for Suppliers Efficiency Prediction and Ranking

Statistical Machine Translation: IBM Models 1 and 2

ECON Game Theory Exam 1 - Answer Key. 4) All exams must be turned in by 1:45 pm. No extensions will be granted.

Keywords: Image complexity, PSNR, Levenberg-Marquardt, Multi-layer neural network.

Lab 11. Simulations. The Concept

Prediction Model for Crude Oil Price Using Artificial Neural Networks

Computational Learning Theory Spring Semester, 2003/4. Lecture 1: March 2

Théorie de la décision et théorie des jeux Stefano Moretti

Efficient online learning of a non-negative sparse autoencoder

Supporting Online Material for

Data Mining Practical Machine Learning Tools and Techniques

Prime Time: Homework Examples from ACE

Data Mining Lab 5: Introduction to Neural Networks

ELLIOTT WAVES RECOGNITION VIA NEURAL NETWORKS

Lecture 1: Systems of Linear Equations

Assignment #1. Example: tetris state: board configuration + shape of the falling piece ~2 200 states! Recap RL so far. Page 1

2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system

4. Continuous Random Variables, the Pareto and Normal Distributions

ALMOST COMMON PRIORS 1. INTRODUCTION

Integer Programming Formulation

Chemical Kinetics. 2. Using the kinetics of a given reaction a possible reaction mechanism

Transcription:

Learning to Play 33 Games: Neural Networks as Bounded-Rational Players Technical Appendix Daniel Sgroi 1 Daniel J. Zizzo 2 Department of Economics School of Economics, University of Warwick University of East Anglia December 2007 Abstract This is a technical appendix designed to support "Learning to Play 3 3 Games: Neural Networks as Bounded-Rational Players" by the same authors. In particular, appendix A provides a rigorous de nition and description of backpropagation as used in section 2.3, appendix B provides further detail on convergence from section 3.1, and appendix C examines the algorithms from section 3.2. Appendix A Before de ning backpropagation, we begin with the error function. Following Rumelhart et al. (1986) we use a function of the form 4w ij = @" @w ij = k ip o j (1) where w ij is the weight of the connection between the sending node j and receiving node i. As " is the neural network s error, @"=@w ij measures the sensitivity of the neural network s error to the changes in the weight between i and j. There is also a learning rate given by 2 (0; 1]: this is a parameter of the learning algorithm and must not be chosen to be too small or learning will be particularly vulnerable to local error minima. Too high a value of may also be problematic as the network may not be able to settle on any stable con guration of weights. De ne @"=@w ij = k i o j where o j is the degree of activation (the actual output) of the sender node j for a given set of inputs. The higher is o j, the more the sending node is at fault for the erroneous output, so it is this node we wish to correct more. k i is the error on unit i for a given set of inputs, multiplied by the derivative of the output node s activation function given 1 Address: Faculty of Economics, Austin Robinson Building, Sidgwick Avenue, Cambridge CB3 9DD, UK. Email: daniel.sgroi@econ.cam.ac.uk. Telephone: +44 1223 335244. Fax: +44 1223 335299. 2 Address: School of Economics, University of East Anglia, Norwich, NR4 7TJ, UK. Email: d.zizzo@uea.ac.uk. 1

its input. Calling g i the goal activation level (the target correct output) of node i for a given set of inputs, in the case of the output nodes k i can be computed as k i = (g i o i )f 0 (t i ) = (g i o i )o i (1 o i ) (2) since the rst derivative f 0 (t i ) of the receiving node i in response to the set of inputs is equal to o i (1 o i ) for a logistic activation function. Now assume that a network has N layers, for N 2. As above, we call layer 1 the input layer, 2 the layer which layer 1 activates (the rst hidden layer), and so on, until layer N the output layer which layer N 1 activates. Using backpropagation, we rst compute the error of the output layer (layer N) using equation 2, and update the weights of the connections between layer N and N 1, using equation 1. We then compute the error to be assigned to each node of layer N 1 as a function of the sum of the errors of the nodes of layer N that it activates. Calling i the hidden node, the current set of inputs and an index for each node of layer N (activated by i), we can use: k i = f 0 (t i ) X k w i (3) to update the weights between layer N 1 and N 2, together with equation 1. We follow this procedure backwards iteratively, one layer at a time, until we get to layer 1, the input layer. A variation on standard backpropagation would involve replacing equation 1 with a momentum function of the form: 4w t ij = @"t @w t ij + 4 w t 1 ij (4) where 2 [0; 1) and t 2 N ++ denotes the time index (an example game, vector x, is presented in each t during training). Appendix B The following table contains details on the convergence of C trained with di erent random seeds and achieving various levels of convergence for di erent learning and momentum rates. Only with a combination of high and ( = 0:5 and = 0:9) was convergence not achieved, even at = 0:1. A high momentum rate was particularly detrimental. A low learning rate ( = 0:1) slowed learning, particularly with 0:3, but did not prevent C from converging at = 0:1 or better. A maximum of 600000 rounds was used to try to achieve convergence. In practice, though, = 0:1 convergence was usually reached in about 100000-200000 rounds. 2

Convergence Momentum Rate µ Level γ = 0.1 0 0.3 0.6 0.9 Learning 0.1 30 30 30 30 Rate η 0.3 30 30 30 25 Convergence Momentum Rate µ Level γ = 0.05 0 0.3 0.6 0.9 Learning 0.1 27 30 30 30 Rate η 0.3 30 30 30 11 Convergence Momentum Rate µ Level γ = 0.02 0 0.3 0.6 0.9 Learning 0.1 0 1 30 30 Rate η 0.3 30 30 30 0 Table 5: Number of C Converged Appendix C This part of the appendix contains a number of de nitions and explanations of the algorithms or solution concepts used in the main text. Consider the game G and the trained neural network player C. Index the neural network by i and the other player by j. The neural network s minmax value (or reservation utility) is de ned as r i = min aj [max ai i (a i ; a j )]. The payo r i is literally the lowest payo player j can hold the network to by any choice of a 2 A, provided that the network correctly foresees a j and plays a best response to it. Minmax therefore requires a particular brand of pessimism to have been developed during the network s training on Nash equilibria. Rationalizability is widely considered a weaker solution concept compared to Nash equilibrium, in the sense that every Nash equilibrium is rationalizable, though every rationalizable equilibrium need not be a Nash equilibrium. Rationalizable sets and the set which survives the iterated deletion of strictly dominated strategies are equivalent in two player games: call this set S n i for player i after n stages of deletion. To give a simple intuitive de nition, S n i is the set of player i s strategies that are not strictly dominated when players j 6= i are constrained to play strategies in S n 1 j. So the network will delete strictly dominated strategies and will assume other players will do the same, and this may reduce the available set of strategies to be less than the total set of actions for i, resulting in a subset Si n A i. Since we are dealing with only three possible strategies in our game G, the subset can be adequately described as Si 2 A i with player j restricted to S 1 j A i. The algorithm 0SD checks whether all payo s for the neural network (the row player) from playing an action are strictly higher than those of the other actions, so no restriction is applied to the action of player j 6= i, and player i s actions are chosen from S 0 i A i. 1SD allows a single 3

level of iteration in the deletion of strictly dominated strategies: the row player thinks that the column player follows 0SD, so chooses from Sj 0 A j, and player i s action set is restricted to Si 1 A i. Both of these algorithms can be viewed as weakened, less computationally demanding versions of iterated deletion. L1 and MPD are di erent ways of formalizing the idea that the agent might try to go for the largest payo s, virtually independently of strategic considerations. If following MPD, C simply learns to spot the highest conceivable payo for itself, and picks the corresponding row, hoping the other player will pick the corresponding column. In the case of L1, an action a i = a L1 is chosen by the row player according to a L1 = arg max ai 2A i fa i j a j = 4 j g, where 4 j is de ned as a perfect mix over all available strategies in A j. Put simply, a L1 is the strategy which picks a row by calculating the payo from each row, based on the assumption that player j will randomly select each column with probability 1, and then chooses the row with the highest payo calculated in 3 this way. It corresponds to the de nition of level 1 thinking in Stahl and Wilson (1995). A player that assumes that his opponent is a level 1 thinker will best-respond to L1 play, and hence engage in what, following Stahl and Wilson (1994) and Costa Gomes et al. (2001), we can label L2 (for level 2 ) play. Finally, the NNG is an algorithm that is based on the idea that agents respond to new situations by comparing them to the nearest example encountered in the past, and behaving accordingly. The NNG algorithm examines each new game from the set of all possible games, and attempts to nd the speci c game from the training set (which proxies for memory) with the highest level of similarity. We compute similarity by summing the square di erences between each payo value of the new game and each corresponding payo value of each game of the training set. The higher the sum of squares, the lower the similarity between the new game and each game in the training set. The NNG algorithm looks for the game with the lowest sum of squares, the nearest neighbor, and chooses the unique pure Nash equilibrium corresponding to this game. Since, by construction, all games in the training set have a unique pure strategy Nash equilibrium, we are virtually guaranteed to nd a NNG solution for all games in the testing set. The only potential (but unlikely) exception is if the nearest neighbor is not unique, because of two (or more) games having exactly the same sum of squares. The exception never held with our game sample. Clearly, a unique solution, or indeed any solution, may not exist with algorithms such as rationalizability, 0SD and 1SD. A unique solution may occasionally not exist with other algorithms, such as MPD, because of their reliance on strict relationships between payo values. 4

References Costa Gomes M, Crawford VP, Broseta B, 2001. Cognition and behavior in normal-form games: an experimental study. Econometrica 69, 1193-1235. Rumelhart DE, Hinton RJ, Williams RJ, 1986. Learning representations by backpropagating error. Nature 323, 533-536. Stahl, DO, Wilson PW, 1994. Experimental evidence on players models of other players. Journal of Economic Behavior and Organization 25, 309-327. Stahl, DO, Wilson PW, 1995. On players models of other players: theory and experimental evidence. Games and Economic Behavior 10, 218-254. 5