- PDF Free Download

Similar documents

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Recurrence. 1 Definitions and main statements

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

Extending Probabilistic Dynamic Epistemic Logic

The OC Curve of Attribute Acceptance Plans

An Alternative Way to Measure Private Equity Performance

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

8 Algorithm for Binary Searching in Trees

NON-CONSTANT SUM RED-AND-BLACK GAMES WITH BET-DEPENDENT WIN PROBABILITY FUNCTION LAURA PONTIGGIA, University of the Sciences in Philadelphia

What is Candidate Sampling

Generalizing the degree sequence problem

Calculation of Sampling Weights

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

1 Example 1: Axis-aligned rectangles

Stochastic epidemic models revisited: Analysis of some continuous performance measures

Addendum to: Importing Skill-Biased Technology

NPAR TESTS. One-Sample Chi-Square Test. Cell Specification. Observed Frequencies 1O i 6. Expected Frequencies 1EXP i 6

Joe Pimbley, unpublished, Yield Curve Calculations

The Cox-Ross-Rubinstein Option Pricing Model

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

The Greedy Method. Introduction. 0/1 Knapsack Problem

Rate Monotonic (RM) Disadvantages of cyclic. TDDB47 Real Time Systems. Lecture 2: RM & EDF. Priority-based scheduling. States of a process

PRIVATE SCHOOL CHOICE: THE EFFECTS OF RELIGIOUS AFFILIATION AND PARTICIPATION

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

Nordea G10 Alpha Carry Index

Portfolio Loss Distribution

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

DEFINING %COMPLETE IN MICROSOFT PROJECT

Implied (risk neutral) probabilities, betting odds and prediction markets

An Interest-Oriented Network Evolution Mechanism for Online Communities

Can Auto Liability Insurance Purchases Signal Risk Attitude?

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

PERRON FROBENIUS THEOREM

BERNSTEIN POLYNOMIALS

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

Information Acquisition and Transparency in Global Games

Forecasting the Direction and Strength of Stock Market Movement

What should (public) health insurance cover?

The Stock Market Game and the Kelly-Nash Equilibrium

EXAMPLE PROBLEMS SOLVED USING THE SHARP EL-733A CALCULATOR

Marginal Returns to Education For Teachers

Section 5.4 Annuities, Present Value, and Amortization

Scale Dependence of Overconfidence in Stock Market Volatility Forecasts

Solution: Let i = 10% and d = 5%. By definition, the respective forces of interest on funds A and B are. i 1 + it. S A (t) = d (1 dt) 2 1. = d 1 dt.

The literature on many-server approximations provides significant simplifications toward the optimal capacity

SPEE Recommended Evaluation Practice #6 Definition of Decline Curve Parameters Background:

) of the Cell class is created containing information about events associated with the cell. Events are added to the Cell instance

FINANCIAL MATHEMATICS. A Practical Guide for Actuaries. and other Business Professionals

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

17 Capital tax competition

How To Calculate An Approxmaton Factor Of 1 1/E

1 De nitions and Censoring

Project Networks With Mixed-Time Constraints

A Lyapunov Optimization Approach to Repeated Stochastic Games

Laws of Electromagnetism

Chapter 7: Answers to Questions and Problems

Logical Development Of Vogel s Approximation Method (LD-VAM): An Approach To Find Basic Feasible Solution Of Transportation Problem

Sketching Sampled Data Streams

Fisher Markets and Convex Programs

12 Evolutionary Dynamics

CHAPTER 14 MORE ABOUT REGRESSION

Georey E. Hinton. University oftoronto. Technical Report CRG-TR May 21, 1996 (revised Feb 27, 1997) Abstract

In our example i = r/12 =.0825/12 At the end of the first month after your payment is received your amount in the account, the balance, is

Housing Liquidity, Mobility and the Labour Market

Dynamic Pricing for Smart Grid with Reinforcement Learning

Performance Analysis of Energy Consumption of Smartphone Running Mobile Hotspot Application

Forecasting the Demand of Emergency Supplies: Based on the CBR Theory and BP Neural Network

This circuit than can be reduced to a planar circuit

Power law distribution of dividends in horse races

When Talk is Free : The Effect of Tariff Structure on Usage under Two- and Three-Part Tariffs

Internalization, Clearing and Settlement, and Stock Market Liquidity 1

REGULAR MULTILINEAR OPERATORS ON C(K) SPACES

Simple Interest Loans (Section 5.1) :

How Bad are Selfish Investments in Network Security?

Availability-Based Path Selection and Network Vulnerability Assessment

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Transcription:

Whch one should I mtate? Karl H. Schlag Projektberech B Dscusson Paper No. B-365 March, 996 I wsh to thank Avner Shaked for helpful comments. Fnancal support from the Deutsche Forschungsgemenschaft, Sonderforschungsberech 303 at the Unversty of Bonn s gratefully acknowledged. Abt. Wrtschaftstheore III, Department of Economcs, Unversty of Bonn, Adenauerallee 4-6, 533 Bonn, Germany.

Abstract We consder the model of socal learnng by Schlag [5]. Indvduals must repeatedly choose an acton n a mult-armed bandt. We assume that each ndvdual observes the outcomes of two other ndvduals' choces before her own next choce must be made { the orgnal model only allows for one observaton. Selecton of optmal behavor yelds a varant of the proportonal mtaton rule { the optmal rule based on one observaton. When each ndvdual uses ths rule then the adaptaton of actons n an nnte populaton follows an aggregate monotone dynamc. JEL classcaton numbers: C7, C79. Keywords: socal learnng, mult-armed bandt, mtaton, payo ncreasng, proportonal mtaton rule, aggregate monotone dynamc.

Introducton In ths paper we consder a varant of a model by Schlag [5]. Schlag consders a model of socal learnng n whch ndvduals repeatedly face a mult-armed bandt. Between ther choces each ndvdual may observe the performance of one other ndvdual, a stuaton referred to n the followng as sngle samplng. Indvduals forget about observatons n the past. Two alternatve approaches to selectng an optmal ndvdual behavor, a bounded ratonal approach and a populaton-orented approach are suggested. Ether approach leads to the same unque prescrpton, the so-called proportonal mtaton rule, of how to choose future actons: ) follow an mtatve behavor,.e., only change actons through mtatng others, ) never mtate an ndvdual that performed worse than oneself, and ) mtate an ndvdual that performed better wth a probablty that s proportonal to how much better ths ndvdual performed. In ths paper we analyze how the above result changes when an ndvdual s allowed to observe the performance of two other ndvduals between her choces. Ths stuaton wll be referred to as double samplng. In contrast to the sngle samplng settng t turns out that there s no behavor that s better than all other behavoral rules (accordng to ether of the selecton approaches for ndvdual behavor). However, there s a best way of performng better than under sngle samplng. Ths can be acheved by modfyng the proportonal mtaton rule, the resultng rule we call the adjusted proportonal mtaton rule. Ths varant of the proportonal mtaton rule speces addtonally to ) to ) above, v) to be more lkely to mtate the ndvdual n the sample who realzed the hgher payo, and v) to be more lkely to mtate one of the two sampled ndvduals the lower the payo of the other one s, especally not to gnore a sampled ndvdual that realzed a lower payo even though he wll never be mtated. Its smple functonal form and ts performance lead us to selectng the adjusted proportonal mtaton rule as the optmal rule under double sam-

plng. Where aggregate behavor of an nnte populaton of ndvduals usng the optmal rule under sngle samplng followed the replcator dynamc (Taylor [6]), under double samplng t follows an aggregate monotone dynamc (as dened by Samuelson and Zhang [4]). The rest of the paper s organzed as follows. In Secton two the basc payo realzaton and samplng scenaro s ntroduced. The feasble behavoral rules for ths settng are presented. In secton three we select among the behavoral rules. Secton four contans the mplcatons optmal behavor has for the populaton dynamcs. In Secton ve we consder an alternatve two populaton matchng scenaro. The Appendx contans the proof of the man theorem whch s stated n Secton three. The Settng Consder the followng dynamc process of choosng actons, samplng and updatng. Let W be a nte populaton (or set) of N ndvduals, N 3. In a sequence of rounds, each ndvdual must choose an acton from a nte set of actons A = f; ; ::; ng where n : Choosng the acton yelds an uncertan payo drawn from a probablty dstrbuton P wth nte support n [;!] where and!; <!, are exogenous parameters. Payos are realzed ndependently of all other events. Let denote the expected payo generated by choosng acton ;.e., = P x[;!] xp (x) ; A. Then the E tuple DA; (P ) A consttutes a mult-armed bandt or a game aganst nature. The set of all mult-armed bandts wth acton set A yeldng payos n [;!] wll be denoted by G (A; [;!]) : A state s A W of the populaton n a gven round t s the descrpton of the acton that each ndvdual s choosng n round t. Let (A) be the set of probablty dstrbutons on A. For a gven state s let p = p (s) (A) denote the probablty dstrbuton that s assocated wth randomly selectng an ndvdual and observng the acton she has chosen for ths round,.e., p (s) = jfc W : s (c) =gj ( A). The set of all such probablty N

dstrbutons wll be denoted by N (A),.e., p N (A) and A mples N p N. Gven ths notaton, the average expected payo of the populaton n state s; (s) ; s gven by (s) = P p (s). Before each round of payo realzaton, each ndvdual meets (or samples) two other ndvduals from the populaton and observes the payo each of them receved, together wth the assocated acton, n the prevous round. Gven three derent ndvduals c; d; e W; the probablty that ndvdual `c' samples ndvduals `d' and `e' s denoted by P (c ; fd; eg). In the followng we wll assume that samplng s symmetrc,.e., that P (c ; fd; eg) = P (d ; fc; eg) = P (e ; fc; dg) for all c; d; e W: The stuaton n whch samplngs occurs by choosng two ndvduals randomly from the populaton wll be called random samplng, n ths case P (c ; fd; eg) = all c; d; e W. for (N )(N ) The descrpton of how an ndvdual chooses her next acton n a multarmed bandt n G (A; [;!]) based on her prevous observatons s summarzed by a behavoral rule. We allow for the ndvdual to use a randomzng devce that generates ndependent events when makng ths choce. We restrct attenton to behavoral rules where observatons pror to her last payo realzaton do not nuence her next choce,.e., essentally an ndvdual forgets these observatons. Hence, a behavoral rule s a functon F : A[;!]fA[;!] A [;!]g!(a) where F (; x; fj; y; k; zg) r s the probablty of playng acton r after obtanng payo x wth acton and samplng ndvduals usng acton j and acton k that obtaned payo y and payo z respectvely. For ; j; k; r A, let F r jk := x;y;z[;!] be the so-called swtchng probabltes. F (; x; fj; y; k; zg) r P (x) P j (y) P k (z) ; ; j; k; r A; A class of behavoral rules of specal mportance n our analyss wll be the class of mtatng rules. A behavoral rule F s called mtatng f F (; x; fj; y; k; zg) r = 0 when r = f; j; kg. For an mtatng behavoral rule F let F (; x; fj; y; k; zg) jk denote the probablty of swtchng actons (to ether acton j or k),.e., F (; x; fj; y; k; zg) jk = let F jk jk F(; x; fj; y; k; zg) : Smlarly, be the assocated swtchng probabltes,.e., F jk jk = 3 F jk.

3 Examples In the followng we present some examples of behavoral rules. Behavoral rules under sngle samplng can be embedded n the class of behavoral rules under double samplng by randomly selectng one of the two sampled ndvduals and applyng the sngle samplng rule. More speccally, the behavoral rule f under sngle samplng,.e., f : A [;!] A[;!]! (A); s assocated to the behavoral rule F f under double samplng dened by F f (; x; fj; y; k; zg) r = f (; x; j; y) + f (; x; k; z) r r ; ; j; k; r A; x; y; z [;!] : Behavoral rules constructed n ths way wll be called sngle samplng rules. An mportant behavoral rule under sngle samplng rule s the mtatng rule f p that satses f p (; x; j; y) j = [y x] where [x] = x when x>0! + + and [x] + = 0 when x 0: Schlag [5] argues that ths so-called proportonal mtaton rule wth rate s the unque optmal rule under sngle samplng.! The assocated sngle samplng rule wll be denoted by F p. The behavoral rule of mportance n the present model of double samplng s the rule we refer to as the adjusted proportonal mtaton rule. Let : [;!]! R + be the lnearly decreasng functon such that () =! and (!) =!,.e., (x) =! +! x for x [;!] : (! ) Consder the behavoral rule ^F such that ^F (; x; fj; y; k; zg) j = (z)[y x] + ; ^F (; x; fj; y; k; zg) k = (y)[z x] + and ^F (; x; fj; y; j; zg) j = (z)[y x] + + (y)[z x] + ; jf; j; kgj = 3. In order for ^F to be n fact a behavoral rule we must show that ^F (; x; fj; y; k; zg) jk when x<yz;ths s true snce ^F (; ; fj; y; k; zg) jk = y! +! z! + z! +! y :! We wll call ^F the adjusted proportonal mtaton rule (based on [;!]). Notce that an ndvdual followng ths rule wll be more lkely to mtate the ndvdual n the sample that realzed the hgher payo. He wll never mtate 4

an ndvdual that realzed a lower payo, wll never-the-less use the payo of such an ndvdual to determne how lkely to swtch to the other ndvdual. Some extreme stuatons for how the payo of the one ndvdual nuences the probablty of swtchng to the other are as follows: ^F (; x; fj; y; k; g)j = f p (; x; j; y) j and ^F (; x; fj; y; k;!g) j = f p (; x; j; y) j : A popular rule under sngle samplng s the mtatng rule `mtate f better', where the ndvdual adapts the acton of the observed ndvdual f and only f t acheved a hgher payo. In the lterature ths rule s extended to the framework of multple samplng n the followng two derent ways. `Imtate the best' (Axelrod []) s the mtatng behavoral rule F that satses: F (; x; fj; y; k; zg) j =f y > max fx; zg and F (; x; fj; y; k; zg) = f x max fy; zg ; ; j; k A; x; y; z [;!] : `Imtate the best average' (Bruch []; Ellson and Fudenberg [3]) s the mtatng behavoral rule F that satses F (; x; fj; y; j; zg) j = f (y + z) > x and 0 otherwse, F (; x; f; y; j; zg) j =fz> (x+y) and 0 otherwse, F (; x; fj; y; k; zg) = j f y > max fx; zg ; F (; x; fj; y; k; yg) j = F (; x; fj; y; k; yg) k = f y > x and F (; x; fj; y; k; zg) =fxmax fy; zg ; ;j;k A wth jf; j; kgj =3; x; y; z [;!] : 4 Selecton Among the Rules The so-called expected mprovement EIP F (s) n state s s gven by the followng expresson: EIP F (s):= N j c;d;ew P (c ; fd; eg) F j s(c)s(d)s(e) h j s(c) : Indvduals are assumed to prefer so-called mprovng behavoral rules, these are rules that always generate non negatve expected mprovement. Formally, a behavoral rule F s called mprovng f EIP F (s) 0 for all s N (A) and all mult-armed bandts n G (A; [;!]) : Schlag [5] gves two alternatve scenaros that cause an ndvdual to choose an mprovng rule. ) Indvduals are boundedly ratonal. They enter the populaton by replacng a random ndvdual n the populaton. They adapt the acton last 5

chosen by ths ndvdual. In each round an ndvdual evaluates the performance of her behavor as f she just entered. Indvduals prefer a rule that always ncreases expected payos n any mult-armed bandt n G (A; [;!]). ) Indvduals evaluate the performance of ther behavor n a populaton of replcas. An ndvdual consders a populaton n whch each ndvdual s usng her behavoral rule. She prefers a rule that s expected to ncrease average payos n each state and each mult-armed bandt n G (A; [;!]). Schlag [5] characterzes the set of mprovng rules under sngle samplng. Especally t turns out that the proportonal mtaton rule wth rate s! mprovng and that the rule `mtate f better' s not mprovng. Clearly, an mprovng behavoral rule f n the sngle samplng settng s assocated to a sngle samplng rule F f (see Secton 3) that s mprovng n the present double samplng settng. The followng theorem characterzes the entre set of mprovng behavoral rules under double samplng. Theorem The behavoral rule F s mprovng f and only f F s mtatng and for all subsets f; j; kg A wth jf; j; kgj > there exsts a functon f;j;kg :[;!]! R + 0 such that F (; x; fj; y; k; zg) jk F (j; y; f; x; k; zg) F (k; z; f; x; j; yg) = f;j;kg (z)(y x)+ f;j;kg (y)(z x); () f = fj; kg. Proof. (n the Appendx) Theorem and ts proof gve lttle nsght as to whch functons f;j;kg () are assocated to an mprovng rule. Of course, the rght hand sde of () must be bounded above by ; especally f;j;kg (y) max all y [;!] : n ; y (! ) However, Theorem enables us to verfy whether a behavoral rule s mprovng or not. Consder for example the rules `mtate the best average' and `mtate the best'. () mples that nether of these rules s mprovng. In the followng we show ths statement usng a counterexample n order to explctly llustrate how these two rules fal to be mprovng. 6 o for

Fx x ; 3 + 3!. Consder a mult-armed bandt n whch P (x) =, P () =and P (!) = for some 0 <<. Then > f and only f <! x! :Notce that the rule `mtate the best' and the rule `mtate the best average' nduce the same swtchng probabltes F = F = and F = F = : Especally, > mples F 3 F > 0 and F F < 0. Followng (8), ths leads to negatve expected mprovement f only acton and acton are played n the populaton, wth postve probablty some ndvdual usng acton observes some ndvdual usng acton and f < <! x. Hence we see that nether `mtate the best' 3! nor `mtate the best average' s mprovng. Under the sngle samplng rules, Schlag [5] shows that the proportonal mtaton rule F p never acheves a lower expected mprovement than any other mprovng sngle samplng rule. Hence, we say that F p domnates the sngle samplng rules. More generally, let F be a set of behavoral rules. We say that a behavoral rule F domnates the set of behavoral rules F f EIP F (s) EIP F 0 (s) for all F 0 F, for any state s and for any mult-armed bandt n G (A; [;!]). Consequently, f F contans an mprovng rule and F domnates F then F s mprovng. In the followng we wll show that mprovng rules under double samplng wth constant swtchng rates f;j;kg () are of no advantage compared to the sngle samplng scenaro. As mentoned above, the hghest expected mprovement s realzed by the proportonal mtaton rule F p : Followng (8), EIP F p (s)= N(! ) c;d;ew P (c ; fd; eg) s(d) s(c) : () Snce f;j;kg (!), followng () and (8), an mprovng rule under double samplng wth constant swtchng rates f;j;kg never acheves a! hgher expected mprovement than F p. The advantage of double samplng les n the fact that swtchng rates of mprovng rules must no longer be constant. The followng theorem states that, unlke under sngle samplng, under double samplng there s no behavoral rule that domnates all other mprovng rules. However, we show that followng the adjusted proportonal mtaton rule ^F s the best way of 7

performng better than the proportonal mtaton rule F p. Theorem Let F be the set of sngle samplng rules that are mprovng. Let F be the set of rules that domnate F. Then the adjusted proportonal mtaton rule ^F domnates F. There s no behavoral rule that domnates the set of mprovng rules. In the followng, let F 3 be the set of rules that domnate F : Proof. Consder an mprovng rule F 0 F, let 0 f;j;kg be the assocated swtchng rates. We wll rst show that 0 (!) = f;j;kg : As mentoned! above, 0 (!) : Consder the mult-armed bandt n whch P f;j;kg! () = and P j (!) =P k (!)=. Consder a populaton wth one ndvdual usng, one usng j and the rest usng k: Usng the fact that a j jk = ak jk = 0 f;j;kg (!) t follows that EIP F 0 = N 0 (!)(! f;j;kg ) (! N! ) = EIP F p. Snce F 0 domnates F p and F p F weobtan that 0 (!) =. f;j;kg! Notce that 0 (!)(y )+ f;j;kg 0 (y)(! ) f;j;kg F0 (; ; fj; y; k;!g) jk mples 0 (y) f;j;kg h (y ) 0f;j;kg! (!) =! +! y (! ) = (y) : (3) Hence, (3) and (8) mply EIP ^F EIP F 0 for any state s and any mult-armed bandt n G (A; [;!]) whch means that ^F F 3. Especally, t follows that f;j;kg (y) = (y) for any rule F F 3 : We wll now construct a rule that s not domnated by any rule n F 3 : Ths wll show that there s no rule that domnates all other mprovng rules. Let ~ F be the behavoral rule that s constructed lke ^F usng the functon ~ where ~ (y) = +! when y and ~ (y) = 0 for y > +! : It follows! that F ~ (; ; fj; y; k; zg) jk and hence that F ~ s n fact a behavoral rule. Moreover, by constructon, ~ F s mprovng and ~ (y) > (y) for all <y +! :Hence, ~ F s not domnated by any rule n F3. One can argue that an ndvdual wll choose an mprovng rule that domnates the mprovng rules under sngle samplng,.e., a rule n F. She 8

mght as well choose a rule that s best at dong ths,.e., a rule n F 3. We presented such a rule, the adjusted proportonal mtaton rule, that addtonally never mtates lower payos and has a smple form. Ths leads us to selectng ths rule as the optmal rule under double samplng. 5 Populaton Dynamcs In ths secton we consder the aggregate behavor of a populaton n whch each ndvdual uses the optmal rule. We wll restrct attenton to random samplng. Moreover, we wll consder adjustment n nnte populatons as an approxmaton of the short run adjustment of a large populaton. Schlag [5] speces the exact meanng of ths approxmaton for the sngle samplng settng. In an nnte populaton, random samplng means that the probablty that an ndvdual observes acton s equal to the proporton of ndvduals usng ths acton. In ths sense, a descrpton of the proportons p usng acton for each A s sucent to determne the populaton adjustment. Hence we wll dentfy the state of a populaton wth p = (p ) A (A): Straghtforward calculatons show that the adjustment process (p t ) tn of a monomorphc populaton (each ndvdual s followng the same behavor) n whch the underlyng rule s mprovng, gven an ntal state p (A), s gven by p t+ = p t + pt for A and t N. j;k pt j pt k proportonal mtaton rule ^F,weobtan " p t+ = p t + where (p) = P Ap : h a j ( jk k )+a k jk ( j ) If, n addton, the underlyng rule s the adjusted! +! (pt ) (! ) #, p t p t, (4) 9

6 A Two Populaton Matchng Scenaro What about a settng n whch the mult-armed bandt s not statonary over tme? We wll consder a popular example for such a stuaton; ndvduals wll be randomly matched to play a game. dsjont sets (populatons) of ndvduals W Consder two nte, and W, each of sze N, also referred to as populaton one and two. Let A be the nte set of actons avalable to an ndvdual n populaton ; = ; : Payos are realzed by matchng ndvduals from derent populatons. When an ndvdual n populaton one usng acton A s matched wth an ndvdual n populaton two usng acton j A ; the ndvdual n populaton k acheves an uncertan payo drawn from a gven, ndependent probablty dstrbuton P k; j k =;. Assocatng player to beng an ndvdual n populaton, the tuple * A ;A ; P j A ; ja P j + A ja denes an asymmetrc two player normal form game. We wll restrct attenton to the class of asymmetrc two player normal form games, denoted by G (A ;A ;[ ;! ];[ ;! ]), n whch player k has acton set A k ;k =;;where P j has nte support n [ ;! ] and P j has nte support n [ ;! ] for all A and j A ; <! and <! are gven. For a gven asymmetrc game, let () and () be the blnear functons on (A ) (A ) where k (; j) s the expected payo to player k when player one s usng acton and player two s usng acton j;.e., k (; j) = P x[ k ;! k ] xp k j (x) ;k=;: Indvduals of opposte populatons are matched at random n pars, for an ndvdual n populaton one ths means the followng. Let s (A ) W be the current state n populaton one and let p N (A ) be the assocated populaton shares. Smlarly let s (A ) W and q N (A ) be dened for populaton two. Then an ndvdual n populaton one s matched wth an ndvdual n populaton two usng acton j A wth probablty q j. Snce we consder random matchng, (; q) speces the expected payo of an ndvdual n populaton one usng acton A and (p; q) speces the average payo n populaton one n ths state. Especally, each ndvdual n populaton one s facng a mult-armed bandt D A ; (P 0 ) AE G(A ;[ ;! ]) that 0

depends on the populaton shares n populaton two; P 0 (x) =P jaq j P j (x) for x [ ;! ] and A. Samplng occurs wthn the same populaton and s performed as n the mult-armed bandt settng. A behavoral rule F for an ndvdual n populaton k s a functon F : (A k )[ k ;! k ](A k )[ k ;! k ]!(A k ), k =;: Schlag [5] gves two scenaros n whch an ndvdual prefers to use the same rule n ths populaton matchng settng as n the former mult-armed bandt settng: ) It mght be that ndvduals do not realze that the mult-armed bandt s non statonary or that they smply gnore ths fact. ) An ndvdual mght choose her rule accordng to ts performance n a populaton of replcas and prefers a rule that s expected to ncrease average payos whenever all ndvduals n the opposte populaton do not change ther acton. Hence, we consder the adjusted proportonal mtaton rule based on [ ;! ]tobetheoptmal rule for an ndvdual n populaton n ths populaton matchng settng. In the followng we consder the aggregate behavor of the two populatons under random samplng when each ndvdual uses her optmal rule. As n Secton 5 we consder the lmt of ths adjustment asthe populaton sze N tends to nnty and apply a law of large numbers type of argument. Analogue to (4), the resultng adjustment process (p t ;q t ) tn gven by " = p t + +! (p t ;q t )! " j = q t j + +! (p t ;q t )! p t+ q t+ # h (! ) ; q t p t ;q t p t ; (5) # h (! ) p t ;j p t ;q t q t j, for A ;j A and t N. Accordng to Samuelson and Zhang [4], (5) s called an aggregate monotone dynamc. Under sngle samplng the adjustment generated when each ndvdual s usng her optmal rule (.e., the proportonal mtaton rule wth rate! for populaton ) s approxmated s

by the followng dscrete verson of the replcator dynamc (Taylor [6]): p t+ = p t + h ; q t p t ;q t p t! ; (6) q t+ j = q t j +! h p t ;j p t ;q t q t j. Comparng ths to (5) we see that the advantage of double samplng for ndvduals usng ther optmal rule ^F s greatest when average payos n ther own populaton are low. References [] R. M. Axelrod, The Evoluton of Cooperaton, Basc Books, New York, 984. [] E. Bruch, \Evoluton von Kooperaton n Netzwerken", Dplomarbet, Unversty of Bonn, 993. [3] G. Ellson and D. Fudenberg, Word-Of-Mouth Communcaton and Socal Learnng, Quart. J. Econ. 440 (995), 93-5. [4] L. Samuelson and J. Zhang, Evolutonary Stablty n Asymmetrc Games, J. Econ. Theory 57 (99), 363-39. [5] K. H. Schlag, \Why Imtate, and f so, How? A Bounded Ratonal Approach to Mult-Armed Bandts," Unversty of Bonn, Dsc. Paper B-36, Bonn, 996. [6] P. Taylor, Evolutonarly Stable Strateges Wth Two Types of Players, J. Appled Prob. 6 (979), 76-83. A The Proof of Theorem Proof. For ; j; k A and s A W p jk (s) = let c;d;ew fs(c);s(d);s(e)g=f;j;kg P (c ; fd; eg).

We wll rst show the `f' statement. EIP F (s) = N + N c;d;ew s(d)=s(e) c;d;ew s(d)6=s(e) hf s(d) s(c)s(d)s(e) = 3N + 3N ;ja P (c ; fd; eg) P (c ; fd; eg) hf s(d) s(c)s(d)s(e) s(d) s(c) s(d) s(c) + F s(e) s(c)s(d)s(e) p jj (s) ;j;ka jf;j;kgj=3 F j jj p jk (s) 6 4 F jj + + F j jk s(e) s(c) ( j ) (7) F k jk F k jk F jk Consder actons ; j; k A such that = fj; kg : Then F kj F j kj ( j ) ( k ) ( k j ) 3 7 5 F jk jk F jk F kj = x;y;z P (x) P j (y) P k (z) hf (; x; fj; y; k; zg) jk F (j; y; f; x; k; zg) F (k; z; f; x; j; yg) = P (x) P j (y) P k (z) x;y;z " # = P k (z) f;j;kg (z) z f;j;kg (z)(y x)+ f;j;kg (y)(z x) ( j )+ " y P j (y) f;j;kg (y) # ( k ) and, gven a l jk = P y P l (y) f;j;kg (y), l f; j; kg, weobtan h F j jk F jk ( j )+ F k jk F kj ( k )+ = = = F jk jk F jk F kj + F k jk F j jk F j kj j + F j kj F k jk F k jk F j kj ( k ) a j + f;j;kg ( j ) a k f;j;kg + ( j ) a k + f;j;kg ( k j ) a f;j;kg j + ( k ) a j + f;j;kg ( j k ) a f;j;kg k h ( j ) a k f;j;kg +( k ) a j +( f;j;kg j k ) af;j;kg 0: 3 ( k j ) k F k jk

Hence, (7) smples to EIP F (s)= N c;d;ew and t follows that EIP F 0: P (c ; s(e) fd; eg) s(d) s(c) a (8) fs(c);s(d);s(e)g Wenow come to the proof of the `only f' statement. In order to smplfy the presentaton of the proof we wll assume that P (c ; fd; eg) > 0 for all c; d; e W (jfc; d; egj = 3). The proof can be easly adjusted to the more general case. The fact that F s mtatng follows just lke under sngle samplng (Schlag [5]). Assume that F (; x; fj; y; k; zg) r > 0 for some r = f; j; kg and x; y; z [;!]. Consder amult-armed bandt n whch P (x) =P (y)=p (!)= 3 ; P j P k P and P l () = for all l = f; j; kg. In a state s whch only actons n f; j; kg are beng played t follows that EIP F (s) < 0: Next we wll show () for 6= j = k. Consder a populaton state n whch one ndvdual s playng acton and the rest are playng acton j. Then Let EIP F (s)= N F j jj F jj ( j ) : (9) g (x; y; z) =F(; x; fj; y; j; zg) j F (j; y; f; x; j; zg) F (j; z; f; x; j; yg) ; x; y; z [;!] : Let y = z. We now follow the same arguments as n the proof of Theorem n Schlag [5] to shows that there exsts jj :[;!]! R + 0 that such g (x; y; y) = jj (x)(y x) for all x; y [;!] : (0) For gven x; y [;!] ; consder the mult-armed bandt where P (x) = P j (y) =. Then F j jj F jj = g (x; y; y) and hence followng (9), g (x; y) 0 and g (y; x) 0 whenever y>x: () Moreover, usng arguments nvolvng symmetry t follows that g (x; x) = 0 for all x [;!]. Next we wll show that g (x; y; y) y x = g (x; z; z) z x 8y <x<z. () 4

Gven y < x < z; consder a mult-armed bandt where P (x) = ; P j (y) = and P j (z) = ; 0 : Then j > f and only f < z x =: z y. Agan followng (9), we obtan F j jj F jj = g (x; y)+( )g(x; z) 0 f < and Therefore, g (x; y)+( that () s true. g (x; y)+( )g(x; z) 0 f > )g(x; z) =0;whch, after smplcaton, shows Followng () there exsts jj : (;!)! R + 0 such that g (x; y; y) = jj (x) (y x) for all x; y (;!). Lookng back at the above proof we see that the explct values of and! dd not enter the argument. Hence, (0) holds for all x; y [;!]. Consder now a mult-armed bandt wth P (x) = ; P j (y) = and P j (z) = for y<x<zand 0. Followng (9) and gven I () = F j jj F jj (3) = jj (y)(y x)+( )g(x; y; z)+( ) jj (z)(z x) we obtan that I 0 f and only f j : Hence I = 0 f and only f j = f and only f = z x =: z y. Snce I ( ) = 0, (z x) jj (y) + g(x; y; z)+(x y) jj (z) =0and hence g (x; y; z) = jj (z)(y x)+ jj (y)(z x): (4) We wll now derve g (x; y; z) for y z < x. Consder a mult-armed bandt wth P j (x) = ; P (y) = ; P (z) = and P (z 0 ) = for z 0 >x:then I = jj (y)(y x)+ jj (z)(z x)+g (x; y; z) (5) + ( )[ jj (z 0 )(y x)+ jj (y)(z 0 x)] + ( )[ jj (z 0 )(z x)+ jj (z)(z 0 x)] +( ) jj (z 0 )(z 0 x). As before, I =0f and only f = j f and only f (x y)+(x z)= ( )(z 0 x). Settng (x y)+(x z)=( )(z 0 x), together wth (5) mples that (4) also holds for y z<x. 5

Repeatng such calculatons for the remanng values of (x; y; z) not yet consdered nally yelds that (4) holds for all x; y; z [;!] : Ths completes the proof of the `only f' statement for j = k: We now proceed wth the case where j 6= k: Consder a populaton n whch one ndvdual s playng, one s playng k and the rest are playng j. Consder a mult-armed bandt n whch j = k : Then F j jj EIP F (s) = 3N p jj (s) + 3N p jk (s) F jj ( j ) h F jk jk F jk F kj ( j ) Snce F j jj F = 0 f jj = j t follows that F jk jk F jk F = 0 f kj and only f = j must hold. Followng the same arguments as n the proof where j = k we obtan that there exsts jk kj :[;!]! R + such that F (; x; fj; y; k; zg) jk F (j; y; f; x; k; zg) F (k; z; f; x; j; yg) = jk (z)(y x)+ jk (y)(z x) holds for all x; y; z [;!] : The only thng remanng to show s that jk s ndependent of a permutaton of, j and k. Consder a mult-armed bandt wth P (x) = P j (y) = P k (z) = and a populaton n whch one ndvdual s playng, one s playng k and the rest are playng j. Followng the calculatons when provng the `f' statement, we obtan EIP F (s) = 3N p jj (s) jj (y)(y x) + 3N p kjj (s) kjj (y)(y z) (6) 3N p jk (s) 6 4 Settng y = z ths smples to h EIP F (s) = 3N p jj (s) jj (y)(y x) x jk (y) + 3N p jk (s) x jk (z)(y h x)+ jk (y)(z x) +y jk (z)(x h y)+ jk (x)(z y) +z kj (x)(y z)+ kj (y)(x z) 6 y jk (y) y kj (y) 3 7 5

whch mples, when settng y = x 6= 0, that jk (x) jk (x) kj (x) = 0 for any x 6= 0: Smlarly, settng x = y n (6) leads to kj (x) jk (x) jk (x) = 0 for any x 6= 0: Together ths means that kj (x) = jk (x) = jk (x) for all x 6= 0: The specal case of x = 0 s easly shown usng more general mult-armed bandts and hence the proof s complete. 7