# Chapter 7. BANDIT PROBLEMS.

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974). In the first section, we present a description of bandit problems and give some historical background. In the next section, we treat the one-armed bandit problems by the method of the previous chapter. In the final section, we discuss the Theorem of Gittins and Jones which shows that the k -armed bandit problems may be solved by solving k one-armed problems. An excellent reference to bandit problems is the book of Berry and Fristedt, (985). 7. Introduction to the Problem. Consider a sequential decision problem in which at each stage there are k possible actions or choices of experiment. Choice of action j results in an observation being taken from the j th experiment, and you receive the numerical value of this observation as a reward. The observations you make may give you information useful in future choices of actions. Your goal is to maximize the present value of the infinite stream of rewards you receive, discounted in some way. The name bandit comes from modeling these problems as a k -armed bandit, which is a slot machine with k arms, each yielding an unknown, possibly different distribution of payoffs. You do not know which arm gives you the greatest average return, but by playing the various arms of the slot machine you can gain information on which arm is best. However, the observations you use to gain information are also your rewards. You must strike a balance between gaining rewards and gaining information. For example, it is not good to always pull the arm that has performed best in the past, because it may have been that you were just unlucky with the best arm. If you have many trials to go and it only takes a few trials to clarify the matter, you can stand to improve your average gain greatly with only a small investment. Typically in these problems, there is a period of gaining information, followed by a period of narrowing down the arms, followed by a period of profit taking, playing the arm you feel to be the best. A more important modeling of these problems comes from clinical trials in which there are k treatments for a given disease. Patients arrive sequentially at the clinic and must be treated immediately by one of the treatments. It is assumed that response from treatment is immediate so that the effectiveness of the treatment that the present patient receives is known when the next patient must be treated. It is not known precisely which one of the

2 Bandit Problems 7.2 treatments is best, but you must decide which treatment to give each patient, keeping in mind that your goal is to cure as many patients as possible. This may require you to give a patient a treatment which is not the one that looks best at the present time in order to gain information that may be of use to future patients. To describe bandit problems more precisely, let the k reward distributions be denoted by F (x θ ),...,F k (x θ k ) where θ,...,θ k are parameters whose exact values are not known precisely, but whose joint prior distribution is known and denoted by G(θ,...,θ k ). Initially, an action a is chosen from the set {,...,k} andthenanobservation,z,the reward for the first stage, is taken from the distribution F a. Based on this information, an action a 2 is then taken from the same action space, and an observation, Z 2,istaken from F a2 and so on. It is assumed that given a n and the parameters θ,...,θ k, Z n is chosen from F an independent of the past. A decision rule for this problem is a sequence A =(a,a 2,...) of functions adapted to the observations; that is, a n may depend on past actions and observations, a n (a,z,a 2,Z 2,...,a n,z n ). It is hoped that no confusion results from using one symbol, a n, to denote both the function of past observations and the action taken at stage n. There is a discount sequence, denoted by B =(β,β 2,...), such that the j th observation is discounted by β j where 0 β j forj =, 2,... The total discounted return is then β jz j. The problem is to choose a decision rule A to maximize the expected reward, E β jz j. This problem is called the k -armed bandit problem. The one-armed bandit problem, mentioned in Exercise.4, is defined as the 2-armed bandit problem in which one of the arms always returns the same known amount, that is, the distribution F associated with one of the arms is degenerate at a known constant. To obtain a finite value for the expected reward, we assume () each distribution, F j for j =,...,k, has finite first moment, and (2) β j <. Two important special cases of the discount sequence are () the n-horizon uniform discount for which β =... = β n =andβ n+ = β n+2 =...=0,and (2) the geometric discount in which B =(,β,β 2,β 3,...), that is, β j = β j for j =, 2,... In the former, the payoff is simply n Z j, the sum of the first n observations. The problem becomes one with a finite horizon which can in principle be solved by backward induction. In the latter, there is a time invariance in which the future after n stages looks like it did at the start except for the change from the prior distribution to the posterior distribution. We will treat mainly problems with geometric discount and independent arms, that is, prior distributions G for which θ,...,θ k are independent, so that an observation on one arm will not influence your knowledge of the distribution of any other arm.

4 Bandit Problems 7.4 one of the arms has a known i.i.d. distribution of returns, and so plays only a minor role. We first show how the one-armed bandit can be related to a stopping rule problem. We assume that arm has an associated sequence of random variables, X,X 2,... with known joint distribution satisfying sup n EX n + <. For the other arm, arm 2, the returns are assumed to be i.i.d. from a known distribution with expectation λ. Wetakethediscount sequence to be geometric, B =(,β,β 2,...)where0<β<, and seek to find a decision rule A =(a,a 2,...) to maximize () V (A) =E( β j Z j A). First we argue that we may assume without loss of generality that the returns from arm 2 are degenerate at λ. Note that in the expectation above, any Z j from arm 2 may be replaced by λ. However, the rule A may allow the actual values of these Z j to influence the choice of the future a j. But the statistician may produce his own private sequence of i.i.d. random variables from the distribution of arm 2 and use them in A in place of the actual values he sees. This produces a randomized decision rule which may be denoted by A. Use of A in the problem where the random variables are given to be degenerate at λ produces the same expected payoff as A does in the original problem. The advantage of making this observation is that we may now assume that the decision rule A does not depend on Z j when a j =2,sinceZ j is known to be λ. Thus we assume that arm 2 gives a constant return of λ each time it is pulled. We now show that if at any stage it is optimal to pull arm 2, then it is optimal to keep pulling arm 2 thereafter. This implies that if there exists an optimal rule for this problem, there exists an optimal rule with the property that every pull of arm 2 is followed by another pull of arm 2. Thus one need only decide on the time to switch from arm to arm 2. This relates this problem to a stopping rule problem in which the stopping time is identified with the time of switching from arm to arm 2. Without loss of generality, we may assume that arm 2 is optimal at the initial stage, and state the theorem as follows. Theorem. If it is initially optimal to use arm 2 in the sense that sup A V (A) =V = sup{v (A) : A such that a = 2}, then it is optimal to use arm 2 always and V = λ/( β). Proof. For a given ɛ>0, find a decision rule A such that a =2andV (A) V ɛ. Then, V (A) =λ + βe( β j 2 Z j A) 2 = λ + βe( β j Z j+ A) = λ + βe( β j Z j A ) λ + βv,

5 Bandit Problems 7.5 where A = (a 2,a 3,...) is the rule A shifted by, and Z j = Z j+. Thus, we have V ɛ λ + βv,orequivalently,v (λ + ɛ)/( β). Since ɛ>0 is arbitrary, this implies V λ/( β), but this value is achievable by using arm 2 at each stage. This theorem is also valid for the n-uniform discount sequence. In fact, Berry and Fristedt (985) show that if the sequence {X,X 2,...} is exchangeable, then an optimal strategy for the problem has the property that every pull of arm 2 is followed by a pull of arm 2 if, and essentially only if, the discount sequence is what they call regular. The discount sequence B is said to be regular if it has increasing failure rate, that is, if β n / n β j is non-decreasing on its domain of definition. That the above theorem is not true for such discount sequences is easily seen: If B = {.,, 0, 0, 0,...}, thenb is regular, yet if X is degenerate at 0, X 2 is degenerate at 0, and λ = 0, then clearly the only optimal strategy is to follow an initial pull of arm 2 with a pull of arm. Exactly what property of B is required for the above theorem seems to be unknown. As a corollary, we see that there exists an optimal rule for this problem. It is either the rule that uses arm 2 at all stages, or the rule corresponding to the stopping rule N that is optimal for the stopping rule problem with payoff, (2) Y n = In fact, we can say more. n β j X j + λ β j. Theorem 2. Let Λ(β) denote the optimal rate of return for using arm at discount β, n+ E( N (3) Λ(β) = sup βj X j ) N E( N βj ). Then arm 2 is optimal initially if, and only if, λ Λ(β). Proof. By Theorem, we may restrict attention to decision rules A specified by a stopping time N which represents the last time that arm is used. The payoff using N is E( N βj X j + λ N + βj ), which for N =0isλ/( β). Therefore, arm 2 is optimal initially if, and only if, for all stopping rules N, or, equivalently, or, equivalently, E( N β j X j + λ β j ) λ/( β) N + N N E( β j X j ) λe( β j ) N N E( β j X j )/E( β j ) λ.

6 This is equivalent to Λ(β) λ. Bandit Problems 7.6 The value Λ(β) depends only on β and on the distribution of the returns from arm, X,X 2,... It is called the Gittins index for arm and it represents the indifference point: that value of λ for arm 2 in the one-armed bandit at which you would be indifferent between starting off on arm and choosing arm 2 all the time. 7.3 The Gittins Index Theorem. We return to the k -armed bandit with geometric discount and independent arms having returns denoted by arm : X(, ),X(, 2),... arm 2: X(2, ),X(2, 2), arm k : X(k,),X(k,2),... where it is assumed that the variables are independent between rows and that the first absolute moments exist and are uniformly bounded, sup k,t E X(k,t) <. The discount is β,where0 β<, and we seek a decision rule A =(a,a 2,...) to maximize the total discounted return, (4) V (A) =E( β t Z t A). For each arm we may compute a Gittins index, (5) Λ j =supe N N β t X(j, t)/e N β t for j =,...,k where we suppress β in the notation for Λ since β is held constant throughout this section. The celebrated theorem of Gittins and Jones (974) states that for the k -armed bandit with geometric discount and independent arms, it is optimal at each stage to select the arm with the highest index. We give a proof of this theorem due to Varaiya, Walrand and Buyukkoc (983) and adopt their didactic strategy of presenting the proof first in the special case in which there are just two arms (k = 2) and where all the random variables are degenerate. We denote the returns from arm by X(),X(2),... and from arm 2 by Y (),Y(2),... Thus, it is assumed that X(),X(2),... and Y (),Y(2),... are bounded sequences of real numbers. Let the Gittins indices be denoted by Λ X =sup j Λ Y =sup j j β t X(t)/ j β t Y (t)/ j β t j β t. Since X(t) is assumed bounded, the series n βt X(t) converges, so that there exists a value of j, possibly, at which the supremum in the definition of Λ X is taken on. In the following lemma, we suppose that s is this value of j so that s.

7 Bandit Problems 7.7 Lemma. Suppose the sequence X(),X(2),... is non-random and bounded. If Λ X = s βt X(t)/ s βt, then for all j s, (6) and for s finite and all j>s, s β t X(t) Λ X t=j s β t, t=j (7) j j β t X(t) Λ X β t. t=s+ t=s+ Proof. Since we have both s β t X(t) =Λ X s β t, and j j β t X(t) Λ X β t for all j, subtracting the latter from the former gives (6) when j is less than or equal to s, and gives (7) when j>s. Inequality (6) implies that after j <s stages have elapsed, the new Gittins index (which is at least as great as s j βt X(t)/ s j βt ) is at least as great as the original index, Λ X. Similarly, (7) shows that after exactly s stages have elapsed, the new Gittins index can be no greater than the original one. In fact, the following proof shows that if Λ X Λ Y and Λ X = s βt X(t)/ s βt, then it is optimal to start with at least s pulls of arm. Theorem 3. If the sequences X(),X(2),... and Y (),Y(2),... are non-random and bounded, if Λ X Λ Y then it is optimal initially to use arm. Proof. Assume Λ X Λ Y and let A be any rule. We will show that there is a rule A that begins with arm and gives at least as great a value as A. Then it is clear that the supremum of V (A) overalla is the same as the supremum of V (A) overalla that begin with arm. Find s such that Λ X = s βt X(t)/ s βt where s. Let k(t) denote the number of times that rule A calls for the use of arm 2 just before arm is used for the tth time, t =, 2,... It may be that arm 2 is not used between pulls t andt of arm so that k(t) =k(t ) and it is possible that A does not use arm t times, in which case k(t) is defined as +. Wehave0 k() k(2).... We define a new decision rule A that starts out with s X s followed by k(s) Y s and then, if k(s) <,

8 Bandit Problems 7.8 following this with the same sequence of X s and Y s as in A. Subject to s and the first s k(t) s being finite, the sequence of observations occurs in the following order for A: Y (),...,Y(k()),X(),Y(k() + ),...,Y(k(2)),X(2),...,Y(k(s)),X(s),Z(T +),... where T is the time that the sth X occurs using decision rule A and Z(T + ),... represents the rest of this sequence beyond T. The sequence of observations occurs in the following order using A : X(),X(2),...,X(s),Y(),...,Y(k()),...,Y(k(2)),...,Y(k(s)),Z(T +),... We are to show that V (A ) V (A) 0. Write this difference as V (A ) V (A) = (X) (Y ), where (X) represents the improvement in the value caused by shifting the X s forward, and (Y ) represents the loss due to shifting the Y s backward. Thus, (X) = = = = s β t X(t) s β t ( β k(t) )X(t) s β t X(t) s β t+k(t) X(t) t (β k(j ) β k(j) ) j= s (β k(j ) β k(j) ) j= Λ X s β t X(t) t=j s (β k(j ) β k(j) ) j= s =Λ X β t ( β k(t) ), where k(0) represents 0, and the inequality follows from (6). It is important to note that this computation is valid even if some of the k(t) =, that is, even if A did not contain s X s. It is also valid if s =. Similarly, we have (Y )= = s j= β j k(j) t=k(j )+ s (β j β s ) j= s t=j β t k(s) β t Y (t) β s β t Y (t) k(j) t=k(j )+ β t Y (t)

9 = = s j= m=j m= Bandit Problems 7.9 s (β m β m ) s m (β m β m ) =( β) s m= Λ Y ( β) k(m) β m s m= k(j) t=k(j )+ k(j) j= t=k(j )+ β t Y (t) k(j) β m β t s =Λ Y β m ( β k(m) ). m= β t Y (t) β t Y (t) The inequality follows from the definition of Λ Y. This computation is also valid if s = or if some of the k(t) =. Now, using the assumption that Λ X Λ Y, we find that (X) (Y ) 0, as was to be shown. The proof of the general theorem is similar, but at the crucial step involving the inequalities, we will not be able to interchange summation and expectation because the limits of summation are random. To circumvent this difficulty, the following two lemmas will be used. We deal with possibly randomized stopping rules by allowing the increasing sequence of σ -fields, (8) F() F(2)... F( ) to be such that F(t) istheσ -field generated by X(),...,X(t) and any number of other random variables independent of X(t +),X(t +2),... To say now that a random variable Z is F(t)-measurable means essentially that Z and {X(t +),X(t +2),...} are conditionally independent given X(),...,X(t). In particular, we have (9) EX(t +)Z =E(E{X(t +)Z X(),...,X(t +)}) =E(X(t +)E{Z X(),...,X(t)}) for any F(t)-measurable Z. This observation is useful in the proof of the following lemma. Lemma 2. Let X(t) be a sequence of random variables such that sup t E X(t) <, let 0 <β< and let Λ X denote the Gittins index. Then, for every stopping rule N, and every sequence of random variables α(t), t =, 2,..., such that α(t) is F(t )- measurable and α() α(2)... 0 a.s., we have N N (0) E α(t)β t X(t) Λ X E α(t)β t.

10 Bandit Problems 7.0 Proof. Let W (t) =β t (X(t) Λ X ). Then, the definition of Λ X every stopping rule N, N E W (t) 0. For any stopping rule N,I(N t) isf(t )-measurable. Hence, from (9), E N W (t) =E =E =E = n= I(N = n) n W (t) = W (t) I(N = n) n=t W (t)γ(t) 0, implies that for where γ(t) =P(N t X(),...,X(t )). Any sequence, γ() γ(2)... 0 a.s. with γ(t) F(t )-measurable, determines a stopping rule N such that P(N t F(t )) = γ(t). Hence, the hypothesis that E N W (t) 0 for all stopping rules N is thus equivalent to the hypothesis that E W (t)γ(t) 0 for all sequences γ(t) such that γ(t) isf(t )-measurable and γ() γ(2)... 0 a.s. Now, since E N α(t)w (t) =E =E =E = n= I(N = n) α(t)w (t) n α(t)w (t) = n=t W (t)γ(t) 0, I(N = n) where γ(t) = E{α(t)I(N t) X(),...,X(t )}, the conclusion follows. The next lemma provides the required generalization of (6) of Lemma. Since E X n is assumed to be bounded, conditions A and A2 are satisfied and there exists an optimal stopping rule for every λ. In particular, there exists a rule that attains the Gittins index. Lemma 3. Let X(t) be a sequence of random variables such that sup t E X(t) <, let 0 <β<,andlet denote a stopping rule that attains the Gittins index, Λ X. Then, for every sequence of random variables ξ(t), t =, 2,... such that ξ(t) is F(t )-measurable and 0 ξ() ξ(2)... a.s., we have () E ξ(t)β t X(t) Λ X E ξ(t)β t.

11 Proof. Since the Gittins index is attained at, Bandit Problems 7. E β t X(t) =Λ X E β t. FromLemma2with α(t) = ξ(t), we have E ( ξ(t))β t X(t) Λ X E ( ξ(t))β t. Subtracting the latter from the former gives the result. We now turn to the general problem with k independent arms, and denote the sequence of returns from arm j by X(j, ),X(j, 2),... for j =,...,k. It is assumed that the sets {X(,t)},...,{X(k,t)} are independent, and that sup j,t E X(j, t) <. We shall be dealing with random variables k(j, t) that depend upon the sequence X(j, ),X(j, 2),... only through the values of X(j, ),...,X(j, t ), though possibly on some of the X(m, n) form j. Such random variables are measurable with respect to the σ -field generated by X(j, ),...,X(j, t ), and X(m, n) form j and n =, 2,... We denote this σ -field by F(j, t ). Any decision rule that at each stage chooses an arm that has the highest Gittins index is called a Gittins index rule. Theorem 4. For a k -armed bandit problem with independent arms and geometric discount, any Gittins index rule is optimal. Proof. Suppose that Λ =maxλ j. Let A be an arbitrary decision rule. We prove the theorem by showing that there is a rule A that begins with arm and gives at least as great a value as A. Let k(j, t) denote the (random) time that A uses arm j for the tth time, j =,...,k, t =, 2,..., with the understanding that k(j, t) = if arm j is used less than t times. The value of A may then be written V (A) =E{ β t Z(t) A} =E k j= β k(j,t) X(j, t). Let denote the stopping rule that achieves the supremum in Λ =supe( N N β t X(,t))/E( N β t ),

12 Bandit Problems 7.2 and let T denote the (random) time that A uses arm for the th time, T = k(, ). Define the decision rule A as follows: (a) use arm at times, 2,...,,andthenif <, (b) use the arms j attimes +,...,T in the same order as given by A, andthen if T<, (c) continue according to A from time T on. Let k (j, t) denote the time when A uses arm j for the tth time, so that k (,t)=t for t =,...,. Finally, let m(j) denote the number of times that arm j is used by time T,sothatm() =. Then, (2) V (A ) V (A) =E m(j) k (β k (j,t) β k(j,t) )X(j, t) j= =E (β t β k(,t) )X(,t) E k j=2 m(j) =E ξ(t)β t X(,t) β t (β k(j,t) t β k (j,t) t )X(j, t) k j=2 m(j) E α(j, t)β t X(j, t) where ξ(t) = β k(,t) t, and α(j, t) =β k(j,t) t β k (j,t) t. Since k(,t) t represents the number of times that an arm other than arm has been pulled by the time the tth pull of arm occurs, we have that k(,t) t is F(,t )- measurable and nondecreasing in t a.s. so that ξ(t) is F(,t )-measurable and 0 ξ() ξ(2)... a.s. Thus from Lemma 3, we have (3) E ξ(t)β t X(,t) Λ E ξ(t)β t. For j>, α(j, t) =β k(j,t) t ( β k (j,t) k(j,t) ). Since k(j, t) t is F(j, t )-measurable and nondecreasing and since k (j, t) k(j, t) isequalto minus the number of times arm is pulled before the tth pull of arm j, and hence is F(j, t )-measurable and nonincreasing, we find that α(j, t) isf(j, t )-measurable and α(j, ) α(j, 2) Hence from Lemma 2, we have for j =2,...,k, m(j) m(j) (4) E α(j, t)β t X(j, t) Λ j E α(j, t)β t.

13 Bandit Problems 7.3 Combining (3) and (4) into (2) and recalling that Λ j Λ for all j>, we find m(j) k V (A ) V (A) Λ E (β k (j,t) β k(j,t) ). j= This last expectation is zero since it is just V (A ) V (A) with all payoffs put equal to.

### Numeraire-invariant option pricing

Numeraire-invariant option pricing Farshid Jamshidian NIB Capital Bank N.V. FELAB, University of Twente Nov-04 Numeraire-invariant option pricing p.1/20 A conceptual definition of an option An Option can

### Invariant Option Pricing & Minimax Duality of American and Bermudan Options

Invariant Option Pricing & Minimax Duality of American and Bermudan Options Farshid Jamshidian NIB Capital Bank N.V. FELAB, Applied Math Dept., Univ. of Twente April 2005, version 1.0 Invariant Option

### CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.

CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,

### Gambling Systems and Multiplication-Invariant Measures

Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously

### 5. Convergence of sequences of random variables

5. Convergence of sequences of random variables Throughout this chapter we assume that {X, X 2,...} is a sequence of r.v. and X is a r.v., and all of them are defined on the same probability space (Ω,

### Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)

Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you

### Infinitely Repeated Games with Discounting Ù

Infinitely Repeated Games with Discounting Page 1 Infinitely Repeated Games with Discounting Ù Introduction 1 Discounting the future 2 Interpreting the discount factor 3 The average discounted payoff 4

### Mathematical Induction

Chapter 2 Mathematical Induction 2.1 First Examples Suppose we want to find a simple formula for the sum of the first n odd numbers: 1 + 3 + 5 +... + (2n 1) = n (2k 1). How might we proceed? The most natural

### OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem")

OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem") BY Y. S. CHOW, S. MORIGUTI, H. ROBBINS AND S. M. SAMUELS ABSTRACT n rankable persons appear sequentially in random order. At the ith

### Double Sequences and Double Series

Double Sequences and Double Series Eissa D. Habil Islamic University of Gaza P.O. Box 108, Gaza, Palestine E-mail: habil@iugaza.edu Abstract This research considers two traditional important questions,

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that. a = bq + r and 0 r < b.

Theorem (The division theorem) Suppose that a and b are integers with b > 0. There exist unique integers q and r so that a = bq + r and 0 r < b. We re dividing a by b: q is the quotient and r is the remainder,

### Lecture 12 Basic Lyapunov theory

EE363 Winter 2008-09 Lecture 12 Basic Lyapunov theory stability positive definite functions global Lyapunov stability theorems Lasalle s theorem converse Lyapunov theorems finding Lyapunov functions 12

### LECTURE 15: AMERICAN OPTIONS

LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These

### Introduction Main Results Experiments Conclusions. The UCB Algorithm

Paper Finite-time Analysis of the Multiarmed Bandit Problem by Auer, Cesa-Bianchi, Fischer, Machine Learning 27, 2002 Presented by Markus Enzenberger. Go Seminar, University of Alberta. March 14, 2007

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Chapter 4 Lecture Notes

Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,

### The Ideal Class Group

Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

### From Binomial Trees to the Black-Scholes Option Pricing Formulas

Lecture 4 From Binomial Trees to the Black-Scholes Option Pricing Formulas In this lecture, we will extend the example in Lecture 2 to a general setting of binomial trees, as an important model for a single

### Further linear algebra. Chapter I. Integers.

Further linear algebra. Chapter I. Integers. Andrei Yafaev Number theory is the theory of Z = {0, ±1, ±2,...}. 1 Euclid s algorithm, Bézout s identity and the greatest common divisor. We say that a Z divides

### Un point de vue bayésien pour des algorithmes de bandit plus performants

Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)

### x if x 0, x if x < 0.

Chapter 3 Sequences In this chapter, we discuss sequences. We say what it means for a sequence to converge, and define the limit of a convergent sequence. We begin with some preliminary results about the

### The Kelly Betting System for Favorable Games.

The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability

### Chapter 2 Sequences and Series

Chapter 2 Sequences and Series 2. Sequences A sequence is a function from the positive integers (possibly including 0) to the reals. A typical example is a n = /n defined for all integers n. The notation

### OPTIMIZATION AND OPERATIONS RESEARCH Vol. IV - Markov Decision Processes - Ulrich Rieder

MARKOV DECISIO PROCESSES Ulrich Rieder University of Ulm, Germany Keywords: Markov decision problem, stochastic dynamic program, total reward criteria, average reward, optimal policy, optimality equation,

### 1.3 Induction and Other Proof Techniques

4CHAPTER 1. INTRODUCTORY MATERIAL: SETS, FUNCTIONS AND MATHEMATICAL INDU 1.3 Induction and Other Proof Techniques The purpose of this section is to study the proof technique known as mathematical induction.

### 1 Norms and Vector Spaces

008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

### A Curious Case of a Continued Fraction

A Curious Case of a Continued Fraction Abigail Faron Advisor: Dr. David Garth May 30, 204 Introduction The primary purpose of this paper is to combine the studies of continued fractions, the Fibonacci

### The Exponential Distribution

21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding

### 3.1. Sequences and Their Limits Definition (3.1.1). A sequence of real numbers (or a sequence in R) is a function from N into R.

CHAPTER 3 Sequences and Series 3.. Sequences and Their Limits Definition (3..). A sequence of real numbers (or a sequence in R) is a function from N into R. Notation. () The values of X : N! R are denoted

ONDERZOEKSRAPPORT NR 8904 OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS BY M. VANDEBROEK & J. DHAENE D/1989/2376/5 1 IN A OPTIMAl PREMIUM CONTROl NON-liFE INSURANCE BUSINESS By Martina Vandebroek

### 10.2 Series and Convergence

10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and

### Tail inequalities for order statistics of log-concave vectors and applications

Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic

### 9.2 Summation Notation

9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a

### Dynamic Assignment of Dedicated and Flexible Servers in Tandem Lines

Dynamic Assignment of Dedicated and Flexible Servers in Tandem Lines Sigrún Andradóttir and Hayriye Ayhan School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205,

### Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100

Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,

### Bad Reputation. March 26, 2002

Bad Reputation Jeffrey C. Ely Juuso Välimäki March 26, 2002 Abstract We study the classical reputation model in which a long-run player faces a sequence of short-run players and would like to establish

### SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH

31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,

### 1 Limiting distribution for a Markov chain

Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easy-to-check

### 8.7 Mathematical Induction

8.7. MATHEMATICAL INDUCTION 8-135 8.7 Mathematical Induction Objective Prove a statement by mathematical induction Many mathematical facts are established by first observing a pattern, then making a conjecture

### e.g. arrival of a customer to a service station or breakdown of a component in some system.

Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be

### 3 Does the Simplex Algorithm Work?

Does the Simplex Algorithm Work? In this section we carefully examine the simplex algorithm introduced in the previous chapter. Our goal is to either prove that it works, or to determine those circumstances

### An example of a computable

An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept

### 6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation

6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation Daron Acemoglu and Asu Ozdaglar MIT November 2, 2009 1 Introduction Outline The problem of cooperation Finitely-repeated prisoner s dilemma

### CPC/CPA Hybrid Bidding in a Second Price Auction

CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09-074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper

### Notes from Week 1: Algorithms for sequential prediction

CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking

### Taylor Polynomials and Taylor Series Math 126

Taylor Polynomials and Taylor Series Math 26 In many problems in science and engineering we have a function f(x) which is too complicated to answer the questions we d like to ask. In this chapter, we will

### Probability Generating Functions

page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence

### The Heat Equation. Lectures INF2320 p. 1/88

The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)

### Lecture 3: UCB Algorithm, Worst-Case Regret Bound

IEOR 8100-001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, Worst-Case Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB

### Section 3 Sequences and Limits

Section 3 Sequences and Limits Definition A sequence of real numbers is an infinite ordered list a, a 2, a 3, a 4,... where, for each n N, a n is a real number. We call a n the n-th term of the sequence.

### 1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let

Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as

### A Portfolio Model of Insurance Demand. April 2005. Kalamazoo, MI 49008 East Lansing, MI 48824

A Portfolio Model of Insurance Demand April 2005 Donald J. Meyer Jack Meyer Department of Economics Department of Economics Western Michigan University Michigan State University Kalamazoo, MI 49008 East

### TAKE-AWAY GAMES. ALLEN J. SCHWENK California Institute of Technology, Pasadena, California INTRODUCTION

TAKE-AWAY GAMES ALLEN J. SCHWENK California Institute of Technology, Pasadena, California L INTRODUCTION Several games of Tf take-away?f have become popular. The purpose of this paper is to determine the

### Course 221: Analysis Academic year , First Semester

Course 221: Analysis Academic year 2007-08, First Semester David R. Wilkins Copyright c David R. Wilkins 1989 2007 Contents 1 Basic Theorems of Real Analysis 1 1.1 The Least Upper Bound Principle................

### CONTRIBUTIONS TO ZERO SUM PROBLEMS

CONTRIBUTIONS TO ZERO SUM PROBLEMS S. D. ADHIKARI, Y. G. CHEN, J. B. FRIEDLANDER, S. V. KONYAGIN AND F. PAPPALARDI Abstract. A prototype of zero sum theorems, the well known theorem of Erdős, Ginzburg

### Finite Sets. Theorem 5.1. Two non-empty finite sets have the same cardinality if and only if they are equivalent.

MATH 337 Cardinality Dr. Neal, WKU We now shall prove that the rational numbers are a countable set while R is uncountable. This result shows that there are two different magnitudes of infinity. But we

### Notes: Chapter 2 Section 2.2: Proof by Induction

Notes: Chapter 2 Section 2.2: Proof by Induction Basic Induction. To prove: n, a W, n a, S n. (1) Prove the base case - S a. (2) Let k a and prove that S k S k+1 Example 1. n N, n i = n(n+1) 2. Example

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### The Goldberg Rao Algorithm for the Maximum Flow Problem

The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }

### Limits and convergence.

Chapter 2 Limits and convergence. 2.1 Limit points of a set of real numbers 2.1.1 Limit points of a set. DEFINITION: A point x R is a limit point of a set E R if for all ε > 0 the set (x ε,x + ε) E is

### Practice Problems Solutions

Practice Problems Solutions The problems below have been carefully selected to illustrate common situations and the techniques and tricks to deal with these. Try to master them all; it is well worth it!

### 3. Renewal Theory. Definition 3 of the Poisson process can be generalized: Let X 1, X 2,..., iidf(x) be non-negative interarrival times.

3. Renewal Theory Definition 3 of the Poisson process can be generalized: Let X 1, X 2,..., iidf(x) be non-negative interarrival times. Set S n = n i=1 X i and N(t) = max {n : S n t}. Then {N(t)} is a

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Examples of infinite sample spaces. Math 425 Introduction to Probability Lecture 12. Example of coin tosses. Axiom 3 Strong form

Infinite Discrete Sample Spaces s of infinite sample spaces Math 425 Introduction to Probability Lecture 2 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan February 4,

### Sets and functions. {x R : x > 0}.

Sets and functions 1 Sets The language of sets and functions pervades mathematics, and most of the important operations in mathematics turn out to be functions or to be expressible in terms of functions.

### INTRODUCTION TO THE CONVERGENCE OF SEQUENCES

INTRODUCTION TO THE CONVERGENCE OF SEQUENCES BECKY LYTLE Abstract. In this paper, we discuss the basic ideas involved in sequences and convergence. We start by defining sequences and follow by explaining

### Mathematical Finance

Mathematical Finance Option Pricing under the Risk-Neutral Measure Cory Barnes Department of Mathematics University of Washington June 11, 2013 Outline 1 Probability Background 2 Black Scholes for European

### Lecture 13: Martingales

Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The

### Chap2: The Real Number System (See Royden pp40)

Chap2: The Real Number System (See Royden pp40) 1 Open and Closed Sets of Real Numbers The simplest sets of real numbers are the intervals. We define the open interval (a, b) to be the set (a, b) = {x

Section 5.1 Climbing an Infinite Ladder Suppose we have an infinite ladder and the following capabilities: 1. We can reach the first rung of the ladder. 2. If we can reach a particular rung of the ladder,

### Section 3 Sequences and Limits, Continued.

Section 3 Sequences and Limits, Continued. Lemma 3.6 Let {a n } n N be a convergent sequence for which a n 0 for all n N and it α 0. Then there exists N N such that for all n N. α a n 3 α In particular

### Cylinder Maps and the Schwarzian

Cylinder Maps and the Schwarzian John Milnor Stony Brook University (www.math.sunysb.edu) Bremen June 16, 2008 Cylinder Maps 1 work with Araceli Bonifant Let C denote the cylinder (R/Z) I. (.5, 1) fixed

### General theory of stochastic processes

CHAPTER 1 General theory of stochastic processes 1.1. Definition of stochastic process First let us recall the definition of a random variable. A random variable is a random number appearing as a result

### Degrees that are not degrees of categoricity

Degrees that are not degrees of categoricity Bernard A. Anderson Department of Mathematics and Physical Sciences Gordon State College banderson@gordonstate.edu www.gordonstate.edu/faculty/banderson Barbara

### CARDINALITY, COUNTABLE AND UNCOUNTABLE SETS PART ONE

CARDINALITY, COUNTABLE AND UNCOUNTABLE SETS PART ONE With the notion of bijection at hand, it is easy to formalize the idea that two finite sets have the same number of elements: we just need to verify

### Chapter 5: Application: Fourier Series

321 28 9 Chapter 5: Application: Fourier Series For lack of time, this chapter is only an outline of some applications of Functional Analysis and some proofs are not complete. 5.1 Definition. If f L 1

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### Decentralised bilateral trading, competition for bargaining partners and the law of one price

ECONOMICS WORKING PAPERS Decentralised bilateral trading, competition for bargaining partners and the law of one price Kalyan Chatterjee Kaustav Das Paper Number 105 October 2014 2014 by Kalyan Chatterjee

### 11 Ideals. 11.1 Revisiting Z

11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(

### LOGNORMAL MODEL FOR STOCK PRICES

LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as

### THIS paper deals with a situation where a communication

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 44, NO. 3, MAY 1998 973 The Compound Channel Capacity of a Class of Finite-State Channels Amos Lapidoth, Member, IEEE, İ. Emre Telatar, Member, IEEE Abstract

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Stochastic Inventory Control

Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the

### TD(0) Leads to Better Policies than Approximate Value Iteration

TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract

### N E W S A N D L E T T E R S

N E W S A N D L E T T E R S 73rd Annual William Lowell Putnam Mathematical Competition Editor s Note: Additional solutions will be printed in the Monthly later in the year. PROBLEMS A1. Let d 1, d,...,

### SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by

SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples

### Solution Using the geometric series a/(1 r) = x=1. x=1. Problem For each of the following distributions, compute

Math 472 Homework Assignment 1 Problem 1.9.2. Let p(x) 1/2 x, x 1, 2, 3,..., zero elsewhere, be the pmf of the random variable X. Find the mgf, the mean, and the variance of X. Solution 1.9.2. Using the

### When is Reputation Bad? 1

When is Reputation Bad? 1 Jeffrey Ely Drew Fudenberg David K Levine 2 First Version: April 22, 2002 This Version: November 20, 2005 Abstract: In traditional reputation theory, the ability to build a reputation

### t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction

### Course 421: Algebraic Topology Section 1: Topological Spaces

Course 421: Algebraic Topology Section 1: Topological Spaces David R. Wilkins Copyright c David R. Wilkins 1988 2008 Contents 1 Topological Spaces 1 1.1 Continuity and Topological Spaces...............

### Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception

### The Relation between Two Present Value Formulae

James Ciecka, Gary Skoog, and Gerald Martin. 009. The Relation between Two Present Value Formulae. Journal of Legal Economics 15(): pp. 61-74. The Relation between Two Present Value Formulae James E. Ciecka,

### What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

### OPTIMAL CONTROL OF FLEXIBLE SERVERS IN TWO TANDEM QUEUES WITH OPERATING COSTS

Probability in the Engineering and Informational Sciences, 22, 2008, 107 131. Printed in the U.S.A. DOI: 10.1017/S0269964808000077 OPTIMAL CONTROL OF FLEXILE SERVERS IN TWO TANDEM QUEUES WITH OPERATING