Chapter 7. BANDIT PROBLEMS.
|
|
|
- Marylou Fisher
- 10 years ago
- Views:
Transcription
1 Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974). In the first section, we present a description of bandit problems and give some historical background. In the next section, we treat the one-armed bandit problems by the method of the previous chapter. In the final section, we discuss the Theorem of Gittins and Jones which shows that the k -armed bandit problems may be solved by solving k one-armed problems. An excellent reference to bandit problems is the book of Berry and Fristedt, (985). 7. Introduction to the Problem. Consider a sequential decision problem in which at each stage there are k possible actions or choices of experiment. Choice of action j results in an observation being taken from the j th experiment, and you receive the numerical value of this observation as a reward. The observations you make may give you information useful in future choices of actions. Your goal is to maximize the present value of the infinite stream of rewards you receive, discounted in some way. The name bandit comes from modeling these problems as a k -armed bandit, which is a slot machine with k arms, each yielding an unknown, possibly different distribution of payoffs. You do not know which arm gives you the greatest average return, but by playing the various arms of the slot machine you can gain information on which arm is best. However, the observations you use to gain information are also your rewards. You must strike a balance between gaining rewards and gaining information. For example, it is not good to always pull the arm that has performed best in the past, because it may have been that you were just unlucky with the best arm. If you have many trials to go and it only takes a few trials to clarify the matter, you can stand to improve your average gain greatly with only a small investment. Typically in these problems, there is a period of gaining information, followed by a period of narrowing down the arms, followed by a period of profit taking, playing the arm you feel to be the best. A more important modeling of these problems comes from clinical trials in which there are k treatments for a given disease. Patients arrive sequentially at the clinic and must be treated immediately by one of the treatments. It is assumed that response from treatment is immediate so that the effectiveness of the treatment that the present patient receives is known when the next patient must be treated. It is not known precisely which one of the
2 Bandit Problems 7.2 treatments is best, but you must decide which treatment to give each patient, keeping in mind that your goal is to cure as many patients as possible. This may require you to give a patient a treatment which is not the one that looks best at the present time in order to gain information that may be of use to future patients. To describe bandit problems more precisely, let the k reward distributions be denoted by F (x θ ),...,F k (x θ k ) where θ,...,θ k are parameters whose exact values are not known precisely, but whose joint prior distribution is known and denoted by G(θ,...,θ k ). Initially, an action a is chosen from the set {,...,k} andthenanobservation,z,the reward for the first stage, is taken from the distribution F a. Based on this information, an action a 2 is then taken from the same action space, and an observation, Z 2,istaken from F a2 and so on. It is assumed that given a n and the parameters θ,...,θ k, Z n is chosen from F an independent of the past. A decision rule for this problem is a sequence A =(a,a 2,...) of functions adapted to the observations; that is, a n may depend on past actions and observations, a n (a,z,a 2,Z 2,...,a n,z n ). It is hoped that no confusion results from using one symbol, a n, to denote both the function of past observations and the action taken at stage n. There is a discount sequence, denoted by B =(β,β 2,...), such that the j th observation is discounted by β j where 0 β j forj =, 2,... The total discounted return is then β jz j. The problem is to choose a decision rule A to maximize the expected reward, E β jz j. This problem is called the k -armed bandit problem. The one-armed bandit problem, mentioned in Exercise.4, is defined as the 2-armed bandit problem in which one of the arms always returns the same known amount, that is, the distribution F associated with one of the arms is degenerate at a known constant. To obtain a finite value for the expected reward, we assume () each distribution, F j for j =,...,k, has finite first moment, and (2) β j <. Two important special cases of the discount sequence are () the n-horizon uniform discount for which β =... = β n =andβ n+ = β n+2 =...=0,and (2) the geometric discount in which B =(,β,β 2,β 3,...), that is, β j = β j for j =, 2,... In the former, the payoff is simply n Z j, the sum of the first n observations. The problem becomes one with a finite horizon which can in principle be solved by backward induction. In the latter, there is a time invariance in which the future after n stages looks like it did at the start except for the change from the prior distribution to the posterior distribution. We will treat mainly problems with geometric discount and independent arms, that is, prior distributions G for which θ,...,θ k are independent, so that an observation on one arm will not influence your knowledge of the distribution of any other arm.
3 Bandit Problems 7.3 First, we give a little historical background to add perspective to what follows. Early work on these problems centered mainly on the finite horizon problem with Bernoulli trials. These problems were introduced in the framework of clinical trials by Thompson (933) for two treatments with outcomes forming Bernoulli trials with success probabilities having independent uniform prior distributions on (0,). Robbins (952) reintroduced the problem from a non-bayes viewpoint, and suggested searching for the minimax decision rule. In the Bernoulli case, he proposed the play-the-winner/switch-from-a-loser strategy, and discussed the asymptotic behavior of rules. The first paper to search for Bayes decision rules for this problem is the paper of Bradt, Johnson and Karlin (956). One of their important results, mentioned in Exercise.4, is that for the one-armed bandit with finite horizon and Bernoulli trials, if the known arm is optimal at any stage, then it is optimal to use that arm at all subsequent stages. Another important result is that for the 2-armed bandit with finite horizon and Bernoulli trials with success probabilities p and p 2, if the prior distribution gives all its weight to points (p,p 2 ) on the line p + p 2 =, then the -stage look-ahead rule is optimal; that is, it is optimal to choose the arm with the higher present probability of success. In addition, they conjecture that if the prior distribution gives all its weight to two points (a, b) and (b, a) symmetrically placed about the line p = p 2, then again the -stage look-ahead rule is optimal. This conjecture was proved to be true by Feldman (962). These are about the only cases in which the optimal rule is easy to evaluate. In particular, in the important practical case of independent arms, the difficulty of computation of the optimal rule hindered real progress. One important result for the 2-armed Bernoulli bandit with independent arms and finite horizon is the stay-on-a winner principle, proved in Berry (972). This principle states that if an arm is optimal at some stage and if it proves to be successful at that stage, then it is optimal at the following stage also. This was proved for the one-armed Bernoulli bandit with finite horizon by Bradt, Johnson and Karlin (956). This was the state of affairs when the theorem of Gittins and Jones showed that for the k-armed bandit with independent arms and geometric discount, the problem canbesolvedbysolvingk one-armed bandit problems. We treat a problem of a somewhat more general structure than that indicated above by letting the returns for each arm be an arbitrary sequence of random variables, (not necessarily exchangeable). Thus for each arm, say arm j, there is assumed to be a sequence of returns, X j,x j2,... with an arbitrary joint distribution, subject to the condition that sup n EX + jn < for all j =,...,k. It is assumed that the arms are independent, that is, that the sets {X,X 2,...},...,{X k,x k2,...} are independent sets of random variables. When an arm, say j, is pulled, the first random variable of the sequence X j is received as the reward. The next time j is pulled, the reward X j2 is received, etc. That the theorem of Gittins and Jones applies to this more general probabilistic structure has been proved by Varaiya, Walrand and Buyukkoc (985). See also Mandelbaum (986). 7.2 The one-armed bandit. As a preliminary to the solution of the k -armed bandit with geometric discount and independent arms, it is important to understand the one-armed bandit. The one-armed bandit is a really a bandit problem with two arms, but
4 Bandit Problems 7.4 one of the arms has a known i.i.d. distribution of returns, and so plays only a minor role. We first show how the one-armed bandit can be related to a stopping rule problem. We assume that arm has an associated sequence of random variables, X,X 2,... with known joint distribution satisfying sup n EX n + <. For the other arm, arm 2, the returns are assumed to be i.i.d. from a known distribution with expectation λ. Wetakethediscount sequence to be geometric, B =(,β,β 2,...)where0<β<, and seek to find a decision rule A =(a,a 2,...) to maximize () V (A) =E( β j Z j A). First we argue that we may assume without loss of generality that the returns from arm 2 are degenerate at λ. Note that in the expectation above, any Z j from arm 2 may be replaced by λ. However, the rule A may allow the actual values of these Z j to influence the choice of the future a j. But the statistician may produce his own private sequence of i.i.d. random variables from the distribution of arm 2 and use them in A in place of the actual values he sees. This produces a randomized decision rule which may be denoted by A. Use of A in the problem where the random variables are given to be degenerate at λ produces the same expected payoff as A does in the original problem. The advantage of making this observation is that we may now assume that the decision rule A does not depend on Z j when a j =2,sinceZ j is known to be λ. Thus we assume that arm 2 gives a constant return of λ each time it is pulled. We now show that if at any stage it is optimal to pull arm 2, then it is optimal to keep pulling arm 2 thereafter. This implies that if there exists an optimal rule for this problem, there exists an optimal rule with the property that every pull of arm 2 is followed by another pull of arm 2. Thus one need only decide on the time to switch from arm to arm 2. This relates this problem to a stopping rule problem in which the stopping time is identified with the time of switching from arm to arm 2. Without loss of generality, we may assume that arm 2 is optimal at the initial stage, and state the theorem as follows. Theorem. If it is initially optimal to use arm 2 in the sense that sup A V (A) =V = sup{v (A) : A such that a = 2}, then it is optimal to use arm 2 always and V = λ/( β). Proof. For a given ɛ>0, find a decision rule A such that a =2andV (A) V ɛ. Then, V (A) =λ + βe( β j 2 Z j A) 2 = λ + βe( β j Z j+ A) = λ + βe( β j Z j A ) λ + βv,
5 Bandit Problems 7.5 where A = (a 2,a 3,...) is the rule A shifted by, and Z j = Z j+. Thus, we have V ɛ λ + βv,orequivalently,v (λ + ɛ)/( β). Since ɛ>0 is arbitrary, this implies V λ/( β), but this value is achievable by using arm 2 at each stage. This theorem is also valid for the n-uniform discount sequence. In fact, Berry and Fristedt (985) show that if the sequence {X,X 2,...} is exchangeable, then an optimal strategy for the problem has the property that every pull of arm 2 is followed by a pull of arm 2 if, and essentially only if, the discount sequence is what they call regular. The discount sequence B is said to be regular if it has increasing failure rate, that is, if β n / n β j is non-decreasing on its domain of definition. That the above theorem is not true for such discount sequences is easily seen: If B = {.,, 0, 0, 0,...}, thenb is regular, yet if X is degenerate at 0, X 2 is degenerate at 0, and λ = 0, then clearly the only optimal strategy is to follow an initial pull of arm 2 with a pull of arm. Exactly what property of B is required for the above theorem seems to be unknown. As a corollary, we see that there exists an optimal rule for this problem. It is either the rule that uses arm 2 at all stages, or the rule corresponding to the stopping rule N that is optimal for the stopping rule problem with payoff, (2) Y n = In fact, we can say more. n β j X j + λ β j. Theorem 2. Let Λ(β) denote the optimal rate of return for using arm at discount β, n+ E( N (3) Λ(β) = sup βj X j ) N E( N βj ). Then arm 2 is optimal initially if, and only if, λ Λ(β). Proof. By Theorem, we may restrict attention to decision rules A specified by a stopping time N which represents the last time that arm is used. The payoff using N is E( N βj X j + λ N + βj ), which for N =0isλ/( β). Therefore, arm 2 is optimal initially if, and only if, for all stopping rules N, or, equivalently, or, equivalently, E( N β j X j + λ β j ) λ/( β) N + N N E( β j X j ) λe( β j ) N N E( β j X j )/E( β j ) λ.
6 This is equivalent to Λ(β) λ. Bandit Problems 7.6 The value Λ(β) depends only on β and on the distribution of the returns from arm, X,X 2,... It is called the Gittins index for arm and it represents the indifference point: that value of λ for arm 2 in the one-armed bandit at which you would be indifferent between starting off on arm and choosing arm 2 all the time. 7.3 The Gittins Index Theorem. We return to the k -armed bandit with geometric discount and independent arms having returns denoted by arm : X(, ),X(, 2),... arm 2: X(2, ),X(2, 2), arm k : X(k,),X(k,2),... where it is assumed that the variables are independent between rows and that the first absolute moments exist and are uniformly bounded, sup k,t E X(k,t) <. The discount is β,where0 β<, and we seek a decision rule A =(a,a 2,...) to maximize the total discounted return, (4) V (A) =E( β t Z t A). For each arm we may compute a Gittins index, (5) Λ j =supe N N β t X(j, t)/e N β t for j =,...,k where we suppress β in the notation for Λ since β is held constant throughout this section. The celebrated theorem of Gittins and Jones (974) states that for the k -armed bandit with geometric discount and independent arms, it is optimal at each stage to select the arm with the highest index. We give a proof of this theorem due to Varaiya, Walrand and Buyukkoc (983) and adopt their didactic strategy of presenting the proof first in the special case in which there are just two arms (k = 2) and where all the random variables are degenerate. We denote the returns from arm by X(),X(2),... and from arm 2 by Y (),Y(2),... Thus, it is assumed that X(),X(2),... and Y (),Y(2),... are bounded sequences of real numbers. Let the Gittins indices be denoted by Λ X =sup j Λ Y =sup j j β t X(t)/ j β t Y (t)/ j β t j β t. Since X(t) is assumed bounded, the series n βt X(t) converges, so that there exists a value of j, possibly, at which the supremum in the definition of Λ X is taken on. In the following lemma, we suppose that s is this value of j so that s.
7 Bandit Problems 7.7 Lemma. Suppose the sequence X(),X(2),... is non-random and bounded. If Λ X = s βt X(t)/ s βt, then for all j s, (6) and for s finite and all j>s, s β t X(t) Λ X t=j s β t, t=j (7) j j β t X(t) Λ X β t. t=s+ t=s+ Proof. Since we have both s β t X(t) =Λ X s β t, and j j β t X(t) Λ X β t for all j, subtracting the latter from the former gives (6) when j is less than or equal to s, and gives (7) when j>s. Inequality (6) implies that after j <s stages have elapsed, the new Gittins index (which is at least as great as s j βt X(t)/ s j βt ) is at least as great as the original index, Λ X. Similarly, (7) shows that after exactly s stages have elapsed, the new Gittins index can be no greater than the original one. In fact, the following proof shows that if Λ X Λ Y and Λ X = s βt X(t)/ s βt, then it is optimal to start with at least s pulls of arm. Theorem 3. If the sequences X(),X(2),... and Y (),Y(2),... are non-random and bounded, if Λ X Λ Y then it is optimal initially to use arm. Proof. Assume Λ X Λ Y and let A be any rule. We will show that there is a rule A that begins with arm and gives at least as great a value as A. Then it is clear that the supremum of V (A) overalla is the same as the supremum of V (A) overalla that begin with arm. Find s such that Λ X = s βt X(t)/ s βt where s. Let k(t) denote the number of times that rule A calls for the use of arm 2 just before arm is used for the tth time, t =, 2,... It may be that arm 2 is not used between pulls t andt of arm so that k(t) =k(t ) and it is possible that A does not use arm t times, in which case k(t) is defined as +. Wehave0 k() k(2).... We define a new decision rule A that starts out with s X s followed by k(s) Y s and then, if k(s) <,
8 Bandit Problems 7.8 following this with the same sequence of X s and Y s as in A. Subject to s and the first s k(t) s being finite, the sequence of observations occurs in the following order for A: Y (),...,Y(k()),X(),Y(k() + ),...,Y(k(2)),X(2),...,Y(k(s)),X(s),Z(T +),... where T is the time that the sth X occurs using decision rule A and Z(T + ),... represents the rest of this sequence beyond T. The sequence of observations occurs in the following order using A : X(),X(2),...,X(s),Y(),...,Y(k()),...,Y(k(2)),...,Y(k(s)),Z(T +),... We are to show that V (A ) V (A) 0. Write this difference as V (A ) V (A) = (X) (Y ), where (X) represents the improvement in the value caused by shifting the X s forward, and (Y ) represents the loss due to shifting the Y s backward. Thus, (X) = = = = s β t X(t) s β t ( β k(t) )X(t) s β t X(t) s β t+k(t) X(t) t (β k(j ) β k(j) ) j= s (β k(j ) β k(j) ) j= Λ X s β t X(t) t=j s (β k(j ) β k(j) ) j= s =Λ X β t ( β k(t) ), where k(0) represents 0, and the inequality follows from (6). It is important to note that this computation is valid even if some of the k(t) =, that is, even if A did not contain s X s. It is also valid if s =. Similarly, we have (Y )= = s j= β j k(j) t=k(j )+ s (β j β s ) j= s t=j β t k(s) β t Y (t) β s β t Y (t) k(j) t=k(j )+ β t Y (t)
9 = = s j= m=j m= Bandit Problems 7.9 s (β m β m ) s m (β m β m ) =( β) s m= Λ Y ( β) k(m) β m s m= k(j) t=k(j )+ k(j) j= t=k(j )+ β t Y (t) k(j) β m β t s =Λ Y β m ( β k(m) ). m= β t Y (t) β t Y (t) The inequality follows from the definition of Λ Y. This computation is also valid if s = or if some of the k(t) =. Now, using the assumption that Λ X Λ Y, we find that (X) (Y ) 0, as was to be shown. The proof of the general theorem is similar, but at the crucial step involving the inequalities, we will not be able to interchange summation and expectation because the limits of summation are random. To circumvent this difficulty, the following two lemmas will be used. We deal with possibly randomized stopping rules by allowing the increasing sequence of σ -fields, (8) F() F(2)... F( ) to be such that F(t) istheσ -field generated by X(),...,X(t) and any number of other random variables independent of X(t +),X(t +2),... To say now that a random variable Z is F(t)-measurable means essentially that Z and {X(t +),X(t +2),...} are conditionally independent given X(),...,X(t). In particular, we have (9) EX(t +)Z =E(E{X(t +)Z X(),...,X(t +)}) =E(X(t +)E{Z X(),...,X(t)}) for any F(t)-measurable Z. This observation is useful in the proof of the following lemma. Lemma 2. Let X(t) be a sequence of random variables such that sup t E X(t) <, let 0 <β< and let Λ X denote the Gittins index. Then, for every stopping rule N, and every sequence of random variables α(t), t =, 2,..., such that α(t) is F(t )- measurable and α() α(2)... 0 a.s., we have N N (0) E α(t)β t X(t) Λ X E α(t)β t.
10 Bandit Problems 7.0 Proof. Let W (t) =β t (X(t) Λ X ). Then, the definition of Λ X every stopping rule N, N E W (t) 0. For any stopping rule N,I(N t) isf(t )-measurable. Hence, from (9), E N W (t) =E =E =E = n= I(N = n) n W (t) = W (t) I(N = n) n=t W (t)γ(t) 0, implies that for where γ(t) =P(N t X(),...,X(t )). Any sequence, γ() γ(2)... 0 a.s. with γ(t) F(t )-measurable, determines a stopping rule N such that P(N t F(t )) = γ(t). Hence, the hypothesis that E N W (t) 0 for all stopping rules N is thus equivalent to the hypothesis that E W (t)γ(t) 0 for all sequences γ(t) such that γ(t) isf(t )-measurable and γ() γ(2)... 0 a.s. Now, since E N α(t)w (t) =E =E =E = n= I(N = n) α(t)w (t) n α(t)w (t) = n=t W (t)γ(t) 0, I(N = n) where γ(t) = E{α(t)I(N t) X(),...,X(t )}, the conclusion follows. The next lemma provides the required generalization of (6) of Lemma. Since E X n is assumed to be bounded, conditions A and A2 are satisfied and there exists an optimal stopping rule for every λ. In particular, there exists a rule that attains the Gittins index. Lemma 3. Let X(t) be a sequence of random variables such that sup t E X(t) <, let 0 <β<,andlet denote a stopping rule that attains the Gittins index, Λ X. Then, for every sequence of random variables ξ(t), t =, 2,... such that ξ(t) is F(t )-measurable and 0 ξ() ξ(2)... a.s., we have () E ξ(t)β t X(t) Λ X E ξ(t)β t.
11 Proof. Since the Gittins index is attained at, Bandit Problems 7. E β t X(t) =Λ X E β t. FromLemma2with α(t) = ξ(t), we have E ( ξ(t))β t X(t) Λ X E ( ξ(t))β t. Subtracting the latter from the former gives the result. We now turn to the general problem with k independent arms, and denote the sequence of returns from arm j by X(j, ),X(j, 2),... for j =,...,k. It is assumed that the sets {X(,t)},...,{X(k,t)} are independent, and that sup j,t E X(j, t) <. We shall be dealing with random variables k(j, t) that depend upon the sequence X(j, ),X(j, 2),... only through the values of X(j, ),...,X(j, t ), though possibly on some of the X(m, n) form j. Such random variables are measurable with respect to the σ -field generated by X(j, ),...,X(j, t ), and X(m, n) form j and n =, 2,... We denote this σ -field by F(j, t ). Any decision rule that at each stage chooses an arm that has the highest Gittins index is called a Gittins index rule. Theorem 4. For a k -armed bandit problem with independent arms and geometric discount, any Gittins index rule is optimal. Proof. Suppose that Λ =maxλ j. Let A be an arbitrary decision rule. We prove the theorem by showing that there is a rule A that begins with arm and gives at least as great a value as A. Let k(j, t) denote the (random) time that A uses arm j for the tth time, j =,...,k, t =, 2,..., with the understanding that k(j, t) = if arm j is used less than t times. The value of A may then be written V (A) =E{ β t Z(t) A} =E k j= β k(j,t) X(j, t). Let denote the stopping rule that achieves the supremum in Λ =supe( N N β t X(,t))/E( N β t ),
12 Bandit Problems 7.2 and let T denote the (random) time that A uses arm for the th time, T = k(, ). Define the decision rule A as follows: (a) use arm at times, 2,...,,andthenif <, (b) use the arms j attimes +,...,T in the same order as given by A, andthen if T<, (c) continue according to A from time T on. Let k (j, t) denote the time when A uses arm j for the tth time, so that k (,t)=t for t =,...,. Finally, let m(j) denote the number of times that arm j is used by time T,sothatm() =. Then, (2) V (A ) V (A) =E m(j) k (β k (j,t) β k(j,t) )X(j, t) j= =E (β t β k(,t) )X(,t) E k j=2 m(j) =E ξ(t)β t X(,t) β t (β k(j,t) t β k (j,t) t )X(j, t) k j=2 m(j) E α(j, t)β t X(j, t) where ξ(t) = β k(,t) t, and α(j, t) =β k(j,t) t β k (j,t) t. Since k(,t) t represents the number of times that an arm other than arm has been pulled by the time the tth pull of arm occurs, we have that k(,t) t is F(,t )- measurable and nondecreasing in t a.s. so that ξ(t) is F(,t )-measurable and 0 ξ() ξ(2)... a.s. Thus from Lemma 3, we have (3) E ξ(t)β t X(,t) Λ E ξ(t)β t. For j>, α(j, t) =β k(j,t) t ( β k (j,t) k(j,t) ). Since k(j, t) t is F(j, t )-measurable and nondecreasing and since k (j, t) k(j, t) isequalto minus the number of times arm is pulled before the tth pull of arm j, and hence is F(j, t )-measurable and nonincreasing, we find that α(j, t) isf(j, t )-measurable and α(j, ) α(j, 2) Hence from Lemma 2, we have for j =2,...,k, m(j) m(j) (4) E α(j, t)β t X(j, t) Λ j E α(j, t)β t.
13 Bandit Problems 7.3 Combining (3) and (4) into (2) and recalling that Λ j Λ for all j>, we find m(j) k V (A ) V (A) Λ E (β k (j,t) β k(j,t) ). j= This last expectation is zero since it is just V (A ) V (A) with all payoffs put equal to.
Numeraire-invariant option pricing
Numeraire-invariant option pricing Farshid Jamshidian NIB Capital Bank N.V. FELAB, University of Twente Nov-04 Numeraire-invariant option pricing p.1/20 A conceptual definition of an option An Option can
Invariant Option Pricing & Minimax Duality of American and Bermudan Options
Invariant Option Pricing & Minimax Duality of American and Bermudan Options Farshid Jamshidian NIB Capital Bank N.V. FELAB, Applied Math Dept., Univ. of Twente April 2005, version 1.0 Invariant Option
Gambling Systems and Multiplication-Invariant Measures
Gambling Systems and Multiplication-Invariant Measures by Jeffrey S. Rosenthal* and Peter O. Schwartz** (May 28, 997.. Introduction. This short paper describes a surprising connection between two previously
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e.
CHAPTER II THE LIMIT OF A SEQUENCE OF NUMBERS DEFINITION OF THE NUMBER e. This chapter contains the beginnings of the most important, and probably the most subtle, notion in mathematical analysis, i.e.,
OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem")
OPTIMAL SELECTION BASED ON RELATIVE RANK* (the "Secretary Problem") BY Y. S. CHOW, S. MORIGUTI, H. ROBBINS AND S. M. SAMUELS ABSTRACT n rankable persons appear sequentially in random order. At the ith
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80)
Discrete Math in Computer Science Homework 7 Solutions (Max Points: 80) CS 30, Winter 2016 by Prasad Jayanti 1. (10 points) Here is the famous Monty Hall Puzzle. Suppose you are on a game show, and you
LECTURE 15: AMERICAN OPTIONS
LECTURE 15: AMERICAN OPTIONS 1. Introduction All of the options that we have considered thus far have been of the European variety: exercise is permitted only at the termination of the contract. These
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
Un point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
Infinitely Repeated Games with Discounting Ù
Infinitely Repeated Games with Discounting Page 1 Infinitely Repeated Games with Discounting Ù Introduction 1 Discounting the future 2 Interpreting the discount factor 3 The average discounted payoff 4
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
arxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
The Kelly Betting System for Favorable Games.
The Kelly Betting System for Favorable Games. Thomas Ferguson, Statistics Department, UCLA A Simple Example. Suppose that each day you are offered a gamble with probability 2/3 of winning and probability
1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS
ONDERZOEKSRAPPORT NR 8904 OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS BY M. VANDEBROEK & J. DHAENE D/1989/2376/5 1 IN A OPTIMAl PREMIUM CONTROl NON-liFE INSURANCE BUSINESS By Martina Vandebroek
CPC/CPA Hybrid Bidding in a Second Price Auction
CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09-074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper
Chapter 4 Lecture Notes
Chapter 4 Lecture Notes Random Variables October 27, 2015 1 Section 4.1 Random Variables A random variable is typically a real-valued function defined on the sample space of some experiment. For instance,
e.g. arrival of a customer to a service station or breakdown of a component in some system.
Poisson process Events occur at random instants of time at an average rate of λ events per second. e.g. arrival of a customer to a service station or breakdown of a component in some system. Let N(t) be
The Ideal Class Group
Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned
SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH
31 Kragujevac J. Math. 25 (2003) 31 49. SHARP BOUNDS FOR THE SUM OF THE SQUARES OF THE DEGREES OF A GRAPH Kinkar Ch. Das Department of Mathematics, Indian Institute of Technology, Kharagpur 721302, W.B.,
10.2 Series and Convergence
10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and
The Exponential Distribution
21 The Exponential Distribution From Discrete-Time to Continuous-Time: In Chapter 6 of the text we will be considering Markov processes in continuous time. In a sense, we already have a very good understanding
Probability Generating Functions
page 39 Chapter 3 Probability Generating Functions 3 Preamble: Generating Functions Generating functions are widely used in mathematics, and play an important role in probability theory Consider a sequence
From Binomial Trees to the Black-Scholes Option Pricing Formulas
Lecture 4 From Binomial Trees to the Black-Scholes Option Pricing Formulas In this lecture, we will extend the example in Lecture 2 to a general setting of binomial trees, as an important model for a single
9.2 Summation Notation
9. Summation Notation 66 9. Summation Notation In the previous section, we introduced sequences and now we shall present notation and theorems concerning the sum of terms of a sequence. We begin with a
Nonparametric adaptive age replacement with a one-cycle criterion
Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: [email protected]
Tail inequalities for order statistics of log-concave vectors and applications
Tail inequalities for order statistics of log-concave vectors and applications Rafał Latała Based in part on a joint work with R.Adamczak, A.E.Litvak, A.Pajor and N.Tomczak-Jaegermann Banff, May 2011 Basic
An example of a computable
An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept
OPTIMAL CONTROL OF FLEXIBLE SERVERS IN TWO TANDEM QUEUES WITH OPERATING COSTS
Probability in the Engineering and Informational Sciences, 22, 2008, 107 131. Printed in the U.S.A. DOI: 10.1017/S0269964808000077 OPTIMAL CONTROL OF FLEXILE SERVERS IN TWO TANDEM QUEUES WITH OPERATING
Wald s Identity. by Jeffery Hein. Dartmouth College, Math 100
Wald s Identity by Jeffery Hein Dartmouth College, Math 100 1. Introduction Given random variables X 1, X 2, X 3,... with common finite mean and a stopping rule τ which may depend upon the given sequence,
Notes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 22-26 Jan 2007 1 Introduction In this course we will be looking
The Heat Equation. Lectures INF2320 p. 1/88
The Heat Equation Lectures INF232 p. 1/88 Lectures INF232 p. 2/88 The Heat Equation We study the heat equation: u t = u xx for x (,1), t >, (1) u(,t) = u(1,t) = for t >, (2) u(x,) = f(x) for x (,1), (3)
Mathematical Finance
Mathematical Finance Option Pricing under the Risk-Neutral Measure Cory Barnes Department of Mathematics University of Washington June 11, 2013 Outline 1 Probability Background 2 Black Scholes for European
A Portfolio Model of Insurance Demand. April 2005. Kalamazoo, MI 49008 East Lansing, MI 48824
A Portfolio Model of Insurance Demand April 2005 Donald J. Meyer Jack Meyer Department of Economics Department of Economics Western Michigan University Michigan State University Kalamazoo, MI 49008 East
Lecture 13: Martingales
Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of
Load balancing of temporary tasks in the l p norm
Load balancing of temporary tasks in the l p norm Yossi Azar a,1, Amir Epstein a,2, Leah Epstein b,3 a School of Computer Science, Tel Aviv University, Tel Aviv, Israel. b School of Computer Science, The
Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope
Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope Dukpa Kim Boston University Pierre Perron Boston University December 4, 2006 THE TESTING
CONTRIBUTIONS TO ZERO SUM PROBLEMS
CONTRIBUTIONS TO ZERO SUM PROBLEMS S. D. ADHIKARI, Y. G. CHEN, J. B. FRIEDLANDER, S. V. KONYAGIN AND F. PAPPALARDI Abstract. A prototype of zero sum theorems, the well known theorem of Erdős, Ginzburg
1. (First passage/hitting times/gambler s ruin problem:) Suppose that X has a discrete state space and let i be a fixed state. Let
Copyright c 2009 by Karl Sigman 1 Stopping Times 1.1 Stopping Times: Definition Given a stochastic process X = {X n : n 0}, a random time τ is a discrete random variable on the same probability space as
TAKE-AWAY GAMES. ALLEN J. SCHWENK California Institute of Technology, Pasadena, California INTRODUCTION
TAKE-AWAY GAMES ALLEN J. SCHWENK California Institute of Technology, Pasadena, California L INTRODUCTION Several games of Tf take-away?f have become popular. The purpose of this paper is to determine the
The Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
TD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Abstract
BANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
Factorization Theorems
Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation
6.207/14.15: Networks Lecture 15: Repeated Games and Cooperation Daron Acemoglu and Asu Ozdaglar MIT November 2, 2009 1 Introduction Outline The problem of cooperation Finitely-repeated prisoner s dilemma
Undergraduate Notes in Mathematics. Arkansas Tech University Department of Mathematics
Undergraduate Notes in Mathematics Arkansas Tech University Department of Mathematics An Introductory Single Variable Real Analysis: A Learning Approach through Problem Solving Marcel B. Finan c All Rights
Lecture Notes on Actuarial Mathematics
Lecture Notes on Actuarial Mathematics Jerry Alan Veeh May 1, 2006 Copyright 2006 Jerry Alan Veeh. All rights reserved. 0. Introduction The objective of these notes is to present the basic aspects of the
Degrees that are not degrees of categoricity
Degrees that are not degrees of categoricity Bernard A. Anderson Department of Mathematics and Physical Sciences Gordon State College [email protected] www.gordonstate.edu/faculty/banderson Barbara
15 Limit sets. Lyapunov functions
15 Limit sets. Lyapunov functions At this point, considering the solutions to ẋ = f(x), x U R 2, (1) we were most interested in the behavior of solutions when t (sometimes, this is called asymptotic behavior
Stochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student
Notes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
The Relation between Two Present Value Formulae
James Ciecka, Gary Skoog, and Gerald Martin. 009. The Relation between Two Present Value Formulae. Journal of Legal Economics 15(): pp. 61-74. The Relation between Two Present Value Formulae James E. Ciecka,
11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
n k=1 k=0 1/k! = e. Example 6.4. The series 1/k 2 converges in R. Indeed, if s n = n then k=1 1/k, then s 2n s n = 1 n + 1 +...
6 Series We call a normed space (X, ) a Banach space provided that every Cauchy sequence (x n ) in X converges. For example, R with the norm = is an example of Banach space. Now let (x n ) be a sequence
Adaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
HOMEWORK 5 SOLUTIONS. n!f n (1) lim. ln x n! + xn x. 1 = G n 1 (x). (2) k + 1 n. (n 1)!
Math 7 Fall 205 HOMEWORK 5 SOLUTIONS Problem. 2008 B2 Let F 0 x = ln x. For n 0 and x > 0, let F n+ x = 0 F ntdt. Evaluate n!f n lim n ln n. By directly computing F n x for small n s, we obtain the following
CHAPTER 5. Number Theory. 1. Integers and Division. Discussion
CHAPTER 5 Number Theory 1. Integers and Division 1.1. Divisibility. Definition 1.1.1. Given two integers a and b we say a divides b if there is an integer c such that b = ac. If a divides b, we write a
t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).
1. Line Search Methods Let f : R n R be given and suppose that x c is our current best estimate of a solution to P min x R nf(x). A standard method for improving the estimate x c is to choose a direction
Optimal Auctions Continued
Lecture 6 Optimal Auctions Continued 1 Recap Last week, we... Set up the Myerson auction environment: n risk-neutral bidders independent types t i F i with support [, b i ] residual valuation of t 0 for
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
LOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
SUBGROUPS OF CYCLIC GROUPS. 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by
SUBGROUPS OF CYCLIC GROUPS KEITH CONRAD 1. Introduction In a group G, we denote the (cyclic) group of powers of some g G by g = {g k : k Z}. If G = g, then G itself is cyclic, with g as a generator. Examples
The Relative Worst Order Ratio for On-Line Algorithms
The Relative Worst Order Ratio for On-Line Algorithms Joan Boyar 1 and Lene M. Favrholdt 2 1 Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark, [email protected]
. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
1 The Brownian bridge construction
The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof
Probability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Faculty of Electrical Engineering, Department of Cybernetics Center for Machine Perception
THE DIMENSION OF A VECTOR SPACE
THE DIMENSION OF A VECTOR SPACE KEITH CONRAD This handout is a supplementary discussion leading up to the definition of dimension and some of its basic properties. Let V be a vector space over a field
2.1 The Present Value of an Annuity
2.1 The Present Value of an Annuity One example of a fixed annuity is an agreement to pay someone a fixed amount x for N periods (commonly months or years), e.g. a fixed pension It is assumed that the
Mathematics for Econometrics, Fourth Edition
Mathematics for Econometrics, Fourth Edition Phoebus J. Dhrymes 1 July 2012 1 c Phoebus J. Dhrymes, 2012. Preliminary material; not to be cited or disseminated without the author s permission. 2 Contents
FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22
FOUNDATIONS OF ALGEBRAIC GEOMETRY CLASS 22 RAVI VAKIL CONTENTS 1. Discrete valuation rings: Dimension 1 Noetherian regular local rings 1 Last day, we discussed the Zariski tangent space, and saw that it
When is Reputation Bad? 1
When is Reputation Bad? 1 Jeffrey Ely Drew Fudenberg David K Levine 2 First Version: April 22, 2002 This Version: November 20, 2005 Abstract: In traditional reputation theory, the ability to build a reputation
2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
Fundamentele Informatica II
Fundamentele Informatica II Answer to selected exercises 1 John C Martin: Introduction to Languages and the Theory of Computation M.M. Bonsangue (and J. Kleijn) Fall 2011 Let L be a language. It is clear
Continued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails
12th International Congress on Insurance: Mathematics and Economics July 16-18, 2008 A Uniform Asymptotic Estimate for Discounted Aggregate Claims with Subexponential Tails XUEMIAO HAO (Based on a joint
CONTINUED FRACTIONS AND FACTORING. Niels Lauritzen
CONTINUED FRACTIONS AND FACTORING Niels Lauritzen ii NIELS LAURITZEN DEPARTMENT OF MATHEMATICAL SCIENCES UNIVERSITY OF AARHUS, DENMARK EMAIL: [email protected] URL: http://home.imf.au.dk/niels/ Contents
7: The CRR Market Model
Ben Goldys and Marek Rutkowski School of Mathematics and Statistics University of Sydney MATH3075/3975 Financial Mathematics Semester 2, 2015 Outline We will examine the following issues: 1 The Cox-Ross-Rubinstein
Finite-time Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finite-time Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A-8010
COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS
COMMUTATIVITY DEGREES OF WREATH PRODUCTS OF FINITE ABELIAN GROUPS IGOR V. EROVENKO AND B. SURY ABSTRACT. We compute commutativity degrees of wreath products A B of finite abelian groups A and B. When B
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
THE DEGREES OF BI-HYPERHYPERIMMUNE SETS
THE DEGREES OF BI-HYPERHYPERIMMUNE SETS URI ANDREWS, PETER GERDES, AND JOSEPH S. MILLER Abstract. We study the degrees of bi-hyperhyperimmune (bi-hhi) sets. Our main result characterizes these degrees
Systems with Persistent Memory: the Observation Inequality Problems and Solutions
Chapter 6 Systems with Persistent Memory: the Observation Inequality Problems and Solutions Facts that are recalled in the problems wt) = ut) + 1 c A 1 s ] R c t s)) hws) + Ks r)wr)dr ds. 6.1) w = w +
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.
Some Polynomial Theorems. John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected].
Some Polynomial Theorems by John Kennedy Mathematics Department Santa Monica College 1900 Pico Blvd. Santa Monica, CA 90405 [email protected] This paper contains a collection of 31 theorems, lemmas,
x a x 2 (1 + x 2 ) n.
Limits and continuity Suppose that we have a function f : R R. Let a R. We say that f(x) tends to the limit l as x tends to a; lim f(x) = l ; x a if, given any real number ɛ > 0, there exists a real number
CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
Online Network Revenue Management using Thompson Sampling
Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira David Simchi-Levi He Wang Working Paper 16-031 Online Network Revenue Management using Thompson Sampling Kris Johnson Ferreira
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm.
Chapter 4, Arithmetic in F [x] Polynomial arithmetic and the division algorithm. We begin by defining the ring of polynomials with coefficients in a ring R. After some preliminary results, we specialize
