Beat the Mean Bandit


 Anastasia Webb
 2 years ago
 Views:
Transcription
1 Yisong Yue H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, USA Thorsten Joachims Department of Computer Science, Cornell University, Ithaca, NY, USA Abstract The Dueling Bandits Problem is an online learning framework in which actions are restricted to noisy comparisons between pairs of strategies (also called bandits. It models settings where absolute rewards are difficult to elicit but pairwise preferences are readily available. In this paper, we extend the Dueling Bandits Problem to a relaxed setting where preference magnitudes can violate transitivity. We present the first algorithm for this more general Dueling Bandits Problem and provide theoretical guarantees in both the online and the PAC settings. We also show that the new algorithm has stronger guarantees than existing results even in the original Dueling Bandits Problem, which we validate empirically.. Introduction Online learning approaches have become increasingly popular for modeling recommendation systems that learn from user feedback. Unfortunately, conventional online learning methods assume that absolute rewards (e.g. rate A from to 5 are observable and reliable, which is not the case in search engines and other systems that have access only to implicit feedback (e.g. clicks (Radlinski et al., 008. However, for search engines there exist reliable methods for inferring preference feedback (e.g. is A better than B from clicks (Radlinski et al., 008. This motivates the Karmed Dueling Bandits Problem (Yue et al., 009, which formalizes the problem of online learning with preference feedback instead of absolute rewards. One major limitation of existing algorithms for the Appearing in Proceedings of the 8 th International Conference on Machine Learning, Bellevue, WA, USA, 0. Copyright 0 by the author(s/owner(s. Dueling Bandits Problem is the assumption that user preferences satisfy strong transitivity. For example, for strategies A, B and C, if users prefer A to B by 55%, and B to C by 60%, then strong transitivity requires that users prefer A to C at least 60%. Such requirements are often violated in practice (see Section 3.. In this paper, we extend the Karmed Dueling Bandits Problem to a relaxed setting where stochastic preferences can violate strong transitivity. We present a new algorithm, called BeattheMean, with theoretical guarantees that are not only stronger than previous results for the original setting, but also degrade gracefully with the degree of transitivity violation. We empirically validate our findings and observe that the new algorithm is indeed more robust, and that it has ordersofmagnitude lower variability. Finally, we show that the new algorithm also has PACstyle guarantees for the Dueling Bandits Problem.. Related Work Conventional multiarmed bandit problems have been well studied in both the online (Lai & Robbins, 985; Auer et al., 00 and PAC (Mannor & Tsitsiklis, 00; EvenDar et al., 006; Kalyanakrishnan & Stone, 00 settings. These settings differ from ours primarily in that feedback is measured on an absolute scale. Methods that learn using noisy pairwise comparisons include active learning approaches (Radlinski & Joachims, 007 and algorithms for finding the maximum element (Feige et al., 99. The latter setting is similar to our PAC setting, but requires a common stochastic model for all comparisons. In contrast, our analysis explicitly accounts for not only that different pairs of items yield different stochastic preferences, but also that these preferences might not be internally consistent (i.e. violate strong transitivity. Yue & Joachims (009 considered a continuous version of the Dueling Bandits Problem, where bandits
2 Table. Pr(Row > Col / as estimated from interleaving experiments with six retrieval functions on ArXiv.org. A B C D E F A B C D E F are represented as high dimensional points, and derivatives are estimated by comparing two bandits. More recently, Agarwal et al. (00 proposed a nearoptimal multipoint algorithm for the conventional continuous bandit setting, which may be adaptable to the Dueling Bandits Problem. Our proposed algorithm is structurally similar to the Successive Elimination algorithm proposed by Even Dar et al. (006 for the conventional PAC bandit setting. Our theoretical analysis differs significantly due to having to deal with pairwise comparisons. 3. The Learning Problem The Karmed Dueling Bandits Problem (Yue et al., 009 is an iterative learning problem on a set of bandits B = {b,..., b K } (also called arms or strategies. Each iteration comprises a noisy comparison (duel between two bandits (possibly the same bandit with itself. We assume the comparison outcomes to have independent and timestationary distributions. We write the comparison probabilities as P (b > b = ɛ(b, b + /, where ɛ(b, b ( /, / represents the distinguishability between b and b. We assume there exists a total ordering such that b b ɛ(b, b > 0. We also use the notation ɛ i,j ɛ(b i, b j. Note that ɛ(b, b = ɛ(b, b and ɛ(b, b = 0. For ease of analysis, we also assume WLOG that the bandits are indexed in preferential order b b... b K. Online Setting. In the online setting, algorithms are evaluated on the fly during every iteration. Let (b (t, b(t be the bandits chosen at iteration t. Let T be the time horizon. We quantify performance using the following notion of regret, R T = T ( ɛ(b, b (t + ɛ(b, b (t. ( t= In search applications, ( reflects the fraction of users who would have preferred b over b (t and b (t. PAC Setting. In the PAC setting, the goal is to confidently find a nearoptimal bandit. More precisely, an (ε,δpac algorithm will find a bandit ˆb such that P (ɛ(b, ˆb > ε δ. Efficiency is measured via the sample complexity, i.e. the total number of comparisons required. Note that sample complexity penalizes each comparison equally, whereas the regret ( of a comparison depends on the bandits being compared. 3.. Modeling Assumptions Previous work (Yue et al., 009 relied on two properties, called stochastic triangle inequality and strong stochastic transitivity. In this paper, we assume a relaxed version of strong stochastic transitivity that more accurately characterizes realworld user preferences. Note that we only require the two properties below to be defined relative to the best bandit b. Relaxed Stochastic Transitivity. For any triplet of bandits b b j b k and some γ, we assume γɛ,k max{ɛ,j, ɛ j,k }. This can be viewed as a monotonicity or internal consistency property of user preferences. Strong stochastic transitivity, considered in (Yue et al., 009, is the special case where γ =. Stochastic Triangle Inequality. For any triplet of bandits b b j b k, we assume ɛ,k ɛ,j +ɛ j,k. This can be viewed as a diminishing returns property. To understand why relaxed stochastic transitivity is important, consider Table, which describes preferences elicited from pairwise interleaving experiments (Radlinski et al., 008 using six retrieval functions in the fulltext search engine of ArXiv.org. We see that user preferences obey a total ordering A B... F, and satisfy relaxed stochastic transitivity for γ =.5 (due to A, B, D as well as stochastic triangle inequality. A good algorithm should have guarantees that degrade smoothly as γ increases.. Algorithm and Analysis Our algorithm, called BeattheMean, is described in Algorithm. The online and PAC settings require different input parameters, and those are specified in Algorithm and Algorithm 3, respectively. BeattheMean proceeds in a sequence of rounds, and maintains a working set W l of active bandits during each round l. For each active bandit b i W l, an empirical estimate ˆP i (Line 6 is maintained for how often b i beats the mean bandit b l of W l, where comparing b i with b l is functionally identical to comparing b i Our results can be extended to the relaxed case where ɛ,k λ(ɛ,j + ɛ j,k for λ. However, we focus on strong transitivity since it is far more easily violated in practice.
3 Algorithm BeattheMean : Input: B = {b,..., b K}, N, T, c δ,γ ( : W {b,..., b K} //working set of active bandits 3: l //num rounds : b W l, n b 0 //num comparisons 5: b W l, w b 0 //num wins 6: b W l, ˆPb w b /n b, or / if n b = 0 7: n min b Wl n b 8: c c δ,γ (n, or if n = 0 //confidence radius 9: t 0 //total number of iterations 0: while > and t < T and n < N do : b argmin b Wl n b //break ties randomly : select b W l at random, compare b vs b 3: if b wins, w b w b + : n b n b + 5: t t + 6: if min b W l ˆPb + c max b Wl ˆPb c then 7: b argmin b Wl ˆPb 8: b W l, delete comparisons with b from w b, n b 9: W l+ W l \ {b } //update working set 0: l l + //new round : end if : end while 3: return argmax b Wl ˆPb Algorithm BeattheMean (Online : Input B = {b,..., b K}, γ, T : δ /(T K 3: Define c δ,γ ( using ( : ˆb BeattheMean(B,, T, c δ,γ with a bandit sampled uniformly from W l (Line. In each iteration, a bandit with the fewest recorded comparisons is selected to compare with b l (Line. Whenever the empirically worst bandit b is separated from the empirically best one by a sufficient confidence margin (Line 6, then the round ends, all recorded comparisons involving b are removed (Line 8, and b is removed from W l (Line 9. Afterwards, each remaining ˆP i is again an unbiased estimate of b i versus the mean bandit b l+ of the new W l+. The algorithm terminates when only one active bandit remains, or when another termination condition is met (Line 0. Notation and terminology. We call a round all the contiguous comparisons until a bandit is removed. We say b i defeats b j if b i and b j have the highest and lowest empirical means, respectively, and that the difference is sufficiently large (Line 6. Our algorithm makes a mistake whenever it removes the best bandit b from any W l. We will use the shorthand ˆP i,j,n ˆP i,n ˆP j,n, ( where ˆP i,n refers to the empirical estimate of b i versus the mean bandit b l after n comparisons (we often suppress l for brevity. We call the empirically Algorithm 3 BeattheMean (PAC : Input B = {b,..., b K}, γ, ε, δ : Define N using (8 3: Define c δ,γ ( using (7 : ˆb BeattheMean(B, N,, c δ,γ best and empirically worst bandits to be the ones with highest and lowest ˆP i,n, respectively. We call the best and worst bandits in W l to be argmin bi W l i and argmax bi W l i, respectively. We define the expected performance of any active bandit b i W l to be E[ ˆP i ] = ( b W l P (b i > b. (3 For clarity of presentation, all proofs are contained in the appendix. We begin by stating two observations. Observation. Let b k be the worst bandit in W l, and let b W l. Then E[ ˆP,k,n ] ɛ,k. Observation. Let b k be the worst bandit in W l, and let b W l. Then b j W l : E[ ˆP j,k,n ] γ ɛ,k. Observation implies a margin between the expected performance of b and the worst bandit in W l. This will be used to bound the comparisons required in each round. Due to relaxed stochastic transitivity, b may not have the best expected performance. 3 Observation bounds the difference in expected performance between any bandit and the worst bandit in W l. This will be used to derive the appropriate confidence intervals so that b W l+ with sufficient probability... Online Setting We take an explore then exploit approach for the online setting, similar to (Yue et al., 009. For time horizon T, relaxed transitivity parameter γ, and bandits B = {b,..., b K }, we use BeattheMean in the explore phase (see Algorithm. Let ˆb denote the bandit returned by BeattheMean. We then enter an exploit phase by repeatedly choosing (b (t, b(t = (ˆb, ˆb until reaching T total comparisons. Comparisons in the exploit phase incur no regret assuming ˆb = b. We use the following confidence interval c(, c δ,γ (n = 3γ n log δ, ( where δ = /(KT. We do not use the last remaining input N to BeattheMean (i.e., we set N = ; it is used only in the PAC setting. 3 For example, the second best bandit may lose slightly to b but be strongly preferred versus the other bandits. When γ =, we can use a tighter conf. interval (9.
4 We will show that BeattheMean correctly returns the best bandit w.p. at least /T. Correspondingly, a suboptimal bandit is returned with probability at most /T, in which case we assume maximal regret O(T. We can thus bound the expected regret by E[R T ] ( /T E [ R BtM ] + (/T O(T = O ( E [ R BtM T T ] + (5 where RT BtM denotes the regret incurred from running BeattheMean. Thus the regret bound depends entirely on the regret incurred by BeattheMean. Theorem. For T K, BeattheMean makes a mistake with probability at most /T, or otherwise returns the best bandit b B and accumulates online regret ( that is bounded with high probability by O ( K l= { } γ 7 min, γ5 ɛ l ɛ l ɛ log T ( γ 7 K = O log T ɛ where ɛ l = ɛ,k if b k is the worst remaining bandit in round l, and ɛ = min{ɛ,,..., ɛ,k }. Corollary. For T K, mistakefree executions of BeattheMean accumulate online regret that is bounded with high probability by ( K γ 8 O log T. ɛ,k k= Our algorithm improves on the previously proposed Interleaved Filter (IF algorithm (Yue et al., 009 in two ways. First, (6 applies when γ >, 5 whereas the bound for IF does not. Second, while (6 matches the expected regret bound for IF when γ =, ours is a high probability bound. Our experiments show that IF can accumulate large regret even when strong stochastic transitivity is slightly violated (e.g. γ =.5 as in Table, and has high variance when γ =. BeattheMean exhibits neither drawback... PAC Setting When running BeattheMean in the PAC setting, one of two things can happen. In the first case, the active set W l is reduced to a single bandit ˆb, in which case we will prove that ˆb = b with sufficient probability. In the second case, the algorithm terminates when the number of comparisons recorded for each remaining bandit is at least N defined in (8 below, in which 5 In practice, γ is typically close to, making the somewhat poor dependence on γ less of a concern. This poor dependence seems primarily due to the bound of Observation often being quite loose. However, it is unclear if one can do better in the general worst case. (6 Regret Regret x Number of Bandits x Number of Bandits Figure. Comparing regret of BeattheMean (light and Interleaved Filter (dark when γ =. For the top graph, each ɛ i,j = 0. where b i b j. For the bottom graph, each ɛ i,j = /( + exp(µ j µ i 0.5, where each µ i N(0,. Error bars indicate one standard deviation. case we prove that every remaining bandit is within ε of b with sufficient probability. It suffices to focus on the second case when analyzing sample complexity. The input parameters are described in Algorithm 3. We use the following confidence interval c(, c δ,γ (n = 3γ n log K3 N, (7 δ where N is the smallest positive integer such that 36γ 6 N = ε log K3 N. (8 δ Note that there are at most K N total time steps, since there are at most K rounds with at most KN comparisons removed after each round. 6 We do not use the last remaining input T to BeattheMean (i.e., we set T = ; it is used only in the online setting. Theorem. BeattheMean in Algorithm 3 is an (ε, δpac algorithm with sample complexity ( Kγ 6 O (KN = O ε log KN. δ 5. Evaluating Online Regret As mentioned earlier, in the online setting, the theoretical guarantees of BeattheMean offer two advantages over the previously proposed Interleaved Filter 6 K N is a trivial bound on the sample complexity of Alg. 3. Thm. gives a high probability bound of O(KN.
5 Regret Regret 3.5 x Number of Bandits x gamma (K=500 Figure. Comparing regret of BeattheMean (light and Interleaved Filter (dark when γ >. For the top graph, we fix γ =.3 and vary K. For the bottom graph, we fix K = 500 and vary γ. Error bars indicate one standard deviation. (IF algorithm (Yue et al., 009 by (A giving a high probability exploration bound versus one in expectation (and thus ensuring low variance, and (B being provably robust to relaxations of strong transitivity (γ >. We now evaluate these cases empirically. IF maintains a candidate bandit, and plays the candidate against the remaining bandits until one is confidently superior. Thus, the regret of IF heavily depends on the initial candidate, resulting in increased variance. When strong transitivity is violated (γ >, one can create scenarios where b i is more likely to defeat b i than any other bandit. Thus, IF will sift through O(K candidates and suffer O(K regret in the worst case. BeattheMean avoids this issue by playing every bandit against the mean bandit. We evaluate using simulations, where we construct stochastic preferences ɛ i,j that are unknown to the algorithms. Since the guarantees of Beatthe Mean and IF differ w.r.t. the number of bandits K, we fix T = 0 0 and vary K. We first evaluate Case (A, where strong transitivity holds (γ =. In this setting, we will use the following confidence interval, c δ,γ (n = (/n log(/δ, (9 where δ = /(T K. This is tighter than (, leading to a more efficient algorithm that still retains the correctness guarantees when γ =. 7 7 The key observation is that cases (b and (c in Lemma do not apply when γ =. The confidence interval used by IF (Yue et al., 009 is also equivalent to (9. We evaluate two settings for Case (A, with the results presented in Figure. The first setting (Figure top defines ɛ i,j = 0. for all b i b j. The second setting (Figure bottom defines for each b i a utility µ i N(0, drawn i.i.d. from a unit normal distribution. Stochastic preferences are defined using a logistic model, i.e. ɛ i,j = /( + exp(µ j µ i 0.5. For both settings, we run 00 trials and observe the average regret of both methods to increase linearly with K, but the regret of IF shows much higher variance, which matches the theoretical results. We next evaluate Case (B, where stochastic preferences only satisfy relaxed transitivity. For any b i b j, we define ɛ i,j = 0.γ if < i = j, or ɛ i,j = 0. otherwise. Figure top shows the results for γ =.3. We observe the similar behavior from BeattheMean as in Case (A, 8 but IF suffers superlinear regret (and is thus not robust with significantly higher variance. For completeness, Figure bottom shows how regret changes as γ is varied for fixed K = 500. We observe a phase transition near γ =.3 where IF begins to suffer superlinear regret (w.r.t. K. As γ increases further, each round in IF also shortens due to each ɛ i,i increasing, causing the regret of IF to decrease slightly (for fixed K. BeattheMean uses confidence intervals ( that grow as γ increases, causing it to require more comparisons for each bandit elimination. It is possible that BeattheMean can still behave correctly (i.e. be mistakefree in practice while using tighter confidence intervals (e.g. (9. 6. Conclusion We have presented an algorithm for the Dueling Bandits Problem with high probability exploration bounds for the online and PAC settings. The performance guarantees of our algorithm degrade gracefully as one relaxes the strong stochastic transitivity property, which is a property often violated in practice. Empirical evaluations confirm the advantages of our theoretical guarantees over previous results. Acknowledgements. This work was funded in part by NSF Award IIS A. Extended Analysis Proof of Observation. We can write E[ ˆP,k,n ] ( as E[ ˆP,k,n ] = ɛ,k + (ɛ,j ɛ k,j. b j W l \{b,b k } 8 The regret is larger than in the analogous setting in Case (A due to the use of wider confidence intervals.
6 Each b j in the summation above satisfies b b j b k. Thus, ɛ,j ɛ k,j = ɛ,j + ɛ j,k ɛ,k, due to stochastic triangle inequality, implying E[ ˆP,k,n ] ɛ,k. Proof of Observation. We focus on the nontrivial case where b j b k, b. Combining (3 and ( yields E[ ˆP j,k,n ] = ɛ j,k + b h W l \{b j,b k } (ɛ j,h ɛ k,h. We know that ɛ j,k γɛ,k from relaxed transitivity. For each b h in the above summation, either (a b h b j b k, or (b b j b h b k. In case (a we have ɛ j,h ɛ k,h = ɛ j,h + ɛ h,k ɛ h,k γɛ,k, since ɛ j,h 0. In case (b we have ɛ j,h +ɛ h,k γ(ɛ,h +ɛ,k γ max{ɛ,h, ɛ,k } γ ɛ,k. This implies that E[ ˆP j,k,n ] γ ɛ,k. A.. Online Setting We assume here that BeattheMean is run according to Alg.. We define c n c δ,γ (n using (. Lemma. For δ = /(T K, and assuming that b W l, the probability of b i W l \{b } defeating b at the end of round l (i.e., a mistake is at most /(T K. Proof. Having b i defeat b requires that for some n, ˆP,n +c n < ˆP i,n c n, and also that b is the first bandit to be defeated in the round. Since at any time, all remaining bandits have the same confidence interval size, then ˆP,n must have the lowest empirical mean. In particular, this requires ˆP,n ˆP k,n, where b k is the worst bandit in W l. We will show that, for any n, the probability of making a mistake is at most δ < /(T K. Thus, by the union bound, the probability of b i mistakenly defeating b for any n T is at most T δ < /(T K. We consider three sufficient cases: (a E[ ˆP i,n ] E[ ˆP,n ] (b E[ ˆP i,n ] > E[ ˆP,n ] and n < log(/δ (c E[ ˆP i,n ] > E[ ˆP,n ] and n log(/δ In case (a applying Hoeffding s inequality yields P ( ˆP,n + c n < ˆP i,n c n δ < δ. In cases (b and (c, we have b i b k since Observation implies E[ ˆP,n ] E[ ˆP k,n ]. In case (b we have c n > (3/γ ɛ,k, which implies via Hoeffding s inequality, P ( ˆP i,n c n > ˆP,n + c n P ( ˆP i,n c n > ˆP k,n + c n (0 = P ( ˆP i,k,n E[ ˆP i,k,n ] > c n E[ ˆP i,k,n ] P ( ˆP i,k,n E[ ˆP i,k,n ] > c n γ ɛ,k ( P ( ˆP i,k,n E[ ˆP i,k,n ] > (/3c n exp ( γ log(/δ δ < δ where (0 follows from j : ɛ,j ɛ k,j, and ( follows from Observation. In case (c we know that P ( ˆP k,n ˆP,n = P ( ˆP k,,n E[ ˆP k,,n ] E[ ˆP k,,n ] P ( ˆP k,,n E[ ˆP k,,n ] ɛ,k ( exp ( n/ = δ < δ Lemma. BeattheMean makes a mistake with probability at most /T. Proof. By Lemma, the probability b j W l \ {b } defeats b is at most /(T K. There are at most K active bandits in any round and at most K rounds. Applying the union bound proves the lemma. Lemma 3. Let δ = /(T K, T K and assume b W l. If b k is the worst bandit in W l, then the number of comparisons each b W l needs to accumulate before some bandit being removed (and thus ending the round is with high probability bounded by ( ( O γ log(t K = O γ log T Proof. It suffices to bound the comparisons n required to remove b k. We will show that for any d, there exists an m depending only on d such that ( P n mγ ɛ log(t K min { K d, T d},k for all K and T sufficiently large. We will focus on the sufficient condition of b defeating b k : if at any t we have ˆP,t c t > ˆP k,t + c t, then b k is removed from W l. It follows that for any t, if n > t, then ˆP,t c t ˆP k,t + c t, and so P (n > t P ( ˆP,t c t ˆP k,t + c t. Note from Observation that E[ ˆP,k,t ] ɛ,k. Thus, P ( ˆP,t c t ˆP k,t + c t = P (E[ ˆP,k,t ] ˆP,k,t E[ ˆP,k,t ] c t P (E[ ˆP,k,t ] ˆP,k,t ɛ,k c t (3.
7 For any m 8 and t 8mγ log(t K/, we have c t γɛ,k /, and so applying Hoeffding s inequality for this m and t shows that (3 is bounded by P ( ˆP,k,t E[ ˆP,k,t ] ɛ,k / exp( t/8. Since t 8mγ log(t K/ by assumption, we have /8 m log(t K, and so t exp( t/8 exp( m log(t K = /(T K m, which is bounded by K m and T m. We finally note that for T K, O(log(T K = O(log T. Lemma. Assume b W l for l l. Then the number of comparisons removed at the end of round l is bounded with high probability by ( { } γ O min ɛ, γ6 ɛ log T, l where ɛ l = ɛ,k if b k is the worst bandit in W l, and ɛ = min{ɛ,,..., ɛ,k }. Proof. Let b k be the worst bandit in W l, and let b j W l denote the bandit removed at the end of round l. By Lemma 3, the total number of comparisons that b j accumulates is bounded with high probability by O ( γ log T ( γ = O ɛ log T. ( Some bandits may have accumulated more comparisons than (, since the number of remaining comparisons from previous rounds may exceed (. However, Lemma 3 implies the number of remaining comparisons is at most ( γ O log T, where ɛ,k = min bk b q ɛ,q ɛ,k /γ. We can bound the number of comparisons accumulated at the end of round l by O(D l,γ,t, where { } γ D l,γ,t min ɛ, γ6 ɛ log T, (5 l and ɛ l = ɛ,k. In expectation + ( / < (6 fraction of the comparisons will be removed at the end of round l (those that involve b j. Applying Hoeffding s inequality yields the number of removed comparisons is with high probability O(D l,γ,t. Proof of Theorem. Lemma bounds the mistake probability by /T. We focus here on the case where BeattheMean is mistakefree. We will show that, at the end of each round l, the regret incurred from the removed comparisons is bounded with high probability by O(γɛ l D l,γ,t for D l,γ,t defined in (5. Since this happens K times (once per round and accounts for the regret of every comparison, then the total accumulated regret of BeattheMean will be bounded with high probability by O ( K l= γɛ l D l,γ,t = O ( K l= { γ 5 } ɛ l min ɛ, γ7 log T. ɛ l By Lemma, the number of removed comparisons in round l is at most O(D l,γ,t. The incurred regret of a comparison between any b i and the removed bandit b j is (ɛ,i +ɛ,j / γɛ l, which follows from relaxed transitivity. Thus, the regret incurred from all removed comparisons of round l is at most O(γɛ l D l,γ,t. Proof of Corollary. We know from Theorem that the regret assigned to the removed bandit in each round l is O((γ 7 /ɛ l log T, where ɛ l = ɛ,k if b k is the worst bandit in W l. By the pigeonhole principle b K+ l b k, since in round l there are K + l bandits. By relaxed transitivity, we have ɛ,k+ l γɛ,k. The desired result naturally follows. A.. PAC Setting We assume here that BeattheMean is run according to Alg. 3. We define c n c δ,γ (n using (7. Lemma 5. Assuming that b W l, the probability of b i W l \ {b } defeating b at the end of round l (i.e., a mistake is at most δ/k 3. Proof. (Sketch. This proof is structurally identical to Lemma, except using a different confidence interval. We consider three analogous cases: (a E[ ˆP i,n ] < E[ ˆP,n ] (b E[ ˆP i,n ] E[ ˆP,n ] and n < log(k 3 N/δ (c E[ ˆP i,n ] E[ ˆP,n ] and n log(k 3 N/δ In case (a applying Hoeffding s inequality yields P ( ˆP,n + c n < ˆP i,n c n (δ/(k 3 N < δ/(k 5 N. In case (b, following (0 and (, we have P ( ˆP i,n c n > ˆP,n + c n P ( ˆP i,k,n E[ ˆP i,k,n ] > (/3c n exp ( γ log(/δ (δ/(k 3 N < δ/(k 5 N.
8 In case (c, following ( and using K, we have P ( ˆP,n ˆP k,n exp ( nɛ i,k/ exp ( log(k 3 N/δ = (δ/(k 3 N < δ/(k 5 N. So the probability of each failure case is at most δ/(k 5 N. There are at most K N time steps, so applying the union bound proves the claim. Lemma 6. BeattheMean makes a mistake with probability at most δ/k. The proof of Lemma 6 is exactly the same as Lemma, except leveraging Lemma 5 instead of Lemma. Lemma 7. If a mistakefree execution of Beatthe Mean terminates due to n = N in round l, then P ( b j W l : ɛ,j > ε δ/k. Proof. Suppose BeattheMean terminates due to n = N after round l and t total comparisons. For any b i W l, we know via Hoeffding s inequality that P ( ˆP i,n E[ ˆP i,n ] c N exp ( 8γ log(k 3 N/δ, which is at most δ/(k N. There are at most K N comparisons, so taking the union bound over all t and b i yields δ/k. So with probability at least δ/k, each ˆP i is within c N of its expectation when BeattheMean terminates. Using (8, this implies b j W l, E[ ˆP ] E[ ˆP j ] c N ε/γ. In particular, for the worst bandit b k in W l we have ε/γ E[ ˆP ] E[ ˆP k ] ɛ,k, (7 which follows from Observation. Since γɛ,k ɛ,j for any b j W l, then (7 implies ε ɛ,j. Proof of Theorem. We first analyze correctness. We consider two sufficient failure cases: (a BeattheMean makes a mistake (b BeattheMean is mistakefree and there exists an active bandit b j upon termination where ɛ,j > ε Lemma 6 implies that the probability of (a is at most δ/k. If BeattheMean terminates due to n = N, then Lemma 7 implies that the probability of (b is also at most δ/k. By the union bound the total probability of (a or (b is bounded by δ/k δ, since K. We now analyze the sample complexity. Let L denote the round when the termination condition n = N is satisfied (L = K if the condition was never satisfied. For rounds,..., L, the maximum number of unremoved comparisons accumulated by any bandit is N. Using the same argument from (6, the number of comparisons removed after each round is then with high probability O(N. In round L, each of the K L + remaining bandits accumulate at most N comparisons, implying that the total number of comparisons made is bounded by ( Kγ 6 O ((K L + N + (L N = O ε log KN. δ References Agarwal, Alekh, Dekel, Ofer, and Xiao, Lin. Optimal algorithms for online convex optimization with multipoint bandit feedback. In Conference on Learning Theory (COLT, 00. Auer, Peter, CesaBianchi, Nicolò, Freund, Yoav, and Schapire, Robert. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 3(:8 77, 00. EvenDar, Eyal, Mannor, Shie, and Mansour, Yishay. Action elimination and stopping conditions for the multiarmed bandit and reinforcement learning problems. Journal of Machine Learning Research (JMLR, 7:079 05, 006. Feige, Uriel, Raghavan, Prabhakar, Peleg, David, and Upfal, Eli. Computing with noisy information. SIAM Journal on Computing, 3(5, 99. Kalyanakrishnan, Shivaram and Stone, Peter. Efficient selection of multiple bandit arms: Theory and practice. In International Conference on Machine Learning (ICML, 00. Lai, T. L. and Robbins, Herbert. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:, 985. Mannor, Shie and Tsitsiklis, John N. The sample complexity of exploration in the multiarmed bandit problem. Journal of Machine Learning Research (JMLR, 5: 63 68, 00. Radlinski, Filip and Joachims, Thorsten. Active exploration for learning rankings from clickthrough data. In ACM Conference on Knowledge Discovery and Data Mining (KDD, 007. Radlinski, Filip, Kurup, Madhu, and Joachims, Thorsten. How does clickthrough data reflect retrieval quality? In ACM Conference on Information and Knowledge Management (CIKM, 008. Yue, Yisong and Joachims, Thorsten. Interactively optimizing information retrieval systems as a dueling bandits problem. In International Conference on Machine Learning (ICML, 009. Yue, Yisong, Broder, Josef, Kleinberg, Robert, and Joachims, Thorsten. The karmed dueling bandits problem. In Conference on Learning Theory (COLT, 009.
The Karmed Dueling Bandits Problem
The Karmed Dueling Bandits Problem Yisong Yue Dept of Computer Science Cornell University Ithaca, NY 4853 yyue@cscornelledu Josef Broder Center for Applied Math Cornell University Ithaca, NY 4853 jbroder@camcornelledu
More informationUCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTIARMED BANDIT PROBLEM
UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTIARMED BANDIT PROBLEM PETER AUER AND RONALD ORTNER ABSTRACT In the stochastic multiarmed bandit problem we consider a modification of the
More informationMonotone multiarmed bandit allocations
JMLR: Workshop and Conference Proceedings 19 (2011) 829 833 24th Annual Conference on Learning Theory Monotone multiarmed bandit allocations Aleksandrs Slivkins Microsoft Research Silicon Valley, Mountain
More informationOnline Learning with Switching Costs and Other Adaptive Adversaries
Online Learning with Switching Costs and Other Adaptive Adversaries Nicolò CesaBianchi Università degli Studi di Milano Italy Ofer Dekel Microsoft Research USA Ohad Shamir Microsoft Research and the Weizmann
More informationMultiArmed Bandit Problems
Machine Learning Coms4771 MultiArmed Bandit Problems Lecture 20 Multiarmed Bandit Problems The Setting: K arms (or actions) Each time t, each arm i pays off a bounded realvalued reward x i (t), say
More informationA Survey of Preferencebased Online Learning with Bandit Algorithms
A Survey of Preferencebased Online Learning with Bandit Algorithms Róbert BusaFekete and Eyke Hüllermeier Department of Computer Science University of Paderborn, Germany {busarobi,eyke}@upb.de Abstract.
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationTrading regret rate for computational efficiency in online learning with limited feedback
Trading regret rate for computational efficiency in online learning with limited feedback Shai ShalevShwartz TTIC Hebrew University Online Learning with Limited Feedback Workshop, 2009 June 2009 Shai
More informationUn point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
More informationA Potentialbased Framework for Online Multiclass Learning with Partial Feedback
A Potentialbased Framework for Online Multiclass Learning with Partial Feedback Shijun Wang Rong Jin Hamed Valizadegan Radiology and Imaging Sciences Computer Science and Engineering Computer Science
More informationStrongly Adaptive Online Learning
Amit Daniely Alon Gonen Shai ShalevShwartz The Hebrew University AMIT.DANIELY@MAIL.HUJI.AC.IL ALONGNN@CS.HUJI.AC.IL SHAIS@CS.HUJI.AC.IL Abstract Strongly adaptive algorithms are algorithms whose performance
More informationLecture 3: UCB Algorithm, WorstCase Regret Bound
IEOR 8100001: Learning and Optimization for Sequential Decision Making 01/7/16 Lecture 3: UCB Algorithm, WorstCase Regret Bound Instructor: Shipra Agrawal Scribed by: Karl Stratos = Jang Sun Lee 1 UCB
More informationCSC2420 Fall 2012: Algorithm Design, Analysis and Theory
CSC2420 Fall 2012: Algorithm Design, Analysis and Theory Allan Borodin November 15, 2012; Lecture 10 1 / 27 Randomized online bipartite matching and the adwords problem. We briefly return to online algorithms
More informationHandling Advertisements of Unknown Quality in Search Advertising
Handling Advertisements of Unknown Quality in Search Advertising Sandeep Pandey Carnegie Mellon University spandey@cs.cmu.edu Christopher Olston Yahoo! Research olston@yahooinc.com Abstract We consider
More informationOnline Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret
Online Bandit Learning against an Adaptive Adversary: from Regret to Policy Regret Raman Arora Toyota Technological Institute at Chicago, Chicago, IL 60637, USA Ofer Dekel Microsoft Research, 1 Microsoft
More informationRegret Minimization for Reserve Prices in SecondPrice Auctions
Regret Minimization for Reserve Prices in SecondPrice Auctions Nicolò CesaBianchi Università degli Studi di Milano Joint work with: Claudio Gentile (Varese) and Yishay Mansour (TelAviv) N. CesaBianchi
More informationarxiv:1306.0155v1 [cs.lg] 1 Jun 2013
Dynamic ad allocation: bandits with budgets Aleksandrs Slivkins May 2013 arxiv:1306.0155v1 [cs.lg] 1 Jun 2013 Abstract We consider an application of multiarmed bandits to internet advertising (specifically,
More informationIntroduction Main Results Experiments Conclusions. The UCB Algorithm
Paper Finitetime Analysis of the Multiarmed Bandit Problem by Auer, CesaBianchi, Fischer, Machine Learning 27, 2002 Presented by Markus Enzenberger. Go Seminar, University of Alberta. March 14, 2007
More information1 Introduction. 2 Prediction with Expert Advice. Online Learning 9.520 Lecture 09
1 Introduction Most of the course is concerned with the batch learning problem. In this lecture, however, we look at a different model, called online. Let us first compare and contrast the two. In batch
More informationAdaptive Search with Stochastic Acceptance Probabilities for Global Optimization
Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,
More informationRevenue Optimization against Strategic Buyers
Revenue Optimization against Strategic Buyers Mehryar Mohri Courant Institute of Mathematical Sciences 251 Mercer Street New York, NY, 10012 Andrés Muñoz Medina Google Research 111 8th Avenue New York,
More informationThe Price of Truthfulness for PayPerClick Auctions
Technical Report TTITR20083 The Price of Truthfulness for PayPerClick Auctions Nikhil Devanur Microsoft Research Sham M. Kakade Toyota Technological Institute at Chicago ABSTRACT We analyze the problem
More informationFairness in Routing and Load Balancing
Fairness in Routing and Load Balancing Jon Kleinberg Yuval Rabani Éva Tardos Abstract We consider the issue of network routing subject to explicit fairness conditions. The optimization of fairness criteria
More informationFoundations of Machine Learning OnLine Learning. Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu
Foundations of Machine Learning OnLine Learning Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Motivation PAC learning: distribution fixed over time (training and test). IID assumption.
More informationChapter 7. Sealedbid Auctions
Chapter 7 Sealedbid Auctions An auction is a procedure used for selling and buying items by offering them up for bid. Auctions are often used to sell objects that have a variable price (for example oil)
More information2.3 Scheduling jobs on identical parallel machines
2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed
More informationLarge induced subgraphs with all degrees odd
Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order
More information1 Limiting distribution for a Markov chain
Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easytocheck
More informationThe Conference Call Search Problem in Wireless Networks
The Conference Call Search Problem in Wireless Networks Leah Epstein 1, and Asaf Levin 2 1 Department of Mathematics, University of Haifa, 31905 Haifa, Israel. lea@math.haifa.ac.il 2 Department of Statistics,
More informationWeek 1: Introduction to Online Learning
Week 1: Introduction to Online Learning 1 Introduction This is written based on Prediction, Learning, and Games (ISBN: 2184189 / 2184189 CesaBianchi, Nicolo; Lugosi, Gabor 1.1 A Gentle Start Consider
More informationLINEAR PROGRAMMING WITH ONLINE LEARNING
LINEAR PROGRAMMING WITH ONLINE LEARNING TATSIANA LEVINA, YURI LEVIN, JEFF MCGILL, AND MIKHAIL NEDIAK SCHOOL OF BUSINESS, QUEEN S UNIVERSITY, 143 UNION ST., KINGSTON, ON, K7L 3N6, CANADA EMAIL:{TLEVIN,YLEVIN,JMCGILL,MNEDIAK}@BUSINESS.QUEENSU.CA
More informationApplied Algorithm Design Lecture 5
Applied Algorithm Design Lecture 5 Pietro Michiardi Eurecom Pietro Michiardi (Eurecom) Applied Algorithm Design Lecture 5 1 / 86 Approximation Algorithms Pietro Michiardi (Eurecom) Applied Algorithm Design
More informationOnline Optimization and Personalization of Teaching Sequences
Online Optimization and Personalization of Teaching Sequences Benjamin Clément 1, Didier Roy 1, Manuel Lopes 1, PierreYves Oudeyer 1 1 Flowers Lab Inria Bordeaux SudOuest, Bordeaux 33400, France, didier.roy@inria.fr
More informationNonparametric adaptive age replacement with a onecycle criterion
Nonparametric adaptive age replacement with a onecycle criterion P. CoolenSchrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK email: Pauline.Schrijner@durham.ac.uk
More information3 Does the Simplex Algorithm Work?
Does the Simplex Algorithm Work? In this section we carefully examine the simplex algorithm introduced in the previous chapter. Our goal is to either prove that it works, or to determine those circumstances
More informationScheduling Home Health Care with Separating Benders Cuts in Decision Diagrams
Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams André Ciré University of Toronto John Hooker Carnegie Mellon University INFORMS 2014 Home Health Care Home health care delivery
More informationThe Trip Scheduling Problem
The Trip Scheduling Problem Claudia Archetti Department of Quantitative Methods, University of Brescia Contrada Santa Chiara 50, 25122 Brescia, Italy Martin Savelsbergh School of Industrial and Systems
More informationOptimizing Search Engines using Clickthrough Data
Optimizing Search Engines using Clickthrough Data Presented by  Kajal Miyan Seminar Series, 891 Michigan state University *Slides adopted from presentations of Thorsten Joachims (author) and ShuiLung
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. #approximation algorithm.
Approximation Algorithms 11 Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of three
More informationChapter 11. 11.1 Load Balancing. Approximation Algorithms. Load Balancing. Load Balancing on 2 Machines. Load Balancing: Greedy Scheduling
Approximation Algorithms Chapter Approximation Algorithms Q. Suppose I need to solve an NPhard problem. What should I do? A. Theory says you're unlikely to find a polytime algorithm. Must sacrifice one
More informationUnraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets
Unraveling versus Unraveling: A Memo on Competitive Equilibriums and Trade in Insurance Markets Nathaniel Hendren January, 2014 Abstract Both Akerlof (1970) and Rothschild and Stiglitz (1976) show that
More information! Solve problem to optimality. ! Solve problem in polytime. ! Solve arbitrary instances of the problem. !approximation algorithm.
Approximation Algorithms Chapter Approximation Algorithms Q Suppose I need to solve an NPhard problem What should I do? A Theory says you're unlikely to find a polytime algorithm Must sacrifice one of
More informationMODELING RANDOMNESS IN NETWORK TRAFFIC
MODELING RANDOMNESS IN NETWORK TRAFFIC  LAVANYA JOSE, INDEPENDENT WORK FALL 11 ADVISED BY PROF. MOSES CHARIKAR ABSTRACT. Sketches are randomized data structures that allow one to record properties of
More informationOn the Union of Arithmetic Progressions
On the Union of Arithmetic Progressions Shoni Gilboa Rom Pinchasi August, 04 Abstract We show that for any integer n and real ɛ > 0, the union of n arithmetic progressions with pairwise distinct differences,
More informationNotes from Week 1: Algorithms for sequential prediction
CS 683 Learning, Games, and Electronic Markets Spring 2007 Notes from Week 1: Algorithms for sequential prediction Instructor: Robert Kleinberg 2226 Jan 2007 1 Introduction In this course we will be looking
More informationChapter 7. BANDIT PROBLEMS.
Chapter 7. BANDIT PROBLEMS. Bandit problems are problems in the area of sequential selection of experiments, and they are related to stopping rule problems through the theorem of Gittins and Jones (974).
More informationOnline Algorithms: Learning & Optimization with No Regret.
Online Algorithms: Learning & Optimization with No Regret. Daniel Golovin 1 The Setup Optimization: Model the problem (objective, constraints) Pick best decision from a feasible set. Learning: Model the
More informationFactoring & Primality
Factoring & Primality Lecturer: Dimitris Papadopoulos In this lecture we will discuss the problem of integer factorization and primality testing, two problems that have been the focus of a great amount
More informationStochastic Inventory Control
Chapter 3 Stochastic Inventory Control 1 In this chapter, we consider in much greater details certain dynamic inventory control problems of the type already encountered in section 1.3. In addition to the
More informationAn example of a computable
An example of a computable absolutely normal number Verónica Becher Santiago Figueira Abstract The first example of an absolutely normal number was given by Sierpinski in 96, twenty years before the concept
More informationMapReduce and Distributed Data Analysis. Sergei Vassilvitskii Google Research
MapReduce and Distributed Data Analysis Google Research 1 Dealing With Massive Data 2 2 Dealing With Massive Data Polynomial Memory Sublinear RAM Sketches External Memory Property Testing 3 3 Dealing With
More informationOnline and Offline Selling in Limit Order Markets
Online and Offline Selling in Limit Order Markets Kevin L. Chang 1 and Aaron Johnson 2 1 Yahoo Inc. klchang@yahooinc.com 2 Yale University ajohnson@cs.yale.edu Abstract. Completely automated electronic
More informationThe degree, size and chromatic index of a uniform hypergraph
The degree, size and chromatic index of a uniform hypergraph Noga Alon Jeong Han Kim Abstract Let H be a kuniform hypergraph in which no two edges share more than t common vertices, and let D denote the
More informationLecture 7: Approximation via Randomized Rounding
Lecture 7: Approximation via Randomized Rounding Often LPs return a fractional solution where the solution x, which is supposed to be in {0, } n, is in [0, ] n instead. There is a generic way of obtaining
More informationTable 1: Summary of the settings and parameters employed by the additive PA algorithm for classification, regression, and uniclass.
Online PassiveAggressive Algorithms Koby Crammer Ofer Dekel Shai ShalevShwartz Yoram Singer School of Computer Science & Engineering The Hebrew University, Jerusalem 91904, Israel {kobics,oferd,shais,singer}@cs.huji.ac.il
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationSimple and efficient online algorithms for real world applications
Simple and efficient online algorithms for real world applications Università degli Studi di Milano Milano, Italy Talk @ Centro de Visión por Computador Something about me PhD in Robotics at LIRALab,
More informationOnline Scheduling for Cloud Computing and Different Service Levels
2012 IEEE 201226th IEEE International 26th International Parallel Parallel and Distributed and Distributed Processing Processing Symposium Symposium Workshops Workshops & PhD Forum Online Scheduling for
More informationDUOL: A Double Updating Approach for Online Learning
: A Double Updating Approach for Online Learning Peilin Zhao School of Comp. Eng. Nanyang Tech. University Singapore 69798 zhao6@ntu.edu.sg Steven C.H. Hoi School of Comp. Eng. Nanyang Tech. University
More informationLoad Balancing with Migration Penalties
Load Balancing with Migration Penalties Vivek F Farias, Ciamac C Moallemi, and Balaji Prabhakar Electrical Engineering, Stanford University, Stanford, CA 9435, USA Emails: {vivekf,ciamac,balaji}@stanfordedu
More informationIntroduction to Online Learning Theory
Introduction to Online Learning Theory Wojciech Kot lowski Institute of Computing Science, Poznań University of Technology IDSS, 04.06.2013 1 / 53 Outline 1 Example: Online (Stochastic) Gradient Descent
More informationarxiv:1112.0829v1 [math.pr] 5 Dec 2011
How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly
More informationThe Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy
BMI Paper The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy Faculty of Sciences VU University Amsterdam De Boelelaan 1081 1081 HV Amsterdam Netherlands Author: R.D.R.
More informationCOS 511: Foundations of Machine Learning. Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 2006
COS 511: Foundations of Machine Learning Rob Schapire Lecture #14 Scribe: Qian Xi March 30, 006 In the previous lecture, we introduced a new learning model, the Online Learning Model. After seeing a concrete
More informationLarge Scale Learning to Rank
Large Scale Learning to Rank D. Sculley Google, Inc. dsculley@google.com Abstract Pairwise learning to rank methods such as RankSVM give good performance, but suffer from the computational burden of optimizing
More informationPlanar Tree Transformation: Results and Counterexample
Planar Tree Transformation: Results and Counterexample Selim G Akl, Kamrul Islam, and Henk Meijer School of Computing, Queen s University Kingston, Ontario, Canada K7L 3N6 Abstract We consider the problem
More information1. R In this and the next section we are going to study the properties of sequences of real numbers.
+a 1. R In this and the next section we are going to study the properties of sequences of real numbers. Definition 1.1. (Sequence) A sequence is a function with domain N. Example 1.2. A sequence of real
More informationLecture 3: Finding integer solutions to systems of linear equations
Lecture 3: Finding integer solutions to systems of linear equations Algorithmic Number Theory (Fall 2014) Rutgers University Swastik Kopparty Scribe: Abhishek Bhrushundi 1 Overview The goal of this lecture
More informationLecture 2: Universality
CS 710: Complexity Theory 1/21/2010 Lecture 2: Universality Instructor: Dieter van Melkebeek Scribe: Tyson Williams In this lecture, we introduce the notion of a universal machine, develop efficient universal
More informationTHE SCHEDULING OF MAINTENANCE SERVICE
THE SCHEDULING OF MAINTENANCE SERVICE Shoshana Anily Celia A. Glass Refael Hassin Abstract We study a discrete problem of scheduling activities of several types under the constraint that at most a single
More informationFixedEffect Versus RandomEffects Models
CHAPTER 13 FixedEffect Versus RandomEffects Models Introduction Definition of a summary effect Estimating the summary effect Extreme effect size in a large study or a small study Confidence interval
More informationThe Goldberg Rao Algorithm for the Maximum Flow Problem
The Goldberg Rao Algorithm for the Maximum Flow Problem COS 528 class notes October 18, 2006 Scribe: Dávid Papp Main idea: use of the blocking flow paradigm to achieve essentially O(min{m 2/3, n 1/2 }
More informationprinceton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora
princeton univ. F 13 cos 521: Advanced Algorithm Design Lecture 6: Provable Approximation via Linear Programming Lecturer: Sanjeev Arora Scribe: One of the running themes in this course is the notion of
More informationOnline Learning with Composite Loss Functions
JMLR: Workshop and Conference Proceedings vol 35:1 18, 2014 Online Learning with Composite Loss Functions Ofer Dekel Microsoft Research Jian Ding University of Chicago Tomer Koren Technion Yuval Peres
More informationLOGNORMAL MODEL FOR STOCK PRICES
LOGNORMAL MODEL FOR STOCK PRICES MICHAEL J. SHARPE MATHEMATICS DEPARTMENT, UCSD 1. INTRODUCTION What follows is a simple but important model that will be the basis for a later study of stock prices as
More informationEstimation Bias in MultiArmed Bandit Algorithms for Search Advertising
Estimation Bias in MultiArmed Bandit Algorithms for Search Advertising Min Xu Machine Learning Department Carnegie Mellon University minx@cs.cmu.edu ao Qin Microsoft Research Asia taoqin@microsoft.com
More informationOnline Convex Optimization
E0 370 Statistical Learning heory Lecture 19 Oct 22, 2013 Online Convex Optimization Lecturer: Shivani Agarwal Scribe: Aadirupa 1 Introduction In this lecture we shall look at a fairly general setting
More informationOn an antiramsey type result
On an antiramsey type result Noga Alon, Hanno Lefmann and Vojtĕch Rödl Abstract We consider antiramsey type results. For a given coloring of the kelement subsets of an nelement set X, where two kelement
More informationContextualBandit Approach to Recommendation Konstantin Knauf
ContextualBandit Approach to Recommendation Konstantin Knauf 22. Januar 2014 Prof. Ulf Brefeld Knowledge Mining & Assesment 1 Agenda Problem Scenario Scenario Multiarmed Bandit Model for Online Recommendation
More informationBig Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network
, pp.273284 http://dx.doi.org/10.14257/ijdta.2015.8.5.24 Big Data Analytics of MultiRelationship Online Social Network Based on MultiSubnet Composited Complex Network Gengxin Sun 1, Sheng Bin 2 and
More informationThe Online Set Cover Problem
The Online Set Cover Problem Noga Alon Baruch Awerbuch Yossi Azar Niv Buchbinder Joseph Seffi Naor ABSTRACT Let X = {, 2,..., n} be a ground set of n elements, and let S be a family of subsets of X, S
More information11 Ideals. 11.1 Revisiting Z
11 Ideals The presentation here is somewhat different than the text. In particular, the sections do not match up. We have seen issues with the failure of unique factorization already, e.g., Z[ 5] = O Q(
More informationLecture 3: Linear Programming Relaxations and Rounding
Lecture 3: Linear Programming Relaxations and Rounding 1 Approximation Algorithms and Linear Relaxations For the time being, suppose we have a minimization problem. Many times, the problem at hand can
More information1 Portfolio Selection
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture # Scribe: Nadia Heninger April 8, 008 Portfolio Selection Last time we discussed our model of the stock market N stocks start on day with
More informationFinitetime Analysis of the Multiarmed Bandit Problem*
Machine Learning, 47, 35 56, 00 c 00 Kluwer Academic Publishers. Manufactured in The Netherlands. Finitetime Analysis of the Multiarmed Bandit Problem* PETER AUER University of Technology Graz, A8010
More informationA DriftingGames Analysis for Online Learning and Applications to Boosting
A DriftingGames Analysis for Online Learning and Applications to Boosting Haipeng Luo Department of Computer Science Princeton University Princeton, NJ 08540 haipengl@cs.princeton.edu Robert E. Schapire
More informationPrime Numbers. Chapter Primes and Composites
Chapter 2 Prime Numbers The term factoring or factorization refers to the process of expressing an integer as the product of two or more integers in a nontrivial way, e.g., 42 = 6 7. Prime numbers are
More informationHackingproofness and Stability in a Model of Information Security Networks
Hackingproofness and Stability in a Model of Information Security Networks Sunghoon Hong Preliminary draft, not for citation. March 1, 2008 Abstract We introduce a model of information security networks.
More informationThe Relation between Two Present Value Formulae
James Ciecka, Gary Skoog, and Gerald Martin. 009. The Relation between Two Present Value Formulae. Journal of Legal Economics 15(): pp. 6174. The Relation between Two Present Value Formulae James E. Ciecka,
More informationCPC/CPA Hybrid Bidding in a Second Price Auction
CPC/CPA Hybrid Bidding in a Second Price Auction Benjamin Edelman Hoan Soo Lee Working Paper 09074 Copyright 2008 by Benjamin Edelman and Hoan Soo Lee Working papers are in draft form. This working paper
More informationMean RamseyTurán numbers
Mean RamseyTurán numbers Raphael Yuster Department of Mathematics University of Haifa at Oranim Tivon 36006, Israel Abstract A ρmean coloring of a graph is a coloring of the edges such that the average
More informationIN THIS PAPER, we study the delay and capacity tradeoffs
IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 15, NO. 5, OCTOBER 2007 981 Delay and Capacity TradeOffs in Mobile Ad Hoc Networks: A Global Perspective Gaurav Sharma, Ravi Mazumdar, Fellow, IEEE, and Ness
More informationOn the Traffic Capacity of Cellular Data Networks. 1 Introduction. T. Bonald 1,2, A. Proutière 1,2
On the Traffic Capacity of Cellular Data Networks T. Bonald 1,2, A. Proutière 1,2 1 France Telecom Division R&D, 3840 rue du Général Leclerc, 92794 IssylesMoulineaux, France {thomas.bonald, alexandre.proutiere}@francetelecom.com
More informationAnalysis of Approximation Algorithms for kset Cover using FactorRevealing Linear Programs
Analysis of Approximation Algorithms for kset Cover using FactorRevealing Linear Programs Stavros Athanassopoulos, Ioannis Caragiannis, and Christos Kaklamanis Research Academic Computer Technology Institute
More informationCONSIDER a merchant selling items through ebay
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 6, NO., JANUARY 5 549 Regret Minimization for Reserve Prices in SecondPrice Auctions Nicolò CesaBianchi, Claudio Gentile, and Yishay Mansour Abstract We
More informationA Note on Maximum Independent Sets in Rectangle Intersection Graphs
A Note on Maximum Independent Sets in Rectangle Intersection Graphs Timothy M. Chan School of Computer Science University of Waterloo Waterloo, Ontario N2L 3G1, Canada tmchan@uwaterloo.ca September 12,
More informationTD(0) Leads to Better Policies than Approximate Value Iteration
TD(0) Leads to Better Policies than Approximate Value Iteration Benjamin Van Roy Management Science and Engineering and Electrical Engineering Stanford University Stanford, CA 94305 bvr@stanford.edu Abstract
More informationHow Asymmetry Helps Load Balancing
How Asymmetry Helps oad Balancing Berthold Vöcking nternational Computer Science nstitute Berkeley, CA 947041198 voecking@icsiberkeleyedu Abstract This paper deals with balls and bins processes related
More informationFaster deterministic integer factorisation
David Harvey (joint work with Edgar Costa, NYU) University of New South Wales 25th October 2011 The obvious mathematical breakthrough would be the development of an easy way to factor large prime numbers
More information