On Adaboost and Optimal Betting Strategies



Similar documents
On Adaboost and Optimal Betting Strategies

A Generalization of Sauer s Lemma to Classes of Large-Margin Functions

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 14 10/27/2008 MOMENT GENERATING FUNCTIONS

MSc. Econ: MATHEMATICAL STATISTICS, 1995 MAXIMUM-LIKELIHOOD ESTIMATION

Risk Adjustment for Poker Players

Math , Fall 2012: HW 1 Solutions

10.2 Systems of Linear Equations: Matrices

Firewall Design: Consistency, Completeness, and Compactness

Inverse Trig Functions

Lagrangian and Hamiltonian Mechanics

JON HOLTAN. if P&C Insurance Ltd., Oslo, Norway ABSTRACT

Modelling and Resolving Software Dependencies

Unsteady Flow Visualization by Animating Evenly-Spaced Streamlines

Data Center Power System Reliability Beyond the 9 s: A Practical Approach

Ch 10. Arithmetic Average Options and Asian Opitons

Review Horse Race Gambling and Side Information Dependent horse races and the entropy rate. Gambling. Besma Smida. ES250: Lecture 9.

Optimal Control Policy of a Production and Inventory System for multi-product in Segmented Market

The one-year non-life insurance risk

An intertemporal model of the real exchange rate, stock market, and international debt dynamics: policy simulations

Detecting Possibly Fraudulent or Error-Prone Survey Data Using Benford s Law

A Blame-Based Approach to Generating Proposals for Handling Inconsistency in Software Requirements

A Comparison of Performance Measures for Online Algorithms

MODELLING OF TWO STRATEGIES IN INVENTORY CONTROL SYSTEM WITH RANDOM LEAD TIME AND DEMAND

The Quick Calculus Tutorial

Web Appendices of Selling to Overcon dent Consumers

Option Pricing for Inventory Management and Control

Parameterized Algorithms for d-hitting Set: the Weighted Case Henning Fernau. Univ. Trier, FB 4 Abteilung Informatik Trier, Germany

Differentiability of Exponential Functions

Search Advertising Based Promotion Strategies for Online Retailers

A New Evaluation Measure for Information Retrieval Systems

Measures of distance between samples: Euclidean

2r 1. Definition (Degree Measure). Let G be a r-graph of order n and average degree d. Let S V (G). The degree measure µ(s) of S is defined by,

Cross-Over Analysis Using T-Tests

State of Louisiana Office of Information Technology. Change Management Plan

View Synthesis by Image Mapping and Interpolation

CURRENCY OPTION PRICING II

y or f (x) to determine their nature.

Lecture L25-3D Rigid Body Kinematics

National Sun Yat-Sen University CSE Course: Information Theory. Gambling And Entropy

Mathematics Review for Economists

Minimum-Energy Broadcast in All-Wireless Networks: NP-Completeness and Distribution Issues

Answers to the Practice Problems for Test 2

Calibration of the broad band UV Radiometer

Optimal Energy Commitments with Storage and Intermittent Supply

Pythagorean Triples Over Gaussian Integers

Consumer Referrals. Maria Arbatskaya and Hideo Konishi. October 28, 2014

The Kelly criterion for spread bets

Example Optimization Problems selected from Section 4.7

An Introduction to Event-triggered and Self-triggered Control

Product Differentiation for Software-as-a-Service Providers

Stock Market Value Prediction Using Neural Networks

Exponential Functions: Differentiation and Integration. The Natural Exponential Function

Performance And Analysis Of Risk Assessment Methodologies In Information Security

Gambling and Data Compression

NEAR-FIELD TO FAR-FIELD TRANSFORMATION WITH PLANAR SPIRAL SCANNING

Introduction to Integration Part 1: Anti-Differentiation

Part I. Gambling and Information Theory. Information Theory and Networks. Section 1. Horse Racing. Lecture 16: Gambling and Information Theory

A Data Placement Strategy in Scientific Cloud Workflows

How To Find Out How To Calculate Volume Of A Sphere

A New Interpretation of Information Rate

Professional Level Options Module, Paper P4(SGP)

! # % & ( ) +,,),. / % ( 345 6, & & & &&3 6

FAST JOINING AND REPAIRING OF SANDWICH MATERIALS WITH DETACHABLE MECHANICAL CONNECTION TECHNOLOGY

Notes on tangents to parabolas

How To Segmentate An Insurance Customer In An Insurance Business

Reading: Ryden chs. 3 & 4, Shu chs. 15 & 16. For the enthusiasts, Shu chs. 13 & 14.

INFLUENCE OF GPS TECHNOLOGY ON COST CONTROL AND MAINTENANCE OF VEHICLES

Cost Efficient Datacenter Selection for Cloud Services

ThroughputScheduler: Learning to Schedule on Heterogeneous Hadoop Clusters

Introduction to Support Vector Machines. Colin Campbell, Bristol University

Hull, Chapter 11 + Sections 17.1 and 17.2 Additional reference: John Cox and Mark Rubinstein, Options Markets, Chapter 5

Given three vectors A, B, andc. We list three products with formula (A B) C = B(A C) A(B C); A (B C) =B(A C) C(A B);

A NATIONAL MEASUREMENT GOOD PRACTICE GUIDE. No.107. Guide to the calibration and testing of torque transducers

Gambling with Information Theory

Safety Stock or Excess Capacity: Trade-offs under Supply Risk

Definition of the spin current: The angular spin current and its physical consequences

The most common model to support workforce management of telephone call centers is

How to Avoid the Inverse Secant (and Even the Secant Itself)

Chapter 11: Feedback and PID Control Theory

Adaptive Online Gradient Descent

Digital barrier option contract with exponential random time

Supporting Adaptive Workflows in Advanced Application Environments

Sustainability Through the Market: Making Markets Work for Everyone q

RUNESTONE, an International Student Collaboration Project

Lecture 3: Linear methods for classification

Modeling and Predicting Popularity Dynamics via Reinforced Poisson Processes

Heat-And-Mass Transfer Relationship to Determine Shear Stress in Tubular Membrane Systems Ratkovich, Nicolas Rios; Nopens, Ingmar

Calculating Viscous Flow: Velocity Profiles in Rivers and Pipes

Transcription:

On Aaboost an Optimal Betting Strategies Pasquale Malacaria 1 an Fabrizio Smerali 1 1 School of Electronic Engineering an Computer Science, Queen Mary University of Lonon, Lonon, UK Abstract We explore the relation between the Aaboost weight upate proceure an Kelly s theory of betting. Specifically, we show that Aaboost can be seen as an intuitive optimal betting strategy in the sense of Kelly. Technically this is achieve as the solution of the ual of the classical formulation of the Aaboost minimisation problem. Using this equivalence with betting strategies we erive a substantial simplification of Aaboost an of a relate family of boosting algorithms. Besies allowing a natural erivation of the simplification, our approach establishes a connection between Aaboost an Information Theory that we believe may cast a new light on some boosting concepts. Keywors: Ensembles methos, Statistical Methos, Regression/Classification, Aaboost, Information theory, Optimal betting strategies 1. Introuction Suppose you want to bet on a series of races of six horses, on which no information is initially available (the same horses participate in all races. The only information available after each race is in the form of the names of a subset of the losers for that race. This implies that you are not informe about the changes to your capital after each race, so your next bet can only be expresse as a percentage of your (unknown capital. Let us furthermore assume that you are require to bet all of your capital on each race. Initially, as nothing is known, it seems reasonable to sprea one s bets evenly, so that your initial bet will allocate 1/6-th of your capital on each horse. Suppose now that you learn that horses 1,3, 4, 5 an 6 lost the first race. As your previous bet was optimal 1 (w.r.t. your information at the time you want the new bet to be the closest possible to the previous bet, taking into account the outcome of the race. You coul for instance ecie to bet a fraction ρ of your capital on the latest possible" winners (horse 1 an 1 ρ on the latest losers (horses 1,3,4,5,6. This ρ is a measure of how much you believe the result of the last race to be a preictor of the result of the next race. Notice that, as you have to invest all your capital at each time, ρ = 1 (i.e. betting everything on the set of horses 1 Optimal betting is here unerstoo in the information theoretical sense. In probability terms betting all capital on the horse most likely to win will give the highest expecte return, but will make you bankrupt with probability 1 in the long term. Maximizing the return has to be balance with the risk of bankruptcy; see Section 3. containing the previous winner woul be a ba choice: you woul then lose all your capital if the following race is won by any other horse. Table 1 presents an algorithm implementing these ieas: bet t + 1 on horse i is obtaine from the previous bet t by multiplying it by ρ ɛ t (if the horse was a possible winner an by 1 ρ (1 ɛ (if it was a loser; ɛ t t is the total amount bet on the latest possible winners in the previous race t. If you nee to choose a value of ρ before all races, as no information is available, you may well opt for ρ = 0.5. This is a non-committal" choice: you believe that the subset of losers of the each race reveale to you each time will not be a very goo preictor for the result of the next race. In our example, where 1,3,4,5 an 6 lost the first race, choosing ρ = 0.5 will lea to the bets (0.1, 0.5, 0.1, 0.1, 0.1, 0.1 on the secon race. Table 2 shows the bets generate by applying the betting algorithm with ρ = 0.5 to a series of 4 races. In Section 4 we show that the Aaboost weight upate cycle is equivalent to this betting strategy with ρ = 0.5, moulo the trivial ientification of the horses with training examples an of the possible winners with misclassifie training ata: this equivalence hols uner an assumption of minimal luck, assumption which we will make more precise in section 5. As will be shown, this erives straightforwarly from a irect solution of the ual of the Aaboost minimisation problem. This result, that is the main contribution of our work, establishes the relation between Kelly s theory of optimal betting an Aaboost sketche in Figure 1. It also leas irectly to the simplifie formulation of Aaboost shown in Table 4. The rest of this paper is organise as follows: Section 2 gives a brief introuction to the Aaboost algorithm; in Section 3 we outline the theory of optimal betting unerlying the strategy isplaye in Table 1. The equivalence between this betting strategy an Aaboost is prove in Section 4. In Section 5 we procee in the opposite irection, an show how convergence theorems for Aaboost can be use to investigate the asymptotic behaviour of our betting strategy. Finally in Section 6 we generalize the corresponence between betting strategies an boosting an show that in the case of arbitrary choice of ρ we recover a known variant of Aaboost [1]. 2. Backgroun on Aaboost Boosting concerns itself with the problem of combining several preiction rules, with poor iniviual performance,

Table 1: The betting algorithm Betting Algorithm, ρ is the fraction of the capital allocate to the latest possible winners: Choose the initial bets b 1,i = 1 m, where m is the number of horses; For race number t: 1 Let W be the set of possible winners in the previous race. Compute ɛ t = i W b t,i; 2 Upate the bets for race t + 1: b t+1,i = ρ ɛ t b t,i if i W, b t+1,i = 1 ρ (1 ɛ b t t,i otherwise. into a highly accurate preictor. This is commonly referre to as combining a set of weak learners into a strong classifier. Aaboost, introuce by Yoav Freun an Robert Schapire [2], iffers from previous algorithms in that it oes not require previous knowlege of the performance of the weak hypotheses, but it rather aapts accoringly. This aaptation is achieve by maintaining a istribution of weights over the elements of the training set. Aaboost is structure as an iterative algorithm, that works by maintaining a istribution of weights over the training examples. At each iteration a new weak learner (classifier is traine, an the istribution of weights is upate accoring to the examples it misclassifies. The iea is to increase the importance of the training examples that are ifficult to classify. The weights also etermine the contribution of the each weak learner to the final strong classifier. In the following, we will inicate the m training examples as {(x 1, y 1,..., (x m, y m }, where the x i are vectors in a feature space X an the y i { 1, +1} are the respective class labels. Given a family of weak learners H = {h t : X { 1, +1}}, it is convenient to efine u t,i = y i h t (x i, so that u t,i is +1 if h t classifies x i correctly, an 1 otherwise. If we now let t,i be the istribution of weights at time t, with i t,i = 1, the cycle of iterations can be escribe as in Table 3. Each iteration of the Aaboost algorithm solves the following minimisation problem [3], [4]: where α t = arg min α log Z t (α, (1 Z t (α = m t,i exp( αu t,i (2 i=1 is the partition function that appears in point 3 of Table 3. In other wors, the choice of the upate weights t+1,i is such that α t will minimise Z t. 3. The gambler s problem: optimal betting strategies Before investigating the formal relation between Aaboost an betting strategies, we review in this Section the basics of the theory of optimal betting, that motivate the heuristic reasoning of Section 1. Our presentation is a summary of the ieas in Chapter 6.1 of [5], base on Kelly s theory of gambling [6]. Let us consier the problem of etermining the optimal betting strategy for a race run by m horses x 1,..., x m. For the i-th horse, let p i, o i represents respectively the probability of x i winning an the os, intene as o i -forone (e.g., at 3-for-1 the gambler will get 3 times his bet in case of victory. Assuming that the gambler always bets all his wealth, let b i be the fraction that is bet on horse x i. Then if horse X wins the race, the gambler s capital grows by the wealth relative S(X = b(xo(x. Assuming all races are inepenent, the gambler s wealth after n races is therefore S n = Π n j=1 S(X j. It is at this point convenient to introuce the oubling rate m W (b, p = E(log S(X = p i log(b i o i. (3 i=1 (in this information theoretical context, the logarithm shoul be unerstoo to be in base 2. From the weak law of large numbers an the efinition of S n, we have that 1 n log S n = 1 n n log S(X j E(log S(X (4 j=1 in probability, which allows us to express the gambler s wealth in terms of the oubling rate: S n = 2 n 1 n log Sn. = 2 ne(log S(X = 2 nw (p,b (5 where by =. we inicate equality to the first orer in the exponent. As can be shown the optimal betting scheme, i.e. the choice of the b i that maximises W (p, b, is given by Table 2: Bets b i generate using the betting algorithm (Table 1 with ρ = 1/2 by information O i over four races (w enotes a possible winner. x 1 x 2 x 3 x 4 x 5 x 6 b 0 0.1666 0.1666 0.1666 0.1666 0.1666 0.1666 O 0 l w l l l l b 1 0.1 0.5 0.1 0.1 0.1 0.1 O 1 w l l l w l b 2 0.25 0.3125 0.0625 0.0625 0.25 0.0625 O 2 l l w w l l b 3 0.1428 0.1785 0.25 0.25 0.1428 0.0357 O 3 l w l l l l b 4 0.0869 0.5 0.1521 0.1521 0.0869 0.0217

Figure 1: Relation between Kelly s theory of gambling an the stanar an ual formulations of Aaboost. The Aaboost algorithm: Table 3: The stanar formulation of the Aaboost algorithm Initialise 1,i = 1 m. 1 Select the weak learner h t that minimises the training error ɛ t, weighte by current istribution of weights t,i : ɛ t = i u t,i= 1 t,i. Check that ɛ t < 0.5. 2 Set α t = 1 1 ɛt 2log ɛ t 3 Upate the weights accoring to t+1,i = 1 Z t(α t t,i exp( α t u t,i, where Z t (α t = m i=1 t,i exp( α t u t,i is a normalisation factor which ensures that the t+1,i will be a istribution. Output the final ( classifier: T H(x = sign t=1 α th t (x b = p, that is to say proportional betting. Note that this is inepenent of the os o i. Incientally, it is crucial to this concept of optimality that (4 is true with probability one in the limit of large n. Consier for example the case of 3 horses that win with probability 1/2, 1/4 an 1/4 respectively an that are given at the same os (say 3. In this case, it woul be tempting to bet all our capital on the 1st horse, as this woul lea to the highest possible expecte gain of 3 (1/2 = 3/2. However, it is easy to see that such a strategy woul eventually make us bankrupt with probability one, as the probability that horse 1 wins the first n races is vanishingly small for large n, so that the expecte value is not a useful criterion to guie our betting. Equation 4, instea, assures us that the capital will asymptotically grow like 2 nw. For the proportional betting strategy, in this case, W = (1/2 log(3 1/2+(1/4 log(3 1/4 + (1/4 log(3 1/6 0.08, so that the capital will grow to infinity with probability one [5]. In the special case of fair os with o i = 1 p i (fair os are efine as os such that 1/o i = 1, the oubling rate becomes W (p, b = ( bi p i log = D(p b/ log e 2 (6 p i where D(p b is the Kullback-Leibler ivergence (also known as the relative entropy of the probability of victory p an the betting strategy b. Note that, since the os are fair, the gambler can at best conserve his wealth, that is max b W (p, b = 0. In this case, it follows irectly from the properties of the Kullback- Leibler ivergence that the maximum occurs for b i = p i, i.e. that the optimal strategy is proportional betting.

In the following, for convenience, we will refer to Equation 6 as if the base of the logarithm were e instea than 2. This moification is inessential, an for the sake of convenience we will continue to refer to the new quantity as the oubling rate. 4. Equivalence of Aaboost an betting strategy We are now in a position to prove the equivalence of the betting algorithm introuce in Section 1 with the Aaboost weight upate strategy, points 2 an 3 in Table 3. Moulo the ientification of the horses with training examples an of the possible winners of each race with misclassifie examples, this leas to the simplification of Aaboost shown in Table 4. Figure 1 illustrates schematically the relationships between these algorithms. Our proof is base on the uality relations iscusse in a number of excellent papers that cast Aaboost in terms of the minimisation of Kullback-Leibler or Bregman ivergences [7], [8], [3]. In these, the Aaboost weight upate equations are shown to solve the following problem: t+1 = arg min D( t subject to u t = 0. (7 In information theoretical terms, this can be interprete as the Minimum Discrimination Information principle applie to the estimation of the istribution t+1 given the initial estimate t an the linear constraint etermine by the statistics u t [9]. The key observation is now that the Kullback-Leibler ivergence of the i an the t,i, D( t = i log i t,i, (8 bears a strong formal analogy to Equation 6 above. For the sake of reaability, we iscar the inex t an simply write i for t,i an i for t+1,i. The Aaboost ual problem Equation 7 then becomes = arg min D( subject to u = 0. (9 The constraint u = 0 can be rewritten as i = i = 1/2. (10 i u i=+1 With these notations, we are reay to prove the following Theorem 1: The simplifie weight upate strategy in table 4 leas to the same weights as the original algorithm in Table 3. Proof: we nee to show that the weight upate equations in table 4, that are the formal transcription of the betting strategy in Table 2, actually solve the optimisation problem Equation 9. The constraint Equation 10 allows us to split the objective function in Equation 9 into the sum of two terms: D( = = i log i + i i u i=+1 = D (, + D + (,, i log i i (11 each of which can be maximise separately. Notice that neither of these terms taken separately represents a Kullback-Leibler ivergence, as the partial sums over the i an i are not over istributions. However, this is easily remeie by normalising these sub-sequences separately. To this purpose, we introuce the Aaboost weighte error ɛ t (see Table 3, point 1, that omitting the t inex is equal to ɛ = i. The first term of the summation can then be rewritten as D (, = = 1 2 i log 2 i 2 i /ɛ 1 log 2ɛ 2 = 1 2 D (2 /ɛ 1 log 2ɛ, 2 (12 where by D we mean the Kullback-Leibler ivergence compute over the subset of inexes {i u i = 1}. The minimum point is now immeiately evient: 2 i = i/ɛ or, in the notation of Table 4, t+1,i = 1 2ɛ t t,i if u t,i = 1. (13 Similarly, the minimization of D + leas to 2 i = i/(1 ɛ, or t+1,i = 1 2(1 ɛ t t,i if u t,i = +1. (14 These are precisely the upate equations given in Table 4, which proves the theorem. We have thus shown that the simplifie algorithm in Table 4 is completely equivalent to the original Aaboost algorithm in Table 3. Inee, the two algorithms generate step by step the same sequence of weights, as well as the same ecision function. A similar simplification of Aaboost, although not so wiely use, is alreay known to researchers in the fiel (see for instance [10]. However, to the best of our knowlege it has always been erive in a purely computational way by algebraic manipulation of the equations in Table 3. We believe that our erivation as a irect solution of the ual problem Equation 7 is both more straightforwar an more illuminating, in that it brings out the irect corresponence with betting strategies represente in Figure 1. Finally, we note that a formally analogous weight upate strategy has been employe in online variants of the algorithm [11], [12].

Table 4: Applying the proportional betting strategy to the solution of the Aaboost ual problem leas to a simpler formulation of the algorithm, equivalent to the original Aaboost (simplifie: Initialise 1,i = 1 m. 1 Select the weak learner h t that minimises the training error ɛ t, weighte by current istribution of weights t,i : ɛ t = i u t,i= 1 t,i. Check that ɛ t < 0.5. 2 Upate the weights: t+1,i = 1 2ɛ t t,i if u t,i = 1, t+1,i = 1 2(1 ɛ t t,i otherwise. Output the final ( classifier: T H(x = sign t=1 α th t (x, with α t = 1 2 log 1 ɛt ɛ t 5. Aaboost as a betting strategy The consierations above suggest an interpretation of the betting algorithm in Table 1 as an iterative ensity estimation proceure in which an initial estimate, represente by the current bets b, is refine using the information on the previous race. The upate estimate is then use to place new bets (in the notations of Table 1, b t+1,i = p i, in accorance to the proportional betting strategy. This interpretation is supporte by convergence results for the Aaboost weight upate cycle, in either its stanar form (Table 3 or its simplifie form (Table 4. The application to our betting scenario with partial information is straightforwar, moulo the the simple formal ientifications b i = t,i an p i = t+1,i. To prove this, we will first state the following convergence theorem, which has been prove in [7], [8], [3]: Theorem 2: If the system of the linear constraints has non-trivial solutions, i.e. there is some such that u t = 0 t, then the Aaboost weight istribution converges to the istribution satisfying all the linear constraints for which D ( 1/m is minimal (here by 1 we inicate the vector of all ones. In other wors, the weight upate algorithm converges to a probability ensity that is, among all those that satisfy all the constraints, the closest to the original guess of a uniform istribution. This approximation property is best expresse by the following Pythagorean theorem for Kullback-Leibler ivergences, that was erive in [13] in the context of an application to iterative ensity estimation (a generalisation to Bregman ivergences can be foun in [8]. For all solutions of the system of linear constraints, we have D ( 1/m = D ( + D ( 1/m. (15 In terms of our gambling problem, theorem 2 above means that the betting algorithm in Table 1 converges to an estimate p = b of the probability of winning of each single horse. Among all the ensities that satisfy the constraints impose by the information on the previous races, p is the one that woul maximise the oubling rate of the initial uniform bet, W (p, 1/m (strictly speaking, theorem 2 proves this result only for the case that ρ = 1/2; however, generalisation to the case of a generic ρ is straightforwar, see [7]. However, for convergence to occur the information u t on the various races must be provie in the orer etermine by Point 1 in either Table 3 or Table 4, corresponing to the Aaboost requirement that the weak learner shoul minimise the error with respect to the weights. In the betting case, of course, this appears problematic as the gambler has no influence on the orer in which the results occur. The corresponing assumption corresponing to the minimal error requirement woul be, in a sense, one of minimal luck : that is, once we ecie on a betting strategy, the next race is the one for which the set of the possible winners has, accoring to our estimate, the lowest total probability. When the conitions for convergence are met, the Pythagorean equality Equation 15 quantifies the avantage of the estimate probability p over any other ensity p compatible with the constraint, if the initial uniform istribution (or inee any other ensity obtaine in an intermeiate iteration are use for betting: W (p, 1/m = W (p, 1/m D (p p. (16 6. Extension of the simplifie approach to Aaboost-ρ In the following, we generalize the simplifie approach outline in Table 4. In terms of the betting strategy escribe in the introuction we are now consiering a gambler using a value of ρ ifferent from 1/2. For simplicity we will consier 0 < ρ 1/2; it is an easy observation that the case 1 > ρ 1/2 is obtaine by symmetry. This generalise constraint prouces an algorithm known as Aaboost-ρ, that has been introuce in [1]. This variant of Aaboost has an intuitive geometric an statistical interpretation in terms of hanling a egree of correlation between the new weights an the output of the weak learner. It also is an ieal testbe to show how our simplifie approach can lea to

straightforwar implementations of generalise versions of Aaboost. The question of whether an extension of our approach to the entire class of algorithms escribe in [7] is feasible is left for future work. Similarly to what has been one for Aaboost, we irectly solve the generalise ual optimisation problem t+1 = arg min D( t subject to u t = 1 2ρ. (17 where 0 < ρ 1/2. The solution to this problem will provie the simplifie weight upate rules (that this is inee the ual of the Aaboost-ρ problem will be clarifie in Section 6.1. Proceeing as in Section 4, we simplify the notation by omitting the inex t in t,i. Equation (17 implies the constraints i = ρ, i = 1 ρ, (18 i u i=+1 The objective function Equation 17 can again be ecompose as one in Equation 11: D(, ρ = D (, ρ + D + (, ρ, (19 where D an D + account for the misclassifie an the correctly classifie training examples respectively. Writing out the right han sie explicitly yiels D (, ρ = ρ i ρ log i /ρ i /ɛ ρ log ɛ ρ = ( = ρd ρ ρ log ɛ ɛ ρ an similarly ( D + (, ρ = (1 ρd + 1 ρ (1 ρ log 1 ɛ 1 ɛ 1 ρ where again D an D + stan for the Kullback- Leibler ivergences calculate over the subsets of inexes {i u i = 1}. It follows straightforwarly that the minimum is achieve for i = ρ ɛ i if u i = 1, an i = 1 ρ 1 ɛ i if u i = +1. These effectively are the weight upate rules for the generalise algorithm, as shown in Table 5. 6.1 Stanar solution of the Aaboost-ρ problem For comparison, we erive here the stanar version of the Aaboost-ρ algorithm in the notations use in this paper (apart for trivial changes of notation, this is the algorithm originally publishe in [1]. We first nee to obtain the objective function for the minimisation problem associate to Aaboost-ρ, of which Equation 17 is the ual. In other wors, we nee to etermine a suitable function Z t ρ (α which generalises the function in Equation 2, that is to say, min D( t = ρ log ɛ u t=0 ρ + (1 ρ log 1 ɛ 1 ρ (20 = min α log Z t ρ (α. (21 Following the construction outline in [7] an essentially setting M ij = u j,i + 2ρ 1 in the Lagrange transform, we fin m Z t ρ (α = t,i exp ( α(u t,i + 2ρ 1, (22 i=1 that is minimise by α t ρ = 1 ( ρ 2 log 1 ρ 1 ɛ t. (23 ɛ t By substituting these values in the equation below it is easy to verify irectly that t+1,i = exp ( α t ρ (u t,i + 2ρ 1 Z t ρ ( αt ρ t,i (24 yiels the same weight upate rules as given in Table 5, an is inee the irect generalisation of the upate rules in the stanar formulation of Aaboost given in Table 3. The complete stanar Aaboost-ρ algorithm is shown in Table 6. Again, we note how the ual solution we propose in Table 5 presents a consierable simplification of the stanar formulation of the algorithm. 7. Conclusions We iscusse the relation between Aaboost an Kelly s theory of gambling. Specifically, we showe how the ual formulation of Aaboost formally correspons to maximising the oubling rate for a gambler s capital, in a partial information situation. The applicable moification of the optimal proportional betting strategy therefore maps to a irect, intuitive solution of the ual of the Aaboost problem. This leas naturally to a substantial formal simplification of the Aaboost weight upate cycle. Previous erivation of similar simplifications are, to the best of our knowlege, purely computational, an therefore, besies being more cumbersome, lack any meaningful interpretation. The implications of the corresponence between Aaboost an gambling theory works in both irections; as we have shown, it is for instance possible to use convergence results obtaine for Aaboost for investigating the asymptotic behaviour of the betting strategy we introuce in Table 1. We believe that a further investigation of the relation between Aaboost an Kelly s theory, or inee information theory in general, may lea to a eeper insight into some boosting concepts, an possibly a cross-fertilization between these two omains.

Aaboost-ρ simplifie: Initialise 1,i = 1 m. 1 Select the weak learner h t that minimises the training error ɛ t, weighte by current istribution of weights t,i : ɛ t = i u t,i= 1 t,i. Check that ɛ t < ρ. 2 Upate the weights: t+1,i = ρ ɛ t t,i if u t,i = 1, t+1,i = 1 ρ 1 ɛ t t,i otherwise. Output the final classifier: H ρ (x = sign ( T t=1 α t ρh t (x, with α t ρ = 1 2 log ( ρ 1 ρ 1 ɛ t ɛ t Table 5: Aaboost-ρ generalises the Aaboost weight upate constraint to the case u = 1 2ρ, with 0 < ρ 1/2 The Aaboost-ρ algorithm: Initialise 1,i = 1 m. 1 Select the weak learner h t that minimises the training error ɛ t, weighte by current istribution of weights t,i : ɛ t = i u t,i= 1 t,i. Check that ɛ t < ρ. ( 2 Set α t ρ = 1 2 log ρ 1 ɛ t 1 ρ ɛ t. 3 Upate the weights accoring to t+1,i = exp( α t ρ(u t,i+2ρ 1 Z t ρ(α t ρ t,i, where Z t ρ (α t ρ = m i=1 t,i exp ( α t ρ (u t,i + 2ρ 1, is a normalisation factor which ensures that the t+1,i will be a istribution. Output the final ( classifier: T H ρ (x = sign t=1 α t ρh t (x Table 6: The stanar formulation of the Aaboost-ρ algorithm References [1] G. Rätsch, T. Onoa, an K.-R. Muller, Soft margins for aaboost, Machine Learning, vol. 42, no. 3, pp. 287 320, 2001. [Online]. Available: citeseer.ist.psu.eu/657521.html [2] Y. Freun an R. Schapire, A ecision-theoretic generalization of online learning an an application to boosting, Journal of Computer an System Science, vol. 55, no. 1, 1997. [3] M. W. J. Kivinen, Boosting as entropy projection, in Proceeings of COLT 99. ACM, 1999. [4] R. Schapire an Y. Singer, Improve boosting algorithms using confience-rate preictions, Machine Learning, vol. 37, no. 3, 1999. [5] T. M. Cover an J. A. Thomas, Elements of Information Theory. Wiley Interscience, 1991. [6] J. J. L. Kelly, A new interpretation of information rate, Bell System Technical Journal, vol. 35, pp. 917 926, July 1956. [7] M. Collins, R. E. Schapire, an Y. Singer, Logistic regression, aaboost an bregman istances, in Proceeings of the Thirteenth Annual Conference on Computational Learning Theory, 2000. [8] S. D. Pietra, V. D. Pietra, an H. Lafferty, Inucing features of ranom fiel, IEEE Transactions Pattern Analysis an Machine Intelligence, vol. 19, no. 4, April 1997. [9] S. Kullback, Information Theory an Statistics. Wiley, 1959. [10] C. Ruin, R. E. Schapire, an I. Daubechies, On the ynamics of boosting, in Avances in Neural Information Processing Systems, vol. 16, 2004. [11] N. C. Oza an S. Russell, Online bagging an boosting, in Proceeings of the IEEE International Conference on Systems, Man an Cybernetics, vol. 3, 2005, pp. 2340 2345. [12] H. Grabner an H. Bischof, On-line boosting an vision, in Proceeings of CVPR, vol. 1, 2006, pp. 260 267. [13] S. Kullback, Probability ensities with given marginals, The Annals of Mathematical Statistics, vol. 39, no. 4, 1968.