Totally Corrective Boosting Algorithms that Maximize the Margin

Size: px
Start display at page:

Download "Totally Corrective Boosting Algorithms that Maximize the Margin"

Transcription

1 Mafred K. Warmuth Ju Liao Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Friedrich Miescher Laboratory of the Max Plack Society, Spemastr. 39, 7076 Tübige, Germay Boostig, Margis, Covergece, Relative Etropy, Bregma Divergeces, Bregma Projectio Abstract We cosider boostig algorithms that maitai a distributio over a set of examples. At each iteratio a weak hypothesis is received ad the distributio is updated. We motivate these updates as miimizig the relative etropy subject to liear costraits. For example AdaBoost costrais the edge of the last hypothesis w.r.t. the updated distributio to be at most γ = 0. I some sese, AdaBoost is corrective w.r.t. the last hypothesis. A cleaer boostig method is to be totally corrective : the edges of all past hypotheses are costraied to be at most γ, where γ is suitably adapted. Usig ew techiques, we prove the same iteratio bouds for the totally corrective algorithms as for their corrective versios. Moreover with adaptive γ, the algorithms provably maximizes the margi. Experimetally, the totally corrective versios retur smaller covex combiatios of weak hypotheses tha the corrective oes ad are competitive with LPBoost, a totally corrective boostig algorithm with o regularizatio, for which there is o iteratio boud kow. The first author was partially fuded by the NSF grat CCR The first two authors were partially fuded by UC Discovery grat ITl ad Telik Ic. grat ITl Part of this work was doe while the third author was visitig UC Sata Cruz. The authors would thak Telik Ic. for providig the COX-1 dataset. Appearig i Proceedigs of the 3 rd Iteratioal Coferece o Machie Learig, Pittsburgh, PA, 006. Copyright 006 by the author(s)/ower(s). 1. Itroductio I this paper we characterize boostig algorithms by the uderlyig optimizatio problems rather tha the approximatio algorithms that solve these problems. The goal is to select a small covex combiatio of weak hypotheses that maximize the margi. For lack of space we oly compare the algorithms i terms of this goal rather tha the geeralizatio error ad refer to (Schapire et al., 1998) for geeralizatio bouds that improve with the margi ad degrade with the size of the fial covex combiatio. Oe of the most commo boostig algorithms is Ada- Boost (Freud & Schapire, 1997; Schapire & Siger, 1999). It ca be viewed as miimizig the relative etropy to the last distributio subject to the costrait that the edge of the last hypothesis is zero (equivaletly its weighted error is half) (Kivie & Warmuth, 1999; Lafferty, 1999). Oe of the importat properties of AdaBoost is that it has a decet iteratio boud ad approximately maximizes the margi of the examples (Breima, 1997; Rätsch et al., 001; Rudi et al., 004a). A similar algorithm called AdaBoost provably maximizes the margi ad has a aalogous iteratio boud (Rätsch & Warmuth, 005). 1 This algorithm eforces oly a sigle costrait at iteratio t: the edge of the hypothesis must be at most γ, where γ is adapted. A atural idea is to costrai the edges of all t past hypotheses to be at most γ ad otherwise miimize the relative etropy to the iitial distributio. Such algorithms were proposed by Kivie ad Warmuth (1999) ad are called totally corrective. However, i that paper oly γ = 0 was cosidered, which leads to 1 Other algorithms for maximizig the margi with weaker iteratio bouds are give i (Breima, 1999; Rudi et al., 004a).

2 a ifeasible optimizatio problem whe the traiig data is separable. Buildig o the work of Rätsch ad Warmuth (005), we ow adapt the edge boud γ of the totally corrective algorithm so that the margi is approximately maximized. We call our ew algorithm TotalBoost. The corrective AdaBoost ca be used as a heuristic for implemetig TotalBoost by doig may passes over all past hypotheses before addig a ew oe. However, we ca show that this heuristic is ofte several orders of magitude less efficiet tha a vailla sequetial quadratic optimizatio approach for solvig the optimizatio problem uderlyig TotalBoost. A parallel progressio occurred for o-lie learig algorithms of disjuctios. The origial algorithms (variats of the Wiow algorithm (Littlestoe, 1988)) ca be see as processig a sigle costrait iduced by the last example. However, more recetly a olie algorithm has bee developed for learig disjuctios (i the oise-free case) that eforces the costraits iduced by all past examples (Log & Wu, 005). The proof techiques i both settigs are essetially the same except that for disjuctios the margi/threshold is fixed whereas i boostig we optimize the margi. Besides emphasizig the ew proof methods for iteratio bouds of boostig algorithms, this paper also does a experimetal compariso of the algorithms. We show that while TotalBoost has the same iteratio boud as AdaBoost, it ofte requires several orders of magitudes fewer iteratios. Whe there are may similar weak hypotheses, the totally corrective algorithms has a additioal advatage: assume we have 100 groups of 100 weak hypotheses each, where the hypotheses withi each group are very similar. TotalBoost picks a small umber of hypotheses from each group, whereas the algorithms that process oe costrait at a time ofte come back to the same group ad choose may more members from the same group. Therefore i our experimets the umber of weak hypotheses i the fial covex combiatio (with o-zero coefficiets) is cosistetly much smaller for the totally corrective algorithms, makig them better suited for the purpose of feature selectio. Perhaps oe of the simplest boostig algorithms is LP- Boost: it is totally corrective, but ulike TotalBoost, it uses o etropic regularizatio. Also, the upper boud γ o the edge is chose to be as small as possible i each iteratio, whereas i TotalBoost it is decreased more moderately. Experimetally, we have idetified cases where TotalBoost requires cosiderably fewer iteratios tha LPBoost, which suggests that either the etropic regularizatio or the moderate choice of γ is helpful for more tha just for provig iteratio bouds.. Prelimiaries Assume we are give N labeled examples (x, y ) 1 N, where the examples are from some domai ad the labels y lie i {±1}. A boostig algorithm combies may weak hypotheses or rules of thumb for the examples to form a covex combiatio of hypotheses with high accuracy. I this paper a boostig algorithm adheres to the followig protocol: it maitais a distributio d t o the examples; i each iteratio t a weak learer provides a weak hypothesis h t ad the distributio d t is updated to d t+1. Ituitively the updated distributio icorporates the iformatio obtaied from h t ad gives high weights to the remaiig hard examples. After iteratig T steps the algorithm stops ad outputs a covex combiatio of the T weak hypotheses it received from the weak learer. We first discuss how we measure the performace of a weak hypothesis h w.r.t. the curret distributio d. If h is ±1 valued, the the error ɛ is the total weight o all the examples that are misclassified. Whe the rage of a hypothesis h is the etire iterval [ 1, +1], the the edge γ h (d) = N =1 d y h(x ) is a more coveiet quatity for measurig the quality of h. This edge is a affie trasformatio of the error for the case whe h has rage ±1: ɛ h (d) = 1 1 γ h(d). Ideally we wat a hypothesis of edge 1 (error 0). O the other had it is ofte easy to produce hypotheses of edge at least 0 (or equivaletly error at most 1 ). We defie the edge of a set of hypotheses as the maximum of the edges. Assumptio o the weak learer: Assume that for ay distributio d o the examples the weak learer returs a hypothesis h with edge γ h (d) at least g. As we will discuss later, the guaratee parameter g might ot be kow to the boostig algorithm. Boostig algorithms produce a covex combiatio of weak hypotheses: f α (x) := T t=1 α th t (x), where h t is the hypothesis added i iteratio t ad α t is its coefficiet. The margi of a give example (x, y ) is defied as y f α (x ). The margi of a set of examples is always the miimum over the examples. Our algorithms always produce a covex combiatio of weak learers of margi at least g, where is a precisio parameter. Also the size of the covex combiatio is at most O( log N ). Note that the higher the guaratee g of the weak learer, the larger the produced margi.

3 Algorithm 1 LPBoost algorithm 1. Iput: S = (x 1, y 1 ),..., (x N, y N ), desired accuracy. Iitialize: d 1 = 1/N for all = 1... N 3. Do for t = 1,..., (a) Trai classifier o {S, d t } ad obtai hypothesis h t : x [ 1, 1] ad let u t = y h t (x ) (b) Calculate the edge γ t of h t : γ t = d t u t, (c) Set γ t = ( mi γ q) q=1,...,t (d) Compute γt as i (1) ad set d t+1 to ay distributio d for which u q d γt, for 1 q t (e) If γt γ t the T = t ad break 4. Output: f α (x) = T t=1 α th t (x), where the coefficiets α t realize margi γ T. How are edges ad margis related? By duality the miimum edge of the examples w.r.t. the hypotheses set H t = {h 1,..., h t } equals the maximum margi: γt :=mi max d γ h(d)=max h H t α mi y f α (x ) := ρ t, (1) where d ad α are N ad t dimesioal probability vectors, respectively. Note that the sequece γ t is odecreasig. It will approach the guaratee g from below. The algorithms will stop as soo as the edges are withi of g (See ext sectio.) The above duality also restricts the rage of the guaratee g that a weak learer ca possible have. Let H be the etire (possibly ifiite) hypothesis set from which the weak learer is choosig. If H is compact (see discussio i Rätsch & Warmuth, 005) the γ := mi d max γ h(d) = max mi y f α (x ) := ρ, h H α where d ad α are probability distributios over the examples ad H, respectively, ad f α (x ) ow sums over H. Clearly g ρ ad for ay o-optimal d, α: max γ h(d) > γ = ρ > mi y f α (x ) =: ρ(α). () h H So eve though there always is a weak hypothesis i H with edge at least ρ, the weak learer is oly guarateed to produce oe of edge at least g ρ. Oe of the most bare-boes boostig algorithms is LP- Boost (Algorithm 1) proposed by Grove ad Schuurmas (1998); Beett et al. (000). It uses liear programmig to costrai the edges of the past t weak hypotheses to be at most γ t, which is as small as possible. No iteratio boud is kow for this algorithm, Algorithm TotalBoost with accuracy param. 1. Iput: S = (x 1, y 1 ),..., (x N, y N ), desired accuracy. Iitialize: d 1 = 1/N for all = 1... N 3. Do for t = 1,... (a) Trai classifier o {S, d t } ad obtai hypothesis h t : x [ 1, 1] ad let u t = y h t (x ) (b) Calculate the edge γ t of h t : γ t = d t u t (c) Set γ t = ( mi q=1,...,t γ q) (d) Update weights: d t+1 = argmi (d, d 1 ) {d P N : d u q bγ t, for 1 q t} (e) If above ifeasible or d t+1 cotais a zero the T = t ad break 4. Output: f α (x) = T t=1 α th t (x), where the coefficiets α t maximize margi over hypotheses set {h 1,..., h T }. Algorithm 3 TotalBoost g with accuracy parameter ad edge guaratee g As TotalBoost but i step 3(c) we use γ t = g. ad also the performace ca very much deped o which LP solver is used (see experimetal sectio). Our algorithms are motivated by the miimum relative etropy priciple of Jayes: amog the solutios satisfyig some liear costraits choose the oe that miimizes a relative etropy to the iitial distributio d 1, where the relative etropy is defied as follows: ( d, d) = d d l e d. Our default iitial distributio is uiform. However, the aalysis works for ay choice of d 1 with o-zero compoets. There are two totally corrective versios of the algorithm: oe that kows the guaratee g of the weak learer ad oe that does ot. The oe that does (called TotalBoost g ; Algorithm 3), simply costrais the edges of the previous hypotheses to be at most g, where is a give precisio parameter. Our mai algorithm, TotalBoost (Algorithm ) does ot kow g. It maitais the estimates γ t = ( mi t ) q=1 γ q ad costrais the edges of the past hypotheses to be at most γ t. The sequece { γ t } t is clearly o-icreasig. By our assumptio γ t g, ad therefore γ t g. 3. Termiatio Guaratees Whe the algorithms break, we eed to guaratee that the margi w.r.t. the curret hypothesis set is at least

4 Algorithm 4 AdaBoost with accuracy parameter As TotalBoost but miimize the divergece to the last distributio w.r.t. a sigle costrait: d t+1 = argmi (d, d t ). {d:d u t bγ t} Let α t be the dual coefficiet of the costrait o the edge of h t used i iteratio t. The algorithm breaks if the margi w.r.t. the curret covex combiatio (i.e. the ormalized α t ) is at least γ t. Algorithm 5 AdaBoost g with accuracy parameter ad guaratee g As AdaBoost but i step 3(c) we use γ t = g. g. TotalBoost g is give g ad costrais the edges of all past hypotheses to be at most g. Whe these become ifeasible, the edge γt w.r.t. the curret hypotheses set is larger tha g. The algorithm also breaks whe the solutio d t+1 of the miimizatio problem lies at the boudary of the simplex (i.e. the distributio has a zero compoet). I this case γt = g, because if γt < g, the all costraits would have slack ad the solutio d that miimizes the divergece (d, d 1 ) would lie i the iterior of the simplex sice d 1 does. Thus wheever the algorithm breaks, we have ρ γt. TotalBoost g outputs a covex combiatio of the hypotheses {h 1,..., h T } that maximizes the margi. By duality, the value ρ t of this margi equals the miimum edge γt ad therefore TotalBoost g is guarateed to output a combied hypothesis of margi larger tha g. The secod algorithm TotalBoost does ot kow the guaratee g of the weak learer. It breaks if its optimizatio problem becomes ifeasible, which happes whe γt > γ t g. The algorithm also breaks whe the solutio d t+1 of the miimizatio problem lies at the boudary of the simplex. I this case, γt = γ t by a argumet similar to the oe used above. Thus wheever the algorithm breaks, we have γt γ t g ad therefore TotalBoost is guarateed to output a hypothesis of margi ρ t = γt g. The termiatio coditio for LPBoost 3 follows a similar argumet: we directly check for γt γ t. The algorithm Adaboost computes the margi usig the ormalized dual coefficiets α t of its costraits ad stops as soo as this margi is at least γ t. Fially, Adaboost g breaks whe the same margi is at least g. For both of these algorithms the curret distri- This secod coditio for breakig is oly added to esure the the dual variables of the optimizatio problem of TotalBoost remai fiite. 3 We use a differet termiatio coditio for LPBoost tha i (Beett et al., 000; Grove & Schuurmas, 1998). butio d t lies i the iterior because the dual coefficiets α t are fiite ad d t d 1 exp( t 1 q=1 α qu q ). 4. Iteratio Boud I the previous sectio we showed that whe the algorithms break, the the output hypothesis has margi at least g. We ow show that TotalBoost must break after T l N iteratios. I each iteratio t, the algorithm updates the distributio that is closest to d 1 ad lies i a certai covex set ad these sets get smaller as t icreases. Here closeess is measured with the relative etropy which is a special Bregma divergece. This closest poit is called a projectio of d 1 to the covex set (d 1 is assumed to lie i the iterior of the simplex). The proof is aalogous to a o-lie mistake boud for learig disjuctios (Log & Wu, 005). It employs the Geeralized Pythagorea Theorem that holds for such projectios w.r.t. ay Bregma divergece (Bregma, 1967, Lemma 1; Herbster & Warmuth, 001, Theorem ). Theorem 1 TotalBoost breaks after at most l N iteratios. Proof Let C t deote the covex set of all poits d R N that satisfy d = 1, d 0 (for 1 N), ad edge costraits d u q γ t, for 1 q t, where u q = y h q (x ). The distributio d t at iteratio t 1 is the projectio of d 1 oto the closed covex set C t 1. Notice that C 0 is the etire simplex ad because γ t ca oly decrease ad a ew costrait is added i trial t, we have C t C t 1. If t T 1, the our termiatio coditio assures that at trial t 1 the set C t 1 has a feasible solutio i the iterior of the simplex. Also d 1 lies i the iterior ad d t+1 C t C t 1. These precoditios assure that at trial t 1 the projectio d t of d 1 oto C t 1 exists ad the Geeralized Pythagorea Theorem for Bregma divergeces ca be applied: (d t+1, d 1 ) (d t, d 1 ) (d t+1, d t ). (3) Sice d t u t = γ t ad d t+1 u t γ t γ t, d t u t d t+1 u t ad because u t [ 1, 1] N, d t+1 d t 1. We ow apply Pisker s iequality: d t+1 d t 1 implies that (d t+1, d t ) >. (4) By summig (3) over the first T 1 trials we obtai (d T, d 1 ) (d 1, d 1 ) > (T 1) } {{ }. 0 Sice the left is at most l N, the boud of the theorem follows.

5 The key requiremet for this proof is that the closed ad covex costrait sets C t used for the projectio at trial t must be o-icreasig. It is therefore easy to see that the iteratio boud also holds for the TotalBoost g algorithm because of our assumptio that γ t g. I the complete paper we prove the same iteratio boud for corrective versio AdaBoost, Adaboost g, ad the variats of TotalBoost where argmi(d, d 1 ) is replaced by argmi(d, d t ). 5. Experimets I this sectio we illustrate the behavior of our ew algorithms TotalBoost ad TotalBoost g, ad compare them with LPBoost ad AdaBoost o three differet datasets: Dataset 1 is a public dataset from Telik Ic. for a drug discovery problem called COX-1: 15 biary labeled examples with a set of 3888 biary features that are complemetatio closed. Dataset is a artificial dataset used i Rudi et al. (004b) for ivestigatig boostig algorithms that maximize the margi: 50 biary labeled examples with 100 biary features. For each origial feature we added 99 similar features by ivertig the feature value of oe radomly chose example (with replacemet). This results i a 10,000 dimesioal feature set of 100 blocks of size 100. Dataset 3 is a series of artificially geerated datasets of 1000 examples with varyig umber of features but roughly costat margi. We first geerated N 1 radom ±1-valued features x 1,..., x N1 ad set the label of the examples as y = sig(x 1 + x + x 3 + x 4 + x 5 ). We the duplicated each features N times, perturbed the features by Gaussia oise with σ = 0.1, ad clipped the feature values so that they lie i the iterval [-1,1]. We cosidered N 1 = 1, 10, 100 ad N = 10, 100, The features of our datasets represet the values of the available weak hypotheses o the examples. I each iteratio of boostig, the base learer simply selects the feature that maximizes the edge w.r.t. the curret distributio d o the examples. This meas that the guaratee g equals the maximum margi ρ. Note that our datasets ad the base learer were chose to exemplify certai properties of the algorithms ad more extesive experimets are still eeded. We first discuss how the etropy miimizatio problems ca be solved efficietly. We the compare the algorithms w.r.t. the umber of iteratios ad the umber of selected hypothesis. Fially we show how LP- Boost is affected by the uderlyig optimizer ad exhibit cases where LPBoost requires cosiderably more iteratios tha TotalBoost Solvig the Etropy Problems We use a vailla sequetial quadratic programmig algorithm (Nocedal & Wright, 000) for solvig our mai optimizatio problem: mi N d : P d=1, d 0, uq d bγ t (1 q t) =1 d log d d 1. We iitially set our approximate solutio to d = d 1 ad iteratively optimize d. Give the curret solutio d satisfies the costraits d = 1 ad d 0, we determie a update δ by solvig the followig problem: mi δ ( N ( =1 1 + log d d 1 ) ) δ + 1 d δ, w.r.t. the costraits d + δ 0, δ = 0, ad u q ( d + δ) γ t (for 1 q t). The estimate d is updated to d d + δ ad we repeat this process util covergece. The algorithms typically coverges i very few steps. Note that the above objective is the d order Taylor approximatio of the relative etropy ( d + δ, d 1 ) at δ = 0. The resultig optimizatio problem is quadratic with a diagoal Hessia ad ca be efficietly solved by off-the-shelf optimizer packages (e.g. ILOG CPLEX). 5.. Number of Iteratios First, we cosider the umber of iteratios eeded util each of the algorithms has achieved a margi of at least ρ. We use dataset 1 ad record the margi of the covex combiatio of hypotheses produced by TotalBoost, LPBoost ad AdaBoost. Additioally, we compute the maximal margi of the curret hypothesis sets i each iteratio. See Figure 1 for details. The default optimizer used for solvig LPs ad QPs is ILOG CPLEX s iterior poit method. It should be oted that AdaBoost eeds cosiderably less computatios per iteratio tha the totally corrective algorithms. I the case where callig the base learer is very cheap, AdaBoost may i some uusual cases require less computatio time tha TotalBoost. However, i our experimets, the umber of iteratios required by AdaBoost to achieve margi at

6 Figure 1: TotalBoost, LPBoost ad AdaBoost o dataset 1 for = 0.03, 0.01, 0.003: We show the margi realized the ormalized dual coefficiets cα t of TotalBoost ad AdaBoost (gree) ad the LP-optimized margi ρ t (1) (blue). Observe that AdaBoost eeds several thousads iteratios ad the umber of iteratios of TotalBoost ad LPBoost are comparable. The margis of TotalBoost ad AdaBoost start growig slowly, i particular whe is small. The margi of TotalBoost g (with guaratee g = ρ ) icreases faster tha LPBoost (ot show). least ρ was 1/10 times the theoretical upper boud log(n)/. TotalBoost typically requires much fewer iteratios, eve though o improved theoretical boud is kow for this algorithm. I our experiece, the iteratio umber of TotalBoost depeds oly slightly o the precisio parameter ad whe γ t is close to ρ, the this algorithm coverges very fast to the maximum margi solutio (LPBoost has a similar behavior). While the algorithms AdaBoost ad TotalBoost provably maximize the margi, they both have the problem of startig too slowly for small. If there is ay good upper boud available for the guaratee g (which here is the optimal margi ρ ), the we ca iitialize γ t with this upper boud ad speed up the startig phase. I particular, whe ρ is kow exactly, the the algorithms AdaBoost g ad TotalBoost g require drastically fewer iteratios ad the latter cosistetly beats LPBoost (ot show). I practical situatios it is ofte easy to obtai a reasoable upper boud for g Number of Hypotheses I this subsectio, we compare how may hypotheses the algorithms eed to achieve a large margi. Note that LPBoost ad TotalBoost oly select a base hypothesis oce: After the first selectio, the distributio d is maitaied such that the edge for that hypothesis is smaller tha γ t ad it is ot selected agai. AdaBoost may select the same hypothesis may times. However, if there are several similar features (as i datasets & 3), the this corrective algorithm ofte selects hypotheses that are similar to previously selected oes ad the umber of weak hypotheses used i the fial covex combiatio is uecessarily large. Hece, TotalBoost ad LPBoost seem better suited for feature selectio, whe small esembles are eeded. I Figure we display the margi vs. the umber of used ad selected hypotheses. The umber of selected hypothesis for LPBoost ad TotalBoost is equal to the umber of iteratios. For these algorithms a previously selected hypothesis ca become iactive (correspodig α = 0). I this case it is ot couted as a used hypothesis. Note that the umber of used hypotheses for LPBoost may deped o the choice of the optimizer (also see discussio below). I the case of AdaBoost, all dual coefficiets α t are o-zero i the fial covex combiatio. (See captio of Figure for more details.) We ca coclude that the totally corrective algorithms eed cosiderable less hypotheses whe there are may redudat hypotheses/features. LPBoost ad TotalBoost differ i the iitial iteratios (depedig o ), but produce combied hypotheses of similar size. I Figure 3 we compare the effect of differet choices of the optimizer for LPBoost. For dataset there is a surprisigly large differece betwee iterior poit ad simplex based methods. The reaso is that the weights computed by the simplex method are ofte sparse ad the chages i the duplicated features are sparse as well (by desig). Hece, it ca easily happe that the base learer is blid o some examples whe selectig the hypotheses. Iterior poit methods fid a solutio i the iterior ad therefore distribute the weights amog the examples. To illustrate that this is the right explaatio, we modify LPBoost such that it first computes γt but the it computes the

7 Figure : TotalBoost, LPBoost ad AdaBoost o dataset for = 0.01: [left & middle] The realized (gree) ad the LP-optimized (blue) margi ρ t (as i Figure 1) vs. the umber of used (active) ad selected (active or iactive) hypotheses i the covex combiatio. We observe that the totally corrective algorithms use cosiderable less hypotheses tha the AdaBoost. If 0.01, the TotalBoost is agai affected by the slow start which leads to a relatively large umber of selected hypotheses i the begiig. [right] The umber of selected hypotheses vs. the umber of selected blocks of hypotheses. AdaBoost ofte chooses additioal hypotheses from previously chose blocks, while LPBoost typically uses oly oe per block ad TotalBoost a few per block. Whe =.1, TotalBoost behaves more like LPBoost (ot show). weights usig the relative etropy miimizatio with γ t = γ t + ɛ (where ɛ = 10 4 ). We call this the regularized LPBoost algorithm. We observe i Figure 3 that the regularizatio cosiderably improves the covergece speed to ρ of the simplex based solver Redudacy i High Dimesios We foud that LPBoost usually performs very well ad is very competitive to TotalBoost i terms of the umber of iteratios. Additioally, it oly eeds to solve liear ad ot etropy miimizatio problems. However, o iteratio boud is kow for LP- Boost that is idepedet of the size of the hypothesis set. We performed a series of experimets with icreasig dimesioality ad compared LPBoost s ad TotalBoost s covergece speed. We foud that i rather high dimesioal cases, LPBoost coverges quite slowly whe features are redudat (see Figure 4 for a example usig dataset 3). I future work, we will ivestigate why LPBoost coverges more slowly i this example ad costruct more extreme datasets that show this. 6. Coclusio We view boostig as a relative etropy projectio method ad obtai our iteratio bouds without boudig the average traiig error i terms of the product of expoetial potetials as is customarily doe i the boostig literature (see e.g. Schapire ad Siger (1999)). I the full paper we will relate our methods to the latter slightly loger proof style. The proof techique based o Bregma projectio ad the Geeralized Pythagorea theorem is very versatile. The iteratio boud of O( log N ) holds for all boostig algorithms that use costraied miimizatio of ay Bregma divergece (.,.) over a domai that cotais the probability simplex for which if d Ct (d, d t ) = Ω( ) ad ( d T, ( 1 N )) = O(log N). For example, the sum of biary etropies has both these properties: if C t := (d,d t ) { }} { ( d l d d t + (1 d ) l 1 d ) 1 d t if (d, d t ) + if C t d: P (1 d, 1 (4) d=1 dt ), } {{ } 0 where the first iequality follows from splittig the if ad droppig oe of the costraits from the costrait set( C t ad 1 deotes the all oe vector. Furthermore, d T 1, ( 1 N )) (l N)+1 ad this leads to a ((l N)+1) iteratio boud of. The corrective versio based o this divergece has bee called LogitBoost (Friedma et al., 000; Duffy & Helmbold, 000). The above reasoig immediately provides O( log N ) iteratio bouds for the totally corrective versios of Log- itboost that maximize the margi. Eve though the theoretical bouds for the LogitBoost variats are essetially the same as the bouds for the stadard relative etropy algorithms discussed i this paper, the LogitBoost variats are margially iferior i practice (ot show). Both the corrective ad totally corrective algorithms for maximizig the margi start rather slowly ad heuristics are eeded for decreasig the edge boud γ t

8 Figure 3: LPBoost with differet optimizers: show is the margi vs. the o. of selected hypotheses. Differet optimizers lead to the selectio of differet hypotheses with varyig maximum margis. Addig a regularizer (see text) sigificatly improves the simplex solutio i some cases. Figure 4: LPBoost vs. TotalBoost o two 100,000 dimesioal datasets. Show is the margis vs. the umber of iteratios: [left] data with 100 duplicated blocks (with clipped Gaussia oise) ad [right] data with idepedet features. For TotalBoost, we depict the realized (gree) ad the LP-optimized (blue) margi. Whe there are lots of duplicated features, the LPBoost stalls after a iitial fast phase, while it performs well i other cases. We did ot observe this behavior for TotalBoost or AdaBoost (ot show). The differece becomes larger whe the block size is icreased. so that this slow start is avoided. For practical oisy applicatios, boostig algorithms are eeded that allow for a bias term ad for soft margis. LPBoost has already bee used this way i Beett et al. (000) but o iteratio bouds are kow for ay versio of LPBoost. We show i the full paper that our methodology still leads to iteratio bouds for boostig algorithms with etropic regularizatio whe a bias term is added. Iteratio bouds for soft margi versios are left as future research. Refereces Beett, K., Demiriz, A., & Shawe-Taylor, J. (000). A colum geeratio algorithm for boostig. Proc. ICML (pp. 65 7). Morga Kaufma. Bregma, L. (1967). The relaxatio method for fidig the commo poit of covex sets ad its applicatio to the solutio of problems i covex programmig. USSR Computatioal Math. ad Math. Physics, 7, Breima, L. (1997). Predictio games ad arcig algorithmstechical Report 504). Statistics Departmet, Uiversity of Califoria at Berkeley. Breima, L. (1999). Predictio games ad arcig algorithms. Neural Computatio, 11, Duffy, N., & Helmbold, D. (000). NIPS 00 (pp ). Potetial boosters? Freud, Y., & Schapire, R. (1997). A decisio-theoretic geeralizatio of o-lie learig ad a applicatio to boostig. J. of Comp. & Sys. Sci., 55, Friedma, J., Hastie, T., & Tibshirai, R. (000). Additive Logistic Regressio: a statistical view of boostig. Aals of Statistics,, Grove, A., & Schuurmas, D. (1998). Boostig i the limit: Maximizig the margi of leared esembles. Proc. 15th Nat. Cof. o Art. It.. Herbster, M., & Warmuth, M. (001). Trackig the best liear predictio. J. Mach. Lear. Res., Kivie, J., & Warmuth, M. (1999). Boostig as etropy projectio. COLT 99. Lafferty, J. (1999). Additive models, boostig, ad iferece for geeralized divergeces. COLT 99 (pp ). Littlestoe, N. (1988). Learig whe irrelevat attributes aboud: A ew liear-threshold algorithm. Machie Learig,, Log, P. M., & Wu, X. (005). Mistake bouds for maximum etropy discrimiatio. NIPS 04 (pp ). Nocedal, J., & Wright, S. (000). Numerical optimizatio. Spriger Series i Op. Res. Spriger. Rätsch, G., Ooda, T., & Müller, K.-R. (001). Soft margis for AdaBoost. Machie Learig, 4, Rätsch, G., & Warmuth, M. K. (005). Efficiet margi maximizig with boostig. J. Mach. Lear. Res., Rudi, C., Daubechies, I., & Schapire, R. (004a). Dyamics of AdaBoost: Cyclic behavior ad covergece of margis. J. Mach. Lear. Res., Rudi, C., Schapire, R., & Daubechies, I. (004b). Aalysis of boostig algoritms usig the smooth margi fuctio: A study of three algorithms. Upublished mauscript. Schapire, R., Freud, Y., Bartlett, P., & Lee, W. (1998). Boostig the margi: A ew explaatio for the effectiveess of votig methods. The Aals of Statistics, 6, Schapire, R., & Siger, Y. (1999). Improved boostig algorithms usig cofidece-rated predictios. Machie Learig, 37,

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff,

NEW HIGH PERFORMANCE COMPUTATIONAL METHODS FOR MORTGAGES AND ANNUITIES. Yuri Shestopaloff, NEW HIGH PERFORMNCE COMPUTTIONL METHODS FOR MORTGGES ND NNUITIES Yuri Shestopaloff, Geerally, mortgage ad auity equatios do ot have aalytical solutios for ukow iterest rate, which has to be foud usig umerical

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows:

Your organization has a Class B IP address of 166.144.0.0 Before you implement subnetting, the Network ID and Host ID are divided as follows: Subettig Subettig is used to subdivide a sigle class of etwork i to multiple smaller etworks. Example: Your orgaizatio has a Class B IP address of 166.144.0.0 Before you implemet subettig, the Network

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature. Itegrated Productio ad Ivetory Cotrol System MRP ad MRP II Framework of Maufacturig System Ivetory cotrol, productio schedulig, capacity plaig ad fiacial ad busiess decisios i a productio system are iterrelated.

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

Optimal Strategies from Random Walks

Optimal Strategies from Random Walks Optimal Strategies from Radom Walks Jacob Aberethy Divisio of Computer Sciece UC Berkeley jake@csberkeleyedu Mafred K Warmuth Departmet of Computer Sciece UC Sata Cruz mafred@cseucscedu Joel Yelli Divisio

More information

1 The Gaussian channel

1 The Gaussian channel ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise.

More information

Linear classifier MAXIMUM ENTROPY. Linear regression. Logistic regression 11/3/11. f 1

Linear classifier MAXIMUM ENTROPY. Linear regression. Logistic regression 11/3/11. f 1 Liear classifier A liear classifier predicts the label based o a weighted, liear combiatio of the features predictio = w 0 + w 1 f 1 + w 2 f 2 +...+ w m f m For two classes, a liear classifier ca be viewed

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

INFINITE SERIES KEITH CONRAD

INFINITE SERIES KEITH CONRAD INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

Tradigms of Astundithi and Toyota

Tradigms of Astundithi and Toyota Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Cutting-Plane Training of Structural SVMs

Cutting-Plane Training of Structural SVMs Cuttig-Plae Traiig of Structural SVMs Thorste Joachims, Thomas Filey, ad Chu-Nam Joh Yu Abstract Discrimiative traiig approaches like structural SVMs have show much promise for buildig highly complex ad

More information

Domain 1: Designing a SQL Server Instance and a Database Solution

Domain 1: Designing a SQL Server Instance and a Database Solution Maual SQL Server 2008 Desig, Optimize ad Maitai (70-450) 1-800-418-6789 Domai 1: Desigig a SQL Server Istace ad a Database Solutio Desigig for CPU, Memory ad Storage Capacity Requiremets Whe desigig a

More information

Finding the circle that best fits a set of points

Finding the circle that best fits a set of points Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Stock Market Trading via Stochastic Network Optimization

Stock Market Trading via Stochastic Network Optimization PROC. IEEE CONFERENCE ON DECISION AND CONTROL (CDC), ATLANTA, GA, DEC. 2010 1 Stock Market Tradig via Stochastic Network Optimizatio Michael J. Neely Uiversity of Souther Califoria http://www-rcf.usc.edu/

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

CHAPTER 3 THE TIME VALUE OF MONEY

CHAPTER 3 THE TIME VALUE OF MONEY CHAPTER 3 THE TIME VALUE OF MONEY OVERVIEW A dollar i the had today is worth more tha a dollar to be received i the future because, if you had it ow, you could ivest that dollar ad ear iterest. Of all

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms

The Power of Free Branching in a General Model of Backtracking and Dynamic Programming Algorithms The Power of Free Brachig i a Geeral Model of Backtrackig ad Dyamic Programmig Algorithms SASHKA DAVIS IDA/Ceter for Computig Scieces Bowie, MD sashka.davis@gmail.com RUSSELL IMPAGLIAZZO Dept. of Computer

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

AP Calculus AB 2006 Scoring Guidelines Form B

AP Calculus AB 2006 Scoring Guidelines Form B AP Calculus AB 6 Scorig Guidelies Form B The College Board: Coectig Studets to College Success The College Board is a ot-for-profit membership associatio whose missio is to coect studets to college success

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric

More information

A Constant-Factor Approximation Algorithm for the Link Building Problem

A Constant-Factor Approximation Algorithm for the Link Building Problem A Costat-Factor Approximatio Algorithm for the Lik Buildig Problem Marti Olse 1, Aastasios Viglas 2, ad Ilia Zvedeiouk 2 1 Ceter for Iovatio ad Busiess Developmet, Istitute of Busiess ad Techology, Aarhus

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY

HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY The Aals of Statistics 2012, Vol. 40, No. 3, 1637 1664 DOI: 10.1214/12-AOS1018 Istitute of Mathematical Statistics, 2012 HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH

More information

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals Overview Estimatig the Value of a Parameter Usig Cofidece Itervals We apply the results about the sample mea the problem of estimatio Estimatio is the process of usig sample data estimate the value of

More information

Reliability Analysis in HPC clusters

Reliability Analysis in HPC clusters Reliability Aalysis i HPC clusters Narasimha Raju, Gottumukkala, Yuda Liu, Chokchai Box Leagsuksu 1, Raja Nassar, Stephe Scott 2 College of Egieerig & Sciece, Louisiaa ech Uiversity Oak Ridge Natioal Lab

More information

Stochastic Online Scheduling with Precedence Constraints

Stochastic Online Scheduling with Precedence Constraints Stochastic Olie Schedulig with Precedece Costraits Nicole Megow Tark Vredeveld July 15, 2008 Abstract We cosider the preemptive ad o-preemptive problems of schedulig obs with precedece costraits o parallel

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

AP Calculus BC 2003 Scoring Guidelines Form B

AP Calculus BC 2003 Scoring Guidelines Form B AP Calculus BC Scorig Guidelies Form B The materials icluded i these files are iteded for use by AP teachers for course ad exam preparatio; permissio for ay other use must be sought from the Advaced Placemet

More information

Trackless online algorithms for the server problem

Trackless online algorithms for the server problem Iformatio Processig Letters 74 (2000) 73 79 Trackless olie algorithms for the server problem Wolfgag W. Bei,LawreceL.Larmore 1 Departmet of Computer Sciece, Uiversity of Nevada, Las Vegas, NV 89154, USA

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information