Consistency of Random Forests and Other Averaging Classifiers

Size: px
Start display at page:

Download "Consistency of Random Forests and Other Averaging Classifiers"

Transcription

1 Joural of Machie Learig Research 9 (2008) Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie Curie Paris VI Boîte 158, 175 rue du Chevaleret Paris, Frace Luc Devroye School of Computer Sciece McGill Uiversity Motreal, Caada H3A 2K6 Gábor Lugosi ICREA ad Departmet of Ecoomics Pompeu Fabra Uiversity Ramo Trias Fargas Barceloa, Spai Editor: Peter Bartlett Abstract I the last years of his life, Leo Breima promoted radom forests for use i classificatio. He suggested usig averagig as a meas of obtaiig good discrimiatio rules. The base classifiers used for averagig are simple ad radomized, ofte based o radom samples from the data. He left a few questios uaswered regardig the cosistecy of such rules. I this paper, we give a umber of theorems that establish the uiversal cosistecy of averagig rules. We also show that some popular classifiers, icludig oe suggested by Breima, are ot uiversally cosistet. Keywords: radom forests, classificatio trees, cosistecy, baggig 1. Itroductio This paper is dedicated to the memory of Leo Breima. Esemble methods, popular i machie learig, are learig algorithms that costruct a set of may idividual classifiers (called base learers) ad combie them to classify ew data poits by takig a weighted or uweighted vote of their predictios. It is ow well-kow that esembles are ofte much more accurate tha the idividual classifiers that make them up. The success of esemble algorithms o may bechmark data sets has raised cosiderable iterest i uderstadig why such methods succeed ad idetifyig circumstaces i which they ca be expected to produce good results. These methods differ i the way the base learer is fit ad combied. For example, baggig (Breima, 1996) proceeds by geeratig bootstrap samples from the origial data set, costructig a classifier from each bootstrap sample, ad votig to combie. I boostig (Freud ad Schapire, 1996) ad arcig algorithms (Breima, 1998) the successive classifiers are costructed by givig icreased weight to those poits that have bee frequetly misclassified, ad the classifiers are combied usig weighted votig. O the other had, radom split selectio (Dietterich, 2000) c 2008 Gérard Biau, Luc Devroye ad Gábor Lugosi.

2 BIAU, DEVROYE AND LUGOSI grows trees o the origial data set. For a fixed umber S, at each ode, S best splits (i terms of miimizig deviace) are foud ad the actual split is radomly ad uiformly selected from them. For a comprehesive review of esemble methods, we refer the reader to Dietterich (2000a) ad the refereces therei. Breima (2001) provides a geeral framework for tree esembles called radom forests. Each tree depeds o the values of a radom vector sampled idepedetly ad with the same distributio for all trees. Thus, a radom forest is a classifier that cosists of may decisio trees ad outputs the class that is the mode of the classes output by idividual trees. Algorithms for iducig a radom forest were first developed by Breima ad Cutler, ad Radom Forests is their trademark. The web page provides a collectio of dowloadable techical reports, ad gives a overview of radom forests as well as commets o the features of the method. Radom forests have bee show to give excellet performace o a umber of practical problems. They work fast, geerally exhibit a substatial performace improvemet over sigle tree classifiers such as CART, ad yield geeralizatio error rates that compare favorably to the best statistical ad machie learig methods. I fact, radom forests are amog the most accurate geeral-purpose classifiers available (see, for example, Breima, 2001). Differet radom forests differ i how radomess is itroduced i the tree buildig process, ragig from extreme radom splittig strategies (Breima, 2000; Cutler ad Zhao, 2001) to more ivolved data-depedet strategies (Amit ad Gema, 1997; Breima, 2001; Dietterich, 2000). As a matter of fact, the statistical mechaism of radom forests is ot yet fully uderstood ad is still uder active ivestigatio. Ulike sigle trees, where cosistecy is proved lettig the umber of observatios i each termial ode become large (Devroye, Györfi, ad Lugosi, 1996, Chapter 20), radom forests are geerally built to have a small umber of cases i each termial ode. Although the mechaism of radom forest algorithms appears simple, it is difficult to aalyze ad remais largely ukow. Some attempts to ivestigate the drivig force behid cosistecy of radom forests are by Breima (2000, 2004) ad Li ad Jeo (2006), who establish a coectio betwee radom forests ad adaptive earest eighbor methods. Meishause (2006) proved cosistecy of certai radom forests i the cotext of so-called quatile regressio. I this paper we offer cosistecy theorems for various versios of radom forests ad other radomized esemble classifiers. I Sectio 2 we itroduce a geeral framework for studyig classifiers based o averagig radomized base classifiers. We prove a simple but useful propositio showig that averaged classifiers are cosistet wheever the base classifiers are. I Sectio 3 we prove cosistecy of two simple radom forest classifiers, the purely radom forest (suggested by Breima as a startig poit for study) ad the scale-ivariat radom forest classifiers. I Sectio 4 it is show that averagig may covert icosistet rules ito cosistet oes. I Sectio 5 we briefly ivestigate cosistecy of baggig rules. We show that, i geeral, baggig preserves cosistecy of the base rule ad it may eve create cosistet rules from icosistet oes. I particular, we show that if the bootstrap samples are sufficietly small, the bagged versio of the 1-earest eighbor classifier is cosistet. 2016

3 CONSISTENCY OF RANDOM FORESTS Fially, i Sectio 6 we cosider radom forest classifiers based o radomized, greedily grow tree classifiers. We argue that some greedy radom forest classifiers, icludig Breima s radom forest classifier, are icosistet ad suggest a cosistet greedy radom forest classifier. 2. Votig ad Averaged Classifiers Let (X,Y ),(X 1,Y 1 ),...,(X,Y ) be i.i.d. pairs of radom variables such that X (the so-called feature vector) takes its values i R d while Y (the label) is a biary {0,1}-valued radom variable. The joit distributio of (X,Y ) is determied by the margial distributio µ of X (i.e., P{X A} = µ(a) for all Borel sets A R d ) ad the a posteriori probability η : R d [0,1 defied by η(x) = P{Y = 1 X = x}. The collectio (X 1,Y 1 ),...,(X,Y ) is called the traiig data, ad is deoted by D. A classifier g is a biary-valued fuctio of X ad D whose probability of error is defied by L(g ) = P (X,Y ) {g (X,D ) Y } where P (X,Y ) deotes probability with respect to the pair (X,Y) (i.e., coditioal probability, give D ). For brevity, we write g (X) = g (X,D ). It is well-kow (see, for example, Devroye, Györfi, ad Lugosi, 1996) that the classifier that miimizes the probability of error, the so-called Bayes classifier is g (x) = {η(x) 1/2}. The risk of g is called the Bayes risk: L = L(g ). A sequece {g } of classifiers is cosistet for a certai distributio of (X,Y) if L(g ) L i probability. I this paper we ivestigate classifiers that calculate their decisios by takig a majority vote over radomized classifiers. A radomized classifier may use a radom variable Z to calculate its decisio. More precisely, let Z be some measurable space ad let Z take its values i Z. A radomized classifier is a arbitrary fuctio of the form g (X,Z,D ), which we abbreviate by g (X,Z). The probability of error of g becomes L(g ) = P (X,Y ),Z {g (X,Z,D ) Y } = P{g (X,Z,D ) Y D }. The defiitio of cosistecy remais the same by augmetig the probability space appropriately to iclude the radomizatio. Give ay radomized classifier, oe may calculate the classifier for various draws of the radomizig variable Z. It is the a atural idea to defie a averaged classifier by takig a majority vote amog the obtaied radom classifiers. Assume that Z 1,...,Z m are idetically distributed draws of the radomizig variable, havig the same distributio as Z. Throughout the paper, we assume that Z 1,...,Z m are idepedet, coditioally o X, Y, ad D. Lettig Z m = (Z 1,...,Z m ), oe may defie the correspodig votig classifier by { g (m) (x,z m 1 if 1,D ) = m m j=1 g (x,z j,d ) 1 2, 0 otherwise. By the strog law of large umbers, for ay fixed x ad D for which P Z {g (x,z,d ) = 1} 1/2, we have almost surely lim m g (m) (x,z m,d ) = g (x,d ), where g (x,d ) = g (x) = {EZ g (x,z) 1/2} 2017

4 BIAU, DEVROYE AND LUGOSI is a (o-radomized) classifier that we call the averaged classifier. (Here P Z ad E Z deote probability ad expectatio with respect to the radomizig variable Z, that is, coditioally o X, Y, ad D.) g may be iterpreted as a idealized versio of the classifier g (m) that draws may idepedet copies of the radomizig variable Z ad takes a majority vote over the resultig classifiers. Our first result states that cosistecy of a radomized classifier is preserved by averagig. Propositio 1 Assume that the sequece {g } of radomized classifiers is cosistet for a certai distributio of (X,Y ). The the votig classifier g (m) (for ay value of m) ad the averaged classifier g are also cosistet. Proof Cosistecy of {g } is equivalet to sayig that EL(g ) = P{g (X,Z) Y } L. I fact, sice P{g (X,Z) Y X = x} P{g (X) Y X = x} for all x R d, cosistecy of {g } meas that for µ-almost all x, P{g (X,Z) Y X = x} P{g (X) Y X = x} = mi(η(x),1 η(x)). Without loss of geerality, assume that η(x) > 1/2. (I the case of η(x) = 1/2 ay classifier has a coditioal probability of error 1/2 ad there is othig to prove.) The P{g (X,Z) Y X = x} = (2η(x) 1)P{g (x,z) = 0} + 1 η(x), ad by cosistecy we have P{g (x,z) = 0} 0. To prove cosistecy of the votig classifier g (m) for µ-almost all x for which η(x) > 1/2. However, P{g (m) (x,z m ) = 0} = P { 2E, it suffices to show that P{g (m) (x,z m ) = 0} 0 (1/m) [ (1/m) m j=1 m j=1 {g (x,z j )=0} > 1/2 {g (x,z j )=0} (by Markov s iequality) = 2P{g (x,z) = 0} 0. Cosistecy of the averaged classifier is proved by a similar argumet. } 3. Radom Forests Radom forests, itroduced by Breima, are averaged classifiers i the sese defied i Sectio 2. Formally, a radom forest with m trees is a classifier cosistig of a collectio of radomized base tree classifiers g (x,z 1 ),...,g (x,z m ) where Z 1,...,Z m are idetically distributed radom vectors, idepedet coditioally o X, Y, ad D. The radomizig variable is typically used to determie how the successive cuts are performed whe buildig the tree such as selectio of the ode ad the coordiate to split, as well as the positio of the split. The radom forest classifier takes a majority vote amog the radom tree classifiers. If m is large, the radom forest classifier is well approximated by the averaged classifier 2018

5 CONSISTENCY OF RANDOM FORESTS g (x) = {EZ g (x,z) 1/2}. For brevity, we state most results of this paper for the averaged classifier oly, though by Propositio 1 various results remai true for the votig classifier g (m) as well. I this sectio we aalyze a simple radom forest already cosidered by Breima (2000), which we call the purely radom forest. The radom tree classifier g (x,z) is costructed as follows. Assume, for simplicity, that µ is supported o [0,1 d. All odes of the tree are associated with rectagular cells such that at each step of the costructio of the tree, the collectio of cells associated with the leaves of the tree (i.e., exteral odes) forms a partitio of [0,1 d. The root of the radom tree is [0,1 d itself. At each step of the costructio of the tree, a leaf is chose uiformly at radom. The split variable J is the selected uiformly at radom from the d cadidates x (1),...,x (d). Fially, the selected cell is split alog the radomly chose variable at a radom locatio, chose accordig to a uiform radom variable o the legth of the chose side of the selected cell. The procedure is repeated k times where k 1 is a determiistic parameter, fixed beforehad by the user, ad possibly depedig o. The radomized classifier g (x,z) takes a majority vote amog all Y i for which the correspodig feature vector X i falls i the same cell of the radom partitio as x. (For cocreteess, break ties i favor of the label 1.) The purely radom forest classifier is a radically simplified versio of radom forest classifiers used i practice. The mai simplificatio lies i the fact that recursive cell splits do ot deped o the labels Y 1,...,Y. The ext theorem maily serves as a illustratio of how the cosistecy problem of radom forest classifiers may be attacked. More ivolved versios of radom forest classifiers are discussed i subsequet sectios. Theorem 2 Assume that the distributio of X is supported o [0,1 d. The the purely radom forest classifier g is cosistet wheever k ad k/ 0 as k. Proof By Propositio 1 it suffices to prove cosistecy of the radomized base tree classifier g. To this ed, we recall a geeral cosistecy theorem for partitioig classifiers proved i (Devroye, Györfi, ad Lugosi, 1996, Theorem 6.1). Accordig to this theorem, g is cosistet if both diam(a (X,Z)) 0 i probability ad N (X,Z) i probability, where A (x,z) is the rectagular cell of the radom partitio cotaiig x ad N (x,z) = {X i A (x,z)} is the umber of data poits fallig i the same cell as x. First we show that N (X,Z) i probability. Cosider the radom tree partitio defied by Z. Observe that the partitio has k + 1 rectagular cells, say A 1,...,A k+1. Let N 1,...,N k+1 deote the umber of poits of X,X 1,...,X fallig i these k + 1 cells. Let S = {X,X 1,...,X } deote the set of positios of these + 1 poits. Sice these poits are idepedet ad idetically distributed, fixig the set S (but ot the order of the poits) ad Z, the coditioal probability that X falls i the i-th cell equals N i /( + 1). Thus, for every fixed t > 0, P{N (X,Z) < t} = E[P{N (X,Z) < t S,Z} [ N i = E + 1 i:n i <t 2019 (t 1) k

6 BIAU, DEVROYE AND LUGOSI which coverges to zero by our assumptio o k. It remais to show that diam(a (X,Z)) 0 i probability. To this aim, let V = V (x,z) be the size of the first dimesio of the rectagle cotaiig x. Let T = T (x,z) be the umber of times that the box cotaiig x is split whe we costruct the radom tree partitio. Let K be biomial (T,1/d), represetig the umber of times the box cotaiig x is split alog the first coordiate. Clearly, it suffices to show that V (x,z) 0 i probability for µ-almost all x, so it is eough to show that for all x, E[V (x,z) 0. Observe that if U 1,U 2,... are idepedet uiform [0,1, the [ E[V (x,z) E E [ K max(u i,1 U i ) K [ = E E[max(U 1,1 U 1 ) K = E [ (3/4) K [ ( = E 1 1 d + 3 ) T 4d [ ( = E 1 1 ) T. 4d Thus, it suffices to show that T i probability. To this ed, ote that the partitio tree is statistically related to a radom biary search tree with k + 1 exteral odes (ad thus k iteral odes). Such a tree is obtaied as follows. Iitially, the root is the sole exteral ode, ad there are o iteral odes. Select a exteral ode uiformly at radom, make it a iteral ode ad give it two childre, both exteral. Repeat util we have precisely k iteral odes ad k + 1 exteral odes. The resultig tree is the radom biary search tree o k iteral odes (see Devroye 1988 ad Mahmoud 1992 for more equivalet costructios of radom biary search trees). It is kow that all levels up to l = 0.37logk are full with probability tedig to oe as k (Devroye, 1986). The last full level F is called the fill-up level. Clearly, the partitio tree has this property. Therefore, we kow that all fial cells have bee cut at least l times ad therefore T l with probability covergig to 1. This cocludes the proof of Theorem 3.1. Remark 3 We observe that the largest first dimesio amog exteral odes does ot ted to zero i probability except for d = 1. For d 2, it teds to a limit radom variable that is ot atomic at zero (this ca be show usig the theory of brachig processes). Thus the proof above could ot have used the uiform smalless of all cells. Despite the fact that the radom partitio cotais some cells of huge diameter of o-shrikig size, the rule based o it is cosistet. Next we cosider a scale-ivariat versio of the purely radom forest classifier. I this variat the root cell is the etire feature space ad the radom tree is grow up to k cuts. The leaf cell to cut ad the directio J i which the cell is cut are chose uiformly at radom, exactly as i the purely radom forest classifier. The oly differece is that the positio of the cut is ow chose i a data-based maer: if the cell to be cut cotais N of the data poits X,X 1,...,X, the a radom idex I is chose uiformly from the set {0,1,...,N} ad the cell is cut so that, whe ordered by their J-th compoets, the poits with the I smallest values fall i oe of the subcells ad the rest i 2020

7 CONSISTENCY OF RANDOM FORESTS the other. To avoid ties, we assume that the distributio of X has o-atomic margials. I this case the radom tree is well-defied with probability oe. Just like before, the associated classifier takes a majority vote over the labels of the data poits fallig i the same cell as X. The scale-ivariat radom forest classifier is defied as the correspodig averaged classifier. Theorem 4 Assume that the distributio of X has o-atomic margials i R d. The the scaleivariat radom forest classifier g is cosistet wheever k ad k/ 0 as k. Proof Oce agai, we may use Propositio 1 ad (Devroye, Györfi, ad Lugosi, 1996, Theorem 6.1) to prove cosistecy of the radomized base tree classifier g. The proof of the fact that N (X,Z) i probability is the same as i Theorem 2. To show that diam(a (X,Z)) 0 i probability, we begi by otig that, just as i the case of the purely radom forest classifier, the partitio tree is equivalet to a biary search tree, ad therefore with probability covergig to oe, all fial cells have bee cut at least l = 0.37 log k times. Sice the classificatio rule is scale-ivariat, we may assume, without loss of geerality, that the distributio of X is cocetrated o the uit cube [0,1 d. Let i deote the cardiality of the i-th cell i the partitio, 1 i k + 1, where the cardiality of a cell C is C {X,X 1,...,X }. Thus, k+1 i = + 1. Let V i be the first dimesio of the i-th cell. Let V (X) be the first dimesio of the cell that cotais X. Clearly, give the i s, V (X) = V i with probability i /( + 1). We eed to show that E[V (X) 0. But we have [ k+1 E[V (X) = E iv i. + 1 So, it suffices to show that E[ i i V i = o(). It is worthy of metio that the radom split of a box ca be imagied as follows. Give that we split alog the s-th coordiate axis, ad that a box has m poits, the we select oe of the m + 1 spacigs defied by these m poits uiformly at radom, still for that s-th coordiate. We cut that spacig properly but are free to do so aywhere. We ca cut i proportios λ,1 λ with λ (0,1), ad the value of λ may vary from cut to cut ad eve be data-depedet. I fact, the, each iteral ad exteral ode of our partitio tree has associated with it two importat quatities, a cardiality, ad its first dimesio. If we keep usig i to idex cells, the we ca use i ad V i for the i-th cell, eve if it is a iteral cell. Let A be the collectio of exteral odes i the subtree of the i-th cell. The trivially, j V j i V i. j A Thus, if E is the collectio of all exteral odes of a partitio tree, l is at most the miimum path distace from ay cell i E to the root, ad L is the collectio of all odes at distace l from the root, the, by the last iequality, i E i V i i V i. i L Thus, usig the otio of fill-up level F of the biary search tree, ad settig l = 0.37logk, we have [ [ E i V i P{F < l} + E i V i. i E i L 2021

8 BIAU, DEVROYE AND LUGOSI We have see that the first term is o(). We argue that the secod term is ot more tha (1 1/(8d)) l, which is o() sice k. That will coclude the proof. It suffices ow to argue recursively ad fix oe cell of cardiality ad first dimesio V. Let C be the collectio of its childre. We will show that E [ i V i i C Repeatig this recursively l times shows that [ E i V i i L ( 1 1 8d ) V. ( 1 1 ) l 8d because V = 1 at the root. Fix that cell of cardiality, ad assume without loss of geerality that V = 1. Let the spacigs alog the first coordiate be a 1,...,a +1, their sum beig oe. With probability 1 1/d, there the first axis is ot cut, ad thus, i C i V i =. With probability 1/d, the first axis is cut i two parts. We will show that coditioal o the evet that the first directio is cut, [ E i V i 7 i 8. Ucoditioally, we have [ E i V i i ( 1 1 ) + 1d d 78 ( = 1 1 ), 8d as required. So, let us prove the coditioal result. Usig δ j to deote umbers draw from (0,1), possibly radom, we have [ E i V i i = = 1 [ E [( j 1)(a a j 1 + a j δ j ) j=1 +( + 1 j)(a j (1 δ j ) + a j a +1 ) [ ( E a k ( j 1) k=1 k< j j<k ( + 1 j) + δ k (k 1) + (1 δ k )( + 1 k) ( +1 ( k(k 1) a k ( + 1) k=1 2 ( k + 1)( k + 2) + max(k 1, + 1 k) )) )

9 CONSISTENCY OF RANDOM FORESTS ( +1 ( 1 ( + 1) = + 1 a k k=1 2 ( (( 1 + 1) ( ) 3/4 + (3/2) = + 1 ) ) + (k 1)( + 1 k) + max(k 1, + 1 k) ( ) ) ) a k k=1 7 8 if > 4. Our defiitio of the scale-ivariat radom forest classifier permits cells to be cut such that oe of the created cells becomes empty. Oe may easily prevet this by artificially forcig a miimum umber of poits i each cell. This may be doe by restrictig the radom positio of each cut so that both created subcells cotai at least, say, m poits. By a mior modificatio of the proof above it is easy to see that as log as m is bouded by a costat, the resultig radom forest classifier remais cosistet uder the same coditios as i Theorem Creatig Cosistet Rules by Radomizatio Propositio 1 shows that if a radomized classifier is cosistet, the the correspodig averaged classifier remais cosistet. The coverse is ot true. There exist icosistet radomized classifiers such that their averaged versio becomes cosistet. Ideed, Breima s (2001) origial radom forest classifier builds tree classifiers by successive radomized cuts util the cell of the poit X to be classified cotais oly oe data poit, ad classifies X as the label of this data poit. Breima s radom forest classifier is just the averaged versio of such radomized tree classifiers. The radomized base classifier g (x,z) is obviously ot cosistet for all distributios. This does ot imply that the averaged radom forest classifier is ot cosistet. I fact, i this sectio we will see that averagig may boost icosistet base classifiers ito cosistet oes. We poit out i Sectio 6 that there are distributios of (X,Y ) for which Breima s radom forest classifier is ot cosistet. The couterexample show i Propositio 8 is such that the distributio of X does t have a desity. It is possible, however, that Breima s radom forest classifier is cosistet wheever the distributio of X has a desity. Breima s rule is difficult to aalyze as each cut of the radom tree is determied by a complicated fuctio of the etire data set D (i.e., both feature vectors ad labels). However, i Sectio 6 below we provide argumets suggestig that Breima s radom forest is ot cosistet whe a desity exists. Istead of Breima s rule, ext we aalyze a stylized versio by showig that icosistet radomized rules that take the label of oly oe eighbor ito accout ca be made cosistet by averagig. For simplicity, we cosider the case d = 1, though the whole argumet exteds, i a straightforward way, to the multivariate case. To avoid complicatios itroduced by ties, assume that X has a o-atomic distributio. Defie a radomized earest eighbor rule as follows: for a fixed x R, let X (1) (x),x (2) (x),...,x () (x) be the orderig of the data poits X 1,...,X accordig to icreasig distaces from x. Let U 1,...,U be i.i.d. radom variables, uiformly distributed over [0,1. The vector of these radom variables costitutes the radomizatio Z of the classifier. We defie g (x,z) 2023

10 BIAU, DEVROYE AND LUGOSI to be equal to the label Y (i) (x) of the data poit X (i) (x) for which max(i,mu i ) max( j,mu j ) for all j = 1,..., where m is a parameter of the rule. We call X (i) (x) the perturbed earest eighbor of x. Note that X (1) (x) is the (uperturbed) earest eighbor of x. To obtai the perturbed versio, we artificially add a radom uiform coordiate ad select a data poit with the radomized rule defied above. Sice ties occur with probability zero, the perturbed earest eighbor classifier is well defied almost surely. It is clearly ot, i geeral, a cosistet classifier. Call the correspodig averaged classifier g (x) = {EZ g (x,z) 1/2} the averaged perturbed earest eighbor classifier. I the proof of the cosistecy result below, we use Stoe s (1977) geeral cosistecy theorem for locally weighted average classifiers, see also (Devroye, Györfi, ad Lugosi, 1996, Theorem 6.3). Stoe s theorem cocers classifiers that take the form g (x) = { Y i W i (x) (1 Y i)w i (x)} where the weights W i (x) = W i (x,x 1,...,X ) are o-egative ad sum to oe. Stoe s theorem, cosistecy holds if the followig three coditios are satisfied: Accordig to (i) (ii) For all a > 0, [ lim E max W i(x) = 0. 1 i [ lim E W i (X) { Xi X >a} = 0. (iii) There is a costat c such that, for every o-egative measurable fuctio f satisfyig E f (X) <, E [ W i (X) f (X i ) ce f (X). Theorem 5 The averaged perturbed earest eighbor classifier g is cosistet wheever the parameter m is such that m ad m/ 0. Proof If we defie W i (x) = P Z {X i is the perturbed earest eighbor of x} the it is clear that the averaged perturbed earest eighbor classifier is a locally weighted average classifier ad Stoe s theorem may be applied. It is coveiet to itroduce the otatio p i (x) = P Z {X (i) (x) is the perturbed earest eighbor of x} ad write W i (x) = j=1 {X i =X ( j) (x)}p j (x). 2024

11 CONSISTENCY OF RANDOM FORESTS To check the coditios of Stoe s theorem, first ote that p i (x) = P{mU i i mi mu j} + P{i < mu i mi max( j,mu j)} j<i j ( i = {i m} 1 i i 1 + P{i < mu i mi m m) max( j,mu j)}. j Now we are prepared to check the coditios of Stoe s theorem. To prove that (i) holds, ote that by mootoicity of p i (x) i i, it suffices to show that p 1 (x) 0. But clearly, for m 2, p 1 (x) 1 ( )} j {U m + P 1 mi max j m m,u j [ m = 1 { ( } j m + E U 1 max j=2p m,u j ) U 1 = 1 m + E [ m j=2 [ 1 {U1 > j/m}u 1 1 m + E[ (1 U 1 ) mu 1 2 { mu 1 3} + P{ mu1 < 3} which coverges to zero by mootoe covergece as m. (ii) follows by the coditio m/ 0 sice W i(x) { Xi X >a} = 0 wheever the distace of m-th earest eighbor of X to X is at most a. But this happes evetually, almost surely, see (Devroye, Györfi, ad Lugosi, 1996, Lemma 5.1). Fially, to check (iii), we use agai the mootoicity of p i (x) i i. We may write p i (x) = a i + a i a for some o-egative umbers a j,1 j, depedig upo m ad but ot x. Observe that j=1 ja j = p i(x) = 1. But the E [ W i (X) f (X i ) [ = E p i (X) f (X (i) ) = E = E = [ [ j=1 j=i a j f (X (i) ) j a j f (X (i) ) j a j E[ f (X (i) ) j=1 2025

12 BIAU, DEVROYE AND LUGOSI as desired. c j=1 a j je f (X) (by Stoe s (1977) lemma, see (Devroye, Györfi, ad Lugosi, 1996, Lemma 5.3), where c is a costat) = ce f (X) j=1 a j j = ce f (X) 5. Baggig Oe of the first ad simplest ways of radomizig ad averagig classifiers i order to improve their performace is baggig, suggested by Breima (1996). I baggig, radomizatio is achieved by geeratig may bootstrap samples from the origial data set. Breima suggests selectig traiig pairs (X i,y i ) at radom, with replacemet from the bag of all traiig pairs {(X 1,Y 1 ),...,(X,Y )}. Deotig the radom selectio process by Z, this way oe obtais ew traiig data D (Z) with possible repetitios ad give a classifier g (X,D ), oe ca calculate the radomized classifier g (X,Z,D ) = g (X,D (Z)). Breima suggests repeatig this procedure for may idepedet draws of the bootstrap sample, say m of them, ad calculatig the votig classifier g (m) (X,Z m,d ) as defied i Sectio 2. I this sectio we cosider a geeralized versio of baggig predictors i which the size of the bootstrap samples is ot ecessary the same as that the origial sample. Also, to avoid complicatios ad ambiguities due to replicated data poits, we exclude repetitios i the bootstrapped data. This is assumed for coveiece but samplig with replacemet ca be treated by mior modificatios of the argumets below. To describe the model we cosider, itroduce a parameter q [0,1. I the bootstrap sample D (Z) each data pair (X i,y i ) is preset with probability q, idepedetly of each other. Thus, the size of the bootstrapped data is a biomial radom variable N with parameters ad q. Give a sequece of (o-radomized) classifiers {g }, we may thus defie the radomized classifier g (X,Z,D ) = g N (X,D (Z)), that is, the classifier is defied based o the radomly re-sampled data. By drawig m idepedet bootstrap samples D (Z 1 ),...,D (Z m ) (with sizes N 1,...,N m ), we may defie the baggig classifier g (m) (X,Z m,d ) as the votig classifier based o the radomized classifiers g N1 (X,D (Z 1 )),..., g Nm (X,D (Z m )) as i Sectio 2. For the theoretical aalysis it is more coveiet to cosider the averaged classifier g (x,d ) = {EZ g N (x,d (Z)) 1/2} which is the limitig classifier oe obtais as the umber m of the bootstrap replicates grows to ifiity. The followig result establishes cosistecy of baggig classifiers uder the assumptio that the origial classifier is cosistet. It suffices that the expected size of the bootstrap sample goes to ifiity. The result is a immediate cosequece of Propositio 1. Note that the choice of m does ot matter i Theorem 6. It ca be oe, costat, or a fuctio of. Theorem 6 Let {g } be a sequece of classifiers that is cosistet for the distributio of (X,Y ). Cosider the baggig classifiers g (m) (x,z m,d ) ad g (x,d ) defied above, usig parameter q. If q as the both classifiers are cosistet. 2026

13 CONSISTENCY OF RANDOM FORESTS If a classifier is isesitive to duplicates i the data, Breima s origial suggestio is roughly equivalet to takig q 1 1/e. However, it may be advatageous to choose much smaller values of q. I fact, small values of q may tur icosistet classifiers ito cosistet oes via the baggig procedure. We illustrate this pheomeo o the simple example of the 1-earest eighbor rule. Recall that the 1-earest eighbor rule sets g (x,d ) = Y (1) (x) where Y (1) (x) is the label of the feature vector X (1) (x) whose Euclidea distace to x is miimal amog all X 1,...,X. Ties are broke i favor of smallest idices. It is well-kow that g is cosistet oly if either L = 0 or L = 1/2, otherwise its asymptotic probability of error is strictly greater tha L. However, by baggig oe may tur the 1-earest eighbor classifier ito a cosistet oe, provided that the size of the bootstrap sample is sufficietly small. The ext result characterizes cosistecy of the baggig versio of the 1-earest eighbor classifier i terms of the parameter q. Theorem 7 The baggig averaged 1-earest eighbor classifier g (x,d ) is cosistet for all distributios of (X,Y) if ad oly if q 0 ad q. Proof It is obvious that both q 0 ad q are ecessary for cosistecy for all distributios. Assume ow that q 0 ad q. The key observatio is that g (x,d ) is a locally weighted average classifier for which Stoe s cosistecy theorem, recalled i Sectio 4, applies. Recall that for a fixed x R, X (1) (x),x (2) (x),...,x () (x) deotes the orderig of the data poits X 1,...,X accordig to icreasig distaces from x. (Poits with equal distaces to x are ordered accordig to their idices.) Observe that g may be writte as g (x,d ) = { Y i W i (x) (1 Y i)w i (x)} where W i (x) = j=1 {X i =X ( j) (x)}p j (x) ad p i (x) = (1 q ) i 1 q is defied as the probability (with respect to the radom selectio Z of the bootstrap sample) that X (i) (x) is the earest eighbor of x i the sample D (Z). It suffices to prove that the weights W i (X) satisfy the three coditios of Stoe s theorem. Coditio (i) obviously holds because max 1 i W i (X) = p 1 (X) = q 0. /q To check coditio (ii), defie k =. Sice q implies that k / 0, it follows from (Devroye, Györfi, ad Lugosi, 1996, Lemma 5.1) that evetually, almost surely, X X (k )(X) a ad therefore W i (X) { Xi X >a} = i=k +1 p i (X) q (1 q ) i 1 i=k +1 (1 q ) k (1 q ) /q e q where we used 1 q e q. Therefore, W i(x) { Xi X >a} 0 almost surely ad Stoe s secod coditio is satisfied by domiated covergece. 2027

14 BIAU, DEVROYE AND LUGOSI Fially, coditio (iii) follows from the fact that p i (x) is mootoe decreasig i i, after usig a argumet as i the proof of Theorem Radom Forests Based o Greedily Grow Trees I this sectio we study radom forest classifiers that are based o radomized tree classifiers that are costructed i a greedy maer, by recursively splittig cells to miimize a empirical error criterio. Such greedy forests were itroduced by Breima (2001, 2004) ad have show excellet performace i may applicatios. Oe of his most popular classifiers is a averagig classifier, g, based o a radomized tree classifier g (x,z) defied as follows. The algorithm has a parameter 1 v < d which is a positive iteger. The feature space R d is partitioed recursively to form a tree partitio. The root of the radom tree is R d. At each step of the costructio of the tree, a leaf is chose uiformly at radom. v variables are selected uiformly at radom from the d cadidates x (1),...,x (d). A split is selected alog oe of these v variables to miimize the umber of misclassified traiig poits if a majority vote is used i each cell. The procedure is repeated util every cell cotais exactly oe traiig poit X i. (This is always possible if the distributio of X has o-atomic margials.) I some versios of Breima s algorithm, a bootstrap subsample of the traiig data is selected before the costructio of each tree to icrease the effect of radomizatio. As observed by Li ad Jeo (2006), Breima s classifier is a weighted layered earest eighbor classifier, that is, a classifier that takes a (weighted) majority vote amog the layered earest eighbors of the observatio x. X i is called a layered earest eighbor of x if the rectagle defied by x ad X i as their opposig vertices does ot cotai ay other data poit X j ( j i). This property of Breima s radom forest classifier is a simple cosequece of the fact that each tree is grow util every cell cotais just oe data poit. Ufortuately, this simple property prevets the radom tree classifier from beig cosistet for all distributios: Propositio 8 There exists a distributio of (X,Y) such that X has o-atomic margials for which Breima s radom forest classifier is ot cosistet. Proof The proof works for ay weighted layered earest eighbor classifier. Let the distributio of X be uiform o the segmet {x = (x (1),...,x (d) ) : x (1) = = x (d),x (1) [0,1} ad let the distributio of Y be such that L {0,1/2}. The with probability oe, X has oly two layered earest eighbors ad the classificatio rule is ot cosistet. (Note that Problem 11.6 i Devroye, Györfi, ad Lugosi 1996 erroeously asks the reader to prove cosistecy of the (uweighted) layered earest eighbor rule for ay distributio with o-atomic margials. As the example i this proof shows, the statemet of the exercise is icorrect. Cosistecy of the layered earest eighbor rule is true however, if the distributio of X has a desity.) Oe may also woder whether Breima s radom forest classifier is cosistet if istead of growig the tree dow to cells with a sigle data poit, oe uses a differet stoppig rule, for example if oe fixes the total umber of cuts at k ad let k grow slowly as i the examples of Sectio 3. The ext two-dimesioal example provides a idicatio that this is ot ecessarily the case. Cosider the joit distributio of (X,Y ) sketched i Figure 1. X has a uiform distributio o [0,1 [0,1 [1,2 [1,2 [2,3 [2,3. Y is a fuctio of X, that is η(x) {0,1} ad L = 0. The lower left square [0,1 [0,1 is divided ito coutably ifiitely may vertical stripes i 2028

15 CONSISTENCY OF RANDOM FORESTS Figure 1: A example of a distributio for which greedy radom forests are icosistet. The distributio of X is uiform o the uio of the three large squares. White areas represet the set where η(x) = 0 ad o the grey regios η(x) = 1. which the stripes with η(x) = 0 ad η(x) = 1 alterate. The upper right square [2,3 [2,3 is divided similarly ito horizotal stripes. The middle rectagle [1, 2 [1, 2 is a 2 2 checkerboard. Cosider Breima s radom forest classifier with v = 1 (the oly possible choice whe d = 2). For simplicity, cosider the case whe, istead of miimizig the empirical error, each tree is grow by miimizig the true probability of error at each split i each radom tree. The it is easy to see that o matter what the sequece of radom selectio of split directios is ad o matter for how log each tree is grow, o tree will ever cut the middle rectagle ad therefore the probability of error of the correspodig radom forest classifier is at least 1/6. It is ot so clear what happes i this example if the successive cuts are made by miimizig the empirical error. Whether the middle square is ever cut will deped o the precise form of the stoppig rule ad the exact parameters of the distributio. The example is here to illustrate that cosistecy of greedily grow radom forests is a delicate issue. Note however that if Breima s origial algorithm is used i this example (i.e., whe all cells with more tha oe data poit i it are split) the oe obtais a cosistet classificatio rule. If, o the other had, horizotal or vertical cuts are selected to miimize the probability of error, ad k i such a way that k = O( 1/2 ε ) for some ε > 0, the, as errors o the middle square are ever more tha about O(1/ ) (by the limit law for the Kolmogorov-Smirov statistic), we see that thi strips of probability mass more tha 1/ are preferetially cut. By choosig the probability weights of the strips, oe ca easily see that we ca costruct more tha 2k such strips. Thus, whe k = O( 1/2 ε ), o cosistecy is possible o that example. We ote here that may versios of radom forest classifiers build o radom tree classifiers based o bootstrap subsamplig. This is the case of Breima s pricipal radom forest classifier. 2029

16 BIAU, DEVROYE AND LUGOSI c c 2 c c 4c c c c c c c c c c Figure 2: A tree based o partitioig the plae ito rectagles. The right subtree of each iteral ode belogs to the iside of a rectagle, ad the left subtree belogs to the complemet of the same rectagle (i c deotes the complemet of i). Rectagles are ot allowed to overlap. Breima suggests to take a radom sample of size draw with replacemet from the origial data. While this may result i a improved behavior i some practical istaces, it is easy to see that such a subsamplig procedure does ot vary the cosistecy property of ay of the classifiers studied i this paper. For example, o-cosistecy of Breima s radom forest classifier with bootstrap resamplig for the distributio cosidered i the proof of Propositio 8 follows from the fact that the two layered earest eighbors o both sides are icluded i the bootstrap sample with a probability bouded away from zero ad therefore the weight of these two poits is too large, makig cosistecy impossible. I order to remedy the icosistecy of greedily grow tree classifiers, (Devroye, Györfi, ad Lugosi, 1996, Sectio 20.14) itroduce a greedy tree classifier which, istead of cuttig every cell alog just oe directio, cuts out a whole hyper-rectagle from a cell i a way to optimize the empirical error. The disadvatage of this method is that i each step, d parameters eed to be optimized joitly ad this may be computatioally prohibitive if d is ot very small. (The computatioal complexity of the method is O( d ).) However, we may use the methodology of radom forests to defie a computatioally feasible cosistet greedily grow radom forest classifier. I order to defie the cosistet greedy radom forest, we first recall the tree classifier of (Devroye, Györfi, ad Lugosi, 1996, Sectio 20.14). The space is partitioed ito rectagles as show i Figure 2. A hyper-rectagle defies a split i a atural way. A partitio is deoted by P, ad a decisio o a set A P is by majority vote. We write g P for such a rule: g P (x) = {i:xi A(x)Y i > i:xi A(x)(1 Y i )} where A(x) deotes the cell of the partitio cotaiig x. Give a partitio P, a legal hyper-rectagle T is oe for which T A = /0 or T A for all sets A P. If we refie P by addig a legal rectagle T somewhere, the we obtai the partitio T. The decisio g T agrees with g P except o the set A P that cotais T. 2030

17 CONSISTENCY OF RANDOM FORESTS Itroduce the coveiet otatio The empirical error of g P is where L (R) = 1 ν j (A) = P{X A,Y = j}, j {0,1}, ν j, (A) = 1 I {Xi A,Y i = j}, j {0,1}. L (P ) def = L (R), R P I {Xi R,g P (X i ) Y i } = mi(ν 0, (R),ν 1, (R)). We may similarly defie L (T ). Give a partitio P, the greedy classifier selects that legal rectagle T for which L (T ) is miimal (with ay appropriate policy for breakig ties). Let R be the set of P cotaiig T. The the greedy classifier picks that T for which L (T ) + L (R T ) L (R) is miimal. Startig with the trivial partitio P 0 = {R d }, we repeat the previous step k times, leadig thus to k + 1 regios. The sequece of partitios is deoted by P 0,P 1,...,P k. (Devroye, Györfi, ad Lugosi, 1996, Theorem 20.9) establish cosistecy of this classifier. More precisely, it is show that if X has o-atomic margials, the the greedy classifier with k ad ( /log ) k = o is cosistet. Based o the greedy tree classifier, we may defie a radom forest classifier by cosiderig its baggig versio. More precisely, let q [0,1 be a parameter ad let Z = Z(D ) deote a radom subsample of size biomial (,q ) of the traiig data (i.e., each pair (X i,y i ) is selected at radom, without replacemet, from D, with probability q ) ad let g (x,z) be the greedy tree classifier (as defied above) based o the traiig data Z(D ). Defie the correspodig averaged classifier g. We call g the greedy radom forest classifier. Note that g is just the baggig versio of the greedy tree classifier ad therefore Theorem 6 applies: Theorem 9 The greedy radom forest classifier is cosistet wheever X has o-atomic margials q ) i R d, q, k ad k = o( /log(q ) as. Proof This follows from Theorem 6 ad the fact that the greedy tree classifier is cosistet (see Theorem 20.9 of Devroye, Györfi, ad Lugosi (1996)). Observe that the computatioal complexity of buildig the radomized tree classifier g (x,z) is O((q ) d ). Thus, the complexity of computig the votig classifier g (m) is m(q ) d. If q 1, this may be a sigificat speed-up compared to the complexity O( d ) of computig a sigle tree classifier usig the full sample. Repeated subsamplig ad averagig may make up for the effect of decreased sample size. 2031

18 BIAU, DEVROYE AND LUGOSI Ackowledgmets We thak James Malley for stimulatig discussios. We also thak three referees for valuable commets ad isightful suggestios. The secod author s research was sposored by NSERC Grat A3456 ad FQRNT Grat 90- ER The third author ackowledges support by the Spaish Miistry of Sciece ad Techology grat MTM ad by the PASCAL Network of Excellece uder EC grat o Refereces Y. Amit ad D. Gema. Shape quatizatio ad recogitio with radomized trees. Neural Computatio, 9: , L. Breima. Baggig predictors. Machie Learig, 24: , L. Breima. Arcig classifiers. The Aals of Statistics, 24: , L. Breima. Some ifiite theory for predictor esembles. Techical Report 577, Statistics Departmet, UC Berkeley, breima. L. Breima. Radom forests. Machie Learig, 45:5 32, L. Breima. Cosistecy for a simple model of radom forests. Techical Report 670, Statistics Departmet, UC Berkeley, A. Cutler ad G. Zhao. Pert Perfect radom tree esembles, Computig Sciece ad Statistics, 33: , L. Devroye. Applicatios of the theory of records i the study of radom trees. Acta Iformatica, 26: , L. Devroye. A ote o the height of biary search trees. Joural of the ACM, 33: , L. Devroye, L. Györfi, ad G. Lugosi. A Probabilistic Theory of Patter Recogitio. Spriger- Verlag, New York, T.G. Dietterich. A experimetal compariso of three methods for costructig esembles of decisio trees: baggig, boostig, ad radomizatio. Machie Learig, 40: , T.G. Dietterich. Esemble methods i machie learig. I J. Kittler ad F. Roli (Eds.), First Iteratioal Workshop o Multiple Classifier Systems, Lecture Notes i Computer Sciece, pp. 1 15, Spriger-Verlag, New York, Y. Freud ad R. Schapire. Experimets with a ew boostig algorithm. I L. Saitta (Ed.), Machie Learig: Proceedigs of the 13th Iteratioal Coferece, pp , Morga Kaufma, Sa Fracisco, Y. Li ad Y. Jeo. Radom forests ad adaptive earest eighbors. Joural of the America Statistical Associatio, 101: ,

19 CONSISTENCY OF RANDOM FORESTS N. Meishause. Quatile regressio forests. Joural of Machie Learig Research, 7: , H.M. Mahmoud. Evolutio of Radom Search Trees. Joh Wiley, New York, C. Stoe. Cosistet oparametric regressio. The Aals of Statistics, 5: ,

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Sequences II. Chapter 3. 3.1 Convergent Sequences

Sequences II. Chapter 3. 3.1 Convergent Sequences Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Recursion and Recurrences

Recursion and Recurrences Chapter 5 Recursio ad Recurreces 5.1 Growth Rates of Solutios to Recurreces Divide ad Coquer Algorithms Oe of the most basic ad powerful algorithmic techiques is divide ad coquer. Cosider, for example,

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process A example of o-queched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means) CHAPTER 7: Cetral Limit Theorem: CLT for Averages (Meas) X = the umber obtaied whe rollig oe six sided die oce. If we roll a six sided die oce, the mea of the probability distributio is X P(X = x) Simulatio:

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace iva.kojadiovic@uiv-ates.fr Jea-Luc Marichal Applied Mathematics

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

CS103X: Discrete Structures Homework 4 Solutions

CS103X: Discrete Structures Homework 4 Solutions CS103X: Discrete Structures Homewor 4 Solutios Due February 22, 2008 Exercise 1 10 poits. Silico Valley questios: a How may possible six-figure salaries i whole dollar amouts are there that cotai at least

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

A Constant-Factor Approximation Algorithm for the Link Building Problem

A Constant-Factor Approximation Algorithm for the Link Building Problem A Costat-Factor Approximatio Algorithm for the Lik Buildig Problem Marti Olse 1, Aastasios Viglas 2, ad Ilia Zvedeiouk 2 1 Ceter for Iovatio ad Busiess Developmet, Istitute of Busiess ad Techology, Aarhus

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

3 Basic Definitions of Probability Theory

3 Basic Definitions of Probability Theory 3 Basic Defiitios of Probability Theory 3defprob.tex: Feb 10, 2003 Classical probability Frequecy probability axiomatic probability Historical developemet: Classical Frequecy Axiomatic The Axiomatic defiitio

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

Factors of sums of powers of binomial coefficients

Factors of sums of powers of binomial coefficients ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

A Recursive Formula for Moments of a Binomial Distribution

A Recursive Formula for Moments of a Binomial Distribution A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

NATIONAL SENIOR CERTIFICATE GRADE 12

NATIONAL SENIOR CERTIFICATE GRADE 12 NATIONAL SENIOR CERTIFICATE GRADE MATHEMATICS P EXEMPLAR 04 MARKS: 50 TIME: 3 hours This questio paper cosists of 8 pages ad iformatio sheet. Please tur over Mathematics/P DBE/04 NSC Grade Eemplar INSTRUCTIONS

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

Section 1.6: Proof by Mathematical Induction

Section 1.6: Proof by Mathematical Induction Sectio.6 Proof by Iductio Sectio.6: Proof by Mathematical Iductio Purpose of Sectio: To itroduce the Priciple of Mathematical Iductio, both weak ad the strog versios, ad show how certai types of theorems

More information

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing

Perfect Packing Theorems and the Average-Case Behavior of Optimal and Online Bin Packing SIAM REVIEW Vol. 44, No. 1, pp. 95 108 c 2002 Society for Idustrial ad Applied Mathematics Perfect Packig Theorems ad the Average-Case Behavior of Optimal ad Olie Bi Packig E. G. Coffma, Jr. C. Courcoubetis

More information

Notes on Hypothesis Testing

Notes on Hypothesis Testing Probability & Statistics Grishpa Notes o Hypothesis Testig A radom sample X = X 1,..., X is observed, with joit pmf/pdf f θ x 1,..., x. The values x = x 1,..., x of X lie i some sample space X. The parameter

More information

Learning outcomes. Algorithms and Data Structures. Time Complexity Analysis. Time Complexity Analysis How fast is the algorithm? Prof. Dr.

Learning outcomes. Algorithms and Data Structures. Time Complexity Analysis. Time Complexity Analysis How fast is the algorithm? Prof. Dr. Algorithms ad Data Structures Algorithm efficiecy Learig outcomes Able to carry out simple asymptotic aalysisof algorithms Prof. Dr. Qi Xi 2 Time Complexity Aalysis How fast is the algorithm? Code the

More information

A RANDOM PERMUTATION MODEL ARISING IN CHEMISTRY

A RANDOM PERMUTATION MODEL ARISING IN CHEMISTRY J. Appl. Prob. 45, 060 070 2008 Prited i Eglad Applied Probability Trust 2008 A RANDOM PERMUTATION MODEL ARISING IN CHEMISTRY MARK BROWN, The City College of New York EROL A. PEKÖZ, Bosto Uiversity SHELDON

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k.

Lecture 3. denote the orthogonal complement of S k. Then. 1 x S k. n. 2 x T Ax = ( ) λ x. with x = 1, we have. i = λ k x 2 = λ k. 18.409 A Algorithmist s Toolkit September 17, 009 Lecture 3 Lecturer: Joatha Keler Scribe: Adre Wibisoo 1 Outlie Today s lecture covers three mai parts: Courat-Fischer formula ad Rayleigh quotiets The

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Engineering 323 Beautiful Homework Set 3 1 of 7 Kuszmar Problem 2.51

Engineering 323 Beautiful Homework Set 3 1 of 7 Kuszmar Problem 2.51 Egieerig 33 eautiful Homewor et 3 of 7 Kuszmar roblem.5.5 large departmet store sells sport shirts i three sizes small, medium, ad large, three patters plaid, prit, ad stripe, ad two sleeve legths log

More information

Trackless online algorithms for the server problem

Trackless online algorithms for the server problem Iformatio Processig Letters 74 (2000) 73 79 Trackless olie algorithms for the server problem Wolfgag W. Bei,LawreceL.Larmore 1 Departmet of Computer Sciece, Uiversity of Nevada, Las Vegas, NV 89154, USA

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here).

Example 2 Find the square root of 0. The only square root of 0 is 0 (since 0 is not positive or negative, so those choices don t exist here). BEGINNING ALGEBRA Roots ad Radicals (revised summer, 00 Olso) Packet to Supplemet the Curret Textbook - Part Review of Square Roots & Irratioals (This portio ca be ay time before Part ad should mostly

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

Notes on exponential generating functions and structures.

Notes on exponential generating functions and structures. Notes o expoetial geeratig fuctios ad structures. 1. The cocept of a structure. Cosider the followig coutig problems: (1) to fid for each the umber of partitios of a -elemet set, (2) to fid for each the

More information

Chapter 5 O A Cojecture Of Erdíos Proceedigs NCUR VIII è1994è, Vol II, pp 794í798 Jeærey F Gold Departmet of Mathematics, Departmet of Physics Uiversity of Utah Do H Tucker Departmet of Mathematics Uiversity

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

Distributions of Order Statistics

Distributions of Order Statistics Chapter 2 Distributios of Order Statistics We give some importat formulae for distributios of order statistics. For example, where F k: (x)=p{x k, x} = I F(x) (k, k + 1), I x (a,b)= 1 x t a 1 (1 t) b 1

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

INFINITE SERIES KEITH CONRAD

INFINITE SERIES KEITH CONRAD INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

More information

1 Hypothesis testing for a single mean

1 Hypothesis testing for a single mean BST 140.65 Hypothesis Testig Review otes 1 Hypothesis testig for a sigle mea 1. The ull, or status quo, hypothesis is labeled H 0, the alterative H a or H 1 or H.... A type I error occurs whe we falsely

More information

1.3 Binomial Coefficients

1.3 Binomial Coefficients 18 CHAPTER 1. COUNTING 1. Biomial Coefficiets I this sectio, we will explore various properties of biomial coefficiets. Pascal s Triagle Table 1 cotais the values of the biomial coefficiets ( ) for 0to

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Trading the randomness - Designing an optimal trading strategy under a drifted random walk price model

Trading the randomness - Designing an optimal trading strategy under a drifted random walk price model Tradig the radomess - Desigig a optimal tradig strategy uder a drifted radom walk price model Yuao Wu Math 20 Project Paper Professor Zachary Hamaker Abstract: I this paper the author iteds to explore

More information

Basic Elements of Arithmetic Sequences and Series

Basic Elements of Arithmetic Sequences and Series MA40S PRE-CALCULUS UNIT G GEOMETRIC SEQUENCES CLASS NOTES (COMPLETED NO NEED TO COPY NOTES FROM OVERHEAD) Basic Elemets of Arithmetic Sequeces ad Series Objective: To establish basic elemets of arithmetic

More information

ON THE EDGE-BANDWIDTH OF GRAPH PRODUCTS

ON THE EDGE-BANDWIDTH OF GRAPH PRODUCTS ON THE EDGE-BANDWIDTH OF GRAPH PRODUCTS JÓZSEF BALOGH, DHRUV MUBAYI, AND ANDRÁS PLUHÁR Abstract The edge-badwidth of a graph G is the badwidth of the lie graph of G We show asymptotically tight bouds o

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of

More information

Exploratory Data Analysis

Exploratory Data Analysis 1 Exploratory Data Aalysis Exploratory data aalysis is ofte the rst step i a statistical aalysis, for it helps uderstadig the mai features of the particular sample that a aalyst is usig. Itelliget descriptios

More information

9.8: THE POWER OF A TEST

9.8: THE POWER OF A TEST 9.8: The Power of a Test CD9-1 9.8: THE POWER OF A TEST I the iitial discussio of statistical hypothesis testig, the two types of risks that are take whe decisios are made about populatio parameters based

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information