arxiv: v1 [stat.ml] 30 Jun 2015

Size: px
Start display at page:

Download "arxiv:1506.08910v1 [stat.ml] 30 Jun 2015"

Transcription

1 Learig Sigle Idex Models i High Dimesios Ravi Gati, Nikhil Rao 2, Rebecca M. Willett 3 ad Robert Nowak 3 arxiv: v [stat.ml] 30 Ju 205 Wiscosi Istitutes for Discovery, 330 N Orchard St, Madiso, WI, Departmet of Computer Sciece, Uiversity of Texas at Austi, Departmet of Electrical ad Computer Egieerig, Uiversity of Wiscosi-Madiso, Madiso, WI, Abstract Sigle Idex Models (SIMs) are simple yet flexible semi-parametric models for classificatio ad regressio. Respose variables are modeled as a oliear, mootoic fuctio of a liear combiatio of features. Estimatio i this cotext requires learig both the feature weights, ad the oliear fuctio. While methods have bee described to lear SIMs i the low dimesioal regime, a method that ca efficietly lear SIMs i high dimesios has ot bee forthcomig. We propose three variats of a computatioally ad statistically efficiet algorithm for SIM iferece i high dimesios. We establish excess risk bouds for the proposed algorithms ad experimetally validate the advatages that our SIM learig methods provide relative to Geeralized Liear Model (GLM) ad low dimesioal SIM based learig methods. Itroductio High-dimesioal learig is ofte tackled usig geeralized liear models, where we assume that a respose variable Y 2 R is related to a feature vector X 2 R d via E[Y X = x] =g? (w >? x) () for some weight vector w? 2 R d ad some mootoic ad smooth fuctio g? called the trasfer fuctio. Typical examples of g? are the logit fuctio ad the probit fuctio for classificatio, ad the liear fuctio for regressio. While classical work o geeralized liear models (GLMs) assumes g? is kow, this potetially oliear fuctio is ofte ukow ad hece a major challege i statical iferece. The model i () with g? ukow is called a Sigle Idex Model (SIM) ad is a powerful semi-parametric geeralizatio of a GLM. SIMs were first itroduced i ecoometrics ad statistics [3,, 2]. Recetly, computatioally ad statistically efficiet algorithms have bee provided for learig SIMs [6, 5] i low-dimesioal settigs where the umber of samples/observatios is larger tha the ambiet dimesio d. However, moder data aalysis problems i machie learig, sigal processig, ad computatioal biology ivolve high dimesioal datasets, where the umber of parameters far exceeds the umber of samples ( d). I this paper we cosider the problem of learig SIMs, give labeled data, i the high-dimesioal regime. We provide algorithms that are both computatioally ad statistically efficiet for learig SIMs i high-dimesios, ad validate our methods o several high dimesioal datasets. Our cotributios i this paper ca be summarized as follows: [email protected] [email protected] [email protected] [email protected]

2 . We propose a suite of algorithms to lear SIMs i high dimesios. Our simplest algorithm called SILO (Sigle Idex Lasso Optimizatio) is a simple, o iterative method that estimates the vector w? ad a mootoic, Lipschitz fuctio g?. isilo ad cisilo are iterative variats of SILO that use differet loss fuctios. While isilo uses a squared loss fuctio, cisilo uses a calibrated loss fuctio that adapts to the SIM from which our data is geerated. 2. We provide excess risk bouds o the hypotheses retured by SILO, isilo, cisilo. 3. We experimetally compare our algorithms with other methods used both for SIM learig ad high dimesioal parameter estimatio o various real world high dimesioal datasets. Our experimetal results show superior performace of isilo ad cisilo whe compared to commoly used methods for high dimesioal estimatio. The rest of the paper is orgaized as follows: I Sectio (2), we formally set up the problem we wish to solve, ad detail the proposed methods, SILO, isilo, cisilo. I Sectio (3), we perform a theoretical aalysis of SILO, isilo, ad cisilo. We perform a thorough empirical evaluatio o several datasets i Sectio (), ad coclude our paper i Sectio (5). Full proofs of our theoretical aalysis are available i the appedix.. Related work High dimesioal parameter estimatio for GLMs has bee widely studied, both from a theoretical ad algorithmic poit of view ( [5, 7, 9] ad refereces therei). Learig SIMs is a harder problem ad was first itroduced i ecoometrics [] ad statistics [3]. I [6] the authors proposed ad aalyzed the Isotro algorithm to lear SIMs i the low dimesioal settig. Isotro uses perceptro type updates to lear w?, alog with applicatio of the Pool Adjacet Violator (PAV) algorithm to lear g?. This was improved i [5] where the authors proposed the Slisotro algorithm that combied perceptro updates to lear w? alog with a Lipschitz PAV (LPAV) procedure to lear g?. Both the Isotro ad the Slisotro algorithm rely o performig perceptro updates. While the perceptro algorithm works for low-dimesioal classificatio problems, to the best of our kowledge the performace of the perceptro algorithm has ot bee studied i high-dimesios. Hece, it is ot clear if the Isotro ad the Slisotro algorithms desiged for learig SIM i low-dimesios would work i the high dimesioal settig. Alquier ad Biau [] cosider learig high dimesioal sigle idex models. The authors provide estimators of g?, w? usig PAC-Bayesia aalysis. However, the estimator relies o reversible jump MCMC, ad it is seemigly hard to implemet. Also, the MCMC step is slow to coverge eve for moderately sized problems. To the best of our kowledge, simple, practical algorithms with theoretical guaratees ad good empirical performace for learig sigle idex models i high dimesios are ot available. Restricted versios of the SIM estimatio problem have bee cosidered i [, 2], where the authors are oly iterested i accurate parameter estimatio ad ot predictio. Hece, i these works the proposed algorithms do ot lear the trasfer fuctio. The LPAV: Before we discuss algorithms for learig high dimesioal SIMs, we discuss the LPAV algorithm proposed i [5], as a extesio to the PAV method used i [6]. Give data (p,y ),...(p,y ), where p,...,p 2 R the LPAV outputs the best uivariate mootoic, -Lipschitz fuctio ĝ, that miimizes squared error P (g(p i) y i ) 2. I order to do this, the LPAV first solves the followig optimizatio problem: ẑ = arg mi z2r kz yk2 2 s.t. 0 apple z j z i apple p j p i if p i apple p j (2) where ĝ(p i )=ẑ i. This gives us the value of ĝ o a discrete set of poits p,...,p. To get ĝ everywhere else o the real lie, we simply perform liear iterpolatio as follows: Sort p i for all i ad let p {i} be the i th etry after sortig. The, for ay 2 R, we have 8 >< ẑ {}, if apple p {} ĝ( ) = ẑ {}, if p {} (3) >: µẑ {i} +( µ)ẑ {i+} if = µp {i} +( µ)p {i+} I the algorithms that we shall discuss i this paper we shall ivoke the LPAV routie with p i set to the projectio of the data poit x i o some algorithm-depedet weight vector w. 2

3 2 Statistical model ad proposed algorithms Assume we are provided i.i.d. data {(x,y ),...,(x,y )}, where the label Y is geerated accordig to the model E[Y X = x] =g? (w? > x) for a ukow parameter vector w? 2 R d d ad ukow -Lipschitz, mootoic fuctio g?. We additioally assume that y 2 [0, ], kw? k 2 apple ad kw? k 0 apple s, where k k 0 is the `0 pseudo-orm. The sparsity assumptio o w? is motivated by the fact that cosistet estimatio i high dimesios is a ill-posed problem without makig further structural assumptios o the uderlyig parameters. Our goal is to make predictios o usee data. Specifically, we would like to provide estimators ĝ ad ŵ of g? ad w? so that give a previously usee sample x, we predict ŷ =ĝ(ŵ > x). To this ed, we propose three algorithms that we explai ext 2. SILO: Sigle Idex Lasso Optimizatio We first propose SILO, a simple SIM learig algorithm that first lears ŵ ad the fits a fuctio ĝ usig ŵ. Specifically, SILO performs the followig two steps i a sigle pass:. I order to lear ŵ we solve the problem that was first proposed i [0]. This optimizatio problem is idepedet of the trasfer fuctio g? ad miimizes a liear loss subject to model costraits: ŵ = arg mi w:kwk 2apple, kwk apple p s X y i x > i w. () where the costrait kwk apple p s arises from costraiig a s sparse vector to have uit Euclidea orm. 2. After learig ŵ, SILO simply fits a -Lipschitz mootoic fuctio by ivokig the LPAV routie with the vector p =[p,...,p ], where p i = ŵ > x i. LPAV outputs a fuctio ĝ. Our fial predictor has the form ŷ =ĝ(ŵ > x). Note that there is o eed to re-lear ŵ after learig ĝ, sice the optimizatio problem to lear ŵ is idepedet of ĝ. This property makes SILO a very simple ad a computatioally attractive algorithm. 2.2 isilo: Iterative SILO with squared loss SILO is computatioally very efficiet, sice it oly ivolves learig ŵ, ĝ oce. However, completely igorig ĝ to lear ŵ could be suboptimal, ad we propose two algorithms to overcome this drawback. We first propose isilo, a iterative method detailed i Algorithm. Give the model i (), isilo miimizes the squared loss with a sparsity pealty to estimate ŵ, ĝ: ŵ, ĝ = arg mi w,g X (y i g(w > x i )) 2 + kwk. (5) We adopt a alteratig miimizatio prodecure. I iteratio t, give g t, we would ideally perform a proximal poit update w.r.t. w to obtai X w t = Prox,k k w t (g t (w t > x i ) y i )gt 0! (wt T x i )x i where Prox( ) is the soft thresholdig operator associated with the k k orm, >0 is a appropriate step size, ad g 0 t is the derivative of g t. Ufortuately, the above gradiet step requires us to estimate the derivative of g t, which ca be difficult. So, istead of performig the above proximal gradiet update, we istead perform a proximal perceptro type update similar i spirit to [6, 5], by replacig g 0 t by the Lipschitz costat of g t. Sice g t is obtaied usig 3

4 Algorithm : isilo Require: Data: X =[x,...,x ], Labels: y =[y,...,y ] >, Regularizatio:, Step size, Iitial parameters: g 0 is -Lipschitz, mootoic fuctio, w 0 2 R d, Iteratios: T>0. : Iitialize ŵ = w 0, ĝ = g 0. 2: opterr = MSE(w 0,g 0 ) 3: for t=,... T do : Perform the update show i Equatio (6) to get w t. 5: Calculate err = MSE(w t,g t ). 6: if err apple opterr the 7: opterr = err. 8: ŵ = w t, ĝ = g t 9: ed if 0: Obtai g t by solvig problem (2) with p i = w t > x i ad liear iterpolatio (3) : Calculate err = MSE(w t,g t ). 2: if err apple opterr the 3: opterr = err. : ŵ = w t, ĝ = g t 5: ed if 6: ed for 7: Output ŵ, ĝ the LPAV algorithm, g t is Lipschitz. Note that ulike the perceptro, we have a o uity step size. This leads to the followig update equatio! X w t = Prox,k k w t (g t (w t > x i ) y i )x i (6) Give w t i iteratio t, isilo updates g t to be the solutio to the LPAV problem with p i = w > t x i. The o-covexity of (5) requires us to to perform a book-keepig procedure that keeps track of the best estimate of ĝ, ŵ by calculatig the MSE of the curret hypothesis o a held-out validatio set. This is doe i steps 5-9 ad 2-6 of Algorithms. Similar book-keepig procedures have bee used i the Isotro, ad Slisotro algorithms of [6, 5]. 2.3 cisilo: Iterative SILO with calibrated loss isilo like the Slisotro algorithm [5] use a squared loss fuctio ad a approximate gradiet descet method to estimate w. These methods do ot take ito accout the derivative of the estimate of the trasfer fuctio while takig gradiet descet steps. We ow propose cisilo, a versio of SILO that uses a calibrated loss fuctio that adapts to the SIM that we are tryig to lear. Suppose g? was kow. Let? : R! R be a fuctio such that 0? = g?. Sice g? is mootoically icreasig,? is covex, ad we ca lear ŵ by solvig the followig covex program: ŵ := X?(w > x i ) y i w > x i + kwk (7) Whe the trasfer fuctio is liear,? is a quadratic fuctio, ad we obtai the stadard Lasso problem that miimizes squared loss with ` pealty. Whe the trasfer fuctio is the logit fuctio, (7) reduces to sparse logistic regressio. Modulo, the ` pealty term the above objective is a sample versio of the followig stochastic optimizatio problem: mi E[?(w > x) yw > x]. (8) w

5 0 If? = g?, the the optimal solutio to the above problem correspods to the sigle idex model that satisfies E[Y X = x] =g? (w? > x). Hece the above calibrated loss fuctio takes ito accout the trasfer fuctio g? used i the SIM via? ad automatically adapts to the SIM from which the data is geerated. Whe g? is ukow, we istead cosider the followig optimizatio problem: ŵ, ĝ = arg mi w,g X (w > x i ) y i w > x i + kwk s.t. g = 0 2G (9) where the set G = {g : R! R is a -Lipschitz, mootoic fuctio}. Note that the above optimizatio problem optimizes for g via its itegral. cisilo solves the above optimizatio problem by iteratively miimizig for w, g. The pseudo-code for cisilo is give i Algorithm 2. There are three key update procedures performed i each iteratio of cisilo, which we explai below: I Step, cisilo fixes g to g t ad performs oe step of a proximal poit update o the objective i problem (9) w.r.t. w to get:! X w t = Prox,k k w t (g t (w t > x i ) y i )x i. (0) This step is idetical to the update step i isilo except that the gt 0 does ot feature i this update. Thus, the proximal poit steps usig a calibrated loss fuctio ca be performed exactly ulike the proximal poit steps i isilo. The use of a calibrated loss fuctio brigs with it aother challege: The LPAV procedure, which was desiged to miimize the squared loss, ca o loger be used i cisilo to estimate g?. cisilo istead uses a ovel quadratic program to efficietly estimate g?. From the first order optimality coditios of the optimizatio problem (9) for w at w t we get that the optimal fuctio g t should satisfy X (g t (w t > x i ) y i )x i + t =0, t w t. () g t is updated such that L.H.S. of () has the smallest possible orm. This ca be cast as a quadratic program (QP) as follows: Defie, p =[p,...,p ] >, where p i = w t > x i ad z =[z,...,z ] >, where z i = g t (p i ). Let X =[x,...,x ]x be a d data matrix. Let q = X > y. Now, solve the problem mi z kx > z + qk 2 2 s.t. 0 apple z i apple 8i ad 0 apple z j z i apple p j p i if p i apple p j (2) We call optimizatio problem (2) QPFit, which is differet from the LPAV give that it is derived from optimizig a calibrated loss fuctio, which could be very differet from the squared loss. 2. Iitializig isilo ad cisilo Sice both isilo ad cisilo are o-covex, alteratig miimizatio procedures, a good iitializatio is key to achievig good performace. A simple iitializatio would be to choose w 0 radomly ad g 0 to be the idetity fuctio. However, we iitialize both methods with ŵ, ĝ obtaied by ruig the (efficiet) SILO algorithm from Sectio 2.. We demostrate i the ext sectio that this yields very good theoretical guaratees, as well as good empirical performace i Sectio. Remarks : Like i isilo we perform book-keepig steps i cisilo too. Sice obtaiig exact or approximate gradiets i isilo ad cisilo are easy we use first order methods to solve for ŵ. Usig lie search methods i cisilo, to compute step sizes, would require evaluatig the calibrated loss fuctio. This ca be computatioally itesive, sice we have access to the calibrated loss fuctio oly via its gradiet. Hece, i isilo, ad cisilo we use a fixed step size to perform our updates. Despite the use of fixed step size, we show empirically that isilo is ofte as competitive ad sometimes better at makig predictios tha GLM based methods with optimal step sizes, ad cisilo is sigificatly superior. 5

6 Algorithm 2: cisilo Require: Data: X =[x,...,x ], Labels y =[y,...,y ] >,, Regularizatio parameter parameters: w 0 2 R d,g 0 : R! R is -Lipschitz, mootoic fuctio. : Iitialize ŵ = w 0, ĝ = g 0. 2: opterr = MSE(w 0,g 0 ) 3: for t=,2,... T do : Perform the update step show i Equatio (0) to obtai w t. 5: Calculate err = MSE(w t,g t ). 6: if err apple opterr the 7: opterr = err. 8: ŵ = w t, ĝ = g t 9: ed if 0: Calculate: p Xw t k, q X > y : Obtai g t by solvig problem (2) ad liear iterpolatio. 2: Calculate err = MSE(w t,g t ). 3: if err apple opterr the : opterr = err. 5: ŵ = w t, ĝ = g t 6: ed if 7: ed for 8: Output ŵ, ĝ, step size, Iitial 3 Theoretical aalysis of SILO, isilo ad cisilo I this sectio, we aalyze the excess risk of the predictors output by isilo, ad cisilo. For a give hypothesis ĥ(x) =ĝ(ŵ > x), defie err(h) :=E (h(x) y) 2. The excess risk is the defied as We first list the techical assumptios we make: E(ĥ) :=err(ĥ) err(h?)=e(y ĥ(x)) 2 E(y g? (w >? x)) 2 (3) A. The data x,...,x is sampled i.i.d. from the stadard multivariate Gaussia distributio. A2. E[Y X = x] =g? (w >? x), ad 0 apple Y apple, A3. g? is mootoic ad Lipschitz, A. kw? k 0 apple s, kw? k 2 apple, kŵk 0 apple k, ad k d. We provide sketches of relevat results i this sectio, ad refer the iterested reader to the Appedix for detailed proofs. Our first mai result is a excess risk bouds for SILO: Theorem. Let ĥ(x) =ĝ(ŵ> x) be the hypothesis output by SILO. Let = E µ N(0,) g? (µ)µ >0. The uder assumptios A-A, the excess risk of the predictor ĥ is, with probability at least, bouded from above by r (s + k) log(2d) s E(ĥ) =Õ + p s p (s + k) log(2d) () where Õ hides factors that are poly-logarithmic i, d,,sad k. 6

7 q Proof Sketch: For otatioal coveiece, deote by 2 = Cslog(2d/s), where C>0 is a uiversal costat. WLOG, we ca assume that kŵk 0 apple s. Our assumptio o the sparsity of ŵ is pretty leiet, ad is most ofte satisfied i practice. Also, sice ŵ is obtaied from SILO, we have kŵk 2 apple, kŵk apple p s. From a result of Pla ad Vershyi [0, Corollary 3.] (Lemma i appedix), we kow that kw ŵk 2 2 apple 2. The excess risk E(ĥ) ca be bouded as follows. E(ĥ) =E[(ĝ(ŵ> x) y) 2 (g? (w > x) y) 2 ]=E(ĝ(ŵ > x) g? (w > x)) 2 = E(ĝ(ŵ > x) ĝ(w > x)+ĝ(w > x) g? (w > x)) 2 apple 2(s + k) 2 log(2d)+2e(ĝ(w > x) g? (w > x)) 2 with probability at least where we used the fact that ĝ is -Lipschitz, ad upper bouds o the expected suprema of a collectio of Gaussia radom variables. Next, we shall boud the R.H.S. of the above equatio. E(ĝ(w > x) g? (w > x)) 2 apple (a) E(ĝ(w > x) y) 2 E(g? (w > x) y) 2 (b) apple X (s log(2d)) (ĝ(w > x i ) y i ) 2 (g? (w > x i ) y i ) 2 / + Õ p I iequality (a) we used a certai projectio iequality for covex sets (see Lemma i appedix). To obtai iequality (b) we replace the expected value quatities with their empirical versios, plus deviatio terms. Via stadard (s log(2d))/ applicatio of large deviatio iequalities, it is possible to establish that these deviatios are Õ( p ) (see Lemma 5 i appedix). The proof cocludes by upper boudig the empirical term i the above equatio usig optimality of ĝ ad properties of maxima of a collectio of Gaussia radom variables. Our ext result is a upper boud o the excess risk bouds of isilo ad cisilo: Theorem 2. Suppose ĝ, ŵ are the outputs of SILO o our data. Let ĥ(x) =ĝ(ŵ> x) be the hypothesis correspodig to these outputs. Let h? (x) = def g? (w > x). Now, let ĥt be the output of cisilo obtaied by usig ĝ, ŵ as iitializers. The uder the assumptios A-A, with high probability we ca boud the excess risk of ĥt by s r (s + k) log(2d) s E(ĥT ) apple Õ + p s r p s log 2 (2d + ) (s + k) log(2d) + + where Õ hides factors that are poly-logarithmic i, d,,s,k. Moreover, the same excess risk guaratees hold for ĥ T obtaied by ruig isilo. Proof Sketch : From Theorem we kow that r (s + k) log(2d) s E(ĥ) =err(ĥ) err(h ) apple Õ + p s p (s + k) log(2d) Usig stadard large deviatio argumets (see Lemma 6 i appedix) we ca claim that err(ĥ) with probability at least. This gives us r r s s cerr(ĥ) =err(ĥ)+õ =err(h? )+err(ĥ) err(h?)+õ s r (s + k) log(2d) s =err(h? )+Õ + p s p (s + k) log(2d) + cerr(ĥ) = Õ(p s ) s log 2 (2d + ). Now cosider ĥt obtaied by ruig either cisilo or isilo for T iteratios, whe iitialized with ŵ, ĝ obtaied by ruig SILO first o the data. Sice ĥt is chose by usig a held-out validatio set as the iterate correspodig 7

8 Figure : Errors rates are ormalized so that the Slisotro has a error of. Note that cisilo cosistetly outperforms all other methods, ad isilo is very competitive. The umbers below each dataset refer to (, d) to the smallest validatio error, we ca claim via Hoeffdig iequality that the empirical error of ĥt caot be too much larger tha that of ĥ (for otherwise ĥt will ot be the iterate with the smallest validatio error). Precisely, if the validatio set is of size, the with high probability cerr(ĥt ) apple cerr(ĥ)+õ p. Usig the above iequalities, ad via stadard large deviatio argumets to boud err(ĥt ) Remarks : (s+k)log(2d) I the boud of Theorem 2, the first term i Õ domiates, ad the excess risk boud is essetially p s. Also, usig the output of SILO to iitialize isilo ad cisilo yields strog theoretical guar- Õ atees. cerr(ĥt ) we get the desired result. The costat i our results: acts like the sigal to oise ratio i our results. The larger is, the better our boud gets. For example, for the logistic model, is approximately the orm of the data ( p log(d)). For measuremets of the form y = sig(x T w), is a costat. <0 ca be easily tackled by reversig the sigs of y, ad =0implies that the data ad observatios are ucorrelated, ad aturally ay error boud will be meaigless. Comparisos to existig results i low dimesios: I [5] the authors obtaied dimesio depedet as well as dimesio idepedet bouds o the predictio error for the Slisotro algorithm for the SIM problem. However, these results were obtaied uder the restrictive assumptio that kw? k 2 apple W, kxk 2 apple B, ad both W, B are fixed ad idepedet of dimesios. I order to carry through a correct high-dimesioal aalysis, oe eeds to let either W or B or both grow with d. I our aalysis, we assume that the data is sampled from a stadard multi-variate Gaussia, ad hece kxk 2 apple p d with high probability. If oe were to replace B with p d i the results of [5], the the excess risk of their predictor would scale as mi{ d, p d }, ad sice d, their bouds are meaigless i the highdimesioal settig. I cotrast our results i Theorem 2 have a (poly)-logarithmic depedece o d, ad hece are /3 / useful i the high dimesioal settig studied i this paper. The same argumets apply to the results of [6], where i additio oe eeds a fresh batch of samples at each ru. Experimetal results We tested our algorithms SILO, isilo, ad cisilo o may real world high dimesioal datasets. For compariso with methods that assume g kow, we used Sparse Logistic Regressio (SLR), ad Sparse Squared Hige Loss miimizatio (SHL) [3] 2. We also tested the Slisotro [5] algorithm desiged for low-dimesioal SIM. For each dataset we radomly chose 60% of the data for traiig, ad 20% each for validatio ad testig. The parameters, are chose via validatio. Mac-Wi, Crypt-Elec, Atheism-Religio ad Auto-Motorcycle are from the 20 Newsgroups I their aalysis B =. 2 code dowloaded from schmidtm/software/lgeeral.html 8

9 dataset. Arcee is from the NIPS challege 3, ad the Page dataset is obtaied form the WebKB dataset [8]. Prostrate ad Colo cacer datasets are available olie 5. Figure shows the misclassificatio error obtaied o the test set. We show results for 8 datasets of varyig size. Additioal results are available i the supplemetary material. Sice the datasets (ad errors) are varied, we ormalize the error rates so that the Slisotro has uit error. As we ca see from these results, usig the calibrated loss i cisilo yields the best performace i all the datasets cosidered, except MacWi. isilo is as good as or better tha SLR i 6/8 cases. It is ecouragig to ote that isilo ad cisilo do well despite ot havig the luxury of choosig optimal step sizes at each iteratio. Fially, the relatively poor performace of SILO uderlies the importace of iterative methods i the SIM learig settig. 5 Coclusios I this paper, we itroduced a suite of algorithms based o sparse parameter estimatio for learig sigle idex models i the high dimesioal settig. We derived excess risk guaratees for the proposed methods. Our algorithm employig a calibrated loss ad a ovel quadratic programmig method to fit the trasfer fuctio achieves superior results compared to stadard high dimesioal classificatio methods based o miimizig the logistic or the hige loss. I the future we pla to ivestigate learig sigle idex models with structural costraits other tha sparsity such as low rak, group sparsity, ad ideed other very geeral costraits. Refereces [] Pierre Alquier ad Gérard Biau. Sparse sigle-idex model. The Joural of Machie Learig Research, ():23 280, 203. [2] Joel L Horowitz. Semiparametric ad oparametric methods i ecoometrics. Spriger, [3] Joel L Horowitz ad Wolfgag Härdle. Direct semiparametric estimatio of sigle-idex models with discrete covariates. Joural of the America Statistical Associatio, 9(36):632 60, 996. [] Hidehiko Ichimura. Semiparametric least squares (sls) ad weighted sls estimatio of sigle-idex models. Joural of Ecoometrics, 58():7 20, 993. [5] Sham M Kakade, Varu Kaade, Ohad Shamir, ad Adam Kalai. Efficiet learig of geeralized liear ad sigle idex models with isotoic regressio. I Advaces i Neural Iformatio Processig Systems, pages , 20. [6] Adam Tauma Kalai ad Ravi Sastry. The isotro algorithm: High-dimesioal isotoic regressio. I COLT, [7] Sahad N Negahba, Pradeep Ravikumar, Marti J Waiwright, ad Bi Yu. A uified framework for highdimesioal aalysis of m-estimators with decomposable regularizers. Statistical Sciece, 27(): , 202. [8] Kamal Paul Nigam. Usig ulabeled data to improve text classificatio. PhD thesis, Citeseer, 200. [9] Mee Youg Park ad Trevor Hastie. L-regularizatio path algorithm for geeralized liear models. Joural of the Royal Statistical Society: Series B (Statistical Methodology), 69(): , [0] Yaiv Pla ad Roma Vershyi. Robust -bit compressed sesig ad sparse logistic regressio: A covex programmig approach. Iformatio Theory, IEEE Trasactios o, 59():82 9, jiashu/research/software/hcclassificatio/prostate/ 9

10 [] Yaiv Pla, Roma Vershyi, ad Elea Yudovia. High-dimesioal estimatio with geometric costraits. arxiv preprit arxiv:0.379, 20. [2] Nikhil S Rao, Robert D Nowak, Christopher R Cox, ad Timothy T Rogers. Classificatio with sparse overlappig groups. arxiv preprit arxiv:02.52, 20. [3] Mark Schmidt, Gle Fug, ad Romer Rosales. Optimizatio methods for l-regularizatio. Uiversity of British Columbia, Techical Report TR-2009, 9, [] Natha Srebro, Karthik Sridhara, ad Ambuj Tewari. Smoothess, low oise ad fast rates. I Advaces i Neural Iformatio Processig Systems, pages , 200. [5] Sara A Va de Geer. High-dimesioal geeralized liear models ad the lasso. The Aals of Statistics, pages 6 65, [6] Tog Zhag. Coverig umber bouds of certai regularized liear fuctio classes. The Joural of Machie Learig Research, 2: ,

11 A Prelimiaries We shall eed a few defiitios ad a few importat lemmas ad propositios before we ca state the proofs of our theorems. We shall cosider the followig fuctio class. G = {g :[ W, W ]! [0, ],g is -Lipschitz ad mootoic}. (5) Though the above defiitio of G uses a uspecified parameter W, most ofte we shall use W = p s log(2d). The followig result cocerig suprema of a collectio of i.i.d. Gaussia radom variables is stadard ad we shall state it without proof. Propositio. Let [g i ] m be a collectio of m i.i.d. Gaussia radom variables with mea 0 ad variace 2 p max g p i apple log(2m)+ 2 log(2/ ) w.p. i2[m] The ext lemma is stadard ad a proof ca be foud i Lemma 9 i [5].. The, Lemma. Let F be a covex class of fuctios, ad let f = arg mi f2f E(f(x) y) 2. Suppose that E[Y X = x] =g? (w > x) for some g? 2G. The for ay f 2F, the followig holds true E[(f(x) y) 2 ] E[(f(x) y) 2 ] E[(f(x) f (x)) 2 ] (6) Lemma 2. Let x 2 R d be a stadard ormal radom vector. The with probability at least w > x apple Õ(p s log(2d)) Proof. The proof follows immediately from Propositio () ad the fact that kw k apple p s. Lemma 3. Let e 2 R d be such that kek 0 apple s + k ad kek 2 apple. Let x be a stadard ormal radom vector. The with probability at least e > x apple Õ( p (s + k) log(2d)) Proof. Let e =[e,...,e d ]. Similarly, let x =[x,...,x d ]. We the have e > x = dx e i x i (7) apple max x i dx e i (8) (a) apple p Xs+k log(2d) e i w.p (9) (b) apple p (s + k) log(2d). (20) I obtaiig iequality (a) we used the fact that the max of the absolute value of d Gaussia radom variables is bouded by p log(2d). I equality (b) we used the fact that kek 0 apple s + k, ad hece oly s + k of the elemets of e are o-zero. We ext eed the followig importat result (Corollary 3. i [0]) Lemma.. Let W = {w 2 R d : kwk 2 apple, kwk apple p s}. Let ŵ be obtaied from SILO, show i the mai paper. Suppose, ŵ 2W. Let x,...x be idepedet Gaussia radom vectors. Assume that the measuremets E[Y X = x] =g? (w > x), where kw k 2 apple, kw k 0 apple s. The with probability at least, the solutio ŵ obtaied from SILO satisfies the iequality kŵ w k 2 2 apple 2 apple r Cslog(2d/s), where C>0 is a uiversal costat, ad = E µ N(0,) g? (µ)µ

12 Lemma 5. With probability at least E(ĝ(w > x) y) 2 E(g? (w > x) y) 2 apple X (ĝ(w > x i ) y i ) 2 (g? (w > x i ) y i ) 2 + Õ where Õ hides factors that are (poly)-logarithmic i, Proof. From Lemma 6 (i) i [5] we kow that r! W (2) N 2 (r, G, z,...,z ) applen (r, G) apple r 2 2W r, (22) where N 2 (r, G, z,...,z ) is the L 2 empirical coverig umber of fuctio class G at radius r, ad N (r, G) is the L coverig umber. Usig Dudley etropy itegral, we ca upper boud the empirical Rademacher complexity by s Z ˆR (G) = if + 0 log(/r)+ 2W r dr apple 0p W p. (23) >0 Hece, via stadard large deviatio iequalities we ca claim that E[(ĝ(w > x) y) 2 ] apple r X (ĝ(w > W x) y) 2 + O( ). (2) Similarly via stadard cocetratio iequalities we ca claim that with probability at least, E[(g? (w > x) y) 2 ] r X log(2/ ) (g? (w > x) y) 2 appleo( ) (25) i ad hece puttig together the above two iequalities the desired result follows. B Proof of Theorem q For otatioal coveiece, deote by 2 = Cslog(2d/s), where C>0 is a uiversal costat. Sice, ŵ is obtaied from SILO, we have kŵk 2 apple, kŵk apple p s. The excess risk E(ĥ) ca be bouded as follows. E(ĥ) =E[(ĝ(ŵ> x) y) 2 (g? (w > x) y) 2 ] = E(ĝ(ŵ > x) g? (w > x)) 2 = E(ĝ(ŵ > x) ĝ(w > x)+ĝ(w > x) g? (w > x)) 2 apple 2 E(ĝ(ŵ > x) ĝ(w > x)) 2 +2E(ĝ(w > x) g? (w > x)) 2 (a) apple 2 E((ŵ w ) > x) 2 +2E(ĝ(w > x) g? (w > x)) 2 (b) apple s 2 log(2d)+2e(ĝ(w > x) g? (w > x)) 2 with probability at least (26) Where i order to obtai iequality (a) we used the fact that ĝ is -Lipschitz, ad i order to obtai iequality (b) we used Lemma (3). We shall ow boud the R.H.S. of iequality 26. We do this as follows E(ĝ(w > x) g? (w > x)) 2 apple (a) E(ĝ(w > x) y) 2 E(g? (w > x) y) 2 (27) (b) apple X (ĝ(w > x i ) y i ) 2 (g? (w > x i ) y i ) 2 + (28) 2

13 I iequality (a) we used Lemma with the fuctio class F = G w. I iequality (b) we used Lemma (5) the expectatio quatity i terms of its empirical quatity, with set to the maximum value of w > x i. We kow, from Lemma 2 that this max value is p s log(2d) with probability at least. Hece by substitutig W = p s log(2d) qp s log(2d) for W, we get = O. Next we shall try to upper boud the empirical term i the above equatio. We have X (ĝ(w > x i ) y i ) 2 (g? (w > x i ) y i ) 2 = X (ĝ(ŵ > x i ) y i ĝ(ŵ > x i )+ĝ(w > x i )) 2 X (g? (ŵ > x i ) y i g? (ŵ > x i )+g? (w > x i )) 2 = X (ĝ(ŵ > x i ) y i ) 2 X (g? (ŵ > x i ) y i ) 2 {z } apple0 + X (ĝ(ŵ > x i ) ĝ(w > x i )) 2 X (g? (w > x i ) ĝ(ŵ > x i )) 2 {z } {z } T 0 X + 2 (ĝ(ŵ > x i ) y i )(ĝ(ŵ > x i ) ĝ(w > x i )) {z } T 2 2 X (g? (ŵ > x i ) y i )(g? (ŵ > x i ) g? (ŵ > x i )) {z } T 3 where the term marked as apple 0 is egative because ĝ is the solutio to a miimizatio problem that miimizes the empirical squared error uder mootoicity ad -Lipschitz costraits. Sice g? is also mootoic ad -Lipschitz the squared error correspodig to the predictor ĝ(ŵ > x) should be smaller tha the squared error correspodig to g? (ŵ > x). The term marked as 0 is positive because it is a average of squared quatities. We shall ow boud T,T 2,T 3 as follows T = (a) apple (29) X (ĝ(ŵ > x i ) ĝ(w > x i )) 2 (30) X ((ŵ w ) > x i )) 2 (3) (b) apple (s + k) 2 log(2d) (32) where, to obtai iequality (a) we used the fact that ĝ is -Lipschitz, ad to obtai iequality (b) we used Lemma 2. To upper boud T 2 we proceed as follows T 2 = 2 (a) apple 2 X (ĝ(ŵ > x i ) y i )(ĝ(ŵ > x i ) ĝ(w > x i )) (33) X ĝ(ŵ > x i ) ĝ(w > x i )) (3) (b) apple p (s + k) log(2d) (35) 3

14 To obtai iequality (a) we used the fact that y i ĝ(ŵ > x i ) apple, ad to obtai iequality (b) we used the fact that ĝ is -Lipschitz ad Lemma 3. The same reasoig ca be applied to upper boud T 3 to get T 3 apple p k log(2d). Fially usig lemma (), we kow that kw ŵk 2 2 = 2 apple Õ( p s ). Gatherig all the terms, we get with probability at least, r (s + k) log(2d) s E(ĥ) =Õ + p s p (s + k) log(2d) (36) where, = E µ N(0,) g(µ)µ is a costat that depeds o g?. C Large Deviatio Guaratees for isilo, cisilo Lemma 6. For ay hypothesis h(x) =g(w > x), where W = {w 2 R d : kwk apple p s, kwk 2 apple }, g 2G, w 2W, we have r pcerr(ht s err(h T ) apple cerr(h T )+Õ ), where the Õ hides factors (poly) logarithmic i d,, /. I particular the above result also applies to h T which is the hypothesis obtaied by ruig isilo or cisilo for T iteratios, ad to ĥ, the hypothesis obtaied by ruig SILO. Before we give the proof of this theorem, we would like to poit out that our assumptio that ŵ 2Wis ot at all restrictive. I practice the result provided by the iterates of a proximal gradiet method used i SILO -M for a sufficietly large are sparse. Proof. Cosider the fuctio class H = {h(x) =g(w > x):w 2W,g 2G}. By costructio, we are guarateed that h T, ĥ 2H, w.h.p., with W = p s log(2d). I order to establish a large deviatio boud o the risk of h T we shall first calculate the worst case Rademacher complexity of H. To do this, we establish L 2 coverig umber of the fuctio class H by establishig L coverig umber of U, ad L 2 coverig umber of W. Both these results are stadard. From Lemma 6 i [5] we have N (, G) apple log + 2sp log(2d). (37) Sice, kwk apple p s, kxk apple Õ(p log(2d)), we ca use Theorem 3 i [6], to coclude that w.h.p. It is ot hard to see that log N 2 (W,,) apple s log2 (2d + ) 2. (38) log N 2 (F,,) apple log N 2 W, 2 p 2, + log N G, 2 p 2 s log 2 = Õ (2d + ) 2 (39) (0) Usig Lemma A. i [] we ca boud the worst case Rademacher complexity of H by 0s ˆR (H) apple s log 2 (2d + ) A

15 Fially applyig Theorem i [] we get with probability at least 0 s err(h T ) apple cerr(h T p cerr(h T ) s log 2 (2d + ) A. D Proof of Theorem (2) Proof. From Theorem () we kow that r (s + k) log(2d) s E(ĥ) =err(ĥ) err(h ) apple Õ + p s p (s + k) log(2d) () Usig Lemma 6 we ca say that with probability at least s cerr(ĥ) =err(ĥ)+ s log 2 (2d + ) =err(h? )+err(ĥ) err(h?)+ r (s + k) log(2d) s =err(h? )+Õ + p s s s log 2 (2d + ) p (s + k) log(2d) + s (2) s log 2 (2d + ). (3) Now cosider ĥt obtaied by ruig isilo for T iteratios, whe iitialized with ŵ, ĝ obtaied by ruig SILO first o the data. Sice ĥt is chose by usig a held-out validatio set as the iterate correspodig to the smallest validatio error, we ca claim via Hoeffdig iequality that the empirical error of ĥt caot be too much larger tha that of ĥ (for otherwise ĥt will ot be the iterate with the smallest validatio error). Precisely, if the validatio set is of size, the with high probability cerr(ĥt ) apple cerr(ĥ)+õ p. () Summig up Equatios () ad (2) we get 0 r cerr(ĥt ) apple err(h (s + k) log(2d) s + p s p ss log 2 (2d + ) (s + k) log(2d)+ + r A (5) Now usig Theorem (6) to upper boud err(ĥt ) i terms of cerr(ĥt ), ad combiig it with the above boud we get the desired result. The same argumets apply eve to the cisilo algorithm. E Additioal Experimetal Results Here we report results o other high dimesioal datasets. Figure 2 agai shows the advatage of the calibrated, ad iterative method cisilo. Table has the details of the datasets i Figure 2 5

16 Eyedata Lik PageLik Slisotro SHL SLR SILO isilo Figure 2: Compariso of differet methods over differet datasets. The results are ormalized so that the Slisotro has error = Dataset d Leukamia 729 Eyedata Lik Page+Lik Gisette Table : Dataset details 6

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis

Running Time ( 3.1) Analysis of Algorithms. Experimental Studies ( 3.1.1) Limitations of Experiments. Pseudocode ( 3.1.2) Theoretical Analysis Ruig Time ( 3.) Aalysis of Algorithms Iput Algorithm Output A algorithm is a step-by-step procedure for solvig a problem i a fiite amout of time. Most algorithms trasform iput objects ito output objects.

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Regularized Distance Metric Learning: Theory and Algorithm

Regularized Distance Metric Learning: Theory and Algorithm Regularized Distace Metric Learig: Theory ad Algorithm Rog Ji 1 Shiju Wag 2 Yag Zhou 1 1 Dept. of Computer Sciece & Egieerig, Michiga State Uiversity, East Lasig, MI 48824 2 Radiology ad Imagig Scieces,

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth [email protected] Ju Liao [email protected] Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch [email protected] Friedrich Miescher Laboratory of

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Chapter 14 Nonparametric Statistics

Chapter 14 Nonparametric Statistics Chapter 14 Noparametric Statistics A.K.A. distributio-free statistics! Does ot deped o the populatio fittig ay particular type of distributio (e.g, ormal). Sice these methods make fewer assumptios, they

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

Entropy of bi-capacities

Entropy of bi-capacities Etropy of bi-capacities Iva Kojadiovic LINA CNRS FRE 2729 Site école polytechique de l uiv. de Nates Rue Christia Pauc 44306 Nates, Frace [email protected] Jea-Luc Marichal Applied Mathematics

More information

Determining the sample size

Determining the sample size Determiig the sample size Oe of the most commo questios ay statisticia gets asked is How large a sample size do I eed? Researchers are ofte surprised to fid out that the aswer depeds o a umber of factors

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich [email protected] [email protected] Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: [email protected] Supervised

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection The aalysis of the Courot oligopoly model cosiderig the subjective motive i the strategy selectio Shigehito Furuyama Teruhisa Nakai Departmet of Systems Maagemet Egieerig Faculty of Egieerig Kasai Uiversity

More information

THE TWO-VARIABLE LINEAR REGRESSION MODEL

THE TWO-VARIABLE LINEAR REGRESSION MODEL THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part

More information

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09-AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,

More information

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

1. MATHEMATICAL INDUCTION

1. MATHEMATICAL INDUCTION 1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Finding the circle that best fits a set of points

Finding the circle that best fits a set of points Fidig the circle that best fits a set of poits L. MAISONOBE October 5 th 007 Cotets 1 Itroductio Solvig the problem.1 Priciples............................... Iitializatio.............................

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

Introduction to Statistical Learning Theory

Introduction to Statistical Learning Theory Itroductio to Statistical Learig Theory Olivier Bousquet 1, Stéphae Bouchero 2, ad Gábor Lugosi 3 1 Max-Plack Istitute for Biological Cyberetics Spemastr 38, D-72076 Tübige, Germay olivierbousquet@m4xorg

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions

On the Generalization Ability of Online Learning Algorithms for Pairwise Loss Functions O the Geeraliatio Ability of Olie Learig Algorithms for Pairwise Loss Fuctios Purushottam Kar [email protected] Departmet of Computer Sciece ad Egieerig, Idia Istitute of Techology, Kapur, UP 208

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

Research Method (I) --Knowledge on Sampling (Simple Random Sampling)

Research Method (I) --Knowledge on Sampling (Simple Random Sampling) Research Method (I) --Kowledge o Samplig (Simple Radom Samplig) 1. Itroductio to samplig 1.1 Defiitio of samplig Samplig ca be defied as selectig part of the elemets i a populatio. It results i the fact

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

Stock Market Trading via Stochastic Network Optimization

Stock Market Trading via Stochastic Network Optimization PROC. IEEE CONFERENCE ON DECISION AND CONTROL (CDC), ATLANTA, GA, DEC. 2010 1 Stock Market Tradig via Stochastic Network Optimizatio Michael J. Neely Uiversity of Souther Califoria http://www-rcf.usc.edu/

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

Lesson 15 ANOVA (analysis of variance)

Lesson 15 ANOVA (analysis of variance) Outlie Variability -betwee group variability -withi group variability -total variability -F-ratio Computatio -sums of squares (betwee/withi/total -degrees of freedom (betwee/withi/total -mea square (betwee/withi

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

1 The Gaussian channel

1 The Gaussian channel ECE 77 Lecture 0 The Gaussia chael Objective: I this lecture we will lear about commuicatio over a chael of practical iterest, i which the trasmitted sigal is subjected to additive white Gaussia oise.

More information

Stochastic Online Scheduling with Precedence Constraints

Stochastic Online Scheduling with Precedence Constraints Stochastic Olie Schedulig with Precedece Costraits Nicole Megow Tark Vredeveld July 15, 2008 Abstract We cosider the preemptive ad o-preemptive problems of schedulig obs with precedece costraits o parallel

More information

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand [email protected]

Chatpun Khamyat Department of Industrial Engineering, Kasetsart University, Bangkok, Thailand ocpky@hotmail.com SOLVING THE OIL DELIVERY TRUCKS ROUTING PROBLEM WITH MODIFY MULTI-TRAVELING SALESMAN PROBLEM APPROACH CASE STUDY: THE SME'S OIL LOGISTIC COMPANY IN BANGKOK THAILAND Chatpu Khamyat Departmet of Idustrial

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

A Recursive Formula for Moments of a Binomial Distribution

A Recursive Formula for Moments of a Binomial Distribution A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

Present Values, Investment Returns and Discount Rates

Present Values, Investment Returns and Discount Rates Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC [email protected] May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function A Efficiet Polyomial Approximatio of the Normal Distributio Fuctio & Its Iverse Fuctio Wisto A. Richards, 1 Robi Atoie, * 1 Asho Sahai, ad 3 M. Raghuadh Acharya 1 Departmet of Mathematics & Computer Sciece;

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information