HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY

Size: px
Start display at page:

Download "HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY"

Transcription

1 The Aals of Statistics 2012, Vol. 40, No. 3, DOI: /12-AOS1018 Istitute of Mathematical Statistics, 2012 HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY BY PO-LING LOH 1,2 AND MARTIN J. WAINWRIGHT 2 Uiversity of Califoria, Berkeley Although the stadard formulatios of predictio problems ivolve fullyobserved ad oiseless data draw i a i.i.d. maer, may applicatios ivolve oisy ad/or missig data, possibly ivolvig depedece, as well. We study these issues i the cotext of high-dimesioal sparse liear regressio, ad propose ovel estimators for the cases of oisy, missig ad/or depedet data. May stadard approaches to oisy or missig data, such as those usig the EM algorithm, lead to optimizatio problems that are iheretly ocovex, ad it is difficult to establish theoretical guaratees o practical algorithms. While our approach also ivolves optimizig ocovex programs, we are able to both aalyze the statistical error associated with ay global optimum, ad more surprisigly, to prove that a simple algorithm based o projected gradiet descet will coverge i polyomial time to a small eighborhood of the set of all global miimizers. O the statistical side, we provide oasymptotic bouds that hold with high probability for the cases of oisy, missig ad/or depedet data. O the computatioal side, we prove that uder the same types of coditios required for statistical cosistecy, the projected gradiet descet algorithm is guarateed to coverge at a geometric rate to a ear-global miimizer. We illustrate these theoretical predictios with simulatios, showig close agreemet with the predicted scaligs. 1. Itroductio. I stadard formulatios of predictio problems, it is assumed that the covariates are fully-observed ad sampled idepedetly from some uderlyig distributio. However, these assumptios are ot realistic for may applicatios, i which covariates may be observed oly partially, observed subject to corruptio or exhibit some type of depedecy. Cosider the problem of modelig the votig behavior of politicias: i this settig, votes may be missig due to abstetios, ad temporally depedet due to collusio or tit-for-tat behavior. Similarly, surveys ofte suffer from the missig data problem, sice users fail to respod to all questios. Sesor etwork data also teds to be both oisy due to measuremet error, ad partially missig due to failures or drop-outs of sesors. Received September 2011; revised May Supported i part by a Hertz Foudatio Fellowship ad the Departmet of Defese (DoD) through a NDSEG Fellowship. 2 Supported i part by NSF Grat DMS ad Air Force Office of Scietific Research Grat AFOSR-09NL184. MSC2010 subject classificatios. Primary 62F12; secodary 68W25. Key words ad phrases. High-dimesioal statistics, missig data, ocovexity, regularizatio, sparse liear regressio, M-estimatio. 1637

2 1638 P.-L. LOH AND M. J. WAINWRIGHT There are a variety of methods for dealig with oisy ad/or missig data, icludig various heuristic methods, as well as likelihood-based methods ivolvig the expectatio maximizatio (EM) algorithm (e.g., see the book [8] ad refereces therei). A challege i this cotext is the possible ocovexity of associated optimizatio problems. For istace, i applicatios of EM, problems i which the egative likelihood is a covex fuctio ofte become ocovex with missig or oisy data. Cosequetly, although the EM algorithm will coverge to a local miimum, it is difficult to guaratee that the local optimum is close to a global miimum. I this paper, we study these issues i the cotext of high-dimesioal sparse liear regressio i particular, i the case whe the predictors or covariates are oisy, missig, ad/or depedet. Our mai cotributio is to develop ad study simple methods for hadlig these issues, ad to prove theoretical results about both the associated statistical error ad the optimizatio error. Like EM-based approaches, our estimators are based o solvig optimizatio problems that may be ocovex; however, despite this ocovexity, we are still able to prove that a simple form of projected gradiet descet will produce a output that is sufficietly close as small as the statistical error to ay global optimum. As a secod result, we boud the statistical error, showig that it has the same scalig as the miimax rates for the classical cases of perfectly observed ad idepedetly sampled covariates. I this way, we obtai estimators for oisy, missig, ad/or depedet data that have the same scalig behavior as the usual fully-observed ad idepedet case. The resultig estimators allow us to solve the problem of high-dimesioal Gaussia graphical model selectio with missig data. There is a large body of work o the problem of corrupted covariates or errori-variables for regressio problems (e.g., see the papers ad books [3, 6, 7, 21], as well as refereces therei). Much of the earlier theoretical work is classical i ature, meaig that it requires that the sample size diverges with the dimesio p fixed. Most relevat to this paper is more recet work that has examied issues of corrupted ad/or missig data i the cotext of high-dimesioal sparse liear models, allowig for p. Städler ad Bühlma [18] developed a EMbased method for sparse iverse covariace matrix estimatio i the missig data regime, ad used this result to derive a algorithm for sparse liear regressio with missig data. As metioed above, however, it is difficult to guaratee that EM will coverge to a poit close to a global optimum of the likelihood, i cotrast to the methods studied here. Rosebaum ad Tsybakov [14] studied the sparse liear model whe the covariates are corrupted by oise, ad proposed a modified form of the Datzig selector (see the discussio followig our mai results for a detailed compariso to this past work, ad also to cocurret work [15] by the same authors). For the particular case of multiplicative oise, the type of estimator that we cosider here has bee studied i past work [21]; however, this theoretical aalysis is of the classical type, holdig oly for p, i cotrast to the high-dimesioal models that are of iterest here.

3 HIGH-DIMENSIONAL NOISY LASSO 1639 The remaider of this paper is orgaized as follows. We begi i Sectio 2 with backgroud ad a precise descriptio of the problem. We the itroduce the class of estimators we will cosider ad the form of the projected gradiet descet algorithm. Sectio 3 is devoted to a descriptio of our mai results, icludig a pair of geeral theorems o the statistical ad optimizatio error, ad the a series of corollaries applyig our results to the cases of oisy, missig, ad depedet data. I Sectio 4, we demostrate simulatios to cofirm that our methods work i practice, ad verify the theoretically-predicted scalig laws. Sectio 5 cotais proofs of some of the mai results, with the remaiig proofs cotaied i the supplemetary Appedix [9]. NOTATION. For a matrix M, we write M max := max i,j m ij to be the elemetwise l -orm of M. Furthermore, M 1 deotes the iduced l 1 -operator orm (maximum absolute colum sum) of M, ad M op is the spectral orm of M. We write κ(m) := λ max(m) λ mi (M), the coditio umber of M. For matrices M 1,M 2, we write M 1 M 2 to deote the compoetwise Hadamard product, ad write M 1 : M 2 to deote compoetwise divisio. For fuctios f() ad g(), we write f() g() to mea that f() cg() for a uiversal costat c (0, ), ad similarly, f() g() whe f() c g() for some uiversal costat c (0, ). Fially, we write f() g() whe f() g() ad f() g() hold simultaeously. 2. Backgroud ad problem setup. I this sectio, we provide backgroud ad a precise descriptio of the problem, ad the motivate the class of estimators aalyzed i this paper. We the discuss a simple class of projected gradiet descet algorithms that ca be used to obtai a estimator Observatio model ad high-dimesioal framework. Suppose we observe a respose variable y i R liked to a covariate vector x i R p via the liear model (2.1) y i = x i,β + ε i for i = 1, 2,...,. Here, the regressio vector β R p is ukow, ad ε i R is observatio oise, idepedet of x i. Rather tha directly observig each x i R p, we observe a vector z i R p liked to x i via some coditioal distributio, that is, (2.2) z i Q( x i ) for i = 1, 2,...,. This setup applies to various disturbaces to the covariates, icludig: (a) Covariates with additive oise: We observe z i = x i + w i,wherew i R p is a radom vector idepedet of x i, say zero-mea with kow covariace matrix w.

4 1640 P.-L. LOH AND M. J. WAINWRIGHT (b) Missig data: For some fractio ρ [0, 1), we observe a radom vector z i R p such that for each compoet j, we idepedetly observe z ij = x ij with probability 1 ρ, adz ij = with probability ρ. We ca also cosider the case whe the etries i the jth colum have a differet probability ρ j of beig missig. (c) Covariates with multiplicative oise: Geeralizig the missig data problem, suppose we observe z i = x i u i,whereu i R p is agai a radom vector idepedet of x i,ad is the Hadamard product. The problem of missig data is a special case of multiplicative oise, where all u ij s are idepedet ad u ij Beroulli(1 ρ j ). Our first set of results is determiistic, depedig o specific istatiatios of the observatios {(y i,z i )} i=1. However, we are also iterested i results that hold with high probability whe the x i s ad z i s are draw at radom. We cosider both the case whe the x i s are draw i.i.d. from a fixed distributio; ad the case of depedet covariates, whe the x i s are geerated accordig to a statioary vector autoregressive (VAR) process. We work withi a high-dimesioal framework that allows the umber of predictors p to grow ad possibly exceed the sample size. Of course, cosistet estimatio whe p is impossible uless the model is edowed with additioal structure for istace, sparsity i the parameter vector β. Cosequetly, we study the class of models where β has at most k ozero parameters, where k is also allowed to icrease to ifiity with p ad M-estimators for oisy ad missig covariates. I order to motivate the class of estimators we will cosider, let us begi by examiig a simple determiistic problem. Let x 0 be the covariace matrix of the covariates, ad cosider the l 1 -costraied quadratic program { 1 2 βt x β x β,β }. (2.3) β arg mi β 1 R As log as the costrait radius R is at least β 1, the uique solutio to this covex program is β = β. Of course, this program is a idealizatio, sice i practice we may ot kow the covariace matrix x, ad we certaily do ot kow x β after all, β is the quatity we are tryig to estimate! Noetheless, this idealizatio still provides useful ituitio, as it suggests various estimators based o the plug-i priciple. Give a set of samples, it is atural to form estimates of the quatities x ad x β, which we deote by Ɣ R p p ad γ R p, respectively, ad to cosider the modified program (2.4) β arg mi β 1 R or alteratively, the regularized versio (2.5) β arg mi β R p { 1 2 βt Ɣβ γ,β }, { } 1 2 βt Ɣβ γ,β +λ β 1,

5 HIGH-DIMENSIONAL NOISY LASSO 1641 where λ > 0 is a user-defied regularizatio parameter. Note that the two problems are equivalet by Lagragia duality whe the objectives are covex, but ot i the case of a ocovex objective. The Lasso [4, 19] is a special case of these programs, obtaied by settig (2.6) Ɣ Las := 1 XT X ad γ Las := 1 XT y, where we have itroduced the shorthad y = (y 1,...,y ) T R,adX R p, with xi T as its ith row. A simple calculatio shows that ( Ɣ Las, γ Las ) are ubiased estimators of the pair ( x, x β ). This ubiasedess ad additioal cocetratio iequalities (to be described i the sequel) uderlie the well-kow aalysis of the Lasso i the high-dimesioal regime. I this paper, we focus o more geeral istatiatios of the programs (2.4)ad (2.5), ivolvig differet choices of the pair ( Ɣ, γ) that are adapted to the cases of oisy ad/or missig data. Note that the matrix Ɣ Las is positive semidefiite, so the Lasso program is covex. I sharp cotrast, for the case of oisy or missig data, the most atural choice of the matrix Ɣ is ot positive semidefiite, hece the quadratic losses appearig i the problems (2.4) ad(2.5) areocovex. Furthermore, whe Ɣ has egative eigevalues, the objective i equatio (2.5) is ubouded from below. Hece, we make use of the followig regularized estimator: { } 1 (2.7) β arg mi β 1 b 0 k 2 βt Ɣβ γ,β +λ β 1 for a suitable costat b 0. I the presece of ocovexity, it is geerally impossible to provide a polyomial-time algorithm that coverges to a (ear) global optimum, due to the presece of local miima. Remarkably, we are able to prove that this issue is ot sigificat i our settig, ad a simple projected gradiet descet algorithm applied to the programs (2.4) or(2.7) coverges with high probability to a vector extremely close to ay global optimum. Let us illustrate these ideas with some examples. Recall that ( Ɣ, γ) serve as ubiased estimators for ( x, x β ). EXAMPLE 1 (Additive oise). Suppose we observe Z = X + W,whereW is a radom matrix idepedet of X, with rows w i draw i.i.d. from a zero-mea distributio with kow covariace w. We cosider the pair (2.8) Ɣ add := 1 ZT Z w ad γ add := 1 ZT y. Note that whe w = 0 (correspodig to the oiseless case), the estimators reduce to the stadard Lasso. However, whe w 0, the matrix Ɣ add is ot positive semidefiite i the high-dimesioal regime ( p). Ideed, sice the matrix 1 ZT Z has rak at most, the subtracted matrix w may cause Ɣ add to have a

6 1642 P.-L. LOH AND M. J. WAINWRIGHT large umber of egative eigevalues. For istace, if w = σw 2I for σ w 2 > 0, the Ɣ add has p eigevalues equal to σw 2. EXAMPLE 2 (Missig data). We ow cosider the case where the etries of X are missig at radom. Let us first describe a estimator for the special case where each etry is missig at radom, idepedetly with some costat probability ρ [0, 1). (I Example 3 to follow, we will describe the extesio to geeral missig probabilities.) Cosequetly, we observe the matrix Z R p with etries { Xij, with probability 1 ρ, Z ij = 0, otherwise. Give the observed matrix Z R p,weuse Z T ( Z Z Ɣ T mis := ρ diag Z ) (2.9) ad γ mis := 1 Z T y, where Z ij = Z ij /(1 ρ). It is easy to see that the pair ( Ɣ mis, γ mis ) reduces to the pair ( Ɣ Las, γ Las ) for the stadard Lasso whe ρ = 0, correspodig to o missig Z T Z data. I the more iterestig case whe ρ (0, 1), the matrix i equatio (2.9) has rak at most, so the subtracted diagoal matrix may cause the matrix Ɣ mis to have a large umber of egative eigevalues whe p. As a cosequece, the matrix Ɣ mis is ot (i geeral) positive semidefiite, so the associated quadratic fuctio is ot covex. EXAMPLE 3 (Multiplicative oise). As a geeralizatio of the previous example, we ow cosider the case of multiplicative oise. I particular, suppose we observe the quatity Z = X U, whereu is a matrix of oegative oise variables. I may applicatios, it is atural to assume that the rows u i of U are draw i a i.i.d. maer, say from some distributio i which both the vector E[u 1 ] ad the matrix E[u 1 u T 1 ] have strictly positive etries. This geeral family of multiplicative oise models arises i various applicatios; we refer the reader to the papers [3, 6, 7, 21] for more discussio ad examples. A atural choice of the pair ( Ɣ, γ)is give by the quatities Ɣ mul := 1 ZT Z : E ( u 1 u T ) (2.10) 1 ad Ɣ mul := 1 ZT y : E(u 1 ), where : deotes elemetwise divisio. A small calculatio shows that these are ubiased estimators of x ad x β, respectively. The estimators (2.10) have bee studied i past work [21], but oly uder classical scalig ( p). As a special case of the estimators (2.10), suppose the etries u ij of U are idepedet Beroulli(1 ρ j ) radom variables. The the observed matrix Z = X U correspods to a missig-data matrix, where each elemet of the jth colum has probability ρ j of beig missig. I this case, the estimators (2.10) become Ɣ mis = ZT Z (2.11) : M ad γ mis = 1 ZT y : (1 ρ),

7 HIGH-DIMENSIONAL NOISY LASSO 1643 where M := E(u 1 u T 1 ) satisfies { (1 ρi )(1 ρ M ij = j ), if i j, 1 ρ i, if i = j, ρ is the parameter vector cotaiig the ρ j s, ad 1 is the vector of all 1 s. I this way, we obtai a geeralizatio of the estimator discussed i Example Restricted eigevalue coditios. Give a estimate β, there are various ways to assess its closeess to β. I this paper, we focus o the l 2 -orm β β 2, as well as the closely related l 1 -orm β β 1. Whe the covariate matrix X is fully observed (so that the Lasso ca be applied), it is ow well uderstood that a sufficiet coditio for l 2 -recovery is that the matrix Ɣ Las = 1 XT X satisfy a certai type of restricted eigevalue (RE) coditio (e.g., [2, 20]). I this paper, we make use of the followig coditio. DEFINITION 1 (Lower-RE coditio). The matrix Ɣ satisfies a lower restricted eigevalue coditio with curvature α 1 > 0 ad tolerace τ(,p)>0if (2.12) θ T Ɣθ α 1 θ 2 2 τ(,p) θ 2 1 for all θ R p. It ca be show that whe the Lasso matrix Ɣ Las = 1 XT X satisfies this RE coditio (2.12), the Lasso estimate has low l 2 -error for ay vector β supported o ay subset of size at most k τ(,p) 1. I particular, boud (2.12) implies a sparse RE coditio for all k of this magitude, ad coversely, Lemma 11 i the Appedix of [9] shows that a sparse RE coditio implies boud (2.12). I this paper, we work with coditio (2.12), sice it is especially coveiet for aalyzig optimizatio algorithms. I the stadard settig (with ucorrupted ad fully observed desig matrices), it is kow that for may choices of the desig matrix X (with rows havig covariace ), the Lasso matrix Ɣ Las will satisfy such a RE coditio with high probability (e.g., [13, 17]) with α 1 = 1 2 λ mi( ) ad τ(,p) log p. A sigificat portio of the aalysis i this paper is devoted to provig that differet choices of Ɣ, such as the matrices Ɣ add ad Ɣ mis defied earlier, also satisfy coditio (2.12) with high probability. This fact is by o meas obvious, sice as previously discussed, the matrices Ɣ add ad Ɣ mis geerally have large umbers of egative eigevalues. Fially, although such upper bouds are ot ecessary for statistical cosistecy, our algorithmic results make use of the aalogous upper restricted eigevalue coditio, formalized i the followig: DEFINITION 2 (Upper-RE coditio). The matrix Ɣ satisfies a upper restricted eigevalue coditio with smoothess α 2 > 0 ad tolerace τ(,p) > 0 if (2.13) θ T Ɣθ α 2 θ τ(,p) θ 2 1 for all θ R p.

8 1644 P.-L. LOH AND M. J. WAINWRIGHT I recet work o high-dimesioal projected gradiet descet, Agarwal et al. [1] make use of a more geeral form of the lower ad upper bouds (2.12) ad (2.13), applicable to oquadratic losses as well, which are referred to as the restricted strog covexity (RSC) ad restricted smoothess (RSM) coditios, respectively. For various class of radom desig matrices, it ca be show that the Lasso matrix Ɣ Las satisfies the upper boud (2.13) with α 2 = 2λ max ( x ) ad τ(,p) log p ; see Raskutti et al. [13] for the Gaussia case ad Rudelso ad Zhou [17] for the sub-gaussia settig. We will establish similar scalig for our choices of Ɣ Gradiet descet algorithms. I additio to provig results about the global miima of the (possibly ocovex) programs (2.4) ad(2.5), we are also iterested i polyomial-time procedures for approximatig such optima. I this paper, we aalyze some simple algorithms for solvig either the costraied program (2.4) or the Lagragia versio (2.7). Note that the gradiet of the quadratic loss fuctio takes the form L(β) = Ɣβ γ. I applicatio to the costraied versio, the method of projected gradiet descet geerates a sequece of iterates {β t,t = 0, 1, 2,...} by the recursio { β t+1 = arg mi L ( β t) + L ( β t ),β β t + η β 1 R 2 β β t } (2.14) 2 2, where η>0 is a stepsize parameter. Equivaletly, this update ca be writte as β t+1 = (β t η 1 L(βt )), where deotes the l 2 -projectio oto the l 1 -ball of radius R. This projectio ca be computed rapidly i O(p) time usig a procedure duetoduchietal.[5]. For the Lagragia update, we use a slight variat of the projected gradiet update (2.14), amely { β t+1 = arg mi L ( β t ) + L ( β t ),β β t + η β 1 R 2 β β t } (2.15) λ β 1 with the oly differece beig the iclusio of the regularizatio term. This update ca also performed efficietly by performig two projectios oto the l 1 -ball; see the paper [1] for details. Whe the objective fuctio is covex (equivaletly, Ɣ is positive semidefiite), the iterates (2.14) or(2.15) are guarateed to coverge to a global miimum of the objective fuctios (2.4) ad(2.7), respectively. I our settig, the matrix Ɣ eed ot be positive semidefiite, so the best geeric guaratee is that the iterates coverge to a local optimum. However, our aalysis shows that for the family of programs (2.4) or(2.7), uder a reasoable set of coditios satisfied by various statistical models, the iterates actually coverge to a poit extremely close to ay global optimum i both l 1 -orm ad l 2 -orm; see Theorem 2 to follow for a more detailed statemet.

9 HIGH-DIMENSIONAL NOISY LASSO Mai results ad cosequeces. We ow state our mai results ad discuss their cosequeces for oisy, missig, ad depedet data Geeral results. We provide theoretical guaratees for both the costraied estimator (2.4) ad the Lagragia versio (2.7). Note that we obtai differet optimizatio problems as we vary the choice of the pair ( Ɣ, γ) R p p R p. We begi by statig a pair of geeral results, applicable to ay pair that satisfies certai coditios. Our first result (Theorem 1) provides bouds o the statistical error, amely the quatity β β 2, as well as the correspodig l 1 -error, where β is ay global optimum of the programs (2.4) or(2.7). Sice the problem may be ocovex i geeral, it is ot immediately obvious that oe ca obtai a provably good approximatio to ay global optimum without resortig to costly search methods. I order to assuage this cocer, our secod result (Theorem 2) provides rigorous bouds o the optimizatio error, amely the differeces β t β 2 ad β t β 1 icurred by the iterate β t after ruig t rouds of the projected gradiet descet updates (2.14) or(2.15) Statistical error. I cotrollig the statistical error, we assume that the matrix Ɣ satisfies a lower-re coditio with curvature α 1 ad tolerace τ(,p),as previously defied (2.12). Recall that Ɣ ad γ serve as surrogates to the determiistic quatities x R p p ad x β R p, respectively. Our results also ivolve a measure of deviatio i these surrogates. I particular, we assume that there is some fuctio ϕ(q,σ ε ), depedig o the two sources of oise i our problem: the stadard deviatio σ ε of the observatio oise vector ε from equatio (2.1), ad the coditioal distributio Q from equatio (2.2) that liks the covariates x i to the observed versios z i. With this otatio, we cosider the deviatio coditio (3.1) γ Ɣβ ϕ(q,σ ε ) log p. To aid ituitio, ote that iequality (3.1) holds wheever the followig two deviatio coditios are satisfied: γ x β ϕ(q,σ ε ) ad (3.2) log p ( Ɣ x )β log p ϕ(q,σ ε ). The pair of iequalities (3.2) clearly measures the deviatio of the estimators ( Ɣ, γ)from their populatio versios, ad they are sometimes easier to verify theoretically. However, iequality (3.1) may be used directly to derive tighter bouds (e.g., i the additive oise case). Ideed, the bouds established via iequalities (3.2) is ot sharp i the limit of low oise o the covariates, due to the secod

10 1646 P.-L. LOH AND M. J. WAINWRIGHT iequality. I the proofs of our corollaries to follow, we will verify the deviatio coditios for various forms of oisy, missig, ad depedet data, with the quatity ϕ(q,σ ε ) chagig depedig o the model. We have the followig result, which applies to ay global optimum β of the regularized versio (2.7) with λ 4ϕ(Q,σ ε ) log p : THEOREM 1 (Statistical error). Suppose the surrogates ( Ɣ, γ) satisfy the deviatio boud (3.1), ad the matrix Ɣ satisfies the lower-re coditio (2.12) with parameters (α 1,τ)such that (3.3) { α1 kτ(,p) mi 128 k, ϕ(q,σ } ε) log p. b 0 The for ay vector β with sparsity at most k, there is a uiversal positive costat c 0 such that ay global optimum β of the Lagragia program (2.7) with ay b 0 β 2 satisfies the bouds (3.4a) β β 2 c { } 0 k log p max ϕ(q,σ ε ) α 1,λ ad (3.4b) β β 1 8c { } 0k log p max ϕ(q,σ ε ) α 1,λ. The same bouds (without λ ) also apply to the costraied program (2.4) with radius choice R = β 1. Remarks. To be clear, all the claims of Theorem 1 are determiistic. Probabilistic coditios will eter whe we aalyze specific statistical models ad certify that the RE coditio (3.3) ad deviatio coditios are satisfied by a radom pair ( Ɣ, γ)with high probability. We ote that for the stadard Lasso choice ( Ɣ Las, γ Las ) of this matrix vector pair, bouds of the form (3.4) for sub-gaussia oise are well kow from past work (e.g., [2, 11, 12, 23]). The ovelty of Theorem 1 is i allowig for geeral pairs of such surrogates, which as show by the examples discussed earlier ca lead to ocovexity i the uderlyig M- estimator. Moreover, some iterestig differeces arise due to the term ϕ(q,σ ε ), which chages depedig o the ature of the model (missig, oisy, ad/or depedet). As will be clarified i the sequel. Provig that the coditios of Theorem 1 are satisfied with high probability for oisy/missig data requires some otrivial aalysis ivolvig both cocetratio iequalities ad radom matrix theory. Note that i the presece of ocovexity, it is possible i priciple for the optimizatio problems (2.4) ad(2.7) tohavemay global optima that are separated by large distaces. Iterestigly, Theorem 1 guaratees that this upleasat feature does ot arise uder the stated coditios: give ay two global optima β ad β

11 HIGH-DIMENSIONAL NOISY LASSO 1647 of the program (2.4), Theorem 1 combied with the triagle iequality guaratees that β β 2 β β 2 + β β ϕ(q,σ ε ) k log p 2 2c 0 α 1 [ad similarly for the program (2.7)]. Cosequetly, uder ay scalig such that k log p = o(1), the set of all global optima must lie withi a l 2 -ball whose radius shriks to zero. I additio, it is worth observig that Theorem 1 makes a specific predictio for the scalig behavior of the l 2 -error β β 2. I order to study this scalig predictio, we performed simulatios uder the additive oise model described i Example 1, usig the parameter settig x = I ad w = σw 2I with σ w = 0.2. Pael (a) of Figure 1 provides plots 3 of the error β β 2 versus the sample size, for problem dimesios p {128, 256, 512}. Note that for all three choices of dimesios, the error decreases to zero as the sample size icreases, showig cosistecy of the method. The curves also shift to the right as the dimesio p icreases, reflectig the atural ituitio that larger problems are harder i a certai sese. Theorem 1 makes a specific predictio about this scalig behavior: i particular, if we plot the l 2 -error versus the rescaled sample size /(k log p),the curves should roughly alig for differet values of p. Pael (b) shows the same data re-plotted o these rescaled axes, thus verifyig the predicted stackig behavior. (a) (b) FIG. 1. Plots of the error β β 2 after ruig projected gradiet descet o the ocovex objective, with sparsity k p. Plot (a) is a error plot for i.i.d. data with additive oise, ad plot (b) shows l 2 -error versus the rescaled sample size k log p. As predicted by Theorem 1, the curves alig for differet values of p i the rescaled plot. 3 Corollary 1, to be stated shortly, guaratees that the coditios of Theorem 1 are satisfied with high probability for the additive oise model. I additio, Theorem 2 to follow provides a efficiet method of obtaiig a accurate approximatio of the global optimum.

12 1648 P.-L. LOH AND M. J. WAINWRIGHT Fially, as oted by a reviewer, the costrait R = β 1 i the program (2.4) is rather restrictive, sice β is ukow. Theorem 1 merely establishes a heuristic for the scalig expected for this optimal radius. I this regard, the Lagragia estimator (2.7) is more appealig, sice it oly requires choosig b 0 to be larger tha β 2, ad the coditios o the regularizer λ are the stadard oes from past work o the Lasso Optimizatio error. Although Theorem 1 provides guaratees that hold uiformly for ay global miimizer, it does ot provide guidace o how to approximate such a global miimizer usig a polyomial-time algorithm. Ideed, for ocovex programs i geeral, gradiet-type methods may become trapped i local miima, ad it is impossible to guaratee that all such local miima are close to a global optimum. Noetheless, we are able to show that for the family of programs (2.4), uder reasoable coditios o Ɣ satisfied i various settigs, simple gradiet methods will coverge geometrically fast to a very good approximatio of ay global optimum. The followig theorem supposes that we apply the projected gradiet updates (2.14) to the costraied program (2.4), or the composite updates (2.15) to the Lagragia program (2.7), with stepsize η = 2α 2. I both cases, we assume that k log p, as is required for statistical cosistecy i Theorem 1. THEOREM 2 (Optimizatio error). Uder the coditios of Theorem 1: (a) For ay global optimum β of the costraied program (2.4), there are uiversal positive costats (c 1,c 2 ) ad a cotractio coefficiet γ (0, 1), idepedet of (,p,k), such that the gradiet descet iterates (2.14) satisfy the bouds (3.5) β t β 2 2 γ t β 0 β c log p 1 β β c 2 β β 2 2, (3.6) β t β 1 2 k β t β k β β β β 1 for all t 0. (b) Lettig φ deote the objective fuctio of Lagragia program (2.7) with global optimum β, ad applyig composite gradiet updates (2.15), there are uiversal positive costats (c 1,c 2 ) ad a cotractio coefficiet γ (0, 1), idepedet of (,p,k), such that (3.7) β t β 2 2 c 1 β β 2 2 where T := c 2 log (φ(β0 ) φ( β)) δ 2 / log(1/γ ). } {{ } δ 2 for all iterates t T, Remarks. As with Theorem 1, these claims are determiistic i ature. Probabilistic coditios will eter ito the corollaries, which ivolve provig that the surrogate matrices Ɣ used for oisy, missig ad/or depedet data satisfy the

13 HIGH-DIMENSIONAL NOISY LASSO 1649 lower- ad upper-re coditios with high probability. The proof of Theorem 2 itself is based o a extesio of a result due to Agarwal et al. [1] o the covergece of projected gradiet descet ad composite gradiet descet i high dimesios. Their result, as origially stated, imposed covexity of the loss fuctio, but the proof ca be modified so as to apply to the ocovex loss fuctios of iterest here. As oted followig Theorem 1, all global miimizers of the ocovex program (2.4) lie withi a small ball. I additio, Theorem 2 guaratees that the local miimizers also lie withi a ball of the same magitude. Note that i order to show that Theorem 2 ca be applied to the specific statistical models of iterest i this paper, a cosiderable amout of techical aalysis remais i order to establish that its coditios hold with high probability. I order to uderstad the sigificace of the bouds (3.5) ad(3.7), ote that they provide upper bouds for the l 2 -distace betwee the iterate β t at time t, which is easily computed i polyomial-time, ad ay global optimum β of the program (2.4) or(2.7), which may be difficult to compute. Focusig o boud (3.5), sice γ (0, 1), the first term i the boud vaishes as t icreases. The remaiig terms ivolve the statistical errors β β q,forq = 1, 2, which are cotrolled i Theorem 1. It ca be verified that the two terms ivolvig the statistical error o the right-had side are bouded as O( k log p ), so Theorem 2 guaratees that projected gradiet descet produce a output that is essetially as good i terms of statistical error as ay global optimum of the program (2.4). Boud (3.7) provides a similar guaratee for composite gradiet descet applied to the Lagragia versio. Experimetally, we have foud that the predictios of Theorem 2 are bore out i simulatios. Figure 2 shows the results of applyig the projected gradiet descet method to solve the optimizatio problem (2.4) i the case of additive oise (a) (b) FIG. 2. Plots of the optimizatio error log( β t β 2 ) ad statistical error log( β t β 2 ) versus iteratio umber t, geerated by ruig projected gradiet descet o the ocovex objective. Each plot shows the solutio path for the same problem istace, usig 10 differet startig poits. As predicted by Theorem 2, the optimizatio error decreases geometrically.

14 1650 P.-L. LOH AND M. J. WAINWRIGHT [pael (a)], ad missig data [pael (b)]. I each case, we geerated a radom problem istace, ad the applied the projected gradiet descet method to compute a estimate β. We the reapplied the projected gradiet method to the same problem istace 10 times, each time with a radom startig poit, ad measured the error β t β 2 betwee the iterates ad the first estimate (optimizatio error), ad the error β t β 2 betwee the iterates ad the truth (statistical error). Withi each pael, the blue traces show the optimizatio error over 10 trials, ad the red traces show the statistical error. O the logarithmic scale give, a geometric rate of covergece correspods to a straight lie. As predicted by Theorem 2, regardless of the startig poit, the iterates {β t } exhibit geometric covergece to the same fixed poit. 4 The statistical error cotracts geometrically up to a certai poit, the flattes out Some cosequeces. As discussed previously, both Theorems 1 ad 2 are determiistic results. Applyig them to specific statistical models requires some additioal work i order to establish that the stated coditios are met. We ow tur to the statemets of some cosequeces of these theorems for differet cases of oisy, missig ad depedet data. I all the corollaries below, the claims hold with probability greater tha 1 c 1 exp( c 2 log p),where(c 1,c 2 ) are uiversal positive costats, idepedet of all other problem parameters. Note that i all corollaries, the triplet (,p,k) is assumed to satisfy scalig of the form k log p, asis ecessary for l 2 -cosistet estimatio of k-sparse vectors i p dimesios. DEFINITION 3. We say that a radom matrix X R p is sub-gaussia with parameters (, σ 2 ) if: (a) each row xi T R p is sampled idepedetly from a zero-mea distributio with covariace,ad (b) for ay uit vector u R p, the radom variable u T x i is sub-gaussia with parameter at most σ. For istace, if we form a radom matrix by drawig each row idepedetly from the distributio N(0, ), the the resultig matrix X R p is a sub- Gaussia matrix with parameters (, op ) Bouds for additive oise: i.i.d. case. We begi with the case of i.i.d. samples with additive oise, as described i Example 1. COROLLARY 1. Suppose that we observe Z = X + W, where the radom matrices X, W R p are sub-gaussia with parameters ( x,σx 2 ), ad let ε be 4 To be precise, Theorem 2 states that the iterates will coverge geometrically to a small eighborhood of all the global optima.

15 HIGH-DIMENSIONAL NOISY LASSO 1651 a i.i.d. sub-gaussia vector with parameter σ 2 ε. Let σz 2 = σ x 2 + σ w 2. The uder the scalig max{ σz 4 λ 2 mi ( x ), 1}k log p, for the M-estimator based o the surrogates ( Ɣ add, γ add ), the results of Theorems 1 ad 2 hold with parameters α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 σ z (σ w + σ ε ) β 2, with probability at least 1 c 1 exp( c 2 log p). Remarks. the boud (a) Cosequetly, the l 2 -error of ay optimal solutio β satisfies β β 2 σ z(σ w + σ ε ) λ mi ( x ) β k log p 2 with high probability. The prefactor i this boud has a atural iterpretatio as a iverse sigal-to-oise ratio; for istace, whe X ad W are zero-mea Gaussia matrices with row covariaces x = σx 2I ad w = σw 2 I, respectively, we have λ mi ( x ) = σx 2,so (σ w + σ ε ) σx 2 + σ w 2 = σ w + σ ε 1 + σ w 2 λ mi ( x ) σ x σx 2. This quatity grows with the ratios σ w /σ x ad σ ε /σ x, which measure the SNR of the observed covariates ad predictors, respectively. Note that whe σ w = 0, correspodig to the case of ucorrupted covariates, the boud o l 2 -error agrees with kow results. See Sectio 4 for simulatios ad further discussios of the cosequeces of Corollary 1. (b) We may also compare the results i (a) with bouds from past work o highdimesioal sparse regressio with oisy covariates [15]. I this work, Rosebaum ad Tsybakov derive similar cocetratio bouds o sub-gaussia matrices. The tolerace parameters are all O( log p ), with prefactors depedig o the sub-gaussia parameters of the matrices. I particular, i their otatio, ν ( σ x σ w + σ w σ ε + σw 2 ) log p β 1, leadig to the boud (cf. Theorem 2 of Rosebaum ad Tsybakov [15]) β β 2 ν k λ mi ( x ) σ 2 λ mi ( x ) k log p β 1. Extesios to ukow oise covariace. Situatios may arise where the oise covariace w is ukow, ad must be estimated from the data. Oe simple method is to assume that w is estimated from idepedet observatios of the

16 1652 P.-L. LOH AND M. J. WAINWRIGHT oise. I this case, suppose we idepedetly observe a matrix W 0 R p with i.i.d. vectors of oise. The we use w = 1 W 0 T W 0 as our estimate of w.amore sophisticated variat of this method (cf. Chapter 4 of Carroll et al. [3]) assumes that we observe k i replicate measuremets Z i1,...,z ik for each x i ad form the estimator i=1 ki j=1 w = (Z ij Z i )(Z ij Z i ) T (3.8) i=1. (k i 1) Basedotheestimator w, we form the pair ( Ɣ, γ) such that γ = 1 ZT y ad Ɣ = ZT Z w. I the proofs of Sectio 5, we will aalyze the case where w = 1 W 0 T W 0 ad show that the result of Corollary 1 still holds whe w must be estimated from the data. Note that the estimator i equatio (3.8) will also yield the same result, but the aalysis is more complicated Bouds for missig data: i.i.d. case. Next, we tur to the case of i.i.d. samples with missig data, as discussed i Example 3. For a missig data parameter vector ρ, wedefieρ max := max j ρ j, ad assume ρ max < 1. COROLLARY 2. Let X R p be sub-gaussia with parameters ( x,σx 2), ad Z the missig data matrix with parameter ρ. Let ε be a i.i.d. sub-gaussia vector with parameter σ 2 1 σ ε. If max( 4 (1 ρ max ) 4 x λ 2 mi (, 1)k log p, the Theorems x) 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) σ ad ϕ(q,σ ε ) = c x 0 1 ρ max (σ ε + σ x 1 ρ max ) β 2. Remarks. Suppose X is a Gaussia radom matrix ad ρ j = ρ for all j. I σx this case, the ratio 2 λ mi ( x ) = λ max( x ) λ mi ( x ) = κ( x) is the coditio umber of x. The ( ϕ(q,σ ε ) 1 σ x σ ε α λ mi ( x ) 1 ρ + κ( ) x) (1 ρ) 2 β 2, a quatity that depeds o both the coditioig of x, ad the fractio ρ [0, 1) of missig data. We will cosider the results of Corollary 2 applied to this example i the simulatios of Sectio 4. Extesios to ukow ρ. As i the additive oise case, we may wish to cosider the case whe the missig data parameters ρ are ot observed ad must be estimated from the data. For each j = 1, 2,...,p, we estimate ρ j usig ρ j, the empirical average of the umber of observed etries per colum. Let ρ R p deote the resultig estimator of ρ. Naturally, we use the pair of estimators ( Ɣ, γ) defied by (3.9) Ɣ = ZT Z : M ad γ = 1 ZT y : (1 ρ),

17 where HIGH-DIMENSIONAL NOISY LASSO 1653 M ij = { (1 ρi )(1 ρ j ), if i j, 1 ρ i, if i = j. We will show i Sectio 5 that Corollary 2 holds whe ρ is estimated by ρ Bouds for depedet data. Turig to the case of depedet data, we cosider the settig where the rows of X are draw from a statioary vector autoregressive (VAR) process accordig to (3.10) x i+1 = Ax i + v i for i = 1, 2,..., 1, where v i R p is a zero-mea oise vector with covariace matrix v,ad A R p p is a drivig matrix with spectral orm A 2 < 1. We assume the rows of X are draw from a Gaussia distributio with covariace x, such that x = A x A T + v. Hece, the rows of X are idetically distributed but ot idepedet, with the choice A = 0 givig rise to the i.i.d. sceario. Corollaries 3 ad 4 correspod to the case of additive oise ad missig data for a Gaussia VAR process. COROLLARY 3. Suppose the rows of X are draw accordig to a Gaussia VAR process with drivig matrix A. Suppose the additive oise matrix W is i.i.d. with Gaussia rows, ad let ε be a i.i.d. sub-gaussia vector with parameter σ 2 ε. ζ If max( 4 λ 2 mi ( x), 1)k log p, with ζ 2 = w op + 2 x op 1 A op, the Theorems 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 (σ ε ζ + ζ 2 ) β 2. COROLLARY 4. Suppose the rows of X are draw accordig to a Gaussia VAR process with drivig matrix A, ad Z is the observed matrix subject to missig data, with parameter ρ. Let ε be a i.i.d. sub-gaussia vector with parameter σ 2 ζ ε. If max( 4 λ 2 mi ( x), 1)k log p, with ζ = x op (1 ρ max ) 2 1 A op, the Theorems 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 (σ ε ζ + ζ 2 ) β 2. REMARKS. Note that the scalig ad the form of ϕ i Corollaries 2 4 are very similar, except with differet effective variaces σ 2 σx = 2, ζ 2 or ζ 2, (1 ρ max ) 2 depedig o the type of corruptio i the data. As we will see i Sectio 5, the proofs ivolve verifyig the deviatio coditios (3.2) usig similar techiques. O the other had, the proof of Corollary 1 proceeds via deviatio coditio (3.1), which produces a tighter boud. Note that we may exted the cases of depedet data to situatios whe w ad ρ are ukow ad must be estimated from the data. The proofs of these extesios are idetical to the i.i.d case, so we will omit them.

18 1654 P.-L. LOH AND M. J. WAINWRIGHT 3.3. Applicatio to graphical model iverse covariace estimatio. The problem of iverse covariace estimatio for a Gaussia graphical model is also related to the Lasso. Meishause ad Bühlma [10] prescribed a way to recover the support of the precisio matrix whe each colum of is k-sparse, via liear regressio ad the Lasso. More recetly, Yua [22] proposed a method for estimatig usig the Datzig selector, ad obtaied error bouds o 1 whe the colums of are bouded i l 1. Both of these results assume that X is fully-observed ad has i.i.d. rows. Suppose we are give a matrix X R p of samples from a multivariate Gaussia distributio, where each row is distributed accordig to N(0, ). We assume the rows of X are either i.i.d. or sampled from a Gaussia VAR process. Based o the modified Lasso of the previous sectio, we devise a method to estimate based o a corrupted observatio matrix Z, whe is sparse. Our method bears similarity to the method of Yua [22], but is valid i the case of corrupted data, ad does ot require a l 1 colum boud. Let X j deote the jth colum of X, ad let X j deote the matrix X with jth colum removed. By stadard results o Gaussia graphical models, there exists a vector θ j R p 1 such that (3.11) X j = X j θ j + ε j, where ε j is a vector of i.i.d. Gaussias ad ε j X j for each j.ifwedefiea j := ( jj j, j θ j ) 1, we ca verify that j, j = a j θ j. Our algorithm, described below, forms estimates θ j ad â j for each j, the combies the estimates to obtai a estimate j, j = â j θ j. I the additive oise case, we observe the matrix Z = X + W. From the equatios (3.11), we obtai Z j = X j θ j + (ε j + W j ). Note that δ j = ε j + W j is a vector of i.i.d. Gaussias, ad sice X W,wehaveδ j X j. Hece, our results o covariates with additive oise allow us to recover θ j from Z. We ca verify that this reduces to solvig the program (2.4) or(2.7) with the pair ( Ɣ (j), γ (j) ) = ( j, j, 1 Z jt Z j ),where = 1 ZT Z w. Whe Z is a missig-data versio of X, we similarly estimate the vectors θ j via equatio (3.11), usig our results o the Lasso with missig covariates. Here, both covariates ad resposes are subject to missig data, but this makes o differece i our theoretical results. For each j,weusethepair ( Ɣ (j), γ (j)) ( = j, j, 1 Z jt Z j : ( 1 ρ j ) ) (1 ρ j ), where = 1 ZT Z : M, adm is defied as i Example 3. To obtai the estimate, we therefore propose the followig procedure, based o the estimators {( Ɣ (j), γ (j) )} p j=1 ad. ALGORITHM 3.1. (1) Perform p liear regressios of the variables Z j upo the remaiig variables Z j, usig the program (2.4) or(2.7) with the estimators ( Ɣ (j), γ (j) ), to obtai estimates θ j of θ j.

19 HIGH-DIMENSIONAL NOISY LASSO 1655 (2) Estimate the scalars a j usig the quatity â j := ( jj j, j θ j ) 1, based o the estimator.form with j, j = â j θ j ad jj = â j. (3) Set = arg mi S p 1,whereS p is the set of symmetric matrices. Note that the miimizatio i step (3) is a liear program, so is easily solved with stadard methods. We have the followig corollary about : COROLLARY 5. Suppose the colums of the matrix are k-sparse, ad suppose the coditio umber κ( ) is ozero ad fiite. Suppose we have γ (j) Ɣ (j) θ j log p (3.12) ϕ(q,σ ε ) j, ad suppose we have the followig additioal deviatio coditio o : (3.13) max cϕ(q,σ ε ) log p. Fially, suppose the lower-re coditio holds uiformly over the matrices with the scalig (3.3).The uder the estimatio procedure of Algorithm 3.1, there exists a uiversal costat c 0 such that op c 0κ 2 ( ( ) ϕ(q,σε ) λ mi ( ) λ mi ( ) + ϕ(q,σ ) ε) log p k α 1. REMARKS. Note that Corollary 5 is agai a determiistic result, with parallel structure to Theorem 1. Furthermore, the deviatio bouds (3.12) ad(3.13) hold for all scearios cosidered i Sectio 3.2 above, usig Corollaries 1 4 for the first two iequalities, ad a similar boudig techique for max ;adthe lower-re coditio holds over all matrices Ɣ (j) by the same techique used to establish the lower-re coditio for Ɣ. The uiformity of the lower-re boud over all sub-matrices holds because 0 <λ mi ( ) λ mi ( j, j ) λ max ( j, j ) λ max ( ) <. Hece, the error boud i Corollary 5 holds with probability at least 1 c 1 exp( c 2 log p) whe k log p, for the appropriate values of ϕ ad α Simulatios. I this sectio, we report some additioal simulatio results to cofirm that the scaligs predicted by our theory are sharp. I Figure 1 followig Theorem 1, we showed that the error curves alig whe plotted agaist a suitably rescaled sample size, i the case of additive oise perturbatios. Pael (a) of Figure 3 shows these same types of rescaled curves for the case of missig data, with sparsity k p, covariate matrix x = I, ad missig fractio ρ = 0.2, whereas pael (b) shows the rescaled plots for the vector autoregressive case with additive Ɣ (j)

20 1656 P.-L. LOH AND M. J. WAINWRIGHT (a) (b) FIG. 3. Plots of the error β β 2 after ruig projected gradiet descet o the ocovex objective, with sparsity k p. I all cases, we plotted the error versus the rescaled sample size k log p. As predicted by Theorems 1 ad 2, the curves alig for differet values of p whe plotted i this rescaled maer.(a)missig data case with i.i.d. covariates.(b)vector autoregressive data with additive oise. Each poit represets a average over 100 trials. oise perturbatios, usig a drivig matrix A with A op = 0.2. Each poit correspods to a average over 100 trials. Oce agai, we see excellet agreemet with the scalig law provided by Theorem 1. We also ra simulatios to verify the form of the fuctio ϕ(q,σ ε ) appearig i Corollaries 1 ad 2. I the additive oise settig for i.i.d. data, we set x = I ad ε equal to i.i.d. Gaussia oise with σ ε = 0.5.Forafixedvalueoftheparametersp = 256 ad k log p, we ra the projected gradiet descet algorithm for differet values of σ w (0.1, 0.3), such that w = σw 2I ad 60(1 + σ w 2)2 k log p, with β 2 = 1. Accordig to the theory, ϕ(q,σ ε) α (σ w + 0.5) 1 + σw 2,sothat β β 2 (σ w + 0.5) 1 + σw 2 k log p (1 + σw 2)2 k log p σ w σw 2 I order to verify this theoretical predictio, we plotted σ w versus the rescaled 1+σ 2 error w σ w +0.5 β β 2. As show by Figure 4(a), the curve is roughly costat, as predicted by the theory. Similarly, i the missig data settig for i.i.d. data, we set x = I ad ε equal to i.i.d. Gaussia oise with σ ε = 0.5. For a fixed value of the parameters p = 128 ad k log p, we ra simulatios for differet values of the missig data parameter ρ (0, 0.3), such that 60 k log p. Accordig to the theory, ϕ(q,σ ε) (1 ρ) 4 α σ ε 1 ρ + 1. Cosequetly, with our specified scaligs of (,p,k), we should expect a (1 ρ) 2

21 HIGH-DIMENSIONAL NOISY LASSO 1657 (a) FIG. 4. (a)plot of the rescaled l 2 -error 1+σw 2 σ w +0.5 β β 2 versus the additive oise stadard β β deviatio σ w for the i.i.d. model with additive oise. (b)plot of the rescaled l 2 -error (1 ρ) versus the missig fractio ρ for the i.i.d. model with missig data. Both curves are roughly costat, showig that our error bouds o β β 2 exhibit the proper scalig. Each poit represets a average over 200 trials. (b) boud of the form β β 2 ϕ(q,σ ε) k log p (1 ρ). α β β 2 The plot of ρ versus the rescaled error 1+0.5(1 ρ) isshowifigure4(b). The curve is agai roughly costat, agreeig with theoretical results. Fially, we studied the behavior of the iverse covariace matrix estimatio algorithm o three types of Gaussia graphical models: (a) Chai-structured graphs. I this case, all odes of the graph are arraged i a liear chai. Hece, each ode (except the two ed odes) has degree k = 2. The diagoal etries of are set equal to 1, ad all etries correspodig to liks i the chai are set equal to 0.1. The is rescaled so op = 1. (b) Star-structured graphs. I this case, all odes are coected to a cetral ode, which has degree k 0.1p. All other odes have degree 1. The diagoal etries of are set equal to 1, ad all etries correspodig to edges i the graph are set equal to 0.1. The is rescaled so op = 1. (c) Erdős Reyi graphs. This example comes from Rothma et al. [16]. For a sparsity parameter k log p, we radomly geerate the matrix by first geeratig the matrix B such that the diagoal etries are 0, ad all other etries are idepedetly equal to 0.5 with probability k/p, ad 0 otherwise. The δ is chose so that = B + δi has coditio umber p. Fially, is rescaled so op = 1.

22 1658 P.-L. LOH AND M. J. WAINWRIGHT After geeratig the matrix X of i.i.d. samples from the appropriate graphical model, with covariace matrix x = 1, we geerated the corrupted matrix Z = X + W with w = (0.2) 2 I i the additive oise case, or the missig data matrix Z with ρ = 0.2 i the missig data case. 1 Paels (a) ad (c) i Figure 5 show the rescaled l 2 -error k op plotted agaist the sample size for a chai-structured graph. I paels (b) ad (d), we have l 2 -error plotted agaist the rescaled sample size, /(k log p).oceagai, we see good agreemet with the theoretical predictios. We have obtaied qualitatively similar results for the star ad Erdős Reyi graphs. (a) (b) (c) (d) FIG. 5. (a)plots of the error op after ruig projected gradiet descet o the ocovex objective for a chai-structured Gaussia graphical model with additive oise. As predicted by Theorems 1 ad 2, all curves alig whe the error is rescaled by 1 ad plotted agaist the ratio k k log p, as show i (b). Plots (c) ad (d) show the results of simulatios o missig data sets. Each poit represets the average over 50 trials.

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8

CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8 CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09-AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Totally Corrective Boosting Algorithms that Maximize the Margin

Totally Corrective Boosting Algorithms that Maximize the Margin Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Universal coding for classes of sources

Universal coding for classes of sources Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

High-dimensional support union recovery in multivariate regression

High-dimensional support union recovery in multivariate regression High-dimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical

More information

5: Introduction to Estimation

5: Introduction to Estimation 5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample

More information

1. MATHEMATICAL INDUCTION

1. MATHEMATICAL INDUCTION 1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1

More information

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return

where: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

Chapter 5: Inner Product Spaces

Chapter 5: Inner Product Spaces Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES

SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,

More information

Quadrat Sampling in Population Ecology

Quadrat Sampling in Population Ecology Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may

More information

Hypergeometric Distributions

Hypergeometric Distributions 7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Lecture 4: Cheeger s Inequality

Lecture 4: Cheeger s Inequality Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular

More information

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric

More information

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1) BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information

1. C. The formula for the confidence interval for a population mean is: x t, which was

1. C. The formula for the confidence interval for a population mean is: x t, which was s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006 Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam

More information

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This

More information

Lesson 17 Pearson s Correlation Coefficient

Lesson 17 Pearson s Correlation Coefficient Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

A Recursive Formula for Moments of a Binomial Distribution

A Recursive Formula for Moments of a Binomial Distribution A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,

More information

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL. Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory

More information

Lecture 2: Karger s Min Cut Algorithm

Lecture 2: Karger s Min Cut Algorithm priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth

.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

The Stable Marriage Problem

The Stable Marriage Problem The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,

More information

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS

COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

THE problem of fitting a circle to a collection of points

THE problem of fitting a circle to a collection of points IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

3. Greatest Common Divisor - Least Common Multiple

3. Greatest Common Divisor - Least Common Multiple 3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation

hp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics

More information

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2

DAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2 Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,

More information

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments

Project Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please

More information

Present Values, Investment Returns and Discount Rates

Present Values, Investment Returns and Discount Rates Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies

More information

Measures of Spread and Boxplots Discrete Math, Section 9.4

Measures of Spread and Boxplots Discrete Math, Section 9.4 Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,

More information

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS

TIGHT BOUNDS ON EXPECTED ORDER STATISTICS Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter

More information

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function

An Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function A Efficiet Polyomial Approximatio of the Normal Distributio Fuctio & Its Iverse Fuctio Wisto A. Richards, 1 Robi Atoie, * 1 Asho Sahai, ad 3 M. Raghuadh Acharya 1 Departmet of Mathematics & Computer Sciece;

More information

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics

Chair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to

More information

A gentle introduction to Expectation Maximization

A gentle introduction to Expectation Maximization A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

A Mathematical Perspective on Gambling

A Mathematical Perspective on Gambling A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal

More information

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK

THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

Ekkehart Schlicht: Economic Surplus and Derived Demand

Ekkehart Schlicht: Economic Surplus and Derived Demand Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/

More information

Coordinating Principal Component Analyzers

Coordinating Principal Component Analyzers Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet

More information