HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
|
|
- Opal McCarthy
- 8 years ago
- Views:
Transcription
1 The Aals of Statistics 2012, Vol. 40, No. 3, DOI: /12-AOS1018 Istitute of Mathematical Statistics, 2012 HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY BY PO-LING LOH 1,2 AND MARTIN J. WAINWRIGHT 2 Uiversity of Califoria, Berkeley Although the stadard formulatios of predictio problems ivolve fullyobserved ad oiseless data draw i a i.i.d. maer, may applicatios ivolve oisy ad/or missig data, possibly ivolvig depedece, as well. We study these issues i the cotext of high-dimesioal sparse liear regressio, ad propose ovel estimators for the cases of oisy, missig ad/or depedet data. May stadard approaches to oisy or missig data, such as those usig the EM algorithm, lead to optimizatio problems that are iheretly ocovex, ad it is difficult to establish theoretical guaratees o practical algorithms. While our approach also ivolves optimizig ocovex programs, we are able to both aalyze the statistical error associated with ay global optimum, ad more surprisigly, to prove that a simple algorithm based o projected gradiet descet will coverge i polyomial time to a small eighborhood of the set of all global miimizers. O the statistical side, we provide oasymptotic bouds that hold with high probability for the cases of oisy, missig ad/or depedet data. O the computatioal side, we prove that uder the same types of coditios required for statistical cosistecy, the projected gradiet descet algorithm is guarateed to coverge at a geometric rate to a ear-global miimizer. We illustrate these theoretical predictios with simulatios, showig close agreemet with the predicted scaligs. 1. Itroductio. I stadard formulatios of predictio problems, it is assumed that the covariates are fully-observed ad sampled idepedetly from some uderlyig distributio. However, these assumptios are ot realistic for may applicatios, i which covariates may be observed oly partially, observed subject to corruptio or exhibit some type of depedecy. Cosider the problem of modelig the votig behavior of politicias: i this settig, votes may be missig due to abstetios, ad temporally depedet due to collusio or tit-for-tat behavior. Similarly, surveys ofte suffer from the missig data problem, sice users fail to respod to all questios. Sesor etwork data also teds to be both oisy due to measuremet error, ad partially missig due to failures or drop-outs of sesors. Received September 2011; revised May Supported i part by a Hertz Foudatio Fellowship ad the Departmet of Defese (DoD) through a NDSEG Fellowship. 2 Supported i part by NSF Grat DMS ad Air Force Office of Scietific Research Grat AFOSR-09NL184. MSC2010 subject classificatios. Primary 62F12; secodary 68W25. Key words ad phrases. High-dimesioal statistics, missig data, ocovexity, regularizatio, sparse liear regressio, M-estimatio. 1637
2 1638 P.-L. LOH AND M. J. WAINWRIGHT There are a variety of methods for dealig with oisy ad/or missig data, icludig various heuristic methods, as well as likelihood-based methods ivolvig the expectatio maximizatio (EM) algorithm (e.g., see the book [8] ad refereces therei). A challege i this cotext is the possible ocovexity of associated optimizatio problems. For istace, i applicatios of EM, problems i which the egative likelihood is a covex fuctio ofte become ocovex with missig or oisy data. Cosequetly, although the EM algorithm will coverge to a local miimum, it is difficult to guaratee that the local optimum is close to a global miimum. I this paper, we study these issues i the cotext of high-dimesioal sparse liear regressio i particular, i the case whe the predictors or covariates are oisy, missig, ad/or depedet. Our mai cotributio is to develop ad study simple methods for hadlig these issues, ad to prove theoretical results about both the associated statistical error ad the optimizatio error. Like EM-based approaches, our estimators are based o solvig optimizatio problems that may be ocovex; however, despite this ocovexity, we are still able to prove that a simple form of projected gradiet descet will produce a output that is sufficietly close as small as the statistical error to ay global optimum. As a secod result, we boud the statistical error, showig that it has the same scalig as the miimax rates for the classical cases of perfectly observed ad idepedetly sampled covariates. I this way, we obtai estimators for oisy, missig, ad/or depedet data that have the same scalig behavior as the usual fully-observed ad idepedet case. The resultig estimators allow us to solve the problem of high-dimesioal Gaussia graphical model selectio with missig data. There is a large body of work o the problem of corrupted covariates or errori-variables for regressio problems (e.g., see the papers ad books [3, 6, 7, 21], as well as refereces therei). Much of the earlier theoretical work is classical i ature, meaig that it requires that the sample size diverges with the dimesio p fixed. Most relevat to this paper is more recet work that has examied issues of corrupted ad/or missig data i the cotext of high-dimesioal sparse liear models, allowig for p. Städler ad Bühlma [18] developed a EMbased method for sparse iverse covariace matrix estimatio i the missig data regime, ad used this result to derive a algorithm for sparse liear regressio with missig data. As metioed above, however, it is difficult to guaratee that EM will coverge to a poit close to a global optimum of the likelihood, i cotrast to the methods studied here. Rosebaum ad Tsybakov [14] studied the sparse liear model whe the covariates are corrupted by oise, ad proposed a modified form of the Datzig selector (see the discussio followig our mai results for a detailed compariso to this past work, ad also to cocurret work [15] by the same authors). For the particular case of multiplicative oise, the type of estimator that we cosider here has bee studied i past work [21]; however, this theoretical aalysis is of the classical type, holdig oly for p, i cotrast to the high-dimesioal models that are of iterest here.
3 HIGH-DIMENSIONAL NOISY LASSO 1639 The remaider of this paper is orgaized as follows. We begi i Sectio 2 with backgroud ad a precise descriptio of the problem. We the itroduce the class of estimators we will cosider ad the form of the projected gradiet descet algorithm. Sectio 3 is devoted to a descriptio of our mai results, icludig a pair of geeral theorems o the statistical ad optimizatio error, ad the a series of corollaries applyig our results to the cases of oisy, missig, ad depedet data. I Sectio 4, we demostrate simulatios to cofirm that our methods work i practice, ad verify the theoretically-predicted scalig laws. Sectio 5 cotais proofs of some of the mai results, with the remaiig proofs cotaied i the supplemetary Appedix [9]. NOTATION. For a matrix M, we write M max := max i,j m ij to be the elemetwise l -orm of M. Furthermore, M 1 deotes the iduced l 1 -operator orm (maximum absolute colum sum) of M, ad M op is the spectral orm of M. We write κ(m) := λ max(m) λ mi (M), the coditio umber of M. For matrices M 1,M 2, we write M 1 M 2 to deote the compoetwise Hadamard product, ad write M 1 : M 2 to deote compoetwise divisio. For fuctios f() ad g(), we write f() g() to mea that f() cg() for a uiversal costat c (0, ), ad similarly, f() g() whe f() c g() for some uiversal costat c (0, ). Fially, we write f() g() whe f() g() ad f() g() hold simultaeously. 2. Backgroud ad problem setup. I this sectio, we provide backgroud ad a precise descriptio of the problem, ad the motivate the class of estimators aalyzed i this paper. We the discuss a simple class of projected gradiet descet algorithms that ca be used to obtai a estimator Observatio model ad high-dimesioal framework. Suppose we observe a respose variable y i R liked to a covariate vector x i R p via the liear model (2.1) y i = x i,β + ε i for i = 1, 2,...,. Here, the regressio vector β R p is ukow, ad ε i R is observatio oise, idepedet of x i. Rather tha directly observig each x i R p, we observe a vector z i R p liked to x i via some coditioal distributio, that is, (2.2) z i Q( x i ) for i = 1, 2,...,. This setup applies to various disturbaces to the covariates, icludig: (a) Covariates with additive oise: We observe z i = x i + w i,wherew i R p is a radom vector idepedet of x i, say zero-mea with kow covariace matrix w.
4 1640 P.-L. LOH AND M. J. WAINWRIGHT (b) Missig data: For some fractio ρ [0, 1), we observe a radom vector z i R p such that for each compoet j, we idepedetly observe z ij = x ij with probability 1 ρ, adz ij = with probability ρ. We ca also cosider the case whe the etries i the jth colum have a differet probability ρ j of beig missig. (c) Covariates with multiplicative oise: Geeralizig the missig data problem, suppose we observe z i = x i u i,whereu i R p is agai a radom vector idepedet of x i,ad is the Hadamard product. The problem of missig data is a special case of multiplicative oise, where all u ij s are idepedet ad u ij Beroulli(1 ρ j ). Our first set of results is determiistic, depedig o specific istatiatios of the observatios {(y i,z i )} i=1. However, we are also iterested i results that hold with high probability whe the x i s ad z i s are draw at radom. We cosider both the case whe the x i s are draw i.i.d. from a fixed distributio; ad the case of depedet covariates, whe the x i s are geerated accordig to a statioary vector autoregressive (VAR) process. We work withi a high-dimesioal framework that allows the umber of predictors p to grow ad possibly exceed the sample size. Of course, cosistet estimatio whe p is impossible uless the model is edowed with additioal structure for istace, sparsity i the parameter vector β. Cosequetly, we study the class of models where β has at most k ozero parameters, where k is also allowed to icrease to ifiity with p ad M-estimators for oisy ad missig covariates. I order to motivate the class of estimators we will cosider, let us begi by examiig a simple determiistic problem. Let x 0 be the covariace matrix of the covariates, ad cosider the l 1 -costraied quadratic program { 1 2 βt x β x β,β }. (2.3) β arg mi β 1 R As log as the costrait radius R is at least β 1, the uique solutio to this covex program is β = β. Of course, this program is a idealizatio, sice i practice we may ot kow the covariace matrix x, ad we certaily do ot kow x β after all, β is the quatity we are tryig to estimate! Noetheless, this idealizatio still provides useful ituitio, as it suggests various estimators based o the plug-i priciple. Give a set of samples, it is atural to form estimates of the quatities x ad x β, which we deote by Ɣ R p p ad γ R p, respectively, ad to cosider the modified program (2.4) β arg mi β 1 R or alteratively, the regularized versio (2.5) β arg mi β R p { 1 2 βt Ɣβ γ,β }, { } 1 2 βt Ɣβ γ,β +λ β 1,
5 HIGH-DIMENSIONAL NOISY LASSO 1641 where λ > 0 is a user-defied regularizatio parameter. Note that the two problems are equivalet by Lagragia duality whe the objectives are covex, but ot i the case of a ocovex objective. The Lasso [4, 19] is a special case of these programs, obtaied by settig (2.6) Ɣ Las := 1 XT X ad γ Las := 1 XT y, where we have itroduced the shorthad y = (y 1,...,y ) T R,adX R p, with xi T as its ith row. A simple calculatio shows that ( Ɣ Las, γ Las ) are ubiased estimators of the pair ( x, x β ). This ubiasedess ad additioal cocetratio iequalities (to be described i the sequel) uderlie the well-kow aalysis of the Lasso i the high-dimesioal regime. I this paper, we focus o more geeral istatiatios of the programs (2.4)ad (2.5), ivolvig differet choices of the pair ( Ɣ, γ) that are adapted to the cases of oisy ad/or missig data. Note that the matrix Ɣ Las is positive semidefiite, so the Lasso program is covex. I sharp cotrast, for the case of oisy or missig data, the most atural choice of the matrix Ɣ is ot positive semidefiite, hece the quadratic losses appearig i the problems (2.4) ad(2.5) areocovex. Furthermore, whe Ɣ has egative eigevalues, the objective i equatio (2.5) is ubouded from below. Hece, we make use of the followig regularized estimator: { } 1 (2.7) β arg mi β 1 b 0 k 2 βt Ɣβ γ,β +λ β 1 for a suitable costat b 0. I the presece of ocovexity, it is geerally impossible to provide a polyomial-time algorithm that coverges to a (ear) global optimum, due to the presece of local miima. Remarkably, we are able to prove that this issue is ot sigificat i our settig, ad a simple projected gradiet descet algorithm applied to the programs (2.4) or(2.7) coverges with high probability to a vector extremely close to ay global optimum. Let us illustrate these ideas with some examples. Recall that ( Ɣ, γ) serve as ubiased estimators for ( x, x β ). EXAMPLE 1 (Additive oise). Suppose we observe Z = X + W,whereW is a radom matrix idepedet of X, with rows w i draw i.i.d. from a zero-mea distributio with kow covariace w. We cosider the pair (2.8) Ɣ add := 1 ZT Z w ad γ add := 1 ZT y. Note that whe w = 0 (correspodig to the oiseless case), the estimators reduce to the stadard Lasso. However, whe w 0, the matrix Ɣ add is ot positive semidefiite i the high-dimesioal regime ( p). Ideed, sice the matrix 1 ZT Z has rak at most, the subtracted matrix w may cause Ɣ add to have a
6 1642 P.-L. LOH AND M. J. WAINWRIGHT large umber of egative eigevalues. For istace, if w = σw 2I for σ w 2 > 0, the Ɣ add has p eigevalues equal to σw 2. EXAMPLE 2 (Missig data). We ow cosider the case where the etries of X are missig at radom. Let us first describe a estimator for the special case where each etry is missig at radom, idepedetly with some costat probability ρ [0, 1). (I Example 3 to follow, we will describe the extesio to geeral missig probabilities.) Cosequetly, we observe the matrix Z R p with etries { Xij, with probability 1 ρ, Z ij = 0, otherwise. Give the observed matrix Z R p,weuse Z T ( Z Z Ɣ T mis := ρ diag Z ) (2.9) ad γ mis := 1 Z T y, where Z ij = Z ij /(1 ρ). It is easy to see that the pair ( Ɣ mis, γ mis ) reduces to the pair ( Ɣ Las, γ Las ) for the stadard Lasso whe ρ = 0, correspodig to o missig Z T Z data. I the more iterestig case whe ρ (0, 1), the matrix i equatio (2.9) has rak at most, so the subtracted diagoal matrix may cause the matrix Ɣ mis to have a large umber of egative eigevalues whe p. As a cosequece, the matrix Ɣ mis is ot (i geeral) positive semidefiite, so the associated quadratic fuctio is ot covex. EXAMPLE 3 (Multiplicative oise). As a geeralizatio of the previous example, we ow cosider the case of multiplicative oise. I particular, suppose we observe the quatity Z = X U, whereu is a matrix of oegative oise variables. I may applicatios, it is atural to assume that the rows u i of U are draw i a i.i.d. maer, say from some distributio i which both the vector E[u 1 ] ad the matrix E[u 1 u T 1 ] have strictly positive etries. This geeral family of multiplicative oise models arises i various applicatios; we refer the reader to the papers [3, 6, 7, 21] for more discussio ad examples. A atural choice of the pair ( Ɣ, γ)is give by the quatities Ɣ mul := 1 ZT Z : E ( u 1 u T ) (2.10) 1 ad Ɣ mul := 1 ZT y : E(u 1 ), where : deotes elemetwise divisio. A small calculatio shows that these are ubiased estimators of x ad x β, respectively. The estimators (2.10) have bee studied i past work [21], but oly uder classical scalig ( p). As a special case of the estimators (2.10), suppose the etries u ij of U are idepedet Beroulli(1 ρ j ) radom variables. The the observed matrix Z = X U correspods to a missig-data matrix, where each elemet of the jth colum has probability ρ j of beig missig. I this case, the estimators (2.10) become Ɣ mis = ZT Z (2.11) : M ad γ mis = 1 ZT y : (1 ρ),
7 HIGH-DIMENSIONAL NOISY LASSO 1643 where M := E(u 1 u T 1 ) satisfies { (1 ρi )(1 ρ M ij = j ), if i j, 1 ρ i, if i = j, ρ is the parameter vector cotaiig the ρ j s, ad 1 is the vector of all 1 s. I this way, we obtai a geeralizatio of the estimator discussed i Example Restricted eigevalue coditios. Give a estimate β, there are various ways to assess its closeess to β. I this paper, we focus o the l 2 -orm β β 2, as well as the closely related l 1 -orm β β 1. Whe the covariate matrix X is fully observed (so that the Lasso ca be applied), it is ow well uderstood that a sufficiet coditio for l 2 -recovery is that the matrix Ɣ Las = 1 XT X satisfy a certai type of restricted eigevalue (RE) coditio (e.g., [2, 20]). I this paper, we make use of the followig coditio. DEFINITION 1 (Lower-RE coditio). The matrix Ɣ satisfies a lower restricted eigevalue coditio with curvature α 1 > 0 ad tolerace τ(,p)>0if (2.12) θ T Ɣθ α 1 θ 2 2 τ(,p) θ 2 1 for all θ R p. It ca be show that whe the Lasso matrix Ɣ Las = 1 XT X satisfies this RE coditio (2.12), the Lasso estimate has low l 2 -error for ay vector β supported o ay subset of size at most k τ(,p) 1. I particular, boud (2.12) implies a sparse RE coditio for all k of this magitude, ad coversely, Lemma 11 i the Appedix of [9] shows that a sparse RE coditio implies boud (2.12). I this paper, we work with coditio (2.12), sice it is especially coveiet for aalyzig optimizatio algorithms. I the stadard settig (with ucorrupted ad fully observed desig matrices), it is kow that for may choices of the desig matrix X (with rows havig covariace ), the Lasso matrix Ɣ Las will satisfy such a RE coditio with high probability (e.g., [13, 17]) with α 1 = 1 2 λ mi( ) ad τ(,p) log p. A sigificat portio of the aalysis i this paper is devoted to provig that differet choices of Ɣ, such as the matrices Ɣ add ad Ɣ mis defied earlier, also satisfy coditio (2.12) with high probability. This fact is by o meas obvious, sice as previously discussed, the matrices Ɣ add ad Ɣ mis geerally have large umbers of egative eigevalues. Fially, although such upper bouds are ot ecessary for statistical cosistecy, our algorithmic results make use of the aalogous upper restricted eigevalue coditio, formalized i the followig: DEFINITION 2 (Upper-RE coditio). The matrix Ɣ satisfies a upper restricted eigevalue coditio with smoothess α 2 > 0 ad tolerace τ(,p) > 0 if (2.13) θ T Ɣθ α 2 θ τ(,p) θ 2 1 for all θ R p.
8 1644 P.-L. LOH AND M. J. WAINWRIGHT I recet work o high-dimesioal projected gradiet descet, Agarwal et al. [1] make use of a more geeral form of the lower ad upper bouds (2.12) ad (2.13), applicable to oquadratic losses as well, which are referred to as the restricted strog covexity (RSC) ad restricted smoothess (RSM) coditios, respectively. For various class of radom desig matrices, it ca be show that the Lasso matrix Ɣ Las satisfies the upper boud (2.13) with α 2 = 2λ max ( x ) ad τ(,p) log p ; see Raskutti et al. [13] for the Gaussia case ad Rudelso ad Zhou [17] for the sub-gaussia settig. We will establish similar scalig for our choices of Ɣ Gradiet descet algorithms. I additio to provig results about the global miima of the (possibly ocovex) programs (2.4) ad(2.5), we are also iterested i polyomial-time procedures for approximatig such optima. I this paper, we aalyze some simple algorithms for solvig either the costraied program (2.4) or the Lagragia versio (2.7). Note that the gradiet of the quadratic loss fuctio takes the form L(β) = Ɣβ γ. I applicatio to the costraied versio, the method of projected gradiet descet geerates a sequece of iterates {β t,t = 0, 1, 2,...} by the recursio { β t+1 = arg mi L ( β t) + L ( β t ),β β t + η β 1 R 2 β β t } (2.14) 2 2, where η>0 is a stepsize parameter. Equivaletly, this update ca be writte as β t+1 = (β t η 1 L(βt )), where deotes the l 2 -projectio oto the l 1 -ball of radius R. This projectio ca be computed rapidly i O(p) time usig a procedure duetoduchietal.[5]. For the Lagragia update, we use a slight variat of the projected gradiet update (2.14), amely { β t+1 = arg mi L ( β t ) + L ( β t ),β β t + η β 1 R 2 β β t } (2.15) λ β 1 with the oly differece beig the iclusio of the regularizatio term. This update ca also performed efficietly by performig two projectios oto the l 1 -ball; see the paper [1] for details. Whe the objective fuctio is covex (equivaletly, Ɣ is positive semidefiite), the iterates (2.14) or(2.15) are guarateed to coverge to a global miimum of the objective fuctios (2.4) ad(2.7), respectively. I our settig, the matrix Ɣ eed ot be positive semidefiite, so the best geeric guaratee is that the iterates coverge to a local optimum. However, our aalysis shows that for the family of programs (2.4) or(2.7), uder a reasoable set of coditios satisfied by various statistical models, the iterates actually coverge to a poit extremely close to ay global optimum i both l 1 -orm ad l 2 -orm; see Theorem 2 to follow for a more detailed statemet.
9 HIGH-DIMENSIONAL NOISY LASSO Mai results ad cosequeces. We ow state our mai results ad discuss their cosequeces for oisy, missig, ad depedet data Geeral results. We provide theoretical guaratees for both the costraied estimator (2.4) ad the Lagragia versio (2.7). Note that we obtai differet optimizatio problems as we vary the choice of the pair ( Ɣ, γ) R p p R p. We begi by statig a pair of geeral results, applicable to ay pair that satisfies certai coditios. Our first result (Theorem 1) provides bouds o the statistical error, amely the quatity β β 2, as well as the correspodig l 1 -error, where β is ay global optimum of the programs (2.4) or(2.7). Sice the problem may be ocovex i geeral, it is ot immediately obvious that oe ca obtai a provably good approximatio to ay global optimum without resortig to costly search methods. I order to assuage this cocer, our secod result (Theorem 2) provides rigorous bouds o the optimizatio error, amely the differeces β t β 2 ad β t β 1 icurred by the iterate β t after ruig t rouds of the projected gradiet descet updates (2.14) or(2.15) Statistical error. I cotrollig the statistical error, we assume that the matrix Ɣ satisfies a lower-re coditio with curvature α 1 ad tolerace τ(,p),as previously defied (2.12). Recall that Ɣ ad γ serve as surrogates to the determiistic quatities x R p p ad x β R p, respectively. Our results also ivolve a measure of deviatio i these surrogates. I particular, we assume that there is some fuctio ϕ(q,σ ε ), depedig o the two sources of oise i our problem: the stadard deviatio σ ε of the observatio oise vector ε from equatio (2.1), ad the coditioal distributio Q from equatio (2.2) that liks the covariates x i to the observed versios z i. With this otatio, we cosider the deviatio coditio (3.1) γ Ɣβ ϕ(q,σ ε ) log p. To aid ituitio, ote that iequality (3.1) holds wheever the followig two deviatio coditios are satisfied: γ x β ϕ(q,σ ε ) ad (3.2) log p ( Ɣ x )β log p ϕ(q,σ ε ). The pair of iequalities (3.2) clearly measures the deviatio of the estimators ( Ɣ, γ)from their populatio versios, ad they are sometimes easier to verify theoretically. However, iequality (3.1) may be used directly to derive tighter bouds (e.g., i the additive oise case). Ideed, the bouds established via iequalities (3.2) is ot sharp i the limit of low oise o the covariates, due to the secod
10 1646 P.-L. LOH AND M. J. WAINWRIGHT iequality. I the proofs of our corollaries to follow, we will verify the deviatio coditios for various forms of oisy, missig, ad depedet data, with the quatity ϕ(q,σ ε ) chagig depedig o the model. We have the followig result, which applies to ay global optimum β of the regularized versio (2.7) with λ 4ϕ(Q,σ ε ) log p : THEOREM 1 (Statistical error). Suppose the surrogates ( Ɣ, γ) satisfy the deviatio boud (3.1), ad the matrix Ɣ satisfies the lower-re coditio (2.12) with parameters (α 1,τ)such that (3.3) { α1 kτ(,p) mi 128 k, ϕ(q,σ } ε) log p. b 0 The for ay vector β with sparsity at most k, there is a uiversal positive costat c 0 such that ay global optimum β of the Lagragia program (2.7) with ay b 0 β 2 satisfies the bouds (3.4a) β β 2 c { } 0 k log p max ϕ(q,σ ε ) α 1,λ ad (3.4b) β β 1 8c { } 0k log p max ϕ(q,σ ε ) α 1,λ. The same bouds (without λ ) also apply to the costraied program (2.4) with radius choice R = β 1. Remarks. To be clear, all the claims of Theorem 1 are determiistic. Probabilistic coditios will eter whe we aalyze specific statistical models ad certify that the RE coditio (3.3) ad deviatio coditios are satisfied by a radom pair ( Ɣ, γ)with high probability. We ote that for the stadard Lasso choice ( Ɣ Las, γ Las ) of this matrix vector pair, bouds of the form (3.4) for sub-gaussia oise are well kow from past work (e.g., [2, 11, 12, 23]). The ovelty of Theorem 1 is i allowig for geeral pairs of such surrogates, which as show by the examples discussed earlier ca lead to ocovexity i the uderlyig M- estimator. Moreover, some iterestig differeces arise due to the term ϕ(q,σ ε ), which chages depedig o the ature of the model (missig, oisy, ad/or depedet). As will be clarified i the sequel. Provig that the coditios of Theorem 1 are satisfied with high probability for oisy/missig data requires some otrivial aalysis ivolvig both cocetratio iequalities ad radom matrix theory. Note that i the presece of ocovexity, it is possible i priciple for the optimizatio problems (2.4) ad(2.7) tohavemay global optima that are separated by large distaces. Iterestigly, Theorem 1 guaratees that this upleasat feature does ot arise uder the stated coditios: give ay two global optima β ad β
11 HIGH-DIMENSIONAL NOISY LASSO 1647 of the program (2.4), Theorem 1 combied with the triagle iequality guaratees that β β 2 β β 2 + β β ϕ(q,σ ε ) k log p 2 2c 0 α 1 [ad similarly for the program (2.7)]. Cosequetly, uder ay scalig such that k log p = o(1), the set of all global optima must lie withi a l 2 -ball whose radius shriks to zero. I additio, it is worth observig that Theorem 1 makes a specific predictio for the scalig behavior of the l 2 -error β β 2. I order to study this scalig predictio, we performed simulatios uder the additive oise model described i Example 1, usig the parameter settig x = I ad w = σw 2I with σ w = 0.2. Pael (a) of Figure 1 provides plots 3 of the error β β 2 versus the sample size, for problem dimesios p {128, 256, 512}. Note that for all three choices of dimesios, the error decreases to zero as the sample size icreases, showig cosistecy of the method. The curves also shift to the right as the dimesio p icreases, reflectig the atural ituitio that larger problems are harder i a certai sese. Theorem 1 makes a specific predictio about this scalig behavior: i particular, if we plot the l 2 -error versus the rescaled sample size /(k log p),the curves should roughly alig for differet values of p. Pael (b) shows the same data re-plotted o these rescaled axes, thus verifyig the predicted stackig behavior. (a) (b) FIG. 1. Plots of the error β β 2 after ruig projected gradiet descet o the ocovex objective, with sparsity k p. Plot (a) is a error plot for i.i.d. data with additive oise, ad plot (b) shows l 2 -error versus the rescaled sample size k log p. As predicted by Theorem 1, the curves alig for differet values of p i the rescaled plot. 3 Corollary 1, to be stated shortly, guaratees that the coditios of Theorem 1 are satisfied with high probability for the additive oise model. I additio, Theorem 2 to follow provides a efficiet method of obtaiig a accurate approximatio of the global optimum.
12 1648 P.-L. LOH AND M. J. WAINWRIGHT Fially, as oted by a reviewer, the costrait R = β 1 i the program (2.4) is rather restrictive, sice β is ukow. Theorem 1 merely establishes a heuristic for the scalig expected for this optimal radius. I this regard, the Lagragia estimator (2.7) is more appealig, sice it oly requires choosig b 0 to be larger tha β 2, ad the coditios o the regularizer λ are the stadard oes from past work o the Lasso Optimizatio error. Although Theorem 1 provides guaratees that hold uiformly for ay global miimizer, it does ot provide guidace o how to approximate such a global miimizer usig a polyomial-time algorithm. Ideed, for ocovex programs i geeral, gradiet-type methods may become trapped i local miima, ad it is impossible to guaratee that all such local miima are close to a global optimum. Noetheless, we are able to show that for the family of programs (2.4), uder reasoable coditios o Ɣ satisfied i various settigs, simple gradiet methods will coverge geometrically fast to a very good approximatio of ay global optimum. The followig theorem supposes that we apply the projected gradiet updates (2.14) to the costraied program (2.4), or the composite updates (2.15) to the Lagragia program (2.7), with stepsize η = 2α 2. I both cases, we assume that k log p, as is required for statistical cosistecy i Theorem 1. THEOREM 2 (Optimizatio error). Uder the coditios of Theorem 1: (a) For ay global optimum β of the costraied program (2.4), there are uiversal positive costats (c 1,c 2 ) ad a cotractio coefficiet γ (0, 1), idepedet of (,p,k), such that the gradiet descet iterates (2.14) satisfy the bouds (3.5) β t β 2 2 γ t β 0 β c log p 1 β β c 2 β β 2 2, (3.6) β t β 1 2 k β t β k β β β β 1 for all t 0. (b) Lettig φ deote the objective fuctio of Lagragia program (2.7) with global optimum β, ad applyig composite gradiet updates (2.15), there are uiversal positive costats (c 1,c 2 ) ad a cotractio coefficiet γ (0, 1), idepedet of (,p,k), such that (3.7) β t β 2 2 c 1 β β 2 2 where T := c 2 log (φ(β0 ) φ( β)) δ 2 / log(1/γ ). } {{ } δ 2 for all iterates t T, Remarks. As with Theorem 1, these claims are determiistic i ature. Probabilistic coditios will eter ito the corollaries, which ivolve provig that the surrogate matrices Ɣ used for oisy, missig ad/or depedet data satisfy the
13 HIGH-DIMENSIONAL NOISY LASSO 1649 lower- ad upper-re coditios with high probability. The proof of Theorem 2 itself is based o a extesio of a result due to Agarwal et al. [1] o the covergece of projected gradiet descet ad composite gradiet descet i high dimesios. Their result, as origially stated, imposed covexity of the loss fuctio, but the proof ca be modified so as to apply to the ocovex loss fuctios of iterest here. As oted followig Theorem 1, all global miimizers of the ocovex program (2.4) lie withi a small ball. I additio, Theorem 2 guaratees that the local miimizers also lie withi a ball of the same magitude. Note that i order to show that Theorem 2 ca be applied to the specific statistical models of iterest i this paper, a cosiderable amout of techical aalysis remais i order to establish that its coditios hold with high probability. I order to uderstad the sigificace of the bouds (3.5) ad(3.7), ote that they provide upper bouds for the l 2 -distace betwee the iterate β t at time t, which is easily computed i polyomial-time, ad ay global optimum β of the program (2.4) or(2.7), which may be difficult to compute. Focusig o boud (3.5), sice γ (0, 1), the first term i the boud vaishes as t icreases. The remaiig terms ivolve the statistical errors β β q,forq = 1, 2, which are cotrolled i Theorem 1. It ca be verified that the two terms ivolvig the statistical error o the right-had side are bouded as O( k log p ), so Theorem 2 guaratees that projected gradiet descet produce a output that is essetially as good i terms of statistical error as ay global optimum of the program (2.4). Boud (3.7) provides a similar guaratee for composite gradiet descet applied to the Lagragia versio. Experimetally, we have foud that the predictios of Theorem 2 are bore out i simulatios. Figure 2 shows the results of applyig the projected gradiet descet method to solve the optimizatio problem (2.4) i the case of additive oise (a) (b) FIG. 2. Plots of the optimizatio error log( β t β 2 ) ad statistical error log( β t β 2 ) versus iteratio umber t, geerated by ruig projected gradiet descet o the ocovex objective. Each plot shows the solutio path for the same problem istace, usig 10 differet startig poits. As predicted by Theorem 2, the optimizatio error decreases geometrically.
14 1650 P.-L. LOH AND M. J. WAINWRIGHT [pael (a)], ad missig data [pael (b)]. I each case, we geerated a radom problem istace, ad the applied the projected gradiet descet method to compute a estimate β. We the reapplied the projected gradiet method to the same problem istace 10 times, each time with a radom startig poit, ad measured the error β t β 2 betwee the iterates ad the first estimate (optimizatio error), ad the error β t β 2 betwee the iterates ad the truth (statistical error). Withi each pael, the blue traces show the optimizatio error over 10 trials, ad the red traces show the statistical error. O the logarithmic scale give, a geometric rate of covergece correspods to a straight lie. As predicted by Theorem 2, regardless of the startig poit, the iterates {β t } exhibit geometric covergece to the same fixed poit. 4 The statistical error cotracts geometrically up to a certai poit, the flattes out Some cosequeces. As discussed previously, both Theorems 1 ad 2 are determiistic results. Applyig them to specific statistical models requires some additioal work i order to establish that the stated coditios are met. We ow tur to the statemets of some cosequeces of these theorems for differet cases of oisy, missig ad depedet data. I all the corollaries below, the claims hold with probability greater tha 1 c 1 exp( c 2 log p),where(c 1,c 2 ) are uiversal positive costats, idepedet of all other problem parameters. Note that i all corollaries, the triplet (,p,k) is assumed to satisfy scalig of the form k log p, asis ecessary for l 2 -cosistet estimatio of k-sparse vectors i p dimesios. DEFINITION 3. We say that a radom matrix X R p is sub-gaussia with parameters (, σ 2 ) if: (a) each row xi T R p is sampled idepedetly from a zero-mea distributio with covariace,ad (b) for ay uit vector u R p, the radom variable u T x i is sub-gaussia with parameter at most σ. For istace, if we form a radom matrix by drawig each row idepedetly from the distributio N(0, ), the the resultig matrix X R p is a sub- Gaussia matrix with parameters (, op ) Bouds for additive oise: i.i.d. case. We begi with the case of i.i.d. samples with additive oise, as described i Example 1. COROLLARY 1. Suppose that we observe Z = X + W, where the radom matrices X, W R p are sub-gaussia with parameters ( x,σx 2 ), ad let ε be 4 To be precise, Theorem 2 states that the iterates will coverge geometrically to a small eighborhood of all the global optima.
15 HIGH-DIMENSIONAL NOISY LASSO 1651 a i.i.d. sub-gaussia vector with parameter σ 2 ε. Let σz 2 = σ x 2 + σ w 2. The uder the scalig max{ σz 4 λ 2 mi ( x ), 1}k log p, for the M-estimator based o the surrogates ( Ɣ add, γ add ), the results of Theorems 1 ad 2 hold with parameters α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 σ z (σ w + σ ε ) β 2, with probability at least 1 c 1 exp( c 2 log p). Remarks. the boud (a) Cosequetly, the l 2 -error of ay optimal solutio β satisfies β β 2 σ z(σ w + σ ε ) λ mi ( x ) β k log p 2 with high probability. The prefactor i this boud has a atural iterpretatio as a iverse sigal-to-oise ratio; for istace, whe X ad W are zero-mea Gaussia matrices with row covariaces x = σx 2I ad w = σw 2 I, respectively, we have λ mi ( x ) = σx 2,so (σ w + σ ε ) σx 2 + σ w 2 = σ w + σ ε 1 + σ w 2 λ mi ( x ) σ x σx 2. This quatity grows with the ratios σ w /σ x ad σ ε /σ x, which measure the SNR of the observed covariates ad predictors, respectively. Note that whe σ w = 0, correspodig to the case of ucorrupted covariates, the boud o l 2 -error agrees with kow results. See Sectio 4 for simulatios ad further discussios of the cosequeces of Corollary 1. (b) We may also compare the results i (a) with bouds from past work o highdimesioal sparse regressio with oisy covariates [15]. I this work, Rosebaum ad Tsybakov derive similar cocetratio bouds o sub-gaussia matrices. The tolerace parameters are all O( log p ), with prefactors depedig o the sub-gaussia parameters of the matrices. I particular, i their otatio, ν ( σ x σ w + σ w σ ε + σw 2 ) log p β 1, leadig to the boud (cf. Theorem 2 of Rosebaum ad Tsybakov [15]) β β 2 ν k λ mi ( x ) σ 2 λ mi ( x ) k log p β 1. Extesios to ukow oise covariace. Situatios may arise where the oise covariace w is ukow, ad must be estimated from the data. Oe simple method is to assume that w is estimated from idepedet observatios of the
16 1652 P.-L. LOH AND M. J. WAINWRIGHT oise. I this case, suppose we idepedetly observe a matrix W 0 R p with i.i.d. vectors of oise. The we use w = 1 W 0 T W 0 as our estimate of w.amore sophisticated variat of this method (cf. Chapter 4 of Carroll et al. [3]) assumes that we observe k i replicate measuremets Z i1,...,z ik for each x i ad form the estimator i=1 ki j=1 w = (Z ij Z i )(Z ij Z i ) T (3.8) i=1. (k i 1) Basedotheestimator w, we form the pair ( Ɣ, γ) such that γ = 1 ZT y ad Ɣ = ZT Z w. I the proofs of Sectio 5, we will aalyze the case where w = 1 W 0 T W 0 ad show that the result of Corollary 1 still holds whe w must be estimated from the data. Note that the estimator i equatio (3.8) will also yield the same result, but the aalysis is more complicated Bouds for missig data: i.i.d. case. Next, we tur to the case of i.i.d. samples with missig data, as discussed i Example 3. For a missig data parameter vector ρ, wedefieρ max := max j ρ j, ad assume ρ max < 1. COROLLARY 2. Let X R p be sub-gaussia with parameters ( x,σx 2), ad Z the missig data matrix with parameter ρ. Let ε be a i.i.d. sub-gaussia vector with parameter σ 2 1 σ ε. If max( 4 (1 ρ max ) 4 x λ 2 mi (, 1)k log p, the Theorems x) 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) σ ad ϕ(q,σ ε ) = c x 0 1 ρ max (σ ε + σ x 1 ρ max ) β 2. Remarks. Suppose X is a Gaussia radom matrix ad ρ j = ρ for all j. I σx this case, the ratio 2 λ mi ( x ) = λ max( x ) λ mi ( x ) = κ( x) is the coditio umber of x. The ( ϕ(q,σ ε ) 1 σ x σ ε α λ mi ( x ) 1 ρ + κ( ) x) (1 ρ) 2 β 2, a quatity that depeds o both the coditioig of x, ad the fractio ρ [0, 1) of missig data. We will cosider the results of Corollary 2 applied to this example i the simulatios of Sectio 4. Extesios to ukow ρ. As i the additive oise case, we may wish to cosider the case whe the missig data parameters ρ are ot observed ad must be estimated from the data. For each j = 1, 2,...,p, we estimate ρ j usig ρ j, the empirical average of the umber of observed etries per colum. Let ρ R p deote the resultig estimator of ρ. Naturally, we use the pair of estimators ( Ɣ, γ) defied by (3.9) Ɣ = ZT Z : M ad γ = 1 ZT y : (1 ρ),
17 where HIGH-DIMENSIONAL NOISY LASSO 1653 M ij = { (1 ρi )(1 ρ j ), if i j, 1 ρ i, if i = j. We will show i Sectio 5 that Corollary 2 holds whe ρ is estimated by ρ Bouds for depedet data. Turig to the case of depedet data, we cosider the settig where the rows of X are draw from a statioary vector autoregressive (VAR) process accordig to (3.10) x i+1 = Ax i + v i for i = 1, 2,..., 1, where v i R p is a zero-mea oise vector with covariace matrix v,ad A R p p is a drivig matrix with spectral orm A 2 < 1. We assume the rows of X are draw from a Gaussia distributio with covariace x, such that x = A x A T + v. Hece, the rows of X are idetically distributed but ot idepedet, with the choice A = 0 givig rise to the i.i.d. sceario. Corollaries 3 ad 4 correspod to the case of additive oise ad missig data for a Gaussia VAR process. COROLLARY 3. Suppose the rows of X are draw accordig to a Gaussia VAR process with drivig matrix A. Suppose the additive oise matrix W is i.i.d. with Gaussia rows, ad let ε be a i.i.d. sub-gaussia vector with parameter σ 2 ε. ζ If max( 4 λ 2 mi ( x), 1)k log p, with ζ 2 = w op + 2 x op 1 A op, the Theorems 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 (σ ε ζ + ζ 2 ) β 2. COROLLARY 4. Suppose the rows of X are draw accordig to a Gaussia VAR process with drivig matrix A, ad Z is the observed matrix subject to missig data, with parameter ρ. Let ε be a i.i.d. sub-gaussia vector with parameter σ 2 ζ ε. If max( 4 λ 2 mi ( x), 1)k log p, with ζ = x op (1 ρ max ) 2 1 A op, the Theorems 1 ad 2 hold with probability at least 1 c 1 exp( c 2 log p) for α 1 = 1 2 λ mi( x ) ad ϕ(q,σ ε ) = c 0 (σ ε ζ + ζ 2 ) β 2. REMARKS. Note that the scalig ad the form of ϕ i Corollaries 2 4 are very similar, except with differet effective variaces σ 2 σx = 2, ζ 2 or ζ 2, (1 ρ max ) 2 depedig o the type of corruptio i the data. As we will see i Sectio 5, the proofs ivolve verifyig the deviatio coditios (3.2) usig similar techiques. O the other had, the proof of Corollary 1 proceeds via deviatio coditio (3.1), which produces a tighter boud. Note that we may exted the cases of depedet data to situatios whe w ad ρ are ukow ad must be estimated from the data. The proofs of these extesios are idetical to the i.i.d case, so we will omit them.
18 1654 P.-L. LOH AND M. J. WAINWRIGHT 3.3. Applicatio to graphical model iverse covariace estimatio. The problem of iverse covariace estimatio for a Gaussia graphical model is also related to the Lasso. Meishause ad Bühlma [10] prescribed a way to recover the support of the precisio matrix whe each colum of is k-sparse, via liear regressio ad the Lasso. More recetly, Yua [22] proposed a method for estimatig usig the Datzig selector, ad obtaied error bouds o 1 whe the colums of are bouded i l 1. Both of these results assume that X is fully-observed ad has i.i.d. rows. Suppose we are give a matrix X R p of samples from a multivariate Gaussia distributio, where each row is distributed accordig to N(0, ). We assume the rows of X are either i.i.d. or sampled from a Gaussia VAR process. Based o the modified Lasso of the previous sectio, we devise a method to estimate based o a corrupted observatio matrix Z, whe is sparse. Our method bears similarity to the method of Yua [22], but is valid i the case of corrupted data, ad does ot require a l 1 colum boud. Let X j deote the jth colum of X, ad let X j deote the matrix X with jth colum removed. By stadard results o Gaussia graphical models, there exists a vector θ j R p 1 such that (3.11) X j = X j θ j + ε j, where ε j is a vector of i.i.d. Gaussias ad ε j X j for each j.ifwedefiea j := ( jj j, j θ j ) 1, we ca verify that j, j = a j θ j. Our algorithm, described below, forms estimates θ j ad â j for each j, the combies the estimates to obtai a estimate j, j = â j θ j. I the additive oise case, we observe the matrix Z = X + W. From the equatios (3.11), we obtai Z j = X j θ j + (ε j + W j ). Note that δ j = ε j + W j is a vector of i.i.d. Gaussias, ad sice X W,wehaveδ j X j. Hece, our results o covariates with additive oise allow us to recover θ j from Z. We ca verify that this reduces to solvig the program (2.4) or(2.7) with the pair ( Ɣ (j), γ (j) ) = ( j, j, 1 Z jt Z j ),where = 1 ZT Z w. Whe Z is a missig-data versio of X, we similarly estimate the vectors θ j via equatio (3.11), usig our results o the Lasso with missig covariates. Here, both covariates ad resposes are subject to missig data, but this makes o differece i our theoretical results. For each j,weusethepair ( Ɣ (j), γ (j)) ( = j, j, 1 Z jt Z j : ( 1 ρ j ) ) (1 ρ j ), where = 1 ZT Z : M, adm is defied as i Example 3. To obtai the estimate, we therefore propose the followig procedure, based o the estimators {( Ɣ (j), γ (j) )} p j=1 ad. ALGORITHM 3.1. (1) Perform p liear regressios of the variables Z j upo the remaiig variables Z j, usig the program (2.4) or(2.7) with the estimators ( Ɣ (j), γ (j) ), to obtai estimates θ j of θ j.
19 HIGH-DIMENSIONAL NOISY LASSO 1655 (2) Estimate the scalars a j usig the quatity â j := ( jj j, j θ j ) 1, based o the estimator.form with j, j = â j θ j ad jj = â j. (3) Set = arg mi S p 1,whereS p is the set of symmetric matrices. Note that the miimizatio i step (3) is a liear program, so is easily solved with stadard methods. We have the followig corollary about : COROLLARY 5. Suppose the colums of the matrix are k-sparse, ad suppose the coditio umber κ( ) is ozero ad fiite. Suppose we have γ (j) Ɣ (j) θ j log p (3.12) ϕ(q,σ ε ) j, ad suppose we have the followig additioal deviatio coditio o : (3.13) max cϕ(q,σ ε ) log p. Fially, suppose the lower-re coditio holds uiformly over the matrices with the scalig (3.3).The uder the estimatio procedure of Algorithm 3.1, there exists a uiversal costat c 0 such that op c 0κ 2 ( ( ) ϕ(q,σε ) λ mi ( ) λ mi ( ) + ϕ(q,σ ) ε) log p k α 1. REMARKS. Note that Corollary 5 is agai a determiistic result, with parallel structure to Theorem 1. Furthermore, the deviatio bouds (3.12) ad(3.13) hold for all scearios cosidered i Sectio 3.2 above, usig Corollaries 1 4 for the first two iequalities, ad a similar boudig techique for max ;adthe lower-re coditio holds over all matrices Ɣ (j) by the same techique used to establish the lower-re coditio for Ɣ. The uiformity of the lower-re boud over all sub-matrices holds because 0 <λ mi ( ) λ mi ( j, j ) λ max ( j, j ) λ max ( ) <. Hece, the error boud i Corollary 5 holds with probability at least 1 c 1 exp( c 2 log p) whe k log p, for the appropriate values of ϕ ad α Simulatios. I this sectio, we report some additioal simulatio results to cofirm that the scaligs predicted by our theory are sharp. I Figure 1 followig Theorem 1, we showed that the error curves alig whe plotted agaist a suitably rescaled sample size, i the case of additive oise perturbatios. Pael (a) of Figure 3 shows these same types of rescaled curves for the case of missig data, with sparsity k p, covariate matrix x = I, ad missig fractio ρ = 0.2, whereas pael (b) shows the rescaled plots for the vector autoregressive case with additive Ɣ (j)
20 1656 P.-L. LOH AND M. J. WAINWRIGHT (a) (b) FIG. 3. Plots of the error β β 2 after ruig projected gradiet descet o the ocovex objective, with sparsity k p. I all cases, we plotted the error versus the rescaled sample size k log p. As predicted by Theorems 1 ad 2, the curves alig for differet values of p whe plotted i this rescaled maer.(a)missig data case with i.i.d. covariates.(b)vector autoregressive data with additive oise. Each poit represets a average over 100 trials. oise perturbatios, usig a drivig matrix A with A op = 0.2. Each poit correspods to a average over 100 trials. Oce agai, we see excellet agreemet with the scalig law provided by Theorem 1. We also ra simulatios to verify the form of the fuctio ϕ(q,σ ε ) appearig i Corollaries 1 ad 2. I the additive oise settig for i.i.d. data, we set x = I ad ε equal to i.i.d. Gaussia oise with σ ε = 0.5.Forafixedvalueoftheparametersp = 256 ad k log p, we ra the projected gradiet descet algorithm for differet values of σ w (0.1, 0.3), such that w = σw 2I ad 60(1 + σ w 2)2 k log p, with β 2 = 1. Accordig to the theory, ϕ(q,σ ε) α (σ w + 0.5) 1 + σw 2,sothat β β 2 (σ w + 0.5) 1 + σw 2 k log p (1 + σw 2)2 k log p σ w σw 2 I order to verify this theoretical predictio, we plotted σ w versus the rescaled 1+σ 2 error w σ w +0.5 β β 2. As show by Figure 4(a), the curve is roughly costat, as predicted by the theory. Similarly, i the missig data settig for i.i.d. data, we set x = I ad ε equal to i.i.d. Gaussia oise with σ ε = 0.5. For a fixed value of the parameters p = 128 ad k log p, we ra simulatios for differet values of the missig data parameter ρ (0, 0.3), such that 60 k log p. Accordig to the theory, ϕ(q,σ ε) (1 ρ) 4 α σ ε 1 ρ + 1. Cosequetly, with our specified scaligs of (,p,k), we should expect a (1 ρ) 2
21 HIGH-DIMENSIONAL NOISY LASSO 1657 (a) FIG. 4. (a)plot of the rescaled l 2 -error 1+σw 2 σ w +0.5 β β 2 versus the additive oise stadard β β deviatio σ w for the i.i.d. model with additive oise. (b)plot of the rescaled l 2 -error (1 ρ) versus the missig fractio ρ for the i.i.d. model with missig data. Both curves are roughly costat, showig that our error bouds o β β 2 exhibit the proper scalig. Each poit represets a average over 200 trials. (b) boud of the form β β 2 ϕ(q,σ ε) k log p (1 ρ). α β β 2 The plot of ρ versus the rescaled error 1+0.5(1 ρ) isshowifigure4(b). The curve is agai roughly costat, agreeig with theoretical results. Fially, we studied the behavior of the iverse covariace matrix estimatio algorithm o three types of Gaussia graphical models: (a) Chai-structured graphs. I this case, all odes of the graph are arraged i a liear chai. Hece, each ode (except the two ed odes) has degree k = 2. The diagoal etries of are set equal to 1, ad all etries correspodig to liks i the chai are set equal to 0.1. The is rescaled so op = 1. (b) Star-structured graphs. I this case, all odes are coected to a cetral ode, which has degree k 0.1p. All other odes have degree 1. The diagoal etries of are set equal to 1, ad all etries correspodig to edges i the graph are set equal to 0.1. The is rescaled so op = 1. (c) Erdős Reyi graphs. This example comes from Rothma et al. [16]. For a sparsity parameter k log p, we radomly geerate the matrix by first geeratig the matrix B such that the diagoal etries are 0, ad all other etries are idepedetly equal to 0.5 with probability k/p, ad 0 otherwise. The δ is chose so that = B + δi has coditio umber p. Fially, is rescaled so op = 1.
22 1658 P.-L. LOH AND M. J. WAINWRIGHT After geeratig the matrix X of i.i.d. samples from the appropriate graphical model, with covariace matrix x = 1, we geerated the corrupted matrix Z = X + W with w = (0.2) 2 I i the additive oise case, or the missig data matrix Z with ρ = 0.2 i the missig data case. 1 Paels (a) ad (c) i Figure 5 show the rescaled l 2 -error k op plotted agaist the sample size for a chai-structured graph. I paels (b) ad (d), we have l 2 -error plotted agaist the rescaled sample size, /(k log p).oceagai, we see good agreemet with the theoretical predictios. We have obtaied qualitatively similar results for the star ad Erdős Reyi graphs. (a) (b) (c) (d) FIG. 5. (a)plots of the error op after ruig projected gradiet descet o the ocovex objective for a chai-structured Gaussia graphical model with additive oise. As predicted by Theorems 1 ad 2, all curves alig whe the error is rescaled by 1 ad plotted agaist the ratio k k log p, as show i (b). Plots (c) ad (d) show the results of simulatios o missig data sets. Each poit represets the average over 50 trials.
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More informationI. Chi-squared Distributions
1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationCME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 8
CME 30: NUMERICAL LINEAR ALGEBRA FALL 005/06 LECTURE 8 GENE H GOLUB 1 Positive Defiite Matrices A matrix A is positive defiite if x Ax > 0 for all ozero x A positive defiite matrix has real ad positive
More informationSAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationSUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09-AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationHere are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.
This documet was writte ad copyrighted by Paul Dawkis. Use of this documet ad its olie versio is govered by the Terms ad Coditios of Use located at http://tutorial.math.lamar.edu/terms.asp. The olie versio
More informationChapter 6: Variance, the law of large numbers and the Monte-Carlo method
Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationTaking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationFIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix
FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if
More informationLecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
More informationOverview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
More informationSUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE
SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised
More informationTotally Corrective Boosting Algorithms that Maximize the Margin
Mafred K. Warmuth mafred@cse.ucsc.edu Ju Liao liaoju@cse.ucsc.edu Uiversity of Califoria at Sata Cruz, Sata Cruz, CA 95064, USA Guar Rätsch Guar.Raetsch@tuebige.mpg.de Friedrich Miescher Laboratory of
More informationNon-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationUniversal coding for classes of sources
Coexios module: m46228 Uiversal codig for classes of sources Dever Greee This work is produced by The Coexios Project ad licesed uder the Creative Commos Attributio Licese We have discussed several parametric
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationHigh-dimensional support union recovery in multivariate regression
High-dimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical
More information5: Introduction to Estimation
5: Itroductio to Estimatio Cotets Acroyms ad symbols... 1 Statistical iferece... Estimatig µ with cofidece... 3 Samplig distributio of the mea... 3 Cofidece Iterval for μ whe σ is kow before had... 4 Sample
More information1. MATHEMATICAL INDUCTION
1. MATHEMATICAL INDUCTION EXAMPLE 1: Prove that for ay iteger 1. Proof: 1 + 2 + 3 +... + ( + 1 2 (1.1 STEP 1: For 1 (1.1 is true, sice 1 1(1 + 1. 2 STEP 2: Suppose (1.1 is true for some k 1, that is 1
More informationwhere: T = number of years of cash flow in investment's life n = the year in which the cash flow X n i = IRR = the internal rate of return
EVALUATING ALTERNATIVE CAPITAL INVESTMENT PROGRAMS By Ke D. Duft, Extesio Ecoomist I the March 98 issue of this publicatio we reviewed the procedure by which a capital ivestmet project was assessed. The
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationEstimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationSection 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
More informationChapter 5: Inner Product Spaces
Chapter 5: Ier Product Spaces Chapter 5: Ier Product Spaces SECION A Itroductio to Ier Product Spaces By the ed of this sectio you will be able to uderstad what is meat by a ier product space give examples
More informationSolutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
More informationSECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES
SECTION 1.5 : SUMMATION NOTATION + WORK WITH SEQUENCES Read Sectio 1.5 (pages 5 9) Overview I Sectio 1.5 we lear to work with summatio otatio ad formulas. We will also itroduce a brief overview of sequeces,
More informationQuadrat Sampling in Population Ecology
Quadrat Samplig i Populatio Ecology Backgroud Estimatig the abudace of orgaisms. Ecology is ofte referred to as the "study of distributio ad abudace". This beig true, we would ofte like to kow how may
More informationHypergeometric Distributions
7.4 Hypergeometric Distributios Whe choosig the startig lie-up for a game, a coach obviously has to choose a differet player for each positio. Similarly, whe a uio elects delegates for a covetio or you
More informationTheorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius
More informationWeek 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
More informationClass Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
More informationLecture 4: Cheeger s Inequality
Spectral Graph Theory ad Applicatios WS 0/0 Lecture 4: Cheeger s Iequality Lecturer: Thomas Sauerwald & He Su Statemet of Cheeger s Iequality I this lecture we assume for simplicity that G is a d-regular
More informationAMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99
VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric
More informationBASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)
BASIC STATISTICS. SAMPLES, RANDOM SAMPLING AND SAMPLE STATISTICS.. Radom Sample. The radom variables X,X 2,..., X are called a radom sample of size from the populatio f(x if X,X 2,..., X are mutually idepedet
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationTHE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
More informationTrigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
More information1. C. The formula for the confidence interval for a population mean is: x t, which was
s 1. C. The formula for the cofidece iterval for a populatio mea is: x t, which was based o the sample Mea. So, x is guarateed to be i the iterval you form.. D. Use the rule : p-value
More informationUC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006
Exam format UC Bereley Departmet of Electrical Egieerig ad Computer Sciece EE 6: Probablity ad Radom Processes Solutios 9 Sprig 006 The secod midterm will be held o Wedesday May 7; CHECK the fial exam
More informationWHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?
WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER? JÖRG JAHNEL 1. My Motivatio Some Sort of a Itroductio Last term I tought Topological Groups at the Göttige Georg August Uiversity. This
More informationLesson 17 Pearson s Correlation Coefficient
Outlie Measures of Relatioships Pearso s Correlatio Coefficiet (r) -types of data -scatter plots -measure of directio -measure of stregth Computatio -covariatio of X ad Y -uique variatio i X ad Y -measurig
More informationAnalyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
More informationLECTURE 13: Cross-validation
LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M
More informationThe following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
More informationVladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
More informationTHE HEIGHT OF q-binary SEARCH TREES
THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationA Recursive Formula for Moments of a Binomial Distribution
A Recursive Formula for Momets of a Biomial Distributio Árpád Béyi beyi@mathumassedu, Uiversity of Massachusetts, Amherst, MA 01003 ad Saverio M Maago smmaago@psavymil Naval Postgraduate School, Moterey,
More informationAnnuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.
Auities Uder Radom Rates of Iterest II By Abraham Zas Techio I.I.T. Haifa ISRAEL ad Haifa Uiversity Haifa ISRAEL Departmet of Mathematics, Techio - Israel Istitute of Techology, 3000, Haifa, Israel I memory
More informationLecture 2: Karger s Min Cut Algorithm
priceto uiv. F 3 cos 5: Advaced Algorithm Desig Lecture : Karger s Mi Cut Algorithm Lecturer: Sajeev Arora Scribe:Sajeev Today s topic is simple but gorgeous: Karger s mi cut algorithm ad its extesio.
More informationChapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More information.04. This means $1000 is multiplied by 1.02 five times, once for each of the remaining sixmonth
Questio 1: What is a ordiary auity? Let s look at a ordiary auity that is certai ad simple. By this, we mea a auity over a fixed term whose paymet period matches the iterest coversio period. Additioally,
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationThe Stable Marriage Problem
The Stable Marriage Problem William Hut Lae Departmet of Computer Sciece ad Electrical Egieerig, West Virgiia Uiversity, Morgatow, WV William.Hut@mail.wvu.edu 1 Itroductio Imagie you are a matchmaker,
More informationCOMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S 2 CONTROL CHART FOR THE CHANGES IN A PROCESS
COMPARISON OF THE EFFICIENCY OF S-CONTROL CHART AND EWMA-S CONTROL CHART FOR THE CHANGES IN A PROCESS Supraee Lisawadi Departmet of Mathematics ad Statistics, Faculty of Sciece ad Techoology, Thammasat
More informationA Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design
A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:
More informationTHE problem of fitting a circle to a collection of points
IEEE TRANACTION ON INTRUMENTATION AND MEAUREMENT, VOL. XX, NO. Y, MONTH 000 A Few Methods for Fittig Circles to Data Dale Umbach, Kerry N. Joes Abstract Five methods are discussed to fit circles to data.
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More informationCHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
More information3. Greatest Common Divisor - Least Common Multiple
3 Greatest Commo Divisor - Least Commo Multiple Defiitio 31: The greatest commo divisor of two atural umbers a ad b is the largest atural umber c which divides both a ad b We deote the greatest commo gcd
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationhp calculators HP 12C Statistics - average and standard deviation Average and standard deviation concepts HP12C average and standard deviation
HP 1C Statistics - average ad stadard deviatio Average ad stadard deviatio cocepts HP1C average ad stadard deviatio Practice calculatig averages ad stadard deviatios with oe or two variables HP 1C Statistics
More informationDAME - Microsoft Excel add-in for solving multicriteria decision problems with scenarios Radomir Perzina 1, Jaroslav Ramik 2
Itroductio DAME - Microsoft Excel add-i for solvig multicriteria decisio problems with scearios Radomir Perzia, Jaroslav Ramik 2 Abstract. The mai goal of every ecoomic aget is to make a good decisio,
More informationProject Deliverables. CS 361, Lecture 28. Outline. Project Deliverables. Administrative. Project Comments
Project Deliverables CS 361, Lecture 28 Jared Saia Uiversity of New Mexico Each Group should tur i oe group project cosistig of: About 6-12 pages of text (ca be loger with appedix) 6-12 figures (please
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationMeasures of Spread and Boxplots Discrete Math, Section 9.4
Measures of Spread ad Boxplots Discrete Math, Sectio 9.4 We start with a example: Example 1: Comparig Mea ad Media Compute the mea ad media of each data set: S 1 = {4, 6, 8, 10, 1, 14, 16} S = {4, 7, 9,
More informationTIGHT BOUNDS ON EXPECTED ORDER STATISTICS
Probability i the Egieerig ad Iformatioal Scieces, 20, 2006, 667 686+ Prited i the U+S+A+ TIGHT BOUNDS ON EXPECTED ORDER STATISTICS DIMITRIS BERTSIMAS Sloa School of Maagemet ad Operatios Research Ceter
More informationAn Efficient Polynomial Approximation of the Normal Distribution Function & Its Inverse Function
A Efficiet Polyomial Approximatio of the Normal Distributio Fuctio & Its Iverse Fuctio Wisto A. Richards, 1 Robi Atoie, * 1 Asho Sahai, ad 3 M. Raghuadh Acharya 1 Departmet of Mathematics & Computer Sciece;
More informationChair for Network Architectures and Services Institute of Informatics TU München Prof. Carle. Network Security. Chapter 2 Basics
Chair for Network Architectures ad Services Istitute of Iformatics TU Müche Prof. Carle Network Security Chapter 2 Basics 2.4 Radom Number Geeratio for Cryptographic Protocols Motivatio It is crucial to
More informationA gentle introduction to Expectation Maximization
A getle itroductio to Expectatio Maximizatio Mark Johso Brow Uiversity November 2009 1 / 15 Outlie What is Expectatio Maximizatio? Mixture models ad clusterig EM for setece topic modelig 2 / 15 Why Expectatio
More informationINVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
More informationA Mathematical Perspective on Gambling
A Mathematical Perspective o Gamblig Molly Maxwell Abstract. This paper presets some basic topics i probability ad statistics, icludig sample spaces, probabilistic evets, expectatios, the biomial ad ormal
More informationTHIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E. MCCARTHY, SANDRA POTT, AND BRETT D. WICK
THIN SEQUENCES AND THE GRAM MATRIX PAMELA GORKIN, JOHN E MCCARTHY, SANDRA POTT, AND BRETT D WICK Abstract We provide a ew proof of Volberg s Theorem characterizig thi iterpolatig sequeces as those for
More information, a Wishart distribution with n -1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationEkkehart Schlicht: Economic Surplus and Derived Demand
Ekkehart Schlicht: Ecoomic Surplus ad Derived Demad Muich Discussio Paper No. 2006-17 Departmet of Ecoomics Uiversity of Muich Volkswirtschaftliche Fakultät Ludwig-Maximilias-Uiversität Müche Olie at http://epub.ub.ui-mueche.de/940/
More informationCoordinating Principal Component Analyzers
Coordiatig Pricipal Compoet Aalyzers J.J. Verbeek ad N. Vlassis ad B. Kröse Iformatics Istitute, Uiversity of Amsterdam Kruislaa 403, 1098 SJ Amsterdam, The Netherlads Abstract. Mixtures of Pricipal Compoet
More information