SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1


 Ethelbert Osborne
 1 years ago
 Views:
Transcription
1 The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: /09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI, MARTIN J. WAINWRIGHT AND MICHAEL I. JORDAN INRIA Willow ProjectTeam Laboratoire d Iformatique de l Ecole Normale Supérieure, Uiversity of Califoria, Berkeley ad Uiversity of Califoria, Berkeley I multivariate regressio, a Kdimesioal respose vector is regressed upo a commo set of p covariates, with a matrix B R p K of regressio coefficiets. We study the behavior of the multivariate group Lasso,iwhich block regularizatio based o the l 1 /l 2 orm is used for support uio recovery, or recovery of the set of s rows for which B is ozero. Uder highdimesioal scalig, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row patter with high probability over the radom desig ad oise that is specified by the sample complexity parameter θ(,p,s):= /[2ψ(B ) log(p s)]. Here is the sample size, ad ψ(b ) is a sparsityoverlap fuctio measurig a combiatio of the sparsities ad overlaps of the Kregressio coefficiet vectors that costitute the model. We prove that the multivariate group Lasso succeeds for problem sequeces (,p,s)such that θ(,p,s) exceeds a critical level θ u, ad fails for sequeces such that θ(,p,s) lies below a critical level θ l. For the special case of the stadard Gaussia esemble, we show that θ l = θ u so that the characterizatio is sharp. The sparsityoverlap fuctio ψ(b ) reveals that, if the desig is ucorrelated o the active rows, l 1 /l 2 regularizatio for multivariate regressio ever harms performace relative to a ordiary Lasso approach ad ca yield substatial improvemets i sample complexity (up to a factor of K) whe the coefficiet vectors are suitably orthogoal. For more geeral desigs, it is possible for the ordiary Lasso to outperform the multivariate group Lasso. We complemet our aalysis with simulatios that demostrate the sharpess of our theoretical results, eve for relatively small problems. 1. Itroductio. The developmet of efficiet algorithms for estimatio of largescale models has bee a major goal of statistical learig research i the last decade. There is ow a substatial body of work based o l 1 regularizatio datig back to the semial work of Tibshirai (1996) ad Dooho ad collaborators [Che, Dooho ad Sauders (1998); Dooho ad Huo (2001)]. The bulk of Received Jue 2009; revised November Supported i part by NSF Grats DMS ad CCF to MJW, ad by NSF Grat ad DARPA IPTO Cotract FA to MIJ. AMS 2000 subject classificatios. Primary 62J07; secodary 62F07. Key words ad phrases. LASSO, blockorm, secodorder coe program, sparsity, variable selectio, multivariate regressio, highdimesioal scalig, simultaeous Lasso, group Lasso. 1
2 2 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN this work has focused o the stadard problem of liear regressio, i which oe makes observatios of the form (1) y = Xβ + w, where y R is a realvalued vector of observatios, w R is a additive zeromea oise vector ad X R p is the desig matrix. A subset of the compoets of the ukow parameter vector β R p are assumed ozero; the goal is to idetify these coefficiets ad (possibly) estimate their values. This goal ca be formulated i terms of the solutio of the pealized optimizatio problem (2) arg mi β R p { 1 y Xβ 22 + λ β 0 }, where β 0 couts the umber of ozero compoets i β ad where λ > 0is a regularizatio parameter. Ufortuately, this optimizatio problem is computatioally itractable, a fact which has led various authors to cosider the covex relaxatio [Tibshirai (1996); Che, Dooho ad Sauders (1998)] (3) arg mi β R p { 1 y Xβ 22 + λ β 1 }, i which β 0 is replaced with the l 1 orm β 1. This relaxatio, ofte referred to as the Lasso [Tibshirai (1996)], is a quadratic program, ad ca be solved efficietly by various methods [e.g., Boyd ad Vadeberghe (2004); Osbore, Presell ad Turlach (2000); Efro et al. (2004)]. A variety of theoretical results are ow i place for the Lasso, both i the traditioal settig where the sample size teds to ifiity with the problem size p fixed [Kight ad Fu (2000)], as well as uder highdimesioal scalig, i which p ad ted to ifiity simultaeously, thereby allowig p to be comparable to or eve larger tha [e.g., Meishause ad Bühlma (2006); Waiwright (2009b); Meishause ad Yu (2009); Bickel, Ritov ad Tsybakov (2009)]. I may applicatios, it is atural to impose sparsity costraits o the regressio vector β, ad a variety of such costraits have bee cosidered. For example, oe ca cosider a hard sparsity model i which β is assumed to cotai at most s ozero etries or a soft sparsity model i which β is assumed to belog to a l q ball with q<1. Aalyses also differ i terms of the loss fuctios that are cosidered. For the model or variable selectio problem, it is atural to cosider the zero oe loss associated with the problem of recoverig the ukow support set of β. Alteratively, oe ca view the Lasso as a shrikage estimator to be compared to traditioal least squares or ridge regressio; i this case, it is atural to study the l 2 loss β β 2 betwee the estimate β ad the groud truth. I other settigs, the predictio error E[(Y X T β) 2 ] may be of primary iterest, ad oe tries to show risk cosistecy (amely, that the estimated model predicts as well as the best sparse model, whether or ot the true model is sparse).
3 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 3 A umber of alteratives to the Lasso have bee explored i recet years, ad i some cases stroger theoretical results have bee obtaied [Fa ad Li (2001); Frak ad Friedma (1993); Huag, Horowitz ad Ma (2008)]. However, the resultig optimizatio problems are geerally ocovex ad thus difficult to solve i practice. The Lasso remais a focus of attetio due to its combiatio of favorable statistical ad computatioal properties Blockstructured regularizatio. While the assumptio of sparsity at the level of idividual coefficiets is oe way to give meaig to highdimesioal (p ) regressio, there are other structural assumptios that are atural i regressio, ad which may provide additioal leverage. For istace, i a hierarchical regressio model, groups of regressio coefficiets may be required to be zero or ozero i a blockwise maer; for example, oe might wish to iclude a particular covariate ad all powers of that covariate as a group [Yua ad Li (2006); Zhao, Rocha ad Yu (2009)]. Aother example arises whe we cosider variable selectio i the settig of multivariate regressio: multiple regressios ca be related by a (partially) shared sparsity patter, such as whe there are a uderlyig set of covariates that are relevat across regressios [Oboziski, Taskar ad Jorda (2010); Argyriou, Evgeiou ad Potil (2006); Turlach, Veables ad Wright (2005); Zhag et al. (2008)]. Based o such motivatios, a recet lie of research [Bach, Lackriet ad Jorda (2004); Tropp (2006); Yua ad Li (2006); Zhao, Rocha ad Yu (2009); Oboziski, Taskar ad Jorda (2010); Ravikumar et al. (2009)] has studied the use of blockregularizatio schemes, iwhichthel 1 orm is composed with some other l q orm (q >1), thereby obtaiig the l 1 /l q orm defied as a sum of l q orms over groups of regressio coefficiets. The best kow examples of such block orms are the l 1 /l orm [Turlach, Veables ad Wright (2005); Zhag et al. (2008)] ad the l 1 /l 2 orm [Oboziski, Taskar ad Jorda (2010)]. I this paper, we ivestigate the use of l 1 /l 2 blockregularizatio i the cotext of highdimesioal multivariate liear regressio, i which a collectio of K scalar outputs are regressed o the same desig matrix X R p.represetig the regressio coefficiets as a p K matrix B, the multivariate regressio model takes the form (4) Y = XB + W, where Y R K ad W R K are matrices of observatios ad zeromea oise, respectively. I additio, we assume a hardsparsity model for the regressio coefficiets i which colum k of the coefficiet matrix B has ozero etries o a subset (5) S k := { i {1,...,p} βik 0} of size s k := S k. I may applicatios it is atural to expect that the supports S k should overlap. I that case, istead of estimatig the support of each regressio
4 4 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN separately, it might be beeficial to first estimate the set of variables which are relevat to ay of the multivariate resposes ad to estimate oly subsequetly the idividual supports withi that set. Thus we focus o the problem of recoverig the uio of the supports, amely the set S := K k=1 S k, correspodig to the subset of idices i {1,...,p} that are ivolved i at least oe regressio. We cosider a rage of problems i which variables ca be relevat to all, some, oly oe or oe of the regressios, ad we ivestigate if ad how the overlap of the idividual supports ad the relatedess of idividual regressios beefit or hider estimatio of the support uio. The support uio problem ca be uderstood as the geeralizatio of the problem of variable selectio to the group settig. Rather tha selectig specific compoets of a coefficiet vector, we aim to select specific rows of a coefficiet matrix. We thus also refer to the support uio problem as the row selectio problem. We ote that recoverig S, although ot equivalet to recoverig each of the distict idividual supports S k, addresses the essetial difficulty i recoverig those supports. Ideed, as we show i Sectio 2.2, give a method that returs the row support S with S p (with high probability), it is straightforward to recover the idividual supports S k by ordiary leastsquares ad thresholdig. If computatioal complexity were ot a cocer, the atural way to perform row selectio for B would be by solvig the optimizatio problem (6) arg mi B R p K { 1 2 Y XB 2 F + λ B l0 /l q where B = (β ik ) 1 i p,1 k K is a p K matrix, the quatity F deotes the Frobeius orm, 1 ad the orm B l0 /l q couts the umber of rows i B that have ozero l q orm. As before, the l 0 compoet of this regularizer yields a ocovex ad computatioally itractable problem, so that it is atural to cosider the relaxatio { } 1 (7) arg mi B R p K 2 Y XB 2 F + λ B l1 /l q, where B l1 /l q is the block l 1 /l q orm ( p K 1/q p (8) B l1 /l q := β q) ij = β i q. i=1 j=1 The relaxatio (7) is a atural geeralizatio of the Lasso; ideed, it specializes to the Lasso i the case K = 1. For later referece, we also ote that settig q = 1 leads to the use of the l 1 /l 1 block orm i the relaxatio (7). Sice this orm decouples across both the rows ad colums, this particular choice is equivalet to solvig K separate Lasso problems, oe for each colum of the p K i=1 }, 1 The Frobeius orm of a matrix A is give by A F := i,j A 2 ij.
5 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 5 regressio matrix B. A more iterestig choice is q = 2, which yields a block l 1 /l 2 orm that couples together the colums of B. Regularizatio with the l 1 /l 2 orm is commoly referred to as the group Lasso i the settig of uivariate regressio [Yua ad Li (2006)]. We thus refer to l 1 /l 2 regularizatio i the multivariate settig as the multivariate group Lasso. Note that the multivariate group Lasso ca be viewed as a special case of the group Lasso, i that it ivolves a specific groupig of regressio coefficiets, but the multivariate settig brigs ew statistical issues to the fore. As we discuss i Appedix B, the multivariate group Lasso ca be cast as a secodorder coe program (SOCP). This is a family of covex optimizatio problems that ca be solved efficietly with iterior poit methods [Boyd ad Vadeberghe (2004)] ad icludes quadratic programs as a particular case. Some recet work has addressed certai statistical aspects of blockregularizatio schemes. Meier, va de Geer ad Bühlma (2008) perform a aalysis of risk cosistecy with blockorm regularizatio. Bach (2008) provides a aalysis of blockwise support recovery for the kerelized group Lasso i the classical, fixed p settig. I the highdimesioal settig, Ravikumar et al. (2009) studies the cosistecy of blockwise support recovery for the group Lasso for fixed desig matrices, ad their result is geeralized by Liu ad Zhag (2008) to blockwise support recovery i the settig of geeral l 1 /l q regularizatio, agai for fixed desig matrices. However, these aalyses do ot discrimiate betwee various values of q, yieldig the same qualitative results ad the same covergece rates for q = 1 as for q > 1. Our focus, which is motivated by the empirical observatio that the group Lasso ad the multivariate group Lasso ca outperform the ordiary Lasso [Bach (2008); Yua ad Li (2006); Zhao, Rocha ad Yu (2009); Oboziski, Taskar ad Jorda (2010)], is precisely the distictio betwee q = 1 ad q > 1 (specifically q = 2). We ote that i cocurret work Negahba ad Waiwright (2008) have studied a related problem of support recovery for the l 1 /l relaxatio. The distictio betwee q = 1adq = 2 is also sigificat from a optimizatiotheoretic poit of view. I particular, the SOCP relaxatios uderlyig the multivariate group Lasso (q = 2) are geerally tighter tha the quadratic programmig relaxatio uderlyig the Lasso (q = 1); however, the improved accuracy is geerally obtaied at a higher computatioal cost [Boyd ad Vadeberghe (2004)]. Thus we ca view our problem as a istace of the geeral questio of the relatioship of statistical efficiecy to computatioal efficiecy: does the qualitatively greater amout of computatioal effort ivolved i solvig the multivariate group Lasso always yield greater statistical efficiecy? More specifically, ca we give theoretical coditios uder which solvig the geeralized Lasso problem (7) has greater statistical efficiecy tha aive strategies based o the ordiary Lasso? Coversely, ca the multivariate group Lasso ever be worse tha the ordiary Lasso?
6 6 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN With this motivatio, this paper provides a detailed aalysis of model selectio cosistecy of the multivariate group Lasso (7) with l 1 /l 2 regularizatio. Statistical efficiecy is defied i terms of the scalig of the sample size, as a fuctio of the problem size p ad of the sparsity structure of the regressio matrix B, required for cosistet row selectio. Our aalysis is highdimesioal i ature, allowig both ad p to diverge, ad yieldig explicit error bouds as a fuctio of p. As detailed below, our aalysis provides affirmative aswers to both of the questios above. First, we demostrate that uder certai structural assumptios o the desig ad regressio matrix B, the multivariate group Lasso is always guarateed to outperform the ordiary Lasso, i that it correctly performs row selectio for sample sizes for which the Lasso fails with high probability. Secod, we also exhibit some problems (though arguably ot geeric) for which the multivariate group Lasso will be outperformed by the aive strategy of applyig the Lasso separately to each of the K colums, ad takig the uio of supports Our results. The mai cotributio of this paper is to show that uder certai techical coditios o the desig ad oise matrices, the model selectio performace of blockregularized l 1 /l 2 regressio (7) is govered by the sample complexity fuctio (9) θ l1 /l 2 (, p; B ) := 2ψ(B ) log(p s), where is the sample size, p is the ambiet dimesio, s = S is the umber of rows that are ozero ad ψ( ) is a sparsityoverlap fuctio. Our use of the term sample complexity for θ l1 /l 2 reflects the role it plays i our aalysis as the rate at which the sample size must grow i order to obtai cosistet row selectio as a fuctio of the problem parameters. More precisely, for scaligs (,p,s,b ) such that θ l1 /l 2 (, p; B ) exceeds a fixed critical threshold θ u (0, + ),weshow that the probability of correct row selectio by the l 1 /l 2 multivariate group Lasso coverges to oe, ad coversely, for scaligs such that θ l1 /l 2 (, p; B ) is below aother threshold θ l, we show that the multivariate group Lasso fails with high probability. Whereas the ratio (log p)/ is stadard for the highdimesioal theory of l 1  regularizatio, the fuctio ψ(b ) is a ovel ad iterestig quatity, oe which measures both the sparsity of the matrix B as well as the overlap betwee the differet regressios, represeted by the colums of B [see equatio (16) forthe precise defiitio of ψ(b )]. As a particular illustratio, cosider the special case of a uivariate regressio with K = 1, i which the covex program (7) reduces to the ordiary Lasso (3). I this case, if the desig matrix is draw from the stadard Gaussia esemble [i.e., X ij N(0, 1), i.i.d.], we show that the sparsityoverlap fuctio reduces to ψ(b ) = s, correspodig to the support size of the sigle coefficiet vector. We thus recover as a corollary a previously kow result [Waiwright (2009b)]: amely, the Lasso succeeds i performig exact support recovery oce the ratio /[s log(p s)] exceeds a certai critical threshold. At
7 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 7 the other extreme, for a geuiely multivariate problem with K>1ads ozero rows, agai for a stadard Gaussia desig, whe the regressio matrix is suitably orthoormal relative to the desig (see Sectio 2 for a precise defiitio), the sparsityoverlap fuctio is give by ψ(b ) = s/k. I this case, l 1 /l 2 blockregularizatio has sample complexity lower by a factor of K relative to the aive approach of solvig K separate Lasso problems. Of course, there is also a rage of behavior betwee these two extremes, i which the gai i sample complexity varies smoothly as a fuctio of the sparsityoverlap ψ(b ) i the iterval [ s K,s]. O the other had, we also show that for suitably correlated desigs, it is possible that the sample complexity ψ(b ) associated with l 1 /l 2 blockregularizatio is larger tha that of the ordiary Lasso (l 1 /l 1 ) approach. The remaider of the paper is orgaized as follows. I Sectio 2, we provide a precise statemet of our mai results (Theorems 1 ad 2), discuss some of their cosequeces ad illustrate the close agreemet betwee our theoretical results ad simulatios. Sectios 3 ad 4 are devoted to the proofs of Theorems 1 ad 2, respectively, with the argumets broke dow ito a series of steps. More techical results are deferred to the appedices. We coclude with a brief discussio i Sectio Notatio. We collect here some otatio used throughout the paper. For a (possibly radom) matrix M R p K, we defie the Frobeius orm M F := ( i,j m 2 ij )1/2, ad for parameters 1 a b,thel a /l b block orm is defied as follows: (10) M la /l b := { p ( K i=1 ) a/b } 1/a m ik b. k=1 These vector orms o matrices should be distiguished from the (a, b)operator orms (11) M a,b := sup Mx a x b =1 (although some orms belog to both families; see Lemma 7 i Appedix E). Importat special cases of the latter iclude the spectral orm M 2,2 (also deoted M 2 ), ad the l operator orm M, = max Kj=1 i=1,...,p M ij, deoted M for short. I additio to the usual Ladau otatio O ad o, we write a = (b ) for sequeces such that b a = o(1). We also use the otatio a = (b ) if both a = O(b ) ad b = O(a ) hold. 2. Mai results ad some cosequeces. The aalysis of this paper cosiders the multivariate group Lasso estimator, obtaied as a solutio to the SOCP { } 1 (12) arg mi 2 Y XB 2 F + λ B l1 /l 2 B R p K
8 8 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN for radom esembles of multivariate liear regressio problems, each of the form (4), where the oise matrix W R K is assumed to cosist of i.i.d. elemets W ij N(0,σ 2 ). We cosider radom desig matrices X with each row draw i a i.i.d. maer from a zeromea Gaussia N(0, ),where 0isap p covariace matrix. Although the blockregularized problem (12) eed ot have a uique solutio i geeral, a cosequece of our aalysis is that i the regime of iterest, the solutio is uique, so that we may talk uambiguously about the estimated support Ŝ. The mai object of study i this paper is the probability P[Ŝ = S], where the probability is take both over the radom choice of oise matrix W ad radom desig matrix X. We study the behavior of this probability as elemets of the triplet (,p,s)ted to ifiity Notatio ad assumptios. More precisely, our mai result applies to sequeces of models idexed by (, p(), s()), a associated sequece of p p covariace matrices ad a sequece {B } of coefficiet matrices with row support (13) S := {i βi 0} of size S =s = s(). We use S c to deote its complemet (i.e., S c := {1,...,p}\S). We let (14) bmi := mi i S β i 2 correspod to the miimal l 2 roworm of the coefficiet matrix B over its ozero rows. Give a observed pair (Y, X) from the model (4), the goal is to estimate the row support S of the matrix B. We impose the followig coditios o the covariace of the desig matrix: (A1) Bouded eigespectrum: There exist fixed costats C mi > 0 ad C max < + such that all eigevalues of the s s matrix SS are cotaied i the iterval [C mi,c max ]. (A2) Irrepresetable coditio: There exists a fixed parameter γ (0, 1] such that S c S( SS ) 1 1 γ. (A3) Selficoherece: ( SS ) 1 D max for some D max < +. The lower boud ivolvig C mi i assumptio (A1) prevets excess depedece amog elemets of the desig matrix associated with the support S; coditios of this form are required for model selectio cosistecy or l 2 cosistecy of the Lasso. The upper boud ivolvig C max i assumptio (A1) is ot eeded for provig success but oly failure of the multivariate group Lasso. The irrepresetable coditio ad selficoherece assumptios are also well kow from previous work o variable selectio cosistecy of the Lasso [Meishause ad Bühlma (2006); Tropp (2006); Zhao ad Yu (2006)]. Although such assumptios are ot eeded i aalyzig l 2 or risk cosistecy, they are kow to be
9 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 9 ecessary for variable selectio cosistecy of the Lasso. Ideed, i the absece of such coditios, it is always possible to make the Lasso fail, eve with a arbitrarily large sample size. [See, however, Meishause ad Yu (2009) for methods that weake the irrepresetable coditio.] Note that these assumptios are trivially satisfied by the stadard Gaussia esemble = I p p, with C mi = C max = 1, D max = 1adγ = 1. More geerally, it ca be show that various matrix classes (e.g., Toeplitz matrices, treestructured covariace matrices, bouded offdiagoal matrices) satisfy these coditios [Meishause ad Bühlma (2006); Zhao ad Yu (2006); Waiwright (2009b)]. We require a few pieces of otatio before statig the mai results. For a arbitrary matrix B S R s K with ith row β i R 1 K, we defie the matrix ζ(b S ) R s K with ith row ζ(β i ) := β i (15), β i 2 whe β i 0, ad we set ζ(β i ) = 0 otherwise. With this otatio, the sparsityoverlap fuctio is give by (16) ψ(b):= ζ(b S ) T ( SS ) 1 ζ(b S ) 2, where 2 deotes the spectral orm. We use this sparsityoverlap fuctio to defie the sample complexity parameter, which captures the effective sample size (17) θ l1 /l 2 (, p; B ) := 2ψ(B ) log(p s). I the followig two theorems, we cosider a radom desig matrix X draw with i.i.d. N(0, )row vectors, where satisfies assumptios (A1) (A3), ad a observatio matrix Y specified by model (4). I order to capture depedece iduced by the desig covariace matrix, for ay positive semidefiite matrix Q 0, we defie the quatities (18a) ρ l (Q) := 1 2 mi[q ii + Q jj 2Q ij ] i j ad (18b) ρ u (Q) := max Q ii. i We ote that by defiitio, we have ρ l (Q) ρ u (Q) wheever Q 0. Our bouds are stated i terms of these quatities as applied to the coditioal covariace matrix S c S c S := S c S c S c S( SS ) 1 SS c. Our first result is a achievability result, showig that the multivariate group Lasso succeeds i recoverig the row support ad yields cosistecy i l /l 2 orm. We state this result for sequeces of regularizatio parameters λ =
10 10 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN f(p)log p,wheref(p) + is ay fuctio such that λ 0. We also p + assume that is sufficietly large such that s/ < 1/2. THEOREM 1. Suppose that we solve the multivariate group Lasso with specified regularizatio parameter sequece λ for a sequece of problems idexed by (,p,b, )that satisfy assumptios (A1) (A3), ad such that, for some ν>0, θ l1 /l 2 (, p; B ) = 2ψ(B ) log(p s) >(1 + ν)ρ u( S c S c S) (19) γ 2. The for uiversal costats c i > 0(i.e., idepedet of, p, s, B, ), with probability greater tha 1 c 2 exp( c 3 K log s) c 0 exp( c 1 log(p s)), the followig statemets hold: (a) The multivariate group Lasso has a uique solutio B with row support S(B) that is cotaied withi the true row support S(B ), ad moreover satisfies the (20) boud 8K log s B B l /l 2 C mi + λ D max + 6λ s 2. C mi } {{ } ρ(,s,λ ) ρ (b) If bmi = o(1), the estimate of the row support, S( B) := {i {1,...,p} β i 0}, specified by this uique solutio is equal to the row support set S(B ) of thetruemodel. Note that the theorem is aturally separated ito two distict but related claims. Part (a) guaratees that the method produces o false iclusios ad, moreover, bouds the maximum l 2 error across the rows. Part (b) requires some additioal ρ b mi assumptios amely, the restrictio = o(1) esurig that the error ρ is of lower order tha the miimum l 2 orm bmi across rows but also guaratees the stroger result of o false exclusios as well, so that the method recovers the row support exactly. Note that the probability of these evets coverges to oe oly if both (p s) ad s ted to ifiity, which might seem couterituitive iitially (sice problems with larger support sets s might seem harder). However, as we discuss at the ed of Sectio 3.3, this depedece ca be removed at the expese of a slightly slower covergece rate for B B l /l 2. Our secod mai theorem is a egative result, showig that the multivariate group Lasso fails with high probability if the rescaled sample size θ l1 /l 2 is below a critical threshold. I order to clarify the phrasig of this result, ote that Theorem 1 ca be summarized succictly as guarateeig that there is a uique solutio B with the correct row support that satisfies B B l /l 2 = o(bmi ). The followig result shows that such a guaratee caot hold if the sample size scales too slowly relative to p, s ad the other problem parameters.
11 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 11 THEOREM 2. Cosider problem sequeces idexed by (,p,b, )that satisfy assumptios (A1) (A2), ad with miimum value bmi such that b mi 2 = ( log p ), ad suppose that we solve the multivariate group Lasso with ay positive regularizatio sequece {λ }. The there exist ν>0 ad uiversal costats c i > 0 such that if the sample size is lower bouded as θ l1 /l 2 (, p; B ) = 2ψ(B ) log(p s) <(1 ν)ρ l( S c S c S) (2 γ) 2, the with probability greater tha 1 c 0 exp{ c 1 mi( K s, θ l 2 log(p s))}, there is o solutio B of the multivariate group Lasso that has the correct row support ad satisfies the boud B B l /l 2 = o(bmi ). The proof of this claim is provided i Sectio 4. We ote that iformatiotheoretic methods [Waiwright (2009a)] imply that o method (icludig the multivariate group Lasso) ca perform exact support recovery uless /s +, so that the probability give i Theorem 2 coverges to oe uder the give coditios. Note that Theorems 1 ad 2 i cojuctio imply that the rescaled sample size θ l1 /l 2 (, p; B ) = 2ψ(B ) log(p s) captures the behavior of the multivariate group Lasso for support recovery ad estimatio i block l /l 2 orm. For the special case of radom desig matrices draw from the stadard Gaussia esemble (i.e., = I p p ), the give scaligs are sharp: COROLLARY 1. For the stadard Gaussia esemble, the multivariate group Lasso udergoes a sharp threshold at the level θ l1 /l 2 (,p,b ) = 1. More specifically, for ay δ>0: (a) For problem sequeces (,p,b ) such that θ l1 /l 2 (,p,b )>1 + δ, the multivariate group Lasso succeeds with high probability. (b) Coversely, for sequeces such that θ l1 /l 2 (,p,b )<1 δ, the multivariate group Lasso fails with high probability. PROOF. I the special case = I p p, it is straightforward to verify that all the assumptios are satisfied: i particular, we have C mi = C max = 1, D max = 1 ad γ = 1. Moreover, a short calculatio shows that ρ u (I) = ρ l (I) = 1. Cosequetly, the thresholds give i the sufficiet coditio (19) ad the ecessary coditio (21) are both equal to oe Efficiet estimatio of idividual supports. The precedig results address exact recovery of the support uio of the regressio matrix B. As demostrated by the followig procedure ad the associated corollary of Theorem 1, oce the
12 12 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN row support has bee recovered, it is straightforward to recover the idividual supports of each colum of the regressio matrix via the additioal steps of performig ordiary least squares ad thresholdig. Efficiet multistage estimatio of idividual supports: (1) estimate the support uio with Ŝ, the support uio of the solutio B of the multivariate group Lasso; (2) compute the restricted ordiary least squares (ROLS) estimate, (21) BŜ := arg mi Y XŜBŜ F BŜ for the restricted multivariate problem; (3) compute the matrix T( BŜ) obtaied by thresholdig BŜ at the level 2log(K Ŝ ) 2 C mi, ad estimate the idividual supports by the ozero etries of T( BŜ). The followig result, which is proved i Appedix A, shows that uder the assumptios of Theorem 1, the additioal postprocessig applied to the support uio estimate will recover the idividual supports with high probability: COROLLARY 2. Uder assumptios (A1) (A3) ad the additioal assumptios of Theorem 1, if for all idividual ozero coefficiets βik,i S,1 k K, we have βik 2 4log(Ks) C mi, the with probability greater tha 1 (exp( c 0 K log s)) the above twostep estimatio procedure recovers the idividual supports of B Some cosequeces of Theorems 1 ad 2. We begi by makig some simple observatios about the sparsityoverlap fuctio. LEMMA 1. (a) For ay desig satisfyig assumptio (A1), the sparsityoverlap ψ(b ) obeys the bouds s (22) C max K ψ(b ) s ; C mi (b) if SS = I s s, ad if the colums (Z (k) ) of the matrix Z = ζ(b ) are orthogoal, the the sparsityoverlap fuctio is ψ(b ) = max k=1,...,k Z (k) 2 2. The proof of this claim is provided i Appedix C. Based o this lemma, we ow study some special cases of Theorems 1 ad 2. The simplest special case is the uivariate regressio problem (K = 1), i which case the quatity ζ(β ) [as defied i equatio (15)] simply yields a sdimesioal sig vector with elemets zi = sig(β i ). [Recall that the sig fuctio is defied as sig(0) = 0, sig(x) = 1
13 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 13 if x>0 ad sig(x) = 1 if x<0.] I this case, the sparsityoverlap fuctio is give by ψ(β ) = z T ( SS ) 1 z, ad as a cosequece of Lemma 1(a), we have ψ(β ) = (s). Cosequetly, a simple corollary of Theorems 1 ad 2 is that the Lasso succeeds oce the ratio /(2s log(p s)) exceeds a certai critical threshold, determied by the eigespectrum ad icoherece properties of, ad it fails below a certai threshold. This result matches the ecessary ad sufficiet coditios established i previous work o the Lasso [Waiwright (2009b)]. We ca also use Lemma 1 ad Theorems 1 ad 2 to compare the performace of the multivariate group Lasso to the followig (arguably aive) strategy for row selectio usig the ordiary Lasso. Row selectio usig ordiary Lasso: (1) Apply the ordiary Lasso separately to each of the K uivariate regressio problems specified by the colums of B, thereby obtaiig estimates β (k) for k = 1,...,K. (2) For k = 1,...,K, estimate the support of idividual colums via Ŝk := {i β (k) i 0}. (3) Estimate the row support by takig the uio: Ŝ = K Ŝk. k=1 To uderstad the coditios goverig the success/failure of this procedure, ote that it succeeds if ad oly if for each ozero row i S = K k=1 S k,thevariable β (k) i is ozero for at least oe k, adforallj S c ={1,...,p}\S, thevariable β (k) j = 0forallk = 1,...,K. From our uderstadig of the uivariate case, we kow that for θ u = ρ u( S c S c S ), the coditio γ 2 2θ u max ψ( β (k) ) S log(p sk ) 2θ u max ψ( β (k) ) (23) S log(p s) k=1,...,k k=1,...,k is sufficiet to esure that the ordiary Lasso succeeds i row selectio. Coversely, for θ l = ρ l( S c S c S ), if the sample size is upper bouded as (2 γ) 2 <2θ l max ψ( β (k) ) S log(p s), k=1,...,k the there will exist some j S c such for at least oe k {1,...,K}, there holds β (k) j 0 with high probability, implyig failure of the ordiary Lasso. A atural questio is whether the multivariate group Lasso, by takig ito accout the coupligs across colums, always outperforms (or at least matches) the aive strategy. The followig result, prove i Appedix D, shows that if the desig is ucorrelated o the support, the ideed this is the case. COROLLARY 3 (Multivariate group Lasso versus ordiary Lasso). Assume SS = I s s. The for ay multivariate regressio problem, row selectio usig
14 14 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN the ordiary Lasso strategy requires, with high probability, at least as may samples as the l 1 /l 2 multivariate group Lasso. I particular, the relative efficiecy of multivariate group Lasso versus ordiary Lasso is give by the ratio (24) max k=1,...,k ψ(β (k) S ) log(p s k ) ψ(b S ) log(p s) 1. We cosider the special case of idetical regressios, for which a result ca be stated for ay covariace desig ad the case of orthoormal regressios which illustrates Corollary 3. EXAMPLE 1 (Idetical regressios). Suppose that B := β 1 T K that is, B cosists of K copies of the same coefficiet vector β R p, with support of cardiality S =s.wethehave[ζ(b )] ij = sig(βi )/ K, from which we see that ψ(b ) = z T ( SS ) 1 z, with z beig a sdimesioal sig vector with elemets zi = sig(β i ). Cosequetly, we have the equality ψ(b ) = ψ(β ),sothat uder our aalysis there is o beefit i usig the multivariate group Lasso relative to the strategy of solvig separate Lasso problems ad costructig the uio of idividually estimated supports. This behavior may seem couterituitive, sice uder the model (4) we essetially have K observatios of the coefficiet vector β with the same desig matrix but K idepedet oise realizatios, which could help to reduce the effective oise variace from σ 2 to σ 2 /K if the fact that the regressios are idetical is kow. It must be bore i mid, however, that i our highdimesioal aalysis the oise variace does ot grow as the dimesioality grows, ad thus asymptotically the oise is domiated by the iterferece betwee the covariates, which grows as (p s). It is thus this highdimesioal iterferece that domiates the rates give i Theorems 1 ad 2. I cotrast to this pessimistic example, we ow tur to the most optimistic extreme: EXAMPLE 2 ( Orthoormal regressios). Suppose that SS = I s s ad (for s>k) suppose that B is costructed such that the colums of the s K matrix ζ(b ) are orthogoal ad with equal orm (which implies their orm equals sk ). Uder these coditios, we claim that the sample complexity of multivariate group Lasso is smaller tha that of the ordiary Lasso by a factor of 1/K. Ideed, usig Lemma 1(b), we observe that Kψ(B ) = K Z (1) 2 = K Z (k) 2 = tr(z T Z ) = tr(z Z T ) = s, k=1 because Z Z T R s s is the Gram matrix of s uit vectors i R k ad its diagoal elemets are therefore all equal to 1. Cosequetly, the multivariate group Lasso
15 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 15 recovers the row support with high probability for sequeces such that 2(s/K) log(p s) > 1, which allows for sample sizes K times smaller tha the ordiary Lasso approach. Corollary 3 ad the subsequet examples address the case of ucorrelated desig ( SS = I s s ) o the row support S, for which the multivariate group Lasso is ever worse tha the ordiary Lasso i performig row selectio. The followig example shows that if the supports are disjoit, the ordiary Lasso has the same sample complexity as the multivariate group Lasso for ucorrelated desig SS = I s s, but ca be better tha the multivariate group Lasso for desigs SS with suitable correlatios: COROLLARY 4 (Disjoit supports). Suppose that the support sets S k of idividual regressio problems are disjoit. The for ay desig covariace SS, we have max ψ( β (k) ) (a) S ψ(bs ) (b) K ψ ( β (k) ) (25) S. 1 k K PROOF. so that Z (k) S First ote that, sice all supports are disjoit, Z (k) i = sig(βik ), = ζ(β (k) S ). Iequality (b) is the immediate because we have ). To establish iequality (a), we ote that Z S T 1 SS Z S 2 tr(z S T 1 SS Z S ψ(b ) = max x R K : x 1 k=1 x T Z S T 1 SS Z S x max 1 k K et k Z S T 1 SS Z S e k max 1 k K Z(k) T S 1 SS Z(k) S. A caveat i iterpretig Corollary 4, ad more geerally i comparig the performace of the ordiary Lasso ad the multivariate group Lasso, is that for a geeral covariace matrix SS, assumptios (A2) ad (A3) required by Theorem 1 do ot iduce the same costraits o the covariace matrix whe applied to the multivariate problem as whe applied to the idividual regressios. Ideed, i the latter case, (A2) would require max k S c k S k 1 S k S k 1 γ ad (A3) would require max k 1 S k S k D max. Thus (A3) is a stroger assumptio i the multivariate case but (A2) is ot. We illustrate Corollary 4 with a example. EXAMPLE 3 (Disjoit support with ucorrelated desig). Suppose that SS = I s s, ad the supports are disjoit. I this case, we claim that the sample complexity of the l 1 /l 2 multivariate group Lasso is the same as the ordiary Lasso.
16 16 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN If the idividual regressios have disjoit support, the ZS = ζ(b S ) has oly a sigle ozero etry per row ad therefore the colums of Z are orthogoal. Moreover, Zik = sig(β(k) i ). By Lemma 1(b), the sparsityoverlap fuctio ψ(b ) is equal to the largest squared colum orm. But Z (k) 2 = s i=1 sig(β (k) i ) 2 = s k. Thus, the sample complexity of the multivariate group Lasso is the same as the ordiary Lasso i this case. 2 Fially, we cosider a example that illustrates the effect of correlated desigs: EXAMPLE 4 (Effects of correlated desigs). To illustrate the behavior of the sparsityoverlap fuctio i the presece of correlatios i the desig, we cosider the simple case of two regressios with support of size 2. For parameters ϑ 1 ad ϑ 2 [0,π] ad μ ( 1, +1), cosider regressio matrices B such that B = ζ(bs ) ad [ ] [ ] ζ(bs ) = cos(ϑ1 ) si(ϑ 1 ) ad 1 1 μ (26) SS cos(ϑ 2 ) si(ϑ 2 ) =. μ 1 Settig M = ζ(bs )T SS 1 ζ(b S ), a simple calculatio shows that tr(m ) = 2 ( 1 + μ cos(ϑ 1 ϑ 2 ) ) ad det(m ) = (1 μ 2 ) si(ϑ 1 ϑ 2 ) 2, so that the eigevalues of M are μ + = (1 + μ) ( 1 + cos(ϑ 1 ϑ 2 ) ) ad μ = (1 μ) ( 1 cos(ϑ 1 ϑ 2 ) ), ad ψ(b ) = max(μ +,μ ). O the other had, with z 1 = ζ ( β (1) ) ( ) sig(cos(ϑ1 )) = sig(cos(ϑ 2 )) ad z 2 = ζ ( β (2) ) = we have ( ) sig(si(ϑ1 )) sig(si(ϑ 2 )) ψ ( β (1) ) = z 1 T 1 SS z 1 = 1 {cos(ϑ1 ) 0} + 1 {cos(ϑ2 ) 0} + 2μ sig(cos(ϑ 1 ) cos(ϑ 2 )), ψ ( β (2) ) = z 2 T 1 SS z 2 = 1 {si(ϑ1 ) 0} + 1 {si(ϑ2 ) 0} + 2μ sig(si(ϑ 1 ) si(ϑ 2 )). Figure 1 provides a graphical compariso of these sample complexity fuctios. The fuctio ψ(b ) = max(ψ(β (1) ), ψ(β (2) )) is discotiuous o S = π 2 Z R R π 2 Z, ad, as a cosequece, so is its differece with ψ(b ). Note that, for fixed ϑ 1 or fixed ϑ 2, some of these discotiuities are removable discotiuities of the iduced fuctio o the other variable, ad these discotiuities therefore 2 I makig this assertio, we are igorig ay differece betwee log(p s k ) ad log(p s), which is valid, for istace, i the regime of subliear sparsity, whe s k /p 0ads/p 0.
17 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 17 FIG. 1. Compariso of sparsityoverlap fuctios for l 1 /l 2 ad the Lasso. For the pair 1 2π (ϑ 1,ϑ 2 ), we represet i each row of plots, correspodig respectively to μ = 0(top), 0.9 (middle) ad 0.9 (bottom), from left to right, the quatities: ψ(b ) (left), max(ψ(β (1) ), ψ(β (2) )) (ceter) ad max(0,ψ(b ) max(ψ(β (1) ), ψ(β (2) ))) (right). The latter idicates whe the iequality ψ(b ) max(ψ(β (1) ), ψ(β (2) )) does ot hold ad by how much it is violated. create eedles, slits or flaps i the graph of the fuctio ψ. Deote by R + ad R the sets R + ={(ϑ 1,ϑ 2 ) mi[cos(ϑ 1 ) cos(ϑ 2 ), si(ϑ 1 ) si(ϑ 2 )] > 0}, R ={(ϑ 1,ϑ 2 ) max[cos(ϑ 1 ) cos(ϑ 2 ), si(ϑ 1 ) si(ϑ 2 )] < 0} o which ψ(b ) reaches its miimum value whe μ 0.5 adμ 0.5, respectively (see middle ad bottom ceter plots i Figure 4). For μ = 0, the top ceter graph illustrates that ψ(b ) is equal to 2 except for the cases of matrices BS with disjoit support, correspodig to the discrete set D ={(k π 2,(k± 1) π 2 ), k Z} for which it equals 1. The top rightmost graph illustrates that, as show i Corollary 3, the iequality always holds for a ucorrelated desig. For μ>0, the iequality ψ(b ) max(ψ(β (1) ), ψ(β (2) )) is violated oly o a subset of S R ;ad for μ<0, the iequality is symmetrically violated o a subset of S R + (see Figure 4).
18 18 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN 2.4. Illustrative simulatios. I this sectio, we preset the results of simulatios that illustrate the sharpess of Theorems 1 ad 2, ad furthermore demostrate how quickly the predicted behavior is observed as elemets of the triple (, p, s) grow i differet regimes. We explore the case of two regressios (i.e., K = 2) which share a idetical support set S with cardiality S =s i Sectio ad cosider a slightly more geeral case i Sectio Threshold effect i the stadard Gaussia case. The first set of experimets was desiged to reveal the threshold effect predicted by Theorems 1 ad 2. The desig matrix X is sampled from the stadard Gaussia esemble, with i.i.d. etries X ij N(0, 1). We cosider two types of sparsity: logarithmic sparsity, where s = α log(p), for α = 2/ log(2), ad liear sparsity, where s = αp,forα = 1/8 for various ambiet model dimesios p {16, 32, 64, 256, 512, 1024}. Fora give triplet (,p,s), we solve the blockregularized problem (12) with the regularizatio parameter λ = log(p s)(log s)/. For each fixed (p, s) pair, we measure the sample complexity i terms of a parameter θ, i particular lettig = θslog(p s) for θ [0.25, 1.5].WeletthematrixB R p 2 of regressio coefficiets have etries βij i { 1/ 2, 1/ 2}, choosig the parameters to vary the agle betwee the two colums, thereby obtaiig various desired values of ψ(b ). Sice = I p p for the stadard Gaussia esemble, the sparsityoverlap fuctio ψ(b ) is simply the maximal eigevalue of the Gram matrix ζ(bs )T ζ(bs ). Sice βij =1/ 2 by costructio, we are guarateed that BS = ζ(b S ), that the miimum value bmi = 1, ad, moreover, that the colums of ζ(b S ) have the same Euclidea orm. To costruct parameter matrices B that satisfy βij =1/ 2, we choose both p ad the sparsity scaligs so that the obtaied values for s are multiples of four. We the costruct the colums Z (1) ad Z (2) of the matrix B = ζ(b ) from copies of vectors of legth four. Deotig by the usual matrix tesor product, we cosider the followig 4vectors: Idetical regressios: We set Z (1) = Z (2) = s,sothatthesparsityoverlap fuctio is ψ(b ) = s. Orthoormal regressios: Here B is costructed with Z (1) Z (2),sothat ψ(b ) = 2 s, the most favorable situatio. I order to achieve this orthoormality, we set Z (1) = s ad Z (2) = s/2 (1, 1) T. Itermediate agles: I this itermediate case, the colums Z (1) ad Z (2) are at a 60 agle, which leads to ψ(b ) = 3 4 s. Specifically, we set Z(1) = s ad Z (2) = s/4 (1, 1, 1, 1) T. Figure 2 shows plots for liear sparsity (left colum) ad logarithmic sparsity (right colum) i all three cases ad where the multivariate group Lasso was used
19 SUPPORT UNION RECOVERY IN MULTIVARIATE REGRESSION 19 FIG. 2. Plots of support uio recovery probability, P[Ŝ = S], versus the cotrol parameter θ = /[2s log(p s)] for two differet types of sparsity: liear sparsity i the left colum (s = p/8) ad logarithmic sparsity i the right colum (s = 2log 2 (p)). The first three rows are based o usig the multivariate group Lasso to estimate the support for the three cases of idetical regressio, itermediate agles ad orthoormal regressios. The fourth row presets results for the Lasso i the case of idetical regressios.
20 20 G. OBOZINSKI, M. J. WAINWRIGHT AND M. I. JORDAN FIG. 3. Plots of support recovery probability, P[Ŝ = S], versus the cotrol parameter θ = /[2s log(p s)] for two differet type of sparsity: logarithmic sparsity o top (s = O(log(p))) ad liear sparsity o bottom (s = αp), ad for icreasig values of p from left to right. The oise level is set at σ = 0.1. Each graph shows four curves (black, red, gree, blue) correspodig to the case of idepedet l 1 regularizatio, ad, for l 1 /l 2 regularizatio, the cases of idetical regressio, itermediate agles ad orthoormal regressios. Note how curves correspodig to the same case across differet problem sizes p all coicide, as predicted by Theorems 1 ad 2. Moreover, cosistet with the theory, the curves for the idetical regressio group reach P[Ŝ = S] 0.50 at θ 1, whereas the orthoormal regressio group reaches 50% success substatially earlier. (top three rows), as well as the referece Lasso case for the case of idetical regressios (bottom row). Each pael plots the success probability, P[Ŝ = S],versus the rescaled sample size θ = /[2s log(p s)]. Uder this rescalig, Theorems 1 ad 2 predict that the curves should alig, ad that the success probability should trasitio to 1 oce θ exceeds a critical threshold (depedet o the type of esemble). Note that for suitably large problem sizes (p 128), the curves do alig i the predicted way, showig stepfuctio behavior. Figure 3 plots data from the same simulatios i a differet format. Here the top row correspods to logarithmic sparsity, ad the bottom row to liear sparsity; each pael shows the four differet choices for B, with the problem size p icreasig from left to right. Note how i each pael the locatio of the trasitio of P[Ŝ = S] to oe shifts from right to
The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
Joural of Machie Learig Research 0 2009 22952328 Submitted 3/09; Revised 5/09; ublished 0/09 The Noparaormal: Semiparametric Estimatio of High Dimesioal Udirected Graphs Ha Liu Joh Lafferty Larry Wasserma
More informationStéphane Boucheron 1, Olivier Bousquet 2 and Gábor Lugosi 3
ESAIM: Probability ad Statistics URL: http://wwwemathfr/ps/ Will be set by the publisher THEORY OF CLASSIFICATION: A SURVEY OF SOME RECENT ADVANCES Stéphae Bouchero 1, Olivier Bousquet 2 ad Gábor Lugosi
More informationSOME GEOMETRY IN HIGHDIMENSIONAL SPACES
SOME GEOMETRY IN HIGHDIMENSIONAL SPACES MATH 57A. Itroductio Our geometric ituitio is derived from threedimesioal space. Three coordiates suffice. May objects of iterest i aalysis, however, require far
More informationConsistency of Random Forests and Other Averaging Classifiers
Joural of Machie Learig Research 9 (2008) 20152033 Submitted 1/08; Revised 5/08; Published 9/08 Cosistecy of Radom Forests ad Other Averagig Classifiers Gérard Biau LSTA & LPMA Uiversité Pierre et Marie
More informationCounterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Joural of Machie Learig Research 14 (2013) 32073260 Submitted 9/12; Revised 3/13; Published 11/13 Couterfactual Reasoig ad Learig Systems: The Example of Computatioal Advertisig Léo Bottou Microsoft 1
More informationEverything You Always Wanted to Know about Copula Modeling but Were Afraid to Ask
Everythig You Always Wated to Kow about Copula Modelig but Were Afraid to Ask Christia Geest ad AeCatherie Favre 2 Abstract: This paper presets a itroductio to iferece for copula models, based o rak methods.
More informationStatistica Siica 6(1996), 31139 EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM Zhidog Bai ad Hewa Saraadasa Natioal Su Yatse Uiversity Abstract: With the rapid developmet of moder computig
More informationHOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1
1 HOW MANY TIMES SHOULD YOU SHUFFLE A DECK OF CARDS? 1 Brad Ma Departmet of Mathematics Harvard Uiversity ABSTRACT I this paper a mathematical model of card shufflig is costructed, ad used to determie
More informationSystemic Risk and Stability in Financial Networks
America Ecoomic Review 2015, 105(2): 564 608 http://dx.doi.org/10.1257/aer.20130456 Systemic Risk ad Stability i Fiacial Networks By Daro Acemoglu, Asuma Ozdaglar, ad Alireza TahbazSalehi * This paper
More informationSignal Reconstruction from Noisy Random Projections
Sigal Recostructio from Noisy Radom Projectios Jarvis Haut ad Robert Nowak Deartmet of Electrical ad Comuter Egieerig Uiversity of WiscosiMadiso March, 005; Revised February, 006 Abstract Recet results
More informationWhich Extreme Values Are Really Extreme?
Which Extreme Values Are Really Extreme? JESÚS GONZALO Uiversidad Carlos III de Madrid JOSÉ OLMO Uiversidad Carlos III de Madrid abstract We defie the extreme values of ay radom sample of size from a distributio
More informationJ. J. Kennedy, 1 N. A. Rayner, 1 R. O. Smith, 2 D. E. Parker, 1 and M. Saunby 1. 1. Introduction
Reassessig biases ad other ucertaities i seasurface temperature observatios measured i situ sice 85, part : measuremet ad samplig ucertaities J. J. Keedy, N. A. Rayer, R. O. Smith, D. E. Parker, ad M.
More informationNo Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large Dimensional Sample Covariance Matrices
No igevalues Outside the Support of the Limitig Spectral Distributio of Large Dimesioal Sample Covariace Matrices By Z.D. Bai ad Jack W. Silverstei 2 Natioal Uiversity of Sigapore ad North Carolia State
More informationCrowds: Anonymity for Web Transactions
Crowds: Aoymity for Web Trasactios Michael K. Reiter ad Aviel D. Rubi AT&T Labs Research I this paper we itroduce a system called Crowds for protectig users aoymity o the worldwideweb. Crowds, amed for
More informationHow Has the Literature on Gini s Index Evolved in the Past 80 Years?
How Has the Literature o Gii s Idex Evolved i the Past 80 Years? Kua Xu Departmet of Ecoomics Dalhousie Uiversity Halifax, Nova Scotia Caada B3H 3J5 Jauary 2004 The author started this survey paper whe
More informationTesting for Welfare Comparisons when Populations Differ in Size
Cahier de recherche/workig Paper 039 Testig for Welfare Comparisos whe Populatios Differ i Size JeaYves Duclos Agès Zabsoré Septembre/September 200 Duclos: Départemet d écoomique, PEP ad CIRPÉE, Uiversité
More informationA Kernel TwoSample Test
Joural of Machie Learig Research 3 0) 73773 Subitted 4/08; Revised /; Published 3/ Arthur Gretto MPI for Itelliget Systes Speastrasse 38 7076 Tübige, Geray A Kerel TwoSaple Test Karste M. Borgwardt Machie
More informationDryad: Distributed DataParallel Programs from Sequential Building Blocks
Dryad: Distributed DataParallel Programs from Sequetial uildig locks Michael Isard Microsoft esearch, Silico Valley drew irrell Microsoft esearch, Silico Valley Mihai udiu Microsoft esearch, Silico Valley
More informationType Less, Find More: Fast Autocompletion Search with a Succinct Index
Type Less, Fid More: Fast Autocompletio Search with a Succict Idex Holger Bast MaxPlackIstitut für Iformatik Saarbrücke, Germay bast@mpiif.mpg.de Igmar Weber MaxPlackIstitut für Iformatik Saarbrücke,
More informationSoftware Reliability via RuTime ResultCheckig Hal Wasserma Uiversity of Califoria, Berkeley ad Mauel Blum City Uiversity of Hog Kog ad Uiversity of Califoria, Berkeley We review the eld of resultcheckig,
More informationBOUNDED GAPS BETWEEN PRIMES
BOUNDED GAPS BETWEEN PRIMES ANDREW GRANVILLE Abstract. Recetly, Yitag Zhag proved the existece of a fiite boud B such that there are ifiitely may pairs p, p of cosecutive primes for which p p B. This ca
More informationTeaching Bayesian Reasoning in Less Than Two Hours
Joural of Experimetal Psychology: Geeral 21, Vol., No. 3, 4 Copyright 21 by the America Psychological Associatio, Ic. 963445/1/S5. DOI: 1.7//963445..3. Teachig Bayesia Reasoig i Less Tha Two Hours Peter
More informationThe Unicorn, The Normal Curve, and Other Improbable Creatures
Psychological Bulleti 1989, Vol. 105. No.1, 156166 The Uicor, The Normal Curve, ad Other Improbable Creatures Theodore Micceri 1 Departmet of Educatioal Leadership Uiversity of South Florida A ivestigatio
More informationOn the Number of CrossingFree Matchings, (Cycles, and Partitions)
O the Number of CrossigFree Matchigs, (Cycles, ad Partitios Micha Sharir Emo Welzl Abstract We show that a set of poits i the plae has at most O(1005 perfect matchigs with crossigfree straightlie embeddig
More informationON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS and A. RÉNYI. Introduction
ON THE EVOLUTION OF RANDOM GRAPHS by P. ERDŐS ad A. RÉNYI Itroductio Dedicated to Professor P. Turá at his 50th birthday. Our aim is to study the probable structure of a radom graph r N which has give
More informationG r a d e. 5 M a t h e M a t i c s. Number
G r a d e 5 M a t h e M a t i c s Number Grade 5: Number (5.N.1) edurig uderstadigs: the positio of a digit i a umber determies its value. each place value positio is 10 times greater tha the place value
More informationThe Arithmetic of Investment Expenses
Fiacial Aalysts Joural Volume 69 Number 2 2013 CFA Istitute The Arithmetic of Ivestmet Expeses William F. Sharpe Recet regulatory chages have brought a reewed focus o the impact of ivestmet expeses o ivestors
More information4. Trees. 4.1 Basics. Definition: A graph having no cycles is said to be acyclic. A forest is an acyclic graph.
4. Trees Oe of the importat classes of graphs is the trees. The importace of trees is evidet from their applicatios i various areas, especially theoretical computer sciece ad molecular evolutio. 4.1 Basics
More informationTurning Brownfields into Greenspaces: Examining Incentives and Barriers to Revitalization
Turig Browfields ito Greespaces: Examiig Icetives ad Barriers to Revitalizatio Juha Siikamäki Resources for the Future Kris Werstedt Virgiia Tech Uiversity Abstract This study employs iterviews, documet
More informationLeadership Can Be Learned, But How Is It Measured?
Maagemet Scieces for Health NO. 8 (2008) O C C A S I O N A L PA P E R S Leadership Ca Be Leared, But How Is It Measured? How does leadership developmet cotribute to measurable chages i orgaizatioal performace,
More information