AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99


 Dominic Carpenter
 3 years ago
 Views:
Transcription
1 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric additive model of a coditioal mea fuctio i which the umber of variables ad additive compoets may be larger tha the sample size but the umber of ozero additive compoets is small relative to the sample size. The statistical problem is to determie which additive compoets are ozero. The additive compoets are approximated by trucated series expasios with Bsplie bases. With this approximatio, the problem of compoet selectio becomes that of selectig the groups of coefficiets i the expasio. We apply the adaptive group Lasso to select ozero compoets, usig the group Lasso to obtai a iitial estimator ad reduce the dimesio of the problem. We give coditios uder which the group Lasso selects a model whose umber of compoets is comparable with the uderlyig model, ad the adaptive group Lasso selects the ozero compoets correctly with probability approachig oe as the sample size icreases ad achieves the optimal rate of covergece. The results of Mote Carlo experimets show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the applicatio of the proposed method. KEY WORDS: adaptive group Lasso; compoet selectio; highdimesioal data; oparametric regressio; selectio cosistecy. Short title. Noparametric compoet selectio AMS 2000 subject classificatio. Primary 62G08, 62G20; secodary 62G99 AOS R2 (December 7,
2 1 Itroductio Let (Y i, X i, i = 1,..., be radom vectors that are idepedetly ad idetically distributed as (Y, X, where Y is a respose variable ad X = (X 1,..., X p is a pdimesioal covariate vector. Cosider the oparametric additive model p Y i = µ + f j (X ij + ε i, (1 j=1 where µ is a itercept term, X ij is the jth compoet of X i, the f j s are ukow fuctios, ad ε i is a uobserved radom variable with mea zero ad fiite variace σ 2. Suppose that some of the additive compoets f j are zero. The problem addressed i this paper is to distiguish the ozero compoets from the zero compoets ad estimate the ozero compoets. We allow the possibility that p is larger tha the sample size, which we represet by lettig p icrease as icreases. We propose a pealized method for variable selectio i (1 ad show that the proposed method ca correctly select the ozero compoets with high probability. There has bee much work o pealized methods for variable selectio ad estimatio with highdimesioal data. Methods that have bee proposed iclude the bridge estimator (Frak ad Friedma 1993; Huag, Horowitz ad Ma 2008; least absolute shrikage ad selectio operator or Lasso (Tibshirai 1996, the smoothly clipped absolute deviatio (SCAD pealty (Fa ad Li 2001; Fa ad Peg 2004, ad the miimum cocave pealty (Zhag Much progress has bee made i uderstadig the statistical properties of these methods. I particular, may authors have studied the variable selectio, estimatio ad predictio properties of the Lasso i highdimesioal settigs. See for example, Meishause ad Bühlma (2006; Zhao ad Yu (2006; Zou (2006; Buea, Tsybakov ad Wegkamp (2007; Meishause ad Yu (2008; Huag, Ma ad Zhag (2008; va de Geer (2008 ad Zhag ad Huag (2008, amog others. All these authors assume a liear or other parametric model. I may applicatios, however, there is little a priori justificatio for assumig that the effects of covariates take a liear form or belog to ay other kow, fiitedimesioal parametric family. For example, i studies of ecoomic developmet, the effects of covariates o the growth of gross domestic product ca be oliear. Similarly, there is evidece of oliearity i the gee expressio data used i the empirical example i Sectio 5. There is a large body of literature o estimatio i oparametric additive models. For example, Stoe (1985, 1986 showed that additive splie estimators achieve the same optimal rate of covergece for a geeral fixed p as for p = 1. Horowitz ad Mamme (2004 ad Horowitz, 2
3 Klemelä, ad Mamme (2006 showed that if p is fixed ad mild regularity coditios hold, the oracleefficiet estimates of the f j s ca be obtaied by a twostep procedure. Here, oracle efficiecy meas that the estimator of each f j has the same asymptotic distributio that it would have if all the other f j s were kow. However, these papers do ot discuss variable selectio i oparametric additive models. Zhag et al. (2004 ad Li ad Zhag (2006 have ivestigated the use of pealizatio methods i smoothig splie ANOVA with a fixed umber of covariates. Zhag et al. (2004 used a Lassotype pealty but did ot ivestigate modelselectio cosistecy. Li ad Zhag (2006 proposed the compoet selectio ad smoothig operator (COSSO method for model selectio ad estimatio i multivariate oparametric regressio models. For fixed p, they showed that the COSSO estimator i the additive model coverges at the rate d/(2d+1, where d is the order of smoothess of the compoets. They also showed that, i the special case of a tesor product desig, the COSSO correctly selects the ozero additive compoets with high probability. Zhag ad Li (2006 cosidered the COSSO for oparametric regressio i expoetial families. Meier, va de Geer, ad Bühlma (2008 treat variable selectio i a oparametric additive model i which the umbers of zero ad ozero f j s may both be larger tha. They propose a pealized leastsquares estimator for variable selectio ad estimatio. They give coditios uder which, with probability approachig 1, their procedure selects a set of f j s cotaiig all the additive compoets whose distace from zero i a certai metric exceeds a specified threshold. However, they do ot establish modelselectio cosistecy of their procedure. Eve asymptotically, the selected set may be larger tha the set of ozero f j s. Moreover, they impose a compatibility coditio that relates the levels ad smoothess of the f j s. The compatibility coditio does ot have a straightforward, ituitive iterpretatio ad, as they poit out, caot be checked empirically. Ravikumar, Liu, Lafferty ad Wasserma (2009 proposed a pealized approach for variable selectio i oparametric additive models. I their approach, the pealty is imposed o the l 2 orm of the oparametric compoets, as well as the mea value of the compoets to esure idetifiability. I their theoretical results, they require that the eigevalues of a desig matrix be bouded away from zero ad ifiity, where the desig matrix is formed from the basis fuctios for the ozero compoets. It is ot clear whether this coditio holds i geeral, especially whe the umber of ozero compoets diverges with. Aother critical coditio required i the results of Ravikumar et al. (2009 is similar to the irrepresetable coditio of Zhao ad Yu (2007. It is ot clear for what type of basis fuctios this coditio is satisfied. We do ot require such a coditio i our results o selectio cosistecy of the adaptive group Lasso. 3
4 Several other recet papers have also cosidered variable selectio i oparametric models. For example, Wag, Che ad Li (2007 ad Wag ad Xia (2008 cosidered the use of group Lasso ad SCAD methods for model selectio ad estimatio i varyig coefficiet models with a fixed umber of coefficiets ad covariates. Bach (2007 applies what amouts to the group Lasso to a oparametric additive model with a fixed umber of covariates. He established model selectio cosistecy uder coditios that are cosiderably more complicated tha the oes we require for a possibly divergig umber of covariates. I this paper, we propose to use the adaptive group Lasso for variable selectio i (1 based o a splie approximatio to the oparametric compoets. With this approximatio, each oparametric compoet is represeted by a liear combiatio of splie basis fuctios. Cosequetly, the problem of compoet selectio becomes that of selectig the groups of coefficiets i the liear combiatios. It is atural to apply the group Lasso method, sice it is desirable to take ito the groupig structure i the approximatig model. To achieve model selectio cosistecy, we apply the group Lasso iteratively as follows. First, we use the group Lasso to obtai a iitial estimator ad reduce the dimesio of the problem. The we use the adaptive group Lasso to select the fial set of oparametric compoets. The adaptive group Lasso is a simple geeralizatio of the adaptive Lasso (Zou 2006 to the method of the group Lasso (Yua ad Li However, here we apply this approach to oparametric additive modelig. We assume that the umber of ozero f j s is fixed. This eables us to achieve model selectio cosistecy uder simple assumptios that are easy to iterpret. We do ot have to impose compatibility or irrepresetable coditios, or do we eed to assume coditios o the eigevalues of certai matrices formed from the splie basis fuctios. We show that the group Lasso selects a model whose umber of compoets is bouded with probability approachig oe by a costat that is idepedet of the sample size. The, usig the group Lasso result as the iitial estimator, the adaptive group Lasso selects the correct model with probability approachig 1 ad achieves the optimal rate of covergece for oparametric estimatio of a additive model. The remaider of the paper is orgaized as follows. Sectio 2 describes the group Lasso ad the adaptive group Lasso for variable selectio i oparametric additive models. Sectio 3 presets the asymptotic properties of these methods i large p, small settigs. Sectio 4 presets the results of simulatio studies to evaluate the fiitesample performace of these methods. Sectio 5 provides a illustrative applicatio, ad Sectio 6 icludes cocludig remarks. Proofs of the results stated i Sectio 3 are give i the Appedix. 4
5 2 Adaptive group Lasso i oparametric additive models We describe a twostep approach that uses the group Lasso for variable selectio based o a splie represetatio of each compoet i additive models. I the first step, we use the stadard group Lasso to achieve a iitial reductio of the dimesio i the model ad obtai a iitial estimator of the oparametric compoets. I the secod step, we use the adaptive group Lasso to achieve cosistet selectio. Suppose that each X j takes values i [a, b] where a < b are fiite umbers. To esure uique idetificatio of the f j s, we assume that Ef j (X j = 0, 1 j p. Let a = ξ 0 < ξ 1 < < ξ K < ξ K+1 = b be a partitio of [a, b] ito K subitervals I Kt = [ξ t, ξ t+1, t = 0,..., K 1 ad I KK = [ξ K, ξ K+1 ], where K K = v with 0 < v < 0.5 is a positive iteger such that max 1 k K+1 ξ k ξ k 1 = O( v. Let S be the space of polyomial splies of degree l 1 cosistig of fuctios s satisfyig: (i the restrictio of s to I Kt is a polyomial of degree l for 1 t K; (ii for l 2 ad 0 l l 2, s is l times cotiuously differetiable o [a, b]. This defiitio is phrased after Stoe (1985, which is a descriptive versio of Schumaker (1981, page 108, Defiitio 4.1. There exists a ormalized Bsplie basis {φ k, 1 k m } for S, where m (Schumaker Thus for ay f j S, we ca write K + l m f j (x = β jk φ k (x, 1 j p. (2 k=1 Uder suitable smoothess assumptios, the f j s ca be well approximated by fuctios i S. Accordigly, the variable selectio method described i this paper is based o the represetatio (2. Let a 2 ( m j=1 a j 2 1/2 deote the l2 orm of ay vector a IR m. Let β j = (β j1,..., β jm ad β = (β 1,..., β p. Let w = (w 1,..., w p be a give vector of weights, where 0 w j, 1 j p. Cosider the pealized least squares criterio L (µ, β = i=1 [ Y i µ p m ] 2 p β jk φ k (X ij + λ w j β j 2, (3 j=1 k=1 j=1 where λ is a pealty parameter. We study the estimators that miimize L (µ, β subject to the costraits m β jk φ k (X ij = 0, 1 j p. (4 i=1 k=1 5
6 These ceterig costraits are sample aalogs of the idetifyig restrictio Ef j (X j = 0, 1 j p. We ca covert (3(4 to a ucostraied optimizatio problem by ceterig the respose ad the basis fuctios. Let φ jk = 1 φ k (X ij, ψ jk (x = φ k (x φ jk. (5 i=1 For simplicity ad without causig cofusio, we simply write ψ k (x = ψ jk (x. Defie Z ij = ( ψ 1 (X ij,..., ψ m (X ij. So Z ij cosists of values of the (cetered basis fuctios at the ith observatio of the jth covariate. Let Z j = (Z 1j,..., Z j be the m desig matrix correspodig to the jth covariate. The total desig matrix is Z = (Z 1,..., Z p. Let Y = (Y 1 Y,..., Y Y. With this otatio, we ca write L (β ; λ = Y Zβ λ p w j β j 2. (6 Here we have dropped µ i the argumet of L. With the ceterig, µ = Y. The miimizig (3 subject to (4 is equivalet to miimizig (6 with respect to β, but the ceterig costraits are ot eeded for (6. We ow describe the twostep approach to compoet selectio i the oparametric additive model (1. Step 1. Compute the group Lasso estimator. Let j=1 L 1 (β, λ 1 = Y Zβ λ 1 p β j 2. This objective fuctio is the special case of (6 that is obtaied by settig w j = 1, 1 j p. The group Lasso estimator is β β (λ 1 = arg mi β L 1 (β ; λ 1. Step 2. Use the group Lasso estimator β to obtai the weights by settig w j = j=1 { βj 1 2, if β j 2 > 0,, if β j 2 = 0. 6
7 The adaptive group Lasso objective fuctio is L 2 (β ; λ 2 = Y Zβ λ 2 p w j β j 2. Here we defie 0 = 0. Thus the compoets ot selected by the group Lasso are ot icluded i Step 2. The adaptive group Lasso estimator is β β (λ 2 = arg mi β L 2 (β ; λ 2. Fially, the adaptive group Lasso estimators of µ ad f j are 3 Mai results µ = Y 1 Y i, fj (x = i=1 m k=1 j=1 β jk ψ k (x, 1 j p. This sectio presets our results o the asymptotic properties of the estimators defied i Steps 1 ad 2 of Sectio 2. Let k be a oegative iteger, ad let α (0, 1] be such that d = k + α > 0.5. Let F be the class of fuctios f o [0, 1] whose kth derivative f (k exists ad satisfies a Lipschitz coditio of order α: f (k (s f (k (t C s t α for s, t [a, b]. I (1, without loss of geerality, suppose that the first q compoets are ozero, that is, f j (x 0, 1 j q, but f j (x 0, q + 1 j p. Let A 1 = {1,..., q} ad A 0 = {q + 1,..., p}. Defie f 2 = [ b a f 2 (xdx] 1/2 for ay fuctio f, wheever the itegral exists. We make the followig assumptios. (A1 The umber of ozero compoets q is fixed ad there is a costat c f > 0 such that mi 1 j q f j 2 c f. (A2 The radom variables ε 1,..., ε are idepedet ad idetically distributed with Eε i = 0 ad Var(ε i = σ 2. Furthermore, their tail probabilities satisfy P ( ε i > x K exp( Cx 2, i = 1,...,, for all x 0 ad for costats C ad K. (A3 Ef j (X j = 0 ad f j F, j = 1,..., q. (A4 The covariate vector X has a cotiuous desity ad there exist costats C 1 ad C 2 such that the desity fuctio g j of X j satisfies 0 < C 1 g j (x C 2 < o [a, b] for every 1 j p. We ote that (A1, (A3, ad (A4 are stadard coditios for oparametric additive models. They would be eeded to estimate the ozero additive compoets at the optimal l 2 rate of 7
8 covergece o [a, b], eve if q were fixed ad kow. Oly (A2 stregthes the assumptios eeded for oparametric estimatio of a oparametric additive model. While coditio (A1 is reasoable i most applicatios, it would be iterestig to relax this coditio ad ivestigate the case whe the umber of ozero compoets ca also icrease with the sample size. The oly techical reaso that we assume this coditio is related to Lemma 3 give i the appedix, which is cocered with the properties of the smallest ad largest eigevalues of the desig matrix formed from the splie basis fuctios. If this lemma ca be exteded to the case of a diverget umber of compoets, the (A1 ca be relaxed. However, it is clear that there eeds to be restrictio o the umber of ozero compoets to esure model idetificatio. 3.1 Estimatio cosistecy of the group Lasso I this sectio, we cosider the selectio ad estimatio properties of the group Lasso estimator. Defie Ã1 = {j : β j 2 0, 1 j p}. Let A deote the cardiality of ay set A {1,..., p}. Theorem 1 Suppose that (A1 to (A4 hold ad λ 1 C log(pm for a sufficietly large costat C. (i With probability covergig to 1, Ã1 M 1 A 1 = M 1 q for a fiite costat M 1 > 1. (ii If m 2 log(pm / 0 ad (λ 2 1m / 2 0 as, the all the ozero β j, 1 j q, are selected with probability covergig to oe. (iii p j=1 ( m β j β j = O log(pm p ( m ( 1 + O p + O + O m 2d 1 ( 4m 2 λ 2 1 Part (i of Theorem 1 says that, with probability approachig 1, the group Lasso selects a model whose dimesio is a costat multiple of the umber of ozero additive compoets f j, regardless of the umber of additive compoets that are zero. Part (ii implies that every ozero coefficiet will be selected with high probability. Part (iii shows that the differece betwee the coefficiets i the splie represetatio of the oparametric fuctios i (1 ad their estimators coverges to zero i probability. The rate of covergece is determied by four terms: the stochastic error i estimatig the oparametric compoets (the first term ad the itercept µ (the secod term, the splie approximatio error (the third term ad the bias due to pealizatio (the fourth term. Let f j (x = m j=1 β jk ψ(x, 1 j p. The followig theorem is a cosequece of Theorem
9 Theorem 2 Suppose that (A1 to (A4 hold ad that λ 1 C log(pm for a sufficietly large costat C. The, (i Let Ãf = {j : f j 2 > 0, 1 j p}. There is a costat M 1 > 1 such that, with probability covergig to 1, Ãf M 1 q. (ii If (m log(pm / 0 ad (λ 2 1m / 2 0 as, the all the ozero additive compoets f j, 1 j q, are selected with probability covergig to oe. (iii f ( j f j 2 m log(pm ( 1 ( 1 2 = O p + O p + O + O m 2d where Ã2 = A 1 Ã1. ( 4m λ 2 1 2, j Ã2, Thus uder the coditios of Theorem 2, the group Lasso selects all the ozero additive compoets with high probability. Part (iii of the theorem gives the rate of covergece of the group Lasso estimator of the oparametric compoets. For ay two sequeces {a, b, = 1, 2,..., }, we write a b if there are costats 0 < c 1 < c 2 < such that c 1 a /b c 2 for all sufficietly large. We ow state a useful corollary of Theorem 2. Corollary 1 Suppose that (A1 to (A4 hold. If λ 1 log(pm ad m 1/(2d+1, the, (i If 2d/(2d+1 log(p 0 as, the with probability covergig to oe, all the ozero compoets f j, 1 j q, are selected ad the umber of selected compoets is o more tha M 1 q. (ii f j f j 2 2 = O p ( 2d/(2d+1 log(pm, j Ã2. For the λ 1 ad m give i Corollary 1, the umber of zero compoets ca be as large as exp(o( 2d/(2d+1. For example, if each f j has cotiuous secod derivative (d = 2, the it is exp(o( 4/5, which ca be much larger tha. 3.2 Selectio cosistecy of the adaptive group Lasso We ow cosider the properties of the adaptive group Lasso. We first state a geeral result cocerig the selectio cosistecy of the adaptive group Lasso, assumig a iitial cosistet estimator is available. We the apply to the case whe the group Lasso is used as the iitial estimator. We make the followig assumptios. 9
10 (B1 The iitial estimators β j are r cosistet at zero: ad there exists a costat c b > 0 such that r max j A 0 β j 2 = O P (1, r, where b 1 = mi j A1 β j 2. P(mi j A 1 β j 2 c b b 1 1, (B2 Let q be the umber of ozero compoets ad s = p q be the umber of zero compoets. Suppose that (a (b 1/2 log 1/2 (s m λ 2 r + m + λ 2m 1/4 1/2 λ 2 r m (2d+1/2 = o(1, = o(1. We state coditio (B1 for a geeral iitial estimator, to highlight the poit that the availability of a r cosistet estimator at zero is crucial for the adaptive group Lasso to be selectio cosistet. I other words, ay iitial estimator satisfyig (B1 will esure that the adaptive group Lasso (based o this iitial estimator is selectio cosistet, provided that certai regularity coditios are satisfied. We ote that it follows immediately from Theorem 1 that the group Lasso estimator satisfies (B1. We will come back to this poit below. For β ( β 1,..., β p ad β (β 1,..., β p, we say β = 0 β if sg 0 ( β j = sg 0 ( β j, 1 j p, where sg 0 ( x = 1 if x > 0 ad = 0 if x = 0. Theorem 3 Suppose that coditios (B1, (B2 ad (A1(A4 hold. The (i P( β = 0 β 1. (ii q j=1 ( m β j β j = O p ( m ( 1 + O p + O + O m 2d 1 ( 4m 2 λ This theorem is cocered with the selectio ad estimatio properties of the adaptive group Lasso i terms of β. The followig theorem states the results i terms of the estimators of the oparametric compoets. 10
11 Theorem 4 Suppose that coditios (B1, (B2 ad (A1(A4 hold. The (i P ( f j 2 > 0, j A 1 ad f j 2 = 0, j A 0 1. (ii q j=1 f ( j f j 2 m ( 1 ( 1 2 = O p + O p + O + O m 2d ( 4m λ Part (i of this theorem states that the adaptive group Lasso ca cosistetly distiguish ozero compoets from zero compoets. Part (ii gives a upper boud o the rate of covergece of the estimator. We ow apply the above results to our proposed procedure described i Sectio 2, i which we first obtai the the group Lasso estimator ad the use it as the iitial estimator i the adaptive group Lasso. By Theorem 1, if λ 1 log(pm ad m 1/(2d+1 for d 1, the the group Lasso estimator satisfies (B1 with r d/(2d+1 / log(pm. I this case, (B2 simplifies to λ 2 = o(1 ad 1/(4d+2 log 1/2 (pm = o(1. (7 (8d+3/(8d+4 λ 2 We summarize the above discussio i the followig corollary. Corollary 2 Let the group Lasso estimator β β (λ 1 with λ 1 log(pm ad m 1/(2d+1 be the iitial estimator i the adaptive group Lasso. Suppose that the coditios of Theorem 1 hold. If λ 2 O( 1/2 ad satisfies (7, the the adaptive group Lasso cosistetly selects the ozero compoets i (1, that is, part (i of Theorem 4 holds. I additio, q f ( j f j 2 2 = O p 2d/(2d+1. j=1 This corollary follows directly from Theorems 1 ad 4. The largest λ 2 allowed is λ 2 = O( 1/2. With this λ 2, the first equatio i (6 is satisfied. Substitute it ito the secod equatio i (6, we obtai p = exp(o( 2d/(2d+1, which is the largest p permitted ad ca be larger tha. Thus, uder the coditios of this corollary, our proposed adaptive group Lasso estimator usig the group Lasso as the iitial estimator is selectio cosistet ad achieves optimal rate of covergece eve whe p is larger tha. Followig model selectio, oracleefficiet, asymptotically ormal estimators of the ozero compoets ca be obtaied by usig existig methods. 11
12 4 Simulatio studies We use simulatio to evaluate the performace of the adaptive group Lasso with regard to variable selectio. The geeratig model is y i = f(x i + ε i p f j (x ij + ε i, i = 1,,. (8 j=1 Sice p ca be larger tha, we cosider two ways to select the pealty parameter, the BIC (Schwarz 1978 ad the EBIC (Che ad Che 2008, The BIC is defied as BIC(λ = log(rss λ + df λ log. Here RSS λ is the residual sum of squares for a give λ, ad the degrees of freedom df λ = ˆq λ m, where ˆq λ is the umber of ozero estimated compoets for the give λ. The EBIC is defied as EBIC(λ = log(rss λ + df λ log where 0 ν 1 is a costat. We use ν = ν df λ log p, We have also cosidered two other possible ways of defiig df: (a usig the trace of a liear smoother based o a quadratic approximatio; (b usig the umber of estimated ozero compoets. We have decided to use the defiitio give above based o the results from our simulatios. We ote that the df for the group Lasso of Yua ad Li (2006 requires a iitial (least squares estimator, which is ot available whe p >. Thus their df is ot applicable to our problem. I our simulatio example, we compare the adaptive group Lasso with the group Lasso ad ordiary Lasso. Here the ordiary Lasso estimator is defied as the value that miimizes Y Zβ λ p m β jk. j=1 k=1 This simple applicatio of the Lasso does ot take ito accout the groupig structure i the splie expasios of the compoets. The group Lasso ad the adaptive group Lasso estimates are computed usig the algorithm proposed by Yua ad Li (2006. The ordiary Lasso estimates are computed usig the Lars algorithms (Efro et al The group Lasso is used as the iitial estimate for the adaptive group Lasso. We also compare the results from the oparametric additive modelig with those from the 12
13 stadard liear regressio model with Lasso. We ote that this is ot a fair compariso because the geeratig model is highly oliear. Our purpose is to illustrate that it is ecessary to use oparametric models whe the uderlyig model deviates substatially from liear models i the cotext of variable selectio with highdimesioal data ad that model misspecificatio ca lead to bad selectio results. Example 1. We geerate data from the model y i = f(x i + ε i p f j (x ij + ε i, i = 1,,, j=1 where f 1 (t = 5t, f 2 (t = 3(2t 1 2, f 3 (t = 4si(2πt/(2 si(2πt, f 4 (t = 6(0.1 si(2πt cos(2πt si(2πt cos(2πt si(2πt 3, ad f 5 (t = = f p (t = 0. Thus the umber of ozero fuctios is q = 4. This geeratig model is the same as Example 1 of Li ad Zhag (2006. However, here we use this model i highdimesioal settigs. We cosider the cases where p = 1000 ad three differet sample sizes: = 50, 100 ad 200. We use the cubic Bsplie with six evely distributed kots for all the fuctios f k. The umber of replicatios i all the simulatios is 400. The covariates are simulated as follows. First, we geerate w i1,, w ip, u i, u i, v i idepedetly from N(0, 1 trucated to the iterval [0, 1], i = 1,,. The we set x ik = (w ik + tu i /(1 + t for k = 1,, 4 ad x ik = (w ik + tv i /(1 + t for k = 5,, p, where the parameter t cotrols the amout of correlatio amog predictors. We have Corr(x ik, x ij = t 2 /(1 + t 2, 1 j 4, 1 k 4 ad Corr(x ik, x ij = t 2 /(1 + t 2, 4 j p, 4 k p, but the covariates of the ozero compoets ad zero compoets are idepedet. We cosider t = 0, 1 i our simulatio. The sigal to oise ratio is defied to be sd(f/sd(ɛ. The error term is chose to be ɛ i N(0, to give a sigaltooise ratio (SNR 3.11 : 1. This value is the same as the estimated SNR i the real data example below, which is the square root of the ratio of the sum of estimated compoets squared divided by the sum of residual squared. The results of 400 Mote Carlo replicatios are summarized i Table 1. The colums are the mea umber of variables selected (NV, model error (ER, the percetage of replicatios i which all the correct additive compoets are icluded i the selected model (IN, ad the percetage of replicatios i which precisely the correct compoets are selected (CS. The correspodig stadard errors are i paretheses. The model error is computed as the average of 1 i=1 [ ˆf(x i f(x i ] 2 over the 400 Mote Carlo replicatios, where f is the true coditioal mea fuctio. Table 1 shows that the adaptive group Lasso selects all the ozero compoets (IN ad selects exactly the correct model (CS more frequetly tha the other methods do. For example, 13
14 with the BIC ad = 200, the percetage of correct selectios (CS by the adaptive group Lasso rages from 65.25% to 81%, which is much higher tha the rages % for the group Lasso ad % for the ordiary Lasso. The adaptive group Lasso ad group Lasso perform better tha the ordiary Lasso i all of the experimets, which illustrates the importace of takig accout of the group structure of the coefficiets of the splie expasio. Correlatio amog covariates icreases the difficulty of compoet selectio, so it is ot surprisig that all methods perform better with idepedet covariates tha with correlated oes. The percetage of correct selectios icreases as the sample size icreases. The liear model with Lasso ever selects the correct model. This illustrates the poor results that ca be produced by a liear model whe the true coditioal mea fuctio is oliear. Table 1 also shows that the model error (ME of the group Lasso is oly slightly larger tha that of the adaptive group Lasso. The models selected by the group Lasso est ad, therefore, have more estimated coefficiets tha the models selected by the adaptive group Lasso. Therefore, the group Lasso estimators of the coditioal mea fuctio have a larger variace ad larger ME. The differeces betwee the MEs of the two methods are small, however, because as ca be see from the NV colum, the models selected by the group Lasso i our experimets have oly slightly more estimated coefficiets tha the models selected by the adaptive group Lasso. Example 2. We ow compare the adaptive group Lasso with the COSSO (Li ad Zhag This compariso is suggested to us by the Associate Editor. Because the COSSO algorithm oly works for the case whe p is smaller tha, we use the same setup as i Example 1 of Li ad Zhag (2006. I this example, the geeratig model is as i (8 with 4 ozero compoets. Let X j = (W j + tu/(1 + t, j = 1,, p, where W 1,, W p ad U are i.i.d. from N(0, 1, trucated to the iterval [0, 1]. Therefore corr(x j, X k = t 2 /(1 + t 2 for j k. The radom error term ɛ N(0, The SNR is 3:1. We cosider three differet sample sizes = 50, 100 or 200 ad three differet umber of predictors p = 10, 20 or 50. The COSSO estimator is computed usig the Matlab software which is publicly available at hzhag/cosso.html. The COSSO procedure uses either geeralized cross validatio or 5fold cross validatio. Based the simulatio results of Li ad Zhag (2006 ad our ow simulatios, the COSSO with 5fold cross validatio has better selectio performace. Thus we compare the adaptive group Lasso with BIC or EBIC with the COSSO with 5fold cross validatio. The results are give i Table 2. For idepedet predictors, whe = 200 ad p = 10, 20 or 50, the adaptive group Lasso ad COSSO have similar performace i terms of selectio accuracy ad model error. However, for smaller ad larger p, the adaptive group Lasso does sigificatly better. For example, for = 100 ad p = 50, the percetage of correct selectio for the adaptive group Lasso is 8183%, 14
15 but it is oly 11% for the COSSO. The model error of the adaptive group Lasso is similar to or smaller tha that of the COSSO. I several experimets, the model error of the COSSO is 2 to more tha 7 times larger tha that of the adaptive group Lasso. It is iterestig to ote that whe = 50 ad p = 20 or 50, the adaptive group Lasso still does a descet job i selectig the correct model, but the COSSO does poorly i these two cases. I particular, for = 50 ad p = 50, the COSSO did ot select the exact correct model i all the simulatio rus. For depedet predictors, the compariso is eve mode favorable to the adaptive group Lasso, which performs sigificatly better tha COSSO i terms of both model error ad selectio accuracy i all the cases. 5 Data example We use the data set reported i Scheetz et al. (2006 to illustrate the applicatio of the proposed method i highdimesioal settigs. For this data set, 120 twelveweekold male rats were selected for tissue harvestig from the eyes ad for microarray aalysis. The microarrays used to aalyze the RNA from the eyes of these aimals cotai over 31,042 differet probe sets (Affymetric GeeChip Rat Geome Array. The itesity values were ormalized usig the robust multichip averagig method (Irizzary et al method to obtai summary expressio values for each probe set. Gee expressio levels were aalyzed o a logarithmic scale. We are iterested i fidig the gees that are related to the gee TRIM32. This gee was recetly foud to cause BardetBiedl sydrome (Chiag et al. (2006, which is a geetically heterogeeous disease of multiple orga systems icludig the retia. Although over 30,000 probe sets are represeted o the Rat Geome Array, may of them are ot expressed i the eye tissue ad iitial screeig usig correlatio shows that most probe sets have very low correlatio with TRIM32. I additio, we are expectig oly a small umber of gees to be related to TRIM32. Therefore, we use 500 probe sets that are expressed i the eye ad have highest margial correlatio i the aalysis. Thus the sample size is = 120 (i.e., there are 120 arrays from 120 rats ad p = 500. It is expected that oly a few gees are related to TRIM32. Therefore this is a sparse, highdimesioal regressio problem. We use the oparametric additive model to model the relatio betwee the expressio of TRIM32 ad those of the 500 gees. We estimate model (1 usig the ordiary Lasso, group Lasso, ad adaptive group Lasso for the oparametric additive model. To compare the results of the oparametric additive model with that of the liear regressio model, we also aalyzed the data usig the liear regressio model with Lasso. We scale the covariates so that their values are 15
16 betwee 0 ad 1 ad use cubic splies with six evely distributed kots to estimate the additive compoets. The pealty parameters i all the methods are chose usig the BIC or EBIC as i the simulatio study. Table 3 lists the probes selected by the group Lasso ad the adaptive group Lasso, idicated by the check sigs. Table 4 shows the umber of variables, the residual sums of squares obtaied with each estimatio method. For the ordiary Lasso with the splie expasio, a variable is cosidered to be selected if ay of the estimated coefficiets of the splie approximatio to its additive compoet are ozero. Depedig o whether BIC or EBIC is used, the group Lasso selects variables, the adaptive group Lasso selects 15 variables ad the ordiary Lasso with the splie expasio selects variables, the liear model selects 814 variables. Table 4 shows that the adaptive group Lasso does better tha the other methods i terms of residual sum of squares (RSS. We have also examied the plots (ot show of the estimated additive compoets obtaied with the group Lasso ad the adaptive group Lasso, respectively. Most are highly oliear, cofirmig the eed for takig ito accout oliearity. I order to evaluate the performace of the methods, we use crossvalidatio ad compare the predictio mea square errors (PEs. We radomly partitio the data ito 6 subsets, each set cosistig of 20 observatios. We the fit the model with 5 subsets as traiig set ad calculate the PE for the remaiig set which we cosider as test set. We repeat this process 6 times, cosiderig oe of the 6 subsets as test set every time. We compute the average of the umbers of probes selected ad the predictio errors of these 6 calculatios. The we replicate this process 400 times (this is suggested to us by the Associate Editor. Table 5 gives the average values over 400 replicatios. The adaptive group Lasso has smaller average predictio error tha the group Lasso, the ordiary Lasso ad the liear regressio with Lasso. The ordiary Lasso selects far more probe sets tha the other approaches, but this does ot lead to better predictio performace. Therefore, i this example, the adaptive group Lasso provides the ivestigator a more targeted list of probe sets, which ca serve as a startig poit for further study. It is of iterest to compare the selectio results from the adaptive group Lasso ad the liear regressio model with Lasso. The adaptive group Lasso ad the liear model with Lasso select differet sets of gees. Whe the pealty parameter is chose with the BIC, the adaptive group Lasso selects 5 gees that are ot selected by the liear model with Lasso. I additio, the liear model with Lasso selects 5 gees that are ot selected by the adaptive group Lasso. Whe the pealty parameter is selected with the EBIC, the adaptive group Lasso selects 10 gees that are ot selected by the liear model with Lasso. The estimated effects of may of the gees are oliear, ad the Mote Carlo results of Sectio 4 show that the performace of the liear model with Lasso ca be very poor i the presece of oliearity. Therefore, we iterpret the differeces betwee 16
17 the gee selectios of the adaptive group Lasso ad the liear model with Lasso as evidece that the selectios produced by the liear model are misleadig. 6 Cocludig remarks I this paper, we propose to use the adaptive group Lasso for variable selectio i oparametric additive models i sparse, highdimesioal settigs. A key requiremet for the adaptive group Lasso to be selectio cosistet is that the iitial estimator is estimatio cosistet ad selects all the importat compoets with high probability. I lowdimesioal settigs, fidig a iitial cosistet estimator is relatively easy ad ca be achieved by may well established approaches such as the additive splie estimators. However, i highdimesioal settigs, fidig a iitial cosistet estimator is difficult. Uder the coditios stated i Theorem 1, the group Lasso is show to be cosistet ad selects all the importat compoets. Thus the group Lasso ca be used as the iitial estimator i the adaptive Lasso to achieve selectio cosistecy. Followig model selectio, oracleefficiet, asymptotically ormal estimators of the ozero compoets ca be obtaied by usig existig methods. Our simulatio results idicate that our procedure works well for variable selectio i the models cosidered. Therefore, the adaptive group Lasso is a useful approach for variable selectio ad estimatio i sparse, highdimesioal oparametric additive models. Our theoretical results are cocered with a fixed sequece of pealty parameters, which are ot applicable to the case where the pealty parameters are selected based o data drive procedures such as the BIC. This is a importat ad challegig problem that deserves further ivestigatio, but is beyod the scope of this paper. We have oly cosidered liear oparametric additive models. The adaptive group Lasso ca be applied to geeralized oparametric additive models, such as the geeralized logistic oparametric additive model ad other oparametric models with highdimesioal data. However, more work is eeded to uderstad the properties of this approach i those more complicated models. Ackowledgemets. The authors wish to thak the Editor, Associate Editor ad two aoymous referees for their helpful commets. Jia Huag is supported i part by NIH grat CA ad NSF grat DMS Joel L. Horowitz was supported i part by NSF Grat grat SES
18 7 Appedix: Proofs We first prove the followig lemmas. Deote the cetered versios of S by S 0 j = { f j : f j (x = m k=1 b jk ψ k (x, (β j1,..., β jm IR m where ψ k s are the cetered splie bases defied i (5. }, 1 j p, Lemma 1 Suppose that f F ad Ef(X j = 0. The, uder (A3 ad (A4, there exists a f S 0 j satisfyig f f 2 = O p (m d + m 1/2 1/2. I particular, if we choose m = O( 1/(2d+1, the f f 2 = O p (m d = O p ( d/(2d+1. Proof of Lemma 1. By (A4, for f F, there is a f S such that f f 2 = O(m d. Let f = f 1 i=1 f (X ij. The f S 0 j ad f f f f + P f, where P is the empirical measure of iid radom variables X 1j,..., X j. Cosider P f = (P P f + P (f f. Here we use the liear fuctioal otatio, i.e., P f = fdp, where P is the probability measure of X 1j. For ay ε > 0, the bracketig umber N [] (ε, S 0 j, L 2 (P of S 0 j satisfies log N [] (ε, S 0 j, L 2 (P c 1 m log(1/ε for some costat c 1 > 0 (She ad Wog 1994, page 597. Thus by the maximal iequality, see e.g. Va der Vaart (1998, page 288, (P P f = O p ( 1/2 m 1/2. By (A4, P (f f C 2 f f 2 = O(m d for some costat C 2 > 0. The lemma follows from the triagle iequality. Lemma 2 Suppose that coditios (A2 ad (A4 hold. Let T jk = 1/2 m 1/2 ψ k (X ij ε i, 1 j p, 1 k m, i=1 ad T = max 1 j p,1 k m T jk. The E(T C 1 1/2 m 1/2 log(pm ( 2C2 m 1 log(pm + 4 log(2pm + C 2 m 1 1/2. 18
19 where C 1 ad C 2 are two positive costats. I particular, whe m log(pm / 0, E(T = O(1 log(pm. Proof of Lemma 2. Let s 2 jk = i=1 ψ2 k (X ij. Coditioal o X ij s, T jk s are subgaussia. Let s 2 = max 1 j p,1 k m s 2 jk. By (A2 ad the maximal iequality for subgaussia radom variables (Va der Vaart ad Weller 1996, Lemmas ad 2.2.2, E ( max T jk {Xij, 1 i, 1 j p} C 1 1/2 m 1/2 s log(pm. 1 j p,1 k m Therefore, E ( max T jk C 1 1/2 m 1/2 log(pm E(s, (9 1 j p,1 k m where C 1 > 0 is a costat. By (A4 ad the properties of Bsplies, ψ k (X ij φ k (X ij + φ jk 2 ad E(ψ k (X ij 2 C 2 m 1, (10 for a costat C 2 > 0, for every 1 j p ad 1 k m. By (10, ad i=1 E[ψ 2 k(x ij Eψ 2 k(x ij ] 2 4C 2 m 1, (11 max 1 j p,1 k m By Lemma A.1 of Va de Geer (2008, (10 ad (11 imply i=1 i=1 ( E max {ψ 2 1 j p,1 k m k(x ij Eψk(X 2 ij } Therefore, by (12 ad the triagle iequality, Eψ 2 k(x ij C 2 m 1. (12 2C 2 m 1 log(pm + 4 log(2pm. Now sice Es (Es 2 1/2, we have Es 2 2C 2 m 1 log(pm + 4 log(2pm + C 2 m 1. Es ( 2C2 m 1 1/2. log(pm + 4 log(2pm + C 2 m 1 (13 19
20 The lemma follows from (9 ad (13. Deote β A = (β j, j A ad Z A = (Z j, j A. Here β A is a A m 1 vector ad Z A is a A m matrix. Let C A = Z A Z A/. Whe A = {1,..., p}, we simply write C = Z Z/. Let ρ mi (C A ad ρ max (C A be the miimum ad maximum eigevalues of C A, respectively. Lemma 3 Let m = O( γ where 0 < γ < 0.5. Suppose that A is bouded by a fixed costat idepedet of ad p. Let h h m 1. The, uder (A3 ad (A4, with probability covergig to oe, where c 1 ad c 2 are two positive costats. c 1 h ρ mi (C A ρ max (C A c 2 h, Proof of Lemma 3. Without loss of geerality, suppose A = {1,..., k}. The Z A = (Z 1,..., Z q. Let b = (b 1,..., b q, where b j R m. By Lemma 3 of Stoe (1985, Z 1 b Z q b q 2 c 3 ( Z 1 b Z q b q 2 for a certai costat c 3 > 0. By the triagle iequality, Z 1 b Z q b q 2 Z 1 b Z q b q 2. Sice Z A b = Z 1 b Z q b q, the above two iequalities imply that c 3 ( Z 1 b Z q b q 2 Z A b 2 Z 1 b Z q b q 2. Therefore, c 2 3( Z 1 b Z q b q 2 2 Z A b 2 2 2( Z 1 b Z q b q 2 2. (14 Let C j = 1 Z jz j. By Lemma 6.2 of Zhou, She ad Wolf (1998, c 4 h ρ mi (C j ρ max (C j c 5 h, j A. (15 Sice C A = 1 Z A Z A, it follows from (14 that c 2 3 ( b 1 C 1 b b qc q b q b C A b 2 ( b 1C 1 b b qc q b q. 20
21 Therefore, by (15, b 1C 1 b 1 b b qc q b q b 2 2 = b 1C 1 b 1 b b b b qc q b q b q 2 2 b q 2 2 b 2 2 ρ mi (C 1 b b 2 2 c 4 h. + + ρ mi (C q b q 2 2 b 2 2 Similarly, Thus we have The lemma follows. b 1C 1 b 1 b b qc q b q b 2 2 c 2 3c 4 h b C A b b b 2c 5h. c 5 h. Proof of Theorem 1. The proof of parts (i ad (ii essetially follows the proof of Theorem 2.1 of Wei ad Huag (2008. The oly chage that must be made here is that we eed to cosider the approximatio error of the regressio fuctios by splies. Specifically, let ξ = ε + δ, where δ = (δ 1,..., δ with δ i = q j=1 (f 0j(X ij f j (X ij. Sice f 0j f j 2 = O(m d = O( d/(2d+1 for m = 1/(2d+1, we have for some costat C 1 > 0. For ay iteger t, let δ 2 C 1 qm 2d = C 1 q 1/(4d+2, χ t = max A =t max U Ak 2 =1,1 k t ξ V A (s V A (s 2 ad χ t = max A =t max U Ak 2 =1,1 k t ε V A (s V A (s 2 where V A (S A = ξ (Z A (Z A Z A 1 SA (I P A Xβ for N(A = q 1 = m 0, S A = (S A 1,, S A m, S Ak = λ d Ak U Ak ad U Ak 2 = 1. For a sufficietly large costat C 2 > 0, defie Ω t0 = {(Z, ε : x t σc 2 ((t 1m log(pm, t t 0 }, ad Ω t 0 = {(Z, ε : x t σc 2 (t 1m log(pm, t t 0 }, where t
22 As i the proof of Theorem 2.1 of Wei ad Huag (2008, (Z, ε Ω q Ã1 M 1 q, for a costat M 1 > 1. By the triagle ad CauchySchwarz iequalities, ξ V A (s V A (s 2 = ε V A (s + δ V A (s V A (s 2 ε V A (s V A 2 + δ. (16 I the proof of Theorem 2.1 of Wei ad Huag (2008, it is show that P(Ω p 1+c 0 ( 2p exp 1. (17 p 1+c 0 Sice δ V A (s V A (s 2 δ 2 = C 1 q 1 2(2d+1 ad m = O( 1/(2d+1, we have for all t 0 ad sufficietly large, δ 2 C 1 q 1 2(2d+1 σc2 (t 1m log(p. (18 It follows from (16, (17 ad (18 that P(Ω 0 1. This completes the proof of part (i of Theorem 1. Before provig part (ii, we first prove part (iii of Theorem 1. By the defiitio of β ( β 1,..., β p, p p Y Z β λ 1 β j 2 Y Zβ λ 1 β j 2. (19 j=1 j=1 Let A 2 = {j : β j 2 0 or β j 2 0} ad d 2 = A 2. By part (i, d 2 = O p (q. By (19 ad the defiitio of A 2, Y Z A2 βa λ 1 β j 2 Y Z A2 β A λ 1 β j 2. (20 j A 2 j A 2 Let η = Y Zβ. Write Y Z A2 βa2 = Y Zβ Z A2 ( β A2 β A2 = η Z A2 ( β A2 β A2. 22
23 We have Y Z A2 βa2 2 2 = Z A2 ( β A2 β A η Z A2 ( β A2 β A2 + η η. We ca rewrite (20 as Z A2 ( β A2 β A η Z A2 ( β A2 β A2 λ 1 β j 2 λ 1 β j 2. (21 j A 1 j A 1 Now β j 2 β j 2 A 1 β A1 β A1 2 A 1 β A2 β A2 2. (22 j A 1 j A 1 Let ν = Z A2 ( β A2 β A2. Combiig (20, (21 ad (22 to get ν 2 2 2η ν λ 1 A1 β A2 β A2 2. (23 Let η be the projectio of η to the spa of Z A2, that is, η = Z A2 (Z A 2 Z A2 1 Z A 2 η. By the CauchySchwartz iequality, 2 η ν 2 η 2 ν 2 2 η ν 2 2. (24 From (23 ad (24, we have ν η λ 1 A1 β A2 β A2 2. Let c be the smallest eigevalue of Z A 2 Z A2 /. By Lemma 3 ad part (i, c p m 1. Sice ν 2 2 c β A2 β A2 2 2 ad 2ab a 2 + b 2, It follows that c β A2 β A η (2λ 1 A c 2 c β A2 β A β A2 β A η λ2 1 A 1. (25 c 2 c 2 Let f 0 (X i = p j=1 f 0j(X ij ad f 0A (X i = j A f 0j(X ij. Write η i = Y i µ f 0 (X i + (µ Y + f 0 (X i j A 2 Z ijβ j = ε i + (µ Y + f A2 (X i f A2 (X i. 23
24 Sice µ Y 2 = O p ( 1 ad f 0j f j = O(m d, we have η ε O p (1 + O(d 2 m 2d, (26 where ε is the projectio of ε = (ε 1,..., ε to the spa of Z A2. We have Now max Z Aε 2 2 = A: A d 2 ε 2 2 = (Z A 2 Z A2 1/2 Z A 2 ε c Z A 2 ε 2 2. max Z jε 2 2 d 2 m max Z A: A d 2 1 j p,1 k m jkε 2, j A where Z jk = (ψ k (X 1j,..., ψ k (X j. By Lemma 2, max Z 1 j p,1 k m jkε 2 = m 1 max (m / 1/2 Z 1 j p,1 k m jkε 2 = O p (1m 1 log(pm. It follows that, Combiig (25, (26, ad (27, we get ε 2 2 = O p (1 d 2 log(pm c. (27 ( β A2 β A2 2 d2 log(pm ( 1 ( d2 m 2d 2 O p + O c 2 p + O c c Sice d 2 = O p (q, c p m 1 ad c p m 1, we have ( m β A2 β A O log(pm p This completes the proof of part (iii. ( m ( 1 + O p + O + O m 2d 1 + 4λ2 1 A 1. 2 c 2 ( 4m 2 λ 2 1 We ow prove part (ii. Sice f j 2 c f > 0, 1 j q, f j f j 2 = O(m d ad f j 2 f j 2 f j f j 2, we have f j 2 0.5c f for sufficietly large. By a result of de Boor (2001, see also (12 of Stoe (1986, there are positive costats c 6 ad c 7 such that 2. c 6 m 1 β 2 2 f j 2 2 c 7 m 1 β j 2 2. It follows that, β j 2 2 c 1 7 m f j c 1 7 c 2 f m. Therefore, if β j 2 0 but β j 2 = 0, the β j β j c 1 7 c 2 fm. (28 24
25 However, sice (m log(pm / 0 ad (λ 2 1m / 2, (28 cotradicts part (iii. Proof of Theorem 2. By the defiitio of f j, 1 j p, parts (i ad (ii follow from parts (i ad (ii of Theorem 1 directly. Now cosider part (iii. By the properties of splie (de Boor (2001, c 6 m 1 β j β j 2 2 f j f j 2 2 c 7 m 1 β j β j 2 2. Thus By (A3, f ( j f j 2 m log(pm ( 1 ( 1 2 = O p + O p + O + O m 2d ( 4m λ (29 f j f j 2 2 = O(m 2d. (30 Part (iii follows from (29 ad (30. I the proofs below, for ay matrix H, deote its 2orm by H, which is equal to its largest eigevalue. This orm satisfies the iequality Hx H x for a colum vector x whose dimesio is the same as the umber of the colums of H. Deote β A1 = (β j, j A 1, βa1 = ( β j, j A 1, ad Z A1 = (Z j, j A 1. Defie C A1 = 1 Z A 1 Z A1. Let ρ 1 ad ρ 2 be the smallest ad largest eigevalues of C A1, respectively. Proof of Theorem 3. By the KKT, a ecessary ad sufficiet coditio for β is ( 2Z βj j Y Z β = λ2 w j, β β j j 2 0, j 1, 2 Z j( Y Z β 2 λ 2 w j, β j = 0, j 1. (31 Let ν = (w j βj /(2 β j, j A 1. Defie β A1 = (Z A 1 Z A1 1 (Z A 1 Y λ 2 ν. (32 If β A1 = 0 β A1, the the equatio i (31 holds for β ( β A 1, 0. Thus, sice Z β = Z A1 βa1 for this β ad {Z j, j A 1 } are liearly idepedet, β A1 = 0 β A1 β = 0 β if Z j( Y ZA1 βa1 2 λ 2 w j /2, j A 1. 25
26 This is true if β = 0 β if β j 2 β j 2 < β j 2, j A 1, Z j(y Z A1 βa1 2 λ 2 w j /2, j A 1. Therefore, P( β 0 β P ( β j β j 2 β j 2, j A 1 +P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1. have Let f 0j (X j = (f 0j (X 1j,..., f 0j (X j ad δ = j A 1 f 0j (X j Z A1 β A1. By Lemma 1, we Let H = I Z A1 (Z A 1 Z A1 1 Z A 1. By (32, 1 δ 2 = O p (qm 2d. (33 β A1 β A1 = 1 C 1 A 1 ( Z A1 (ε + δ λ 2 ν, (34 ad Y Z A1 βa1 = H ε + H δ + λ 2 Z A1 C 1 A 1 ν /. (35 Based o these two equatios, Lemma 5 below shows that P ( β j β j 2 β j 2, j A 1 0, ad Lemma 6 below shows that P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1 0. These two equatios lead to part (i of the theorem. We ow prove part (ii of Theorem 3. As i (26, for η = Y Zβ ad η 1 = Z A1 (Z A 1 Z A1 1 Z A 1 η, we have η ε O p (1 + O(qm 2d, (36 26
27 where ε 1 is the projectio of ε = (ε 1,..., ε to the spa of Z A1. We have ε = (Z A 1 Z A1 1/2 Z A 1 ε ρ 1 Z A 1 ε 2 2 = O p (1 A 1 ρ 1. (37 Now similarly to the proof of (25, we ca show that Combiig (36, (37 ad (38, we get β A1 β A η λ2 2 A 1. (38 ρ 1 2 ρ 2 1 ( 8 ( 1 ( 1 ( 4λ β A1 β A = O p + O ρ 2 p + O + O 2. 1 ρ 1 m 2d 1 2 ρ 2 1 Sice ρ 1 p m 1, the result follows. The followig lemmas are eeded i the proof of Theorem 3. Lemma 4 For ν = (w j βj /(2 β j, j A 1, uder coditio (B1, Proof of Lemma 4. Write ν 2 = O p (h 2 = O p ( (b 2 1 c b 2 r 1 + qb 1 1. ν 2 = j A 1 w 2 j = j A 1 β j 2 = j A 1 β j 2 β j 2 β j 2 β j 2 + j A 1 β j 1. Uder (B2, j A 1 βj 2 β j 2 β j 2 β j 2 Mc 2 b b 4 1 β β, ad j A 1 β j 2 qb 2 1. The claim follows. Let ρ 3 be the maximum of the largest eigevalues of 1 Z jz j, j A 0, that is, ρ 3 = max j A0 1 Z jz j 2. By Lemma 3, b 1 O(m 1/2, ρ 1 p m 1, ρ 2 p m 1 ad ρ 3 p m 1. (39 Lemma 5 Uder coditios (B1, (B2, (A3 ad (A4, P ( β j β j 2 β j 2, j A 1 0. (40 27
28 Proof of Lemma 5. Let T j be a m qm matrix with the form T j = (0 m,..., 0 m, I m, 0 m,..., 0 m, where O m is a m m matrix of zeros ad I m is a m m idetity matrix, ad I m is at the jth block. By (34, β j β j = 1 T j C 1 A 1 (Z A 1 ε + Z A 1 δ λ 2 ν. By the triagle iequality, β j β j 2 1 T j C 1 A 1 Z A 1 ε T j C 1 A 1 Z A 1 δ λ 2 T j C 1 A 1 ν 2. (41 Let C be a geeric costat idepedet of. The first term o the righthad side By (33, the secod term max 1 T j C 1 A j A 1 Z A 1 ε 2 1 ρ 1 1 Z A 1 ε 2 1 = 1/2 ρ 1 1 1/2 Z A 1 ε 2 = O p (1 1/2 ρ 1 1 m 1/2 (qm 1/2 (42 By Lemma 4, the third term max 1 T j C 1 A j A 1 Z A 1 δ 2 C 1 A Z A 1 Z A1 1/2 2 1 δ 2 1 = O p (1ρ 1 1 ρ 1/2 2 q 1/2 m d. (43 max 1 λ 2 T j C 1 A j A 1 ν 2 λ 2 ρ 1 1 ν 2 = O p (1ρ λ 2 h. (44 1 Thus (40 follows from (39, (42, (43, (44 ad coditio (B2a. Lemma 6 Uder coditios (B1, (B2, (A3 ad (A4, P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1 0. (45 Proof of Lemma 6. By (35, we have ( Z j Y ZA1 βa1 = Z j H ε + Z jh δ + λ 1 Z jz A1 C 1 A 1 ν. (46 28
29 Recall s = p q is the umber of zero compoets i the model. By Lemma 2, ( E max 1/2 Z j A 1 jh ε 2 O(1 { log(s m } 1/2. (47 Sice w j = β j 1 = O p (r for j A 1 ad by (47, for the first term o the right had side of (46 we have P ( Z jh ε 2 > λ 2 w j /6, j A 1 By (33, the secod term o the right had side of (46 P ( Z jh ε 2 > Cλ 2 r, j A 1 + o(1 ( = P max 1/2 Z j A 1 jh ε 2 > C 1/2 λ 2 r + o(1 O(1 1/2 {log(s m } 1/2 Cλ 2 r + o(1. (48 max Z j A 1 jh δ 2 1/2 max 1 Z j A 1 jz j 1/2 2 H 2 δ 2 = O(1ρ 1/2 3 q 1/2 m d. (49 By Lemma 4, the third term o the right had side of (46 max λ 2 1 Z j Z A1 C 1 A j ia 1 ν 2 λ 2 max 1/2 Z j 2 1/2 Z A1 C 1/2 A 1 j A 1 2 C 1/2 A 1 2 ν 2 1 = λ 2 ρ 1/2 3 ρ 1/2 1 O p (qb 1 1. (50 Therefore, (45 follows from (39, (48, (49, (50 ad coditio (B2b. Proof of Theorem 4. The proof is similar to that of Theorem 2 ad is omitted. Refereces [1] Bach, F. R. (2007. Cosistecy of the group Lasso ad multiple kerel learig. Joural of Machie Learig Research, 9, [2] Buea, F., Tsybakov, A. ad Wegkamp, M. (2007. Sparsity oracle iequalities for the lasso. Electroic Joural of Statististic, [3] Che, J. ad Che, Z. (2008. Exteded Bayesia iformatio criteria for model selectio with large model space. Biometrika, 95, [4] Che, J. ad Che, Z. (2009. Exteded BIC for smalllargep sparse GLM. Available from stachez/cheche.pdf. 29
Properties of MLE: consistency, asymptotic normality. Fisher information.
Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout
More informationChapter 7 Methods of Finding Estimators
Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of
More informationAsymptotic Growth of Functions
CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll
More information3. Covariance and Correlation
Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics
More information7. Sample Covariance and Correlation
1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y
More informationI. Chisquared Distributions
1 M 358K Supplemet to Chapter 23: CHISQUARED DISTRIBUTIONS, TDISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad tdistributios, we first eed to look at aother family of distributios, the chisquared distributios.
More informationModified Line Search Method for Global Optimization
Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o
More informationDepartment of Computer Science, University of Otago
Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS200609 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly
More informationNonlife insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring
Nolife isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy
More informationSequences and Series
CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their
More informationClass Meeting # 16: The Fourier Transform on R n
MATH 18.152 COUSE NOTES  CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should
More informationIrreducible polynomials with consecutive zero coefficients
Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem
More informationSAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx
SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval
More informationSoving Recurrence Relations
Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree
More informationNPTEL STRUCTURAL RELIABILITY
NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics
More informationTHE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n
We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample
More informationLecture 4: Cauchy sequences, BolzanoWeierstrass, and the Squeeze theorem
Lecture 4: Cauchy sequeces, BolzaoWeierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits
More informationAn example of nonquenched convergence in the conditional central limit theorem for partial sums of a linear process
A example of oqueched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which
More informationTheorems About Power Series
Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real oegative umber R, called the radius
More informationA probabilistic proof of a binomial identity
A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two
More informationChapter 6: Variance, the law of large numbers and the MonteCarlo method
Chapter 6: Variace, the law of large umbers ad the MoteCarlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value
More information1 Correlation and Regression Analysis
1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio
More informationConvexity, Inequalities, and Norms
Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for
More informationModule 4: Mathematical Induction
Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate
More information1 Computing the Standard Deviation of Sample Means
Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.
More informationTaking DCOP to the Real World: Efficient Complete Solutions for Distributed MultiEvent Scheduling
Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed MultiEvet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria
More information4.1 Sigma Notation and Riemann Sums
0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas
More informationLECTURE 13: Crossvalidation
LECTURE 3: Crossvalidatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Threeway data partitioi Itroductio to Patter Aalysis Ricardo GutierrezOsua Texas A&M
More informationIn nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008
I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces
More informationNormal Distribution.
Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued
More information, a Wishart distribution with n 1 degrees of freedom and scale matrix.
UMEÅ UNIVERSITET Matematiskstatistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 00409 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that
More informationVladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT
Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee
More informationIncremental calculation of weighted mean and variance
Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically
More informationLecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)
18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the BruMikowski iequality for boxes. Today we ll go over the
More informationUniversity of California, Los Angeles Department of Statistics. Distributions related to the normal distribution
Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chisquare (χ ) distributio.
More informationMaximum Likelihood Estimators.
Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio
More informationPlugin martingales for testing exchangeability online
Plugi martigales for testig exchageability olie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk
More informationChapter 7  Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:
Chapter 7  Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries
More informationApproximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find
1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.
More informationSequences II. Chapter 3. 3.1 Convergent Sequences
Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,
More informationConfidence Intervals for One Mean
Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a
More informationPSYCHOLOGICAL STATISTICS
UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics
More informationApproximating the Sum of a Convergent Series
Approximatig the Sum of a Coverget Series Larry Riddle Ages Scott College Decatur, GA 30030 lriddle@agesscott.edu The BC Calculus Course Descriptio metios how techology ca be used to explore covergece
More informationINFINITE SERIES KEITH CONRAD
INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal
More informationResearch Article Sign Data Derivative Recovery
Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov
More informationHighdimensional support union recovery in multivariate regression
Highdimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical
More informationOverview of some probability distributions.
Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability
More informationPROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM
PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUSMALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics
More information0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5
Sectio 13 KolmogorovSmirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.
More informationHypothesis testing. Null and alternative hypotheses
Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate
More informationOur aim is to show that under reasonable assumptions a given 2πperiodic function f can be represented as convergent series
8 Fourier Series Our aim is to show that uder reasoable assumptios a give periodic fuctio f ca be represeted as coverget series f(x) = a + (a cos x + b si x). (8.) By defiitio, the covergece of the series
More informationHIGHDIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY
The Aals of Statistics 2012, Vol. 40, No. 3, 1637 1664 DOI: 10.1214/12AOS1018 Istitute of Mathematical Statistics, 2012 HIGHDIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH
More informationSystems Design Project: Indoor Location of Wireless Devices
Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 6985295 Email: bcm1@cec.wustl.edu Supervised
More informationCase Study. Normal and t Distributions. Density Plot. Normal Distributions
Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca
More informationCenter, Spread, and Shape in Inference: Claims, Caveats, and Insights
Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the
More informationMARTINGALES AND A BASIC APPLICATION
MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measuretheoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this
More informationA Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design
A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 168040030 haupt@ieee.org Abstract:
More informationAQA STATISTICS 1 REVISION NOTES
AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if
More informationTHE ABRACADABRA PROBLEM
THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected
More informationKey Ideas Section 81: Overview hypothesis testing Hypothesis Hypothesis Test Section 82: Basics of Hypothesis Testing Null Hypothesis
Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, Pvalue Type I Error, Type II Error, Sigificace Level, Power Sectio 81: Overview Cofidece Itervals (Chapter 7) are
More informationThe second difference is the sequence of differences of the first difference sequence, 2
Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for
More informationSUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1
The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGHDIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,
More informationarxiv:1506.03481v1 [stat.me] 10 Jun 2015
BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,
More informationThe Euler Totient, the Möbius and the Divisor Functions
The Euler Totiet, the Möbius ad the Divisor Fuctios Rosica Dieva July 29, 2005 Mout Holyoke College South Hadley, MA 01075 1 Ackowledgemets This work was supported by the Mout Holyoke College fellowship
More informationJoint Probability Distributions and Random Samples
STAT5 Sprig 204 Lecture Notes Chapter 5 February, 204 Joit Probability Distributios ad Radom Samples 5. Joitly Distributed Radom Variables Chapter Overview Joitly distributed rv Joit mass fuctio, margial
More informationThis is arithmetic average of the x values and is usually referred to simply as the mean.
prepared by Dr. Adre Lehre, Dept. of Geology, Humboldt State Uiversity http://www.humboldt.edu/~geodept/geology51/51_hadouts/statistical_aalysis.pdf STATISTICAL ANALYSIS OF HYDROLOGIC DATA This hadout
More informationLecture 5: Span, linear independence, bases, and dimension
Lecture 5: Spa, liear idepedece, bases, ad dimesio Travis Schedler Thurs, Sep 23, 2010 (versio: 9/21 9:55 PM) 1 Motivatio Motivatio To uderstad what it meas that R has dimesio oe, R 2 dimesio 2, etc.;
More informationFIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix
FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if
More informationGregory Carey, 1998 Linear Transformations & Composites  1. Linear Transformations and Linear Composites
Gregory Carey, 1998 Liear Trasformatios & Composites  1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio
More information{{1}, {2, 4}, {3}} {{1, 3, 4}, {2}} {{1}, {2}, {3, 4}} 5.4 Stirling Numbers
. Stirlig Numbers Whe coutig various types of fuctios from., we quicly discovered that eumeratig the umber of oto fuctios was a difficult problem. For a domai of five elemets ad a rage of four elemets,
More informationEstimating Probability Distributions by Observing Betting Practices
5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,
More informationCHAPTER 3 DIGITAL CODING OF SIGNALS
CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity
More informationAnalyzing Longitudinal Data from Complex Surveys Using SUDAAN
Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More information5 Boolean Decision Trees (February 11)
5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected
More informationSolutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork
Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the
More informationx(x 1)(x 2)... (x k + 1) = [x] k n+m 1
1 Coutig mappigs For every real x ad positive iteger k, let [x] k deote the fallig factorial ad x(x 1)(x 2)... (x k + 1) ( ) x = [x] k k k!, ( ) k = 1. 0 I the sequel, X = {x 1,..., x m }, Y = {y 1,...,
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by
More informationData Analysis and Statistical Behaviors of Stock Market Fluctuations
44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:
More informationSection 11.3: The Integral Test
Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult
More informationINVESTMENT PERFORMANCE COUNCIL (IPC)
INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks
More informationA Faster ClauseShortening Algorithm for SAT with No Restriction on Clause Length
Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 4960 A Faster ClauseShorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece
More informationChapter 9: Correlation and Regression: Solutions
Chapter 9: Correlatio ad Regressio: Solutios 9.1 Correlatio I this sectio, we aim to aswer the questio: Is there a relatioship betwee A ad B? Is there a relatioship betwee the umber of emploee traiig hours
More informationSUPPLEMENTARY MATERIAL TO GENERAL NONEXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE
SUPPLEMENTARY MATERIAL TO GENERAL NONEXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Marelavallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,
More informationWeek 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable
Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5
More informationNr. 2. Interpolation of Discount Factors. Heinz Cremers Willi Schwarz. Mai 1996
Nr 2 Iterpolatio of Discout Factors Heiz Cremers Willi Schwarz Mai 1996 Autore: Herausgeber: Prof Dr Heiz Cremers Quatitative Methode ud Spezielle Bakbetriebslehre Hochschule für Bakwirtschaft Dr Willi
More informationCS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations
CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad
More informationEstimating the Mean and Variance of a Normal Distribution
Estimatig the Mea ad Variace of a Normal Distributio Learig Objectives After completig this module, the studet will be able to eplai the value of repeatig eperimets eplai the role of the law of large umbers
More informationTrigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is
0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values
More informationChapter 7: Confidence Interval and Sample Size
Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum
More informationDiscrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13
EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may
More informationThe following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles
The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio
More informationZTEST / ZSTATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown
ZTEST / ZSTATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large TTEST / TSTATISTIC: used to test hypotheses about
More informationIran. J. Chem. Chem. Eng. Vol. 26, No.1, 2007. Sensitivity Analysis of Water Flooding Optimization by Dynamic Optimization
Ira. J. Chem. Chem. Eg. Vol. 6, No., 007 Sesitivity Aalysis of Water Floodig Optimizatio by Dyamic Optimizatio Gharesheiklou, Ali Asghar* + ; MousaviDehghai, Sayed Ali Research Istitute of Petroleum Idustry
More informationFactors of sums of powers of binomial coefficients
ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the
More informationUnit 20 Hypotheses Testing
Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect
More informationPresent Values, Investment Returns and Discount Rates
Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies
More informationTHE HEIGHT OF qbinary SEARCH TREES
THE HEIGHT OF qbinary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average
More informationSIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR. By Peter J. Bickel, Ya acov Ritov and Alexandre B. Tsybakov
Submitted to the Aals of Statistics SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR By Peter J. Bickel, Ya acov Ritov ad Alexadre B. Tsybakov We show that, uder a sparsity sceario, the Lasso estimator
More information