AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "AMS 2000 subject classification. Primary 62G08, 62G20; secondary 62G99"

Transcription

1 VARIABLE SELECTION IN NONPARAMETRIC ADDITIVE MODELS Jia Huag 1, Joel L. Horowitz 2 ad Fegrog Wei 3 1 Uiversity of Iowa, 2 Northwester Uiversity ad 3 Uiversity of West Georgia Abstract We cosider a oparametric additive model of a coditioal mea fuctio i which the umber of variables ad additive compoets may be larger tha the sample size but the umber of o-zero additive compoets is small relative to the sample size. The statistical problem is to determie which additive compoets are o-zero. The additive compoets are approximated by trucated series expasios with B-splie bases. With this approximatio, the problem of compoet selectio becomes that of selectig the groups of coefficiets i the expasio. We apply the adaptive group Lasso to select ozero compoets, usig the group Lasso to obtai a iitial estimator ad reduce the dimesio of the problem. We give coditios uder which the group Lasso selects a model whose umber of compoets is comparable with the uderlyig model, ad the adaptive group Lasso selects the o-zero compoets correctly with probability approachig oe as the sample size icreases ad achieves the optimal rate of covergece. The results of Mote Carlo experimets show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the applicatio of the proposed method. KEY WORDS: adaptive group Lasso; compoet selectio; high-dimesioal data; oparametric regressio; selectio cosistecy. Short title. Noparametric compoet selectio AMS 2000 subject classificatio. Primary 62G08, 62G20; secodary 62G99 AOS R2 (December 7,

2 1 Itroductio Let (Y i, X i, i = 1,..., be radom vectors that are idepedetly ad idetically distributed as (Y, X, where Y is a respose variable ad X = (X 1,..., X p is a p-dimesioal covariate vector. Cosider the oparametric additive model p Y i = µ + f j (X ij + ε i, (1 j=1 where µ is a itercept term, X ij is the jth compoet of X i, the f j s are ukow fuctios, ad ε i is a uobserved radom variable with mea zero ad fiite variace σ 2. Suppose that some of the additive compoets f j are zero. The problem addressed i this paper is to distiguish the ozero compoets from the zero compoets ad estimate the ozero compoets. We allow the possibility that p is larger tha the sample size, which we represet by lettig p icrease as icreases. We propose a pealized method for variable selectio i (1 ad show that the proposed method ca correctly select the ozero compoets with high probability. There has bee much work o pealized methods for variable selectio ad estimatio with high-dimesioal data. Methods that have bee proposed iclude the bridge estimator (Frak ad Friedma 1993; Huag, Horowitz ad Ma 2008; least absolute shrikage ad selectio operator or Lasso (Tibshirai 1996, the smoothly clipped absolute deviatio (SCAD pealty (Fa ad Li 2001; Fa ad Peg 2004, ad the miimum cocave pealty (Zhag Much progress has bee made i uderstadig the statistical properties of these methods. I particular, may authors have studied the variable selectio, estimatio ad predictio properties of the Lasso i high-dimesioal settigs. See for example, Meishause ad Bühlma (2006; Zhao ad Yu (2006; Zou (2006; Buea, Tsybakov ad Wegkamp (2007; Meishause ad Yu (2008; Huag, Ma ad Zhag (2008; va de Geer (2008 ad Zhag ad Huag (2008, amog others. All these authors assume a liear or other parametric model. I may applicatios, however, there is little a priori justificatio for assumig that the effects of covariates take a liear form or belog to ay other kow, fiite-dimesioal parametric family. For example, i studies of ecoomic developmet, the effects of covariates o the growth of gross domestic product ca be oliear. Similarly, there is evidece of oliearity i the gee expressio data used i the empirical example i Sectio 5. There is a large body of literature o estimatio i oparametric additive models. For example, Stoe (1985, 1986 showed that additive splie estimators achieve the same optimal rate of covergece for a geeral fixed p as for p = 1. Horowitz ad Mamme (2004 ad Horowitz, 2

3 Klemelä, ad Mamme (2006 showed that if p is fixed ad mild regularity coditios hold, the oracle-efficiet estimates of the f j s ca be obtaied by a two-step procedure. Here, oracle efficiecy meas that the estimator of each f j has the same asymptotic distributio that it would have if all the other f j s were kow. However, these papers do ot discuss variable selectio i oparametric additive models. Zhag et al. (2004 ad Li ad Zhag (2006 have ivestigated the use of pealizatio methods i smoothig splie ANOVA with a fixed umber of covariates. Zhag et al. (2004 used a Lasso-type pealty but did ot ivestigate model-selectio cosistecy. Li ad Zhag (2006 proposed the compoet selectio ad smoothig operator (COSSO method for model selectio ad estimatio i multivariate oparametric regressio models. For fixed p, they showed that the COSSO estimator i the additive model coverges at the rate d/(2d+1, where d is the order of smoothess of the compoets. They also showed that, i the special case of a tesor product desig, the COSSO correctly selects the o-zero additive compoets with high probability. Zhag ad Li (2006 cosidered the COSSO for oparametric regressio i expoetial families. Meier, va de Geer, ad Bühlma (2008 treat variable selectio i a oparametric additive model i which the umbers of zero ad o-zero f j s may both be larger tha. They propose a pealized least-squares estimator for variable selectio ad estimatio. They give coditios uder which, with probability approachig 1, their procedure selects a set of f j s cotaiig all the additive compoets whose distace from zero i a certai metric exceeds a specified threshold. However, they do ot establish model-selectio cosistecy of their procedure. Eve asymptotically, the selected set may be larger tha the set of o-zero f j s. Moreover, they impose a compatibility coditio that relates the levels ad smoothess of the f j s. The compatibility coditio does ot have a straightforward, ituitive iterpretatio ad, as they poit out, caot be checked empirically. Ravikumar, Liu, Lafferty ad Wasserma (2009 proposed a pealized approach for variable selectio i oparametric additive models. I their approach, the pealty is imposed o the l 2 orm of the oparametric compoets, as well as the mea value of the compoets to esure idetifiability. I their theoretical results, they require that the eigevalues of a desig matrix be bouded away from zero ad ifiity, where the desig matrix is formed from the basis fuctios for the ozero compoets. It is ot clear whether this coditio holds i geeral, especially whe the umber of ozero compoets diverges with. Aother critical coditio required i the results of Ravikumar et al. (2009 is similar to the irrepresetable coditio of Zhao ad Yu (2007. It is ot clear for what type of basis fuctios this coditio is satisfied. We do ot require such a coditio i our results o selectio cosistecy of the adaptive group Lasso. 3

4 Several other recet papers have also cosidered variable selectio i oparametric models. For example, Wag, Che ad Li (2007 ad Wag ad Xia (2008 cosidered the use of group Lasso ad SCAD methods for model selectio ad estimatio i varyig coefficiet models with a fixed umber of coefficiets ad covariates. Bach (2007 applies what amouts to the group Lasso to a oparametric additive model with a fixed umber of covariates. He established model selectio cosistecy uder coditios that are cosiderably more complicated tha the oes we require for a possibly divergig umber of covariates. I this paper, we propose to use the adaptive group Lasso for variable selectio i (1 based o a splie approximatio to the oparametric compoets. With this approximatio, each oparametric compoet is represeted by a liear combiatio of splie basis fuctios. Cosequetly, the problem of compoet selectio becomes that of selectig the groups of coefficiets i the liear combiatios. It is atural to apply the group Lasso method, sice it is desirable to take ito the groupig structure i the approximatig model. To achieve model selectio cosistecy, we apply the group Lasso iteratively as follows. First, we use the group Lasso to obtai a iitial estimator ad reduce the dimesio of the problem. The we use the adaptive group Lasso to select the fial set of oparametric compoets. The adaptive group Lasso is a simple geeralizatio of the adaptive Lasso (Zou 2006 to the method of the group Lasso (Yua ad Li However, here we apply this approach to oparametric additive modelig. We assume that the umber of o-zero f j s is fixed. This eables us to achieve model selectio cosistecy uder simple assumptios that are easy to iterpret. We do ot have to impose compatibility or irrepresetable coditios, or do we eed to assume coditios o the eigevalues of certai matrices formed from the splie basis fuctios. We show that the group Lasso selects a model whose umber of compoets is bouded with probability approachig oe by a costat that is idepedet of the sample size. The, usig the group Lasso result as the iitial estimator, the adaptive group Lasso selects the correct model with probability approachig 1 ad achieves the optimal rate of covergece for oparametric estimatio of a additive model. The remaider of the paper is orgaized as follows. Sectio 2 describes the group Lasso ad the adaptive group Lasso for variable selectio i oparametric additive models. Sectio 3 presets the asymptotic properties of these methods i large p, small settigs. Sectio 4 presets the results of simulatio studies to evaluate the fiite-sample performace of these methods. Sectio 5 provides a illustrative applicatio, ad Sectio 6 icludes cocludig remarks. Proofs of the results stated i Sectio 3 are give i the Appedix. 4

5 2 Adaptive group Lasso i oparametric additive models We describe a two-step approach that uses the group Lasso for variable selectio based o a splie represetatio of each compoet i additive models. I the first step, we use the stadard group Lasso to achieve a iitial reductio of the dimesio i the model ad obtai a iitial estimator of the oparametric compoets. I the secod step, we use the adaptive group Lasso to achieve cosistet selectio. Suppose that each X j takes values i [a, b] where a < b are fiite umbers. To esure uique idetificatio of the f j s, we assume that Ef j (X j = 0, 1 j p. Let a = ξ 0 < ξ 1 < < ξ K < ξ K+1 = b be a partitio of [a, b] ito K subitervals I Kt = [ξ t, ξ t+1, t = 0,..., K 1 ad I KK = [ξ K, ξ K+1 ], where K K = v with 0 < v < 0.5 is a positive iteger such that max 1 k K+1 ξ k ξ k 1 = O( v. Let S be the space of polyomial splies of degree l 1 cosistig of fuctios s satisfyig: (i the restrictio of s to I Kt is a polyomial of degree l for 1 t K; (ii for l 2 ad 0 l l 2, s is l times cotiuously differetiable o [a, b]. This defiitio is phrased after Stoe (1985, which is a descriptive versio of Schumaker (1981, page 108, Defiitio 4.1. There exists a ormalized B-splie basis {φ k, 1 k m } for S, where m (Schumaker Thus for ay f j S, we ca write K + l m f j (x = β jk φ k (x, 1 j p. (2 k=1 Uder suitable smoothess assumptios, the f j s ca be well approximated by fuctios i S. Accordigly, the variable selectio method described i this paper is based o the represetatio (2. Let a 2 ( m j=1 a j 2 1/2 deote the l2 orm of ay vector a IR m. Let β j = (β j1,..., β jm ad β = (β 1,..., β p. Let w = (w 1,..., w p be a give vector of weights, where 0 w j, 1 j p. Cosider the pealized least squares criterio L (µ, β = i=1 [ Y i µ p m ] 2 p β jk φ k (X ij + λ w j β j 2, (3 j=1 k=1 j=1 where λ is a pealty parameter. We study the estimators that miimize L (µ, β subject to the costraits m β jk φ k (X ij = 0, 1 j p. (4 i=1 k=1 5

6 These ceterig costraits are sample aalogs of the idetifyig restrictio Ef j (X j = 0, 1 j p. We ca covert (3-(4 to a ucostraied optimizatio problem by ceterig the respose ad the basis fuctios. Let φ jk = 1 φ k (X ij, ψ jk (x = φ k (x φ jk. (5 i=1 For simplicity ad without causig cofusio, we simply write ψ k (x = ψ jk (x. Defie Z ij = ( ψ 1 (X ij,..., ψ m (X ij. So Z ij cosists of values of the (cetered basis fuctios at the ith observatio of the jth covariate. Let Z j = (Z 1j,..., Z j be the m desig matrix correspodig to the jth covariate. The total desig matrix is Z = (Z 1,..., Z p. Let Y = (Y 1 Y,..., Y Y. With this otatio, we ca write L (β ; λ = Y Zβ λ p w j β j 2. (6 Here we have dropped µ i the argumet of L. With the ceterig, µ = Y. The miimizig (3 subject to (4 is equivalet to miimizig (6 with respect to β, but the ceterig costraits are ot eeded for (6. We ow describe the two-step approach to compoet selectio i the oparametric additive model (1. Step 1. Compute the group Lasso estimator. Let j=1 L 1 (β, λ 1 = Y Zβ λ 1 p β j 2. This objective fuctio is the special case of (6 that is obtaied by settig w j = 1, 1 j p. The group Lasso estimator is β β (λ 1 = arg mi β L 1 (β ; λ 1. Step 2. Use the group Lasso estimator β to obtai the weights by settig w j = j=1 { βj 1 2, if β j 2 > 0,, if β j 2 = 0. 6

7 The adaptive group Lasso objective fuctio is L 2 (β ; λ 2 = Y Zβ λ 2 p w j β j 2. Here we defie 0 = 0. Thus the compoets ot selected by the group Lasso are ot icluded i Step 2. The adaptive group Lasso estimator is β β (λ 2 = arg mi β L 2 (β ; λ 2. Fially, the adaptive group Lasso estimators of µ ad f j are 3 Mai results µ = Y 1 Y i, fj (x = i=1 m k=1 j=1 β jk ψ k (x, 1 j p. This sectio presets our results o the asymptotic properties of the estimators defied i Steps 1 ad 2 of Sectio 2. Let k be a o-egative iteger, ad let α (0, 1] be such that d = k + α > 0.5. Let F be the class of fuctios f o [0, 1] whose kth derivative f (k exists ad satisfies a Lipschitz coditio of order α: f (k (s f (k (t C s t α for s, t [a, b]. I (1, without loss of geerality, suppose that the first q compoets are ozero, that is, f j (x 0, 1 j q, but f j (x 0, q + 1 j p. Let A 1 = {1,..., q} ad A 0 = {q + 1,..., p}. Defie f 2 = [ b a f 2 (xdx] 1/2 for ay fuctio f, wheever the itegral exists. We make the followig assumptios. (A1 The umber of ozero compoets q is fixed ad there is a costat c f > 0 such that mi 1 j q f j 2 c f. (A2 The radom variables ε 1,..., ε are idepedet ad idetically distributed with Eε i = 0 ad Var(ε i = σ 2. Furthermore, their tail probabilities satisfy P ( ε i > x K exp( Cx 2, i = 1,...,, for all x 0 ad for costats C ad K. (A3 Ef j (X j = 0 ad f j F, j = 1,..., q. (A4 The covariate vector X has a cotiuous desity ad there exist costats C 1 ad C 2 such that the desity fuctio g j of X j satisfies 0 < C 1 g j (x C 2 < o [a, b] for every 1 j p. We ote that (A1, (A3, ad (A4 are stadard coditios for oparametric additive models. They would be eeded to estimate the ozero additive compoets at the optimal l 2 rate of 7

8 covergece o [a, b], eve if q were fixed ad kow. Oly (A2 stregthes the assumptios eeded for oparametric estimatio of a oparametric additive model. While coditio (A1 is reasoable i most applicatios, it would be iterestig to relax this coditio ad ivestigate the case whe the umber of ozero compoets ca also icrease with the sample size. The oly techical reaso that we assume this coditio is related to Lemma 3 give i the appedix, which is cocered with the properties of the smallest ad largest eigevalues of the desig matrix formed from the splie basis fuctios. If this lemma ca be exteded to the case of a diverget umber of compoets, the (A1 ca be relaxed. However, it is clear that there eeds to be restrictio o the umber of ozero compoets to esure model idetificatio. 3.1 Estimatio cosistecy of the group Lasso I this sectio, we cosider the selectio ad estimatio properties of the group Lasso estimator. Defie Ã1 = {j : β j 2 0, 1 j p}. Let A deote the cardiality of ay set A {1,..., p}. Theorem 1 Suppose that (A1 to (A4 hold ad λ 1 C log(pm for a sufficietly large costat C. (i With probability covergig to 1, Ã1 M 1 A 1 = M 1 q for a fiite costat M 1 > 1. (ii If m 2 log(pm / 0 ad (λ 2 1m / 2 0 as, the all the ozero β j, 1 j q, are selected with probability covergig to oe. (iii p j=1 ( m β j β j = O log(pm p ( m ( 1 + O p + O + O m 2d 1 ( 4m 2 λ 2 1 Part (i of Theorem 1 says that, with probability approachig 1, the group Lasso selects a model whose dimesio is a costat multiple of the umber of o-zero additive compoets f j, regardless of the umber of additive compoets that are zero. Part (ii implies that every ozero coefficiet will be selected with high probability. Part (iii shows that the differece betwee the coefficiets i the splie represetatio of the oparametric fuctios i (1 ad their estimators coverges to zero i probability. The rate of covergece is determied by four terms: the stochastic error i estimatig the oparametric compoets (the first term ad the itercept µ (the secod term, the splie approximatio error (the third term ad the bias due to pealizatio (the fourth term. Let f j (x = m j=1 β jk ψ(x, 1 j p. The followig theorem is a cosequece of Theorem

9 Theorem 2 Suppose that (A1 to (A4 hold ad that λ 1 C log(pm for a sufficietly large costat C. The, (i Let Ãf = {j : f j 2 > 0, 1 j p}. There is a costat M 1 > 1 such that, with probability covergig to 1, Ãf M 1 q. (ii If (m log(pm / 0 ad (λ 2 1m / 2 0 as, the all the ozero additive compoets f j, 1 j q, are selected with probability covergig to oe. (iii f ( j f j 2 m log(pm ( 1 ( 1 2 = O p + O p + O + O m 2d where Ã2 = A 1 Ã1. ( 4m λ 2 1 2, j Ã2, Thus uder the coditios of Theorem 2, the group Lasso selects all the ozero additive compoets with high probability. Part (iii of the theorem gives the rate of covergece of the group Lasso estimator of the oparametric compoets. For ay two sequeces {a, b, = 1, 2,..., }, we write a b if there are costats 0 < c 1 < c 2 < such that c 1 a /b c 2 for all sufficietly large. We ow state a useful corollary of Theorem 2. Corollary 1 Suppose that (A1 to (A4 hold. If λ 1 log(pm ad m 1/(2d+1, the, (i If 2d/(2d+1 log(p 0 as, the with probability covergig to oe, all the ozero compoets f j, 1 j q, are selected ad the umber of selected compoets is o more tha M 1 q. (ii f j f j 2 2 = O p ( 2d/(2d+1 log(pm, j Ã2. For the λ 1 ad m give i Corollary 1, the umber of zero compoets ca be as large as exp(o( 2d/(2d+1. For example, if each f j has cotiuous secod derivative (d = 2, the it is exp(o( 4/5, which ca be much larger tha. 3.2 Selectio cosistecy of the adaptive group Lasso We ow cosider the properties of the adaptive group Lasso. We first state a geeral result cocerig the selectio cosistecy of the adaptive group Lasso, assumig a iitial cosistet estimator is available. We the apply to the case whe the group Lasso is used as the iitial estimator. We make the followig assumptios. 9

10 (B1 The iitial estimators β j are r -cosistet at zero: ad there exists a costat c b > 0 such that r max j A 0 β j 2 = O P (1, r, where b 1 = mi j A1 β j 2. P(mi j A 1 β j 2 c b b 1 1, (B2 Let q be the umber of ozero compoets ad s = p q be the umber of zero compoets. Suppose that (a (b 1/2 log 1/2 (s m λ 2 r + m + λ 2m 1/4 1/2 λ 2 r m (2d+1/2 = o(1, = o(1. We state coditio (B1 for a geeral iitial estimator, to highlight the poit that the availability of a r -cosistet estimator at zero is crucial for the adaptive group Lasso to be selectio cosistet. I other words, ay iitial estimator satisfyig (B1 will esure that the adaptive group Lasso (based o this iitial estimator is selectio cosistet, provided that certai regularity coditios are satisfied. We ote that it follows immediately from Theorem 1 that the group Lasso estimator satisfies (B1. We will come back to this poit below. For β ( β 1,..., β p ad β (β 1,..., β p, we say β = 0 β if sg 0 ( β j = sg 0 ( β j, 1 j p, where sg 0 ( x = 1 if x > 0 ad = 0 if x = 0. Theorem 3 Suppose that coditios (B1, (B2 ad (A1-(A4 hold. The (i P( β = 0 β 1. (ii q j=1 ( m β j β j = O p ( m ( 1 + O p + O + O m 2d 1 ( 4m 2 λ This theorem is cocered with the selectio ad estimatio properties of the adaptive group Lasso i terms of β. The followig theorem states the results i terms of the estimators of the oparametric compoets. 10

11 Theorem 4 Suppose that coditios (B1, (B2 ad (A1-(A4 hold. The (i P ( f j 2 > 0, j A 1 ad f j 2 = 0, j A 0 1. (ii q j=1 f ( j f j 2 m ( 1 ( 1 2 = O p + O p + O + O m 2d ( 4m λ Part (i of this theorem states that the adaptive group Lasso ca cosistetly distiguish ozero compoets from zero compoets. Part (ii gives a upper boud o the rate of covergece of the estimator. We ow apply the above results to our proposed procedure described i Sectio 2, i which we first obtai the the group Lasso estimator ad the use it as the iitial estimator i the adaptive group Lasso. By Theorem 1, if λ 1 log(pm ad m 1/(2d+1 for d 1, the the group Lasso estimator satisfies (B1 with r d/(2d+1 / log(pm. I this case, (B2 simplifies to λ 2 = o(1 ad 1/(4d+2 log 1/2 (pm = o(1. (7 (8d+3/(8d+4 λ 2 We summarize the above discussio i the followig corollary. Corollary 2 Let the group Lasso estimator β β (λ 1 with λ 1 log(pm ad m 1/(2d+1 be the iitial estimator i the adaptive group Lasso. Suppose that the coditios of Theorem 1 hold. If λ 2 O( 1/2 ad satisfies (7, the the adaptive group Lasso cosistetly selects the ozero compoets i (1, that is, part (i of Theorem 4 holds. I additio, q f ( j f j 2 2 = O p 2d/(2d+1. j=1 This corollary follows directly from Theorems 1 ad 4. The largest λ 2 allowed is λ 2 = O( 1/2. With this λ 2, the first equatio i (6 is satisfied. Substitute it ito the secod equatio i (6, we obtai p = exp(o( 2d/(2d+1, which is the largest p permitted ad ca be larger tha. Thus, uder the coditios of this corollary, our proposed adaptive group Lasso estimator usig the group Lasso as the iitial estimator is selectio cosistet ad achieves optimal rate of covergece eve whe p is larger tha. Followig model selectio, oracle-efficiet, asymptotically ormal estimators of the o-zero compoets ca be obtaied by usig existig methods. 11

12 4 Simulatio studies We use simulatio to evaluate the performace of the adaptive group Lasso with regard to variable selectio. The geeratig model is y i = f(x i + ε i p f j (x ij + ε i, i = 1,,. (8 j=1 Sice p ca be larger tha, we cosider two ways to select the pealty parameter, the BIC (Schwarz 1978 ad the EBIC (Che ad Che 2008, The BIC is defied as BIC(λ = log(rss λ + df λ log. Here RSS λ is the residual sum of squares for a give λ, ad the degrees of freedom df λ = ˆq λ m, where ˆq λ is the umber of ozero estimated compoets for the give λ. The EBIC is defied as EBIC(λ = log(rss λ + df λ log where 0 ν 1 is a costat. We use ν = ν df λ log p, We have also cosidered two other possible ways of defiig df: (a usig the trace of a liear smoother based o a quadratic approximatio; (b usig the umber of estimated ozero compoets. We have decided to use the defiitio give above based o the results from our simulatios. We ote that the df for the group Lasso of Yua ad Li (2006 requires a iitial (least squares estimator, which is ot available whe p >. Thus their df is ot applicable to our problem. I our simulatio example, we compare the adaptive group Lasso with the group Lasso ad ordiary Lasso. Here the ordiary Lasso estimator is defied as the value that miimizes Y Zβ λ p m β jk. j=1 k=1 This simple applicatio of the Lasso does ot take ito accout the groupig structure i the splie expasios of the compoets. The group Lasso ad the adaptive group Lasso estimates are computed usig the algorithm proposed by Yua ad Li (2006. The ordiary Lasso estimates are computed usig the Lars algorithms (Efro et al The group Lasso is used as the iitial estimate for the adaptive group Lasso. We also compare the results from the oparametric additive modelig with those from the 12

13 stadard liear regressio model with Lasso. We ote that this is ot a fair compariso because the geeratig model is highly oliear. Our purpose is to illustrate that it is ecessary to use oparametric models whe the uderlyig model deviates substatially from liear models i the cotext of variable selectio with high-dimesioal data ad that model misspecificatio ca lead to bad selectio results. Example 1. We geerate data from the model y i = f(x i + ε i p f j (x ij + ε i, i = 1,,, j=1 where f 1 (t = 5t, f 2 (t = 3(2t 1 2, f 3 (t = 4si(2πt/(2 si(2πt, f 4 (t = 6(0.1 si(2πt cos(2πt si(2πt cos(2πt si(2πt 3, ad f 5 (t = = f p (t = 0. Thus the umber of ozero fuctios is q = 4. This geeratig model is the same as Example 1 of Li ad Zhag (2006. However, here we use this model i high-dimesioal settigs. We cosider the cases where p = 1000 ad three differet sample sizes: = 50, 100 ad 200. We use the cubic B-splie with six evely distributed kots for all the fuctios f k. The umber of replicatios i all the simulatios is 400. The covariates are simulated as follows. First, we geerate w i1,, w ip, u i, u i, v i idepedetly from N(0, 1 trucated to the iterval [0, 1], i = 1,,. The we set x ik = (w ik + tu i /(1 + t for k = 1,, 4 ad x ik = (w ik + tv i /(1 + t for k = 5,, p, where the parameter t cotrols the amout of correlatio amog predictors. We have Corr(x ik, x ij = t 2 /(1 + t 2, 1 j 4, 1 k 4 ad Corr(x ik, x ij = t 2 /(1 + t 2, 4 j p, 4 k p, but the covariates of the ozero compoets ad zero compoets are idepedet. We cosider t = 0, 1 i our simulatio. The sigal to oise ratio is defied to be sd(f/sd(ɛ. The error term is chose to be ɛ i N(0, to give a sigal-to-oise ratio (SNR 3.11 : 1. This value is the same as the estimated SNR i the real data example below, which is the square root of the ratio of the sum of estimated compoets squared divided by the sum of residual squared. The results of 400 Mote Carlo replicatios are summarized i Table 1. The colums are the mea umber of variables selected (NV, model error (ER, the percetage of replicatios i which all the correct additive compoets are icluded i the selected model (IN, ad the percetage of replicatios i which precisely the correct compoets are selected (CS. The correspodig stadard errors are i paretheses. The model error is computed as the average of 1 i=1 [ ˆf(x i f(x i ] 2 over the 400 Mote Carlo replicatios, where f is the true coditioal mea fuctio. Table 1 shows that the adaptive group Lasso selects all the o-zero compoets (IN ad selects exactly the correct model (CS more frequetly tha the other methods do. For example, 13

14 with the BIC ad = 200, the percetage of correct selectios (CS by the adaptive group Lasso rages from 65.25% to 81%, which is much higher tha the rages % for the group Lasso ad % for the ordiary Lasso. The adaptive group Lasso ad group Lasso perform better tha the ordiary Lasso i all of the experimets, which illustrates the importace of takig accout of the group structure of the coefficiets of the splie expasio. Correlatio amog covariates icreases the difficulty of compoet selectio, so it is ot surprisig that all methods perform better with idepedet covariates tha with correlated oes. The percetage of correct selectios icreases as the sample size icreases. The liear model with Lasso ever selects the correct model. This illustrates the poor results that ca be produced by a liear model whe the true coditioal mea fuctio is oliear. Table 1 also shows that the model error (ME of the group Lasso is oly slightly larger tha that of the adaptive group Lasso. The models selected by the group Lasso est ad, therefore, have more estimated coefficiets tha the models selected by the adaptive group Lasso. Therefore, the group Lasso estimators of the coditioal mea fuctio have a larger variace ad larger ME. The differeces betwee the MEs of the two methods are small, however, because as ca be see from the NV colum, the models selected by the group Lasso i our experimets have oly slightly more estimated coefficiets tha the models selected by the adaptive group Lasso. Example 2. We ow compare the adaptive group Lasso with the COSSO (Li ad Zhag This compariso is suggested to us by the Associate Editor. Because the COSSO algorithm oly works for the case whe p is smaller tha, we use the same set-up as i Example 1 of Li ad Zhag (2006. I this example, the geeratig model is as i (8 with 4 ozero compoets. Let X j = (W j + tu/(1 + t, j = 1,, p, where W 1,, W p ad U are i.i.d. from N(0, 1, trucated to the iterval [0, 1]. Therefore corr(x j, X k = t 2 /(1 + t 2 for j k. The radom error term ɛ N(0, The SNR is 3:1. We cosider three differet sample sizes = 50, 100 or 200 ad three differet umber of predictors p = 10, 20 or 50. The COSSO estimator is computed usig the Matlab software which is publicly available at hzhag/cosso.html. The COSSO procedure uses either geeralized cross validatio or 5-fold cross validatio. Based the simulatio results of Li ad Zhag (2006 ad our ow simulatios, the COSSO with 5-fold cross validatio has better selectio performace. Thus we compare the adaptive group Lasso with BIC or EBIC with the COSSO with 5-fold cross validatio. The results are give i Table 2. For idepedet predictors, whe = 200 ad p = 10, 20 or 50, the adaptive group Lasso ad COSSO have similar performace i terms of selectio accuracy ad model error. However, for smaller ad larger p, the adaptive group Lasso does sigificatly better. For example, for = 100 ad p = 50, the percetage of correct selectio for the adaptive group Lasso is 81-83%, 14

15 but it is oly 11% for the COSSO. The model error of the adaptive group Lasso is similar to or smaller tha that of the COSSO. I several experimets, the model error of the COSSO is 2 to more tha 7 times larger tha that of the adaptive group Lasso. It is iterestig to ote that whe = 50 ad p = 20 or 50, the adaptive group Lasso still does a descet job i selectig the correct model, but the COSSO does poorly i these two cases. I particular, for = 50 ad p = 50, the COSSO did ot select the exact correct model i all the simulatio rus. For depedet predictors, the compariso is eve mode favorable to the adaptive group Lasso, which performs sigificatly better tha COSSO i terms of both model error ad selectio accuracy i all the cases. 5 Data example We use the data set reported i Scheetz et al. (2006 to illustrate the applicatio of the proposed method i high-dimesioal settigs. For this data set, 120 twelve-week-old male rats were selected for tissue harvestig from the eyes ad for microarray aalysis. The microarrays used to aalyze the RNA from the eyes of these aimals cotai over 31,042 differet probe sets (Affymetric GeeChip Rat Geome Array. The itesity values were ormalized usig the robust multi-chip averagig method (Irizzary et al method to obtai summary expressio values for each probe set. Gee expressio levels were aalyzed o a logarithmic scale. We are iterested i fidig the gees that are related to the gee TRIM32. This gee was recetly foud to cause Bardet-Biedl sydrome (Chiag et al. (2006, which is a geetically heterogeeous disease of multiple orga systems icludig the retia. Although over 30,000 probe sets are represeted o the Rat Geome Array, may of them are ot expressed i the eye tissue ad iitial screeig usig correlatio shows that most probe sets have very low correlatio with TRIM32. I additio, we are expectig oly a small umber of gees to be related to TRIM32. Therefore, we use 500 probe sets that are expressed i the eye ad have highest margial correlatio i the aalysis. Thus the sample size is = 120 (i.e., there are 120 arrays from 120 rats ad p = 500. It is expected that oly a few gees are related to TRIM32. Therefore this is a sparse, high-dimesioal regressio problem. We use the oparametric additive model to model the relatio betwee the expressio of TRIM32 ad those of the 500 gees. We estimate model (1 usig the ordiary Lasso, group Lasso, ad adaptive group Lasso for the oparametric additive model. To compare the results of the oparametric additive model with that of the liear regressio model, we also aalyzed the data usig the liear regressio model with Lasso. We scale the covariates so that their values are 15

16 betwee 0 ad 1 ad use cubic splies with six evely distributed kots to estimate the additive compoets. The pealty parameters i all the methods are chose usig the BIC or EBIC as i the simulatio study. Table 3 lists the probes selected by the group Lasso ad the adaptive group Lasso, idicated by the check sigs. Table 4 shows the umber of variables, the residual sums of squares obtaied with each estimatio method. For the ordiary Lasso with the splie expasio, a variable is cosidered to be selected if ay of the estimated coefficiets of the splie approximatio to its additive compoet are o-zero. Depedig o whether BIC or EBIC is used, the group Lasso selects variables, the adaptive group Lasso selects 15 variables ad the ordiary Lasso with the splie expasio selects variables, the liear model selects 8-14 variables. Table 4 shows that the adaptive group Lasso does better tha the other methods i terms of residual sum of squares (RSS. We have also examied the plots (ot show of the estimated additive compoets obtaied with the group Lasso ad the adaptive group Lasso, respectively. Most are highly oliear, cofirmig the eed for takig ito accout oliearity. I order to evaluate the performace of the methods, we use cross-validatio ad compare the predictio mea square errors (PEs. We radomly partitio the data ito 6 subsets, each set cosistig of 20 observatios. We the fit the model with 5 subsets as traiig set ad calculate the PE for the remaiig set which we cosider as test set. We repeat this process 6 times, cosiderig oe of the 6 subsets as test set every time. We compute the average of the umbers of probes selected ad the predictio errors of these 6 calculatios. The we replicate this process 400 times (this is suggested to us by the Associate Editor. Table 5 gives the average values over 400 replicatios. The adaptive group Lasso has smaller average predictio error tha the group Lasso, the ordiary Lasso ad the liear regressio with Lasso. The ordiary Lasso selects far more probe sets tha the other approaches, but this does ot lead to better predictio performace. Therefore, i this example, the adaptive group Lasso provides the ivestigator a more targeted list of probe sets, which ca serve as a startig poit for further study. It is of iterest to compare the selectio results from the adaptive group Lasso ad the liear regressio model with Lasso. The adaptive group Lasso ad the liear model with Lasso select differet sets of gees. Whe the pealty parameter is chose with the BIC, the adaptive group Lasso selects 5 gees that are ot selected by the liear model with Lasso. I additio, the liear model with Lasso selects 5 gees that are ot selected by the adaptive group Lasso. Whe the pealty parameter is selected with the EBIC, the adaptive group Lasso selects 10 gees that are ot selected by the liear model with Lasso. The estimated effects of may of the gees are oliear, ad the Mote Carlo results of Sectio 4 show that the performace of the liear model with Lasso ca be very poor i the presece of oliearity. Therefore, we iterpret the differeces betwee 16

17 the gee selectios of the adaptive group Lasso ad the liear model with Lasso as evidece that the selectios produced by the liear model are misleadig. 6 Cocludig remarks I this paper, we propose to use the adaptive group Lasso for variable selectio i oparametric additive models i sparse, high-dimesioal settigs. A key requiremet for the adaptive group Lasso to be selectio cosistet is that the iitial estimator is estimatio cosistet ad selects all the importat compoets with high probability. I low-dimesioal settigs, fidig a iitial cosistet estimator is relatively easy ad ca be achieved by may well established approaches such as the additive splie estimators. However, i high-dimesioal settigs, fidig a iitial cosistet estimator is difficult. Uder the coditios stated i Theorem 1, the group Lasso is show to be cosistet ad selects all the importat compoets. Thus the group Lasso ca be used as the iitial estimator i the adaptive Lasso to achieve selectio cosistecy. Followig model selectio, oracle-efficiet, asymptotically ormal estimators of the o-zero compoets ca be obtaied by usig existig methods. Our simulatio results idicate that our procedure works well for variable selectio i the models cosidered. Therefore, the adaptive group Lasso is a useful approach for variable selectio ad estimatio i sparse, high-dimesioal oparametric additive models. Our theoretical results are cocered with a fixed sequece of pealty parameters, which are ot applicable to the case where the pealty parameters are selected based o data drive procedures such as the BIC. This is a importat ad challegig problem that deserves further ivestigatio, but is beyod the scope of this paper. We have oly cosidered liear oparametric additive models. The adaptive group Lasso ca be applied to geeralized oparametric additive models, such as the geeralized logistic oparametric additive model ad other oparametric models with high-dimesioal data. However, more work is eeded to uderstad the properties of this approach i those more complicated models. Ackowledgemets. The authors wish to thak the Editor, Associate Editor ad two aoymous referees for their helpful commets. Jia Huag is supported i part by NIH grat CA ad NSF grat DMS Joel L. Horowitz was supported i part by NSF Grat grat SES

18 7 Appedix: Proofs We first prove the followig lemmas. Deote the cetered versios of S by S 0 j = { f j : f j (x = m k=1 b jk ψ k (x, (β j1,..., β jm IR m where ψ k s are the cetered splie bases defied i (5. }, 1 j p, Lemma 1 Suppose that f F ad Ef(X j = 0. The, uder (A3 ad (A4, there exists a f S 0 j satisfyig f f 2 = O p (m d + m 1/2 1/2. I particular, if we choose m = O( 1/(2d+1, the f f 2 = O p (m d = O p ( d/(2d+1. Proof of Lemma 1. By (A4, for f F, there is a f S such that f f 2 = O(m d. Let f = f 1 i=1 f (X ij. The f S 0 j ad f f f f + P f, where P is the empirical measure of iid radom variables X 1j,..., X j. Cosider P f = (P P f + P (f f. Here we use the liear fuctioal otatio, i.e., P f = fdp, where P is the probability measure of X 1j. For ay ε > 0, the bracketig umber N [] (ε, S 0 j, L 2 (P of S 0 j satisfies log N [] (ε, S 0 j, L 2 (P c 1 m log(1/ε for some costat c 1 > 0 (She ad Wog 1994, page 597. Thus by the maximal iequality, see e.g. Va der Vaart (1998, page 288, (P P f = O p ( 1/2 m 1/2. By (A4, P (f f C 2 f f 2 = O(m d for some costat C 2 > 0. The lemma follows from the triagle iequality. Lemma 2 Suppose that coditios (A2 ad (A4 hold. Let T jk = 1/2 m 1/2 ψ k (X ij ε i, 1 j p, 1 k m, i=1 ad T = max 1 j p,1 k m T jk. The E(T C 1 1/2 m 1/2 log(pm ( 2C2 m 1 log(pm + 4 log(2pm + C 2 m 1 1/2. 18

19 where C 1 ad C 2 are two positive costats. I particular, whe m log(pm / 0, E(T = O(1 log(pm. Proof of Lemma 2. Let s 2 jk = i=1 ψ2 k (X ij. Coditioal o X ij s, T jk s are subgaussia. Let s 2 = max 1 j p,1 k m s 2 jk. By (A2 ad the maximal iequality for subgaussia radom variables (Va der Vaart ad Weller 1996, Lemmas ad 2.2.2, E ( max T jk {Xij, 1 i, 1 j p} C 1 1/2 m 1/2 s log(pm. 1 j p,1 k m Therefore, E ( max T jk C 1 1/2 m 1/2 log(pm E(s, (9 1 j p,1 k m where C 1 > 0 is a costat. By (A4 ad the properties of B-splies, ψ k (X ij φ k (X ij + φ jk 2 ad E(ψ k (X ij 2 C 2 m 1, (10 for a costat C 2 > 0, for every 1 j p ad 1 k m. By (10, ad i=1 E[ψ 2 k(x ij Eψ 2 k(x ij ] 2 4C 2 m 1, (11 max 1 j p,1 k m By Lemma A.1 of Va de Geer (2008, (10 ad (11 imply i=1 i=1 ( E max {ψ 2 1 j p,1 k m k(x ij Eψk(X 2 ij } Therefore, by (12 ad the triagle iequality, Eψ 2 k(x ij C 2 m 1. (12 2C 2 m 1 log(pm + 4 log(2pm. Now sice Es (Es 2 1/2, we have Es 2 2C 2 m 1 log(pm + 4 log(2pm + C 2 m 1. Es ( 2C2 m 1 1/2. log(pm + 4 log(2pm + C 2 m 1 (13 19

20 The lemma follows from (9 ad (13. Deote β A = (β j, j A ad Z A = (Z j, j A. Here β A is a A m 1 vector ad Z A is a A m matrix. Let C A = Z A Z A/. Whe A = {1,..., p}, we simply write C = Z Z/. Let ρ mi (C A ad ρ max (C A be the miimum ad maximum eigevalues of C A, respectively. Lemma 3 Let m = O( γ where 0 < γ < 0.5. Suppose that A is bouded by a fixed costat idepedet of ad p. Let h h m 1. The, uder (A3 ad (A4, with probability covergig to oe, where c 1 ad c 2 are two positive costats. c 1 h ρ mi (C A ρ max (C A c 2 h, Proof of Lemma 3. Without loss of geerality, suppose A = {1,..., k}. The Z A = (Z 1,..., Z q. Let b = (b 1,..., b q, where b j R m. By Lemma 3 of Stoe (1985, Z 1 b Z q b q 2 c 3 ( Z 1 b Z q b q 2 for a certai costat c 3 > 0. By the triagle iequality, Z 1 b Z q b q 2 Z 1 b Z q b q 2. Sice Z A b = Z 1 b Z q b q, the above two iequalities imply that c 3 ( Z 1 b Z q b q 2 Z A b 2 Z 1 b Z q b q 2. Therefore, c 2 3( Z 1 b Z q b q 2 2 Z A b 2 2 2( Z 1 b Z q b q 2 2. (14 Let C j = 1 Z jz j. By Lemma 6.2 of Zhou, She ad Wolf (1998, c 4 h ρ mi (C j ρ max (C j c 5 h, j A. (15 Sice C A = 1 Z A Z A, it follows from (14 that c 2 3 ( b 1 C 1 b b qc q b q b C A b 2 ( b 1C 1 b b qc q b q. 20

21 Therefore, by (15, b 1C 1 b 1 b b qc q b q b 2 2 = b 1C 1 b 1 b b b b qc q b q b q 2 2 b q 2 2 b 2 2 ρ mi (C 1 b b 2 2 c 4 h. + + ρ mi (C q b q 2 2 b 2 2 Similarly, Thus we have The lemma follows. b 1C 1 b 1 b b qc q b q b 2 2 c 2 3c 4 h b C A b b b 2c 5h. c 5 h. Proof of Theorem 1. The proof of parts (i ad (ii essetially follows the proof of Theorem 2.1 of Wei ad Huag (2008. The oly chage that must be made here is that we eed to cosider the approximatio error of the regressio fuctios by splies. Specifically, let ξ = ε + δ, where δ = (δ 1,..., δ with δ i = q j=1 (f 0j(X ij f j (X ij. Sice f 0j f j 2 = O(m d = O( d/(2d+1 for m = 1/(2d+1, we have for some costat C 1 > 0. For ay iteger t, let δ 2 C 1 qm 2d = C 1 q 1/(4d+2, χ t = max A =t max U Ak 2 =1,1 k t ξ V A (s V A (s 2 ad χ t = max A =t max U Ak 2 =1,1 k t ε V A (s V A (s 2 where V A (S A = ξ (Z A (Z A Z A 1 SA (I P A Xβ for N(A = q 1 = m 0, S A = (S A 1,, S A m, S Ak = λ d Ak U Ak ad U Ak 2 = 1. For a sufficietly large costat C 2 > 0, defie Ω t0 = {(Z, ε : x t σc 2 ((t 1m log(pm, t t 0 }, ad Ω t 0 = {(Z, ε : x t σc 2 (t 1m log(pm, t t 0 }, where t

22 As i the proof of Theorem 2.1 of Wei ad Huag (2008, (Z, ε Ω q Ã1 M 1 q, for a costat M 1 > 1. By the triagle ad Cauchy-Schwarz iequalities, ξ V A (s V A (s 2 = ε V A (s + δ V A (s V A (s 2 ε V A (s V A 2 + δ. (16 I the proof of Theorem 2.1 of Wei ad Huag (2008, it is show that P(Ω p 1+c 0 ( 2p exp 1. (17 p 1+c 0 Sice δ V A (s V A (s 2 δ 2 = C 1 q 1 2(2d+1 ad m = O( 1/(2d+1, we have for all t 0 ad sufficietly large, δ 2 C 1 q 1 2(2d+1 σc2 (t 1m log(p. (18 It follows from (16, (17 ad (18 that P(Ω 0 1. This completes the proof of part (i of Theorem 1. Before provig part (ii, we first prove part (iii of Theorem 1. By the defiitio of β ( β 1,..., β p, p p Y Z β λ 1 β j 2 Y Zβ λ 1 β j 2. (19 j=1 j=1 Let A 2 = {j : β j 2 0 or β j 2 0} ad d 2 = A 2. By part (i, d 2 = O p (q. By (19 ad the defiitio of A 2, Y Z A2 βa λ 1 β j 2 Y Z A2 β A λ 1 β j 2. (20 j A 2 j A 2 Let η = Y Zβ. Write Y Z A2 βa2 = Y Zβ Z A2 ( β A2 β A2 = η Z A2 ( β A2 β A2. 22

23 We have Y Z A2 βa2 2 2 = Z A2 ( β A2 β A η Z A2 ( β A2 β A2 + η η. We ca rewrite (20 as Z A2 ( β A2 β A η Z A2 ( β A2 β A2 λ 1 β j 2 λ 1 β j 2. (21 j A 1 j A 1 Now β j 2 β j 2 A 1 β A1 β A1 2 A 1 β A2 β A2 2. (22 j A 1 j A 1 Let ν = Z A2 ( β A2 β A2. Combiig (20, (21 ad (22 to get ν 2 2 2η ν λ 1 A1 β A2 β A2 2. (23 Let η be the projectio of η to the spa of Z A2, that is, η = Z A2 (Z A 2 Z A2 1 Z A 2 η. By the Cauchy-Schwartz iequality, 2 η ν 2 η 2 ν 2 2 η ν 2 2. (24 From (23 ad (24, we have ν η λ 1 A1 β A2 β A2 2. Let c be the smallest eigevalue of Z A 2 Z A2 /. By Lemma 3 ad part (i, c p m 1. Sice ν 2 2 c β A2 β A2 2 2 ad 2ab a 2 + b 2, It follows that c β A2 β A η (2λ 1 A c 2 c β A2 β A β A2 β A η λ2 1 A 1. (25 c 2 c 2 Let f 0 (X i = p j=1 f 0j(X ij ad f 0A (X i = j A f 0j(X ij. Write η i = Y i µ f 0 (X i + (µ Y + f 0 (X i j A 2 Z ijβ j = ε i + (µ Y + f A2 (X i f A2 (X i. 23

24 Sice µ Y 2 = O p ( 1 ad f 0j f j = O(m d, we have η ε O p (1 + O(d 2 m 2d, (26 where ε is the projectio of ε = (ε 1,..., ε to the spa of Z A2. We have Now max Z Aε 2 2 = A: A d 2 ε 2 2 = (Z A 2 Z A2 1/2 Z A 2 ε c Z A 2 ε 2 2. max Z jε 2 2 d 2 m max Z A: A d 2 1 j p,1 k m jkε 2, j A where Z jk = (ψ k (X 1j,..., ψ k (X j. By Lemma 2, max Z 1 j p,1 k m jkε 2 = m 1 max (m / 1/2 Z 1 j p,1 k m jkε 2 = O p (1m 1 log(pm. It follows that, Combiig (25, (26, ad (27, we get ε 2 2 = O p (1 d 2 log(pm c. (27 ( β A2 β A2 2 d2 log(pm ( 1 ( d2 m 2d 2 O p + O c 2 p + O c c Sice d 2 = O p (q, c p m 1 ad c p m 1, we have ( m β A2 β A O log(pm p This completes the proof of part (iii. ( m ( 1 + O p + O + O m 2d 1 + 4λ2 1 A 1. 2 c 2 ( 4m 2 λ 2 1 We ow prove part (ii. Sice f j 2 c f > 0, 1 j q, f j f j 2 = O(m d ad f j 2 f j 2 f j f j 2, we have f j 2 0.5c f for sufficietly large. By a result of de Boor (2001, see also (12 of Stoe (1986, there are positive costats c 6 ad c 7 such that 2. c 6 m 1 β 2 2 f j 2 2 c 7 m 1 β j 2 2. It follows that, β j 2 2 c 1 7 m f j c 1 7 c 2 f m. Therefore, if β j 2 0 but β j 2 = 0, the β j β j c 1 7 c 2 fm. (28 24

25 However, sice (m log(pm / 0 ad (λ 2 1m / 2, (28 cotradicts part (iii. Proof of Theorem 2. By the defiitio of f j, 1 j p, parts (i ad (ii follow from parts (i ad (ii of Theorem 1 directly. Now cosider part (iii. By the properties of splie (de Boor (2001, c 6 m 1 β j β j 2 2 f j f j 2 2 c 7 m 1 β j β j 2 2. Thus By (A3, f ( j f j 2 m log(pm ( 1 ( 1 2 = O p + O p + O + O m 2d ( 4m λ (29 f j f j 2 2 = O(m 2d. (30 Part (iii follows from (29 ad (30. I the proofs below, for ay matrix H, deote its 2-orm by H, which is equal to its largest eigevalue. This orm satisfies the iequality Hx H x for a colum vector x whose dimesio is the same as the umber of the colums of H. Deote β A1 = (β j, j A 1, βa1 = ( β j, j A 1, ad Z A1 = (Z j, j A 1. Defie C A1 = 1 Z A 1 Z A1. Let ρ 1 ad ρ 2 be the smallest ad largest eigevalues of C A1, respectively. Proof of Theorem 3. By the KKT, a ecessary ad sufficiet coditio for β is ( 2Z βj j Y Z β = λ2 w j, β β j j 2 0, j 1, 2 Z j( Y Z β 2 λ 2 w j, β j = 0, j 1. (31 Let ν = (w j βj /(2 β j, j A 1. Defie β A1 = (Z A 1 Z A1 1 (Z A 1 Y λ 2 ν. (32 If β A1 = 0 β A1, the the equatio i (31 holds for β ( β A 1, 0. Thus, sice Z β = Z A1 βa1 for this β ad {Z j, j A 1 } are liearly idepedet, β A1 = 0 β A1 β = 0 β if Z j( Y ZA1 βa1 2 λ 2 w j /2, j A 1. 25

26 This is true if β = 0 β if β j 2 β j 2 < β j 2, j A 1, Z j(y Z A1 βa1 2 λ 2 w j /2, j A 1. Therefore, P( β 0 β P ( β j β j 2 β j 2, j A 1 +P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1. have Let f 0j (X j = (f 0j (X 1j,..., f 0j (X j ad δ = j A 1 f 0j (X j Z A1 β A1. By Lemma 1, we Let H = I Z A1 (Z A 1 Z A1 1 Z A 1. By (32, 1 δ 2 = O p (qm 2d. (33 β A1 β A1 = 1 C 1 A 1 ( Z A1 (ε + δ λ 2 ν, (34 ad Y Z A1 βa1 = H ε + H δ + λ 2 Z A1 C 1 A 1 ν /. (35 Based o these two equatios, Lemma 5 below shows that P ( β j β j 2 β j 2, j A 1 0, ad Lemma 6 below shows that P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1 0. These two equatios lead to part (i of the theorem. We ow prove part (ii of Theorem 3. As i (26, for η = Y Zβ ad η 1 = Z A1 (Z A 1 Z A1 1 Z A 1 η, we have η ε O p (1 + O(qm 2d, (36 26

27 where ε 1 is the projectio of ε = (ε 1,..., ε to the spa of Z A1. We have ε = (Z A 1 Z A1 1/2 Z A 1 ε ρ 1 Z A 1 ε 2 2 = O p (1 A 1 ρ 1. (37 Now similarly to the proof of (25, we ca show that Combiig (36, (37 ad (38, we get β A1 β A η λ2 2 A 1. (38 ρ 1 2 ρ 2 1 ( 8 ( 1 ( 1 ( 4λ β A1 β A = O p + O ρ 2 p + O + O 2. 1 ρ 1 m 2d 1 2 ρ 2 1 Sice ρ 1 p m 1, the result follows. The followig lemmas are eeded i the proof of Theorem 3. Lemma 4 For ν = (w j βj /(2 β j, j A 1, uder coditio (B1, Proof of Lemma 4. Write ν 2 = O p (h 2 = O p ( (b 2 1 c b 2 r 1 + qb 1 1. ν 2 = j A 1 w 2 j = j A 1 β j 2 = j A 1 β j 2 β j 2 β j 2 β j 2 + j A 1 β j 1. Uder (B2, j A 1 βj 2 β j 2 β j 2 β j 2 Mc 2 b b 4 1 β β, ad j A 1 β j 2 qb 2 1. The claim follows. Let ρ 3 be the maximum of the largest eigevalues of 1 Z jz j, j A 0, that is, ρ 3 = max j A0 1 Z jz j 2. By Lemma 3, b 1 O(m 1/2, ρ 1 p m 1, ρ 2 p m 1 ad ρ 3 p m 1. (39 Lemma 5 Uder coditios (B1, (B2, (A3 ad (A4, P ( β j β j 2 β j 2, j A 1 0. (40 27

28 Proof of Lemma 5. Let T j be a m qm matrix with the form T j = (0 m,..., 0 m, I m, 0 m,..., 0 m, where O m is a m m matrix of zeros ad I m is a m m idetity matrix, ad I m is at the jth block. By (34, β j β j = 1 T j C 1 A 1 (Z A 1 ε + Z A 1 δ λ 2 ν. By the triagle iequality, β j β j 2 1 T j C 1 A 1 Z A 1 ε T j C 1 A 1 Z A 1 δ λ 2 T j C 1 A 1 ν 2. (41 Let C be a geeric costat idepedet of. The first term o the right-had side By (33, the secod term max 1 T j C 1 A j A 1 Z A 1 ε 2 1 ρ 1 1 Z A 1 ε 2 1 = 1/2 ρ 1 1 1/2 Z A 1 ε 2 = O p (1 1/2 ρ 1 1 m 1/2 (qm 1/2 (42 By Lemma 4, the third term max 1 T j C 1 A j A 1 Z A 1 δ 2 C 1 A Z A 1 Z A1 1/2 2 1 δ 2 1 = O p (1ρ 1 1 ρ 1/2 2 q 1/2 m d. (43 max 1 λ 2 T j C 1 A j A 1 ν 2 λ 2 ρ 1 1 ν 2 = O p (1ρ λ 2 h. (44 1 Thus (40 follows from (39, (42, (43, (44 ad coditio (B2a. Lemma 6 Uder coditios (B1, (B2, (A3 ad (A4, P ( Z j(y Z A1 βa1 2 > λ 2 w j /2, j A 1 0. (45 Proof of Lemma 6. By (35, we have ( Z j Y ZA1 βa1 = Z j H ε + Z jh δ + λ 1 Z jz A1 C 1 A 1 ν. (46 28

29 Recall s = p q is the umber of zero compoets i the model. By Lemma 2, ( E max 1/2 Z j A 1 jh ε 2 O(1 { log(s m } 1/2. (47 Sice w j = β j 1 = O p (r for j A 1 ad by (47, for the first term o the right had side of (46 we have P ( Z jh ε 2 > λ 2 w j /6, j A 1 By (33, the secod term o the right had side of (46 P ( Z jh ε 2 > Cλ 2 r, j A 1 + o(1 ( = P max 1/2 Z j A 1 jh ε 2 > C 1/2 λ 2 r + o(1 O(1 1/2 {log(s m } 1/2 Cλ 2 r + o(1. (48 max Z j A 1 jh δ 2 1/2 max 1 Z j A 1 jz j 1/2 2 H 2 δ 2 = O(1ρ 1/2 3 q 1/2 m d. (49 By Lemma 4, the third term o the right had side of (46 max λ 2 1 Z j Z A1 C 1 A j ia 1 ν 2 λ 2 max 1/2 Z j 2 1/2 Z A1 C 1/2 A 1 j A 1 2 C 1/2 A 1 2 ν 2 1 = λ 2 ρ 1/2 3 ρ 1/2 1 O p (qb 1 1. (50 Therefore, (45 follows from (39, (48, (49, (50 ad coditio (B2b. Proof of Theorem 4. The proof is similar to that of Theorem 2 ad is omitted. Refereces [1] Bach, F. R. (2007. Cosistecy of the group Lasso ad multiple kerel learig. Joural of Machie Learig Research, 9, [2] Buea, F., Tsybakov, A. ad Wegkamp, M. (2007. Sparsity oracle iequalities for the lasso. Electroic Joural of Statististic, [3] Che, J. ad Che, Z. (2008. Exteded Bayesia iformatio criteria for model selectio with large model space. Biometrika, 95, [4] Che, J. ad Che, Z. (2009. Exteded BIC for small--large-p sparse GLM. Available from stachez/cheche.pdf. 29

Properties of MLE: consistency, asymptotic normality. Fisher information.

Properties of MLE: consistency, asymptotic normality. Fisher information. Lecture 3 Properties of MLE: cosistecy, asymptotic ormality. Fisher iformatio. I this sectio we will try to uderstad why MLEs are good. Let us recall two facts from probability that we be used ofte throughout

More information

Chapter 7 Methods of Finding Estimators

Chapter 7 Methods of Finding Estimators Chapter 7 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 011 Chapter 7 Methods of Fidig Estimators Sectio 7.1 Itroductio Defiitio 7.1.1 A poit estimator is ay fuctio W( X) W( X1, X,, X ) of

More information

Asymptotic Growth of Functions

Asymptotic Growth of Functions CMPS Itroductio to Aalysis of Algorithms Fall 3 Asymptotic Growth of Fuctios We itroduce several types of asymptotic otatio which are used to compare the performace ad efficiecy of algorithms As we ll

More information

3. Covariance and Correlation

3. Covariance and Correlation Virtual Laboratories > 3. Expected Value > 1 2 3 4 5 6 3. Covariace ad Correlatio Recall that by takig the expected value of various trasformatios of a radom variable, we ca measure may iterestig characteristics

More information

7. Sample Covariance and Correlation

7. Sample Covariance and Correlation 1 of 8 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 7. Sample Covariace ad Correlatio The Bivariate Model Suppose agai that we have a basic radom experimet, ad that X ad Y

More information

I. Chi-squared Distributions

I. Chi-squared Distributions 1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios.

More information

Modified Line Search Method for Global Optimization

Modified Line Search Method for Global Optimization Modified Lie Search Method for Global Optimizatio Cria Grosa ad Ajith Abraham Ceter of Excellece for Quatifiable Quality of Service Norwegia Uiversity of Sciece ad Techology Trodheim, Norway {cria, ajith}@q2s.tu.o

More information

Department of Computer Science, University of Otago

Department of Computer Science, University of Otago Departmet of Computer Sciece, Uiversity of Otago Techical Report OUCS-2006-09 Permutatios Cotaiig May Patters Authors: M.H. Albert Departmet of Computer Sciece, Uiversity of Otago Micah Colema, Rya Fly

More information

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring No-life isurace mathematics Nils F. Haavardsso, Uiversity of Oslo ad DNB Skadeforsikrig Mai issues so far Why does isurace work? How is risk premium defied ad why is it importat? How ca claim frequecy

More information

Sequences and Series

Sequences and Series CHAPTER 9 Sequeces ad Series 9.. Covergece: Defiitio ad Examples Sequeces The purpose of this chapter is to itroduce a particular way of geeratig algorithms for fidig the values of fuctios defied by their

More information

Class Meeting # 16: The Fourier Transform on R n

Class Meeting # 16: The Fourier Transform on R n MATH 18.152 COUSE NOTES - CLASS MEETING # 16 18.152 Itroductio to PDEs, Fall 2011 Professor: Jared Speck Class Meetig # 16: The Fourier Trasform o 1. Itroductio to the Fourier Trasform Earlier i the course,

More information

Output Analysis (2, Chapters 10 &11 Law)

Output Analysis (2, Chapters 10 &11 Law) B. Maddah ENMG 6 Simulatio 05/0/07 Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should

More information

Irreducible polynomials with consecutive zero coefficients

Irreducible polynomials with consecutive zero coefficients Irreducible polyomials with cosecutive zero coefficiets Theodoulos Garefalakis Departmet of Mathematics, Uiversity of Crete, 71409 Heraklio, Greece Abstract Let q be a prime power. We cosider the problem

More information

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx SAMPLE QUESTIONS FOR FINAL EXAM REAL ANALYSIS I FALL 006 3 4 Fid the followig usig the defiitio of the Riema itegral: a 0 x + dx 3 Cosider the partitio P x 0 3, x 3 +, x 3 +,......, x 3 3 + 3 of the iterval

More information

Soving Recurrence Relations

Soving Recurrence Relations Sovig Recurrece Relatios Part 1. Homogeeous liear 2d degree relatios with costat coefficiets. Cosider the recurrece relatio ( ) T () + at ( 1) + bt ( 2) = 0 This is called a homogeeous liear 2d degree

More information

NPTEL STRUCTURAL RELIABILITY

NPTEL STRUCTURAL RELIABILITY NPTEL Course O STRUCTURAL RELIABILITY Module # 0 Lecture 1 Course Format: Web Istructor: Dr. Aruasis Chakraborty Departmet of Civil Egieerig Idia Istitute of Techology Guwahati 1. Lecture 01: Basic Statistics

More information

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n We will cosider the liear regressio model i matrix form. For simple liear regressio, meaig oe predictor, the model is i = + x i + ε i for i =,,,, This model icludes the assumptio that the ε i s are a sample

More information

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem Lecture 4: Cauchy sequeces, Bolzao-Weierstrass, ad the Squeeze theorem The purpose of this lecture is more modest tha the previous oes. It is to state certai coditios uder which we are guarateed that limits

More information

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process

An example of non-quenched convergence in the conditional central limit theorem for partial sums of a linear process A example of o-queched covergece i the coditioal cetral limit theorem for partial sums of a liear process Dalibor Volý ad Michael Woodroofe Abstract A causal liear processes X,X 0,X is costructed for which

More information

Theorems About Power Series

Theorems About Power Series Physics 6A Witer 20 Theorems About Power Series Cosider a power series, f(x) = a x, () where the a are real coefficiets ad x is a real variable. There exists a real o-egative umber R, called the radius

More information

A probabilistic proof of a binomial identity

A probabilistic proof of a binomial identity A probabilistic proof of a biomial idetity Joatho Peterso Abstract We give a elemetary probabilistic proof of a biomial idetity. The proof is obtaied by computig the probability of a certai evet i two

More information

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method Chapter 6: Variace, the law of large umbers ad the Mote-Carlo method Expected value, variace, ad Chebyshev iequality. If X is a radom variable recall that the expected value of X, E[X] is the average value

More information

1 Correlation and Regression Analysis

1 Correlation and Regression Analysis 1 Correlatio ad Regressio Aalysis I this sectio we will be ivestigatig the relatioship betwee two cotiuous variable, such as height ad weight, the cocetratio of a ijected drug ad heart rate, or the cosumptio

More information

Convexity, Inequalities, and Norms

Convexity, Inequalities, and Norms Covexity, Iequalities, ad Norms Covex Fuctios You are probably familiar with the otio of cocavity of fuctios. Give a twicedifferetiable fuctio ϕ: R R, We say that ϕ is covex (or cocave up) if ϕ (x) 0 for

More information

Module 4: Mathematical Induction

Module 4: Mathematical Induction Module 4: Mathematical Iductio Theme 1: Priciple of Mathematical Iductio Mathematical iductio is used to prove statemets about atural umbers. As studets may remember, we ca write such a statemet as a predicate

More information

1 Computing the Standard Deviation of Sample Means

1 Computing the Standard Deviation of Sample Means Computig the Stadard Deviatio of Sample Meas Quality cotrol charts are based o sample meas ot o idividual values withi a sample. A sample is a group of items, which are cosidered all together for our aalysis.

More information

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling

Taking DCOP to the Real World: Efficient Complete Solutions for Distributed Multi-Event Scheduling Taig DCOP to the Real World: Efficiet Complete Solutios for Distributed Multi-Evet Schedulig Rajiv T. Maheswara, Milid Tambe, Emma Bowrig, Joatha P. Pearce, ad Pradeep araatham Uiversity of Souther Califoria

More information

4.1 Sigma Notation and Riemann Sums

4.1 Sigma Notation and Riemann Sums 0 the itegral. Sigma Notatio ad Riema Sums Oe strategy for calculatig the area of a regio is to cut the regio ito simple shapes, calculate the area of each simple shape, ad the add these smaller areas

More information

LECTURE 13: Cross-validation

LECTURE 13: Cross-validation LECTURE 3: Cross-validatio Resampli methods Cross Validatio Bootstrap Bias ad variace estimatio with the Bootstrap Three-way data partitioi Itroductio to Patter Aalysis Ricardo Gutierrez-Osua Texas A&M

More information

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008 I ite Sequeces Dr. Philippe B. Laval Keesaw State Uiversity October 9, 2008 Abstract This had out is a itroductio to i ite sequeces. mai de itios ad presets some elemetary results. It gives the I ite Sequeces

More information

Normal Distribution.

Normal Distribution. Normal Distributio www.icrf.l Normal distributio I probability theory, the ormal or Gaussia distributio, is a cotiuous probability distributio that is ofte used as a first approimatio to describe realvalued

More information

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

, a Wishart distribution with n -1 degrees of freedom and scale matrix. UMEÅ UNIVERSITET Matematisk-statistiska istitutioe Multivariat dataaalys D MSTD79 PA TENTAMEN 004-0-9 LÖSNINGSFÖRSLAG TILL TENTAMEN I MATEMATISK STATISTIK Multivariat dataaalys D, 5 poäg.. Assume that

More information

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT

Vladimir N. Burkov, Dmitri A. Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT Keywords: project maagemet, resource allocatio, etwork plaig Vladimir N Burkov, Dmitri A Novikov MODELS AND METHODS OF MULTIPROJECTS MANAGEMENT The paper deals with the problems of resource allocatio betwee

More information

Incremental calculation of weighted mean and variance

Incremental calculation of weighted mean and variance Icremetal calculatio of weighted mea ad variace Toy Fich faf@cam.ac.uk dot@dotat.at Uiversity of Cambridge Computig Service February 009 Abstract I these otes I eplai how to derive formulae for umerically

More information

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009)

Lecture 13. Lecturer: Jonathan Kelner Scribe: Jonathan Pines (2009) 18.409 A Algorithmist s Toolkit October 27, 2009 Lecture 13 Lecturer: Joatha Keler Scribe: Joatha Pies (2009) 1 Outlie Last time, we proved the Bru-Mikowski iequality for boxes. Today we ll go over the

More information

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution Uiversity of Califoria, Los Ageles Departmet of Statistics Statistics 100B Istructor: Nicolas Christou Three importat distributios: Distributios related to the ormal distributio Chi-square (χ ) distributio.

More information

Maximum Likelihood Estimators.

Maximum Likelihood Estimators. Lecture 2 Maximum Likelihood Estimators. Matlab example. As a motivatio, let us look at oe Matlab example. Let us geerate a radom sample of size 00 from beta distributio Beta(5, 2). We will lear the defiitio

More information

Plug-in martingales for testing exchangeability on-line

Plug-in martingales for testing exchangeability on-line Plug-i martigales for testig exchageability o-lie Valetia Fedorova, Alex Gammerma, Ilia Nouretdiov, ad Vladimir Vovk Computer Learig Research Cetre Royal Holloway, Uiversity of Lodo, UK {valetia,ilia,alex,vovk}@cs.rhul.ac.uk

More information

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas: Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries

More information

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find 1.8 Approximatig Area uder a curve with rectagles 1.6 To fid the area uder a curve we approximate the area usig rectagles ad the use limits to fid 1.4 the area. Example 1 Suppose we wat to estimate 1.

More information

Sequences II. Chapter 3. 3.1 Convergent Sequences

Sequences II. Chapter 3. 3.1 Convergent Sequences Chapter 3 Sequeces II 3. Coverget Sequeces Plot a graph of the sequece a ) = 2, 3 2, 4 3, 5 + 4,...,,... To what limit do you thik this sequece teds? What ca you say about the sequece a )? For ǫ = 0.,

More information

Confidence Intervals for One Mean

Confidence Intervals for One Mean Chapter 420 Cofidece Itervals for Oe Mea Itroductio This routie calculates the sample size ecessary to achieve a specified distace from the mea to the cofidece limit(s) at a stated cofidece level for a

More information

PSYCHOLOGICAL STATISTICS

PSYCHOLOGICAL STATISTICS UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION B Sc. Cousellig Psychology (0 Adm.) IV SEMESTER COMPLEMENTARY COURSE PSYCHOLOGICAL STATISTICS QUESTION BANK. Iferetial statistics is the brach of statistics

More information

Approximating the Sum of a Convergent Series

Approximating the Sum of a Convergent Series Approximatig the Sum of a Coverget Series Larry Riddle Ages Scott College Decatur, GA 30030 lriddle@agesscott.edu The BC Calculus Course Descriptio metios how techology ca be used to explore covergece

More information

INFINITE SERIES KEITH CONRAD

INFINITE SERIES KEITH CONRAD INFINITE SERIES KEITH CONRAD. Itroductio The two basic cocepts of calculus, differetiatio ad itegratio, are defied i terms of limits (Newto quotiets ad Riema sums). I additio to these is a third fudametal

More information

Research Article Sign Data Derivative Recovery

Research Article Sign Data Derivative Recovery Iteratioal Scholarly Research Network ISRN Applied Mathematics Volume 0, Article ID 63070, 7 pages doi:0.540/0/63070 Research Article Sig Data Derivative Recovery L. M. Housto, G. A. Glass, ad A. D. Dymikov

More information

High-dimensional support union recovery in multivariate regression

High-dimensional support union recovery in multivariate regression High-dimesioal support uio recovery i multivariate regressio Guillaume Oboziski Departmet of Statistics UC Berkeley gobo@stat.berkeley.edu Marti J. Waiwright Departmet of Statistics Dept. of Electrical

More information

Overview of some probability distributions.

Overview of some probability distributions. Lecture Overview of some probability distributios. I this lecture we will review several commo distributios that will be used ofte throughtout the class. Each distributio is usually described by its probability

More information

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical ad Mathematical Scieces 2015, 1, p. 15 19 M a t h e m a t i c s AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM A. G. GULYAN Chair of Actuarial Mathematics

More information

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5

0.7 0.6 0.2 0 0 96 96.5 97 97.5 98 98.5 99 99.5 100 100.5 96.5 97 97.5 98 98.5 99 99.5 100 100.5 Sectio 13 Kolmogorov-Smirov test. Suppose that we have a i.i.d. sample X 1,..., X with some ukow distributio P ad we would like to test the hypothesis that P is equal to a particular distributio P 0, i.e.

More information

Hypothesis testing. Null and alternative hypotheses

Hypothesis testing. Null and alternative hypotheses Hypothesis testig Aother importat use of samplig distributios is to test hypotheses about populatio parameters, e.g. mea, proportio, regressio coefficiets, etc. For example, it is possible to stipulate

More information

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series 8 Fourier Series Our aim is to show that uder reasoable assumptios a give -periodic fuctio f ca be represeted as coverget series f(x) = a + (a cos x + b si x). (8.) By defiitio, the covergece of the series

More information

HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY

HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH NONCONVEXITY The Aals of Statistics 2012, Vol. 40, No. 3, 1637 1664 DOI: 10.1214/12-AOS1018 Istitute of Mathematical Statistics, 2012 HIGH-DIMENSIONAL REGRESSION WITH NOISY AND MISSING DATA: PROVABLE GUARANTEES WITH

More information

Systems Design Project: Indoor Location of Wireless Devices

Systems Design Project: Indoor Location of Wireless Devices Systems Desig Project: Idoor Locatio of Wireless Devices Prepared By: Bria Murphy Seior Systems Sciece ad Egieerig Washigto Uiversity i St. Louis Phoe: (805) 698-5295 Email: bcm1@cec.wustl.edu Supervised

More information

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Case Study. Normal and t Distributions. Density Plot. Normal Distributions Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca

More information

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights Ceter, Spread, ad Shape i Iferece: Claims, Caveats, ad Isights Dr. Nacy Pfeig (Uiversity of Pittsburgh) AMATYC November 2008 Prelimiary Activities 1. I would like to produce a iterval estimate for the

More information

MARTINGALES AND A BASIC APPLICATION

MARTINGALES AND A BASIC APPLICATION MARTINGALES AND A BASIC APPLICATION TURNER SMITH Abstract. This paper will develop the measure-theoretic approach to probability i order to preset the defiitio of martigales. From there we will apply this

More information

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design

A Combined Continuous/Binary Genetic Algorithm for Microstrip Antenna Design A Combied Cotiuous/Biary Geetic Algorithm for Microstrip Atea Desig Rady L. Haupt The Pesylvaia State Uiversity Applied Research Laboratory P. O. Box 30 State College, PA 16804-0030 haupt@ieee.org Abstract:

More information

AQA STATISTICS 1 REVISION NOTES

AQA STATISTICS 1 REVISION NOTES AQA STATISTICS 1 REVISION NOTES AVERAGES AND MEASURES OF SPREAD www.mathsbox.org.uk Mode : the most commo or most popular data value the oly average that ca be used for qualitative data ot suitable if

More information

THE ABRACADABRA PROBLEM

THE ABRACADABRA PROBLEM THE ABRACADABRA PROBLEM FRANCESCO CARAVENNA Abstract. We preset a detailed solutio of Exercise E0.6 i [Wil9]: i a radom sequece of letters, draw idepedetly ad uiformly from the Eglish alphabet, the expected

More information

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis

Key Ideas Section 8-1: Overview hypothesis testing Hypothesis Hypothesis Test Section 8-2: Basics of Hypothesis Testing Null Hypothesis Chapter 8 Key Ideas Hypothesis (Null ad Alterative), Hypothesis Test, Test Statistic, P-value Type I Error, Type II Error, Sigificace Level, Power Sectio 8-1: Overview Cofidece Itervals (Chapter 7) are

More information

The second difference is the sequence of differences of the first difference sequence, 2

The second difference is the sequence of differences of the first difference sequence, 2 Differece Equatios I differetial equatios, you look for a fuctio that satisfies ad equatio ivolvig derivatives. I differece equatios, istead of a fuctio of a cotiuous variable (such as time), we look for

More information

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1

SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 The Aals of Statistics 2011, Vol. 39, No. 1, 1 47 DOI: 10.1214/09-AOS776 Istitute of Mathematical Statistics, 2011 SUPPORT UNION RECOVERY IN HIGH-DIMENSIONAL MULTIVARIATE REGRESSION 1 BY GUILLAUME OBOZINSKI,

More information

arxiv:1506.03481v1 [stat.me] 10 Jun 2015

arxiv:1506.03481v1 [stat.me] 10 Jun 2015 BEHAVIOUR OF ABC FOR BIG DATA By Wetao Li ad Paul Fearhead Lacaster Uiversity arxiv:1506.03481v1 [stat.me] 10 Ju 2015 May statistical applicatios ivolve models that it is difficult to evaluate the likelihood,

More information

The Euler Totient, the Möbius and the Divisor Functions

The Euler Totient, the Möbius and the Divisor Functions The Euler Totiet, the Möbius ad the Divisor Fuctios Rosica Dieva July 29, 2005 Mout Holyoke College South Hadley, MA 01075 1 Ackowledgemets This work was supported by the Mout Holyoke College fellowship

More information

Joint Probability Distributions and Random Samples

Joint Probability Distributions and Random Samples STAT5 Sprig 204 Lecture Notes Chapter 5 February, 204 Joit Probability Distributios ad Radom Samples 5. Joitly Distributed Radom Variables Chapter Overview Joitly distributed rv Joit mass fuctio, margial

More information

This is arithmetic average of the x values and is usually referred to simply as the mean.

This is arithmetic average of the x values and is usually referred to simply as the mean. prepared by Dr. Adre Lehre, Dept. of Geology, Humboldt State Uiversity http://www.humboldt.edu/~geodept/geology51/51_hadouts/statistical_aalysis.pdf STATISTICAL ANALYSIS OF HYDROLOGIC DATA This hadout

More information

Lecture 5: Span, linear independence, bases, and dimension

Lecture 5: Span, linear independence, bases, and dimension Lecture 5: Spa, liear idepedece, bases, ad dimesio Travis Schedler Thurs, Sep 23, 2010 (versio: 9/21 9:55 PM) 1 Motivatio Motivatio To uderstad what it meas that R has dimesio oe, R 2 dimesio 2, etc.;

More information

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix

FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. 1. Powers of a matrix FIBONACCI NUMBERS: AN APPLICATION OF LINEAR ALGEBRA. Powers of a matrix We begi with a propositio which illustrates the usefuless of the diagoalizatio. Recall that a square matrix A is diogaalizable if

More information

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites

Gregory Carey, 1998 Linear Transformations & Composites - 1. Linear Transformations and Linear Composites Gregory Carey, 1998 Liear Trasformatios & Composites - 1 Liear Trasformatios ad Liear Composites I Liear Trasformatios of Variables Meas ad Stadard Deviatios of Liear Trasformatios A liear trasformatio

More information

{{1}, {2, 4}, {3}} {{1, 3, 4}, {2}} {{1}, {2}, {3, 4}} 5.4 Stirling Numbers

{{1}, {2, 4}, {3}} {{1, 3, 4}, {2}} {{1}, {2}, {3, 4}} 5.4 Stirling Numbers . Stirlig Numbers Whe coutig various types of fuctios from., we quicly discovered that eumeratig the umber of oto fuctios was a difficult problem. For a domai of five elemets ad a rage of four elemets,

More information

Estimating Probability Distributions by Observing Betting Practices

Estimating Probability Distributions by Observing Betting Practices 5th Iteratioal Symposium o Imprecise Probability: Theories ad Applicatios, Prague, Czech Republic, 007 Estimatig Probability Distributios by Observig Bettig Practices Dr C Lych Natioal Uiversity of Irelad,

More information

CHAPTER 3 DIGITAL CODING OF SIGNALS

CHAPTER 3 DIGITAL CODING OF SIGNALS CHAPTER 3 DIGITAL CODING OF SIGNALS Computers are ofte used to automate the recordig of measuremets. The trasducers ad sigal coditioig circuits produce a voltage sigal that is proportioal to a quatity

More information

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN

Analyzing Longitudinal Data from Complex Surveys Using SUDAAN Aalyzig Logitudial Data from Complex Surveys Usig SUDAAN Darryl Creel Statistics ad Epidemiology, RTI Iteratioal, 312 Trotter Farm Drive, Rockville, MD, 20850 Abstract SUDAAN: Software for the Statistical

More information

Statistical inference: example 1. Inferential Statistics

Statistical inference: example 1. Inferential Statistics Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either

More information

5 Boolean Decision Trees (February 11)

5 Boolean Decision Trees (February 11) 5 Boolea Decisio Trees (February 11) 5.1 Graph Coectivity Suppose we are give a udirected graph G, represeted as a boolea adjacecy matrix = (a ij ), where a ij = 1 if ad oly if vertices i ad j are coected

More information

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork

Solutions to Selected Problems In: Pattern Classification by Duda, Hart, Stork Solutios to Selected Problems I: Patter Classificatio by Duda, Hart, Stork Joh L. Weatherwax February 4, 008 Problem Solutios Chapter Bayesia Decisio Theory Problem radomized rules Part a: Let Rx be the

More information

x(x 1)(x 2)... (x k + 1) = [x] k n+m 1

x(x 1)(x 2)... (x k + 1) = [x] k n+m 1 1 Coutig mappigs For every real x ad positive iteger k, let [x] k deote the fallig factorial ad x(x 1)(x 2)... (x k + 1) ( ) x = [x] k k k!, ( ) k = 1. 0 I the sequel, X = {x 1,..., x m }, Y = {y 1,...,

More information

1 Introduction to reducing variance in Monte Carlo simulations

1 Introduction to reducing variance in Monte Carlo simulations Copyright c 007 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a uow mea µ = E(X) of a distributio by

More information

Data Analysis and Statistical Behaviors of Stock Market Fluctuations

Data Analysis and Statistical Behaviors of Stock Market Fluctuations 44 JOURNAL OF COMPUTERS, VOL. 3, NO. 0, OCTOBER 2008 Data Aalysis ad Statistical Behaviors of Stock Market Fluctuatios Ju Wag Departmet of Mathematics, Beijig Jiaotog Uiversity, Beijig 00044, Chia Email:

More information

Section 11.3: The Integral Test

Section 11.3: The Integral Test Sectio.3: The Itegral Test Most of the series we have looked at have either diverged or have coverged ad we have bee able to fid what they coverge to. I geeral however, the problem is much more difficult

More information

INVESTMENT PERFORMANCE COUNCIL (IPC)

INVESTMENT PERFORMANCE COUNCIL (IPC) INVESTMENT PEFOMANCE COUNCIL (IPC) INVITATION TO COMMENT: Global Ivestmet Performace Stadards (GIPS ) Guidace Statemet o Calculatio Methodology The Associatio for Ivestmet Maagemet ad esearch (AIM) seeks

More information

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length

A Faster Clause-Shortening Algorithm for SAT with No Restriction on Clause Length Joural o Satisfiability, Boolea Modelig ad Computatio 1 2005) 49-60 A Faster Clause-Shorteig Algorithm for SAT with No Restrictio o Clause Legth Evgey Datsi Alexader Wolpert Departmet of Computer Sciece

More information

Chapter 9: Correlation and Regression: Solutions

Chapter 9: Correlation and Regression: Solutions Chapter 9: Correlatio ad Regressio: Solutios 9.1 Correlatio I this sectio, we aim to aswer the questio: Is there a relatioship betwee A ad B? Is there a relatioship betwee the umber of emploee traiig hours

More information

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE

SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE SUPPLEMENTARY MATERIAL TO GENERAL NON-EXACT ORACLE INEQUALITIES FOR CLASSES WITH A SUBEXPONENTIAL ENVELOPE By Guillaume Lecué CNRS, LAMA, Mare-la-vallée, 77454 Frace ad By Shahar Medelso Departmet of Mathematics,

More information

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable Week 3 Coditioal probabilities, Bayes formula, WEEK 3 page 1 Expected value of a radom variable We recall our discussio of 5 card poker hads. Example 13 : a) What is the probability of evet A that a 5

More information

Nr. 2. Interpolation of Discount Factors. Heinz Cremers Willi Schwarz. Mai 1996

Nr. 2. Interpolation of Discount Factors. Heinz Cremers Willi Schwarz. Mai 1996 Nr 2 Iterpolatio of Discout Factors Heiz Cremers Willi Schwarz Mai 1996 Autore: Herausgeber: Prof Dr Heiz Cremers Quatitative Methode ud Spezielle Bakbetriebslehre Hochschule für Bakwirtschaft Dr Willi

More information

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations CS3A Hadout 3 Witer 00 February, 00 Solvig Recurrece Relatios Itroductio A wide variety of recurrece problems occur i models. Some of these recurrece relatios ca be solved usig iteratio or some other ad

More information

Estimating the Mean and Variance of a Normal Distribution

Estimating the Mean and Variance of a Normal Distribution Estimatig the Mea ad Variace of a Normal Distributio Learig Objectives After completig this module, the studet will be able to eplai the value of repeatig eperimets eplai the role of the law of large umbers

More information

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE 6.44. The absolute value of the complex number z a bi is 0_0605.qxd /5/05 0:45 AM Page 470 470 Chapter 6 Additioal Topics i Trigoometry 6.5 Trigoometric Form of a Complex Number What you should lear Plot complex umbers i the complex plae ad fid absolute values

More information

Chapter 7: Confidence Interval and Sample Size

Chapter 7: Confidence Interval and Sample Size Chapter 7: Cofidece Iterval ad Sample Size Learig Objectives Upo successful completio of Chapter 7, you will be able to: Fid the cofidece iterval for the mea, proportio, ad variace. Determie the miimum

More information

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13 EECS 70 Discrete Mathematics ad Probability Theory Sprig 2014 Aat Sahai Note 13 Itroductio At this poit, we have see eough examples that it is worth just takig stock of our model of probability ad may

More information

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles The followig eample will help us uderstad The Samplig Distributio of the Mea Review: The populatio is the etire collectio of all idividuals or objects of iterest The sample is the portio of the populatio

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ whe the populatio stadard deviatio is kow ad populatio distributio is ormal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses about

More information

Iran. J. Chem. Chem. Eng. Vol. 26, No.1, 2007. Sensitivity Analysis of Water Flooding Optimization by Dynamic Optimization

Iran. J. Chem. Chem. Eng. Vol. 26, No.1, 2007. Sensitivity Analysis of Water Flooding Optimization by Dynamic Optimization Ira. J. Chem. Chem. Eg. Vol. 6, No., 007 Sesitivity Aalysis of Water Floodig Optimizatio by Dyamic Optimizatio Gharesheiklou, Ali Asghar* + ; Mousavi-Dehghai, Sayed Ali Research Istitute of Petroleum Idustry

More information

Factors of sums of powers of binomial coefficients

Factors of sums of powers of binomial coefficients ACTA ARITHMETICA LXXXVI.1 (1998) Factors of sums of powers of biomial coefficiets by Neil J. Cali (Clemso, S.C.) Dedicated to the memory of Paul Erdős 1. Itroductio. It is well ow that if ( ) a f,a = the

More information

Unit 20 Hypotheses Testing

Unit 20 Hypotheses Testing Uit 2 Hypotheses Testig Objectives: To uderstad how to formulate a ull hypothesis ad a alterative hypothesis about a populatio proportio, ad how to choose a sigificace level To uderstad how to collect

More information

Present Values, Investment Returns and Discount Rates

Present Values, Investment Returns and Discount Rates Preset Values, Ivestmet Returs ad Discout Rates Dimitry Midli, ASA, MAAA, PhD Presidet CDI Advisors LLC dmidli@cdiadvisors.com May 2, 203 Copyright 20, CDI Advisors LLC The cocept of preset value lies

More information

THE HEIGHT OF q-binary SEARCH TREES

THE HEIGHT OF q-binary SEARCH TREES THE HEIGHT OF q-binary SEARCH TREES MICHAEL DRMOTA AND HELMUT PRODINGER Abstract. q biary search trees are obtaied from words, equipped with the geometric distributio istead of permutatios. The average

More information

SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR. By Peter J. Bickel, Ya acov Ritov and Alexandre B. Tsybakov

SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR. By Peter J. Bickel, Ya acov Ritov and Alexandre B. Tsybakov Submitted to the Aals of Statistics SIMULTANEOUS ANALYSIS OF LASSO AND DANTZIG SELECTOR By Peter J. Bickel, Ya acov Ritov ad Alexadre B. Tsybakov We show that, uder a sparsity sceario, the Lasso estimator

More information