# Data analysis in supersaturated designs

Save this PDF as:

Size: px
Start display at page:

## Transcription

2 36 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) Westfall et al., 998). In this paper, we introduce an approach from the viewpoint of frequentist analysis. Fan and Li (200) proposed a class of variable selection procedures via nonconcave penalized likelihood. They showed that with the proper choice of penalty function and regularization parameter, their approaches possess an oracle property. Namely, the true coecients that are zero are automatically estimated as zero, and the other coecients are estimated as if the true submodel were known in advance. However, their approach cannot be directly applied for analyzing SSD because regularity conditions imposed in their paper require the design matrix to be full rank. This cannot be satised by an SSD. In this paper, we extend the nonconcave penalized likelihood approaches to least squares, namely nonconvex penalized least squares, and focus on the situation in which the design matrix is not full rank. Theoretic properties of the proposed approach are investigated, and empirical comparisons via Monte Carlo simulation are conducted. This paper is organized as follows. In Section 2, we briey discuss nonconvex penalized least squares and introduce a variable selection procedure for SSD. Root n consistency of the resulting estimator via penalized least squares is established. We also show that the introduced procedure possesses an oracle property. An iterative ridge regression algorithm is employed to nd the solution of the penalized least squares. A choice of initial value of unknown coecients for the iterative ridge regression is suggested. In Section 3, some empirical comparisons via Monte Carlo simulation are conducted. A real data example is used for illustration. Section 4 gives the proof of the main theorem. Some conclusions are given in Section Variable selection for screening active eects 2.. Preliminary Assume that observations Y ;:::;Y n are independent samples from the linear regression model Y i = xi T + i (2.) with E( i ) = 0 and var( i )= 2, where x i is the vector of input variables. Following Fan and Li (200), a form of penalized least squares is dened as Q() n d (Y i xi T ) 2 + p n ( j ); 2n (2.2) i= j= where p n ( ) is a penalty function, and n is a tuning parameter, which can be chosen by a data-driven approach, such as cross-validation (CV) and generalized cross-validation (GCV, Craven and Wahba, 979). Many variable selection criteria are closely related to this penalized least squares. Take the penalty function to be the entropy penalty, namely, p n ( )= 2 2 ni( 0), where I( ) is an indicator function. Note that the dimension or the size of a model equals the number of nonzero regression coecients in the model. This actually equals j I( j 0). In other words, the penalized least squares (2.2) with the entropy penalty can be rewritten as n (Y i xi T ) 2 + 2n 2 2 n M ; (2.3) i=

3 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) where M = j I( j 0), the size of the underlying candidate model. Hence, many popular variable selection criteria can be derived from the penalized least squares (2.3) by taking dierent values of n. For instance, the AIC and BIC correspond to n = 2(= n) and log n(= n), respectively, although these two criteria were motivated from dierent principles. The entropy penalty is not continuous, and can be improved by its smooth version, namely a hard thresholding penalty function, p ( )= 2 ( ) 2 I( ); proposed by Fan (997). Other penalty functions have been used in the literature. The L 2 penalty p n ( )=2 n 2 yields a ridge regression, and the L penalty p n ( )= n results in LASSO (Tibshirani, 996). More generally, the L q penalty leads to a bridge regression (see Frank and Friedman, 993; Fu, 998; Knight and Fu, 2000). All aforementioned penalties do not satisfy the conditions for desired properties in terms of continuity, sparsity and unbiasedness, advocated by Fan and Li (200). They suggested using the smoothly clipped absolute deviation (SCAD) penalty, proposed by Fan (997). The rst-order derivative of SCAD is dened by { p ()= I( 6 )+ (a ) } + (a ) I( ) for 0 with a =3:7, and p (0) = 0. For simplicity of presentation, we will use SCAD for all procedures using the SCAD penalty. As recommended by Fan and Li, we employ penalized least squares with the SCAD penalty to identify the sparse active eects in the analysis stage of SSD in this paper Asymptotic properties When (x i ;Y i );i=;:::;n, are independent and identically distributed, Fan and Li have established an oracle property for their penalized likelihood estimator. In what follows, we will show that the oracle property still holds for the penalized least-squares estimator when the predictor variable x is a xed design even if design matrix is singular. We next introduce some related notations. Denote by 0 the true value of, and let 0 =( 0 ;:::; d0 ) T =( T 0 ; T 20 )T. Without loss of generality, assume that 20 = 0, and all components of 0 are not equal to 0. Denote by s the number of nonzero components of. Let X consist of the rst s columns of X, the design matrix of the linear regression model (2.), and x k consist of the rst s components of x k ;k=;:::;n. Dene V = lim n (=n)x T X and let V consist of the rst s columns and rows of V. To establish the oracle property for the penalized least-squares estimate, we need the following two conditions: (C) max 6k6n x T k (XT X ) x k 0; as n. (C2) V is nite and V 0. Conditions (C) and (C2) guarantee that the asymptotic normality holds for the least-squares estimator of ˆ. We show in Theorem that the penalized least-squares estimate is root n consistent and possesses the oracle property for the penalized least-squares estimator with the SCAD penalty.

4 38 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) Theorem. Consider model (2.) and suppose that the random errors are independent and identically distributed with zero mean and nite positive variance 2. Suppose that conditions (C) and (C2) hold. For the SCAD penalty; if n 0 and n n ; then (a) (Root n consistency). With probability tendingto one; there exists a local minimizer ˆ of Q(); dened in (2.2); such that ˆ is a root n consistent estimator of ; (b) (Oracle property). With probability tendingto one; the root n consistent estimator in Part (a) satises ˆ 2 = 0 and n( ˆ 0 ) N(0; 2 V ): The proof of Theorem is given in Section 4. From Theorem, with proper rate of n, the SCAD results in a root n consistency estimator. Moreover, the resulting estimator correctly identies the inactive eects and estimates the active eects as if we knew the true submodel in advance Iterative ridge regression Local quadratic approximation Note that the SCAD penalty function is singular at the origin and may not have the second derivative at some points. In order to apply the Newton Raphson algorithm to the penalized least squares, Fan and Li (200) locally approximate the SCAD penalty function by a quadratic function as follows. Given an initial value (0) that is close to the true value of, when (0) j is not very close to 0, the penalty p ( j ) can be locally approximated by the quadratic function as [p ( j )] = p ( j ) sgn( j ) {p ( (0) j )= (0) j } j and set ˆj =0 if (0) j is very close to 0. With the local quadratic approximation, the solution for the penalized least squares can be found by iteratively computing the following ridge regression with an initial value (0) : where () = {X T X + n ( (0) )} X T y; ( (0) ) = diag{p ( (0) )= (0) ;:::;p ( (0) d )= (0) d }: This can be easily implemented in many statistical packages Initial value The local quadratic approximation requires a good initial value which is close to the true value 0. When the design matrix is full rank, the least-squares estimate can serve as the initial value of. When the sample size is relatively large, the least-squares estimate possesses root n consistency, and hence it is very close to the true value of. In the SSD settings, however, the number of experiments may be less than the number of potential candidates, and therefore the design matrix is not full rank. In fact, the regression coecients are not identiable without further assumptions, for example, the eect sparsity assumption. Here we use stepwise variable selection to nd an initial

5 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) value of. In other words, we rst apply stepwise variable selection to the full model with small thresholding values (i.e. large value of signicance level ) such that all active factors are included in the selected model. Of course, some insignicant factors may still stay in the resulting model at this step. 3. Simulation study and example In this section we compare the performance of the SCAD and stepwise variable selection procedure for analyzing SSD. All simulations are conducted using MATLAB codes. GCV (Craven and Wahba, 979) was used for determining the regularization parameter in the SCAD. In the following examples, the level of signicance is set to be 0. for both F-enter and F-remove in the stepwise variable selection procedure for choosing an initial value for the SCAD. The comparison is made in terms of the ability of identifying true model and the size of resulting model. Example. Given the design matrix constructed in Lin (993); we simulated 000 data sets from the linear model Y = x T + ; where the random error is N(0; ). We generate data from the following three models: Model I: =8, 2 = 5 and all other components of are equal to 0. Model II: =0; 2 =9, 3 = 2 and all other components of are equal to 0. Model III: = 20; 3 =2; 5 =0; 7 =5; 6 = 2 and all other components of are equal to 0. In Model I, there are two large active eects; while in Models II and III, there are some large eects, some moderate eects and a small eect. In our simulations, we also employ stepwise regression with three dierent criteria. One selects signicant eects by setting the level of signicance to be 0.05, the other two select a subset with the best AIC and BIC score in the stepwise regression, respectively. Table summarizes the simulation results. From Table, it is clear that SCAD outperforms all three stepwise regression methods in terms of rate of identifying true models and size of selected models. Example 2 (Williams Rubber Experiment). The proposed procedure is applied to the SSD constructed by Lin (993). Stepwise variable selection selects the following factors: X 5 ; X 2 ; X 20 ; X 4 ;X 0 ;X ;X 7 ;X ;X 4 ;X 7 and X 22. Then the SCAD procedure is applied. The selected tuning parameter is ˆ =6:5673. The nal model selected by the SCAD identies X 4 ;X 2 ;X 5 ;X 20 as the active eects. The result is consistent with the conclusion of Williams (968). 4. Proofs To prove Theorem, we prove the following two lemmas rst.

6 40 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) Table Percent of 000 simulations in Example Model Method Rate of the true model being identied Avg. size of tted model Median Mean I II III True model: Y =8x +5x 2 + Stepwise (p = 0:05) 27.00% Stepwise (AIC) 2.20% Stepwise (BIC).0% SCAD 82.70% True model: Y =0x +9x 2 +2x 3 + Stepwise (p = 0:05) 32.30% Stepwise (AIC) 3.00% Stepwise (BIC) 3.50% SCAD 74.70% True model: Y = 20x +2x 3 +0x 5 +5x 7 +2x 6 + Stepwise (p = 0:05) 38.90% Stepwise (AIC) 7.00% Stepwise (BIC) 24.70% SCAD 7.90% Lemma. Under the conditions of Theorem ; there exists a large constant C such that { } P inf Q( 0 + n =2 u) Q( 0 ) ; (4.) u =C where u =(u T ; 0) and the dimension of u is the same as that of 0. Proof. Dene D n (u) Q( 0 + n u) Q( 0 ): Note that p n (0) = 0 and 20 =0; D n (u) n { y X ( 2n 0 + u ) 2 y X 0 2 } + i= s {p n ( j0 + n u j ) p n ( j0 )}; (4.2) j= where y =(Y ;:::;Y n ) T. The rst term in (4.2) can be simplied ( ) n 2 n u T n XT X u n =2 u T x i (Y i xi T 0 ): (4.3) i=

7 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) By the assumption of (C2); the rst term in (4.3) equals n 2 ut V u { + o()}. Using R = E(R)+ O P { var(r)} for any random variable with nite second moment; it follows that the second term in (4.3) equals to n u T V u O P (). Note that V is nite and positive denite. By choosing a suciently large C; the rst term will dominate the second term; uniformly in u = C. By Taylor s expansion, the second term on the right-hand side of (4.2) becomes s j= [n =2 p n ( j0 ) sgn( j0 )u j + n p n ( j0 )u 2 j ( + o())] which is bounded by sn =2 a n u + n b n u 2 ; where a n = max{ p n ( j0 ) : j0 0} and b n = max{ p n ( j0 ) : j0 0}: For the SCAD penalty, if n 0, then a n = 0 and b n = 0 when n is large enough. Therefore, the second term in (4.2) is dominated by the rst term of (4.3). Hence, by choosing suciently large C, (4.) holds. This completes the proof. Lemma 2. Under the conditions of Theorem ; for any given satisfyingthat 0 = O P (n =2 ) and any constant C; the followingequation holds with probability tendingto one; {( )} {( )} Q = min Q : 0 2 6Cn =2 2 Proof. It is sucient to show that; with probability tending to as n ; for any satisfying 0 =O P (n =2 ) and for some small n = Cn =2 and j = s 0 for 0 j n j 0 for n j 0: (4.5) By some straightforward computations; it follows j n xt (j)(y X 0 )+ n xt (j)x( 0 )+p n ( j ) sgn( j ); where x (j) is the jth column of X. Since V is nite, var{(=n)x(j) T (y XT 0 )} =O(n ), and it follows that n xt (j)(y X 0 )=O P (n =2 ):

8 42 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) By condition (C2) n xt (j)x = V j ( + o()); where V j is the jth column of V. By the assumption that 0 =O P (n =2 ), j = n { n p n ( j ) sgn( j )+O P (n =2 = n )}: Since lim inf 0 + n p n ()= and n =2 = n 0asn, the sign of the derivative is completely determined by that of j. Hence, (4.4) and (4.5) follows. This completes the proof. Proof of Theorem.. To show Part (a); it suces to show that there exists a large constant C such that { } P inf Q( 0 + n =2 u) Q( 0 ) : (4.6) u =C This implies that with probability tending to one there exists a local minimum in the ball { 0 + n =2 u : u 6 C}: Hence; there exists a local minimizer ˆ such that ˆ 0 =O P (n =2 ). Since Q( 0 + n =2 u) Q( 0 ) [ {( 0 + n =2 u = Q 20 + n =2 u 2 )} Q {( 0 + n =2 u 20 )}] [ + Q {( 0 + n =2 u 20 )} ] Q( 0 ) : By Lemma 2, with probability tending to the rst term is positive as 20 = 0. By Lemma, (4.6) holds. This completes the proof of Part (a). Now we show Part (b). It follows by Lemma 2 that with probability tending to one, ˆ 2 = 0. It can be easily shown that there exists a ˆ in Part (a) that is a root n consistent local minimizer of {( )} Q ; 0 regarded as a function of, and satisfying the j ( ) ˆ = 0 for j =;:::;s: = 0 Note that ˆ is a consistent estimator and 20 ˆ) j n xt (j)(y X 0 )+ n xt (j)x () ( ˆ 0 ) p n ( ˆ j ) = n xt (j)(y X 0 )+V T (j)( ˆ 0 )(+o P ()) (p n ( 0 j ) sgn( j0 )+{p n ( j0 )+o P ()}( ˆ j j0 ));

9 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) where X () consists of the rst s columns of X and V (j) is the jth-column of V. It follows by Slutsky s Theorem and the Hajek Sidak Central Limit Theorem (CLT) that n(v + ){ ˆ 0 +(V + ) b} N(0; 2 V ) in distribution, where = diag{p n ( 0 );:::;p n ( s0 )}; b =(p n ( 0 ) sgn( 0 );:::;p n ( s0 ) sgn( s0 )) T : Note that for the SCAD penalty, = 0 and b = 0 as n 0. Thus, n( ˆ 0 ) N(0; 2 V ) in distribution. This completes the proof of Part (b). 5. Conclusion While the construction of SSDs has been paid increasing attention, the analysis of SSDs deserves special attention. In this paper, we proposed a variable selection approach for analyzing experiments when the full model contains many potential candidate eects. It has been shown that the proposed approach possesses an oracle property. The proposed approach has been empirically tested. A real data example was used to illustrate the eectiveness of the proposed approach. Acknowledgements The authors thank the referee and Dr. H. McGrath for their valuable comments and suggestions which led to a great improvement in the presentation of this paper. Li s research was supported by an NSF Grant DMS References Craven, P., Wahba, G., 979. Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation Numer. Math. 3, Fan, J., 997. Comments on Wavelets in statistics: a review by A. Antoniadis. J. Italian Statist. Assoc. 6, Fan, J., Li, R., 200. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assos. 96, Fang, K.T., Lin, D.K.J., Ma, C.X., On the construction of multi-level supersaturated designs. J. Statist. Plan. Inference 86, Frank, I.E., Friedman, J.H., 993. A statistical view of some chemometrics regression tools. Technometrics 35, Fu, W.J., 998. Penalized regression: the bridge versus the LASSO. J. Comput. Graphical Statist. 7, Knight, K., Fu, W., Asymptotic for Lasso-type estimators. Ann. Statist. 28, Lin, D.K.J., 99. Systematic supersaturated designs. Working Paper, No. 264, Department of Statistics, University of Tennessee. Lin, D.K.J., 993. A new class of supersaturated designs. Technometrics 35, Lin, D.K.J., 995. Generating system supersaturated designs. Technometrics 37,

10 44 R. Li, D.K.J. Lin / Statistics & Probability Letters 59 (2002) Lin, D.K.J., Supersaturated design: theory and application. In: Park, S.H., Vining, G.G. (Eds.), Statistical Process Monitoring and Optimization. Marcel Dekker, New York (Chapter 8). Tibshirani, R.J., 996. Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. B 58, Westfall, P.H., Young, S.S., Lin, D.K.J., 998. Forward selection error control in analysis of supersaturated designs. Statist. Sinica 8, 0 7. Williams, K.R., 968. Designed experiments. Rubber Age 00, 65 7.

### Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

### The degrees of freedom of the Lasso in underdetermined linear regression models

The degrees of freedom of the Lasso in underdetermined linear regression models C. Dossal (1), M. Kachour (2), J. Fadili (2), G. Peyré (3), C. Chesneau (4) (1) IMB, Université Bordeaux 1 (2) GREYC, ENSICAEN

### On closed-form solutions of a resource allocation problem in parallel funding of R&D projects

Operations Research Letters 27 (2000) 229 234 www.elsevier.com/locate/dsw On closed-form solutions of a resource allocation problem in parallel funding of R&D proects Ulku Gurler, Mustafa. C. Pnar, Mohamed

A Memory Reduction Method in Pricing American Options Raymond H. Chan Yong Chen y K. M. Yeung z Abstract This paper concerns with the pricing of American options by simulation methods. In the traditional

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### On the degrees of freedom in shrinkage estimation

On the degrees of freedom in shrinkage estimation Kengo Kato Graduate School of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan kato ken@hkg.odn.ne.jp October, 2007 Abstract

### Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

### Topic 1: Matrices and Systems of Linear Equations.

Topic 1: Matrices and Systems of Linear Equations Let us start with a review of some linear algebra concepts we have already learned, such as matrices, determinants, etc Also, we shall review the method

### Introduction to Logistic Regression

OpenStax-CNX module: m42090 1 Introduction to Logistic Regression Dan Calderon This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 3.0 Abstract Gives introduction

### How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

### Blockwise Sparse Regression

Blockwise Sparse Regression 1 Blockwise Sparse Regression Yuwon Kim, Jinseog Kim, and Yongdai Kim Seoul National University, Korea Abstract: Yuan an Lin (2004) proposed the grouped LASSO, which achieves

### BLOCKWISE SPARSE REGRESSION

Statistica Sinica 16(2006), 375-390 BLOCKWISE SPARSE REGRESSION Yuwon Kim, Jinseog Kim and Yongdai Kim Seoul National University Abstract: Yuan an Lin (2004) proposed the grouped LASSO, which achieves

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Regularized Logistic Regression for Mind Reading with Parallel Validation

Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, Jukka-Pekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland

### Similarity and Diagonalization. Similar Matrices

MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix.

MATH 304 Linear Algebra Lecture 8: Inverse matrix (continued). Elementary matrices. Transpose of a matrix. Inverse matrix Definition. Let A be an n n matrix. The inverse of A is an n n matrix, denoted

Solving linear equations on parallel distributed memory architectures by extrapolation Christer Andersson Abstract Extrapolation methods can be used to accelerate the convergence of vector sequences. It

### Lasso on Categorical Data

Lasso on Categorical Data Yunjin Choi, Rina Park, Michael Seo December 14, 2012 1 Introduction In social science studies, the variables of interest are often categorical, such as race, gender, and nationality.

### On the (S, 1;S) lost sales inventory model with priority. October 1997. Abstract

On the (S, 1;S) lost sales inventory model with priority demand classes R. Dekker R.M. Hill y M.J. Kleijn October 1997 Abstract In this paper an inventory model with several demand classes, prioritised

### Date: April 12, 2001. Contents

2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

### Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

### ON THE DEGREES OF FREEDOM OF THE LASSO

The Annals of Statistics 2007, Vol. 35, No. 5, 2173 2192 DOI: 10.1214/009053607000000127 Institute of Mathematical Statistics, 2007 ON THE DEGREES OF FREEDOM OF THE LASSO BY HUI ZOU, TREVOR HASTIE AND

### Fuzzy regression model with fuzzy input and output data for manpower forecasting

Fuzzy Sets and Systems 9 (200) 205 23 www.elsevier.com/locate/fss Fuzzy regression model with fuzzy input and output data for manpower forecasting Hong Tau Lee, Sheu Hua Chen Department of Industrial Engineering

### Learning Big (Image) Data via Coresets for Dictionaries

Learning Big (Image) Data via Coresets for Dictionaries Dan Feldman, Micha Feigin, and Nir Sochen Abstract. Signal and image processing have seen in the last few years an explosion of interest in a new

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Lecture 11. Shuanglin Shao. October 2nd and 7th, 2013

Lecture 11 Shuanglin Shao October 2nd and 7th, 2013 Matrix determinants: addition. Determinants: multiplication. Adjoint of a matrix. Cramer s rule to solve a linear system. Recall that from the previous

### Summary of Probability

Summary of Probability Mathematical Physics I Rules of Probability The probability of an event is called P(A), which is a positive number less than or equal to 1. The total probability for all possible

### School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213

,, 1{8 () c A Note on Learning from Multiple-Instance Examples AVRIM BLUM avrim+@cs.cmu.edu ADAM KALAI akalai+@cs.cmu.edu School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 1513 Abstract.

### Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Maximum Likelihood Estimation of an ARMA(p,q) Model

Maximum Likelihood Estimation of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 8 This note describes the Matlab function arma_mle.m that computes the maximum likelihood estimates

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Effective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data

Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA

### Model selection in R featuring the lasso. Chris Franck LISA Short Course March 26, 2013

Model selection in R featuring the lasso Chris Franck LISA Short Course March 26, 2013 Goals Overview of LISA Classic data example: prostate data (Stamey et. al) Brief review of regression and model selection.

### 10.3 POWER METHOD FOR APPROXIMATING EIGENVALUES

58 CHAPTER NUMERICAL METHODS. POWER METHOD FOR APPROXIMATING EIGENVALUES In Chapter 7 you saw that the eigenvalues of an n n matrix A are obtained by solving its characteristic equation n c nn c nn...

### Applications to Data Smoothing and Image Processing I

Applications to Data Smoothing and Image Processing I MA 348 Kurt Bryan Signals and Images Let t denote time and consider a signal a(t) on some time interval, say t. We ll assume that the signal a(t) is

### (Quasi-)Newton methods

(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

### LINEAR SYSTEMS. Consider the following example of a linear system:

LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x +2x 2 +3x 3 = 5 x + x 3 = 3 3x + x 2 +3x 3 = 3 x =, x 2 =0, x 3 = 2 In general we want to solve n equations in

### Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they

### Chapter 12 Modal Decomposition of State-Space Models 12.1 Introduction The solutions obtained in previous chapters, whether in time domain or transfor

Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter

### Sales forecasting # 2

Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

### 171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### The Delta Method and Applications

Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and

### Bayesian Penalized Methods for High Dimensional Data

Bayesian Penalized Methods for High Dimensional Data Joseph G. Ibrahim Joint with Hongtu Zhu and Zakaria Khondker What is Covered? Motivation GLRR: Bayesian Generalized Low Rank Regression L2R2: Bayesian

### Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Bootstrapping Big Data

Bootstrapping Big Data Ariel Kleiner Ameet Talwalkar Purnamrita Sarkar Michael I. Jordan Computer Science Division University of California, Berkeley {akleiner, ameet, psarkar, jordan}@eecs.berkeley.edu

### Integer roots of quadratic and cubic polynomials with integer coefficients

Integer roots of quadratic and cubic polynomials with integer coefficients Konstantine Zelator Mathematics, Computer Science and Statistics 212 Ben Franklin Hall Bloomsburg University 400 East Second Street

### The Characteristic Polynomial

Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

### Math 115A HW4 Solutions University of California, Los Angeles. 5 2i 6 + 4i. (5 2i)7i (6 + 4i)( 3 + i) = 35i + 14 ( 22 6i) = 36 + 41i.

Math 5A HW4 Solutions September 5, 202 University of California, Los Angeles Problem 4..3b Calculate the determinant, 5 2i 6 + 4i 3 + i 7i Solution: The textbook s instructions give us, (5 2i)7i (6 + 4i)(

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Nonparametric adaptive age replacement with a one-cycle criterion

Nonparametric adaptive age replacement with a one-cycle criterion P. Coolen-Schrijner, F.P.A. Coolen Department of Mathematical Sciences University of Durham, Durham, DH1 3LE, UK e-mail: Pauline.Schrijner@durham.ac.uk

### Machine Learning Big Data using Map Reduce

Machine Learning Big Data using Map Reduce By Michael Bowles, PhD Where Does Big Data Come From? -Web data (web logs, click histories) -e-commerce applications (purchase histories) -Retail purchase histories

Applying MCMC Methods to Multi-level Models submitted by William J Browne for the degree of PhD of the University of Bath 1998 COPYRIGHT Attention is drawn tothefactthatcopyright of this thesis rests with

### On Load Balancing in Erlang Networks. The dynamic resource allocation problem arises in a variety of applications.

1 On Load Balancing in Erlang Networks 1.1 Introduction The dynamic resource allocation problem arises in a variety of applications. The generic resource allocation setting involves a number of locations

### Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### The Dirichlet Unit Theorem

Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

### Social Media Aided Stock Market Predictions by Sparsity Induced Regression

Social Media Aided Stock Market Predictions by Sparsity Induced Regression Delft Center for Systems and Control Social Media Aided Stock Market Predictions by Sparsity Induced Regression For the degree

### Inverses and powers: Rules of Matrix Arithmetic

Contents 1 Inverses and powers: Rules of Matrix Arithmetic 1.1 What about division of matrices? 1.2 Properties of the Inverse of a Matrix 1.2.1 Theorem (Uniqueness of Inverse) 1.2.2 Inverse Test 1.2.3

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### NEW TABLE AND NUMERICAL APPROXIMATIONS FOR KOLMOGOROV-SMIRNOV/LILLIEFORS/VAN SOEST NORMALITY TEST

NEW TABLE AND NUMERICAL APPROXIMATIONS FOR KOLMOGOROV-SMIRNOV/LILLIEFORS/VAN SOEST NORMALITY TEST PAUL MOLIN AND HERVÉ ABDI Abstract. We give new critical values for the Kolmogorov-Smirnov/Lilliefors/Van

### Solving Linear Systems, Continued and The Inverse of a Matrix

, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing

### Acknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues

Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### Przemys law Biecek and Teresa Ledwina

Data Driven Smooth Tests Przemys law Biecek and Teresa Ledwina 1 General Description Smooth test was introduced by Neyman (1937) to verify simple null hypothesis asserting that observations obey completely

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### 160 CHAPTER 4. VECTOR SPACES

160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results

A Globally Convergent Primal-Dual Interior Point Method for Constrained Optimization Hiroshi Yamashita 3 Abstract This paper proposes a primal-dual interior point method for solving general nonlinearly

### Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091)

Monte Carlo and Empirical Methods for Stochastic Inference (MASM11/FMS091) Magnus Wiktorsson Centre for Mathematical Sciences Lund University, Sweden Lecture 5 Sequential Monte Carlo methods I February

### Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### Master s Theory Exam Spring 2006

Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

### Classification/Decision Trees (II)

Classification/Decision Trees (II) Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Right Sized Trees Let the expected misclassification rate of a tree T be R (T ).

### constraint. Let us penalize ourselves for making the constraint too big. We end up with a

Chapter 4 Constrained Optimization 4.1 Equality Constraints (Lagrangians) Suppose we have a problem: Maximize 5, (x 1, 2) 2, 2(x 2, 1) 2 subject to x 1 +4x 2 =3 If we ignore the constraint, we get the

### 3 Does the Simplex Algorithm Work?

Does the Simplex Algorithm Work? In this section we carefully examine the simplex algorithm introduced in the previous chapter. Our goal is to either prove that it works, or to determine those circumstances

### CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

### A Lower Bound for Area{Universal Graphs. K. Mehlhorn { Abstract. We establish a lower bound on the eciency of area{universal circuits.

A Lower Bound for Area{Universal Graphs Gianfranco Bilardi y Shiva Chaudhuri z Devdatt Dubhashi x K. Mehlhorn { Abstract We establish a lower bound on the eciency of area{universal circuits. The area A

### Piecewise Cubic Splines

280 CHAP. 5 CURVE FITTING Piecewise Cubic Splines The fitting of a polynomial curve to a set of data points has applications in CAD (computer-assisted design), CAM (computer-assisted manufacturing), and

### Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation

INTERNATIONAL JOURNAL FOR NUMERICAL METHODS IN ENGINEERING Int. J. Numer. Meth. Engng 2001; 50:2373 2410 Finite cloud method: a true meshless technique based on a xed reproducing kernel approximation N.

### Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1

Further Study on Strong Lagrangian Duality Property for Invex Programs via Penalty Functions 1 J. Zhang Institute of Applied Mathematics, Chongqing University of Posts and Telecommunications, Chongqing

### 2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

### A Network Flow Approach in Cloud Computing

1 A Network Flow Approach in Cloud Computing Soheil Feizi, Amy Zhang, Muriel Médard RLE at MIT Abstract In this paper, by using network flow principles, we propose algorithms to address various challenges

### The Ideal Class Group

Chapter 5 The Ideal Class Group We will use Minkowski theory, which belongs to the general area of geometry of numbers, to gain insight into the ideal class group of a number field. We have already mentioned

### The Lindsey-Fox Algorithm for Factoring Polynomials

OpenStax-CNX module: m15573 1 The Lindsey-Fox Algorithm for Factoring Polynomials C. Sidney Burrus This work is produced by OpenStax-CNX and licensed under the Creative Commons Attribution License 2.0

### Tilburg University. Publication date: 1998. Link to publication

Tilburg University Time Series Analysis of Non-Gaussian Observations Based on State Space Models from Both Classical and Bayesian Perspectives Durbin, J.; Koopman, S.J.M. Publication date: 1998 Link to

### Simultaneous or Sequential? Search Strategies in the U.S. Auto. Insurance Industry. Elisabeth Honka 1. Pradeep Chintagunta 2

Simultaneous or Sequential? Search Strategies in the U.S. Auto Insurance Industry Elisabeth Honka 1 University of Texas at Dallas Pradeep Chintagunta 2 University of Chicago Booth School of Business October

### Todd R. Nelson, Department of Statistics, Brigham Young University, Provo, UT

SAS Interface for Run-to-Run Batch Process Monitoring Using Real-Time Data Todd R Nelson, Department of Statistics, Brigham Young University, Provo, UT Scott D Grimshaw, Department of Statistics, Brigham

### arxiv:1112.0829v1 [math.pr] 5 Dec 2011

How Not to Win a Million Dollars: A Counterexample to a Conjecture of L. Breiman Thomas P. Hayes arxiv:1112.0829v1 [math.pr] 5 Dec 2011 Abstract Consider a gambling game in which we are allowed to repeatedly