# Selecting the Correct Number of Factors in Approximate Factor Models: The Large Panel Case with Group Bridge Estimators

Save this PDF as:

Size: px
Start display at page:

Download "Selecting the Correct Number of Factors in Approximate Factor Models: The Large Panel Case with Group Bridge Estimators"

## Transcription

1 Selecting the Correct umber of Factors in Approximate Factor Models: The Large Panel Case with Group Bridge Estimators Mehmet Caner Xu Han orth Carolina State University City University of Hong Kong ovember 5, 03 Abstract This paper proposes a group bridge estimator to select the correct number of factors in approximate factor models. It contributes to the literature on shrinkage estimation and factor models by extending the conventional bridge estimator from a single equation to a large panel context. The proposed estimator can consistently estimate the factor loadings of relevant factors and shrink the loadings of irrelevant factors to zero with a probability approaching one. Hence, it provides a consistent estimate for the number of factors. We also propose an algorithm for the new estimator; Monte Carlo experiments show that our algorithm converges reasonably fast and that our estimator has very good performance in small samples. An empirical example is also presented based on a commonly used US macroeconomic data set. Key words: bridge estimation, common factors, selection consistency We thank the co-editor Shakeeb Khan, an associate editor and two anonymous referees who made this paper better. We also thank Robin Sickles, James Stock, and the participants at the High Dimension Reduction Conference (December 00) in London, the Panel Data Conference (July 0) in Montreal, the CIREQ Econometrics Conference (May 03), and the orth American Summer Meeting of the Econometric Society (June 03) for their comments. Department of Economics, 468 elson Hall, Raleigh, C Department of Economics and Finance, City University of Hong Kong.

4 The Model We use the following representation for the factor model: X it = λ 0 i F 0 t + e it, (.) where X it is the observed data for the i th cross section at time t, for i =,,..., and t =,,..., T, F 0 t is an r vector of common factors, and λ 0 i is an r vector of factor loadings associated with Ft 0. The true number of factors is r. λ 0 i F t 0 is the common component of X it and e it is the idiosyncratic error. Ft 0, λ 0 i, e it and r are unobserved. Here we explain several terms used in our discussion of factor models. We call (.) an approximate factor model because our assumptions (see Section.) allow some cross-sectional correlations in the idiosyncratic errors. It is more general than a strict factor model, which assumes e it to be uncorrelated across i. Also, we refer to (.) as a static factor model because X it has a contemporaneous relationship with the factors. This is different from the dynamic factor model in Forni et al. (000). The dynamic factor model has the representation X it = q j= φ ij(l)f jt + e it, where the q dimensional dynamic factors are orthonormal white noises and the one-sided filters φ ij (L) are dynamic factor loadings. Hence, the factor model X it = a i f t + b i f t + e it has one dynamic factor but two static factors. Our focus is the static factor model with cross-sectionally correlated e it. For conciseness, we use the term factor model to refer to the static factor model and approximate factor model in the rest of the paper, unless otherwise specified. ow, we rewrite model (.) in the vector form: X i = F 0 λ 0 i + e i for i =,,...,, (.) where X i = (X i, X i,..., X it ) is a T -dimensional vector of observations on the i th cross section, F 0 = (F 0, F 0,..., F T 0) is a T r matrix of the unobserved factors, and e i = (e i, e i,..., e it ) is a T -dimensional vector of the idiosyncratic shock of the i th cross section. (.) and (.) can also be represented in the matrix form: X = F 0 Λ 0 + e, (.3) where X = (X, X,..., X ) is a T matrix of observed data, Λ 0 = (λ 0, λ0,..., λ0 ) is the r factor loading matrix, and e = (e, e,..., e ) is a T matrix of idiosyncratic shocks. We use Principal Components Analysis (PCA) to estimate unknown factors, and thereafter we use bridge estimation to get the correct number of factors. The conventional PCA minimizes the following objective function: T V (k) = min ) (X Λ k,f k( it λ k i Ft k ), (.4) t= 4

5 where the superscript k denotes the fact that the number of factors is to be determined, and both λ k i and F k t are k vectors. We consider 0 k p, where p is some predetermined upper bound such that p r. For a given k, the solution to the above objective function (.4) is that ˆF k (T k) is equal to T times the eigenvectors corresponding to the k largest eigenvalues of XX and ˆΛ k OLS = X ˆF k ( ˆF k ˆF k ) = X ˆF k /T (as eigenvectors are orthonormal, ˆF k ˆF k /T = I k ). Let C = min(, T ). We define our group bridge estimator as ˆΛ that minimizes the following objective function: ˆΛ = argmin Λ L(Λ) (.5) L(Λ) = T (X it λ i ˆF p t ) + γ ( ) α p λ ij, (.6) t= where ˆF p is the T p principal component estimate of the factors, ˆFt p is the transpose of the t th row of ˆF p, λ i = (λ i,..., λ ip ) is a p vector in a compact subset of R p, Λ (λ, λ,..., λ ) is C p, α is a constant with 0 < α < /, and γ is a tuning parameter. The penalty term in (.6) is different from that of a conventional bridge estimator such as the one in HHM (008). First, unlike the single equation shrinkage estimation, our penalty term is divided by C, which indicates that the tuning parameter will depend on both and T. Second, we use ( p ) α j= λ ij instead of the HHM (008) type of penalty p j= λ ij δ (0 < δ < ). A necessary condition for p j= λ ij δ to shrink the last p r columns of Λ to zero is that the corresponding non-penalized estimates for these loadings have to converge in probability to zero for all i s and j = r +,..., p. This necessary condition fails because the last p r columns of ˆF p are correlated with some X i s, due to the limited cross-sectional correlation among idiosyncratic errors. Hence, using p j= λ ij δ as the penalty will yield many zero and a few nonzero estimates in the last p r columns of Λ, and will cause overestimation if we use the number of nonzero columns in Λ as an estimator for r. To shrink the entire j th (j > r) column of Λ to zero, we apply the penalty term in (.6) to penalize the norm of the entire vector Λ j (λ j,..., λ j ) rather than each single element λ ij. The idea is related to the group LASSO (Yuan and Lin, 006), which is designed for single equation models. In a factor model setup, the analog of the group LASSO s penalty term will be proportional to p j= λ ij, which provides an intuition to explain why we have 0 < α < /. If =, then the model only involves a single equation, and the group LASSO s penalty reduces to p j= λ j. Accordingly, our penalty becomes p j= λ j α with 0 < α <, which is the familiar penalty term in a single equation bridge estimation. Huang et al. (009) also propose a group bridge estimator for single equation shrinkage estimation. In our large panel setup, the analog of Huang et al. s (009) penalty term will be proportional to p j= ( λ ij ) δ (0 < δ < ). Their penalty is designed to achieve selection consistency both within and among groups. Under this paper s framework, a group of coefficients correspond to a j= 5

6 column in the factor loading matrix. The context of this paper is different from that of Huang et al. (009), and we do not need to consider the within group selection for three reasons. First, within group selection means distinguishing zero from nonzero elements in the first r columns of Λ( p), but this is not necessary because our purpose is to distinguish the first r nonzero columns from the last p r zero columns of Λ, i.e., we only need to select groups in order to determine r. Second, it is well known that both factors and factor loadings estimated by principal components are consistent estimates of their original counterparts up to some rotations. The rotations can be so generic that the economic meanings of the principal component estimators are completely unclear. The within group selection is based on the post-rotation factor loadings. Hence, unless there is a specific reason to detect zero and nonzero elements in the post-rotation factor loading matrix, within group selection is not very meaningful under the current setup. Moreover, as our penalty term uses square rather than absolute values, the computation is much easier than that of the penalty used by Huang et al. (009). Hirose and Konishi (0) apply the group LASSO estimator in the context of strict factor models. However, their estimator is substantially different from (.5), primarily because they focus on selecting variables that are not affected by any of the factors, whereas this paper focuses on selecting factors that do not affect any of the variables. In other words, they are interested in which rows of Λ are zero, whereas we care about which columns of Λ are zero, so Hirose and Konishi s (0) penalty term cannot determine the number of factors by design. One could in principle construct a group LASSO estimator to select the zero columns of Λ. However, such an estimator is different from Hirose and Konishi (0), and its theoretical property is an open question and beyond the scope of this paper.. Assumptions Let A [trace(a A)] / denote the norm of matrix A and p denote convergence in probability. Our theoretical results are derived based on the following assumptions.. E Ft 0 4 <, T T t= Ft 0 Ft 0 p Σ F (r r) as T, and Σ F is finite, and positive definite.. λ 0 i λ <, Λ 0 Λ 0 / Σ Λ 0 as, and Σ Λ is finite, and positive definite. The j th column of Λ, denoted as Λ j, is inside a compact subset of R for all and j =,..., p. 3. There exists a positive and finite constant M that does not depend on or T, such that for all and T, (i). Ee it = 0, E e it 8 < M. (ii). E(e se t /) = E[ e is e it ] = ι (s, t), ι (s, s) M for all s, and T T T ι (s, t) M. s= t= 6

7 (iii). E(e it e jt ) = τ ij,t where τ ij,t τ ij for some τ ij and for all t. In addition τ ij M. j= (iv). E(e it e js ) = τ ij,ts and (v). For every (t, s), ( ) T T τ ij,ts M. j= s= t= E / [e is e it E(e is e it )] 4 M. 4. E[ T T t= F 0 t e it ] M. 5. The eigenvalues of the matrix (Σ Λ Σ F ) are distinct. 6. (i). As and T, γ C 0, where C = min(, T ). (ii). Also, as, where 0 < α < /. γ C α, Assumptions -4 are Assumptions A-D in Bai and g (00). Assumption is standard for factor models. With Assumption, we consider only nonrandom factor loadings. The results hold when the factor loadings are random but independent of factors and errors. The divergence rate of Λ Λ is assumed to be, which is the same as Bai and g (00). It is possible to allow a divergence rate slower than, which indicates a weaker factor structure. A slower divergence rate of Λ Λ will slow down the convergence rate of Ft k and change the divergence rate of γ specified in Assumption 6. We conjecture that our bridge estimator still works when Λ Λ diverges at a slower rate and we leave that as a future research topic. Assumption 3 allows cross-sectional as well as serial dependence in the idiosyncratic errors. Hence, the model follows an approximate factor structure. Assumption 4 allows for some dependency between factors and errors. Assumption 5 is Assumption G in Bai (003). This assumption ensures the existence of the limit of the rotation matrix H, which will be introduced in the next subsection. Assumption 6 is an assumption about the tuning parameter. Compared with the single equation bridge estimation where the tuning parameter only depends on or T, γ s divergence rate depends 7

8 on both and T. Hence, we extend the application of the bridge estimation from single equation models to large panels. Also, note that a conventional bridge penalty will involve the sum of λ ij δ with 0 < δ <. However, as we use λ ij instead of λ ij in the penalty term, the restriction on the power has to be adjusted accordingly, i.e., 0 < α <.. Theoretical Results Before discussing the main theoretical results, it is useful to describe some notations and transformations. We partition ˆF p in the following way: ˆF p = ( ˆF :r. ˆF (r+):p ), (.7) where ˆF :r corresponds to the first r columns of ˆF p, and ˆF (r+):p corresponds to the last p r columns of ˆF p. It is worth noting that there is an important difference between ˆF :r and ˆF (r+):p. It is well known that ˆF :r consistently estimates F 0 (for each t) up to some rotation (Bai, 003), but ˆF (r+):p is just some vectors containing noisy information from the idiosyncratic errors. In fact, the property of ˆF (r+):p is to the best of our knowledge still hardly discussed in either the factor model or random matrix theory literature. We will treat ˆF :r and ˆF (r+):p using different techniques. Let ˆF :r t denote the transpose of the t th row of ˆF :r. Bai (003) shows that ˆF :r t H Ft 0 p 0, where H is some r r matrix 3 converging to some nonsingular limit H 0. Since the factors are estimated up to the nonsingular rotation matrix H, we rewrite (.) as X i = F 0 H H λ 0 i + e i. (.8) Replacing the unobserved F 0 H with the principal component estimates ˆF p, we obtain the following transformation X i = ˆF :r H λ 0 i + e i + (F 0 H ˆF :r )H λ 0 i = ˆF p λ i + u i, (.9) where u i and λ i are defined as [ H λ 0 i ] u i e i ( ˆF :r F 0 H)H λ 0 i, λ i 0 (p r). (.0) We set the last p r entries of λ i equal to zero because the model has r factors. ote that ˆF p is This is why Bai and g (00) define their estimator for factors as ˆF k V k, where V k is a diagonal matrix consisting of the first k largest eigenvalues of XX / in a descending order. Since the k th (k > r) eigenvalue of XX / is O p (C ), the last p r columns of their factor matrix are asymptotically zeros and only the first r columns of the factor matrix matter. As their criteria focus on the sum of squared errors, this asymptotically not-of-full-column-rank design does not affect their result. Since we focus on estimating and penalizing the factor loadings, we use ˆF k to ensure full column rank instead of Bai and g s (00) ˆF k V k as the estimator for F 0. 3 The definition of H is given by Lemma in the Appendix. 8

9 an estimated regressor and the transformed error term u i involves two components: the true error term, which is heteroskedastic and serially correlated, and an additional term depending on ˆF :r, F 0 and λ 0 i. Compared with the i.i.d. errors in HHM (008) or the independent errors in Huang et al (009), our error term e i is much less restricted (see Assumption 3). Also, note that the second term in u i involves estimated factors via principal components and that (.6) uses estimated factors instead of the unobserved F 0. These are new in the shrinkage literature and bring an extra layer of difficulty. Hence, our estimator in large panels is a nontrivial extension of existing methods. Given the λ i defined above, we can obtain the transformed p loading matrix: Λ (λ,..., λ ) [Λ 0 H.0 (p r) ]. Thus, the goal is to prove that ˆΛ Λ converges to zero. The last p r columns ˆΛ should be zero and its first r columns should be nonzero, so that we can consistently determine the number of factors using the number of nonzero columns in ˆΛ. Let ˆλ i denote the transpose of the i th row of ˆΛ, the solution to (.6). The following theorem establishes the convergence rate of ˆλ i. Theorem : Under Assumptions - 6, ˆλ i λ i = O p (C ). It is noteworthy that this rate is not affected by γ, as long as the divergence rate of γ satisfies Assumption 6. This is an extension of Horowitz, Huang, and Ma s (008) result to a high dimensional factor model context. Under some appropriate assumptions, HHM (008) show that the bridge estimator (denoted by ˆβ) has the familiar OLS convergence rate, i.e., ˆβ β 0 = O p ( ), in a single equation model. In this paper, the rate depends on both and T and is similar to Bai s (003) result for the OLS estimator of factor loadings (given the principal components estimates of factors). Hence, Theorem shows that the group bridge estimator defined in (.5) is as good as conventional estimators in terms of convergence rate. The next theorem shows that our estimator can estimate the last p r columns of ˆΛ exactly as zeros with a probability converging to one. Theorem : Under Assumptions - 6, (i)ˆλ (r+):p = 0 (p r) with probability converging to one as, T, where ˆΛ (r+):p denotes the last p r columns of ˆΛ. (ii) P (ˆΛ j = 0) 0 as, T for any j =,, r, where ˆΛ j represents the j th column of the ˆΛ. This theorem shows that our estimator achieves the selection consistency for the zero elements in Λ : the first r columns in ˆΛ will be nonzero and last p r columns of ˆΛ will be zero as and T 9

10 diverge. Hence, it is natural to define the estimator for the number of factors as ˆr the number of nonzero columns of ˆΛ. (.) The following corollary establishes the consistency of ˆr, and the proof is straightforward given Theorem. Corollary : Under Assumptions - 6, the number of factors in (.3) can be consistently determined by ˆr..3 Computation In this subsection we show how to implement our method. The initial step is estimating the p factors via PCA in (.4). ext, we show how to solve the optimization in (.5) and (.6). As the bridge penalty is not differentiable at zero for 0 < α < /, standard gradient based methods are not applicable to our estimator. We develop an algorithm to compute the solution for (.6). Let ˆΛ (m) be the value of the m th iteration from the optimization algorithm, m = 0,,... Let T denote the convergence tolerance and ν denote some small positive number. In this paper, we set T = and ν = 0 4. Set the initial value equal to the OLS solution, ˆΛ (0) = X ˆF p /T. For m =,,..., () Let ˆΛ (m),j denote the j th column of ˆΛ (m). Compute g = (X ˆF p ˆΛ (m) ) ˆF p / and ˆΛ (m),j αγ g (j, ν) = [ ] (ˆΛ(m),j α /) ˆΛ(m),j + ν T C, where ν avoids the zero denominator when ˆΛ (m),j = 0. Since the value of the tuning parameter will be selected as well, the constant term T/C algorithm. Set g (ν) = [g (, ν), g (, ν),..., g (p, ν)]. (for a given sample) can be dropped in the () Define the p gradient matrix g = [g, g,..., g p ] ( p), and the i th element of the j th column g j (j =,..., p) is defined as g j i = g,ij + g (j, ν) i if λ (m),j i > T g j i = 0 if λ (m),j i T, where g,ij is the (i, j) th element of g, g (j, ν) i is the i th element of g (j, ν), and λ (m),j i element of ˆΛ (m),j. is the i th (3) Let max g denote the largest element in g in terms of absolute value. Re-scale g = g/ max g if max g > 0, otherwise set g = 0. (4) ˆΛ (m+) = ˆΛ (m) + g/( + m/750), where is the increment for this iteration algorithm. We set = 0 3, which follows the value used by HHM (008). 0

12 accelerates our algorithm. We follow the method developed by Wang et al. (009) to determine the optimal choice of γ. The optimal γ is the one that minimizes the following criterion: log[( ) T ( (X it ˆλ ˆF p + T i t ) ] + ˆr(γ) t= ) ( ) ln ln[ln()], + T where ˆr(γ) denotes the fact that the estimated number of factors is affected by the choice of γ. ote that the number of parameters in our model is proportional to, so we use ln[ln()] in the penalty, which has the same rate as suggested by Wang et al. (009). 3 Simulations In this section, we explore the finite sample properties of our estimator. We also compare our estimator with some of the estimators in the literature: IC p proposed by Bai and g (00), IC T ;n proposed by Hallin and Liska (007), ED proposed by Onatski (00), and ER and GR proposed by Seung and Horenstein (03). Given the principal component estimator ˆF k, Bai and g (00) use OLS to estimate the factor loadings by minimizing the sum of squared residuals: V (k, ˆF k ) = min Λ T (X it λ k i ˆF t k ). t= Then the IC p estimator is defined as the k that minimizes the following criterion: ( ) ( ) IC p (k) = log[v (k, ˆF + T k )] + k ln. + T The criterion can be considered the analog of the conventional BIC in the factor model context. It penalizes the integer k to prevent the model from overfitting. Hallin and Liska s (007) estimator is designed to determine the number of dynamic factors (usually denoted by q in the literature), but it is applicable in our Monte Carlo experiments where static and dynamic factors coincide by design, i.e. r = q. Hallin and Liska s (007) estimator follows the idea of Bai and g (00), but their penalty term involves an additional scale parameter c. In theory, the choice of c > 0 does not matter as and T. For a given and T, they propose an algorithm to select the scale parameter: c is varied on a predetermined grid ranging from zero to some positive number c, and their statistic is computed for each value of c using J nested subsamples (i =,..., j ; t =,..., T j ) for j =,..., J, where J is a fixed positive integer, 0 < <... < j <... < J = and 0 < T <... < T j <... < T J = T. Hallin and Liska (007) show that [0, c] will contain several intervals that generate estimates not varying across subsamples. They call these stability intervals and suggest that the c belonging to the second stability interval is the optimal one (the first stability intervals contains zero). Since the scale parameter in IC T ;n

### Standardization and Estimation of the Number of Factors for Panel Data

Journal of Economic Theory and Econometrics, Vol. 23, No. 2, Jun. 2012, 79 88 Standardization and Estimation of the Number of Factors for Panel Data Ryan Greenaway-McGrevy Chirok Han Donggyu Sul Abstract

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Topics in Time Series Analysis

Topics in Time Series Analysis Massimiliano Marcellino EUI and Bocconi University This course reviews some recent developments in the analysis of time series data in economics, with a special emphasis

### Factor analysis. Angela Montanari

Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

### Data analysis in supersaturated designs

Statistics & Probability Letters 59 (2002) 35 44 Data analysis in supersaturated designs Runze Li a;b;, Dennis K.J. Lin a;b a Department of Statistics, The Pennsylvania State University, University Park,

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Detecting Big Structural Breaks in Large Factor Models

Detecting Big Structural Breaks in Large Factor Models Liang Chen Juan J. Dolado Jesús Gonzalo First Draft: May 2010 his draft: September 2013 Abstract ime invariance of factor loadings is a standard assumption

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015

Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Estimation and Inference in Cointegration Models Economics 582

Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there

### WORKING PAPER SERIES FORECASTING USING A LARGE NUMBER OF PREDICTORS IS BAYESIAN REGRESSION A VALID ALTERNATIVE TO PRINCIPAL COMPONENTS?

WORKING PAPER SERIES NO 700 / DECEMBER 2006 FORECASTING USING A LARGE NUMBER OF PREDICTORS IS BAYESIAN REGRESSION A VALID ALTERNATIVE TO PRINCIPAL COMPONENTS? by Christine De Mol, Domenico Giannone and

### Mathematics Course 111: Algebra I Part IV: Vector Spaces

Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

### Factor Models and Dynamic Stochastic General Equilibrium models: a forecasting evaluation

Factor Models and Dynamic Stochastic General Equilibrium models: a forecasting evaluation Gabriele Palmegiani LUISS University of Rome Abstract This paper aims to put dynamic stochastic general equilibrium

### Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

### C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)}

C: LEVEL 800 {MASTERS OF ECONOMICS( ECONOMETRICS)} 1. EES 800: Econometrics I Simple linear regression and correlation analysis. Specification and estimation of a regression model. Interpretation of regression

Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

### OLS in Matrix Form. Let y be an n 1 vector of observations on the dependent variable.

OLS in Matrix Form 1 The True Model Let X be an n k matrix where we have observations on k independent variables for n observations Since our model will usually contain a constant term, one of the columns

### Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

### Large Time-Varying Parameter VARs

Large Time-Varying Parameter VARs Gary Koop University of Strathclyde Dimitris Korobilis University of Glasgow Abstract In this paper we develop methods for estimation and forecasting in large timevarying

### Notes on Factoring. MA 206 Kurt Bryan

The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor

### Analysis of Financial Time Series with EViews

Analysis of Financial Time Series with EViews Enrico Foscolo Contents 1 Asset Returns 2 1.1 Empirical Properties of Returns................. 2 2 Heteroskedasticity and Autocorrelation 4 2.1 Testing for

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Degrees of Freedom and Model Search

Degrees of Freedom and Model Search Ryan J. Tibshirani Abstract Degrees of freedom is a fundamental concept in statistical modeling, as it provides a quantitative description of the amount of fitting performed

### Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

### Causal Graphs and Variable Selection in Large Vector Autoregressive Models

Causal Graphs and Variable Selection in Large Vector Autoregressive Models Ralf Brüggemann 1 Christian Kascha 2 1 Department of Economics, University of Konstanz 2 Department of Economics, University of

### The Characteristic Polynomial

Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### The Feeble Link between Exchange Rates and Fundamentals: Can We Blame the Discount Factor?

LUCIO SARNO ELVIRA SOJLI The Feeble Link between Exchange Rates and Fundamentals: Can We Blame the Discount Factor? Recent research demonstrates that the well-documented feeble link between exchange rates

### Forecasting Financial Time Series

Canberra, February, 2007 Contents Introduction 1 Introduction : Problems and Approaches 2 3 : Problems and Approaches : Problems and Approaches Time series: (Relative) returns r t = p t p t 1 p t 1, t

### How to assess the risk of a large portfolio? How to estimate a large covariance matrix?

Chapter 3 Sparse Portfolio Allocation This chapter touches some practical aspects of portfolio allocation and risk assessment from a large pool of financial assets (e.g. stocks) How to assess the risk

### Statistical Machine Learning

Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### One-shot learning and big data with n = 2

One-shot learning and big data with n = Lee H. Dicker Rutgers University Piscataway, NJ ldicker@stat.rutgers.edu Dean P. Foster University of Pennsylvania Philadelphia, PA dean@foster.net Abstract We model

### 1 Teaching notes on GMM 1.

Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

### Machine Learning and Pattern Recognition Logistic Regression

Machine Learning and Pattern Recognition Logistic Regression Course Lecturer:Amos J Storkey Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh Crichton Street,

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

### The primary goal of this thesis was to understand how the spatial dependence of

5 General discussion 5.1 Introduction The primary goal of this thesis was to understand how the spatial dependence of consumer attitudes can be modeled, what additional benefits the recovering of spatial

### Predictability of Non-Linear Trading Rules in the US Stock Market Chong & Lam 2010

Department of Mathematics QF505 Topics in quantitative finance Group Project Report Predictability of on-linear Trading Rules in the US Stock Market Chong & Lam 010 ame: Liu Min Qi Yichen Zhang Fengtian

### 1 Limiting distribution for a Markov chain

Copyright c 2009 by Karl Sigman Limiting distribution for a Markov chain In these Lecture Notes, we shall study the limiting behavior of Markov chains as time n In particular, under suitable easy-to-check

### Chapter 2. Dynamic panel data models

Chapter 2. Dynamic panel data models Master of Science in Economics - University of Geneva Christophe Hurlin, Université d Orléans Université d Orléans April 2010 Introduction De nition We now consider

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### The Dirichlet Unit Theorem

Chapter 6 The Dirichlet Unit Theorem As usual, we will be working in the ring B of algebraic integers of a number field L. Two factorizations of an element of B are regarded as essentially the same if

### THE AUSTRALIAN BUSINESS CYCLE: A COINCIDENT INDICATOR APPROACH

THE AUSTRALIAN BUSINESS CYCLE: A COINCIDENT INDICATOR APPROACH Christian Gillitzer, Jonathan Kearns and Anthony Richards Research Discussion Paper 2005-07 October 2005 Economic Group Reserve Bank of Australia

### 4.5 Linear Dependence and Linear Independence

4.5 Linear Dependence and Linear Independence 267 32. {v 1, v 2 }, where v 1, v 2 are collinear vectors in R 3. 33. Prove that if S and S are subsets of a vector space V such that S is a subset of S, then

### Chapter 15 Introduction to Linear Programming

Chapter 15 Introduction to Linear Programming An Introduction to Optimization Spring, 2014 Wei-Ta Chu 1 Brief History of Linear Programming The goal of linear programming is to determine the values of

### Markov Chains, Stochastic Processes, and Advanced Matrix Decomposition

Markov Chains, Stochastic Processes, and Advanced Matrix Decomposition Jack Gilbert Copyright (c) 2014 Jack Gilbert. Permission is granted to copy, distribute and/or modify this document under the terms

### Summary of week 8 (Lectures 22, 23 and 24)

WEEK 8 Summary of week 8 (Lectures 22, 23 and 24) This week we completed our discussion of Chapter 5 of [VST] Recall that if V and W are inner product spaces then a linear map T : V W is called an isometry

### D-optimal plans in observational studies

D-optimal plans in observational studies Constanze Pumplün Stefan Rüping Katharina Morik Claus Weihs October 11, 2005 Abstract This paper investigates the use of Design of Experiments in observational

### Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering Department of Industrial Engineering and Management Sciences Northwestern University September 15th, 2014

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### Financial TIme Series Analysis: Part II

Department of Mathematics and Statistics, University of Vaasa, Finland January 29 February 13, 2015 Feb 14, 2015 1 Univariate linear stochastic models: further topics Unobserved component model Signal

### 2. Linear regression with multiple regressors

2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

### Econometric Methods fo Panel Data Part II

Econometric Methods fo Panel Data Part II Robert M. Kunst University of Vienna April 2009 1 Tests in panel models Whereas restriction tests within a specific panel model follow the usual principles, based

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Factorization Theorems

Chapter 7 Factorization Theorems This chapter highlights a few of the many factorization theorems for matrices While some factorization results are relatively direct, others are iterative While some factorization

### Multiple Testing. Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf. Abstract

Multiple Testing Joseph P. Romano, Azeem M. Shaikh, and Michael Wolf Abstract Multiple testing refers to any instance that involves the simultaneous testing of more than one hypothesis. If decisions about

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING

THE FUNDAMENTAL THEOREM OF ARBITRAGE PRICING 1. Introduction The Black-Scholes theory, which is the main subject of this course and its sequel, is based on the Efficient Market Hypothesis, that arbitrages

### MATH 240 Fall, Chapter 1: Linear Equations and Matrices

MATH 240 Fall, 2007 Chapter Summaries for Kolman / Hill, Elementary Linear Algebra, 9th Ed. written by Prof. J. Beachy Sections 1.1 1.5, 2.1 2.3, 4.2 4.9, 3.1 3.5, 5.3 5.5, 6.1 6.3, 6.5, 7.1 7.3 DEFINITIONS

### Maximum Likelihood Estimation of an ARMA(p,q) Model

Maximum Likelihood Estimation of an ARMA(p,q) Model Constantino Hevia The World Bank. DECRG. October 8 This note describes the Matlab function arma_mle.m that computes the maximum likelihood estimates

### MATLAB and Big Data: Illustrative Example

MATLAB and Big Data: Illustrative Example Rick Mansfield Cornell University August 19, 2014 Goals Use a concrete example from my research to: Demonstrate the value of vectorization Introduce key commands/functions

### Clustering in the Linear Model

Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

### Example 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum. asin. k, a, and b. We study stability of the origin x

Lecture 4. LaSalle s Invariance Principle We begin with a motivating eample. Eample 4.1 (nonlinear pendulum dynamics with friction) Figure 4.1: Pendulum Dynamics of a pendulum with friction can be written

### Powerful new tools for time series analysis

Christopher F Baum Boston College & DIW August 2007 hristopher F Baum ( Boston College & DIW) NASUG2007 1 / 26 Introduction This presentation discusses two recent developments in time series analysis by

### Time Series Analysis

Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

### Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL

Appendix C Ordinary Least Squares and Poisson Regression Models by Luc Anselin University of Illinois Champaign-Urbana, IL This note provides a brief description of the statistical background, estimators

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### The Bootstrap. 1 Introduction. The Bootstrap 2. Short Guides to Microeconometrics Fall Kurt Schmidheiny Unversität Basel

Short Guides to Microeconometrics Fall 2016 The Bootstrap Kurt Schmidheiny Unversität Basel The Bootstrap 2 1a) The asymptotic sampling distribution is very difficult to derive. 1b) The asymptotic sampling

### INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

ONDERZOEKSRAPPORT NR 8904 OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS BY M. VANDEBROEK & J. DHAENE D/1989/2376/5 1 IN A OPTIMAl PREMIUM CONTROl NON-liFE INSURANCE BUSINESS By Martina Vandebroek

### Solving Sets of Equations. 150 B.C.E., 九章算術 Carl Friedrich Gauss,

Solving Sets of Equations 5 B.C.E., 九章算術 Carl Friedrich Gauss, 777-855 Gaussian-Jordan Elimination In Gauss-Jordan elimination, matrix is reduced to diagonal rather than triangular form Row combinations

### ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE

ECON20310 LECTURE SYNOPSIS REAL BUSINESS CYCLE YUAN TIAN This synopsis is designed merely for keep a record of the materials covered in lectures. Please refer to your own lecture notes for all proofs.

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters. By Tiemen M. Woutersen. Draft, January

A Simple Way to Calculate Confidence Intervals for Partially Identified Parameters By Tiemen M. Woutersen Draft, January 2006 Abstract. This note proposes a new way to calculate confidence intervals of

### 2.5 Gaussian Elimination

page 150 150 CHAPTER 2 Matrices and Systems of Linear Equations 37 10 the linear algebra package of Maple, the three elementary 20 23 1 row operations are 12 1 swaprow(a,i,j): permute rows i and j 3 3

### Testing for serial correlation in linear panel-data models

The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear panel-data models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear panel-data

### Notes V General Equilibrium: Positive Theory. 1 Walrasian Equilibrium and Excess Demand

Notes V General Equilibrium: Positive Theory In this lecture we go on considering a general equilibrium model of a private ownership economy. In contrast to the Notes IV, we focus on positive issues such

### NOTES ON LINEAR TRANSFORMATIONS

NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all

### EC 6310: Advanced Econometric Theory

EC 6310: Advanced Econometric Theory July 2008 Slides for Lecture on Bayesian Computation in the Nonlinear Regression Model Gary Koop, University of Strathclyde 1 Summary Readings: Chapter 5 of textbook.

### Subsets of Euclidean domains possessing a unique division algorithm

Subsets of Euclidean domains possessing a unique division algorithm Andrew D. Lewis 2009/03/16 Abstract Subsets of a Euclidean domain are characterised with the following objectives: (1) ensuring uniqueness

### p Values and Alternative Boundaries

p Values and Alternative Boundaries for CUSUM Tests Achim Zeileis Working Paper No. 78 December 2000 December 2000 SFB Adaptive Information Systems and Modelling in Economics and Management Science Vienna

### Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 2, FEBRUARY 2002 359 Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel Lizhong Zheng, Student

### Inequality, Mobility and Income Distribution Comparisons

Fiscal Studies (1997) vol. 18, no. 3, pp. 93 30 Inequality, Mobility and Income Distribution Comparisons JOHN CREEDY * Abstract his paper examines the relationship between the cross-sectional and lifetime

### Statistical machine learning, high dimension and big data

Statistical machine learning, high dimension and big data S. Gaïffas 1 14 mars 2014 1 CMAP - Ecole Polytechnique Agenda for today Divide and Conquer principle for collaborative filtering Graphical modelling,

### Macroeconomic Shocks and the Business Cycle: Evidence from a Structural Factor Model

Macroeconomic Shocks and the Business Cycle: Evidence from a Structural Factor Model Mario Forni Università di Modena e Reggio Emilia CEPR and RECent Luca Gambetti Universitat Autonoma de Barcelona and

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### FULLY MODIFIED OLS FOR HETEROGENEOUS COINTEGRATED PANELS

FULLY MODIFIED OLS FOR HEEROGENEOUS COINEGRAED PANELS Peter Pedroni ABSRAC his chapter uses fully modified OLS principles to develop new methods for estimating and testing hypotheses for cointegrating

### Forecasting Retail Credit Market Conditions

Forecasting Retail Credit Market Conditions Eric McVittie Experian Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other products and company names mentioned

### FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS Jeffrey M. Wooldridge Department of Economics Michigan State University East Lansing, MI 48824-1038