Estimating an ARMA Process
|
|
|
- Naomi Poole
- 9 years ago
- Views:
Transcription
1 Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators for autoregressions Main ideas Efficiency Maximum likelihood is nice, if you know the right distribution. For time series, its more motivation for least squares. If the data are Gaussian, then ML is efficient; if the data are not Gaussian, then ML amounts to complicated weighted least squares. Gaussian log-likelihood for the stationary process {X t } that generates X = (X 1,..., X n ) is (minus twice the log of the likelihood) 2l(µ, φ, θ, σ 2 ) = n log 2π + log Γ n + (X µ) Γ 1 n (X µ). (1) Think of the covariances γ(k) as functions of the parameters of the process, γ(k) = σ 2 g(k; φ, θ). (2) To find the maximum likelihood estimates of µ, φ, and θ for an ARMA(p, q) process is simply a numerical minimization of the negative log likelihood. All you need to do is express the covariances in (1) as functions of the unknown parameters. For example, for the AR(1) process X t = φ 1 X t 1 + w t with µ = 0 (given), γ(0) = σ 2 /(1 φ 2 1 ), and γ(h) = φ h γ(0).
2 Statistics 910, #12 2 Recursion The models we consider are causal, with time flowing in one direction. Hence, it is useful to decompose the joint distribution of X in the log-likelihood (1) as a sequence of one-sided conditional distributions: f(x 1,..., x n ) = f(x 1 )f(x 2 x 1 ) f(x 3 x 1, x 2 ) f(x n x 1,..., x n 1 ). MLEs for AR(1) It s useful to solve for the MLE in closed form for the simplest of models. The log-likelihood (4) simplifies for a Gaussian AR(1) process: l(φ, σ 2 ) = log f(x 1 )f(x 2 X 1 ) f(x n X n 1 ) σ 2 = log N(0, 1 φ 2 )f(x 2 X 1 ) f(x n X n 1 ) ( = n 2 log(2πσ2 ) log(1 φ2 ) The derivatives that give the MLE s for σ 2 and φ are: and l σ 2 = n 2σ 2 + SS 2σ 4 X 2 1(1 φ 2 ) + l φ = X2 1 φ + (Xt φx t 1 )X t 1 σ 2 φ 1 φ 2 ) (X t φx t 1 ) 2 /2σ 2. t=2 } {{ } SS where SS denotes the sum of squares from the exponential term in the likelihood. Setting these to zero and solving gives: and ( ) ˆσ 2 ˆφ 1 ˆφ 2 X2 1 = ˆσ 2 = SS/n (X t ˆφX t 1 )X t 1. t=2 Since the l.h.s. of this expression has approximate expected value zero (note the expression for the variance of X t, the MLE can be seen to be quite similar to the usual LS estimator.
3 Statistics 910, #12 3 Initial values Again thinking recursively, the likelihood looks a lot like that in the normal linear model if we only knew enough to get started. If we condition on X 1, X 2,..., X p and w 1, w 2,..., w q (i.e., assume we know these), then + 2 log f(x n p+1 x p 1, wq 1 ) = c + (n p) log σ2 (w t = x t φ 1 x t 1 φ p x t p θ 1 w t 1 θ q w t q ) 2 t=p+1 by the change of variables from x t to w t, as in the normal linear model. This expression becomes amenable to least squares. This approach is called conditional least squares in R. What does this leave out? Conditioning on the initial values does not leave out too much, really. We still Need values for w 1,..., w q, and Would like to gain further information from X 1,..., X p. Autoregressions Why start with autoregressions? Several reasons: These often fit quite well (don t need the MA terms) because we know that we can approximate X t = π j X t j. This Markovian approach has become more popular because of its speed. Estimation is fast (MLEs require some iteration). Avoid estimation of initial w t s. Sampling properties are well-known (essentially those of normal linear model with stochastic explanatory variables). Backwards Even if we don t want the AR model itself, these are often used to estimate the initial errors, w 1, w 2,,..., w q. By fitting an autoregression backwards in time, we can use the fit to estimate say, ŵ (m) t = X t m j=1 ˆπ jx t+j (if we assume normality, the process is reversible).
4 Statistics 910, #12 4 MLE for autoregression In the AR(p) case, f(x 1,..., x n ) = f(x 1, x 2,..., x p ) f(x p+1 x p 1 } {{ } ) f(x n xn p n 1 } {{ } ) messy simple 1 P n = f(x 1, x 2,..., x p ) e 2 p+1 w2 t /σ2 (2πσ 2 ) (n p)/2. where w t = (x t µ) j φ j(x t j µ) for t = p + 1,..., n. But for the initial terms (the messy part), we have the same likelihood as in the normal linear model, and the MLEs are those that we would get from the least squares regression of x t on x t 1,..., x t p and a constant. (If we call the constant in that regression β 0, then ˆµ = ˆβ 0 /(1 ˆφ 1 ˆφ p ). But for ignoring the contribution from x 1,..., x p, least squares matches maximum likelihood in the AR(p) case. Hence, maximum likelihood cannot improve the estimates much unless p is large relative to n. Recursion = triangular factorization A recursion captures the full likelihood. For an AR(p) model with coefficients φ p = (φ p1, φ p2,... φ pp ) express the lower-order coefficients as functions of φ p (e.g., find γ(0) and φ 11 = Corr(X t, X t 1 ) in terms of φ p ). If we can do that, it is simple to model f(x 1,..., x p ) = f(x 1 )f(x 2 x 1 ) f(x p x p 1 1 ). The prior AR(1) example shows this. In general, use the Levinson recursion to obtain a triangular decomposition of the covariance matrix Γ n. This is done by converting the correlated variables X 1,..., X n into a collection, say U 1, U 2,..., U n of uncorrelated variables. One has many ways of doing this, such as the Gram-Schmidt or Cholesky factorization. In the following, let P j denote the projection onto the random variables in X j (as in fitting a regression). Following the Cholesky factorization, construct U 1 = X 1 U 2 = X 2 P 1 X 2 = X 2 φ 1,1 X 1
5 Statistics 910, #12 5 U 3 = X 3 P 12 X 3 = X 3 φ 2,2 X 1 φ 2,1 X 2 U 4 = X 4 P 123 X 4 = X 4 φ 3,3 X 1 φ 3,2 X 2 φ 3,1 X 3 j 1 U j = X j φ j 1,k X j k k=1 (This sequence of projections differs from those used in the numerically superior modified Gram-Schmidt method. GS sweeps X 1 from all of the others first, filling the first column of L rather than recursively one row at a time.) Let L denote a lower triangular matrix that begins L = φ φ 22 φ φ 33 φ 32 φ with diagonal elements L kk = 1 and off-diagonal elements L kj = φ k 1,k j, j = 1,..., k 1. Since the U j are uncorrelated, we have D n = Var(U = LX) = LΓ n L Γ 1 n (3) = L 1 D 1 n L 1, (4) where D n is the diagonal matrix with the conditional variances k σk 2 = Var X t φ kj X t j along the diagonal. Comments It follows that σ 2 = lim n Γ n+1 / Γ n. (Where have we seen this ratio of determinants before? Toeplitz matrices, Szegö s theorem, and Levinson s recursion itself.) If the process is indeed AR(p), the lower triangular matrix L is banded, with p subdiagonal stripes. The element L kj = 0 for j < k p. j=1
6 Statistics 910, #12 6 If not an AR model? Since w t is a linear combination of X s, s = 1,..., t, the approximation is a simple way of computing/representing w t = X t j π j X t j. The terms become more accurate ( w t approaches w t ) as t increases since w t w t = j=t π jx t j γ(0)c t for some c < 1 (since the π j are eventually a sum of geometric series). We ll do an example of this in R; see the accompanying script. ARMA via regression Given estimates of w t, fit the regression model using the estimates of the error process as covariates. That is, regress X t on its p lags and q lags of the estimated errors, X t = φ 1 X t φ p X t p + θ 1 w t θ q w t q + w t. This method works rather well, plus it opens up the use of all of the tools of regression diagnostics for use in time series analysis. The method dates back to J. Durbin (1960) The fitting of time series models (Intl Statis. Review, ) and was exploited in a model selection context by Hannan and Kavaleris (1984) (Biometrika 71, ). Standard errors Another insight from this approach is that it suggests the properties of estimators of φ and θ. Estimates of φ (ignoring the MA components) resemble those from usual regression, only with lags. Hence, for an AR(p) model, it follows that Var( ˆφ) σ 2 ˆΓ 1 p Maximum likelihood ARMA(p, q) models As noted earlier, the MLE is easy to compute if we condition on prior observations X 0,..., X p+1 and prior errors w 0,..., w q+1, for then the likelihood can be written in terms of w 1,..., w n, which are iid Gaussian. Estimation then reduces to minimizing n t=1 w2 t.
7 Statistics 910, #12 7 Without this conditioning, the resulting estimates of w 1, w 2,..., w n are functions of the parameters φ and θ, we can chose parameters that minimize this sum, giving a nonlinear least squares problem. Approximate noise terms Assume for this section that µ = 0 so that we can focus on the estimates of φ and θ. Let β = (φ 1,..., φ p, θ 1,..., θ q ) denote the vector of unknown coefficients. Consider setting the prior unknown X j and w j for j 0 to their marginal expected values (0) and define w 1 (β) = X 1, w 2 (β) = X 2 φ 1 X 1 θ 1 w 1, w 3 (β) = X 3 φ 1 X 2 φ 2 X 1 θ 1 w 2 θ 2 w 1, w n (β) = X n φ j X n j θ j w n j j j Backcasting In general, algorithms start like this, and then find estimates of x t and w t for t 0 (the method is known as backcasting ). We will avoid those methods and consider (coming lecture) methods based on the Kalman filter (used in R). We will continue this discussion, though, to explore how the iterative estimation proceeds and discover the form of the asymptotic variance matrix of the estimates. Nonlinear least squares The nonlinear SS is approximately a least squares problem w t (β) 2 ( w t (β 0 ) D t(β β 0 )) 2, (5) t=1 where the derivative term is t=1 D tj = w t β j. The negative sign is chosen to make the sum (5) look like a regression sum-of-squares i (y i x i β)2 with D t playing the role of x i and w t (β 0 ) as y i. The estimate of β is thus ˆβ = β 0 + (D D) 1 D w(β 0 ).
8 Statistics 910, #12 8 Continuing the analogy to least squares, the variance matrix of ˆβ is then (approximating w t w t ), Var( ˆβ) = σ 2 (D D) 1. Derivatives simplify once the estimates are beyond the direct presence of the initial conditions where we filled in zeros. For t > max(p, q), D tj = w t φ j = X t j q θ k D t k,j, j = 1,..., p, k=1 and, for the moving average terms, (use the product rule for derivatives) q D t,p+j = w t j θ k D t k,p+j, j = 1,..., q. k=1 That is, θ(b)d t,j = X t j for the AR terms and θ(b)d t,p+j = w t j for the MA terms. The derivatives thus behave as two AR processes for which (see Property 3.9) and D tj B j u t, u t = 1 θ(b) X t j = 1 φ(b) w t D t,p+j B j v t, v t = 1 θ(b) w t. Hence, the matrix D D/n is approximately the variance covariance of two distinct AR processes, both with errors w t. Example: ARMA(1,1) process The variance/covariance matrix of the estimates is (page 135 in 2nd edition, 134 in 3rd) n Var( φ, θ) σ 2 [ E u 2 t E u t v t E u t v t E vt 2 ] 1 = [ 1/(1 φ 2 ) 1/(1 + φθ) 1/(1 + φθ) 1/(1 θ 2 ) A related issue at this point is whether it is better to use the expression here for the expected Fisher information, or to use the observed matrix of second derivatives (which is available within the actual estimation routine as D D). There is a long literature that compares the use of expected versus observed Fisher information. ] 1
9 Statistics 910, #12 9 Asymptotic distributions Information matrix MLEs in regular problems are asymptotically normal, unbiased, and have variance determined by the information matrix (to this order of approximation). The information matrix is the inverse of the matrix of the expected values of the second derivatives of the log-likelihood function. That is, n( θ θ) N(0, In ) where I n = (E 2 l ) 1. For the AR(1) example, we have θ 2 ( 2 ) ( l X 2 E φ 2 = E 1 + n 2 X2 t 1 σ 2 1 ) 3φ2 (1 φ 2 ) 2 n 1 φ 2, as we obtained for least squares. Independence of σ 2 and φ The mixed partial is ( 2 ) ( ) ( l SS/ φ φx 2 E σ 2 = E φ 2σ 4 = E 1 + ) w t X t 1 σ 4 = 0 so that the estimated coefficients and error variance estimate are asymptotically independent. Invariance The variance of ˆφ or ˆθ is invariant of σ 2. The accuracy of the estimate of φ only depends upon the correlations of the process, not the level of the noise variance. Examples Examples using US unemployment are in the file macro.r on the text web site. Alternative conditions. These results are utopian in the sense that they hold when all of the assumptions (large n, normality, known orders,...) hold. Simulation makes it easy to explore what happens when these assumptions fail. Interesting examples that you can explore are: Skewness in the distribution of the AR(1) estimator.
10 Statistics 910, #12 10 Effect of outliers in the w t process. Boundary effects as φ 1 nears 1. Effect of model specification errors. Effects on prediction What are the optimal predictors once we replace φ and θ by estimates, particularly when the fitted model may not match p and q of the underlying data generating mechanism? (As if the underlying process were an ARMA process!)
11 Statistics 910, #12 11 Appendix: Estimators for Autoregressions Introduction In applications, easily fitting autoregressions is important for obtaining initial values of parameters and in getting estimates of the error process. Least squares is a popular choice, as is the Yule Walker procedure. Unlike the YW procedure, LS can produce a nonstationary fitted model. Both the YW and LS estimators are noniterative and consistent, so they can be used as starting values for iterative methods like ML. Yule-Walker estimators are basically moment estimators, and so have the consistency and asymptotic normality associated with smooth functions of moment estimators (Slutsky s theorem). For an autoregression of order p, the Yule-Walker estimator solves the system ˆΓ p ˆφp = ˆγ p. The Yule-Walker estimator has two important properties not shared by other estimators for autoregressions: 1. Estimates for a sequence of models of order p = 1, 2,..., n 1 are easily computed recursively using the Levinson algorithm applied to the estimated covariance matrix ˆΓ n. 2. The associated estimates define stationary processes. That is the zeros of ˆφ(z) must must lie outside the unit circle. The stationarity follows from observing that ˆΓ p is positive definite. Asymptotics The result is essentially that which obtains in a corresponding regression equation. Assuming that the process is indeed AR(p), we have the same asymptotic distribution for all three of the estimators: n( ˆφ φ) AN(φ, Γ 1 p ) A feature useful for identification of p is embedded here. If we overfit a model of order m > p, the fitted model is still correct (though not efficient) with φ j = 0 for j = p + 1,..., m. Consequently, the overfit coefficients have mean zero with variance that is O(1/n). Similar properties The methods for deriving properties of the YW and LS estimators are similar. The LS estimator is considered here, and properties of the YW estimator follow similarly. The two types of
12 Statistics 910, #12 12 estimators are asymptotically equivalent (though rather different in reasonably sized applications). Estimating equations Write the normal equations for the LS estimator as n( ˆφ φ) = (X X/n) 1 (X w)/ n where the n p p matrix X and vector w are defined as X = (X p,..., X n), X t = (X t,..., X t p+1 ) w = (w p,..., w n ) The cross-product matrix is nonsingular due to the a.s. convergence X X/n a.s. Γ p and non-singularity of Γ n. Asymptotically, then, n( ˆφ φ) Γ 1 n (X w)/ n. Return of the sandwich To make life easier, assume that the w t are independent. It follows that n( ˆφ φ) AN(0, Γ 1 n Var(X w/ n)γ 1 n ), so we are left to deal with X w/ n. Consider the jth element of this vector, (not related to terms in Levinson s method considered earlier) U j = t=p+1 w t X t j / n. From the assumed independence, the variance and covariance of the U j s are: n Cov(U j, U k ) = E s,t=p+1 = nσ 2 γ(j k), w t X t j w s X s k or, in matrix notation, Var(U) = σ 2 Γ p. Hence, the asymptotic distribution is n( ˆφ φ) AN(0, σ 2 Γ 1 n ), as one would anticipate from the results for regression (σ 2 (X X) 1 ).
13 Statistics 910, #12 13 Weaker assumptions. You can obtain the same result without independence by assuming that the w t s are stationary to order 4 and writing the process as an infinite moving average X t = ψ j w t j, or by assuming that the w t s form a martingale difference sequence. Taking the former approach, U j = t=p+1 ( ) w t ψ k w t j m / n, m=0 and the covariance is (assuming κ 4 = 0) n Cov(U j, U k ) = = s,t = t s,t=p+1 l,m ψ l ψ m E(w t w s w t j m w s k l ) ψ l ψ m E(w t w s )E(w t j m w s k l ) l,m m=k j+l ψ l ψ m E(w 2 t )E(w t k m w t k l ) = (n p)σ 4 m ψ l ψ l+k j = (n p)σ 2 γ(k j), as before.
Lecture 2: ARMA(p,q) models (part 3)
Lecture 2: ARMA(p,q) models (part 3) Florian Pelgrin University of Lausanne, École des HEC Department of mathematics (IMEA-Nice) Sept. 2011 - Jan. 2012 Florian Pelgrin (HEC) Univariate time series Sept.
Time Series Analysis
Time Series Analysis Autoregressive, MA and ARMA processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 212 Alonso and García-Martos
Maximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
Sales forecasting # 2
Sales forecasting # 2 Arthur Charpentier [email protected] 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Basics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu [email protected] Modern machine learning is rooted in statistics. You will find many familiar
Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: [email protected] Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
Time Series Analysis 1. Lecture 8: Time Series Analysis. Time Series Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 MIT 18.S096
Lecture 8: Time Series Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis 1 Outline Time Series Analysis 1 Time Series Analysis MIT 18.S096 Time Series Analysis 2 A stochastic
1 Short Introduction to Time Series
ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The
Univariate and Multivariate Methods PEARSON. Addison Wesley
Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston
Approximate Likelihoods for Spatial Processes
Approximate Likelihoods for Spatial Processes Petruţa C. Caragea, Richard L. Smith Department of Statistics, University of North Carolina at Chapel Hill KEY WORDS: Maximum likelihood for spatial processes,
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Univariate Time Series Analysis; ARIMA Models
Econometrics 2 Spring 25 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Outline of the Lecture () Introduction to univariate time series analysis. (2) Stationarity. (3) Characterizing
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Linear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
Univariate Time Series Analysis; ARIMA Models
Econometrics 2 Fall 25 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Univariate Time Series Analysis We consider a single time series, y,y 2,..., y T. We want to construct simple
Linear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
7 Gaussian Elimination and LU Factorization
7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method
1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Time Series Analysis III
Lecture 12: Time Series Analysis III MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis III 1 Outline Time Series Analysis III 1 Time Series Analysis III MIT 18.S096 Time Series Analysis
Efficiency and the Cramér-Rao Inequality
Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
Autocovariance and Autocorrelation
Chapter 3 Autocovariance and Autocorrelation If the {X n } process is weakly stationary, the covariance of X n and X n+k depends only on the lag k. This leads to the following definition of the autocovariance
Markov Chain Monte Carlo Simulation Made Simple
Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical
ITSM-R Reference Manual
ITSM-R Reference Manual George Weigt June 5, 2015 1 Contents 1 Introduction 3 1.1 Time series analysis in a nutshell............................... 3 1.2 White Noise Variance.....................................
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
Time Series Analysis
Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
Logit Models for Binary Data
Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
Topic 5: Stochastic Growth and Real Business Cycles
Topic 5: Stochastic Growth and Real Business Cycles Yulei Luo SEF of HKU October 1, 2015 Luo, Y. (SEF of HKU) Macro Theory October 1, 2015 1 / 45 Lag Operators The lag operator (L) is de ned as Similar
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Trend and Seasonal Components
Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing
Factor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
Analysis and Computation for Finance Time Series - An Introduction
ECMM703 Analysis and Computation for Finance Time Series - An Introduction Alejandra González Harrison 161 Email: [email protected] Time Series - An Introduction A time series is a sequence of observations
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
SYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
State Space Time Series Analysis
State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State
3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
Non-Stationary Time Series andunitroottests
Econometrics 2 Fall 2005 Non-Stationary Time Series andunitroottests Heino Bohn Nielsen 1of25 Introduction Many economic time series are trending. Important to distinguish between two important cases:
Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
Exploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
Probability and Random Variables. Generation of random variables (r.v.)
Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly
MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform
MATH 433/533, Fourier Analysis Section 11, The Discrete Fourier Transform Now, instead of considering functions defined on a continuous domain, like the interval [, 1) or the whole real line R, we wish
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Time Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
Centre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Practice problems for Homework 11 - Point Estimation
Practice problems for Homework 11 - Point Estimation 1. (10 marks) Suppose we want to select a random sample of size 5 from the current CS 3341 students. Which of the following strategies is the best:
Chapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
Forecasting methods applied to engineering management
Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational
Solving Linear Systems, Continued and The Inverse of a Matrix
, Continued and The of a Matrix Calculus III Summer 2013, Session II Monday, July 15, 2013 Agenda 1. The rank of a matrix 2. The inverse of a square matrix Gaussian Gaussian solves a linear system by reducing
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Forecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
Lecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
ARMA, GARCH and Related Option Pricing Method
ARMA, GARCH and Related Option Pricing Method Author: Yiyang Yang Advisor: Pr. Xiaolin Li, Pr. Zari Rachev Department of Applied Mathematics and Statistics State University of New York at Stony Brook September
LECTURE 4. Last time: Lecture outline
LECTURE 4 Last time: Types of convergence Weak Law of Large Numbers Strong Law of Large Numbers Asymptotic Equipartition Property Lecture outline Stochastic processes Markov chains Entropy rate Random
Gaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
Multivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
Time Series Analysis of Aviation Data
Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in
Monte Carlo Simulation
1 Monte Carlo Simulation Stefan Weber Leibniz Universität Hannover email: [email protected] web: www.stochastik.uni-hannover.de/ sweber Monte Carlo Simulation 2 Quantifying and Hedging
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Exam Solutions. X t = µ + βt + A t,
Exam Solutions Please put your answers on these pages. Write very carefully and legibly. HIT Shenzhen Graduate School James E. Gentle, 2015 1. 3 points. There was a transcription error on the registrar
Chapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems
Numerical Methods I Solving Linear Systems: Sparse Matrices, Iterative Methods and Non-Square Systems Aleksandar Donev Courant Institute, NYU 1 [email protected] 1 Course G63.2010.001 / G22.2420-001,
a 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
Factor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
Note on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
Nonlinear Algebraic Equations Example
Nonlinear Algebraic Equations Example Continuous Stirred Tank Reactor (CSTR). Look for steady state concentrations & temperature. s r (in) p,i (in) i In: N spieces with concentrations c, heat capacities
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
Numerical methods for American options
Lecture 9 Numerical methods for American options Lecture Notes by Andrzej Palczewski Computational Finance p. 1 American options The holder of an American option has the right to exercise it at any moment
Notes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
Introduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
Master s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
The Engle-Granger representation theorem
The Engle-Granger representation theorem Reference note to lecture 10 in ECON 5101/9101, Time Series Econometrics Ragnar Nymoen March 29 2011 1 Introduction The Granger-Engle representation theorem is
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 48, NO. 7, JULY 2002 1873 Analysis of Mean-Square Error Transient Speed of the LMS Adaptive Algorithm Onkar Dabeer, Student Member, IEEE, Elias Masry, Fellow,
Multivariate time series analysis is used when one wants to model and explain the interactions and comovements among a group of time series variables:
Multivariate Time Series Consider n time series variables {y 1t },...,{y nt }. A multivariate time series is the (n 1) vector time series {Y t } where the i th row of {Y t } is {y it }.Thatis,for any time
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Sections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
Christfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
