Time Series Analysis



Similar documents
Time Series Analysis 1. Lecture 8: Time Series Analysis. Time Series Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 MIT 18.S096

Analysis and Computation for Finance Time Series - An Introduction

3.1 Stationary Processes and Mean Reversion

1 Short Introduction to Time Series

Sales forecasting # 2

Vector Time Series Model Representations and Analysis with XploRe

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Univariate and Multivariate Methods PEARSON. Addison Wesley

Time Series Analysis in Economics. Klaus Neusser

Chapter 6: Multivariate Cointegration Analysis

Lecture 2: ARMA(p,q) models (part 3)

Time Series Analysis

Chapter 4: Vector Autoregressive Models

Non-Stationary Time Series andunitroottests

Financial TIme Series Analysis: Part II

Time Series in Mathematical Finance

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Univariate Time Series Analysis; ARIMA Models

Introduction to Time Series Analysis. Lecture 6.

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500

Advanced Forecasting Techniques and Models: ARIMA

Time Series Analysis

State Space Time Series Analysis

Performing Unit Root Tests in EViews. Unit Root Testing

Time Series Analysis

Time Series Analysis of Aviation Data

Some useful concepts in univariate time series analysis

Time Series Analysis

ITSM-R Reference Manual

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Chapter 5: Bivariate Cointegration Analysis

Univariate Time Series Analysis; ARIMA Models

Chapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions

Forecasting methods applied to engineering management

Time Series Analysis

Time Series Analysis III

Maximum likelihood estimation of mean reverting processes

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

THE SVM APPROACH FOR BOX JENKINS MODELS

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Chapter 9: Univariate Time Series Analysis

Graphical Tools for Exploring and Analyzing Data From ARIMA Time Series Models

Empirical Properties of the Indonesian Rupiah: Testing for Structural Breaks, Unit Roots, and White Noise

TIME SERIES ANALYSIS

Estimating an ARMA Process

Volatility modeling in financial markets

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Testing The Quantity Theory of Money in Greece: A Note

TIME SERIES ANALYSIS

Time Series Analysis

Software Review: ITSM 2000 Professional Version 6.0.

LOGNORMAL MODEL FOR STOCK PRICES

Time Series Analysis

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Applied Time Series Analysis Part I

On the long run relationship between gold and silver prices A note

TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

Multivariate Normal Distribution

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Linear Threshold Units

Analysis of algorithms of time series analysis for forecasting sales

Examples. David Ruppert. April 25, Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

A course in Time Series Analysis. Suhasini Subba Rao

Forecasting Chilean Industrial Production and Sales with Automated Procedures 1

ARMA, GARCH and Related Option Pricing Method

Statistical Machine Learning

Statistics and Probability Letters

Multiple Linear Regression in Data Mining

SYSTEMS OF REGRESSION EQUATIONS

Discrete Time Series Analysis with ARMA Models

Powerful new tools for time series analysis

Forecasting Using Eviews 2.0: An Overview

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

Slides for Risk Management

Lecture 8: Gamma regression

On the mathematical theory of splitting and Russian roulette

VI. Real Business Cycles Models

Slides for Risk Management VaR and Expected Shortfall

Lecture 4: Seasonal Time Series, Trend Analysis & Component Model Bus 41910, Time Series Analysis, Mr. R. Tsay

Time Series - ARIMA Models. Instructor: G. William Schwert

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

Lectures in Modern Economic Time Series Analysis. 2 ed. c. Linköping, Sweden bo.sjo@liu.se

Statistics Graduate Courses

Working Papers. Cointegration Based Trading Strategy For Soft Commodities Market. Piotr Arendarski Łukasz Postek. No. 2/2012 (68)

Charles University, Faculty of Mathematics and Physics, Prague, Czech Republic.

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

4. Simple regression. QBUS6840 Predictive Analytics.

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

Forecasting model of electricity demand in the Nordic countries. Tone Pedersen

Factor analysis. Angela Montanari

Simple Linear Regression Inference

Sections 2.11 and 5.8

Transcription:

Time Series Analysis Andrea Beccarini Center for Quantitative Economics Winter 2013/2014 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 1 / 143

Introduction Objectives Time series are ubiquitous in economics, and very important in macro economics and financial economics GDP, inflation rates, unemployment, interest rates, stock prices You will learn... the formal mathematical treatment of time series and stochastic processes what the most important standard models in economics are how to fit models to real world time series Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 2 / 143

Introduction Prerequisites Descriptive Statistics Probability Theory Statistical Inference Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 3 / 143

Introduction Class and material Class Class teacher: Sarah Meyer Time: Tu., 12:00-14:00 Location: CAWM 3 Start: 22 October 2013 Material Course page on Blackboard Slides and class material are (or will be) downloadable Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 4 / 143

Introduction Literature Neusser, Klaus (2011), Zeitreihenanalyse in den Wirtschaftswissenschaften, 3. Aufl., Teubner, Wiesbaden. available online in the RUB-Netz Hamilton, James D. (1994), Time Series Analysis, Princeton University Press, Princeton. Pfaff, Bernhard (2006), Analysis of Integrated and Cointegrated Time Series with R, Springer, New York. Schlittgen, Rainer und Streitberg, Bernd (1997), Zeitreihenanalyse, 7. Aufl., Oldenbourg, München. Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 5 / 143

Basics Definition Definition: Time series A sequence of observations ordered by time is called time series Time series can be univariate or multivariate Time can be discrete or continous The states can be discrete or continuous Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 6 / 143

Basics Definition Typical notations x 1, x 2,..., x T or x(1), x(2),..., x(t ) or x t, t = 1,..., T or (x t ) t 0 This course is about... univariate time series in discrete time with continuous states Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 7 / 143

Basics Examples Quarterly GDP Germany, 1991 I to 2012 II GDP (in current billion Euro) 350 400 450 500 550 600 650 1995 2000 2005 2010 Time Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 8 / 143

Basics Examples DAX index and log(dax), 31.12.1964 to 6.4.2009 DAX 2000 6000 1970 1980 1990 2000 2010 Time logarithm of DAX 6.0 7.0 8.0 9.0 1970 1980 1990 2000 2010 Time Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 9 / 143

Basics Definition Definition: Stochastic process A sequence (X t ) t T of random variables, all defined on the same probability space (Ω, A, P), is called stochastic process with discrete time parameter (usually T = N or T = Z) Short version: A stochastic process is a sequence of random variables A stochastic process depends on both chance and time Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 10 / 143

Basics Definition Distinguish four cases: both time and chance can be fixed or variable t fixed t variable ω fixed X t (ω) is a X t (ω) is a sequence of real number real numbers (path, realization, trajectory) ω variable X t (ω) is a X t (ω) is a stochastic random variable process process.r Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 11 / 143

Basics Examples Example 1: White noise ε t NID ( 0, σ 2) Example 2: Random walk X t = X t 1 + ε t and X 0 = 0 ε t NID(0, σ 2 ) Example 3: A random constant X t = Z Z N(0, σ 2 ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 12 / 143

Basics Moment functions Definition: Moment functions The following functions of time are called moment functions: µ(t) = E(X t ) (expectation function) σ 2 (t) = Var(X t ) (variance function) γ(s, t) = Cov(X s, X t ) (covariance function) Correlation function (autocorrelation function) ρ(s, t) = γ(s, t) σ 2 (s) σ 2 (t) moments.r [1] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 13 / 143

Basics Estimation of moment functions Usually, the moment functions are unknown and have to be estimated Problem: Only a single path (realization) can be observed X (1) 1 X (2) 1... X (n) 1 X (1) 2 X (2) 2... X (n) 2...... X (1) T X (2) T... X (n) T Can we still estimate the expectation function µ(t) and the autocovariance function γ(s, t)? Under which conditions? Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 14 / 143

Basics Estimation of moment functions X (1) 1 X (2) 1... X (n) 1 X (1) 2 X (2) 2... X (n) 2...... X (1) T X (2) T... X (n) T Usually, the expectation function µ(t) should be estimated by averaging over realizations, ˆµ(t) = 1 n n i=1 X (i) t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143

Basics Estimation of moment functions X (1) 1 X (2) 1... X (n) 1 X (1) 2 X (2) 2... X (n) 2...... X (1) T X (2) T... X (n) T Under certain conditions, µ(t) can be estimated by averaging over time, ˆµ = 1 T (1) X t T t=1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 15 / 143

Basics Estimation of moment functions X (1) 1 X (2) 1... X (n) 1 X (1) 2 X (2) 2... X (n) 2...... X (1) T X (2) T... X (n) T Usually, the autocovariance γ(t, t + h) should be estimated by averaging over realizations, ˆγ(t, t + h) = 1 n n i=1 (X (i) t ˆµ(t))(X (i) t+h ˆµ(t + h)) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143

Basics Estimation of moment functions X (1) 1 X (2) 1... X (n) 1 X (1) 2 X (2) 2... X (n) 2...... X (1) T X (2) T... X (n) T Under certain conditions, γ(t, t + h) can be estimated by averaging over time, ˆγ(t, t + h) = 1 T T h (X (1) t ˆµ)(X (1) t+h ˆµ) t=1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 16 / 143

Basics Definition Moment functions cannot be estimated without additional assumptions since only one path is observed There are restrictions which allow to estimate the moment functions Restriction of the time heterogeneity: The distribution of (X t (ω)) t T must not be completely different for each t T Restriction of the memory: If the values of the process are coupled too closely over time, the individual observations do not supply any (or only insufficient) information about the distribution Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 17 / 143

Basics Restriction of time heterogeneity: Stationarity Definition: Strong stationarity Let (X t ) t T be a stochastic process, and let t 1,..., t n T be an arbitrary number of n N arbitrary time points. (X t ) t T is called strongly stationary if for arbitrary h Z P(X t1 x 1,..., X tn x n ) = P(X t1 +h x 1,..., X tn+h x n ) Implication: all univariate marginal distributions are identical Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 18 / 143

Basics Restriction of time heterogeneity: Stationarity Definition: Weak stationarity (X t ) t T is called weakly stationary if 1 the expectation exists and is constant: E(X t ) = µ < for all t T 2 the variance exists and is constant: Var(X t ) = σ 2 < for all t T 3 for all t, s, r Z (in admissible range) γ(t, s) = γ (t + r, s + r) Simplified notation for covariance and correlation functions γ(h) = γ(t, t + h) ρ(h) = ρ(t, t + h) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 19 / 143

Basics Restriction of time heterogeneity: Stationarity Strong stationarity implies weak stationarity (but only if the first two moments exist) A stochastic process is called Gaussian if the joint distribution of X t1,..., X tn is multivariate normal For Gaussian processes, weak and strong stationarity coincide Intuition: An observed time series can be regarded as a realization of a stationary process, if a gliding window of appropriate width always displays qualitatively the same picture stationary.r Examples [2] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 20 / 143

Basics Restriction of memory: Ergodicity Definition: Ergodicity (I) Let (X t ) t T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define ˆµ = 1 T T t=1 X t (X t ) t T is called (expectation) ergodic, if [ lim E (ˆµ T µ) 2] = 0 T Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 21 / 143

Basics Restriction of memory: Ergodicity Definition: Ergodicity (II) Let (X t ) t T be a weakly stationary stochastic process with expectation µ and autocovariance γ(h); define ˆγ(h) = 1 T T h (X t µ)(x t+h µ) t=1 (X t ) t T is called (covariance) ergodic, if for all h Z [ lim E (ˆγ(h) γ(h)) 2] = 0 T Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 22 / 143

Basics Restriction of memory: Ergodicity Ergodicity is consistency (in quadratic mean) of the estimators ˆµ of µ and ˆγ(h) of γ(h) for dependent observations The process (X t ) t T is expectation ergodic if (γ(h)) h Z is absolutely summable, i.e. h= γ(h) < The dependence between far away observations must be sufficiently small Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 23 / 143

Basics Restriction of memory: Ergodicity Ergodicity condition (for autocovariance): A stationary Gaussian process (X t ) t T with absolutely summable autocovariance function γ(h) is (autocovariance) ergodic Under ergodicity, the law of large numbers holds even if the observations are dependent If the dependence γ(h) does not diminish fast enough, the estimators are no longer consistent Examples [3] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 24 / 143

Basics Estimation of moment functions Summary of estimators electricity.r ˆµ = X T = 1 T T t=1 X t ˆγ(h) = 1 T h (X t ˆµ)(X t+h ˆµ) T t=1 ˆρ(h) = ˆγ(h) ˆγ(0) Sometimes, ˆγ(h) is defined with factor 1/(T h) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 25 / 143

Basics Estimation of moment functions A closer look at the expectation estimator ˆµ The estimator ˆµ is unbiased, i.e. E(ˆµ) = µ [4] The variance of ˆµ is Var (ˆµ) = γ (0) T Under ergodicity, for T + 2 T T Var (ˆµ) γ (0) + 2 T 1 h=1 ( 1 h ) γ (h) T γ (h) = h=1 h= γ(h) [5] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 26 / 143

Basics Estimation of moment functions For Gaussian processes, ˆµ is normally distributed ˆµ N (µ, Var(ˆµ)) and asymptotically ( ) T (ˆµ µ) Z N 0, γ (0) + 2 γ (h) h=1 For non-gaussian processes, ˆµ is (often) asymptotically normal ( ) T (ˆµ µ) Z N 0, γ (0) + 2 γ (h) h=1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 27 / 143

Basics Estimation of moment functions A closer look at the autocovariance estimators ˆγ(h) For Gaussian processes with absolutely summable covariance function, ( T (ˆγ (0) γ (0)),..., T (ˆγ (K) γ (K)) ) is multivariate normal with expectation vector (0,..., 0) and = T Cov (ˆγ (h 1 ), ˆγ (h 2 )) (γ (r) γ (r + h 1 + h 2 ) + γ (r h 2 ) γ (r + h 1 )) r= Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 28 / 143

Basics Estimation of moment functions A closer look at the autocorrelation estimators ˆρ(h) For Gaussian processes with absolutely summable covariance function, the random vector ( T (ˆρ (0) ρ (0)),..., T (ˆρ (K) ρ (K)) ) is multivariate normal with expectation vector (0,..., 0) and a complicated covariance matrix Be careful: For small to medium sample sizes the autocovariance and autocorrelation estimators are biased! autocorr.r Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 29 / 143

Basics Estimation of moment functions An important special case for autocorrelation estimators: Let (ε t ) be a white-noise process with Var (ε t ) = σ 2 <, then E (ˆρ (h)) = T 1 + O(T 2 ) Cov (ˆρ (h 1 ), ˆρ (h 2 )) = { T 1 + O(T 2 ) for h 1 = h 2 O ( T 2) else For white-noise processes and long time series, the empirical autocorrelations are approximately independent normal random variables with expectation T 1 and variance T 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 30 / 143

Mathematical digression (I) Complex numbers Some quadratic equations do not have real solutions, e.g. x 2 + 1 = 0 Still it is possible (and sensible) to define solutions to such equations The definition in common notation is i = 1 where i is the number which, when squared, equals 1 The number i is called imaginary (i.e. not real) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 31 / 143

Mathematical digression (I) Complex numbers Other imaginary numbers follow from this definition, e.g. 16 = 16 1 = 4i 5 = 5 1 = 5i Further, it is possible to define numbers that contain both a real part and an imaginary part, e.g. 5 8i or a + bi Such numbers are called complex and the set of complex numbers is denoted as C The pair a + bi and a bi is called conjugate complex Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 32 / 143

Mathematical digression (I) Complex numbers Geometric interpretation: seq(0, 8, length = 11) imaginary axis absolute value r a+bi imaginary part b θ real part a real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 33 / 143

Mathematical digression (I) Complex numbers Polar coordinates and Cartesian coordinates z = a + bi = r (cos θ + i sin θ) = re iθ a = r cos θ b = r sin θ r = a 2 + b ( 2 ) b θ = arctan a Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 34 / 143

Mathematical digression (I) Complex numbers Rules of calculus: Addition (a + bi) + (c + di) = (a + c) + (b + d)i Multiplication (cartesian coordinates) (a + bi) (c + di) = (ac bd) + (ad + bc)i Multiplication (polar coordinates) r 1 e iθ1 r 2 e iθ 2 = r 1 r 2 e i(θ 1+θ 2 ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 35 / 143

Mathematical digression (I) Complex numbers Addition: seq( 2, 8, length = 11) imaginary axis c+di a+bi real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143

Mathematical digression (I) Complex numbers Addition: seq( 2, 8, length = 11) imaginary axis c+di a+bi real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143

Mathematical digression (I) Complex numbers Addition: (a+c)+(b+d)i seq( 2, 8, length = 11) imaginary axis c+di a+bi real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 36 / 143

Mathematical digression (I) Complex numbers Multiplication: seq( 2, 8, length = 11) imaginary axis r 2 θ 2 θ 1 r 1 real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143

Mathematical digression (I) Complex numbers Multiplication: seq( 2, 8, length = 11) imaginary axis r = r 1 r 2 r 2 θ = θ 1 + θ 2 θ 2 θ 1 r 1 real axis Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 37 / 143

Mathematical digression (I) Complex numbers The quadratic equation has the solutions If p2 4 x 2 + px + q = 0 x = p 2 ± p 2 4 q q < 0 the solutions are complex (and conjugate) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 38 / 143

Mathematical digression (I) Complex numbers Example: The solutions of x 2 2x + 5 = 0 are and x = ( 2) 2 x = ( 2) 2 + ( 2) 2 4 ( 2) 2 4 5 = 1 + 2i 5 = 1 2i Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 39 / 143

Mathematical digression (II) Linear difference equations First order difference equation with initial value x 0 : x t = c + φ 1 x t 1 p-th order difference equation with initial value x 0 : x t = c + φ 1 x t 1 +... + φ p x t p A sequence (x t ) t=0,1,... that satisfies the difference equation is called a solution of the difference equation Examples (diffequation.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 40 / 143

Mathematical digression (II) Linear difference equations We only consider the homogeneous case, i.e. c = 0 The general solution of the first-order difference equation is x t = φ 1 x t 1 x t = A φ t 1 with arbitrary constant A since x t = Aφ t 1 = φ 1Aφ t 1 1 = φ 1 x t 1 The constant is definitized by the initial condition, A = x 0 The sequence x t = Aφ t 1 is convergent if and only if φ 1 < 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 41 / 143

Mathematical digression (II) Linear difference equations Solution of the p-th order difference equation x t = φ 1 x t 1 +... + φ p x t p Let x t = Az t, then Az t = φ 1 Az (t 1) +... + φ p Az (t p) z t = φ 1 z (t 1) +... + φ p z (t p) and thus 1 φ 1 z 1... φ p z p = 0 Characteristic polynomial, characteristic equation Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 42 / 143

Mathematical digression (II) Linear difference equations There are p (possibly complex, possibly nondistinct) solutions of the characteristic equation Denote the solutions (called roots) by z 1,..., z p If all roots are real and distinct, then x t = A 1 z1 t +... + A p zp t is a solution of the homogeneous difference equation If there are complex roots the solution is oscillating The constants A 1,..., A p can be definitized with p initial conditions (x 0, x 1,..., x p 1 ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 43 / 143

Mathematical digression (II) Linear difference equations Stability condition: The linear difference equation x t = φ 1 x t 1 +... + φ p x t p is stable (i.e. convergent) if and only if all roots of the characteristic polynomial 1 φ 1 z... φ p z p = 0 are outside the unit circle, i.e. z i > 1 for all i = 1,..., p In R, the stability condition can be checked easily using the commands polyroot (base package) or ArmaRoots (farma package) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 44 / 143

ARMA models Definition Definition: ARMA process Let (ε t ) t T be a white noise process; the stochastic process X t = φ 1 X t 1 +... + φ p X t p + ε t + θ 1 ε t 1 +... + θ q ε t q with φ p, θ q 0 is called ARMA(p, q) process AutoRegressive Moving Average process ARMA processes are important since every stationary process can be approximated by an ARMA process Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 45 / 143

ARMA models Lag operator and lag polynomial The lag operator is a convenient notational tool The lag operator L shifts the time index of a stochastic process L (X t ) t T = (X t 1 ) t T LX t = X t 1 Rules L 2 X t = L (LX t ) = X t 2 L n X t = X t n L 1 = X t+1 L 0 X t = X t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 46 / 143

ARMA models Lag operator and lag polynomial Lag polynomial A(L) = a 0 + a 1 L + a 2 L 2 +... + a p L p Example: Let A(L) = 1 0.5L and B(L) = 1 + 4L 2, then C(L) = A(L)B(L) = (1 0.5L) ( 1 + 4L 2) = 1 0.5L + 4L 2 2L 3 Lag polynomials can be treated in the same way as ordinary polynomials Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 47 / 143

ARMA models Lag operator and lag polynomial Define the lag polynomials Φ(L) = 1 φ 1 L... φ p L p Θ(L) = 1 + θ 1 L +... + θ q L q The ARMA(p, q) process can be written compactly as Important special cases Φ(L)X t = Θ(L)ε t MA(q) process : X t = ε t + θ 1 ε t 1 +... + θ q ε t q AR(1) process : X t = φ 1 X t 1 + ε t AR(p) process : X t = φ 1 X t 1 + + φ p X t p + ε t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 48 / 143

ARMA models MA(q) process The MA(q) process is X t = Θ(L)ε t X t = ε t + θ 1 ε t 1 +... + θ q ε t q with ε t NID(0, σε) 2 Expectation function E(X t ) = E (ε t + θ 1 ε t 1 +... + θ q ε t q ) = E(ε t ) + θ 1 E(ε t 1 ) +... + θ q E(ε t q ) = 0 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 49 / 143

ARMA models MA(q) process Autocovariance function γ (s, t) = E [ (ε s + θ 1 ε s 1 +... + θ q ε s q ) (ε t + θ 1 ε t 1 +... + θ q ε t q ) ] = E [ ε s ε t + θ 1 ε s ε t 1 + θ 2 ε s ε t 2 +... + θ q ε s ε t q +θ 1 ε s 1 ε t + θ1ε 2 s 1 ε t 1 + θ 1 θ 2 ε s 1 ε t 2 +... + θ 1 θ q ε s 1 ε t q +... +θ q ε s q ε t + θ 1 θ q ε s q ε t 1 + θ 2 θ q ε s q ε t 2 +... + θqε 2 ] s q ε t q The expectations of the cross products are { 0 for s t E(ε s ε t ) = for s = t σ 2 ε Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 50 / 143

ARMA models MA(q) process Define θ 0 = 1, then γ (t, t) = σ 2 ε γ (t 1, t) = σ 2 ε γ (t 2, t) = σ 2 ε q i=0 θ2 i q 1 q 2 i=0 θ iθ i+1 i=0 θ iθ i+2 γ (t q, t) = σεθ 2 0 θ q = σεθ 2 q γ (s, t) = 0 for s < t q Hence, MA(q) processes are always stationary Simulation of MA(q) processes (maqsim.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 51 / 143

ARMA models AR(1) process The AR(1) process is Φ(L)X t = ε t (1 φ 1 L)X t = ε t X t = φ 1 X t 1 + ε t with ε t NID(0, σε) 2 Expectation and variance function [6] Stability condition: AR(1) processes are stable if φ 1 < 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 52 / 143

ARMA models AR(1) process Stationarity: Stable AR(1) processes are weakly stationary if [7] E (X 0 ) = 0 Var(X 0 ) = σ 2 ε 1 φ 2 1 Nonstationary stable processes converge towards stationarity [8] It is common parlance to call stable processes stationary Covariance function of stationary AR(1) process [9] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 53 / 143

ARMA models AR(p) process The AR(p) process is Φ(L)X t = ε t X t = φ 1 X t 1 +... + φ p X t p + ε t with ε t NID(0, σε) 2 Assumption: ε t is independent from X t 1, X t 2,... (innovations) Expectation function The covariance function is complicated (ar2autocov.r) [10] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 54 / 143

ARMA models AR(p) process AR(p) processes are stable if all roots of the characteristic equation Φ(z) = 0 are larger than 1 in absolute value, z i > 1 for i = 1,..., p An AR(p) process is weakly stationary if the joint distribution of the p initial values (X 0, X 1,..., X (p 1) ) is appropriate Stable AR(p) processes converge towards stationarity; they are often called stationary Simulation of AR(p) processes (arpsim.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 55 / 143

ARMA models Invertability AR and MA processes can be inverted (into each other) Example: Consider the stable AR (1) process with φ 1 < 1 X t = φ 1 X t 1 + ε t = φ 1 (φ 1 X t 2 + ε t 1 ) + ε t = φ 2 1X t 2 + φ 1 ε t 1 + ε t. = φ n 1X t n + φ n 1 1 ε t (n 1) +... + φ 2 1ε t 2 + φ 1 ε t 1 + ε t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 56 / 143

ARMA models Invertability Since φ 1 < 1 X t = φ i 1ε t i i=0 = ε t + θ 1 ε t 1 + θ 2 ε t 2 +... with θ i = φ i 1 A stable AR(1) process can be written as an MA( ) process (the same is true for stable AR(p) processes) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 57 / 143

ARMA models Invertability Using lag polynomials this can be written as (1 φ 1 L)X t = ε t X t = (1 φ 1 L) 1 ε t X t = (φ 1 L) i ε t i=0 General compact and elegant notation Φ(L)X t = ε t X t = (Φ(L)) 1 ε t = Θ(L)ε t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 58 / 143

ARMA models Invertability MA(q) can be written as AR( ) if all roots of Θ(z) = 0 are larger than 1 in absolute value (invertability condition) Example: MA(1) with θ 1 < 1; from X t = ε t + θ 1 ε t 1 θ 1 X t 1 = θ 1 ε t 1 + θ 2 1ε t 2 we find X t = θ 1 X t 1 + ε t θ 2 1 ε t 2 Repeated substitution of the ε t i terms yields X t = φ i X t i + ε t with φ i = ( 1) i+1 θ1 i i=1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 59 / 143

ARMA models Invertability Summary ARMA(p, q) processes are stable if all roots of Φ(z) = 0 are larger than 1 in absolute value ARMA(p, q) processes are invertible if all roots of Θ(z) = 0 are larger than 1 in absolute value Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 60 / 143

ARMA models Invertability Sometimes (e.g. for proofs), it is useful to write an ARMA(p, q) process either as AR( ) or as MA( ) ARMA(p, q) can be written as AR( ) or MA( ) Φ(L)X t = Θ(L)ε t (Θ(L)) 1 Φ(L)X t = ε t X t = (Φ(L)) 1 Θ(L)ε t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 61 / 143

ARMA models Deterministic components Until now we only considered processes with zero expectation Many processes have both a zero-expectation stochastic component (Y t ) and a non-zero deterministic component (D t ) Examples: linear trend D t = a + bt exponential trend D t = ab t saisonal patterns Let (X t ) t Z be a stochastic process with deterministic component D t and define Y t = X t D t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 62 / 143

ARMA models Deterministic components Then E(Y t ) = 0 and Cov (Y t, Y s ) = E [(Y t E (Y t )) (Y s E (Y s ))] = E[(X t D t E(X t D t ))(X s D s E(X s D s ))] = E [(X t E (X t )) (X s E (X s ))] = Cov (X t, X s ) The covariance function does not depend on the deterministic component To derive the covariance function of a stochastic process, simply drop the deterministic component Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 63 / 143

ARMA models Deterministic components Special case: D t = µ t = µ ARMA(p, q) process with constant (non-zero) expectation X t µ = φ 1 (X t 1 µ) +... + φ p (X t p µ) +ε t + θ 1 ε t 1 +... + θ q ε t q The process can also be written as X t = c + φ 1 X t 1 +... + φ p X t p + ε t + θ 1 ε t 1 +... + θ q ε t q where c = µ (1 φ 1... φ p ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 64 / 143

ARMA models Deterministic components Wold s representation theorem: Every stationary stochastic process (X t ) t T can be represented as X t = ψ h ε t h + D t h=0 with ψ 0 = 1, h=0 ψ2 j < and ε t white noise with variance σ 2 > 0 Stationary stochastic processes can be written as a sum of a deterministic process and an MA( ) process Often, low order ARMA(p, q) processes can approximate MA( ) processes well Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 65 / 143

ARMA models Linear processes and filter Definition: Linear process Let (ε t ) t Z be a white noise process; a stochastic process (X t ) t Z is called linear if it can be written as X t = h= = Ψ(L)ε t ψ h ε t h where the coefficients are absolutely summable, i.e. h= ψ h <. The lag polynomial Ψ(L) is called (linear) filter Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 66 / 143

ARMA models Linear processes and filter Some special filters Change from previous period (difference filter) Ψ(L) = 1 L Change from last year (for quarterly or monthly data) Ψ(L) = 1 L 4 Ψ(L) = 1 L 12 Elimination of saisonal influences (quarterly data) Ψ(L) = ( 1 + L + L 2 + L 3) /4 Ψ(L) = 0.125L 2 + 0.25L + 0.25 + 0.25L 1 + 0.125L 2 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 67 / 143

ARMA models Linear processes and filter Hodrick-Prescott filter (important tool in empirical macro economics) Decompose a time series (X t ) into a long-term growth component (G t ) and a short-term cyclical component (C t ) X t = G t + C t Trade-off between goodness-of-fit and smoothness of G t Minimize the criterion function T t=1 T 1 (X t G t ) 2 + λ [(G t+1 G t ) (G t G t 1 )] 2 t=2 with respect to G t for given smoothness parameter λ Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 68 / 143

ARMA models Linear processes and filter The FOCs of the minimization problem are G 1. G T = A X 1. X T where A = (I + λk K) 1 with 1 2 1 0 0... 0 0 0 0 1 2 1 0... 0 0 0 K = 0 0 1 2 1... 0 0 0........... 0 0 0 0 0... 1 2 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 69 / 143

ARMA models Linear processes and filter The HP filter is a linear filter Typical values for smoothing parameter λ λ = 10 λ = 1600 λ = 14400 annual data quarterly data monthly data Implementation in R (code by Olaf Posch) Empirical examples (hpfilter.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 70 / 143

Estimation of ARMA models The estimation problem Problem: The parameters φ 1,..., φ p, θ 1,..., θ q, σ 2 ε of an ARMA(p, q) process are usually unknown They have to be estimated from an observed time series X 1,..., X T Standard estimation methods: Least squares (OLS) Maximum likelihood (ML) Assumption: the lag orders p and q are known Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 71 / 143

Estimation of ARMA models Least squares estimation of AR(p) models The AR(p) model with non-zero constant expectation X t = c + φ 1 X t 1 +... + φ p X t p + ε t can be writte in matrix notation X p+1 1 X p X p 1... X 1 X p+2. = 1 X p+1 X p... X 2....... X T 1 X T 1 X T 2... X T p c φ 1. φ p + ε p+1 ε p+2. ε T Compact notation: y = Xβ + u Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 72 / 143

Estimation of ARMA models Least squares estimation of AR(p) models The standard least squares estimator is ˆβ = ( X X ) 1 X y The matrix of exogenous variables X is stochastic usual results for OLS regression do not hold But: There is no contemporaneous correlation between the error term and the exogenous variables Hence, the OLS estimators are consistent and asymptotically efficient Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 73 / 143

Estimation of ARMA models Least squares estimation of ARMA models Solve the ARMA equation for ε t, X t = c + φ 1 X t 1 +... + φ p X t p + ε t + θ 1 ε t 1 +... + θ q ε t q ε t = X t c φ 1 X t 1... φ p X t p θ 1 ε t 1... θ q ε t q Define the residuals as functions of the unknown parameters ˆε t (d, f 1,..., f p, g 1,..., g q ) = X t d f 1 X t 1... f p X t p g 1ˆε t 1... g q ˆε t q Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 74 / 143

Estimation of ARMA models Least squares estimation of ARMA models Define the sum of squared residuals S (d, f 1,..., f p, g 1,..., g q ) = The least squares estimators are T (ˆε t (d, f 1,..., f p, g 1,..., g q )) 2 t=1 (ĉ, ˆφ 1,..., ˆφ p, ˆθ 1,..., ˆθ q ) = arg min S (d, f 1,..., f p, g 1,..., g q ) Since the residuals are defined recursively one needs starting values ˆε 0,..., ˆε q+1 and X 0,..., X p+1 to calculate ˆε 1 Easiest way: Set all starting values to zero ( conditional estimation ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 75 / 143

Estimation of ARMA models Least squares estimation of ARMA models The first order conditions are a nonlinear equation system which cannot be solved easily Minimization by standard numerical methods (implemented in all usual statistical packages) Either solve the nonlinear first order conditions equation system or minimize S Simple special case: ARMA(1, 1) arma11.r Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 76 / 143

Estimation of ARMA models Maximum likelihood estimation Additional assumption: The innovations ε t are normally distributed Implication: ARMA processes are Gaussian The joint distribution of X 1,..., X T is multivariat normal X = X 1. X T N (µ, Σ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 77 / 143

Estimation of ARMA models Maximum likelihood estimation Expectation vector µ = E X 1. X T = c/ (1 φ 1... φ p ). c/ (1 φ 1... φ p ) Covariance matrix X 1 X 2 Σ = Cov. = X T γ(0) γ(1)... γ(t 1) γ(1) γ(0)... γ (T 2)...... γ(t 1) γ (T 2)... γ(0) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 78 / 143

Estimation of ARMA models Maximum likelihood estimation The expectation vector and the covariance matrix contain all unknown parameters ψ = ( φ 1,..., φ p, θ 1,..., θ q, c, σε 2 ) The likelihood function is ( L (ψ; X) = (2π) T /2 (det Σ) 1/2 exp 1 ) 2 (X µ) Σ 1 (X µ) and the loglikelihood function is ln L (ψ; X) = T 2 ln (2π) 1 2 ln (det Σ) 1 2 (X µ) Σ 1 (X µ) The ML estimators are ˆψ = arg max ln L (ψ; X) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 79 / 143

Estimation of ARMA models Maximum likelihood estimation The loglikelihood function has to be maximized by numerical methods Standard properties of ML estimators: 1 consistency 2 asymptotic efficiency 3 asymptotically jointly normally distributed 4 the covariance matrix of the estimators can be consistently estimated Example: ML estimation of an ARMA(3, 3) model for the interest rate spread (arma33.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 80 / 143

Estimation of ARMA models Hypothesis tests Since the estimation method is maximum likelihood, the classical tests (Wald, LR, LM) are applicable General null and alternative hypotheses H 0 : g(ψ) = 0 H 1 : not H 0 where g(ψ) is an m-valued function of the parameters Example: If H 0 : φ 1 = 0 then m = 1 and g(ψ) = φ 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 81 / 143

Estimation of ARMA models Hypothesis tests Likelihood ratio test statistic LR = 2(ln L(ˆθ ML ) ln L(ˆθ R )) where ˆθ ML and ˆθ R are the unrestricted and restricted estimators Under the null hypothesis LR d U χ 2 m and H 0 is rejected at significance level α if LR > χ 2 m;1 α Disadvantage: Two models must be estimated Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 82 / 143

Estimation of ARMA models Hypothesis tests For the Wald test we only consider g(ψ) = ψ ψ 0, i.e. Test statistic H 0 : ψ = ψ 0 H 1 : not H 0 W = ( ˆψ ψ 0 ) Ĉov( ˆψ)( ˆψ ψ 0 ) If the null hypothesis is true then W d U χ 2 m The asymptotic covariance matrix can be estimated consistently as Ĉov( ˆψ) = H 1 where H is the Hessian matrix returned by the maximization procedure Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 83 / 143

Estimation of ARMA models Hypothesis tests Test example 1: H 0 : φ 1 = 0 H 1 : φ 1 0 Test example 2 Illustration (arma33.r) H 0 : ψ = ψ 0 H 1 : not H 0 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 84 / 143

Estimation of ARMA models Model selection Usually, the lag orders p and q of an ARMA model are unknown Trade-off: Goodness-of-fit against parsimony Akaike s information criterion for the model with non-zero expectation AIC = ln }{{} ˆσ 2 + 2 (p + q + 1) /T } {{ } goodness-of-fit penalty Choose the model with the smallest AIC Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 85 / 143

Estimation of ARMA models Model selection Bayesian information criterion BIC (Schwarz information criterion) BIC = ln ˆσ 2 + (p + q + 1) ln T /T Hannan-Quinn information criterion HQ = ln ˆσ 2 + 2 (p + q + 1) ln (ln T ) /T Both BIC and HQ are consistent while the AIC tends to overfit Illustration (arma33.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 86 / 143

Estimation of ARMA models Model selection Another illustration: The true model is ARMA(2, 1) with X t = 0.5X t 1 + 0.3X t 2 + ε t + 0.7ε t 1 ; 1000 samples of size n = 500 were generated; the table shows the model s orders p and q as selected by AIC and BIC # orders selected by AIC # orders selected by BIC q q p 0 1 2 3 4 5 0 1 2 3 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 18 64 23 14 6 0 310 167 4 0 0 2 0 171 21 16 5 7 0 503 3 1 0 0 3 0 7 35 58 80 45 1 0 2 1 0 0 4 9 2 12 139 37 44 6 1 0 0 0 0 5 11 6 12 56 46 56 1 0 0 0 0 0 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 87 / 143

Integrated processes Difference operator Define the difference operator = 1 L, then Second order differences X t = X t X t 1 2 = ( ) = (1 L) 2 = 1 2L + L 2 Higher orders n are defined in the same way; note that n 1 L n Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 88 / 143

Integrated processes Definition Definition: Integrated process A stochastic process is called integrated of order 1 if X t = µ + Ψ(L)ε t where ε t is white noise, Ψ(1) 0, and j=0 j ψ j < Common notation: X t I (1) I (1) processes are also called difference stationary or unit root processes Stochastic and deterministic trends Trend stationary processes are not I (1) (since Ψ(1) = 0) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 89 / 143

Integrated processes Definition Stationary processes are sometimes called I (0) Higher order integrations are possible, e.g. X t I (2) 2 X t I (0) In general, X t I (d) means that d X t I (0) Most economic time series are either I (0) or I (1) Some economic time series may be I (2) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 90 / 143

Integrated processes Definition Example 1: The random walk with drift, X t = b + X t 1 + ε t, is I (1) because X t = X t X t 1 = b + ε t = b + Ψ(L)ε t where ψ 0 = 1 and ψ j = 0 for j 0 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 91 / 143

Integrated processes Definition Example 2: The trend stationary process, X t = a + bt + ε t, is not I (1) because X t = b + ε t ε t 1 = Ψ(L)ε t with ψ 0 = 1, ψ 1 = 1 and ψ j = 0 for all other j Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 92 / 143

Integrated processes Definition Example 3: The AR(2) process X t = b + (1 + φ) X t 1 φx t 2 + ε t (1 φl) (1 L) X t = b + ε t is I (1) if φ < 1 because X t = Ψ(L) (b + ε t ) with Ψ(L) = (1 φl) 1 = 1 + φl + φ 2 L 2 + φ 3 L 3 + φ 4 L 4 +... and thus Ψ(1) = i=0 φi = 1 equation are z = 1 and z = 1/φ 1 φ 0. The roots of the characteristic Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 93 / 143

Integrated processes Definition Example 4: The process X t = 0.5X t 1 0.4X t 2 + ε t is a stationary (stable) zero expectation AR(2) process; the process is trend stationary and I (0) since Y t = a + bt + X t Y t = b + X t with X t = Ψ(L)ε t = (1 L) ( 1 0.5L + 0.4L 2) 1 εt and therefore Ψ(1) = 0 (i0andi1.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 94 / 143

Integrated processes Definition Definition: ARIMA process Let (ε t ) t T be a white noise process; the stochastic process (X t ) t Z is called integrated autoregressive moving-average process of the orders p, d and q, or ARIMA(p, d, q), if d X t is an ARMA(p, q) process Φ(L) d X t = Θ(L)ε t For d > 0 the process is nonstationary (I (d)) even if all roots of Φ(z) = 0 are outside the unit circle Simulation of an ARIMA(p, d, q) process (arimapdqsim.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 95 / 143

Integrated processes Deterministic versus stochastic trends Why is it important to distinguish deterministic and stochastic trends? Reason 1: Long-term forecasts and forecasting errors Deterministic trend: The forecasting error variance is bounded Stochastic trend: The forecasting error variance is unbounded Illustrations i0andi1.r Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 96 / 143

Integrated processes Deterministic versus stochastic trends Why is it important to distinguish deterministic and stochastic trends? Reason 2: Spurious regression OLS regressions will show spurious relationships between time series with (deterministic or stochastic) trends Detrending works if the series have deterministic trends, but it does not help if the series are integrated Illustrations spurious1.r Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 97 / 143

Integrated processes Integrated processes and parameter estimation OLS estimators (and ML estimators) are consistent and asymptotically normal for stationary processes The asymptotic normality is lost if the processes are integrated We only look at the very special case with ε t NID(0, 1) and X 0 = 0 X t = φ 1 X t 1 + ε t The AR(1) process is stationary if φ 1 < 1 and has a unit root if φ 1 = 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 98 / 143

Integrated processes Integrated processes and parameter estimation The usual OLS estimator of φ 1 is ˆφ 1 = T t=1 X tx t 1 T t=1 X 2 t 1 How does the distribution of ˆφ look like? Influence of φ and T Consistency? Asymptotic normality? Illustration (phihat.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 99 / 143

Integrated processes Integrated processes and parameter estimation Consistency and asymptotic normality for I (0) processes ( φ 1 < 1) plim ˆφ 1 = φ 1 T ( ˆφ 1 φ 1 ) d Z N ( 0, 1 φ 2 1 ) Consistency and asymptotic normality for I (1) processes (φ 1 = 1) plim ˆφ 1 = 1 ( ) d T ˆφ 1 1 V where V is a nondegenerate, nonnormal random variable Root-T -consistency and superconsistency Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 100 / 143

Integrated processes Unit root tests Importance to distinguish between trend stationarity and difference stationarity Test of hypothesis that a process has a unit root (i.e. is I (1)) Classical approaches: (Augmented) Dickey-Fuller-Test, Phillips-Perron-Test Basic tool: Linear regression X t = deterministics + φx t 1 + ε t X t = deterministics + (φ 1) X } {{ } t 1 + ε t =:β Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 101 / 143

Integrated processes Unit root tests Null and alternative hypothesis H 0 : φ = 1 (unit root) H 1 : φ < 1 (no unit root) or, equivalently, H 0 : β = 0 (unit root) H 1 : β < 0 (no unit root) Unit root tests are one-sided; explosive process are ruled out Rejecting the null hypothesis is evidence in favour of stationarity If the null hypothesis is not rejected, there could be a unit root Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 102 / 143

Integrated processes DF test and ADF test Dickey-Fuller (DF) and Augmented Dickey-Fuller (ADF) tests Possible regressions X t = φx t 1 + ε t X t = a + φx t 1 + ε t X t = a + bt + φx t 1 + ε t or X t = βx t 1 + ε t or X t = a + βx t 1 + ε t or X t = a + bt + βx t 1 + ε t Assumption for Dickey-Fuller test: no autocorrelation in ε t If there is autocorrelation in ε t, use the augmented DF test Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 103 / 143

Integrated processes DF test and ADF test Dickey-Fuller regression, case 1: no constant, no trend X t = βx t 1 + ε t Null and alternative hypotheses H 0 : β = 0 H 1 : β < 0 Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around zero Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 104 / 143

Integrated processes DF test and ADF test Dickey-Fuller regression, case 2: constant, no trend X t = a + βx t 1 + ε t Null and alternative hypotheses H 0 : β = 0 or H 0 : β = 0, a = 0 H 1 : β < 0 or H 0 : β < 0, a 0 Null hypothesis: stochastic trend without drift Alternative hypothesis: stationary process around a constant Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 105 / 143

Integrated processes DF test and ADF test Dickey-Fuller regression, case 3: constant and trend X t = a + bt + βx t 1 + ε t Null and alternative hypotheses H 0 : β = 0 or β = 0, b = 0 H 1 : β < 0 or β < 0, b 0 Null hypothesis: stochastic trend with drift Alternative hypothesis: trend stationary process Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 106 / 143

Integrated processes DF test and ADF test Dickey-Fuller test statistics for single hypotheses ρ-test : T ˆβ τ-test : ˆβ/ˆσ ˆφ The τ-test statistic is computed in the same way as the usual t-test statistic Reject the null hypothesis if the test statistics are too small The critical values are not the quantiles of the t-distribution There are tables with the correct critical values (e.g. Hamilton, table B.6) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 107 / 143

Integrated processes DF test and ADF test The Dickey-Fuller test statistics for the joint hypotheses are computed in the same way as the usual F -test statistics Reject the null hypothesis if the test statistic is too large The critical values are not the quantiles of the F -distribution There are tables with the correct critical values (e.g. Hamilton, table B.7) Illustrations (dftest.r) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 108 / 143

Integrated processes DF test and ADF test If there is autocorrelation in ε t the DF test does not work (dftest.r) Augmented Dickey-Fuller test (ADF test) regressions: X t = γ 1 X t 1 +... + γ p X t p + βx t 1 + ε t X t = a + γ 1 X t 1 +... + γ p X t p + βx t 1 + ε t X t = a + bt + γ 1 X t 1 +... + γ p X t p + βx t 1 + ε t The added lagged differences capture the autocorrelation The number of lags p must be large enough to make ε t white noise The critical values remain the same as in the no-correlation case Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 109 / 143

Integrated processes DF test and ADF test Further interesting topics (but we skip these) Phillips-Perron test Structural breaks and unit roots KPSS test of stationarity H 0 : X t I (0) H 1 : X t I (1) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 110 / 143

Integrated processes Regression with integrated processes Spurious regression: If X t and Y t are independent but both I (1) then the regression Y t = α + βx t + u t will result in an estimated coefficient ˆβ that is significantly different from 0 with probability 1 as T BUT: The regression Y t = α + βx t + u t may be sensible even though X t and Y t are I (1) Cointegration Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 111 / 143

Integrated processes Regression with integrated processes Definition: Cointegration Two stochastic processes (X t ) t T and (Y t ) t T are cointegrated if both processes are I (1) and there is a constant β such that the process (Y t βx t ) is I (0) If β is known, cointegration can be tested using a standard unit root test on the process (Y t βx t ) If β is unknown, it can be estimated from the linear regression Y t = α + βx t + u t and cointegration is tested using a modified unit root test on the residual process (û t ) t=1,...,t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 112 / 143

GARCH models Conditional expectation Let (X, Y ) be a bivariate random variable with a joint density function, then E(X Y = y) = x f X Y =y (x)dx is the conditional expectation of X given Y = y E(X Y ) denotes a random variable with realization E(X Y = y) if the random variable Y realizes as y Both E(X Y ) and E(X Y = y) are called conditional expectation Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 113 / 143

GARCH models Conditional variance Let (X, Y ) be a bivariate random variable with a joint density function, then Var(X Y = y) = is the conditional variance of X given Y = y (x E(X Y = y)) 2 f X Y =y (x)dx Var(X Y ) denotes a random variable with realization Var(X Y = y) if the random variable Y realizes as y Both Var(X Y = y) and Var(X Y ) are called conditional variance Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 114 / 143

GARCH models Rules for conditional expectations 1 Law of iterated expectations: E (E (X Y )) = E (X ) 2 If X and Y are independent, then E (X Y ) = E (X ) 3 The condition can be treated like a constant, E (XY Y ) = Y E (X Y ) 4 The conditional expecation is a linear operator. For a 1,..., a n R ( n ) n E a i X i Y = a i E(X i Y ) i=1 i=1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 115 / 143

GARCH models Basics Some economic time series show volatility clusters, e.g. stock returns, commodity price changes, inflation rates,... Simple autoregressive models cannot capture volatility clusters since their conditional variance is constant Example: Stationary AR(1)-process, X t = αx t 1 + ε t with α < 1; then Var(X t ) = σx 2 = σ2 ε 1 α 2, and the conditional variance is Var(X t X t 1 ) = σ 2 ε Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 116 / 143

GARCH models Basics In the following, we will focus on stock returns Empirical fact: squared (or absolute) returns are positively autocorrelated Implication: Returns are not independent over time The dependence is nonlinear How can we model this kind of dependence? Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 117 / 143

GARCH models ARCH(1)-process Definition: ARCH(1)-process The stochastic process (X t ) t Z is called ARCH(1)-process if for all t Z, with α 0, α 1 > 0 E(X t X t 1 ) = 0 Var(X t X t 1 ) = σt 2 = α 0 + α 1 Xt 1 2 Often, an additional assumption is X t (X t 1 = x t 1 ) N(0, α 0 + α 1 x 2 t 1) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 118 / 143

GARCH models ARCH(1)-process The unconditional distribution of X t is a non-normal distribution Leptokurtosis: The tails are heavier than the tails of the normal distribution Example of an ARCH(1)-process X t = ε t σ t where (ε t ) t Z is white noise with σε 2 = 1 and σ t = α 0 + α 1 Xt 1 2 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 119 / 143

GARCH models ARCH(1)-process One can show that [11] E(X t X t 1 ) = 0 E(X t ) = 0 Stationarity condition: 0 < α 1 < 1 Var(X t X t 1 ) = α 0 + α 1 Xt 1 2 Var(X t ) = α 0 / (1 α 1 ) Cov(X t, X t i ) = 0 for i > 0 The unconditional kurtosis is 3(1 α 2 1 )/(1 3α2 1 ) if ε t N(0, 1). If α 1 > 1/3 = 0.57735, the kurtosis does not exist. [12] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 120 / 143

GARCH models ARCH(1)-process Squared returns follow [13] X 2 t = α 0 + α 1 X 2 t 1 + v t with v t = σt 2 (ε 2 t 1) Thus, squared returns of ARCH(1) are AR(1) The process (v t ) t Z is white noise E(v t ) = 0 Var(v t ) = E(vt 2 ) = const. Cov(v t, v t i ) = 0 (i = 1, 2,...) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 121 / 143

GARCH models ARCH(1)-process Simulation of an ARCH(1)-process for t = 1,..., 2500 Parameters: α 0 = 0.05, α 1 = 0.95, start value X 0 = 0 Conditional distribution: ε t N(0, 1) archsim.r Check whether the simulated time series shows the typical stylized facts of return distributions Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 122 / 143

GARCH models Estimation of an ARCH(1)-process Of course, we do not know the true values of the model parameters α 0 and α 1 How can we estimate the unknown parameters α 0 and α 1? Observations X 1,..., X T Because of X 2 t = α 0 + α 1 X 2 t 1 + v t a possible estimation method is OLS Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 123 / 143

GARCH models Estimation of an ARCH(1)-process OLS estimator of α 1 ( ) ( ) T t=2 Xt 2 Xt 2 Xt 1 2 X t 1 2 ˆα 1 = ( ) 2 ˆρ(X 2 T t=2 Xt 1 2 X t 1 2 t, Xt 1) 2 Careful: These estimators are only consistent if the kurtosis exists (i.e. if α 1 < 1/3) Test of ARCH-effects H 0 : α 1 = 0 H 1 : α 1 > 0 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 124 / 143

GARCH models Estimation of an ARCH(1)-process For T large, under H 0 T ˆα 1 N(0, 1) Reject H 0 if T ˆα 1 > Φ 1 (1 α) Second version of this test: Consider the R 2 of the regression X 2 t = α 0 + α 1 X 2 t 1 + v t, then under H 0 T ˆα 2 1 TR 2 appr χ 2 1 Reject H 0 if TR 2 > F 1 (1 α) χ 2 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 125 / 143

GARCH models ARCH(p)-process Definition: ARCH(p)-process The stochastic process (X t ) t Z is called ARCH(p)-process if E(X t X t 1,... X t p ) = 0 Var(X t X t 1,..., X t p ) = σt 2 = α 0 + α 1 Xt 1 2 +... + α p Xt p 2 for t Z, where α i 0 for i = 0, 1,..., p 1 and α p > 0 Often, an additional assumption is that X t (X t 1 = x t 1,..., X t p = x t p ) N(0, σ 2 t ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 126 / 143

GARCH models ARCH(p)-process Example of an ARCH(p)-process X t = ε t σ t where(ε t ) t Z is white noise with σε 2 = 1 and σ t = α 0 + α 1 Xt 1 2 +... + α pxt p 2 An ARCH(p) process is weakly stationary if all roots of 1 α 1 z α 2 z 2... α p z p = 0 are outside the unit circle Then, for all t Z, E(X t ) = 0 and Var(X t ) = α 0 1 p i=1 α i Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 127 / 143

GARCH models ARCH(p)-process If (X t ) t Z is a stationary ARCH(p) process, then (X 2 t ) t Z is a stationary AR(p) process X 2 t = α 0 + α 1 X 2 t 1 +... + α p X 2 t p + v t As to the error term, E(v t ) = 0 Var(v t ) = const. Cov(v t, v t i ) = 0 for i = 1, 2,... Simulating an ARCH(p) is easy Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 128 / 143

GARCH models Estimation of ARCH(p) models OLS estimation of X 2 t = α 0 + α 1 X 2 t 1 +... + α p X 2 t p + v t Test of ARCH-effects H 0 : α 1 = α 2 =... = α p = 0 vs H 1 : not H 0 Let R 2 denote the coefficient of determination of the regression Under H 0, the test statistic TR 2 χ 2 p; thus reject H 0 if TR 2 > F 1 (1 α) χ 2 p Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 129 / 143

GARCH models Maximum likelihood estimation Basic idea of the maximum likelihood estimation method: Choose parameters such that the joint density of the observations is maximized f X1,...,X T (x 1,..., x T ) Let X 1,..., X T denote a random sample from X The density f X (x; θ) depends on R unknown parameters θ = (θ 1,..., θ R ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 130 / 143

GARCH models Maximum likelihood estimation ML estimation of θ: Maximize the (log)likelihood function L (θ) = f X1,...X T (x 1,..., x T ; θ) T = f X (x t ; θ) t=1 T ln L (θ) = ln f X (x t ; θ) t=1 ML estimate ˆθ = argmax [ln L (θ)] Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 131 / 143

GARCH models Maximum likelihood estimation Since observations are independent in random samples or T f X1,...,X T (x 1,..., x T ) = f Xt (x t ) t=1 ln f X1,...,X T (x 1,..., x T ) = = T ln f Xt (x t ) t=1 T ln f X (x t ) t=1 But: ARCH-returns are not independent! Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 132 / 143

GARCH models Maximum likelihood estimation Factorization with dependent observations or T f X1,...,X T (x 1,..., x T ) = f Xt Xt 1,...,X 1 (x t x t 1,..., x 1 ) t=1 ln f X1,...,X T (x 1,..., x T ) = Hence, for an ARCH(1)-process T ln f Xt Xt 1,...,X 1 (x t x t 1,..., x 1 ) t=1 f X1,...,X T (x 1,..., x T ) = f X1 (x 1 ) T t=2 ( 1 exp 1 ( ) ) 2 xt 2π σ 2 t 2 σ t Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 133 / 143

GARCH models Maximum likelihood estimation The marginal density of X 1 is complicated but becomes negligible for large T and, therefore, will be dropped from now on Log-likelihood function (without initial marginal density) ln L(α 0, α 1 x 1,..., x T ) T T ( ) 2 xt = T 2 ln 2π 1 2 t=2 ln σ 2 t 1 2 t=2 σ t where σ 2 t = α 0 + α 1 x 2 t 1 ML-estimation of α 0 and α 1 by numerical maximization of ln L(α 0, α 1 ) with respect to α 0 and α 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 134 / 143

GARCH models GARCH(p,q)-process Definition: GARCH(p,q)-process The stochastic process (X t ) t Z is called GARCH(p, q)-process if E(X t X t 1, X t 2,...) = 0 Var(X t X t 1, X t 2,...) = σt 2 = α 0 + α 1 Xt 1 2 +... + α p Xt p 2 for t Z with α i, β i 0 +β 1 σ 2 t 1 +... + β q σ 2 t q Often, an additional assumption is that (X t X t 1 = x t 1, X t 2 = x t 2,...) N(0, σ 2 t ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 135 / 143

GARCH models GARCH(p,q)-process Conditional variance of GARCH(1, 1) Var(X t X t 1, X t 2,...) = σt 2 = α 0 + α 1 Xt 1 2 + β 1 σt 1 2 = α 0 + α 1 β1 i 1 Xt i 2 1 β 1 Unconditional variance i=1 Var(X t ) = α 0 1 p i=1 α i q j=1 β j Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 136 / 143

GARCH models GARCH(p,q)-process Necessary condition for weak stationarity p α i + i=1 (X t ) t Z has no autocorrelation q β j < 1 GARCH-processes can be written as ARMA(max (p, q), q)-processes in the squared returns j=1 Example: GARCH(1, 1)-process with X t = ε t σ t and σ 2 t = α 0 + α 1 X 2 t 1 + β 1σ 2 t 1 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 137 / 143

GARCH models Estimation of GARCH(p,q)-processes Estimation of the ARMA(max (p, q), q)-process in the squared returns Alternative (and better) method: Maximum likelihood For a GARCH(1, 1)-process f X1,...,X T (x 1,..., x T ) ( T 1 = f X1 (x 1 ) exp 1 2π σ 2 t 2 t=2 ( xt σ t ) 2 ) Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 138 / 143

GARCH models Estimation of GARCH(p,q)-processes Again, the density of X 1 can be neglected Log-Likelihood function ln L(α 0, α 1, β 1 x 1,..., x T ) T T ( ) 2 xt = T 2 ln 2π 1 2 t=2 ln σ 2 t 1 2 t=2 σ t with σ 2 t = α 0 + α 1 x 2 t 1 + β 1σ 2 t 1 and σ2 1 = 0 ML-estimation of α 0, α 1 and β 1 by numerical maximization Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 139 / 143

GARCH models Estimation of GARCH(p,q)-processes Conditional h-step forecast of the volatility σt+h 2 in a GARCH(1, 1) model E ( σt+h 2 X t, X t 1,... ) ) = (α 1 + β 1 ) (σ h t 2 α 0 1 α 1 β 1 If the process is stationary α 0 + 1 α 1 β 1 lim h E(σ2 t+h X t, X t 1,...) = α 0 1 α 1 β 1 Simulation of GARCH-processes is easy; the estimation can be computer intensive Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 140 / 143

GARCH models Residuals of an estimated GARCH(1,1) model Careful: Residuals are slightly different from what you know from OLS regressions Estimates: ˆα 0, ˆα 1, ˆβ 1, ˆµ From σ 2 t = α 0 + α 1 X 2 t 1 + β 1σ 2 t 1 and X t = µ + σ t ε t we calculate the standardized residuals ˆε t = X t ˆµ ˆσ t = Histogram of the standardized residuals X t ˆµ ˆα 0 + ˆα 1 Xt 1 2 + ˆβ 1 σt 1 2 Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 141 / 143

GARCH models AR(p)-ARCH(q)-models Definition: (X t ) t Z is called AR(p)-ARCH(q)-process if X t = µ + φ 1 X t 1 + ε t σt 2 = α 0 + α 1 ε 2 t 1 where ε t N(0, σt 2 ) mean equation / variance equation Maximum likelihood estimation Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 142 / 143

GARCH models Extensions of the GARCH model There are a number of possible extensions to the GARCH model: Empirical fact: Negative shocks have a larger impact on volatility than positive shocks (leverage effect) News impact curve Nonnormal innovations, e.g. ε t t ν Andrea Beccarini (CQE) Time Series Analysis Winter 2013/2014 143 / 143