Lectures in Modern Economic Time Series Analysis. 2 ed. c. Linköping, Sweden bo.sjo@liu.se

Size: px
Start display at page:

Download "Lectures in Modern Economic Time Series Analysis. 2 ed. c. Linköping, Sweden email:bo.sjo@liu.se"

Transcription

1 Lectures in Modern Economic Time Series Analysis. 2 ed. c Bo Sjö Linköping, Sweden bo.sjo@liu.se October 30, 2011

2 2

3 CONTENTS 1 Introduction Outline of this Book/Text/Course/Workshop Why Econometrics? Junk Science and Junk Econometrics Introduction to Econometric Time Series Programs Different types of time series Repetition - Your First Courses in Statistics and Econometrics.. 15 I Basic Statistics 19 3 Time Series Modeling - An Overview Statistical Models Random Variables Moments of random variables Popular Distributions in Econometrics Analysing the Distribution Multidimensional Random Variables Marginal and Conditional Densities The Linear Regression Model A General Description The Method of Maximum Likelihood MLE for a Univariate Process MLE for a Linear Combination of Variables The Classical tests - Wald,LM and LR tests 41 II Time Series Modeling 43 6 Random Walks, White noise and All That Different types processes White Noise The Log Normal Distribution The ARIMA Model The Random Walk Model Martingale Processes Markov Processes Brownian Motions Brownian motions and the sum of white noise The geometric Brownian motion A more formal definition CONTENTS 3

4 7 Introductioo to Time Series Modeling Descriptive Tools for Time Series Weak and Strong Stationarity Weak Stationarity, Covariance Stationary and Ergodic Processes Strong Stationarity Finding the Optimal Lag Length and Information Criteria The Lag Operator Generating Functions The Difference Operator Filters Dynamics and Stability Fractional Integration Building an ARIMA Model. The Box-Jenkin s Approach Is the ARMA model identified? Theoretical Properties of Time Series Models The Principle of Duality Wold s decomposition theorem Additional Topics Seasonality Non-stationarity Aggregation Overview of Single Equation Dynamic Models Multipliers and Long-run Solutions of Dynamic Models Vector Autoregressive Models How estimate a VAR? Impulse responses in a VAR with non-stationary variables and cointegration BVAR, TVAR etc III Granger Non-causality Tests Introduction to Exogeneity and Multicollinearity Exogeneity Weak Exogeneity Strong Exogeneity Super Exogeneity Multicollinearity and understanding of multiple regression Univariate Tests of The Order of Integration The DF-test: The ADF-test The Phillips-Perron test The LMSP-test The KPSS-test The G(p, q) test The Alternative Hypothesis in I(1) Tests Fractional Integration CONTENTS

5 12 Non-Stationarity and Co-integration The Spurious Regression Problem Integrated Variables and Co-integration Approaches to Testing for Co-integration Integrated Variables and Common Trends A Deeper Look at Johansen s Test The Estimation of Dynamic Models Deterministic Explanatory Variables The Deterministic Trend Model Stochastic Explanatory Variables Lagged Dependent Variables Lagged Dependent Variables and Autocorrelation The Problems of Dependence and the Initial Observation Estimation with Integrated Variables Encompassing ARCH Models Practical Modelling Tips Some ARCH Theory Some Different Types of ARCH and GARCH Models The Estimation of ARCH models Econometrics and Rational Expectations Rational v.s. other Types of Expectations Typical Errors in the Modeling of Expectations Modeling Rational Expectations Testing Rational Expectations A Research Strategy References APPENDIX Appendix III Operators The Expectations Operator The Variance Operator The Covariance Operator The Sum Operator The Plim Operator The Lag and the Difference Operators Abstract CONTENTS 5

6 6 CONTENTS

7 1. INTRODUCTION He who controls the past controls the future. George Orwell in "1984". Please respect that this is work in progress. It has never been my intention to write a commercial book, or a perfect textbook in time series econometrics. It is simply a collection of lectures in a popular form that can serve as a complement to ordinary textbooks and articles used in education. The parts dealing with tests for unit roots (order of integration) and cointegration are not well developed. These topics have a memo of their own "A Guide to testing for unit roots and cointegration". When I started to put these lecture notes together some years ago I decided on title "Lectures in Modern Time Series Econometrics" because I thought that the contents where a bit "modern" compared to standard econometric textbook. During the fall of 2010 as I started to update the notes I thought that it was time to remove the word "modern" from the title. A quick look in Damodar Gujarati s textbook "Basic Econometrics" from 2009 convinced my to keep the word "modern" in te title. Gujarati s text on time series hasn t changed since the 1970 s even though time series econometrics has changed completely since the 70s. Thus, under these circumstances I see no reason to change the title, at least not yet. There are four ways in which one do time series econometrics. The first is to use the approach of the 1970s, view your time series model just like any linear regression, and impose a number of ad hoc restrictions that will hide all problems you find. This is not a good approach. This approach is only found in old textbooks and never in today s research. You might only see it used in very low scientific journals. Second, you can use theory to derive a time series model, and interesting parameters, that you then estimate with appropriate estimators. Examples of this ti derive utility functions, assume that agents have rational expectations etc. This is a proper research strategy. However, it typically takes good data, and you need to be original in your approach, but you can get published in good journals. The third, approach is simply to do statistical description of the data series, in the form of a vector autoregressive system, or reduced form of the vector error correction model. This system can used for forecasting, analysing relationships among data series and investigated with respect to unforeseen shocks such as drastic changes in energy prices, money supply etc. The fourth way is to go beyond the vector autoregressive system and try to estimate structural parameters in the form of elasticities and policy intervention parameters. If you forget about the first method, the choice depends on the problem at hand and you chose to formulate it. This book aims at telling you how to use methods three and four. The basic thinking is that your data is the real world, theories are abstractions that we use to understand the real world. In applied econometric time series you should always strive to build well-defined statistical models, that is models that are consistent with the data chosen. There is a complex statistical theory behind all this, that I will try to popularize in this book. I do not see this book as a substitute for an ordinary textbook. It is simply a complement. INTRODUCTION 7

8 1.1 Outline of this Book/Text/Course/Workshop This book is intended for people who has done a basic course in statistics and econometrics, either at the undergraduate or at the graduate level. If you did an undergraduate course I assume that you did it well. Econometrics is a type of course were every lecture, and every textbook chapter leads to the next level. The best way to learn econometrics is to be active, read several books, work on your own with econometric software. No teacher can learn you how to run a software. That is something you have to learn on your own by practicing how to use the software. There are some very good software out there, and some The outline differences between graduate and Ph.D. level mainly in the theoretical parts. At the Ph.D. level, there is more stress on theoretical backgrounds. 1) I will begin by talking about why econometrics is different from statistics, and why econometric time series is different from the econometrics your meet in many basic textbooks. 2) I will repeat very briefly basic statistics, and linear regression and stress what you should know in terms of testing and modeling dynamic models. For most students that will imply going back and do some quick repetition. 3) Introduction into statistical theory including maximum likelihood, random variables, density functions and stochastic processes. 4) Fourth, basic time series properties and processes. 5) Using and understanding ARFIMA and VAR modelling techniques. 6) Testing for non-stationary in the form of stochastic trends, i.e. test for unit roots. 7) The spurious regression problem 8) Testing and understanding cointegration. 9) Testing for Granger non-causality 10) The theory of reduction, exogeneity and building dynamic models and systems 11) Modelling time varying variances, ARCH and GARCH models 12) The implications and consequences of rational expectations on econometric modelling 13) Non-linearities 14) Additional topics For most of these topics I have developed more or less self-instructing exercises. 1.2 Why Econometrics? Why is there a subject called econometrics? Why study econometrics, instead of statistics? Why not let the statisticians teach statistics, and in particular time series techniques? These are common questions, raised during seminars and in private, by students, statisticians and economists. The answer is that each scientific area tends to create its own special methodological problems often heavily interrelated with theoretical issues. These problems, and the ways of solving them, are important in a particular area of science but not necessarily in others. Economics is a typical example, were the formulation of the economic and the statistical problem is deeply interrelated from the beginning. In everyday life we are forced to make decisions based on limited information. Most of our decisions deal with the an uncertain stochastic future. We all base our 8 INTRODUCTION

9 decisions on some view of the economy where we assume that certain events are linked to each other in more or less complex ways. Economists call this a model of the economy. We can describe the economy and the behavior of the individuals in terms of multivariate stochastic processes. Decisions based on stochastic sequences play a central role economics and in finance. Stochastic processes are the basis for our understanding about the behavior of economic agents and of how their behavior determine the future path of the economy. Most econometric text books deal with stochastic time series as a special application of the linear regression technique. Though this approach is acceptable for an introductory course in econometrics, it is unsatisfactory for students with a deeper interest in economics and finance. To understand the empirical and theoretical work in these areas, it is necessary to understand some of the basic philosophy behind stochastic time series. This work is a work in progress. It is based on my lectures on Modern Economic Time Series Analysis at the Department of Economics first at University of Gothenburg and later at University of Skovde and Linköping University in Sweden. The material is not ready for a widespread distribution. This work, most likely, contains lots of errors, some are known by the author, and some are not yet detected. The different sections do not necessarily follow in a logical order. Therefore, I invite anyone who has opinions about this work to share them me. The first part of this work provides a repetition of some basic statistical concepts, which are necessary understanding modern economic time series analysis. The motive for repeating these concepts is that they play a larger role in econometrics than many contemporary textbooks in econometrics indicate. Econometrics did not change much from the first edition of Johnston in the 60s until the revised version of Kmenta in the mid 80s. However, as a consequence of the critique against the use of econometrics delivered by Sims, Lucas, Leamer, Hendry and others, in combination with new insights into the behavior of non-stationary time series and the rapid development of computer technology, have revolutionized econometric modeling, and resulted in an explosion of knowledge. The demand for writing a decent thesis, or a scientific paper, based on econometric methods has risen far beyond what one can learn in an introductory course in econometrics. 1.3 Junk Science and Junk Econometrics In media you often hear about this and that being proved by scientific research. In the late 1990s newspapers told that someone had proved that genetic modified (GM) food could be dangerous. The news were spread quickly, and according to the story the original article had been stooped from being published by scientists with suspicious motives. Various lobby groups immediately jumped up. GM food were dangerous, should be banned and more money should go into this line of research. What had happened was the following. A researcher claimed to have shown that GM food were bad for health. He claimed this results for a number of media people, who distributed the results. (Remember the fuss about cold fusion ). The result were presented in a paper sent to a scientific journal for publication. The journal however, did not publish the article. It was dismissed because the results were not based on a sound scientific method. The researcher had feed rats with potatoes. One group of rats got GM potatoes, the other group of rats got normal non-gm potatoes. The rats that got GM potatoes seemed to develop cancer more often than the control group. The statistical difference JUNK SCIENCE AND JUNK ECONOMETRICS 9

10 between the groups were not big, but suffi ciently big for those wanting to confirm their a priori beliefs that GM food is bad. A somewhat embarrassing detail, never reported in the media, is that rats in general do not like potatoes. As a consequence both groups of rats in this study were suffering from starvation, which severely affected the test. It was not possible to determine if the difference between the two groups were caused by starvation, or by GM food. Once the researcher conditioned on the effects of starvation, the difference became insignificant. This is an example of Junk science, bad science getting a lot of media exposure because the results fits the interests of lobby groups, and can be used to scare people. The lesson for econometricians is obvious, if you come up with good results you get rewarded, bad results on the other hand can quickly be forgotten. The GM food example is extreme econometric work. Econometric research seldom get such media coverage, though there are examples such as Sweden s economic growth is less than other similar countries, the assumed dynamic effects of a reduction of marginal taxes. There are significant results that depend on one single outlier. Once the outlier is removed, the significance is gone, and the whole story behind this particular book is also gone. In these lectures we will argue that the only way to avoid junk econometrics is careful and systematic construction and testing of models. Basically, this is the modern econometric time series approach. Why is this modern, and why stress the idea of testing? The answers are simply that careers have been build on running junk econometric equations, most people are unfamiliar with scientific methods in general and the consequences of living in a world surrounded by random variables in particular. 10 INTRODUCTION

11 2. INTRODUCTION TO ECONO- METRIC TIME SERIES "Time is a great teacher, but unfortunately it kills all its pupils" Louis Hector Berlioz A time series is simply data ordered by time. For an econometrician time series is usually data that is also generated over time in such a way that time can be seen as a driving factor behind the data. Time series analysis is simply approaches that look for regularities in these data ordered by time. In comparison with other academic fields, the modeling of economic time series is characterized by the following problems, which partly motivates why econometrics is a subject of its own: The empirical sample sizes in economics are generally small, especially compared with many applications in physics or biology. Typical sample sizes ranges between observations. In many areas anything below 500 observations is considered a small sample. Economic time series are dependent in the sense that they are correlated with other economic time series. In the economic science, problems are almost never concerned with univariate series. Consumption, as an example, is a function of income, and at the same time, consumption also affects income directly and through various other variables. Economic time series are often dependent over time. Many series display high autocorrelation, as well as cross autocorrelation with other variables over time. Economic time series are generally non-stationary. Their means and variances change over time, implying that estimated parameters might follow unknown distributions instead of standard tabulated distributions like the normal distribution. Non-stationarity arises from productivity growth and price inflation. Non-stationary economic series appear to be integrated, driven by stochastic trends, perhaps as a result of stochastic changes in the total factor productivity. Integrated variables, and in particular the need to model them, are not that common outside economics. In some situations, therefore, inference in econometrics become quite complicated, and requires the development of new statistical techniques for handling stochastic trends. The concepts of cointegration and common trends, and the recently developed asymptotic theory for integrated variables are examples of this. Economic time series cannot be assumed to be drawn from samples in the way assumed in classical statistics. The classical approach is to start from a population from which a sample is drawn. Since the sampling process can be controlled the variables which make up the sample can be seen as random variables. Hypothesis are then formulated and tested conditionally on the assumption that the random variables have a specific distribution. Economic time series are seldom random variables drawn from some underlying population in the classical statistical sense. Observations do not represent INTRODUCTION TO ECONOMETRIC TIME SERIES 11

12 a random sample in the classical statistical sense, because the econometrician cannot control the sampling process of variables. Variables like, GDP, money, prices and dividends are given from history. To get a different sample we would have to re-run history, which of course is impossible. The way statistic theory deals with this situation is to reverse the approach taken in classical statistic analysis, and build a model that describes the behavior of the observed data. A model which achieves this is called a well defined statistical model, it can be understood as a parsimonious time invariant model with white noise residuals, that makes sense from economic theory. Finally, from the view of economics, the subject of statistics deals mainly with the estimation and inference of covariances only. The econometrician, however, must also give estimated parameters an economic interpretation. This problem cannot always be solved ex post, after the a model has been estimated. When it comes to time series, economic theory is an integrated part of the modeling process. Given a well defined statistical model, estimated parameters should represent behavior of economic agents. Many econometric studies fail because researchers assume that their estimates can be given an economic interpretation without considering the statistical properties of the model, or the simple fact there is in general not a one to one correspondence with observed variables and the concepts defined in economic theory Programs Here is a list of statistical software that you should be familiar with, please goggle, (those recommended for time series are marked with *): *RATS and CATS in RATS, Regression Analysis of Time Series and Cointegrating Analysis of Time Series ( - *PcGive - Comes highly recommended. Included in Oxmetrics modules, see also Timberlake consultants for more programs. - *Gretl (Free GNU license, very good for students in econometrics) - *JMulti (Free for multivariate time series analysis, updated? The discussion forum is quite dead, - *EViews - Gauss (good for simulation) - STATA (used by the World Bank, good for microeconometrics, panel data, OK on time series) - LIMDEP ( Mostly free with some editions of Green s Econometric text book?, you need to pay for duration models?) - SAS - Statistical Analysis System (good for big data sets, but not time series, mainly medicine, "the calculus program for decision makers") - Shazam And more, some are very special programs for this and that,... but I don t find them worth mentioning in this context For a recent discussion about the controversies in econometrics see The Economic Journal 12 INTRODUCTION TO ECONOMETRIC TIME SERIES

13 There is a bunch of software that allows you to program your own models or use other peoples modules: - Matlab - R (Free, GNU license, connects with Gretl) - Ox You should also know about C, C++, and LaTeX to be a good econometrician. Please google. For Data Envelopment Analysis (DEA) I recommend Tom Coelli s DEAP 2.1 or Paul W. Wilson s FEAR. 2.2 Different types of time series Given the general definition of time series above, there many types of time series. The focus in econometrics, macroeconomics and finance is in stochastic time series typically in the time domain, which are non-stationarity in levels but becomes what is called covariance stationary after differencing. In a broad perspective, time series analysis typically aims at making time series more understandable by decomposing them into different parts. The aim of this introduction is to give a general overview of the subject. A time series is any sequence ordered by time. The sequence can be either deterministic or stochastic. The primary interest in economics is in stochastic time series, where the sequence of observations is made up by the outcome of random variables. A sequence of stochastic variables ordered by time is called a stochastic time series process. The random variables that make up the process can either be discrete random variables, taking on a given set of integer numbers, or be continuous random variables taking on any real number between ±. While discrete random variables are possible they are not that common in economic time series research. Another dimension in modeling time series is to consider processes in discrete time or in continuous time. The principal difference is that stochastic variables in continuous time can take different values at any time. In a discrete time process, the variables are observed at fixed intervals of time (t), and they do not change between these observation points. Discrete time variables are not common in finance and economics. There are few, if any variables that remain fixed between their points of observations. The distinction between continuous time and discrete time is not matter of measurability alone. A common mistake is to be confused the fact that economic variables are measured at discrete time intervals. The money stock is generally measured and recorded as an end-of-month value. The way of measuring the stock of money does not imply that it remains unchanged between the observation interval, instead it changes whenever the money market is open. The same holds for variables like production and consumption. These activities take place 24 hours a day, during the whole year. The are measured as the flow of income and consumption over a period, typically a quarter, representing the integral sum of these activities. Usually, a discrete time variable is written with a time subscript (x t ) while continuous time variables written as x(t). The continuous time approach has a number of benefits, but the cost and quality of the empirical results seldom motivate the continuous time approach. It is better to use discrete time approaches DIFFERENT TYPES OF TIME SERIES 13

14 as an approximation to the underlying continuous time system. The cost for doing this simplification is small compared with the complexity of continuous time analysis. This should not be understood as a rejection of all continuous time approaches. Continuous time is good for analyzing a number of well defined problems like aggregation over time and individuals. In the end it should lead to a better understanding of adjustment speeds, stability conditions and interactions among economic time series, see Sjöö (1990, 1995). 2 In addition, stochastic time series can be analysed in the time domain or in the frequency domain. In the time domain the data is analysed ordered in given time periods such as days, weeks, years etc. The frequency approach decomposes time series into frequencies by using trigonometric functions like sinuses, etc. Spectral analysis is an example of analysis that uses the frequency domain, to identify regularities such as seasonal factors, trends, and systematic lags in adjustment etc. The main advantage with analysing time series in the frequency domain is that it is relatively easy to handle continuous time processes and observations observed as aggregations over time such as consumption. However, in economics and finance, where we are typically faced with given observations at given frequencies and we seek to study the behavior of agents operating in real time. Under these circumstances, the time domain is the most interesting road ahead because it has a direct intuitive appeal to both economists and policy makers. A dimension in modeling time series is to consider processes in discrete time or in continuous time. The principal difference here is that the stochastic variables in a continuous time process can take on different values at any time. In a discrete time process, the variables are observed at fixed intervals of time (t), and they are assumed not to change during the frequency interval. Discrete time variables are not common in finance and economics. There are few, if any variables that remain fixed between their points of observations. The distinction between continuous time and discrete time is not matter of measurability alone. A common mistake is to be confused the fact that economic variables are measured at discrete time intervals. The money stock is generally measured and recorded as an end-of-month value. The way of measuring the stock of money does not imply that it remains unchanged between the observation interval, instead it changes whenever the money market is open. The same holds for variables like production and consumption. These activities take place 24 hours a day, during the whole year. The are measured as the flow of income and consumption over a period, typically a quarter, representing the integral sum of these activities. Our interest is usually in analysing discrete time stochastic processes in the time domain. A time series process is generally indicated with brackets, like {y t }. In some situations it is necessary to be more precise about the length of the process. Writing {y} 1 indicates that he process start at period one and continues infinitely. The process consists of random variables because we can view each element in {y t } as a random variable. Let the process go from the integer values 1 up to T. If necessary, to be exact, the first variable in the process can be written as y t1 the second variable y t2 etc. up until y tt. The distribution function of the process can then be written as F (y t1, y t2,..., y tt ). 2 We can also mention the different types of series that are used; stocks, flows and price variables. Stocks are variables that can be observed at a point in time like, the money stock, inventories. Flows are variables that can only be observed over some period, like consumption or GDP. In this context price variables include prices, interest rates and similar variables which can be observed at a market at a given point in time. Combining these variables into multivariate process and constructing econometric models from observed variables in discrete time produces further problems, and in general they are quite diffi cult to solve without using continuous time methods. Usually, careful discrete time models will reduce the problems to a large extent. 14 INTRODUCTION TO ECONOMETRIC TIME SERIES

15 In some situation it is necessary to start from the very beginning. A time series is data ordered by time. A stochastic time series is a set of random variables ordered by time. Let Ỹit represent the stochastic variable Ỹi given at time t. Observations on this random variable is often indicated as y it. In general terms a stochastic time series is a series of random variables ordered by time. A series starting at time t = 1 and ending at time} t = T, consisting of T different random variables is written as {Ỹ1,1, Ỹ2,2,...ỸT,T. Of course, assuming that the series is built up by individual random variables, with their own independent probability distributions is a complex thought. But, nothing in our definition of stochastic time series rules out that the data is made up by completely different random variables. Sometimes, to understand and find solutions to practical problems, it will be necessary to go all the way back to the most basic assumptions. Suppose we are given a time series consisting of yearly observations of interest rates, {6.6, 7.5, 5.9, 5.4, 5.5, 4.5, 4.3, 4.8}, the first question to ask is this a stochastic series in the sense that these number were generated by one stochastic process or perhaps several different stochastic processes? Further questions would be to ask if the process or processes are best represented as continuous or discrete, are the observations independent or dependent? Quite often we will assume that the series are generated by the same identical stochastic process in discrete time. Based on these assumptions the modelling process tries to find systematic historical patters and cross-correlations with other variables in the data. All time series methods aim at decomposing the series into separate parts in some way. The standard approach in time series analysis is to decompose as y t = T t,d + S t,d + C t,d + I t, where T d and S d represents (deterministic) trend and seasonal components, C t,d is deterministic cyclical components and I is process representing irregular factors 3. For time series econometrics this definition is limited, since the econometrician is highly interested in the irregular component. As an alternative, let {y t } be a stochastic time series process, which is composed as, y t = systematic components + unsystematic components = T d + T s + S d + S s + {y t } + e t, (2.1) where the systematic components include deterministic trends T d, stochastic trend T s, deterministic seasonals S d stochastic seasonals S s, a stationary process (or the short-run dynamics) yt, and finally a white noise innovation term e t. The modeling problem can be described as the problem of identifying the systematic components such that the residual becomes a white noise process. For all series,remember that any inference is potentially wrong, if not all components have been modeled correctly. This is so, regardless of whether we model a simple univariate series with time series techniques, a reduced system, a or a structural model. Inference is only valid for a correctly specified model. 2.3 Repetition - Your First Courses in Statistics and Econometrics 1. To be completed... 3 For simplicity we assume a linear process. An alternative is to assume that the components are multiplicative, x t = T t,d S t,d C t,d I t. REPETITION - YOUR FIRST COURSES IN STATISTICS AND ECONOMETRICS 15

16 In you first course in statistics you learned how to use descriptive statistics; the mean and the variance. Next you learned to calculate the mean and variances from a sample that represents the whole underlying population. For the mean and the variance to work as a description of the underlying population it is necessary to construct the sample in such a way that the difference between the sample mean and the true population mean is non-systematic meaning that the difference between the sample mean and the population is unpredictable. This man that your estimated sample mean is random variable with known characteristics. The most important thing is to construct a sampling mechanism so that the mean calculated from the sample has the characteristics you want to have. That is the estimated mean should be unbiased, effi cient and consistent. You learn about random variables, probabilities, distributions functions and frequency distributions. Your first course in econometrics "A theory should be as simple as possible, but not simpler" Albert Einstein To be completed... Random variables, OLS, minimize the sum of squares, assumptions 1-5(6), understanding, multiple regression, multicollinearity, properties of OLS estimator Matrix algebra Tests and solutions for heteroscedasticity (cross-section), and autocorrelation (time series). If you read a good course you should have learned the three golden rules: test test test, and learned about the probabilities of the OLS estimator. Generalized least squares GLS System estimation: demand and supply models. Further extensions: Panel data, Tobit, Heckit, discrete choice, probit/logit, duration Time series: distributed lag models, partial adjustment models, error correction models, lag structure, stationarity vs. non-stationarity, co-integration What need to know... What you probably do not know but should know. OLS Ordinary least squares is a common estimation method. Suppose there are two series {y t, x t } y t = α + βx t + ε t Minimize the sum of Squares over the sample t = 1,.2...T, S = T t=1 ε2 t = T t=1 (y t α βx t ) 2 Take the derivative of S with respect to α and β, set the expressions to zero, and solve for β and α. δs δβ = δs δα = ˆβ = T SS = ESS + RSS 1 = ESS T SS + RSS T SS R 2 = 1 RSS T SS = ESS T SS Basic assumptions 1) E(ε t ) = 0 for all t 16 INTRODUCTION TO ECONOMETRIC TIME SERIES

17 2) E(ε t ) 2 = σ 2 for all t 3) E(ε t ε t k ) = 0 for all k t 4) E(X t ε t ) = 0 5) E(X X) 0 6) ε t NID(0, σ 2 ) Discuss these properties Properties Gauss-Markow BLUE Deviations Misspecification, add extra variable, forget relevant variable Multicollinearity Error in variables problem Homoscedasticity Heteroscedasticity Autocorrelation REPETITION - YOUR FIRST COURSES IN STATISTICS AND ECONOMETRICS 17

18 18 INTRODUCTION TO ECONOMETRIC TIME SERIES

19 Part I Basic Statistics 19

20

21 3. TIME SERIES MODELING - AN OVERVIEW Economists are generally interested in a small part of what is normally included in the subject Time Series Analysis. Various techniques such as filtering, smoothing and interpolation developed for deterministic time series are of relative minor interest for economists. Time series econometrics is more focused on the stochastic part of time series. The following is an brief overview of time series modeling, from an econometric perspective. It is not text book in mathematical statistics, nor is the ambition to be extremely rigorous in the presentation of statistical concepts. The aim more to be a guide for the yet not so informed economist who wants to know more about the statistical concepts behind time series econometrics. When approaching time series econometrics the statistical vocabulary quickly increases and can become overwhelming. These first two chapters seek to make it possible for people without deeper knowledge in mathematical statistics to read and follow the econometric and financial time series literature. A time series is simply a set of observations ordered by time. Time series techniques seeks to decompose this ordered series into different components, which in turn can be used to generate forecasts, learn about the dynamics of the series, and how it relates to other series. There is a number of dimensions and decision to keep account of when approaching this subject. First, the series, or the process, can be univariate or multivariate, depending on the problem at hand. Second, the series can be stochastic or purely deterministic. In the former case a stochastic random process is generating the observations. Third, given that the series is stochastic, with perhaps deterministic components, it can be modeled in the time domain or in the frequency domain. Modeling in the frequency domain implies describing the series in terms cosines functions of different wave lengths. This is a useful approach for solving some problems, but not a general approach for economic time series modeling. Fourth, the data generating process and the statistical model can constructed in continuous or discrete time. Continuous time econometrics is good for some problems but not all. In general it leads to more complex models. A discrete time approach builds on the assumption that the observed data is unchanged between the intervals of observation. This is a convenient approximation, that makes modeling easier, but comes at a cost in the form of aggregation biases. However, in the general case, this is a low cost, compared with the costs of general misspecification. A special chapter deals with the discussion of discrete versus continuous time modeling. The typical economic time series is a discrete stochastic process modeled in the time domain. Time series can be modelled by smoothing and filter techniques. For economists these techniques are generally uninteresting, though we will briefly come back to the concept of filters. The simplest way to model an economic time series is to use autoregressive techniques, or ARIMA techniques in the general case. Most economic time series, however, are better modeled as a part of a multivariate stochastic process. Economic theory systems of economic variables, leading to single equation transfer functions and systems of equations in a VAR model. These techniques are descriptive, they do not identify structural, or deep parameters like elasticities, marginal propensities to consume etc. The estimate more TIME SERIES MODELING - AN OVERVIEW 21

22 specific economic models, we turn to techniques as VECM, SVAR, and structural VECM. What is outlined above is quite different from the typical basic econometric textbook approach, which starts with OLS and ends in practice with GLS as the solution to all problems. Here we will develop methods, which first describes the statistical properties of the (joint) series at hand, and then allows the researcher to answer economic questions in such a way that the conclusions are statistically and economically valid. To get there we have to start with some basic statistics. 3.1 Statistical Models A general definition of statistical time series analysis is that it finds a mathematical model that links observed variables with the stochastic mechanism that generated the data. This sounds abstract, but the purpose of this abstraction is understand the analytical tools of time series statistics. The practical problem is the following; we have some stochastic observations over time. We know that these observations have been generated by a process, but we do not know what this process looks like. Statistical time series analysis is about developing the tools needed to mimic the unknown data generating function (DGP). We can formulate some general features of the model. First, it should be a well-defined statistical model in the sense that the assumptions behind the model should be valid for the data chosen. Later we will define more exactly what this implies for an econometric model. For the time being, we can say that single most important criteria of models is that the residuals should be a white noise process. Second, the parameters of the model should be stable over time. Third, the model should be simple, or parsimonious, meaning that its functional form should be simple. Fourth, the model should be parameterized in such a way that it is possible to give the parameters a clear interpretation and identify them with events in the real world. Finally, the model should be able to explain other rival models describing the dependent variable(s). The way to build a well-defined-statistical-model is to investigate the underlying assumptions of the model in a systematic way. It can easily be shown that t-values, R 2, and Durbin-Watson values are not suffi cient for determining the fit of a model. In later chapters we will introduce a systematic test procedure. The final aim of econometric modelling is to learn about economic behavior. To some extent this always implies using some a priori knowledge about in the form of theoretical relationships. Economists, in general, have extremely strong a priori belief about the size and sign of certain parameters. This way of thinking has lead to much confusion, because a priori believes can be driven too far. Econometrics is basically about measuring correlations. It is a common misunderstanding among non-econometricians that correlations can be too high or too low, or be deemed right or wrong. Measured correlations are the outcome of the data used, only. Anyone who thinks of an estimated correlation as wrong, must also explain what went wrong in the estimation process, which requires knowledge of econometrics and the real world. 22 TIME SERIES MODELING - AN OVERVIEW

23 3.2 Random Variables The basic reason for dealing with stochastic models rather than deterministic models is that we are faced with random variables. A popular definition of random variables goes like this: a random variable is a variable that can take on more than one value. 1 For every possible value that a random variable can take on there is a number between zero and one that describes the probability that the random variable will take on this value. In the following a random variable is indicated with. In statistical terms, a random variable is associated with the outcome of a statistical experiment. All possible outcomes of such an experiment can be called the sample space. If S is a sample space with a probability measure and if X is real valued function defined over S then X is called a random variable. There are two types of random variables; discrete random variables, which only take on a specific number of real values, and (absolute) continuous random variables, which can take on any value between ±. It is also possible to examine discontinuous random variables, but we will limit ourselves to the first two types. If the discrete random variable X can take k numbers of values (x 1,..., x k ), the probability of observing a value x j can be stated as, P (x j ) = p j. (3.1) Since probabilities of discrete random variables are additive, the probability of observing one of the k possible outcomes is equal to 1.0, or using the notation just introduced, P (x 1, x 2,..., or x k ) = p 1 + p p k = 1. (3.2) A discrete random variable is described by its probability function, F (x i ), which specifies the probability with which X takes on a certain value. (The term cumulative distribution is used synonymous with probability function). In time series econometrics we are in most applications dealing with continuous random variables. Unlike discrete variables, it is not possible to associate a specific observation with a certain probability, since these variables can take on an infinite range of numbers. The probability that a continuous random variable will take on a certain value is always zero. Because it is continuous we cannot make a difference between 1.01 and etc. This does not mean that the variables do not take on specific values. The outcome of the experiment, or the observation, is of course always a given number. Thus, for a continuous random variable, statements of the probability of an observation must be made in terms of the probability that the random variable X is less than or equal to some specific value. We express this with the distribution function F (x) of the random variable X as follows, F (x) = P ( X x) for < x <, (3.3) which states the probability of X taking a value less than or equal to x. The continuous analogue of the probability function is called the density function f(x), which we get by derivation of the distribution function, w.r.t the observations (x), df (x) = f(x). (3.4) dx 1 Random variables (RV:s) are also called stochastic variables, chance variables, or variates. RANDOM VARIABLES 23

24 The fundamental theorem of integral calculus gives us the following expression for the probability that X takes on a value less that or equal to x, F (x) = x f(u)du. (3.5) It follows that for any two constants (a) and (b), with a < b, the probability that X takes on a value on the interval from (a) to (b) is given by F (b) F (a) = = b b a a f(u)du f(u)du (3.6) f(u)du (3.7) The term density function is used in a way that is analogous to density in physics. Think of a rod of variable density, measured by the function f(x). To obtain the weight of some given length of this rod, we would have to integrate its density function over that particular part in which we are interested. Random variables care described by their density function and/or by their moments; the mean, the variance etc. Given the density function, the moments can be determined exactly. In statistical work, we must first estimate the moments, from the moments we can learn about density function. For, instance we can test, if the assumption of an underlying normal density function is consistent with the observed data. A random variable can be predicted, in other words it is possible to form an expectation of its outcome based on its density function. Appendix III deals with the expectations operator and other operators related to random variables. 3.3 Moments of random variables Random variables are characterized by their probability density functions pdf : s) or their moments. In the previous section we introduced pdf : s. Moments refers to measurements such as the mean, the variance, skewness, etc. If we know the exact density function of a random variable then we would also know the moments. In applied work, we will typically first calculate the moments from a sample, and from the moments figure out the density function of variables. The term moment originates from physics and the moment of a pendulum. For our purposes it can be though of as a general term which includes the definition of concepts like the mean and the variance, without referring to any specific distribution. Starting with the first moment, the mathematical expectation of a discrete random variable is given by, E( X) = xf(x) (3.8) where E is the expectation operator and f(x) is the value of its probability function at X. Thus, E( X) represents the mean of the discrete random variable X. Or, in other words, the first moment of the random variable. For a continuous random variable ( X), the mathematical expectation is E( X) = x f(x)dx (3.9) 24 TIME SERIES MODELING - AN OVERVIEW

25 where f(x) is the value of its probability density at x. The first moment can also be referred to as the location of the random variable. Location is a more generic concept than the first moment or the mean. The term moments are used in situations where we are interested in the expected value of a function of a random variable, rather than the expectation of the specific variable itself. Say that we are interested in Ỹ, whose values are related to X by the equation y = g(x). The expectation of Ỹ is equal to the expectation of g(x), since E(Ỹ ) = E [g(x)]. In the continuous case this leads to, E(Ỹ ) = E[g( X)] = g(x)f(x)dx. (3.10) Like density, the term moment, or moment about the origin, has its explanation in physics. (In physics the length of a lever arm is measured as the distance from the origin. Or if we refer to the example with the rod above, the first moment around the mean would correspond to horizontal center of gravity of the rod.) Reasoning from intuition, the mean can be seen as the midpoint of the limits of the density. The midpoint can be scaled in such a way that its becomes the origin of the x- axis. The term moments of a random variable is a more general way of talking about the mean and variance of a variable. Setting g(x) equal to x, we get the r:th moment around the origin, µ r = E( X r ) = x r f(x) (3.11) when X is a discrete variable. In the continuous case we get, µ r = E( X r ) = x r f(x)dx. (3.12) The first moment is nothing else than the mean, or the expected value of X. The second moment is the variance. Higher moments give additional information about the distribution and density functions of random variables. Now, defining g( X) = ( X µ r) we get what is called the r:th moment about the mean of the distribution of the random variable X. For r = 0, 1, 2, 3... we get for a discrete variable, and when X is continuous µ r = E[( X µ r) r ] = ( X µ r) r f(x) (3.13) µ r = E[( X µ r) r ] = ( X µ ) r f(x)dx. (3.14) The second moment about the mean, also called the second central moment, is nothing else than the variance of g(x) = x, var( X) = = [ X E( X)] 2 f(x)dx (3.15) X 2 f(x)dx [E( X)] 2 (3.16) = E( X 2 ) [E( X)] 2, (3.17) where f(x) is the value of probability density function of the random variable X at x.a more generic expression for the variance is dispersion. We can say that MOMENTS OF RANDOM VARIABLES 25