PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU



Similar documents
Chapter 9: Univariate Time Series Analysis

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

The information content of lagged equity and bond yields

Chapter 4: Vector Autoregressive Models

Financial Market Efficiency and Its Implications

4. Simple regression. QBUS6840 Predictive Analytics.

Time Series Analysis

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

16 : Demand Forecasting

Review for Exam 2. Instructions: Please read carefully

ADVANCED FORECASTING MODELS USING SAS SOFTWARE

Time Series Analysis of Aviation Data

Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance)

THE PRICE OF GOLD AND STOCK PRICE INDICES FOR

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Part 2: Analysis of Relationship Between Two Variables

Introduction to Regression and Data Analysis

ANALYSIS OF EUROPEAN, AMERICAN AND JAPANESE GOVERNMENT BOND YIELDS

Department of Economics

JetBlue Airways Stock Price Analysis and Prediction

Dynamic Relationship between Interest Rate and Stock Price: Empirical Evidence from Colombo Stock Exchange

Chapter 5: Bivariate Cointegration Analysis

Examining the Relationship between ETFS and Their Underlying Assets in Indian Capital Market

**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.

Regression Analysis: A Complete Example

Time Series - ARIMA Models. Instructor: G. William Schwert

Simple Linear Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

COMP6053 lecture: Time series analysis, autocorrelation.

Dynamics of Real Investment and Stock Prices in Listed Companies of Tehran Stock Exchange

THE EFFECTS OF BANKING CREDIT ON THE HOUSE PRICE

Review for Exam 2. Instructions: Please read carefully

Simple Linear Regression Inference

Chapter 7: Simple linear regression Learning Objectives

Sales forecasting # 2

Univariate and Multivariate Methods PEARSON. Addison Wesley

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

Test3. Pessimistic Most Likely Optimistic Total Revenues Total Costs

MTH 140 Statistics Videos

GRADO EN ECONOMÍA. Is the Forward Rate a True Unbiased Predictor of the Future Spot Exchange Rate?

Chapter 13 Introduction to Linear Regression and Correlation Analysis

A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500

The importance of graphing the data: Anscombe s regression examples

I. Basic concepts: Buoyancy and Elasticity II. Estimating Tax Elasticity III. From Mechanical Projection to Forecast

TIME SERIES ANALYSIS

Performing Unit Root Tests in EViews. Unit Root Testing

Impulse Response Functions

Forecasting in supply chains

Correlational Research

Week TSX Index

Homework 11. Part 1. Name: Score: / null

Chapter 6: Multivariate Cointegration Analysis

Financial Risk Management Exam Sample Questions/Answers

ARE STOCK PRICES PREDICTABLE? by Peter Tryfos York University

Getting Correct Results from PROC REG

Stock market booms and real economic activity: Is this time different?

Time Series Analysis

How To Model A Series With Sas

Department of Economics and Related Studies Financial Market Microstructure. Topic 1 : Overview and Fixed Cost Models of Spreads

Additional sources Compilation of sources:

Non-Stationary Time Series andunitroottests

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

2013 MBA Jump Start Program. Statistics Module Part 3

Using Duration Times Spread to Forecast Credit Risk

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Agenda. Managing Uncertainty in the Supply Chain. The Economic Order Quantity. Classic inventory theory

SAMPLE MID-TERM QUESTIONS

Advanced Forecasting Techniques and Models: ARIMA

Pearson's Correlation Tests

Lecture 1: Asset Allocation

Pricing Corn Calendar Spread Options. Juheon Seok and B. Wade Brorsen

Testing for Cointegrating Relationships with Near-Integrated Data

Outline: Demand Forecasting

AN EMPIRICAL INVESTIGATION OF THE RELATIONSHIP AMONG P/E RATIO, STOCK RETURN AND DIVIDEND YIELS FOR ISTANBUL STOCK EXCHANGE


Forecasting in STATA: Tools and Tricks

Time Series Analysis

Should we Really Care about Building Business. Cycle Coincident Indexes!

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

TIME SERIES ANALYSIS & FORECASTING

Example: Boats and Manatees

Module 5: Statistical Analysis

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

International Statistical Institute, 56th Session, 2007: Phil Everson

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Capital Market Inflation theory: An empirical approach

Econometrics I: Econometric Methods

Contents. List of Figures. List of Tables. List of Examples. Preface to Volume IV

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

I.e., the return per dollar from investing in the shares from time 0 to time 1,

Transcription:

PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU

The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard normal distribution, where x is the sample mean and s is the sample standard deviation. But if the true mean µ is (say) positive, then t will typically be large, in the right tail of a standard normal distribution. If, for example, the t -statistic is 3, we would have strong evidence that the true population mean is not zero. Indeed, the probability that a standard normal exceed 3 is just.0013. So by looking at t -statistics, we can draw conclusions from the data, while controlling the error rates (false positive, false negative). Consider a data set of monthly global temperatures (n = 1632). Is the plot sloping up (global warming), or is it just an illusion?

Temperatures, Northern Hemisphere Monthly: 1854-1989, Seasonally Adjusted Degrees C -1.5-1.0-0.5 0.0 0.5 0 500 1000 1500 Month

- 2 - A simple approach to this: Look at the monthly changes in temperature and test whether these changes have a zero population mean. We get x =.000754 Degrees C / Month and t = 0.11. No evidence of global warming. Another way to approach the problem: Run a simple linear regression of the temperatures on a time variable. The estimated slope is βˆ =.000322 Degrees C / Month, and the t -statistic for the slope is t = 22.2. Now get strong evidence of global warming! There s something strange here, since two apparently reasonable methods give completely different results. What s the problem?

- 3 - Regression is also used for prediction. Let s try predicting this month s stock return (y t ) based on three logged financial ratios from the previous month (time t 1). Data for NYSE, December 1963 - December 1994 (n = 385). The t -statistics for the least-squares coefficients of log dividend yield, log Book-to-Market ratio and log Earnings-to-Price ratio are 3.02, 2.40 and 2.43, respectively. So we have strong evidence of predictability of stock returns based on past financial ratios.

- 4 - Now, let s see if current stock price can be predicted from past stock price. Consider the Russell 2000 stock index. The slope in the linear regression of today s price on yesterday s price is βˆ =.994, with a t -statistic of t = 260. So price is highly predictable from past prices.

Today's Vs. Yesterday's Russell 2000 Index July 27, 2000 - Jan 22, 2003. n=615 400 Russell 300 200 200 300 lagrussell 400

- 5 - Of course, to make money, we have to predict returns. The scatterplot indicates that returns are not too predictable. Linear regression of today s returns on yesterday s yields an estimated slope of βˆ =.00292 and t =.07. No evidence of predictability of stock returns based on past returns.

Today's Vs. Yesterday's Russell 2000 Return July 27, 2000 - Jan 22, 2003. n=615 0.05 RussRet 0.00-0.05-0.05 0.00 lagrussret 0.05

- 6 - Another useful statistical tool is correlation. Consider daily US and UK bond yields (n = 960). The Pearson correlation between the yields is.317, which is highly statistically significant, with a p - value less than.0005. Could also try regressing UK yield against US yield. The slope is βˆ =.3709, t = 10.33. This slope is essentially the same as the correlation in this case. The two yields seem to be significantly linked.

- 7 - The problem: None of our conclusions above can be trusted, because the t -statistic does not behave in the usual way in these situations. In time series, we cannot assume that the observations are independent! This will often affect the distribution of the t -statistic, and invalidate the usual inferences. Plan for the rest of the talk: Discuss correlation Describe the autoregressive model for time series Explain why above analyses were flawed Discuss cointegration to measure co-movement of two or more series.

- 8 - Correlation Suppose X and Y are two random variables, e.g., Yesterday s Russell and Today s Russell. They have theoretical means µ x and µ y. So µ x = E [X ] and µ y = E [Y ]. Define Variance: Var (X ) = E [(X µ x ) 2 ]. Now define covariance. This describes how X and Y move together, or covary. Cov (X,Y ) = E [(X µ x )(Y µ y )]. Note that Cov (X, X ) = Var (X ). Finally, define correlation: Corr (X,Y ) = Cov (X,Y ). Var (X )Var (Y )

- 9 - The Autoregressive Model Let {x t } be a time series, i.e., a sequence of random variables. A very useful model for {x t } is the first-order autoregressive (AR(1)) model. The model is x t = ρx t 1 +ε t, 1<ρ<1 where the {ε t } are independent normal with constant mean (say, zero) and constant variance. Autocorrelation describes the correlations between the series and its time-lagged values. We could plot x t versus x t 1 and estimate the slope. The estimated and true slopes represent the sample and population autocorrelation at lag 1. We could do the same thing for any lag. So we get a sample and population autocorrelation sequence, {ρˆ r } and {ρ r }, for r = 0,1,2,... For the AR (1) model, we have ρ r = ρ r.

- 10 - The AR (1) process is mean reverting: The next value is expected to be closer to the mean (zero) than the current value. The conditional mean of x t +1 is ρx t, and ρ <1. The autocorrelation leads to predictability. As long as ρ 0, the process is predictable. The best predictor of x t +1 is ρx t. However, there is a downside to correlation: It typically invalidates the standard methods of statistical inference. In the global temperatures example, the temperatures show autocorrelation (potentially with a trend added). When you adequately account for the autocorrelation, the t -statistic for global warming based on a regression on time becomes t = 2.44. This is much less than the value t = 22.2 we got earlier assuming no autocorrelation, but still provides moderately strong evidence of global warming. The autocorrelation also affects the variance of the sample mean, thereby invalidating the corresponding t -statistic.

- 11 - In the example on prediction of stock returns based on financial ratios, it turns out that the financial ratios show strong autocorrelation. If we devise an AR(1) model for the ratios, together with a regression model for the stock returns, there will be a correlation between the errors in the two models. The net result of this is that the least-squares coefficients will be biased (they estimate the wrong thing, on average), and the t -statistics will not be valid. When we correctly account for these problems, the t -statistics on the financial ratios become 1.96, 1.31 and 1.25, as compared to the original (incorrect) values of 3.02, 2.40 and 2.43. So the evidence for predictability of stock returns based on financial ratios is actually quite marginal, and far weaker than it seemed before.

- 12 - The Random Walk In the AR (1) model, as ρ approaches 1, the mean reversion becomes weaker: We get longer excursions from zero. For an AR (1) model, we have Var (ε t ) Var (x t ) =. 1 ρ 2 As ρ approaches 1, Var (x t ) goes to. When ρ becomes exactly equal to 1, we get the Random Walk, x t =x t 1 +ε t. The random walk is not stationary, and has an infinite variance. In a random walk, the expected waiting time to get back to the current value is infinite. (Extremely long excursions!). In a random walk starting from zero, the path is much more likely to spend almost all of its time above zero than it is to spend about 50% of its time above zero.

- 13 - Stock prices follow a random walk, as long as markets are efficient. If the price change were predictable, investors would quickly figure this out, thereby removing the predictability. In an efficient market, the best forecast of the future price is the current price, and the best forecast of the future return is zero. Since the variance of a random walk is infinite, it makes no sense to talk about the correlation between stock prices (assuming that the prices follow a random walk, or simply assuming that prices have an infinite variance).

Two independent random walks Estimated Correlation =.53 30 20 xt 10 0-10 Index 100 200 300 400 500

- 14 - It can be shown that if we take two random walks that are completely independent of each other, there is a very high probability of finding a (spuriously) high correlation coefficient between them. (This may explain the bond yield example). This underscores the futility of looking at correlations between two price series. The t -statistic in the regression of one independent random walk on the other goes to as the sample size increases. So even though there is no relationship between the two series, we are guaranteed to declare (wrongly) that there is a relationship if we use naive regression methods and the sample size is large enough. My two simulated independent random walks seem to move together, but it s just an illusion. The Pearson correlation is.53, and the estimated regression coefficient is.74, with a t -statistic of 13.87. All of this "structure" is spurious!

- 15 - Unit Root Tests The random walk nature of prices also invalidates the t -statistic in the regression of current price on past price. To try to determine whether our price data came from a random walk, we can test whether the true slope is 1. But the t -statistic for this hypothesis does not have an approximately standard normal distribution, even if we really have a random walk. Fortunately, the distribution of this t -statistic has been determined (Dickey and Fuller), and tables are available. The result is a unit root test. In the unit root test, we test the null hypothesis that the series is a random walk against the alternative hypothesis that it is an AR (1) with ρ<1. Note that under the alternative hypothesis, the series is stationary, and therefore mean reverting, while under the null hypothesis is it nonstationary.

- 16 - Cointegration Suppose we have two nonstationary series {x t } and {y t }, both (approximately) random walks. How do we measure their tendency to move together? Correlation is meaningless here. Both series wander all over the place, since they are nonstationary. Instead of looking at how they wander from a particular point (such as zero), let s look at how they wander from each other. Maybe the "spread" {y t x t } is stationary. Then even though both series wander all over the place separately, they are tied to each other in that the spread between them is mean reverting. So we can make bets on the reversion of this spread. More generally, maybe there is a β such that the linear combination {y t βx t } is stationary. If so, then we say that {x t } and {y t } are cointegrated.

- 17 - A simple approach to cointegration is first to do unit root tests on {x t } and {y t } separately. Next, estimate β by an (ordinary) regression of {y t } on {x t }, and finally do a unit root test on the residuals {y t βˆx t }. If the tests indicate that {x t } and {y t } are nonstationary, but {y t βˆx t } is stationary, then we declare that {x t } and {y t } are cointegrated, with cointegrating parameter βˆ.