COMP6053 lecture: Time series analysis, autocorrelation. jn2@ecs.soton.ac.uk



Similar documents
9th Russian Summer School in Information Retrieval Big Data Analytics with R

Analysis of algorithms of time series analysis for forecasting sales

Time Series Analysis: Basic Forecasting.

Energy Load Mining Using Univariate Time Series Analysis

Time Series Analysis and Forecasting

TIME SERIES ANALYSIS

Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1

Time Series - ARIMA Models. Instructor: G. William Schwert

TIME SERIES ANALYSIS

Time Series Analysis

IBM SPSS Forecasting 22

MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal

Promotional Forecast Demonstration

Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC

Studying Achievement

Time Series Analysis

Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić

Time Series Analysis

USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY

How To Model A Series With Sas

Advanced Forecasting Techniques and Models: ARIMA

Some useful concepts in univariate time series analysis

Predicting Indian GDP. And its relation with FMCG Sales

ITSM-R Reference Manual

ISSUES IN UNIVARIATE FORECASTING

The SAS Time Series Forecasting System

Time series Forecasting using Holt-Winters Exponential Smoothing

Software Review: ITSM 2000 Professional Version 6.0.

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

16 : Demand Forecasting

Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

2.2 Elimination of Trend and Seasonality

Univariate and Multivariate Methods PEARSON. Addison Wesley

Luciano Rispoli Department of Economics, Mathematics and Statistics Birkbeck College (University of London)

Sales forecasting # 2

5. Multiple regression

(More Practice With Trend Forecasts)

Causal Leading Indicators Detection for Demand Forecasting

Readers will be provided a link to download the software and Excel files that are used in the book after payment. Please visit

Threshold Autoregressive Models in Finance: A Comparative Approach

Introduction to Time Series Analysis. Lecture 1.

Forecasting Using Eviews 2.0: An Overview

Time Series Analysis and Forecasting Methods for Temporal Mining of Interlinked Documents

IBM SPSS Forecasting 21

TIME-SERIES ANALYSIS, MODELLING AND FORECASTING USING SAS SOFTWARE

Lecture 4: Seasonal Time Series, Trend Analysis & Component Model Bus 41910, Time Series Analysis, Mr. R. Tsay

Time Series in Mathematical Finance

Traffic Safety Facts. Research Note. Time Series Analysis and Forecast of Crash Fatalities during Six Holiday Periods Cejun Liu* and Chou-Lin Chen

Time Series Analysis of Aviation Data

8. Time Series and Prediction

Chapter 27 Using Predictor Variables. Chapter Table of Contents

Forecasting in supply chains

CASH DEMAND FORECASTING FOR ATMS

Search Marketing Cannibalization. Analytical Techniques to measure PPC and Organic interaction

Trend and Seasonal Components

Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques Page 1 of 11. EduPristine CMA - Part I

Time Series Analysis of Household Electric Consumption with ARIMA and ARMA Models

Graphical Tools for Exploring and Analyzing Data From ARIMA Time Series Models

Forecasting areas and production of rice in India using ARIMA model

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

Forecasting, Prediction Models, and Times Series Analysis with Oracle Business Intelligence and Analytics. Rittman Mead BI Forum 2013

Week TSX Index

JetBlue Airways Stock Price Analysis and Prediction

Introduction to time series analysis

State Space Time Series Analysis

Univariate Time Series Analysis; ARIMA Models

Vector Time Series Model Representations and Analysis with XploRe

Module 6: Introduction to Time Series Forecasting

Time Series Analysis

Time Series Analysis. 1) smoothing/trend assessment

Chapter 25 Specifying Forecasting Models

Application of ARIMA models in soybean series of prices in the north of Paraná

Univariate Time Series Analysis; ARIMA Models

Promotional Analysis and Forecasting for Demand Planning: A Practical Time Series Approach Michael Leonard, SAS Institute Inc.

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Chapter 4: Vector Autoregressive Models

Introducing Oracle Crystal Ball Predictor: a new approach to forecasting in MS Excel Environment

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Practical Time Series Analysis Using SAS

RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

Ch.3 Demand Forecasting.

Analysis and Computation for Finance Time Series - An Introduction

4. Simple regression. QBUS6840 Predictive Analytics.

FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM

Time Series Analysis

ARMA, GARCH and Related Option Pricing Method

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

C H A P T E R Forecasting statistical fore- casting methods judgmental forecasting methods 27-1

Introduction to Time Series and Forecasting, Second Edition

Sales and operations planning (SOP) Demand forecasting

KATE GLEASON COLLEGE OF ENGINEERING. John D. Hromi Center for Quality and Applied Statistics

Exam Solutions. X t = µ + βt + A t,

Integrated Resource Plan

Time-Series Regression and Generalized Least Squares in R

3. Regression & Exponential Smoothing

Financial Risk Management Exam Sample Questions/Answers

Transcription:

COMP6053 lecture: Time series analysis, autocorrelation jn2@ecs.soton.ac.uk

Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that can predict what comes next? Obvious applications in finance, business, ecology, agriculture, demography, etc.

What's different about time series? In most of the contexts we've seen so far, there's an implicit assumption that observations are independent of each other. In other words, the fact that subject 27 is 165cm tall and terrible at basketball says nothing at all about what will happen with subject 28.

What's different about time series? In time series data, this is not true. We're hoping for exactly the opposite: that what happens at time t contains information about what will happen at time t+1. Observations are treated as both outcome and then predictor variables as we move forward in time.

Ways of dealing with time series Despite (or perhaps because of) the practical uses of time series, there is no single universal technique for handling them. Lots of different ways to proceed depending on the implicit theory of data generation we're proposing. Easiest to illustrate with examples...

Example 1: Lake Huron data Our first example data set is a series of annual measurements of the level of Lake Huron, in feet, from 1875 to 1972. It's a built-in data set in R. So we only need data(lakehuron) to access it. R already "knows" that this is a time series.

Example 1: Lake Huron data

Ex. 2: Australian beer production Our second example is data on monthly Australian beer production, in millions of litres. The time series runs from January 1956 to August 1995. The data is available in beer.csv.

Ex. 2: Australian beer production R doesn't yet know that this is a time series: the data comes in as a list of numbers. We use the ts function to specify that something should be interpreted as a time series, optionally specifying the seasonal period. beer = ts(beer[,1],start=1956,freq=12)

Ex. 2: Australian beer production

Two goals in time series modelling We assume there's some structure in the time series data, obscured by random noise. Structure = trends + seasonal variation. The Lake Huron data has no obvious repetitive structure, but possibly a downward trend. The beer data shows clear seasonality and a trend.

Models of data generation The most basic of data generation is to suppose that there is no structure in the time series at all, and that each observation is an independent random variate. An example: white noise. In this case, the best we can do is simply predict the mean value of the data set.

Lake Huron: prediction if observations were independent

Beer production: prediction if observations were independent

Producing these graphs in R png("beermeanpredict.png",width=800,height=400) plot(beer,xlim=c(1956,2000),lw=2,col="blue") lines(predict(nullbeer,n.ahead=50)$pred, lw=2,col="red") lines(predict(nullbeer,n.ahead=50)$pred +1.96*predict(nullBeer,n.ahead=50)$se, lw=2,lty="dotted",col="red") lines(predict(nullbeer,n.ahead=50)$pred -1.96*predict(nullBeer,n.ahead=50)$se, lw=2,lty="dotted",col="red") graphics.off()

Simple approach to trends We could ignore the seasonal variation and the random noise and simply fit a linear or polynomial model to the data. For example: tb = seq(1956,1995.8,length=length(beer)) tb2 = tb^2 polybeer = lm(beer ~ tb + tb2)

Polynomial fit of lake level on time

Polynomial fit of beer production on time

Regression on time a good idea? This is an OK start: it gives us some sense of what the trend line is. But we probably don't believe that beer production or lake level is a function of the calendar date. More likely these things are a function of their own history, and we need methods that can capture that.

Autoregression A better approach is to ask whether the next value in the time series can be predicted as some function of its previous values. This is called autoregression. We want to build a regression model of the current value fitted on one or more previous values (lagged values). But how many?

Autocorrelation and partial autocorrelation We can look directly at the time series and ask how much information there is in previous values that helps predict the current value. The acf function looks at the correlation between now and various points in the past. Partial autocorrelation(pacf) does the same, but "partials out" the other effects to get the unique contribution of each time-lag.

ACF & PACF, Lake Huron data

ACF & PACF, beer data

ACF & PACF plots ACF shows a correlation that fades as we take longer lagged values in the Lake Huron time series. ACF shows periodic structure in the beer time series reflecting its seasonal nature.

ACF & PACF plots But if t[0] is correlated with t[-1], and t[-1] is correlated with t[-2], then t[0] will necessarily be correlated with t[-2] also. So we need to look at the PACF values. We find that only the most recent value is really useful in building an autoregression model for the Lake Huron data, for example.

Autoregression models With the ar command we can fit autoregression models and ask R to use AIC to decide how many lagged values should be included in the model. For example: arb = ar(beer) The Lake Huron model includes only one lagged value; the beer model includes 24.

Autoregression model, lake data, 1 lagged term

Autoregression model, beer data, 24 lagged terms

Automatically separating trends, seasonal effects, and noise The stl procedure uses locally weighted regression to separate out a trend line, and parcels out the seasonal effect. For example: plot(stl(beer,s.window="periodic"), col="blue",lw=2) If things go well, there should be no autocorrelation structure left in the residuals.

Exponential smoothing A reasonable guess about the next value in a series is that it would be an average of previous values, with the most recent values weighted more strongly. This assumption constitutes exponential smoothing: t 0 = α t -1 + α(1-α)t -2 + α(1-α) 2 t -3...

Holt-Winters procedure The logic can be applied to the basic level of the prediction, to the trend term, and to the seasonal term. The Holt-Winters procedure automatically does this for all three; for example: HWB = HoltWinters(beer)

Holt-Winters analysis on beer data

Holt-Winters analysis on lake data The process seems to work well with the seasonal beer data. For the lake data, we have not specified a seasonal period, and we might also drop the trend term, thus: HWLake = HoltWinters(LakeHuron,gamma=FALSE, beta=false)

Holt-Winters analysis on lake data

Holt-Winters analysis on lake data The fitted alpha value is close to 1 (i.e., a very short memory) so the prediction is that the process will stay where it was. What if we put the trend term back in? HWLake = HoltWinters(LakeHuron,gamma=FALSE)

Holt-Winters analysis on lake data Trend is overdoing it (beta = 0.17)?

Differencing Some time series techniques (e.g., ARIMA) are based on the assumption that the series is stationary, i.e., that it has constant mean, variance, and autocorrelation values over time. If we want to use these techniques we may need to work with the differenced values rather than the raw values.

Differencing This just means transforming t[1] into t[1] - t[0], etc. We can use the diff command to make this easy. To plot the beer data as a differenced series: plot(diff(beer),lw=2,col="green")

Differencing

Some housekeeping in R To get access to some relevant ARIMA model fitting functions, we need to download the "forecast" package. install.packages("forecast") library(forecast)

Auto-regressive integrated movingaverage models (ARIMA) ARIMA is a method for putting together all of the techniques we've seen so far. A non-seasonal ARIMA model is specified with p, d, and q parameters. p: no. of autoregression terms. d: no. of difference levels. q: no. of moving-average (smoothing) terms.

Auto-regressive integrated movingaverage models (ARIMA) ARIMA(0,0,0) is simply predicting the mean of the overall time series, i.e., no structure. ARIMA(0,1,0) works with differences, not raw values, and predicts the next value without any autoregression or smoothing. This is therefore a random walk. ARIMA(1,0,0) and ARIMA(24,0,0) are the models we originally fitted to the lake and beer data.

Auto-regressive integrated movingaverage models (ARIMA) We can also have seasonal ARIMA models: three more terms apply to the seasonal effects. The "forecast" library includes a very convenient auto.arima function that uses AIC to find the most parsimonious model in the space of possible models.

ARIMA(1,1,2) model of lake data

ARIMA(2,1,2)(2,0,0)[12] model of beer data

Fourier transforms No time to discuss Fourier transforms... But they're useful when you suspect there are seasonal or cyclic components in the data, but you don't yet know the period of these components. In the beer example, we already knew the seasonal period was 12, of course.

Additional material The beer.csv data set. The R script used to do the analyses. A general intro to time series analysis in R by Walter Zucchini and Oleg Nenadic. An intro to ARIMA models by Robert Nau. Another useful intro to time series analysis.