Time Series Analysis with R - Part I. Walter Zucchini, Oleg Nenadić
|
|
|
- Douglas Cox
- 10 years ago
- Views:
Transcription
1 Time Series Analysis with R - Part I Walter Zucchini, Oleg Nenadić
2 Contents 1 Getting started Downloading and Installing R Data Preparation and Import in R Basic R commands: Data Manipulation and Visualization Simple Component Analysis Linear Filtering of Time Series Decomposition of Time Series Regression analysis Exponential Smoothing Introductionary Remarks Exponential Smoothing and Prediction of Time Series ARIMA Models Introductionary Remarks Analysis of Autocorrelations and Partial Autocorrelations Parameter Estimation of ARIMA Models Diagnostic Checking Prediction of ARIMA Models A Function Reference 22 1
3 Chapter 1 Getting started 1.1 Downloading and Installing R R is a widely used environment for statistical analysis. The striking difference between R and most other statistical software is that it is free software and that it is maintained by scientists for scientists. Since its introduction in 1996, the R project has gained many users and contributors, which continously extend the capabilities of R by releasing add ons (packages) that offer previously not available functions and methods or improve the existing ones. One disadvantage or advantage, depending on the point of view, is that R is used within a command line interface, which imposes a slightly steeper learning curve than other software. But, once this burden hab been taken, R offers almost unlimited possibilities for statistical data analysis. R is distributed by the Comprehensive R Archive Network (CRAN) it is available from the url: The current version of R (1.7.0, approx. 20 MB) for Windows can be downloaded by selecting R binaries windows base and downloading the file rw1070.exe from the CRAN website. R can then be installed by executing the downloaded file. The installation procedure is straightforward, one usually only has to specify the target directory in which to install R. After the installation, R can be started like any other application for Windows, that is by double clicking on the corresponding icon. 1.2 Data Preparation and Import in R Importing data into R can be carried out in various ways to name a few, R offers means for importing ASCII and binary data, data from other applications or even for database connections. Since a common denominator for data analysis (cough) seem to be spreadsheet applications like e.g. Microsoft Excel c, the 2
4 CHAPTER 1. GETTING STARTED 3 remainder of this section will focus on importing data from Excel to R. Let s assume we have the following dataset tui as a spreadsheet in Excel. (The dataset can be downloaded from as a zipped Excel file). The spreadsheet contains stock data for the TUI AG from Jan., 3rd 2000 to May, 14th 2002, namely date (1st column), opening values (2nd column), highest and lowest values (3rd and 4th column), closing values (5th column) and trading volumes (6th column). A convenient way of preparing the data is to clean up the table within Excel, so that only the data itself and one row containing the column names remain. Once the data has this form, it can be exported as a CSV ( comma seperated values ) file, e.g. to C:/tui.csv. After conversion to a CSV file, the data can be loaded into R. In our case, the tui dataset is imported by typing tui <- read.csv("c:/tui.csv", header=t, dec=",", sep=";") into the R console. The right hand side, read.csv( ), is the R command, which reads the CSV file (tui.csv). Note, that paths use slashes / instead of backslashes. Further options within this command include header and dec, which specify if the dataset has the first row containing column names (header=t). In case one does not have a naming row, one would use header=f instead. The option dec sets the decimal seperator used in case it differs from a point (here a comma is used as a decimal seperator). 1.3 Basic R commands: Data Manipulation and Visualization The dataset is stored as a matrix object with the name tui. In order to access particular elements of objects, square brackets ([ ]) are used. Rows and colums of matrices can be accessed with object[ row,column]. The closing values of the TUI shares are stored in the 5th column, so they can be selected with tui[,5]. A chart with the closing values can be created using the plot( ) command: plot(tui[,5],type="l") The plot( ) command allows for a number of optional arguments, one of them is type="l", which sets the plot type to lines. Especially graphics commands allow for a variety of additional options, e.g.: plot(tui[,5], type="l", lwd=2, col="red", xlab="time", ylab="closing values",
5 CHAPTER 1. GETTING STARTED 4 main="tui AG", ylim=c(0,60) ) TUI AG closing values time Figure 1.1: Creating charts with the plot( ) command First it has to be noted that R allows commands to be longer than one line the plot( ) command is evaluated after the closing bracket! The option lwd=2 (line width) is used to control the thickness of the plotted line and col="red" controls the colour used (a list of available colour names for using within this option can be obtained by typing colors() into the console). xlab and ylab are used for labeling the axes while main specifies the title of the plot. Limits of the plotting region from a to b (for the x and y axes) can be set using the xlim=c(a,b) and/or ylim=c(a,b) options. A complete list of available options can be displayed using the help system of R. Typing?plot into the console opens the help file for the plot( ) command (the helpfile for the graphical parameters is accessed with?par). Every R function has a corresponding help file which can be accessed by typing a question mark and the command (without brackets). It contains further details about the function and available options, references and examples of usage. Now back to our TUI shares. Assume that we want to plot differences of the logarithms of the returns. In order to do so, we need two operators, log( ) which takes logarithms and diff( ) which computes differences of a given object: plot(diff(log(tui[,5])),type="l")
6 CHAPTER 1. GETTING STARTED 5 Operations in R can be nested (diff(log( ))) as in the example above one just needs to care about the brackets (each opening bracket requires a closing one)! Another aspect to time series is to investigate distributional properties. A first step would be to draw a histogram and to compare it with e.g. the density of the normal distribution: Histogram of diff(tui[, 4]) Density diff(tui[, 4]) Figure 1.2: Comparing histograms with densities The figure is created using following commands: hist(diff(tui[,4]),prob=t,ylim=c(0,0.6),xlim=c(-5,5),col="red") lines(density(diff(tui[,4])),lwd=2) In the first line, the hist( ) command is used for creating a histogramm of the differences. The option prob=t causes the histogram to be displayed based on relative frequencies. The other options (ylim, xlim and col) are solely for enhancing the display. A nonparametric estimate of the density is added using the function density( ) together with lines( ). In order to add the density of the fitted normal distribution, one needs to estimate the mean and the standard deviation using mean( ) and sd( ): mu<-mean(diff(tui[,4])) sigma<-sd(diff(tui[,4]))
7 CHAPTER 1. GETTING STARTED 6 Having estimated the mean and standard deviation, one additionally needs to define x values for which the corresponding values of the normal distribution are to be calculated: x<-seq(-4,4,length=100) y<-dnorm(x,mu,sigma) lines(x,y,lwd=2,col="blue") seq(a,b,length) creates a sequence of length values from a to b in this case 100 values from 4 to 4. The corresponding values of the normal distribution for given µ and σ are then computed using the function dnorm(x,µ,σ). Adding lines to existing plots is done with the function lines(x,y) the major difference to plot(x,y) is that plot( ) clears the graphics window. Another option for comparison with the normal distribution is given using the function qqnorm(diff(tui[,4])) which draws a quantile- quantile plot: Normal Q Q Plot Sample Quantiles Theoretical Quantiles Figure 1.3: Comparing empirical with theoretical quantiles Without going too much into detail, the ideal case (i.e. normally distributed observations) is given when the observations lie on the line (which is drawn using abline(0,1)). There are various means for testing normality of given data. The Kolmogorov- Smirnoff test can be used to test if sample follows a specific distribution. In order
8 CHAPTER 1. GETTING STARTED 7 to test the normality of the differences of the logs from the TUI-shares, one can use the function ks.test( ) together with the corresponding distribution name: x<-diff(log(tui[,5])) ks.test(x,"pnorm",mean(x),sd(x)) Since we want to compare the data against the (fitted) normal distribution, we include "pnorm", mean(x) and sd(x) (as result we should get a significant deviation from the normal distribution). Unfortunately, the Kolmogorov Smirnoff test does not take account of where the deviations come from causes for that range from incorrectly estimated parameters, skewness to leptocurtic distributions. Another possibility (for testing normality) is to use the Shapiro Test (which still does not take account of skewness etc., but generally performs a bit better): shapiro.test(x) Since this test is a test for normality, we do not have to specify a distribution but only the data which is to be tested (here the results should be effectively the same, except for a slightly different p-value).
9 Chapter 2 Simple Component Analysis 2.1 Linear Filtering of Time Series A key concept in traditional time series analysis is the decomposition of a given time series X t into a trend T t, a seasonal component S t and the remainder e t. A common method for obtaining the trend is to use linear filters on given time series: T t = λ i X t+i i= A simple class of linear filters are moving averages with equal weights: T t = 1 2a + 1 a i= a X t+i In this case, the filtered value of a time series at a given period τ is represented by the average of the values {x τ a,..., x τ,..., x τ+a }. The coefficients of the filtering are { 1,..., 1 }. 2a+1 2a+1 Applying moving averages with a = 2, 12, and 40 to the closing values of our tui dataset implies using following filters: a = 2 : λ i = { 1 5, 1 5, 1 5, 1 5, 1 5 } a = 12 : λ i = { 1 25,..., 1 25 } } {{ } 25 times a = 40 : λ i = { 1 81,..., 1 81 } } {{ } 81 times 8
10 CHAPTER 2. SIMPLE COMPONENT ANALYSIS 9 A possible interpretation of these filters are (approximately) weekly (a = 2), monthly (a = 12) and quaterly (a = 40) averages of returns. The fitering is carried out in R with the filter( ) command. One way to plot the closing values of the TUI shares and the averages in different colours is to use the following code: library(ts) plot(tui[,5],type="l") tui.1 <- filter(tui[,5],filter=rep(1/5,5)) tui.2 <- filter(tui[,5],filter=rep(1/25,25)) tui.3 <- filter(tui[,5],filter=rep(1/81,81)) lines(tui.1,col="red") lines(tui.2,col="purple") lines(tui.3,col="blue") It creates the following output: tui[, 4] Index Figure 2.1: Closing values and averages for a = 2, 12 and Decomposition of Time Series Another possibility for evaluating the trend of a time series is to use nonparametric regression techniques (which in fact can be seen as a special case of linear
11 CHAPTER 2. SIMPLE COMPONENT ANALYSIS 10 filters). The function stl( ) performs a seasonal decomposition of a given time series X t by determining the trend T t using loess regression and then calculating the seasonal component S t (and the residuals e t ) from the differences X t T t. Performing the seasonal decomposition for the time series beer (monthly beer production in Australia from Jan to Aug. 1995) is done using the following commands: beer<-read.csv("c:/beer.csv",header=t,dec=",",sep=";") beer<-ts(beer[,1],start=1956,freq=12) plot(stl(log(beer),s.window="periodic")) remainder trend seasonal data time Figure 2.2: Seasonal decomposition using stl( ) First, the data is read from C:/beer.csv (the datafile is available for download from as a zipped csv-file). Then the data is transformed into a ts object. This transformation is required for most of the time series functions, since a time series contains more information than the values itself, namely information about dates and frequencies at which the time series has been recorded.
12 CHAPTER 2. SIMPLE COMPONENT ANALYSIS Regression analysis R offers the functions lsfit( ) (least squares fit) and lm( ) (linear models, a more general function) for regression analysis. This section focuses on lm( ), since it offers more features, especially when it comes to testing significance of the coefficients. Consider again the beer data. Assume that we wish to fit the following model (a parabola) to the logs of beer: log(x t ) = α 0 + α 1 t + α 2 t 2 + e t The fit can be carried out in R with the following commands: lbeer<-log(beer) t<-seq(1956,1995.2,length=length(beer)) t2<-t^2 plot(lbeer) lm(lbeer~t+t2) lines(lm(lbeer~t+t2)$fit,col=2,lwd=2) lbeer Time Figure 2.3: Fitting a parabola to lbeer using lm( ) In the first row of the commands above, logarithms of beer are calculated and stored as lbeer. Explanatory variables (t and t 2 as t and t2) are defined in the
13 CHAPTER 2. SIMPLE COMPONENT ANALYSIS 12 second and third row. The actual fit of the model is done using lm(lbeer~t+t2). The function lm( ) returns a list object, whose element can be accessed using the $ sign: lm(lbier~t+t2)$coefficients returns the estimated coefficients (α 0, α 1 and α 2 ); lm(lbier~t+t2)$fit returns the fitted values ˆX t of the model. Extending the model to log(x t ) = α 0 + α 1 t + α 2 t 2 + β cos ( ) 2 π + γ sin 12 ( ) 2 π + e t 12 so that it includes the first Fourier frequency is straightforward. After defining the two additional explanatory variables, cos.t and sin.t, the model can be estimated in the usual way : lbeer<-log(beer) t<-seq(1956,1995.2,length=length(beer)) t2<-t^2 sin.t<-sin(2*pi*t) cos.t<-cos(2*pi*t) plot(lbeer) lines(lm(lbeer~t+t2+sin.t+cos.t)$fit,col=4) lbeer Time Figure 2.4: Fitting a parabola and the first fourier frequency to lbeer Note that in this case sin.t does not include 12 in the denominator, since 1 12 has already been considered during the transformation of beer and the creation of t.
14 CHAPTER 2. SIMPLE COMPONENT ANALYSIS 13 Another important aspect in regression analysis is to test the significance of the coefficients. In the case of lm( ), one can use the summary( ) command: summary(lm(lbeer t+t2+sin.t+cos.t)) which returns the following output: Call: lm(formula = lbeer ~ t + t2 + sin.t + cos.t) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e e <2e-16 *** t 3.868e e <2e-16 *** t e e <2e-16 *** sin.t e e <2e-16 *** cos.t e e Signif. codes: 0 *** ** 0.01 * Residual standard error: on 471 degrees of freedom Multiple R-Squared: , Adjusted R-squared: 0.8 F-statistic: on 4 and 471 DF, p-value: < 2.2e-16 Apart from the coefficient estimates and their standard error, the output also includes the corresponding t-statistics and p values. In our case, the coefficients α 0 (Intercept), α 1 (t), α 2 (t 2 ) and β (sin(t)) differ significantly from zero, while γ does not seem to (one would include γ anyway, since Fourier frequencies are always kept in pairs of sine and cosine).
15 Chapter 3 Exponential Smoothing 3.1 Introductionary Remarks A natural estimate for predicting the next value of a given time series x t at the period t = τ is to take weighted sums of past observations: ˆx (t=τ) (1) = λ 0 x τ + λ 1 x τ It seems reasonable to weight recent observations more than observations from the past. Hence, one possibility is to use geometric weights λ i = α(1 α) i ; 0 < α < 1 such that ˆx (t=τ) (1) = α x τ + α(1 α) x τ 1 + α(1 α) 2 x τ Exponential smoothing in its basic form (the term exponential comes from the fact that the weights decay exponentially) should only be used for time series with no systematic trend and/or seasonal components. It has been generalized to the Holt Winters procedure in order to deal with time series containg trend and seasonal variation. In this case, three smoothing parameters are required, namely α (for the level), β (for the trend) and γ (for the seasonal variation). 3.2 Exponential Smoothing and Prediction of Time Series The ts library in R contains the function HoltWinters(x,alpha,beta,gamma), which lets one perform the Holt Winters procedure on a time series x. One can specify the three smoothing parameters with the options alpha, beta and gamma. Particular components can be excluded by setting the value of the corresponding parameter to zero, e.g. one can exclude the seasonal component by using 14
16 CHAPTER 3. EXPONENTIAL SMOOTHING 15 gamma=0. In case one does not specify smoothing parameters, these are determined automatically (i.e. by minimizing the mean squared prediction error from one step forecasts). Thus, exponential smoothing of the beer dataset could be performed as follows in R: beer<-read.csv("c:/beer.csv",header=t,dec=",",sep=";") beer<-ts(beer[,1],start=1956,freq=12) This loads the dataset from the CSV file and transforms it to a ts object. HoltWinters(beer) This performs the Holt Winters procedure on the beer dataset. As result, it displays a list with e.g. the smoothing parameters (which should be α 0.076, β 0.07 and γ in this case). Another component of the list is the enty fitted, which can be accessed with HoltWinters(beer)$fitted: plot(beer) lines(holtwinters(beer)$fitted,col="red") beer Time Figure 3.1: Exponential smoothing of the beer dataset R offers the function predict( ), which is a generic function for predictions from various models. In order to use predict( ), one has to save the fit of a model
17 CHAPTER 3. EXPONENTIAL SMOOTHING 16 to an object, e.g.: beer.hw<-holtwinters(beer) In this case, we have saved the fit from the Holt Winters procedure on beer as beer.hw. predict(beer.hw,n.ahead=12) gives us the predicted values for the next 12 periods (i.e. Sep to Aug. 1996). The following commands can be used to create a graph with the predictions for the next 4 years (i.e. 48 months): plot(beer,xlim=c(1956,1999)) lines(predict(beer.hw,n.ahead=48),col=2) beer Time Figure 3.2: Predicting beer with exponential smoothing
18 Chapter 4 ARIMA Models 4.1 Introductionary Remarks Forecasting based on ARIMA (autoregressive integrated moving averages) models, commonly know as the Box Jenkins approach, comprises following stages: i.) Model identification ii.) Parameter estimation iii.) Diagnostic checking These stages are repeated until a suitable model for the given data has been identified (e.g. for prediction). The following three sections show some facilities that R offers for assisting the three stages in the Box Jenkins approach. 4.2 Analysis of Autocorrelations and Partial Autocorrelations A first step in analyzing time series is to examine the autocorrelations (ACF) and partial autocorrelations (PACF). R provides the functions acf( ) and pacf( ) for computing and plotting of ACF and PACF. The order of pure AR and MA processes can be identified from the ACF and PACF as shown below: sim.ar<-arima.sim(list(ar=c(0.4,0.4)),n=1000) sim.ma<-arima.sim(list(ma=c(0.6,-0.4)),n=1000) par(mfrow=c(2,2)) acf(sim.ar,main="acf of AR(2) process") acf(sim.ma,main="acf of MA(2) process") pacf(sim.ar,main="pacf of AR(2) process") pacf(sim.ma,main="pacf of MA(2) process") 17
19 CHAPTER 4. ARIMA MODELS 18 ACF of AR(2) process ACF of MA(2) process ACF ACF Lag Lag PACF of AR(2) process PACF of MA(2) process Partial ACF Partial ACF Lag Lag Figure 4.1: ACF and PACF of AR and MA models The function arima.sim( ) was used to simulate ARIMA(p,d,q) models ; in the first line 1000 observations of an ARIMA(2,0,0) model (i.e. AR(2) model) were simulated and saved as sim.ar. Equivalently, the second line simulated 1000 observations from a MA(2) model and saved them to sim.ma. An useful command for graphical displays is par(mfrow=c(h,v)) which splits the graphics window into (h v) regions in this case we have set up 4 seperate regions within the graphics window. The last four lines created the ACF and PACF plots of the two simulated processes. Note that by default the plots include confidence intervals (based on uncorrelated series). 4.3 Parameter Estimation of ARIMA Models Once the order of the ARIMA(p,d,q) model has been specified, the function arima( ) from the ts library can be used to estimate the parameters: arima(data,order=c(p,d,q)) Fitting e.g. an ARIMA(1,0,1) model on the LakeHuron dataset (annual levels of the Lake Huron from 1875 to 1972) is done using
20 CHAPTER 4. ARIMA MODELS 19 data(lakehuron) fit<-arima(lakehuron,order=c(1,0,1)) Here, fit is a list containing e.g. the coefficients (fit$coef), residuals (fit$residuals) and the Akaike Information Criterion AIC (fit$aic). 4.4 Diagnostic Checking A first step in diagnostic checking of fitted models is to analyze the residuals from the fit for any signs of non randomness. R has the function tsdiag( ), which produces a diagnostic plot of a fitted time series model: fit<-arima(lakehuron,order=c(1,0,1)) tsdiag(fit) It produces following output containing a plot of the residuals, the autocorrelation of the residuals and the p-values of the Ljung Box statistic for the first 10 lags: Standardized Residuals Time ACF of Residuals ACF Lag p values for Ljung Box statistic p value lag Figure 4.2: Output from tsdiag
21 CHAPTER 4. ARIMA MODELS 20 The Box Pierce (and Ljung Box) test examines the Null of independently distributed residuals. It s derived from the idea that the residuals of a correctly specified model are independently distributed. If the residuals are not, then they come from a miss specified model. The function Box.test( ) computes the test statistic for a given lag: Box.test(fit$residuals,lag=1) 4.5 Prediction of ARIMA Models Once a model has been identified and its parameters have been estimated, one purpose is to predict future values of a time series. Lets assume, that we are satisfied with the fit of an ARIMA(1,0,1) model to the LakeHuron data: fit<-arima(lakehuron,order=c(1,0,1)) As with Exponential Smoothing, the function predict( ) can be used for predicting future values of the levels under the model: LH.pred<-predict(fit,n.ahead=8) Here we have predicted the levels of Lake Huron for the next 8 years (i.e. until 1980). In this case, LH.pred is a list containing two entries, the predicted values LH.pred$pred and the standard errors of the prediction LH.pred$se. Using a rule of thumb for an approximate confidence interval (95%) of the prediction, prediction ± 2 SE, one can e.g. plot the Lake Huron data, predicted values and an approximate confidence interval: plot(lakehuron,xlim=c(1875,1980),ylim=c(575,584)) LH.pred<-predict(fit,n.ahead=8) lines(lh.pred$pred,col="red") lines(lh.pred$pred+2*lh.pred$se,col="red",lty=3) lines(lh.pred$pred-2*lh.pred$se,col="red",lty=3) First, the levels of Lake Huron are plotted. To leave some space for adding the predicted values, the x-axis has been limited from 1875 to 1980 with xlim=c(1875,1980) ; the use of ylim is purely for cosmetic purposes here. The prediction takes place in the second line using predict( ) on our fitted model. Adding the prediction and the approximate confidence interval is done in the last three lines. The confidence bands are drawn as a red, dotted line (using the options col="red" and lty=3):
22 CHAPTER 4. ARIMA MODELS 21 LakeHuron Time Figure 4.3: Lake Huron levels and predicted values
23 Appendix A Function Reference abline( ) Graphics command p. 6 acf( ) Estimation of the autocorrelation function p. 17 arima( ) Fitting ARIMA models p. 18 arima.sim( ) Simulation of ARIMA models p. 17 Box.test( ) Box Pierce and Ljung Box test p. 20 c( ) Vector command p. 5 cos( ) Cosine p. 12 density( ) Density estimation p. 5 diff( ) Takes differences p. 4 dnorm( ) Normal distribution p. 6 filter( ) Filtering of time series p. 9 hist( ) Draws a histogram p. 5 HoltWinters( ) Holt Winters procedure p. 14 ks.test( ) Kolmogorov Smirnov test p. 5 length( ) Vector command p. 12 lines( ) Graphics command p. 5 lm( ) Linear models p. 11 log( ) Calculates logs p. 4 lsfit( ) Least squares estimation p. 11 mean( ) Calculates means p. 5 pacf( ) Estimation of the partial autocorrelation function p. 17 plot( ) Graphics command p. 3 predict( ) Generic function for prediction p. 15 read.csv( ) Data import from CSV files p. 3 rep( ) Vector command p. 9 sd( ) Standard deviation p. 5 seq( ) Vector command p. 6 shapiro.test( ) Shapiro Wilk test p. 7 sin( ) Sine p. 12 stl( ) Seasonal decomposition of time series p. 10 summary( ) Generic function for summaries p. 13 ts( ) Creating time series objects p. 10 tsdiag( ) Time series diagnostic p. 19 qqnorm( ) Quantile quantile plot p. 6 22
Statistical Analysis with R
Statistical Analysis with R - a quick start - OLEG NENADIĆ, WALTER ZUCCHINI September 2004 Contents 1 An Introduction to R 3 1.1 Downloading and Installing R..................... 3 1.2 Getting Started..............................
9th Russian Summer School in Information Retrieval Big Data Analytics with R
9th Russian Summer School in Information Retrieval Big Data Analytics with R Introduction to Time Series with R A. Karakitsiou A. Migdalas Industrial Logistics, ETS Institute Luleå University of Technology
COMP6053 lecture: Time series analysis, autocorrelation. [email protected]
COMP6053 lecture: Time series analysis, autocorrelation [email protected] Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that
Multiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
Using R for Windows and Macintosh
2010 Using R for Windows and Macintosh R is the most commonly used statistical package among researchers in Statistics. It is freely distributed open source software. For detailed information about downloading
TIME SERIES ANALYSIS
TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 [email protected]. Introduction Time series (TS) data refers to observations
Time Series Analysis
Time Series 1 April 9, 2013 Time Series Analysis This chapter presents an introduction to the branch of statistics known as time series analysis. Often the data we collect in environmental studies is collected
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
TIME SERIES ANALYSIS
TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 [email protected] 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these
Time Series Analysis and Forecasting
Time Series Analysis and Forecasting Math 667 Al Nosedal Department of Mathematics Indiana University of Pennsylvania Time Series Analysis and Forecasting p. 1/11 Introduction Many decision-making applications
Psychology 205: Research Methods in Psychology
Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready
Time Series Analysis of Aviation Data
Time Series Analysis of Aviation Data Dr. Richard Xie February, 2012 What is a Time Series A time series is a sequence of observations in chorological order, such as Daily closing price of stock MSFT in
IBM SPSS Forecasting 22
IBM SPSS Forecasting 22 Note Before using this information and the product it supports, read the information in Notices on page 33. Product Information This edition applies to version 22, release 0, modification
ITSM-R Reference Manual
ITSM-R Reference Manual George Weigt June 5, 2015 1 Contents 1 Introduction 3 1.1 Time series analysis in a nutshell............................... 3 1.2 White Noise Variance.....................................
Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC
Using JMP Version 4 for Time Series Analysis Bill Gjertsen, SAS, Cary, NC Abstract Three examples of time series will be illustrated. One is the classical airline passenger demand data with definite seasonal
5 Correlation and Data Exploration
5 Correlation and Data Exploration Correlation In Unit 3, we did some correlation analyses of data from studies related to the acquisition order and acquisition difficulty of English morphemes by both
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller
Tutorial 3: Graphics and Exploratory Data Analysis in R Jason Pienaar and Tom Miller Getting to know the data An important first step before performing any kind of statistical analysis is to familiarize
Time Series Analysis AMS 316
Time Series Analysis AMS 316 Programming language and software environment for data manipulation, calculation and graphical display. Originally created by Ross Ihaka and Robert Gentleman at University
Statistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
Time Series Analysis
Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos
IBM SPSS Forecasting 21
IBM SPSS Forecasting 21 Note: Before using this information and the product it supports, read the general information under Notices on p. 107. This edition applies to IBM SPSS Statistics 21 and to all
MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
Using R for Linear Regression
Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional
7 Time series analysis
7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are
3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
Getting started with qplot
Chapter 2 Getting started with qplot 2.1 Introduction In this chapter, you will learn to make a wide variety of plots with your first ggplot2 function, qplot(), short for quick plot. qplot makes it easy
Regression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
Generalized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
A Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 1 Introduction 2 2 Installing R and RStudio 2 3 The RStudio Environment 2 4 Additions
2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
Forecasting in STATA: Tools and Tricks
Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be
Data Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
Advanced Forecasting Techniques and Models: ARIMA
Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: [email protected]
Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
Java Modules for Time Series Analysis
Java Modules for Time Series Analysis Agenda Clustering Non-normal distributions Multifactor modeling Implied ratings Time series prediction 1. Clustering + Cluster 1 Synthetic Clustering + Time series
KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
We extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
Graphics in R. Biostatistics 615/815
Graphics in R Biostatistics 615/815 Last Lecture Introduction to R Programming Controlling Loops Defining your own functions Today Introduction to Graphics in R Examples of commonly used graphics functions
Time Series - ARIMA Models. Instructor: G. William Schwert
APS 425 Fall 25 Time Series : ARIMA Models Instructor: G. William Schwert 585-275-247 [email protected] Topics Typical time series plot Pattern recognition in auto and partial autocorrelations
Software Review: ITSM 2000 Professional Version 6.0.
Lee, J. & Strazicich, M.C. (2002). Software Review: ITSM 2000 Professional Version 6.0. International Journal of Forecasting, 18(3): 455-459 (June 2002). Published by Elsevier (ISSN: 0169-2070). http://0-
MIXED MODEL ANALYSIS USING R
Research Methods Group MIXED MODEL ANALYSIS USING R Using Case Study 4 from the BIOMETRICS & RESEARCH METHODS TEACHING RESOURCE BY Stephen Mbunzi & Sonal Nagda www.ilri.org/rmg www.worldagroforestrycentre.org/rmg
Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model
Tropical Agricultural Research Vol. 24 (): 2-3 (22) Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model V. Sivapathasundaram * and C. Bogahawatte Postgraduate Institute
Chapter 23. Inferences for Regression
Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily
Financial Econometrics MFE MATLAB Introduction. Kevin Sheppard University of Oxford
Financial Econometrics MFE MATLAB Introduction Kevin Sheppard University of Oxford October 21, 2013 2007-2013 Kevin Sheppard 2 Contents Introduction i 1 Getting Started 1 2 Basic Input and Operators 5
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
ANOVA. February 12, 2015
ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Vector Time Series Model Representations and Analysis with XploRe
0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin [email protected] plore MulTi Motivation
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang
Exercises on using R for Statistics and Hypothesis Testing Dr. Wenjia Wang School of Computing Sciences, UEA University of East Anglia Brief Introduction to R R is a free open source statistics and mathematical
How To Model A Series With Sas
Chapter 7 Chapter Table of Contents OVERVIEW...193 GETTING STARTED...194 TheThreeStagesofARIMAModeling...194 IdentificationStage...194 Estimation and Diagnostic Checking Stage...... 200 Forecasting Stage...205
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller
Tutorial 2: Reading and Manipulating Files Jason Pienaar and Tom Miller Most of you want to use R to analyze data. However, while R does have a data editor, other programs such as excel are often better
Package empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
Simple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events
Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
Some useful concepts in univariate time series analysis
Some useful concepts in univariate time series analysis Autoregressive moving average models Autocorrelation functions Model Estimation Diagnostic measure Model selection Forecasting Assumptions: 1. Non-seasonal
Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.
Excel Tutorial Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information. Working with Data Entering and Formatting Data Before entering data
Time Series Analysis of Household Electric Consumption with ARIMA and ARMA Models
, March 13-15, 2013, Hong Kong Time Series Analysis of Household Electric Consumption with ARIMA and ARMA Models Pasapitch Chujai*, Nittaya Kerdprasop, and Kittisak Kerdprasop Abstract The purposes of
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
Sales forecasting # 2
Sales forecasting # 2 Arthur Charpentier [email protected] 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
4 Other useful features on the course web page. 5 Accessing SAS
1 Using SAS outside of ITCs Statistical Methods and Computing, 22S:30/105 Instructor: Cowles Lab 1 Jan 31, 2014 You can access SAS from off campus by using the ITC Virtual Desktop Go to https://virtualdesktopuiowaedu
Forecasting Using Eviews 2.0: An Overview
Forecasting Using Eviews 2.0: An Overview Some Preliminaries In what follows it will be useful to distinguish between ex post and ex ante forecasting. In terms of time series modeling, both predict values
Time Series Analysis: Basic Forecasting.
Time Series Analysis: Basic Forecasting. As published in Benchmarks RSS Matters, April 2015 http://web3.unt.edu/benchmarks/issues/2015/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD [email protected]
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data
Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable
Promotional Forecast Demonstration
Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion
Time Series Analysis
Time Series Analysis [email protected] Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
NCSS Statistical Software
Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the
Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
Time Series in Mathematical Finance
Instituto Superior Técnico (IST, Portugal) and CEMAT [email protected] European Summer School in Industrial Mathematics Universidad Carlos III de Madrid July 2013 Outline The objective of this short
Chapter 27 Using Predictor Variables. Chapter Table of Contents
Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339
Exchange Rate Regime Analysis for the Chinese Yuan
Exchange Rate Regime Analysis for the Chinese Yuan Achim Zeileis Ajay Shah Ila Patnaik Abstract We investigate the Chinese exchange rate regime after China gave up on a fixed exchange rate to the US dollar
Comparing Nested Models
Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller
GeoGebra Statistics and Probability
GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,
Figure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
SPSS TUTORIAL & EXERCISE BOOK
UNIVERSITY OF MISKOLC Faculty of Economics Institute of Business Information and Methods Department of Business Statistics and Economic Forecasting PETRA PETROVICS SPSS TUTORIAL & EXERCISE BOOK FOR BUSINESS
Lab 13: Logistic Regression
Lab 13: Logistic Regression Spam Emails Today we will be working with a corpus of emails received by a single gmail account over the first three months of 2012. Just like any other email address this account
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16
Summary of R software commands used to generate bootstrap and permutation test output and figures in Chapter 16 Since R is command line driven and the primary software of Chapter 16, this document details
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA
ESTIMATING THE DISTRIBUTION OF DEMAND USING BOUNDED SALES DATA Michael R. Middleton, McLaren School of Business, University of San Francisco 0 Fulton Street, San Francisco, CA -00 -- [email protected]
N-Way Analysis of Variance
N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS. J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID
SIMPLIFIED PERFORMANCE MODEL FOR HYBRID WIND DIESEL SYSTEMS J. F. MANWELL, J. G. McGOWAN and U. ABDULWAHID Renewable Energy Laboratory Department of Mechanical and Industrial Engineering University of
Threshold Autoregressive Models in Finance: A Comparative Approach
University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Informatics 2011 Threshold Autoregressive Models in Finance: A Comparative
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
SUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY
Paper PO10 USE OF ARIMA TIME SERIES AND REGRESSORS TO FORECAST THE SALE OF ELECTRICITY Beatrice Ugiliweneza, University of Louisville, Louisville, KY ABSTRACT Objectives: To forecast the sales made by
How To Use Statgraphics Centurion Xvii (Version 17) On A Computer Or A Computer (For Free)
Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades
Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs
1.1 Introduction Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs For brevity, the Lavastorm Analytics Library (LAL) Predictive and Statistical Analytics Node Pack will be
Analysis and Computation for Finance Time Series - An Introduction
ECMM703 Analysis and Computation for Finance Time Series - An Introduction Alejandra González Harrison 161 Email: [email protected] Time Series - An Introduction A time series is a sequence of observations
