ABSTRACT Time Series Analysis Using SAS R Par I The Augmened Dickey-Fuller (ADF) Tes By Ismail E. Mohamed The purpose of his series of aricles is o discuss SAS programming echniques specifically designed o simulae he seps involved in ime series daa analysis. The firs par of his series will cover he Augmened Dickey-Fuller (ADF) es of ime series (saionariy es). The second par will cover coinegraion and error correcion. The SAS echniques presened in boh pars can be used wih he more complex SAS rouines such as PROC ARIMA, which require a high level of research and analysis experise (Bails & Peppers, 1982). INTRODUCTION Time series daa analysis has many applicaions in many areas including sudying he relaionship beween wages and house prices, profis and dividends, and consumpion and GDP. Many analyss erroneously use he framework of linear regression (OLS) models o predic change over ime or exrapolae from presen condiions o fuure condiions. Exreme cauion is needed when inerpreing he resuls of regression models esimaed using ime series daa. Saisicians and analyss working wih ime series daa uncovered a serious problem wih sandard analysis echniques applied o ime series. Esimaion of parameers of he Ordinary Leas Square Regression (OLS) model produced saisically significan resuls beween ime series ha conain a rend and are oherwise random. This finding led o considerable work on how o deermine wha properies a ime series mus possess if economeric echniques are o be used. One basic conclusion was ha any imes series used in economeric applicaions mus be saionary (Granger and Newbold, 1974). This paper will discuss a simple SAS framework o assis SAS programmers in undersanding and modeling ime series daa as a univariae series (Eq 1). Y = α + β X + ε (1) BASICS AND TERMINOLOGY Time series daases are differen from oher ordinary daases in ha heir observaions are recorded sequenially over equal ime incremens (daily, weekly, monhly, quarerly, annually ec). A simple example of a ime series daase (RawDaa) is illusraed below. YEAR QTR X Y 1987 4-0.05294 0.067891 1988 1-0.14696 0.063533 1988 2-0.12600 0.065794 1988 3-0.14656 0.060760 1988 4-0.06056 0.062053 1989 1-0.02644 0.057527 1989 2-0.05778 0.049068 1989 3 0.01924 0.061497 1989 4-0.10823 0.060421 1990 1-0.04056 0.050771 1990 2-0.03390 0.036702 1990 3-0.06903 0.016959 1990 4 0.07547 0.002585 Each of x and y is called a series, while he combinaion of he 2 variables YEAR and QTR represen he sequenial equal ime incremens. If x and y series are boh non-saionary random processes (inegraed), hen modeling he x, y relaionship as a simple OLS relaionship as in equaion 1 will only generae a spurious regression. Granger and Newbold (1974) inroduced he noion of a spurious regression which hey argued produces saisically significan resuls beween series ha conain a rend and are oherwise random. Time series saionariy is a saisical characerisic of a series mean and variance over ime. If boh are consan over ime, hen he series is said o be a saionary process (i.e. is no a random walk/has 1
no uni roo), oherwise, he series is described as being a non-saionary process (i.e. a random walk/has uni roo). Differencing echniques are normally used o ransform a ime series from a non-saionary o saionary by subracing each daum in a series from is predecessor. As such, he se of observaions ha correspond o he iniial ime period () when he measuremen was aken describes he series level. Differencing a series using differencing operaions produces oher ses of observaions such as he firsdifferenced values, he second-differenced values and so on. x level x x 1 s -differenced value x - x -1 x 2 nd -differenced value x - x -2 If a series is saionary wihou any differencing i is designaed as I(0), or inegraed of order 0. On he oher hand, a series ha has saionary firs differences is designaed I(1), or inegraed of order 1. Saionariy of a series is an imporan phenomenon because i can influence is behavior. For example, he erm shock is used frequenly o indicae an unexpeced change in he value of a variable (or error). For a saionary series a shock will gradually die away. Tha is, he effec of a shock during ime will have a smaller effec in ime +1, a sill smaller effec in ime +2, ec. The daa used in his paper assumed o represens ime series daa. Each series in equaion 1 namely, x and y requires examinaions a level for saionariy before proceeding furher o invesigae he relaionship beween he wo variables (he OLS regression analysis). In his specificaion, because he daa used by he paper is a quarerly series, saionariy esing will be conduced a level for up o 5-lagged periods. The saionariy es will uilize he Augmened Dickey-Fuller (ADF) echnique (Dickey and Fuller (1981) which is a generalized auo-regression model formulaed in he following regression equaion (Dickey and Fuller (1981) 5 x = κ x + ϖ xi, + ε i, i, 1 i, k k k, k = 1 The model hypoheses of ineres are: The Series is (2) H O: Non-saionary H A: Saionary ADF Saisics is compared o Criical values o draw conclusions abou Saionariy (see Dickey and Fuller, 1979 for he criical values) AN ANATOMY OF AN ADF EQUATION x i, = κx i, 1 + k 5 = 1 ϖ, k + ε k, i xi +, k This is he 1s-differenced value of x This is he 1s-lagged value of x These are he 1s, 2nd, 3rd, 4h, & 5h-lagged of 1s-differenced of values of x This is he error erm The above elemens can be easily seen in he following char. 2
SAS TECHNIQUES As i was menioned earlier ha our sample daa is quarerly spaced, his dicaes ha five lagged differences have o be included in esing of saionariy of boh series (x and y) for more explanaory power. The following SAS Daa sep creaes he firs lagged, he firs differenced and he five lagged-differenced values of he x series. A similar sep is needed o creae he same variables from he y series. The SAS Daa sep explois he power of SAS LAG and DIF funcions o creae he se of he lagged and differenced values of x. SAS LAG funcion simply looks back in he daase nh number of records and allows you o obain a previous value of a variable and sore i in he curren observaion. 'n' refers o he number of records back in he daa and can be an ineger from 1 o 99. Many imes he only hing you wan o do wih a previous value of a variable is o compare i wih he curren value o compue he difference. I is always recommended ha he LAG and DIF funcions no o be execued condiionally because hey could cause unexpeced resuls. If you have o use hem wih condiional processing of a daase, firs execue he funcions and assign heir resuls o a new variable, hen use he new variable for he condiional processing. The DIF n funcion works he same way as LAG n, bu raher han simply assigning a value, i assigns he difference beween he curren value and a previous value of a variable. The saemen A = DIF n ( X ) ells SAS ha A should equal he curren value of x minus he value x had nh number of records back in he ime. Boh LAG and DIF funcions should only be used on he righ hand side of assignmen saemens and again should no be execued condiionally. DATA TimeSeries; SET RawDaa; RUN; x_1 s _LAG = LAG1(x); x_1 s _DIFF = DIF1(x); x_1 s _DIFF_1 s _LAG = DIF1(LAG1(x)); x_1 s _DIFF_2 nd _LAG = DIF1(LAG2(x)); x_1 s _DIFF_3 rd _LAG = DIF1(LAG3(x)); x_1 s _DIFF_4 h _LAG = DIF1(LAG4(x)); x_1 s _DIFF_5 h _LAG = DIF1(LAG5(x)); SAS Oupu (parial): 1 s _lagged, 1 s _differenced, and he 1 s 5 h _lagged values of he 1 s _differenced value of x DIFF_ DIFF_ DIFF_ DIFF_ DIFF_ YEAR QTR X LAG DIFF 1 _LAG 2 _LAG 3 _LAG 4 _LAG 5 _LAG 1987 4-0.05294....... 1988 1-0.14696-0.05294-0.09402..... 1988 2-0.12600-0.14696 0.02096-0.09402.... 1988 3-0.14656-0.12600-0.02057 0.02096-0.09402... 1988 4-0.06056-0.14656 0.08600-0.02057 0.02096-0.09402.. 1989 1-0.02644-0.06056 0.03412 0.08600-0.02057 0.02096-0.09402. 1989 2-0.05778-0.02644-0.03134 0.03412 0.08600-0.02057 0.02096-0.09402 1989 3 0.01924-0.05778 0.07702-0.03134 0.03412 0.08600-0.02057 0.02096 1989 4-0.10823 0.01924-0.12748 0.07702-0.03134 0.03412 0.08600-0.02057 1990 1-0.04056-0.10823 0.06767-0.12748 0.07702-0.03134 0.03412 0.08600 1990 2-0.03390-0.04056 0.00666 0.06767-0.12748 0.07702-0.03134 0.03412 1990 3-0.06 0.06903-0.03390-0.03513 0.00666 0.06767-0.12748 0.07702-0.03134 1990 4 0.07547-0.06903 0.14451-0.03513 0.00666 0.06767-0.12748 0.07702 1991 1 0.03567 0.07547-0.03981 0.14451-0.03513 0.00666 0666 0.06767-0.12748 1991 2 0.09819 0.03567 0.06252-0.03981 0.14451-0.03513 0.00666 0.06767 Nex he SAS REG procedure, one of many regression procedures in he SAS Sysem is used in he analysis o regress he lagged and differenced values of x generaed by he above daa sep. The regression model used here was se as a relaionship in which he value of x a he preceding ime period (lagged value 3
of x) is he dependen variable and he independen variables are he se of 5 previous-differenced values of he x series. This analysis provides a "bes-fi" mahemaical equaion for he relaionship exhibied in Eq (2). SAS REG procedure for Uni Roo Tes a level, wih fixed 5 Lag Lengh and a Consan : PROC REG DATA = TimeSeries; MODEL x_1s_diff = x_1s_lag x_1s _DIFF_1s _LAG x_1s _DIFF_2nd_LAG x_1s _DIFF_3rd_LAG x_1s _DIFF_4h_LAG x_1s _DIFF_5h_LAG; RUN; QUIT; DISCUSSION The x_1 s _LAG -value generaed by he above regression model corresponds o he Augmened Dickey- Fuller es (ADF) Saisics. Compare his -value o he Criical Values (see Dickey and Fuller, 1979 for he criical values) o es he hypoheses ha he x series is: H O: Non-Saionary H A: Saionary In our example he -value of (-1.83) is greaer han he Criical Values (CVs) a 1%, 5%, and 10% significan level (-3.524233, -2.902358, and -2.588587 respecively). We would fail o rejec he null hypohesis and conclude ha he x series is a non-saionary process when esed a level. WHAT IS NEXT? If we fail o rejec he null hypohesis, and concluded ha x and perhaps y are non-saionary series, we would have o difference each series once, creae a se of lagged and differenced variables as shown in he earlier SAS daa sep his ime from he differenced-values of each series, and finally carry ou he ADF es (esing he series saionariy a is firs-differenced value). Differencing of a series normally ransforms i from non-saionariy o saionariy. A differenced saionary series is said o be inegraed and is denoed as I(d) where d is he order of inegraion. The order of inegraion is he number of uni roos conained in he series, or he number of differencing operaions i akes o make he series saionary. For our purpose here, since we will difference our example series once, here is one uni roo, so i is an I(1) series. Once boh x and y deermined non-saionary a heir level, we will move furher o examine he naure of heir linear combinaion. Specifically we will be ineresed in examining he linear combinaion beween he non-saionary x and y, if such a linear combinaion exiss, hen x and y series are said o be coinegraed. The linear combinaion beween hem is he coinegraing equaion and may be inerpreed as he long-run equilibrium relaionship among he 2 variables. Forunaely, his es can also be accomplished using he Augmened Dickey-Fuller es and will be he subjec of discussion of he second par of his series of aricles. 4
SAS Oupu Regression Analysis (Uni Roo Tes) Level wih 5 Lags NULL HYPOTHESIS: 'x' has a uni roo LAG LENGTH: 5 (FIXED) AUGMENTED DICKEY-FULLER TEST STATISTICS, TEST CRITICAL VALUES: 1% LEVEL T-STATISTICS = -3.524233 5% LEVEL T-STATISTICS = -2.902358 10% LEVEL T-STATISTICS = -2.588587 LEVEL WITH 5 LAGS The REG Procedure Model: MODEL1 Dependen Variable: x_1 s _DIFF Number of Observaions Read 78 Number of Observaions Used 72 Number of Observaions wih Missing Values 6 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 6 0.08731 0.01455 21.25 <.0001 Error 65 0.04451 0.00068479 Correced Toal 71 0.13182 Roo MSE 0.02617 R-Square 0.6623 Dependen Mean 0.00172 Adj R-Sq 0.6312 Coeff Var 1518.81011 Parameer Esimaes The hypohesis ha x has a uni roo canno be rejeced Parameer Sandard Variable DF Esimae Error Value Pr > Inercep 1 0.00916 0.00422 2.17 0.0338 x_1 s _LAG 1-0.16361 0.08960-1.83 0.0724 x_1 s _DIFF_1 s _LAG 1-0.43485 0.13151-3.31 0.0015 x_1 s _DIFF_2 nd _LAG 1 0.11255 0.10735 1.05 0.2983 x_1 s _DIFF_3 rd _LAG 1 0.23609 0.10676 2.21 0.0305 x_1 s _DIFF_4 h _LAG 1-0.42082 0.10964-3.84 0.0003 x_1 s _DIFF_5 h _LAG 1-0.12741 0.10698-1.19 0.2380 EVIEWS R1 CODE AND OUTPUT FOR COMPARISON Similarly, EVIEWS or oher SAS ime series ools can be used o carry ou he same es. The following EVIEWS Code can be used o carry ou he ADF es. Resuls of his code are shown in he nex page. Uroo(adf,cons,lag=5,save=mou) 1 EVIEWS is an economerics & Time Series Analysis sofware package by Quaniaive Micro Sofware. hp://www.eviews.com/index.hml 5
The hypohesis ha x has a uni roo canno be rejeced REFERENCES Bails, Dale G. and Larry C. Peppers (1982) Business Flucuaions: Forecasing Techniques and Applicaions, Englewood Cliffs NJ: Prenice-Hall Inc. Dickey, D. and W. Fuller (1979). Disribuion of he Esimaors for Auoregressive Time Series wih a Uni Roo, Journal of he American Saisical Associaion, 74, 427-431. Fuller, W. (1996). Inroducion o Saisical Time Series, Second Ediion. John Wiley, New York. Granger, C.W.J., and P. Newbold(1974). Spurious regressions in economerics, Journal of Economerics, 2, 111-120. Hamilon (1994). Time Series Analysis, Princeon Universiy Press. Phillips, P.C.B. (1987). Time Series Regression wih a Uni Roo, Economerica, 55, 227-301. ACKNOWLEDGEMENTS My sincere hanks o everyone I have had he pleasure of exchanging ime Series analysis relaed ideas wih in recen years. Special hanks o Theresa Diveni, Ian Keih boh wih he Financial Insiuions Regulaion Division, Kee N. Cheung wih he Housing Finance Analysis Division of he U.S. Deparmen of Housing and Urban Developmen, and Ronald Hanson wih L3 Communicaions, Enerprise IT Soluions (EITS), for heir consrucive suggesions which added much o his paper. TRADEMARKS SAS and all oher SAS Insiue Inc. produc or service names are regisered rademarks or rademarks of SAS Insiue Inc. in he USA and oher counries. EVIEWS and all oher EVIEWS produc or service names are regisered rademarks or rademarks of Quaniaive Micro Sofware in he USA and oher counries. Indicaes USA regisraion. The auhor welcomes and encourages any quesions, correcions, improvemens, feedback, remarks, boh on- and off-opic via email. Please conac he auhor: Ismail E. Mohamed, Ph.D Sofware Engineer 5, L3 Communicaions, Enerprise IT Soluions (EITS), U.S. Deparmen of Housing & Urban Developmen, 451 7h Sree, SW, Room 8212, Washingon, DC 20410; E-mail: ismail.mohamed@l-3com.com; Ismail.Mohamed@hud.gov; Phone: 202-402-5884 6