ARIMA forecasts Open the usa.dta data set (1984q1-2009q4), create the s and declare it as a time series. Save the data so you won t have to do this step again. use usa, clear * --------------------------------------- * Create s and declare time-series * --------------------------------------- generate = q(1984q1) + _n-1 format %tq tsset Here, we plot real GDP, its difference, its natural log and the log difference. qui gen lg = ln(gdp) qui tsline gdp, name(g, replace) qui tsline D.gdp, name(dg, replace) qui tsline lg, name(lg, replace) qui tsline D.lg, name(dlg, replace) graph combine g Dg lg Dlg
5000 10000 15000 real US gross domestic product real US gross domestic product, D -400-200 0 200 400 ln(gdp) 8 8.5 9 9.5 D.ln(gdg) -.02 -.01 0.01.02.03 Looks like there is a trend in the level (perhaps exponential). The difference (upper right) may show a slight upward trend until the bottom dropped out in late 2008. Still, I see no reason to use logs, so I won t. Others might disagree. dfgls gdp dfgls D.gdp, notrend. dfgls gdp DF-GLS for gdp Number of obs = 91 Maxlag = 12 chosen by Schwert criterion DF-GLS tau 1% Critical 5% Critical 10% Critical [lags] Test Statistic Value Value Value ------------------------------------------------------------------------------ 12-0.996-3.575-2.753-2.479 11-1.147-3.575-2.783-2.508 10-1.571-3.575-2.813-2.537 9-1.707-3.575-2.842-2.565 8-1.147-3.575-2.870-2.591 7-1.131-3.575-2.898-2.617 6-1.256-3.575-2.924-2.641 5-1.402-3.575-2.949-2.664 4-1.371-3.575-2.972-2.686 3-1.193-3.575-2.994-2.706 2-1.324-3.575-3.014-2.723
1-1.155-3.575-3.031-2.739 Opt Lag (Ng-Perron seq t) = 11 with RMSE 55.07213 Min SC = 8.324652 at lag 1 with RMSE 61.11491 Min MAIC = 8.277655 at lag 1 with RMSE 61.11491 That s not too good. Clearly we are in the not reject region. The level is nonstationary. And the differences with notrend results:. dfgls D.gdp, notrend DF-GLS for D.gdp Number of obs = 90 Maxlag = 12 chosen by Schwert criterion DF-GLS mu 1% Critical 5% Critical 10% Critical [lags] Test Statistic Value Value Value ------------------------------------------------------------------------------ 12-2.644-2.600-1.971-1.672 11-2.942-2.600-1.986-1.687 10-2.727-2.600-2.001-1.701 9-2.140-2.600-2.016-1.716 8-2.033-2.600-2.031-1.731 7-2.769-2.600-2.046-1.745 6-2.930-2.600-2.061-1.759 5-2.872-2.600-2.075-1.772 4-2.870-2.600-2.088-1.785 3-3.241-2.600-2.101-1.797 2-3.924-2.600-2.113-1.808 1-3.894-2.600-2.124-1.817 Opt Lag (Ng-Perron seq t) = 10 with RMSE 55.81506 Min SC = 8.33664 at lag 1 with RMSE 61.45603 Min MAIC = 8.700253 at lag 1 with RMSE 61.45603 The statistic is significant at every lag. Go for the differences. Removing the trend has no substantive effect in this case. I think the DF-GLS test is the way to go as opposed to the usual DF or ADF test (more powerful than ADF) so I ll use it. Also, this test in Stata is useful in helping to model select the number of lags to use. First, I ll run the autoregressions manually using the regress command, testing residuals for autocorrelation after each. reg D.gdp L.D.gdp estat bgodfrey reg D.gdp L(1/2).D.gdp estat bgodfrey
. reg D.gdp L.D.gdp Source SS df MS Number of obs = 102 F( 1, 100) = 47.92 Model 168398.972 1 168398.972 Prob > F = 0.0000 Residual 351387.432 100 3513.87432 R-squared = 0.3240 Adj R-squared = 0.3172 Total 519786.404 101 5146.40003 Root MSE = 59.278 D.gdp Coef. Std. Err. t P> t [95% Conf. Interval] gdp LD..5712509.0825183 6.92 0.000.407537.7349649 _cons 43.95044 10.19719 4.31 0.000 23.71952 64.18137. estat bgodfrey Breusch-Godfrey LM test for autocorrelation and lags(p) chi2 df Prob > chi2 1 1.692 1 0.1933. reg D.gdp L(1/2).D.gdp H0: no serial correlation Source SS df MS Number of obs = 101 F( 2, 98) = 24.76 Model 174120.822 2 87060.411 Prob > F = 0.0000 Residual 344632.963 98 3516.66289 R-squared = 0.3357 Adj R-squared = 0.3221 Total 518753.785 100 5187.53785 Root MSE = 59.301 D.gdp Coef. Std. Err. t P> t [95% Conf. Interval] gdp LD..4968111.1008118 4.93 0.000.2967534.6968689 L2D..1295186.1008543 1.28 0.202 -.0706234.3296606 _cons 38.6639 11.11212 3.48 0.001 16.61225 60.71554. estat bgodfrey Breusch-Godfrey LM test for autocorrelation lags(p) chi2 df Prob > chi2 1 0.415 1 0.5196 H0: no serial correlation I estimated AR(1) and AR(2) models on the differenced series. AR(1) is probably the best choice, but I continue the example with AR(2) just for fun. The arima command is very convenient. It can be used to take differences, add autoregressive terms, add other regressors and their lags, and add autocorrelated errors to the model (called moving average). Here is the syntax:
Title [TS] arima ARIMA, ARMAX, and other dynamic regression models Syntax Basic syntax for a regression model with ARMA disturbances arima depvar [indepvars], ar(numlist) ma(numlist) Basic syntax for an ARIMA(p,d,q) model arima depvar, arima(#p,#d,#q) options Model noconstant arima(#p,#d,#q) ar(numlist) ma(numlist) constraints(constraints) collinear description suppress constant term specify ARIMA(p,d,q) model for dependent variable autoregressive terms of the structural model disturbance moving-average terms of the structural model disturbance apply specified linear constraints keep collinear variables I want 2 autoregressive terms and to take the first difference of real GDP. That is done arima gdp, arima(2,1,0). arima gdp, arima(2,1,0) (setting optimization to BHHH) Iteration 0: log likelihood = -564.46367 Iteration 1: log likelihood = -564.45944 Iteration 2: log likelihood = -564.45779 Iteration 3: log likelihood = -564.45655 Iteration 4: log likelihood = -564.45556 (switching optimization to BFGS) Iteration 5: log likelihood = -564.4548 Iteration 6: log likelihood = -564.45308 Iteration 7: log likelihood = -564.4521 Iteration 8: log likelihood = -564.45206 ARIMA regression Sample: 2-104 Number of obs = 103 Wald chi2(2) = 34.79 Log likelihood = -564.4521 Prob > chi2 = 0.0000 OPG D.gdp Coef. Std. Err. z P> z [95% Conf. Interval] gdp ARMA _cons 102.3637 17.97174 5.70 0.000 67.13974 137.5877 ar L1..4920216.1077538 4.57 0.000.2808281.7032151 L2..1274014.0886153 1.44 0.151 -.0462814.3010841 /sigma 57.9252 2.452609 23.62 0.000 53.11817 62.73223 The results for the AR terms are very close to those from least squares. ML is not making much of a difference in estimating the parameters. Compare the standard errors though. To generate a series of 1-step ahead forecasts, simply use
predict ghat, y Dynamic forecasts can be generated as well. These use actual values of gdp up to a point and then use forecasted values for all subsequent values. These will be quite smooth. predict ghatdy, dynamic(tq(2004q1)) y tsline gdp ghatdy ghat if tin(2004q1,) The resulting graph is 11000 12000 13000 14000 15000 2004q1 2005q3 2007q1 2008q3 2010q1 real US gross domestic product y prediction, one-step y prediction, dyn(tq(2004q1)) You can see that the 1-step forecasts never deviate very far from the actual series (since they use actual values of gdp each time). The dynamic forecast is smoother and deviations of predicted and actual gdp are fairly large (at least for a while).