FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM

Transcription

1 FORECASTING AND TIME SERIES ANALYSIS USING THE SCA STATISTICAL SYSTEM VOLUME 2 Expert System Capabilities for Time Series Modeling Simultaneous Transfer Function Modeling Vector Modeling by Lon-Mu Liu in collaboration with George E. P. Box George C. Tiao This manual is published by Scientific Computing Associates Corp. 913 West Van Buren Street, Suite 3H Chicago, Illinois U.S.A. Copyright Scientific Computing Associates Corp.,

2

3 TABLE OF CONTENTS CHAPTER 1 CHAPTER 2 INTRODUCTION MODELING AND FORECASTING TIME SERIES USING SCA-EXPERT CAPABILITIES 2.1 Modeling and Forecasting a Univariate Time Series The univariate ARIMA Model Modeling and forecasting multi-variable time series Transfer Function Models LTF Method Differencings An Illustrative Example for Reduced Form Transfer Function Modeling An Illustrative Example for Structural Form Transfer Function Modeling Identification of Rational Transfer Function Models Appendix A Summary of the SCA Paragraphs in Chapter IARIMA, IESTIM Appendix B Identification of seasonal ARIMA models using a filtering method B.1 Introduction B.2 Basic Concepts of the Method and Its Rationale B.3 A Summary of the Method B.4 Example B.5 Discussion References in Chapter CHAPTER 3 MULTIVARIATE TIME SERIES ANALYSIS AND FORECASTING USING SIMULTANEOUS TRANSFER FUNCTION MODELS 3.1 Model Building Strategy for STF Models Identification of STF models Specification of an STF model Estimation of an STF model Diagnostic checking Forecasting future observations Further analysis of the example

4 3.2 Identification of STF Models Transfer function models The LTF method An illustrated example Multivariate Time Series Analysis with Interventions Econometric Modeling Using the STF Models Specification of endogenous variables Specification of definitional equations Examples for estimation of STF model (A) Kmenta's demand-supply model (B) Klein s U.S. economy Model I Model Simulation Summary of the SCA Paragraphs in Chapter CCM, STEPAR, MIDEN, ECCM, SCAN, MTSMODEL, MESTIM, IMESTIM, MFORECAST, and CANONICAL Paragraphs References in Chapter CHAPTER 4 MULTIVARIATE TIME SERIES ANALYSIS AND FORECASTING USING VECTOR ARMA MODELS 4.1 Implications of the Vector ARMA Model The vector MA(1) model The vector AR(1) model The vector ARMA(1,1) model A nonstationary vector model Relationship of vector ARMA models to transfer function models Relationship of vector models to econometric models Model building strategy for vector ARMA models A Simulated Example Sample cross correlation matrices Specification of a vector ARMA model Model estimation for the simulated MA(1) data Diagnostic checks of the fitted model Forecasting an estimated model Modeling an Autoregressive Process: Lydia Pinkham Data Preliminary identification: sample cross correlation matrices Preliminary identification, continued: stepwise Autoregressive fitting

5 4.3.3 Initial model specification and estimation for Lydia Pinkham data Estimation with constraints Interpreting the estimation results Analysis of a Mixed Vector ARMA Model Preliminary model identification, CCM and STEPAR Identification methods for a mixed model Model specification and estimation for the U.K. financial data Diagnostic checking and implication of the fitted model Modeling Seasonal Data: Census Housing Data Preliminary model identification Model specification and estimation Diagnostic checks of the estimated model Automatic Vector ARMA Estimation The simulated example in Section The Lydia Pinkham example in Section The U.K. financial data example in Section The Census housing data example in Section Summary of the SCA Paragraphs in Chapter CCM, STEPAR, MIDEN, ECCM, SCAN, MTSMODEL, MESTIM, IMESTIM, MFORECAST, and CANONICAL Paragraphs References in Chapter

6

7 CHAPTER 1 INTRODUCTION The Forecasting and Modeling Package of the SCA Statistical System is comprised of five products. These products are: UTS: Univariate time series analysis and forecasting using Box-Jenkins ARIMA, intervention and transfer function models. This product also includes forecasting capabilities using general exponential smoothing methods. Extended UTS: Univariate time series analysis and forecasting with automatic outlier detection and adjustment, as well as analysis and forecasting of time series containing missing data EXPERT: Automatic time series modeling using Box-Jenkins ARIMA, intervention, transfer function, and vector ARMA models. The automatic vector ARMA modeling component requires the SCA-MTS product. ECON/M: Econometric modeling, multivariate time series analysis, and forecasting using simultaneous transfer function (STF) models. This module also provides the seasonal adjustment procedures X-11, X-11-ARIMA, and a model-based canonical decomposition method. MTS: Multivariate time series analysis and forecasting using vector ARMA models The manual, Forecasting and Time Series Analysis Using the SCA Statistical System, describes the capabilities in the above products. This manual has two volumes. Volume 1 describes the capabilities of the SCA-UTS and Extended UTS products, and Volume 2 describes additional forecasting and time series analysis capabilities of the SCA Statistical System as of document s print date. Any new capabilities or new SCA products for time series analysis and forecasting will be documented as an addendum to these manuals or as a stand-alone monograph. Capabilities described in this volume include: Expert modeling capabilities: (Chapter 2) Simultaneous transfer function modeling: (Chapter 3) Vector ARMA modeling (Chapter 4) Automatic time series modeling using Box-Jenkins ARIMA, intervention, and transfer function models. Multivariate time series analysis and forecasting using STF models. It also discusses the use of STF models in econometric analysis. Multivariate time series analysis and forecasting using vector ARMA models.

8 This volume should be used in conjunction with companion manuals, Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1, and The SCA Statistical System: Reference Manual for Fundamental Capabilities. Within these companion manuals, information is provided on univariate time series analysis, and basic functionality of the SCA System. Whenever possible, material in this manual is presented in a data analysis form. That is, SCA System capabilities, commands, and output are usually presented within the context of a data analysis. Examples have been chosen to both demonstrate the use of the SCA System and to provide some broad guidelines for forecasting and time series analysis. This volume is an extension of the manual, Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1. It is highly recommended that both volumes be used as reference material when working with the time series analysis and forecasting capabilities of the SCA System. One key reference and source of examples in this manual is the text Time Series Analysis: Forecasting and Control by Box and Jenkins (1970). This text contains many important concepts and properties of forecasting and time series analysis. REFERENCES Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control. San Francisco: Holden-Day. (Revised edition published 1976).

9

10 CHAPTER 2 MODELING AND FORECASTING TIME SERIES USING SCA-EXPERT CAPABILITIES In forecasting and analysis of time series data, it is well demonstrated that autoregressive-integrated moving average (ARIMA), intervention, and transfer function models are very effective in handling practical applications. Vast advancements in both theory and methods in this area of research have been accomplished over the last two decades. Unfortunately these methods are not as widely used as they should, given the great advantage they offer. It seems that the complexity and often time consuming nature of the model building process imposed a barrier between the methodology and its use in main stream business and industrial applications. SCA has removed this barrier with the development of SCA-EXPERT. SCA-EXPERT employs new expert system technology to facilitate automatic ARIMA, intervention, and transfer function modeling. The SCA-EXPERT product is very easy to use, and is well suited to most forecasting and time series analysis applications. It is an asset to both novices and experts alike. The automatic nature of the modeling capability provides a mechanism for time series models to be adopted in business and industrial applications. It is a quick and effective solution to handle repetitive, or large scale modeling and forecasting problems. SCA- EXPERT can identify and estimate an appropriate time series model within seconds. In education, the SCA-EXPERT product allows the student to concentrate efforts on the interpretation and application of results, and less time on the complexity of model identification techniques. The SCA-EXPERT product is designed to be a self-contained product. It provides all SCA fundamental capabilities (please refer to the SCA document The SCA Statistical System: Reference Manual for Fundamental Capabilities for detailed information). In addition, it contains the following paragraphs for modeling and forecasting time series: IARIMA, IESTIM, TSMODEL, ESTIM, FORECAST, SFORECAST, OUTLIER, ACF, CCF, CORNER, REGRESSION, DAYS, EASTER, AGGREGATE, PERCENT, and PATCH. Except for IARIMA and IESTIM, the functionality of the above SCA paragraphs can be found in the SCA document Forecasting and Time Series Analysis Using the SCA Statistical System: Volume 1. The capabilities of the IARIMA and IESTIM paragraphs will be described in this document. Since outliers (abnormal or extreme observations) commonly occur in reallife time series, it is desirable to employ the SCA Extended UTS capabilities OESTIM and OFORECAST in conjunction with the SCA-EXPERT capabilities. Several examples of this interplay will be provided in this document. This document should be used in conjunction with other SCA reference manuals. For general functionality of the SCA System, the information can be found in The SCA Statistical System: Reference Manual for Fundamental Capabilities (Liu and Hudak 1991). For detailed information on modeling and forecasting using ARIMA, intervention, and transfer function models, please refer to Forecasting and Time Series Analysis Using the SCA Statistical

11 2.2 EXPERT MODELING AND FORECASTING TIME SERIES System: Volume 1 (Liu et. al. 1992). The latter text also contains information regarding outlier detection and adjustment in time series modeling. This document begins by first describing how to use SCA-EXPERT for univariate ARIMA modeling. We then describe the use of SCA-EXPERT in transfer function modeling. 2.1 Modeling And Forecasting A Univariate Time Series ARIMA models (Box and Jenkins 1970) are useful in many aspects of time series analysis. Such models can be used (1) to understand the nature of a time series, (2) to forecast future observations, and (3) to capture serial correlation in an intervention or a transfer function model. ARIMA models are simple in their model structure, yet they are quite effective in capturing the patterns of serial correlations and in forecasting the future observations of a time series. When forecasts are derived using a more complicated model (such as a multi-variable or non-linear time series model), they are often compared with those generated by an ARIMA model. If the forecasts generated under a more complicated time series model are less accurate than those under an ARIMA model, it often signifies misspecification in the more complicated model, or the existence of outliers in the series. The effects of outliers on forecasting performance can be found in Hillmer (1984), Ledolter (1989), and Chen and Liu (1993). It is highly recommended that the first step in any statistical modeling is to plot the data. In time series modeling, we can use the TSPLOT or TPLOT paragraph, or the time plot capability of SCAGRAF (see The SCA Graphics Package User's Guide) for this purpose. By viewing the plot, we can easily spot important characteristics of a time series, for example, nonstationarity of the series, presence of trend or seasonality, and the existence of major outliers or extreme values. Such information is not only useful for modeling, but also important for forecasting and other applications of the time series model. After examining the time series plot, the IARIMA paragraph can be employed to automatically identify and estimate an ARIMA model for the series. Since the IARIMA paragraph typically employs the conditional likelihood algorithm for model estimation, the SCA estimation paragraph ESTIM (or OESTIM in the Extended UTS) can be used to obtain more efficient parameter estimates if needed. With an appropriately estimated model, the forecasts of the series can be generated using the FORECAST (or OFORECAST in the Extended UTS) paragraph. In this process, if a user's primary interest is forecasting, it is possible to ignore model information and simply just generate the forecasts based on the automatically identified and estimated model. Generally speaking, however, it is advisable to have some basic knowledge in terms of ARIMA models in order to have a better understanding of the generated forecasts.

12 EXPERT MODELING AND FORECASTING TIME SERIES 2.3 Some useful descriptions of ARIMA models are presented in Section 2, where we also present some key ideas used in the identification techniques employed in SCA-EXPERT. In the remainder of this section, we shall illustrate the modeling and forecasting procedure outlined above by using the logged variety stores sales data discussed in Hillmer, Bell and Tiao (1983). Figure 1. Log Retail Sales of Variety Stores: 1/1967 9/1979 The time series plot of the monthly logged variety stores sales data between January 1967 and September 1979 is displayed in Figure 1. In this plot, we observe that the series has strong seasonality and an upward trend. A downward level shift beginning in mid 1976 (between t=108 and t=120) is also visible. After reading the time series data into the SCA workspace and storing the log transformed data as LVSALES, we can obtain an ARIMA model for the series by entering the statement -->IARIMA LVSALES. SEASONALITY IS 12.

13 2.4 EXPERT MODELING AND FORECASTING TIME SERIES THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 153 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 LVSALES RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 LVSALES MA 1 12 NONE LVSALES D-AR 1 1 NONE LVSALES D-AR 1 2 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E In the above IARIMA paragraph, LVSALES is the name of the series, and the SEASONALITY sentence is used to specify the potential seasonality in the series (which is 12 for monthly data). The identified model, which is the same as that presented in Hillmer, Bell, and Tiao (1983), can be expressed as or Θ1B t 2 1 1B 2B (1 B ) (1 B)VSALES =, (1) φ φ 12 1 Θ1B 12 t 2 t 1 1B 2B VSALES = a. (2) φ φ In the latter expression, we use 12 to represent (1 B ) and to represent (1 B). It is important to note that we place the AR(2) operator in the denominator of the above ARIMA model (hence they are referred as D-AR in the model display of the IARIMA output). Such an expression is somewhat different from the conventional form employed in Box and Jenkins (1970) and others. The rationale is presented in the next section. The model information generated by the IARIMA paragraph is stored under the model name UTSMODEL by default. To obtain the exact maximum likelihood estimates for the above model, we enter -->ESTIM UTSMODEL. METHOD IS EXACT. 12

14 EXPERT MODELING AND FORECASTING TIME SERIES 2.5 THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 153 NONLINEAR ESTIMATION TERMINATED DUE TO: MAXIMUM NUMBER OF ITERATIONS 10 REACHED SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 LVSALES RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 LVSALES MA 1 12 NONE LVSALES D-AR 1 1 NONE LVSALES D-AR 1 2 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E We notice that the estimates under the EXACT method are somewhat different from those under the IARIMA paragraph (where a conditional likelihood method is employed). The difference is more pronounced for the seasonal MA parameters. The exact estimation is recommended if an ARIMA model contains MA parameters, particularly seasonal MA parameters. If a model only contains AR parameters, then there is no need to perform exact maximum likelihood estimation. To obtain forecasts for the series, we enter -->FORECAST UTSMODEL FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN

15 2.6 EXPERT MODELING AND FORECASTING TIME SERIES The above forecasts seem to follow the pattern of the original series very well. As shown in Figure 1, the variety stores series contains major outliers. To account for the effects of outliers, we may use the OESTIM paragraph in the Extended UTS product to perform joint estimation of model parameters and outlier effects. Here, we enter -->OESTIM UTSMODEL. METHOD IS EXACT. NEW-SERIES ARE ADJR, ADJY. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 153 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 LVSALES RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 LVSALES MA 1 12 NONE LVSALES D-AR 1 1 NONE LVSALES D-AR 1 2 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE TC AO LS TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E In this example, three outliers are identified. At time period 45, a temporary change occurs. At time period 96, an additive outlier is found. At time period 112, a negative level shift occurs in the series. The last outlier, which occurred in April 1976, corresponds to the closing of a major variety store chain, W.T. Grant. As a result, a significant proportion of retail sales previously made at variety stores were shifted to department stores. The parameter estimates obtained by the OESTIM paragraph are different from those obtained under the ESTIM paragraph, since the model parameters are jointly estimated with the outlier effects in the

16 EXPERT MODELING AND FORECASTING TIME SERIES 2.7 OESTIM paragraph. A display of outliers, adjusted series, and the original series is shown in Figure 2. Figure 2. Outliers and Adjusted Series of Log Retail Sales Data To obtain forecasts with outlier effects accounted, we enter -->OFORECAST UTSMODEL. RESIDUAL STANDARD ERROR (USES DATA UP TO THE FIRST FORECAST ORIGIN)= E-01 TIME ESTIMATE T-VALUE TYPE TC AO LS FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN

17 2.8 EXPERT MODELING AND FORECASTING TIME SERIES The forecasts generated by the OFORECAST paragraph are not much different from those generated under the FORECAST paragraph. This is not surprising since the forecast origin is far away from the last outlier. More detailed discussions regarding the effects of outliers on forecasts can be found in Chen and Liu (1993). 2.2 The Univariate ARIMA Model Following Box and Jenkins (1970), the characteristics of a univariate time series typically can be well represented by a relatively simple ARIMA model. Using the backshift operator B (where BZt = Zt 1), a non-seasonal ARIMA model traditionally is expressed as d φ(b)(1 B) Z t = C 0 +θ (B)a t, t=1,2,...,n (3) where { Z t } is a time series with n observations, { a t } is a sequence of random errors that are 2 independently and identically distributed with a normal distribution N(0, σ a ), C 0 is a constant term, and d is the number of differencings. The φ (B) and θ (B) operators are polynomials in B where 2 p 1 2 p φ (B) = (1 φ B φ B φ B ), and 2 q 1 2 q θ (B) = (1 θ B θ B θ B ). The value p denotes the order of the autoregressive (AR) operator φ (B), and q denotes the order of the moving average (MA) operator θ (B). In most practical situations, p and q are small values no greater than 3, and d is 0, 1, or at most 2. Double differencing (i.e., d=2) seldom occurs in real-life time series applications. The model in (3) can also be expressed as θ(b) (1 B) Z C a (B) d t = + φ t (4) with C = C 0/(1 φ1 φ2... φ p). The latter representation is more preferable since the term C can be easily interpreted. When d=0 (i.e., the series requires no differencing), the constant term C represents the mean of the series. An ARIMA model in such a case is also referred to as an autoregressive-moving average (ARMA) model. When d=1 (i.e., the series requires a first-order differencing), the constant term represents the trend of the series (i.e., the increase or decrease between two successive observations). When d=2 (i.e., the series requires double differencing), the constant term represents the second order trend (i.e., the

18 EXPERT MODELING AND FORECASTING TIME SERIES 2.9 trend of the trend), which seldom occurs in real-life application. Unlike the representation in (4), the term C0 in (3) does not have an easy-to-understand interpretation. Due to these reasons, all ARIMA models identified by SCA-EXPERT are expressed in the form of (4). A special case of (4) is the mixed ARMA(1,1) model, which can be expressed as 1 θ1b Zt = C+ a 1 φ B 1 t (5) SCA-EXPERT uses the parameter estimates of the above ARMA(1,1) model, and the sample ACF and PACF of the series to automatically identify the number of differencings required, and the initial orders for the AR and MA polynomials. Seasonal ARIMA models The models in (3) and (4) can be extended to represent a seasonal time series. The general form of a multiplicative seasonal ARIMA model can be expressed as or s d s D s t φ(b) Φ(B )(1 B) (1 B ) Z = C +θ(b) Θ (B )a, t = 1,2,...,n (6) s d s D θ(b) Θ(B ) (1 B) (1 B ) Z t = C + a s t, t=1,2,...,n (7) φ (B) Φ (B ) t where s s 2s Ps 1 2 p Φ (B ) = (1 Φ B Φ B Φ B ), and s s 2s Qs 1 2 p Θ (B ) = (1 Θ B Θ B Θ B ). In most practical applications, the values of D, P, and Q are either 0 or 1. A special case of model (7) is the mixed ARMA(1,1)xARMA(1,1) s model, which can be expressed as 1 1 s (1 θ1b) (1 Θ1B ) Zt = C+ a s t. (8) (1 φ B) (1 Φ B ) SCA-EXPERT uses the parameter estimates of the above model, and the sample ACF and PACF for the filtered series of model (8) to automatically identify the differencings required for the final model, and the initial orders for the seasonal and non-seasonal AR and MA polynomials. More details regarding the theory and techniques for the identification of seasonal ARIMA models can be found in Liu (1989). A revised version of Liu (1989) is included in this document as Appendix B.

19 2.10 EXPERT MODELING AND FORECASTING TIME SERIES The determination of appropriate differencing order(s) for a seasonal time series can be difficult in some situations. We found that a number of published models in fact are overdifferenced. SCA-EXPERT employs an effective technique to determine appropriate differencing orders and avoid over-differencing. 2.3 Modeling and Forecasting Multi-Variable Time Series An effective means for modeling and forecasting multi-variable time series is to employ transfer function models. Transfer function models (Box and Jenkins 1970) can be regarded as extensions of classical regression and econometric models, and are useful in many applications. In this document, we describe two application areas: (1) for forecasting, and (2) for understanding and interpretation of inter-relationships among variables in a system. The latter application is also known as structural analysis. Traditionally structural form models (i.e., models that allow for contemporaneous relationships between the input variables and the output variable) are used for both forecasting and structural analysis. A number of difficulties in the identification (specification) and estimation of structural form models have been extensively discussed in econometric literature. Liu (1991) suggested that if the primary interest of transfer function modeling is forecasting, a reduced form model may be more preferable than a structural form model. A reduced form model does not allow for contemporaneous relationships in a model unless an input variable is certain to be exogenous. Due to this restriction, the use of a reduced form model avoids a number of difficulties that commonly occur in structural form modeling. Both reduced form and structural form models may consist of a system of equations. Using the approach to be discussed in this document, we can deal with the identification and estimation of the model in the system equation by equation. This flexibility allows us to focus on a partial system (e.g., just one or a few equations) depending upon our interest and application. Joint model estimation for a system of transfer function equations is discussed in Wall (1976), Liu and Hudak (1985), and Liu et. al. (1986) Transfer Function Models A transfer function model can be a single-equation model or a multi-equation model. The latter is also referred to as a simultaneous transfer function (STF) model (Wall 1976, Liu et. al. 1986, Liu 1987, and Liu 1991). For notational simplicity, we will consider a transfer function model with two variables, Y t and X t, where Y t and X t may be inter-related and both can be endogenous variables. Assuming both Y t and X t are stationary, the general form of a transfer function model can be expressed as or Yt = C +ω (B)Xt + N t, t=1,2,...,n (9) ω(b) Yt = C + Xt + N t, t=1,2,...,n δ(b) (10)

20 EXPERT MODELING AND FORECASTING TIME SERIES 2.11 where C is a constant term, and N t is the disturbance term which may follow a stationary ARMA process described in model (4) or (7). The polynomial ω (B) and δ (B) can be generally expressed as 2 g g ω (B) =ω +ω B +ω B ω B, and (11) 1 r r δ (B) = 1 δ B... δ B. (12) We refer to the model in (9) as a linear transfer function (LTF) model (which implies δ (B) = 1), and the model in (10) as a rational transfer function model (which implies δ(b) 1). It is also important to note that the parameter ω 0 in (11) is constrained to be zero (i.e., cannot be present in the ω (B) polynomial) if the model is in reduced form. Similar to (9) and (10), we may consider a transfer function model for Y, which may be expressed as or t X t dependent on Xt = C' +ω '(B)Yt + N ' t, t=1,2,...,n (13) ω'(b) Xt = C' + Yt + N ' t, t=1,2,...,n. δ'(b) (14) Joint estimation for a system of transfer function equations using the maximum likelihood method was first addressed in Wall (1976) and implemented in the SCA System (Liu et. al. 1986). Identification of transfer function equations will be discussed next LTF Method An effective way to identify a transfer function equation is to employ the linear transfer function (LTF) method. The LTF method follows an approach proposed by Liu and Hanssens (1982) and is detailed in Liu and Hudak (1985), Liu (1986, 1987), Pankratz (1991), and particularly in Liu et. al. (1992). This method is effective for both non-seasonal and seasonal time series, and for both reduced form and structural form models. Furthermore, it is easy to use, flexible, and easy to understand. More information regarding the LTF method can be found in Chapter 9, Forecasting and Time Series Analysis Using the SCA Statistical System: Volume 1 (Liu et. al. 1992). Consider the transfer function equations described in (9) through (12). The LTF method employs the following linear transfer function model for the identification of a structural form equation: 2 k t k t t Y = C + (v + v B + v B v B )X + N (15) where k is a lag order for X t chosen by the user based on the subject matter, and N t is the disturbance term (to be discussed later). For the identification of a reduced form equation, the following model structure is employed:

21 2.12 EXPERT MODELING AND FORECASTING TIME SERIES 2 k t 1 2 k t t Y = C + (v B + v B v B )X + N (16) The key difference between (15) and (16) is that (16) imposes the exclusion of a potential contemporaneous relationship between X t and Y t, and (15) allows for a potential contemporaneous relationship between X t and Y. t If X t is an exogeneous variable, the model structure used in (15) or (16) typically leads to the same model if X t and Y t does not have a contemporaneous relationship. This is particularly true if the exogenous variable is a pre-determined non-stochastic time series. However, if X t is also an endogenous variable, the transfer function weight estimates based on (15) can be seriously biased, and render rather misleading results even if X t and Y t are not contemporaneously related. Therefore, it is preferable to employ a reduced form model unless exogeneity of an input variable is definite. The disturbance term in (15) and (16) can be effectively approximated by simple autoregressive models for the purpose of model identification. If the output variable is nonseasonal, the disturbance term may be approximated by 1 Nt = a t. 1 φ B 1 (17) In the case of a seasonal output variable (with seasonality s), an initial approximation for the disturbance term may be N t 1 = a s (1 φ B)(1 Φ B ) 1 1 t The combined use of a linear transfer function with an autoregressive disturbance term provides some unique advantages. These include: (18) (1) Obtaining efficient estimates of transfer function weights. Based on these weights, we can determine if a linear or a rational transfer function is needed for the model equation. (2) Obtaining an estimated disturbance series ˆN t. The model for identified by the IARIMA paragraph. ˆN t can then be easily (3) Providing information on differencing. If either φ 1 or Φ 1 in (17) and (18) is close to 1, then it is definitely necessary to perform appropriate differencing(s) on all variables in the model. This topic will be further discussed in the next subsection Differencings The models we discussed assume that both Y t and X t are stationary. In most real-life applications, Y t and X t may not follow this assumption. Similar to ARIMA modeling, the determination of differencing orders is a key aspect in transfer function modeling. We may examine the value of φ 1 or Φ 1 in the autoregressive term (to see if they are close to 1) to s determine if a regular (i.e., (1 B) ) and/or a seasonal (i.e., (1 B ) ) differencing is necessary. However it is important to note that the estimates φ 1 or Φ 1 can be seriously biased if a non-

22 EXPERT MODELING AND FORECASTING TIME SERIES 2.13 seasonal or seasonal MA parameter is required in the disturbance term. This is particularly serious for the seasonal AR(1) estimate. An easy way to overcome this difficulty is to employ the IARIMA paragraph to automatically identify an ARIMA model for the estimated disturbance series. If the identified model for the disturbance series requires differencing or if the model contains a non-seasonal or seasonal AR(1) polynomial with the parameter estimate(s) close to 1, then appropriate differencing(s) is necessary. To illustrate this method of differencing determination, we employ a set of data consisting of monthly shipments and new orders of durable goods in the United States between January 1958 and December 1974 (Hiller 1976, and Liu 1987). These series are displayed in Figure 3. Our primary interest in the analysis is to develop a reduced form model that employs information in both series for forecasting. Following the analysis in Liu (1987), we shall only employ the data between January 1958 and December 1972 (i.e., the first 180 observations) for model building, the last 24 observations can be used for post-sample forecasting comparision (see Liu 1987). Both series are log transformed, and stored in the SCA workspace as SHIPMENT and NEWORDER respectively. Figure 3. U.S. Durable Goods Shipments and New Orders (a) Durable goods shipments

23 2.14 EXPERT MODELING AND FORECASTING TIME SERIES (b) Durable goods new orders Our first step is to develop a transfer function equation for the variable SHIPMENT (with NEWORDER as an input variable). Following the models presented in (16) and (18), we consider a linear transfer function model 1 Y = C + (v B + v B v B )X + a. (19) φ Φ 2 6 t t 12 t (1 1B)(1 1B ) for the determination of differencing orders (and subsequently for the identification of the model equation). The above LTF model can be specified and estimated using the following TSMODEL and ESTIM paragraphs. -->TSMODEL NAME IS EQ1. NO --> MODEL IS SHIPMENT=C1+(1 TO 6)NEWORDER+1/(1)(12)NOISE. -->ESTIM EQ1. HOLD DISTURBANCE(NS). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN D-03 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED SHIPMENT RANDOM ORIGINAL NONE NEWORDER RANDOM ORIGINAL NONE

24 EXPERT MODELING AND FORECASTING TIME SERIES 2.15 PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE NEWORDER NUM. 1 4 NONE NEWORDER NUM. 1 5 NONE NEWORDER NUM. 1 6 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Reviewing the above estimation results, we find that the seasonal AR(1) parameter estimate is close to 1 ( Φ 1 =.9121), suggesting that a seasonal differencing is required for the model. Since we stored the estimated disturbance series (N ˆ t ) in the variable NS, we can identify a model for the disturbance series using the following IARIMA paragraph: -->IARIMA NS. SEASONALITY IS 12. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 NS RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST 1 0 NONE NS MA 1 12 NONE NS D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E Based on the above results, it confirms that the transfer function equation for SHIPMENT requires a seasonal differencing, but a regular differencing does not seem to be necessary. Using a model similar to (19), we can determine the differencing order(s) required for the model equation for NEWORDER. The LTF model specification and its subsequent estimation results are listed below:

25 2.16 EXPERT MODELING AND FORECASTING TIME SERIES -->TSMODEL NAME IS EQ2. NO --> MODEL IS NEWORDER=C2+(1 TO 6)SHIPMENT+1/(1)(12)NOISE. -->ESTIM EQ2. HOLD DISTURBANCE(NS). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN D-03 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NEWORDER RANDOM ORIGINAL NONE SHIPMENT RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C2 CNST 1 0 NONE SHIPMENT NUM. 1 1 NONE SHIPMENT NUM. 1 2 NONE SHIPMENT NUM. 1 3 NONE SHIPMENT NUM. 1 4 NONE SHIPMENT NUM. 1 5 NONE SHIPMENT NUM. 1 6 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Since the non-seasonal AR(1) estimate is close to 1 ( φ =.9888), it suggests that regular differencing is necessary. However, it is not clear whether seasonal differencing is required since the seasonal AR(1) estimate is only The IARIMA paragraph (as shown below) provides valuable information in determining differencing orders. -->IARIMA NS. SEASONALITY IS 12. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 NS RANDOM ORIGINAL (1-B )

26 EXPERT MODELING AND FORECASTING TIME SERIES 2.17 PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 1 NONE NS MA 2 12 NONE NS D-AR 1 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E Based on the above results, it is advisable to consider seasonal differencing since Φ 1 = is rather close to 1. In some situations, it is possible that both Y t and X t are nonstationary (or seasonal), but φ1(or Φ 1) in (17) or (18) is not close to 1. In such a circumstance, if we believe that the nonstationarity (or seasonality) of Y t is solely caused by X t, then we should not include differencing in the model. Otherwise, appropriate differencing(s) should be imposed. Generally speaking, if both Y t and X t are seasonal, it is safe to impose a seasonal differencing in a transfer function model. This is due to the fact that seasonality in most applications (e.g., business, economic or environmental studies) is caused by some common factors or common environments, rather than just by the particular input variable(s) in the model. This is also generally true for nonstationarity. However, in some situations it is possible that the nonstationarity of the output variable is caused by the input variable(s). In such a situation, the first-order differencing should not be imposed. The results using differencing and no differencing can be quite different.

27 2.18 EXPERT MODELING AND FORECASTING TIME SERIES An Illustrative Example for Reduced Form Transfer Function Modeling In this section, we continue using the example presented in Section 3.3 to illustrate key aspects in transfer function modeling using SCA-EXPERT. Based on the analysis presented in Section 3.3, we find that a seasonal differencing is necessary for the SHIPMENT model. The remaining SCA paragraphs we use to automatically identify a transfer function model for SHIPMENT are listed below. TSMODEL NAME IS EQ1. NO MODEL IS SHIPMENT(12)=C1+(1 TO 6)NEWORDER(12)+1/(1)(12)NOISE. IESTIM EQ1. PRESERVE ARMA. HOLD DISTURBANCE(NS). IARIMA NS. SEASONALITY IS 12. REPLACE EQ1. ESTIM EQ1. METHOD IS EXACT. HOLD OUTPUT LEVEL(BRIEF). In the above SCA statements, the TSMODEL paragraph specifies the differencing, the multiplicative AR(1) disturbance, and most importantly the initial lags for the linear transfer function. The IESTIM paragraph estimates the parameters in the specified model, and automatically deletes insignificant transfer function weight estimates from the model following a prudent algorithm. This algorithm first deletes insignificant weight estimates from both ends. Insignificant weight estimates in the middle are not deleted in the first pass, but will be deleted in the later iterations. The insignificant constant term is always deleted in the final iteration. The sentence PRESERVE ARMA ' requests that parameter estimates in the ARMA component not to be deleted, even if they are insignificant. The estimated disturbance series is stored in the variable NS. The IARIMA paragraph is then used to identify an ARMA model for the estimated disturbance series. By specifying the sentence REPLACE EQ1", the identified ARMA model automatically replaces the intermediate AR disturbance term in the transfer function model EQ1. If the transfer function model is in linear form (i.e., with δ (B) = 1), then we have obtained a final model after the execution of the above SCA statements. The model can be more accurately estimated using the ESTIM paragraph. For seasonal time series, it is recommended that EXACT maximum likelihood estimation is used. In this example, we shall use the default CONDITIONAL method. The residual series is stored in the variable RES in this example, and can be used for diagnostic checking. The output for the above SCA paragraphs are listed below: -->TSMODEL NAME IS EQ1. NO --> MODEL IS SHIPMENT(12)=C1+(1 TO 6)NEWORDER(12)+1/(1)(12)NOISE. -->IESTIM EQ1. PRESERVE ARMA. HOLD DISTURBANCE(NS). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B )

28 EXPERT MODELING AND FORECASTING TIME SERIES 2.19 PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE NEWORDER NUM. 1 4 NONE NEWORDER NUM. 1 5 NONE NEWORDER NUM. 1 6 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E >IARIMA NS. SEASONALITY IS 12. REPLACE EQ1. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 12 NONE NS D-AR 1 1 NONE

29 2.20 EXPERT MODELING AND FORECASTING TIME SERIES TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E >ESTIM EQ1. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE SHIPMENT MA 1 12 NONE SHIPMENT D-AR 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Thus the final transfer function model for SHIPMENT is Θ1B t t 1 φ1b SHIPMENT = C + ( ω B +ω B +ωb ) NEWORDER + a. Similarly, following the analysis in Section 3.3, we find that regular and seasonal 12 differencing (1 B)(1 B ) is necessary for the NEWORDER model. The SCA paragraphs for automatic identification of a transfer function model for NEWORDER is listed below: TSMODEL NAME IS EQ2. NO MODEL IS NEWORDER(1,12)=C2+(1 TO 6)SHIPMENT(1,12)+1/(1)(12)NOISE. IESTIM EQ2. PRESERVE ARMA. HOLD DISTURBANCE(NS). IARIMA NS. SEASONALITY IS 12. REPLACE EQ2. ESTIM EQ2. METHOD IS EXACT. HOLD OUTPUT LEVEL(BRIEF). The output for the above SCA paragraphs is listed below: (20)

30 EXPERT MODELING AND FORECASTING TIME SERIES >TSMODEL NAME IS EQ2. NO --> MODEL IS NEWORDER(1,12)=C2+(1 TO 6)SHIPMENT(1,12)+1/(1)(12)NOISE. -->IESTIM EQ2. PRESERVE ARMA. HOLD DISTURBANCE(NS). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C2 CNST 1 0 NONE SHIPMENT NUM. 1 1 NONE SHIPMENT NUM. 1 2 NONE SHIPMENT NUM. 1 3 NONE SHIPMENT NUM. 1 4 NONE SHIPMENT NUM. 1 5 NONE SHIPMENT NUM. 1 6 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C2 CNST 1 0 NONE SHIPMENT NUM. 1 2 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 2 =================================================

31 2.22 EXPERT MODELING AND FORECASTING TIME SERIES SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT NUM. 1 2 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E >IARIMA NS. SEASONALITY IS 12. REPLACE EQ2. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 1 NONE NS MA 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E >ESTIM EQ2. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12

32 EXPERT MODELING AND FORECASTING TIME SERIES 2.23 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT NUM. 1 2 NONE NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Thus the final transfer function model for NEWORDER is t t SHIPMENT = ω B NEWORDER + (1 θ B )(1 Θ B )a. (21) After appropriate diagnostic checking, if we find the above transfer function equation satisfactory, the forecasts can be obtained by entering -->SFORECAST EQ1,EQ FORECASTS, BEGINNING AT ORIGIN = SERIES: SHIPMENT NEWORDER TIME FORECAST STD ERR FORECAST STD ERR E E E E E E E E E E E E E E E E E E E E E E E E E E E E ERROR COVARIANCE MATRIX

33 2.24 EXPERT MODELING AND FORECASTING TIME SERIES Model Revisions In reviewing the above transfer function equations, we find that the model for the first equation seems to be quite reasonable. In this equation, it implies that monthly forecasts for durable goods shipments can be improved by employing the information of new orders of durable goods one to three months prior to the current month. On the other hand, it is dubious whether it is useful to include SHIPMENTt-2 in the second equation. The main reasons are (1) the significant transfer function weight only occurs in a single lag (and it is somewhat an odd lag and therefore possibly spurious), and (2) the weight estimate is not large and is barely significant (t=-1.98) based on the last estimation. In forecasting, it may cause more harm than good if spurious relationships are employed in a model. With this in mind, we may consider a model for NEWORDER without the input variable SHIPMENT. This leads us to build an ARIMA model for the NEWORDER series (since no input variable is present in the model now). The SCA paragraphs used in building an ARIMA model for NEWORDER, estimating the model and generating forecasts are listed below: -->IARIMA NEWORDER. SEASONALITY IS 12. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SAMPLE ACF OF THE RESIDUALS (** SIGNIFICANT VALUES EXIST **) T-VALUE T-VALUE T-VALUE SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E-01 --

34 EXPERT MODELING AND FORECASTING TIME SERIES 2.25 In the above output, the sample ACF of the residuals is displayed since at least one sample autocorrelation is highly significant (lag 13 in this case). However since the sample autocorrelation is only -0.22, the model is still acceptable and requires no further revision. -->ESTIM UTSMODEL. METHOD IS EXACT. HOLD RESIDUALS(RES). --> STOP MAXITER(20). OUTPUT THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN THE STANDARD ERROR LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E The forecasts for SHIPMENT and NEWORDER can also be obtained by using the SFORECAST paragraph. The output is listed below. -->SFORECAST EQ1, UTSMODEL FORECASTS, BEGINNING AT ORIGIN = SERIES: SHIPMENT NEWORDER TIME FORECAST STD ERR FORECAST STD ERR E E E E E E E E E E E E E E E E E E E E E E E E E E E E E

35 2.26 EXPERT MODELING AND FORECASTING TIME SERIES E E E E E E E E ERROR COVARIANCE MATRIX In the above results, we find the forecasts for SHIPMENT are not much different from those presented earlier. However, the forecasts for NEWORDER are quite different since its forecasts are now based on an ARIMA model. Comparison of Forecasting Performance It is useful to compare the forecasts under a transfer function model with those using an ARIMA model. Particularly in this case, if we also include the first-order differencing in the transfer function model for SHIPMENT, all transfer function weights become insignificant and the SHIPMENT model is reduced to an ARIMA model. We may wonder which model performs better in this case. The automatic identification of the ARIMA model for SHIPMENT, its estimation, and the generation of forecasts can be performed using the following SCA paragraphs: -->IARIMA SHIPMENT. SEASONALITY IS 12. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT MA 1 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E-01 --

36 EXPERT MODELING AND FORECASTING TIME SERIES >ESTIM UTSMODEL. METHOD IS EXACT. HOLD --> OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN THE STANDARD ERROR LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 1 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT MA 1 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E >FORECAST UTSMODEL. NOFS IS FORECASTS, BEGINNING AT TIME FORECAST STD. ERROR ACTUAL IF KNOWN The forecasts for SHIPMENT using ARIMA model and transfer function models are not very different, particularly in a short term period. This is due to the fact that the SHIPMENT series is dominated by seasonality and subjected to limited effects from other series. However

37 2.28 EXPERT MODELING AND FORECASTING TIME SERIES based on the study in Liu (1989), the use of the reduced form transfer function model indeed provides better forecasts than the ARIMA model based on a forecasting comparison using one-step to five-step ahead forecasts. Liu (1989) also noted that the observation at t=204 is possibly an outlier, and appropriate consideration must be rendered in such a forecasting comparison An Illustrative Example for Structural Form Transfer Function Modeling In this section we employ a set of stock market data to illustrate structural form transfer function modeling. The data consist of the following monthly series, each from January 1976 through December 1991 inclusive (a total of 192 observations for each series): (1) The monthly average of the Standard and Poor's 500 stock index, (2) The monthly average of long term government security interest rates (from the Federal Reserve Bulletin), and (3) The monthly composite index of leading indicators (from Business Conditions Digest). The data are displayed in Figure 4 and stored in the SCA workspace under the labels SP500, LONGTERM and LINDCTR, respectively. From Figure 4, we see that SP500 increases steadily until observation 142, at which time it plummets for three consecutive periods. This period corresponds to the stock market crash in October-December We will also analyze the natural logarithms of all time series. The logarithmic transformation is frequently used to achieve a more homogeneous variance in a data set. In the case of economic data, it is also employed so that the parameters in the model can be interpreted in terms of elasticity. In this way, we can assess the percent change in the response for a 1% change in an explanatory variable. The log transformed series of SP500, LONGTERM and LINDCTR are stored in the SCA workspace under the labels LNSP500, LNLONG, and LNLINDTR respectively. In this data set, it is reasonable to assume that stock prices can be influenced by interest rates and economic conditions, but not vice versa. Hence a structural form model can be built in a straightforward manner based on the linear transfer function model described in (15). Since the stock show major changes after t=141 (September 1987), we shall perform model identification using the data between t=1 and t=141. We then examine the parameter estimates of the identified model using the entire time span. In the LTF model specified below, we assume that the maximum lag order (k) for each input variable is 5. Similar results are obtained for k between 2 and 6. The SCA statements to facilitate automatic transfer function modeling and their output are displayed below. -->TSMODEL STOCKMDL. NO SHOW. MODEL --> LNSP500(1)=CNST+(0 TO 5)LNIRLONG(1)+(0 TO 5)LNLINDTR(1)+1/(1)NOISE. -->IESTIM STOCKMDL. PRESERVE ARMA. HOLD --> MAXREVISON 5. SPAN 1,141.

38 EXPERT MODELING AND FORECASTING TIME SERIES 2.29

39 2.30 EXPERT MODELING AND FORECASTING TIME SERIES THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 141 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE LNIRLONG NUM. 1 0 NONE LNIRLONG NUM. 1 1 NONE LNIRLONG NUM. 1 2 NONE LNIRLONG NUM. 1 3 NONE LNIRLONG NUM. 1 4 NONE LNIRLONG NUM. 1 5 NONE LNLINDTR NUM. 1 0 NONE LNLINDTR NUM. 1 1 NONE LNLINDTR NUM. 1 2 NONE LNLINDTR NUM. 1 3 NONE LNLINDTR NUM. 1 4 NONE LNLINDTR NUM. 1 5 NONE LNSP500 D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE LNIRLONG NUM. 1 0 NONE LNIRLONG NUM. 1 1 NONE LNIRLONG NUM. 1 2 NONE LNLINDTR NUM. 1 0 NONE LNLINDTR NUM. 1 1 NONE LNLINDTR NUM. 1 2 NONE LNLINDTR NUM. 1 3 NONE

40 EXPERT MODELING AND FORECASTING TIME SERIES LNLINDTR NUM. 1 4 NONE LNLINDTR NUM. 1 5 NONE LNSP500 D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 2 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE LNIRLONG NUM. 1 0 NONE LNIRLONG NUM. 1 1 NONE LNLINDTR NUM. 1 0 NONE LNLINDTR NUM. 1 3 NONE LNSP500 D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 3 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE LNIRLONG NUM. 1 0 NONE LNIRLONG NUM. 1 1 NONE LNLINDTR NUM. 1 0 NONE LNSP500 D-AR 1 1 NONE

41 2.32 EXPERT MODELING AND FORECASTING TIME SERIES TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E >IARIMA NS. REPLACE STOCKMDL. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 141 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E The above IARIMA paragraph identifies a white noise model for the disturbance term. This seems to be appropriate if we examine the sample ACF of the disturbance series (as shown below). However since LNSP500 follows an ARIMA(0,1,1) model with rather significant θ 1 estimate, we shall include an MA(1) term in the final transfer function model. -->ACF NS. MAXLAG IS 12. TIME PERIOD ANALYZED TO 141 NAME OF THE SERIES NS EFFECTIVE NUMBER OF OBSERVATIONS STANDARD DEVIATION OF THE SERIES MEAN OF THE (DIFFERENCED) SERIES STANDARD DEVIATION OF THE MEAN T-VALUE OF MEAN (AGAINST ZERO) AUTOCORRELATIONS ST.E Q I IXXX IXX I IXX IXXXX IX XI I IX +

42 EXPERT MODELING AND FORECASTING TIME SERIES I IXX XXXI + -- Based on the results shown in the above IESTIM paragraph, it is clear that LNSP500 is only influenced by LNLINDTR contemporaneously (i.e., at lag 0). However, LNSP500 may be influenced by LNLONG at lag 0, 1 and 2 at a decreasing rate. The iterative estimation results in REVISION NUMBER 1" in the above IESTIM paragraph is supportive to this postulation. With this in mind, we may entertain a rational transfer function ω0 /(1 δ B) for the LNLONG input variable. Thus the complete transfer function model for LNSP500 can be specified as: ω LNSP500 0 t = C + LNLONG t +ω 1 LNLINDTR t + (1 θ1b)at 1 δb The above model is specified in the next two SCA paragraphs. Please note that it is necessary to provide a reasonable initial value for the numerator parameters if a rational transfer function model is employed. The specified model is then estimated by the ESTIM paragraph. -->W0= >TSMODEL STOCKMDL. NO SHOW. MODEL --> --> +(1-THETA*B)NOISE -->ESTIM STOCKMDL. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 192 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN D-03 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE W0 LNIRLONG NUM. 1 0 NONE LNIRLONG DENM 1 1 NONE LNLINDTR NUM. 1 0 NONE THETA LNSP500 MA 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE

43 2.34 EXPERT MODELING AND FORECASTING TIME SERIES RESIDUAL STANDARD ERROR E The above estimation results show that δ = with a significant t value. This result seems to support the postulation that LNLONG influences LNSP500 beyond lag 0 at a decreasing rate. However since there are major outliers in the response series, we shall estimate the above model with outlier effects considered. The SCA paragraph OESTIM is used and the results are listed below. -->OESTIM STOCKMDL. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 192 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- STOCKMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LNSP500 RANDOM ORIGINAL (1-B ) 1 LNIRLONG RANDOM ORIGINAL (1-B ) 1 LNLINDTR RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE W0 LNIRLONG NUM. 1 0 NONE LNIRLONG DENM 1 1 NONE LNLINDTR NUM. 1 0 NONE THETA LNSP500 MA 1 1 NONE SUMMARY OF OUTLIER DETECTION AND ADJUSTMENT TIME ESTIMATE T-VALUE TYPE AO AO IO IO LS TC AO IO IO IO IO IO TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 RESIDUAL STANDARD ERROR (WITH OUTLIER ADJUSTMENT) E-01 --

44 EXPERT MODELING AND FORECASTING TIME SERIES 2.35 Based on the above OESTIM results, we find that δ = and is insignificant as shown by the t value. Therefore we may consider a simple linear transfer function for LNLONG (i.e., with δ =0 in the above model) and conclude that LNSP500 is only influenced by LNLONG contemporaneously. In the output for OESTIM, the two major outliers detected at t=142 and 143 (both are innovational outliers) represent a dynamic level shift caused by the stock market crash that occurred in October Other outliers also correspond to anomalous changes in the stock market. More details on the types of outliers and their effects on a series can be found in Chen and Liu (1993). In addition to the differences in the estimates of transfer function weights, the trend estimate under the ESTIM paragraph is (corresponds to 6.6% annually), which is much lower than the trend estimate under the OESTIM paragraph. The results in the above two paragraphs demonstrate the importance of outlier detection and adjustment in time series modeling. It is useful to note if we employ ESTIM and OESTIM using only the first 141 observations, the results are consistent with those shown above. However, we find that the parameter estimates for interest rates are much more stable than those for the economic leading indicator. This raises an interesting question concerning the role of economic leading indicators in affecting stock market prices Identification of Rational Transfer Function Models In Section 3.5, we provide an example involving the use of rational transfer function models. In some situations, a transfer function may have a rather pronounced denominator polynomial δ (B), and special techniques may be employed to identify the form of a transfer function. In this section, we use an example to illustrate these techniques. The Series M data set of Box and Jenkins (1970) is used. The output series (response) consists of sales data, and the input series (explanatory variable) is a leading indicator. There are 150 observations in each series. The data are displayed in Figure 5 and stored in the SCA workspace under the labels SALES and LEADING. In this section, we will use the first 126 observations only for model building and estimation. The last 24 observations are reserved to compare forecast performance using different models.

45 2.36 EXPERT MODELING AND FORECASTING TIME SERIES Since we expect the transfer function weights to have a persistent die-out pattern, we choose a larger value for k, which is 10. In addition, each transfer function weight is assigned a name in order to store its value. This will allow us to use the estimated weights to identify a rational transfer function more conveniently. We also know a priori that sales could be influenced by a leading indicator, but not vice versa, thus a structural form model will be employed. Following the procedure outlined in Section 3.3, we determine that first-order differencing is necessary. The remaining SCA paragraphs for the identification of a transfer function model are listed below.

46 EXPERT MODELING AND FORECASTING TIME SERIES >TSMODEL SALESMDL. NO --> MODEL IS SALES(1)=CNST+(0 TO 10; V0 TO V10)LEADING(1)+1/(1)NOISE. -->IESTIM SALESMDL. PRESERVE ARMA. HOLD DISTURBANCE(NS). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V0 LEADING NUM. 1 0 NONE V1 LEADING NUM. 1 1 NONE V2 LEADING NUM. 1 2 NONE V3 LEADING NUM. 1 3 NONE V4 LEADING NUM. 1 4 NONE V5 LEADING NUM. 1 5 NONE V6 LEADING NUM. 1 6 NONE V7 LEADING NUM. 1 7 NONE V8 LEADING NUM. 1 8 NONE V9 LEADING NUM. 1 9 NONE V10 LEADING NUM NONE SALES D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E+00 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V3 LEADING NUM. 1 3 NONE V4 LEADING NUM. 1 4 NONE V5 LEADING NUM. 1 5 NONE V6 LEADING NUM. 1 6 NONE V7 LEADING NUM. 1 7 NONE V8 LEADING NUM. 1 8 NONE

47 2.38 EXPERT MODELING AND FORECASTING TIME SERIES 8 V9 LEADING NUM. 1 9 NONE V10 LEADING NUM NONE SALES D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E Since the transfer function weights display a die-out pattern, this indicates that a rational transfer function may be employed. The rational transfer function can be identified using the corner method (see later output). The model for the disturbance term again can be identified using the IARIMA paragraph as shown below. -->IARIMA NS. SEASONALITY IS 12. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E In order to employ the corner method to identify a rational transfer function, first we need to join the transfer function weights sequentially to form a new variable. The joined transfer function weights can then be used in the CORNER paragraph to identify a model. The variables V0 to V10 store the most up-to-date transfer function weights during the iterative estimation. -->JOIN OLD ARE V0 TO V10. NEW IS TFWEIGHTS. THE JOIN OPERATION HAS BEEN COMPLETED, RESULT IS STORED IN VARIABLE TFWEIGHT VARIABLE TFWEIGHT IS A 11 BY 1 MATRIX --

48 EXPERT MODELING AND FORECASTING TIME SERIES >CORNER TFWEIGHTS. CORNER TABLE FOR THE TRANSFER FUNCTION WEIGHTS IN TFWEIGHT The above results show that the rational transfer function ( ω3b )/(1 δ B) is appropriate for the estimated weights. We now can specify the complete rational transfer function model as below. It is followed by model estimation using the exact maximum likelihood method. -->TSMODEL SALESMDL. NO --> MODEL IS +(V3*B**3)/(1-D1*B)LEADING(1)+(1-TH*B)NOISE. -->ESTIM SALESMDL. METHOD IS EXACT.HOLD --> OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN THE STANDARD ERROR LESS THAN D-03 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST CNST 1 0 NONE V3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE TH SALES MA 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E

49 2.40 EXPERT MODELING AND FORECASTING TIME SERIES APPENDIX A SUMMARY OF THE SCA PARAGRAPHS This section provides a summary of those SCA paragraphs employed in this document. The syntax for the paragraphs is presented in both brief and full form. The brief display of the syntax contains the most frequently used sentences of a paragraph, while the full display presents all possible modifying sentences of a paragraph. In addition, special remarks related to a paragraph may also be presented with the description. It is recommended that the brief form be used before employing any System capability that can be accessed only through the use of the full form of the paragraph syntax. Each SCA paragraph begins with a paragraph name and is followed by modifying sentences. Sentences that may be used as modifiers for a paragraph are shown below and the types of arguments used in each sentence are also specified. Sentences not designated required may be omitted as default conditions (or values) exist. The most frequently used required sentence is given as the first sentence of the paragraph. The portion of this sentence that may be omitted is underlined. This portion may be omitted only if this sentence appears as the first sentence in a paragraph. Otherwise, all portions of the sentence must be used. The last character of each line except the last line must be the continuation The paragraphs to be explained in this summary are IARIMA and IESTIM. Legend v: variable or model name r: real value i: integer w: keyword IARIMA Paragraph The IARIMA paragraph is used to automatically identify an appropriate ARIMA model for a seasonal or non-seasonal time series. In addition, it estimates the parameters of the identified model. The IARIMA paragraph may be followed by the ESTIM paragraph if more precise parameter estimates are desired, or by the OESTIM paragraph if joint estimation of model parameters and outlier effects is needed. The IARIMA paragraph can also be used in conjunction with the IESTIM paragraph to facilitate more automatic transfer function modeling.

50 EXPERT MODELING AND FORECASTING TIME SERIES 2.41 Syntax for the IARIMA paragraph Brief syntax IARIMA VARIABLE IS SEASONALITY IS SPAN IS i1, i2. Required sentence: VARIABLE Full syntax IARIMA VARIABLE IS NAME IS SEASONALITY IS i. (PERIOD IS SPAN IS i1, DFORDERS ARE v1, v2, - - DELETE-CONSTANT/NO REPLACE COMPONENT-SERIES ARE v1, v2, HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). Required sentence: VARIABLE Sentences used in the IARIMA paragraph VARIABLE sentence The VARIABLE sentence is used to specify the name of a series for which an ARIMA model will be identified and estimated. It is a required sentence. NAME sentence The NAME sentence is used to specify a name (label) for the model identified by the IARIMA paragraph. When it is not specified, the default name UTSMODEL is used internally. SEASONALITY sentence The SEASONALITY sentence is used to specify the potential seasonality the series may possess. The seasonality is 4 for quarterly data, 12 for monthly data, and so on. If a seasonality is specified but in fact the series is non-seasonal, an appropriate non-seasonal model will still be obtained (assuming that the series is medium to long in length). If a series is seasonal but no seasonality is specified, the identified model will not be appropriate (it is signified by the display of significant sample autocorrelations of residuals). Hence the SEASONALITY is required if the series is seasonal, and optional if

51 2.42 EXPERT MODELING AND FORECASTING TIME SERIES the series is non-seasonal. It is safe to specify the potential seasonality if the series is medium to long in length. PERIOD sentence The PERIOD sentence serves the same purpose as the SEASONALITY sentence. Either PERIOD or SEASONALITY may be used to specify the periodicity or seasonality of a time series, but not both. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data are used to identify and estimate an ARIMA model. The default is the maximum span available for the series. DFORDER sentence The DFORDER sentence is used to specify the differencing order(s) that must be included in the final ARIMA model. In addition to the specified differencing(s), other differencing orders may be included in the final ARIMA model if they are found to be necessary. By default, the differencing orders for the ARIMA model of a time series are automatically determined by the IARIMA paragraph. This imposed specification of differencing order(s) may be particularly useful when a time series is short. DELETE-CONSTANT sentence The DELETE-CONSTANT sentence is used to specify the manner the deletion of the constant term in an ARIMA model is handled. By default (which is DELETE- CONSTANT), the constant term is deleted from the model if it is insignificant, and retained in the model if it is significant. However, if NO DELETE is specified, the constant term will be retained in the model no matter it is significant or not. The default is DELETE-CONSTANT. REPLACE sentence The REPLACE sentence is used to specify the name of a transfer function model or intervention model for which its ARMA component is to be replaced by the ARIMA model identified by the IARIMA paragraph. The REPLACE sentence provides a link for the combined use of the IARIMA and IESTIM paragraphs. COMPONENT sentence The COMPONENT sentence is used to specify the names of variables to store the residual series, R series, and S series (see Appendix B for the definition of R and S series). The S series will not be generated if the SEASONALITY (or PERIOD) sentence is not specified. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace until the end of the session. Only those statistics desired to be retained need be named. Values are placed in the variable named in parentheses. The default is that none of the values of the above statistics will be retained after the paragraph is used. The values that may be retained are: RESIDUALS : the residual series

52 EXPERT MODELING AND FORECASTING TIME SERIES 2.43 FITTED VARIANCE : the one-step-ahead forecasts (fitted values) of the series : the variance of the noise IESTIM Paragraph The IESTIM paragraph is used to estimate a time series model (transfer function, intervention, or ARIMA model) and automatically delete insignificant parameter estimates in an appropriate manner. For a transfer function or intervention model, if no parameter estimate is significant for an input variable, the variable will be deleted entirely. For an ARIMA model, insignificant parameter estimates will be deleted and necessary MA parameters will be included to ensure white noise residuals. Syntax for the IESTIM paragraph Brief syntax IESTIM MODEL SPAN IS i1, HOLD RESIDUALS(v). Required sentence: MODEL Full syntax IESTIM MODEL SPAN IS i1, METHOD IS STOP ARE MAXIT(i), PRESERVE ARMA, CONSTANT, v1, v2, - - MAXREVISION IS OUTPUT IS LEVEL(w), PRINT(w1, w2, - - NOPRINT(w1, w2, - - HOLD RESIDUALS(v), FITTED(v), VARIANCE(v). Required sentence: MODEL

53 2.44 EXPERT MODELING AND FORECASTING TIME SERIES Sentences used in the IESTIM paragraph MODEL sentence The MODEL sentence is used to specify the name (label) of the model to be estimated. The name must be one specified in a previous TSMODEL or IARIMA paragraph. It is a required sentence. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which data are analyzed. The default is the maximum span available for the series. METHOD sentence The METHOD sentence is used to specify the method for the computation of the likelihood function used in model estimation. The keyword may be CONDITIONAL for the conditional likelihood or EXACT for the exact likelihood function. The default is CONDITIONAL. STOP sentence The STOP sentence is used to specify the stopping criterion for the nonlinear estimation of parameters. This estimation is conditional on the most recent outlier adjustment. Estimation is terminated when the relative change in the value of the likelihood function or parameter estimates between two successive iterations is less than or equal to the convergence criterion, or if the maximum number of iterations is reached. The argument, i, for the keyword MAXIT specifies the maximum number of iterations. The default is i=10. The argument, r1, for the keyword LIKELIHOOD specifies the value of the relative convergence criterion on the likelihood function. The default is r1 = The argument, r2, for the keyword ESTIMATE specifies the value of the relative convergence criterion on the parameter estimates. The default is r2 = PRESERVE sentence The PRESERVE sentence is used to specify the components in a transfer function or intervention model for which the functional forms of the effects will not be altered, regardless of the significance of the parameter estimates in the components. The keywords are ARMA, CONSTANT, or variable names specified in a transfer function or intervention model. If the keyword ARMA is specified, it means that the ARMA model for the noise will not be changed. If the keyword CONSTANT is specified, it means that the constant term remains in the model regardless of its significance. Similarly, if a valid variable of name is specified, the transfer function for the variable will not be altered regardless the significance of the parameter estimates in the transfer function. If the PRESERVE sentence is not used, the functional forms of all components in a model may be modified depending upon the significance of parameter estimates.

54 EXPERT MODELING AND FORECASTING TIME SERIES 2.45 MAXREVISION sentence The MAXREVISION sentence is used to specify the maximum number of model revisions allowed in the iterative model estimation. The default is 3. OUTPUT sentence The OUTPUT sentence is used to control the amount of output displayed for selected statistics. Control is achieved in a two stage procedure. First, a basic LEVEL of output (default NORMAL) is designated. Output may then be increased (decreased) from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output displayed are: BRIEF NORMAL DETAILED : estimates and their related statistics only : RCORR : ITERATION, CORR, and RCORR where the keywords on the right denote: ITERATION: CORR: RCORR: the parameter and covariance estimates for each iteration the correlation matrix for the parameter estimates the reduced correlation matrix for the parameter estimates (i.e., a display in which all values have no more than two decimal places and those estimates within two standard errors of zero are displayed as dots,. ). The keyword OITERATION may also be specified to activate the display of estimation results for major stages of iterative outlier and parameter estimation. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace until the end of the session. Only those statistics desired to be retained need be named. Values are placed in the variable named in parentheses. The default is that none of the values of the above statistics will be retained after the paragraph is used. The values that may be retained are: RESIDUALS FITTED VARIANCE : the residual series without outlier adjustment : the one-step-ahead forecasts (fitted values) of the series : the variance of the noise

55 2.46 EXPERT MODELING AND FORECASTING TIME SERIES APPENDIX B IDENTIFICATION OF SEASONAL ARIMA MODELS USING A FILTERING METHOD SUMMARY This appendix summarizes an identification method of ARIMA models for seasonal time series using an intermediary model and a filtering method. This method is found to be useful when conventional methods, such as using sample ACF and PACF, fail to reveal a clear-cut model. This filtering identification method is also found to be particularly effective when a seasonal time series is subjected to calendar variations, moving-holiday effects, and interventions. B.1 Introduction In the application of univariate time series analysis and forecasting, seasonal time series occur frequently and deserve special attention. A seasonal time series has a tendency to repeat the pattern of its past, and can often be modeled and forecasted with high accuracy. The autoregressive-integrated moving average (ARIMA) models proposed in Box and Jenkins (1970) are particularly useful in the analysis of such time series. For stationary seasonal time series, the general form of a seasonal ARIMA model can be written as where s t s φ(b) Φ (B )Y = C +θ(b) Θ (B )a, (1) φ (B) = 1 φ B φ B... φ B 1 2 p s 2s Ps 1 2 p Φ (B) = 1 Φ B Φ B... Φ B 2 q 1 2 q θ (B) = 1 θ B θ B... θ B s s 2s Qs 1 2 Q Θ (B ) = 1 Θ B Θ B... Θ B, p t and B is the backshift operator BYt = Yt 1. For non-stationary seasonal time series, the general form of the model can be written as s d s d s t 1 2 φ(b) Φ(B )(1 B) (1 B ) Y = C +θ(b) Θ (B )a (2) In Box-Jenkins ARIMA modeling, identification of a tentative model has been found to be an important but difficult stage, particularly when a time series follows a mixed autoregressive-moving average (ARMA) model. This difficulty is further compounded when seasonality is also present in a time series. Several major advances have been made in the identification of non-seasonal mixed ARIMA models during the past few years, particularly t

56 EXPERT MODELING AND FORECASTING TIME SERIES 2.47 the extended autocorrelation function (EACF) and the smallest canonical correlation (SCAN) methods developed by Tsay and Tiao (1984, 1985). These methods are very informative in the identification of ARIMA models for non-seasonal time series, however their direct application in the identification of seasonal time series is less successful. In this paper, we propose a method which can be used to identify seasonal ARIMA models when traditional methods, such as autocorrelation function (ACF) and partial autocorrelation function (PACF) methods, do not provide a clear-cut model. The method consists of the use of an intermediary model which is used as a tool for filtering the series. It is followed by estimation of the intermediary model, and filtering of the seasonal time series into two separate component series. Between these two component series, one represents the behavior of the non-seasonal part of the ARIMA model, the other represents the behavior of the seasonal part of the model. The proposed identification method, referred to as the filtering identification method, has been applied to a number of simulated and actual time series and found to be very useful. In this paper, we apply this method to an actual time series that represents a more difficult case in the identification of seasonal ARIMA models. B.2 Basic Concepts of the Method and Its Rationale The basic concepts of the filtering identification method are quite simple and can be easily explained. Assuming that we have a seasonal time series Y t, if we can derive a time series R t from Y t that represents the non-seasonal behavior (model) of Y t, and another time series St from Y t that represents the seasonal behavior (model) of Y t, we can then obtain the model for Y t by combining the models for R t and S. t If R t and S t are not contaminated with each other, then their models can be easily identified since they are either pure nonseasonal or pure seasonal time series. To simplify the explanation, we assume that the series model in (1). The model in (1) can be rewritten as Y t is stationary and follows the C θ(b) (B )Y (B ) a φ(b) φ(b) s s Φ t = +Θ t (3) or s s t 1 t Φ (B )Y = C +Θ (B )R (4) θ(b) C with Rt = at and C1 =. φ (B) 1 φ φ... φ 1 2 p Similarly, the model in (1) can also be re-written as

57 2.48 EXPERT MODELING AND FORECASTING TIME SERIES φ (B)Yt = C 2 +θ (B)St (5) with S t s Θ(B ) = a s t and C2 Φ (B ) C =. 1 Φ Φ... Φ 1 2 p Based on (4) and (5), it is easy to see that R t is the noise series of Y t when it is filtered s s by Φ (B ) and Θ (B ) using Model (4), and S t is the noise series of Y t when it is filtered by φ (B) and θ (B) using Model (5). We are particularly interested in how well the series R t can represent the non-seasonal behavior of the series. It is important to note that the filtering procedure using model (4) does not use φ (B) and θ (B) polynomials. Therefore as long as s s the seasonal polynomials Φ (B ) and Θ (B ) are reasonable approximations of the underlying model, then the series R t can be generated appropriately and represent the non-seasonal part of the model adequately. In practice, the time series model and its parameters are unknown, therefore it is necessary to employ an intermediary model to generate the series R t and S t from Y. t The following multiplicative ARMA(1,1) model is found to be a good intermediary model: S s 1 1 t 1 1 t (1 φ B)(1 Φ B )Y = C + (1 θ B)(1 Θ B )a (6) The parameters of the above model can be easily estimated (Liu et al. 1986), and the series R can be obtained by filtering Y using the model t s s 1 t 1 1 t 1 t (1 Φ B )Y = C /(1 φ ) + (1 Θ B )R, if φ 1. (7) and S t can be obtained using (1 φ 1B)Yt = C /(1 Φ 1) + (1 θ1b)s t, if Φ1 1. (8) Here C, Φ1, 1, 1, 1 φ Θ θ are estimated values. Even though, the model in (6) is quite simple, it contains 16 possible sub-models. Furthermore, an ARMA(1,1) model is very flexible and can be used to approximate a wide variety of ARIMA processes, such as AR(2), MA(2), ARMA(1,2), or ARMA(2,1), etc. In practice, seasonal components of an ARIMA model are usually rather simple and the seasonal ARMA(1,1) in fact is an adequate model for most of the applications. In the application of the above method, it is important to note that φ 1 and Φ 1 cannot be equal to 1. However, if φ 1 is close to 1 in the intermediary model, it implies that a regular differencing (1 B) is necessary. Similarly, if Φ 1 is close to 1, it implies that a seasonal s differencing (1 B ) is necessary. Therefore with slight modifications, the above method can

58 EXPERT MODELING AND FORECASTING TIME SERIES 2.49 be extended to non-stationary seasonal time series. For the convenience of application, it is useful to always include the constant term C in model (6). B.3 A Summary of the Method The above method can be summarized in the following steps. original time series as Z. t We shall refer to the Step 1: First, we examine the sample ACF's of Z,(1 t B)Z,(1 t B)Zt, and s (1 B)(1 B )Z t and check if any differencings are necessary and if seasonalities are present. We may then examine the sample ACF under the appropriately differenced series. If an appropriate seasonal ARIMA model is obvious based on the sample ACF of the series, the model identification is completed. Otherwise, we need to proceed to the next step. We shall use Y t to represent an appropriately differenced series of Z. t Step 2: If a tentative model for Y t is not obvious by the use of sample ACF of Y t, we will then fit the intermediary model described in (6), and then generate the series R t and S t if neither φ 1 nor Φ 1 is close to 1. If either φ 1 or Φ 1 is close to 1, then appropriate differencing(s) is necessary. After differencing(s), we can fit the intermediary model in (6) once more and generate the series R t and S t subsequently. Step 3: Now we can use the sample ACF, PACF, and EACF of appropriate ARMA model for the series R. t s R t to identify an Step 4: To identify a model for St, we can use the sample ACF of S t. If any ambiguities exist for the model of S t, we may also examine the estimated values of Φ 1 and Θ 1 and determine an appropriate model for the series S. t After the models for R t and S t are identified, we can combine the models of these two component series with appropriate differencings and obtain a tentative model for Z. t In the above steps, it is useful to check if the intermediary model is appropriate. Typically slight serial correlations will not affect the identification of a model using this method and should not be over emphasized. However, if serial-correlations persist, or if a second seasonality exists, then the intermediary model in (6) should be modified to account for the higher order parameters or other seasonalities in the series. Non-multiplicative Seasonal Models The framework employed in this identification method automatically implies a multiplicative model. Occasionally, a non-multiplicative model may be preferable to a multiplicative model. When the need of a non-multiplicative model is suspected, we may try a tentative model of similar order and select the one that performs well in both model estimation and forecasting.

59 2.50 EXPERT MODELING AND FORECASTING TIME SERIES B.4 Example As mentioned earlier, the intermediary model (6) includes a number of sub-models. One may suspect that the filtering identification method may work if the actual model happens to be a sub-model of (6), and may not work well otherwise. Based on a number of simulated and actual time series, we find it is true that this approach is quite convenient if the actual model is a sub-model of (6). However, it is also true that the method works well even if the model is quite different from the intermediary model. This situation is illustrated in the following example where the U.S. quarterly nominal GNP (non-seasonally adjusted) is analyzed. The U.S. GNP series consists of 92 observations of quarterly data from 1947 to The series was discussed in Fiege and Pearce (1979) and Roberts (1974). The analysis in Fiege and Pearce (1979) was based on the log transformed data ( LNGNP t ). Using the traditional Box-Jenkins identification method, Fiege and Pearce (1979) obtain the following model: t t (1 B)(1 B )(1 φ B)(1 Φ B )LNGNP = a (9) To use the method outlined in this paper to identify a model for LNGNP t, first we 4 examine the sample ACF's of LNGNP t, (1 B)LNGNPt, (1 B )LNGNP t, and 4 (1 B)(1 B )LNGNP t, which are presented in Figure 1. Based on the sample ACF, we find 4 it is necessary to consider a seasonal differencing (1 B ), but not both regular and seasonal 4 differencings ((1 B)(1 B )). Therefore, we applied the filtering method on the series 4 (1 B )LNGNP t. The estimated intermediary model is t = + + t a (1.839B)(1.318B )Y 0.13 (1.172B)(1.270B )a, (10.39) (-1.89) (2.00) (-1.20) (1.48) 4 ˆ σ = (10) where Y t = (1 B )LNGNPt. The values within the parentheses are t-values of the estimated parameters. The sample autocorrelations of the residuals of the intermediary model are all insignificant except at lag 2 ( r 2 = 0.31). Using the above model, we can easily generate the series R t and S t. The sample ACF, PACF and EACF of R t are presented in Figure 2, and the sample ACF of S t is presented in Figure 3. Based on the sample EACF of R t in Figure 2, it is easy to see that an AR(3) or an ARMA(1,2) model is appropriate for R t. Based on the sample ACF of S t in Figure 3, it is found that a seasonal MA(1) model is appropriate for S t. Therefore the following models are derived: or t 1 t (1 φb φ B φ B )(1 B )LNGNP = C + (1 Θ B )a (11) t t (1 φ B)(1 B )LNGNP = C + (1 θ B θ B )(1 Θ B )a (12)

60 EXPERT MODELING AND FORECASTING TIME SERIES 2.51 Figure 1. Sample ACF's of the Log Transformed U.S. Quarterly GNP ACF of LNGNP t I IXXXX+XXXXXXXXXXXXXXXXXXX IXXXXXXXX+XXXXXXXXXXXXXX IXXXXXXXXXX+XXXXXXXXXXX IXXXXXXXXXXX+XXXXXXXXX IXXXXXXXXXXXXX+XXXXXX IXXXXXXXXXXXXXX+XXXX IXXXXXXXXXXXXXXX+XX IXXXXXXXXXXXXXXXX+X IXXXXXXXXXXXXXXXXX IXXXXXXXXXXXXXXXX IXXXXXXXXXXXXXXX IXXXXXXXXXXXXXX IXXXXXXXXXXXXX IXXXXXXXXXXXX IXXXXXXXXXXX IXXXXXXXXXXX + ACF of (1 B)LNGNPt I XXXXXXXXXX+XXXXI IXXXXXX+X XXXXXXXX+XXXXXXI IXXXXXXX+XXXXXXXXXXXXXX XXXXX+XXXXXXXXXI IXXXXXXX XX+XXXXXXXXXXXI IXXXXXXXXXXX+XXXXXXXXX XXXXXXXXXXXXXXI IXXXXXXX XXXXXXXXXXXXXI IXXXXXXXXXXXXXX+XXXXX XXXXXXXXXXXXXXI IXXXXXXX XXXXXXXXXXXXXI IXXXXXXXXXXXXXXXX+XX ACF of 4 (1 B )LNGNP t I IXXXX+XXXXXXXXXXXXXXX IXXXXXXX+XXXX IX XXXXXXXI X+XXXXXXXXI XXXXXXXXXI XXXXXI XI IXX IXXX IXX I XI XI IX IXXX + ACF of 4 (1 B)(1 B )LNGNP t I IXXXX+XXXX IXXXXX XXXXI XXXXXXX+XXXXXI XXX+XXXXXXI XXXXXXXI XXI IXXX IXXXXX IXXXX IX XXI XXXXI XXXI XI IX +

61 2.52 EXPERT MODELING AND FORECASTING TIME SERIES The estimated models for (11) and (12) are: or t = + t (13) ( B.019B.337B )(1 B )LNGNP.017 (1.428B )a, (10.33) (.12) (-3.31) (4.19) (3.81) σ ˆ = a t = t (14) (1.747B)(1 B )LNGNP.016 (1.302B.341B )(1.564B )a (8.18) (2.65) (-2.51) (-3.04) (6.05) σ ˆ = By examining the sample ACF of residuals, both models appear to be appropriate. Using the same computer software (Liu et. al. 1986), the model in (9) is estimated as: t = t σ ˆ a = (15) (1 B)(1 B )(1.189B)(1.488B )LNGNP a, (1.77) (-5.33) Apparently, the model presented in Fiege and Pearce (1979) is over-differenced. The residual standard error of this over-differenced model is larger than those of the models (13) and (14). Furthermore, its residual series has a significant autocorrelation at lag 2 ( r 2 =.23). Therefore model (13) or (14) is preferable to model (9). a Figure 2. Sample ACF, PACF, and EACF of R t under Model (10) Sample ACF of R t I IXXXX+XXXXXXXXXXXXXXXXX IXXXXXXXX+XXXXXXXXX IXXXXXXXXX+XX IXXXXXXX IXXX I XXI XXI XXXI XXXI XXXI XXXI XXI XI IX IXX + Sample PACF of R t I IXXXX+XXXXXXXXXXXXXXXXX X+XXXXI XXXX+XXXXI IXX XI I IXX XXI I XXXI IXX IX IX IX IXX XXI +

62 EXPERT MODELING AND FORECASTING TIME SERIES 2.53 Sample EACF of (Q-->) (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) R t SIMPLIFIED EXTENDED ACF TABLE (5% LEVEL) (Q-->) (P= 0) X X X O O O O O O O O O O (P= 1) X X O O O O O O O O O O O (P= 2) X X O O O O O O O O O O O (P= 3) O O O O O O O O O O O O O (P= 4) X O O O O O O O O O O O O (P= 5) X O O O O O O O O O O O O (P= 6) X X X O O O O O O O O O O Figure 3. Sample ACF of S t under Model (10) I IXXXX+X IXXXXX XXXI XXXXXX+XXXXXI X+XXXXXXI XXXXXXI XI IXX IXXXXX IXXX IX XXI XXXXI XXXI I I IXXXXXX IXXXX IXXX I XXXXI XXXI XXXI I +

63 2.54 EXPERT MODELING AND FORECASTING TIME SERIES B.5 Discussion In reviewing seasonal time series models presented in the current literature, we often discover that the model for a seasonal time series is over-differenced and results in an airline model. Even though, over-differencing may not cause great harm in forecasting, it can produce an unduly complicated model. The use of the filtering identification method discussed in this paper can greatly avoid the possibility of over-differencing. In addition to over-differencing, ignorance of calendar effects can also result in a complicated model (see e.g., Bell and Hillmer 1983, and Liu 1986). The combination of the filtering method and the linear transfer function (LTF) method discussed in Liu (1986) and Liu et. al. (1986) can greatly alleviate this problem. It is important to realize that different identification approaches may lead to slightly different models. However, these models may have similar implications once they are expressed in terms of their π -weights or Ψ -weights. In general, forecasting is not sensitive to slight differences in the models. The method discussed in this paper requires estimation of an intermediary model and filtering of the series using the intermediary model. It requires more computation than the traditional method using differencings and their sample ACF and PACF. Even though, an experienced modeler may reach a similar ARIMA model using some heuristic techniques, the filtering identification method discussed in this paper provides a systematic approach for complicated problems and also avoids potential difficulties in the identification of a seasonal ARIMA model. This approach is particularly valuable in the instruction of seasonal ARIMA modeling where a beginner may not have the sophistication of an experienced ARIMA modeler.

64 EXPERT MODELING AND FORECASTING TIME SERIES 2.55 REFERENCE Bell, W.R. and Hillmer, S.C. (1983). Modeling Time Series with Calendar Variation. Journal of the American Statistical Association 78: Box, G.E.P. and Jenkins, G.W. (1970). Time Series Analysis: Forecasting and Control. San Francisco: Holden Day. Feige, E.L. and Pearce, D.K. (1979). The Casual Causal Relationship Between Money and Income: Some Caveats for Time Series Analysis. The Review of Economics and Statistics LXI: Liu, L.-M (1986). Identification of Time Series Models in the Presence of Calendar Variation. International Journal of Forecasting 2: Liu, L.-M, Hudak, G., Box, G.E.P., Muller, M.E. and Tiao, G.C. (1986). The SCA Statistical System: Reference Manual for Forecasting and Time Series Analysis. Scientific Computing Associates, P.O. Box 625, DeKalb, Illinois Roberts, H.V. (1974). Conversational Statistics. Palo Alto: The Scientific Press. Tsay, R.S. and Tiao, G.C. (1984). Consistent Estimates of Autoregressive Parameters and Extended Sample Autocorrelation Function for Stationary and Nonstationary ARMA Models. Journal of American Statistical Association 79: Tsay, R.S. and Tiao, G.C. (1985). Use of Canonical Analysis in Time Series Model Identification. Biometrika 72:

65

66 CHAPTER 3 MULTIVARIATE TIME SERIES ANALYSIS AND FORECASTING USING SIMULTANEOUS TRANSFER FUNCTION MODELS This chapter discusses a class of models that can be used in multivariate time series analysis and forecasting, as well as a wide range of econometric time series studies. This class of models, known as simultaneous transfer function (STF) models (Liu and Hudak, 1985, 1986), allows for a system of transfer function models to be estimated and forecasted jointly. A special case of STF models is also known as rational distributed lag structural-form (RSF) models which were discussed in Wall (1976) and Hanssens and Liu (1983). In its most general form, the STF model can be written as Φ (B) Y = C+ Γ(B) X + N, t=1,2,...,n (1) i t t t θi(b) N it = a it i=1, 2,..., k φ (B) where Y t is a k 1 vector containing observations of k endogenous variables at time t, X t is an m 1 vector of exogenous (or predetermined) variables at time t, at is a series of k 1 white noise vectors, which are independently and identically distributed as multivariate normal N(0, Σ), and C is a k 1 vector of constants for the equations. Φ(B) and Γ(B) are matrix operators whose elements have the form ω(b)/δ(b) (to be defined in (3)) for the endogenous and exogenous terms, respectively; θi(b) / φ i(b) is the ARMA noise process for the i-th equation; and B is the backshift operator. The lag 0 coefficient matrix of Φ(B), denoted by Φ 0, represents the contemporaneous endogenous relationships in the STF model. In some applications, it is preferable to restrict this matrix to be an identity matrix, in which case the model in (1) is in a reduced-form. The model described in (1) is a simultaneous system with k equations. Each equation in the system may be directly compared to the transfer function models described in Box and Jenkins (1970), hence the terminology "simultaneous transfer function model" is used. For the model in (1) to be estimable, it is assumed that the system in (1) is identifiable (Hannan 1971, Granger and Newbold 1977, Kohn 1979), the noise series are stationary and invertible (i.e., the roots of φ i (B) and θ i (B) polynomials lie outside the unit circle), and the roots of Φ (B) lie outside the unit circle. STF models can be employed in a wide range of applications. This class of models is useful in any area where simultaneous and/or transfer function modeling may be employed (e.g., forecasting, process control, and the study of variable relationships). In addition, it has been shown that forecasting using single-equation transfer function models with multiple inputs may be handled more conveniently in the framework of STF models (Liu and Hudak 1984). In this chapter, the application of STF models is demonstrated for modeling multiple time series, joint forecasting, and econometric analysis. Without loss of generality the discussion and illustrations in the first few sections of this chapter is limited to a simultaneous model with two endogenous variables, Y 1, and Y 2, and

67 3.2 MULTIVARIATE TIME SERIES USING STF MODELS two exogenous variables, X 1, and X 2. In each equation, the input variables may be known a priori, or may be determined empirically. In the following discussion, it is assumed that the input variables for Y 1 in the first equation are X 1 and Y 2, and the input variables for Y 2 in the second equation are X 2 and Y 1. The general form of such a simultaneous equation system can be expressed as: Y = C +β (B)X +β (B)Y + N 1t t 12 2t 1t Y = C +β (B)X +β (B)Y + N 2t t 22 1t 2t (2) where β ij (B) 's are linear or rational polynomials in the backshift operator B. The dynamic regression coefficients β ij (B) 's are also referred to as the transfer functions between the corresponding input and output variables. In general, the transfer function may be expressed as ω(b)/δ(b) with 0 1 s 1 s 1 ω (B) = ( ω +ω B+ +ω B )B 2 r 1 2 r δ (B) = 1 δ B δ B δ B, b (3) where all roots of the δ(b) polynomial lie outside the unit circle. In order to better understand the model in (2) consider the following two examples: 0.7B Y = ( B)X + Y + (1 0.6B)a 1 0.6B 0.5B 1 Y2t = X 2 2t 0.5Y1t + a2t 1 1.1B + 0.3B 1 0.7B 1t 1t 2t 1t (4) and 0.7B Y 1t = ( B)X1t + Y 2t + (1 0.6B)a1t 1 0.6B 0.5B 2 1 Y2t = X 2 2t 0.5B Y1t + a 2t B + 0.3B 1 0.7B (5) The STF model in (4) shows a contemporaneous relationship between Y 1t and Y 2t in the equation for Y 2t, while (5) is an STF model without a contemporaneous endogenous relationship in the entire system. In both (4) and (5), the series a 1t and a 2t may be contemporaneously correlated. The model described in (1) is very general and contains several important classes of models. In the single equation case, model (1) includes univariate ARIMA, distributed lag, and transfer function models. In the case of a multiple equation model, model (1) becomes a classical simultaneous equation model if the polynomials φ i (B), θ i (B) and δ(b) are all equal to 1. For example, a simplified version of the model in (4) may be

68 MULTIVARIATE TIME SERIES USING STF MODELS 3.3 Y = ( B)X + (0.7B)Y + a 1t 1t 2t 1t Y = (0.5B)X 0.5Y + a 2t 2t 1t 2t (6) or in traditional notations, Y = 0.6X 0.4X + 0.7Y + a 1t 1t 1(t 1) 2(t 1) 1t Y = 0.5X 0.5Y + a 2t 2(t 1) 1t 2t (7) In this chapter, we present SCA paragraphs that are useful for modeling and forecasting of multivariate time series using STF models. These paragraphs includes: STEPAR CCM -- performs stepwise autoregressive fitting for multivariate time series -- computes sample cross correlation matrices for multivariate time series STFMODEL -- specifies or modifies an STF model SESTIM -- estimates the parameters of a previously specified STF model SFORECAST -- forecasts future values for multivariate time series based on a specific STF model SSIMULATE -- generates time series according to a user specified STF model The explanation of the syntax of these paragraphs will be interspersed with the descriptions of general STF modeling techniques. In addition, some capabilities will be illustrated by example using computer output (some computer output has been condensed in some examples for brevity). A complete description of the syntax for each paragraph can be found at the end of this chapter. Missing Data The SCA system has adopted the following convention if missing data are present in one or more series: (a) For each series and the designated overall span of observations to be analyzed, the span of time indexes beginning with the first non-missing value through the value preceding the next missing value is determined. For example, if a series of 400 observations has missing data at time indices 1-20, 150 and , then only the data from indices 21 through 149 will be used in the analysis of the series. (b) The intersection of all these spans is determined. Computations and analyses are based only on data in this span of time indexes.

69 3.4 MULTIVARIATE TIME SERIES USING STF MODELS 3.1 Model Building Strategy for STF Models The class of STF models described in (1) is extensive and may contain a large number of variables and parameters. At times the model building process consists of only the specification of a simultaneous equation system and then estimates the parameters in the model. At other times, a model is constructed by employing the information contained in the data. That is, given a set of vector time series Y t and X t of finite length, the aim is to find a model that contains as few parameters as possible and, at the same time, adequately represent the dynamic and stochastic relationships in the data at hand. In the latter case, the basic ideas in Box and Jenkins (1970) are extended to a three phase iterative approach for building STF models: (i) tentative model identification, (ii) estimation, and (iii) diagnostic checking. In general, tentative model identification is the most laborious phase of this model building process and relies heavily on the model builder's skills and judgement. Methods for tentative identification of STF models are discussed in Hanssens and Liu (1983), Liu and Hudak (1984), and Liu (1991). Details are described later in this chapter. After a model is tentatively identified for a simultaneous system, it can then be specified and estimated. A tentatively identified model may or may not be appropriate; therefore, it is important to check if the statistics of the residual series are consonant with those of white noise processes. If not, the model should be modified, using information revealed in diagnostic checking. After an appropriate model is developed and estimated, we can then use the final model for application. In the next few sections, we shall use the Series M data set in Box and Jenkins (1970) to illustrate the modeling and forecasting techniques available in the SCA System. In this data set, the output series (response) consists of sales data, and the input series (explanatory variable) is a leading indicator. There are 150 observations in each series. The data are displayed in Figure 1 and stored in the SCA workspace under the labels SALES and LEADING. In this section, we will use the first 126 observations only for model building and estimation. The last 24 observations are reserved to evaluate forecast performance, if desired.

70 MULTIVARIATE TIME SERIES USING STF MODELS 3.5 Figure 1. Sales Data with Leading Indicator Identification of STF models Since the presence of contemporaneous relationships in a system of equations shall increase the complexity of STF modeling, the first step in STF model identification is to examine such a possibility. This can be accomplished by examining the residual correlation matrices after fitting a vector time series using an increasing order of autoregressive models, i.e., stepwise autoregressive fitting. Based on the presence or absence of the correlations in the residual series, we can determine whether certain endogenous variables (dependent series) are contemporaneously correlated. We shall illustrate an example of stepwise AR fitting later in this section. Depending upon whether our interest is to build a reduced-form or a

71 3.6 MULTIVARIATE TIME SERIES USING STF MODELS structural-form model, certain cautions must be taken in order to avoid obtaining misleading models (Liu 1991). Based on the results of stepwise autoregressive fitting, we can then proceed to identify the model for each equation one by one. At this stage, we want to know the variables that should be included in each equation and also the forms of the transfer function for the input variables in each equation. As shown in Liu (1991), the LTF method discussed in Chapter 8 of Volume 1 and Chapter 2 of this volume can be employed. More details for the use of the LTF method in the identification of STF models are discussed in Section 3.2 of this chapter. In this section, we shall discuss the use of stepwise autoregressive fitting in STF modeling first. Stepwise autoregressive fitting Using the vector Z t to represent the joined vector of Y t and X t, the reduced-form model for the equation system described in (1) can be approximated by the following vector AR(p) model (where p is a sufficiently large number): t = + 1 t 1+ 2 t p t p + t. Z C Φ Z Φ Z Φ Z ε (8) For the STF model described in (1), if any pair of variables is neither contemporaneously related (a) explicitly through the model equation nor (b) implicitly related through the covariance matrix Σ for a t in (1), then the corresponding correlation for the residual series ( ε t ) in the above fitted autoregressive model will be non-existent (i.e., insignificant). In such a case, there is no need to include lag 0 for the related input variable when we use the LTF model for identification. Such information enables us to avoid potential bias in the identification of a structural-form model and greatly simplify the task of model identification in such a situation. Since we usually do not know the exact order (p) we should use in the vector AR model, a logical approach is to fit a sequence of autoregressive models. The STEPAR paragraph in the SCA System is used to accomplish such a task. We can obtain a quick overview of stepwise fits through 6 lags for the sales-leading indicator data if we enter -->STEPAR SALES,LEADING. DFORDER 1. ARFITS ARE 1 TO 6. OUTPUT PRINT(PHI). Here the DFORDER sentence specify the differencing order, and the ARFITS sentence is a required sentence that is used to specify the lags to include in the stepwise fits, and their order sequence. More will be said on order specification later. Here the phrase "1 TO 6" is a notational shorthand for "1, 2, 3, 4, 5, 6". Stepwise fits will be made in that order. The following output is edited for brevity: 1 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 126 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 SALES LEADING

72 MULTIVARIATE TIME SERIES USING STF MODELS 3.7 NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES AUTOREGRESSIVE FITTING ON LAG(S) 1 === PHI( 1) === RESIDUAL COVARIANCE MATRIX S( 1).213E E E-01 RESIDUAL CORRELATION MATRIX RS( 1) AUTOREGRESSIVE FITTING ON LAG(S) 1 2 === PHI( 1) === === PHI( 2) === RESIDUAL COVARIANCE MATRIX S( 2).165E E E-01 RESIDUAL CORRELATION MATRIX RS( 2) AUTOREGRESSIVE FITTING ON LAG(S) === PHI( 1) === === PHI( 2) ===

73 3.8 MULTIVARIATE TIME SERIES USING STF MODELS === PHI( 3) === === PHI( 4) === === PHI( 5) === === PHI( 6) === RESIDUAL COVARIANCE MATRIX S( 6).564E E E-01 RESIDUAL CORRELATION MATRIX RS( 6) The initial portion of the output consists of descriptive summary of the series. Information is then given on the correlation matrix of the series, followed by a short tabular display that summarizes key information from each successive AR fit and the correlation matrix of the residual series. Based on the correlation matrices displayed above, we are certain that the series SALES and LEADING are not contemporaneously related. This may be verified by dividing the off-diagonal element(s) of the residual correlation matrix by the approximate standard error for the estimated correlations. Therefore it is not necessary to include lag 0 in the LTF models. More detail on the output of the STEPAR paragraph can be found in Chapter 4 of this manual.

74 MULTIVARIATE TIME SERIES USING STF MODELS 3.9 In the above computer display, the values of the autoregression matrices can be summarized by assigning the indicator symbol '+' when an element is greater than two times its estimated standard error, the symbol '-' for values less than minus two standard errors, and '.' for values in between (i.e., insignificant). Altering the order for stepwise AR fits In the STEPAR paragraph shown above, we sequentially fit vector AR(1), AR(2), AR(3), AR(4), AR(5) and AR(6) models to our data. However if it is clear that the series are seasonal (e.g., a seasonality of 12 periods), then we may want to fit the seasonal component first, then others. To illustrate this capability, we will re-execute the STEPAR paragraph and change the order sequence in which lags are added to the model. Below is an example: -->STEPAR SALES, LEADING. ARFITS ARE 12,1,2,3,4,5, Specification of an STF model Once an STF model is identified, or partially identified, it may be transmitted to the SCA system for analysis. It is likely that the model will be modified during the iterative model building process. During each stage of the process, the following items need to be considered in the specification of an STF model. (1) number of equations in the model; (2) input variables to be included in each equation; (3) the form of transfer function that may be for each input series in an equation; (4) the form of the noise process for each equation; and (5) the constraints, if any, on parameters. The SCA system uses a two stage procedure to specify an STF model. First, a paragraph is used to specify the model for each equation. This is the TSMODEL paragraph described in Chapter 8 of Volume 1. The use of this paragraph is identical to that of singleequation transfer function modeling except that the FIXED-PARAMETER and CONSTRAINT sentences are only pertinent to univariate estimation (ESTIM) and not to simultaneous estimation (SESTIM, to be discussed in Section 3.1.3). In the second stage of STF model specification, the STFMODEL paragraph is used to specify names of equations (specified previously by TSMODEL paragraphs) composing the STF model as well as other characteristics of the model, such as fixed parameters, parameter constraints, covariance structure, and definitional equations. The FIXED-PARAMETER and CONSTRAINT sentences in the STFMODEL paragraph are only pertinent to simultaneous estimation (SESTIM), and not to univariate estimation (ESTIM).

75 3.10 MULTIVARIATE TIME SERIES USING STF MODELS Constraints on parameters During the estimation process of an STF model, the parameters in the model may be: (1) varied freely; (2) held fixed at a specified value; or (3) constrained to be equal to other parameters. Case (1) is the default condition. The latter cases are accommodated in the same manner as in the case of the transfer function models. Parameters may be fixed to a specific value or constrained to be equal to other parameters, using the FIXED-PARAMETER or CONSTRAINT sentences in the STFMODEL paragraph. The syntax for FIXED- PARAMETER and CONSTRAINT sentences is the same as in the TSMODEL paragraph. More details can be found in the full description of STFMODEL syntax. Covariance matrix The covariance matrix of the noise series may be calculated from the residual series derived from the data using the specified or estimated parameter values. This matrix may also be specified by the user in the STFMODEL paragraph. In certain instances, dependencies between series may be specified in order to reflect the known or postulated independence of one or more series from the others. In these cases, the covariance matrix may be constrained to have zeros in certain off-diagonal elements. For example, if the vector series, Z t, consists of the component series Y 1t, Y 2t, Y 3t, Y 4t and Y 5t with Y 1t, Y 2t, and Y 3t each independent of Y 4t and Y 5t then the covariance matrix can be written as a partitioned matrix that is block diagonal. This is reflected by the specification DEPENDENCY IS (YI,Y2,Y3), (Y4,Y5). where Yl, Y2, Y3, Y4, and Y5 are assumed to be the names used to store the data of the series in the SCA workspace. The use of the DEPENDENCY sentence can be important in the analysis of an under-identified system of equations with contemporaneous relationship. In some situations, the system of equations cannot be estimated unless the noise terms of certain equations are specified as independent. For example, in the following two-equation system, Y =β X + (1 θb)a t 1 t 1 1t 1 Xt = a 2t. 1 φ B 2 (9) the noise a 1t and a 2t must be assumed to be independent if joint model estimation is to be performed. However if a 1t and a 2t are independent, then joint model estimation will not improve the efficiency of the parameter estimates of this model. Joint estimation is not necessary in such a situation.

76 MULTIVARIATE TIME SERIES USING STF MODELS 3.11 An illustration Using the LTF method described in Chapter 2 (and also later in this chapter), we obtain the following models for the SALES and LEADING series. 3 t 1 3 t 1 1t (1 B)SALES = C +ω B /(1 δ B)LEADING + (1 θ B)a (1 B)LEADING t = (1 θ 2B)a 2t (10) Before we perform a joint estimation for the above model, we may estimate the above model separately first. This will increase the efficiency of STF model estimation in terms of computation time and convergence in parameter estimates. Note that the equation forsales t, includes a denominator term. When an equation includes a denominator term, it is necessary that the numerator terms have reasonable initial values. In this example, we use 4.80 as the initial value for the numerator term, W3. In practice, it is highly recommended that the model first be estimated without the denominator term to get appropriate initial values for the numerator. Once initial values are obtained for the numerator, the denominator term may be added into the model and re-estimated. Bad initial values may lead to a division by zero error during nonlinear estimation. -->W3= >TSMODEL SALESMDL. NO --> MODEL IS SALES(1)=C1+(W3*B**3)/(1-D1*B)LEADING(1)+(1-TH1*B)NOISE >ESTIM SALESMDL. HOLD RESIDUALS(R1). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE W3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE

77 3.12 MULTIVARIATE TIME SERIES USING STF MODELS 4 TH1 SALES MA 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E >TSMODEL LEADMDL. NO --> MODEL IS LEADING(1)=(1-TH2*B)NOISE. -->ESTIM LEADMDL. HOLD RESIDUALS(R2). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- LEADMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 TH2 LEADING MA 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Now that the individual equations (SALESMDL and LEADMDL) have been specified and estimated, we have obtained good initial values for the model parameters and are ready to put these equations together as a simultaneous transfer function model. The syntax is specified below. -->STFMODEL JOINTMDL. MODELS ARE SALESMDL,LEADMDL. NO SHOW Estimation of an STF model Once an STF model is tentatively identified and specified, the parameters and the covariance matrix Σ can be estimated by maximizing the corresponding likelihood function. The maximum likelihood estimation for STF models is usually referred to as full-information maximum likelihood (FIML) estimation. Extensive literature exists on the properties of FIML estimates (see, for example, Sargan 1961, Fair 1970, Hendry 1971, Chow and Fair 1973, Zellner 1979, Palm and Zellner 1980, Reinsel 1979). When there are no definitional equations (see Section 3.4.2), Wall (1976) derived the following likelihood function for model (1),

78 MULTIVARIATE TIME SERIES USING STF MODELS 3.13 n n n t t= 1 Φ L(η) exp { 0.5 at a } (11) where Φ 0 is the lag 0 structure matrix that contains the contemporaneous endogenous terms, and η is a vector that contains all unknown model parameters to be estimated. When the model contains definitional equations, it is possible to eliminate these exact equations via substitution in order to obtain a system of k equations. The likelihood function in (11) can then be used exactly as it stands. This otherwise tedious task of substitution is performed by the computer program automatically as long as the definitional equations are correctly specified. When MA parameters are present in an equation, the starting values for at can be either assumed to be 0 or estimated using the likelihood function. The current version adopts the conditional algorithm (i.e., assumed unknown a t 's to be 0). The numerical minimization of the negative log likelihood function with respect to η is performed, using a variant of the Gauss-Marquardt method (MACC, 1965). The SCA command to perform FIML estimation on JOINTMDL is specified below. The output follows. -->SESTIM JOINTMDL.HOLD RESIDUALS(R1,R2). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 LOG LIKELIHOOD AT INITIAL ESTIMATES = ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+03 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-05 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-01 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-04 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- JOINTMDL MODEL SUMMARY FOR EQUATION 1 -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE W3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE TH1 SALES MA 1 1 NONE

79 3.14 MULTIVARIATE TIME SERIES USING STF MODELS MODEL SUMMARY FOR EQUATION 2 -- LEADMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 5 TH2 LEADING MA 1 1 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 116 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 SALES E E LEADING E E Diagnostic checking Once the parameters are estimated, various diagnostic checks should be performed on the estimated residual series to determine the adequacy of the model fit and to search for directions of improvement, if necessary. Useful methods include plotting the individual residual series against time to spot possible outliers, and printing the cross correlation matrices (CCM) of the residual series to determine if it is consonant with that of a white noise vector process. Cross correlation matrices for a vector time series is the combination of the ACF for all series and the CCF for all pairs of series which are displayed in matrix form. In the case of a vector comprised of two series, Y and X, the CCM for lag l is ρ ρ ρ ρ 21 22, (12) where ρ 11 and ρ 22 are the lag l values of the ACF for Y and X, respectively; ρ 12 is the lag l value of the CCF when X leads Y; and ρ 21 is the lag l value of the CCF when Y leads X. The

80 MULTIVARIATE TIME SERIES USING STF MODELS 3.15 values of the CCM for more series have a similar interpretation. In this way, the values associated with the same lag of all ACF and CCF are presented jointly in a compact form. Since we are interested in spotting non-zero values of the CCM, a display of the actual values may be of less interest to us than whether each value is significantly different from zero. Just as a plot of the ACF is more useful than listing the values of the ACF, it would be more beneficial if we can condense the information of the CCM further to express significance and insignificance in a way that is visually more striking. Following Tiao and Box (1981), an effective summary of the pattern of the correlation structure is provided if indicator symbols are used to replace the numerical values of the CCM. These symbols are (+, -,.) where the symbol '+' represents a positively significant value, the symbol '-' represents a negatively significant value, and the symbol '.' represents an insignificant value. The criterion for the significance of a value in the CCM is based on the work of Bartlett (1946). More complete information on the mathematical representation of the CCM and the criterion for significance can be found in Tiao and Box (1981), and Wei (1990). We can compute and display the CCM for the residual series of the estimated model shown above if we enter -->CCM R1,R2. MAXLAG IS 12. We limit the number of lags to be computed by including the MAXLAG sentence (the default limit is 24 lags). We obtain the following. TIME PERIOD ANALYZED TO 126 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I

81 3.16 MULTIVARIATE TIME SERIES USING STF MODELS CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH The above output contains summary information for each series and the cross correlation matrices in terms of the ( +, -,. ) symbols. The criterion used to designate a symbol is also provided. The cross correlation information is provided in two forms. We have already explained the k k matrix form. In addition, the information is also provided within one matrix of symbols. The (i,j) component of this matrix summarizes the information related to that element for all lags. Hence the diagonal elements summarize the significance of the ACF for each series. Each off-diagonal element summarizes the significance of a cross correlation between two series. Based on the above CCM for the residual series, we find the model we obtained is appropriate for application.

82 MULTIVARIATE TIME SERIES USING STF MODELS Forecasting future observations When an STF model is used for forecasting, it is necessary to provide additional information regarding exogenous variables in the model. In the case of a non-stochastic exogenous variable, the series cannot be forecasted and the user must supply all observations through the forecasting periods for those variables. Since such information is completely predetermined, the related variable will not contribute stochastic variation to the forecasts. In the case of stochastic exogenous variables, the user may specify a sequence of individual and joint models. Such models may be univariate ARIMA, transfer function, or another STF model. Once the model(s) for exogenous variables are specified, the values required in forecasting the endogenous variables can be computed. These values will then be used in computing the forecasts of the endogenous variables. In some situations, the user may prefer to use modified forecasts of the exogenous variables but still use the specified model for computing the variances of the forecasts. This can be accomplished by appending the user's forecasts at the end of the exogenous variables. The computation of forecasts and their standard errors are discussed in Liu and Hudak (1984). The SFORECAST paragraph is used to compute the forecasts of future values of a vector time series based on a specified STF model. The SFORECAST paragraph requires the value of the estimated covariance matrix Σ in order to compute standard errors of forecasts. This matrix represents the current estimated value of Σ. The current value for the matrix is held internally after estimation and is rewritten after each estimation. Thus, if Σ is not stored in a user specified variable, forecasts should be made for a "final" model before any new model is estimated. Listed below is the use of the SFORECAST paragraph, and its computer output for the sales-leading indicator example. -->SFORECAST JOINTMDL. NOFS IS FORECASTS, BEGINNING AT ORIGIN = SERIES: SALES LEADING TIME FORECAST STD ERR FORECAST STD ERR

83 3.18 MULTIVARIATE TIME SERIES USING STF MODELS ERROR COVARIANCE MATRIX Further analysis of the example Since the covariance matrix Σ does not show correlation, the two equations in fact can be estimated separately. Alternately, we can still use the STFMODEL, but specify that residual series are not correlated. This can be accomplished by adding the sentence DEPENDENCY NONE to the STFMODEL paragraph. -->STFMODEL JOINTMDL. MODELS ARE SALESMDL,LEADMDL. --> DEPENDENCY NONE. NO SHOW. -->SESTIM JOINTMDL. HOLD RESIDUALS(R1,R2). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 126 LOG LIKELIHOOD AT INITIAL ESTIMATES = ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+03 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-05 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-02 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-04 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- JOINTMDL MODEL SUMMARY FOR EQUATION 1 -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 SALES RANDOM ORIGINAL (1-B ) 1 LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE W3 LEADING NUM. 1 3 NONE D1 LEADING DENM 1 1 NONE TH1 SALES MA 1 1 NONE MODEL SUMMARY FOR EQUATION 2 -- LEADMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED

84 MULTIVARIATE TIME SERIES USING STF MODELS LEADING RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 5 TH2 LEADING MA 1 1 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 116 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 SALES E E LEADING E E >SFORECAST JOINTMDL. NOFS IS FORECASTS, BEGINNING AT ORIGIN = SERIES: SALES LEADING TIME FORECAST STD ERR FORECAST STD ERR ERROR COVARIANCE MATRIX

85 3.20 MULTIVARIATE TIME SERIES USING STF MODELS In the above SFORECAST paragraph, the off-diagonal of the error covariance matrix is constrained to be zero. 3.2 Identification of STF Models Simultaneous transfer function models may be used (1) for forecasting, and (2) for understanding and interpretation of inter-relationships among variables in a system. The latter application is also known as structural analysis. Traditionally structural-form models (i.e., models that allow for the inclusion of contemporaneous relationships between the input variables and the output variable) are used for both forecasting and structural analysis. A number of difficulties in the identification and estimation of structural-form models have been extensively discussed in econometric literature. Liu (1991) suggested that if the primary interest of STF modeling is forecasting, a reduced- form model may be more preferable than a structural-form model. A reduced-form model does not allow for contemporaneous relationships in a model unless an input variable is certain to be exogenous. With this restriction, the use of a reduced-form model avoids a number of difficulties that commonly occur in structural-form modeling. Both reduced-form and structural-form models may consist of a system of equations. Using the approach to be discussed in this section, we can deal with the identification of the model in the system one equation at a time. This flexibility allows us to focus on a partial system (e.g., just one or a few equations) depending upon our interest and application. It is also useful to note that if a structural-form equation does not contain a contemporaneous endogenous variable as an input variable, then the structural-form of the equation is identical to its reduced-form. The model developed for the sales-leading indicator data in the previous section is such an example. With this in mind, it is important to perform stepwise autoregressive fitting and examine the correlation matrices for the residual series of each autoregressive fit. If we can conclude that some endogenous variables are not contemporaneously related, the task for model identification may be simplified Transfer function models For notational simplicity, we shall further simplify the model presented in (2) and consider a simultaneous equation system with just two variables, Y t and Z t, where Y t and Z t may be inter-related and both can be endogenous variables. Assuming both Y t and Z t are stationary, the general form for the transfer function model of Y can be expressed as Yt = C +ω (B)Zt + N t, t=1,2,...,n (13) t or ω(b) Yt = C + Zt + N t, t=1,2,...,n δ(b) (14) where C is a constant term, and N t is the disturbance term which follows a stationary ARMA process. The polynomial ω(b) and δ(b) can be generally expressed as

86 MULTIVARIATE TIME SERIES USING STF MODELS g g ω (B) =ω +ω B+ω B + +ω B, and (15) 1 r r δ (B) = 1 δ B δ B (16) We refer to the model in (13) as a linear transfer function (LTF) model (which implies δ(b)=1), and the model in (14) as a rational transfer function model (which implies δ(b) 1). It is also important to note that the parameter ω 0 in (15) is constrained to be zero (i.e., cannot be present in the ω(b) polynomial) if the model equation is in reduced- form. on Similar to (13) and (14), we may consider a transfer function model for Y, which may be expressed as t t t t Z t dependent Z = C +ω (B)Y + N, t=1,2,...,n (17) or ω (B) Zt = C + Yt + N t, t=1,2,...,n. δ (B) (18) A key task to address is the identification of the exact forms for ω(b), δ(b), ω (B), and δ (B). We shall employ the LTF method, to be shown below, to address this task The LTF method An effective way to identify a transfer function equation is to employ the linear transfer function (LTF) method. The LTF method follows an approach proposed by Liu and Hanssens (1982) and is detailed in Liu and Hudak (1985), Liu (1986, 1987), Pankratz (1991), and particularly in Liu et. al. (1992). This method is effective for both non-seasonal and seasonal time series, and for both reduced-form and structural-form models. Furthermore, it is easy to use, flexible, and easy to understand. More information regarding the LTF method can be found in Chapter 8, Forecasting and Time Series Analysis Using the SCA Statistical System: Volume 1 (Liu et. al. 1992). Consider the transfer function equations described in (13) through (16). The LTF method employs the following linear transfer function model for the identification of a structural-form equation: 2 k t k t t Y = C + (v + v B+ v B + + v B )Z + N (19) where k is a lag order for Z t chosen by the user based on the subject matter, and N t is the disturbance term (to be discussed later). For the identification of a reduced-form equation, the following model structure is employed: 2 k t 1 2 k t t Y = C + (v B+ v B + + v B )Z + N (20)

87 3.22 MULTIVARIATE TIME SERIES USING STF MODELS The key difference between (19) and (20) is that (20) excludes a potential contemporaneous relationship between Z t and Y t, and (19) allows for a potential contemporaneous relationship between Z t and Y. t If Z t is an exogeneous variable, the model structure used in (19) or (20) typically leads to the same model if Z t and Y t are not contemporaneously related. This is particularly true if the exogenous variable is a pre-determined non-stochastic time series. However, if Z t is also an endogenous variable, the transfer function weight estimates based on (19) can be seriously biased, and render rather misleading results even if Z t and Y t are not contemporaneously related. Therefore it is preferable to employ a reduced-form model unless exogeneity of an input variable is definite. The disturbance term in (19) and (20) can be effectively approximated by simple autoregressive models for the purpose of model identification. If the output variable is nonseasonal, the disturbance term may be approximated by 1 Nt = a t. 1 φ B 1 (21) In the case of a seasonal output variable (with seasonality s), an initial approximation for the disturbance term may be N t 1 = a s (1 φ B)(1 Φ B ) 1 1 t (22) The combined use of a linear transfer function with an autoregressive disturbance term provides some unique advantages. They include: (1) Obtaining efficient estimates of transfer function weights. Based on these weights, we can determine if a linear or a rational transfer function is needed for each input variable. (2) Obtaining an estimated disturbance series ˆN t. The model for ˆN t can then be easily identified by SCA s IARIMA (automatic modeling capability) paragraph. (3) Providing information on differencing. If either φ 1 or Φ 1 in (21) and (22) is close to 1, then it is necessary to perform appropriate differencing(s) on all variables in the model. This topic will be further discussed below. Differencings The models we discussed assume that both Y t and Z t are stationary. In most real-life applications, Y t and Z t may not follow this assumption. Similar to ARIMA modeling, the determination of differencing orders is a key aspect in transfer function modeling. We may examine the value of φ 1 and Φ 1 in the autoregressive term (to see if they are close to 1) to determine if a regular (i.e., (1-B)) and/or a seasonal (i.e., (1-B 2 )) differencing is necessary. However it is important to note that the estimates φ 1 and Φ 1 can be seriously biased if a nonseasonal or seasonal MA parameter is required in the disturbance term. This is particularly serious for the seasonal AR(1) estimate. An easy way to overcome this difficulty is to employ

88 MULTIVARIATE TIME SERIES USING STF MODELS 3.23 the IARIMA paragraph to automatically identify an ARIMA model for the estimated disturbance series. If the identified model for the disturbance series requires differencing or if the model contains a non-seasonal or seasonal AR(1) polynomial with the parameter estimate(s) close to 1, then appropriate differencing(s) is necessary An illustrated example In this section, we employ a set of data consisting of monthly shipments and new orders of durable goods in the United States between January 1958 and December 1974 (Hillmer 1976, and Liu 1987) to illustrate the application of the LTF method in both reduced-form and structural-form model building. These series are displayed in Figure 2. Following the analysis in Liu (1987), we shall only consider the data between January 1958 and December 1972 (i.e., the first 180 observations) for model building. The last 24 observations may be used for post-sample forecasting comparison (see Liu 1987) if desired. Both series are log transformed, and stored in the SCA workspace as SHIPMENT and NEWORDER respectively.

89 3.24 MULTIVARIATE TIME SERIES USING STF MODELS In this example, we first check if SHIPMENT and NEWORDER could be contemporaneously correlated. This is accomplished by employing the STEPAR paragraph. In this STEPAR analysis, we include a seasonal differencing since both series are highly seasonal (as revealed in Figure 2). Also a vector AR(12) term is fitted first, as suggested in Section >STEPAR SHIPMENT,NEWORDER. DFORDER 12. ARFITS ARE --> OUTPUT PRINT(PHI).

90 MULTIVARIATE TIME SERIES USING STF MODELS DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 180 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 SHIPMENT NEWORDER NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES AUTOREGRESSIVE FITTING ON LAG(S) 12. === PHI(12) === RESIDUAL COVARIANCE MATRIX S( 1).338E E E-02 RESIDUAL CORRELATION MATRIX RS( 1) AUTOREGRESSIVE FITTING ON LAG(S) 12 1 === PHI(12) === === PHI( 1) === RESIDUAL COVARIANCE MATRIX S( 2).969E E E-02 RESIDUAL CORRELATION MATRIX RS( 2)

91 3.26 MULTIVARIATE TIME SERIES USING STF MODELS... AUTOREGRESSIVE FITTING ON LAG(S) === PHI(12) === === PHI( 1) === === PHI( 2) === === PHI( 3) === === PHI( 4) === RESIDUAL COVARIANCE MATRIX S( 5).896E E E-02 RESIDUAL CORRELATION MATRIX RS( 5) Note: The statistical significance of a contemporaneous correlation within the series may be approximated by dividing the off-diagonal element of the sample correlation matrix by the standard error for the estimated correlation (for example, the t value for the correlation in the first matrix is =10.50). Based on the residual correlation matrices of the above stepwise autoregressive fitting, we find that SHIPMENT and NEWORDER are contemporaneously correlated. However, we cannot determine which series is contemporaneously influenced by the other. This issue will be clarified after we consider the following reduced-form modeling for this system of equations.

92 MULTIVARIATE TIME SERIES USING STF MODELS 3.27 Reduced-form model building and forecasting In the reduced-form transfer function model identification for this data set, the following linear transfer function model is employed for the determination of differencing orders and subsequently for the identification of the model equation. 1 Y = C + (v B+ v B + + v B )x + a. (23) φ Φ 2 6 t t 12 t (1 1B)(1 1B ) Based on the above model, we find that it is necessary to have a differencing of 12 for both input and output variables. With the differencing order determined, the above LTF model (with differencing order 12) can be used for model identification. This LTF model can be specified in the SCA System using the TSMODEL paragraph. The IESTIM paragraph is then used to automatically estimate and delete the insignificant parameters in the LTF model. In practice, it is recommended that the ARMA component of the model be fixed to either an AR(1) for non-seasonal data or fixed to an AR(1) AR(1) S for seasonal data during the initial parameter reduction on the model. We accomplish this by including the PRESERVE ARMA sentence in the IESTIM paragraph. We can then refine the ARMA component and update our model as a separate step. In order to allow this, we must retain the disturbance term. This is accomplished by using the optional sentence HOLD DISTURBANCE(NS). -->TSMODEL NAME IS EQ1. NO --> MODEL IS SHIPMENT(12)=C1+(1 TO 6)NEWORDER(12)+1/(1)(12)NOISE >IESTIM EQ1. PRESERVE ARMA. HOLD DISTURBANCE(NS). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE NEWORDER NUM. 1 4 NONE NEWORDER NUM. 1 5 NONE NEWORDER NUM. 1 6 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE

93 3.28 MULTIVARIATE TIME SERIES USING STF MODELS TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E The results of the IESTIM paragraph indicate that NEWORDER at lags 1, 2 and 3 are positively related to SHIPMENT. The IARIMA paragraph is now used to refine and replace the ARMA component of the above LTF model. Here, we specify a potential seasonality of 12 using the SEASONALITY optional sentence. The REPLACE sentence is specified so that the LTF model (EQ1) is automatically updated with the refined ARMA model which is identified in the IARIMA paragraph. -->IARIMA NS. SEASONALITY IS 12. REPLACE EQ1. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 12 NONE NS D-AR 1 1 NONE

94 MULTIVARIATE TIME SERIES USING STF MODELS 3.29 TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E The results of the IARIMA paragraph indicate that the appropriate ARIMA model for the LTF model is an ARIMA(1,0,0)(0,0,1) 12. Since the EQ1 model has already been updated with this new ARIMA component, we complete the modeling process by performing an estimation that includes both the transfer function lags and the new ARIMA parameters. This is accomplished through the following ESTIM paragraph. -->ESTIM EQ1. OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE SHIPMENT MA 1 12 NONE SHIPMENT D-AR 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E The modeling process is now repeated for the second equation, EQ2. Here, we model the relationship between NEWORDER and SHIPMENT. The LTF model employed is the same as that in (23). Using this model, we determine that the differencing orders are 1, 12. Using these differencings, we proceed with the following steps to identify a transfer function model for the second equation. -->TSMODEL NAME IS EQ2. NO --> MODEL NEWORDER(1,12)=C1+(1 TO 6)SHIPMENT(1,12)+1/(1)(12)NOISE >IESTIM EQ2. PRESERVE ARMA. HOLD DISTURBANCE(NS).

95 3.30 MULTIVARIATE TIME SERIES USING STF MODELS THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE SHIPMENT NUM. 1 1 NONE SHIPMENT NUM. 1 2 NONE SHIPMENT NUM. 1 3 NONE SHIPMENT NUM. 1 4 NONE SHIPMENT NUM. 1 5 NONE SHIPMENT NUM. 1 6 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE SHIPMENT NUM. 1 2 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 2 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED

96 MULTIVARIATE TIME SERIES USING STF MODELS NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT NUM. 1 2 NONE NEWORDER D-AR 1 1 NONE NEWORDER D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E >IARIMA NS. SEASONALITY IS 12. REPLACE EQ2. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 1 NONE NS MA 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E >ESTIM EQ2. OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B )

97 3.32 MULTIVARIATE TIME SERIES USING STF MODELS PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT NUM. 1 2 NONE NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E The modeling result of the relationship between NEWORDER and SHIPMENT suggests that SHIPMENTS at lag 2 may have a negative influence on NEWORDER. Although this relationship is borderline significant, we will retain it in the equation for the purpose of demonstrating STF model estimation. The individual equations, EQ1 and EQ2, are now combined using the STFMODEL paragraph and estimated jointly using the SESTIM paragraph. The results are shown below: -->STFMODEL JOINTMDL. -- MODELS ARE EQ1,EQ2. NO SHOW. -->SESTIM JOINTMDL. HOLD RESID(R1,R2). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+03 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-05 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-01 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-03 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- JOINTMDL MODEL SUMMARY FOR EQUATION 1 -- EQ1 VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE

98 MULTIVARIATE TIME SERIES USING STF MODELS C1 CNST 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE SHIPMENT MA 1 12 NONE SHIPMENT D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 2 -- EQ2 VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 7 SHIPMENT NUM. 1 2 NONE NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 164 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 SHIPMENT E E NEWORDER E E The model estimation summary for EQ2 reveals that SHIPMENT at lag 2 may be omitted from the model since it is insignificant. If SHIPMENT does not influence NEWORDER, we may extend our analysis by considering the structural-form for the SHIPMENT model. The structural-form model is presented later in this section. To complete our analysis, a diagnostic check is performed on the residual series. Since there are two equations in our STF model, there are two residual series where R1 corresponds to EQ1 and R2 corresponds to EQ2. We want to be sure that there is no large correlation remaining between the residual series, especially in the first few lags or multiples of the seasonal lag. To perform this diagnostic check we examine the cross correlation matrices of

99 3.34 MULTIVARIATE TIME SERIES USING STF MODELS the residual series using the CCM paragraph. In this example, we can conclude that the residuals are clean with only trace correlations at lag 3 and lag 13 of the CCM. The correlation may be considered as spurious. -->CCM R1,R2. MAXLAG IS 24. TIME PERIOD ANALYZED TO 180 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH LAGS 19 THROUGH

100 MULTIVARIATE TIME SERIES USING STF MODELS 3.35 To forecast using STF models, the SFORECAST paragraph is employed. Here, we generate 24 forecasts starting at origin >SFORECAST JOINTMDL FORECASTS, BEGINNING AT ORIGIN = SERIES: SHIPMENT NEWORDER TIME FORECAST STD ERR FORECAST STD ERR E E E E E E E E E E E E E E E E E E E E E E ERROR COVARIANCE MATRIX Structural-form modeling and forecasting Based on the reduced-form model building shown above, we find that SHIPMENT is related to lags 1, 2, and 3 of NEWORDER when a contemporaneous relationship is not considered. On the other hand, NEWORDER is barely related to SHIPMENT at lag 2. As a result, it is more logical to think that SHIPMENT may be influenced by NEWORDER contemporaneously. With this in mind, we consider the following structural-form model identification using the same procedures that were used to identify the reduced-form model.

101 3.36 MULTIVARIATE TIME SERIES USING STF MODELS -->TSMODEL NAME IS EQ1. NO --> MODEL IS SHIPMENT(12)=C1+(0 TO 6)NEWORDER(12)+1/(1)(12)NOISE >IESTIM EQ1. PRESERVE ARMA. HOLD DISTURBANCE(NS). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 0 NONE NEWORDER NUM. 1 1 NONE NEWORDER NUM. 1 2 NONE NEWORDER NUM. 1 3 NONE NEWORDER NUM. 1 4 NONE NEWORDER NUM. 1 5 NONE NEWORDER NUM. 1 6 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01 ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 0 NONE NEWORDER NUM. 1 1 NONE SHIPMENT D-AR 1 1 NONE SHIPMENT D-AR 2 12 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR (WITHOUT OUTLIER ADJUSTMENT) E-01

102 MULTIVARIATE TIME SERIES USING STF MODELS From the model summary presented above, we find that there is indeed a contemporaneous relationship between SHIPMENT and NEWORDER. At the beginning of this section, we examined the residual correlation matrices and found a contemporaneous relationship was present. By examining the reduced-form equations, we were then able to make the inference regarding causality between NEWORDER and SHIPMENT. Also, by including a contemporaneous component in the equation, we find that our model becomes less complicated in that NEWORDER at lags 2 and 3 are no longer significant and were omitted from the equation by the IESTIM paragraph. We shall complete the rest of modeling for EQ1 by including the following two SCA paragraphs. -->IARIMA NS. SEASONALITY IS 12. REPLACE EQ1. THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 THE CRITICAL VALUE FOR SIGNIFICANCE TESTS OF ACF AND ESTIMATES IS SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- UTSMODEL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED NS RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 NS MA 1 12 NONE NS D-AR 1 1 NONE TOTAL NUMBER OF OBSERVATIONS EFFECTIVE NUMBER OF OBSERVATIONS RESIDUAL STANDARD ERROR E >ESTIM EQ1. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T

103 3.38 MULTIVARIATE TIME SERIES USING STF MODELS LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 0 NONE NEWORDER NUM. 1 1 NONE SHIPMENT MA 1 12 NONE SHIPMENT D-AR 1 1 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Since we are considering a contemporaneous relationship in EQ1, we can not include a contemporaneous lag in EQ2. We will use the model identified earlier for EQ2 and include lag 2 of SHIPMENT for the sake of illustration. -->TSMODEL NAME IS EQ2. NO --> MODEL IS NEWORDER(1,12)=(2)SHIPMENT(1,12)+(1)(12)NOISE >ESTIM EQ2. HOLD RESIDUALS(RES). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 180 NONLINEAR ESTIMATION TERMINATED DUE TO: RELATIVE CHANGE IN (OBJECTIVE FUNCTION)**0.5 LESS THAN.1000D-02 SUMMARY FOR UNIVARIATE TIME SERIES MODEL -- EQ VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 SHIPMENT NUM. 1 2 NONE NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE EFFECTIVE NUMBER OF OBSERVATIONS R-SQUARE RESIDUAL STANDARD ERROR E Below we specify the model for joint equations. The model is then estimated using the SESTIM paragraph. -->STFMODEL JOINTMDL. MODELS ARE EQ1,EQ >SESTIM JOINTMDL. HOLD RESID(R1,R2).

104 MULTIVARIATE TIME SERIES USING STF MODELS 3.39 THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+03 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-05 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-01 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-03 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- JOINTMDL MODEL SUMMARY FOR EQUATION 1 -- EQ1 VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 12 SHIPMENT RANDOM ORIGINAL (1-B ) 12 NEWORDER RANDOM ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 C1 CNST 1 0 NONE NEWORDER NUM. 1 0 NONE NEWORDER NUM. 1 1 NONE SHIPMENT MA 1 12 NONE SHIPMENT D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 2 -- EQ2 VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 12 NEWORDER RANDOM ORIGINAL (1-B ) (1-B ) 1 12 SHIPMENT RANDOM ORIGINAL (1-B ) (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 6 SHIPMENT NUM. 1 2 NONE NEWORDER MA 1 1 NONE NEWORDER MA 2 12 NONE ERROR COVARIANCE MATRIX

105 3.40 MULTIVARIATE TIME SERIES USING STF MODELS ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 165 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 SHIPMENT E E NEWORDER E E The model summary presented above reveals that SHIPMENT at lag 2 in EQ2 is not significant and may be deleted from the model. We will now examine the CCM of the residual series as a diagnostic check. -->CCM R1,R2. MAXLAG IS 24. TIME PERIOD ANALYZED TO 180 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH 12

106 MULTIVARIATE TIME SERIES USING STF MODELS LAGS 13 THROUGH LAGS 19 THROUGH The cross correlation matrices of the residual series pass the diagnostic check although there are some spurious correlation at lag 3 and lag 8. To complete our example, we use the SFORECAST paragraph to generate 24 forecasts beginning at origin >SFORECAST JOINTMDL FORECASTS, BEGINNING AT ORIGIN = SERIES: SHIPMENT NEWORDER TIME FORECAST STD ERR FORECAST STD ERR E E E E E E E E E E E E E E E E E E E E E E ERROR COVARIANCE MATRIX

107 3.42 MULTIVARIATE TIME SERIES USING STF MODELS 3.3 Multivariate Time Series Analysis with Interventions In this section, we illustrate how STF model can accommodate time series modeling with interventions. The time series used in this study were discussed in Liu (1987) and displayed in Figure 3. The first series, called SALES, represents shipments of computer parts of a manufacturer to customers (wholesale distributors) between April 1975 and October The second series, ORDERS1, represents the same firm s customer order status of computer parts product, one period prior to shipment. Similarly, the series ORDERS2 represents the firm s customer orders given two periods prior to actual shipment. In this example, one time period contains 28 days and is referred to as a retail month. Therefore, each year contains 13 retail months instead of 12 ordinary calendar months. Manufacturing firms often use retail months to avoid trading day variation and to facilitate production planning. In this data set, the series ORDERS1 and ORDERS2 could be useful as possible predictors of SALES since the series reflect customer expectations for the quantities of parts they will require over the next one and two periods. Using the LTF method, Liu (1987) found

108 MULTIVARIATE TIME SERIES USING STF MODELS 3.43 the following STF model is appropriate for modeling SALES, ORDERS1, and ORDERS2 jointly: 1 SALES =β ORDERS + a (24) φ t 1 t 1 1t 1 1B 1 ORDERS1 =β ORDERS2 + a (25) φ t 2 t 1 2t 1 2B (1 B)ORDERS2 =β (1 B)INTV2 + (1 φ B)a. (26) t 3 t 1 3t The input INTV2 t is an intervention indicator variable which is used to account for end-ofthe-year plant shutdowns during the first 4 years of operation. The variable INTV2, is a binary variable consisting of 1 at t=6, 19, 32 and 45, and 0 otherwise. Model specification The above model can be specified using the following SCA paragraphs: -->TSMODEL SALESMDL. MODEL SALES=(1)ORDERS1+1/(1)NOISE. -->TSMODEL ORDR1MDL. MODEL ORDERS1=(1)ORDERS2+1/(1)NOISE. -->TSMODEL --> MODEL ORDERS2(1)=(0)INTV2(1,BINARY)+(1)NOISE. -->STFMODEL JNTMDL. NO --> MODELS ARE SALESMDL,ORDR1MDL,ORDR2MDL. Model estimation The STF model specified above can be estimated using the SESTIM paragraph. The result is displayed below. -->SESTIM JNTMDL. HOLD RESID(R1,R2,R3). OUTPUT LEVEL(BRIEF). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 98 LOG LIKELIHOOD AT INITIAL ESTIMATES = ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-03 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+03 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-05 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-01 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-04 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- JNTMDL

109 3.44 MULTIVARIATE TIME SERIES USING STF MODELS MODEL SUMMARY FOR EQUATION 1 -- SALESMDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED SALES RANDOM ORIGINAL NONE ORDERS1 RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 ORDERS1 NUM. 1 1 NONE SALES D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 2 -- ORDR1MDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED ORDERS1 RANDOM ORIGINAL NONE ORDERS2 RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 3 ORDERS2 NUM. 1 1 NONE ORDERS1 D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 3 -- ORDR2MDL VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED 1 ORDERS2 RANDOM ORIGINAL (1-B ) 1 INTV2 BINARY ORIGINAL (1-B ) PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 5 INTV2 NUM. 1 0 NONE ORDERS2 MA 1 1 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX

110 MULTIVARIATE TIME SERIES USING STF MODELS THE FOLLOWING SUM OF SQUARES ARE BASED ON 96 OBSERVATIONS Forecasting EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 SALES E E ORDERS E E ORDERS E E To obtain joint forecasts for the STF model developed above, first we need to make sure that the intervention variable INTV2 (or any non-stochastic variables) contains information for the time periods to be forecasted. In this case, INTV2 should contain 0 for the periods to be forecasted. With non-stochastic information appropriately provided, we can enter the following SCA paragraph to obtain the forecasts. -->SFORECAST JNTMDL. NOFS IS FORECASTS, BEGINNING AT ORIGIN = SERIES: SALES ORDERS1 TIME FORECAST STD ERR FORECAST STD ERR SERIES: ORDERS2 TIME FORECAST STD ERR

111 3.46 MULTIVARIATE TIME SERIES USING STF MODELS ERROR COVARIANCE MATRIX Econometric Modeling Using the STF Models The STF models discussed so far also encompass a wide range of models often used in econometric analysis. In this section, we discuss some special aspects that are unique in certain econometric analysis Specification of endogenous variables In the specification of an STF model, it is assumed that all left-hand side variables are endogenous variables, and others are exogenous. Whenever a model specification is different from this default assumption, the name of the endogenous series must be specified. The following demand-supply model described in Kmenta (1971) is such an example: where Qt =α 1+α 2Pt +α 3Dt + a 1t (demand equation) (27) Qt =β 1+β 2Pt +β 3Ft +β 4YR t + a 2t (supply equation) (28) Q t : food consumption per head; P t : ratio of food prices to general consumer prices; D t : disposable income in constant prices; F t : ratio of preceding year's prices received by farmers for products to general consumer prices; and Yr t : time in years. In this example, Q t and P t are endogenous variables. Since only Q t appears on the lefthand side of the equation, it is necessary to specify Q t in the first equation and P t in the second equation as endogenous variables. This is reflected by the specification ENDOGENOUS ARE Q, P.

112 MULTIVARIATE TIME SERIES USING STF MODELS 3.47 Below is the complete specification for the above equation system: -->TSMODEL NAME IS DEMAND. NO SHOW. MODEL IS --> Q = A1 + (A2)P + (A3)D + NOISE. -->TSMODEL NAME IS SUPPLY. NO SHOW. MODEL IS --> Q = B1 + (B2)P + (B3)F + (B4)YR -->STFMODEL NAME IS DMDSPLY. MODELS ARE DEMAND, --> ENDOGENOUS ARE Q, P. NO SHOW Specification of definitional equations Some econometric models may contain definitional equations. Specification of definitional equations is accomplished by using the IDENTITY sentence. When definitional equations are specified, it is also necessary to specify the endogenous variables in the model (using the ENDOGENOUS sentence). Here we use the U.S. economy model I discussed in Klein (1950) as an illustrative example. Klein's model is described as: Consumption: C t = α 1+ ( ω 1+ω 2B)Pt +ω 3TWt + a1t (29) function Investment: I t =α 2 + ( ω 4 +ω 5B)P t + ( ω 6B)Kt + a2t (30) function Demand for labor: PW t = α 3+ ( ω 7 +ω 8B)Et +ω 9YRt + a3t (31) function Identity: Et = Yt + TAXt = Gwt (32) (Private product) Identity: Yt = Ct + It + Gt TAXt (33) (National income) Identity: Pt = Yt TWt (34) (Profits) Identity: Kt = Kt 1+ It (35) (Capital stock) Identity: TWt = PWt + GW t, (37)

113 3.48 MULTIVARIATE TIME SERIES USING STF MODELS where C t : consumption I t : investment PW t : private wage bill GW t : government wage bill TW t : total wage bill P t : profits Y t : national income K t : end-of-year capital stock E t : private product TAX t : indirect taxes YR t : time in year (YR t = year ) The identities specified in the STFMODEL paragraph read almost identically to those in the original equations. For example, the above identity equations can be specified in the IDENTITY sentence as: IDENTITIES ARE E =Y+TAX-GW; Y =C+I+G-TAX; P =Y-PW-GW; K In the specification of definitional equations, lag operators are specified within parentheses to the right-hand side of a variable (e.g., K(B**1) denotes K t 1 ). In addition coefficients of each variable used in a definitional equation do not need to be set equal to unity. For example, if an identity is Et = Yt + 2.3TAX t 3.2GW t, it can be specified as E= Y+ 2.3*TAX 3.2*GW. In any specification it is important to realize the following rules for the expression of identities must be followed: (1) the asterisk notation (*) cannot be omitted; (2) all coefficients must be constants and variables cannot be used as coefficients; (3) coefficients must precede the pertinent variable and cannot be enclosed within parentheses. The Klein's U.S. economy model shown above is specified in the paragraphs below.

114 MULTIVARIATE TIME SERIES USING STF MODELS >TSMODEL NAME IS CONSUMP. NO SHOW. MODEL IS --> C = CNST1 + (WL+W2*B)P + (W3)TW + NOISE. -->TSMODEL NAME IS INVEST. NO SHOW. MODEL IS --> I = CNST2 + (W4+W5*B)P + (W6)K + NOISE. -->TSMODEL NAME IS PWAGES. NO SHOW. MODEL IS --> PW = CNST3 + (W7+W8*B)E + (W9)TR @ -->STFMODEL NAME IS --> MODELS ARE CONSUMP, INVEST, --> IDENTITIES ARE E = --> Y = --> P = Y --> K = --> TW = --> NO SHOW Examples for estimation of STF model The STF model specified in the above sections will now be estimated. The residual series for each equation may be retained for diagnostic checking. (A) Kmenta's demand-supply model Univariate estimation of each equation is performed in order to obtain reasonable initial values for estimation of the simultaneous model. Output related to the two ESTIM paragraphs is suppressed. -->TSMODEL NAME IS --> MODEL IS Q=A1+(A2)P+(A3)D+NOISE. -->ESTIM DEMAND >TSMODEL NAME IS --> MODEL IS Q=B1+(B2)P+(B3)F+(B4)YR+NOISE. -->ESTIM SUPPLY >STFMODEL NAME IS DMDSPLY. MODELS ARE DEMAND, --> ENDOGENOUS ARE Q,P. SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- DMDSPLY MODEL SUMMARY FOR EQUATION 1 -- DEMAND VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED

115 3.50 MULTIVARIATE TIME SERIES USING STF MODELS Q RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE D RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 A1 CNST 1 0 NONE A2 P NUM. 1 0 NONE A3 D NUM. 1 0 NONE MODEL SUMMARY FOR EQUATION 2 -- SUPPLY VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED Q RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE F RANDOM ORIGINAL NONE YR RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 B1 CNST 1 0 NONE B2 P NUM. 1 0 NONE B3 F NUM. 1 0 NONE B4 YR NUM. 1 0 NONE >SESTIM MODEL --> STOP MAXIT(30),LIKE( ),EST( ). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 20 LOG LIKELIHOOD AT INITIAL ESTIMATES = ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-06 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+02 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-15 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-10 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-10 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- DMDSPLY MODEL SUMMARY FOR EQUATION 1 -- DEMAND VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED

116 MULTIVARIATE TIME SERIES USING STF MODELS 3.51 Q RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE D RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 A1 CNST 1 0 NONE A2 P NUM. 1 0 NONE A3 D NUM. 1 0 NONE MODEL SUMMARY FOR EQUATION 2 -- SUPPLY VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED Q RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE F RANDOM ORIGINAL NONE YR RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 4 B1 CNST 1 0 NONE B2 P NUM. 1 0 NONE B3 F NUM. 1 0 NONE B4 YR NUM. 1 0 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 20 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 Q E E Q E E (B) Klein s U.S. economy Model I In the estimation of this model, annual observations from 1920 through 1941 are used. In this data set, two maximum values for the likelihood function may be obtained. The global maximum reported or obtained from literature and computer programs (e.g., Brundy and Jorgenson 1974, Hausman 1974, Hall and Hall 1981, Joreskog and Sorbom 1978, and

117 3.52 MULTIVARIATE TIME SERIES USING STF MODELS TROLL 1980) may not be a valid solution to the model. The estimates obtained below are more stable than those associated with the global maximum as the deletion of a single data point at the beginning will not result in widely varying estimates of the parameters. In the following model estimation, the initial values are obtained by the ESTIM paragraph equation by equation. -->STFMODEL NAME IS USECON. MODELS --> ENDOGENOUS ARE --> IDENTIES ARE --> --> --> --> TW=PW+GW. SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- USECON MODEL SUMMARY FOR EQUATION 1 -- CONSUMP VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED C RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE TW RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST1 CNST 1 0 NONE P10 P NUM. 1 0 NONE P11 P NUM. 1 1 NONE TW10 TW NUM. 1 0 NONE MODEL SUMMARY FOR EQUATION 2 -- INVEST VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED I RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE K RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST2 CNST 1 0 NONE P20 P NUM. 1 0 NONE P21 P NUM. 1 1 NONE K21 K NUM. 1 1 NONE

118 MULTIVARIATE TIME SERIES USING STF MODELS MODEL SUMMARY FOR EQUATION 3 -- PWAGES VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PW RANDOM ORIGINAL NONE E RANDOM ORIGINAL NONE YR RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST3 CNST 1 0 NONE E30 E NUM. 1 0 NONE E31 E NUM. 1 1 NONE YR30 YR NUM. 1 0 NONE >SESTIM MODEL --> STOP --> OUTPUT NOPRINT(ITER). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 22 LOG LIKELIHOOD AT INITIAL ESTIMATES = ** THE LOG LIKELIHOOD CANNOT BE REDUCED -- ITERATING STOPS... ITERATION TERMINATED DUE TO: RELATIVE CHANGE IN (COVARIANCE DETERMINANT)**0.5 LESS THAN.1000D-06 TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+02 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-06 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-02 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D+00 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- USECON MODEL SUMMARY FOR EQUATION 1 -- CONSUMP VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED C RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE TW RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST1 CNST 1 0 NONE P10 P NUM. 1 0 NONE P11 P NUM. 1 1 NONE TW10 TW NUM. 1 0 NONE

119 3.54 MULTIVARIATE TIME SERIES USING STF MODELS MODEL SUMMARY FOR EQUATION 2 -- INVEST VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED I RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE K RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 5 CNST2 CNST 1 0 NONE P20 P NUM. 1 0 NONE P21 P NUM. 1 1 NONE K21 K NUM. 1 1 NONE MODEL SUMMARY FOR EQUATION 3 -- PWAGES VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PW RANDOM ORIGINAL NONE E RANDOM ORIGINAL NONE YR RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 9 CNST3 CNST 1 0 NONE E30 E NUM. 1 0 NONE E31 E NUM. 1 1 NONE YR30 YR NUM. 1 0 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 21 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 C E E I E E PW E E

120 MULTIVARIATE TIME SERIES USING STF MODELS 3.55 To obtain the global maximum likelihood estimates, one may perform nonlinear estimation with initial values in the neighborhood of the global estimates. Klein's U.S. Economy Model I with Autoregressive Disturbances To examine the validity of the traditional Klein's U.S. economy model I, an alternative model is now specified. In this model, an autoregressive term is included in the disturbance for each equation. This additional term is employed to reduce the serial correlation of the noise series of each equation. -->TSMODEL NAME IS --> MODEL IS C=CNST1+(P10+P11*B)P+(TW10)TW+1/(1-PHI1*B)NOISE. -->TSMODEL NAME IS --> MODEL IS I=CNST2+(P20+P21*B)P+(K21*B)K+1/(1-PHI2*B)NOISE. -->TSMODEL NAME IS --> MODEL IS PW=CNST3+(E30+E31*B)E+(YR30)YR+1/(1-PHI3*B)NOISE. The newly specified equations are incorporated into a new STF model. The identity equations are the same as the traditional model. --> STFMODEL NAME IS USECON.MODELS --> ENDOGENOUS ARE --> IDENTIES ARE --> --> --> --> TW=PW+GW. Estimation of the new model is now performed. Note that due to the inclusion of the autoregressive parameters, there is substantial change in some parameter estimates. In addition, the diagonal elements of the error covariance matrix is greatly reduced, thus insuring more accurate forecasts than could be obtained by the previous model. Also, the signs of the estimates for P t and P t 1, in the second equation are now both positive. The latter result is more consistent with theory. -->SESTIM MODEL --> STOP --> OUTPUT NOPRINT(ITER). HOLD RESID(R1,R2,R3). THE FOLLOWING ANALYSIS IS BASED ON TIME SPAN 1 THRU 22 LOG LIKELIHOOD AT INITIAL ESTIMATES =

121 3.56 MULTIVARIATE TIME SERIES USING STF MODELS ITERATION TERMINATED DUE TO: MAXIMUM NUMBER OF ITERATIONS 60 REACHED TOTAL NUMBER OF ITERATIONS THE LOG LIKELIHOOD VALUE D+02 RELATIVE CHANGE IN (LOG LIKELIHOOD)** D-10 MAXIMUM RELATIVE CHANGE IN THE ESTIMATES D-04 RELATIVE CHANGE IN (COVARIANCE DETERMINANT)** D-05 SUMMARY FOR SIMULTANEOUS TRANSFER FUNCTION MODEL -- USECON MODEL SUMMARY FOR EQUATION 1 -- CONSUMP VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED C RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE TW RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 1 CNST1 CNST 1 0 NONE P10 P NUM. 1 0 NONE P11 P NUM. 1 1 NONE TW10 TW NUM. 1 0 NONE PHI1 C D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 2 -- INVEST VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED I RANDOM ORIGINAL NONE P RANDOM ORIGINAL NONE K RANDOM ORIGINAL NONE PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 6 CNST2 CNST 1 0 NONE P20 P NUM. 1 0 NONE P21 P NUM. 1 1 NONE K21 K NUM. 1 1 NONE PHI2 I D-AR 1 1 NONE MODEL SUMMARY FOR EQUATION 3 -- PWAGES VARIABLE TYPE OF ORIGINAL DIFFERENCING VARIABLE OR CENTERED PW RANDOM ORIGINAL NONE E RANDOM ORIGINAL NONE YR RANDOM ORIGINAL NONE

122 MULTIVARIATE TIME SERIES USING STF MODELS 3.57 PARAMETER VARIABLE NUM./ FACTOR ORDER CONS- VALUE STD T LABEL NAME DENOM. TRAINT ERROR VALUE 11 CNST3 CNST 1 0 NONE E30 E NUM. 1 0 NONE E31 E NUM. 1 1 NONE YR30 YR NUM. 1 0 NONE PHI3 PW D-AR 1 1 NONE ERROR COVARIANCE MATRIX ERROR CORRELATION MATRIX THE FOLLOWING SUM OF SQUARES ARE BASED ON 20 OBSERVATIONS EQ VARIABLE TOTAL RESIDUAL R-SQUARE 1 C E E I E E PW E E Model Simulation The SSIMULATE paragraph is used to generate series according to a user specified STF model. This capability is useful in research and understanding of STF modeling. Material related to the simulation of data according to a specified distribution is also presented in Chapter 5 of the SCA Reference Manual for Fundamental Capabilities.

123 3.58 MULTIVARIATE TIME SERIES USING STF MODELS SUMMARY OF THE SCA PARAGRAPHS This section provides a summary of those SCA paragraphs employed in this document. The syntax for the paragraphs is presented in both brief and full form. The brief display of the syntax contains the most frequently used sentences of a paragraph, while the full display presents all possible modifying sentences of a paragraph. In addition, special remarks related to a paragraph may also be presented with the description. It is recommended that the brief form be used before employing any System capability that can be accessed only through the use of the full form of the paragraph syntax. Each SCA paragraph begins with a paragraph name and is followed by modifying sentences. Sentences that may be used as modifiers for a paragraph are shown below and the types of arguments used in each sentence are also specified. Sentences not designated required may be omitted as default conditions (or values) exist. The most frequently used required sentence is given as the first sentence of the paragraph. The portion of this sentence that may be omitted is underlined. This portion may be omitted only if this sentence appears as the first sentence in a paragraph. Otherwise, all portions of the sentence must be used. The last character of each line except the last line must be the continuation character, "@". The paragraphs to be explained in this summary are CCM, STEPAR, MIDEN, ECCM, SCAN, MTSMODEL, MESTIM, IMESTIM, MFORECAST and CANONICAL. Legend v : variable or model name r : real value i : integer w (.) : keyword with argument)

124 MULTIVARIATE TIME SERIES USING STF MODELS 3.59 STEPAR Paragraph The STEPAR paragraph is used to perform the stepwise autoregressive fitting of vector series. Syntax for the STEPAR Paragraph Brief syntax STEPAR VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - ARFITS ARE i1, i2, Required sentences: VARIABLES and ARFITS Full syntax STEPAR VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS DESCRIBE./NO ARFITS ARE i1, i2, - - RCCMS ARE i1, i2, - - STANDARDIZED./NO SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1,w2,- - HOLD PHI(v1,v2,- - -), RESIDUALS(v1,v2,- - COVARIANCE(v). Required sentences: VARIABLES and ARFITS Sentence Used in the STEPAR Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names (labels) of the series to be analyzed. DFORDERS sentence The DFORDERS sentence is used to specify the orders of differencing to be applied on all the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none.

125 3.60 MULTIVARIATE TIME SERIES USING STF MODELS MAXLAG sentence The MAXLAG sentence is used to specify the maximum number of lagged sample crosscorrelation matrices to be computed. The default is 24. DESCRIBE sentence The DESCRIBE sentence is used to specify the display of descriptive statistics and principal component information of the original series (after any differencing). The default is NO DESCRIBE. ARFITS sentence The ARFITS sentence is used to specify the lags employed when performing stepwise autoregression fitting. RCCMS sentence This sentence is used to specify those lags in the stepwise autoregressive fits for which the sample cross correlation matrices of residual series will be computed and displayed. The number of CCM's of residual series to be computed is controlled by the MAXLAG sentence. The default is none. STANDARDIZED sentence This sentence is used to specify that the stepwise autoregression is based on the standardization of the original or differenced series. The standardized series have variances 1.0 as each series is scaled by dividing by its sample standard deviation. The default is NO STANDARDIZED. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span of the series which is the span of the shortest series if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output is specified. Output may then be increased from this level by use of PRINT, or decreased from this level by use of NOPRINT. The keywords for LEVEL and output printed are: BRIEF: NORMAL: DETAILED: SUMMARY SUMMARY and SIGNS SUMMARY, CCM_VALUES, SIGNS, PHI_VALUES, STD_ERRS and CORR_PHI where the reserved words on the right denote: SUMMARY: the summary table for stepwise regression CCM_VALUES: the values of cross-correlation matrices of residual. Series after autoregressive fitting

126 MULTIVARIATE TIME SERIES USING STF MODELS 3.61 SIGNS: the significant signs for CCM's and autoregressive coefficients PHI_VALUES: the display of the estimated values of the autoregressive coefficients and principal component information. After each lag of a stepwise fit STD_ERRS: the display of the standard errors of the autoregressive coefficients and principal component information after each lag of a stepwise fit CORR_PHI: the correlation matrices for the stepwise autoregressive coefficients These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace until the end of the session. Only those statistics desired to be retained need be named. Values are placed in the associated variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: PHI: the autoregression matrices for the last autoregression fitted. The number of variable names must be the same as the number of lags specified in the ARFIT sentence. RESIDUALS: the residual series. The number of variables specified in this sentence must be the same as the number of series in the model. COVARIANCE: the covariance matrix for the noise.

127 3.62 MULTIVARIATE TIME SERIES USING STF MODELS CCM Paragraph The CCM paragraph is used to compute sample cross correlation matrices of vector time series. Syntax for the CCM Paragraph Brief syntax CCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, Required sentence: VARIABLE Full syntax CCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS DESCRIBE./NO SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1, w2, - - -). Required sentence: VARIABLES Sentence Used in the CCM Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names (labels) of the series to be analyzed. DFORDERS sentence The DFORDERS sentence is used to specify the orders of differencing to be applied on all the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none. MAXLAG sentence The MAXLAG sentence is used to specify the maximum number of lagged sample crosscorrelation matrices to be computed. Default is 24.

128 MULTIVARIATE TIME SERIES USING STF MODELS 3.63 DESCRIBE sentence The DESCRIBE sentence is used to specify the display of descriptive statistics and principal component information of the original series (after any differencing). The default is NO DESCRIBE. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span of the series which is the span of the shortest series if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output is specified. Output may then be increased from this level by use of PRINT, or decreased from this level by use of NOPRINT. The keywords for LEVEL and output printed are: NORMAL : SIGNS DETAILED : CCM_VALUES, and SIGNS where the reserved words on the right denote: CCM_VALUES: the values of cross-correlation matrices SIGNS: the significant signs for CCM's These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph.

129 3.64 MULTIVARIATE TIME SERIES USING STF MODELS STFMODEL Paragraph The STFMODEL paragraph is used to specify an STF model or to modify a model specified previously. For each STF model specified in the paragraph, a distinguishing label or name also must be given. A number of different models may be specified in the same SCA session, each having a unique name, and subsequently employed at a user's discretion. Moreover, the label also allows a model to be modified in the sense of adding additional following an estimation. Syntax for the STFMODEL Paragraph Brief syntax STFMODEL NAME IS MODELS ARE model-name1, model-name2, Required sentence: MODELS Full syntax STFMODEL NAME IS MODELS ARE model-name1, model-name2, - - FIXED-PARAMETERS ARE v1, v2, - - CONSTRAINTS ARE (v1,v2,- - -), - - -, (v1,v2,- - COVARIANCE IS DEPENDENCY IS ENDOGENOUS ARE v1,v2, - - IDENTITIES ARE - - SHOW./NO CHECK. / NO SIMULATION. / NO SIMULATION. Required sentence: NAME

130 MULTIVARIATE TIME SERIES USING STF MODELS 3.65 Sentence Used in the STFMODEL Paragraph NAME sentence The NAME sentence is used to specify a unique name (label) for the STF model specified in the paragraph. This label is used to refer to this model in the SESTIM, SFORECAST and SSIMULATE paragraphs or if the model is to be modified. MODEL sentence The MODEL sentence is used to specify the names of the equations in the STF model. The model name for each equation is that of a label specified previously in a TSMODEL paragraph. FIXED-PARAMETER sentence This sentence is used to specify the parameters whose values will be held constant during model estimation, where v's are the parameter names. The default condition is that no parameters are fixed. CONSTRAINT sentence This sentence is used to specify that the parameters within each pair of parentheses will have the same value during model estimation. The default condition is that no parameters are constrained to be equal. COVARIANCE sentence The COVARIANCE sentence is used to specify the label of an existing or a new variable where the noise covariance matrix is or will be stored. If the variable is already defined, the covariance matrix will be used as the initial covariance values in estimation and in forecasting. Otherwise the covariance matrix is calculated from the residual series derived from the specified model and initial parameter estimates. Note that the SCA System designates an internal variable to store the estimated covariance matrix and the specification of this sentence is optional. The internal variable is overwritten upon each execution of estimation. Specification of this sentence is recommended only in the case where information on multiple models is kept in the SCA workspace. DEPENDENCY sentence The DEPENDENCY sentence is used to specify the dependency structure for the noise covariance matrix. Dependent series are grouped within pairs of parentheses. For example, (Y1, Y2, Y3), (Y4, Y5) signifies that any of the series in the first group with names Y1, Y2, Y3 are independent of the series in the second group, but the series within each group may be dependent. If all series are independent, the keyword NONE (or NONEXISTENT) can be used to describe the dependency structure. If all series are independent and have the same variance, the keyword DIEQUAL can be used to specify this condition. If this sentence is not specified, all series are assumed to be possibly interdependent.

131 3.66 MULTIVARIATE TIME SERIES USING STF MODELS SHOW sentence The SHOW sentence is used to display a summary of the specified model. The default is SHOW. CHECK sentence The CHECK sentence is used to check whether all roots of the AR, MA, and denominator polynomials lie outside the unit circle. The default is NO CHECK. SIMULATION sentence The SIMULATION sentence is used to specify that the model will be used for simulation purposes. The default is NO SIMULATION. SESTIM Paragraph The SESTIM paragraph is used to control the estimation of the parameters of an STF model that has been specified previously. Syntax for the SESTIM Paragraph Brief syntax SESTIM MODEL model-name. Required sentence: MODEL Full syntax SESTIM MODEL STOP-CRITERIA ARE MAXIT(i), SPAN IS i1, OUTPUT IS LEVEL(w), HOLD RESIDUALS (v1,v2,---), Required sentence: MODEL

132 MULTIVARIATE TIME SERIES USING STF MODELS 3.67 Sentence Used in the SESTIM Paragraph MODEL sentence The MODEL sentence is used to specify the name (label) of the STF model to be estimated. The label must be that corresponding model specified previously in an STFMODEL paragraph. STOP sentence The STOP sentence is used to specify the stopping criterion for nonlinear estimation. The argument, i, for the keyword MAXIT specifies the maximum number of iterations (default is i=10), and the argument, r, for the keyword LIKELIHOOD specifies the value of the relative convergence criterion on the likelihood function (default is r=0.001). Estimation iterations will be terminated when the relative change in the value of The MODEL sentence is used to specify the name (label) of the STF model to be estimated. The label must be that corresponding model specified previously in an STFMODEL paragraph between two successive iterations is less than or equal to the convergence criterion, or if the maximum number of iterations is exceeded. SPAN sentence The SPAN sentence is used to specify the span of time indices, I1 to i2, for which the data will be analyzed. The default is the maximum possible span of the series which is the span of the shortest series, if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed or plotted for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output is specified and then output may be increased from this level by use of PRINT, or decreased from this level by use of NOPRINT. The keywords for LEVEL and output printed or plotted are: BRIEF: NORMAL: DETAILED: RCORR is not printed RCORR RCORR, ITERATION, and CORR where the reserved words on the right denote: RCORR: ITERATION: CORR: the reduced correlation matrix for the parameter estimate the parameter and covariance estimates for each iteration the correlation matrix for the parameter estimates These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named.

133 3.68 MULTIVARIATE TIME SERIES USING STF MODELS Values are placed in the variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph. The values that may be retained are: RESIDUALS: the residual series. The number of variable names specified in this sentence must be the same as the number of series in the model. FITTED: the one step ahead forecasts (fitted values) for each series. The number of variables specified in this sentence must be the same as the number of series in the model. COVARIANCE: covariance matrix for the noise series. SFORECAST Paragraph The SFORECAST paragraph is used to compute the forecasts of future values of a vector time series based on a specified STF model. Syntax for the SFORECAST Paragraph Brief syntax SFORECAST MODELS ARE model-name1, model-name2, Required sentence: MODELS Full syntax SFORECAST MODELS ARE model-name1, model-name2, ORIGINS ARE i1, i2, NOFS ARE i1, i2, REPLACE v1, v2, NOPSIWEIGHTS IS I. OUTPUT IS PRINT(w1,w2,- - -), NOPRINT(w1,w2,- - -), HOLD FORECASTS(v1,v2,- - -), STD_ERRS(v1,v2,- Required sentence: MODEL

134 MULTIVARIATE TIME SERIES USING STF MODELS 3.69 Sentence Used in the SFORECAST Paragraph MODEL sentence The MODEL sentence is used to specify the names (labels) of the STF and univariate models for the series to be forecasted. The labels must be specified in a previous STFMODEL or TSMODEL paragraph. ORIGINS sentence The ORIGINS sentence is used to specify the time origins for forecasts. The default is one origin, the last observation. NOFS sentence The NOFS sentence is used to specify for each time origin the number of time points ahead for which forecasts will be generated. The number of arguments in this sentence must be the same as that in the ORIGINS sentence. The default is 24 forecasts for each time origin. REPLACE sentence The REPLACE sentence is used to denote those variables for which the user will specify forecasts. These user supplied forecasts will be used as inputs in the computation of other forecasts. Ordinarily, the SCA System provides forecasts for all output variables according to the user specified models. Note that any variable specified in this sentence must be an output variable in one equation of the STF model and also used as an input variable in some equation. The values to be used in forecasting must be joined at the end of the original series. NOPSIWEIGHTS sentence The NOPSIWEIGHTS sentence is used to specify the number of psi weight matrices to be displayed. See Box and Tiao (1981) for an explanation of these matrices. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed or plotted for computed statistics. Control is achieved by increasing or decreasing the basic level of output by use of PRINT or NOPRINT, respectively. The keyword for PRINT and NOPRINT is: FORECAST: forecast values for each time origin The default condition is PRINT(FORECAST). HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace until the end of the session. Only those statistics desired to be retained need be named. Values are placed in the variable named in parenthesis. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: FORECASTS: forecasts at the last time origin

135 3.70 MULTIVARIATE TIME SERIES USING STF MODELS STD_ERRS: standard errors of the forecasts at the last time origin SSIMULATE Paragraph The SSIMULATE paragraph is used to generate a series according to a user specified STF model. This capability is useful for both research and the understanding of STF modeling. Syntax for the SSIMULATE Paragraph Brief syntax SSIMULATE MODEL IS model-name. NOISE IS VARIABLE(v) or distribution(parameters). NOBS IS i. Full syntax SSIMULATE MODEL IS model-name. VARIABLE IS v. NOISE IS VARIABLE(v) or distribution(parameters). NOBS IS i. SEED IS i. OMIT IS i. Required sentences: MODEL, NOISE and NOBS Sentences Used in the SSIMULATION Paragraph MODEL sentence The MODEL sentence is used to specify the name of the STF model to be simulated. The model may be any STF model specified in a STFMODEL paragraph. The SIMULATION must also appear in the STFMODEL paragraph. VARIABLE sentence The VARIABLE sentence is used to specify name(s) of the variable(s) to store the simulation results. If this sentence is not specified, the names of the output variables in the STF model will be used to store the generated data. For the simulation of a singleequation model, only one variable may be specified. For the simulation of a sequence, the number of variables must be the same as the number of equations.

136 MULTIVARIATE TIME SERIES USING STF MODELS 3.71 NOISE sentence The NOISE sentence is used to specify the noise for the simulation time series model. One should specify either the name of the variable containing values to be used as the noise sequence or the distribution for generating the noise. The following distributions can be employed: N(r1,r2): normal distribution with mean r1 and variance r2 MN(v1,v2): multivariate normal distribution with mean vector v1 and covariance matrix v2. Note that v1 and v2 must be names of defined variables. NOBS sentence The NOBS sentence is used to specify the number of observations or cases to be simulated. SEED sentence The SEED sentence is used to specify an integer or the name of a variable for starting the random number generation. When a variable is used, the seven digit value is used as a seed if it is not defined yet, or the value of the variable is used if the variable is an existing one. After the simulation, the variable contains the seed last used. The number of digits for the seed must be not more than 8 digits. The default is OMIT sentence The OMIT sentence is used to specify the number of observations to be omitted at the beginning of the simulated data. The default is none.

137 3.72 MULTIVARIATE TIME SERIES USING STF MODELS REFERENCES Box, G.E.P. and G.M. Jenkins (1970). Francisco: Holden-Day. Time Series Analysis, Forecasting and Control. San Brundy, J.M. and D.W. Jorgenson (1974). "The Relative Efficiency of Instrumental Variables Estimators of Systems of Simultaneous Equations", Annals of Econometric and Social Measurement 3: Chow, G.C. and R.C. Fair (1973). "Maximum Likelihood Estimation of Linear Equation Systems with Autoregressive Residuals". Annals of Economic and Social Measurement 2: Fair, R.C.(1970)."The Estimation of Simultaneous Equation Models with Lagged Endogenous Variables and First Order Serially Correlated Errors". Econometrica 37: Granger, C.W.J. and P. Newbold (1977). Forecasting Economic Time Series. Academic Press, New York. Hall, B.H. and R.E. Hall (1981). Time Series Processor, User's Manual (Version. Stanford, California. Hannan, E.J. (1971). "The Identification Problem for Multiple Equation System with Moving Average Errors". Econometrica 39: Hanssens, D.M. and L.-M. Liu (1983). "Lag Specification in Rational Distributed Lag Structural Models". Journal of Business and Economic Statistics 1: Hausman, J.A. (1974). "Full Information Instrumental Variables Estimation of Simultaneous Equations Systems", Annals of Economic and Social Measurement IL4: Hendry, D.F. (1971). "Maximum Likelihood Estimation of Systems of Simultaneous Regression Equations with Errors Generated by a Vector Autoregressive Process". International Economic Review 12: Joreskog, K.G. and D. Sorbom (1978). "LISREL IV, Analysis of Linear Structural Relationships by the Method of Maximum Likelihood". SPSS. Klein, L.R. (1950). "Economic Fluctuations in the United States, ", Cowles Commission Monograph 11. New York: John Wiley and Sons. Kmenta, J. (1971). Elements of Econometrics. New York: Macmillan Publishing Co. Kohn, R. (1979). "Identification Results for ARMAX Structures". Econometrica 47:

138 MULTIVARIATE TIME SERIES USING STF MODELS 3.73 Liu, L.-M. and D.M. Hanssens (1982). "Identification of Multiple-Input Transfer Function Models". Communications in Statistics - Theory and Methods 11: Liu, L.-M., G.B. Hudak, G.E.P. Box, M.E. Muller, and G.C. Tiao (1983). The SCA System for Univariate-Multivariate Time Series and General Statistical Analysis. Scientific Computing Associates Corp., Chicago, Illinois. Liu, L.-M. and G.B. Hudak (1984). "Unified Econometric Model Building Using Simultaneous Transfer Function Equations". Time Series Analysis: Theory and Practice 5. Ed: O.D. Anderson, North-Holland, Amsterdam and New York. Liu, L.-M. (1987). Sales Forecasting Using Multi-Equation Transfer Function Models. Journal of Forecasting 6: Liu, L.-M. (1991). "Use of Linear Transfer Function Analysis in Econometric Time Series Modeling". Statistica Sinica 1: MACC (1965). GAUSHAUS -- Nonlinear Least Squares. Madison Academic Computing Center, University of Wisconsin, Madison. Quenouille, M.H. (1957). The Analysis of Multiple Time Series. New York: Hafner Publishing Company. Reinsel, G. (1979). "FIML Estimation of the Dynamic Simultaneous Equations Model with ARMA Disturbances". Journal of Econometrics 9: Sargan, J.D. (1961). "The Maximum Likelihood Estimation of Economic Relationships with Autoregressive Residuals", Econometrica 19: Tiao, G.C. and G.E.P. Box (1981). "Modeling Multiple Time Series with Application". Journal of American Statistical Association 76: Tiao, G.C. and R.S. Tsay (1983). "Multiple Time Series Modeling and Extended Sample Cross-Correlations". Journal of Business and Economic Statistics 1: Troll (1980). GREMLIN: Estimation of Equation Systems. MIT Information Processing Services. Wall, K.D. (1976). FIML Estimation of Rational Distributed Lag Structural Form Models. Annals of Economic and Social Measurement 5: Zellner, A. and F. Palm (1974). "Time Series Analysis and Simultaneous Equation Econometric Models". Journal of Econometrics 2: Zellner, A. (1979). "Statistical Analysis of Econometric Models". Journal of American Statistical Association 74:

139

140 CHAPTER 4 MULTIVARIATE TIME SERIES ANALYSIS AND FORECASTING USING VECTOR ARMA MODELS Box-Jenkins ARIMA (autoregressive-integrated moving average) models are effective characterizations for many univariate (single) time series and are widely used. Such models are effective because they are able to represent the historical "memory" pattern of a single series rather parsimoniously. The models are popular because Box and Jenkins (1970) were able to provide a cohesive framework for model building. Box-Jenkins time series models can be extended to incorporate the influence of one or more explanatory variables or factors. Intervention analysis incorporates the effects of known external events within the ARIMA model framework. Transfer function models combine information of other related (and possibly stochastic) time series and an ARIMA model of an underlying disturbance to model the behavior of a single series. More information on ARIMA, intervention and transfer function models can be found in Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1 (Liu and Hudak 1992). Analyses of business, economic, environmental or industrial time series often require that we consider modeling several variables jointly. In some situations, it is appropriate to assume that the relationship(s) between the input (exogenous) variable(s) and the output (response) variable are unidirectional. That is, we may be able to assume that there is no feedback in a model; in other words, the input affects the output, but the output does not affect the input. For example, it is reasonable to assume that crude oil prices affect the prices of gasoline, but gasoline prices should not affect crude oil prices (at least in the short term). Similarly, the temperature recorded in an attic are affected by the outside temperature, but not conversely. When the assumption of unidirectional relationships can be justified, it may be appropriate to employ transfer function models for time series analyses. However, in many applications a unidirectional assumption may not be appropriate. Physical laws may dictate the consideration of interrelationships in the case of industrial or environmental data. In the case of business and economic data, there may not be sufficient theoretical understanding to establish any a priori causality. For example, it is difficult to postulate the dynamic relationships between major economic variables such as money supply, interest rate, inflation, producer price and industrial production using only economic theory. In fact, when studying such variables, a primary objective of a time series analysis may be to understand the causal relationships among the variables of the system. In this document, we consider vector autoregressive-moving average models. Such models are referred to as either vector ARMA or VARMA models. As will be seen, the vector ARMA model is a multivariate analogue of the univariate Box-Jenkins ARIMA model that employ linear operators among variables within the model. As a result, vector ARMA models may be useful to represent the dynamic relationships between series of interest or provide more accurate forecasts than those obtained from Box-Jenkins ARIMA models. Quennoille (1957) was one of the earliest advocates of vector time series modeling. More extensive discussions of vector time series and vector ARMA modeling can be found in

141 4.2 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Hannan (1970), Tiao and Box (1981), Jenkins and Alavi (1981), Granger and Newbold (1986), Wei (1990), Mills (1990), Pankratz (1991), and references contained therein. In this section, we will introduce basic notation, the form of vector ARMA models and the assumptions of the model. In Section 1, we consider the implication of vector models for individual series and provide an overview of the vector ARMA modeling strategy. Examples are used in Sections 2, 3 and 4 to illustrate vector ARMA modeling of nonseasonal processes. A seasonal example is presented in Section 5. The capabilities described in the above sections are contained in the SCA-MTS modules. In Section 6 we present how to estimate vector ARMA models in an automatic manner using the SCA System. The latter capabilities are contained in the SCA-EXPERT module. The vector ARMA model The basic form and assumptions of the univariate Box-Jenkins autoregressive-moving average (ARMA) model can be extended and applied to multiple time series. The univariate ARMA model has the form Z φ Z φ Z... φ Z = C + a θ a θ a... θ a (1) t 1 t-1 2 t-2 p t-p t 1 t 1 2 t 2 q t q where {at} is a sequence of random errors that are independently and identically distributed with a normal distribution having zero mean and constant variance σ2. By introducing the backshift operator, B, where 2 t t 1 t t t 2 BZ = Z ; B Z = B(BZ ) = Z ; and so on, we can rewrite (1) as 2 p 2 q 1 2 p t 1 2 q t (1 φ B φ B... φ B )Z = C + (1 θ B θ B... θ B )a (2) or, more simply, as where φ (B)Z = C +θ (B)a (3) t t 2 p 1 2 p φ(b) (1 φ B φ B... φ B ), and 2 q 1 2 q θ (B) = (1 θ B θ B... θ B ). The constant term, C, is related to the mean level of Z, t µ, as C = (1 φ1 φ2... φ p). Restrictions need to be imposed on the parameters of φ (B) and θ(b) in order to ensure the conditions known as stationarity and invertibility, respectively. Invertibility is a means to ensure uniqueness in the MA component for a given ACF of a series, while stationarity imposes a behavior on the series that is without any systematic changes in level (trend), variance or strictly periodic behavior (Chatfield 1985). Invertibility and stationarity also permit θ(b) and φ (B) to be used as if they represented polynomial operators. If a series is not stationary (e.g., has no fixed mean level), then the autoregressive portion of the model must include a stationary inducing operator. Stationary is usually achieved through the use of one

142 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.3 or more differencing operators. That is, instead of modeling the series model the change in the series, Z, t we may instead W t = (1 B)Zt = Zt Zt 1. The vector ARMA model is a direct generalization of the above model. In place of the single series Z t, we consider a vector containing several series. We can represent this vector as Z = [Z, Z,...,Z ]'. t 1t 2t kt The vector ARMA model can then be written as Z ΦZ Φ Z... Φ Z = C+ a Θ a Θ a... Θ a. (4) t 1 t 1 2 t 2 p t-p t 1 t 1 2 t 2 q t-q In this representation, C is a vector of constants, the Φ 's and Θ 's are kxk matrices, and { a t } is a sequence of random shock vectors that are independently and identically distributed as a k-variate normal distribution with zero mean and covariance matrix Σ. Equation (4) can be expressed more compactly as or where 2 p 2 q 1 2 p t 1 2 q t ( I ΦB Φ B... Φ B ) Z = C+ ( I Θ B Θ B... Θ B ) a (5) Φ(B) Z = C+ Θ (B) a (6) t t 2 p 1 2 p Φ(B) = ( I ΦB Φ B... Φ B ), and 2 q 1 2 q Θ(B) = ( I Θ B Θ B... Θ B ). In (5) and (6) above, the term I represents a kxk identity matrix and the backshift operator is applied to each component of the vector series Z t. As in the univariate case, the constant term used above is related to the mean of the vector series as C= ( I Φ Φ... Φ ) µ. 1 2 p Restrictions are also imposed on the parameter matrices of Φ (B) and Θ (B) to ensure stationarity and invertibility as before. Information regarding these restrictions may be found in Tiao and Box (1981), Liu (1986), Wei (1990) and Mills (1990). Differencing of component series is usually employed to achieve a stationary model, although the use of differencing in vector ARMA models is more complicated than its univariate counterpart. Differencing is discussed further in Sections 1.4, 3 and 4. More extensive information can be found in Tiao and Box (1981), Liu (1986) and Wei (1990).

143 4.4 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.1 Implications of the Vector ARMA Model In this section, we will consider some special cases of the general vector ARMA model. In this way we may obtain a better understanding for the implications of the vector model. Later in this section, we will provide a brief overview of the relationship of the vector ARMA model to other time series models. For better understanding, we will illustrate all cases using the bivariate vector ARMA model. That is, the illustrations of this section assume the vector series Z t consists of two component series, Z 1t and Z 2t The vector MA(1) model As an initial example, consider the stationary vector MA(1) model (i.e., the vector ARMA model of (5) with p=0 and q=1), Z = C+ ( I Θ B) a, where (7) t t θ11 θ12 Θ =. θ21 θ22 If we use matrix multiplication to expand (7), we obtain the following two equations Z = C + a θ a θ a (8) 1t 1 1t 11 1(t 1) 12 2(t 1) Z = C + a θ a θ a. 2t 2 2t 21 1(t 1) 22 2(t 1) Since there are no autoregressive terms in either equation of (8), C 1 represents the mean of Z 1t and C 2 is that of Z 2t. We see that each component series depends on the current value of the random shock associated with that series and both elements of the shock vector of the previous time period (i.e., a t 1 ). In the equation for Z 1t, we see that θ 11 reflects the effect of the prior value of the random shock (error) associated with Z 1t. The parameter θ 12 reflects the effect of the prior value of the shock associated with Z 2t. The only influence that Z 2t has on Z 1t is through this term. If this term is not present, the equation reduces to that of a univariate MA(1) model. Analogous statements may be made about the equation for Z. Since each component series is only affected by its current shock and the shocks one time period earlier, it can be shown that each individual series behaves as a univariate MA(1) process. The derivation is not provided here. 2t The vector AR(1) model As a second example, we now consider the individual component equations for the stationary vector AR(1) model. The matrix form of this model is ( I Φ B) Z = C+ a, where (9) t t

144 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.5 φ11 φ12 Φ =. φ21 φ 22 Using matrix multiplication to expand (9), we find the individual components of this model are Z φ Z φ Z = C + a (10) 1t 11 1(t 1) 12 2(t 1) 1 1t Z φ Z φ Z = C + a. 2t 21 1(t 1) 22 2(t 1) 2 2t We see that each component series is dependent on weighted realizations of each series in the prior period and a current random shock. If we rearrange the terms of (10) slightly, we obtain Z = C +φ Z +φ Z + a (11) 1t (t 1) 12 2(t 1) 1t Z = C +φ Z +φ Z + a. 2t (t 1) 22 2(t 1) 2t In this form we may clearly see the underlying bivariate linear model. In each equation of (11), the current observation is expressed in a linear form with respect to its "input" variables (i.e., the components of the vector series in the immediately preceding time period). As in the case of the vector MA(1) model, the effect of one series on the other is through a past value, here the prior observed values (i.e., Zt 1). Unlike the vector MA(1) case, the constant term does not represent the mean vector of Z t. In this case the mean vector is related to the constant term according to µ = [ I Φ] 1 C. Hence the mean of Z 1t is µ 1 = {(1 φ11)(1 φ22) φ12φ21}{(1 φ 22)C1+φ 21C 2} The mean of Z 2t can be computed similarly. In the above MA(1) model, we noted that individual series behaved as MA(1) processes. Since the general form of the component series of the vector AR(1) process is somewhat similar to that of the vector MA(1) case, we may wonder if each component series may be represented as an AR(1) process when modeled separately. The behavior of the individual series may indicate that this is not the case. In fact, we cannot be entirely sure of the exact model that component series may follow if modeled separately. We can only be certain of the maximum order of the models. It can be shown (but not provided here) that when modeled separately, the model for each component series can be either a univariate AR(1) or ARMA(2,1) process, depending upon the form of Φ and possible parameter cancellation. In general, the maximum order for a component series of a vector AR(1) model (when modeled separately) is an ARMA(k, k-1), where k represents the number of component series The vector ARMA(1,1) model The individual series representations corresponding to a stationary vector ARMA(1,1) model is a direct combination of that presented in the above cases. The matrix form of the model is ( I ΦB) Z = C+ ( I Θ B) a, where (12) t t

145 4.6 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Φ φ φ θ θ = Θ = and φ21 φ22 θ21 θ22 Using matrix multiplication to expand (12), we find the individual components of this model are Z φ Z φ Z = C + a θ a θ a (13) 1t 11 1(t 1) 12 2(t 1) 1 1t 11 1(t 1) 12 2(t 1) Z φ Z φ Z = C + a θ a θ a. 2t 21 1(t 1) 22 2(t 1) 2 2t 21 1(t 1) 22 2(t 1) Both series in (13) are dependent on the vector of realizations and the shock vector of the previous time period. The constant term has the same interpretation as in the vector AR(1) case. We can also realize if we were to model either of the series separately, the univariate ARMA model used can be more complex than an ARMA(1,1) A nonstationary vector model The previous three examples were all examples of stationary vector models. In practice, it is more common to encounter series that are not stationary (e.g., may exhibit no fixed mean level, cyclic patterns, nonhomogeneous variance, and so on). Differencing is commonly employed in a univariate analysis to achieve stationarity. In the vector setting, the nonstationary behavior of one series may be the direct result of the nonstationarity of another series. For example (see page 34 of Liu, 1986), both of the following series will exhibit nonstationary behavior Z =φ Z + a 1t 2(t 1) 1t (1 B)Z = (1 θ B)a. 2t 2t In such a case, if we model component series individually, we may find it necessary to use differencing in each model. However, when modeled jointly, differencing is not required for the first series, and over-differencing of series can lead to complications in model fitting. We can see that we should approach differencing of vector series more cautiously. A useful univariate ARIMA model is the ARIMA(0,1,1) model. Such a model is roughly equivalent to that associated with simple exponential smoothing, and the model has proven useful in many applications. The vector generalization of this model can be obtained from the vector MA(1) model in which all component series are differenced, or from the vector ARMA(1,1) model above with Φ = I. In the latter case, we have ( I I B) Z = C+ ( I Θ B) a, where (14) t t θ11 θ12 Θ =. θ21 θ22 Expanding (14) we obtain

146 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.7 Z Z = C + a θ a θ a (15) 1t 1(t 1) 1 1t 11 1(t 1) 12 2(t 1) Z Z = C + a θ a θ a. 2t 2(t 1) 2 2t 21 1(t 1) 22 2(t 1) The components of the constant vector now represent the trend of each series. All other interpretations of (15) are the same as that in the vector MA(1) case. Since there are no autoregressive parameters in this model, a component series behaves as an ARIMA(0,1,1) process when it is modeled separately Relationship of vector ARMA models to transfer function models Vector ARMA models are closely related to other models used to represent time series data, such as transfer function models and econometric models. In this section, we explore the relationship between vector ARMA models and unidirectional transfer function models. The relationships between vector ARMA models and econometric models are discussed in Section 1.6. In the general form of the vector ARMA model given in (4) or (6), all elements of Z t are related to all elements of Z t 1, Z t 2,..., Z t p and there is the potential for feedback relationships between all the component series. For the bivariate vector models examined thus far, we see this most clearly in the individual series equations (10) and (11) of the vector AR(1) case and those of (13) for the vector ARMA(1,1) case. However, suppose that φ 12 =0 in the equations of (11). We have the following (the constant term C is set to zero for the purpose of brevity) Z =φ Z + a (16) 1t 11 1(t 1) 1t Z =φ Z +φ Z + a. 2t 21 1(t 1) 22 2(t 1) 2t Here Z 2t is affected by prior values of both Z 1t and Z 2t, but Z 1t is affected only by its own past. Hence we have the basis of a unidirectional relationship. Now Z 1t is correlated with a 2t (since the equation for Z 1t involves a 1t and the vector of shocks, at, have a correlation structure). However, the equation for Z 2t involves Z 1(t 1) which is independent of a 2t. Thus, if we collect like terms, we obtain the transfer function φ 1 Z Z a 21 2t = 1(t 1) + 2t 1 φ22b 1 φ22b. In general, if the series that comprise Z t can be arranged so that all of the coefficient matrices (i.e., all Φ 's and Θ 's) are lower triangular (or similarly, upper triangular), then the vector ARMA model can be written as a transfer function model. To illustrate how a transfer function model may be obtained, consider the vector MA(1) model with θ 12 = 0 (and also C = 0 for convenience). From (8) we have Z = a θ a = (1 θ B)a (17) 1t 1t 11 1(t 1) 11 1t Z = a θ a θ a = ( θ B)a + (1 θ B)a 2t 2t 21 1(t 1) 22 2(t 1) 21 1t 22 2t

147 4.8 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES If we use the first equation to express a 1t in terms of Z 1t and then substitute for a 1t in the second equation, we obtain θ B Z Z (1 B)a 21 2t = 1t + θ22 2t (1 θ11b) (18) This is not yet in the form of a transfer function model as Z1(t 1) and a2(t 1) are correlated, as noted above. However, we can express a 2t as a function of a 1t by writing a =β a +ε (19) 2t 1t t where a 2t and ε t are independent. Now Z 1t and ε t are independent. Substituting (19) into (18), replacing a 1t by Z 1t, and collecting like terms yields ω +ω B Z Z (1 B) 1 δb 0 1 2t = 1t + θ22 εt (20) where ω 0 =β, ω 1 = ( βθ 22 +θ 21), and δ =θ 11. This then is the corresponding transfer function model. It is interesting to note that the input series, Z 1t, has a contemporaneous effect on Z 2t when the model for Z 2t is written in the transfer function form of (20), depending on the value of β. This possible effect is not readily apparent from the vector model, nor the equations of (17) and (18). As noted above, if the series of Z t can be arranged so that the coefficient matrices are lower triangular, then a transfer function model may be obtained. More generally, if the series can be arranged so that the Φ 's and Θ 's are all lower block triangular, then we may obtain a more general transfer function relationship. In this case, the input vector series may have feedback relationships; and the output vector series may also have such structure. To illustrate, consider the following trivariate vector MA(1) model Z1t θ11 θ12 0 a1t Z = θ θ 0 a. (21) 2t t Z 3t θ31 θ32 θ33 a3t Since θ 13 =θ 23 = 0, there is no feedback relationship from Z 3t to either Z 1t or Z 2t. Hence Z 1t and Z 2t form a vector of input series to Z 3t. These two series are allowed to have feedback relationships (in this case the relationships may be described by a bivariate vector MA(1) model). If more series are part of Z t, the vector of inputs could consist of more series. Similarly, a vector of output series may exist. However, the component series comprising this output vector may have feedback relationships among themselves. Such vector ARMA models are related to the econometric simultaneous equation model. Further information on the relationships may be found in Zellner and Palm (1974), and Liu and Hudak (1985).

148 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.9 A clear difference between unidirectional transfer function models and vector ARMA models is in the assumption of causality. A unidirectional transfer function model assumes one-way causality. A vector model makes no such assumption since it can represent time series with feedback effects resulting from two-way influences. There is also another important difference between these two model classes. A vector ARMA model does not contain any explicit contemporaneous relationships among the variables in the data vector. All individual equations derived directly from the vector model involve only lagged variables. As a result, vector models are reduced form models (in which a variable is expressed solely in terms or an error term and predetermined values of the variables of the system). Transfer function models can contain contemporaneous relationships, and consequently can be structural models. This difference is also discussed below Relationship of vector models to econometric models The vector ARMA model is related to the simultaneous equation model that forms the basis of most econometric models. To better understand this relationship, a brief overview of terminology and some concepts related to simultaneous equation systems may be in order. Two types of variables are commonly employed in a system of equations: endogenous variables, those that are simultaneously determined that the model attempts to explain (i.e., those comprising all dependent variables), and exogenous variables, those that are not determined but influence the endogenous variables. Structural models attempt to characterize the underlying theory driving each endogenous variable through a set (or sequence) of equations that involve both endogenous and exogenous variables. In order to construct a structural model, theory is employed to select what variables are to be included in the system of equations, and how they are regarded (endogenous or exogenous). One significant problem with the development of a structural model is the soundness of the theory on which it is based. Reduced form models are closely related to structural models. A reduced form model is one in which all endogenous variables are expressed in terms of an error term and predetermined variables. A predetermined variable is one whose value is determined outside the system of equations or prior to the current time period. Hence all exogenous variables and all lagged endogenous variables are considered predetermined variables. An important concept associated with structural form models is referred to as identification (or identifiability). An equation within the model is called identifiable only when that equation can be distinguished from all others in the system. In this way the coefficients of the system can be estimated. All identifiable structural form models have a corresponding reduced form model, but the converse is not necessarily true. For this reason, reduced form models may not be of use for the purpose of a structural analysis (or for the purpose of interpretation). However, they may be employed as an intermediate step toward a structural analysis. A vector ARMA model is a reduced form model since it expresses the dependent (endogenous) variables in terms of only their (predetermined) lagged values. Moreover, the vector model is an unrestricted reduced form model since all statistically significant variables are included in each equation of the system. Vector ARMA models estimate the parameters in a dynamic system without imposing any a priori restrictions on the presence of

149 4.10 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES variables in an equation. One important consequence of this approach is that the relationships among the variables are determined by the data themselves, and questionable theory is not used. Thus the results may be more objective. Vector ARMA modeling may be most appropriate for forecasting future values for some or all of the variables that comprise the vector series. However, we cannot expect to be able to use these models for an immediate relationship interpretation or structural analysis, although useful information may be revealed in such an analysis. Moreover, vector modeling may provide directions for subsequent structural analyses. Additional information on this point may be found in Mills (1990). We will now consider the strategy that will be used in vector ARMA modeling Model building strategy for vector ARMA models Box and Jenkins (1970) proposed an iterative procedure for modeling a univariate time series. This modeling approach consists of (a) tentative model identification, (b) model estimation, and (c) diagnostic checking. This iterative strategy is also applicable to vector ARMA modeling. Statistics useful for identification and diagnostic checking are generalized for the vector case, and estimation procedures are extended for the vector ARMA model. The modeling procedure and statistics are explained further in Sections 2 and 3. An important facet of the vector ARMA model of (4) and (6) is the number of parameters involved. A univariate ARMA(p,q) of the form of (4) contains p+q+1 parameters 2 (this includes the constant term, but excludes the variance of the error sequence, σ ). In the vector case, an AR or MA "parameter" actually represents a k k matrix of parameters. As a 2 result, the vector ARMA model given in (4) contains {k (p + q) + k} parameters (again excluding the covariance matrix). Even if the underlying model is an AR(1) or MA(1) model, this means there may be 6 parameters in a model for 2 series, 12 parameters for 3 series, 20 for 4 series, 30 for 5 series, and so on. Hence it is an important modeling concern to reduce the number of parameters where possible. This can be accomplished, to various degrees, by the some of the following measures (see Tiao and Box, 1981): (1) giving the strongest consideration to models of low orders (i.e., small values of p and q), (2) reducing of the number of parameters (i.e., zeroing out insignificant terms) after model fitting, (3) constraining or fixing model parameters in some other ways (e.g., setting equality constraints on some parameters), and (4) a priori simplification of a system based on knowledge of the system (with necessary checks made of the simplification used). Other schemes for model simplification have been proposed. A brief discussion of some of these methods is provided in Section 5. Methods for the simplification of the vector ARMA model remains a fundamental area of research at this time (see for example, Tiao and Tsay 1989, and Tsay 1989a, 1989b).

150 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES A Simulated Example To illustrate some aspects of vector ARMA modeling, we will first consider a simulated set of data. The data consist of two series, simulated according to the model Z = C+ ( I Θ B) a, with t t C =,, and 25.0 = = Θ. There are 250 observations in each series. The data are listed in Table 1 and are stored in the SCA workspace under the names Z1 and Z2. Plots of the series (created using the graphics program SCAGRAF) are given in Figure 1.

151 4.12 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Table 1 Simulated vector MA(1) data Series 1: Z Series 2: Z

152 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.13 Figure 1 Plot of simulated vector MA(1) data We observe that each series oscillates about a fixed mean level, a typical manifestation of a low-order univariate MA process. Based on the discussion in Section 1.1, we are open to a vector MA(q) representation for the vector series, but may be unsure of the initial order to choose for q. We will now discuss statistics that aid in this decision Sample cross correlation matrices An effective tool in the identification of a univariate ARMA model is the autocorrelation function, ACF. The sample ACF of a single non-seasonal series: (1) has an initial sequence of positive values that decays slowly if the series is not stationary,

153 4.14 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES (2) displays a cut-off pattern after a low-order lag, q, (that is, values after this lag are statistically indistinguishable from 0) for a stationary series that can be modeled as a pure MA(q) process, and (3) decays as a mixture of damping exponential or sine functions when the series is stationary and can be modeled as either a pure AR(p) or mixed ARMA(p,q) process. It is clear that we will want to consider the ACF of each individual component of our vector series. We will also want to consider an analogue to the ACF that measures the effect of one series on another. This analogue for two series is the cross correlation function, CCF. The CCF is discussed in more detail in Chapter 8 of Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1. The values of the CCF for two series, X and Y, are presented for negative and positive lags. In one case the CCF reflects the effect of the past of X on the current Y, and in the other case the CCF reflects the effect of the past of Y on the current X. In the case of two series, we can use the ACF of each series and the CCF of both series to determine if the vector comprised of the two series can be represented by a low-order MA(q) model. If the vector series is stationary, then it can be represented by a low-order MA(q) model if: (1) the ACF of each series cut-off after q lags, and (2) the CCF cuts-off after q lags in either direction (i.e., for both negative and positive lags). The same condition is true for a vector comprised of any number of series. For more than two series, we then need to examine the ACF of each individual series and the CCF for all distinct pairs of component series. Although this allows us to identify a low-order MA(q) process, the mechanics of the procedure can become unwieldy as the number of series increase. We can resolve this problem if we jointly compute the ACF for all series and the CCF for all pairs of series, then display these values in matrix form. In this way we compute and display the cross correlation matrices, CCM, for the vector series. In the case of a vector comprised of two series, X and Y, the CCM for lag l is ρ ρ ρ ρ 21 22, where ρ 11 and ρ 22 are the lag l values of the ACF for X and Y, respectively; ρ 12 is the lag l value of the CCF when Y leads X; and ρ 21 is the lag l value of the CCF when X leads Y. The values of the CCM for more series have a similar interpretation. In this way, the values associated with the same lag of all ACF and CCF are presented jointly in a compact form. Moreover, we can identify a low-order vector MA(q) model if the CCM cuts-off after lag q. The CCM cuts-off if the matrices consist of insignificant values after lag q. A matrix of values is considered to be insignificant only if all values of the matrix are not significant.

154 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.15 Since we are interested in spotting a cut-off pattern for values of the CCM, a display of the actual values may be of less interest to us than whether each value is significantly different from zero. Just as a plot of the ACF is more useful than listing the values of the ACF, it would be more beneficial if we can condense the information of the CCM further to express significance and insignificance in a way that is visually more striking. Following Tiao and Box (1981), an effective summary of the pattern of the correlation structure is provided if indicator symbols are used to replace the numerical values of the CCM. These symbols are (+, -,.) where the symbol '+' represents a positively significant value, the symbol '-' represents a negatively significant value, and the symbol '.' represents an insignificant value. The criterion for the significance of a value in the CCM is based on the work of Bartlett (1946). More complete information on the mathematical representation of the CCM and the criterion for significance can be found in Tiao and Box (1981), Liu (1986) and Wei (1990). We can compute and display the CCM for the simulated vector series if we enter -->CCM Z1,Z2. MAXLAG IS 12. We limit the number of lags to be computed by including the MAXLAG sentence (the default limit is 24 lags). We obtain the following. TIME PERIOD ANALYZED TO 250 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 Z Z NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH

155 4.16 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES LAGS 7 THROUGH The output contains summary information for each series and the cross correlation matrices in terms of the ( +, -,.) symbols. The criterion used to designate a symbol is also provided. The cross correlation information is provided in two forms. We have already explained the kxk matrix form. In addition, the information is also provided within one matrix of symbols. The (i,j) component of this matrix summarizes the information related to that element for all lags. Hence the diagonal elements summarizes the significance of the ACF for each series. Each off-diagonal element summarizes the significance of a cross correlation with one component series leading another. We see that all elements of the lag 1 matrix are significant and, except for two isolated elements, the values for all other matrices are not significant. This cut-off structure is indicative of a vector MA(1) model. If we are concerned about the terms in the lag 7 and lag 8 matrices that are significant, or otherwise wish to obtain the values of the CCM, we can include the sentence OUTPUT PRINT(CCM). in the CCM paragraph. If this sentence is included, then the following additional output is provided immediately before the ( +, -,.) summaries. SAMPLE CROSS CORRELATION MATRICES FOR THE ORIGINAL SERIES. THE (I,J) ELEMENT OF THE LAG L MATRIX IS THE ESTIMATE OF THE LAG L CROSS CORRELATION WHEN SERIES J LEADS SERIES I LAG = LAG = LAG = LAG = LAG = LAG =

156 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.17 LAG = LAG = LAG = LAG = LAG = LAG = Here, an element of a CCM is designated insignificant if its magnitude is less than 2( ) = , where is twice the standard error for the estimated correlations. Although the values of the lag 7 and 8 matrices exceed this criterion, we may attribute this to sampling fluctuation (or in this case simulation fluctuation). Hence the data support a vector MA(1) representation. We will now specify and estimate this model. The methods for the identification of a pure vector AR(p) model or mixed vector ARMA(p,q) model will be discussed in Sections 3 and 4, respectively Specification of a vector ARMA model The specification of a univariate ARMA model in the SCA System is a virtual transcription of the model equation (see Sections 5.1.4, and of Forecasting and Time Series Analysis Using The SCA Statistical System, Volume 1). The same basic structure carries over for the specification of vector ARMA models. Hence we wish to "transcribe" the general equation given in (4). For this example, we wish to specify the model Z = C+ ( I Θ B) a (22) t t From (5) or (6), we recognize the need to define the series that comprise Z t and any differencing operators that may have been employed, since the same level of differencing may not have been used for all series (see Section 1.4 for a brief discussion). In addition, we may wish to restrict, reduce or otherwise constrain the parameters of our model (see Section 1.7). We are able to conveniently specify a vector ARMA model, while addressing these important issues, through the MTSMODEL paragraph.

157 4.18 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES To illustrate the MTSMODEL paragraph, we can specify the model of (22) as follows -->MTSMODEL MA1. MODEL IS SERIES = CONSTANT + --> SERIES ARE Z1,Z2. CONSTRAINTS ARE THETA(CTHETA). The above model specification is similar to its univariate counterpart TSMODEL. A unique model name is designated (here MA1) and the MODEL sentence is used to specify the form of the model. Two additional sentences are included. The SERIES sentence is a required sentence used to specify the components of the vector series. The CONSTRAINT sentence is used to designate a possible set of constraints for the parameter matrices. In the later versions of the SCA System (Release V.2 and later), the MTSMODEL paragraph automatically creates the constraint matrices (with the letter "C" added to the beginning of the name of the parameter matrix) and the user does not need to specify the constraint matrices if the default names are used. The MODEL, SERIES and CONSTRAINT sentences are now described in more detail. The MODEL sentence The syntax and use of the MODEL sentence here is virtually identical to that for the specification of a univariate Box-Jenkins ARMA model in the SCA System (see Section of Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1). There are only two noteworthy differences. The dissimilarity that is most apparent in the MODEL sentence of the above MTSMODEL paragraph is the use of the label SERIES in lieu of a variable name. The word 'SERIES' is a reserved word in the MODEL sentence and serves as a placeholder for a vector series. In this way the use of the MODEL sentence is the same, regardless of the number of series that may be involved. Of course, the components of the vector series represented by this label must be defined. The SERIES sentence is used for this purpose, as discussed below. A second difference, but more subtle than the use of the 'SERIES' label, is the way that the SCA System handles coefficient matrices. In the context of the current example, CONSTANT represents a 2 1 vector of parameters and THETA a 2x2 matrix of parameters. The SCA System can determine the orders for these arrays provided it knows the number of component series represented by 'SERIES' (as accomplished through the SERIES sentence). As in the univariate case, the SCA System can distinguish between ARMA parameter matrices defined previously (i.e., those that exist currently in the SCA workspace) or those that are being initially specified in the sentence. In the latter case, a matrix is created with the default value of 0.1 assigned as the initial parameter value for each matrix element. When the model is estimated, the final estimates obtained are stored in the matrix. If a matrix label is used again, its current values are used as initial estimates for a new estimation. The SCA System takes an analogous approach for the initial values of constants, unless a vector of constants is being initially defined. In this case, its values are ( I Φ Φ... Φ ) Z, 1 2 p

158 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.19 where Z is the vector of sample means of the component series and the Φ 's are those currently defined AR matrices of the model, if any. The SERIES sentence As noted above, the SERIES sentence is used to define the components of the vector Z t within the stationary ARMA model. All component series specified must have the same number of observations. In the MTSMODEL paragraph above, we indicate that this vector consists of the two series Z1 and Z2. No differencing is specified. As we noted in Section 1.4, the use of differencing in the case of a nonstationary vector model is not as straightforward as in the case of a univariate model. It may not be the case that all components of a nonstationary vector time series require the same order(s) of differencing, if any. For that reason, differencing orders are specified for each component series separately, and in the same manner as we do for the univariate case. Listed below are some examples of the SERIES sentence corresponding to various differencing scenarios: Series and differencing Specification using SERIES sentence Z1 and Z2 SERIES ARE Z1, Z2. (1-B)Z1 and (1-B)Z2 SERIES ARE Z1(1), Z2(1). Z1 and (1-B)Z2 SERIES ARE Z1, Z2(1). (1-B)(1-B12)Z1 and (1-B)(1-B12)Z2 SERIES ARE Z1(1,12), Z2(1,12). (1-B)(1-B12)Z1 and (1-B12)Z2 SERIES ARE Z1(1,12), Z2(12). Constraining ARMA parameters, the CONSTRAINT sentence In Section 1.7 we noted the concerns about the number of parameters in a vector ARMA model and some ways to limit the number of parameters in coefficient matrices. Two actions that we may consider are: (1) zeroing out parameters that are either assumed or found to be statistically insignificant, and (2) constraining various parameters to be equal to one another. Such actions are accommodated easily within the SCA System through the use of constraint matrices. For every parameter matrix used in the MODEL sentence, we can associate another matrix that contains information concerning the constraints that apply to the elements of the parameter matrix. The CONSTRAINT sentence is used for this purpose. In the above MTSMODEL paragraph, the sentence CONSTRAINTS ARE THETA(CTHETA) associates the parameter matrix THETA with the constraint matrix CTHETA. In later versions of the SCA System, the constraint matrix CTHETA is automatically created with zeroes in the entire matrix. Thus the CONSTRAINT sentence needs not be used in the newer version of the SCA System. After a constraint matrix is specified, the value of each element of CTHETA determines how the SCA System will treat the parameter element corresponding to it in the THETA coefficient matrix. For example, consider the (2,1)th element of THETA, θ 21, and the (2,1)th element of CTHETA, c21. If

159 4.20 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES The value of c21 is Then the parameter θ 21 will be 0 estimated without any restriction 1 constrained (fixed) to its present value during estimation k (k > 1) held equal to any other ARMA parameter of the model that has the value k in its associated constraint matrix The same interpretation is made for elements of all other constraint and associated parameter matrices. We are not required to specify constraint matrices. If no matrix of constraints is defined for a parameter matrix, then all elements of the parameter matrix will be estimated without restriction. Moreover, if the label (name) of a constraint matrix is introduced initially in the MTSMODEL paragraph (i.e., does not exist currently in the workspace), then all values of the matrix are set to 0. Consequently, all elements of its associated parameter matrix will be estimated without restriction. Constraints can be added later, after an initial estimation of the model. As a rule, it is good practice to initially define constraint matrices for all ARMA parameter matrices, even if no constraints are ever applied. To illustrate how constraints are interpreted by the SCA System, consider the following cases: CTHETA THETA Result All parameters estimated without constraint (each with initial value 0.1) Same as above, except θ 21 is fixed at 0 during estimation θ21fixed at 0 and θ 12 fixed at 0.5 during estimation θ 21 =θ 12 during estimation After an estimation, we can impose or change constraints by simply redefining the appropriate values in the parameter and associated constraint matrices that are involved. We do not need to re-specify the MTSMODEL paragraph, provided that all necessary constraint associations have been made earlier (another reason for designating any possible associations when the model is initially specified). Please note that in order to "zero out" an insignificant parameter, we need to change the value of two matrices. We need to specify a 1 in the

160 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.21 constraint matrix, so that the corresponding element of the coefficient matrix is fixed during estimation. In addition, we need to set the element of the coefficient matrix to the desired fixed value (0 in this case). If we ignore the latter, then the parameter element will be fixed to its last estimated value, not 0. If we ignore the former, then the parameter will be estimated beginning from an initial value of 0, which will result in an error. We obtain the following terse summary for the model specification -->MTSMODEL MA1. SERIES ARE Z1,Z2. MODEL --> SERIES = CONSTANT + --> CONSTRAINTS ARE THETA(CTHETA). SUMMARY FOR MULTIVARIATE ARMA MODEL -- MA1 VARIABLE DIFFERENCING Z1 Z2 PARAMETER FACTOR ORDER CONSTRAINT 1 CONSTANT CONSTANT THETA REG MA 1 CTHETA The summary provides the components of the vector series (and any differencing orders), and the parameter matrices of the model together with the "type", order and associated constraint matrix for each. The abbreviation REG in the summary above denotes a nonseasonal term (sometimes known as regular). The abbreviation SEA in such a summary denotes a seasonal term Model estimation for the simulated MA(1) data We have now specified the model Z = C+ ( I Θ B) a t t and will maintain information about this model in the workspace under the name (label) MA1. We can estimate this model by entering -->MESTIM MA1. HOLD RESIDUALS(R1,R2). The above command is almost identical to that for the estimation of a univariate ARIMA model in the SCA System. The only difference is in the specification of more than one label for retaining the residual series after estimation. For a vector ARMA model, we will have as many residual series as the number of series in our data vector. Hence we need to specify as many labels as we have in the component series. We obtain the following SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 Z Z

161 4.22 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES NUMBER OF OBSERVATIONS = 250 (EFFECTIVE NUMBER = NOBE = 250) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) MOVING AVERAGE ( 1, 1, 1) MOVING AVERAGE ( 1, 1, 2) MOVING AVERAGE ( 1, 2, 1) MOVING AVERAGE ( 1, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 8 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) (.102 ) (.076 ) THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX REDUCED CORRELATION MATRIX OF THE PARAMETERS

162 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.23 There are two sections in the output. The first section consists of summary information before estimation. This includes descriptive information on each component series, the initial values (and any constraints) for each parameter, and the covariance matrix of the (possibly differenced) vector of components. Three columns are present in the initial parameter summary. The PARAMETER NUMBER column associates an integer with each parameter. This value is used to represent the parameter in the correlation matrix of parameter estimates later in the output summary. The word FIXED appears in this column whenever a constraint exists for a parameter. The PARAMETER DESCRIPTION column lists all parameters of the model. Here we have 6 parameters, the two terms of the constant vector and the four terms of the MA(1) parameter matrix. The designation of the terms in the constant vector is self evident. The terms of the MA matrix are designated by three values (l,i,j). This set of values is used to indicate the (i,j) element of the matrix coefficient of lag order l. Since we have only one MA matrix here, of order 1; l=1 for all matrix parameters. The PARAMETER VALUE column lists the initial value that will be used for a parameter during the estimation process. Here the value 0.1 will be the initial value for all MA terms. The initial value for the constant terms is based on the initial values of all Φ matrices and the sample mean of the stationary series as follows ( I Φ Φ... Φ ) Z. 1 2 p Since there are no AR terms in this model, the initial parameter values are the sample means of the stationary series. The final model summary follows this initial summary. The final model summary includes the criterion for stopping estimation, the estimates of all model parameters (with the standard error of the estimate for each element of the constant vector, and AR or MA parameter matrix), and the correlation matrix for all parameter estimates. The (+, -,.) notational "matrix of significance" is given for each AR and MA parameter matrix. The interpretation of the symbols of this matrix is the same as that used for the cross correlation matrices. Each estimate is compared to its individual standard error (as displayed) and the '+', '-', or '.' symbols represent an estimate that is positively significant, negatively significant or insignificant at the 5% level, respectively. We have the following as estimates of our model parameters: C =,, and = = Θ Σ. These estimates are in agreement with the "true" values of this model (i.e., those used to simulate the data). The correlation between all parameter estimates is also presented. The row and column integer labels correspond to the parameter numbers displayed in the initial summary. The matrix of values is presented in reduced form. That is, the symbol '.' is displayed if the magnitude of the correlation is insignificant. In this way our attention is better drawn to estimates that may be more highly correlated. Such information is valuable if we want to omit or constrain parameter values.

163 4.24 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Estimation algorithms for MA parameters As in its univariate counterpart, the ARMA parameter estimates obtained above are maximum likelihood estimates (i.e., estimates that maximize a likelihood function). Extensive literature exists on properties of the likelihood function for the vector model, and various simplifying approximations to this function (see, for example, Anderson 1971, Nicholls and Hall 1979, Osborn 1977, and Phadke and Kedem 1978). As in the univariate case, two forms of the likelihood function are available for models that contain MA parameters. The first of these is called the conditional likelihood function as proposed by Wilson (1973). Alternatively, the exact likelihood function, proposed by Hillmer and Tiao (1979) and others, are available. With n vector observations Z1,..., Z n, both approaches compute a likelihood function on the basis of the stochastic structure of n-p observations, p q t Φi t i Θ j t j t i= 1 j= 1 Z = C + Z a + a, t = p + 1,...,n where Z 1 through Z p are regarded as fixed (known) values. The algorithms employing this function differ in that the "conditional" likelihood algorithm assumes ap =... = ap q+ 1 = 0 while the "exact" likelihood algorithm computes estimates for these values. Hence the "exact" algorithm is exact for MA parameter matrices only. The conditional and exact algorithms do not affect the estimates of a pure vector AR process. The exact algorithm is computationally more burdensome, but it can appreciably reduce the potential biases in estimating the moving average parameter matrices, Θ 's, that may occur under the conditional method; especially in the case when some of the roots of the determinantal polynomial Θ (B) are near the unit circle. It is usually good practice to employ the exact algorithm whenever MA parameters are present, and in particular, for a seasonal model. As in the univariate case, the most efficient way to employ the exact estimation method is to first estimate a model using the default conditional method. Then, we can re-estimate the model using the exact method at the later stage. The advantage is that the conditional method will provide a good starting point from which the exact method may begin. We can do this easily within the SCA System since each model maintains information of the last set of estimates for all parameters. We can now re-estimate the parameters of the model just estimated, and retain the residuals of the fit by entering -->MESTIM MA1. METHOD IS EXACT. HOLD RESIDUALS(R1, R2). SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 Z Z NUMBER OF OBSERVATIONS = 250 (EFFECTIVE NUMBER = NOBE = 250)

164 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.25 MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) MOVING AVERAGE ( 1, 1, 1) MOVING AVERAGE ( 1, 1, 2) MOVING AVERAGE ( 1, 2, 1) MOVING AVERAGE ( 1, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: CHANGE IN (-2*LOG LIKELIHOOD)/NOBE.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 4 FINAL MODEL SUMMARY WITH MAXIMUM LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) (.104 ) (.078 ) THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX *(LOG LIKELIHOOD AT FINAL ESTIMATES) IS E REDUCED CORRELATION MATRIX OF THE PARAMETERS

165 4.26 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES We see from the initial parameter summary that estimation begins from the previous set of "final" parameter values. The final estimates are effectively the same as those obtained using the default conditional algorithm since there are no roots near the unit circle. One additional piece of information in this output is a value for "-2(LOG LIKELIHOOD AT FINAL ESTIMATES)". This value can be used in likelihood ratio tests that involve the model. No such value is displayed when the conditional estimation algorithm is used because scaling within the algorithm causes the final objective function to be equal to the product of the number of series and number of observations per series. It should be noted that in vector ARMA estimation, the computer execution time and workspace requirement is proportional to the square of the number of the series in the model. Hence the computing time can be lengthy when the number of series is large. Moreover, the exact likelihood approach usually requires much more computing time than the conditional likelihood method. Hence it is advisable to give careful thought to the model to be estimated when the number of series is large or if the exact likelihood method is to be employed Diagnostic checks of the fitted model There are several ways in which a fitted ARMA model can be diagnostically checked. One set of checks involves the residual vector series. A basic visual check is to plot each component of the residual series over time. A plot of R1 and R2 does not reveal any anomalies in this example (plots are not displayed here). In the univariate case, we also examine the ACF of the residuals to see if it is consonant with the ACF of a white noise process. In like fashion, the cross correlation matrices of the vector series should be examined. These residual cross correlation matrices, summarized by the usual indicator symbols, should represent a random white noise process in which there is no correlation between the current residual vector and any lagged residual vectors. However, since vector models include stochastic series, it is not likely that all residual cross correlations will be statistically insignificant. Consequently, the general pattern of residual CCM and their significant matrix location (e.g., lag order or element position) should be a guide to model adequacy. The residual CCM should be insignificant or at most just have a few significant elements at random lags and matrix positions. In the latter case, this means that significant cross correlations should not repeat themselves at the same (i,j) position in the residual CCM. In addition, we may discount significant correlations that occur at much later lags as such lags are generally less important than significant correlations which occur at shorter lags. An obvious exception to this are higher lag orders that correspond to the multiple of a seasonal period, in the case of seasonal time series. We can compute the CCM for the residuals of our model fit by entering -->CCM R1,R2

166 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.27 TIME PERIOD ANALYZED TO 250 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH LAGS 19 THROUGH The residual series appear to have a zero mean and only 2 of the 96 computed cross correlations are significant (approximately 2%). Since their relative positions are not particularly meaningful, we conclude that we have an adequate model fit. Other diagnostic checks are possible. These include an examination of the reduction of the variance covariance values for Σ with successive fits and an eigenvalue analysis. The latter is discussed in more detail in Section 5.

167 4.28 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Forecasting an estimated model We have now determined that we have an adequate fit of our bivariate series. As in the univariate case, we can now forecast values for both series, here using the MFORECAST paragraph. To forecast our vector series, we can enter -->MFORECAST MA1. NOFS ARE 10. HOLD FORECASTS(FZ1,FZ2),STD(SZ1,SZ2) FORECASTS, BEGINNING AT ORIGIN = SERIES: Z1 Z2 TIME FORECAST STD ERR FORECAST STD ERR We are provided with 10 forecasts for each component series, together with the standard error of each forecast. As in its univariate counterpart, the NOFS sentence was used to limit the number of forecasts to 10. Without this specification, the default 24 forecasts are produced for each series. The HOLD sentence was specified in order to retain the forecasts and standard errors that are computed. The forecasts for the first and second component series are retained in variables FZ1 and FZ2, respectively. In like fashion, SZ1 and SZ2 hold the standard errors. Using the data of these variables, and that of the original component vectors, we can produce plots of the forecasts and standard error limits using SCAGRAF. We see in the output of MFORECAST that, except for the first forecasted value, all forecasts for Z1 are the same (the value ). The standard error limits are also constant after the first forecast. The reason for these values is straightforward. As in the case of a single time series, all forecasts are computed using the most current fitted model. Here that model is Z1t a a 1t (t 1) Z2 = t a 2t a2(t 2) From (22) we can derive the difference equation pertaining to Z1. It is (23) Z1 = a 0.173a 0.447a. (24) t 1t 1(t 1) 2(t 1) The equation of (23) states that the value for Z1 at any time period is (the estimated constant term, here the mean) plus a shock associated with Z1 for the present time period

168 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.29 minus weighted amounts of the shocks that occurred for each component series in the prior period. By using this equation through the forecasting periods, we obtain the following Z1251 = a1(251) 0.173a1(250) 0.447a2(250), Z1252 = a1(252) 0.173a1(251) 0.447a2(251), Z1 = a 0.173a 0.447a (260) 1(259) 2(259) Except for the constant value , all terms in the forecast period are driven by shocks. We can use the values of the residual series at time t=250 to serve as surrogates for a 1(250) and a 2(250), but the best "guess" we can make for any other value is its assumed mean value of the residual series, 0. As a result, the forecast for all values of Z1 after t=251 is The same logic holds when we examine the standard error of the forecasts. Calculations of forecasts and forecast standard errors As noted above, forecasts and the standard errors of forecasts for the components of the vector series are obtained based on the values through the forecast origin, the fitted vector ARMA model, and the residuals from the fitted model. Suppose vector observations Z1, Z2,... are available up to time n and it is desired to forecast future vector observations Zn + l, l 1. Forecasts calculated in the SCA System are the minimum mean squared error (MMSE) forecasts so that the forecast for Zn+ l is the conditional expectation of Zn+ l based on all information to time n. It can be shown that the MMSE forecast can be recursively computed using Zˆ ( l) = C+ ΦZˆ Φ Zˆ Θ Ea ( )... Θ Ea ( ) n 1 n( l 1) p n( l p) 1 n+ l 1 q n+ l q where Z ˆ n (j) = Z n + j for j 0, E( a ) 0 for j > 0, n+ j = and E( n+ j) = n+ j a a for j 0. In practice, neither the values of the parameter matrices nor the values of the error vector sequence are known. Hence we use the estimated parameter values and the corresponding residual vector sequence in their place. If we assume that the vector sequence { a t } is a white noise sequence, the vector of forecast errors, say e n ( l ), is normally distributed with zero vector mean and covariance matrix l 1 V( en( l)) = ΨΣΨ i i, i= 0

169 4.30 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES where the Ψ matrices are analogous to the ψ weights of a univariate model and Ψ 0 = I. For a more complete discussion of this covariance matrix, see Tiao and Box (1981) and Section 14.7 of Wei (1990). Within this document, it is tacitly assumed that all components of the vector series being modeled are stochastic (that is, not deterministic). However, the SCA System permits the use of a vector series that is composed of both stochastic and deterministic (nonforecastable) series. If deterministic series are used, it is important to remember that these series cannot be forecasted by the SCA System. Hence, when forecasting in the presence of such series, the "future" values of the deterministic component series must also be supplied. 4.3 Modeling an Autoregressive Process: Lydia Pinkham Data In Section 2, we modeled and forecasted a bivariate MA(1) vector process. In this section we will consider an actual example, the analysis of the annual advertising outlays and sales revenues of Lydia Pinkham's vegetable compound from 1906 through 1960 inclusive. These data have been analyzed by many authors, including Vandaele (1983), Wei (1990) and Pankratz (1991). The data are listed in Table 2 and are plotted in Figure 2 (using SCAGRAF). The data are stored in the SCA workspace under the labels ADVS and SALES. Table 2 Lydia Pinkham annual data, Advertising outlays (thousands of dollars), read across Sales revenues (thousands of dollars), read across

170 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.31 Figure 2 Plot of Lydia Pinkham data The plots of ADVS and SALES indicate what may be nonstationary behavior in both series. In fact, differencing is usually employed when either series is modeled individually or when a transfer function model (see Chapter 8 of Forecasting and Time Series Analysis Using The SCA Statistical System) is used to relate the "input" series ADS to the "output" series SALES. Because of the relatively short length of the two series (54 observations), we are reluctant to difference the data here. An analysis of the first order differences of each component of the vector series is presented in Section 14.7 of Wei (1990).

171 4.32 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Preliminary identification: sample cross correlation matrices As a first step in the identification of a model, we will compute the CCM of the vector series. Since there are only 54 observations, we will limit the number of lags to 10. To obtain this preliminary overview, we can enter the following (SCA output is edited slightly for presentation purposes). -->CCM ADVS, SALES. MAXLAG IS 10. TIME PERIOD ANALYZED TO 54 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 ADVS SALES BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH Clearly, the sample CCMs do not cut off after a few lags. From this we conclude that the vector series composed of the series ADVS and SALES cannot be represented by a low order vector MA(q) process. (Note: A similar conclusion is reached from the CCM for the vector series composed of the first order differences of ADVS and SALES, see page 365 of Wei 1990). If a vector ARMA model is to be constructed for these data, we need to include AR terms. A method to determine an AR order is now discussed.

172 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Preliminary identification, continued: stepwise Autoregressive fitting In univariate ARMA model building, the sample ACF is useful in determining whether a series can be well represented by a low order MA(q) process. The sample PACF is useful in determining the order of an AR(p) process. As we have seen in Section 2, the sample CCM is direct generalization of the sample ACF. Unfortunately, the sample PACF does not extend directly to the multivariate case. However, if we appeal to the regression interpretation of the PACF and employ theory involving multivariate linear models, we can obtain a useful measure for identifying low order vector AR(p) models. The sample PACF of a single series can be obtained by sequentially fitting a lagged regression to a series Z t through lag l (l = 1, 2, 3,...) and retaining the estimate of the last term (i.e., highest lag order) of each fit. Each successive value is a measure of the relative importance of adding the next higher order lag term to the previous fit. If the series follows an AR(p) model, then the usefulness of adding Z p + 1is negligible, as is Z p + 2, Z p + 3, and so on. Hence the sample PACF gives us direct information on the AR order as it should "cut off" after lag p. We can extend this concept to the multivariate case, except we need to keep in mind that we will be manipulating vectors and matrices, and not scalar values. That is, we cannot reverse the order of a multiplicative term. Now, if Z follows a vector AR(p) model, we have Z = C+ ΦZ + Φ Z Φ Z + a. (25) t 1 t 1 2 t 2 p t p t We can employ multivariate least squares to obtain estimates of the coefficient parameter matrices of (24) if we re-write the equation slightly. Specifically, for n observations we can re-write (24) as ' ' ' ' ' ' t t 1 1 t p t Z = C + Z φ Z + a, t = p + 1,...,n. (26) Thus, we can use (25) to express the multivariate linear model or ' ' ' ' ' Zp+ 1 1 Zp Z1 C ap+ 1. = + ' ' ' ' ' Zn 1 Zn 1 Zn p Φ p an Y= Xη + ε. (27) Within the framework of (26), the least squares estimates of η can be immediately obtained. In practice the order of p is unknown, but we can use the representation outlined above to estimated vector AR( l ) models for successively higher orders. t

173 4.34 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES We can now extend the idea behind the univariate PACF to consider the l th sample partial autoregression matrix, say Φ ll, for l = 1, 2,.... Φ ll is the matrix of coefficients associated with the highest order lag of each successive fit. If Z t follows a vector AR(p) model, then Φ l = 0 for l > p. Correspondingly, all the elements of Φ ll will be expected to be small. Using least squares theory, estimates of the variances of the elements of Φ ll can be obtained. As we have done previously, the values of the partial autoregression matrices can be summarized by assigning the indicator symbol '+' when an element is greater than two times its estimated standard error, the symbol '-' for values less than minus two standard errors, and '.' for values in between (i.e., insignificant). To help determine a tentative order of an autoregressive model, the SCA System calculates a χ2 statistic after each fitted lag (for an explanation of this statistic, see Tiao and Box 1981). This statistic is based on the determinant of the matrix of the residual sum of squares and cross products after fitting a vector AR( l ) model. The matrix, say S( l ), is used to derive the statistic 1 S( l) M( l) = (N l*k) ln. 2 S( l 1) where N= n p l is the effective number of observations and S(0) is the matrix of the residual sum of square and cross products when only a constant term is fitted to the vector 2 2 series. M( l ) is asymptotically distributed as χ with k degrees of freedom. In addition to the statistic M( l ), the SCA System also computes and displays the Akaike Information Criterion, AIC (see Akaike 1973, 1974), after each fitted lag. The smallest AIC value gives an indication of the order of the model to be used. The cut-off in the significance of M( l ) and the smallest value of the AIC usually confirm one another in the determination of the order for p. In practice neither the significance M(l ) nor the AIC should be used as the sole criterion. We can obtain a quick overview of stepwise fits through 5 lags for the Lydia Pinkham data if we enter -->STEPAR ADVS, SALES. ARFITS ARE 1 TO 5. The ARFITS sentence is a required sentence that is used to specify the lags to include in the stepwise fits, and in what order. More will be said on order specification later. Here the phrase "1 TO 5" is a notational shorthand for "1, 2, 3, 4, 5". Stepwise fits will be made in that order. We obtain the following TIME PERIOD ANALYZED TO 54 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 ADVS SALES DETERMINANT OF S(0) = E+11

174 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.35 NOTE: S(0) IS THE SAMPLE COVARIANCE MATRIX OF W(MAXLAG+1),...,W(NOBE) ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.357E+05 I.197E+05 I I I + + I.473E+05 I.633E+05 I I I I.340E+05 I.187E+05 I I I.. I.376E+05 I.529E+05 I I I I.243E+05 I.141E+05 I I I.. I.372E+05 I.475E+05 I I I I.235E+05 I.130E+05 I 3.64 I I.. I.365E+05 I.470E+05 I I I I.224E+05 I.129E+05 I 2.84 I I.. I.342E+05 I.437E+05 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 4 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 13.3 NOTE: THE PARTIAL AUTOREGRESSION COEFFICIENT MATRIX FOR LAG L IS THE ESTIMATED PHI(L) FROM THE FIT WHERE THE MAXIMUM LAG USED IS L (I.E. THE LAST COEFFICIENT MATRIX). THE ELEMENTS ARE STANDARDIZED BY DIVIDING EACH BY ITS STANDARD ERROR. The initial portion of the output consists of the same descriptive summary as was presented at the beginning of the prior CCM output. Information is then given on the matrix S(0), followed by a short tabular display that summarizes key information from each successive AR fit. There are five columns of information within this display. The information of three of the columns has been discussed previously. The column labeled SIGNIFICANCE OF PARTIAL AR COEFF. displays the l th sample partial autocorrelation matrix, Φ ll, in reduced symbolic form. The value in the columns labeled CHI-SQ TEST and AIC are the 2 values of the χ statistic M( l ) and Akaike Information Criterion, respectively (critical values for the χ 2 statistic are given at the bottom of the display). The values of the RESIDUAL VARIANCES column are the variances of each component residual series after each fit. Here we can trace the reduction in variance, if any, that occurs in a component series as a new AR term is entered into the model. The values of the EIGENVAL. OF SIGMA column are the eigenvalues of the covariance matrix for the residual series after each fit. The use of eigenvalue information is discussed in more detail in Section 6. 2 The χ is significant (at the 5% level) through the 3rd lag and is not significant thereafter. The minimum of the AIC occurs at l =3, and the variances of the residual series decrease through the 3rd lag, leveling off thereafter. On the basis of this information we may conclude that a vector AR(3) model may be appropriate for these data. It may be useful to obtain more information on an AR(3) fit of the data, as is discussed below.

175 4.36 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Obtaining CCM and stepwise fit information quickly We obtained the cross correlation matrices and stepwise autoregressive fitting information through the use of the CCM and STEPAR paragraphs, respectively. The information of these two paragraphs is similar to that provided in the ACF and PACF paragraphs when a single time series is analyzed. In the univariate case, we can obtain both the sample ACF and sample PACF by using the IDEN paragraph. An analogous paragraph, MIDEN, exists for this purpose in vector ARMA modeling. We can obtain the exact information given above if we enter -->MIDEN ADVS, SALES. MAXLAG IS 10. ARFITS ARE 1 TO 5. As in the case of the STEPAR paragraph, ARFITS is a required sentence of this SCA paragraph. The use of the CCM and STEPAR paragraphs (or the MIDEN paragraph) also provides with information that may be of use in identifying the order of a mixed ARMA model. An example of identifying the orders of a mixed vector ARMA model is given in Section 4. Altering the order for stepwise AR fits In the STEPAR paragraph shown earlier, we sequentially fit vector AR(1), AR(2), AR(3), AR(4) and AR(5) models to our data and obtained information on the relative value of adding another lagged coefficient matrix to the previous AR fit. We saw that the relative value of adding terms ended after the fit of a full AR(3) model. Intuitively, we may think that the values of the χ2 statistic will decrease uniformly until insignificance is reached. This was not the case here as M(2)= (significant at the 5% level) and M(3) = (significant at the 1% level). We may wonder of the relative value of the Φ2 matrix. That is, we may wonder if the significance of an AR(2) fit was more related to the need to include the Φ3 coefficient matrix in the fit or the lag 2 term itself. The SCA System provides us with the flexibility to quickly examine this possibility as we have complete freedom in specifying the order in which lagged terms are entered into the fits. Clearly the most common (and rational) method of adding lagged terms is sequentially from lag 1 through some upper bound, here 5. However, if we desire that only specific lags should be considered (e.g., 1, 2 and 4) or that the lags be entered in a specific order (e.g., 1, 3, 2), then we may do so through the ARFITS sentence. To illustrate this capability, we will reexecute the STEPAR paragraph and change the order in which lags are added to the model. The SCA output is edited so that only the summary table is presented. -->STEPAR ADVS, SALES. ARFITS ARE 1,3,2,4,5. ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.357E+05 I.197E+05 I I I + + I.473E+05 I.633E+05 I I I

176 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES I.348E+05 I.157E+05 I I I.. I.421E+05 I.612E+05 I I I I.243E+05 I.141E+05 I I I - - I.372E+05 I.475E+05 I I I I.235E+05 I.130E+05 I 3.64 I I.. I.365E+05 I.470E+05 I I I I.224E+05 I.129E+05 I 2.84 I I.. I.342E+05 I.437E+05 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 4 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 13.3 The information of rows 1, 4 and 5 is identical to that of the previous table, as it should be. Reversing the order in which the lag 2 and 3 terms are included in the fits does not alter our previous interpretation which calls for a full vector AR(3) model. Note that the smallest value of the AIC occurs along the row associated with lag 2. This does not indicate that the AIC "favors" a vector AR(2) model. Instead, it indicates that the model fit at this stage (with lag 1, 3 and 2 terms) led to the smallest value of the AIC. Thus, an AR(3) model is also implied if the AIC statistic is used. Obtaining more output from the STEPAR paragraph It appears that a full vector AR(3) model may be an adequate initial representation for the vector series, based on the summary information of the CCM and STEPAR paragraphs. The default output of the STEPAR paragraph is a summary of what has proven to be the most important information from which we may make an initial inference about modeling the series using only a vector AR model. By default, the STEPAR paragraph only provides a limited report of all information available from the sequence of AR fits. Other information are also available, such as: (1) estimates and standard errors of the parameters of all estimated coefficient matrices, (2) the residual covariance matrix (and eigenvalue information, see Section 6), (3) the correlation matrix of all estimated parameters, and (4) the CCM of the vector residual series after each fit. The above information is suppressed by default because, in the simplest of terms, it represents an overkill of information. However, once we have obtained a quick summary we may want a more extensive (yet limited) amount of information. For the Lydia Pinkham data, we may want to have more information about the AR(3) fit. Specifically, we would like to see the estimates of the three coefficient matrices and the CCM of the residuals of the fit. We can achieve this by requesting a more detailed level of output and using the RCCM sentence. Specifically, if we enter the command -->STEPAR ADVS,SALES. ARFITS ARE 1 TO 3. RCCM IS --> MAXLAG IS 10. OUTPUT LEVEL(DETAILED).

177 4.38 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES we will obtain this information. The sentence 'ARFITS ARE 1 TO 3' is necessary to assure a fit of a full AR(3) model. Fits of an AR(1) model and a full AR(2) model will also occur, and cannot be avoided in this paragraph. The RCCM sentence is used to indicate when to display the CCM of the residual series. Here the CCM of the residuals can be computed and displayed after the 1st fit, AR(1), the 2nd fit, the full AR(2), or the 3rd fit, the full AR(3). We indicate that we only want the computation and display at the 3rd fit. We further limit the display to 10 lags, through the MAXLAG sentence (the default is 24 lags whenever CCM are computed and displayed). The OUTPUT sentence indicates that we desire a detailed level of output. As a result we will obtain the estimated values of the coefficient matrices after each fit. More complete information on all output that is available may be found in the syntax for this sentence at the end of this document. We have now better controlled the amount of output that will be generated so that the additional information we want will be displayed. Still, quite a bit of information will be generated. For presentation purposes, the output below has been edited to include only information from the last fit. AUTOREGRESSIVE FITTING ON LAG(S) === PHI( 1) === STANDARD ERRORS === PHI( 2) === STANDARD ERRORS === PHI( 3) === STANDARD ERRORS RESIDUAL COVARIANCE MATRIX S( 3).234E E E+05 RESIDUAL CORRELATION MATRIX RS( 3)

178 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES EIGENVALUES AND EIGENVECTORS OF S( 3) EIGENVALUES EIGENVECTORS DETERMINANT OF S(J) = E+09 LEADING TO A VALUE OF THE TEST STATISTIC M = -W*LN(U) = APPROXIMATELY DISTRIBUTED AS A CHI SQUARE WITH 4 DF, WHERE U = DET(S(J))/DET(S(J-1)) S(J) = RESIDUAL COVARIANCE MATRIX AFTER JTH FIT W = (NOBE-MAXARF-1)-J*K-.5, AND DF = K*K. -1 RV(J) = CORRELATION FORM OF V(J) = (X'X) NOTE: CORR(PHI(LAG(L1),I1,J1),PHI(LAG(L2),I2,J2)) = RS(I1,I2)*RV(K*(L1-1)+J1,K*(L2-1)+J2) SAMPLE CROSS CORRELATION MATRICES FOR THE RESIDUAL SERIES. THE (I,J) ELEMENT OF THE LAG L MATRIX IS THE ESTIMATE OF THE LAG L CROSS CORRELATION WHEN SERIES J LEADS SERIES I SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH

179 4.40 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.343E+05 I.191E+05 I I I + + I.458E+05 I.610E+05 I I I I.326E+05 I.182E+05 I I I.. I.368E+05 I.512E+05 I I I I.234E+05 I.139E+05 I I I.. I.364E+05 I.460E+05 I I I We see from the CCM of the residual series that the AR(3) seems to adequately represent the data. If we so desire, we could consider this as the final model for the data. However, we see that many of the parameter estimates are not statistically significant. We may then consider fitting an AR(3) model with some parameters set to zero (i.e., zeroed out). For this reason, we will now continue with the specification and estimation of an AR(3) model. The results in the above stepwise regression are somewhat different from those with a lag 1 thru lag 5 stepwise fitting. This is due to the fact that fewer pairs of vector observations are used in computing stepwise regression when more lags are specified Initial model specification and estimation for Lydia Pinkham data We can now specify an AR(3) model for the data. Since we expect to zero out some parameters, we will also specify a constraint matrix for each coefficient matrix. We enter -->MTSMODEL LPMODEL. SERIES ARE ADVS, --> MODEL IS (1 - PHI1*B - PHI2*B**2 - PHI3*B**3)SERIES = CONST + NOISE.@ --> CONSTRAINTS ARE PHI1(CPHI1), PHI2(CPHI2), PHI3(CPHI3). SUMMARY FOR MULTIVARIATE ARMA MODEL -- LPMODEL VARIABLE DIFFERENCING ADVS SALES PARAMETER FACTOR ORDER CONSTRAINT 1 CONST CONSTANT PHI1 REG AR 1 CPHI1 3 PHI2 REG AR 2 CPHI2 4 PHI3 REG AR 3 CPHI3 Since our model contains AR terms, the constant term will not be an estimate of the mean. Instead it will be related to the mean according to ( I Φ Φ Φ ) µ = C Since PHI1, PHI2, and PHI3 have not yet been defined, the value 0.1 will be used for all terms of the matrix parameters. If we had desired, we could have retained the estimates of

180 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.41 these matrices from the last stepwise fit if the HOLD sentence was included in our last use of the STEPAR paragraph (see the syntax at the end of this document for more information). In such a case, the initial values for all coefficient matrices could be the final values from the STEPAR fits. The initial values for the constant term will be ( I Φ Φ Φ ) Z, where Z is the sample mean. The initial value for all terms of the constraint matrices are automatically set to 0, so that all estimates will be computed without constraint. We can estimate this model by simply entering -->MESTIM LPMODEL. No residuals are retained at this time since we intend to constrain some parameters and reestimate the model later. We obtain the following output SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 ADVS SALES NUMBER OF OBSERVATIONS = 54 (EFFECTIVE NUMBER = NOBE = 51) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) AUTOREGRESSIVE ( 2, 1, 1) AUTOREGRESSIVE ( 2, 1, 2) AUTOREGRESSIVE ( 2, 2, 1) AUTOREGRESSIVE ( 2, 2, 2) AUTOREGRESSIVE ( 3, 1, 1) AUTOREGRESSIVE ( 3, 1, 2) AUTOREGRESSIVE ( 3, 2, 1) AUTOREGRESSIVE ( 3, 2, 2) ERROR COVARIANCE MATRIX E+06 ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 5 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES

181 4.42 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES CONSTANT VECTOR (STD ERROR) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 2 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 3 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX Estimation with constraints It should not be surprising that the results here are the same as those obtained previously. We see that there are a number of "insignificant" estimates (as compared to their standard errors). We will now re-estimate this model after we zero out some parameters. In order to be a bit conservative regarding parameters that will be zeroed out, we will first consider those parameters whose (absolute) value is less than 1.64 times of its standard error. Hence all insignificant values above, except those of the top row of PHI3, will be zeroed out. In order to constrain these terms to 0, we must set appropriate coefficient matrix elements to 0 and set the elements of their associated constraint matrix to 1. We can use SCA analytic

182 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.43 statements for this purpose, and enter the following sequence (SCA output is edited for presentation purposes) -->PHI1(2,1) = 0 -->CPHI1(2,1) = 1 -->PHI2(2,1) = 0 -->CPHI2(2,1) = 1 -->PHI2(2,2) = 0 -->CPHI2(2,2) = 1 -->PHI3(2,1) = 0 -->CPHI3(2,1) = 1 -->PHI3(2,2) = 0 -->CPHI3(2,2) = 1 -->MESTIM LPMODEL. HOLD RESIDUALS(R1, R2). SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 ADVS SALES NUMBER OF OBSERVATIONS = 54 (EFFECTIVE NUMBER = NOBE = 51) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) *FIXED* AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) AUTOREGRESSIVE ( 2, 1, 1) AUTOREGRESSIVE ( 2, 1, 2) *FIXED* AUTOREGRESSIVE ( 2, 2, 1) *FIXED* AUTOREGRESSIVE ( 2, 2, 2) AUTOREGRESSIVE ( 3, 1, 1) AUTOREGRESSIVE ( 3, 1, 2) *FIXED* AUTOREGRESSIVE ( 3, 2, 1) *FIXED* AUTOREGRESSIVE ( 3, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 4 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) -----

183 4.44 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 2 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 3 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX The results here have changed slightly, reflecting the imposed constraints. The CCM of R1 and R2 (the residual series) show no anomalies; neither do their time plots (output not shown). However, the estimates for Φ2(1,1) and Φ3(1,1) are insignificant, therefore these parameters need to be constrained to zero and the model needs to be re-estimated Interpreting the estimation results If we now use matrix multiplication, we can write the models for each component series. We obtain the following ADVS t = C1 + (0.52B 0.34B B )ADVS t + (0.59B 0.54B B )SALES + a SALES = C SALES + a. t t 1 2t t 1t

184 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.45 We see that current annual sales are dependent upon the annual sales of the prior year. Advertising outlays are dependent upon both advertising outlays and sales of prior years. Hence the model implies a feedback relationship from SALES to ADVS, but the converse is not true. Pankratz (1991, page 354) suggests that the feedback relationship may be the result of aggregation, and that less aggregated data may exhibit no feedback. Wei (1990, page 369) also notes that, from the equations above, we should not conclude that contemporaneous relationship between ADV and SALES does not exist. In our vector ARMA model framework, contemporaneous relationships are exhibited through the off-diagonal elements of Σ. The value of the correlation coefficient between ADV and SALES is about This is a clear indication of a contemporaneous relationship between the two series. 4.4 Analysis of a Mixed Vector ARMA Model In the previous examples, our final model was either a pure MA or a pure AR vector process. In this example, we consider a mixed vector ARMA model. The data considered in this example are quarterly financial data of the U.K. between 1952/3Q and 1967/4Q, inclusive. The data were analyzed originally by Coen, Gomme and Kendall (1969), and have been studied subsequently by Box and Newbold (1970), Tiao and Box (1981), and Liu (1986). The data consist of the following three series (variable name used within the SCA workspace in parentheses): Financial Time Ordinary Share Index (STOCK), U.K. Car Production (CARPD), and Financial Time Commodity Price Index (CPI). The data are listed in Table 3, and are plotted (using SCAGRAF) in Figure 3. In Coen, Gomme and Kendall (1969), the authors were interested in the possibility of predicting the current value of STOCK from lagged values of CARPD and CPI. A standard regression analysis was used in which STOCK t was treated as a dependent variable and CARPDt 6 and CPIt 7 were used as independent variables. For a discussion of this approach, see Box and Newbold (1970). In the remainder of this section, we will examine vector series with STOCK, CARPD and CPI as components.

185 4.46 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Table 3 U.K. Financial Data, 1952/3Q /4Q Financial Time Ordinary Share Index, read across U.K. Car Production, read across (unit: 1000 cars) Financial Time Commodity Price Index, read across

186 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Figure 3. Plot of U.K. financial data 4.47

187 4.48 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Preliminary model identification, CCM and STEPAR Although no seasonality is present, the plots of STOCK, CARPD and CPI appear to indicate a degree of nonstationarity in all three series. In fact, Box and Newbold (1970) use first order differenced data within their analysis. Because of the issues related to differencing in the vector case (see Section 1.4), we will proceed with caution with respect to differencing all series. As a first step in our preliminary model identification, we will compute the CCM for the vector series. SCA output is edited for presentation purposes. -->CCM STOCK, CARPD, CPI. MAXLAG IS 10. TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 STOCK CARPD CPI BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH The original series show high and persistent auto and cross correlation. In particular, the autocorrelation of each series are positively significant through at least lag 8. This is additional evidence that differencing may be considered. Before such a step is taken, we will consider the results of stepwise AR fitting through 5 lags. -->STEPAR STOCK, CARPD, CPI. ARFITS ARE 1 TO 5.

188 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.49 TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 STOCK CARPD CPI DETERMINANT OF S(0) = E+09 NOTE: S(0) IS THE SAMPLE COVARIANCE MATRIX OF W(MAXLAG+1),...,W(NOBE) ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.296E+03 I.378E+01 I I I +.. I.946E+03 I.235E+03 I I I. +. I.426E+01 I.101E+04 I I I I.269E+03 I.302E+01 I I I -.. I.892E+03 I.215E+03 I I I... I.323E+01 I.947E+03 I I I I.245E+03 I.297E+01 I 9.57 I I... I.861E+03 I.182E+03 I I I... I.319E+01 I.924E+03 I I I I.243E+03 I.288E+01 I 3.60 I I... I.841E+03 I.176E+03 I I I... I.313E+01 I.909E+03 I I I I.213E+03 I.262E+01 I I I. +. I.749E+03 I.164E+03 I I I. +. I.292E+01 I.798E+03 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 9 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 21.7 From the summary table above, it is clear that little improvement in the model occurs after lag l = 2. After lag 2, most of the elements of Φ ll are small compared with their estimated standard errors. The M( l ) statistic, which is approximately distributed as a χ2 with 9 degrees of freedom, fails to show significant improvement after lag 2, and there are no obvious reductions in residual variances. In addition, the AIC reaches a minimum at lag 2. As a result, we may conclude that an AR(2) model is adequate for this data. However, we may still have a concern about differencing the data. For this reason, we may also wish to investigate a model using first differences of all series. Alternatively, the data may be suggesting that 2 parameter matrices are required, but one could be an MA matrix. This could account for the stepwise results after lag 1. We will now investigate this possibility.

189 4.50 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Identification methods for a mixed model As in its univariate counterpart, we have relatively simple and effective tools to determine the order of a pure autoregressive, or a pure moving average stationary vector ARMA model. However, if both p and q are not zero, then the identification of the model can be more difficult if we rely solely on the information provided from the sample CCM and stepwise AR fits. Various methods are available to us for the identification of a mixed model. One such method is based on the combined use of autoregressive fitting and the sample CCM of residuals of these fits. The rationale for this method is based on the assumption that if the proper AR and MA orders are p and q, respectively, then the CCM of the residual series after an AR(p) fit may follow the pattern of an MA(q) vector model. However, if the true model is a mixed vector ARMA(p,q) model, then we may have significant bias in the estimation of the pure vector AR parameters. Hence the residuals after an AR(p) fit may not be consonant with a vector MA(q) process. Nevertheless, the method is often useful in practice. In this example, we may wish to investigate the CCM of the residual series after both an AR(1) and an AR(2) fit. We can also examine the values of the Φ1matrix as a rudimentary check on differencing (i.e., we may consider differencing all series if Φ1 I ). We can obtain this information if we enter the following SCA command (SCA output is edited for presentation purposes): -->STEPAR STOCK,CARPD,CPI. ARFITS ARE 1,2. RCCM ARE --> MAXLAG IS 10. OUTPUT LEVEL(DETAILED). TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 STOCK CARPD CPI AUTOREGRESSIVE FITTING ON LAG(S) 1 === PHI( 1) === STANDARD ERRORS RESIDUAL COVARIANCE MATRIX S( 1).283E E E E E E+01

190 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.51 RESIDUAL CORRELATION MATRIX RS( 1) BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH AUTOREGRESSIVE FITTING ON LAG(S) 1 2 === PHI( 1) === STANDARD ERRORS === PHI( 2) === STANDARD ERRORS RESIDUAL COVARIANCE MATRIX S( 2).258E+03

191 4.52 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES.181E E E E E+01 RESIDUAL CORRELATION MATRIX RS( 2) BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.283E+03 I.394E+01 I I I +.. I.900E+03 I.226E+03 I I I. +. I.439E+01 I.958E+03 I I I I.258E+03 I.311E+01 I I I -.. I.853E+03 I.207E+03 I I I... I.329E+01 I.904E+03 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 9 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 21.7 The above information is revealing. The pattern of the CCM of the residuals after an AR(2) fit is consonant with that of a vector white noise process. Hence an AR(2) model may be appropriate for the data. However, we also note that there is only one non-zero residual CCM after an AR(1) fit, and this is the lag 1 CCM. The estimates of Φ 1 are approximately 0.88I. It is possible to interpret this as a need to difference the series. However, it may be more prudent to continue with an AR(1) term. Hence the possibility of a vector ARMA(1,1) model is also suggested.

192 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.53 Extended sample cross correlation matrices The method used above for the identification of a mixed vector ARMA process is somewhat ad hoc. Two other methods for such an identification have also been developed. One is the multivariate extension of the univariate extended sample autocorrelation function, or sample EACF (see Section of Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1). The multivariate method uses the extended sample cross correlation matrices, or sample ECCM (see Tiao and Tsay 1983). As in the univariate case, the ECCM provides a unified approach to the identification of both mixed stationary and nonstationary vector ARMA models. A table of values and a simplified summary table are produced that are analogous to those of the univariate case. However, in the vector case, matrices rather than scalar values, or symbols, are displayed. In the condensed summary, the 'O' and 'X' symbols of the univariate table are replaced by matrices in the reduced ('+', '-', '.') form. Model identification is the same as in the univariate case. That is, we attempt to determine a triangle of "insignificant" matrices. The vertex of this triangle designates the ARMA maximum orders. The ECCM paragraph is used to compute and display the extended sample cross correlation matrices of a vector time series. We can compute the ECCM here by entering -->ECCM STOCK,CARPD,CPI TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) *** ALL SERIES HAVE BEEN NORMALIZED *** THE ECCM TABLE: (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6)

193 4.54 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES (P= 0) (P= 1) (P= 2) (P= 3) (P= 4) (P= 5) (P= 6) ***** THE SIMPLIFIED ECCM TABLE ***** (Q-->) I (P= 0)I I I I (P= 1)I I I I (P= 2)I I I I (P= 3)I I I I (P= 4)I I I I (P= 5)I I I I (P= 6)I I I

194 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.55 The initial summary information is the same as that produced by the CCM or STEPAR paragraph. A triangle of insignificant matrices appears to emanate from p=1, q=1. Hence we may wish to consider the use of a vector ARMA(1,1) model. Due to sampling fluctuations, the condensed ECCM table may not always provide a clear pattern as above. However, it may indicate a few possible sets of candidates for p and q. Moreover, this method is only effective for nonseasonal series, and is appropriate if the number of series involved is small (e.g., for no more than 3 or 4 series). When the number of series increase, the ECCM method tends to identify a low order vector ARMA model (such as ARMA(1,1)) for the series. Smallest canonical correlation table To correct the latter deficiency in the ECCM method, Tiao and Tsay (1985) developed an improved identification method employing a smallest canonical correlation analysis (SCAN) of the vector series. This method may also uncover hidden relationships that may exist among the series. The approach utilizes a canonical correlation of the vector series and the smallest eigenvalue for a computed matrix. A two-way table of statistics is derived. Each statistic is a function of the smallest eigenvalue of a matrix derived from the autocovariance matrices of the series. The table that summarizes the results is called the smallest canonical correlation (SCAN) table. The SCAN table produced for vector series is the same as that produced in the univariate case (see Section of Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1). That is, unlike the ECCM results, the SCAN table consists of scalar values and the "scalar significance symbols", 'O' and 'X'. Here we search for a corner of insignificant values, the upper left-hand corner of which indicating the maximum orders of p and q. In this example, the SCAN table indicates that the values for p and q are 1. To obtain the SCAN table for this vector series, we can simple enter -->SCAN STOCK, CARPD, CPI TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) THE SCAN TABLE (NORMALIZED BY 1% CHI-SQUARE CRITICAL VALUES): Q:

195 4.56 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES THE CRITERION VALUES (NORMALIZED BY 1% CHI-SQUARE CRITICAL VALUES): Q: SIMPLIFIED SCAN TABLE (1% LEVEL): Q: : X X X O O O O 1: X O O O O O O 2: O O O O O O O 3: O O O O O O O 4: O O O O O O O 5: O O O O O O O 6: O O O O O O O NO. OF ZERO EIGENVALUES: Q: : : : : : : : Model specification and estimation for the U.K. financial data We have evidence to consider either a vector AR(2) or a vector ARMA(1,1) model for the data. Tiao and Box (1981) noted that the ARMA(1,1) model provided a slightly better representation. For that reason, we will pursue the ARMA(1,1) model at this time. We can compare results for this model with those obtained from the AR(2) fit in STEPAR above. We can specify our model by entering -->MTSMODEL UKMODEL. SERIES ARE STOCK,CARPD,CPI. --> MODEL IS --> CONSTRAINTS ARE SUMMARY FOR MULTIVARIATE ARMA MODEL -- UKMODEL VARIABLE DIFFERENCING STOCK CARPD CPI

196 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.57 PARAMETER FACTOR ORDER CONSTRAINT 1 CONST CONSTANT PHI1 REG AR 1 CPHI1 3 THETA REG MA 1 CTHETA We will begin the model estimation using the default method (i.e., the conditional method). The command and the output are shown below. -->MESTIM UKMODEL. SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 STOCK CARPD CPI NUMBER OF OBSERVATIONS = 62 (EFFECTIVE NUMBER = NOBE = 61) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) CONSTANT( 3) AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) AUTOREGRESSIVE ( 1, 1, 3) AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) AUTOREGRESSIVE ( 1, 2, 3) AUTOREGRESSIVE ( 1, 3, 1) AUTOREGRESSIVE ( 1, 3, 2) AUTOREGRESSIVE ( 1, 3, 3) MOVING AVERAGE ( 1, 1, 1) MOVING AVERAGE ( 1, 1, 2) MOVING AVERAGE ( 1, 1, 3) MOVING AVERAGE ( 1, 2, 1) MOVING AVERAGE ( 1, 2, 2) MOVING AVERAGE ( 1, 2, 3) MOVING AVERAGE ( 1, 3, 1) MOVING AVERAGE ( 1, 3, 2) MOVING AVERAGE ( 1, 3, 3) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 15 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES

197 4.58 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX Almost one-half of the ARMA parameter estimates are within two standard errors of zero. As a result, we will specify values in the constraint and parameter matrices in order to "zero out" these terms and then re-estimate the model. -->PHI1(1,3)=0.0 -->PHI1(2,1)=0.0 -->PHI1(2,3)=0.0 -->PHI1(3,1)=0.0 -->PHI1(3,2)=0.0 -->THETA(1,1)=0.0 -->THETA(1,3)=0.0 -->THETA(2,2)=0.0 -->THETA(2,3)=0.0 -->CPHI1(1,3)=1 -->CPHI1(2,1)=1

198 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES >CPHI1(2,3)=1 -->CPHI1(3,1)=1 -->CPHI1(3,2)=1 -->CTHETA(1,1)=1 -->CTHETA(1,3)=1 -->CTHETA(2,2)=1 -->CTHETA(2,3)=1 -->MESTIM UKMODEL. SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 STOCK CARPD CPI NUMBER OF OBSERVATIONS = 62 (EFFECTIVE NUMBER = NOBE = 61) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) CONSTANT( 3) AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) *FIXED* AUTOREGRESSIVE ( 1, 1, 3) *FIXED* AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) *FIXED* AUTOREGRESSIVE ( 1, 2, 3) *FIXED* AUTOREGRESSIVE ( 1, 3, 1) *FIXED* AUTOREGRESSIVE ( 1, 3, 2) AUTOREGRESSIVE ( 1, 3, 3) *FIXED* MOVING AVERAGE ( 1, 1, 1) MOVING AVERAGE ( 1, 1, 2) *FIXED* MOVING AVERAGE ( 1, 1, 3) MOVING AVERAGE ( 1, 2, 1) *FIXED* MOVING AVERAGE ( 1, 2, 2) *FIXED* MOVING AVERAGE ( 1, 2, 3) MOVING AVERAGE ( 1, 3, 1) MOVING AVERAGE ( 1, 3, 2) MOVING AVERAGE ( 1, 3, 3) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 6

199 4.60 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX Again, there are many ARMA parameter estimates that are not statistically different from zero. We will restrict the model further and re-estimate it. -->PHI1(1,2)=0.0 -->THETA(1,2)=0.0 -->THETA(2,1)=0.0 -->THETA(3,2)=0.0 -->CPHI1(1,2)=1 -->CTHETA(1,2)=1 -->CTHETA(2,1)=1 -->CTHETA(3,2)=1 -->MESTIM UKMODEL. SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S)

200 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STOCK CARPD CPI NUMBER OF OBSERVATIONS = 62 (EFFECTIVE NUMBER = NOBE = 61) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) CONSTANT( 3) AUTOREGRESSIVE ( 1, 1, 1) *FIXED* AUTOREGRESSIVE ( 1, 1, 2) *FIXED* AUTOREGRESSIVE ( 1, 1, 3) *FIXED* AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) *FIXED* AUTOREGRESSIVE ( 1, 2, 3) *FIXED* AUTOREGRESSIVE ( 1, 3, 1) *FIXED* AUTOREGRESSIVE ( 1, 3, 2) AUTOREGRESSIVE ( 1, 3, 3) *FIXED* MOVING AVERAGE ( 1, 1, 1) *FIXED* MOVING AVERAGE ( 1, 1, 2) *FIXED* MOVING AVERAGE ( 1, 1, 3) *FIXED* MOVING AVERAGE ( 1, 2, 1) *FIXED* MOVING AVERAGE ( 1, 2, 2) *FIXED* MOVING AVERAGE ( 1, 2, 3) MOVING AVERAGE ( 1, 3, 1) *FIXED* MOVING AVERAGE ( 1, 3, 2) MOVING AVERAGE ( 1, 3, 3) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 4 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE

201 4.62 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX Since the model contains an MA component, we will re-estimate the final model using the exact algorithm. The SCA command and its output are shown below. In this case, the parameter estimates for the MA matrix are similar between the two estimation methods. This is due to the fact that the MA parameter is far from the unit root. -->MESTIM UKMODEL. METHOD IS EXACT. HOLD RESID(R1,R2,R3). SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 STOCK CARPD CPI NUMBER OF OBSERVATIONS = 62 (EFFECTIVE NUMBER = NOBE = 61) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE CONSTANT( 1) CONSTANT( 2) CONSTANT( 3) AUTOREGRESSIVE ( 1, 1, 1) *FIXED* AUTOREGRESSIVE ( 1, 1, 2) *FIXED* AUTOREGRESSIVE ( 1, 1, 3)

202 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.63 *FIXED* AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) *FIXED* AUTOREGRESSIVE ( 1, 2, 3) *FIXED* AUTOREGRESSIVE ( 1, 3, 1) *FIXED* AUTOREGRESSIVE ( 1, 3, 2) AUTOREGRESSIVE ( 1, 3, 3) *FIXED* MOVING AVERAGE ( 1, 1, 1) *FIXED* MOVING AVERAGE ( 1, 1, 2) *FIXED* MOVING AVERAGE ( 1, 1, 3) *FIXED* MOVING AVERAGE ( 1, 2, 1) *FIXED* MOVING AVERAGE ( 1, 2, 2) *FIXED* MOVING AVERAGE ( 1, 2, 3) MOVING AVERAGE ( 1, 3, 1) *FIXED* MOVING AVERAGE ( 1, 3, 2) MOVING AVERAGE ( 1, 3, 3) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: CHANGE IN (-2*LOG LIKELIHOOD)/NOBE.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 2 FINAL MODEL SUMMARY WITH MAXIMUM LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS

203 4.64 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES ERROR COVARIANCE MATRIX *(LOG LIKELIHOOD AT FINAL ESTIMATES) IS E Diagnostic checking and implication of the fitted model As a diagnostic check of the final restricted ARMA(1,1) model, we may plot the residual series of the fit (R1, R2 and R3) and compute their CCM. From the residual plots (not shown here) and the CCM, we conclude that the model is an adequate representation for the vector time series. Below is the display for the CCM output. -->CCM R1,R2,R3. MAXLAG IS 10. TIME PERIOD ANALYZED TO 62 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH

204 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.65 LAGS 7 THROUGH Using the results of the final restricted fit, we obtain the following (approximate) equations for each component series: (1 0.98B)STOCK = a (28) t t 1t (1 0.93B)CARPD = a (29) 2t (1 0.84B)CPI = a + ( B)a. (30) t 1(t 1) 3t If we substitute (28) into (30), we obtain (1 0.84B)CPI = (1 0.98B)STOCK + ( B)a. (31) t t 1 3t We see from (28), (29) and (30) that all three series behave approximately as random walks with slightly correlated innovations. From the forecasting point of view, equation (31) is of some interest since it implies that STOCK is a leading indicator at lag 1 for CPI. Its effect, however, is small. This can be confirmed if (31) is compared to a fit of a univariate ARIMA model of CPI (see Tiao and Box, 1981). Although nothing surprising is revealed from this model, we are certain that we have not been misled in any way. 4.5 Modeling Seasonal Data: Census Housing Data In previous examples, we modeled nonseasonal vector time series. In this section we consider a seasonal example. The bivariate vector time series we will use consists of monthly U.S. housing starts and monthly housing sales of single family houses from January 1965 to December 1975 (all data are in thousands of units). These series, obtained originally from the U.S. Bureau of the Census, have been studied by Hillmer and Tiao (1979) and others. The data are stored in the SCA workspace under the names HSTARTS and HSOLD. The series are listed in Table 4 and are plotted (using SCAGRAF) in Figure 4. Table 4 Census Housing Data Single family housing starts (in thousands) Jan Dec Read across

205 4.66 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Single family houses sold (in thousands) Jan Dec Read across Figure 4. Plots of Census Housing Data

206 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Preliminary model identification We observe a strong seasonal behavior in both series. Hence it is likely that we will 12 need to consider modeling the seasonally differenced series (1 B )HSTARTS t and 12 (1 B )HSOLD t. We can confirm the nonstationary behavior if we compute the CCM for the original (undifferenced) series. SCA output is edited for presentation purposes. -->CCM HSTARTS, HSOLD TIME PERIOD ANALYZED TO 132 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 HSTARTS HSOLD BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH LAGS 19 THROUGH The presence of matrices consisting of '+' values for lags 12 and 24 of the CCM confirms our suspicions that a 12th order differencing for both series is required. (Note: The univariate ACF of each series, not shown, may demonstrate more clearly the need for 12th order differencing.) We will now re-compute the CCM for the appropriately differenced data. Again, SCA output is edited for presentation purposes. -->CCM HSTARTS, HSOLD. DFORDER IS 12.

207 4.68 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 12 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 132 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 HSTARTS HSOLD BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH LAGS 19 THROUGH The behavior of the 12th and 24th lags indicates that we have partially accounted for nonstationarity due to the seasonal behavior of the series. The long string of matrices consisting of '+' values clearly indicates that a low order MA model is not appropriate. However, the positioning of the string, beginning at lag 1, may also make us wonder whether first order differencing is warranted. Since we wish to approach differencing cautiously, we will first examine the results from stepwise fits of the seasonally differenced vector series. We will consider stepwise fitting through 6 lags (SCA output is edited for presentation purposes). -->STEPAR HSTARTS, HSOLD. DFORDER IS 12. ARFITS ARE 1 TO 6.

208 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 132 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 HSTARTS HSOLD DETERMINANT OF S(0) = E+04 ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.544E+02 I.174E+02 I I I + + I.202E+02 I.571E+02 I I I I.527E+02 I.172E+02 I 4.67 I I.. I.201E+02 I.557E+02 I I I I.521E+02 I.171E+02 I 1.92 I I.. I.200E+02 I.550E+02 I I I I.515E+02 I.164E+02 I 5.01 I I.. I.195E+02 I.545E+02 I I I I.504E+02 I.160E+02 I 4.95 I I.. I.189E+02 I.534E+02 I I I I.497E+02 I.153E+02 I 6.35 I I.. I.179E+02 I.524E+02 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 4 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 13.3 We observe a dramatic cut-off in the χ2 statistic after the first lag. In addition, the value of the AIC is its smallest for an AR(1) fit. We see (from the SIGNIFICANCE OF PARTIAL AR. COEFF. column) that Φ 1 consists entirely of significantly positive values. It would be of interest to observe how close its diagonal elements are to 1 (as the question of first order differencing is still unresolved) as well as the patterns, if any, in the CCM of the residuals of a first-order fit. We can obtain this information if we enter the following (SCA output is edited) -->STEPAR HSTARTS, HSOLD. DFORDER IS 12. ARFIT IS --> RCCM IS 1. MAXLAG IS 36. OUTPUT LEVEL(DETAILED). 12 DIFFERENCE ORDERS (1-B ) TIME PERIOD ANALYZED TO 132 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 HSTARTS HSOLD

209 4.70 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES AUTOREGRESSIVE FITTING ON LAG(S) 1 === PHI( 1) === STANDARD ERRORS RESIDUAL COVARIANCE MATRIX S( 1).541E E E+02 RESIDUAL CORRELATION MATRIX RS( 1) BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH LAGS 19 THROUGH

210 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.71 LAGS 25 THROUGH LAGS 31 THROUGH ========== STEPWISE AUTOREGRESSION SUMMARY ========== I RESIDUAL I EIGENVAL.I CHI-SQ I I SIGNIFICANCE LAG I VARIANCESI OF SIGMA I TEST I AIC I OF PARTIAL AR COEFF I.541E+02 I.176E+02 I I I + + I.208E+02 I.573E+02 I I I NOTE: CHI-SQUARED CRITICAL VALUES WITH 4 DEGREES OF FREEDOM ARE 5 PERCENT: PERCENT: 13.3 The values of the Φ matrix do not suggest that further differencing is needed. The residuals of this fit may at first look appear relatively clean. However, 3 of the 4 values of the CCM at lag 12 are (negatively) significant. This is a clear indication of the need of a moving average matrix at this lag. The summary information shows that the sample mean of each seasonally differenced series is dwarfed by its standard error Model specification and estimation Based on the results of the above STEPAR paragraph, the model t ( I ΦB)( I B )Z = ( I Θ B ) a t may be a reasonable representation for the vector series. We can specify this model by entering -->MTSMODEL HOUSING. SERIES ARE HSTARTS(12),HSOLD(12). MODEL --> (1-PHI*B)SERIES=(1-THETA*B**12)NOISE. CONSTRAINTS --> PHI(CPHI),THETA(CTHETA). SUMMARY FOR MULTIVARIATE ARMA MODEL -- HOUSING VARIABLE DIFFERENCING HSTARTS 12 HSOLD 12 PARAMETER FACTOR ORDER CONSTRAINT 1 PHI REG AR 1 CPHI 2 THETA REG MA 12 CTHETA Since we have a moving average parameter matrix in the model, it may be important to use the exact likelihood algorithm for estimation. An efficient way to proceed is to first

211 4.72 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES estimate the model using the conditional likelihood algorithm (i.e., the default method) and then re-estimate the model using the exact method. Hence we may sequentially enter the following -->MESTIM HOUSING. SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 HSTARTS HSOLD NUMBER OF OBSERVATIONS = 132 (EFFECTIVE NUMBER = NOBE = 119) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) MOVING AVERAGE (12, 1, 1) MOVING AVERAGE (12, 1, 2) MOVING AVERAGE (12, 2, 1) MOVING AVERAGE (12, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 7 FINAL MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE

212 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.73 STANDARD ERRORS ERROR COVARIANCE MATRIX >MESTIM HOUSING. METHOD IS EXACT. SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 HSTARTS HSOLD NUMBER OF OBSERVATIONS = 132 (EFFECTIVE NUMBER = NOBE = 119) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) MOVING AVERAGE (12, 1, 1) MOVING AVERAGE (12, 1, 2) MOVING AVERAGE (12, 2, 1) MOVING AVERAGE (12, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: MAXIMUM NUMBER OF ITERATIONS 10 REACHED TOTAL NUMBER OF ITERATIONS IS 14 FINAL MODEL SUMMARY WITH MAXIMUM LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE

213 4.74 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX *(LOG LIKELIHOOD AT FINAL ESTIMATES) IS E+03 There are significant dissimilarities between the estimates using the conditional method and using the exact method. While the estimates of Φ remain essentially unchanged, the diagonal terms of Θ move to unity when the exact method is used. Moreover, the diagonal terms of Σ are appreciably reduced using the exact method (i.e., the variances of the residual series are made much smaller). This provides clear evidence of the usefulness of the exact method; in particular, when seasonal moving average terms are involved. We also note in the fit using the exact method that the off-diagonal terms of Θ are within 2 standard errors of 0. We will now zero out these terms and re-estimate the model. We may enter the following sequence for this purpose. -->THETA(1,2)=0.0 -->THETA(2,1)=0.0 -->CTHETA(1,2)=1 -->CTHETA(2,1)=1 -->MESTIM HOUSING. METHOD IS EXACT. HOLD RESIDUALS(R1,R2). SUMMARY FOR THE MULTIVARIATE ARMA MODEL SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 HSTARTS HSOLD NUMBER OF OBSERVATIONS = 132 (EFFECTIVE NUMBER = NOBE = 119) MODEL SPECIFICATION WITH PARAMETER VALUES PARAMETER PARAMETER PARAMETER NUMBER DESCRIPTION VALUE

214 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES AUTOREGRESSIVE ( 1, 1, 1) AUTOREGRESSIVE ( 1, 1, 2) AUTOREGRESSIVE ( 1, 2, 1) AUTOREGRESSIVE ( 1, 2, 2) MOVING AVERAGE (12, 1, 1) *FIXED* MOVING AVERAGE (12, 1, 2) *FIXED* MOVING AVERAGE (12, 2, 1) MOVING AVERAGE (12, 2, 2) ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: CHANGE IN (-2*LOG LIKELIHOOD)/NOBE.LE..100E-03 TOTAL NUMBER OF ITERATIONS IS 5 FINAL MODEL SUMMARY WITH MAXIMUM LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX *(LOG LIKELIHOOD AT FINAL ESTIMATES) IS E+03

215 4.76 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Diagnostic checks of the estimated model The residuals of the latest fit have been retained in the SCA workspace under the variable names R1 and R2. The plots of these series do not display visible anomalies. The CCM of these series are shown below (SCA output is edited). The plots and CCM indicate that we have an adequate representation for the vector series. -->CCM R1, R2. TIME PERIOD ANALYZED TO 132 EFFECTIVE NUMBER OF OBSERVATIONS (NOBE) SERIES NAME MEAN STD. ERROR 1 R R NOTE: THE APPROX. STD. ERROR FOR THE ESTIMATED CORRELATIONS BELOW IS (1/NOBE**.5) = SAMPLE CORRELATION MATRIX OF THE SERIES SUMMARIES OF CROSS CORRELATION MATRICES USING +,-,., WHERE + DENOTES A VALUE GREATER THAN 2/SQRT(NOBE) - DENOTES A VALUE LESS THAN -2/SQRT(NOBE). DENOTES A NON-SIGNIFICANT VALUE BASED ON THE ABOVE CRITERION BEHAVIOR OF VALUES IN (I,J)TH POSITION OF CROSS CORRELATION MATRIX OVER ALL OUTPUTTED LAGS WHEN SERIES J LEADS SERIES I CROSS CORRELATION MATRICES IN TERMS OF +,-,. LAGS 1 THROUGH LAGS 7 THROUGH LAGS 13 THROUGH

216 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.77 LAGS 19 THROUGH In the above results, we find the value of the moving average terms are quite close to unity. In such a case, it suggests that both series have a deterministic seasonal components. In this example, it shows that (1) the existence of a deterministic seasonal component can be detected when the exact likelihood method is employed, and (2) an appreciable reduction in the one-step-ahead forecast variance can occur by modeling several series jointly. 4.6 Automatic Vector ARMA Estimation As shown in the previous sections, a major task in the vector ARMA model estimation is to zero out insignificant parameter estimates in the parameter matrices. This tedious task is automatically accomplished by the IMESTIM paragraph in the SCA-EXPERT module. In the IMESTIM paragraph, insignificant parameter estimates are deleted according to the following rules: (1) In the first model revision, only the parameter estimates with absolute t-values less than 1.64 (rather than 1.96) are deleted. After the first model revision, the critical t-value is set to the typical This is to avoid deleting too many parameters in the first model revision. (2) In the first model revision, if both nonseasonal AR and MA polynomials are present, only insignificant terms in the MA parameter matrices will be deleted. The same rule also applies if both seasonal AR and seasonal MA polynomials are present. This is to circumvent potential problems due to high correlation between the AR and MA parameters in the initial model estimation. (3) The constant term can only be deleted at the very last model revision. In addition, the user has an option to specify that all constant term parameters be retained even if the estimates are insignificant. (4) To avoid any non-stationary or non-invertible vector AR or MA polynomial, the initial values of the parameters are reset to 0.1 before estimation in each model revision. To illustrate the use and the performance of the IMESTIM paragraph, we employ the examples shown in the previous sections. Detailed comments are presented in each subsection of the example shown.

217 4.78 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES The simulated example in Section 2 In this example, the MA(1) model is specified in the MTSMODEL paragraph in a similar manner to that in Section 2.2. The only major difference is that the default constraint matrix automatically generated by the MTSMODEL paragraph will be used. Thus the CONSTRAINT sentence need not be specified. The default constraint matrix CTHETA for the parameter matrix THETA is shown in the model summary. After the vector ARMA model is properly specified in the MTSMODEL paragraph, the IMESTIM paragraph is used to estimate the model automatically. -->MTSMODEL MA1. SERIES ARE Z1,Z2. MODEL --> SERIES=CONSTANT+(1-THETA*B)NOISE. SUMMARY FOR MULTIVARIATE ARMA MODEL -- MA1 VARIABLE DIFFERENCING Z1 Z2 PARAMETER FACTOR ORDER CONSTRAINT 1 CONSTANT CONSTANT THETA REG MA 1 CTHETA -- -->IMESTIM MA1. HOLD RESIDUALS(R1,R2). SUMMARY OF THE TIME SERIES SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 Z Z ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 8 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE

218 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.79 STANDARD ERRORS ERROR COVARIANCE MATRIX Since all parameter estimates in the Θ matrix are significant, no model revision is necessary. The parameter estimates are the same as those shown in Section 2.3. To reduce the amount of output, the correlation matrix of the parameters is not displayed in the normal level of output. Since this is an MA model, we may re-estimate the model by either the MESTIM or IMESTIM using the exact algorithm (by specifying the sentence METHOD IS EXACT) The Lydia Pinkham example in Section 3 In this example, the vector AR(3) model is specified by the following MTSMODEL paragraph. It is then followed by the IMESTIM paragraph. -->MTSMODEL LPMODEL. SERIES ARE --> MODEL IS (1-PHI1*B-PHI2*B**2-PHI3*B**3)SERIES=CONST+NOISE. SUMMARY FOR MULTIVARIATE ARMA MODEL -- LPMODEL VARIABLE DIFFERENCING ADVS SALES PARAMETER FACTOR ORDER CONSTRAINT 1 CONST CONSTANT 0 CCONST 2 PHI1 REG AR 1 CPHI1 3 PHI2 REG AR 2 CPHI2 4 PHI3 REG AR 3 CPHI >IMESTIM LPMODEL. HOLD RESIDUALS(R1,R2). SUMMARY OF THE TIME SERIES SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 ADVS SALES ERROR COVARIANCE MATRIX

219 4.80 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES E+06 ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 5 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 2 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 3 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 11

220 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.81 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 2 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 3 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 2 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 11 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( )

221 4.82 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 2 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ESTIMATES OF PHI( 3 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX In the first model revision of the above IMESTIM paragraph, all insignificant parameter estimates, except Θ3(1,1) and Θ3(1,2), have absolute t-values less than 1.64 and therefore are deleted (i.e., constrained to be zero). After the first model revision, the Θ2(1,1) and Θ3 (1,1) estimates become insignificant and thus are deleted in the second model revision. The model for the Lydia Pinkham example is obtained in just one single SCA command by using the IMESTIM paragraph The U.K. financial data example in Section 4 In the U.K. financial example, the ARMA(1,1) model is specified by the following MTSMODEL paragraph. We then use the IMESTIM paragraph to estimate the model parameters. -->MTSMODEL UKMODEL. SERIES ARE --> MODEL IS (1-PHI1*B)SERIES=CONST+(1-THETA*B)NOISE.

222 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.83 SUMMARY FOR MULTIVARIATE ARMA MODEL -- UKMODEL VARIABLE DIFFERENCING STOCK CARPD CPI PARAMETER FACTOR ORDER CONSTRAINT 1 CONST CONSTANT 0 CCONST 2 PHI1 REG AR 1 CPHI1 3 THETA REG MA 1 CTHETA -- -->IMESTIM UKMODEL. SUMMARY OF THE TIME SERIES SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 STOCK CARPD CPI ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 15 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE

223 4.84 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STANDARD ERRORS ERROR COVARIANCE MATRIX ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 11 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX

224 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 2 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 10 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX

225 4.86 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 3 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 15 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES CONSTANT VECTOR (STD ERROR) ( ) ( ) ( ) PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX MAXIMUM NUMBER OF REVISIONS REACHED -- In the first model revision in the above IMESTIM paragraph, only 3 insignificant terms for the MA(1) matrix ( Θ 1) are constrained to zeroes. The parameter Θ 1 (1,1) has a t-value about and therefore is not constrained to zero. In the second model revision, all insignificant terms in both AR and MA parts are constrained to be zero. Thus only the

226 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.87 diagonal elements of Φ1 and Θ 1 (3,1) and Θ 1 (3,3) are retained. After the estimation of the second model, we find Θ 1 (3,1) to be insignificant. Thus only Θ 1 (3,3) remains significant in the MA part. In this IMESTIM paragraph, the model estimation terminated at the maximum number of revisions. The maximum number of revisions can be increased by the MAXREVISION sentence. However, the results remain to be the same in this case (since all parameter estimates are significant now). The parameter estimates obtained under the IMESTIM paragraph may be somewhat different from those obtained using the MESTIM paragraph. This is due to the fact that the IMESTIM paragraph always resets the initial values to 0.1 for the parameters to be estimated. Thus, slightly different results may occur if some parameter estimates are marginally significant or insignificant The Census housing data example in Section 5 In the Census housing data example, the seasonal vector ARMA model can be specified as below. It is then followed by the IMESTIM paragraph. -->MTSMODEL HOUSING. SERIES ARE HSTARTS(12),HSOLD(12). MODEL --> (1-PHI*B)SERIES=(1-THETA*B**12)NOISE. SUMMARY FOR MULTIVARIATE ARMA MODEL -- HOUSING VARIABLE DIFFERENCING HSTARTS 12 HSOLD 12 PARAMETER FACTOR ORDER CONSTRAINT 1 PHI REG AR 1 CPHI 2 THETA REG MA 12 CTHETA -- -->IMESTIM HOUSING. SUMMARY OF THE TIME SERIES SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 HSTARTS HSOLD ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 7

227 4.88 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS ERROR COVARIANCE MATRIX ================================================= RESULTS OF THE REVISED MODEL: REVISION NUMBER 1 ================================================= ITERATIONS TERMINATED DUE TO: RELATIVE CHANGE IN DETERMINANT OF COVARIANCE MATRIX.LE E-03 TOTAL NUMBER OF ITERATIONS IS 7 MODEL SUMMARY WITH CONDITIONAL LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE

228 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.89 STANDARD ERRORS ERROR COVARIANCE MATRIX In the above IMESTIM paragraph, the insignificant seasonal MA terms (in Θ12) are constrained to be zero and the final model is obtained in just one model revision. Since seasonal MA polynomial is present, it is desirable to estimate the final model using the exact method. This can be accomplished by using either the MESTIM or the IMESTIM paragraph (as shown below). -->IMESTIM HOUSING. METHOD IS EXACT. SUMMARY OF THE TIME SERIES SERIES NAME MEAN STD DEV DIFFERENCE ORDER(S) 1 HSTARTS HSOLD ERROR COVARIANCE MATRIX ITERATIONS TERMINATED DUE TO: CHANGE IN (-2*LOG LIKELIHOOD)/NOBE.LE E-03 TOTAL NUMBER OF ITERATIONS IS 16 MODEL SUMMARY WITH MAXIMUM LIKELIHOOD PARAMETER ESTIMATES PHI MATRICES ESTIMATES OF PHI( 1 ) MATRIX AND SIGNIFICANCE STANDARD ERRORS THETA MATRICES ESTIMATES OF THETA( 12 ) MATRIX AND SIGNIFICANCE

229 4.90 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STANDARD ERRORS ERROR COVARIANCE MATRIX *(LOG LIKELIHOOD AT FINAL ESTIMATES) IS E+03 As shown in the above examples, we find the IMESTIM paragraph performs well in the automatic deletion of insignificant parameter estimates and in obtaining a final parsimonious model that includes only significant parameters.

230 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.91 SUMMARY OF THE SCA PARAGRAPHS This section provides a summary of those SCA paragraphs employed in this document. The syntax for the paragraphs is presented in both brief and full form. The brief display of the syntax contains the most frequently used sentences of a paragraph, while the full display presents all possible modifying sentences of a paragraph. In addition, special remarks related to a paragraph may also be presented with the description. It is recommended that the brief form be used before employing any System capability that can be accessed only through the use of the full form of the paragraph syntax. Each SCA paragraph begins with a paragraph name and is followed by modifying sentences. Sentences that may be used as modifiers for a paragraph are shown below and the types of arguments used in each sentence are also specified. Sentences not designated required may be omitted as default conditions (or values) exist. The most frequently used required sentence is given as the first sentence of the paragraph. The portion of this sentence that may be omitted is underlined. This portion may be omitted only if this sentence appears as the first sentence in a paragraph. Otherwise, all portions of the sentence must be used. The last character of each line except the last line must be the continuation character, "@". The paragraphs to be explained in this summary are CCM, STEPAR, MIDEN, ECCM, SCAN, MTSMODEL, MESTIM, IMESTIM, MFORECAST and CANONICAL. Legend v : variable or model name r : real value i : integer w (.) : keyword with argument) CCM Paragraph The CCM paragraph is used to compute sample cross correlation matrices of vector time series. Syntax for the CCM Paragraph Brief syntax CCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, Required sentence: VARIABLE

231 4.92 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Full syntax CCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS DESCRIBE./NO SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1, w2, - - -). Required sentence: VARIABLES Sentence Used in the CCM Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names (labels) of the series to be analyzed. DFORDERS sentence The DFORDERS sentence is used to specify the orders of differencing to be applied on all the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none. MAXLAG sentence The MAXLAG sentence is used to specify the maximum number of lagged sample crosscorrelation matrices to be computed. Default is 24. DESCRIBE sentence The DESCRIBE sentence is used to specify the display of descriptive statistics and principal component information of the original series (after any differencing). The default is NO DESCRIBE. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span of the series which is the span of the shortest series if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output is specified. Output may then be increased from this level by use of PRINT, or decreased from this level by use of NOPRINT. The keywords for LEVEL and output printed are: NORMAL: DETAILED: SIGNS CCM_VALUES, and SIGNS

232 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.93 where the reserved words on the right denote: CCM_VALUES: the values of cross-correlation matrices SIGNS: the significant signs for CCM's These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. STEPAR Paragraph The STEPAR paragraph is used to perform the stepwise autoregressive fitting of vector series. Syntax for the STEPAR Paragraph Brief syntax Full syntax STEPAR VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - ARFITS ARE i1, i2, Required sentences: VARIABLES and ARFITS STEPAR VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS DESCRIBE./NO ARFITS ARE i1, i2, - - RCCMS ARE i1, i2, - - STANDARDIZED./NO SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1,w2,- - HOLD PHI(v1,v2,- - -), RESIDUALS(v1,v2,- - COVARIANCE(v). Required sentences: VARIABLES and ARFITS

233 4.94 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Sentence Used in the STEPAR Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names (labels) of the series to be analyzed. DFORDERS sentence The DFORDERS sentence is used to specify the orders of differencing to be applied on all the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none. MAXLAG sentence The MAXLAG sentence is used to specify the maximum number of lagged sample crosscorrelation matrices to be computed. The default is 24. DESCRIBE sentence The DESCRIBE sentence is used to specify the display of descriptive statistics and principal component information of the original series (after any differencing). The default is NO DESCRIBE. ARFITS sentence The ARFITS sentence is used to specify the lags employed when performing stepwise autoregression fitting. RCCMS sentence This sentence is used to specify those lags in the stepwise autoregressive fits for which the sample cross correlation matrices of residual series will be computed and displayed. The number of CCM's of residual series to be computed is controlled by the MAXLAG sentence. The default is none. STANDARDIZED sentence This sentence is used to specify that the stepwise autoregression is based on the standardization of the original or differenced series. The standardized series have variances 1.0 as each series is scaled by dividing by its sample standard deviation. The default is NO STANDARDIZED. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span of the series which is the span of the shortest series if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output is specified. Output may then be increased from this level by use of PRINT, or decreased from this level by use of NOPRINT. The keywords for LEVEL and output printed are:

234 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.95 BRIEF: SUMMARY NORMAL: SUMMARY and SIGNS DETAILED: SUMMARY, CCM_VALUES, SIGNS, PHI_VALUES, STD_ERRS and CORR_PHI where the reserved words on the right denote: SUMMARY: the summary table for stepwise regression CCM_VALUES: the values of cross-correlation matrices of residual. Series after autoregressive fitting SIGNS: the significant signs for CCM's and autoregressive coefficients PHI_VALUES: the display of the estimated values of the autoregressive coefficients and principal component information. After each lag of a stepwise fit STD_ERRS: the display of the standard errors of the autoregressive coefficients and principal component information after each lag of a stepwise fit CORR_PHI: the correlation matrices for the stepwise autoregressive coefficients These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace until the end of the session. Only those statistics desired to be retained need be named. Values are placed in the associated variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: PHI: the autoregression matrices for the last autoregression fitted. The number of variable names must be the same as the number of lags specified in the ARFIT sentence. RESIDUALS: the residual series. The number of variables specified in this sentence must be the same as the number of series in the model. COVARIANCE: the covariance matrix for the noise.

235 4.96 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES MIDEN Paragraph The MIDEN paragraph is used to compute sample cross correlation matrices and the stepwise autoregressive fitting of vector series. Syntax for the MIDEN Paragraph Brief syntax MIDEN VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - ARFITS ARE i1, i2, Required sentences: VARIABLES and ARFITS Full syntax MIDEN VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS DESCRIBE./NO CCM./NO STANDARDIZED./NO ARFITS ARE i1, i2, - - RCCMS ARE i1, i2, - - SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - HOLD PHI(v1,v2,- - -), RESIDUALS(v1,v2,- - COVARIANCE(v). Required sentences: VARIABLES and ARFITS Sentence Used in the MIDEN Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names (labels) of the series to be analyzed. DFORDERS sentence The DFORDERS sentence is used to specify the orders of differencing to be applied on all the series when differencing is the stationary inducing transformation being used. The default is none.

236 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.97 MAXLAG sentence The MAXLAG sentence is used to specify the maximum number of lagged sample crosscorrelation matrices to be computed. The default is 24. DESCRIBE sentence The DESCRIBE sentence is used to specify the display of descriptive statistics and principal component information of the original series(after any differencing). The default is NO DESCRIBE. CCM sentence The CCM sentence is used to specify the calculation of the cross-correlation matrices for the original or differenced series. The default is CCM. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span available for the series which is the span of the shortest series if all series are not of equal length. STANDARDIZED sentence This sentence is used to specify that the stepwise autoregression is based on the standardization of the original or differenced series. Each series is scaled by division by its sample standard deviation. The default is NO STANDARDIZED. ARFITS sentence The ARFITS sentence is used to specify the lags employed when performing stepwise autoregression fitting. The default is none. RCCMS sentence This sentence is used to specify those lags in the stepwise autoregressive fits for which the sample cross correlation matrices of residual series will be computed and displayed. The number of CCM's of residual series to be computed is controlled by the MAXLAG sentence. The default is none. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased (decreased) from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: SUMMARY NORMAL: SUMMARY, and SIGNS DETAILED: SUMMARY, CCM_VALUES, SIGNS, PHI_VALUES, STD_ERRS and CORR_PHI where the reserved words (and keywords for PRINT, NOPRINT) on the right denote:

237 4.98 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES SUMMARY: the summary table for stepwise regression (not applicable if only CCM are computed) CCM_VALUES: the values of cross-correlation matrices (for both original and residual series after autoregressive fitting) SIGNS: the significant signs for CCM's and autoregressive coefficients PHI_VALUES: the display of the estimated values of the autoregressive coefficients and principal component information at each lag used in a stepwise fit (not applicable if only CCM are computed) STD_ERRS: the display of the standard errors of the autoregressive coefficients and principal component information at each lag used in a stepwise fit (not applicable if only CCM are computed) CORR_PHI: the correlation matrices for the stepwise autoregressive coefficients (not applicable if only CCM are computed) These reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named. Values are placed in the associated variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: PHI: the autoregression matrices for the last autoregression fitted. The number of variable names must be the same as the number of lags specified in the ARFIT sentence. RESIDUALS: the residual series. The number of variables specified in this sentence must be the same as the number of series in the model. COVARIANCE: the covariance matrix for the noise.

238 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES 4.99 ECCM Paragraph The ECCM paragraph is used to compute extended sample cross correlation matrices for a vector time series. Syntax for the ECCM Paragraph Brief syntax ECCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, Required sentence: VARIABLES Full syntax ECCM VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS AR(i1), SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1, w2, - - -). Required sentence: VARIABLES Sentence Used in the ECCM Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the names of the series to be analyzed. DFORDERS sentence This sentence is used to specify the orders of differencing to be applied to the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none. MAXLAG sentence The MAXLAG sentence is used to specify the maximum autoregressive (AR) and moving average (MA) orders to be computed and displayed. The default maximum AR and MA orders are 6. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum possible span of the series which is the span of the shortest series if the series are not of equal length.

239 4.100 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased (decreased) from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: TABLE NORMAL: TABLE, VALUES DETAILED: TABLE, VALUES, EAR where the reserved words (and keywords for PRINT, NOPRINT) on the right denote: VALUE: TABLE: EAR: values of the table derived from the sample ECCM display of the condensed summary table for the series the computed extended autoregressive coefficients for the series SCAN Paragraph The SCAN paragraph is used to compute the smallest canonical correlation (SCAN) table for a vector time series. Syntax for the SCAN Paragraph Brief syntax SCAN VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, Required sentence: VARIABLES Full syntax SCAN VARIABLES ARE v1, v2, - - DFORDERS ARE i1, i2, - - MAXLAG IS AR(i1), SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1,w2,- - -). Required sentence: VARIABLES

240 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Sentence Used in the SCAN Paragraph VARIABLES sentence The VARIABLES sentence is used to specify the name of the series to be analyzed. DFORDERS sentence This sentence is used to specify the orders of differencing to be applied to the series when differencing is the stationary inducing transformation being used. The same differencing is applied to each series. The default is none. MAXLAG sentence The MAXLAG sentence is used to specify the maximum autoregressive (AR) and moving average (MA) orders to be examined and displayed in the SCAN table. The default maximum AR and MA orders are 6. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum possible span of the series which is the span of the shortest series if the series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased (decreased) from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: TABLE NORMAL: TABLE, VALUES DETAILED: TABLE, VALUES, COUNTS, EIGENVALUES where the reserved words (and keywords for PRINT, NOPRINT) on the right denote: VALUE: values of the table derived from the analysis TABLE: display of the condensed summary table for the analysis COUNTS: number of zero eigenvalues corresponding to each entry of the SCAN table EIGENVALUES: eigenvalues and th4e corresponding chi-square values for each combination of AR and MA orders

241 4.102 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES MTSMODEL Paragraph The MTSMODEL paragraph is used to specify or modify a vector ARMA model. Syntax for the MTSMODEL Paragraph Brief syntax MTSMODEL NAME IS model-name. SERIES ARE v1(differencing operators), v2(differencing operators), MODEL IS @ Required sentences: NAME, SERIES and MODEL Full syntax MTSMODEL NAME IS model-name. SERIES ARE v1(differencing operators), v2(differencing operators), MODEL IS "model equation". STANDARDIZED./NO STANDARDIZED. CONSTRAINTS ARE "constraints matrices". COVARIANCE IS v. DEPENDENCY IS "dependency". Required sentences: NAME, SERIES and MODEL Sentence Used in the MTSMODEL Paragraph NAME sentence The NAME sentence is used to specify a label (name) for the vector ARMA model being specified in the paragraph. This label is used to refer to this model in both the MESTIM and MFORECAST paragraphs. The label may also be used to specify the model if it is to be modified in any way. SERIES sentence The SERIES sentence is used to specify the names of the variables of the vector series and associated difference operators (if any) for each series. The default is none. MODEL sentence The MODEL sentence is used to specify the vector ARMA model.

242 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES STANDARDIZED sentence This sentence is used to specify that all subsequent analysis using the above model name will be based on the standardization of the original or differenced series by division by sample standard deviation. The default is NO STANDARDIZED. CONSTRAINT sentence The CONSTRAINT sentence is used to specify the constraint matrix associated for each parameter matrix. COVARIANCE sentence The COVARIANCE sentence is used to specify an existing or a new variable where the noise covariance matrix is or will be stored. If the variable is already defined, the covariance matrix will be used as initial covariance values in estimation and the covariance values in forecasting. Otherwise the covariance matrix is calculated from the residual series derived from the specified model and initial ARMA parameter estimates. Note that the SCA System designates an internal variable to store the estimated covariance matrix and the specification of this sentence is optional. DEPENDENCY sentence The DEPENDENCY sentence is used to specify the dependency structure for the noise covariance matrix. Dependent series are grouped within pairs of parentheses. If all series are independent, the keyword NONE (or NONEXISTANT) can be used to describe the dependency structure. If all series are independent and have the same variance, the keyword DIEQUAL can be used to specify this condition. If this sentence is not specified, all series are assumed to be possibly interdependent. SHOW sentence The SHOW sentence is used to display a brief summary of the specified model. The default is SHOW.

243 4.104 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES MESTIM Paragraph The MESTIM paragraph is used to perform the estimation of the parameters of a vector ARMA model. Syntax for the MESTIM Paragraph Brief syntax MESTIM MODEL model-name. Required sentence: MODEL Full syntax MESTIM MODEL METHOD IS STOP-CRITERIA ARE MAXIT(i), SPAN IS i1, OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1,w2,- - HOLD RESIDUALS(v1,v2,- - -), FITTED(v1,v2,- - COVARIANCE(v). Required sentence: MODEL Sentence Used in the MESTIM Paragraph MODEL sentence The MODEL sentence is used to specify the label (name) of the model to be estimated. The label must be a model name specified in a previous MTSMODEL paragraph. METHOD sentence The METHOD sentence is used to specify the likelihood function used for model estimation. The keyword, w, may be CONDITIONAL for the "conditional" likelihood or EXACT for the "exact" likelihood function. The default is CONDITIONAL. STOP sentence The STOP sentence is used to specify the stopping criterion for nonlinear estimation. The argument, i, for the keyword MAXIT specifies the maximum number of iterations (default is i=10), and the argument, r, for the keyword LIKELIHOOD specifies the value of the relative convergence criterion on the likelihood function (default is r=0.001). Estimation iterations will be terminated when the relative change in the value of

244 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES likelihood function between two successive iterations is less than or equal to the convergence criterion, or if the maximum number of iterations is exceeded. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span available for the series which is the span of the shortest series if all series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: estimates and their related statistics only NORMAL: RCORR DETAILED: RCORR, ITERATION, and CORR where the reserved words (and keywords for PRINT, NOPRINT) on the right denote: RCORR: the reduced correlation matrix for the parameter estimates ITERATION: the parameter and covariance estimates for each iteration CORR: the correlation matrix for the parameter estimates HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named. Values are placed in the variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: RESIDUALS: the residual series. The number of variable names specified in this sentence must be the same as the number of series in the model. FITTED: one step ahead forecasts (fitted values) for each series. The number of variables specified in this sentence must be the same as the number of series in the model. COVARIANCE: covariance matrix for the noise series.

245 4.106 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES IMESTIM Paragraph The IMESTIM paragraph is used to estimate the parameters of a vector ARMA model and automatically delete insignificant parameter estimates in an appropriate manner. Syntax for the IMESTIM Paragraph Brief syntax IMESTIM MODEL SPAN IS i1, HOLD RESIDUALS(v1, v2,...). Required sentence: MODEL Full syntax IMESTIM MODEL METHOD IS STOP-CRITERIA ARE MAXIT(i), SPAN IS i1, MAXREVISION IS DELETE-CONSTANT/NO OUTPUT IS LEVEL(w), PRINT(w1,w2,- - NOPRINT(w1,w2,- - HOLD RESIDUALS(v1,v2,- - -), FITTED(v1,v2,- - COVARIANCE(v). Required sentence: MODEL Sentence Used in the IMESTIM Paragraph MODEL sentence The MODEL sentence is used to specify the label (name) of the model to be estimated. The label must be a model name specified in a previous MTSMODEL paragraph. METHOD sentence The METHOD sentence is used to specify the likelihood function used for model estimation. The keyword, w, may be CONDITIONAL for the "conditional" likelihood or EXACT for the "exact" likelihood function. The default is CONDITIONAL. STOP sentence The STOP sentence is used to specify the stopping criterion for nonlinear estimation. The argument, i, for the keyword MAXIT specifies the maximum number of iterations (default is i=10), and the argument, r, for the keyword LIKELIHOOD specifies the value of the relative convergence criterion on the likelihood function (default is r=0.001). Estimation iterations will be terminated when the relative change in the value of

246 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES likelihood function between two successive iterations is less than or equal to the convergence criterion, or if the maximum number of iterations is exceeded. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum span available for the series which is the span of the shortest series if all series are not of equal length. MAXREVISION sentence The MAXREVISION sentence is used to specify the maximum number of model revisions allowed in the automatic model estimation. The default is 3. DELETE-CONSTANT sentence The DELETE-CONSTANT sentence is used to specify that the insignificant terms in the constant vector can be automatically deleted. The default is NO DELETE-CONSTANT, that is all constant parameter estimates will be retained in the model even if they are insignificant. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: estimates and their related statistics only NORMAL: RCORR DETAILED: RCORR, ITERATION, and CORR where the reserved words (and keywords for PRINT, NOPRINT) on the right denote: RCORR: the reduced correlation matrix for the parameter estimates ITERATION: the parameter and covariance estimates for each iteration CORR: the correlation matrix for the parameter estimates HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named. Values are placed in the variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: RESIDUALS: FITTED: the residual series. The number of variable names specified in this sentence must be the same as the number of series in the model. one step ahead forecasts (fitted values) for each series. The number of variables specified in this sentence must be the same as the number of series in the model.

247 4.108 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES COVARIANCE: covariance matrix for the noise series. MFORECAST Paragraph The MFORECAST paragraph is used to compute the forecast of future values of a vector time series based on a specified vector ARMA model. Syntax for the MFORECAST Paragraph Brief syntax MFORECAST MODEL IS model-name. Required sentence: MODEL Full syntax MFORECAST MODEL IS model-name. ORIGINS ARE i1, i2, NOFS ARE i1, i2, NONFORECASTABLE-SERIES ARE v1, v2, NOPSIWEIGHTS IS i. OUTPUT IS PRINT(w1,w2,- - -), NOPRINT(w1,w2,- - -). HOLD FORECASTS(v1,v2,- - -), STD-ERRS(v1,v2,- @ Required sentence: MODEL Sentence Used in the MFORECAST Paragraph MODEL sentence The MODEL sentence is used to specify the label (name) of the model to be forecasted. The label must be a model name specified in a previous MTSMODEL paragraph. ORIGINS sentence The ORIGINS sentence is used to specify the time origins for forecasts. The default is one origin, the last observation. NOFS sentence The NOFS sentence is used to specify the number of forecasts to be generated for each time origin. The number of arguments in this sentence must be the same as that in the ORIGINS sentence. The default is 24 forecasts for each time origin.

248 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES NONFORECASTABLE sentence This sentence is used to specify the names of the series for which the user provides the forecasts. NOPSIWEIGHTS sentence The NOPSIWEIGHTS sentence is used to specify the number of psi weight matrices to be displayed. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved by increasing or decreasing the basic level of output by use of PRINT or NOPRINT, respectively. The keyword for PRINT and NOPRINT is: FORECASTS: forecast values for each time origin The default condition is PRINT(FORECASTS). HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named. Values are placed in the associated variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: FORECASTS: forecasts at the last time origins STD_ERRS: standard errors of the forecasts at the last time origins

249 4.110 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES CANONICAL Paragraph The CANONICAL paragraph is used to perform a canonical analysis of a vector ARMA model. Syntax for the CANONICAL Paragraph Brief syntax CANONICAL MODEL IS model-name. Required sentence: MODEL Full syntax CANONICAL MODEL IS model-name. SPAN IS i1, i2. OUTPUT ARE LEVEL(w), PRINT(- - -), NOPRINT(- - -). HOLD EVALUE(v), EVECTOR(v), TMATRICES(v1,v2,- - -), TSERIES(v1,v2,- @ Required sentence: MODEL Sentence Used in the CANONICAL Paragraph MODEL sentence The MODEL sentence is used to specify the name (label) of a model specified previously for which a canonical analysis will be performed. SPAN sentence The SPAN sentence is used to specify the span of time indices, i1 to i2, for which the data will be analyzed. The default is the maximum possible span of the series, which is the span of the shortest series if the series are not of equal length. OUTPUT sentence The OUTPUT sentence is used to control the amount of output printed for computed statistics. Control is achieved in a two stage procedure. First a basic LEVEL of output (default NORMAL) is specified. Output may then be increased (decreased) from this level by use of PRINT (NOPRINT). The keywords for LEVEL and output printed are: BRIEF: EVALUE and EVECTOR

250 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES NORMAL: EVALUE, EVECTOR and TMATRICES where the reserved words (and keywords for PRINT, NOPRINT) on the right denote: EVALUE: eigenvalues of the matrix product of two matrices. The first matrix is the inverse of the lag zero covariance matrix of the original series corresponding to the model specified in the MODEL sentence. The second matrix is the lag zero covariance matrix of the forecasted series corresponding to the model specified. In the MODEL sentence. EVECTOR: corresponding eigenvectors of the matrix product used in EVALUE. The eigenvectors are scaled so that MVM'=I, where M is the matrix of the scaled eigenvectors, V the lag zero covariance matrix of the original series corresponding to the model specified in the MODEL series, and I the identity matrix. TMATRICES: transformed parameter matrices of the given model using the eigenvector matrix M. The transformed parameter matrix is (M)P(INV(M)), where P is a parameter matrix and INV(M) is the inverse of the matrix M. The reserved words are also keywords for PRINT and NOPRINT. The default for LEVEL is NORMAL, or the level specified in the PROFILE paragraph. HOLD sentence The HOLD sentence is used to specify those values computed for particular functions to be retained in the workspace. Only those statistics desired to be retained need be named. Values are placed in the variable named in parentheses. Default is that none of the values of the above statistics will be retained after the paragraph is executed. The values that may be retained are: EVALUE: EVECTOR: eigenvalues defined in OUTPUT eigenvectors defined in OUTPUT TMATRICES: transformed parameter matrices defined in OUTPUT TSERIES: transformed data of the component series of the given model using the eigenvector matrix M. The transformed data is obtained using MZ where Z is the original series corresponding to the model specified in the MODEL sentence.

251 4.112 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES REFERENCES Akaike, H. (1973). "Information Theory and an Extension of the Maximum Likelihood Principle". Proceedings, 2nd International Symposium on Information Theory (ed. by B.N. Petrov and F. Csaki): Budapest: Akademiai Kiado. Akaike, H. (1974). "A New Look at Statistical Model Identification". IEEE Transactions on Automatic Control, AC-19: Anderson, T.W. (1971). The Statistical Analysis of Time Series. New York: Wiley. Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis: Forecasting and Control. San Francisco: Holden-Day. (Revised edition published 1976). Box, G.E.P. and Newbold, P. (1970). "Some Comments on a Paper by Coen, Gomme and Kendall". Journal Royal Statistical Society, A 134: Box, G.E.P. and Tiao, G.C. (1977). "A Canonical Analysis of Multiple Time Series". Biometrika, 64: Chatfield, C. (1985). "The Initial Examination of Data (with Discussion)". Journal of the Royal Statistical Society, A 148: Coen, P. G., Gomme, E.D. and Kendall, M.G. (1969). "Lagged Relationships in Economic Forecasting". Journal of Royal Statistical Society, A 132: 133. Granger, C.W.J. and Newbold, P. (1986). Forecasting Economic Time Series, 2nd edition. San Diego: Academic Press. Hannan, E.J. (1970). Multiple Time Series. New York: Wiley. Hillmer, S.C. and Tiao, G.C. (1979). "Likelihood Function of Stationary Multiple Autoregressive Moving Average Models". Journal of the American Statistical Association, 74: Jenkins, G.M. and Alavi, A.S. (1981). "Some Aspects of Modeling and Forecasting Multivariate Time Series". Journal of Time Series Analysis, 2: Ledolter, J. (1978). "The Analysis of Multivariate Time Series Applied to Problems in Hydrology". Journal of Hydrology, 36: Liu, L.-M. and Hudak, G.B. (1985). "Unified Econometric Model building Using Simultaneous Transfer Function Equations". Time Series Analysis: Theory and Practice 7:

252 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Liu, L.-M. (1986). "Multivariate Time Series Analysis Using Vector ARMA Models". Oak Brook, IL: Scientific Computing Associates Corp. Liu, L.-M. and Hudak, G.B. (1992). Forecasting and Time Series Analysis Using the SCA Statistical System, Volume 1. Oak Brook, IL: Scientific Computing Associates Corp. MACC (1965). GAUSHAUS - Nonlinear Least Squares. Madison, WI: Madison Academic Computing Center, University of Wisconsin. Mills, T.C. (1990). Time Series Techniques for Economists. Cambridge: Cambridge University Press. Nicholls, D.F. and Hall, A.D. (1979). "The Exact Likelihood of Multivariate Autoregressive- Moving Average Models". Biometrika, 66: Osborn, R.R. (1977). "Exact and Approximate Maximum Likelihood Estimators for Vector Moving Average Processes". Journal of the Royal Statistical Society, Series B, 39: Pankratz, A. (1991). Forecasting with Dynamic Regression Models. New York: Wiley. Parzen, E. (1977). "Multiple Time Series Modeling: Determining the Order of Approximating Autoregressive Schemes". Multivariate Analysis IV (ed. by P. Krishnaiah): Amsterdam: North-Holland. Parzen, E. and Newton, H.J. (1979). "Multiple Time Series Modeling, II". Multivariate Analysis V (ed. by P. Krishnaiah): Amsterdam: North-Holland. Phadke, M.S. and Kedem, G. (1978). "Computation of the Exact Likelihood Function of Multivariate Moving Average Models". Biometrika, 65: Quenouille, M.H. (1957). The Analysis of Multiple Time Series. London: Griffin. Tiao, G.C. and Box, G.E.P. (1981). "Modeling Multiple Time Series with Applications". Journal of the American Statistical Association, 76: Tiao, G.C., Box, G.E.P., Grupe, M.R., Hudak, G.B., Bell, W.R., and Chang, I. (1979). The Wisconsin Multiple Time Series Program (WMTS-1), A Preliminary Guide. Department of Statistics, University of Wisconsin-Madison. Tiao, G.C. and Tsay, R.S. (1983). "Multiple Time Series Modeling and Extended Sample Cross-Correlations". Journal of Business & Economic Statistics, 1: Tiao, G.C. and Tsay, R.S. (1985). "A Canonical Correlation Approach to Modeling Multivariate Time Series". American Statistical Association 1985 Proceedings of the Business and Economic Statistics Section:

253 4.114 VECTOR ARMA MODELING OF MULTIPLE TIME SERIES Tiao, G.C. and Tsay, R.S. (1989). "Model Specification in Multivariate Time Series" (with discussion). Journal of the Royal Statistical Society, Series B, 51: Tsay, R.S. (1989a). "Identifying Multivariate Time Series Models". Journal of Time Series Analysis, 10: Tsay, R.S. (1989b). "Parsimonious Parameterization of Vector Autoregressive Moving Average Models". Journal of Business & Economic Statistics, 7: Vandaele, W. (1983). Applied Time Series and box-jenkins Models. New York: Academic Press. Wei, W.W.S. (1990). Time Series Analysis: Univariate and Multivariate Methods. Redwood City, CA: Addison-Wesley. Wilson, G.T. (1973). "The Estimation of Parameters in Multivariate Time Series Models". Journal of the Royal Statistical Society, Series B, 35: Zellner, A. and Palm, F. (1974). "Time Series Analysis and Simultaneous Equation Econometric Models". Journal of Econometrics 2: