# Using graphical modelling in official statistics

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Quaderni di Statistica Vol. 6, 2004 Using graphical modelling in official statistics Richard N. Penny Fidelio Consultancy Marco Reale Mathematics and Statistics Department, University of Canterbury Summary: People using economic time series would like them to be available as soon as possible after the end of the reference period. However there can be difficulties in getting all the responses required to produce a series of acceptable quality in a timely manner. The earlier the time series is released the more likely there will be tardy respondents, thus the series will have to be estimated without their responses. As the quarterly gross domestic product (QGDP) is the aggregation of a large number of economic time series the difficulties are compounded. An adequate preliminary estimate of QGDP may be made by using models parsimonious in the number of time series involved. Graphical models assist us in obtaining such parsimonious models by identifying the relevant components in a saturated structural VAR enabling us to eliminate unnecessary delays. Even if an earlier release is not possible we could target work to improve the timeliness of series identified in the parsimonious models. Keywords: Early estimates, Irregular structures, Moralization, Structural vector autoregression. 1. Introduction A National Statistical Office (NSO) produces a large number of time series which are updated on regular basis. Some are estimates from surveys run by the NSO, some from data collected by another organization

2 for their administrative purposes (e.g. customs records), and others are combinations of a range of time series (e.g. Consumer Price Index (CPI), Quarterly Gross Domestic Product (QGDP)). It is required to produce these statistical outputs to published quality standards and in a timely manner. To some extent these criteria are related so that care must be taken that improving one aspect of the data (e.g. timeliness) does not impact on another (e.g. quality). One key technical advance would be the ability to easily identify significant relationships between time series. Each time series has particular issues. A knowledge of the relation between different time series would assist NSOs in assessing the benefits of changes in the way particular time series are compiled, particularly for those outputs, such as QGDP, that are combinations of a range of time series. We have been investigating the feasibility of using graphical modelling to identify and model the relationships between time series, particularly to identify where improvements in timeliness could be made without materially affecting quality, or increasing cost. 2. Components of time series As outlined above, the main concern of NSOs is to release data that reflect the social or economic concept that they are meant to represent, within the budget allocated for this work and, crucially, with little or no revisions after release. Much of the reporting on NSO outputs focusses on the movement represented by the new data point, that is the first difference, rather than its absolute value. For any series that is seasonal a large part of any first difference is caused by movements in the seasonal component. Hence the desire for NSOs to seasonally adjust where appropriate. For this reason most statistical agencies provide the measured figure, along with the seasonally adjusted value (where appropriate) and, increasingly, the trend estimates, and direct users to the latter series rather than the unadjusted figures. To provide seasonally adjusted and trend estimates Statistics New Zealand currently uses Census Method II Variant X-12, commonly called

3 X-12 (Findley et al., 1998). For seasonally adjusted or non-seasonal series, work done by Statistics New Zealand (Kazakova, 2001) shows that movements over short time spans will be dominated by the movement in the irregular component. As one of the key conditions for seasonal adjustment is that the seasonal pattern is stable we assume the seasonal factors are not changing significantly over a short time span. What is of interest to many uses are turning points in the trend estimate. It is well known that estimating trend components at the end of a series is difficult, with identification of turning points being particularly problematic. Often evidence of a turning point will appear first in atypical behavior of the irregular component. Therefore it is important that the estimate of the irregular component is not substantially revised. Consequently estimating the next value for all the components bar the irregular should be done well. Therefore any attempt to find a more parsimonious model for any time series should ensure that the irregular component is consistent with that from the more complex model. For this reason we have focussed our attention on estimation of the irregular component. A byproduct of investigating the irregular component is that it is stationary. 3. Models for multivariate time series The relation among several autoregressions can be modelled with the vector autoregression of order k, VAR(k) x t = c + Φ 1 x t 1 + Φ 2 x t Φ k x t k + e t, (1) where x t, x t 1,..., x t k are n-dimensional vectors, with the corresponding coefficient vectors Φ 1, Φ 2,..., Φ k, c is the constant and e t is the error vector (which is assumed iid). If the covariance matrix, H, of e t is not diagonal, the set of linear equations (1) corresponds to a system of seemingly unrelated regressions (Zellner, 1962) and in H are hidden the relations among the components of x t. To highlight such relations we can represent the canonical VAR(k) in (1) in its structural form (SVAR) (Bernanke, 1986; Blanchard and Watson, 1986; Sims, 1986): Θ 0 x t = d + Θ 1 x t 1 + Θ 2 x t Θ k x t k + u t (2)

4 where Θ i = Θ 0 Φ i for i = 1,..., k, d = Θ 0 c and u t = Θ 0 e t with covariance matrix Θ 0 HΘ 0 = D, which is diagonal. If there are no zeros in the coefficient vectors the SVAR is saturated, but in many cases some lagged variables on the right hand side in (2) do not play any role in explaining the current variables, x t. In this case the value of the corresponding coefficient is zero and hence the SVAR is sparse. The order, p, of the regression may be determined by various methods including inspection of a multivariate partial autocorrelation sequence, see (Reinsel, 1993, pp 69-70), or minimization of an order selection criterion such as AIC (Akaike, 1973), HIC (Hannan and Quinn, 1979), SIC (Schwarz, 1978), corrected AIC (Hurvitch and Tsai, 1989). The latter is particularly advisable given its small sample properties and it is the one we use in this paper. We require a further condition on Θ 0, that it represents a recursive dependence of each component of x t on other contemporaneous components. This is equivalent to the existence of a re-ordering of the elements of x t such that Θ 0 is triangular with unit diagonal. Each possible ordering of x t therefore gives a potentially distinct form of (2), but these are all statistically equivalent, corresponding to different factorizations of D = Θ 0 HΘ 0. (3) This inverse problem contrasts with the unique form of (1), which is an attractive feature of that model from a time-series modelling viewpoint. The value of (2) therefore lies in the possibility that there is one particular form which, as a consequence of its representing the data generating process, is more parsimonious in its parameterization than either (1) or the other forms of (2). This would be reflected in the ability to exclude many of the elements of Θ 0 and Θ i from the model without penalizing the adequacy in comparison with the saturated or other forms of either (1) or (2).

5 4. Conditional independence graphs Neglecting, for the present treatment, any effects of time series model estimation, we suppose that we have observations on the vector Gaussian white noise innovations process e t with the usual sample covariance matrix Ĥ. We wish to determine from the data the form of possible sparse structural matrices Θ 0 which are compatible with D. There may be no such unique form without imposing further constraints using insight from the modelling context. Tunnicliffe Wilson (1992) and Swanson and Granger (1997) consider a similar problem. The latter authors focus more on testing for the constraints implied by a particular structural form of Φ 0 which has commonly occurred in practice. Their tests are expressed in terms of partial autocorrelations which, as they remark, are not directional and would therefore appear less appropriate for recursive models. We follow the approach proposed by Reale and Tunnicliffe Wilson (2001) and use pair-wise partial autocorrelations, conditioning on all remaining variables (i.e. components of e t ). With respect to backward stepwise regression this approach has the advantage of leaving the conditioning set unchanged. Nevertheless there is a problem of multiple testing to deal with and later we ll describe a strategy to tackle this issue. The partial correlations, given the Gaussianity, are used to construct the conditional independence graph (CIG) of the variables, following procedures presented, for example, in Whittaker (1990). As Swanson and Granger (1997, p. 359) also remark, the structural form of dependence between the variables is naturally expressed by (and is equivalent to) a directed acyclic graph (DAG), in which nodes representing variables are linked with arrows indicating the direction of any dependence. A DAG implies a single CIG for the variables, but the possible DAGs which might explain a particular CIG may be several or none. The point is that, subject to sampling variability, the CIG is a constructible quantity and a useful one for expressing the data determined constraints on permissible DAG interpretations. The CIG consists of nodes representing the variables, two nodes being without a link if and only if they are independent conditional upon all the

6 remaining variables. In a Gaussian context this conditional independence is indicated by a zero partial autocorrelation: ρ (e it, e jt {e kt, k i, j}) = 0. (4) In the linear least squares context the linear partial autocorrelations in (4) still usefully indicate lack of linear predictability of one variable by another given the inclusion of all remaining variables. The link with Granger causality is quite evident. The set of all such partial correlations required to construct the CIG is conveniently calculated by making use of the inverse variance lemma (Whittaker, 1990) as ρ (e it, e jt {e kt, k i, j}) = W ij / (W ii W jj ) (5) where W = H 1. The sample values are obtained by substituting the sample value Ĥ for H. We then test their significance using thresholds obtained by exploiting the relationship between a regression t value and the sample partial correlation ˆρ given by ˆρ = t/ (t2 + ν) (6) (see Greene, 1993, p. 180), where ν is the residual degrees of freedom in the regression of one of the variables in the partial autocorrelation, upon all the other variables. The t value is that attached, in this regression, to the other variable in the partial autocorrelation. This is a relationship deriving from the linear algebra of least squares, and is not reliant upon statistical assumptions. Standard assumptions are needed to support the usual distribution of t under the null hypothesis that the true value of the relevant variable is zero, which is equivalent to ρ = 0. There are of course statistical pitfalls in applying the test simultaneously to all sample partial autocorrelations. Our approach is to use these values to suggest possible models, and after fitting these, we apply more formal tests and diagnostic checks to converge on an acceptable model.

7 5. From conditional independence graphs to directed acyclic graphs Central to the interpretation of a CIG is the separation theorem. The CIG is constructed by pairwise separation of variables which are independent conditional on the remainder. The separation theorem states that if two blocks of variables are separated, that is there is no link between any member of the first block and any member of the second, then the two blocks are completely independent conditional on the remaining variables. See, for example, Whittaker (1990, pp ) for a general proof and references to more straightforward proofs in the Gaussian case, where the result can be read directly from the joint density. To illustrate the theorem, let us consider the conditional independence graph in Figure, where A, B and C are either random variables or groups of random variables. The graph implies that A C B or alternatively that A B, C = A B. While the CIG leaves room for several possible alterna- A B C Figure 1: Markov property for conditional independence graphs. tive factorizations of the joint density function, the DAG provides a more precise definition. As an example let us consider the DAG in Figure ; it is very similar to the CIG in Figure where the lines, also called edges, are replaced by arrows. Nevertheless the definition in terms of density is now very precise, f(a, B, C) = f(a B)f(B C)f(A). Using the graphical modelling terminology we would say, in this case, that B is a parent of A and C is a parent of B. Although the DAG and the CIG represent a differ- A B C Figure 2: Density factorization implied by a DAG. ent definition of the joint probability, there is a correspondence between

8 these two graphs which is embodied by the moralization rule (Lauritzen and Spiegelhalter 1988). Because of this result we can obtain the CIG from the DAG by transforming the arrows into lines and linking unlinked parents. These kinds of edges are defined as moral. To better explain moralization let us consider a simplified example: obtaining the New Zealand residency. You can become NZ resident (C) for two reasons: general skills (A) or business reasons (B), which basically means having money to invest in New Zealand. We can effectively represent this system with the graph on the left hand side of Figure where both A and B affect C: A and B are parents of C. Assuming no relationship between being rich and being skilful (many real cases would support this assumption) there is no link between A and B. The DAG clearly provides a precise description of the pairwise independence relations. The CIG on the other hand provides a description of more global independence relations, considering the effect of all the variables present in the graph. If the joint distribution of the variables in the graph is not Gaussian, relations should be interpreted in terms of partial correlation. In our example although we assumed no direct dependence between money and skills, the joint consideration of the third variable, the obtainment of the New Zealand residency, would change the situation. In fact, information, for a certain applicant, about the level of skills and the outcome of the application can be revealing about the business capability. The resulting CIG is shown on the right hand side of Figure. A A C C B B Figure 3: Moralization of a directed acyclic graph. Moralization brings us to the next step which is to determine what DAG structures can explain a CIG. This is part of a much wider prob-

9 lem of the search for causal structure, covered for example by Spirtes, Glymour and Scheines (2000). The DAG is very attractive because of its causal interpretation (Pearl 2000), but all we can observe in practice is the CIG obtained by the sample partial correlation. So actually we need to perform the inverse operation of the moralization, which we term demoralization. Unfortunately while the transformation of a DAG into a CIG is unique, there are several DAG s which can give the same CIG. As an example, consider the CIG on the right end side in Figure : it could result from the moralization of all the DAG s in Figure. So we need to identify the moral links and remove Figure 4: Possible directed acyclic graph. them. To do that we need to use all the knowledge we have about the relationships among the random variables in the system. As we shall see in the application in the next section, the search for the DAG is simplified when we are in a structural VAR framework. 6. A graphical model for the quarterly gross national expenditure in New Zealand The quarterly general domestic product is one of the most relevant time series for economic and social analyses and there is more and more pressure to release early reliable estimates for this aggregate. In our analysis we consider a subcomponent of QGDP but the same methodology

10 could be extended to all the subcomponents and eventually we could possibly consider the main time series for each subcomponent jointly. We have QGDP on an expenditure basis (QGDE) from the June quarter of 1988 to the December quarter of 2002, a series of 65 values. As QGDE is quarterly gross national expenditure (QGNE) plus exports minus imports we have investigated the relationship between the top level components of QGNE only for our preliminary investigations. The quarterly general national expenditure is equal to the sum of private final consumption expenditure (PFCE), government final consumption expenditure (GFCE), gross fixed capital formation (GFKF) and stock changes (STOCK). That is QGNE = P F CE + GF CE + GF KF + ST OCK (7) Note that STOCK can have negative and positive values. We only model the irregular components produced by Statistics New Zealand s seasonal adjustment procedure. In order to provide earlier reliable estimates of the QGNE, our strategy is to focus on the most volatile component and rely on the stability of the others. The irregular components of PFCE, GFCE, GFKF and STOCK are plotted in Figures and Values Irregular component of Government Final Consumption Expenditure Figure 5: Irregular components of PFCE and GFCE. These time series are stationary by definition as confirmed by inspection. Nevertheless the methodology we are going to use can be applied to

11 Values Stock changes Figure 6: Irregular components of GFKF and STOCK. systems integrated of the first order without any concern for cointegration as proved by Tunnicliffe Wilson and Reale (2002). Using the corrected AIC we identified a vector autoregression of order 4. We then proceeded with the methodology explained above to obtain the conditional independence graph in figure. We first calculated the sample partial correlation by using the inverse variance lemma (5) and then tested their significance by using (6). In using this testing procedure we have to deal with the issue of multiple testing. A strategy to try to minimize type I and type II errors would be to use different levels of significance of partial correlations. This information combined with cross-correlations of residuals, prior information and moralization consistency will assist in selecting a specific DAG. Looking at the independence graph (Figure ) one of the advantages of graphical modelling is immediately obvious. It is considerably easier to see the intricacies of the relationship between different series at differing orders of lag. Note that in the CIG we represent only the relations with current variables, excluding the relations between past variables. This is because it is the current relations we are interested in. Nevertheless relationships between past variables can sometimes be of help; their use and sampling properties have been studied by Reale and Tunnicliffe Wilson (2002). From Figure general higher level patterns can be clearly identified.

12 Stocks GFKF GFCE PFCE t t 1 t 2 t 3 t 4 Figure 7: Conditional independence graph. It can be seen that there is a web of relationships between GFCE and GFKF at various lags, most at 0.99 significance. There are only two links connecting this group. Both are to PFCE, one from current GFKF and the other from lag 4 GFCE. It can also be seen that PFCE is linked to STOCKS. While this CIG is useful, for official statistics purposes we need also some indication of a causal structure. In order to identify a causal structure among the irregulars we need to identify the DAG and hence the direction of the edges among contemporaneous variables, the direction of the other edges being obvious given the time framework. Therefore we need to determine the causal structure for the contemporaneous relationship between STOCKS, PFCE and GFKF. Using moralization there are three possible directed structures among contemporaneous variables. They are presented in figure. Because of the knowledge we have of the system we can exclude structure B. We then proceed with subset selection and use information criteria, in particular the one proposed by Schwarz (1978), to select the best models for contemporaneous structures A and C (figures and ). Ac-

13 Stocks GFKF GFCE PFCE A B C Figure 8: Possible structures among contemporaneous variables. cording to both the Schwarz and Hannan and Quinn criteria the model in figure provides a better representation of the data. The following table provides the number of parameters, deviance, AIC, HIC and SIC for the saturated model, best model and alternative model. Model k DEV AIC HIC SIC Saturated Best Alternative At this point we now have a model that may be useful for official statistical purposes. We see that GFKF and GFCE are linked to past values of both, so both would be required for total QGNE. Current STOCKS and GFKF are related to current PFCE, which is an AR(1) process. So there is some evidence that a good current value for PFCE is not necessary to produce current QGNE, but rather its information is already contained in the other variables in the graph. We would still need to eventually have a good value for PFCE in order to use it in the PFCE AR(1) model for the next quarter s estimate of QGNE.

14 Stocks GFKF GFCE PFCE t t 1 t 2 t 3 t 4 Figure 9: Best model. However any decision as to when to release QGNE would require further investigation of the subcomponents of the series that we have used, plus using information on the time the various series are available for use. Also we would need to analyse the early estimates from any proposed model with the final estimates as given by Statistics NZ. 7. Conclusions Graphical modelling has been developed to help draw population inferences. While NSOs produce models as part of their outputs the primary task of an NSO is to produce a broad range of timely quality data for use by society. To this end it would be useful to identify the relationships between time series to select those that are crucial to the release of an acceptable first estimate. These crucial series could merit work to improve their timeliness, whereas less important series could be either not collected or have less resources applied to their collection. Our preliminary work shows graphical modelling has potential, but

15 Stocks GFKF GFCE PFCE t t 1 t 2 t 3 t 4 Figure 10: Alternative model. as a NSO often approaches time series analysis with different purposes than straight prediction more work will be required to identify under what conditions and in what areas it will be most useful. An extension of this approach for instance could be devised by including all the components of the time series. Graphical modelling could also be succesfully applied in reducing the number of time series collected by eliminating time series giving information already given by others. The large quantity of data available to a NSO offers applications to data mining, a field where graphical modelling is successful (Borgelt and Kruse, 2002). References Akaike H. (1973), A new look at statistical model identification, IEEE Transactions on Automatic Control, AC-19, Bernanke B.S. (1986), Alternative explanations of the money-income

16 correlation, Carnegie-Rochester Conference Series on Public Policy 25, Blanchard O.J., Watson M.W. (1986), Are business cycles all alike?, The American Business Cycle: Continuity and Change, (Gordon R.J. ed.), University of Chicago Press, Chicago. Borgelt C., Kruse R. (2002), Graphical Models: Methods for Data Analysis and Mining, Wiley, New York. Findley D.F., Monsell B.C., Bell W.R., Otto M.C., Chen B.-C. (1998), New capabilities and methods of the X-12 ARIMA seasonal adjustment program, Journal of Business and Economic Statistics, 16, Greene W.H. (1993), Econometric Analysis, Prentice-Hall, Englewood Cliffs. Hannan E.J., Quinn B.G. (1979), The determination of the order of an autoregression, Journal of the Royal Statistical Society Series B, 41, Hurvitch C.M, Tsai C-L. (1989), Regression and time series model selection in small samples, Biometrika, 76, Kazakova V. (2001), What dominates movements in SNZ seasonally adjusted time series, Internal Report, Statistics New Zealand. Lauritzen S.L., Spiegelhalter D.J. (1988), Local computations with probabilities on graphical structures and their applications to expert systems, Journal of the Royal Statistical Society Series B, 50, Pearl J. (2000), Causality, Cambridge University Press, Cambridge. Reale M., Tunnicliffe Wilson G. (2001), Identification of vector AR models with recursive structural errors using conditional independence graphs, Statistical Methods and Applications, 10, Reale M., Tunnicliffe Wilson G. (2002), The sampling properties of conditional graphs for structural vector autoregressions, Biometrika, 89, Reinsel G.C. (1993), Elements of Multivariate Time Series Analysis, Springer-Verlag, New York. Schwarz G. (1978), Estimating the dimension of a model, The Annals of Statistics, 6, Sims C.A. (1986), Are forecasting models usable for policy analysis?, Federal Reserve Bank of Minneapolis Quarterly Review, 10, 2-16.

17 Spirtes P., Glymour C., Scheines R. (2000), Causation, Prediction and Search, MIT University Press, Cambridge. Swanson N.R., Granger C.W.J. (1997), Impulse response functions based on a causal approach to residual orthogonalization in vector autoregressions, Journal of the American Statistical Association, 92, Tunnicliffe Wilson G. (1992), Structural models for structural change, Quaderni di Statistica e Econometria, 14, Tunnicliffe Wilson G., Reale M. (2002), Causal diagrams for I(1) structural VAR models, University of Canterbury Department of Mathematics and Statistics Research Reports, 2002/6. Whittaker J.C. (1990), Graphical Models in Applied Multivariate Statistics, Chichester, Wiley. Zellner A. (1962), An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association, 57,

### Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

### Vector Time Series Model Representations and Analysis with XploRe

0-1 Vector Time Series Model Representations and Analysis with plore Julius Mungo CASE - Center for Applied Statistics and Economics Humboldt-Universität zu Berlin mungo@wiwi.hu-berlin.de plore MulTi Motivation

### SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

### Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model

Tropical Agricultural Research Vol. 24 (): 2-3 (22) Forecasting of Paddy Production in Sri Lanka: A Time Series Analysis using ARIMA Model V. Sivapathasundaram * and C. Bogahawatte Postgraduate Institute

### State Space Time Series Analysis

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

### Univariate and Multivariate Methods PEARSON. Addison Wesley

Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

### ADVANCED FORECASTING MODELS USING SAS SOFTWARE

ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting

### Testing for Granger causality between stock prices and economic growth

MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

### TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND

I J A B E R, Vol. 13, No. 4, (2015): 1525-1534 TEMPORAL CAUSAL RELATIONSHIP BETWEEN STOCK MARKET CAPITALIZATION, TRADE OPENNESS AND REAL GDP: EVIDENCE FROM THAILAND Komain Jiranyakul * Abstract: This study

### Software Review: ITSM 2000 Professional Version 6.0.

Lee, J. & Strazicich, M.C. (2002). Software Review: ITSM 2000 Professional Version 6.0. International Journal of Forecasting, 18(3): 455-459 (June 2002). Published by Elsevier (ISSN: 0169-2070). http://0-

### Studying Achievement

Journal of Business and Economics, ISSN 2155-7950, USA November 2014, Volume 5, No. 11, pp. 2052-2056 DOI: 10.15341/jbe(2155-7950)/11.05.2014/009 Academic Star Publishing Company, 2014 http://www.academicstar.us

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### TIME SERIES ANALYSIS

TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

### Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA

Forecasting Geographic Data Michael Leonard and Renee Samy, SAS Institute Inc. Cary, NC, USA Abstract Virtually all businesses collect and use data that are associated with geographic locations, whether

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models

Journal of Business Finance & Accounting, 30(9) & (10), Nov./Dec. 2003, 0306-686X Monitoring the Behaviour of Credit Card Holders with Graphical Chain Models ELENA STANGHELLINI* 1. INTRODUCTION Consumer

### 16 : Demand Forecasting

16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical

### Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate?

Is the Forward Exchange Rate a Useful Indicator of the Future Exchange Rate? Emily Polito, Trinity College In the past two decades, there have been many empirical studies both in support of and opposing

### Investigating Quarterly Trading Day Effects

Investigating Quarterly Trading Day Effects 1 1 1 Kathleen M. McDonald-Johnson, David F. Findley, Erica Cepietz 1 U. S. Census Bureau, 4600 Silver Hill Road, Washington, DC 20233 Abstract When the U. S.

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### TIME SERIES ANALYSIS

TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

Chapter 7 Chapter Table of Contents OVERVIEW...193 GETTING STARTED...194 TheThreeStagesofARIMAModeling...194 IdentificationStage...194 Estimation and Diagnostic Checking Stage...... 200 Forecasting Stage...205

### Introduction to time series analysis

Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

### Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

### Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement

Adaptive Demand-Forecasting Approach based on Principal Components Time-series an application of data-mining technique to detection of market movement Toshio Sugihara Abstract In this study, an adaptive

### Department of Economics

Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

### Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

### PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

### Energy consumption and GDP: causality relationship in G-7 countries and emerging markets

Ž. Energy Economics 25 2003 33 37 Energy consumption and GDP: causality relationship in G-7 countries and emerging markets Ugur Soytas a,, Ramazan Sari b a Middle East Technical Uni ersity, Department

### Working Papers. Cointegration Based Trading Strategy For Soft Commodities Market. Piotr Arendarski Łukasz Postek. No. 2/2012 (68)

Working Papers No. 2/2012 (68) Piotr Arendarski Łukasz Postek Cointegration Based Trading Strategy For Soft Commodities Market Warsaw 2012 Cointegration Based Trading Strategy For Soft Commodities Market

### ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS

COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 ON THE DEGREES OF FREEDOM IN RICHLY PARAMETERISED MODELS Salvatore Ingrassia and Isabella Morlini Key words: Richly parameterised models, small data

### Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models LIUWEI (9906360) - 1 - ABSTRACT...3 1. INTRODUCTION...4 2. DATA ANALYSIS...5 2.1 Stationary estimation...5 2.2 Dickey-Fuller Test...6 3.

### Dynamics of Real Investment and Stock Prices in Listed Companies of Tehran Stock Exchange

Dynamics of Real Investment and Stock Prices in Listed Companies of Tehran Stock Exchange Farzad Karimi Assistant Professor Department of Management Mobarakeh Branch, Islamic Azad University, Mobarakeh,

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### Forecast covariances in the linear multiregression dynamic model.

Forecast covariances in the linear multiregression dynamic model. Catriona M Queen, Ben J Wright and Casper J Albers The Open University, Milton Keynes, MK7 6AA, UK February 28, 2007 Abstract The linear

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Multivariate Analysis of Ecological Data

Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Testing The Quantity Theory of Money in Greece: A Note

ERC Working Paper in Economic 03/10 November 2003 Testing The Quantity Theory of Money in Greece: A Note Erdal Özmen Department of Economics Middle East Technical University Ankara 06531, Turkey ozmen@metu.edu.tr

### 7 Time series analysis

7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

### MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

### Time Series Analysis

Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

### Chapter 5. Analysis of Multiple Time Series. 5.1 Vector Autoregressions

Chapter 5 Analysis of Multiple Time Series Note: The primary references for these notes are chapters 5 and 6 in Enders (2004). An alternative, but more technical treatment can be found in chapters 10-11

### COURSES: 1. Short Course in Econometrics for the Practitioner (P000500) 2. Short Course in Econometric Analysis of Cointegration (P000537)

Get the latest knowledge from leading global experts. Financial Science Economics Economics Short Courses Presented by the Department of Economics, University of Pretoria WITH 2015 DATES www.ce.up.ac.za

### A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500

A Trading Strategy Based on the Lead-Lag Relationship of Spot and Futures Prices of the S&P 500 FE8827 Quantitative Trading Strategies 2010/11 Mini-Term 5 Nanyang Technological University Submitted By:

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Sales forecasting # 2

Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting

### Threshold Autoregressive Models in Finance: A Comparative Approach

University of Wollongong Research Online Applied Statistics Education and Research Collaboration (ASEARC) - Conference Papers Faculty of Informatics 2011 Threshold Autoregressive Models in Finance: A Comparative

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### Chapter 2 Portfolio Management and the Capital Asset Pricing Model

Chapter 2 Portfolio Management and the Capital Asset Pricing Model In this chapter, we explore the issue of risk management in a portfolio of assets. The main issue is how to balance a portfolio, that

### Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

### Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay. Solutions to Midterm

Booth School of Business, University of Chicago Business 41202, Spring Quarter 2015, Mr. Ruey S. Tsay Solutions to Midterm Problem A: (30 pts) Answer briefly the following questions. Each question has

### Modeling Stock Trading Day Effects Under Flow Day-of-Week Effect Constraints

Modeling Stock Trading Day Effects Under Flow Day-of-Week Effect Constraints David F. Findley U.S. Census Bureau, Washington, DC 20233 (Email:David.F.Findley@census.gov) Brian C. Monsell U.S. Census Bureau,

### Time Series Analysis

Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Detecting Stock Calendar Effects in U.S. Census Bureau Inventory Series

Detecting Stock Calendar Effects in U.S. Census Bureau Inventory Series Natalya Titova Brian Monsell Abstract U.S. Census Bureau retail, wholesale, and manufacturing inventory series are evaluated for

### Causal Inference and Major League Baseball

Causal Inference and Major League Baseball Jamie Thornton Department of Mathematical Sciences Montana State University May 4, 2012 A writing project submitted in partial fulfillment of the requirements

### Appendix 1: Time series analysis of peak-rate years and synchrony testing.

Appendix 1: Time series analysis of peak-rate years and synchrony testing. Overview The raw data are accessible at Figshare ( Time series of global resources, DOI 10.6084/m9.figshare.929619), sources are

### Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

### Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems

Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems Robert J. Rossana Department of Economics, 04 F/AB, Wayne State University, Detroit MI 480 E-Mail: r.j.rossana@wayne.edu

### Probability and Random Variables. Generation of random variables (r.v.)

Probability and Random Variables Method for generating random variables with a specified probability distribution function. Gaussian And Markov Processes Characterization of Stationary Random Process Linearly

### APPLICATION OF THE VARMA MODEL FOR SALES FORECAST: CASE OF URMIA GRAY CEMENT FACTORY

APPLICATION OF THE VARMA MODEL FOR SALES FORECAST: CASE OF URMIA GRAY CEMENT FACTORY DOI: 10.2478/tjeb-2014-0005 Ramin Bashir KHODAPARASTI 1 Samad MOSLEHI 2 To forecast sales as reliably as possible is

### Time Series Analysis

JUNE 2012 Time Series Analysis CONTENT A time series is a chronological sequence of observations on a particular variable. Usually the observations are taken at regular intervals (days, months, years),

### Performing Unit Root Tests in EViews. Unit Root Testing

Página 1 de 12 Unit Root Testing The theory behind ARMA estimation is based on stationary time series. A series is said to be (weakly or covariance) stationary if the mean and autocovariances of the series

### By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This material is posted here with permission of the IEEE Such permission of the IEEE does not in any way imply IEEE endorsement of any of Helsinki University of Technology's products or services Internal

### 6. Cholesky factorization

6. Cholesky factorization EE103 (Fall 2011-12) triangular matrices forward and backward substitution the Cholesky factorization solving Ax = b with A positive definite inverse of a positive definite matrix

### THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS

THE IMPACT OF EXCHANGE RATE VOLATILITY ON BRAZILIAN MANUFACTURED EXPORTS ANTONIO AGUIRRE UFMG / Department of Economics CEPE (Centre for Research in International Economics) Rua Curitiba, 832 Belo Horizonte

### Analysis of Load Frequency Control Performance Assessment Criteria

520 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 16, NO. 3, AUGUST 2001 Analysis of Load Frequency Control Performance Assessment Criteria George Gross, Fellow, IEEE and Jeong Woo Lee Abstract This paper presents

### 3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Solution to Homework 2

Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if

### COMP6053 lecture: Time series analysis, autocorrelation. jn2@ecs.soton.ac.uk

COMP6053 lecture: Time series analysis, autocorrelation jn2@ecs.soton.ac.uk Time series analysis The basic idea of time series analysis is simple: given an observed sequence, how can we build a model that

### Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

### INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

### MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010

Advances in Economics and International Finance AEIF Vol. 1(1), pp. 1-11, December 2014 Available online at http://www.academiaresearch.org Copyright 2014 Academia Research Full Length Research Paper MULTIPLE

### ITSM-R Reference Manual

ITSM-R Reference Manual George Weigt June 5, 2015 1 Contents 1 Introduction 3 1.1 Time series analysis in a nutshell............................... 3 1.2 White Noise Variance.....................................

### Air passenger departures forecast models A technical note

Ministry of Transport Air passenger departures forecast models A technical note By Haobo Wang Financial, Economic and Statistical Analysis Page 1 of 15 1. Introduction Sine 1999, the Ministry of Business,

### Traffic Driven Analysis of Cellular Data Networks

Traffic Driven Analysis of Cellular Data Networks Samir R. Das Computer Science Department Stony Brook University Joint work with Utpal Paul, Luis Ortiz (Stony Brook U), Milind Buddhikot, Anand Prabhu

### Some useful concepts in univariate time series analysis

Some useful concepts in univariate time series analysis Autoregressive moving average models Autocorrelation functions Model Estimation Diagnostic measure Model selection Forecasting Assumptions: 1. Non-seasonal

### Modeling and Performance Evaluation of Computer Systems Security Operation 1

Modeling and Performance Evaluation of Computer Systems Security Operation 1 D. Guster 2 St.Cloud State University 3 N.K. Krivulin 4 St.Petersburg State University 5 Abstract A model of computer system

### A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS

A SURVEY ON CONTINUOUS ELLIPTICAL VECTOR DISTRIBUTIONS Eusebio GÓMEZ, Miguel A. GÓMEZ-VILLEGAS and J. Miguel MARÍN Abstract In this paper it is taken up a revision and characterization of the class of

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### The Orthogonal Response of Stock Returns to Dividend Yield and Price-to-Earnings Innovations

The Orthogonal Response of Stock Returns to Dividend Yield and Price-to-Earnings Innovations Vichet Sum School of Business and Technology, University of Maryland, Eastern Shore Kiah Hall, Suite 2117-A

### Examining the Relationship between ETFS and Their Underlying Assets in Indian Capital Market

2012 2nd International Conference on Computer and Software Modeling (ICCSM 2012) IPCSIT vol. 54 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V54.20 Examining the Relationship between

### Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

### Time Series Analysis: Basic Forecasting.

Time Series Analysis: Basic Forecasting. As published in Benchmarks RSS Matters, April 2015 http://web3.unt.edu/benchmarks/issues/2015/04/rss-matters Jon Starkweather, PhD 1 Jon Starkweather, PhD jonathan.starkweather@unt.edu

### Overview of Factor Analysis

Overview of Factor Analysis Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone: (205) 348-4431 Fax: (205) 348-8648 August 1,

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013 Indices of Model Fit A recommended minimal set of fit indices that should be reported and interpreted when reporting the results of SEM analyses:

### Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd. COEURE workshop Brussels 3-4 July 2015

Big data in macroeconomics Lucrezia Reichlin London Business School and now-casting economics ltd COEURE workshop Brussels 3-4 July 2015 WHAT IS BIG DATA IN ECONOMICS? Frank Diebold claimed to have introduced

### Promotional Forecast Demonstration

Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion

### Joseph Twagilimana, University of Louisville, Louisville, KY

ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim