7 Time series analysis

Save this PDF as:
Size: px
Start display at page:

Transcription

1 7 Time series analysis In Chapters 16, 17, in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are self-explanatory. This chapter provides extra information on some of the methods. 7.1 Time series techniques Some of the time series techniques discussed in Zuur et al. (2007) are available from the exploration or multivariate menus. Others can be obtained by clicking on the main menu time series button (Figure 7.1). The panels show further time series techniques available in Brodgar from the same menu button. There are four main sections, namely (i) trends, (ii) sudden change, (iii) relationships and (iv) cycles. General time series methods are auto-and cross-correlations, time lags plotting ARIMAX modelling and spectral analysis. These methods make use of R, and applying them in Brodgar is similar to the linear regression menus (Chapter 5). These methods are available from the lower left and right panels in Figure 7.1.

2 128 7 Time series analysis Figure 7.1. Time series analysis menus. 7.2 Dynamic factor analysis Running dynamic factor analysis in Brodgar is simple; select DFA based on Zuur et al. (2003) in the upper left panel in Figure 7.1, and click the GO button. The upper left window in Figure 7.2 will appear. In the upper left window, one can select the model, the number of common trends, the type of error covariance matrix and the response variables. The following DFA models are available in Brodgar: 1. data = M common trends + noise 2. data = M common trends + explanatory variables + noise 3. data = N univariate trends + noise 4. data = N univariate trends + explanatory variables + noise 5. data = 1 common trend + noise 6. data = 1 common trend + explanatory variables + noise 7. 1 time series = trend + noise 8. 1 time series = trend + explanatory variables + noise We now discuss each of these models.

3 7.2 Dynamic factor analysis 129 Figure 7.2. Options for the DFA (Zuur et al. 2003) method. Model groups 1 & 2 These models are discussed in detail in Zuur et al. (2007), and Zuur et al. (2003 ab, 2004). The DFA code in Brodgar can deal with scalar explanatory variables, i.e. explanatory variables that have only one value at time t. However, the model structure is such that an explanatory variable can have a different effect on each of the response variables. This is probably best explained with help of a formula. Suppose we have a dynamic factor model with 5 time series, 2 common trends, and 1 explanatory variable. The mathematical formulation is given by: y 1t = a 11 z 1t + a 12 z 2t + b 1 x t + e 1t y 2t = a 21 z 1t + a 22 z 2t + b 2 x t + e 2t y 3t = a 31 z 1t + a 32 z 2t + b 3 x t + e 3t y 4t = a 41 z 1t + a 42 z 2t + b 4 x t + e 4t y 5t = a 51 z 1t + a 52 z 2t + b 5 x t + e 5t where y it is the value of the i th time series at time t, z 1t and z 2t are the two common trends at time t, x t is the value of the explanatory variable at time t and b 1,.., b 5 are

4 130 7 Time series analysis regression coefficients. The terms e 1t,.., e 5t are noise components and it is assumed that e t = (e 1t,.., e 5t )' is normally distributed with expectation 0 and covariance matrix R. For simplicity, we have ignored the constant level term. In matrix notation, the model can be written as: Y t = A z t + b x t + e t The elements of A are called factor loadings and indicate which common trends are important for which of the N response variables. The parameters b 1,.., b 5 are regression parameters. The relationship with `ordinary' factor analysis becomes clear by writing down the expression for the covariance matrix of Y t. For simplicity, we have omitted the explanatory variable component in the model. Cov(Y t )= A Cov(z t ) A' + R For identification purposes, the covariance matrix of the common trends is set to the identity matrix, which means that we get: Cov(Y t )= A A' + R This is a similar covariance model as in factor analysis, except that our factors (or: common trends) are required to be smoothing functions over time. In factor analysis, the error covariance matrix R is usually taken as a diagonal matrix. Consequently, the off-diagonal elements of the covariance matrix of the response variables are modelled entirely as a function of the factor loadings. However, there is no reason why the user should not use a symmetric, non-diagonal matrix for R in dynamic factor analysis. Our experience using such a matrix is positive. The various DFA models are presented in Table 7.1. To decide which model to use, the AIC criterion can be used. This criterion is a trade-off between measure of fit, and the number of parameters in the model. The advantage of using a non-diagonal matrix for R is that the number of common trends needed for an adequate model fit is smaller; instead of using a common trend for 2-way interactions, one single parameter can be used. The disadvantage is that the number of parameters increases drastically. The implementation of DFA in Brodgar can cope with missing values in the response variables, but not with missing values in the explanatory variables. To avoid a fatal crash, Brodgar replaces these missing values by the average of the explanatory variable. We suggest standardising the explanatory variables, as this makes comparison of the estimated regression parameters easier. The use of highly correlated explanatory variables (multi-collinearity) should also be avoided. Occasionally, it happens that a common trend gives an exact fit for one of the response variables. This behaviour might also occur in `ordinary' factor analysis and is called a Heywood case. Using a non-diagonal matrix for R solves this problem. A possible explanation is that the time series are too noisy (or fluctuate too much), and therefore DFA, which is basically a smoothing technique, might be inappropriate.

5 7.2 Dynamic factor analysis 131 Table 7.1. Various DFA models. Model matrix R Interpretation Y t =Az t + e t Diagonal The N time series are modelled as a linear combination of M common trends. These common trends represent the joint signal in a group, or all the series. The diagonal elements of R indicate the amount of information that cannot be explained by the common trends. Y t =Az t + e t non-diagonal As above. Additionally, 2-way interactions (if present) between the series are modelled by the off-diagonal elements of R. Y t =Az t + bx t + e t Diagonal The N time series are modelled as the effects of measured explanatory variables plus a linear combination of the common trends (a common pattern in the series which cannot be explained by the explanatory variables), plus a certain amount of information (time series specific) that cannot be explained by any of the other components. Y t =Az t + bx t + e t non-diagonal As above. Additionally, 2-way interactions are modelled via the off-diagonal elements of R. This might mean that less common trends are needed for an adequate model fit compared to the model above. We have used Brodgar to analyse data sets up to 20 time series (response variables) and up to 5 explanatory variables. We used 1 to 5 common trends and both the diagonal and non-diagonal matrices R. Results for different starting values were nearly identical, indicating that the numerical estimation procedure indeed obtained the global optimal solution. For larger data sets, this might not always be the case and a local optimum might be found. Model groups 3 to 6 In these models a trend is estimated for each response variable. It is still possible to use a symmetric, non-diagonal matrix R, resulting in so-called seemingly unrelated time series models Harvey (1989).

6 132 7 Time series analysis Model groups 5 & 6 In these models, one common trend is used, and all factor loadings are set to 1. Hence, the N time series are modelled as one common pattern plus a level parameter (see below) plus noise. Basically, this is a dynamic factor model with only one common trend and all factor loadings set to 1. The common trends are shifted up and down via the constant level parameter. Model groups 7 & 8 These models are for univariate series only. They model the time series as: 1 time series = trend + noise. The trend is modelled as a so-called random walk. If there is more than one response variable, a multivariate model is needed. A sensible approach is to start with N univariate trends (trend for each Y), where N is the number of time series, followed by a model containing one common trend, two common trends, three common trends, etc. The models with common trends (groups 1, 2, 5, 6) all have constant level parameters, see Zuur et al. (2003 a,b,2004). Please note that if a model with explanatory variables is selected, you should have specified explanatory variables in the data import step. If a model with explanatory variables is selected, make sure that the appropriate variables are selected in the upper right window in Figure 7.2. Settings for DFA The lower left panel in Figure 7.2 shows the default settings for DFA. Various items can be changed via this window, namely: Number of EM iterations. Parameters in the dynamic factor model are estimated with the so-called EM algorithm. Here, you can set the upper limit of the number of iterations the EM algorithm will carry out. For reasonably fast computers (800 MHz and above), it is advisable to use 1500 EM iterations. If computing time takes too long (either because of a slow computer or a large data set), it can be decreased to 500 iterations in preliminary analyses. If the number of EM iterations is changed, Brodgar stores it as the new default value. Stop criterion EM algorithm. If changes in the maximum likelihood function become smaller than this criterion, the EM algorithm will stop. We advise If a different value is used, please note that Brodgar does not store the new value. Hence, you will need to change this each time Brodgar is started. Try different starting values. Most optimisation routines rely on good starting values. If yes is selected, the EM algorithm starts x-times with different starting values (which are chosen at random) and carries out y-iterations. After x- runs the starting values that resulted in the lowest AIC (or highest value of the likelihood function) are used as starting values in the final run. The user can

7 7.2 Dynamic factor analysis 133 choose the number of runs (either 5 or 10) and the number of EM iterations in each run (50, 100 or 200). We recommend 5 loops with 100 iterations each. Refresh rate of runtime output. The graphical user interface of Brodgar was written in a language called Tcl-Tk. The statistical routines were programmed in FORTRAN. Once the parameter estimation process is started, the FORTRAN code will save results to a file every jth iteration and a signal is given to the graphical user interface to present these results in a window. This allows the user to monitor the progress of the estimation process. The value of j can be changed. Calculate 95% c.i. for beta. These are the confidence intervals for the parameters corresponding to the explanatory variables. Depending on the number of response variables and explanatory variables, estimating the confidence intervals might be time consuming. Error covariance matrix. Here the user can choose between a diagonal matrix or a symmetric, non-diagonal matrix for the error matrix R. Output during runtime The estimation process is started by clicking the GO button in Figure 7.2. During the estimation process a new window will appear, see Figure 7.3. It shows the intermediate results during runtime. In particular, it shows: at which iteration the algorithm is, the estimated value of the log likelihood function, the change in the log likelihood function (compared to the previous iteration), the AIC, BIC and CAIC values (model selection tools) changes in the trends, factor loadings (Z), noise component (R), and parameters for the explanatory variables (beta), the iteration for the starting values (if selected), and most importantly, the common trends plotted versus time.

8 134 7 Time series analysis Figure 7.3. Output during runtime. Recall that the dynamic factor model has a constant level parameter. This is modelled via a dummy explanatory variable (taking the value of one). For this reason, there will always be a change in beta, even if you have not selected explanatory variables. Changes in the parameters (the elements of Z, beta and R) are defined as the sum (over all elements in a matrix) of absolute differences of estimated values in the current iteration and the previous iteration. Validation of the DFA model In this paragraph, model validation of DFA is discussed. A distinction has been made between a graphical and numerical validation. Select Time Series from the main menu, and click the Validation tab (after convergence of the algorithm). This gives the lower right panel in Figure 7.2. This panel is also given in Figure 7.4. The options on the left correspond to the graphical tools, those on the right to numerical information. Because all information is stored in the project directory, one can easily access the validation information without running the DFA algorithm.

9 7.2 Dynamic factor analysis 135 Graphical output Once Brodgar has estimated the parameters in the dynamic factor model, the user can use various graphical tools to assess whether the chosen model is the most optimal one. This process follows similar lines as in linear regression. The following tools are available: Plot all fitted curves in one graph. Plot all common trends in one graph, and in separate graphs. Plot the factor loadings. Plot the model fit and observed values versus time in one graph for each response variable. Plot canonical correlations. Plot the residuals versus time. Plot histograms of the residuals. We discuss these tools below. The CPUE Nephrops data set was used. Time series were standardised. Figure 7.4. Validation options.

10 136 7 Time series analysis Clicking on the button labelled Fitted curves in Figure 7.4, pops up a new window containing all fitted curves in one graph, see Figure 7.5. Using the legend, colours of the lines can be changed to blue. Comparing fitted curves with each other, and with the observed time series ( Plot data in Data exploration), might provide very useful information. Clicking the Common trends button in Figure 7.4 shows the estimated common trends, see Figure 7.6. Again, by clicking on names on the legend, the color of the common trends can be changed. The second and third tabs in Figure 7.6 contain the individual common trends and 95% c.i. The estimated factor loadings for each axis are plotted as vertical lines from 0 to their estimated values, see Figure 7.7. Furthermore, loadings of axis i are plotted versus loadings of axis j for every combination of i and j. By clicking on the dotted line under the name of a tab, the corresponding graph will jump out and can be compared with other graphs from the same window. Via the Settings menu in Figure 7.2, the size of the graphs in Figure 7.6 and Figure 7.7 can be changed. Figure 7.5. Fitted curves for Icelandic CPUE data.

11 7.2 Dynamic factor analysis 137 Figure 7.6. Estimated common trends and 95% confidence intervals. Figure 7.7. Factor loadings axis 2.

12 138 7 Time series analysis Results in Figure 7.7 indicate that the second axis is important for the stations 9, 8, 10, 8 and 11. The fit of the dynamic factor model for each response variable can be viewed via the plot model fit button, see Figure 7.8. Figure 7.8. Model fit for each response variable. Dynamic factor analysis is a dimension reduction technique. It is unlikely that every response variable is fitted well if a small number of common trends is used. This is the same with techniques like principal component analysis and correspondence analysis. In these techniques, points close to the origin are generally not fitted well. The advantage of dynamic factor analysis is that is shows the fit, whereas PCA and CA do not. Dynamic factor analysis is also a smoothing technique. In contrast to ordinary smoothing techniques, dynamic factor analysis estimates the amount of smoothing automatically. Occasionally, dynamic factor analysis produces fitted lines that are an exact fit. This means that the amount of smoothing is too small, which is an indication that the underlying model is inappropriate. In our experience, switching to a symmetric, non-diagonal matrix for the error covariance matrix solves this problem. Another option might be a transformation. Also inspect the data for outliers.

13 7.2 Dynamic factor analysis 139 The residuals are calculated as the observed values minus the fitted values. Fitted values are the values obtained by the Kalman smoothing algorithm (Zuur et al a ). The residuals can be plotted versus time. To enhance visual inspection of the residuals, a smoothing line can be added. Figure 7.9 shows how graphical tools can be used in practise. Various windows can be open simultaneously. Figure 7.9. Validation tools. Canonical correlations are correlations between the original time series and the common trends. If a canonical correlation is large, then this indicates that the corresponding response variable follows the pattern of the common trend. If it is low, then the response variable is not related (in a linear context) to the common trend. These correlations provide similar information as the factor loadings. However, in some multivariate methods (e.g. discriminant analysis and canonical correspondence analysis), there is a discussion on the reliability of weights (factor loadings).

14 140 7 Time series analysis Numerical output The numerical output can be obtained from the options on the right side in Figure 7.4. There are two important buttons, namely Measures of fit and Explanatory variables. Clicking the Measures of fit button will give the AIC, log likelihood value and number of parameters, among others. For the example data, the following output is obtained via this button: Measures of fit LOG Likelihood = AIC = BIC = CIAC = Number of parameters = 43 The output was obtained for a dynamic factor model, with two common trends and a diagonal matrix for the noise covariance matrix. The AIC is the most important quantity. Various models can be tried, and the model with the smallest AIC value is likely to be the best model. The BIC and CIAC are alternative measures of fit, but in our experience the AIC is more useful. The button Explanatory variables will give the estimated values of the constant level parameters and the explanatory variables (if selected) plus standard errors and t-values. For the model above, the following output was obtained. EXPLANATORY VARIABLES Used model: y(t) = GAMMA * alpha(t) + D x(t) + epsilon(t) alpha(t) = alpha(t-1) + eta(t) x(t) contains the explanatory variables. The first column of D contains the constant level. 1. Constant level parameters and standard errors (first column of D) Index, estimated value, standard error and t-value

15 7.2 Dynamic factor analysis Estimated regression parameters No explanatory variables were used. The constant level parameter is modelled with help of a dummy variable, which takes the value of one for each response variable (Zuur et al a ). The second column (containing the values 0.07, 0.15, etc.) contains the estimated values for the constant level parameters. The third column contains the standard errors and the fourth column the t-values. Although these t-values should be interpreted with care, t-values larger than 3 (in absolute sense) indicate a strong relationship between the explanatory variable and the response variable. In this case, all estimated constant level parameters are non-significant different from zero. This did not come as a surprise because the time series were standardised prior to the analysis. If besides the dummy variable, explanatory variables are used, results are presented in the same way. The output of Brodgar can also be found in ascii files, under the output directory of Brodgar. Recall that this is the directory which has the same name (and directory path) as your project name except for the *.brd extension. The relevant files are: results.txt, Res_ci.txt, Res_covm.txt, Res_expl.txt, Res_facl.txt and Res_mf.txt. The text in these files explains what is printed. Other numerical information (e.g. starting values of the trends, estimated error covariance matrix, convergence information) is available and follows the notation in Zuur et al. (2003a). Copy numerical output to the clipboard Numerical output of most techniques in Brodgar can be copied directly to the windows clipboard. From there, you can paste it into Microsoft Excel. Just click one of the Copy to clipboard buttons in Brodgar, and press Control-V in Excel. The interpretation of the output is discussed below. For DFA, the output is as follows. Factor loadings An ascii file containing the loading can be found in: \YourProjectName\facload.txt. The first column contains the factor loadings of the first axis, the second column the loadings of the second axis, etc. Trends An ascii file containing the trends can be found in: \YourProjectName\at.txt. Interpretation of the columns in this file are: First column: time index. Second column: the first common trend. Third column: the lower 95% confidence band of the first common trend. Fourth column: the upper 95% confidence band of the first common trend.

16 142 7 Time series analysis Fifth column: the second common trend (if selected). Sixth column: the lower 95% confidence band of the second common trend. Seventh column: the upper 95% confidence band of the first common trend. Eight column: third common trend (if selected) etc. 7.3 Repeated Loess smoothing in Brodgar. Repeated Loess smoothing is available from the Trend menu under the main menu button Time Series. First, the user can (de-)select response variables. Brodgar will apply repeated Loess smoothing on each selected time series. Figure Settings for repeated Loess smoothing in Brodgar. The following options are available (Figure 7.10). Number of Loess curves. Either 1 or 2.

17 7.3 Repeated Loess smoothing in Brodgar. 143 Span width of the first Loess curve. This should be a value between 0 and 1. Span width of the second Loess curve (if selected in the first step). This should also be a value between 0 and 1, and smaller than the span width of the first Loess curve. If 2 smoothing curves are selected, the number of iterations. This is the iteration of repeated Loess smoothing described above. In general, 5 iterations will suffice. Predict missing values. This option, only available if one smoothing curve is estimated, will predict (using Loess) missing values in the original time series. Note that it will not predict missing values immediately at the beginning or end of the time series. Values along the x-axis. This can be automatic (resulting in values of 1, 2, 3, etc.), or the column labels can be used. In the latter case, make sure that the column labels are numeric. Thickness of line for the mean. This line is only plotted if the appropriate box is ticked in the graphical output menu, see above. The line width can be changed from 3 (default) to other values. Graph type. Allows the user to select saving graphs as a JPG, BMP or PNG. Name of graph. Brodgar will save the graph as: YourName.JPG in the project directory (assuming a JPG file was selected). Title. This title will appear at the top of the figure. Title x-axis. Allows the user to enter a label for the x-axis. Title y-axis. Allows the user to enter a title for the y-axis. The graphical output consists of: All smoothing curves in one graph, with or without a mean value, per component. The first, second and residuals on one page. The style options determine whether graphs are plotted next to each other, under each other, or in a matrix format. It is also possible to access the estimated smoothing curves, click on the Numerical output button. It will give the estimated smoothing curves and data with missing values replaced by estimators (if requested). This output can be

18 144 7 Time series analysis copied and pasted manually to Excel if you wish to create graphs with a different plotting style. 7.4 Seasonal decomposition by Loess smoothing This technique makes use of the R function 'stl', which decomposes a time series into a trend, seasonal component and residual information using Loess smoothing. Examples and details are given in Zuur et al. (2007). A related method is the month or cycle plot, and is discussed next Month plots (cycle plots) This technique plots data of the same month in one graph. It will create a January time series, a February time series, etc. These 12 time series (of the original data) are plotted in one graph in a coplot style. Besides a month plot of the original data, Brodgar will also add a month plot for the components obtained by the seasonal decomposition by Loess smoothing, see Cleveland (1993) for details. An illustration of seasonal decomposition by Loess smoothing and month plots is presented in Figure 7.11 to Figure Figure 7.11 shows monthly CO2 data measured on Hawaii since the late 1950s. The decomposition into a trend and seasonal component using Loess smoothing is presented in Figure 7.12 and Figure We used a Loess window of 21 (span in lags). Figure 7.14 shows the month (or cycle) plot. Results indicate that the monthly series have a maximum in May and a minimum in October. The maximum increase in CO2 was observed in April, and the largest decrease in September. The menu options for the seasonal decomposition using Loess smoothing are given in Figure The options for the month plot are identical.

19 7.4 Seasonal decomposition by Loess smoothing Variable Samples Figure Time series plot of monthly CO2 data at Hawaii. Trends Time Figure Long term trend obtained by seasonal Loess decomposition. of monthly CO2 data at Hawaii.

20 146 7 Time series analysis Seasonal Time Figure Seasonal component obtained by seasonal Loess decomposition. of monthly CO2 data at Hawaii. CO2 seasonal J F M A M J J A S O N D Figure Month plot of seasonal component obtained by seasonal Loess decomposition. of monthly CO2 data at Hawaii.

21 7.4 Seasonal decomposition by Loess smoothing 147 Figure Options in Brodgar for the seasonal decomposition using Loess. In the left panel, a response variable can be selected. The right panel shows the settings. Time series frequency. This is the frequency of the time series. For monthly time series, this will be 12, and for quarterly time series it is 4. Loess window trend and Loess window seasonal. The values determine the amount of smoothing that is used in the sub-series. The default value for the months is periodic and means that mean values per month are taken. Alterna-

22 148 7 Time series analysis tively, the span, in terms of the lag, can be chosen. In Figure 7.13, we used a span of 21 for the seasonal component, and the default value for the trend. Trend degree and seasonal degree. This value determines whether the Loess algorithm should use a linear regression model or polynomial model to obtain the smoothing values. See also the stl help files of R. Setting it to 0 should result in smoother curves. Graphical output. This can be 1 graph per component (with or without a mean curve), and N individual graphs (resulting in Figure 7.12 to Figure 7.14). Thickness of line for the mean. The higher this value is, the thicker the line for the mean values.

Figure 1. An embedded chart on a worksheet.

8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL

PERFORMING REGRESSION ANALYSIS USING MICROSOFT EXCEL John O. Mason, Ph.D., CPA Professor of Accountancy Culverhouse School of Accountancy The University of Alabama Abstract: This paper introduces you to

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

36 Time series analysis of Hawaiian waterbirds

Reed, J. M., C. S. Elphick, A. F. Zuur, E. N. Ieno, and G. M. Smith. 2007. Time series analysis of Hawaiian waterbirds. Pages 615-632 in, Analysing Ecological Data. A. F. Zuur, E. N. Ieno, and G. M. Smith.

5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

Nonlinear Iterative Partial Least Squares Method

Numerical Methods for Determining Principal Component Analysis Abstract Factors Béchu, S., Richard-Plouet, M., Fernandez, V., Walton, J., and Fairley, N. (2016) Developments in numerical treatments for

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations. Excel is powerful tool and can make your life easier if you are proficient in using it. You will need to use Excel to complete most of your

Dealing with Data in Excel 2010

Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing

Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

Spreadsheet software for linear regression analysis

Spreadsheet software for linear regression analysis Robert Nau Fuqua School of Business, Duke University Copies of these slides together with individual Excel files that demonstrate each program are available

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

SPSS Basics Tutorial 1: SPSS Windows There are six different windows that can be opened when using SPSS. The following will give a description of each of them. The Data Editor The Data Editor is a spreadsheet

Introduction to Quadratic Functions The St. Louis Gateway Arch was constructed from 1963 to 1965. It cost 13 million dollars to build..1 Up and Down or Down and Up Exploring Quadratic Functions...617.2

Software Review: ITSM 2000 Professional Version 6.0.

Lee, J. & Strazicich, M.C. (2002). Software Review: ITSM 2000 Professional Version 6.0. International Journal of Forecasting, 18(3): 455-459 (June 2002). Published by Elsevier (ISSN: 0169-2070). http://0-

Tutorial 2: Using Excel in Data Analysis

Tutorial 2: Using Excel in Data Analysis This tutorial guide addresses several issues particularly relevant in the context of the level 1 Physics lab sessions at Durham: organising your work sheet neatly,

http://school-maths.com Gerrit Stols

For more info and downloads go to: http://school-maths.com Gerrit Stols Acknowledgements GeoGebra is dynamic mathematics open source (free) software for learning and teaching mathematics in schools. It

State Space Time Series Analysis

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

Introduction to time series analysis

Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples

Interrupted time series (ITS) analyses

Interrupted time series (ITS) analyses Table of Contents Introduction... 2 Retrieving data from printed ITS graphs... 3 Organising data... 3 Analysing data (using SPSS/PASW Statistics)... 6 Interpreting

15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

Regression Clustering

Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

Microsoft Excel Tutorial

Microsoft Excel Tutorial Microsoft Excel spreadsheets are a powerful and easy to use tool to record, plot and analyze experimental data. Excel is commonly used by engineers to tackle sophisticated computations

Assignment objectives:

Assignment objectives: Regression Pivot table Exercise #1- Simple Linear Regression Often the relationship between two variables, Y and X, can be adequately represented by a simple linear equation of the

Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

Multivariate Logistic Regression

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

GeoGebra Statistics and Probability

GeoGebra Statistics and Probability Project Maths Development Team 2013 www.projectmaths.ie Page 1 of 24 Index Activity Topic Page 1 Introduction GeoGebra Statistics 3 2 To calculate the Sum, Mean, Count,

Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

Instructions for SPSS 21

1 Instructions for SPSS 21 1 Introduction... 2 1.1 Opening the SPSS program... 2 1.2 General... 2 2 Data inputting and processing... 2 2.1 Manual input and data processing... 2 2.2 Saving data... 3 2.3

Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

Statgraphics Centurion XVII (Version 17)

Statgraphics Centurion XVII (currently in beta test) is a major upgrade to Statpoint's flagship data analysis and visualization product. It contains 32 new statistical procedures and significant upgrades

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless the volume of the demand known. The success of the business

Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

Calibration and Linear Regression Analysis: A Self-Guided Tutorial

Calibration and Linear Regression Analysis: A Self-Guided Tutorial Part 1 Instrumental Analysis with Excel: The Basics CHM314 Instrumental Analysis Department of Chemistry, University of Toronto Dr. D.

Manual Analysis Software AFD 1201

AFD 1200 - AcoustiTube Manual Analysis Software AFD 1201 Measurement of Transmission loss acc. to Song and Bolton 1 Table of Contents Introduction - Analysis Software AFD 1201... 3 AFD 1200 - AcoustiTube

CHAPTER 9 ADD-INS: ENHANCING EXCEL This chapter discusses the following topics: WHAT CAN AN ADD-IN DO? WHY USE AN ADD-IN (AND NOT JUST EXCEL MACROS/PROGRAMS)? ADD INS INSTALLED WITH EXCEL OTHER ADD-INS

Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

Partial Least Squares (PLS) Regression.

Partial Least Squares (PLS) Regression. Hervé Abdi 1 The University of Texas at Dallas Introduction Pls regression is a recent technique that generalizes and combines features from principal component

An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

Excel Guide for Finite Mathematics and Applied Calculus

Excel Guide for Finite Mathematics and Applied Calculus Revathi Narasimhan Kean University A technology guide to accompany Mathematical Applications, 6 th Edition Applied Calculus, 2 nd Edition Calculus:

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

Scientific Graphing in Excel 2010

Scientific Graphing in Excel 2010 When you start Excel, you will see the screen below. Various parts of the display are labelled in red, with arrows, to define the terms used in the remainder of this overview.

Analysis of Bayesian Dynamic Linear Models

Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main

Appendix 2.1 Tabular and Graphical Methods Using Excel

Appendix 2.1 Tabular and Graphical Methods Using Excel 1 Appendix 2.1 Tabular and Graphical Methods Using Excel The instructions in this section begin by describing the entry of data into an Excel spreadsheet.

Trend and Seasonal Components

Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing

Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

Introducing the Multilevel Model for Change

Department of Psychology and Human Development Vanderbilt University GCM, 2010 1 Multilevel Modeling - A Brief Introduction 2 3 4 5 Introduction In this lecture, we introduce the multilevel model for change.

Module 6: Introduction to Time Series Forecasting

Using Statistical Data to Make Decisions Module 6: Introduction to Time Series Forecasting Titus Awokuse and Tom Ilvento, University of Delaware, College of Agriculture and Natural Resources, Food and

GelAnalyzer 2010 User s manual. Contents

GelAnalyzer 2010 User s manual Contents 1. Starting GelAnalyzer... 2 2. The main window... 2 3. Create a new analysis... 2 4. The image window... 3 5. Lanes... 3 5.1 Detect lanes automatically... 3 5.2

Microsoft Excel. Qi Wei

Microsoft Excel Qi Wei Excel (Microsoft Office Excel) is a spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot

Zahner 08/2013. Monitor measurements in a separate window.

Zahner 08/2013 Monitor measurements in a separate window. Online Display - 2-1. Introduction...3 2. Installation and Removal...3 2.1 Installation Procedure... 3 2.2 Removal... 4 3. Usage of the Online

Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

Chapter 6: The Information Function 129. CHAPTER 7 Test Calibration

Chapter 6: The Information Function 129 CHAPTER 7 Test Calibration 130 Chapter 7: Test Calibration CHAPTER 7 Test Calibration For didactic purposes, all of the preceding chapters have assumed that the

Data exploration with Microsoft Excel: univariate analysis

Data exploration with Microsoft Excel: univariate analysis Contents 1 Introduction... 1 2 Exploring a variable s frequency distribution... 2 3 Calculating measures of central tendency... 16 4 Calculating

Package MDM. February 19, 2015

Type Package Title Multinomial Diversity Model Version 1.3 Date 2013-06-28 Package MDM February 19, 2015 Author Glenn De'ath ; Code for mdm was adapted from multinom in the nnet package

Engineering Problem Solving and Excel. EGN 1006 Introduction to Engineering

Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques

TIBCO Spotfire Network Analytics 1.1. User s Manual

TIBCO Spotfire Network Analytics 1.1 User s Manual Revision date: 26 January 2009 Important Information SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.\$ and Sales \$: 1. Prepare a scatter plot of these data. The scatter plots for Adv.\$ versus Sales, and Month versus

Promotional Forecast Demonstration

Exhibit 2: Promotional Forecast Demonstration Consider the problem of forecasting for a proposed promotion that will start in December 1997 and continues beyond the forecast horizon. Assume that the promotion

PGR Computing Programming Skills

PGR Computing Programming Skills Dr. I. Hawke 2008 1 Introduction The purpose of computing is to do something faster, more efficiently and more reliably than you could as a human do it. One obvious point

Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

Using Microsoft Excel to Plot and Analyze Kinetic Data

Entering and Formatting Data Using Microsoft Excel to Plot and Analyze Kinetic Data Open Excel. Set up the spreadsheet page (Sheet 1) so that anyone who reads it will understand the page (Figure 1). Type

Time Series Analysis. 1) smoothing/trend assessment

Time Series Analysis This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values, etc. Usually the intent is to discern whether

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

Formulas, Functions and Charts

Formulas, Functions and Charts :: 167 8 Formulas, Functions and Charts 8.1 INTRODUCTION In this leson you can enter formula and functions and perform mathematical calcualtions. You will also be able to

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c

Unit 7 Quadratic Relations of the Form y = ax 2 + bx + c Lesson Outline BIG PICTURE Students will: manipulate algebraic expressions, as needed to understand quadratic relations; identify characteristics

1 Example of Time Series Analysis by SSA 1

1 Example of Time Series Analysis by SSA 1 Let us illustrate the 'Caterpillar'-SSA technique [1] by the example of time series analysis. Consider the time series FORT (monthly volumes of fortied wine sales

Univariate and Multivariate Methods PEARSON. Addison Wesley

Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

Exploratory Data Analysis with MATLAB

Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study

A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these

Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree of PhD of Engineering in Informatics

INTERNATIONAL BLACK SEA UNIVERSITY COMPUTER TECHNOLOGIES AND ENGINEERING FACULTY ELABORATION OF AN ALGORITHM OF DETECTING TESTS DIMENSIONALITY Mehtap Ergüven Abstract of Ph.D. Dissertation for the degree

The North Carolina Health Data Explorer

1 The North Carolina Health Data Explorer The Health Data Explorer provides access to health data for North Carolina counties in an interactive, user-friendly atlas of maps, tables, and charts. It allows

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

Multivariate Normal Distribution

Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues