Seasonal adjustment methods for the analysis of. respiratory disease in environmental epidemiology
|
|
|
- Emily Cummings
- 9 years ago
- Views:
Transcription
1 Seasonal adjustment methods for the analysis of respiratory disease in environmental epidemiology Bircan Erbas 1 and Rob J Hyndman 2 9 August 2000 Abstract: We study the relationship between daily hospital admissions for respiratory disease and various pollutant and climatic variables, looking particularly at the effect of seasonal adjustment on the estimated models. Often time series exhibit seasonal behaviour and adequate control for the presence of a seasonal component is essential before one attempts to model the complex pollutionhealth association. We show that if these factors are not adequately controlled for, spurious effects of pollutants and climate on morbidity/mortality can be induced. We present a method of seasonal adjustment called STL (Seasonal-Trend decomposition based on Loess smoothing), and apply it to pollution and climate data. We will use the seasonally adjusted series in a Generalized Linear Models and Generalized Additive Models analysis of the effects of pollution and climate on hospital admissions for Chronic Obstructive Pulmonary Disease in Melbourne, Australia for the period Department of General Practice & Public Health, The University of Melbourne, VIC 3010, Australia. 2 Department of Econometrics and Business Statistics, Monash University, VIC 3800, Australia. 1
2 Introduction The presence of a long-wave length pattern in hospital admissions and mortality for Respiratory Disease has been a common methodological issue in many studies 1,2,3. These long-wave length patterns are commonly known as seasonality in time series. A seasonal pattern exists when a series is influenced by a seasonal factor (e.g., day of week, or month of year) 4. Seasonal patterns in the response (hospital admissions and mortality for respiratory disease) have been commonly estimated using Fourier series terms 5,6,7. Although this method of modelling seasonality works well in adequately capturing the underlying seasonal pattern in the response, it doesn t accommodate the possible seasonal pattern in each of the explanatory variables. When there is seasonality in the explanatory variables, there will inevitably be colinearity which may lead to spurious conclusions concerning the effect of an individual pollutant, and makes it difficult to separate the effects of pollutants. Despite these serious difficulties, accounting for the presence of a seasonal pattern in the pollutants and climatic data has been largely ignored. Our approach will be to first seasonally adjust all explanatory variables. This greatly reduces the colinearity problem, without affecting the interpretability of the model. It might seem appropriate and consistent to also seasonally adjust the response variable, however this is not advisable since it is a count variable and we wish to use a Poisson model. Instead, we include Fourier series terms to control for the seasonal pattern in hospital admissions for COPD. To seasonally adjust the pollutants and climatic series we will use a time series seasonal adjustment method called the STL (Seasonal-Trend Decomposition procedure based on Loess smoothing) method. 8. Daily counts of hospital admissions for ICD 496 (chronic obstructive airways disease) were obtained from the Public Health & Development Division of the Department of Human Services, Victoria, Australia. Air pollution data was obtained from the Environment Protection Authority (EPA), which maintains a network of 12 monitoring stations around Erbas and Hyndman. 9 August 2000 Page 2
3 Melbourne. Daily maximum hourly levels of Nitrogen Dioxide (NO 2 ), Ozone (O 3 ), Sulfur Dioxide (SO 2 ) and the Air Particles Index (API) were obtained. Daily humidity (hu) and dry bulb temperature (db) measurements were obtained from the Commonwealth Bureau of Meteorology which has four major stations in the Melbourne metropolitan area. We will study the effect of the pollutants and climate variables on hospital admissions for Chronic Obstructive Pulmonary Disease (COPD) in Melbourne, Australia for the period We will apply the STL method to seasonally adjust all variables. We will utilize Generalized Linear Models (GLMs) 9 and Generalized Additive Models (GAMs) 10 to analyze the effects of pollutants and climate on hospital admissions for COPD. We will then compare these results with those we have obtained previously 11. Methodology Time series seasonal adjustment A crucial preliminary step before modeling the short term effects of pollution and climate on daily hospital admissions/mortality, is the examination of the underlying behaviour of each of these potential covariates. Since pollutants and climatic variables are time series, decomposition methods may be used to break up a time series into the following components: trend, seasonal and irregular. The trend component consists of the underlying long-term aperiodic rises and/or falls in the level of the series over time. The seasonal component is a pattern that is recurrent over time. The irregular component is the remaining pattern in the series not attributed to trend or seasonality 4. Both trend and seasonality are potential confounding variables in any analysis, so their identification and removal are important. Seasonality is an established strong confounding variable in the analysis of daily hospital admissions/mortality data. Time series decomposition methods allow us to identify Erbas and Hyndman. 9 August 2000 Page 3
4 the strength of the seasonal component in each of the pollutants and climatic variables. After identification, the seasonal component of the series will be removed and the resultant seasonally adjusted series will be used in subsequent analysis. Thus extracting the seasonal component will allow a clearer picture of the other features of the time series. A number of time series decomposition methods are available. A relatively simple decomposition method is classical decomposition 4, but that has several drawbacks including bias problems near the ends of the series and an inability to allow a smoothly varying seasonal component. To overcome these difficulties, we prefer the STL (Seasonal-Trend decomposition procedure based on Loess) method. We will assume an additive decomposition: Y t = T t + S t + E t where Y t denotes the time series of interest, T t denotes the trend component, S t denotes the seasonal component and E t denotes the remainder (or irregular) component. The seasonally adjusted series, Y t component from the original series, Y t is computed simply by subtracting the estimated seasonal = Y t Ŝt. STL consists of a sequence of applications of the Loess smoother 12 to give a decomposition that is highly resistant to extreme observations 4. The STL method involves an iterative algorithm to progressively refine and improve estimates of the trend and seasonal components. STL consists of two recursive procedures, one nested within the other, called the inner loop and the outer loop. In each iteration of the inner loop, the seasonal and trend-cycle components are updated once. An iteration of the outer loop consists of one or two iterations of the inner loop followed by an identification of extreme values. Future iterations of the inner loop downweight the extreme values that were identified in the previous iteration of the outer loop. Between 10 and 20 iterations of the outer loop are usually carried out in total. We describe the steps for a single iteration of the inner loop assuming a series Y t which consists of daily observations (so the seasonal period is 365). The iteration consists of updating the estimate of the trend component and calculating a new estimate of the sea- Erbas and Hyndman. 9 August 2000 Page 4
5 sonal component. The whole procedure must start with some initial estimate of the trend. This is set to be zero. That is, the procedure begins by assuming no trend at all. This poor estimate is quickly updated to something more reasonable after one iteration. Step 1 Subtract an estimated trend from the original data to obtain the detrended values Y t = Y t ˆT t. Step 2 For each day of the year, the detrended values are collected to construct a daily sub-series. Each of the 365 sub-series are separately smoothed by a Loess smoother. A preliminary seasonal component is constructed by connecting the smoothed subseries back together. An estimate of the seasonal component a few days before and after the observed data are used to extrapolate the Loess smoother. Step 3 A moving average is applied to the preliminary seasonal component estimated in Step 2. The result is in turn smoothed by a Loess smoother of length 365. Applying a weighted-moving average resulted in a loss of values at both the beginning and the end of the series. However, this was overcome by the extrapolation of the seasonal component in Step 2. The purpose of this step is to identify any trend that may have contaminated the preliminary seasonal component in Step 2. If there is little trend in the preliminary seasonal component, the result of this step will be a series with all values close to zero. Step 4 We estimate the new seasonal component as the difference between the preliminary seasonal component in Step 2 and the smoothed seasonal component in step 3. Step 5 The result from Step 4 is subtracted from the original series to give the seasonally adjusted series Y t = Y t Ŝt. Step 6 To obtain a new estimate of the trend component T t, we smooth (by Loess) the now seasonally adjusted series Y t. Erbas and Hyndman. 9 August 2000 Page 5
6 The outer loop begins with one or two iterations of the inner loop. The resulting estimates of trend and seasonal components are then used to calculate the irregular component: Ê t = Y t ˆT t Ŝt. Large values of Êt indicate an extreme observation. These are identified and a weight calculated. That concludes the outer loop. To down weight the effects of the extreme observations, future iterations of the inner loop use these weights from Step 2 to Step 6. Also, future iterations of the inner loop begin with the trend component from the previous iteration rather than starting with zero as in the very first iteration of the inner loop. There are two Loess smoothing parameters that must be selected when using the STL procedure: the seasonal smoothing parameter used in Steps 2 and Steps 6, and a smoothing parameter for the trend component calculated in Step 6 of the inner loop. Both smoothing parameters determine the variation from year to year in the seasonality and the trend. Small smoothing parameters allow substantial variation from year to year and a large smoothing parameter allows very little variation from year to year. The procedure is implemented using S-Plus 2000 for Windows. Generalized linear and additive models Generalized Linear Models (GLMs) with quasi-likelihood estimation 9 to model the overdispersion frequently encountered in hospital admissions data 3 were applied to COPD hospital admissions in Melbourne, Australia for the period Potential explanatory variables are the seasonally adjusted nitrogen dioxide, ozone, humidity, dry bulb temperature and air particle index, and the non seasonally adjusted sulfur dioxide. We also included day of week dummy variables, and Fourier series terms (i.e., cos(2πkt/365) and sin(2πkt/365) for k = 1, 2, 3, 4) to control for the seasonal effect in COPD hospital admissions. Lags of up to 2 days were included in the analysis for each Erbas and Hyndman. 9 August 2000 Page 6
7 pollutant and climatic variable. Covariates were selected using an efficient step-wise selection process in S-Plus using Akaike s Information Criterion 13 (AIC) to evaluate different models. The model with the smallest AIC was chosen as the final model. A non parametric alternative to the GLM is the Generalized Additive Model 10 (GAM). These models allow each of the explanatory variables to enter the model in a non-linear manner. As for the GLM, we used step-wise selection in S-Plus, to choose the covariates, selecting the model with the smallest AIC statistic. The same explanatory variables were used as for the GLM, except that we allowed each of them to enter the model non-linearly. The non-linear functions were estimated using cubic smoothing splines with four degrees of freedom. The AIC was used to determine whether a variable should be included in the model using a spline or as a linear function. Results Seasonal adjustment Figure 1 displays a time series plot of the response series (COPD) and each of the pollutants and climatic series. There is evidence of seasonality in all but SO 2. For each series, we determined the length of the seasonal and trend window for the loess smoothing by trial and error, selecting values which appear reasonable for the data. We also assume a periodic component for seasonality; that is, there is the same cycle for each year of the series. We use decomposition plots 4,14 to help visualize the decomposition procedure. A decomposition plot displays a time series plot of the original data in the top panel and the remaining panels provide a plot of the trend, seasonal pattern and remaining variation which is not accounted for by the trend and seasonality. Erbas and Hyndman. 9 August 2000 Page 7
8 Figure 1: A time plot of each series in the data set, for the time period 1 July 1989 to 31 December 1992 Erbas and Hyndman. 9 August 2000 Page 8
9 Figure 2: A decomposition plot of COPD, for the time period 1 July 1989 to 31 December 1992 In Figure 2 we display the decomposition plot for COPD as an example. The length of the bars on each side of the decomposition plots are an indication of the strength of the individual components. Each bar is the same length, but plotted on different scales. Clearly there is very little trend in the series (indicated by the long bar in the trend plot) but substantial seasonality present (indicated by the shorter bar in the seasonal plot). A scatter plot matrix is an exploratory graphical method introduced by Chambers et al. 15 (1983) to investigate more than two series in an multi-dimensional space. In Erbas & Hyndman 11 (2000) we used a scatter plot matrix to show that several supposed non-linear relationships between hospital admissions for COPD and the pollutant and climatic covariates were, in fact, induced by seasonality. Erbas and Hyndman. 9 August 2000 Page 9
10 Figure 3: Pairwise scatter plots for hospital admissions for COPD, pollutants and climate. All variables seasonally adjusted except SO 2. Figure 3 displays a scatter plot matrix of the data after all covariates (including COPD) have been seasonally adjusted. It is difficult to visualize the non-linear relationships between COPD hospital admissions and climate reported in previous studies 16,17,18. However, we shall see that there is some non-linearity between COPD and db temperature, and COPD and SO 2, both covariates lagged by two days. Erbas and Hyndman. 9 August 2000 Page 10
11 Table 1: Regression coefficients, corresponding standard errors and p values obtained by a GLM analysis Parameter Estimate Standard Error p-value Intercept NO 2,t hu t t t D D D D D D sin(2πt/365) cos(2πt/365) sin(4πkt/365) cos(4πkt/365) sin(6πkt/365) cos(6πkt/365) sin(8πkt/365) cos(8πkt/365) Generalized linear models Table 1 displays the estimation results of a GLM analysis with hospital admissions for COPD as the response and day of week dummies, a quadratic time trend, and all other explanatory variables were included linearly in the model. A GLM analysis for COPD hospital admissions in Melbourne, Australia for the period was previously reported in Erbas & Hyndman 11 (2000). However, the GLMs didn t include a seasonal adjustment of the explanatory variables that exhibited an underlying seasonal pattern. In our previous analysis, we reported a similar GLM except that our previous model included API lagged at 1 and 2 days. All other variables are the same. Thus, the effect of API appears to be spurious and induced by seasonality. Erbas and Hyndman. 9 August 2000 Page 11
12 Table 2: Regression coefficients of linear terms, corresponding standard errors and p values obtained by a GAM analysis Parameter Estimate Standard Error p-value Intercept NO 2,t ozone t api t hu t hu t D D D D D D sin(2πt/365) cos(2πt/365) sin(4πkt/365) cos(4πkt/365) sin(6πkt/365) cos(6πkt/365) sin(8πkt/365) cos(8πkt/365) Generalized additive models We obtained the following GAM for the data: { E(Y t X t ) = exp β 0 + β 1 NO 2,t + β 2 ozone t 2 + g 3 (SO 2,t 2 ) + β 4 api t 2 + g 5 (db t 2 ) + β 6 hu t + β 7 hu t 2 + g 8 (t) + β 9 D 1 + β 10 D 2 + β 11 D 3 + β 12 D 4 + β 13 D 5 + β 14 D 6 4 } + [γ k cos(2πkt/365) + θ k sin(2πkt/365)] k=1 (1) where Y t X t is Pseudo-Poisson (i.e., Poisson with overdispersion). Note that SO 2 lagged 2 days, db t lagged 2 days, and time were modelled using a non-linear (spline) function. All other variables are included linearly. Table 2 displays the coefficients for the linear terms in model 1. Erbas and Hyndman. 9 August 2000 Page 12
13 This differs from the model reported in Erbas & Hyndman 11 (2000) in that humidity and dry bulb temperature (lagged 2 days) are now significant and included. All other variables are the same. Thus, it appears that the seasonality in humidity and dry bulb temperature was masking their importance in our previous analysis, and that the seasonal adjustment done here has led to their inclusion. Apart from the non-linear time trend, two covariates were selected to be non-linear: dry bulb temperature and sulphur dioxide (both lagged by 2 days). Figure 4 depicts these relationships. Figure 4: Non-linear functions in the generalized additive model (1), fitted using cubic smoothing splines. (a) g 8 (t), the smooth underlying time trend; (b) g 3 (SO 2,t 2 ), the non-linear function of sulphur dioxide (lagged 2 days); (c) g 5 (db t 2 ), the non-linear function of dry bulb temperature (lagged 2 days). Dashed lines represent pointwise 95% confidence intervals. In the analysis of daily morbidity/mortality, serial correlation is also an important methodological issue 2,3. Autocorrelation plots of the residuals allow a visual examination of any remaining correlation structure. We will use randomized quantile residuals 19 developed for non normal data. An autocorrelation plot of the randomized quantile residuals from the GAM analysis in 1 is displayed in Figure 5. There is clearly very little remaining significant correlation in the residuals after both seasonally adjusting the explanatory variables and applying GAM methodology. Erbas and Hyndman. 9 August 2000 Page 13
14 Figure 5: Autocorrelation plot of random quantile residuals from the fit of the GAM in equation (1). The equivalent plot for the GLM showed more significant autocorrelation at lags 1, 3 and 7. The GLM randomized quantile residuals exhibited greater (although still small) autocorrelation which was significant at lags 1, 3 and 7. Allowing non-linear functions as in the GAM can greatly reduce the autocorrelation 11 inherent in morbidity/mortality data and thereby greatly simplify the analysis. Discussion The analysis described here demonstrates that the effects of pollution and climate on daily counts of morbidity/mortality can be masked by seasonality. This arises because of the colinearity between the variables which is induced by seasonality. We argue that it is only possible to assess the effects of individual variables if this confounding is reduced via seasonal adjustment. We have presented a time series decomposition method, namely STL, that handles any length of seasonality and can handle time series with missing values, something other seasonal adjustment methods cannot handle easily. We have used decomposition plots to break down the variables into a seasonal and a trend component. Then the explanatory variables were seasonally adjusted accordingly. We then use a GLM and a GAM analysis with the seasonally adjusted variables to study Erbas and Hyndman. 9 August 2000 Page 14
15 the effects of each variable on COPD hospital admissions. In both the GLM and GAM analysis, a linear effect of nitrogen dioxide is statistically significant. We identified a significant nonparametric smooth effect of dry bulbs temperature lagged 2 days in the GAM analysis. This was not identified in our previous analysis 11 where seasonal adjustment was not employed. We have shown that seasonality masked the true effects of temperature and humidity, since they became significant only after seasonal adjustment. The issue of autocorrelation in the residuals was overcome in the GAM analysis, since we were able to have a combination of linear and smooth effects of the regressors. We find that allowing non-linear models can substantially reduce the autocorrelation problem in mortality/morbidity data. We can view seasonal adjustment as a prelude to more sophisticated analysis, enabling a clearer understanding of the nature of the pollution-climate mixture, and allowing an examination of the unobscured relationships between the covariates and daily counts of morbidity/mortality. References 1 GOLDSTEIN, I.F. & CURRIE, B. (1984) Seasonal patterns of asthma: a clue to etiology, Environmental Research, 33, THURSTON, G.D. & KINNEY, P.L. (1995) Air pollution epidemiology: considerations in time series modeling, Inhalation Toxicology, 7, SCHWARTZ, J., SPIX, C., TOULOUMI, G., BACAROVA, L., BARUMAMDZADEH, T., TERTRE, A LE., PIEKARSKI, T., PONCE DE LEON, A., PONKA, A., ROSSI, G., SAEZ, M., & SCHOUTEN, J.P. (1996) Methodological issues in studies of air pollution and daily counts of deaths or hospital admissions, Journal of Epidemiology & Community Health, 50(suppl 1), s3 s11. 4 MAKRIDAKIS, S., WHEELWRIGHT, S.C., & HYNDMAN, R.J. (1998) Forecasting: Erbas and Hyndman. 9 August 2000 Page 15
16 methods & applications, 3rd ed., New York: Wiley & Sons. 5 THURSTON, G.D., ITO, K., KINNEY, P.L. & LIPPMANN, M. (1992) A multi-year study of air pollution and respiratory hospital admissions in three New York state metropolitan areas: results for 1988 and 1989 summers, Journal of Exposure Analysis and Environmental Epidemiology, 2, SUNYER, J., CASTELLSAGUE, J., SAEZ, M., TOBIAS, A. & ANTO, J. (1996) Air pollution & mortality in Barcelona, Journal of Epidemiology & Community Health, 50(suppl), S76 S80. 7 SIMPSON, R., WILLIAMS, G., PTEROESCHEVSKY, A., MORGAN, G., & RUTHER- FORD, S. (1997) Associations between outdoor air pollution and daily mortality in Brisbane, Australia, Archives of Environmental Health, 52, CLEVELAND, R.B., CLEVELAND, W.S., MCRAE, J.E., & TERPENNING, I. (1990) STL: a seasonal-trend decomposition procedure based on Loess (with discussion), Journal of Official Statistics, 6, MCCULLAGH, P. & NELDER J.A. (1989) Generalized linear models, London: Chapman and Hall. 10 HASTIE, T. & TIBSHIRANI, R.J. (1990) Generalized additive models, London: Chapman and Hall. 11 ERBAS, B. & HYNDMAN, R.J. (2000) The effect of air pollution & climate on hospital admissions for chronic obstructive airways disease: a non-parametric alternative, Submitted,. 12 CLEVELAND, W.S. & DEVLIN, S. (1988) Locally weighted regression: an approach to regression analysis by local fitting, Journal of the American Statistical Association, 74, AKAIKE, H. (1973) Information theory & an extension of the maximum likelihood principle, 2nd International Symposium on Information Theory, B.N. Petrov & F.Csaki(eds), Adademiai Kidao, Budapest, Erbas and Hyndman. 9 August 2000 Page 16
17 14 ERBAS, B. & HYNDMAN, R.J. (2000) Data visualization for time series in environmental epidemiology, Submitted,. 15 CHAMBERS, J.M., CLEVELAND, W.S., KLEINER, B., & TUKEY, P.A. (1983) Graphical methods for data analysis, New York: Chapman & Hall. 16 KINNEY, P.L. & OZKAYNAK, H. (1991) Associations of daily mortality & air pollution in Los Angeles County, Environmental Research, 54, SCHWARTZ, J (1995) Short term fluctuations in air pollution and hospital admissions of the elderly for respiratory disease, Thorax, 50, MORGAN, G., CORBETT, S., & WLODARCYZK, J. (1998) Air pollution and hospital admissions in Sydney, Australia, , American Journal of Public Health, 88, DUNN, P & SMYTH, G (1996) Randomized quantile residuals, Journal of Computational and Graphical Statistics, 5, BATES, D.V., BAKER-ANDERSON, M. & SITZO, R. (1990) Asthma attack periodicity: a study of hospital emergency visits in Vancouver, Environmental Research, 51, SAEZ, M., SUNYER, J., CASTELLSAGUE, J., MURILLO, C. & ANTO, J.M. (1995) Relationship between weather temperature & mortality: a time series analysis approach in Barcelona, International Journal Of Epidemiology, 24, BALLESTER, F., CORELLA, D., PEREZ-HOYOS, S., SAEZ, M. & HERVAS, A. (1997) Mortality as a function of temperature. a study in Valencia, Spain, , International Journal of Epidemiology, 26, Erbas and Hyndman. 9 August 2000 Page 17
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
Moving averages. Rob J Hyndman. November 8, 2009
Moving averages Rob J Hyndman November 8, 009 A moving average is a time series constructed by taking averages of several sequential values of another time series. It is a type of mathematical convolution.
7 Time series analysis
7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are
Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data.
Estimating and Forecasting Network Traffic Performance based on Statistical Patterns Observed in SNMP data. K. Hu 1,2, A. Sim 1, Demetris Antoniades 3, Constantine Dovrolis 3 1 Lawrence Berkeley National
Analysis of algorithms of time series analysis for forecasting sales
SAINT-PETERSBURG STATE UNIVERSITY Mathematics & Mechanics Faculty Chair of Analytical Information Systems Garipov Emil Analysis of algorithms of time series analysis for forecasting sales Course Work Scientific
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
TIME SERIES ANALYSIS
TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 [email protected]. Introduction Time series (TS) data refers to observations
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
TIME SERIES ANALYSIS
TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 [email protected] 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these
Simple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
Advanced Forecasting Techniques and Models: ARIMA
Advanced Forecasting Techniques and Models: ARIMA Short Examples Series using Risk Simulator For more information please visit: www.realoptionsvaluation.com or contact us at: [email protected]
8. Time Series and Prediction
8. Time Series and Prediction Definition: A time series is given by a sequence of the values of a variable observed at sequential points in time. e.g. daily maximum temperature, end of day share prices,
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
Nonnested model comparison of GLM and GAM count regression models for life insurance data
Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
Exploratory Data Analyses
5 Exploratory Data Analyses 5.1 Introduction What do time series data look like? The purpose of this chapter is to provide a number of different answers to this question. In addition, we outline the rudiments
ANALYSIS AND META-ANALYSIS OF EPIDEMIOLOGICAL TIME SERIES DATA WITH APPLICATIONS IN AIR POLLUTION EPIDEMIOLOGY
ANALYSIS AND META-ANALYSIS OF EPIDEMIOLOGICAL TIME SERIES DATA WITH APPLICATIONS IN AIR POLLUTION EPIDEMIOLOGY PRACTICAL TRAINING COURSE The Contract included two Practical Training Courses (PTC). The
A comparison of forecast models to predict weather parameters
A comparison of forecast models to predict weather parameters GUIDO GUIZZI 1, CLAUDIO SILVESTRI 1, ELPIDIO ROMANO 2, ROBERTO REVETRIA 3 1 Dipartimento di Ingegneria Chimica, dei Materiali e della Produzione
A Non-parametric Approach to Modeling Exchange Rate Pass-through. in Basic Commodity Markets
A Non-parametric Approach to Modeling Exchange Rate Pass-through in Basic Commodity Markets Gülcan Önel * and Barry K. Goodwin ** * Food and Resource Economics Department, University of Florida, Gainesville,
INTERACTIONS: ENERGY/ENVIRONMENT Fossil Fuel Energy Impacts on Health - Helena Ribeiro
FOSSIL FUEL ENERGY IMPACTS ON HEALTH Helena Ribeiro Department of Environmental Health, Faculdade de Saúde Pública, Universidade de São Paulo, Brazil Keywords: Fossil Fuel, Combustion, Air Pollution, Respiratory
Rob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
Joseph Twagilimana, University of Louisville, Louisville, KY
ST14 Comparing Time series, Generalized Linear Models and Artificial Neural Network Models for Transactional Data analysis Joseph Twagilimana, University of Louisville, Louisville, KY ABSTRACT The aim
2.2 Elimination of Trend and Seasonality
26 CHAPTER 2. TREND AND SEASONAL COMPONENTS 2.2 Elimination of Trend and Seasonality Here we assume that the TS model is additive and there exist both trend and seasonal components, that is X t = m t +
Model Selection and Claim Frequency for Workers Compensation Insurance
Model Selection and Claim Frequency for Workers Compensation Insurance Jisheng Cui, David Pitt and Guoqi Qian Abstract We consider a set of workers compensation insurance claim data where the aggregate
STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS
STATISTICAL ANALYSIS OF SAFETY DATA IN LONG-TERM CLINICAL TRIALS Tailiang Xie, Ping Zhao and Joel Waksman, Wyeth Consumer Healthcare Five Giralda Farms, Madison, NJ 794 KEY WORDS: Safety Data, Adverse
MGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
GLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
exspline That: Explaining Geographic Variation in Insurance Pricing
Paper 8441-2016 exspline That: Explaining Geographic Variation in Insurance Pricing Carol Frigo and Kelsey Osterloo, State Farm Insurance ABSTRACT Generalized linear models (GLMs) are commonly used to
Time Series Analysis. 1) smoothing/trend assessment
Time Series Analysis This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values, etc. Usually the intent is to discern whether
Introduction to time series analysis
Introduction to time series analysis Margherita Gerolimetto November 3, 2010 1 What is a time series? A time series is a collection of observations ordered following a parameter that for us is time. Examples
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
9th Russian Summer School in Information Retrieval Big Data Analytics with R
9th Russian Summer School in Information Retrieval Big Data Analytics with R Introduction to Time Series with R A. Karakitsiou A. Migdalas Industrial Logistics, ETS Institute Luleå University of Technology
National Environment Protection (Ambient Air Quality) Measure. Appendix 6
SOCO National Environment Protection (Ambient Air Quality) Measure Report of the Risk Assessment Taskforce 2 Appendix 6 Pb NO 2 Possible use of Health Risk Assessment in the Review of NEPM Pollutants Specified
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Chapter 27 Using Predictor Variables. Chapter Table of Contents
Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION
PROPERTIES OF THE SAMPLE CORRELATION OF THE BIVARIATE LOGNORMAL DISTRIBUTION Chin-Diew Lai, Department of Statistics, Massey University, New Zealand John C W Rayner, School of Mathematics and Applied Statistics,
Smoothing. Fitting without a parametrization
Smoothing or Fitting without a parametrization Volker Blobel University of Hamburg March 2005 1. Why smoothing 2. Running median smoother 3. Orthogonal polynomials 4. Transformations 5. Spline functions
Automated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
Macroeconomic drivers of private health insurance coverage. nib Health Insurance
Macroeconomic drivers of private health insurance coverage nib Health Insurance 1 September 2011 Contents Executive Summary...i 1 Methodology and modelling results... 2 2 Forecasts... 6 References... 8
Exploratory Data Analysis
Goals of EDA Relationship between mean response and covariates (including time). Variance, correlation structure, individual-level heterogeneity. Guidelines for graphical displays of longitudinal data
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network
, pp.67-76 http://dx.doi.org/10.14257/ijdta.2016.9.1.06 The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network Lihua Yang and Baolin Li* School of Economics and
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model
Location matters. 3 techniques to incorporate geo-spatial effects in one's predictive model Xavier Conort [email protected] Motivation Location matters! Observed value at one location is
Time series forecasting
Time series forecasting 1 The latest version of this document and related examples are found in http://myy.haaga-helia.fi/~taaak/q Time series forecasting The objective of time series methods is to discover
Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index
Big Data, Socio- Psychological Theory, Algorithmic Text Analysis, and Predicting the Michigan Consumer Sentiment Index Rickard Nyman *, Paul Ormerod Centre for the Study of Decision Making Under Uncertainty,
Effect of Heat Stress on Lactating Sows
NCSU Statistics Department Consulting Project Effect of Heat Stress on Lactating Sows Client : Santa Mendoza Benavides, Department of Animal Science Consulting Team: Sihan Wu, Bo Ning Faculty Advisor:
Time series analysis as a framework for the characterization of waterborne disease outbreaks
Interdisciplinary Perspectives on Drinking Water Risk Assessment and Management (Proceedings of the Santiago (Chile) Symposium, September 1998). IAHS Publ. no. 260, 2000. 127 Time series analysis as a
Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
Financial Risk Management Exam Sample Questions/Answers
Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period
Testing for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
A spreadsheet Approach to Business Quantitative Methods
A spreadsheet Approach to Business Quantitative Methods by John Flaherty Ric Lombardo Paul Morgan Basil desilva David Wilson with contributions by: William McCluskey Richard Borst Lloyd Williams Hugh Williams
Centre for Central Banking Studies
Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
EST.03. An Introduction to Parametric Estimating
EST.03 An Introduction to Parametric Estimating Mr. Larry R. Dysert, CCC A ACE International describes cost estimating as the predictive process used to quantify, cost, and price the resources required
A Novel Technique for Long-Term Anomaly Detection in the Cloud
A Novel Technique for Long-Term Anomaly Detection in the Cloud Owen Vallis, Jordan Hochenbaum, Arun Kejariwal Twitter Inc. Abstract High availability and performance of a web service is key, amongst other
Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?
Forecasting Methods What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Prod - Forecasting Methods Contents. FRAMEWORK OF PLANNING DECISIONS....
How To Understand The Theory Of Probability
Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL
TIME-SERIES ANALYSIS, MODELLING AND FORECASTING USING SAS SOFTWARE
TIME-SERIES ANALYSIS, MODELLING AND FORECASTING USING SAS SOFTWARE Ramasubramanian V. IA.S.R.I., Library Avenue, Pusa, New Delhi 110 012 [email protected] 1. Introduction Time series (TS) data refers
Fitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: [email protected] Currie,
Exploratory Data Analysis with MATLAB
Computer Science and Data Analysis Series Exploratory Data Analysis with MATLAB Second Edition Wendy L Martinez Angel R. Martinez Jeffrey L. Solka ( r ec) CRC Press VV J Taylor & Francis Group Boca Raton
MATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
Time series analysis of the dynamics of news websites
Time series analysis of the dynamics of news websites Maria Carla Calzarossa Dipartimento di Ingegneria Industriale e Informazione Università di Pavia via Ferrata 1 I-271 Pavia, Italy [email protected] Daniele
Building risk prediction models - with a focus on Genome-Wide Association Studies. Charles Kooperberg
Building risk prediction models - with a focus on Genome-Wide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
Regression Modeling Strategies
Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination
Australian Journal of Basic and Applied Sciences, 5(7): 1190-1198, 2011 ISSN 1991-8178 Own Damage, Third Party Property Damage Claims and Malaysian Motor Insurance: An Empirical Examination 1 Mohamed Amraja
TIME SERIES ANALYSIS & FORECASTING
CHAPTER 19 TIME SERIES ANALYSIS & FORECASTING Basic Concepts 1. Time Series Analysis BASIC CONCEPTS AND FORMULA The term Time Series means a set of observations concurring any activity against different
MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.
MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of
TIME SERIES ANALYSIS. A time series is essentially composed of the following four components:
TIME SERIES ANALYSIS A time series is a sequence of data indexed by time, often comprising uniformly spaced observations. It is formed by collecting data over a long range of time at a regular time interval
Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
Simple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
Statistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
MSCA 31000 Introduction to Statistical Concepts
MSCA 31000 Introduction to Statistical Concepts This course provides general exposure to basic statistical concepts that are necessary for students to understand the content presented in more advanced
Analysis of Bayesian Dynamic Linear Models
Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, 2010 1 Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Premaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Financial Risk Management Exam Sample Questions
Financial Risk Management Exam Sample Questions Prepared by Daniel HERLEMONT 1 PART I - QUANTITATIVE ANALYSIS 3 Chapter 1 - Bunds Fundamentals 3 Chapter 2 - Fundamentals of Probability 7 Chapter 3 Fundamentals
Teaching Multivariate Analysis to Business-Major Students
Teaching Multivariate Analysis to Business-Major Students Wing-Keung Wong and Teck-Wong Soon - Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command
Forecast Forecast is the linear function with estimated coefficients T T + h = b0 + b1timet + h Compute with predict command Compute residuals Forecast Intervals eˆ t = = y y t+ h t+ h yˆ b t+ h 0 b Time
