# Analysis of Bayesian Dynamic Linear Models

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Analysis of Bayesian Dynamic Linear Models Emily M. Casleton December 17, Introduction The main purpose of this project is to explore the Bayesian analysis of Dynamic Linear Models (DLMs). The main area in which DLMs are used is in modeling observations collected over time for the purposes of forecasting or detection of model shifts. The model utilized by DLMs is actually a sequence of models which are updated at each time step, justifying the word dynamic. The characteristic of interest or unknown parameters in the time series is modeled as θ, possibly a vector of parameters, and the analysis of the evolution of θ most commonly the main goal. Forecasting is also a common goal, but this also depends on how θ is behaving over time. A model is fit to the parameter(s) of interest at time t 1 and using this model a value is forecasted for time t. Then, an observation is received at time t, compared to that which was forecasted, and the model is then updated given this new information as well as any newly obtained and relevant outside information. The Bayesian analysis is the most natural to allow for this updating of information at each time, t. The DLM is specified by the following observation equation and system equation: y t = F t θ t + ν t ν t (0, V t ) (1) θ t = G t θ t 1 + ω t ω t (0, W t ) (2) The observation equation (1) above models the observation vector at time t. These values are modeled with F t, a design matrix of known values of the independent variable(s). This is multiplied by the state or system vector θ t and then added to ν t which represents the observation error assumed to have mean 0. The system equation (2) models the state vector as the sum of the zero mean system/evolution errors, ω t, and the product of the state vector at the previous time, θ t 1, and the matrix G t, known as the evolution, system, transfer, or state matrix. The observation and evolution errors are considered independent of each other and internally independent. More concisely a DLM can be characterized by the following quadruple {F t, G t, V t, W t }, each of which could or could not be dependent upon time. For example, the quadruple {1, 1, V, W } represents a purely random walk if the distribution of the error terms is assumed to be normal. In the textbook problems, all four values of this quadruple are 1

2 considered known. Clearly, F t and G t are chosen by the modeler in accordance to the design of the model. In practice, the evolution variance, W t, is typically a value chosen and it is only the V t which is often unknown and sometimes needed to be estimated from the data. Because a DLM is dynamic, it is only locally appropriate in time. The model in Equation 2 is appropriate at time, t, until an observation, y t, is observed and the model is updated according to the new information. The amount of information at available at time t will be designated as D t. This new information can consist of information other than just the observation. For example, if the observations represent the amount of sales of a company during month t, and it is known that a rival company is going out of business at month t + 1, this expected increase in sales can also be included. In this project, however, a model will attempted to be fit to simulated data, thus the information matrix will be closed to external information, i.e. D t = {Y t, D t 1 }. In addition symbolically, the initial information available before observation of the process, i.e. at time t = 0, will be represented at D o. According to West, Harrison, and Migon [3], a key feature of the Bayesian analysis of DLMs is the use of a conjugate prior and posterior distribution for the parameters. Typically this distribution is taken to be the normal distribution. This is the distribution that will be considered for all models examined in this project. This leads to the following restatement of the general system and observation equations in Equation 2. (Y t θ t ) N[F t θ t, V t ] (3) (θ t θ t 1 ) N[G t θ t 1, W t ] (4) In addition, the initial information, displayed through the prior distribution of the parameter, is assumed normal, with initial estimates of the mean, µ o, and variance, C o. (θ o D o ) N[m o, C o ] (5) It is also assumed that the observation and system errors are independent of the initial information. Sequentially, this leads to another key to the analysis argued by West and Harrison in 2 is that given the present, the future is independent of the past. At each time increment, the following distributions describe how each is updated with respect to the new information. Posterior at t 1: (θ t 1 D t 1 ) N(m t 1, C t 1 ) Prior at t: (θ t, D t 1 ) N(a t, R t ) One-step forecast: (Y t D t 1 ) N(f t, Q t ) Posterior at t: (θ t, D t ) N(m t, C t ) The mean of the prior for θ t is updated from the mean of the posterior of θ t 1 by a t = G t m t 1 and its variance follows as R t = G t C t 1 G t + W t. The 2

3 one-step forecast mean is the computed as f t = F t a t and the variance of the distribution again follows as Q t = F t R t F t +V t. Finally, the mean of the posterior is updated through the equation m t = a t + A t e t where a t is the mean of the posterior, A t = R t F + Q 1 t is known as the adaptive vector, and e t = Y t f t is the one step forecast error, and the variance of the posterior is updated at C t = R t F t /Q t. This posterior for θ t then becomes the prior for θ t and the process is repeated for time t + 1. The general system and observation equations in Equation 4 will be used to simulate a data set from three different types of models; a random walk, a dynamic straight line with intercept located at the origin, and a dynamic linear regression. Through the use of the forecasting and recurrence distributional relations above, a dynamic linear model will attempted to be fit to each simulated dataset. The one-step forecast distribution will be plotted against the simulated values in an effort to judge the accuracy of the forecasts. Other interesting features of the analysis will also be explored. 2 Random Walk The first data set to which a Bayesian analysis of a DLM will be applied will be the simplest: a random walk. This constant model takes the form of F t = 1 and G t = 1 with observation and system error variance independent of time. The particular model that will be simulated here will take these values to be V = 4 and W = Therefore, the quadruple that describes this model is {1, 1, 4, 0.25} which leads to the system and observation equations as the following: y t = θ t + ν t ν t (0, 4) θ t = θ t 1 + ω t ω t (0, 0.25) The simulation process to obtain a random walk dataset is as follows: 1. Start with an initial system observation θ o. Here this will be taken to be Simulate the first system error, ω 1, from N(0, 0.25). 3. Calculate θ 1 = θ o + ω Simulate the first observation error, ν 1, from N(0, 4). 5. Calculate the first observation y 1 = θ 1 + ν Repeat steps 2-5 for (n-1) times. The above algorithm was run resulting in a random walk with 50 values. These values can be seen plotted against time in Figure 1. 3

4 Figure 1: Simulated Random Walk Next, it is desired to forecast this time series using the forecasting and recurrence relations of the Bayesian analysis of the DLM. The values of interest that will be examined in further detail are the mean and variance of the forecast distribution, f t and Q t, the adaptive coefficient, A t, the error between the mean of the forecast distribution and the actual value, e t, and lastly, the posterior distribution of θ t at each time step characterized by the mean and variance, m t and C t. These values will be computed at each step using the following algorithm/formulas: 1. Start with initial values for the distribution of theta with mean, m 0, and variance, C 0 and estimates for the observation error variance, V and system error variance, W. 2. Compute forecast mean f 1 = m Compute forecast variance Q 1 = R 1 + V 4. Compute the Adaptive coefficient A 1 = R 1 Q 1 = C 0 + W Q 1 4

5 Month Forecast Distr Adaptive Coef Datum Error Posterior Info t Q t f t A t y t e t m t C t Table 1: Various components of the one-step forecasting and updating recurrence relations for the random walk data. 5. Compute the forecast error e 1 = Y 1 f 1, where Y 1 is the first value in the random walk sequence. 6. Compute the posterior mean m 1 = m 0 + A 1 e Compute the posterior variance C 1 = A 1 V. 8. Repeat steps 2-7 (n 1) times. This algorithm will first be used to forecast the random walk dataset shown in Figure 1 by cheating, somewhat. The assumed known values of m 0, C 0, V and W will be taken to be those values used to simulate the data. The results of this DLM is shown plotted against the original values in Figure 2. The solid red line represents the forecast mean, f t, while the dashed red lines represent 95% confidence bands, computed with the usual normal assumption equation of f t ± 1.96 Q t. For the most part, the forecasted values react to the peaks of the observations, typically one time step behind the peak. The forecast also does not demonstrate as many random fluctuations as the original observations. The forecasted values as a sequence appear to be a one-time delayed smoother version of the original data. Although the forecasted values did not demonstrate as much variability as the original values, at most of the time steps the original value was contained within the 95% confidence band. For the first nine time steps, the values of the components of interest are shown in a table similar to that reported by West and Harrison in Table 1. It can be seen that the adaptive coefficient converges rapidly to One interpretation of the adaptive coefficient is the prior regression coefficient of θ t upon y t. This rapid convergence implies that the prior information from the previous step is given about the same weight in forecasting the next point for all times, t. 5

6 Figure 2: Simulated Random Walk with forecasted values and 95% confidence bands. The above analysis relied on the fact that the prior information was the true values used to simulate the data; so, it is not surprising that the forecast was relatively good and the confidence bands relatively narrow. A question that may be asked is, how do these initial values affect the subsequent forecast, particularly the errors? A second analysis of the same random walk data was performed, but instead of the initial values being the true values used for simulation, the known values in the model were estimated from the data. Therefore, the initial forecast mean, m 0, was taken to be the first observed value, m 0 = The value of C 0 was again taken to be 1. The variance of all 50 values from the random walk was computed to be s Therefore, the estimated values of the observation error variance, V, and system error variance, W, were taken to each be half of the estimated variance. So, V = 9.5 and W = 9.5 were used for the second analysis. Initially of interest in comparing the two DLMs with differing initial values is that of the forecast errors, e t ; so how well do they do at predicting new values. The errors for each model are shown against time in Figure 3. Here only the first 6

7 25 values are plotted to distinguish differences and because the pattern continues for the remaining 25 values of time. At time t = 1, the forecast error of the model with estimated initial prior information is 0 since the initial value of the realized sequence was taken to be the prior mean. It can be seen that the errors between the two methods are dissimilar for the first five time values, and then converge and are similar for the remaining 20. This can possibly be explained by the values of the adaptive coefficient. At the time t = 8 the coefficient of m 0 when computing m 8 is (1 A t )(1 A t 1 )... (1 A 1 ). Therefore, the initial value contributes only about 17% to the estimated value at step 8 and this value decreases as time increases. The noticeable difference between using the known values for the prior and those estimated from the data can be seen by comparing Figure 4 to Figure 2. When using the data to estimate initial values, the forecast errors react more to the fluctuations of the observations and the uncertainty associated with the forecast distribution is much larger. This could be due to the fact that, in reality the observation error variance was 16 times larger than that of the system error variance, but in our naive estimation procedure, these variances were taken to be equal to each other. 3 Dynamic straight line through (0,0) The model in the previous section was that of a time invariant random walk. The observation and system error variance were constant throughout the process and the matrices of G t and F t were both identically 1. In this section a slightly more complex model will be explored, that which models the local relationship as a straight line through the origin, with values of the slope that vary with time. Again, a constance error variance model will be used. Therefore the quadruple that describes this model is {F t, 1, V, W } which results in the following observation and system equations y t = F t β t + ν t ν t (0, V ) β t = β t 1 + ω t ω t (0, W ) The covariate used here will be time, so F t = (1, 2,..., n). To simulate from this model, an algorithm similar to that used to simulate from the random walk will be used. The only different will occur to the equation in Step 5 which will become y t = t θ t +ν t. The initial slope used to simulate the data was β 0 = 3.2. The observational variance error was taken to be V = 4 and the system variance W = 0.5. Results of simulating 50 values in this way is shown in Figure 5. The increasing trend is explained by the use of time as the covariate. The algorithm to estimate the DLM for a straight line through the origin is similar to that used for the random walk in that the same values will need to be computed. However, for the random walk the matrix F t was not time dependent, so some of the formulas will be modified to adapt to this changing value. The updated algorithm is shown below. 7

8 Figure 3: Comparison of forecast errors using the known values for the prior information versus using values estimated from the data. 1. Start with initial values for the distribution of the slope, β 0, and variance, C 0 and estimates for the observation error variance, V and system error variance, W. 2. Compute forecast mean f 1 = F 1 m Compute forecast variance Q 1 = F 2 1 R 1 + V = F 2 1 (C 0 + W ) + V 4. Compute the Adaptive coefficient A 1 = R 1F 1 Q 1 = (C 0 + W ) F 1 Q 1 5. Compute the forecast error e 1 = Y 1 f 1, where Y 1 is the first value in the random walk sequence. 6. Compute the posterior mean m 1 = m 0 + A 1 e Compute the posterior variance C 1 = RtV Q t. 8. Repeat steps 2-7 (n 1) times. 8

9 Figure 4: Random walk with predictions and 95% error bands using values estimated from the data as the prior information. The algorithm was used to compute the forecasting and recurrence distributions of the simulated data in Figure 5. Cheating was again implemented as the prior information was taken to be that which was used to simulate the data; therefore, β 0 = 3.2, C 0 = 1, V = 4,and W = 0.5 were used in the algorithm above. The resulting prediction and 95% confidence bands are shown in Figure 6. The prediction means were similar to that seen with the random walk in that the fluctuations of the forecast means lag behind that the observed values by one point in time. A big difference between the confidence bands of the random walk and those seen here are that that width of the confidence bands are increasing with time. This is due to the fact that the variance of the 1-step forecast distribution, Q t, is dependent upon the value of the covariate, as seen in step 3 in the algorithm above. Since time is the covariate, this pattern makes sence. Another difference between the predictions from the first analysis of the random walk and that of the dynamic straight line is that the forecast mean responds more to the apparent random fluctuations of the observations. Where 9

10 Figure 5: Dynamic straight line through (0,0) the DLM analysis for the random walk appeared to be more of a smoother version of the observations, the dynamic straight line analysis does not appear to smooth the original data at all. This could be partially, at least explained by the comparison of the adaptive coefficient, A t and the posterior variance, C t by examining Table 1 for the random walk and the first five columns of Table 2. The range of A t values for the random walk analysis is between 0.22 and 0.23, while for the dynamic straight line analysis it ranges from 0.27 to This implies that for the analysis of this section much less weight is place on the forecast error when computing the posterior mean. Also, the range of the posterior variance for the random walk analysis was only 0.88 to 1, while for the dynamic straight line it ranged from to This implies that for the straight line analysis the forecast variance becomes dominated by the fixed system variance and there is not much contribution from the posterior variance of θ t as time increases. Instead of examining what the results would have been if the prior information has been estimated from the data, this analysis will instead be compared to that of a static model. Table 2 compares the distribution of the posterior mean 10

11 Figure 6: Simulated dynamic straight line through (0,0) with forecasted values and 95% confidence bands. for β t through the mean m t and variance C t. Also included in the table is the adaptive coefficient, A t, for each analysis. These values are shown for the first five observations as well as the last five observations. The most obvious difference between the two models is the values of the posterior mean, m t, at large values of t. Because time was used as the covariate, the observations increased as time increased. The dynamic model accounted for this by incorporating the increasing covariate into the estimation. The static model, however, held F t fixed at 1, so to account for the increasing trend, the value of β t increased. In the static model, the posterior variance of β held steady throughout at around , while that of the dynamic model decreased to This difference can be easily seen in the formulas for computing the posterior variance in each model, with the dynamic model taking into consideration the increasing covariate. Lastly, the adaptive coefficient for the dynamic model converges to a value of 0.02, while the static model converges to an order of magnitude larger at Because the adaptive coefficient in the dynamic model is closer to 0, this implies that the prior distribution is more concentrated than the likelihood. So, 11

12 Dynamic Model Static Model F t y t m t C t A t m t C t A t Table 2: Analysis of dynamic straight line through (0,0) using both a dynamic and a static model. the static model is more sensitive to the latest value of the observation. With the decreasing values of A t and Q t for the dynamic model, this implies that the model responds less and less to the most recent data point. 4 Dynamic Linear Regression The last set of DLM that will be analyzed in this report will be that with a slope and intercept that are time dependent. Therefore, θ t = (α t, β t ) will be bi-dimensional and the observation and state equations will now become y t = α t + t β t + ν t ν t (0, V ) α t = α t 1 + ω α,t ω α,t N(0, W α,t ) β t = β t 1 + ω β,t ω β,t (0, W β,t ) The above equations imply that the quadruple that characterizes the model is {F t, 1, V, W t } where F t = (1, t) and ( ) W t Wα,t 0 = 0 W β,t Here the observation error variance will again be assumed constant, but now the system error variances will vary with time. An algorithm similar to that used to simulate the observations for the random walk and the dynamic line with intercept 0, is used to simulate a dynamic linear regression dataset. The main difference is that two independent system error variances must be simulated to obtain one observation. Here again, time 12

13 will be used as the only covariate. Fifty observations were simulated using α 0 = 12, β 0 = 1.6, V = 3.8, W α,t = t0.35 and W β,t = 0.03/t. The resulting values are shown in Figure 7. Similar to the values simulated in the previous section, the decreasing trend is due to using a negative slope and time as the covariate. Figure 7: Simulated dynamic linear regression. The algorithm used to forecast this time series is again similar to those above, with added computations due to the two parameters that must be estimated. The necessary calculations are shown below. 1. Start with initial values of the intercept, α 0, slope, β 0, the initial covariance matrix of the joint distribution of the two, C 0, a 2x2 matrix with 0 s on the off diagonal, initial estimate of observation error variance, V, and a 2x2 matrix of the system error variance again with 0 s on the off diagonal. 2. Compute forecast mean f 1 = F 1a 1 = α 0 + (1)β Compute prior variance R 1 = C 0 + W 0. 13

14 4. Compute forecast variance Q 1 = F 1R 1 F 1 + V 5. Compute the adaptive coefficient A 1 = R 1 F 1 Q Compute the forecast error e 1 = y 1 f 1 7. Compute the posterior means α 1 = α 0 +A 1(1,1) e 1 and β 1 = β 0 +A 1(2,1) e Compute the posterior variance C 1 = R 1 A 1 Q 1 A 1. The algorithm was used on the previously simulated data using the known values as the prior information. Results of this algorithm can be seen in Figure 8. For the forecast means displayed by the red solid line, a similar pattern to that of the dynamic straight line with no intercept is seen. The predicted means appear very similar to the original data, but shift to the right by one time step. The predicted line seen here and in the previous section are so similar to the original data that it begs the question if this is a result of using the known values of the variances and initial estimates in the algorithms or is there something wrong with the prediction calculations. Although the latter cannot be ruled out, the former can definitely be accused for the analysis of the dynamic linear regression model as the true structure of the system variance, i.e. that W α,t = t0.35 and W β,t = 0.03/t were utilized in the calculations. Not surprisingly, the confidence bands around the forecasted means are very narrow; and they should be, the true state of nature is known and was used in the estimation procedure, so it would follow that there is little doubt about the estimates. With a complicated model as was used in this section, one may wonder how these initial values would be determined from only having access to the observed sequence. Currently, this is an area in which the author does not have any insight. 5 Conclusions Three different types of models were simulated and a Bayesian analysis of the resulting time series was attempted using dynamic linear models. The three types of models were a random walk, a dynamic straight line with intercept through the origin, and a dynamic linear regression, where the slope and intercept were allowed to vary with the covariate, taken here to be time. The main problem encountered with the analysis was that prior information of various parameters was needed in order to perform the necessary calculations. Since the data was simulated and did not represent any scientific process, the prior information available to the modeler was the actual values used to simulate the data. Intuitively, this led to very accurate results, but this information typically is unknown, and thus this procedure would not translate well to non-simulated data. The issue of not using the known values of simulation for estimation purposes was explored in the random walk example. Here the initial information was estimated from the data values themselves. It was seen that there was not a loss 14

15 Figure 8: Simulated dynamic linear regression with forecasted values and 95% confidence bands. in the prediction errors; however, the variance of the 1-step forecast distribution was significantly larger. In addition to the process of producing initial estimates, there are many other issues that could be addressed with respect to the Bayesian analysis of dynamic linear models. For all simulations in this report the observation variance, V, was assumed to be known, although this is the value that typcially must be estimated. An analysis which has to also model this parameter is another extension of the work of this report. Also, a model that utilizes a covariate other than just time also may be of interest. Lastly, analyzing non-simulated data with this procedure would be the true test of a modelers ability. 15

16 6 Bibliography 1. Petris, Giovanni. (2010). An R Package for Dynamic Linear Models. Journal of Statistical Software, 36(12), v36/i12/. 2. West, Mike and Jeff Harrison. Bayesian Forecasting and Dynamic Models. New York: Springer Verlang, West, Mike, Jeff Harrison, and Helio Migon. Dynamic Generalized Linear Models and Bayesian Forecasting. Journal of the American Statistical Association 80 (1985):

### 11. Time series and dynamic linear models

11. Time series and dynamic linear models Objective To introduce the Bayesian approach to the modeling and forecasting of time series. Recommended reading West, M. and Harrison, J. (1997). models, (2 nd

### 17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

### Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

### Linear regression methods for large n and streaming data

Linear regression methods for large n and streaming data Large n and small or moderate p is a fairly simple problem. The sufficient statistic for β in OLS (and ridge) is: The concept of sufficiency is

### Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

### Observing the Changing Relationship Between Natural Gas Prices and Power Prices

Observing the Changing Relationship Between Natural Gas Prices and Power Prices The research views expressed herein are those of the author and do not necessarily represent the views of the CME Group or

### 4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

### Gamma Distribution Fitting

Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed

### 1 Short Introduction to Time Series

ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### Centre for Central Banking Studies

Centre for Central Banking Studies Technical Handbook No. 4 Applied Bayesian econometrics for central bankers Andrew Blake and Haroon Mumtaz CCBS Technical Handbook No. 4 Applied Bayesian econometrics

### System Identification for Acoustic Comms.:

System Identification for Acoustic Comms.: New Insights and Approaches for Tracking Sparse and Rapidly Fluctuating Channels Weichang Li and James Preisig Woods Hole Oceanographic Institution The demodulation

### Section A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I

Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting

### Spam Filtering based on Naive Bayes Classification. Tianhao Sun

Spam Filtering based on Naive Bayes Classification Tianhao Sun May 1, 2009 Abstract This project discusses about the popular statistical spam filtering process: naive Bayes classification. A fairly famous

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

### On Correlating Performance Metrics

On Correlating Performance Metrics Yiping Ding and Chris Thornley BMC Software, Inc. Kenneth Newman BMC Software, Inc. University of Massachusetts, Boston Performance metrics and their measurements are

### Penalized regression: Introduction

Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

### Predict the Popularity of YouTube Videos Using Early View Data

000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

### Marketing Mix Modelling and Big Data P. M Cain

1) Introduction Marketing Mix Modelling and Big Data P. M Cain Big data is generally defined in terms of the volume and variety of structured and unstructured information. Whereas structured data is stored

### Forecasting in supply chains

1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the

### Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Outline: Demand Forecasting

Outline: Demand Forecasting Given the limited background from the surveys and that Chapter 7 in the book is complex, we will cover less material. The role of forecasting in the chain Characteristics of

### 2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.\$ and Sales \$: 1. Prepare a scatter plot of these data. The scatter plots for Adv.\$ versus Sales, and Month versus

### Forecasting in STATA: Tools and Tricks

Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be

### Time Series Analysis

Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

### Time series Forecasting using Holt-Winters Exponential Smoothing

Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract

### In this chapter, you will learn to use moving averages to estimate and analyze estimates of contract cost and price.

6.0 - Chapter Introduction In this chapter, you will learn to use moving averages to estimate and analyze estimates of contract cost and price. Single Moving Average. If you cannot identify or you cannot

### 7 Time series analysis

7 Time series analysis In Chapters 16, 17, 33 36 in Zuur, Ieno and Smith (2007), various time series techniques are discussed. Applying these methods in Brodgar is straightforward, and most choices are

### Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur. Lecture - 2 Simple Linear Regression

Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 2 Simple Linear Regression Hi, this is my second lecture in module one and on simple

### What Does the Correlation Coefficient Really Tell Us About the Individual?

What Does the Correlation Coefficient Really Tell Us About the Individual? R. C. Gardner and R. W. J. Neufeld Department of Psychology University of Western Ontario ABSTRACT The Pearson product moment

### Algebra 1 Course Information

Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

### Conditional guidance as a response to supply uncertainty

1 Conditional guidance as a response to supply uncertainty Appendix to the speech given by Ben Broadbent, External Member of the Monetary Policy Committee, Bank of England At the London Business School,

### Chapter 5 Estimating Demand Functions

Chapter 5 Estimating Demand Functions 1 Why do you need statistics and regression analysis? Ability to read market research papers Analyze your own data in a simple way Assist you in pricing and marketing

### CHAPTER 7: OPTIMAL RISKY PORTFOLIOS

CHAPTER 7: OPTIMAL RIKY PORTFOLIO PROLEM ET 1. (a) and (e).. (a) and (c). After real estate is added to the portfolio, there are four asset classes in the portfolio: stocks, bonds, cash and real estate.

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### Forecast covariances in the linear multiregression dynamic model.

Forecast covariances in the linear multiregression dynamic model. Catriona M Queen, Ben J Wright and Casper J Albers The Open University, Milton Keynes, MK7 6AA, UK February 28, 2007 Abstract The linear

### Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

### How Much Equity Does the Government Hold?

How Much Equity Does the Government Hold? Alan J. Auerbach University of California, Berkeley and NBER January 2004 This paper was presented at the 2004 Meetings of the American Economic Association. I

### RELEVANT TO ACCA QUALIFICATION PAPER P3. Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam

RELEVANT TO ACCA QUALIFICATION PAPER P3 Studying Paper P3? Performance objectives 7, 8 and 9 are relevant to this exam Business forecasting and strategic planning Quantitative data has always been supplied

### Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

Forecasting Methods What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Prod - Forecasting Methods Contents. FRAMEWORK OF PLANNING DECISIONS....

### Master s Thesis. A Study on Active Queue Management Mechanisms for. Internet Routers: Design, Performance Analysis, and.

Master s Thesis Title A Study on Active Queue Management Mechanisms for Internet Routers: Design, Performance Analysis, and Parameter Tuning Supervisor Prof. Masayuki Murata Author Tomoya Eguchi February

### 5.5. Solving linear systems by the elimination method

55 Solving linear systems by the elimination method Equivalent systems The major technique of solving systems of equations is changing the original problem into another one which is of an easier to solve

### Time Series and Forecasting

Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### State Space Time Series Analysis

State Space Time Series Analysis p. 1 State Space Time Series Analysis Siem Jan Koopman http://staff.feweb.vu.nl/koopman Department of Econometrics VU University Amsterdam Tinbergen Institute 2011 State

### Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

### Regression Analysis: Basic Concepts

The simple linear model Regression Analysis: Basic Concepts Allin Cottrell Represents the dependent variable, y i, as a linear function of one independent variable, x i, subject to a random disturbance

### Investors and Central Bank s Uncertainty Embedded in Index Options On-Line Appendix

Investors and Central Bank s Uncertainty Embedded in Index Options On-Line Appendix Alexander David Haskayne School of Business, University of Calgary Pietro Veronesi University of Chicago Booth School

### PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES From Exploratory Factor Analysis Ledyard R Tucker and Robert C MacCallum 1997 180 CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES In

### Time Series Analysis

Time Series Analysis Time series and stochastic processes Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos

### Time Series Forecasting Techniques

03-Mentzer (Sales).qxd 11/2/2004 11:33 AM Page 73 3 Time Series Forecasting Techniques Back in the 1970s, we were working with a company in the major home appliance industry. In an interview, the person

### Time Series Analysis

Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

### Linear Threshold Units

Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

### 9 Multiplication of Vectors: The Scalar or Dot Product

Arkansas Tech University MATH 934: Calculus III Dr. Marcel B Finan 9 Multiplication of Vectors: The Scalar or Dot Product Up to this point we have defined what vectors are and discussed basic notation

### 1 Maximum likelihood estimation

COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

### Forecasting methods applied to engineering management

Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Integrated Resource Plan

Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1

### A Basic Introduction to Missing Data

John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

### LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

### 3. Regression & Exponential Smoothing

3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a

### Using simulation to calculate the NPV of a project

Using simulation to calculate the NPV of a project Marius Holtan Onward Inc. 5/31/2002 Monte Carlo simulation is fast becoming the technology of choice for evaluating and analyzing assets, be it pure financial

### 99.37, 99.38, 99.38, 99.39, 99.39, 99.39, 99.39, 99.40, 99.41, 99.42 cm

Error Analysis and the Gaussian Distribution In experimental science theory lives or dies based on the results of experimental evidence and thus the analysis of this evidence is a critical part of the

### Industry Environment and Concepts for Forecasting 1

Table of Contents Industry Environment and Concepts for Forecasting 1 Forecasting Methods Overview...2 Multilevel Forecasting...3 Demand Forecasting...4 Integrating Information...5 Simplifying the Forecast...6

### Supplement to Call Centers with Delay Information: Models and Insights

Supplement to Call Centers with Delay Information: Models and Insights Oualid Jouini 1 Zeynep Akşin 2 Yves Dallery 1 1 Laboratoire Genie Industriel, Ecole Centrale Paris, Grande Voie des Vignes, 92290

### 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field

3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field 77 3.8 Finding Antiderivatives; Divergence and Curl of a Vector Field Overview: The antiderivative in one variable calculus is an important

### Regression Clustering

Chapter 449 Introduction This algorithm provides for clustering in the multiple regression setting in which you have a dependent variable Y and one or more independent variables, the X s. The algorithm

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data

A Comparative Study of the Pickup Method and its Variations Using a Simulated Hotel Reservation Data Athanasius Zakhary, Neamat El Gayar Faculty of Computers and Information Cairo University, Giza, Egypt

### Fitting Subject-specific Curves to Grouped Longitudinal Data

Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,

### Regression, least squares

Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen

### APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

### Determination of g using a spring

INTRODUCTION UNIVERSITY OF SURREY DEPARTMENT OF PHYSICS Level 1 Laboratory: Introduction Experiment Determination of g using a spring This experiment is designed to get you confident in using the quantitative

### Machine Learning Logistic Regression

Machine Learning Logistic Regression Jeff Howbert Introduction to Machine Learning Winter 2012 1 Logistic regression Name is somewhat misleading. Really a technique for classification, not regression.

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Forecast. Forecast is the linear function with estimated coefficients. Compute with predict command

Forecast Forecast is the linear function with estimated coefficients T T + h = b0 + b1timet + h Compute with predict command Compute residuals Forecast Intervals eˆ t = = y y t+ h t+ h yˆ b t+ h 0 b Time

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Variance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers

Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications

### Statistics 104: Section 6!

Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC

### The Image Deblurring Problem

page 1 Chapter 1 The Image Deblurring Problem You cannot depend on your eyes when your imagination is out of focus. Mark Twain When we use a camera, we want the recorded image to be a faithful representation