Sales forecasting # 1
|
|
- Alexander Harper
- 8 years ago
- Views:
Transcription
1 Sales forecasting # 1 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1
2 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting Regression techniques Regression and econometric methods Box & Jenkins ARIMA time series method Forecasting with ARIMA series Practical issues: forecasting with MSExcel 2
3 Somes references Major reference for this short course, Pindyck, R.S. & Rubinfeld, D.L. (1997). Econometric models and economic forecasts. Mc Graw Hill. A forecast is a quantitative estimate about the likelihood of future events which is developed on the basis of past and current information. 3
4 Forecasting challenges? With over 50 foreign cars already on sale here, the Japanese auto industry isn t likely to carve out a big slice of the U.S. market. - Business Week, 1958 I think there is a world market for maybe five computers. - Thomas J. Watson, 1943, Chairman of the Board of IBM 640K ought to be enough for anybody. - Bill Gates, 1981 Stocks have reached what looks like a permanently high plateau. - Irving Fisher, Professor of Economics, Yale University, October 16,
5 Challenge: use MSExcel (only) to build a forecast model MSExcel is not a statistical software. Specific softwares can be used, e.g. SAS, Gauss, RATS, EViews, SPlus, or more recently, R (which is the free statistical software). 5
6 Macro versus micro? Macroeconomic Forecasting is related to the prediction of aggregate economic behavior, e.g. GDP, Unemployment, Interest Rates, Exports, Imports, Government Spending, etc. It is a very difficult exercice, which appears frequently in the media. 6
7 American Express University of North Carolina Goldman Sachs PNC Financial Kudlow & co Figure 1: Economic growth forecasts, from Wall Street Journal, Sept. 12, 2002, Q4 2002, Q and Q
8 Macro versus micro? Microeconomic Forecasting is related to the prediction of firm sales, industry sales, product sales, prices, costs... Usually more accurate, and applicable to business manager... Problem is that human behavior is not always rational: there is always unpredictable uncertainty. 8
9 Short versus long term? Figure 2: Forecasting a time series, with different models. 9
10 Short versus long term? Figure 3: Forecasting a time series, with different models. 10
11 Short versus long term? The Nasdaq index, Daily log return Level of the Nasdaq index Figure 4: Forecasting financial time series. 11
12 Series decomposition Decomposition assumes that the data consist of data = pattern + error Where the pattern is made of trend, cycle, and seasonality. General representation is X t = f(s t, D t, C t, ε t ) where X t denotes the time series value at time t, S t denotes the seasonal component at time t, i.e. seasonal effect, D t denotes the trend component at time t, i.e. secular trend, C t denotes the cycle component at time t, i.e. cyclical variation, ε t denotes the error component at time t, i.e. random fluctuations, 12
13 Series decomposition The secular trends are long-run trends that cause changes in an economic data series, three different patterns can be distinguished, linear trend, Ŷt = α + βt constant rate of growth trend, Ŷt = Y 0 (1 + γ) t declining rate of growth trend, Ŷt = exp(α β/t) For the linear trend, adjustment can be obtained, introducing breaks for instance. For constant rate of growth trend, note that in that case log Ŷt = log Y 0 + log(1 + γ) t, which is a linear model on the logarithm of the serie. 13
14 Series decomposition For those two models, standard regression techniques can be used. For declining rate of growth trend, log Ŷt = α β/t, which is sometimes called semilog regression model. The cyclical variations are major expansions and contractions in an economic series that are usually greater than a year in duration The seasonal effect cause variation during a year, that tend to be more or less consistent from year to year, From an econometric point of view, a seasonal effect is obtained using dummy variables. E.g for quaterly data, Ŷ t = α + βt + γ 1 1,t + γ 2 2,t + γ 3 3,t + γ 4 4,t where i,t is an indicator series, being equal to 1 when t is in the ith quarter, and 0 if not. The random fluctuations cannot be predicted. 14
15 Figure 5: Standard time series model, X t. 15
16 Figure 6: Standard time series model, the linear trend component. 16
17 Figure 7: Removing the linear trend component X t D t. 17
18 Figure 8: Standard time series model, detecting the cycle on X t D t. 18
19 Figure 9: Standard time series model, X t. 19
20 Figure 10: Removing linear trend and seasonal component X t D t S t. 20
21 Exogeneous versus endogenous variables Model X t = f(s t, D t, C t, ε t, Z t ) can contain on exogeneous variables Z, so that S t, the seasonal component at time t, can be predicted, i.e. S T +1, S T +2,, S T +h D t, the trend component at time t, can be predicted, i.e. D T +1, D T +2,, D T +h C t, the cycle component at time t, can be predicted, i.e. C T +1, C T +2,, C T +h Z t, the exogeneous variables at time t, can be predicted, i.e. Z T +1, Z T +2,, Z T +h but ε t, the error component cannot be predicted 21
22 Exogeneous versus endogenous variables Like in classical regression models: try to find a model Y i = X i β + ε i which the highest prediction value. Classical ideas in econometrics: compare Ŷi and Y i, which should be as closed as n possible. E.g. minimize (Y i Ŷi) 2, which is the sum of squared errors, and i=1 can be related to the R 2, or MSE, or RMSE. When dealing with time series, it is possible to add an endogeneous component. Endogeneous variables are those that the model seeks to explain via the solution of the system of equations. The general model is then X t = f(s t, D t, C t, ε t, Z t, X t 1, X t 2,..., Z t 1,..., ε t 1,...) 22
23 Comparing forecast models In order to evaluate the accuracy - or reliability - of forecasting models, the R 2 has been seen as a good measure in regression analysis,but the standard is the root mean square error (RMSE), i.e. RMSE = 1 n (Y i n Ŷi) 2 i=1 where is a good measure of the goodness of fit. The smaller the value of the RMSE, the greater the accurary of a forecasting model. 23
24 ESTIMATION PERIOD EX POST FORECAST PERIOD EX ANTE FORECAST PERIOD Figure 11: Estimation period, ex-ante and ex-post forecasting periods. 24
25 Regression model Consider the following regression model, Y i = X i β + ε i. Call: lm(formula = weight ~ groupctl+ grouptrt - 1) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) groupctl e-15 *** grouptrt e-14 *** --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 18 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 2 and 18 DF, p-value: < 2.2e-16 25
26 Lest square estimation Parameters are estimated using ordinary least squares techniques, i.e. β = (X X) 1 X Y. E( β) = β. Linear regression, distance versus speed distance car speed Figure 12: Least square regression, Y = a + bx. 26
27 Lest square estimation Parameters are estimated using ordinary least squares techniques, i.e. β = (X X) 1 X Y. E( β) = β. Linear regression, speed versus distance distance car speed Figure 13: Least square regression, X = c + dy. 27
28 Lest square estimation Assuming ε N (0, σ 2 ), then V ( β) = (X X) 1 σ 2. The variance of residuals σ 2 can be estimated using ε ε/(n k 1). It is possible to test H 0 : β i = 0, then β i /σ (X X) 1 i,i has a Student t distribution under H 0, with n k 1 degrees of freedom. The p-value corresponding to the power of the t-test, i.e. 1- probability of second type error. The confidence interval for β i can be obtained easilty as [ ] β i t n k (1 α/2) σ [(X X) 1 ] i,i ; β i + t n k (1 α/2) σ [(X X) 1 ] i,i where t n k (1 α/2) stands for the (1 α/2) quantile of the t distribution with n k degrees of freedom. 28
29 Lest square estimation Area Area Endemics Elevation 29
30 Lest square estimation The R 2 is the correlation coefficient between series {Y 1,, Y n } and {Ŷ1,, Ŷn}, where Ŷi = X i β. It can be interpreted as the ratio of the variance explained by regression, and total variance. The adjusted R 2, called R 2, is defined as R 2 = (n 1)R2 k n k = 1 n 1 n k 1 (1 R2 ). Assume that residuals are N (0, σ 2 ), then Y N (Xβ, σ 2 I), and thus, it is possible to use maximum likelihood technique, log L(β, σ X, Y ) = n 2 log(2π) n 2 log(σ2 ) (Y Xβ) (Y Xβ) 2σ 2 Akake criteria (AIC) and Schwarz criteria (SBC) can be used to choose a model. AIC = 2 log L + 2k and SBC = 2 log L + k log n 30
31 Lest square estimation Fisher s statistics can be used to test globally the significance of the regression, i.e. H 0 : β = 0, defined as F = n k R 2 k 1 1 R 2. Additional tests can be run, e.g. to test normality of residuals, such as Jarque-Berra statistics, defined as BJ = n 6 sk2 + n 24 [κ 3]2, where sk denotes the empirical skewness, and κ the empirical kurtosis. Under assumption H 0 of normality, BJ 2 (2). 31
32 Residual in linear regression Residuals vs Fitted Normal Q Q Residuals Standardized residuals Fitted values lm(y ~ X1 + X2) Theoretical Quantiles lm(y ~ X1 + X2) 32
33 Prediction in the linear model Given a new observation x 0, the predicted response is x 0 β. Note that the associated variance is V ar(x 0 β) = x 0(X X) 1 x 0 σ 2. Since the future observation should be x β+ε 0 (where ε is unknown, but yield additional uncertainty), the confidence interval for this predicted value can be computed as [ ] β i t n k (1 α/2) σ 1+x 0 (X X) 1 x 0 ; β i + t n k (1 α/2) σ 1+x 0 (X X) 1 x 0 where again t n k (1 α/2) stands for the (1 α/2) quantile of the t distribution with n k degrees of freedom. Remark Recall that this is rather different compared with the confidence interval for the mean response, given x 0, which is [ ] β i t n k (1 α/2) σ x 0 (X X) 1 x 0 ; β i + t n k (1 α/2) σ x 0 (X X) 1 x 0 33
34 Prediction in the linear model car speed distance Confidence and prediction bands car speed distance Confidence and prediction bands 34
35 Regression, basics on statistical regression techniques Remark statistical uncertainty and parameter uncertainty. Consider i.i.d. observations X 1, lcdot, X n from a N (µ, σ) distribution, where µ is unknown and should be estimated. Step 1: in case σ is known. The natural estimate of unkown µ is µ = 1 n X i, n and the 95% confidence interval is [ µ + u 2.5% σ n ; µ + u 97.5% σ n ] i=1 where u 2.5% = and u 97.5% = Both are quantiles of the N (0, 1) distribution. 35
36 Regression, basics on statistical regression techniques Step 2: in case σ is unknown. The natural estimate of unkown µ is still µ = 1 n X i, and the 95% confidence interval is n i=1 [ µ + t 2.5% σ n ; µ + t 97.5% σ n ] The following table gives values of t 2.5% and t 97.5% for different values of n. 36
37 n t 2.5% t 97.5% n t 2.5% t 97.5% Table 1: Quantiles of the t distribution for different values of n. This information is embodied in the form of a model - a single equation structural model, a multiequation model, or a time series model By extrapolating the models beyond the period over which they are estimated,we get forecasts about future events. 37
38 Regression model for time series Consider the following regression model, Y t = α + βx t + ε t where ε t N (0, σ 2 ). Step 1: in case α and β are known, Given a known value X T +1, and if α and β are known, then Ŷ T +1 = E(Y T +1 ) = α + βx T +1 This yields a forecast error, ε T +1 = ŶT +1 Y T +1. This error has two properties the forecast should be unbiased E( ε T +1 ) = 0 the forecast error variance is constant V ( ε T +1 ) = E( ε 2 T +1 ) = σ2. 38
39 Regression model for time series Step 2: in case α and β are unknown, The best forecast for Y T +1 is then determined from a simple two-stage procedure, estimate parameters of the linear equation using ordinary least squares set ŶT +1 = α + βx T +1 Thus, the forecast error is then ε T +1 = ŶT +1 Y T +1 = ( α α) + ( β β)x T +1 ε T +1 Thus, there are two sources of error: the additive error term ε T +1 the random nature of statistical estimation 39
40 Figure 14: Forecasting techniques, problem of uncertainty related to parameter estimation. 40
41 Regression model for time series Consider the following regression model Goal of ordinay least squares, minimize N I=1 (Y i Ŷi) 2 where Ŷ = α + βx. Then β = n X i Y i X i Yi n X 2 i ( X i ) 2 and Yi α = n β The least square slope can be writen Xi n = Y βx β = (Xi X)(Y i Y ) (Xi X) 2 V ( ε T +1 ) = V ( α) + 2X T +1 cov( α, β) + X 2 T +1V ( β) + σ 2 41
42 Regression model for time series under the assumption of the linear model, i.e. there exists a linear relationship between X and Y, Y = α + βx, the X i s are nonrandom variables, the errors have zero expected value, E(ε) = 0, the errors have constant variance, V (ε) = σ 2, the errors are independent, the errors are normally distributed. 42
43 Regression model and Gauss-Markov theorem Under the 5 first assumptions, the estimators α and β are the best (most efficient) linear unbiased estimator of α and β, in the sense that they have minimum variance, of all linear unbiased estimators (i.e. BLUE, best linear unbiased estimators). The two estimators are further asymptotically normal, ( n σ 2 ) n( β β) N 0, and n( α α) N (Xi X) 2 ( ) X 0, σ 2 2 i. (Xi X) 2 The asymptotic variances of α and β can be estimated as V ( β) = while the covariance is σ 2 (Xi X) 2 and V ( α) = cov( α, β) = X σ 2 (Xi X) 2. σ 2 n (X i X) 2 43
44 Regression model and Gauss-Markov theorem Thus, if σ denotes the standard deviation of ε T +1, the standard deviation s of ε T +1 can be estimated as ŝ 2 = σ (1+ 1T + (X T +1 X) 2 (Xi X) 2 ) > σ. 44
45 RMSE (root mean square error) and Theil s inequality Recall that the root mean square error (RMSE), i.e. RMSE = 1 n (Y i n Ŷi) 2 Another useful statistic is Theil inequality coefficient defined as 1 n (Y i T Ŷi) 2 i=1 U = 1 n Ŷi n Yi 2 T T i=1 From this normalization U always fall between 0 and 1. U = 0 is a perfect fit, while U = 1 means that the predictive performance is as bad as it could possibly be. i=1 i=1 45
46 Step 3, assume that α, β and X T +1 are unknown, but that X T +1 = X T +1 + u T +1, where u T +1 N (0, σ 2 u). The two errors are uncorrelated. Here, the error of forecast is ε T +1 = ŶT +1 Y T +1 = ( α α) + ( β β)x T +1 ε T +1 It can be proved (easily) that E( ε T +1 ) = 0. But its variance is slightly more complecated to derive V ( ε T +1 ) = V ( α) + 2X T +1 cov( α, β) + (X 2 T +1+σ 2 u)v ( β) + σ 2 +β 2 σ 2 u And therefore, the forecast error variance is then ( s 2 = σ T + (X T +1 X) 2 + σu 2 (Xi X) 2 which,again, increases the forecast error. + β 2 σ 2 u ) > σ 2, 46
47 To go further, multiple regression model In the multiple regression model, Y = Xβ + ε, in which Y = Y 1 Y 2...,X = X 1,1 X 2,1... X k,1 X 1,2 X 2,2... X k, ,β = β 1 β 2...,ε = ε 1 ε 2... Y n X 1,n X 2,n... X k,n β K ε n there exists a textcolorbluelinear relationship between X 1,, X k and Y, Y = α + β 1 X 1 + +β k X k, the X i s are nonrandom variables, and moreover, there are no exact linear relationship between two and more independent variables, the errors have zero expected value, E(ε = 0, the errors have constant variance, var(ε) = σ 2, the errors are independent, 47
48 the errors are normally distributed. The new assumption here is that there are no exact linear relationship between two and more independent variables. If such a relationship exists, variables are perfectly collinear, i.e. perfect collinearity. From a statistical point of view, multicollinearity occures when two variables are closely related. This might occur e.g. between two series {X 2, X 3,, X T } and {X 1, X 2,, X T 1 } with strong autocorrelation. 48
49 To go further, forecasting with serial correlated errors In previous model, errors were homoscedastic. A more general model is obtained when errors are heteroscedastic, i.e. non-constant variance. Goldfeld-Quandt test can be performed. An alternative is to assume serial correlation. Cochrane-Orcutt or Hildreth-Lu procedures can be performed. Consider the following regression model, with 1 ρ +1 and η t N (0, σ 2 ). Y t = α + βx t + ε t where ε t = ρε t 1 + η t Step 1, assume that α, β and ρ are known. Ŷ T +1 = α + βx T +1 + ε T +1 = α + βx T +1 + ρε T assuming that ε T +1 = ρε T. Recursively, ε T +2 = ρ ε T +1 = ρ 2 ε T 49
50 ε T +3 = ρ ε T +2 = ρ 3 ε T ε T +h = ρ ε T +h 1 = ρ h ε T Since ρ < 1, ρ h approaches 0 as h gets arbitrary large. Hence, the information provided by serial correlation becomes less and less usefull. Ŷ T +1 = α(1 ρ) + βx T +1 + ρ(y T βx T ) Since Y T = α + βx T + ε T, then Ŷ T +1 = α + βx T +1 + ρε T Thus, the forecast error is then ε T = ŶT Y T = ρε T ε T +1 50
51 To go further, using lag models We have mentioned earlier that when dealing with time series, it was possible not only to consider the linear regression of Y t on X t, but to consider lagged variates either X t 1, X t 2, X t 2,...etc, or Y t 1, Y t 2, Y t 2,...etc, First, we will focuse on adding lagged explanatory exogneous variable, i.e. models such as Y t = α + β 0 X t + β 1 X t 1 + β 2 X t β h X t h + + ε t. Remark In a very general setting X t can be a random vector in R k. 51
52 To go further, a geometric lag model Assume that weights of the lagged explanatory variables are all positive and decline geometrically with time, Y t = α + β ( X t + ωx t 1 + ω 2 X t 2 + ω 3 X t ω h X t h + ) + ε t, with 0 < ω < 1. Note that Y t 1 = α + β ( X t 1 + ωx t 2 + ω 2 X t 3 + ω 3 X t ω h X t h 1 + ) + ε t 1, so that where η t = ε t ωε t 1. Y t ωy t 1 = α(1 ω) + βx t + η t Rewriting Y t = α(1 ω) + ωy t 1 + βx t + η t. 52
53 To go further, a geometric lag model This would be called single-equation autoregressive model, with a single lagged dependent variable. The presence of a lagged dependent variable in the model causes ordinary least-squares parameter estimates to be biased, although they remain consistent. 53
54 Estimation of parameters In classical linear econometrics, Y = Xβ + ε, with ε N (0, σ 2 ). Then β = (X X) 1 X Y is the ordinary least squares estimator, OLS, is the maximum likelihood estimator, ML. Maximum likelihood estimator is consistent, asymptotically efficient, and (asymptotic) variances can be determined. This can be obtined using optimization techniques. Remark it is possible to use generalized method of moments, GMM. 54
55 To go further, modeling a qualitative variable In some case, the variable of interest is not necessarily of price (continuous variable on R), but a binary variable. 1 Consider the following regression model Y i = α + βx i + ε i, with Y i = 0 where the ε are independent random variables, with 0 mean. Then E(Y i ) = α + βx i. Note that Y i is then a Bernoulli (binomial) distribution. Classical models are either the probit or the logit model. The idea is that there exists a continuous latent unobservable Y such that 1 if Y i Y i = > t i 0 if Y i t with Yi = α + βx i + ε i, which is now a classical i regression model. Equivalently, it means that Y i is then a Bernoulli (binomial) distribution B(p i ) 55
56 where p i = F (α + βx i ), where F is a cumulative distribution function. If F is the cumulative distribution function of N(0,1), i.e. F (x) = 1 x ) exp ( z2 dz, 2π 2 which is the probit model, or the cumularive distribution of the logistic distribution 1 F (x) = 1 + exp( x) for the logit model. Those models can be extended to so-called ordered probit model, where Y can denote e.g. a rating (AAA,BB+, B-,...etc). Maximum likelihood techniques can be used. 56
57 Modeling the random component The unpredictible random component is the key element when forecasting. Most of the uncertainty comes from this random component ε t. The lower the variance, the smaller the uncertainty on forecasts. The general theoritical framework related to randomness of time series is related to weakly stationary. 57
Sales forecasting # 2
Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More information16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationVI. Real Business Cycles Models
VI. Real Business Cycles Models Introduction Business cycle research studies the causes and consequences of the recurrent expansions and contractions in aggregate economic activity that occur in most industrialized
More informationproblem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More information4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4
4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationADVANCED FORECASTING MODELS USING SAS SOFTWARE
ADVANCED FORECASTING MODELS USING SAS SOFTWARE Girish Kumar Jha IARI, Pusa, New Delhi 110 012 gjha_eco@iari.res.in 1. Transfer Function Model Univariate ARIMA models are useful for analysis and forecasting
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationA Review of Cross Sectional Regression for Financial Data You should already know this material from previous study
A Review of Cross Sectional Regression for Financial Data You should already know this material from previous study But I will offer a review, with a focus on issues which arise in finance 1 TYPES OF FINANCIAL
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationEconometric Modelling for Revenue Projections
Econometric Modelling for Revenue Projections Annex E 1. An econometric modelling exercise has been undertaken to calibrate the quantitative relationship between the five major items of government revenue
More informationAuxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationE(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F
Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationLOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationMGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
More informationRob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
More informationSection A. Index. Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1. Page 1 of 11. EduPristine CMA - Part I
Index Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting techniques... 1 EduPristine CMA - Part I Page 1 of 11 Section A. Planning, Budgeting and Forecasting Section A.2 Forecasting
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationTime Series Analysis
Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:
More informationMultivariate Logistic Regression
1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationNote 2 to Computer class: Standard mis-specification tests
Note 2 to Computer class: Standard mis-specification tests Ragnar Nymoen September 2, 2013 1 Why mis-specification testing of econometric models? As econometricians we must relate to the fact that the
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationChapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationPractical. I conometrics. data collection, analysis, and application. Christiana E. Hilmer. Michael J. Hilmer San Diego State University
Practical I conometrics data collection, analysis, and application Christiana E. Hilmer Michael J. Hilmer San Diego State University Mi Table of Contents PART ONE THE BASICS 1 Chapter 1 An Introduction
More informationCausal Forecasting Models
CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental
More informationForecasting Using Eviews 2.0: An Overview
Forecasting Using Eviews 2.0: An Overview Some Preliminaries In what follows it will be useful to distinguish between ex post and ex ante forecasting. In terms of time series modeling, both predict values
More informationANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? 1. INTRODUCTION
ANNUITY LAPSE RATE MODELING: TOBIT OR NOT TOBIT? SAMUEL H. COX AND YIJIA LIN ABSTRACT. We devise an approach, using tobit models for modeling annuity lapse rates. The approach is based on data provided
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More informationThe VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.
Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium
More informationUnivariate Time Series Analysis; ARIMA Models
Econometrics 2 Spring 25 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Outline of the Lecture () Introduction to univariate time series analysis. (2) Stationarity. (3) Characterizing
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationThe Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss
More informationUnivariate Time Series Analysis; ARIMA Models
Econometrics 2 Fall 25 Univariate Time Series Analysis; ARIMA Models Heino Bohn Nielsen of4 Univariate Time Series Analysis We consider a single time series, y,y 2,..., y T. We want to construct simple
More informationTIME SERIES ANALYSIS
TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationCh.3 Demand Forecasting.
Part 3 : Acquisition & Production Support. Ch.3 Demand Forecasting. Edited by Dr. Seung Hyun Lee (Ph.D., CPL) IEMS Research Center, E-mail : lkangsan@iems.co.kr Demand Forecasting. Definition. An estimate
More informationStatistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Dear-dra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm-3pm in SC 109, Thursday 5pm-6pm in SC 705 Office Hours: Thursday 6pm-7pm SC
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationWe extended the additive model in two variables to the interaction model by adding a third term to the equation.
Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationConcepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance)
Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance) Mr. Eric Y.W. Leung, CUHK Business School, The Chinese University of Hong Kong In PBE Paper II, students
More informationTesting for Granger causality between stock prices and economic growth
MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted
More informationChapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
More information2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
More informationForecasting methods applied to engineering management
Forecasting methods applied to engineering management Áron Szász-Gábor Abstract. This paper presents arguments for the usefulness of a simple forecasting application package for sustaining operational
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationA Primer on Forecasting Business Performance
A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationTime Series Analysis
Time Series Analysis Identifying possible ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos
More informationTIME SERIES ANALYSIS
TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations
More informationTime Series Analysis and Forecasting
Time Series Analysis and Forecasting Math 667 Al Nosedal Department of Mathematics Indiana University of Pennsylvania Time Series Analysis and Forecasting p. 1/11 Introduction Many decision-making applications
More information**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
More informationThe Use of Event Studies in Finance and Economics. Fall 2001. Gerald P. Dwyer, Jr.
The Use of Event Studies in Finance and Economics University of Rome at Tor Vergata Fall 2001 Gerald P. Dwyer, Jr. Any views are the author s and not necessarily those of the Federal Reserve Bank of Atlanta
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationPart II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations
More informationPITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU
PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationTime Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationChapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
More informationNon-Stationary Time Series andunitroottests
Econometrics 2 Fall 2005 Non-Stationary Time Series andunitroottests Heino Bohn Nielsen 1of25 Introduction Many economic time series are trending. Important to distinguish between two important cases:
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationGenerating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
More informationRecent Developments of Statistical Application in. Finance. Ruey S. Tsay. Graduate School of Business. The University of Chicago
Recent Developments of Statistical Application in Finance Ruey S. Tsay Graduate School of Business The University of Chicago Guanghua Conference, June 2004 Summary Focus on two parts: Applications in Finance:
More information2. Descriptive statistics in EViews
2. Descriptive statistics in EViews Features of EViews: Data processing (importing, editing, handling, exporting data) Basic statistical tools (descriptive statistics, inference, graphical tools) Regression
More information