The R 2 will always increase (and the SSR and MSE will always decrease) each time you add another variable to the right-hand-side of the regression.
|
|
- Jemima Parker
- 7 years ago
- Views:
Transcription
1 In-Sample Overfitting There is a serious danger in looking at the simple R (or SSR or MSE) to select among competing forecast models. The R will always increase (and the SSR and MSE will always decrease) each time you add another variable to the right-hand-side of the regression. So, for example, if we mechanically apply this criteria to select the trend model from among models of the form: T t = β 0 + β 1 t + β t + + β p t p these criteria will direct us to choose p as large as possible. For the hepi data, I fit the hepi to a trend model for p = 1,,3,4. Here are the results. p SSR So? What s the problem?
2 The problem is that improving the fit of the model over the sample period by adding additional variables can easily lead to poorer out of sample forecasts! The more variables you add to the right hand side of the forecasting model, the higher the variance of the forecast error! So, there is a tradeoff that needs to be addressed: On the one hand, adding variables to the regression improves the fit of the model. On the other hand, adding variables to the regression increases the variance of the forecast error. The simple R, SSR, MSE criteria ignores the negative side effects of adding variables to the regression.
3 We need criteria that account for both the benefits and costs of adding variables to the regression. Adjusted R Akaike Information Criterion (AIC) Schwartz Information Criterion (SIC) Select the model that max s the adjusted R. Select the model that min s the AIC. Select the model that min s the SIC. Let s first look at each of these measures
4 Adjusted R - R = 1 T 1 ( y t s y) /( T 1) where s = SSR /( T k) = s.e. of the regression, k = number of regression parameters Note that the denominator term depends only on y 1,,y T (not on the model that the y s are fit to). So, the only thing that will change as we fit different models will be s. So, in effect, maximizing the adjusted R amounts to minimizing the standard error of the regression, SSR/(T-k), whereas maximizing the simple R amounts to minimizing the MSE of the regression, SSR/T. As we increase the number of variables in the model, SSR will decrease, T-k will decrease, and s will increase or decrease depending only whether SSR is decreasing less than or more than proportionally to the (linear) decrease in T-k.
5 AIC and SIC - log(aic) = log(ssr/t) + k/t log(sic) = log(ssr/t) + k*log(t)/t (Note the AIC and SIC reported by EViews are log(aic) and log(sic) but they do not use exactly the same formulas as those given above, which are the ones used in the text, pp In fact, there are a number of variations of the AIC and SIC that are used by different books and programs. They all imply the same results with regard to ordering models according to these criteria.) Note that the AIC and SIC values will be decreasing as additional variables are added to the regression through the first term but will be increasing through the second term, i.e., the penalty term. The preferred model is the one that minimizes the AIC (or SIC).
6 Unfortunately, these three criteria (adjusted R,AIC, SIC) will not always select the same model! That is because of the differences in their penalty functions as illustrated by Figure 4.13 in your text. The SIC imposes the strongest penalty for additional variables, followed by the AIC and then the adjusted R. So, when they select different models, the SIC will choose a more parsimonious model than the AIC, which, in turn, will choose a more parsimonious model than the R. The AIC and SIC are more commonly used than the adjusted R. Each of the two has certain (but different) theoretical properties that make them appealing. Which of the two to use in practice when they give different answers is somewhat arbitrary. If we maintain the KISS principle and select the simpler model when there is not a compelling reason to do otherwise, then we should use the SIC.
7 Compute for the polynomial trends for hepi p R-Bar-Square AIC SIC Year Forecast (p=) Forecast (p=5) (Actual) (3.4%) 44.1 (5.0%) (3.4%) 57.8 (5.5%) (3.3%) 73.7 (6.0%) (3.3%) 9. (6.5%) (3.%) (7.1%) (3.%) 338.5(7.6%) The AIC and SIC select a 5-th order polynomial in t to represent the trend component of the
8 HEPI. There are, however, a couple of reasons why I might still end up selecting the quadratic trend: 1. The AIC and SIC are in-sample fit criteria. And although they account for the costs of overfitting through the inclusion of a penalty term, I am still concerned that extrapolating such a high-order polynomial into the future will be misleading.. What might be going on with this series is that the actual trend is a linear or quadratic function of time but the parameters of that function have changed during the sample period. E.g., perhaps y t = β 0,1 + β 1,1 t for t = 1,,T 0 y t = β 0, + β 1, t for t = T 0 +1,,T, T+1, There are a number of things that I can do to pursue these possibilities.
9 Out-of-Sample Fitting What I am really interested in is the question: Having fit the model over the sample period, how well does it forecast outside of that sample? The in-sample fit criteria that we discussed do not directly answer this question. Consider the following exercise Suppose we have a data sample y 1,,y T. 1.Break it up into two parts: where n << T. y 1, y T-n (first T-n observations) y T-n+1,,y T (last n observations). Fit the shortened sample, y 1,,y T-n to various trend models that may seem like plausible choices based on time series plots, in-sample fit criteria, : linear, quadratic, the one selected by AIC/SIC, log linear,
10 3. For each estimated trend model, forecast y T-n+1,,y T and compute the forecast errors: e 1,,e n 4. Compare the errors across the various models time series plots (of the forecasts and actual values of y T-n+1,,y T ; of the forecast errors) tables of the forecasts, actuals, and errors mean squared prediction errors (MSPE) MSPE = 1 n n i= 1 e i
11 The advantage of this approach is that we are actually comparing the trend models in terms of their out-of-sample forecasting performance. A disadvantage is that the comparison is based on models fit over T-n observations rather than the T observations we have available. (Note that if you do use this approach and, for example, settle on the quadratic model, then when you proceed to construct your forecasts for T+1, you should use the quadratic model fit to the full T observations in your sample.)will the fact that, for example, the quadratic trend model outperformed other models in forecasting out of sample based on the short sample mean that it will perform best in forecasting beyond the full sample? No.
12 Structural Breaks in the Trend Suppose that the trend in y t can be modeled as T t = β 0, t + β 1,t t where and β 0,t = β 0,1 if t < T 0 = β 0, if t > T 0 β 1,t = β 1,1 if t < T 0 = β 1, if t > T 0 In this case, T T+h = β 0, + β 1, (T+h) Problem How to estimate β 0, and β 1,? A bad approach Regress y t on 1,t for t=1,,t
13 Better approaches Regress y t on 1,t for t = T 0 +1,,T Problems with this approach Not an ideal approach if you want to force either the intercept or slope coefficient to be fixed over the full sample, t = 1,,T, allowing only one of the coefficients to change at T 0. Does not allow you to test whether the intercept and/slope changed at T 0. Does not provide us with estimated deviations from trend for t = 1,,T 0, which we will want to use to estimate the seasonal and cyclical components of the series to help us forecast those components of the series.
14 Introduce dummy variables into the regression to jointly estimate β 0,1, β 0,, β 1,1, β 1, Let D t = 0 if t = 1,,T 0 = 1 if t > T 0 Run the regression y t = α 0 + α 1 D t + α t + α 3 (D t t) + ε t, over the full sample, t = 1,,T. Then ˆ β ˆ ˆ ˆ + 0,1 = ˆ α 0, β 0, = ˆ α ˆ ˆ ˆ ˆ 0 + α1, β1,1 = α, β1, = α α 3 Suppose we want to allow β 0 to change at T 0 but we want to force β 1 to remain fixed (i.e., a shift in the intercept of the trend line) Run the regression of y t on 1, D t and t to estimate α 0, α 1, and α ( = β 1 ).
15 Notes This approach extends to higher order polynomials in a straightforward way, allowing one or more parameters to change at one or more points in time. This approach can be extended to allow for breaks at unknown time(s).
16 Exponential (or,log Linear) Trends Recall that an alternative to the polynomial trend is the exponential trend model T t = e β 0 + β1t + β t β pt p since log(e x ) = x. Assuming that y t = T t + ε t we can estimate the β s by applying nonlinear least squares: Choose β 0, β 1,,β p to minimize T t = 1 ( y t e β + β t + β t β 0 1 p p t ) This minimization problem must be solved numerically (vs. analytically), but most modern regression software (including EViews) are well-equipped to solve this problem.
17 To select p in this case use the NLS residuals, yt T t (βˆ), to compute the AIC and/or the SIC, then select the model that minimizes the AIC and/or the SIC. We can also compare the fit of these exponential trend models to the polynomial trend models by comparing AICs and SICs. If we do select an estimated exponential trend model, the forecast of y T+h,T is ˆ) y ˆ T h T T ˆ +, = T + h ( β + ε T + h, T
18 A related approach that is commonly used in practice Assume that log(y t ) = T t + ε t T t = β 0 + β 1 t + + β p t p (The ε s are deviations of log(y) from its trend, vs. deviations of y from its trend.) In this case, we can fit log(y t ) to 1,t,,t p by OLS to estimate the β s we can select p by minimizing the AIC and/or SIC across these regressions ˆ) log( y ˆ T h T T ˆ +, ) = T + h ( β + ε T + h, T = T (β T + h ˆ) if the ε s are i.i.d. ˆ T + h, T log( ˆ y T + h, T y = e )
19 This approach has the advantage of relying on OLS vs. NLS, but although this approach produces an unbiased forecast of log(y T+h ), it produces a biased forecast of y T+h. [E(f(x)) f(e(x)) if f is nonlinear]. There has been some work done on ways to adjust the forecast to reduce this bias NLS is not particularly difficult or unreliable, especially in this setting. We should also note that the AICs and SICs from the log(y) regressions cannot be meaningfully compared to the AICs and SICs from the y regressions, so it is difficult to choose between the log linear models and the polynomial trend models based on in-sample fits.
20 Although this model is a nonlinear model (in the β s), its natural log is a linear model and so we also call it a log linear trend model log(t t ) = β 0 + β 1 t + β t + β p t p
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationSimple Methods and Procedures Used in Forecasting
Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationTime Series and Forecasting
Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More information2) The three categories of forecasting models are time series, quantitative, and qualitative. 2)
Exam Name TRUE/FALSE. Write 'T' if the statement is true and 'F' if the statement is false. 1) Regression is always a superior forecasting method to exponential smoothing, so regression should be used
More informationCausal Forecasting Models
CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationPartial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:
Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: From the standpoint of integration, the left side of Equation 1 would be much easier to work with than
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationT O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these
More informationCollege Readiness LINKING STUDY
College Readiness LINKING STUDY A Study of the Alignment of the RIT Scales of NWEA s MAP Assessments with the College Readiness Benchmarks of EXPLORE, PLAN, and ACT December 2011 (updated January 17, 2012)
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationChapter 27 Using Predictor Variables. Chapter Table of Contents
Chapter 27 Using Predictor Variables Chapter Table of Contents LINEAR TREND...1329 TIME TREND CURVES...1330 REGRESSORS...1332 ADJUSTMENTS...1334 DYNAMIC REGRESSOR...1335 INTERVENTIONS...1339 TheInterventionSpecificationWindow...1339
More informationTime Series Analysis
Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools
More informationTIME SERIES ANALYSIS & FORECASTING
CHAPTER 19 TIME SERIES ANALYSIS & FORECASTING Basic Concepts 1. Time Series Analysis BASIC CONCEPTS AND FORMULA The term Time Series means a set of observations concurring any activity against different
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationForecast. Forecast is the linear function with estimated coefficients. Compute with predict command
Forecast Forecast is the linear function with estimated coefficients T T + h = b0 + b1timet + h Compute with predict command Compute residuals Forecast Intervals eˆ t = = y y t+ h t+ h yˆ b t+ h 0 b Time
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationSecond Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.
Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients We will now turn our attention to nonhomogeneous second order linear equations, equations with the standard
More informationRob J Hyndman. Forecasting using. 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1
Rob J Hyndman Forecasting using 11. Dynamic regression OTexts.com/fpp/9/1/ Forecasting using R 1 Outline 1 Regression with ARIMA errors 2 Example: Japanese cars 3 Using Fourier terms for seasonality 4
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More information2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationExample G Cost of construction of nuclear power plants
1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationCanonical Correlation Analysis
Canonical Correlation Analysis LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the similarities and differences between multiple regression, factor analysis,
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationDemand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless
Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless the volume of the demand known. The success of the business
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationMgmt 469. Model Specification: Choosing the Right Variables for the Right Hand Side
Mgmt 469 Model Specification: Choosing the Right Variables for the Right Hand Side Even if you have only a handful of predictor variables to choose from, there are infinitely many ways to specify the right
More informationCovariance and Correlation
Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationEfficient Curve Fitting Techniques
15/11/11 Life Conference and Exhibition 11 Stuart Carroll, Christopher Hursey Efficient Curve Fitting Techniques - November 1 The Actuarial Profession www.actuaries.org.uk Agenda Background Outline of
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationPhysics Lab Report Guidelines
Physics Lab Report Guidelines Summary The following is an outline of the requirements for a physics lab report. A. Experimental Description 1. Provide a statement of the physical theory or principle observed
More informationSome Essential Statistics The Lure of Statistics
Some Essential Statistics The Lure of Statistics Data Mining Techniques, by M.J.A. Berry and G.S Linoff, 2004 Statistics vs. Data Mining..lie, damn lie, and statistics mining data to support preconceived
More informationAP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
More informationTIME SERIES ANALYSIS
TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationPricing I: Linear Demand
Pricing I: Linear Demand This module covers the relationships between price and quantity, maximum willing to buy, maximum reservation price, profit maximizing price, and price elasticity, assuming a linear
More informationInvestment Statistics: Definitions & Formulas
Investment Statistics: Definitions & Formulas The following are brief descriptions and formulas for the various statistics and calculations available within the ease Analytics system. Unless stated otherwise,
More informationINTRODUCTION TO MULTIPLE CORRELATION
CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary
More informationtable to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.
Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find
More informationMulticollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationNew Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationn + n log(2π) + n log(rss/n)
There is a discrepancy in R output from the functions step, AIC, and BIC over how to compute the AIC. The discrepancy is not very important, because it involves a difference of a constant factor that cancels
More informationThe Method of Partial Fractions Math 121 Calculus II Spring 2015
Rational functions. as The Method of Partial Fractions Math 11 Calculus II Spring 015 Recall that a rational function is a quotient of two polynomials such f(x) g(x) = 3x5 + x 3 + 16x x 60. The method
More informationNonparametric statistics and model selection
Chapter 5 Nonparametric statistics and model selection In Chapter, we learned about the t-test and its variations. These were designed to compare sample means, and relied heavily on assumptions of normality.
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationTime series Forecasting using Holt-Winters Exponential Smoothing
Time series Forecasting using Holt-Winters Exponential Smoothing Prajakta S. Kalekar(04329008) Kanwal Rekhi School of Information Technology Under the guidance of Prof. Bernard December 6, 2004 Abstract
More informationINCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT
58 INCREASING FORECASTING ACCURACY OF TREND DEMAND BY NON-LINEAR OPTIMIZATION OF THE SMOOTHING CONSTANT Sudipa Sarker 1 * and Mahbub Hossain 2 1 Department of Industrial and Production Engineering Bangladesh
More informationChapter 4: Vector Autoregressive Models
Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...
More informationWooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions
Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially
More informationNominal and Real U.S. GDP 1960-2001
Problem Set #5-Key Sonoma State University Dr. Cuellar Economics 318- Managerial Economics Use the data set for gross domestic product (gdp.xls) to answer the following questions. (1) Show graphically
More informationCorrelation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
More informationCross Validation. Dr. Thomas Jensen Expedia.com
Cross Validation Dr. Thomas Jensen Expedia.com About Me PhD from ETH Used to be a statistician at Link, now Senior Business Analyst at Expedia Manage a database with 720,000 Hotels that are not on contract
More informationTime Series Laboratory
Time Series Laboratory Computing in Weber Classrooms 205-206: To log in, make sure that the DOMAIN NAME is set to MATHSTAT. Use the workshop username: primesw The password will be distributed during the
More informationStatistical estimation using confidence intervals
0894PP_ch06 15/3/02 11:02 am Page 135 6 Statistical estimation using confidence intervals In Chapter 2, the concept of the central nature and variability of data and the methods by which these two phenomena
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationMgmt 469. Regression Basics. You have all had some training in statistics and regression analysis. Still, it is useful to review
Mgmt 469 Regression Basics You have all had some training in statistics and regression analysis. Still, it is useful to review some basic stuff. In this note I cover the following material: What is a regression
More informationMULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)
MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More information171:290 Model Selection Lecture II: The Akaike Information Criterion
171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information
More informationIs the person a permanent immigrant. A non permanent resident. Does the person identify as male. Person appearing Chinese
Cole Sprague Kai Addae Economics 312 Canadian Census Project Introduction This project is based off of the 2001 Canadian Census data, and examines the relationship between wages and education, while controlling
More informationDefinition 8.1 Two inequalities are equivalent if they have the same solution set. Add or Subtract the same value on both sides of the inequality.
8 Inequalities Concepts: Equivalent Inequalities Linear and Nonlinear Inequalities Absolute Value Inequalities (Sections 4.6 and 1.1) 8.1 Equivalent Inequalities Definition 8.1 Two inequalities are equivalent
More informationSAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria
Paper SA01_05 SAS Code to Select the Best Multiple Linear Regression Model for Multivariate Data Using Information Criteria Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN
More informationTime Series Analysis. 1) smoothing/trend assessment
Time Series Analysis This (not surprisingly) concerns the analysis of data collected over time... weekly values, monthly values, quarterly values, yearly values, etc. Usually the intent is to discern whether
More information16 : Demand Forecasting
16 : Demand Forecasting 1 Session Outline Demand Forecasting Subjective methods can be used only when past data is not available. When past data is available, it is advisable that firms should use statistical
More informationI. Basic concepts: Buoyancy and Elasticity II. Estimating Tax Elasticity III. From Mechanical Projection to Forecast
Elements of Revenue Forecasting II: the Elasticity Approach and Projections of Revenue Components Fiscal Analysis and Forecasting Workshop Bangkok, Thailand June 16 27, 2014 Joshua Greene Consultant IMF-TAOLAM
More informationTIME SERIES ANALYSIS
TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these
More informationChapter 15. Mixed Models. 15.1 Overview. A flexible approach to correlated data.
Chapter 15 Mixed Models A flexible approach to correlated data. 15.1 Overview Correlated data arise frequently in statistical analyses. This may be due to grouping of subjects, e.g., students within classrooms,
More informationForecasting in STATA: Tools and Tricks
Forecasting in STATA: Tools and Tricks Introduction This manual is intended to be a reference guide for time series forecasting in STATA. It will be updated periodically during the semester, and will be
More informationA Primer on Forecasting Business Performance
A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.
More informationHomework 8 Solutions
Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.
More informationMissing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13
Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional
More information