Linear and Piecewise Linear Regressions


 Gabriel Williamson
 2 years ago
 Views:
Transcription
1 Tarigan Statistical Consulting & Coaching statisticalcoaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Handson Data Analysis with R University of Neuchatel, 10 May 2016 Linear and Piecewise Linear Regressions Bernadetta Tarigan, Dr. sc. ETHZ Linear and Piecewise s 1
2 Piecewise Critic data generated from different versions of a software project version: a version number of the software project rule name: a name of the rule used to generate critics about source code entity: name of the entity which has a critic other fields contain additional information about the rules and entities and can be used for filtering [version, rule name, entity] triplets have unique values as for each version an entity can have only one critic of by a certain rule Linear and Piecewise s 2
3 Piecewise Main calculated value: Number of critics in a version = number of lines with the same version number How does the number of critics trend change after version 50185? Hypothesis is that after that version the number of critics should go down. But there is too much noise for analyzing all the data without filtering. Ideas on filtering: filter by package: focus on Collections*, Kernel, System*, Nautilus packages or perform analysis by package and merge the results filter by rule severity: focus only on error rules filter by rule group: focus only on Pharo bugs and Bugs calculate the trend values for each rule or each package separately and compare trends/eliminate outliers consider only entities that were changed since last version Linear and Piecewise s 3
4 Piecewise Very noisy indeed Linear and Piecewise s 4
5 Piecewise What is happening? Linear and Piecewise s 5
6 Piecewise There is hope Linear and Piecewise s 6
7 Piecewise This model? Use all version What s wrong with this? Linear and Piecewise s 7
8 Piecewise Or this one? Good enough? Use only version > 185 What s wrong with this? Linear and Piecewise s 8
9 Piecewise How about this one? Much better? Use all version Looks better, no? Linear and Piecewise s 9
10 Piecewise Simple = fitting a line on 1dim input f(x) = β 0 + β 1 x β 0 : intercept of the line (when x = 0 then y = β 0 ), β 1 : slope of the line (one unit increase in x gives β 1 units in y) ε i : random component (statistical error) for the ith case, it accounts for the fact that the statistical model does not give an exact fit to each and every data points ε i is unobservable, but we assume that E(ε i ) = 0 and Var ε i i = 1,, n = σ ε 2 for all However, we do not assume any distribution for ε i Population parameters are β 0, β 1 and σ ε 2 and we want to estimate them Linear and Piecewise s 10
11 Piecewise Estimate best line Define fitted value y i : = β 0 + β 1 x i residual e i : = y i y i Points above the line have positive residuals, points below the line have negative residuals A good line should have small residuals Residuals should be small in magnitude, because large negative residuals are as bad as large positive ones So we cannot simply require e i = 0 In fact, any line passing the means of the variables, the point (x, y), satisfies e i = 0 Two immediate solutions require e i to be as small as possible (least absolute distance) require (e i ) 2 to be as small as possible (least squares distance) Consider the second option: mathematically easier (e.g. to take derivative), although the first option is more resistant to outliers Linear and Piecewise s 11
12 Piecewise Least squares solution Denote β T = (β 0, β 1 ) and x T i = (1, x i ) (column vector) Residual sum of (error) squares RSS β : = (e i ) 2 = {y i β T x i } 2 Least squares solution is β ls = arg min β RSS β = arg min β {y i β T x i } 2 Easy to solve: set the first partial derivatives equal to zero, check the second derivative β 0 ls = y β 1 ls x β 1 ls = (x i x )(y i y) (x i x ) 2 Properties of residuals e i = 0 since the leastsquares line passes (x, y) x i e i = 0 and y i e i = 0: residuals are uncorrelated with the independent variable x i and fitted value y i β ls are unique defined as long as x i s are not all identical, in that case the numerator (x i x ) 2 = 0 Estimate for σ ε 2 is s e RSS/(n 2) Linear and Piecewise s 12
13 Piecewise How good is the fit? Use s e RSS/(n 2) : the smaller the better Use the coefficient of determination R 2 RegSS TSS Define TSS := (y i y) 2 o o total sum of squares of the null model, i.e., we do not use the independent variable Recall RSS := (y i y i ) 2 Clearly RSS < TSS Define RegSS := (y i y) 2 RegSS = TSS RSS, it gives reduction in the squared error due to the linear regression Define R 2 RegSS TSS, clearly 0 R2 1 R 2 is the proportion of the variation in that is explained by the linear regression The larger R 2 the better Linear and Piecewise s 13
14 Piecewise Famous result: least squares estimates are BLUE BLUE = Best Linear Unbiased Estimates β ls = arg min β RSS β = arg min β {y i β T x i } 2 GaussMarkov Theorem: least squares estimates have the smallest variance among all linear unbiased estimates Recall: Let β an estimate for an unknown parameter β The quality of β is measured via its mean squared error MSE β E β E β 2 = β E β 2 + Var β = Bias 2 + Variance Therefor least squares estimates are famous: if the underlying function f(x) were truly linear (that is, y = β T x + ε), then least squares estimates are your best approximation! Linear and Piecewise s 14
15 Piecewise Great! But, what next? Remember that we do not assume any distribution for the statistical 2 error ε, only that E(ε i ) = 0 and Var ε i = σ ε for all i = 1,, n Least squares estimates are great and truly mathematical solution, but we cannot do much more We cannot do statistical inference on them, e.g. Confidence interval Hypotheses test Which are needed when the goal in estimating the underlying mechanism is to explain or to describe But not to predict Statistical Inference: drawing conclusion about population from sample with some calculated uncertainty When you have two sets of data/sample from the same mechanism y = β T x + ε, you will get two sets of different estimates Linear and Piecewise s 15
16 Piecewise Normal distribution of the random error ε Linear statistical model: y i = β 0 + β 1 x i + ε i Assume that random error ε i are iid and N(0, σ ε 2 ) distributed, for i = 1,, n Y i x i ~ N(β 0 + β 1 x i, σ ε 2 ) The standard deviation remains constant, E(y x 3 ) b 0 + b 1 x 3 E(y x 2 ) m 3 but the mean value changes with x b 0 + b 1 x 2 E(y x 1 ) m 2 b 0 + b 1 x 1 m 1 x 1 x 2 x 3 Linear and Piecewise s 16
17 Piecewise Maximum Likelihood Estimates Now that we know the distributions of Y i x i that are independent but not identical (Y i x i ~ N(β 0 + β 1 x i, σ ε 2 )), hence we can apply maximum likelihood estimation (MLE) method The MLE estimates for are equal to least squares estimates β 1 MLE = β 1 ls = (x i x )(y i y) (x i x ) 2 β 0 MLE = β 0 ls = y β 1 ls x Linear and Piecewise s 17
18 Piecewise Maximum Likelihood Estimates (Cont.) However, we get more β 0 ~ N β 0, σ ε 2 β 1 ~ N(β 1, σ ε 2 s x ) n 2 x s i=1 i x Covariance Cov β 0, β 1 = σ ε 2 x s x Define S 2 RSS(β 0, β 1 )/(n 2) unbiased estimate for σ2 ε n 2 S 2 σ2 ~ χ 2 n 2 ε Moreover, (β 0, β 1 ) and S 2 are independent Linear and Piecewise s 18
19 Piecewise Test statistics From the results about sampling distributions, it immediately follows that which are the basis for inferences, significance test and CI estimation, regarding the two parameters β 0 and β 1 Linear and Piecewise s 19
20 Piecewise Test of significance 1. Test both parameters simultaneously with F test H 0 β 0 = β 1 = 0 H 1 at least one of them is not zero 2. Test each parameter with t test, for i = 0,1 H 0 β i = 0 H 1 β i 0 Linear and Piecewise s 20
21 Piecewise Confidence Interval (CI) estimation The (1 α) CI for respectively β 0 and β 1 are β 0 ± t 1 α 2 ; n 1 SE(β 0) β 1 ± t 1 α 2 ; n 1 SE(β 1) point estimate ± margin of error R returns the SE values When n is large, t behaves like the Standard Normal Z For α = 0.05, t 1 α 2 ; n 1 2 Remember the rule Linear and Piecewise s 21
22 Piecewise Model validation The assumptions of the random term (i.e., the errors) The outliers 1. Zero mean of the errors 2. Constant variance (homoscedasticity) of the errors 3. Independence of the errors 4. Normality of the errors 5. Outlier diagnostic Linear and Piecewise s 22
23 Piecewise Model Evaluation The goodnessoffit or quality of the model How good is the fit? Two measures: Residual standard error Coefficient of determination R 2 Linear and Piecewise s 23
24 Piecewise Piecewise linear regression Other names: hockey stick, broken stick or segmented It is a simple modification of linear model, yet very useful Different ranges of x, different linear relationships occur A single linear model may not provide an adequate explanation or description Breakpoints are the value of x where the slope changes The value of breakpoints may or may not known before the analysis, when unknown it has to be estimated Linear and Piecewise s 24
25 Piecewise Even to model a nonlinear relationship! Breakpoints are the value of x where the slope changes The value of breakpoints may or may not known before the analysis, when unknown it has to be estimated Linear and Piecewise s 25
26 Piecewise One breakpoint with known value Let c be the value of breakpoint Denote (x c) + = 0 ; x c x c ; x > c Piecewise linear model y = β 0 + β 1 x + β 2 (x c) + + ε Can be written as y = β 0 + β 1 x ; x c β 0 β 2 c + (β 1 +β 2 ) x ; x > c For x c the slope is β 1 Then it changes to β 1 + β 2 when x > c Linear and Piecewise s 26
27 Piecewise Hypothesis test y = β 0 + β 1 x ; x c β 0 β 2 c + (β 1 +β 2 ) x ; x > c For x c the slope is β 1 Then it changes to β 1 + β 2 when x > c As x increases, to test if y would decrease after the breakpoint c is to test if β 2 < 0 Linear and Piecewise s 27
ttests and Ftests in regression
ttests and Ftests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and Ftests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationStatistiek (WISB361)
Statistiek (WISB361) Final exam June 29, 2015 Schrijf uw naam op elk in te leveren vel. Schrijf ook uw studentnummer op blad 1. The maximum number of points is 100. Points distribution: 23 20 20 20 17
More informationELECE8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems
Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed
More informationDEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests
DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationStatistics  Written Examination MEC Students  BOVISA
Statistics  Written Examination MEC Students  BOVISA Prof.ssa A. Guglielmi 26.0.2 All rights reserved. Legal action will be taken against infringement. Reproduction is prohibited without prior consent.
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: estim.tex, Dec8, 009: 6 p.m. (draft  typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationPrediction and Confidence Intervals in Regression
Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SHDH. Hours are detailed in the syllabus.
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationChicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011
Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationFixed vs. Random Effects
Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor  p. 1/19 Today s class Implications for Random effects. Oneway random effects ANOVA. Twoway
More informationElements of statistics (MATH04871)
Elements of statistics (MATH04871) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis 
More informationINTRODUCTORY STATISTICS
INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore
More informationOutline. Correlation & Regression, III. Review. Relationship between r and regression
Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationRegression, least squares
Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #47/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X 1, X,... of independent and
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationBivariate Regression Analysis. The beginning of many types of regression
Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationHypothesis Testing Level I Quantitative Methods. IFT Notes for the CFA exam
Hypothesis Testing 2014 Level I Quantitative Methods IFT Notes for the CFA exam Contents 1. Introduction... 3 2. Hypothesis Testing... 3 3. Hypothesis Tests Concerning the Mean... 10 4. Hypothesis Tests
More informationStatistics 112 Regression Cheatsheet Section 1B  Ryan Rosario
Statistics 112 Regression Cheatsheet Section 1B  Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationStandard Deviation Calculator
CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or
More informationWhat s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationUniversity of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.
University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationFactors affecting online sales
Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More informationL10: Probability, statistics, and estimation theory
L10: Probability, statistics, and estimation theory Review of probability theory Bayes theorem Statistics and the Normal distribution Least Squares Error estimation Maximum Likelihood estimation Bayesian
More information2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or
Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus
More information1. χ 2 minimization 2. Fits in case of of systematic errors
Data fitting Volker Blobel University of Hamburg March 2005 1. χ 2 minimization 2. Fits in case of of systematic errors Keys during display: enter = next page; = next page; = previous page; home = first
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More informationExample: Boats and Manatees
Figure 96 Example: Boats and Manatees Slide 1 Given the sample data in Table 91, find the value of the linear correlation coefficient r, then refer to Table A6 to determine whether there is a significant
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationLecture 4 Linear random coefficients models
Lecture 4 Linear random coefficients models Rats example 30 young rats, weights measured weekly for five weeks Dependent variable (Y ij ) is weight for rat i at week j Data: Multilevel: weights (observations)
More informationUCLA STAT 13 Statistical Methods  Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates
UCLA STAT 13 Statistical Methods  Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationInstrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.
Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted
More informationFrom the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
More informationWooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares
Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit
More informationLinear Regression. Guy Lebanon
Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It
More informationQuantile Regression under misspecification, with an application to the U.S. wage structure
Quantile Regression under misspecification, with an application to the U.S. wage structure Angrist, Chernozhukov and FernandezVal Reading Group Econometrics November 2, 2010 Intro: initial problem The
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationStatistics courses often teach the twosample ttest, linear regression, and analysis of variance
2 Making Connections: The TwoSample ttest, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the twosample
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationStatistics 104: Section 6!
Page 1 Statistics 104: Section 6! TF: Deirdre (say: Deardra) Bloome Email: dbloome@fas.harvard.edu Section Times Thursday 2pm3pm in SC 109, Thursday 5pm6pm in SC 705 Office Hours: Thursday 6pm7pm SC
More information, has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.
BA 275 Review Problems  Week 9 (11/20/0611/24/06) CD Lessons: 69, 70, 1620 Textbook: pp. 520528, 111124, 133141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationSimple Linear Regression in SPSS STAT 314
Simple Linear Regression in SPSS STAT 314 1. Ten Corvettes between 1 and 6 years old were randomly selected from last year s sales records in Virginia Beach, Virginia. The following data were obtained,
More information17.0 Linear Regression
17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationEstimation and Inference in Cointegration Models Economics 582
Estimation and Inference in Cointegration Models Economics 582 Eric Zivot May 17, 2012 Tests for Cointegration Let the ( 1) vector Y be (1). Recall, Y is cointegrated with 0 cointegrating vectors if there
More informationPooling and Metaanalysis. Tony O Hagan
Pooling and Metaanalysis Tony O Hagan Pooling Synthesising prior information from several experts 2 Multiple experts The case of multiple experts is important When elicitation is used to provide expert
More informationRegression analysis in practice with GRETL
Regression analysis in practice with GRETL Prerequisites You will need the GNU econometrics software GRETL installed on your computer (http://gretl.sourceforge.net/), together with the sample files that
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationPenalized regression: Introduction
Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20thcentury statistics dealt with maximum likelihood
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationHeteroskedasticity and Weighted Least Squares
Econ 507. Econometric Analysis. Spring 2009 April 14, 2009 The Classical Linear Model: 1 Linearity: Y = Xβ + u. 2 Strict exogeneity: E(u) = 0 3 No Multicollinearity: ρ(x) = K. 4 No heteroskedasticity/
More informationJoint Probability Distributions and Random Samples. Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage
5 Joint Probability Distributions and Random Samples Week 5, 2011 Stat 4570/5570 Material from Devore s book (Ed 8), and Cengage Two Discrete Random Variables The probability mass function (pmf) of a single
More informationRobust procedures for Canadian Test Day Model final report for the Holstein breed
Robust procedures for Canadian Test Day Model final report for the Holstein breed J. Jamrozik, J. Fatehi and L.R. Schaeffer Centre for Genetic Improvement of Livestock, University of Guelph Introduction
More informationChapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics  HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationChapter 6: Multivariate Cointegration Analysis
Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration
More informationLesson Lesson Outline Outline
Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and
More information1 The Pareto Distribution
Estimating the Parameters of a Pareto Distribution Introducing a Quantile Regression Method Joseph Lee Petersen Introduction. A broad approach to using correlation coefficients for parameter estimation
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationAP Statistics 2002 Scoring Guidelines
AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought
More information