Lecture 7 Linear Regression Diagnostics


 Ellen Morris
 1 years ago
 Views:
Transcription
1 Lecture 7 Linear Regression Diagnostics BIOST 515 January 27, 2004 BIOST 515, Lecture 6
2 Major assumptions 1. The relationship between the outcomes and the predictors is (approximately) linear. 2. The error term ɛ has zero mean. 3. The error term ɛ has constant variance. 4. The errors are uncorrelated. 5. The errors are normally distributed or we have an adequate sample size to rely on large sample theory. We should always check fitted models to make sure that these assumptions have not been violated. BIOST 515, Lecture 6 1
3 Departures from the underlying assumptions cannot be detected using any of the summary statistics we ve examined so far such as the t or F statistics or R 2. In fact, tests based on these statistics may lead to incorrect inference since they are based on many of the assumptions above. BIOST 515, Lecture 6 2
4 Residual analysis The diagnostic methods we ll be exploring are based primarily on the residuals. Recall, the residual is defined as where e i = y i ŷ i, i = 1,..., n, ŷ = X ˆβ. If the model is appropriate, it is reasonable to expect the residuals to exhibit properties that agree with the stated assumptions. BIOST 515, Lecture 6 3
5 Characteristics of residuals The mean of the {e i } is 0: ē = 1 n n i=1 e i = 0. The estimate of the population variance computed from the sample of the n residuals is S 2 = 1 n p 1 n i=1 e 2 i which is the residual mean square, MSE = SSE/(n p 1). BIOST 515, Lecture 6 4
6 The {e i } are not independent random variables. In general, if the number of residuals (n) is large relative to the number of independent variables (p), the dependency can be ignored for all practical purposes in an analysis of residuals. BIOST 515, Lecture 6 5
7 Methods for standardizing residuals Standardized residuals Studentized residuals Jackknife residuals BIOST 515, Lecture 6 6
8 Standardized residuals An obvious choice for scaling residuals is to divide them by their estimated standard error. The quantity z i = e i MSE is called a standardized residual. Based on the linear regression assumptions, we might expect the z i s to resemble a sample from a N(0, 1) distribution. BIOST 515, Lecture 6 7
9 Studentized residuals Using MSE as the variance of the ith residual e i is only an approximation. We can improve the residual scaling by dividing e i by the standard deviation of the ith residual. We can show that the covariance matrix of the residuals is var(e) = σ 2 (I H). Recall H = X(X X) 1 X is the hat matrix. The variance of the ith residual is var(e i ) = σ 2 (1 h i ), where h i is the ith element on the diagonal of the hat matrix and 0 h i 1. BIOST 515, Lecture 6 8
10 The quantity r i = e i MSE(1 hi ) is called a studentized residual and approximately follows a t distribution with n p 1 degrees of freedom (assuming the assumptions stated at the beginning of lecture are satisfied). Studentized residuals have a mean near 0 and a variance, 1 n p 1 n i=1 r 2 i, that is slightly larger than 1. In large data sets, the standardized and studentized residuals should not differ dramatically. BIOST 515, Lecture 6 9
11 The quantity r ( i) = r i s MSE MSE ( i) = Jackknife residuals e i q MSE ( i) (1 h i ) = r i s (n p 1) 1 (n p 1) r 2 i is called a jackknife residual (or RStudent residual). MSE ( i) is the residual variance computed with the ith observation deleted. Jackknife residuals have a mean near 0 and a variance 1 n r( i) 2 (n p 1) 1 that is slightly greater than 1. Jackknife residuals are usually the preferred residual for regression diagnostics. i=1 BIOST 515, Lecture 6 10
12 How to use residuals for diagnostics? at Residual analysis is usually done graphically. We may look Quantile plots: to assess normality Scatterplots: to assess model assumptions, such as constant variance and linearity, and to identify potential outliers Histograms, stem and leaf diagrams and boxplots BIOST 515, Lecture 6 11
13 Quantilequantile plots Quantilequantile plots can be useful for comparing two samples to determine if they arise from the same distribution. Similarly, we can compare quantiles of a sample to the expected quantiles if the sample came from some distribution F for a visual assessment of whether the sample arises from F. In linear regression, this can help us determine the normality of the residuals (if we have relied on an assumption of normality). To construct a quantilequantile plot for the residuals, we plot the quantiles of the residuals against the theorized quantiles if the residuals arose from a normal distribution. If the residuals come from a normal distribution the plot should resemble a straight line. A straight line connecting the 1st and 3rd quartiles is often added to the plot to aid in visual assessment. BIOST 515, Lecture 6 12
14 Samples from N(0, 1) distribution Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 13
15 Samples from a skewed distribution Normal Q Q Plot Theoretical Quantiles Residuals Histogram of y Residuals Density BIOST 515, Lecture 6 14
16 Samples from a heavytailed distribution Normal Q Q Plot Residuals Theoretical Quantiles Histogram of y Density Residuals BIOST 515, Lecture 6 15
17 Samples from a lighttailed distribution Normal Q Q Plot Theoretical Quantiles Residuals Histogram of y Residuals Density BIOST 515, Lecture 6 16
18 Scatterplots Another useful aid for inspection is a scatterplot of the residuals against the fitted values and/or the predictors. These plots can help us identify: Nonconstant variance Violation of the assumption of linearity Potential outliers BIOST 515, Lecture 6 17
19 Satisfactory residual plot y i ^ e i BIOST 515, Lecture 6 18
20 Non constant variance y i ^ e i BIOST 515, Lecture 6 19
21 Non constant variance y i ^ e i BIOST 515, Lecture 6 20
22 Example Suppose the true relationship between a predictor, x, and an outcome, y is but we fit the model E(y i ) = 1 + 2x i 0.25x 2 i, y i = β 0 + β 1 x i + ɛ i. Can we diagnose this with residual plots? BIOST 515, Lecture 6 21
23 y i = β 0 + β 1 x i + ɛ i Scatterplot of x vs. y x y BIOST 515, Lecture 6 22
24 Estimate Std. Error t value Pr(> t ) (Intercept) x Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 23
25 Fitted values versus residuals y i ^ e i BIOST 515, Lecture 6 24
26 Next, we fit y i = β 0 + β 1 x i + β 2 x 2 i + ɛ i. Estimate Std. Error t value Pr(> t ) (Intercept) x x Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 25
27 Fitted values versus residuals y i ^ e i BIOST 515, Lecture 6 26
28 What if we have more than one predictor and only one is misspecified? So, the true model is and we fit E[y i ] = 1 + 3w i + 2x i 0.25x 2 i y i = β 0 + β 1 w i + β 2 x i + ɛ i. How can we diagnose model misspecification with residual plots in this case? BIOST 515, Lecture 6 27
29 y w x BIOST 515, Lecture 6 28
30 Estimate Std. Error t value Pr(> t ) (Intercept) x w Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 29
31 y i ^ e i BIOST 515, Lecture 6 30
32 x res BIOST 515, Lecture 6 31
33 w res BIOST 515, Lecture 6 32
34 y i = β 0 + β 1 w i + β 2 x i + β 3 x 2 i ɛ i Estimate Std. Error t value Pr(> t ) (Intercept) x x w Normal Q Q Plot Theoretical Quantiles Sample Quantiles BIOST 515, Lecture 6 33
35 y i ^ e i BIOST 515, Lecture 6 34
36 Partial regression plots A variation of the plot of residuals versus predictors Allows us to study the marginal relationship of a regressor given the other variables that are in the model. Also called the added variable plot or the adjusted variable plot BIOST 515, Lecture 6 35
37 For example, suppose we are considering the model y i = β 0 + β 1 x 1 + β 2 x 2 + ɛ, and we are concerned about the correct specification of the relationship between x 1 and y 1. To create a partial regression plot, we 1. Regress y on x 2 and obtain the fitted values and residuals ŷ i (x 2 ) = ˆθ 0 + ˆθ 1 x i2 e i (y x 2 ) = y i ŷ i (x 2 ), i = 1,..., n 2. Regress x 1 on x 2 and calculate the residuals xˆ i1 (x 2 ) = ˆα 0 + ˆα 1 x i2 BIOST 515, Lecture 6 36
38 e i (x 1 x 2 ) = x i1 xˆ i1 (x 2 ), i = 1,..., n 3. Plot the y residuals e i (y x 2 ) against the x 1 residuals e i (x 1 x 2 ). BIOST 515, Lecture 6 37
39 Partial regression plot from previous example x y BIOST 515, Lecture 6 38
40 Comments on partial regression plots Use with caution  they only suggest possible relationships between the predictor and the response. In general, they will not detect interactions between regressors. The presence of strong multicollinearity can cause partial regression plots to give incorrect information. BIOST 515, Lecture 6 39
41 Next Variance stabilizing transformations Identifying outliers and influential observations. BIOST 515, Lecture 6 40
Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1. Assumptions in Multiple Regression: A Tutorial. Dianne L. Ballance ID#
Running head: ASSUMPTIONS IN MULTIPLE REGRESSION 1 Assumptions in Multiple Regression: A Tutorial Dianne L. Ballance ID#00939966 University of Calgary APSY 607 ASSUMPTIONS IN MULTIPLE REGRESSION 2 Assumptions
More informationLecture 5 Hypothesis Testing in Multiple Linear Regression
Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informatione = random error, assumed to be normally distributed with mean 0 and standard deviation σ
1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationPrediction and Confidence Intervals in Regression
Fall Semester, 2001 Statistics 621 Lecture 3 Robert Stine 1 Prediction and Confidence Intervals in Regression Preliminaries Teaching assistants See them in Room 3009 SHDH. Hours are detailed in the syllabus.
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationRegression, least squares
Regression, least squares Joe Felsenstein Department of Genome Sciences and Department of Biology Regression, least squares p.1/24 Fitting a straight line X Two distinct cases: The X values are chosen
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More information4. Introduction to Statistics
Statistics for Engineers 41 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation
More informationFull Factorial Design of Experiments
Full Factorial Design of Experiments 0 Module Objectives Module Objectives By the end of this module, the participant will: Generate a full factorial design Look for factor interactions Develop coded orthogonal
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationRegression III: Advanced Methods
Lecture 5: Linear leastsquares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression
More informationRegression III: Advanced Methods
Lecture 11: Outliers and Influential data Regression III: Advanced Methods William G. Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Outlying Observations: Why pay attention?
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationAP Statistics Semester Exam Review Chapters 13
AP Statistics Semester Exam Review Chapters 13 1. Here are the IQ test scores of 10 randomly chosen fifthgrade students: 145 139 126 122 125 130 96 110 118 118 To make a stemplot of these scores, you
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 4448935 email:
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLecture 10: Other Continuous Distributions and Probability Plots
Lecture 10: Other Continuous Distributions and Probability Plots Devore: Section 4.44.6 Page 1 Gamma Distribution Gamma function is a natural extension of the factorial For any α > 0, Γ(α) = 0 x α 1 e
More informationLecture 2. Summarizing the Sample
Lecture 2 Summarizing the Sample WARNING: Today s lecture may bore some of you It s (sort of) not my fault I m required to teach you about what we re going to cover today. I ll try to make it as exciting
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationAP Statistics 2001 Solutions and Scoring Guidelines
AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for noncommercial use by AP teachers for course and exam preparation; permission for any other use
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More information103 Measures of Central Tendency and Variation
103 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.
More informationStatistics 101 Homework 2
Statistics 101 Homework 2 Solution Reading: January 23 January 25 Chapter 4 January 28 Chapter 5 Assignment: 1. As part of a physiology study participants had their heart rate (beats per minute) taken
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationStatistics courses often teach the twosample ttest, linear regression, and analysis of variance
2 Making Connections: The TwoSample ttest, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the twosample
More informationTreatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics
Treatment and analysis of data Applied statistics Lecture 3: Sampling and descriptive statistics Topics covered: Parameters and statistics Sample mean and sample standard deviation Order statistics and
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationElements of statistics (MATH04871)
Elements of statistics (MATH04871) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis 
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 03 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationRegression Models 1. May 10, 2012 Regression Models
Regression Models 1 May 10, 2012 Regression Models Perhaps the most used statistical methodology in practice is regression analysis. Basically the idea of regression is to relate one variable, called a
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationParametric versus Semi/nonparametric Regression Models
Parametric versus Semi/nonparametric Regression Models Hamdy F. F. Mahmoud Virginia Polytechnic Institute and State University Department of Statistics LISA short course series July 23, 2014 Hamdy Mahmoud
More informationSIMPLE REGRESSION ANALYSIS
SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationIntroductory Statistics Notes
Introductory Statistics Notes Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 354870348 Phone: (205) 3484431 Fax: (205) 3488648 August
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationTeaching Multivariate Analysis to BusinessMajor Students
Teaching Multivariate Analysis to BusinessMajor Students WingKeung Wong and TeckWong Soon  Kent Ridge, Singapore 1. Introduction During the last two or three decades, multivariate statistical analysis
More informationAn Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R)
DSC 2003 Working Papers (Draft Versions) http://www.ci.tuwien.ac.at/conferences/dsc2003/ An Interactive Tool for Residual Diagnostics for Fitting Spatial Dependencies (with Implementation in R) Ernst
More informationPrice Elasticity of Demand DRAFT Working Paper. Kentucky Energy and Environment Cabinet. July 25, 2011
Price Elasticity of Demand DRAFT Working Paper Department for Energy Development & Independence Under Dr. Arne Bathke and Aron Patrick: Shaoceng Wei, Yang Luo, Edward Roualdes July 25, 2011 Overview 1
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationFEV1 (litres) Figure 1: Models for gas consumption and lung capacity
Simple Linear Regression: Reliability of predictions Richard Buxton. 2008. 1 Introduction We often use regression models to make predictions. In Figure 1 (a), we ve fitted a model relating a household
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationGeostatistics Exploratory Analysis
Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa Master of Science in Geospatial Technologies Geostatistics Exploratory Analysis Carlos Alberto Felgueiras cfelgueiras@isegi.unl.pt
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationThe Big 50 Revision Guidelines for S1
The Big 50 Revision Guidelines for S1 If you can understand all of these you ll do very well 1. Know what is meant by a statistical model and the Modelling cycle of continuous refinement 2. Understand
More informationModel Diagnostics for Regression
Model Diagnostics for Regression After fitting a regression model it is important to determine whether all the necessary model assumptions are valid before performing inference. If there are any violations,
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationPart II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a yvariable relates to two or more xvariables (or transformations
More informationHow Far is too Far? Statistical Outlier Detection
How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30325329 Outline What is an Outlier, and Why are
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationUsing JMP with a Specific
1 Using JMP with a Specific Example of Regression Ying Liu 10/21/ 2009 Objectives 2 Exploratory data analysis Simple liner regression Polynomial regression How to fit a multiple regression model How to
More informationUCLA STAT 13 Statistical Methods  Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates
UCLA STAT 13 Statistical Methods  Final Exam Review Solutions Chapter 7 Sampling Distributions of Estimates 1. (a) (i) µ µ (ii) σ σ n is exactly Normally distributed. (c) (i) is approximately Normally
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationNew Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 35345 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationChapter 3 Descriptive Statistics: Numerical Measures. Learning objectives
Chapter 3 Descriptive Statistics: Numerical Measures Slide 1 Learning objectives 1. Single variable Part I (Basic) 1.1. How to calculate and use the measures of location 1.. How to calculate and use the
More informationDensity Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:
Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve
More information1 Measures for location and dispersion of a sample
Statistical Geophysics WS 2008/09 7..2008 Christian Heumann und Helmut Küchenhoff Measures for location and dispersion of a sample Measures for location and dispersion of a sample In the following: Variable
More informationStepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (StepUp) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationThe aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree
PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and
More informationELECE8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems
Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed
More informationAssumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
More informationLecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 5: Model Checking Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Regression Diagnostics Unusual and Influential Data Outliers Leverage Influence Heterosckedasticity Nonconstant
More informationDescriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
More informationttests and Ftests in regression
ttests and Ftests in regression Johan A. Elkink University College Dublin 5 April 2012 Johan A. Elkink (UCD) t and Ftests 5 April 2012 1 / 25 Outline 1 Simple linear regression Model Variance and R
More informationInstrumental Variables & 2SLS
Instrumental Variables & 2SLS y 1 = β 0 + β 1 y 2 + β 2 z 1 +... β k z k + u y 2 = π 0 + π 1 z k+1 + π 2 z 1 +... π k z k + v Economics 20  Prof. Schuetze 1 Why Use Instrumental Variables? Instrumental
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More information1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation.
Kapitel 5 5.1. 1. The standardised parameters are given below. Remember to use the population rather than sample standard deviation. The graph of crossvalidated error versus component number is presented
More informationFixed vs. Random Effects
Statistics 203: Introduction to Regression and Analysis of Variance Fixed vs. Random Effects Jonathan Taylor  p. 1/19 Today s class Implications for Random effects. Oneway random effects ANOVA. Twoway
More informationDEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
More informationModerator and Mediator Analysis
Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,
More informationLecture 4 Linear random coefficients models
Lecture 4 Linear random coefficients models Rats example 30 young rats, weights measured weekly for five weeks Dependent variable (Y ij ) is weight for rat i at week j Data: Multilevel: weights (observations)
More informationWebbased Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Webbased Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationMultiple Regression  Selecting the Best Equation An Example Techniques for Selecting the "Best" Regression Equation
Multiple Regression  Selecting the Best Equation When fitting a multiple linear regression model, a researcher will likely include independent variables that are not important in predicting the dependent
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationLecture 1: Review and Exploratory Data Analysis (EDA)
Lecture 1: Review and Exploratory Data Analysis (EDA) Sandy Eckel seckel@jhsph.edu Department of Biostatistics, The Johns Hopkins University, Baltimore USA 21 April 2008 1 / 40 Course Information I Course
More informationConfidence Intervals for One Standard Deviation Using Standard Deviation
Chapter 640 Confidence Intervals for One Standard Deviation Using Standard Deviation Introduction This routine calculates the sample size necessary to achieve a specified interval width or distance from
More information