Lecture 5: Model Checking. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
|
|
|
- Constance Payne
- 9 years ago
- Views:
Transcription
1 Lecture 5: Model Checking Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
2 Regression Diagnostics Unusual and Influential Data Outliers Leverage Influence Heterosckedasticity Non-constant variance Multicollinearity Non-independence of x variables
3 Unusual and Influential Data Outliers An observation with large residual. An observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.
4 Outliers Largest positive outliers Largest negative outliers reg api00 meals ell emer rvfplot, yline(0)
5 Unusual and Influential Data Outliers An observation with large residual. An observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem. Leverage An observation with an extreme value on a predictor variable Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an effect on the estimate of regression coefficients.
6 Leverage These cases have relatively large leverage
7 Unusual and Influential Data Outliers An observation with large residual. An observation whose dependent-variable value is unusual given its values on the predictor variables. An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem. Leverage An observation with an extreme value on a predictor variable Leverage is a measure of how far an independent variable deviates from its mean. These leverage points can have an effect on the estimate of regression coefficients. Influence Influence can be thought of as the product of leverage and outlierness. Removing the observation substantially changes the estimate of coefficients.
8 Influence Regression line without influential data point Regression line with influential data point This case has the largest influence
9 Introduction The problem: least squares is not resistant One or several observations can have undue influence on the results A quadratic-in-x term is significant here, but not when largest x is removed. Why is this a problem? Conclusions that hinge on one or two data points must be considered extremely fragile and possibly misleading.
10 Tools Scatterplots Residuals plots Tentative fits of models with one or more cases set aside A strategy for dealing with influential observations (11.3) Tools to help detect outliers and influential cases (11.4) Cook s distance Leverage Studentized residual
11 Difficulties to overcome Detection of influential observations depends on Having determined a good scale for y (transformation) first Having the appropriate x s in the model, But assessment of appropriate functional form and x s can be affected by influential observations (see previous page).
12 Example of Influential Outliers Log transformation smoothes data and
13 General strategy Start with a fairly rich model; Include possible x s even if you re not sure they will appear in the final model Be careful about this with small sample sizes Resolve influence and transformation simultaneously, early in the data analysis In complicated problems, be prepared for dead ends.
14 Influence By influential observation(s) we mean one or several observations whose removal causes a different conclusion in the analysis. Two strategies for dealing with the fact that least squares is not resistant: Use an estimating procedure that is more resistant than least squares (and don t worry about the influence problem) Use least squares with the strategy defined below...
15 A strategy for dealing with influential cases Do conclusions change when the case is deleted? Proceed with the case included. Study it to see if anything can be learned No Yes Is there reason to believe the case belongs to a population other than he one under investigation Yes No Omit the case and proceed Does the case have unusually distant explanatory variable values? Omit the case and proceed. Report conclusions for the reduced range of explanatory Variables. Yes No More data are needed to resolve the question
16 Alcohol Metabolism Example (Section ) Possible influential outliers GASTRIC Fitted values METABOL Does the fitted regression model change when the two isolated points are removed?
17 Example: Alcohol Metabolism Step 1: Create indicator variables and Interactive terms. Commands to generate dummies for female and male: gen female=gender if gender==1 (14 missing values generated) gen male=gender if gender==2 (18 missing values generated) replace female=0 if female!=1 (14 real changes made) replace male=0 if male!=2 (18 real changes made) Interactive Term gen femgas=female*gastric
18 Example: Alcohol Metabolism (cont.) Step 2: run initial regression model:
19 Example: Alcohol Metabolism (cont.) Step 3: run initial regression model: exclude the largest values of gastric, cases 31 and 32
20 Scatterplots Female line does not change
21 Female line in detail
22 Male line in detail
23 Case influence statistics Introduction These help identify influential observations and help to clarify the course of action. Use them when: you suspect influence problems and when graphical displays may not be adequate One useful set of case influence statistics: D i : Cook s s Distance - for measuring influence h i : Leverage - for measuring unusualness of x s r i : Studentized residual - for measuring outlierness Note: i = 1,2,..., n Sample use of influence statistics
24 Cook s s Distance: Measure easure of overall influence predict D, cooskd graph twoway spike D subject D i = n j= 1 (ˆ y j( i) yˆ 2 pˆ σ j ) 2 Note: observations 31 and 32 have large cooks distances. The impact that omitting a case has on the estimated regression coefficients.
25 D i : Cook s Distance for identifying influential cases One formula: where is the estimated mean of y at observation j, based on the reduced data set with observation i deleted. D i = P 2is the number of regression coefficients is the estimated variance from the fit, based on all observations. σˆ Equivalent formula (admittedly mysterious): h i D = 1 i ( studres i ) 2 p 1 h i n j= 1 ( yˆ This term is big if case i is unusual in the y-direction j( i) p ˆ σ yˆ 2 j ) 2 This term is big if case i is unusual in the x-direction
26 Leverage: h i for the single variable case (also called: diagonal element of the hat matrix) It measures the multivariate distance between the x s for case i and the average x s, accounting for the correlation structure. If there is only one x: Equivalently: h i 2 ( x i x ) = ( x x ) n h i = ( n 1 1) x i s x x n This case has a relatively large leverage Leverage is the proportion of the total sum of squares of the explanatory variable contributed by the i th case.
27 Leverage: h i for the multivariate case For several x s, h i has a matrix expression X 1 Unusual in explanatory variable values, although not unusual in X 1 or X 2 individually X 2
28 Studentized residual for detecting outliers (in y direction) Formula: studres = i res SE i ( resi ) Fact: SE( res i ) = ˆ σ 1 h i i.e. different residuals have different variances, and since 0 < h i < 1 those with largest h i (unusual x s) have the smallest SE(res i ). For outlier detection use this type of residual (but use ordinary residuals in the standard residual plots).
29 How to use case influence statistics Get the triplet (D i, h i, studresi) for each i from 1 to n Look to see whether any D i s are large Large D i s indicate influential observations Note: you ARE allowed to investigate these more closely by manual case deletion. h i and studresi help explain the reason for influence unusual x-value, outlier or both; helps in deciding the course of action outlined in the strategy for dealing with suspected influential cases.
30 ROUGH guidelines for large (Note emphasis on ROUGH) D i values near or larger than 1 are good indications of influential cases; Sometimes a D i much larger than the others in the data set is worth looking at. The average of h i is always p/n; some people suggest using h i >2p/n as large Based on normality, studres > 2 is considered large
31 Sample situation with a single x
32 STATA commands: predict derives statistics from the most recently fitted model. Some predict options that can be used after anova or regress are: predict newvariable, cooksd predict newvariable, rstudent Predict newvariable, hat Cook s distance Studentized residuals Leverage
33 1. predict D, cooksd 2. graph twoway scatter D subject, msymbol(i) mlabel(subject ytitle("cooks'd") xlabel(0(5)35) ylabel(0(0.5)1.5) title("cook's Distance by subject")
34 1. predict studres, rstudent 2. graph twoway scatter studres subject, msymbol(i) mlabel(subject) ytitle( Studentized Residuals") title("studentized Residuals by subject")
35 1. predict leverage, hat 2. graph twoway scatter leverage subject, msymbol(i) mlabel(subject) ytitle( Leverage") ylabel(0(.1).5) xlabel(0(5)35) title( Leverage by subject")
36 Alternative case influence statistics Alternative to D i : dffits i (and others) Alternative to studresi: externally- studentized residual Suggestion: use whatever is convenient with the statistical computer package you re using. Note: D i only detects influence of single-cases; influential pairs may go undetected.
37 Partial Residual Plots A problem: a scatterplot of y vs x 2 gives information regarding µ(y x 2 ) about (a) whether x 2 is a useful predictor of y, (b) nonlinearity in x 2 and (c) outliers and influential observations. We would like a plot revealing (a), (b), and (c) for µ(y x 1, x 2, x 3 ) e.g. what is the effect of x 2, after accounting for x 1 and x 3?
38 Example: SAT Data (Case 12.01) Question: Is the distribution of state average SAT scores associated with state expenditure on public education, after accounting for percentage of high school students who take the SAT test? We would like to visually explore the function f(expend) in: µ(sat takers,expend) = β 0 + β 1 takers + f(expend) After controlling for the number of students taking the test, does expenditures impact performance?
39 Step 1: Scatterplots Marginal plots y vs. x 1 and y vs. x 2 The scatterplot matrix depicts the interrelations among these three variables
40 Stata Commands: avplot The added variable plot is also known as partial-regression leverage plots, adjusted partial residuals plots or adjusted variable plots. The AVPlot depicts the relationship between y and one x variable, adjusting for the effects of other x variables Avplots help to uncover observations exerting a disproportionate influence on the regression model. High leverage observations show in added variable plots as points horizontally distant from the rest of the data.
41 Added variable plots - Is the state with largest expenditure influential? - Is there an association of expend and SAT, after accounting for takers?
42 Alaska is unusual in its expenditure, and is apparently quite influential
43 After accounting for % of students who take SAT, there is a positive association between expenditure and mean SAT scores.
44 Component plus Residual We d like to plot y versus x 2 but with the effect of x 1 subtracted out; 0 β1x1 i.e. plot versus x 2 To approximate this, get the partial residual for x 2 : a. Get ˆ, ˆ, ˆ in 0 y 1 β + β β β b. Compute the partial residual as This is also called a component plus residual; if res is the residual from 3a: 2 µ ( y x = β + β x + β x pres 1, x ) = res pres 2 = + ˆ β x y ˆ β + ˆ β x
45 Stata Commands: cprplot The component plus residual plot is also known as partial-regression leverage plots, adjusted partial residuals plots or adjusted variable plots. The command cprplot x graph each obervation s residual plus its component predicted from x against values of x. Cprplots help diagnose non-linearities and suggest alternative functional forms.
46 Graph cprplot x 1 ei + b1 x1 i Graphs each observation's residual plus its component predicted from x 1 against values of x 1i
47 Hetroskedasticity (Non-constant Variance) reg api00 meals ell emer rvfplot, yline(0) Heteroskastic: Systematic variation in the size of the residuals Here, for instance, the variance for smaller fitted values is greater than for larger ones
48 Hetroskedasticity Rvfplot, yline(0) reg api00 meals ell emer rvfplot, yline(0) Heteroskastic: Systematic variation in the size of the residuals Here, for instance, the variance for smaller fitted values is greater than for larger ones
49 Hetroskedasticity Rvfplot, yline(0) reg api00 meals ell emer rvfplot, yline(0) Heteroskastic: Systematic variation in the size of the residuals Here, for instance, the variance for smaller fitted values is greater than for larger ones
50 Tests for Heteroskedasticity Grabbed whitetst from the web Fails hettest Fails whitetst
51 Another Example reg api00 enroll rvfplot These error terms are really bad! Previous analysis suggested logging enrollment to correct skewness
52 Another Example Much better Errors look more-orless normal now gen lenroll = log(enroll) reg api00 lenroll rvfplot
53 Back To First Example Adding enrollment keeps errors normal Don t need to take the log of enrollment this time reg api00 meals ell emer enroll rvfplot, yline(0)
54 Weighted regression for certain types of non-constant variance (cont.) 1. Suppose: µ ( y x = β + β x + β x 1, x2 ) var( y 2 x1, x2) = σ / ωi and the w i s are known 2. Weighted least squares is the appropriate tool for this model; it minimizes the weighted sum of squared residuals n i= 1 ω ˆ i ( yi β1x1 i β 2 x2i 3. In statistical computer programs: use linear regression in the usual way, specify the column w as a weight, read the output in the usual way ˆ ) 2
55 Weighted regression for certain types of non-constant variance 4. Important special cases where this is useful: a. y i is an average based on a sample of size m i In this case, the weights are w i = 1/m i b. the variance is proportional to x; so w i = 1/x i
56 Multicollinearity This means that two or more regressors are highly correlated with each other. Doesn t bias the estimates of the dependent variable So not a problem if all you care about is the predictive accuracy of the model But it does affect the inferences about the significance of the collinear variables To understand why, go back to Venn diagrams
57 Multicollinearity Variable X explains Blue + Red Variable W explains Green + Red So how should Red be allocated?
58 Multicollinearity We could: 1. Allocate Red to both X and W 2. Split Red between X and W (using some formula) 3. Ignore Red entirely
59 Multicollinearity In fact, only the information in the Blue and Green areas is used to predict Y. Red area is ignored when estimating β x and β w
60 Venn Diagrams and Estimation of Regression Model Only this information is used in the estimation of β 1 Temp Oil Insulation Only this information is used in the estimation of β 2 This information is NOT used in the estimation of nor 1 β β2
61 Venn Diagrams and Collinearity Y This is the usual situation: some overlap between regressors, but not too much. X W
62 Venn Diagrams and Collinearity Y Now the overlap is so big, there s hardly any information left over to use when estimating β x and β w. X W These variables interfere with each other.
63 Venn Diagrams and Collinearity Large Overlap reflects collinearity between Temp and Insulation Temp Oil Insulation Large Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOT in estimating and β 1 β 2
64 Testing for Collinearity quietly suppresses all output. quietly regress api00 meals ell emer. vif Variable VIF 1/VIF meals ell emer Mean VIF 2.22 VIF = variance inflation factor Any value over 10 is worrisome
65 Testing for Collinearity quietly suppresses all output. quietly regress api00 meals ell emer. vif Variable VIF 1/VIF meals ell emer Mean VIF 2.22 These results are not too bad VIF = variance inflation factor Any value over 10 is worrisome
66 Testing for Collinearity Now add different regressors. qui regress api00 acs_k3 avg_ed grad_sch col_grad some_col. vif Variable VIF 1/VIF avg_ed grad_sch col_grad some_col acs_k Mean VIF 15.66
67 Testing for Collinearity Now add different regressors. qui regress api00 acs_k3 avg_ed grad_sch col_grad some_col. vif Variable VIF 1/VIF avg_ed grad_sch col_grad some_col acs_k Mean VIF Much worse. Problem: education variables are highly correlated Solution: delete collinear factors.
68 Testing for Collinearity Now add different regressors. qui regress api00 acs_k3 avg_ed grad_sch col_grad some_col. vif Variable VIF 1/VIF avg_ed grad_sch col_grad some_col acs_k Mean VIF Much worse. Problem: education variables are highly correlated Solution: delete collinear factors.
69 Testing for Collinearity Now add different regressors. qui regress api00 acs_k3 avg_ed grad_sch col_grad some_col. vif Variable VIF 1/VIF avg_ed grad_sch col_grad some_col acs_k Mean VIF Much worse. Problem: education variables are highly correlated Solution: delete collinear factors.
70 Testing for Collinearity Delete average parent education. qui regress api00 acs_k3 grad_sch col_grad some_col. vif Variable VIF 1/VIF col_grad grad_sch some_col acs_k Mean VIF 1.15 This solves the problem.
71 Measurement errors in x s Fact: least squares estimates are biased and inferences about µ( y x1, x2) = β 0 + β 1 x 1 + β 2 x 2 can be misleading if the available data for estimating the regression are observations y, x 1, x 2 *,where x 2 * is an imprecise measurement of x 2 (even though it may be an unbiased measurement) This is an important problem to be aware of; general purpose solutions do not exist in standard statistical programs Exception: if the purpose of the regression is to predict future y s from future values of x 1 and x 2 * then there is no need to worry about x 2 * being a measurement of x 2
Regression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
Regression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Multiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
Exercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
MULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
Example: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
Section 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
Chapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
Module 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
Simple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
The Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
Simple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
Correlation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
Premaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
Statistics 2014 Scoring Guidelines
AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home
Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
Module 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
Simple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
4. Multiple Regression in Practice
30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look
Univariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
Moderator and Mediator Analysis
Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,
The leverage statistic, h, also called the hat-value, is available to identify cases which influence the regression model more than others.
Outliers Outliers are data points which lie outside the general linear pattern of which the midline is the regression line. A rule of thumb is that outliers are points whose standardized residual is greater
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Linear Regression. use http://www.stat.columbia.edu/~martin/w1111/data/body_fat. 30 35 40 45 waist
Linear Regression In this tutorial we will explore fitting linear regression models using STATA. We will also cover ways of re-expressing variables in a data set if the conditions for linear regression
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma [email protected] The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
AP STATISTICS REVIEW (YMS Chapters 1-8)
AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with
NCSS Statistical Software. Multiple Regression
Chapter 305 Introduction Analysis refers to a set of techniques for studying the straight-line relationships among two or more variables. Multiple regression estimates the β s in the equation y = β 0 +
CALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
Correlation and Regression Analysis: SPSS
Correlation and Regression Analysis: SPSS Bivariate Analysis: Cyberloafing Predicted from Personality and Age These days many employees, during work hours, spend time on the Internet doing personal things,
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9
DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,
Multivariate Analysis of Variance (MANOVA)
Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various
Moderation. Moderation
Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation
Relationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Introduction
Introduction New Work Item for ISO 3534-5 Predictive Analytics (Initial Notes and Thoughts) Predictive analytics encompasses the body of statistical knowledge supporting the analysis of massive data sets.
Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
INTRODUCTION TO MULTIPLE CORRELATION
CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary
Linear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples
Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics
Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),
Model Diagnostics for Regression
Model Diagnostics for Regression After fitting a regression model it is important to determine whether all the necessary model assumptions are valid before performing inference. If there are any violations,
Obesity in America: A Growing Trend
Obesity in America: A Growing Trend David Todd P e n n s y l v a n i a S t a t e U n i v e r s i t y Utilizing Geographic Information Systems (GIS) to explore obesity in America, this study aims to determine
MULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:
Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:
Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.
1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
Week 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
Review Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
Chapter 7: Dummy variable regression
Chapter 7: Dummy variable regression Why include a qualitative independent variable?........................................ 2 Simplest model 3 Simplest case.............................................................
Data analysis and regression in Stata
Data analysis and regression in Stata This handout shows how the weekly beer sales series might be analyzed with Stata (the software package now used for teaching stats at Kellogg), for purposes of comparing
11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
Directions for using SPSS
Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...
Factor Analysis. Chapter 420. Introduction
Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.
Session 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
The importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective
Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis
Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions
Scatter Plots with Error Bars
Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.
Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.
Stata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting
Describing, Exploring, and Comparing Data
24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter
Homework 8 Solutions
Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.
MTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY
TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online
Notes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
Means, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD
REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT
UNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics
UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002
Introduction to Linear Regression
14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression
Systat: Statistical Visualization Software
Systat: Statistical Visualization Software Hilary R. Hafner Jennifer L. DeWinter Steven G. Brown Theresa E. O Brien Sonoma Technology, Inc. Petaluma, CA Presented in Toledo, OH October 28, 2011 STI-910019-3946
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
