MULTIPLE REGRESSION WITH CATEGORICAL DATA

Size: px
Start display at page:

Transcription

1 DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting coefficients 3. Interaction B. Reading: Agresti and Finlay Statistical Methods in the Social Sciences, 3 rd edition, Chapter, pages 449 to 46. II. CATEGORICAL INDEPENDENT VARIABLES: A. Problem: what does sex discrimination in employment mean and how can it be measured? B. To answer these questions consider these artificial data pertaining to employment records of a sample of employees of Ace Manufacturing: i Sex Merit Pay i Sex Merit Pay Bob M 9.6 Tim M 6.0 Paul M 8.3 George M. Mary F 4. Alan M 9. John M 8.8 Lisa F 3.3 Nancy F. Anne F.7 C. Data: here the dependent variable, Y, is merit pay increase measured in percent and the "independent" variable is sex which is quite obviously a nominal or categorical variable. D. Our goal is to use categorical variables to explain variation in Y, a quantitative dependent variable.. We need to convert the categorical variable gender into a form that makes sense to regression analysis. E. One way to represent a categorical variable is to code the categories 0 and as follows:

2 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page let X = if sex is "male" 0 otherwise. Example: Bob is scored "" because he is male; Mary is 0. F. Called dummy variables, data coded according this 0 and scheme, are in a sense arbitrary but still have some desirable properties.. A dummy variable, in other words, is a numerical representation of the categories of a nominal or ordinal variable. G. Interpretation: by creating X with scores of and 0 we can transform the above table into a set of data that can be analyzed with regular regression. Here is what the data matrix would look like prior to using, say, MINITAB: i Sex Merit Pay i Sex Merit Pay (c) (c) (c3) (c) (c) (c3) Bob 9.6 Tim 6.0 Paul 8.3 George. Mary 0 4. Alan 9. John 8.8 Lisa Nancy 0. Anne 0.7. III. H. Except for the first column, these data can be considered numeric: merit pay is measured in percent, while gender is dummy or binary variable with two values, for male and 0 for female.. We can use these numbers in formulas just like any data.. Of course, there is something artificial about choosing 0 and, for why couldn t we use and or 33 and 55.6 or any other pair of numbers? 3. The answer is that we could. Using scores of 0 and, however, leads to particularly simple interpretations of the results of regression analysis, as we ll see below. INTERPRETATION OF COEFFICIENTS: A. If the categorical variable has K categories (e.g., region which might have K = 4 categories--north, South, Midwest, and West) one uses K - dummy variables as seen later. B. Using regular OLS analysis the parameter estimators can be interpreted as usual: a one-unit change in X leads to \$ change in Y. C. But given the definition of the variables a more straight forward interpretation is possible. The model is:

3 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 3 E(Y i ) ' \$ 0 X. The model states that the expected value of Y--in this case, the expected merit pay increase--equals β 0 plus β times X. But what are the two possible values of X?. First consider males; that is, X =. Substitute into the model: E(Y i ) ' \$ 0 X ' \$ 0 () ' \$ 0 i. The expected merit pay increase for males is thus β + β Now consider the model for females, i.e., X = 0. Again, make the substitution and reduce the equation. E(Y i ) ' \$ 0 X ' \$ 0 (0) ' \$ 0 i. We can see from these equations that β is the expected value of Y 0 (remember it's merit pay increase in this example) for those subjects or units coded 0 on X--in this instance it is the expected pay ii. increase of females. Stated differently but equivalently, β is the 0 mean of Y (% pay increase) in the population of units coded 0 on X (i.e., females). ) That is, β is µ where µ is the mean of the dependent variable for the group coded 0. ) Remember: the expected value of a random variable is its mean or E(Y ) = µ i β is the "effect," so to speak, of moving or changing from category 0 to category --here of changing from female to male--on the dependent variable. ) Specifically, if β > 0, then the expected value of Y is higher for the group members (e.g. males) than group 0 cases (e.g., females). Thus, if β > 0 then men get higher increases on average than women do. ) On the other hand, if β < 0, then group people (units) get less Y than do group 0 individuals. If β < 0, in other words,

4 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 4 then females receive higher pay increases. 3) If β = 0, then both groups have the same expected value on Y. 4. Hence, knowing the values of β 0 and β tells us a lot about the nature of the relationship. D. Using 0 and to code gender (or any categorical variable) thus leads to particularly simple interpretations.. If we used other pairs of numbers, we would get the correct results but they would be hard to interpret. E. Example: to put some "flesh" on these concepts suppose we regressed merit pay (Y) on gender.. We obtain the estimated equation: Ŷ ' 3.08 % 4.09X R =.48. The estimate of the average merit pay increase for women in the population is 3.08 percent. (Let X = 0 and simplify the equation.) 3. Men, on average, get 4.09 percent more than women. hence their average increase is: Ŷ men ' 3.08 % 4.09() ' 3.09 % 4.09 ' The "effect" of being male is 4.09 percent greater merit pay than what women get. 5. Here is a partial regression ANOVA table: Source df SS MS F obs Regression (sex) Residual Total variation in Y

5 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 5 6. At the.05 level, the critical value of F with and 8 degrees of freedom is 5.3. Thus, the observed F is barely significant. Since the critical F at the.0 level is.6, the result (the observed "effect" of Y that is) has a probability of happening by chance of between.05 and.0. IV. ANOTHER EXAMPLE: A. Here is a problem to consider. A law firm has been asked to represent a group of women who charge that their employer, GANGRENE CHEMICAL CO., discriminates against them, especially in pay. The women claim that salary increases for females are consistently and considerably lower than the raises men receive. GANGRENE counters that increases are based entirely on job performance as measured by an impartial "supervisor rating of work" evaluation which includes a number of performance indicators. You have been asked by the law firm to make a preliminary assessment of the merits of the claim. To begin with, you draw a random sample from the company's files: File Quality of Salary No. Sex work Years Increase score Experience Division F 0 9 Production \$ F 90 Production 96 3 F 0 4 Production 47 4 F 80 Production 8 5 F 30 4 Research 64 6 F 70 Research 5 7 F 0 4 Sales 73 8 F 5 7 Production 9 9 M 0 6 Research 8 0 M 80 3 Sales 474 M 50 3 Research 34 M 70 Sales M 30 7 Sales 85 4 M 70 7 Sales 33 5 M 40 Sales 67 6 M 90 6 Production 57 7 M 50 8 Production 390

6 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 6 B. The data:. Salary increase is measured in extra dollars per month.. The quality of work index ranges from 0 (lowest) to 00 (highest) rating. 3. Division is divisions within the company C. Questions:. Even a cursory glance at the data reveal that men get higher increases than women. But the real question is why?. Notice for example that men differ from women on other factors such as experience, division, and job performance evaluations. Are the differences in salary increases due to these factors? 3. Another problem: suppose raises are tied solely to performance ratings. Is there discrimination in these evaluations. D. Preliminary model: E(Y i ) ' \$ 0 % \$ X X % \$ Z Z % \$ W W. X is gender coded: if female 0 otherwise. Z is job performance evaluation (i.e., quality of work) 3. W = XZ, an interaction term (see below) E. Interaction (W):. The interaction term has this meaning or interpretation: consider the relationship between Y and Z. So far in this course, this relationship has been measured by β, the regression coefficient of Y on Z. This coefficient Z is a partial coefficient in that it measures the impact of Z on Y when other variables have been held constant. But suppose the effect of Z on Y depends on the level of another variable, say X. Then, β by itself would Z not be enough to describe the relationship because there is no simple relationship between Y and Z. It depends on the level of X. This is the idea of interaction.. To be more specific, suppose the relationship between work performance (Z) and pay increase (Y) depends on a worker's sex. That is, suppose a one-unit increase in quality of work performance evaluations for women brings a \$.00 increase in salary but a one unit increase in Z brings a \$.00 increase for men. In this case, the effect of Z, quality of work, depends on or is affected by sex. 3. To test this idea we have to create an "interaction" variable by multiplying Z by X: W ' XZ ' (Gender) (Work index)

7 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 7 4. The variable can be added to the model. If it turns out to be non-significant or does not seem to add much to the model's explanatory power, then it can be dropped. Dropping the interaction term in this context amounts to saying that the job performance rating has the same impact on salary increases for both sexes. If, on the other hand, there is a difference in effects of Z, the interaction term will explain some of the variation in Y. 5. The important point is that W has a substantive meaning. In this context it would indicate the presence of a second kind of discrimination: i. One type of discrimination is that on average women get smaller raises. ii. Another type is that extra increments in job evaluations bring less rewards for women than men. F. Initial OLS estimates of the model:. This model is called complete because it contains X, Z, and W: Ŷ ' & 9.7X % 4.84Z & 4.05W i. The R for the complete model is.94.. Interpretation of the parameters: i. The coefficients can be interpreted in the usual way. But using the logic presented earlier in conjunction with the definition of the dummy variables, there are more straightforward interpretations. ii. For men the estimated equation is Ŷ ' % 4.84Z because X ' W ' 0 (remember X ' 0 for men and hence W also equals 0) 3. Thus when Z (job performance) is zero, men can expect on average to get a salary increase of \$ For each one-unit increase in their job evaluations they get an extra \$ Now look at the equation for women: Ŷ ' (59.94 & 9.7) % (4.84 & 4.05)Z ' % 30.3 %.79Z i. For women X = and hence -9.7X = -9.7 and so forth. Work through the equations yourself to make sure you grasp what is going on.

8 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 8 5. Therefore, women can expect an average salary increase of only \$30.3 when Z is zero and a one-unit increase in the job evaluation index only brings an extra 79 cents in raises. 6. These figures suggest that there are indeed two types of discrimination working in the company: i. For men β 0 is \$59.94 whereas for women it is \$30.3, a difference of almost \$30. ii. For men, furthermore, β Z, which measures the return on work performance, is \$4.85 while for women the return is only \$ Here is a diagram that illustrates these ideas. (It is not drawn to scale.) Figure : Interaction Effect V. TESTING FOR INTERACTION: A. The estimates are different but since this is a small sample one will want evidence that the differences are not due to sampling error. We thus need to test the significance of β W.. If it is not significant we might want to drop it from the model.. That would mean that at least one type of discrimination was not operative. B. The strategy is this:. We use an extra or added sum of squares logic.. Does the inclusion in a model of a variable or set of variables significantly increase the explained sum of squares over what we would expect from chance. 3. So we will compare two models R 's: i. The first will be for the model containing all of the variables, a model we call complete. ii. The second is the R for the model without the variables of interest.

9 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 9 It s called the reduced model. C. One way to do this is to estimate a complete model as above and obtain an observed F for it. Here, the F for the complete model is with 3 and 3 degrees of freedom. Next, estimate a reduced model, that is a model in which W has not been included: E(Y) ' \$ 0 % \$ X X % \$ Z Z i. This model will also have an observed F. The difference between the F's can be evaluated by the formula: F obs ' (R com & R reduced ) ( & R com ) (N & K &) g. D. Here is the R for the complete model and is the R for the R com R reduced reduced model, K is the number of independent variables in the complete model counting interaction (here 3), and g is the number of variables left out of the reduced model (here ).. Believe it or not, the previous formula is just an application of the ratio of mean squares we discussed earlier in the semester when describing the F test. i. The formula is the ratio of two estimators of the error variance. ) The first is the residual sum of squares divided by its degrees of freedom. ) The error or residual sum of squares is equivalent to & R comp 3) When divided by degrees of freedom N - K -, we have an estimator of the error standard deviation. ii. It can be shown that the component R comp & R reduced divided by its degrees of freedom, g, is an independent and unbiased estimator of the error standard deviation, under the hypothesis that the extra explained sum of squares added by the additional variables is nil or zero.. The R's are obtained from MINITAB (or SPSS) in the usual way. i. I just regress Y on all of the variables including the interaction term and noted the R

10 Posc/Uapp 86 Class 4 Multiple Regression With Categorical Data Page 0 ii. I then drop the interaction term and get the R for the reduced model. 3. For this example the F is, where g = and N - K - = 3: F obs ' (.9409 &.8340) ( &.9409) (7 & 3 &) ' 3.5 i. The critical F for and 3 degrees of freedom is 7.8 at the.00 level. ii. Therefore, include interaction term measured by β W is statistically significant (at the.00 level) and should be included in the model. ) That is, the observed F exceeds the critical F at the.00 level. ) In other words, the interaction term adds significantly to the explanatory power of the model. 3) This means in turn that there is evidence of the second kind of discrimination. VI. NEXT TIME: A. More on dummy variables and multiple regression. B. Transformations C. Regression diagnostics Go to Notes page Go to Statistics page

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

4. Multiple Regression in Practice

30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving

Section 7 Algebraic Manipulations and Solving Part 1 Expressions, Equations, and Inequalities: Simplifying and Solving Before launching into the mathematics, let s take a moment to talk about the words

Solving Rational Equations

Lesson M Lesson : Student Outcomes Students solve rational equations, monitoring for the creation of extraneous solutions. Lesson Notes In the preceding lessons, students learned to add, subtract, multiply,

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

Statistics Review PSY379

Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript

DDBA 8438: The t Test for Independent Samples Video Podcast Transcript JENNIFER ANN MORROW: Welcome to The t Test for Independent Samples. My name is Dr. Jennifer Ann Morrow. In today's demonstration,

Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable

Friedman's Two-way Analysis of Variance by Ranks -- Analysis of k-within-group Data with a Quantitative Response Variable Application: This statistic has two applications that can appear very different,

Point and Interval Estimates

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

Statistics 2014 Scoring Guidelines

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

General Method: Difference of Means 1. Calculate x 1, x 2, SE 1, SE 2. 2. Combined SE = SE1 2 + SE2 2. ASSUMES INDEPENDENT SAMPLES. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

NOTES ON HLM TERMINOLOGY

HLML01cc 1 FI=HLML01cc NOTES ON HLM TERMINOLOGY by Ralph B. Taylor breck@rbtaylor.net All materials copyright (c) 1998-2002 by Ralph B. Taylor LEVEL 1 Refers to the model describing units within a grouping:

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

Binary Logistic Regression

Binary Logistic Regression Main Effects Model Logistic regression will accept quantitative, binary or categorical predictors and will code the latter two in various ways. Here s a simple model including

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

MODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING

Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

WHERE DOES THE 10% CONDITION COME FROM?

1 WHERE DOES THE 10% CONDITION COME FROM? The text has mentioned The 10% Condition (at least) twice so far: p. 407 Bernoulli trials must be independent. If that assumption is violated, it is still okay

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

The Islamic University of Gaza Faculty of Commerce Department of Economics and Political Sciences An Introduction to Statistics Course (ECOE 130) Spring Semester 011 Chapter 10- TWO-SAMPLE TESTS Practice

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

Trend and Seasonal Components

Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing

10. Analysis of Longitudinal Studies Repeat-measures analysis

Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

Rockefeller College University at Albany

Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.

Analyzing Intervention Effects: Multilevel & Other Approaches. Simplest Intervention Design. Better Design: Have Pretest

Analyzing Intervention Effects: Multilevel & Other Approaches Joop Hox Methodology & Statistics, Utrecht Simplest Intervention Design R X Y E Random assignment Experimental + Control group Analysis: t

Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

individualdifferences

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

3.2. Solving quadratic equations. Introduction. Prerequisites. Learning Outcomes. Learning Style

Solving quadratic equations 3.2 Introduction A quadratic equation is one which can be written in the form ax 2 + bx + c = 0 where a, b and c are numbers and x is the unknown whose value(s) we wish to find.

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables

Wooldridge, Introductory Econometrics, 4th ed. Chapter 7: Multiple regression analysis with qualitative information: Binary (or dummy) variables We often consider relationships between observed outcomes

Solutions of Equations in Two Variables

6.1 Solutions of Equations in Two Variables 6.1 OBJECTIVES 1. Find solutions for an equation in two variables 2. Use ordered pair notation to write solutions for equations in two variables We discussed

Economic Development and the Gender Wage Gap Sherri Haas

Economic Development and the Gender Wage Gap I. INTRODUCTION General wage inequality within countries is a topic that has received a great deal of attention in the economic literature. Differences in wages

5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

Chapter Four. Data Analyses and Presentation of the Findings

Chapter Four Data Analyses and Presentation of the Findings The fourth chapter represents the focal point of the research report. Previous chapters of the report have laid the groundwork for the project.

Multiple Imputation for Missing Data: A Cautionary Tale

Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust