# Linear Models in STATA and ANOVA

Size: px
Start display at page:

Transcription

1 Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples t-test 4-10 f-test: Two Sample for Variances 4-12 Paired Samples t-test 4-13 One way ANOVA 4-15 Practical Session

2 SESSION 4: Linear Models in STATA and ANOVA Strengths of Linear Relationships In the previous session we looked at relationships between variables and the Line of Best Fit through the points on a plot. Linear Regression can tell us whether any perceived relationship between the variables is a significant one. But what about the strength of a relationship? How tightly are the points clustered around the line? The strength of a linear relationship can be measured using the Pearson Correlation Coefficient. The values of the Correlation Coefficient can range from 1 to +1. The following table provides a summary of the types of relationship and their Correlation Coefficients: Linear Relationship Correlation Coefficient Perfect Negative -1 Negative -1 to 0 None 0 Positive 0 to +1 Perfect Positive +1 The higher the Correlation Coefficient, regardless of sign, the stronger the linear relationship between the two variables. From the GSS data set gss91t.dta, we can look at the linear relationships between the education of the respondent (educ), that of the parents (maeduc and paeduc), the age of the respondent (age), and the Occupational Prestige Score (prestg80). In STATA, click on Statistics Summaries, tables & tests Summary Statistics Pairwise correlations 4-2

3 Write the 5 variables. Click to obtain number of observations. Click to obtain the significance level. Click so that STATA marks the significant correlations All possible pairs of variables from your chosen list will have the Correlation Coefficient calculated. No of observations Significance value Correlation coefficients Notice that, for each pair of variables, the number of respondents, N, differs. This is because the default is to exclude missing cases pairwise; that is, if a respondent has missing values for some of the variables, he or she is removed from the Correlation calculations involving those variables, but is included in any others where there are valid values for both variables. 4-3

4 Using the Sig. (2-tailed) value, we can determine whether the Correlation is a significant one. The Null Hypothesis is that the Correlation Coefficient is zero (or close enough to be taken as zero), and we reject this at the 5% level if the significance is less than STATA flags the Correlation Coefficients with an asterisk if they are significant at the 5% level. We can see in our example that there are significant positive Correlations for each pair of the education variables; age is significantly negatively correlated with each of them, and prestg80 has significant positive correlations with each. All these correlations are significant at the 1% level, with the education of mothers and fathers having the strongest relationship. The remaining variable pairing, age and prestg80, does not have a significant linear relationship; the correlation coefficient of is not significantly different from zero, as indicated by the significance level of This is a formal test of what we saw in the scatter plot of prestg80 against age in the previous session, the points seemed randomly scattered. A Note on Non-Linear Relationships It must be emphasised that we are dealing with Linear Relationships. You may find that the correlation coefficient indicates no significant linear relationship between two variables, but they may have a Non-Linear Relationship which we are not testing for. The following is the result of the correlation and scatter plot procedures performed on some hypothetical data. Not significant correlation coefficient. As can be seen, the correlation coefficient is not significant, indicating no linear relationship, while the plot indicates a very obvious quadratic relationship. It is 4-4

5 always a good idea to check for relationships visually using graphics as well as using formal statistical methods! Multiple Linear Regression Simple Linear Regression looks at one dependent variable in terms of one independent (or explanatory) variable. When we want to explain a dependent variable in terms of two or more independent variables we use Multiple Linear Regression. Just as in Simple Linear Regression, the Least Squares method is used to estimate the Coefficients (the constant and the Bs) of the independent variables in the now more general equation: ( Independent Var1) + B ( Independent 2)... depen det variable = B + B1 2 Var 0 + Use the dataset gss91t.dta to investigate the effect of the respondent s age (age), sex (sex), education (educ) and spouse s education (speduc) on the Occupational Prestige score (prestg80). Firstly, we will produce scatter plots of the continuous variables by clicking on Graphics Scatterplot matrix Then we can produce some correlation coefficients by clicking on Statistics Summaries, tables & tests Summary Statistics Pairwise correlations 4-5

6 We cannot see any unusual patterns in the Scatter Plots that would indicate relationships other than linear ones might be present. The correlations indicate that there are significant linear relationships between prestg80 and the two education variables, but not age. However, there are also significant correlations between what will be our 3 continuous independent variables (educ, speduc and age). How will this affect the Multiple Regression? We follow the same procedure as Simple Linear Regression; we click on: Statistics Linear Regression and related Linear regression Choose prestg80 as the dependent variable Choose educ, speduc, age and sex as the independent variables Click OK 4-6

7 sex is not a continuous variable, but, as it is a binary variable, we can use it if we interpret the results with care. The following output is obtained. The 2 nd table is the Model Summary table, which tells us how well we are explaining the dependent variable, prestg80, in terms of the variables we have entered into the model; the figures here are sometimes called the Goodness of Fit statistics. The figure in the row headed R-Squared is the proportion of variability in the dependent variable that can be explained by changes in the values of the independent variables. The higher this proportion, the better the model is fitting to the data. The 1 st table is the ANOVA table and it also indicates whether there is a significant Linear Relationship between the Dependent variable and the combination of the Explanatory variables; an F-Test is used to test the Null Hypothesis that there is no Linear Relationship. The F-Test is given as a part of the 2 nd table. We can see in our example that, with a Significance value (Prob>F) of less than 0.05, we have evidence that there is a significant Linear Relationship. In the 3 rd table, the table of the coefficients, we have the figures that will be used in our equation. All 4 explanatory variables have been entered, but should they all be there? Looking at the 2 columns, headed t and P> t, we can see that the significance level for the variable speduc is more than This indicates that, when the other variables (a constant, educ, age and sex) are used to explain the variability in prestg80, using speduc as well doesn t help to explain it any better; the coefficient of speduc is not significantly different from zero. It is not needed in the model. Recall that, when we looked at the correlation coefficients before fitting this model, educ and speduc were both significantly correlated with prestg80, but educ had the stronger relationship (0.520 compared to 0.355). In addition, the correlation between educ and speduc, 0.619, showed a stronger linear relationship. We should not be surprised, therefore, that the Multiple Linear 4-7

8 Regression indicates that using educ to explain prestg80 means you don t need to use speduc as well. On the other hand, age was not significantly correlated with prestg80, but was significantly correlated with both education variables. We find that it appears as a significant effect when combined with these variables in the Multiple Linear Regression. Removal of variables We now want to remove the insignificant variable speduc, as its presence in the model affects the coefficients of the other variables. We follow the same procedure as before and click on: Statistics Linear Regression and related Linear regression Drop speduc from the independent variables box The output obtained now is as follows: 4-8

9 We can now see that R-squared has decreased to from This is because we have removed the variable speduc from the regression model. The ANOVA table also shows that the combination of variables in each model has a significant Linear Relationship with prestg80. Both educ and age remain significant in the model, however we see that sex has now become not significant. So we repeat the procedure but this time we remove sex from the model. Our final model is shown in the following output: Therefore the regression equation is: ( 2.47 * educ) ( age) prestg 80 = * So, for example, for a person aged 40 with 12 years of education, we estimate the Occupational Prestige score prestg80 as: ( 2.47 *12) + ( * 40) prestg 80 = = 4-9

10 Independent Samples T-Test Under the assumption that the variables are normal, how can we investigate relationships between variables where one is continuous? For these tests, we will use the data set statlaba.dta. In this data set, the children were weighed and measured (among other things) at the age of ten. We want to know whether there is any difference in the average heights of boys and girls at this age. We do this by performing a t-test. We start by stating our Null Hypothesis: H 0 : We assume there is no difference between boys and girls in terms of their height The Alternative Hypothesis is the one used if the Null Hypothesis is rejected. H a : We assume there is difference between boys and girls in terms of their height To perform the t-test, click on: Statistics Summaries, tables & tests Classical tests of hypotheses Group mean comparison test We want to test for differences in the mean HEIGHTS of the children; Move the variable cth to the Variable name area. We want to look at differences in the heights of the two groups BOYS and GIRLS, and so the Group variable name is sex. Click OK. Click or not? We need to do an F-test. 4-10

11 Null is rejected The first part of the output gives some summary statistics; the numbers in each group, and the mean, standard deviation, standard error and the confidence interval of the mean for the height. STATA also gives out the combined statistics for the 2 groups. In the second part of the output, we have the actual t-test. STATA gives out two null hypotheses as well as all the possible alternative hypotheses that we could have. Depending on which test you are after, you could either use a 1- tailed t-test (Ha: diff<0 or Ha: diff>0) or a 2-tailed t-test (Ha: diff!= 0). Our Null Hypothesis says that there is no difference between the boys and girls in terms of their heights; in other words, we are testing whether the difference of , is significantly different from zero. If it is, we must reject the Null Hypothesis, and instead take the Alternative. STATA calculates the t-value, the degrees of freedom and the Significance Level; we can then make our decision quickly based on the displayed Significance Level. We will use the 2-tailed test in our example. If the Significance Level is less than 0.05, we reject the Null Hypothesis and take the Alternative Hypothesis instead. In this case, with a Significance Level of 0.012, we say that there is evidence, at the 5% level, to suggest that there is a difference between the heights of boys and girls at age ten (the Alternative Hypothesis). (From the output, you can see that we can also conclude that this difference is negative). 4-11

12 f-test: Two Sample for Variances The f-test performs a two-sample f-test to compare two population variances. To be able to use the t-test, we need to determine whether the two populations have the same variance or not. In such a case, use the f-test. The f-test compares the f-score to the f distribution. In this case, the null hypothesis (H 0 ) and the alternative hypothesis (H a ) are: H 0 : the two populations have the same variance H a : the two populations do not have the same variance If we look at the same variable cth, we can now determine whether we should have ticked the option Unequal variance or not. This decision is based on an F-test which will check on the variance of the 2 populations. To use the f-test click on Statistics Summaries, tables & tests Classical tests of hypotheses Group variance comparison test Variable to be checked Grouping variable Click OK The following output is obtained. 4-12

13 The 1 st table contains some summary statistics of the two groups. In the 2 nd part of the output, we have the F-test. A significance value (P>F) of 0.05 or more means that the Null Hypothesis of assuming equal variances is acceptable, and we therefore can use the default option Equal Variances in the previous t-test; a significance value of less than 0.05 means that we have to check the option Unequal variances when performing the t-test. In this case, the significance value is comfortably above this threshold, and therefore equal variances are assumed. Paired Samples t-test Imagine you want to compare two groups that are somehow paired; for example, husbands and wives, or mothers and daughters. Knowing about this pairing structure gives extra information, and you should take account of this when performing the t-test. In the data set statlaba.dta, we have the weights of the parents when their child was aged 10 in ftw and mtw. If we want to know if there is a difference between males and females in terms of weight, we can perform a Paired Samples T-Test on these two variables. We start by stating our Null Hypothesis: H 0 : We assume there is no difference between the weights of the parents. The Alternative Hypothesis, is the one used if the Null Hypothesis is rejected. H a : We assume there is difference between the weights of the parents 4-13

14 To perform the t-test, click on: Statistics Summaries, tables & tests Classical tests of hypotheses Two-sample mean comparison test Choose the 2 variables that you want to test. Do not choose if observations are paired. Click OK. As with the Independent Samples T-Test, we are first given some summary statistics. The Paired Samples Test table shows that the difference between the weights of the males and females is is this significantly different from zero? We use this table just as we did in the Independent Samples T-Test, and since the Sig. (2-tailed) column shows a value of less than 0.05, we can say that there is evidence, at the 5% level, to reject the Null Hypothesis that there is no difference between the mothers and fathers in terms of their weight. 4-14

15 One-Way ANOVA We now look at the situation where we want to compare several independent groups. For this we use a One-Way ANOVA (ANALYSIS OF VARIANCE). We will make use of the data set gss91t.dta. We can split the respondents into three groups according to which category of the variable life they fall into; exciting, routine or dull. We want to know if there is any difference in the average years of education of these groups. Our Null Hypothesis is that there is no difference between them in terms of education. We start by stating our Null Hypothesis: H 0 : We assume there is no difference between the level of education of the 3 groups. The Alternative Hypothesis, is the one used if the Null Hypothesis is rejected. H a : We assume there is difference between the level of education of the 3 groups. To perform the one way ANOVA, click on: Statistics ANOVA/MANOVA One-way analysis of variance Choose the response variable. Choose the group or factor variable Click to obtain some summary statistics Click OK. 4-15

16 STATA produces output that enables us to decide whether to accept or reject the Null Hypothesis that there is no difference between the groups. But if we find evidence of a difference, we will not know where the difference lies. For example, those finding life exciting may have a significantly different number of years in education from those finding life dull, but there may be no difference when they are compared to those finding life routine. We therefore ask STATA to perform a further analysis for us, called Bonferroni. The output produced by STATA is below. The 1 st table gives some summary statistics of the 3 groups. The 2 nd table gives the results of the One-Way ANOVA. A measure of the variability found between the groups is shown in the Between Groups line, while the Within Groups line gives a measure of how much the observations within each group vary. These are used to perform the f-test which we use to test our Null Hypothesis that there is no difference between the three groups in terms of their years in education. We interpret the f-test in the same way as we did the t-test; if the significance (in the Prob>F column) is less than 0.05, we have evidence, at the 5% level, to reject the Null Hypothesis, and say that there is some difference between the groups. Otherwise, we accept our Null Hypothesis. 4-16

17 We can see from the output that the f-value of has a significance of less than , and therefore we reject the Null Hypothesis. The 3 rd table then shows us where these differences lie. Bonferroni creates subsets of the categories; if there is no difference between two categories, they are put into the same subset. We can say that, at the 5% level, all 3 categories are different as all significance levels are less than Practical Session 4 Use the data set statlaba.dta. 1. Use correlation and regression to investigate the relationship between the weight of the child at age 10 (ctw) and some physical characteristics: cbw child s weight at birth cth child s height at age 10 sex child s gender (coded 1 for girls, 2 for boys) 2. Repeat Question 1, but instead use the following explanatory variables: fth ftw mth mtw Father s height Father s weight Mother s height Mother s weight Use the data set gss91t.dta. 3. Investigate the Linear Relationships between the following variables using Correlations: educ maeduc paeduc speduc Education of respondent Education of respondent s mother Education of respondent s father Education of respondent s spouse 4. Using Linear Regression, investigate the influence of education and parental education on the choice of marriage partner (Dependent variable speduc). Use the variable sex to distinguish between any gender effects. 5. It is thought that the size of the family might affect educational attainment. Investigate this using educ and sibs (the number of siblings) in a Linear Regression. 6. Also investigate whether the education of the parents (maeduc and paeduc) affects the family size (sibs). 4-17

18 7. How does the result of question 6 influence your interpretation of question 5? Are you perhaps finding a spurious effect? Test whether sibs still has a significant effect on educ when maeduc and paeduc are included in the model. 8. Compute a new variable pared = (maeduc + paeduc) / 2, being the average years of education of the parents. By including pared, maeduc and paeduc in a Multiple Linear Regression, investigate which is the better predictor of educ; the separate measures or the combined measure. Use the data set statlaba.dta. At the age of ten, the children in the sample were given two tests; the Peabody Picture Vocabulary Test and the Raven Progressive Matrices Test. Their scores are stored in the variables ctp and ctr. Create a new variable called tests which is the sum of the two tests; this new variable will be used in the following questions. In each of the questions below, state your Null and Alternative Hypotheses, which of the two you accept on the evidence of the relevant test, and the Significance Level. 9. Use an Independent Samples T-Test to decide whether there is any difference between boys and girls in terms of their scores. 10. By pairing the parents of the child, decide whether there is any difference between fathers and mothers in terms of the heights. (Use fth and mth). 11. The fathers occupation is stored in the variable fto, with the following categories: 0 Professional 1 Teacher / Counsellor 2 Manager / Official 3 Self-employed 4 Sales 5 Clerical 6 Craftsman / Operator 7 Labourer 8 Service worker Recode fto into a new variable, occgrp, with categories: 4-18

19 1 Self-employed 2 Professional/ Manager / Official 3 Teacher / Counsellor 4 Sales/ Clerical/ Service worker 5 Craftsman / Operator 6 Labourer Attach suitable variable and value labels to this new variable. Using a One-Way ANOVA, test whether there is any difference between the occupation groups, in terms of the test scores of their children. Open the data set sceli.dta. In the SCELI questionnaire, employees were asked to compare the current circumstances in their job with what they were doing five years previously. Various aspects were considered: effort Effort put into job promo Chances of promotion secur Level of job security skill Level of skill used speed How fast employee works super Tightness of supervision tasks Variety of tasks train Provision of training They were asked, for each aspect, what, if any, change there had been. The codes used were: 1 Increase 2 No change 3 Decrease 7 Don t know The sex of the respondent is stored in the variable gender, (code 1 is male, and code 2 is female) and the age in age. For each of the job aspects, change code 7, Don t know to a missing value. Choose one or more of the job aspects. For each choice, answer the following questions: 12. What proportion of the employees sampled are employees perceiving a decrease in the job aspect? 4-19

20 13. What proportion of the employees sampled are female employees perceiving an increase or no change in the job aspect? 14. Use a bar chart to illustrate graphically any differences in the pattern of response between males and females. 15. Is there a significant difference in the average ages of the male and female employees in this sample? 16. Choose one or more of the job aspects. For each choice, investigate whether the employees falling into each of the categories have differences in terms of their ages. 4-20

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Chapter 2 Probability Topics SPSS T tests

Chapter 2 Probability Topics SPSS T tests Data file used: gss.sav In the lecture about chapter 2, only the One-Sample T test has been explained. In this handout, we also give the SPSS methods to perform

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### SPSS Explore procedure

SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### An introduction to IBM SPSS Statistics

An introduction to IBM SPSS Statistics Contents 1 Introduction... 1 2 Entering your data... 2 3 Preparing your data for analysis... 10 4 Exploring your data: univariate analysis... 14 5 Generating descriptive

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

### Chapter 7 Factor Analysis SPSS

Chapter 7 Factor Analysis SPSS Factor analysis attempts to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables. Factor analysis is often

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

### Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

### Using Excel for Statistical Analysis

Using Excel for Statistical Analysis You don t have to have a fancy pants statistics package to do many statistical functions. Excel can perform several statistical tests and analyses. First, make sure

### Simple Linear Regression, Scatterplots, and Bivariate Correlation

1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 5. Correlation. Open HeightWeight.sav. Take a moment to review the data file.

5. Correlation Objectives Calculate correlations Calculate correlations for subgroups using split file Create scatterplots with lines of best fit for subgroups and multiple correlations Correlation The

### Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

### INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### Two Related Samples t Test

Two Related Samples t Test In this example 1 students saw five pictures of attractive people and five pictures of unattractive people. For each picture, the students rated the friendliness of the person

### We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Simple Tricks for Using SPSS for Windows

Simple Tricks for Using SPSS for Windows Chapter 14. Follow-up Tests for the Two-Way Factorial ANOVA The Interaction is Not Significant If you have performed a two-way ANOVA using the General Linear Model,

### Mixed 2 x 3 ANOVA. Notes

Mixed 2 x 3 ANOVA This section explains how to perform an ANOVA when one of the variables takes the form of repeated measures and the other variable is between-subjects that is, independent groups of participants

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

### Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### SPSS Tests for Versions 9 to 13

SPSS Tests for Versions 9 to 13 Chapter 2 Descriptive Statistic (including median) Choose Analyze Descriptive statistics Frequencies... Click on variable(s) then press to move to into Variable(s): list

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish

Examining Differences (Comparing Groups) using SPSS Inferential statistics (Part I) Dwayne Devonish Statistics Statistics are quantitative methods of describing, analysing, and drawing inferences (conclusions)

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

### This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Descriptive Statistics

Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

### IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Module 5: Statistical Analysis

Module 5: Statistical Analysis To answer more complex questions using your data, or in statistical terms, to test your hypothesis, you need to use more advanced statistical tests. This module reviews the

### SPSS Guide How-to, Tips, Tricks & Statistical Techniques

SPSS Guide How-to, Tips, Tricks & Statistical Techniques Support for the course Research Methodology for IB Also useful for your BSc or MSc thesis March 2014 Dr. Marijke Leliveld Jacob Wiebenga, MSc CONTENT

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

t-tests in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com www.excelmasterseries.com

### Main Effects and Interactions

Main Effects & Interactions page 1 Main Effects and Interactions So far, we ve talked about studies in which there is just one independent variable, such as violence of television program. You might randomly

### containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

Getting Correlations Using PROC CORR Correlation analysis provides a method to measure the strength of a linear relationship between two numeric variables. PROC CORR can be used to compute Pearson product-moment

### Data exploration with Microsoft Excel: analysing more than one variable

Data exploration with Microsoft Excel: analysing more than one variable Contents 1 Introduction... 1 2 Comparing different groups or different variables... 2 3 Exploring the association between categorical

### COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

### Testing for differences I exercises with SPSS

Testing for differences I exercises with SPSS Introduction The exercises presented here are all about the t-test and its non-parametric equivalents in their various forms. In SPSS, all these tests can

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

### SPSS-Applications (Data Analysis)

CORTEX fellows training course, University of Zurich, October 2006 Slide 1 SPSS-Applications (Data Analysis) Dr. Jürg Schwarz, juerg.schwarz@schwarzpartners.ch Program 19. October 2006: Morning Lessons

### KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

### Statistics for Sports Medicine

Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Multiple-Comparison Procedures

Multiple-Comparison Procedures References A good review of many methods for both parametric and nonparametric multiple comparisons, planned and unplanned, and with some discussion of the philosophical

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily