Chi Square & Correlation

Similar documents
Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Association Between Variables

The Chi-Square Test. STAT E-50 Introduction to Statistics

UNDERSTANDING THE TWO-WAY ANOVA

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Statistical tests for SPSS

Rank-Based Non-Parametric Tests

Nonparametric Tests. Chi-Square Test for Independence

Nonparametric Statistics

Descriptive Statistics

TABLE OF CONTENTS. About Chi Squares What is a CHI SQUARE? Chi Squares Hypothesis Testing with Chi Squares... 2

Pearson s Correlation

Research Methods & Experimental Design

Chapter 5 Analysis of variance SPSS Analysis of variance

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

The Dummy s Guide to Data Analysis Using SPSS

Using Excel for inferential statistics

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Additional sources Compilation of sources:

Elementary Statistics

This chapter discusses some of the basic concepts in inferential statistics.

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Study Guide for the Final Exam

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

HYPOTHESIS TESTING WITH SPSS:

Section 3 Part 1. Relationships between two numerical variables

II. DISTRIBUTIONS distribution normal distribution. standard scores

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Crosstabulation & Chi Square

3. Analysis of Qualitative Data

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Chi-square test Fisher s Exact test

Analysing Questionnaires using Minitab (for SPSS queries contact -)

Introduction to Quantitative Methods

Section 12 Part 2. Chi-square test

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

Is it statistically significant? The chi-square test

One-Way Analysis of Variance (ANOVA) Example Problem

Using Stata for Categorical Data Analysis

Independent t- Test (Comparing Two Means)

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

CHAPTER 5 COMPARISON OF DIFFERENT TYPE OF ONLINE ADVERTSIEMENTS. Table: 8 Perceived Usefulness of Different Advertisement Types

Introduction to Statistics and Quantitative Research Methods

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

People like to clump things into categories. Virtually every research

Chapter 13 Introduction to Linear Regression and Correlation Analysis

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Descriptive Analysis

SPSS TUTORIAL & EXERCISE BOOK

11. Analysis of Case-control Studies Logistic Regression

CALCULATIONS & STATISTICS

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

Introduction to Regression and Data Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

UNDERSTANDING THE INDEPENDENT-SAMPLES t TEST

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS

Whitney Colbert Research Methods for the Social Sciences Trinity College Spring 2012

THE KRUSKAL WALLLIS TEST

UNIVERSITY OF NAIROBI

Statistics for Sports Medicine

Simple linear regression

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Simple Linear Regression Inference

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

Categorical Data Analysis

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Linear Models in STATA and ANOVA

Poisson Models for Count Data

Testing differences in proportions

Calculating the Probability of Returning a Loan with Binary Probability Models

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

STA-201-TE. 5. Measures of relationship: correlation (5%) Correlation coefficient; Pearson r; correlation and causation; proportion of common variance

An introduction to IBM SPSS Statistics

A full analysis example Multiple correlations Partial correlations

Calculating, Interpreting, and Reporting Estimates of Effect Size (Magnitude of an Effect or the Strength of a Relationship)

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

Common Univariate and Bivariate Applications of the Chi-square Distribution

RECRUITERS PRIORITIES IN PLACING MBA FRESHER: AN EMPIRICAL ANALYSIS

Analysis of categorical data: Course quiz instructions for SPSS

Section 13, Part 1 ANOVA. Analysis Of Variance

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Statistiek II. John Nerbonne. October 1, Dept of Information Science

Ordinal Regression. Chapter

WHAT IS A JOURNAL CLUB?

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Nonparametric tests these test hypotheses that are not statements about population parameters (e.g.,

Transcription:

Chi Square & Correlation

Nonparametric Test of Chi 2 Used when too many assumptions are violated in T-Tests: Sample size too small to reflect population Data are not continuous and thus not appropriate for parametric tests based on normal distributions. χ2 is another way of showing that some pattern in data is not created randomly by chance. X 2 can be one or two dimensional. X 2 deals with the question of whether what we observed is different from what is expected

Calculating X 2 What would a contingency table look like if no relationship exists between gender and voting for Bush? (i.e. statistical independence) Male Female 25 25 25 Voted for Bush 50 25 Voted for Kerry 50 50 50 100 NOTE: INDEPENDENT VARIABLES ON COLUMS AND DEPENDENT ON ROWS

Calculating X 2 What would a contingency table look like if a perfect relationship exists between gender and voting for Bush? Male Female Voted for Bush Voted for Kerry 50 0 0 50

f^ ij = Calculating the expected value f ^ ij = ( f i )( N The expected frequency of the cell in the ith row and jth column Fi = The total in the ith row marginal Fj = The total in the jth column marginal N = The grand total, or sample size for the entire table f j ) Expected Voted for Bush = 50x50 / 100 = 25

Nonparametric Test of Chi 2 Again, the basic question is what you are observing in some given data created by chance or through some systematic process? 2 ( O E ) χ = E 2 O= Observed frequency E= Expected frequency

Nonparametric Test of Chi 2 The null hypothesis we are testing here is that the proportion of occurrences in each category are equal to each other (Ho: B=K). Our research hypothesis is that they are not equal (Ha: B =K). Given the sample size, how many cases could we expect in each category (n/#categories)? The obtained/critical value estimation will provide a coefficient and a Pr. that the results are random.

Voted for Bush Voted For Kerry Male 50 0 Female 0 50 Let s do a X 2 (50-25) 2 /25=25 (0-25) 2 /25=25 (0-25) 2 /25=25 (50-25) 2 /25=25 X 2 =100 What would X 2 be when there is statistical independence?

Let s corroborate with SPSS Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Chi-Square Tests Asymp. Sig. Value df (2-sided).000 b 1 1.000.000 1 1.000.000 1 1.000.000 1 1.000 100 a. Computed only for a 2x2 table Exact Sig. (2-sided) b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 25.00. Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases Chi-Square Tests Asymp. Sig. Value df (2-sided) 100.000 b 1.000 96.040 1.000 138.629 1.000 99.000 1.000 100 a. Computed only for a 2x2 table Exact Sig. (1-sided) 1.000.579 Exact Sig. (2-sided) b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 25.00. Exact Sig. (1-sided).000.000

Testing for significance Voted for Bush Voted for Kerry Male Female 20 30 30 20 X 2 = 4 How do we know if the relationship is statistically significant? We need to know the df (df= (R-1) (C-1) ) (2-1)(2-1)= 1 We go to the X 2 distribution to look for the critical value (CV= 3.84) We conclude that the relationship gender and voting is statistically significant.

When is X 2 appropriate to use? X 2 is perhaps the most widely used statistical technique to analyze nominal and ordinal data Nominal X nominal (gender and voting preferences) Nominal and ordinal (gender and opinion for W)

65 80 145 X 2 =52.3 Do we reject the null hypothesis? X 2 can also be used with larger tables Opinion of Bush Favorable Indifferent Unfavorable MALE 40 (19.4) 10 (.88) 5 20 15 55 FEMALE (15.8) (.72) (8.6) (6.9) 45 30 70

Correlation (Does not mean causation) We want to know how two variables are related to each other Does eating doughnuts affect weight? Does spending more hours studying increase test scores? Correlation means how much two variables overlap with each other

Types of Correlations X (cause) Y (effect) Correlation Values Increases Increases Positive 0 to1 Decreases Decreases Positive 0 to 1 Increases Decreases Negative -1 to 0 Decreases Increases Negative -1 to 0 Increase Decreases Does not change Independent 0

Conceptualizing Correlation Weak Measuring Development Strong GPD POP WEIGHT GDP EDUCATION Correlation will be associated with what type of validity?

Correlation Coefficient n XY X Y r xy = [ n X 2 ( X) 2 ][ n Y 2 ( Y) 2 ]

Home Value & Square footage Log value Log sqft value 2 sqft 2 Val * sqft 5.13 4.02 26.3169 16.1604 20.6226 5.2 4.54 27.04 20.6116 23.608 4.53 3.53 20.5209 12.4609 15.9909 4.79 3.8 22.9441 14.44 18.202 4.78 3.86 22.8484 14.8996 18.4508 4.72 4.17 22.2784 17.3889 19.6824 29.15 23.92 141.95 95.96 116.56

Correlation Coefficient r xy = [(141.95*6) (6*116.56) (29.15)(23.92) (29.15) 2 ][(95.96*6) (23.92) 2 ]. 78 = 2.09 2.66 VALUE SQFT Correlations VALUE SQFT Pearson Correlation 1.778 Sig. (2-tailed)..068 N 6 6 Pearson Correlation.778 1 Sig. (2-tailed).068. N 6 6

Rules of Thumb Size of correlation coefficient.8-1.0.6 -.8.4 -.6.2 -.4.0 -.2 General Interpretation Very Strong Strong Moderate Weak Very Weak or no relationship

Multiple Correlation Coefficients Correlations VALUE SQFT BTH BDR VALUE Pearson Correlation 1.784**.775**.708** Sig. (2-tailed)..000.000.000 N 46 46 46 46 SQFT Pearson Correlation.784** 1.669**.654** Sig. (2-tailed).000..000.000 N 46 46 46 46 BTH Pearson Correlation.775**.669** 1.895** Sig. (2-tailed).000.000..000 N 46 46 46 46 BDR Pearson Correlation.708**.654**.895** 1 Sig. (2-tailed).000.000.000. N 46 46 46 46 **. Correlation is significant at the 0.01 level (2-tailed).

Limitation of correlation coefficients They tell us how strong two variables are related However, r coefficients are limited because they cannot tell anything about: 1. Causation between X and Y 2. Marginal impact of X on Y 3. What percentage of the variation of Y is explained by X 4. Forecasting Because of the above Ordinary Least Square (OLS) is most useful

Do you have the BLUES? B for Best (Minimum error) L for Linear (The form of the relationship) U for Un-bias (does the parameter truly reflect the effect?) E for Estimator

Home value and sq. Feet 5.3 5.2 5.1 Y = α + βx + ε 5.0 4.9 4.8 4.7 VALUE 4.6 4.5 3.4 3.6 3.8 4.0 4.2 4.4 4.6 SQFT Does the above line meet the BLUE criteria?