3. Analysis of Qualitative Data



Similar documents
The Chi-Square Test. STAT E-50 Introduction to Statistics

The Dummy s Guide to Data Analysis Using SPSS

Analysis of categorical data: Course quiz instructions for SPSS

Association Between Variables

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Nonparametric Statistics

HYPOTHESIS TESTING WITH SPSS:

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Descriptive Statistics

Math 108 Exam 3 Solutions Spring 00

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

Mind on Statistics. Chapter 15

Statistics. One-two sided test, Parametric and non-parametric test statistics: one group, two groups, and more than two groups samples

Is it statistically significant? The chi-square test

An introduction to IBM SPSS Statistics

Basic Statistical and Modeling Procedures Using SAS

Additional sources Compilation of sources:

Introduction to Statistics with SPSS (15.0) Version 2.3 (public)

Common Univariate and Bivariate Applications of the Chi-square Distribution

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Chi-square test Fisher s Exact test

Testing differences in proportions

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 7 Section 7.1: Inference for the Mean of a Population

VI. Introduction to Logistic Regression

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

Row vs. Column Percents. tab PRAYER DEGREE, row col

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Using Stata for Categorical Data Analysis

Mind on Statistics. Chapter 13

Recall this chart that showed how most of our course would be organized:

Section 12 Part 2. Chi-square test

AP STATISTICS (Warm-Up Exercises)

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Elementary Statistics

People like to clump things into categories. Virtually every research

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

Math 58. Rumbos Fall Solutions to Review Problems for Exam 2

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Categorical Data Analysis

One-Way Analysis of Variance

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

Calculating the Probability of Returning a Loan with Binary Probability Models

SPSS Step-by-Step Tutorial: Part 2

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

Tests of Hypotheses Using Statistics

Chi Square Distribution

Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Chi Square Tests. Chapter Introduction

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

Outline. Dispersion Bush lupine survival Quasi-Binomial family

9. Sampling Distributions

IBM SPSS Statistics for Beginners for Windows

An SPSS companion book. Basic Practice of Statistics

Data Mining Introduction

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 7. Comparing Means in SPSS (t-tests) Compare Means analyses. Specifically, we demonstrate procedures for running Dependent-Sample (or

Chapter Four. Data Analyses and Presentation of the Findings

SPSS Guide: Regression Analysis

How to set the main menu of STATA to default factory settings standards

Introduction to Hypothesis Testing

Two Related Samples t Test

When to use Excel. When NOT to use Excel 9/24/2014

An Introduction to Statistical Tests for the SAS Programmer Sara Beck, Fred Hutchinson Cancer Research Center, Seattle, WA

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

How To Test For Significance On A Data Set

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

MBA 611 STATISTICS AND QUANTITATIVE METHODS

SPSS TUTORIAL & EXERCISE BOOK

Binary Diagnostic Tests Two Independent Samples

Linear Models in STATA and ANOVA

Statistical tests for SPSS

Chapter 7 Section 1 Homework Set A

Testing for differences I exercises with SPSS

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Projects Involving Statistics (& SPSS)

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Stats Review Chapters 9-10

SPSS Guide How-to, Tips, Tricks & Statistical Techniques

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

Introduction to Statistics with GraphPad Prism (5.01) Version 1.1

Introduction Course in SPSS - Evening 1

Simulating Chi-Square Test Using Excel

Study Guide for the Final Exam

Types of Data, Descriptive Statistics, and Statistical Tests for Nominal Data. Patrick F. Smith, Pharm.D. University at Buffalo Buffalo, New York

Two Correlated Proportions (McNemar Test)

Can SAS Enterprise Guide do all of that, with no programming required? Yes, it can.

Data Analysis for Marketing Research - Using SPSS

Beginning Tutorials. PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI OVERVIEW.

4. Descriptive Statistics: Measures of Variability and Central Tendency

Transcription:

3. Analysis of Qualitative Data Inferential Stats, CEC at RUPP Poch Bunnak, Ph.D. Content 1. Hypothesis tests about a population proportion: Binomial test 2. Chi-square testt for goodness offitfit 3. Chi-square test for independence 4. Notes on Measures of associations 2010 Poch Bunnak 2

1. Hypothesis tests About a population proportion: One-sample binomial test 2010 Poch Bunnak 3 1.1. Situations: To compare if a sample proportion is different from another test value, such at the value a given population, past result... Examples: In 2010, girls accounted 25% of all students in all universities in Phnom Penh. The Rector wishes to know if the % of girls at RUPP is significantly higher or lower than the overall proportion. A president got 55% of votes. After one year in his president office, his wanted to know if the number of his supporters increased or decreased. A company sold 2500 red and 2300 blue toys. Do data provide evidence for a significant color preference? 1.2. Test statistics: Large sample: Z-test; Small sample: Binomial test (n > 5/[min(p, 1-p)] 2010 Poch Bunnak 4

1.2.a Test statistics: z test Random sample, binary var, normal distribution of p (the closer p to 0 or 1 for any sample size, the skewed is the distribution of p) H 0 : p = p h H a : p p h Or H a : p > p h Or H a : p < p h Two ways: Z = (p p h )/ S.E. (p) or Z = Sqrt(chi-square) (hi then find the p value make decision i S.E.(p) = sqrt((p*(1-p))/n) Run 13 1.3. Example Women today are getting more educated and working outside the home for cash. They are likely to marry later and have fewer children than before. Is this claim true? Use CDHS 2005 data to test if the mean age at marriage and the mean number of children in 2005 are differentfrom f those in 2000. 2010 Poch Bunnak 5 1.2.b Test statistics: Binomial test Based on binomial distribution (prob distribution of two outcomes only, binary var) Assumptions: Binary var., normality [n*p>10 & n*(1-p)>10] Example 15 girls and 35 boys were enrolled in one class. Is this class gender-different from the gender admission quota of 25% girls? SPSS data: create two vars (gender with 1=girls and2=boys and n with 15 for girls and 35 for boys), weight by n. SPSS analysis: Analyze Nonparametric test Binomial Move gender in Test var List Box Enter 0.25 in the Test Proportion Box OK 2010 Poch Bunnak 6

Binomial Test Asymp. Observed Sig. Category N Prop. Test Prop. (1-tailed) cat Group 1 1.00 15.30.25.252 a Group 2 2.00 35.70 Total 50 100 1.00 a. Based on Z Approximation. Interpretation There was 30% of girls in the class with 5% greater than the quota. However, the difference is not statistically significant based on the binomial test (n = 50, p (1-tailed) = 0.252) Practice: Redo the test with 150 girls and 350 boys. What do you see? A survey of 200 voters showed that 120 voted for A and 80 votedfor tdf B. BIs there enough evidence to predictthe ditth winner? 2010 Poch Bunnak 7 Other features of binomial test with SPSS Note that binomial test is always one-tailed test SPSS does not calculate CI of the difference. You can do this using formulas You can use cut point to split the data, if do not want to do recode (values =< cut point value is group 1) Three options for calculating p values: Asymptotic distribution (z approximation) Exact test (based on actual data w/o prob sampling calculation): when the normal approximation is not met. You should use this test if your data are small or p is small Monte Carlo: When the sample size is too large 2010 Poch Bunnak 8

2. Hypothesis tests about a population s proportions: Chi-square test for goodness of fit 2010 Poch Bunnak 9 2.1. Situations: To compare a sample s freq distribution of a categ var with expected frequencies (all categories contain the same proportion of values) or with user-specified proportions p of values Examples: Do all three candidates have a significant difference in the number of supporters? Is there any evidence showing that all departments have different numbers of first year students? In 2000, 10% were extreme poor, 20% were just poor, 40% were just above the poverty line, 25% were rich, and 5% were very rich. Is there any change in the distribution of living standard 10 years later? 2.2. Test statistics: Chi-square test for goodness of fit 2010 Poch Bunnak 10

2.3. Chi-square test assumptions Nonparametric test no distribution shape assumption Categorical data; data from a random sample The 2 test is valid only if the expected freq (f () (e) ) is at least 5 for any category or no more than 20% of the categories should have f (e) < 5 H 0 : f (o) = f (e) (ll (all categories); )H a : f (o) f (e) (tl (at least1 category) 2.3. Example The distribution of foundation-year students by department is: 100 in English, 80 in math, and 110 in computer science. Is the difference statistical significant at 99% CL? H0: the distribution of students are equal across the department Enter data in SPSS (dept and n vars) and weight by n SPSS: Analyze Nonparametric Chi-square Put Dept var in Test Var List (Be sure all cat equal is ticked) OK 2010 Poch Bunnak 11 2.4. Result and interpretation Notes: f(e) = n of case/n of cat; residual = f(0) f(e); 2 = Sum[(f (0) -f (e) ) 2 /f (e) ]; df = n of cat 1 Compare the obtained 2 with critical 2 or use asymp sig to make decision about the test Interpretation: The test is not stat sig ( 2 = 4.8, df=2, p=0.089), meaning that H 0 is accepted tdandh a is rejected. Thus, there is no evidence supporting the H a that the distribution of students are different across all three departments. 2010 Poch Bunnak 12

2.5. Other notes for Chi-square test with SPSS If you have many categories but wish to analyze some of them, you can specify the range of values to be analyzed You can do the test of freq distribution against user-defined (values entered in the order of cat value codes and one of them must be different; otherwise equal f (e) is the same as bf before) Three options for calculating p values: Asymptotic distribution (z approximation) Exact test: when the freq values assumptions are not met (small n f (e) () too small and too many categories with < 5) Monte Carlo: When the sample size is too large 2010 Poch Bunnak 13 3. Hypothesis tests about Different population proportions: Chi-square test for independence 2010 Poch Bunnak 14

3.1. Situations You want to find if two categorical vars from a single population are associated 2 vars are associated if they are dependent on one another (change in one var change in the other var) Examples Is the proportion offemale f students t in two departments t (English and Computer) the same? [Gender and dept vars; if the % of females in both depts is the same no association b/w gender and field of study] Does the number of supports of a president change after 1 year ofhis being elected? [2 vars: support (yes-no) and time(when elected and 1 year later). If the % of supporters the same no association] If both vars are binary 2x2 table; In general, 2 categorical variables rxc table 2010 Poch Bunnak 15 3.3. Chi-square test assumptions Nonparametric test no distribution shape assumption Categorical or nominal data; data from a random sample The 2 test is valid only if f () (e) > 0 there is no more than 20% of the categories have f (e) < 5 H 0 : There is no association b/w var 1 and var 2 No association i = independence d = (% in urban ~ % in rural) = (f (o) = f (e) ) 3.4. Example 1: 2x2 table The mean age at mar is 19.4 yrs. Is there any sig difference in the proportion of mar at age below the average b/w urban and rural areas? H 0 : no association b/w age at mar and residential location SPSS CDHS 2005 data Recode v511 into binary var v511_d with 1=below 19.4 and 2>=19.4 Analyze Descriptive Crosstabs V025 as column and v511_d as row Clikt Clickto open Statistics Sttiti Box Click Chi-square Continue OK 2010 Poch Bunnak 16

Result 2010 Poch Bunnak 17 Interpretation In total, t 59% of women married at age below the average of age at mar. This proportion is higher in rural areas than in urban areas (59.9% versus 55.9%, respectively. A 2 test was performed see if the two vars are independence and the result showed that the two vars are independent at 95% CL (chi- square = 34 3.4, 2-tailed p = 0.065). 065) Although the proportion of marriage at the age below the mean is higher in rural than in urban areas, the difference is not statistical significant. Note that if H a is one-tailed, thus p = 0.065/2 = 0.033 sig at 95% two vars are dependent! 2010 Poch Bunnak 18

Importance! Chi-square test does not tell us how strong is the association. To know this, we need to request measures of association: Contingency coefficient: i C=sqrt( 2 /( 2 /)) /n)), 2 -based, value: 0 ~1 (reach 1 only if there are many categories of vars) Phi: Adjusted for n, =sqrt( 2 /n), for 2x2 table only, value: 0 1 Cramer s V: Adj for n, V=sqrt( 2 /(n*min(r-1,c-1)), rxc tables, value: 0 1 Lamda and Uncertainty Coefficients are 2 -based, value: 0 1; PRE interpretation: improvement in predicting one var given the knowledge of the other var. 2010 Poch Bunnak 19 3.4. Example 2: rxc table Find out if the level of education of husbands and wives are related ( v106 and v701). Is this true for both urban and rural areas? (Use appropriate tests) Table below summarizes the data on religious preference and the attitudes towards abortion in one country. Is respondents attitude related to their religious preference? Religious preference Liberal Conservativ protestant e protestant Catholic None Total Attitude toward abortion Favor 103 182 80 16 381 Oppose 187 238 286 74 785 Total 290 420 366 90 1166 2010 Poch Bunnak 20