# CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

Save this PDF as:

Size: px
Start display at page:

Download "CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont"

## Transcription

2 True Condition Guess First Second First Second I created these data with these cell frequencies so as to have a table whose Pearson chisquare statistic will be significant at close to α =.05. In this case the probability under the null is (These data are more extreme that the actual data as far as Muriel s ability to detect differences is concerned.) Pearson s Chi-Square Test Let s start with a simple Pearson chi-square test for a 2 2 table, along with the odds ratio ( O E) (29 24) χ = Σ = + E 24 2 (29 24)... + = / OR = = = / As I said, this value of chi-square is significant at p = We can reject the null hypothesis and conclude that Muriel s response and the way the tea was prepared are not independent. In other words she guessed correctly at greater than chance levels. (Note that I did not include Yates correction for these data, but if I had the chi-square would have been 3.38 with a p value of We will come back to that later.) Fisher s Exact Test and the Hypergeometric Distribution Fisher specifically did not evaluate these data with a Pearson chi-square, and not just because he couldn t stand Pearson, which he couldn t. His reasoning involved the hypergeometric distribution. When both row and column totals (the marginals ) are fixed, the hypergeometric distribution will tell us the probability that we will have a specified number of observations in cell 11. (We could have picked on any cell, but cell 11

3 seems like a nice choice. We only need to worry about one cell because we only have 1 df. If we had a 10 in cell 11, then cell 12 must be 38, cell 21 but be 28, and cell 22 must be 10.) The formula for the hypergeometric, if you must know, is p x n1. n ! 48! x n x !19! 19!29! N 96 96! n 48 48!48!.1 ( ) = = = = Where x = observation in cell 11, n 1. = total of row 1, n..2 = total of column 2, and N = total number of observations. But of course this will only give us the probability of exactly 29 observations in cell 11. We are going to want the probability of 29 or more observations in that cell, so we have to evaluate this expression for all values of 29 and higher. This is shown below, though I only carried 5 decimals and the entries rapidly drop to But to make this a two-tailed test we also need to know the probability of 19 or fewer observations, which, because we have the same number of each type of tea, is the same. So the two-tailed probability is Number in Probability Upper Left Sum Number in Upper Probability Left Sum In case you are interested, the distribution is plotted below, where you can see that the vast majority of outcomes lie between about 18 and 31.

4 Using Fisher s Exact Test we have a one-sided probability of.033, which would lead us to reject the null hypothesis. The two-sided probability would be.0656, which would not allow us to reject the null. Notice that we did not reject the (two-tailed) null hypothesis using Fisher s test but we did using the Pearson chi-square test. This is most likely a result of the fact that observations are discrete, whereas the chi-square distribution is continuous. I earlier gave the corrected version of chi-square, called Yates correction, which had a probability of That number is very close to Fisher s value, which is as it should be. Yates was trying to correct for the continuousness of the chi-square distribution, and he did so admirably. Contingency Tables with One Set of Fixed Marginals In the data table that we just examined, both the row and marginal totals were fixed. We knew in advance that there would be 48 cups of each type of tea and that Muriel would make 48 choices of each type. But consider a different example where the row totals are fixed but not the column totals. I will take as an example Exercise 6.13 from the 6 th edition of Statistical Methods for Psychology. In 2000 the State of Vermont, to their very great credit, a slight editorial comment approved a bill authorizing civil unions between gay and lesbian partners. The data shown below suggest that there was a difference on this issue between male and female legislators. I chose this example, in part, because the sample size (145 legislators) is reasonable and the probability (p =.019) is significant but not extreme. Notice that the row totals are fixed because if someone demanded a recount the number of male and female legislators would be the same (44 and 101) but the number of votes for and against the legislation could change. The calculations of Pearson s chi-square and the odds ratio are straightforward.

5 Vote Yes No Total Women Men Total ( O E) ( ) χ = Σ = + E ( )... + = / OR = = = / The usual way to evaluate this test statistic is to compare it to the mathematically defined chi-square distribution. Although that distribution will not be an exact fit to the sampling distribution of this statistic, it will be very close. Using R, or any other statistical software, the two-tailed p value is By normal standards, this value is statistically significant and we can conclude that how legislators voted depended, to some extent, on their gender. Women were more likely than men to vote for the legislation. The odds ratio was Notice that this is a twotailed test because we could have a large value of chi-square if either men outvoted women or women outvoted men the difference is squared. Another, and equivalent, way of running this test is to notice that 82% of the female legislators supported the measure whereas on 59% of the male legislators did. You probably know from elsewhere that we can test the difference between two proportions using z p = ( ) /145 =.6552 z p p (.3448)(.0326).0858 p(1 p)( + ) n n 1 2 = = = The two-tailed probability is.0190, which is what we found with chi-square. (The reason that they agree so well is that when you have 1 df a chi-square distribution is just the square of a normal distribution, so this is really the same test.) Fisher s Exact Test Although the marginal totals on the columns in this example are not fixed, we could act as if they are and apply Fisher s Exact Test. If we did so we would find a probability under the null of.0226, which is slightly higher than we found with chi-square.

6 Resampling But let s look at this still a third way. Even though we have a lot of cases (145) the distribution is still discrete (χ 2 can only take on a specific set of different values) and is being fit against a continuous distribution. So suppose that we set up a simple resampling study for the case with one fixed set of marginals. First we will assume that the null hypothesis is true so we are sampling from populations with the same proportion of Yes votes. Then our best estimate of the common population proportion is 95/145 = In our resampling study we will draw 44 cases, corresponding to women in the legislature. For each woman we will draw a random number between 0 and 1. If that number is less than.6552 we will record her vote as Yes. After we have drawn for all 44 women we will compute the proportion of them who voted Yes. (I could have accomplished the same thing by drawing 44 observations from a binomial with p =.6552 e.g. rbinom(n = 1, size = 44, prob =.6552). I did things the long way because it makes it easier to grasp the underlying process.) In the long run we would expect to have 65.52% of women voting Yes, if the null hypothesis is true, but the actual counts will vary due to normal sampling error. Then we will do the same thing for males, but this time making 101 draws with number of Yes votes equal to the number of times our random number was greater than We will record the difference between male and female Yes votes. We will then repeat this experiment 10,000 times, each time generating the number of votes for the legislation by both men and women. Notice that we have set the common probability in both cases at.6552, so, on average, the differences between males and females will come out to When we are all done we will have the distribution of differences for 10,000 cases based on a true null hypothesis, and we can ask what percentage of those differences exceeded.2013, the difference in proportions that we found from our study 1. Notice that I am holding the number of males and females constant, but allowing the votes to vary. A histogram of the results is printed below. 1 Although I am using the difference between the two sample proportions as my statistic, I could instead calculate chi-square for each resampling and use that as my statistic. I just need something that represents a measure of how similar or different are the behaviors of men and women. ( In the program that is attached, I use chi-square as my statistic, but I do not compare it to the chi-square distribution.)

7 The proportion of differences greater than.2013 is The proportion of differences less than is If we add together the two tails of the distribution we find that 1.84% of the observations exceeded , which was our obtained difference. These results are comforting because they very nearly duplicate the results of our chi-square test, which had a p value of.190. One important thing that this tells us is that chi-square is an appropriate statistic when one set of marginals are fixed at least if we have relatively large sample sizes. The Case of No Fixed Marginals Now we will carry the discussion of contingency tables one step further by considering results in which we would not be willing to assume that either set of marginals is fixed. There have been a number of studies over the years looking at whether the imposition of a death sentence is affected by the race of the defendant (and/or the race of the victim). Peterson (2001) reports data on a study by Unah and Borger (2001) examining the death penalty in North Carolina in The data in the following table show the outcome of sentencing for white and nonwhite (mostly black and Hispanic) defendants when the victim was white. The expected frequencies are shown in parentheses.

8 Defendant s Race Sentencing as a function of the race of the defendant Nonwhite 33 (22.72) White 33 (43.28) Death Sentence Yes No Total 251 (261.28) 508 (497.72) Total This table is different from the others we have seen because if we went out and collected new data for some other time period or some other state, both the row totals and the column totals would be expected to change. Although it would be possible to do the arithmetic for Fisher s Exact Test, it would be hard to argue that we should condition on these marginals. So let s look at a different way of approaching the problem. First of all there is nothing to stop us from running a plain old Pearson chi-square, and, in fact, that is probably what we should do. These calculations follow. 2 ( O E) χ = Σ E 2 ( ) ( ) ( ) ( ) = = 7.71 The probability of chi-square = 7.71 on 1 df, if the null is true, is.0055, leading us to reject the null. (The probability for Fisher s test, which I think is not appropriate here, is.007.) The Resampling Approach Again Fisher met a lot of resistance to his idea on contingency tables because people objected to holding the marginal totals fixed. They argued that if the experiment were held again, we would be unlikely to have the same numbers of white and black defendants. Nor would we have the same number of death sentences. (This argument actually went on for a very long time, and we still don t have good resolution. I, myself, have moved back and forth, but I now think that I would recommend Fisher s Exact Test only for the case of fixed marginals.)

9 One of the huge problems in this debate was that the calculations became totally unwieldy if you let go of the fixed marginals requirement. BUT now we have computers, and they like to do things for us. Computers, like Google, are our friends! as you saw in the previous problem. Suppose that we pose the following task. We have 825 defendants. For this sample, 284 of them are nonwhite and 541 of them are white. Thus 284/825 =.2442 is the proportion of nonwhite defendants. Similarly, 66/825 =.08 is the proportion of defendants who were found guilty. From now on I don t care about how many nonwhite or guilty defendants we will have in any sampling, but if the null hypothesis is true,.2442 *.08 =.0195 is the expected proportion of nonwhite guilty defendants we would have in any sample. So what we can do is draw a sample of 825 observations where the probability of landing in cell 11 is Similar calculations will give us the expected proportions landing in the other cells. Notice that these proportions are calculated by multiplying together the proportions of each race and the proportions of each verdict. This would only be the case if the null hypothesis were true. Instead of looking at all possible outcomes and their probability, which would be a huge number, I ll instead take a random sample of all of those outcomes. Let s run 10,000 experiments. Each time we run an experiment we let a random number generator put the observations in the four cells randomly based on the calculated probabilities under the null. For example, the random number generator (drawing from the set of numbers 1-4) might pick a 1 with a probability of.0195 and thus add the defendant to the 1 st cell, which is cell 1,1, which is the cell for NonWhite/Yes. The program makes 825 of these random assignments and pauses. We have filled up our contingency table with one possible outcome of 825 observations, but we need some way to tell how extreme this particular outcome is. One way to do this is to calculate a chi-square statistic on the data. That is easy to do, and we calculate this statistic for each particular sample. Keep in mind that we are using chi-square just as a measure of extremeness, not because we will evaluate the statistic against the chi-square distribution. We could have used the difference between the cross products in the matrix. The more extreme the results, the larger our chi-square. We just tuck that chi-square value away in some array and repeat the experiment all over again. And we do this 10,000 times. I actually carried out this experiment. (That took me 37 seconds, which is faster than I could calculate Pearson s chi-square once by hand.) For my 10,000 resamples, the proportion of cases that were more extreme than a chi-square of 7.71 was.0056, which is in excellent agreement with the chi-square distribution itself, which gave a probability of Remember that these data were drawn at random, so the null hypothesis is actually true. This is beginning to look as if the chi-square distribution is an excellent approximation of the results we obtained here, even when neither set of marginals is fixed. We can look more closely at the results of the resampling study to see if this is really true. The results of this sampling procedure are shown below. What I have plotted is not the mathematical chi-square distribution but the distribution of chi-square values from our 10,000 resampling. It certainly looks like a chi-square distribution, but looks can be deceiving.

10 One nice feature of the results is that the mean of this distribution is 0.98 with a variance of (On a second run those values were and , respectively.) For the true chi-square distribution the mean and variance are df and 2df, or 1 and 2, which is very close. Let s see how well they agree across a range of values. In Statistical Methods for Psychology, 7 th ed. I used QQ plots to test normality. I just plotted the obtained quantiles against the ones expected from a normal distribution. I will do the same thing here except that I will plot my obtained chi-square values against the quantiles of the chi-square distribution. For example, I would expect 1% of a chi-square distribution with 1 df to exceed Actually I found 94 / 10,000 =.94% at 6.63 or above. I would expect 50% of the results to be greater than or equal to 0.45, and actually 49.8% met that test. If I I obtain similar pairs of values for all of the percentages between 0 and 100, I would have the following result. Notice that the values fall on a straight line. There are only trivial deviations of the results from that line. (The dashed lines represents confidence intervals.) So now we have at least four ways to evaluate data in 2 2 contingency tables. We can run a standard Pearson s chi-square test (or a likelihood ratio test, which I have not described), we can assume that both sets of marginals are fixed and use Fisher s Exact test, we can assume that row marginals are fixed but not the column marginals, or we can assume nothing about the marginals (other than the total number of observations) and run a resampling test. In the language of statisticians we are moving from the hypergeometric to the binomial to the multinomial distributions. The fact our results generally come out to be close is encouraging. I m a great fan of randomization tests, but that is partly because I like to write computer programs. For those who don t like to write programs, you can use either Fisher s Exact Test or Pearson s chi-square. There is a slight bias toward Fisher s test in the literature when you have small sample sizes, but don t settle for it until you have read the rest of this document.

11 The one thing that I would not recommend is using Yates Correction with the standard Pearson chi-square. If you are worried about discreteness of the probability distribution go with Fisher s Exact Test. Remember that Fisher s Exact Test applies only to 2 2 tables. It can be expanded to larger tables, (Howell and Gordon, 1976), but that is not commonly done. (In R and S- Plus there are no restrictions on dimensionality if you use fisher.test( ) But are Things Really That Good?? It s nice to know that with reasonably large samples the randomization tests (and the hypergeometric) produce results very similar to those produced by chi-square. But what if we don t have large samples? In that case the chi-square distribution may not be an adequate fit to the resulting test statistic. I will start with Fisher s original experiment, which only had 8 cups of tea. His results are below. True Condition Guess First Second First Second Suppose that we compute chi-square on these data. Our result will yield of chi-square of 2.00 on 1 df with an associated probability of But Fisher s exact test gives a probability of.4857!!! The reason is fairly simple. There are only a few ways the data could have come out. The table could have looked like any of the following χ 2 = p Fisher tail Fisher tail Notice how discrete the results are and how far the (correct) Fisher probabilities are from the chi-square probabilities. With small samples and fixed marginals I strongly

12 suggest that you side with Fisher he could use some company. We know that for fixed marginals his values are theoretically correct, and we now know that chi-square probabilities are not even close. So the p values assigned by Fisher and by the chi-square distribution are greatly different. But perhaps this is a bad example because the frequencies are so very small. So let s go back to the example of the Vermont legislature and cut the sample sizes in each cell by approximately 5. (I had to cheat a bit to get whole numbers.) Remember that this is not a case where we would prefer to use Fisher s Exact test because only the row totals are constant. So I will compare a standard chi-square test with the resampling procedure that we used earlier. Vote Yes No Total Women Men Total ( O E) ( ) χ = Σ = + E ( )... + = / OR = = = / The p value (two-sided) from chi-square =.351, which is not significant. If we had asked for Yates correction, chi-squared would have been.2597 with a p value of.61 whereas Fisher s test would have had a p value of.43. Notice that these values are all over the place. Now let s use the same resampling approach that we used earlier, holding the row marginals fixed but not the columns. The obtained difference in the two proportions were = The probability of an outcome more extreme than this, in either direction, was =.3521, which is virtually the same as the chi-square probability, but quite different from the probability given by Fisher s test (.43). I prefer the resampling approach because it is in certain ways an exact test. But notice that the standard chi-square test produces almost the same statistic. This tells me that chi-square is a good fallback when you don t want to do a randomization test. Now let s go back to the death sentence data where neither set of marginals are fixed. If I cut my cell frequencies by approximately 9, I obtain the following data. The total sample sizes are larger than I would like for an example, but I can t reduce the cells by much more and still have useful data.

13 Defendant s Race Death Sentence Yes No Total Nonwhite White Total The standard Pearson chi-square on these data is for p = When we run a randomization test with the null hypothesis true and no constraints on the marginal totals we find 3518 results that are greater than our obtained chi-square, for p = Notice how well that agrees with the Pearson chi-square. Yates correction gave a p value of.5773, whereas Fisher s Exact Test gave a p = Again we see that the standard chi-square test is to be preferred. (Well, I later reduced my cells by a factor of approximately 10 and found a chi-square probability of.395 and a resampling probability of.410 who can complain about that?) So what have we learned? Well, I have learned a lot more than I expected. When the data can reasonably be expected to have both sets of marginal totals fixed, then conditioning on those marginals, which is what Fisher s Exact Test does, is the preferred way to go. However when one or both sets of marginals are not fixed, and when the sample size is small, Fisher s test gives misleading values. Both when one set of marginals is fixed and when no marginals are fixed, the standard Pearson chi-square test, without Yates correction, is to be preferred. This would appear to be in line with the recommendation given by Agresti (2002), which is always reassuring. The chi-square test (when one or zero marginals are fixed) agrees remarkably well with randomization tests that seem to be reasonable ways of conceiving of the data. I had expected to find something like this, but I never thought that it would be anywhere as neat as it is. Now I have to go back to the book that I am revising and re-revise the section on Fisher s Exact Test. The Programs The programs that I created with R can be easily downloaded. They are not particularly elegant, but they work just fine. In each of them I have several sets of data, with a # in front of all but one. A # simply comments out the line. You can create your own data matrix by mimicking what I have there.

14 Plot the Hypergeometric Distribution Row Marginals Fixed No Fixed Marginals ########################################## 8/15/2009

### Contingency tables, chi-square tests

Contingency tables, chi-square tests Joe Felsenstein Contingency tables, chi-square tests p.1/13 Confidence limits on a proportion When you observe the numbers of times two outcomes occur, in a binomial

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### Testing Hypotheses using SPSS

Is the mean hourly rate of male workers \$2.00? T-Test One-Sample Statistics Std. Error N Mean Std. Deviation Mean 2997 2.0522 6.6282.2 One-Sample Test Test Value = 2 95% Confidence Interval Mean of the

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Chi Square for Contingency Tables

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Solutions 7. Review, one sample t-test, independent two-sample t-test, binomial distribution, standard errors and one-sample proportions.

Solutions 7 Review, one sample t-test, independent two-sample t-test, binomial distribution, standard errors and one-sample proportions. (1) Here we debunk a popular misconception about confidence intervals

### Introduction to Statistics for Computer Science Projects

Introduction Introduction to Statistics for Computer Science Projects Peter Coxhead Whole modules are devoted to statistics and related topics in many degree programmes, so in this short session all I

### Chapter 7 Part 2. Hypothesis testing Power

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

### Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### Chi-Square Analysis (Ch.8) Purpose. Purpose. Examples

Chi-Square Analysis (Ch.8) Chi-square test of association (contingency) x tables rxc tables Post-hoc Interpretation Running SPSS Windows CROSSTABS Chi-square test of goodness of fit Purpose Chi-square

### 4) The goodness of fit test is always a one tail test with the rejection region in the upper tail. Answer: TRUE

Business Statistics, 9e (Groebner/Shannon/Fry) Chapter 13 Goodness of Fit Tests and Contingency Analysis 1) A goodness of fit test can be used to determine whether a set of sample data comes from a specific

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Week 7: Chi-squared tests

The chi-squared test for association Health Sciences M.Sc. Programme Applied Biostatistics Week 7: Chi-squared tests Table 1 shows the relationship between housing tenure for a sample of pregnant women

### Point Biserial Correlation Tests

Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540

Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540 1. Assume you are tossing a coin 11 times. The following distribution gives the likelihoods of getting a particular number of

### DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS. Posc/Uapp 816 CONTINGENCY TABLES

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 CONTINGENCY TABLES I. AGENDA: A. Cross-classifications 1. Two-by-two and R by C tables 2. Statistical independence 3. The interpretation

### 1. Chi-Squared Tests

1. Chi-Squared Tests We'll now look at how to test statistical hypotheses concerning nominal data, and specifically when nominal data are summarized as tables of frequencies. The tests we will considered

### Two Correlated Proportions (McNemar Test)

Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

### Chi Square Analysis. When do we use chi square?

Chi Square Analysis When do we use chi square? More often than not in psychological research, we find ourselves collecting scores from participants. These data are usually continuous measures, and might

### 14 Chi-Square CHAPTER Chapter Outline 14.1 THE GOODNESS-OF-FIT TEST 14.2 TEST OF INDEPENDENCE

www.ck12.org CHAPTER 14 Chi-Square Chapter Outline 14.1 THE GOODNESS-OF-FIT TEST 14.2 TEST OF INDEPENDENCE 278 www.ck12.org Chapter 14. Chi-Square 14.1 The Goodness-of-Fit Test Learning Objectives Learn

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Chapter 8 Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter problem: Does the MicroSort method of gender selection increase the likelihood that a baby will be girl? MicroSort: a gender-selection method developed by Genetics

### Sample Size Determination

Sample Size Determination Population A: 10,000 Population B: 5,000 Sample 10% Sample 15% Sample size 1000 Sample size 750 The process of obtaining information from a subset (sample) of a larger group (population)

### Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam. Software Profiling Seminar, Statistics 101

Dr. Peter Tröger Hasso Plattner Institute, University of Potsdam Software Profiling Seminar, 2013 Statistics 101 Descriptive Statistics Population Object Object Object Sample numerical description Object

### Tests for Two Proportions

Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

### LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

### Stat 371, Cecile Ane Practice problems Midterm #2, Spring 2012

Stat 371, Cecile Ane Practice problems Midterm #2, Spring 2012 The first 3 problems are taken from previous semesters exams, with solutions at the end of this document. The other problems are suggested

### 1 Confidence intervals

Math 143 Inference for Means 1 Statistical inference is inferring information about the distribution of a population from information about a sample. We re generally talking about one of two things: 1.

### Confidence Intervals for Cp

Chapter 296 Confidence Intervals for Cp Introduction This routine calculates the sample size needed to obtain a specified width of a Cp confidence interval at a stated confidence level. Cp is a process

### Chapter 8. Hypothesis Testing

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

### Distributions: Population, Sample and Sampling Distributions

119 Part 2 / Basic Tools of Research: Sampling, Measurement, Distributions, and Descriptive Statistics Chapter 9 Distributions: Population, Sample and Sampling Distributions In the three preceding chapters

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Hypothesis tests, confidence intervals, and bootstrapping

Hypothesis tests, confidence intervals, and bootstrapping Business Statistics 41000 Fall 2015 1 Topics 1. Hypothesis tests Testing a mean: H0 : µ = µ 0 Testing a proportion: H0 : p = p 0 Testing a difference

### The Basics of a Hypothesis Test

Overview The Basics of a Test Dr Tom Ilvento Department of Food and Resource Economics Alternative way to make inferences from a sample to the Population is via a Test A hypothesis test is based upon A

### statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

### Introduction to Regression. Dr. Tom Pierce Radford University

Introduction to Regression Dr. Tom Pierce Radford University In the chapter on correlational techniques we focused on the Pearson R as a tool for learning about the relationship between two variables.

### ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer.

ANOVA MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, select the best answer. 1. Analysis of variance is a statistical method of comparing the of several populations. a. standard

### CHI-Squared Test of Independence

CHI-Squared Test of Independence Minhaz Fahim Zibran Department of Computer Science University of Calgary, Alberta, Canada. Email: mfzibran@ucalgary.ca Abstract Chi-square (X 2 ) test is a nonparametric

### T-test in SPSS Hypothesis tests of proportions Confidence Intervals (End of chapter 6 material)

T-test in SPSS Hypothesis tests of proportions Confidence Intervals (End of chapter 6 material) Definition of p-value: The probability of getting evidence as strong as you did assuming that the null hypothesis

### Chi-Square Tests. In This Chapter BONUS CHAPTER

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

### Chi-square test Testing for independeny The r x c contingency tables square test

Chi-square test Testing for independeny The r x c contingency tables square test 1 The chi-square distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computer-aided

### Point and Interval Estimates

Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number

### Introduction to Hypothesis Testing

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Common Univariate and Bivariate Applications of the Chi-square Distribution

Common Univariate and Bivariate Applications of the Chi-square Distribution The probability density function defining the chi-square distribution is given in the chapter on Chi-square in Howell's text.

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Margin of Error When Estimating a Population Proportion

Margin of Error When Estimating a Population Proportion Student Outcomes Students use data from a random sample to estimate a population proportion. Students calculate and interpret margin of error in

### CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V Chapters 13 and 14 introduced and explained the use of a set of statistical tools that researchers use to measure

### N Mean Std. Deviation Std. Error of Mean

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 815 SAMPLE QUESTIONS FOR THE FINAL EXAMINATION I. READING: A. Read Agresti and Finalay, Chapters 6, 7, and 8 carefully. 1. Ignore the

### Having a coin come up heads or tails is a variable on a nominal scale. Heads is a different category from tails.

Chi-square Goodness of Fit Test The chi-square test is designed to test differences whether one frequency is different from another frequency. The chi-square test is designed for use with data on a nominal

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Hypothesis Testing. Learning Objectives. After completing this module, the student will be able to

Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and

### Using Excel for inferential statistics

FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Investigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness

Investigating the Investigative Task: Testing for Skewness An Investigation of Different Test Statistics and their Power to Detect Skewness Josh Tabor Canyon del Oro High School Journal of Statistics Education

### Introduction to Hypothesis Testing. Copyright 2014 Pearson Education, Inc. 9-1

Introduction to Hypothesis Testing 9-1 Learning Outcomes Outcome 1. Formulate null and alternative hypotheses for applications involving a single population mean or proportion. Outcome 2. Know what Type

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### BLOSSOMS MODULE ARE RANDOM TRIANGLES ACUTE OR OBTUSE? By Gilbert Strang Professor of Mathematics Massachusetts Institute of Technology

BLOSSOMS MODULE ARE RANDOM TRIANGLES ACUTE OR OBTUSE? By Gilbert Strang Professor of Mathematics Massachusetts Institute of Technology Hi! I m Gilbert Strang. I teach math at MIT. Mostly what I teach is

### Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

### General Procedures for Testing Hypotheses

Chapter 19 General Procedures for Testing Hypothesies 297 CHAPTER 19 General Procedures for Testing Hypotheses Introduction Skeleton Procedure for Testing Hypotheses An Example: Can the Bio-Engineer Increase

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Estimating and Finding Confidence Intervals

. Activity 7 Estimating and Finding Confidence Intervals Topic 33 (40) Estimating A Normal Population Mean μ (σ Known) A random sample of size 10 from a population of heights that has a normal distribution

### Crosstabulation & Chi Square

Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### TRANSCRIPT: In this lecture, we will talk about both theoretical and applied concepts related to hypothesis testing.

This is Dr. Chumney. The focus of this lecture is hypothesis testing both what it is, how hypothesis tests are used, and how to conduct hypothesis tests. 1 In this lecture, we will talk about both theoretical

### Hypothesis Testing. Chapter Introduction

Contents 9 Hypothesis Testing 553 9.1 Introduction............................ 553 9.2 Hypothesis Test for a Mean................... 557 9.2.1 Steps in Hypothesis Testing............... 557 9.2.2 Diagrammatic

### Chi Square Test. Paul E. Johnson Department of Political Science. 2 Center for Research Methods and Data Analysis, University of Kansas

Chi Square Test Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2011 What is this Presentation? Chi Square Distribution Find

### HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Models for Discrete Variables

Probability Models for Discrete Variables Our study of probability begins much as any data analysis does: What is the distribution of the data? Histograms, boxplots, percentiles, means, standard deviations

### CHI SQUARE DISTRIBUTION

CI SQUARE DISTRIBUTION 1 Introduction to the Chi Square Test of Independence This test is used to analyse the relationship between two sets of discrete data. Contingency tables are used to examine the

### STAT 35A HW2 Solutions

STAT 35A HW2 Solutions http://www.stat.ucla.edu/~dinov/courses_students.dir/09/spring/stat35.dir 1. A computer consulting firm presently has bids out on three projects. Let A i = { awarded project i },

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Chapter Additional: Standard Deviation and Chi- Square

Chapter Additional: Standard Deviation and Chi- Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Section 12 Part 2. Chi-square test

Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

### We know from STAT.1030 that the relevant test statistic for equality of proportions is:

2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

### The Logic of Statistical Inference-- Testing Hypotheses

The Logic of Statistical Inference-- Testing Hypotheses Confirming your research hypothesis (relationship between 2 variables) is dependent on ruling out Rival hypotheses Research design problems (e.g.

### , for x = 0, 1, 2, 3,... (4.1) (1 + 1/n) n = 2.71828... b x /x! = e b, x=0

Chapter 4 The Poisson Distribution 4.1 The Fish Distribution? The Poisson distribution is named after Simeon-Denis Poisson (1781 1840). In addition, poisson is French for fish. In this chapter we will

### Random Portfolios for Evaluating Trading Strategies

Random Portfolios for Evaluating Trading Strategies Patrick Burns 13th January 2006 Abstract Random portfolios can provide a statistical test that a trading strategy performs better than chance. Each run

### CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES

CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chi-square distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chi-square is