CHAPTER EIGHT TESTING HYPOTHESES REGARDING FREQUENCY DATA AND CONTINGENCY TABLES: THE CHI-SQUARE TESTS

Size: px
Start display at page:

Download "CHAPTER EIGHT TESTING HYPOTHESES REGARDING FREQUENCY DATA AND CONTINGENCY TABLES: THE CHI-SQUARE TESTS"

Transcription

1 1 CHAPTER EIGHT TESTING HYPOTHESES REGARDING FREQUENCY DATA AND CONTINGENCY TABLES: THE CHI-SQUARE TESTS Chapter Objectives In this Chapter you will: Learn how to determine frequencies and proportions using data on nominal scales. Understand the concept of expected values of frequencies and proportions. Learn how to use the chi-squared (χ ) distribution to test hypotheses about differences between observed and expected frequencies and proportions of the various values of a nominal variable in a population. Learn how to use the chi-squared statistic to test hypotheses about the independence or relationship between variables measured on nominal scales. Learn how to determine the strength of relationships between variables measured on nominal scales. Learn to use SPSS to test hypotheses about frequencies, proportions, and relationships of variables measured on nominal scales.

2 A simple thing that we can do with data is to count them. In fact, there is some evidence that writing began developing in the Fertile Crescent as a way of keeping track of the number of domestic animals that were owned by a person or involved in a business transaction. Counting is old! When you come down to it, it is really the only thing we can do with data that is on nominal scales. Frequencies are the results of counting. We can look at relationships between variables on nominal scales and test hypotheses concerning distributions of nominal variables in populations. These latter hypotheses are referred to as hypotheses about goodness of fit since they test how well the patterns of the frequencies fit together. Let s start our investigation of nominal variables by looking at these hypotheses. Hypotheses About Goodness of Fit Imagine that Mr. Lycanthrop, the principal of Transylvania High School suspected that the frequency of students being referred to his office for discipline varied with the phase of the moon. For a year he keeps track of the number of referrals for disciplinary action he received, noting carefully the phase the moon was in at each referral. At the end of the academic year Mr. Lycanthrop observed that the one thousand referrals teachers had made were distributed over the phases of the moon as shown in Table 8.1. Table 8.1 Observed Referrals for Discipline During the Various Phases of the Moon Phase of the Moon Number of Referrals New to waxing quarter 06 Waxing quarter to full 0 Full to waning quarter 305 Waning quarter to new

3 3 Comparing Observed and Expected Values What if Mr. Lycanthrop is wrong and the phase of the moon had nothing to do with the frequency of referrals for discipline? In this case we would expect to find the referrals distributed evenly across the different moon phases. The distribution of the 1,000 referrals would look like Table 8.. Table 8. Expected Referrals for Discipline During the Various Phases of the Moon if the Phase of the Moon and the Number of Referrals were not related to each other Phase of the Moon Number of Referrals New to waxing quarter 50 Waxing quarter to full 50 Full to waning quarter 50 Waning quarter to new These expected values are certainly different from the values actually observed over the course of the academic year. In fact, we can see how much this distribution we observed in the study differed from the distribution we would expect if Mr. Lycanthrop were wrong and there was no relationship between the phases of the moon and discipline referrals by calculating the residuals (i.e., the differences between the two distributions) as shown in Table 8.3. Table 8.3 Observed and Expected Distributions with Residuals Moon Phase Observed Frequency (f o ) Expected Frequency (f e ) Residual (f o f e ) New to Waxing Quarter Waxing Quarter to Full Full to Waning Quarter Waning Quarter to New Note that the sum of the residuals always equals zero.

4 If Lycanthrop is wrong he should have observed the same frequencies that were expected if the phase of the moon had no effect on student discipline. Therefore, the residuals would all be zero. We see that this is not the case here. We can think of two reasons for this. The first is that the phases of the moon affect the behavior of students at Transylvania High (Lycanthrop is correct). The second is that the differences between the observed frequencies and the expected frequencies are simply due to chance (Lycanthrop is wrong). What is the probability that the residuals can be this great (that is, the differences between the observed and expected values can be as great as we see in Table 8) if Mr. Lycanthrop is wrong about the phases of the moon affecting student behavior and the differences between the observed and expected frequencies are simply due to chance? There are two problems that we have to deal with when we evaluate the magnitude of these deviations from expectancy. One is that regardless of how far we are from expectations, a simple sum of the deviations is not useful because it is always zero. For example, if in the table above, you change the observed frequencies to 35, 45, 55, and 60, and calculate the deviation of these values from expected frequency for each category (50), and add them up, the sum would be zero. But you can see that these four frequencies are much closer to the expected frequencies than the ones in Mr. Lycanthrop s study in Table 8.3. One way to solve this problem is to square each deviation from expectancy, that is, examine the squared deviations ( f ) o f e rather than the simple deviations ( f ) canceling each other out. o f e Who Invented the Chi-Squared Test?. Squaring the residuals would prevent them from Bliss 4 Karl Pearson introduced the chi-squared test and the name for it in an article in 1900 in The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. Pearson had been in the habit of writing the exponent in the multivariate normal density as -1/ chi-squared.

5 5 A second problem that we have to solve is that deviation from expectancy is only meaningful if is rather large as compared to the number we expected. For example, in Table 8.3, the expected frequency for the first category (New to Waxing Quarter) is 50, and the deviation from this expected frequency is -44. Would 44 points have the same meaning/implication if the expected frequency was 500? Obviously, if we expect 50, and we are 44 points off, we have a bigger deviation from expectation that if we are expect 500 and we are 44 points off. A solution to this problem is to calculate the ratio of squared deviation for each category and the expected frequency for that category ( f ) o f e f e. The overall deviation from expectation in a study such as Mr. Lycanthrop s would be the sum of these proportions. We can do this by calculating a test statistic known as Chi-Square (χ ) (it starts with a K sound and i rhymes with sky ) using Formula 8.1. Chi-square χ = Sum Observed frequency ( f f ) o f e e Expected frequency (8.1) Table 8.4 demonstrates the procedure for calculating the elements of the formula. Table 8.4 Calculating the Elements of the χ Formula

6 6 Observed Frequency (f o ) Expected Frequency (f e ) Squared Residual ( f o f e ) (f o f e ) f e Residual Moon Phase (f o f e ) New to Waxing Quarter Waxing Quarter to Full Full to Waning Quarter Waning Quarter to New In the research on the effect of the phase of the moon on student behavior, the test statistic is calculated as shown below. χ = ( f f ) ( 06 50) ( 0 50) ( ) ( 69 50) o f e e 50 ( 44) ( 30) ( 55) ( 19) 1936 = = = = The sampling distribution of the χ statistic varies based on the degrees of freedom, which is defined as the number of categories minus one. Figure 8.1 displays the shape of the χ distribution for various degrees of freedom. = Figure 8.1 χ distributions for 1, 3, 5, and 10 degrees of freedom Our example has four categories, so we can see that we have three degrees of freedom in this design. The table of critical values in Appendix X can tell us the value of χ that cuts off a

7 particular area under the curve. For instance, in the χ distribution with three degrees of freedom we find that a value of cuts off the upper 5% of the area of the distribution. Hence, we can conclude that there is less than a 5% probability that we could have obtained a value of χ of (as we did in this case) if there were no differences between the observed and expected distribution of disciplinary referrals. Given the obtained distribution of referrals across the four phases of the moon, Mr. Lycanthrop would have had less than a 5% chance of being correct if he had concluded that the distribution of disciplinary referrals was not related to the phases of the moon. Chi-Square for Goodness of Fit When data are on a nominal scale all we can really do with them is to count the number of times a particular value occurs in a sample of data. Using appropriate theory, we can predict what the distribution of these values would be under certain circumstances. For example, in the previous example we could apply simple probability theory to predict that, if there were no relationship between the phases of the moon and the occurrence of discipline referrals, we could expect that the frequencies of discipline problems should be equal during the four phases of the moon. Now, if the theory were appropriate, the observed distribution of the frequencies should be the same as the theoretically derived distribution. We can use this situation as our null hypothesis: H o : f o = f e and test this null hypothesis against the alternative hypothesis that we did not observe the frequency distribution predicted by the theory (H 1 : f o f e ). We begin by assuming that the null hypothesis is true, calculate the χ statistic, and determine what the probability is off getting a χ value as high as the one obtained if the null hypothesis were true. Remember that if the null hypothesis is true and we reject it, we will commit a Type I error not a very smart thing to do. So, the probability that the null is true given a particular value of the Bliss 7

8 obtained χ statistic is also the probability of our making a Type I error if we decide to reject the null hypothesis in that particular case. As described in Chapter 6, the researcher must decide on the maximum level of Type I error he or she will tolerate before deciding that the chance of the null being true is too high to risk rejecting the null hypothesis. If we found that the chance that the null is true, given our data, is less than this maximum tolerable risk, we can conclude that the chance that the null hypothesis is true is low enough for us to feel comfortable rejecting the null hypothesis and concluding that the obtained distribution is different from the distribution expected under the theory. That is, we can conclude that the observed distribution of values is not a good fit with the expected value. A statistically significant result would tell us that the observed distribution does not fit the theoretically expected one. The expected frequencies of proportions are usually derived from theories (See the box concerning Mendel s theory). But they may also be derived from the proportions in the population if it is known (see the box about the soldier s boots). Therefore, a chi-square goodness of fit may also be used to test hypotheses regarding the difference between obtained sample frequencies/proportions and the population proportions. Bliss 8 Mendel s Theory Gregor Mendel ( ), an Austrian Roman Catholic monk is the originator of the modern genetic theories. In his original work (first published in 1866), he predicted that two attributes of peas (color and seed texture) are genetically determined, and each has a dominant and a recessive form. For example, in certain situations if two yellow pea plants with rough textured peas are cross pollinated and the resulting seeds are planted, Mendelian theory tells us we would expect to obtain offspring pea plants in the ratio 9/16 yellow, rough; 3/16 yellow, smooth; 3/16 green, rough; and 1/16 green, smooth. Suppose Mendel had planted the seeds resulting from 90 such cross pollinated pairs of plants and planted a randomly chosen sample of their seeds finding that 60 of the offspring were yellow with rough seeds, 15 were yellow with smooth seeds, 10 were green with rough seeds, and 5 were green with smooth seeds. Would you conclude that his results supported his theory?

9 9 Steps in Conducting a χ Test for Goodness of Fit 1. Determine the maximum level of Type I error you will tolerate in making decisions about your hypothesis (i.e., the significance of your statistical test) and use this determination to construct a decision rule that tells you when to reject the null hypothesis. In our example, Mr. Lycanthrope decided that he would only reject the null hypothesis if there were less than a 5% chance of it being true (that is if the probability of his making a Type I error when he rejected the null hypothesis was less than 5%). So, his decision rule would have been: Reject the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%. Fail to reject the null hypothesis if p is greater than or equal to.05.. Determine a theoretical distribution of the data expected under specific circumstances (the expected values). In the case of our example, in the circumstance that there was no relationship between the incidence of disciplinary problems and the phase of the moon, we would expect the incidences of disciplinary referrals to be evenly distributed among the four moon phase periods (Table 8.). 3. Gather data and determine the frequency distribution of the data observed in the field (Table 8.1). In our example we determine how many disciplinary referrals were made during each phase of the moon by checking the records kept by Mr. Lycanthrop. 4. Calculate the value of the χ statistic using Equation Determine the number of degrees of freedom of the design by subtracting one from the number of categories in the frequency distribution. In our example there are four phases of the moon and, therefore, the design has three (4 1) degrees of freedom. 6. Use Table X to determine the critical value of χ for the statistical test. That is, find that value that cuts off the upper percent of the area of the χ distribution that you determined in

10 step #1. In the case of Mr. Lycanthrop s study, the critical value that cuts off the upper 5% of the distribution for χ with 3 degrees of freedom was found to be Compare the calculated value of χ using the data with this critical value found in step #6. If the calculated value exceeds the tabled critical value this tells us that the chances of obtaining a value this large if there were no differences between the observed and expected distributions was less than the maximum probability of making a Type I error that you chose. 8. Apply the decision rule you devised in step #1 to determine whether or not you should reject the null hypothesis. In the case of the possibly moonstruck students, we concluded that the probability of the null hypothesis being true was less than 5% (p<.05). Therefore, we will reject the null hypothesis and conclude that the distribution of disciplinary incidences across the various phases of the moon was different from what we would have expected if there were no relationship between moon phase and the frequency of disciplinary incidents. Army Boots Uniform boots issued to members of the United States Army come in nine sizes. The boots of currently serving soldiers are distributed as shown in the table below. Size Proportion of soldiers If we take a random sample of 1000 soldiers whose last address before enlisting was in the state of Florida, we find the boots issued to these 1000 soldiers were distributed by size in the following way. Size Number of soldiers from Florida Based on this data, would you say that the distribution of boot sizes among soldiers from Florida is the same as the distribution of boots in the population of soldiers in the United States? Bliss 10

11 11 Assumptions for the Chi-Square for Goodness of Fit Chi-square is one of a family of statistics known as nonparametric statistics. As this name implies, using the statistic makes no assumptions of the parameters (characteristics) of the population from which the data was obtained. So, unlike such parametric statistical tests such as t-tests and analysis of variance, we do not have to be concerned about whether the data are more or less normally distributed or that there is homogeneity of variance among samples. Nonparametric tests are reasonably easy-going. However, as in most of life, there is no such thing as free lunch. Nonparametric tests make us pay for this easy-goingness, however. Compared to corresponding parametric statistical test, nonparametric tests have lower power. That is, they have less chance of rejecting a false null hypothesis than the corresponding parametric tests. Put simply, if two distributions differ by a given amount, a nonparametric statistical test will yield a higher probability of the null hypothesis being true than would a parametric statistic. So, in the event that the null hypothesis is actually false, there is a greater chance that you would fail to reject the null (and make a Type II error) using a nonparametric statistic than if you had used a parametric statistic. Quite simply, parametric tests are more sensitive to false null hypotheses than are parametric statistics.

12 1 SPSS Doing it by Computer This is the SPSS output was obtained by calculating a χ for goodness of fit using the Lycanthrop data. The table labeled Phase of moon contains the same information that is New to waxing quarter Waxing quarter to full Full to waning quarter Waning quarter to new Total displayed in Table 8.1. The table labeled Test Statistics presents the obtained value of χ, the number of degrees of freedom (df) and the significance (Asymp. Sig.) of the obtained χ statistic with the given number of degrees of freedom (p). Of course this significance (the probability that the null hypothesis is Phase of moon Observed N Expected N Residual Chi-Square a df Asymp. Sig Test Statistics Phase of moon a. 0 cells (.0%) have expected frequencies less than 5. The minimum expected cell frequency is true) is not really zero. It couldn t be unless we had data from the entire population of interest. SPSS rounds this value to three decimal places, so we can interpret a printed value of.000 as p< In any case, using the decision rule we derived earlier, we can reject the null hypothesis and conclude that the expected frequency distribution is not equal to the distribution of the actual data. Independence/Association In the nation of West Atlantis, the national legislature is made up of 00 representatives. Recently, geologists have found that large deposits of oil lie off the southern coast of the country along the continental shelf. This region is a large tourist and recreational area and a wellorganized movement has arisen to convince the government not to issue leases to oil companies allowing them to drill in the area. West Atlantis has a two party political system that is based on environmental ideology. The Green Party considers the quality of the natural environment to be the ultimate good and its

13 members believe that the less technology that people use, the better off the world will be. The Brown Party s platform calls for a greater use of technology and natural resources and believes that what is good for business is good for the entire country. The last time the legislature voted on a similar issue dealing with permitting offshore oil exploration, the vote was as shown in Table 8.3. Statisticians refer to tables such as this one as contingency tables since they allow us to calculate contingency probabilities. For instance, if a legislator is a member of the Green Party, there is a 75% (60/80) chance that the legislator voted yes. Likewise, if a legislator is a member of the Brown Party, there is only a 6.5% (75/10) chance that he or she would have voted in favor of the motion. Bliss 13 Table 8.3 Distribution of Votes by Party Party Vote Green Brown Yes No 0 45 Clearly, if there is a relationship between party affiliation and how legislators tend to vote on issues of involving offshore drilling leases, both sides in the issue could use this information to target their arguments and financial support to certain legislators. On the other hand, if voting on these issues turns out to be independent of party affiliation, the movement would be better off targeting representatives based on some other variable. In this case, does knowing a legislator s party affiliation change the probability of successfully predicting how a legislator would vote? We can answer this question by expanding Table 8.3 to show the marginals of the rows and columns in the contingency table as shown in Table 8.4.

14 14 Table 8.4 Distribution of Votes by Party With Row and Column Marginals Party Vote Green Brown Yes No As can be seen in Table 8.4, the marginals are simply the sums of the rows and columns. Now, assuming that we don t know the party affiliation of the particular legislator in question, our best guess of his or her vote would be that the legislator voted Yes. Quite simply, this is because looking at the number of legislators who voted Yes and No (the row marginals) we see that more of them voted Yes (135) than voted No (65). If we were to guess the legislator vote Yes (the smartest thing to do) we would be wrong 65 times or 3.5% (65/00) of the time. Let s assume now that we knew the legislator was a member of the Green Party. In this case we need only look at the Green Party Column in Table 8.4. Again, our best guess of how the legislator voted would be Yes since more Greens voted Yes (60) than voted No (0). In this case we would be wrong only 0 times or 5% (0/80) of the time if we so guessed. Finally, if we know that the legislator is a member of the Brown party, our best guess is still that the legislator voted Yes since Table 8.4 shows us that more Browns voted Yes (75) than No (45). If we were to use this decision to guess on the Brown legislator s vote we would find we were wrong 45 times out of 10 or 37.5% of the time. Table 8.5 shows us the accuracy of our guesses on votes under the varying levels of knowledge we had about the party affiliation of the legislators in question. Table 8.5 Accuracy of Vote Guess Under Three Knowledge Conditions Party Affiliation Guessed Vote Chance of Guessing Incorrectly Unknown Yes 3.5% Green Party Yes 5.0% Brown Party Yes 37.5% Table 8.5 shows us that knowing that a legislator is a member of the Green party decreases our chances of guessing incorrectly by 7.5%. In this case, knowing about party

15 affiliation is useful in predicting how a legislator will vote. Note, however, that knowing that a legislator is a member of the Brown Party actually increases our chances of guessing this legislator s vote incorrectly. In this case, we would be better off without the information. Since knowledge of the value of the party affiliation changes our probability of correctly predicting the value of the vote variable it should be easy to understand that the two variables are not independent of each other. In other words, there is a relationship between these variables. Now let s look at the values in Table 8.6. Table 8.6 Distribution of Votes by Party With Row and Column Marginals (Case ) Party Vote Green Brown Yes No Note that in this table even though the values in the individual cells are different from those in Table 8.4, the marginals remain the same. Thus, there are still 80 legislators in the Green Party and 10 in the Brown and 135 legislators voted Yes on the bill while 65 voted No. All that has changed is the distributions within the categories. Using the same strategy we used with the previous distribution, we can see that if we do not know what party a particular legislator belonged to, our best guess about his or her vote would be that the legislator voted Yes since there were 135 voting Yes and only 65 voting No. We would have a 3% (65/00) chance of guessing incorrectly. If we know a legislator is a member of the Green Party, again our best guess would be that the legislator voted yes since 54 out of 80 of the Green legislators voted in that manner. We would be wrong 3.4% (6/80) of the time giving us no advantage over guessing when we did not know the legislator s party affiliation. Finally, If we knew that the legislator was a member of the Brown Party, we would also guess that he or she voted in the affirmative since 81 out of 10 Browns voted affirmatively. We would find we had guessed wrong 3.5% (39/10) of the Bliss 15

16 16 time, again and again we see that knowing party affiliation, in this case, does not help us in making a prediction about how a legislator will vote. Table 8.7 shows this clearly. Table 8.7 Accuracy of Vote Guess Under Three Knowledge Conditions for Case Party Affiliation Guessed Vote Chance of Guessing Incorrectly Unknown Yes 3.5% Green Party Yes 3.5% Brown Party Yes 3.5% In this second case we can say that the two variables, party affiliation and the way a legislator voted on offshore oil drilling are independent of each other. That is, knowing the value of one variable does not help us predict the value of the second variable. In the first case, the variables appeared to be related. If the variables are related, then it should be possible for us to use the information on the previous vote that the West Atlantis legislature made on offshore oil drilling to predict what the next vote might be. Let s look again at the vote on the first offshore drilling bill in Table 8.8. Table 8.8 Distribution of Votes by Party With Row and Column Marginals and Percents Party Vote Green Brown Yes 75% 6.5% 67.5% No 5% 80 40% 37.5% 10 60% 3.5% % In this table the percentages within each cell correspond to the percent of legislators within each row (each party) who voted positively and negatively on the bill, respectively. These percentages are often referred to as column marginals percents since they give us the proportions based on the column marginals for each column. For instance, we note that 60 out of 80 Green Party members voted for the bill and 60/80 = 0.75 (or 75%). Similarly, 45 out of 10 members of the Brown Party voted against the bill and 45/1 = (or 37.5%). The percents in the marginals are the percent of the total number of legislators (there are 00 of them) who fall in each row or column. So, the column

17 marginal percent for members of the Brown Party is 10/00 =0. 60, telling us that 60% of the members of the West Atlantis legislature belong to the Brown Party. The row marginal percent for the legislators who voted yes is 135/00 = 0.675% telling us that 67.5% of the legislators voted in favor of th the bill. Bliss 17 Table 8.9 Distribution of Votes by Party With Row and Column Marginals and Percents (Case ) Party Vote Green Brown Yes 67.5% 67.5% 67.5% No 3.5% 80 40% 3.5% 10 60% 3.5% % Now look at the same information for Case shown in Table 8.9. Note that the row and column marginals and percent marginals are the same as in Case 1 (Table 8.8). What is different in these two cases are the column marginals. Note also that in Case (where the variables are independent) the column percents in a given row are identical to the row marginal percent. This is not true in Case 1 where the variables are related to each other. So, now let s get down to the question of interest. In the vote on the previous bill (Case 1), were the variables party affiliation and vote independent? If they were, then someone trying to influence the vote knows that he or she should not bother looking at the party affiliation of legislators in order to choose legislators they should spend resources lobbying. On the other hand, if the variables are related to each other, knowing legislators party affiliations might help in determining how to use limit resources to influence the vote. Clearly, Case 1 and Case (the situation where there was independence between the variables) are different. All we need to do is look at the column and marginal percentages to see that. Of course, there are two reasons why the distribution might be different. The first is that the two variables are actually independent of each other. The second, however, is that they appear different due to sampling error. That is, because the first vote is not a representative sample of how the 00 members of the legislature

18 18 vote when it comes to issues of offshore oil drilling. This could have occurred if there was something special about the circumstances of the Case 1 vote. Perhaps there had recently been a huge oil spill at an offshore oilrig that polluted the ocean around it and the newspapers and television news programs showed picture after picture of dead birds, fish, and other sea life before the vote was taken. So, if the two variables were independent, we would expect to get cell frequencies that look like the ones in Case. We observed the cell frequencies in Case 1. What is the probability that we could get the observed frequencies (Case 1) if the variables really were independent and the difference between the observed and expected frequencies was simply due to sampling error? Calculating chi-square for independence. This situation should seem familiar. Here we have a set of observed frequencies and a set of expected frequencies. If the frequencies are distributed the same in the two sets, we can say they are independent. If not, there is an association (relationship) between the two variables. We can determine the probability that the two distributions come from the same population using the chi-squared statistic that we saw at the beginning of the chapter. Remember that the formula for χ is ( f f ) = o e χ as f e noted in formula (8.1). In this case f o is the frequency in each of the cells in Case 1 (the table of the frequencies observed in the prior vote) and f e is the frequency expected if the variables are independent in the corresponding cell in the table of Case. Table 8.10 shows these observed and expected frequencies (in italics) for each cell.

19 19 Table 8.10 Distribution of Votes by Party With Expected Values Party Vote Green Brown Yes No We can calculate χ in the following manner. The resulting χ statistic will have (R-1) (C-1) degrees of freedom where R=the number of rows in the contingency table and C=the number of columns in the table. In our example, df= (-1) (-1)=(1) (1)=1. χ = = ( 60 54) ( 75 81) ( 0 6) ( 45 39) ( 6) ( 6) ( 6) ( 6) = = Now our question is, what are the chances of getting a value of χ as high as 3.41 or higher with one degree of freedom if how legislators voted is independent of their party affiliations? We can t determine this directly, but we can find out whether this probability is less than a certain value by looking in Table X. First, let s look at the two possible decisions we can make. We can decide that the variables are independent (in which case we will be saying that the observed frequency distribution among the cells of the table are the same as the distribution of the expected values). We might also assume that the variables are related (i.e. are not independent). In this case, we would find that the observed frequency distribution was different from the expected distribution. As in most cases of using hypothesis-testing statistics, we will set our null hypothesis as the condition where no relationship exists. So, the null hypothesis is that the expected and observed cell frequencies are equal (H 0 : f o = f e ) while the alternative is that the expected and observed frequencies are not equal (H 1 : f o f e ). Next we must devise a rule to use when deciding whether or not to reject the null hypothesis. In this case, let us decide that we will reject the null hypothesis when the chances of it being true are less than 5%. Using Table X, we find that, with one degree of freedom, a χ =

20 value of will cut off the upper 5% of the distribution. So, any χ value above has less than a 5% chance of occurring if the two variables were independent; that is, if the null hypothesis were true. Any values below this have more than a 5% chance of occurring. In our example, we obtained a χ value of 3.41 and now we know that there was more than a 5% chance of obtaining a value as high as this or higher if the null hypothesis were true. Using our decision rule, then, we will fail to reject the null hypothesis and conclude that we have no reason to believe that a legislator s vote on the offshore drilling bill was related to his or her affiliation. If we were going to lobby these legislators, party affiliation would not be a good variable to use in order to target our efforts. Finding expected values. Unlike goodness of fit models, expected values in tests for independence do not come from the theory behind the variables. Rather, these tests use the cell frequencies that would be expected if the two variables in question were independent of each other. As hinted in Tables 8.8 and 8.9, these values are a function of the marginal frequencies of the rows and columns. We can find the expected values of a specific cell by following this simple procedure. 1. Find the row marginal of the row containing the specific cell of interest.. Find the column marginal of the column containing the specific cell of interest. 3. Multiply the row marginal you found in step 1 by the column marginal found in step. 4. Take the product you found in step 3 and divide it by the total number of subjects in the contingency table. of subjects, So, if n i equals the row marginal, n j equals the column marginal, and n is the total number Bliss 0

21 1 Expected value E Row marginal ij = n n i n j Column marginal Total sample size (8.) where Eijis the expected value of the cell in row i and column j. Using Equation 8. with the data in Table 8.4 we would obtain the expected value for members of the Green Party who voted in the affirmative ( E 11) in this way E n n = n ( 135)( 80) = = = as can be seen in Table Equation 8. can be derived thus. A percent marginal is a row or column marginal expressed as a percent of the total number of subjects. Suppose that π i. is the percent marginal for row i and π.j is the percent marginal for column j. If n i. is the row marginal for row i and n.j is the column marginal for column j, we can estimate the i n i. row percent marginal for each row by π. = n and we can estimate the column percent marginal for each column by π. j = n. j n. If the variables are independent, we can apply the multiplication rule of probabilities [p(a B) = p(a)p(b)] and see that the probability that a subject will fall in the cell in row i and column j π ij is equal to π j ( ni n)( n j n) i. π. =.. From this we see that the. expected frequency of a cell if the variables are independent is n n n n i.. j i.. j Eij = n =. This, of course, is the row marginal times the column n n n margin all divided by the number of subjects in the design.

22 To sum up, here are the steps that are used Steps in conducting a χ test for association/independence. 1. Determine the maximum level of Type I error you will tolerate in making decisions about your hypothesis and use this determination to construct a decision rule that tells you when to reject the null hypothesis. In our example we decided that we would only reject the null hypothesis if there were less than a 5% chance of it being true. So our decision rule was: Reject the null hypothesis if p (the probability that the null hypothesis is true) is less than 5%. Fail to reject the null hypothesis if p is more than or equal to.05.. Gather data and determine the frequency distribution of the data observed in the field (for example, Table 8.3). 3. Calculate the expected frequencies for each cell using Equation Calculate the value of the χ statistic using Equation Calculate the number of degrees of freedom in the design using the formula df = (R-1) (C-1) (the number of rows in the contingency table minus one times the number of columns in the table minus 1). 6. Compare the calculated value of χ using the data with this critical value found in step #4. If the calculated value exceeds the tabled critical value this tells us that the chances of obtaining a value this large if there were no differences between the observed and expected distributions was less than the maximum probability that you allowed of making a Type I error. 7. Apply the decision rule you devised in step #1 to determine whether or not you should reject the null hypothesis. In the case of the votes of legislators on offshore drilling, we concluded that the probability of the null hypothesis being true was greater than 5% (p>.05). Therefore,

23 we will fail to reject the null hypothesis and conclude that variables party affiliation and type of vote cast cannot be said to be related. The problem of small expected values in contingency tables. Often in social science research we work with rather small groups of subjects. This often results in cells that have small expected values. Table 8.11 displays a contingency table showing the frequency of responses when a sample of 60 people were asked whether or not they had experienced an act of racism directed towards them within the past year. The responses are grouped according to the selfreported racial/ethnic group membership of the participants. As in previous tables, the regular type numbers are the observed values and the italicized numbers are the expected values, that is the cell values you would expect to see if the two variables are not related to each other. Table 8.11 Experienced Racism in the Past Year by Race/Ethnicity White Black Hispanic Asian Other Yes Bliss 3 No If the frequencies of the expected values in any of the cells of a contingency table is less than 5, it has been the standard practice to attempt to increase these expected values by combining rows and/or columns of the table to increase expected values (see Table 8.1). The rationale for this is a bit beyond the scope of this book. Suffice it to say that the result of this problem of small expected values is that calculated probabilities of Type I errors tend to be underestimates of the actual chances of making a Type I error. In other words, when the value of χ obtained from the data tells you the probability of making a Type I error (the significance of the statistical test) is 4%, it might really be 6%. If p =.04 according to your statistic and you

24 were testing at α =.05, you would reject the null hypothesis. However, if p were actually equal to.06, you should have failed to reject the null hypothesis. In other words, you would have rejected the null hypothesis when you shouldn t have a Type I error. Theoretically then, small expected values increase our chances of making a Type I error beyond the level of α that we initially set. However, Camilli and Hapkins (1979) have shown that Type I error is not really a problem so long as the sample size is at least equal to eight. Overall (1980) noted that the real problem with small sample sizes is more likely to have to do with Type II error (i.e., with power) than with Type I error. It probably is not a bad idea to try to keep expected values of contingency table cells above 5, but one should not adhere to this rule slavishly if the situation warrants smaller expected frequencies. Table 8.1 Table 8.11 With the Last Three Categories Condensed White Black Other Yes Bliss 4 No

25 5 Some authors suggest the use of Yates Correction Factor for Continuity when using χ with a contingency table. This adjustment to the standard formula for χ was suggested by Yates in 1934 who noted that, as mentioned in the discussion of small sample sizes, the sampling distribution of the calculated values of χ is discrete while the theoretical sampling distribution of the statistic is continuous. This is particularly a problem with small samples and this led Yates to devise this correction for use with tables. The procedure merely subtracts.5 from the differences between the observed and expected frequencies in the numerator before they are squared giving Formula 8.3. (( f f ). ) 5 o e χ = (8.3) f e This procedure works quite nicely so long as the marginals of the contingency table are fixed. To speak of fixed marginals is to say that, if you had repeated the study with a different sample from the same population, although the individual cell frequencies might change from the original sample, the marginals would be equal to those in the first sample. This is a very unusual situation. For this reason, even though the correction factor is available in the output of most computer statistical packages, it is probably not a good idea to use Yate s Correction for Continuity. Measures of the strength of association. Remember that null hypothesis tested by the χ for goodness of fit is that the cell frequencies observed in the sample are the same as those we would have expected to see if the two variables were independent of each other. In other words, the null hypothesis is that the variables are independent. If we reject this null hypothesis we are simply saying that there is a relationship between the two variables; that the correlation between the two is not zero. However, this does not tell us anything about the strength of the relationship. As we discussed in Chapter 1, after we determined that was a finding that allowed us to reject a null hypothesis that there was no relationship between the subjects race and their experiences with racists, we need to determine the strength of the relationship. In our example we rejected the null hypothesis that there was no relationship between the subjects race and their

26 6 experiences with racists, and now we need to determine if the association (relationship) was strong or weak. In order to determine the strength of the relationship between the two variables in Phi φ = χ N Chi-square Total sample size (8.4) contingency tables, we can use the Phi (Φ) statistic. Phi is related to χ as shown in Formula 8.4 Ccccccdccc (8.5) For tables larger than, the formula can be extended to Cramér s V as shown in Formula 8.5 where k is either the number of rows or the number of columns in the contingency table, whichever is less. Both Φ and V are correlation coefficients, and may have values between 0 and 1.00 where zero indicates no relationship at all and 1.00, a perfect relationship between the two variables. From Chapter 6, you remember that if correlation values close to 1.00 indicate strong relationship, while values close to 0 indicate relative independence or lack or relationship. In other words, one may conclude that if the value of a correlation coefficient is 1.00, we can perfectly predict one variable from the second all of the time. A value of zero, however, can be interpreted as telling us any attempt to predict the value of one variable from Be cautious when calculating measures of association with contingency tables! Remember, as with any other correlation coefficient, Φ or V should only be calculated when there is reason to believe that the correlation between the two variables is non-zero. If the χ test for independence indicates that it is not appropriate to reject the null hypothesis (the hypothesis that tells us that the variables are independent, i.e. not related), we have no reason to believe that Φ or V is not zero. In this case we would be rather foolish to attempt to actually calculate the statistic. the value of the other would be no more accurate than taking a wild guess.

27 7 SPSS Doing it By Computer We have a randomly selected sample of 67 graduate students from a College of Education at a large public university in the southeastern United States. These students have responded to a questionnaire that included information on the colors of their hair and eyes. The information on hair and eye color has been coded as either light or dark. Geneticists tell us that the genes that control hair and eye color (there are a lot of them) are close to each other on a particular chromosome. Based on this information (and what we observe every day in the people around us) we have reason to believe that there is a relationship between the hair and eye color of human beings. Specifically, that people with light hair tend to have light eyes (and vice-versa) and that people who have dark hair tend to have dark eyes. If we know whether a person has light or dark hair we should be able to predict, at least to some degree) whether they have light or dark eyes. We use SPSS to devise a contingency table for these to variables and test the null hypothesis that these two variables are independent. If we can reject this null hypothesis, we can use the Φ statistic to determine the strength Hair color * Eye color Crosstabulation of the relationship between these two variables. The results of this Count analysis are shown below. Eye color The contingency table on the left Hair color shows the frequencies for each of Light Dark Total Light the cells of the contingency table and the row and column marginals Dark (labeled Total ). In the second Total table we note that the calculated value of χ with one degree of freedom is The column labeled Assymp. Sig. tells us that the chances that the null Chi-Square Tests Pearson Chi-Square Continuity Correction a Likelihood Ratio Fisher's Exact Test Linear-by-Linear Association N of Valid Cases a. Computed only for a x table Value b df Asymp. Sig. (-sided) Exact Sig. (-sided) Exact Sig. (1-sided) b. 1 cells (5.0%) have expected count less than 5. The minimum expected count is Symmetric Measures hypothesis (the variables are independent) is true is Value Approx. Sig. only.5% (.005). Nominal by Phi Whether we chose to Nominal test this null hypothesis Cramer's V at the.05 or.01 levels, N of Valid Cases 67 we would reject it since a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. there is less than a 1% chance that the null is true and we would conclude that hair and eye color are related to each other in the population from which this sample

28 was drawn. Since we can now assume there is a relationship between these two. The third table shows us that Cramer s V statistic has a value of.344 indicating a moderately weak relationship between hair and eye color in the population from which this sample was drawn. Bliss 8

29 9 Review Questions and Exercises 1. An investigator is trying to determine if there is a relationship between marital satisfaction of husbands and wives. He identifies a group of married couples, and asks each individual independently how happy he or she is in the current marriage. The data (number of respondents in each category) are reported in the following table. Husband s level of satisfaction Wife s level of satisfaction Low Medium High Low Medium High 14 9 a) Is there a statistically significant relationship between husbands and wives satisfaction with the marriage? Please state the null and alternate hypotheses, and test at Alpha=.05. b) If there is a statistically significant relationship, what is its strength? Please calculate an indicator of effect size. c) In this sample, what is the probability of having a married couple with the wife being unhappy (low satisfaction) and the husband being very happy (high satisfaction) with their marriage?. A sociologist is interested in the number of children in a population of 1,49 families that are being investigated. She hypothesizes the following distribution of family size: No children 6% 1 child 16% children 5% 3 children 15% 4 children 9% 5 children 4% 6 children % 7 children % 8 or more children 1% Test the hypothesis that this is the distribution in the population from which the sample of drawn at the.05 level of significance. Write a paragraph describing your conclusions. 3. As a part of an action research project, a school principal administered a standardized test of critical thinking skills to her third grade classes in the beginning and at the end of a school year. She finds out that the mean of all of her third graders (n=13) test scores at the end of the year was the same as their mean in the beginning of the year. But, unexpectedly she also finds-out that the variance of the scores were not the same. In the beginning of the year, the variance was 9 (s =9), while at the end of the year the variance was 0.

30 a. Can she conclude that the variance of the scores actually increased? State the null and alternate hypotheses answering this question, and test with Alpha=.05. b. Write a short report to summarize these results, and make a clear statement regarding changes in critical thinking during the third grade. Bliss 30

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

More information

HYPOTHESIS TESTING WITH SPSS:

HYPOTHESIS TESTING WITH SPSS: HYPOTHESIS TESTING WITH SPSS: A NON-STATISTICIAN S GUIDE & TUTORIAL by Dr. Jim Mirabella SPSS 14.0 screenshots reprinted with permission from SPSS Inc. Published June 2006 Copyright Dr. Jim Mirabella CHAPTER

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

6.4 Normal Distribution

6.4 Normal Distribution Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under

More information

Crosstabulation & Chi Square

Crosstabulation & Chi Square Crosstabulation & Chi Square Robert S Michael Chi-square as an Index of Association After examining the distribution of each of the variables, the researcher s next task is to look for relationships among

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

The Dummy s Guide to Data Analysis Using SPSS

The Dummy s Guide to Data Analysis Using SPSS The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests

More information

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other 1 Hypothesis Testing Richard S. Balkin, Ph.D., LPC-S, NCC 2 Overview When we have questions about the effect of a treatment or intervention or wish to compare groups, we use hypothesis testing Parametric

More information

Chi Square Tests. Chapter 10. 10.1 Introduction

Chi Square Tests. Chapter 10. 10.1 Introduction Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

More information

Independent samples t-test. Dr. Tom Pierce Radford University

Independent samples t-test. Dr. Tom Pierce Radford University Independent samples t-test Dr. Tom Pierce Radford University The logic behind drawing causal conclusions from experiments The sampling distribution of the difference between means The standard error of

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V

CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V CHAPTER 15 NOMINAL MEASURES OF CORRELATION: PHI, THE CONTINGENCY COEFFICIENT, AND CRAMER'S V Chapters 13 and 14 introduced and explained the use of a set of statistical tools that researchers use to measure

More information

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate

Chapter 13. Chi-Square. Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running two separate 1 Chapter 13 Chi-Square This section covers the steps for running and interpreting chi-square analyses using the SPSS Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

The Chi-Square Test. STAT E-50 Introduction to Statistics

The Chi-Square Test. STAT E-50 Introduction to Statistics STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Sample Size and Power in Clinical Trials

Sample Size and Power in Clinical Trials Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance

More information

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES SCHOOL OF HEALTH AND HUMAN SCIENCES Using SPSS Topics addressed today: 1. Differences between groups 2. Graphing Use the s4data.sav file for the first part of this session. DON T FORGET TO RECODE YOUR

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as... HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

More information

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table.

Tom wants to find two real numbers, a and b, that have a sum of 10 and have a product of 10. He makes this table. Sum and Product This problem gives you the chance to: use arithmetic and algebra to represent and analyze a mathematical situation solve a quadratic equation by trial and improvement Tom wants to find

More information

Chapter 19 The Chi-Square Test

Chapter 19 The Chi-Square Test Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA

CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working

More information

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

More information

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table ANALYSIS OF DISCRT VARIABLS / 5 CHAPTR FIV ANALYSIS OF DISCRT VARIABLS Discrete variables are those which can only assume certain fixed values. xamples include outcome variables with results such as live

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions.

Unit 1 Number Sense. In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. Unit 1 Number Sense In this unit, students will study repeating decimals, percents, fractions, decimals, and proportions. BLM Three Types of Percent Problems (p L-34) is a summary BLM for the material

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

StatCrunch and Nonparametric Statistics

StatCrunch and Nonparametric Statistics StatCrunch and Nonparametric Statistics You can use StatCrunch to calculate the values of nonparametric statistics. It may not be obvious how to enter the data in StatCrunch for various data sets that

More information

In the past, the increase in the price of gasoline could be attributed to major national or global

In the past, the increase in the price of gasoline could be attributed to major national or global Chapter 7 Testing Hypotheses Chapter Learning Objectives Understanding the assumptions of statistical hypothesis testing Defining and applying the components in hypothesis testing: the research and null

More information

2 GENETIC DATA ANALYSIS

2 GENETIC DATA ANALYSIS 2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

SPSS Explore procedure

SPSS Explore procedure SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,

More information

Unit 26 Estimation with Confidence Intervals

Unit 26 Estimation with Confidence Intervals Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

More information

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217

Part 3. Comparing Groups. Chapter 7 Comparing Paired Groups 189. Chapter 8 Comparing Two Independent Groups 217 Part 3 Comparing Groups Chapter 7 Comparing Paired Groups 189 Chapter 8 Comparing Two Independent Groups 217 Chapter 9 Comparing More Than Two Groups 257 188 Elementary Statistics Using SAS Chapter 7 Comparing

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

9. Sampling Distributions

9. Sampling Distributions 9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS

Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS Overview of Non-Parametric Statistics PRESENTER: ELAINE EISENBEISZ OWNER AND PRINCIPAL, OMEGA STATISTICS About Omega Statistics Private practice consultancy based in Southern California, Medical and Clinical

More information

Kenken For Teachers. Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles June 27, 2010. Abstract

Kenken For Teachers. Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles June 27, 2010. Abstract Kenken For Teachers Tom Davis tomrdavis@earthlink.net http://www.geometer.org/mathcircles June 7, 00 Abstract Kenken is a puzzle whose solution requires a combination of logic and simple arithmetic skills.

More information

Non-Inferiority Tests for Two Proportions

Non-Inferiority Tests for Two Proportions Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

More information

Chi Square Distribution

Chi Square Distribution 17. Chi Square A. Chi Square Distribution B. One-Way Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes

More information

Nonparametric Tests. Chi-Square Test for Independence

Nonparametric Tests. Chi-Square Test for Independence DDBA 8438: Nonparametric Statistics: The Chi-Square Test Video Podcast Transcript JENNIFER ANN MORROW: Welcome to "Nonparametric Statistics: The Chi-Square Test." My name is Dr. Jennifer Ann Morrow. In

More information

Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 06 Introduction This procedure provides several reports for the comparison of two distributions, including confidence intervals for the difference in means, two-sample t-tests, the z-test, the

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Comparing Means in Two Populations

Comparing Means in Two Populations Comparing Means in Two Populations Overview The previous section discussed hypothesis testing when sampling from a single population (either a single mean or two means from the same population). Now we

More information

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk

Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Analysing Questionnaires using Minitab (for SPSS queries contact -) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:

More information

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing.

Introduction to. Hypothesis Testing CHAPTER LEARNING OBJECTIVES. 1 Identify the four steps of hypothesis testing. Introduction to Hypothesis Testing CHAPTER 8 LEARNING OBJECTIVES After reading this chapter, you should be able to: 1 Identify the four steps of hypothesis testing. 2 Define null hypothesis, alternative

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Topic 8. Chi Square Tests

Topic 8. Chi Square Tests BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

6.3 Conditional Probability and Independence

6.3 Conditional Probability and Independence 222 CHAPTER 6. PROBABILITY 6.3 Conditional Probability and Independence Conditional Probability Two cubical dice each have a triangle painted on one side, a circle painted on two sides and a square painted

More information

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST

EPS 625 INTERMEDIATE STATISTICS FRIEDMAN TEST EPS 625 INTERMEDIATE STATISTICS The Friedman test is an extension of the Wilcoxon test. The Wilcoxon test can be applied to repeated-measures data if participants are assessed on two occasions or conditions

More information

Tests for Two Proportions

Tests for Two Proportions Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

More information

This chapter discusses some of the basic concepts in inferential statistics.

This chapter discusses some of the basic concepts in inferential statistics. Research Skills for Psychology Majors: Everything You Need to Know to Get Started Inferential Statistics: Basic Concepts This chapter discusses some of the basic concepts in inferential statistics. Details

More information

Solutions to Homework 10 Statistics 302 Professor Larget

Solutions to Homework 10 Statistics 302 Professor Larget s to Homework 10 Statistics 302 Professor Larget Textbook Exercises 7.14 Rock-Paper-Scissors (Graded for Accurateness) In Data 6.1 on page 367 we see a table, reproduced in the table below that shows the

More information

PRACTICE PROBLEMS - PEDIGREES AND PROBABILITIES

PRACTICE PROBLEMS - PEDIGREES AND PROBABILITIES PRACTICE PROBLEMS - PEDIGREES AND PROBABILITIES 1. Margaret has just learned that she has adult polycystic kidney disease. Her mother also has the disease, as did her maternal grandfather and his younger

More information

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine

Multivariate Analysis of Variance. The general purpose of multivariate analysis of variance (MANOVA) is to determine 2 - Manova 4.3.05 25 Multivariate Analysis of Variance What Multivariate Analysis of Variance is The general purpose of multivariate analysis of variance (MANOVA) is to determine whether multiple levels

More information

Session 7 Fractions and Decimals

Session 7 Fractions and Decimals Key Terms in This Session Session 7 Fractions and Decimals Previously Introduced prime number rational numbers New in This Session period repeating decimal terminating decimal Introduction In this session,

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

Introduction to the Smith Chart for the MSA Sam Wetterlin 10/12/09 Z +

Introduction to the Smith Chart for the MSA Sam Wetterlin 10/12/09 Z + Introduction to the Smith Chart for the MSA Sam Wetterlin 10/12/09 Quick Review of Reflection Coefficient The Smith chart is a method of graphing reflection coefficients and impedance, and is often useful

More information

Testing differences in proportions

Testing differences in proportions Testing differences in proportions Murray J Fisher RN, ITU Cert., DipAppSc, BHSc, MHPEd, PhD Senior Lecturer and Director Preregistration Programs Sydney Nursing School (MO2) University of Sydney NSW 2006

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

How To Check For Differences In The One Way Anova

How To Check For Differences In The One Way Anova MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

More information

Introduction to Hypothesis Testing

Introduction to Hypothesis Testing I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

More information

The Kruskal-Wallis test:

The Kruskal-Wallis test: Graham Hole Research Skills Kruskal-Wallis handout, version 1.0, page 1 The Kruskal-Wallis test: This test is appropriate for use under the following circumstances: (a) you have three or more conditions

More information

Mind on Statistics. Chapter 15

Mind on Statistics. Chapter 15 Mind on Statistics Chapter 15 Section 15.1 1. A student survey was done to study the relationship between class standing (freshman, sophomore, junior, or senior) and major subject (English, Biology, French,

More information

p ˆ (sample mean and sample

p ˆ (sample mean and sample Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

More information