Probability and the Chi-Square Test written by J. D. Hendrix Learning Objectives Upon completing the exercise, each student should be able: to determine the chance that a given state will occur in a system that consists of a finite number of equivalent states; to determine the chance of two or more independent events occurring simultaneously by using the product rule; to determine the chance that either one or the other of two mutually exclusive events will occur by using the sum rule; to test hypotheses based on expected frequencies using the chi-square test. Background Many genetic events are random processes. These include the segregation and assortment of genes during meiosis, the combination of gametes during fertilization, and crossover between homologous chromosomes. Scientists describe random natural processes using the mathematical tools of probability and statistics. A. The Chance that an Event will Occur Consider a system that consists of a finite number of equivalent states. The chance that a given state will occur is given by the equation C= a t in which C is the chance (probability) of the state, a is the number of times the state is represented in the system, and t is the total number of equivalent states in the system. For example, we can develop a mathematical model to describe a coin toss. We assume that a coin toss is a system with two equivalent states, heads-up and tails-up. We describe each state with a probability. Chance of heads = Chance of tails = number of heads total sides on the coin = 2 number of tails total sides on the coin = 2 = 0.5 = 50% = 0.5 = 50% We can use probabilities to predict the frequency of an event, or how often an event will occur. In an experiment with 50 coin tosses: Expected number of heads = 50 x 0.5 = 25 Expected number of tails = 50 x 0.5 = 25 We test the validity of the hypothesis (that there is an equal chance of getting heads or tails) by comparing the observed number of heads and tails in a coin-toss experiment with the expected values calculated from the probabilities. If the original assumptions in the hypothesis are not valid (for example, if the coin is heavier on one side, or if it is deformed in some way),
2 then there could be a significant difference between the observed and expected values. It is customary to express the expected outcome of an experiment involving frequencies as a ratio. In the coin toss experiment, we expect a heads:tails ratio of :. Imagine a standard deck of 52 playing cards, randomly shuffled. What is the chance of drawing an ace of hearts (A ) from the deck? Chance of A = Number of A Total number of Cards = 52 0.092 What is the chance of drawing any ace? Chance of any ace = Number of Aces Total number of Cards = 4 52 0.0769 Imagine that an ace of hearts was drawn from a standard deck of 52 cards and returned to the deck. Then, the deck was reshuffled and another card drawn. What is the chance that the card will be an ace of hearts? Chance of A = Number of A Total number of Cards = 52 0.092 Notice that, since the card was returned to the deck, the total number of cards and the chance remain the same. Imagine that an ace of hearts was drawn from a standard deck of 52 cards and discarded. Then, another card was drawn. What is the probability that the card will be an ace of hearts? Chance of A = Number of A Total number of Cards = 0 5 = 0 Notice that the number of cards and the chance have changed. Since there is no longer an ace of hearts in the deck, then the probability of drawing an ace of hearts is zero. What is the chance of drawing one of the three remaining aces? 3 Chance of drawing one of the three remaining aces = 5 0.0588
3 B. The Chance of Independent Events occurring together The chance of two or more independent events occurring together is the product of their individual probabilities. An example is the simultaneous tossing of two coins. The outcome of the toss on one coin should not affect the outcome on the second coin (unless the coins are glued together). Therefore, the events are independent of each other. Outcome of toss on Probability of toss on Coin # Coin #2 Coin # Coin #2 Probability of both events occurring Heads Heads 0.5 0.5 0.5 x 0.5 = 0.25 Heads Tails 0.5 0.5 0.5 x 0.5 = 0.25 Tails Heads 0.5 0.5 0.5 x 0.5 = 0.25 Tails Tails 0.5 0.5 0.5 x 0.5 = 0.25 What is the probability of rolling a pair of sixes on a standard set of dice? As you probably know, a die is a game cube with six sides, each side numbered with between one through six dots. Assuming that the mass of the cube is evenly distributed, the chance of rolling any of the numbers is /6. Therefore, the probability of rolling a pair of sixes on a pair of dice is 6 6 = 36 C. Mutually Exclusive Events (Either/or situations) The chance that either one or the other of two mutually exclusive events will occur is the sum of their individual probabilities. For example, consider a box containing two red beads, three white beads, and four blue beads. If one bead is randomly chosen, what is the chance that the bead will be either red or white? Chance of a red bead = 2 9 Chance of a white bead = 3 9 Chance of either red or white = 2 9 3 9 = 5 9
4 D. Hypothesis Testing using the Chi-square Test Let s develop a formal hypothesis for the coin toss experiment. Hypothesis: If the mass of a coin is symmetrically distributed on both sides of the coin, then there is an equal probability of a coin toss resulting in heads or tails. From this hypothesis we can make the following prediction. Prediction: If a specific coin is tossed 50 times, then 25 of the tosses will result in heads and 25 of the tosses will result in tails. The prediction can be tested by performing the following experiment. Experiment: Toss the coin 50 times and count the number of heads and tails. Independent variable: Number of times the coin is tossed. Dependent variable: Number of heads or tails. The observed results in an experiment are almost never exactly equal to the expected results. For example, in the coin toss experiment one expects 25 heads and 25 tails if a coin is tossed 50 times. However, what if the result is 27 heads and 23 tails? Is this a significant difference between the expected and the observed results, or can we attribute the difference to random chance? It seems to make sense that a result of 27 heads, 23 tails is reasonable, but how can we be sure? If we repeated the experiment 00 times, how often would we expect to this much deviation from the expected value (25:25)? The chi-square (χ 2 ) test is a statistical test used to determine whether the difference between an expected result and an observed result is significant or whether the difference can be attributed to random chance. To analyze experimental data using the χ 2 test, the data must consist of a finite number of mutually exclusive outcomes or classes. Also, we must know the probability of each class in order to calculate the expected values. The degrees of freedom in an experiment is the total number of classes minus one: df = k -, where k is the number of classes. In the coin experiment, there are two outcomes or classes of results, heads and tails. Therefore, there is one degree of freedom. The value of χ 2 is given by the equation χ 2 = O E 2 E where O is the observed number of items in a given class, E is the expected number of items in the class, and the summation sign (Σ) indicates the sum of all values of [(O-E) 2 ]/E for every class in the system. Consider the following results of the coin toss experiment.
5 # obtained # expected O E (O E) 2 (O E) 2 Toss (O) (E) E Heads 27 50 x 0.5 = 25 2 4 0.60 Tails 23 50 x 0.5 = 25-2 4 0.60 Total: 50 χ 2 = 0.320 Is the difference between O and E significant? If so, then we reject the hypothesis. If not, then we fail to reject the hypothesis. We evaluate the difference from a table of χ 2 values, such as the one shown below. P value = Probability that the Difference is due to Chance and is Not Significant df 0.95 0.80 0.50 0.20 0.0 0.05 0.0 0.00393 0.0642 0.455.642 2.706 3.84 6.635 2 0.03 0.446.386 3.29 4.605 5.99 9.20 3 0.352.005 2.366 4.642 6.25 7.85.34 4 0.7.649 3.357 5.989 7.779 9.488 3.277 5.45 2.343 4.35 7.289 9.236.070 5.086 6.635 3.070 5.348 8.558 0.645 2.592 6.82 7 2.67 3.822 6.346 9.803 2.07 4.067 8.475 8 2.733 4.594 7.344.030 3.362 5.507 20.090 9 3.325 5.380 8.343 2.242 4.684 6.99 2.666 0 3.940 6.79 9.342 3.442 5.987 8.307 23.209 Locate the value of χ 2 in the row corresponding to the appropriate df value. In this example, the value of χ 2 = 0.320, and the value of df = 2 - =. Therefore, the χ 2 value is between 0.0642 and 0.455. 0.0642 < χ 2 < 0.455 The probability, P, that the deviation is due to random chance, and is not significant, is read from the top row of the table. How do we interpret this nonsense? 0.80 > P > 0.50 In most genetics work, deviations are considered significant only if the probability value from the χ 2 table is 0.05 (5%) or less. This is called a 5% level of significance (or a 95% confidence level). If the probability is 0.05 or less, this means that there is a 95% or greater probability that the deviation is not due to chance, and the hypothesis is rejected. If the probability is greater than 0.05, then we cannot reject the hypothesis based on the data. In our example, P is greater than 0.05, so the hypothesis is not rejected. Whew! That s a lot of words. To put it simply: If the P value from the χ 2 table is less than 0.05, then the deviation of the observed values from the expected values is significant and the data do not support the hypothesis. If the P value from the χ 2 table is greater that 0.05, then the deviation of the observed values from the expected values is not significant and the data support the hypothesis.
6 In this example, the P value is between 0.50 and 0.80. This means that there is between a 50% and 80% probability that the deviation seen is due to chance. Since the P value is greater than 0.05, the deviation is not significant at the 95% confidence level, and the data support the hypothesis. Consider the results of another coin toss experiment, using a different coin. # obtained # expected O E (O E) 2 (O E) 2 Toss (O) (E) E Heads 3 25-2 44 5.76 Tails 37 25 2 44 5.76 Total: 50 χ 2 =.52 As before, df = 2 =. At df =, it looks like the χ 2 value we calculated is off the chart! This simply means that the deviation is so big that it is larger than the the largest recorded value in the chart. The P value must therefore be much much smaller than 0.0 (and smaller than 0.05). Hence: χ 2 > 6.635 P < 0.0 Since P < 0.05, the deviation of the observed values from the expected in this coin toss is significant and the data do not support the hypothesis. Can you suggest a reason why the coin toss experiment failed to support the expected : ratio in this case? (Here s a hint: read the first sentence of the hypothesis for an important assumption that lead us to the : ratio.)
Probability and the Chi-Square Test Laboratory Report Sheet Name Lab Partners. A standard deck of 52 playing cards has 3 cards of each suit (hearts, spades, diamonds, or clubs). What is the probability of drawing a diamond? 2. If two coins are tossed, what is the probability that one coin will be heads and the other coin will be tails, with either of the two coins being heads. To solve this problem, start with the information given in section B under the Product Rule, then apply the Sum Rule to solve for an either/or situation. 3. What is the probability of rolling a seven in any combination on a pair of dice? To solve this problem, you will need to use a combination of the product rule and the summation rule. Try completing this table. Remember that each roll is a mutually exclusive event (that is, if you roll a and a 6, you can t roll a 2 and a 5 at the same time). Roll on Die Roll on Die 2 Probability of this Roll 6 6 x 6 = 36 2 Probability of rolling a seven in any combination: 7
8 4. In corn, the genes for seed color (purple or yellow) and seed shape (smooth or wrinkled) assort independently of each other. This means that the expected frequencies (probabilities) of these traits in a cross can be treated as independent events occurring simultaneously, so the product rule applies. If hybrid purple corn is self-fertilized, the following offspring are expected: ¾ Purple ¼ Yellow If hybrid smooth corn is self-fertilized, the following offspring are expected: ¾ Smooth ¼ Wrinkled Here s the question: If corn that is both hybrid purple and hybrid smooth is self-fertilized, what results do you expect? Use the product rule to figure out how many purple smooth, purple wrinkled, yellow smooth, and yellow wrinkled kernels you expect. 5. You will be provided a 6-sided die (game cube). (a) (b) (c) (d) Write a formal hypothesis, prediction, experiment, and variables about the probabilities of tossing numbers on the die. Perform your experiment. You should have a sufficient sample size (i.e. several hundred rolls) for a valid statistical sample. Use the χ 2 test to determine if your data support your hypothesis. Show your work. Write a brief conclusion summarizing your results. If the data do not support your conclusion, you should suggest reasons in your conclusion.