Appendix A The Chi-Square (32) Test in Genetics

Appendix A The Chi-Square (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between two heterozygotes should produce an exact 3:1 ratio of dominant to recessive phenotypes. In any particular real- world experiment, with limited and sometimes very small sample sizes, results are expected to deviate somewhat from the exact theoretical ratio, due simply to chance. In order to evaluate a genetic hypothesis (for example, that a particular trait is due to a recessive allele segregating at a locus), we need a means to distinguish an experimental result that is consistent with the hypothesis within the bounds of simple chance deviations, apart from one that is intrinsically unlikely ( wrong ), given the data. Statistical tests are a means of quantifying the results of an experiment as evidence for or against a particular hypothesis. One of the simplest statistical tests is Chi-Square (32) Analysis, which compares the "goodness of fit" between observed and expected counts. An hypothesis is developed that predicts how a set of observations will fall into each of two or more categories (the expected result). These counts are compared with the experimental data (the observed result). Allowing for the sample size, the differences among the observed and expected results are reduced to a single number, the chi-square value. Because larger deviations from expectation are expected with more categories, the test also takes into account the degrees of freedom in the experiment. Comparison with a table of probability values shows the probability that the observed deviation could have been obtained by chance alone. [See the website for Bio2900 (Principles of Evolution) for further discussion of the concepts of null hypothesis, significance testing, and Type I & II error: http://www.mun.ca/biology/scarr/2900_hypothesis_testing.htm]. CHI-SQUARE CALCULATIONS The chi-square formula is: O = observed # of individuals with phenotype 3 2 = [(O - E) 2 / E] E = expected # of individuals with phenotype = sum of deviations for all phenotypes i) The chi-square test should be used only on the numerical data themselves, not on ratios or percentages derived from the data. ii) iii) iv) In experiments where the expected frequency in any phenotypic class is less than five, the true probability is usually slightly larger than the p given in the table. Expected values should be adjusted to the closest integer [you would not expect a fraction of an individual]; the sum totals of observed and expected values should not differ more than one individual. These calculations neglect a number of corrections, including those for small expected classes, multiple simultaneous tests, and the one-tail or two-tail nature of the test.

These will be discussed in your statistics course.

INTERPRETATION OF THE CHI-SQUARE TABLE The table of chi-square values (below) can be used to determine the probability that any particular result (observed deviation) could have been obtained by chance alone. i) Horizontal rows indicate the number of degrees of freedom (df). In general, df = (n - 1) where n is the number of observed classes or phenotypes. For each row, the p value at the top of the column is the probability that a particular chi-square value could have been obtained by chance alone. This is called the critical value for that level of probability. ii) iii) To use the table, enter it on the row corresponding to the number of degrees of freedom. Look for the column with the critical value closest to and less than the chi-square obtained in your calculations. The probability of your result is less than the p value for that column. For a biological experiment, we typically set the level of significance at p = 0.05. If the observed chi-square is greater than the critical value for p = 0.05, we conclude that the result could have been obtained by chance less than 5% of the time, and we reject the null hypothesis that chance alone is responsible for the result. We say that this result is statistically significant. Example #1: 3 2 = 15.0 and n = 8 phenotypes. The observed chi-square of 15.0 with df = 7 is greater than the critical value of 14.07 for p <.05. This means that the observed deviation represented by this chi-square value would be expected to occur by chance less than 5% of the time. The result is statistically significantly, and the hypothesis (that the data are consistent with the expected ratio) can be rejected. Example #2: 3 2 = 5.0 and n = 8 phenotypes. The observed chi-square of 5.0 with df = 7 lies between the values 2.83 and 6.35, which correspond to.90 > p >.50. A deviation as large as that observed would be expected to occur by chance more than 50% of the time. The difference between the observed and expected results is not statistically significant, and the null hypothesis (the ratio being tested) cannot be rejected. p = 0.9 0.50 0.20 0.05 0.01 0.001 df = 1 0.02 0.46 1.64 3.84 6.64 10.83 2 0.21 1.39 3.22 5.99 9.21 13.82 3 0.58 2.37 4.64 7.82 11.35 16.27 4 1.06 3.36 5.99 9.49 13.28 18.47 5 1.61 4.35 7.29 11.07 15.09 20.52 6 2.20 5.35 8.56 12.59 16.81 22.46 7 2.83 6.35 9.80 14.07 18.48 24.32 8 3.49 7.34 11.03 15.51 20.09 26.13 9 4.17 8.34 12.24 16.92 21.67 27.88 10 4.87 9.34 13.44 18.31 23.21 29.59 15 8.55 14.34 19.31 25.00 30.58 37.30 25 16.47 24.34 30.68 37.65 44.31 52.62 50 37.69 49.34 58.16 67.51 76.15 86.6

Appendix B Gene Map of Drosophila melanogaster From William S. Klug, Michael R. Cummings, Concepts of Genetics, Macmillan, 1994, 4 th ed., p. 132. This material has been copied under licence from CANCOPY. Resale or further copying of this material is strictly prohibited.

Appendix C Genetic Nomenclature & Notation for Drosophila Clear notation for any Drosophila genotype will indicate whether the locus involved is on an autosomal (II, III, or IV) or sex chromosome (I(X) or Y), and in the case of two (or more) loci, whether they are the same or different chromosomes (linked or unlinked, respectively). Dominant alleles at a locus are indicated by a capitalized symbol, recessive alleles by a lowercase symbol. Examples of such notation are as follows. 1) One autosomal locus: e.g. The genotype for ebony body on Chromosome III is ee, for wild-type body at that locus e + e +. For autosomal genes the genotype is the same for male and female, and can be homozygous or heterozygous. 2) One sex-linked locus: In Drosophila alleles may be present on the X chromosome but not on the Y chromosome, therefore the genotypes for male and female are different. The symbol ( ) indicates a male Y sex-chromosome and therefore the presence of only one allele. e.g. Bar eye on Chromosome I. Bar eye female has genotype BB, Bar eye male is B. Wildtype eye female is B + B +, wild-type eye male is B +. 3) Two unlinked autosomal loci e.g. vestigial wing (II) and ebony body (III) would have genotype vgvg ee and wild-type (wing and body at these loci) would have vg + vg + e + e +. 4) Two linked autosomal loci e.g. curled wing (III, 50.0) and ebony body (III, 70.7). The genotype is written to show the alleles on each homologue cu e/ cu e. Wild-type would be cu + e + / cu + e +. 5) Two sex-linked loci e.g. Bar eye (I 57.0) and forked bristle (I, 56.7). Female is Bf / Bf, male is Bf /. Wild-type female is B + f + / B + f +, wild-type male is B + f + /. 6) One sex-linked & one autosomal loci e.g. Bar eye (I) and vestigial wing (II). Female is BB vgvg, male is B vgvg wild-type female is B + B + vg + vg +, wild-type male is B + vg + vg +.