Topic 21: Goodness of Fit


 Sheena Lester
 2 years ago
 Views:
Transcription
1 Topic 21: December 5, 2011 A goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of 4 possible of blood types, A, B, AB, and O. The local blood bank has good information from a national database of the fraction of individuals having each blood type, π O, π A, π B, and π AB. The actual fraction p O, p A, p B, and p AB of these blood types in the community for a given blood bank may be different than what is seen in the national database. As a consequency, the local blood bank may choose to alter its distribution of blood supply to more accurately reflect local conditions. To place this assessment strategy in terms of formal hypothesis testing, let π = (π 1,..., π k ) be postulated values of the probability P π {individual is a member of ith category} = π i and let p = (p 1,..., p k ) denote the possible states of nature. Then, the parameter space is The hypothesis test is Θ = {p = (p 1,..., p k ); p i 0 for all i = 1,..., k, p i = 1}. H 0 : p i = π i, for all i = 1,..., k versus H 1 : p i π i, for some i = 1,..., k. The data x = (x 1,..., x n ) are the outcomes of the n observations. Let s use the likelihood ration criterion to create a test for the distribution of human blood types in a given population. For the data x = {O, B, O, A, A, A, A, A, O, AB} for the blood types of tested individuals, then, in the case of independent observations, the likelihood is L(p x) = p O p B p O p A p A p A p A p A p O p AB = p 3 Op 5 Ap B p AB. Notice that the likelihood has a factor of p i whenever an observation take on the value i. In other words, if we summarize the data using n i = #{observations from category i} to create n = (n 1, n 2,, n k ), a vector that records the number of observations in each category, then, the likelihood function L(p n) = p n1 1 pn k k. The likelihood ratio is the ratio of the maximum value of the likelihood under the null hypothesis and the maximum likelihood for any parameter value. In this case, the numerator is the likelihood evaluated at π. Λ(n) = L(n π) L(n ˆp) = πn1 1 πn2 2 ( ) πn k n1 ( ) nk k π1 πk ˆp n1 1 ˆpn2 2 =. (1) ˆpn k ˆp 1 ˆp k k 261
2 To find the maximum likelihood estimator ˆp, we, as usual, begin by taking the logarithm ln L(p n) = n i ln p i. Because not every set of values for p i is admissible, we cannot just take derivatives, set them equal to 0 and solve. Indeed, we must find a maximum under the constraint s(p) = p i = 1. The maximization problem is now stated in terms of the method of Lagrange multipliers. This method tells us that at the maximum likelihood estimator (ˆp 1,..., ˆp k ), the gradient of ln L(p n) is proportional to the gradient of the constraint s(p). To explain this briefly, recall that the gradient of a function is a vector that is perpendicular to a level set of that function. In this case, ˆp s(p) is perpendicular to the level set {p; s(p) = 1}. Now imagine walking along the set of parameter values of p given by the constraint s(p) = 1, keeping track of the values of the score function ln L(p n). If the walk takes us from a value of the score function below l 0 to values above l 0 then (See Figure 1.), the level surfaces s(p) = 1 {p; s(p) = 1} and {ln L(p n = l 0 } intersect. Consequently, the gradients level sets for the score function, values increasing moving downward ˆp s(p) and ˆp ln L(p n) point in different directions on the intersection of these two surfaces. At Figure 1: Lagrance multipliers Level sets of the score function.shown in dashed blue. The level a local maximum or minimum of the set {s(p) = 1} shown in black. The gradients for the score function and the constraint indicated score function, the level surfaces are by dashed blue and black arrows, respectively. At the maximum, these two arrows are parallel, Their ratio λ is called the Langrange multiplier. If we view the blue dashed lines as contour lines tangent and the two gradients are parallel. In other words, the these two up or down hill. When the trail reaches its highest elevation, the trail is tangent to a contour line on a contour map and the black line as a trail. The crossing contour line indicates walking either gradients vectors are related by a constant of proportionality, λ, known as the Lagrange multiplier. Consequently, at extreme and the gradient for the hill is perpendicular to the trail. values, ( p 1 ln L(ˆp n),..., p ln L(ˆp n) = λ ˆp s(p). ) ln L(ˆp n) = p k ( n1,..., n ) k = λ(1,..., 1) ˆp 1 ˆp k ( p 1 s(p),..., Each of the components of the two vectors must be equal. In other words, ) s(p) p k n i ˆp i = λ, n i = λˆp i for all i = 1,..., k. (2) 262
3 Now sum this equality for all values of i and use the constraint s(p) = 1 to obtain Returning to (2), we have that n i = λ ˆp i = λs(p) = λ and n = λ. n 1 ˆp i = n and ˆp i = n i n. This is the answer we would guess  the estimate for p i is the fraction of observations in category i. We substitute for ˆp i in the likelihood ratio (1) to obtain Λ(n) = L(n π) L(n ˆp) = ( ) n1 ( ) nk nπ1 nπk. n 1 Recall that we reject the null hypothesis if this ratio is too low, i.e, the maximum likelihood under the null hypothesis is sufficiently smaller than the maximum likelihood under the alternative hypothesis. For random variables X 1, X 2,..., X n having distribution π, let s summarize the observations as suggested above N i = #{j; X j = i} Set N = (N 1,..., N k ) to be the vector of observed number of occurrences for each category i. Then the likelihood ratio test statistic 2 ln Λ n (N) = 2 N i ln nπ i = 2 N i ln N i N i nπ i has approximately a χ 2 k 1 distribution. The test statistic 2 ln Λ n (n) is generally rewritten using the notation O i = n i for the number of observed occurrences of i and E i = nπ i for the number of expected occurrences of i as given by H 0. Then, we can write the test statistic as 2 ln Λ n (O) = 2 O i ln O i (3) E i This is called the G test statistic. The traditional method for a test of goodness of fit was introduced between 1895 and 1900 by Karl Pearson and consequenttly has been in use for longer that the concept of likelihood ratio tests. To show the connection between the two tests, we begin by giving the quadratic Taylor polynomial approximation to the natural logarithm expanded around the value a 0 = 1. Apply this to 2 ln Λ n (n), we find that 2 ln Λ n (n) = 2 The data can be displayed in a table = 2 n k ln a (a 1) 1 2 (a 1)2. = 0 + ( (nπi ) n i 1 1 n i 2 (nπ i n i ) + (nπ i n i ) 2 n i. ( ) ) 2 nπi 1 n i ( ) 2 nπi n i 1 n i 263
4 Then, 2 ln Λ n (O) i 1 2 k observed O 1 O 2 O k expected E 1 E 2 E k (nπ i n i ) 2 n i (O i E i ) 2 E i. Example 1. The Red Cross recommends that a blood bank maintains 44% blood type O, 42% blood type A, 10% blood type B, 4% blood type AB. You suspect that the distribution of blood types in Tucson is not the same as the recommendation. In this case, the hypothesis is H 0 : p O = 0.44, p A = 0.42, p B = 0.10, p AB = 0.04 versus H 1 : at least one p i is unequal to the given values Based on 400 observations, we find 228 type O, 124 type A, 40 type B and 8 type AB. This give the table i O A B AB observed expected The commands to perform the chisquare test in R is shown below. The program computes the expected number of observations. > chisq.test(c(228,124,40,8),p=c(0.44,0.42,0.10,0.04)) Chisquared test for given probabilities data: c(228, 124, 40, 8) Xsquared = , df = 3, pvalue = 8.977e07 The number of degrees of freedom is 4 1 = 3. Note that the pvalue is very low and so the distribution of blood types in Tucson is very unlikely to be the same as the national distribution. We can also perform the test using the Gstatistic in (3): > O<c(228,124,40,8) > E<sum(O)*c(0.44,0.42,0.10,0.04) > Gstat<2*sum(O*log(O/E)) > Gstat [1] > 1pchisq(Gstat,3) [1] e07 Example 2 (HardyWeinberg equilibrium). As we saw with Gregor Mendel s pea experiments, the twoallele Hardy Weinberg principle states that after two generations of random mating the genotypic frequencies can be represented by a binomial distribution. So, if a population is segregating for two alleles A 1 and A 2 at an autosomal locus with frequencies p 1 and p 2. Then, random mating would give a proportion p 11 = p 2 1 for the A 1 A 1 genotype, p 12 = 2p 1 p 2 for the A 1 A 2 genotype, and p 22 = p 2 2 for the A 2 A 2 genotype. Then, p 1 = p p 12 p 2 = p p 12. This give us 5 parameters: p 1, p 2, p 11, p 12, p 22, with the constraints that p 1 + p 2 = p 11 + p 12 + p 22 = 1. Thus n = 3. The two restrictions from the HandyWeinberg equilibrium above give us that k = 2. Thus, the chisquare test statistic will have 1 degree of freedom. 264
5 McDonald et al. (1996) examined variation at the CVJ5 locus in the American oyster, Crassostrea virginica. There were two alleles, L and S, and the genotype frequencies in Panacea, Florida were 14 LL, 21 LS, and 25 SS. So, So, the estimate of p 1 and p 2 are So, the expected number of observations is ˆp 11 = 14 60, ˆp 12 = 21 60, ˆp 22 = ˆp 1 = ˆp ˆp 12 = , ˆp 2 = ˆp ˆp 12 = E 11 = 60ˆp 2 1 = , E 12 = 60 2ˆp 1 ˆp 2 = , E 22 = 60ˆp 2 2 = The chisquare statistic The pvalue (14 10) (21 29) (25 21)2 21 = = > 1pchisq(4.569,1) [1] Thus, we have moderate evidence againt the null hypothesis of a HardyWeinberg equilibrium. Many forces may be the cause of this  nonrandom mating, selection, or migration to name a few possibilities. Exercise 3. Perform the chi square test using the G statistic for the example above. 1 Contingency tables For an r c contingency table, we consider two classifications A and B for an experiment. This gives c categories for classification A and r categories A 1,... A c B 1,... B r for classification B. Here, we write O ij to denote the number of occurences for which an individual falls into both category A i and category B j. The results is then organized into a twoway table. A 1 A 2 A c total B 1 O 11 O 12 O 1c O 1 B 2 O 21 O 22 O 2c O B r O r1 O r2 O rc O r total O 1 O 2 O c n The null hypothesis is that the classifications A and B are independent. To set the parameters for this model, we define p ij = P {an individual is simultaneously a member of category A i and category B j }. 265
6 Then, we have the parameter space Θ = {p = (p ij, 1 i r, 1 j c); p ij 0 for all i, j = 1, r c p ij = 1}. j=1 Write the marginal probabilities c r p i = p ij and p j = p ij. j=1 The null hypothesis of independence of the categories A and B can be written H 0 : p ij = p i p j, for all i, j versus H 1 : p ij p i p j, for some i, j. Follow the procedure as before for the goodness of fit test to end with the test statistic 2 r c j=1 O ij ln E ij O ij r c j=1 (O ij E ij ) 2 E ij. The null hypothesis p ij = p i p j can be written in terms of observed and expected observations as or E ij n = O i O j n n. E ij = O i O j n. Example 4. Returning to the study of the smoking habits of 5375 high school children in Tucson in 1967, here is a twoway table summarizing some of the results. The expected table is student student smokes does not smoke total 2 parents smoke parent smokes parents smoke total student student smokes does not smoke total 2 parents smoke parent smokes parents smoke total For example, E 11 = O 1 O 1 n = =
7 To compute the chisquare statistic ( ) ( ) ( ) ( ) ( ) ( ) = = The number of degrees of freedom is (r 1) (c 1) = (3 1) (2 1) = 2. Thus, the pvalue > 1pchisq(37.57,2) [1] e09 is very small and leads us to reject the null hypothesis. Thus, children smoking habits are not independent of their parents smoking habits. An examination of the individual cells shows that the children of parents who do not smoke are less likely to smoke and children who have two parents that smoke are more likely to smoke. Under the null hypothesis, each cell has a mean of 1 and so values much greater than 1 show contribution that leads to the rejection of H 0. R does the computation for us using the chisq.test command > smoking<matrix(c(400,416,188,1380,1823,1168),nrow=3) > smoking [,1] [,2] [1,] [2,] [3,] > chisq.test(smoking) Pearson s Chisquared test data: smoking Xsquared = , df = 2, pvalue = 6.959e09 Example 5. We have the following data on malaria in three regions of the world. The expected table is South Asia America Africa Total Malaria Type A Malaria Type B Malaria Type C Total
8 South Asia America Africa Total Malaria Type A Malaria Type B Malaria Type C Total The values for each of the cels in the chisquare statistic is = The degrees of freedom are (c 1)(r 1) = 2 2 = 4. In R, we have > malaria<matrix(c(31,2,53,45,53,2,14,5,45),nrow=3,ncol=3) > malaria [,1] [,2] [,3] [1,] [2,] [3,] > chisq.test(malaria) Pearson s Chisquared test data: malaria Xsquared = , df = 4, pvalue < 2.2e16 This provides strong evidence against the independence of malaria type and continent. Exercise 6 (twobytwo tables). Here is the contingency table can be thought of as two sets of Bernoulli trials. group 1 group 2 total successes x 1 x 2 x 1 + x 2 failures n 1 x 1 n 2 x 2 (n 1 + n 2 ) (x 1 + x 2 ) total n 1 n 2 n 1 + n 2 Show that the chisquare test is equivalent to the twosided two sample proportion test. 2 Applicability and Alternatives to Chisquared Tests The chisquare test uses the central limit theorem and so is based on the ability to us a normal approximation. On criterion, the Cochran conditions requires no cell has count zero, and more than 80% of the cells have counts at least 5. If this does not hold, then Fisher s exact test uses the hypergeometric distribution (or its generalization) directly rather than normal approximation. For example, for the 2 2 table 268
9 Then, from the point of view of the B classification: A 1 A 2 total B 1 O 11 O 12 O 1 B 2 O 21 O 22 O 2 total O 1 O 2 n From the O 1 belonging to category B 1, O 11 also belong to A 1 This outcome can be accomplished in ) ( O1 ways. From the O 2 belonging to category B 2, O 12 also belong to A 1 This outcome can be accomplished in ) O 11 ( O2 O 21 We have O 1 from n observations with A 1. This can be accomplished in ( ) O 1 n ways. Under the null hypothesis that every individual can be placed in any group, provided we have the given marginal information. In this case, the probability of the table above has hypergeometric formula ( O1 )( O2 ) O 11 O ( 21 O 1 n ) = O 1!O 2!O 1!O 2! O 11!O 12!O 21!O 22!n!. (4) Notice that the formula is symmetric in the column and row variables. Thus, if we had derived the hypergeometric formula from the point of view of the A classification we would have obtained exactly the dame formula (4). To complete the exact test, we now compute the hypergeometric probabilities over all possible choices for entries in the cells that result in the given marginal values, and rank these probabilities from most likely to least likely. Find the ranking of the actual data. For a onesided test of too rare, the P value is the sum of probabilities of the ranking lower than that of the data. A similar idea applies to general r c tables. Example 7. We now return to a table on hemoglobin genotypes on two Indonesian islands. Recall that heterozygotes are protected against malaria. genotype AA AE EE Flores Sumba We noted that heterozygotes are rare on Flores and that it appears that malaria is less prevalent there since the heterozygote does not provide an adaptive advantage. Here are both the chisquare test and the Fisher exact test. 269
10 > genotype<matrix(c(128,119,6,78,0,4),nrow=2) > genotype [,1] [,2] [,3] [1,] [2,] > chisq.test(genotype) Pearson s Chisquared test data: genotype Xsquared = , df = 2, pvalue = 1.238e12 Warning message: In chisq.test(genotype) : Chisquared approximation may be incorrect > fisher.test(genotype) Fisher s Exact Test for Count Data data: genotype pvalue = 3.907e15 alternative hypothesis: two.sided Note that R cautions against the use of the chisquare test with these data. 3 Answer to Selected Exercise 3. Here is the R output. > O<c(14,21,25) > phat<c(o[1]+o[2]/2,o[3]+o[2]/2)/sum(o) > phat [1] > E<sum(O)*c(phat[1]ˆ2,2*phat[1]*phat[2],phat[2]ˆ2) > E [1] > sum(e) [1] 60 > Gstat<2*sum(O*log(O/E)) > Gstat [1] > 1pchisq(Gstat,1) [1] The table of expected observations successes group 1 group 2 total n 1(x 1+x 2) n 2(x 1+x 2) n 1+n 2 n 1+n 2 x 1 + x 2 n failures 1((n 1+n 2) (x 1+x 2)) n 2((n 1+n 2) (x 1+x 2)) n 1+n 2 n 1+n 2 (n 1 + n 2 ) (x 1 + x 2 ) total n 1 n 2 n 1 + n 2 270
11 Now, write ˆp i = x i /n i for the sample proportions from each group, and ˆp 0 = x 1 + x 2 n 1 + n 2 for the pooled sample proportion. Then we have the table of observed and expected observations observed group 1 group 2 total successes n 1 ˆp 1 n 2 ˆp 2 (n 1 + n 2 )ˆp 0 failures n 1 (1 ˆp 1 ) n 2 (1 ˆp 2 ) (n 1 + n 2 )(1 ˆp 0 ) total n 1 n 2 n 1 + n 2 The chisquare test statistic expected group 1 group 2 total successes n 1 ˆp 0 n 2 ˆp 0 (n 1 + n 2 )ˆp 0 failures n 1 (1 ˆp 0 ) n 2 (1 ˆp 0 ) (n 1 + n 2 )(1 ˆp 0 ) total n 1 n 2 n 1 + n 2 (n 1(ˆp 1 ˆp 0)) 2 n 1 ˆp 0 + (n2(ˆp2 ˆp0))2 n 2 ˆp 0 + (n1((1 ˆp1) (1 ˆp0)))2 n 1(1 ˆp 0) + (n2((1 ˆp2)+(1 ˆp0)))2 n 2(1 ˆp 0) = n 1 (ˆp 1 ˆp 0) 2 ˆp 0 + n 2 (ˆp 2 ˆp 0) 2 ˆp 0 (ˆp + n 1 ˆp 0) 2 (ˆp 1 (1 ˆp 0) + n 2 ˆp 0) 2 2 (1 ˆp 0) = n 1 (ˆp 1 ˆp 0 ) 2 1 ˆp + n 0(1 p 0) 2(ˆp 2 ˆp 0 ) 2 1 n = 1(ˆp 1 ˆp 0) 2 +n 2(ˆp 2 ˆp 0) 2 ˆp 0(1 p 0) = ln Λ(x 1, x 2 ) from the likelihood ratio computation for the twosided two sample proportion test ˆp 0(1 p 0) 271
Topic 19: Goodness of Fit
Topic 19: November 24, 2009 A goodness of fit test examine the case of a sequence if independent experiments each of which can have 1 of k possible outcomes. In terms of hypothesis testing, let π = (π
More informationO, A, B, and AB. 0 for all i =1,...,k,
Topic 1 1.1 Fit of a Distribution Goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Fit of a Distribution 1 / 14 Outline Fit of a Distribution Blood Bank Likelihood Function Likelihood Ratio Lagrange Multipliers Hanging ChiGram 2 / 14 Fit of a Distribution Goodness
More informationTesting on proportions
Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is
More informationGoodness of fit  2 classes
Goodness of fit  2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact pvalue Exact confidence interval Normal approximation
More informationUnit 29 ChiSquare GoodnessofFit Test
Unit 29 ChiSquare GoodnessofFit Test Objectives: To perform the chisquare hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
More informationGoodness of Fit Goodness of fit  2 classes
Goodness of Fit Goodness of fit  2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact pvalue Exact confidence interval
More informationCHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining pvalues likelihood ratio criterion
CHAPTER NINE Key Concepts chisquare distribution, chisquare test, degrees of freedom observed and expected values, goodnessoffit tests contingency table, dependent (response) variables, independent
More informationChapter 19 The ChiSquare Test
Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The ChiSquare Test In this chapter, we will discuss the following topics: We will plot
More informationChapter 4. The analysis of Segregation
Chapter 4. The analysis of Segregation 1 The law of segregation (Mendel s rst law) There are two alleles in the two homologous chromosomes at a certain marker. One of the two alleles received from one
More informationChi Square for Contingency Tables
2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationLecture 7: Binomial Test, Chisquare
Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two
More informationTopic 8. Chi Square Tests
BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test
More informationStatistics for EES and MEME Chisquare tests and Fisher s exact test
Statistics for EES and MEME Chisquare tests and Fisher s exact test Dirk Metzler 3. June 2013 Contents 1 X 2 goodnessoffit test 1 2 X 2 test for homogeneity/independence 3 3 Fisher s exact test 7 4
More informationCHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES
CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chisquare distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chisquare is
More informationBivariate Statistics Session 2: Measuring Associations ChiSquare Test
Bivariate Statistics Session 2: Measuring Associations ChiSquare Test Features Of The ChiSquare Statistic The chisquare test is nonparametric. That is, it makes no assumptions about the distribution
More information13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.
13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More information7 Hypothesis testing  one sample tests
7 Hypothesis testing  one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X
More informationModule 5 Hypotheses Tests: Comparing Two Groups
Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this
More informationTests for Two Proportions
Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics
More informationChisquare test Fisher s Exact test
Lesson 1 Chisquare test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
More information2 GENETIC DATA ANALYSIS
2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationMATH 10: Elementary Statistics and Probability Chapter 11: The ChiSquare Distribution
MATH 10: Elementary Statistics and Probability Chapter 11: The ChiSquare Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,
More informationCATEGORICAL DATA ChiSquare Tests for Univariate Data
CATEGORICAL DATA ChiSquare Tests For Univariate Data 1 CATEGORICAL DATA ChiSquare Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.
More informationToday s lecture. Lecture 6: Dichotomous Variables & Chisquare tests. Proportions and 2 2 tables. How do we compare these proportions
Today s lecture Lecture 6: Dichotomous Variables & Chisquare tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios
More informationChisquare test Testing for independeny The r x c contingency tables square test
Chisquare test Testing for independeny The r x c contingency tables square test 1 The chisquare distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computeraided
More informationThe Chi Square Test. Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.
The Chi Square Test Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.) Concepts: TwoWay Tables The Problem of Multiple Comparisons Expected
More informationPopulation and Community Dynamics
Population and Community Dynamics Part 1. Genetic Diversity in Populations Pages 676 to 701 Part 2. Population Growth and Interactions Pages 702 to 745 Review Evolution by Natural Selection new variants
More informationSection 12 Part 2. Chisquare test
Section 12 Part 2 Chisquare test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationCategorical Data Analysis
Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods
More informationNonInferiority Tests for Two Proportions
Chapter 0 NonInferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for noninferiority and superiority tests in twosample designs in which
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationChisquare (χ 2 ) Tests
Math 442  Mathematical Statistics II May 5, 2008 Common Uses of the χ 2 test. 1. Testing Goodnessoffit. Chisquare (χ 2 ) Tests 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4. Testing
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationAppendix A The ChiSquare (32) Test in Genetics
Appendix A The ChiSquare (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between
More informationAllele Frequencies and Hardy Weinberg Equilibrium
Allele Frequencies and Hardy Weinberg Equilibrium Summer Institute in Statistical Genetics 013 Module 8 Topic Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies
More informationCONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont
CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency
More informationHypothesis Testing. Learning Objectives. After completing this module, the student will be able to
Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and
More informationVI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
More informationHardyWeinberg Equilibrium Problems
HardyWeinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in HardyWeinberg equilibrium. (a) Calculate the percentage of
More informationOdds ratio, Odds ratio test for independence, chisquared statistic.
Odds ratio, Odds ratio test for independence, chisquared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review
More informationTest of Hypotheses. Since the NeymanPearson approach involves two statistical hypotheses, one has to decide which one
Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win $1 if head and lose $1 if tail. After 10 plays, you lost $2 in net (4 heads
More informationThe ChiSquare Test. STAT E50 Introduction to Statistics
STAT 50 Introduction to Statistics The ChiSquare Test The Chisquare test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed
More informationChiSquare Tests. In This Chapter BONUS CHAPTER
BONUS CHAPTER ChiSquare Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: ChiSquared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chisquared
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationChapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics  Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
More informationFor a particular allele N, its frequency in a population is calculated using the formula:
Date: Calculating Allele Frequency Definitions: Allele frequency is a measure of the relative frequency of an allele in a population. Microevolution is defined as the change in the frequency of alleles
More informationAnalyzing tables. Chisquared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO
Analyzing tables Chisquared, exact and monte carlo tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Thursday 26.05.2011 1 / 50 Overview Aalen chapter 6.5,
More informationChapter 23. Two Categorical Variables: The ChiSquare Test
Chapter 23. Two Categorical Variables: The ChiSquare Test 1 Chapter 23. Two Categorical Variables: The ChiSquare Test TwoWay Tables Note. We quickly review twoway tables with an example. Example. Exercise
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 OneWay ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationTwoSample TTest from Means and SD s
Chapter 07 TwoSample TTest from Means and SD s Introduction This procedure computes the twosample ttest and several other twosample tests directly from the mean, standard deviation, and sample size.
More informationApplications in population genetics. Hanan Hamamy Department of Genetic Medicine and Development Geneva University
Applications in population genetics Hanan Hamamy Department of Genetic Medicine and Development Geneva University Training Course in Sexual and Reproductive Health Research Geneva 2013 Population genetics
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationIs it statistically significant? The chisquare test
UAS Conference Series 2013/14 Is it statistically significant? The chisquare test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chisquare? Tests whether two categorical
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationSimulating ChiSquare Test Using Excel
Simulating ChiSquare Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu
More informationAlgebra 2 Chapter 1 Vocabulary. identity  A statement that equates two equivalent expressions.
Chapter 1 Vocabulary identity  A statement that equates two equivalent expressions. verbal model A word equation that represents a reallife problem. algebraic expression  An expression with variables.
More informationHypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...
Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More information1. Comparing Two Means: Dependent Samples
1. Comparing Two Means: ependent Samples In the preceding lectures we've considered how to test a difference of two means for independent samples. Now we look at how to do the same thing with dependent
More informationNonInferiority Tests for Two Means using Differences
Chapter 450 oninferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for noninferiority tests in twosample designs in which the outcome is a continuous
More informationTesting Research and Statistical Hypotheses
Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you
More informationSenior Secondary Australian Curriculum
Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 14)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 14) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationTopic 1 Displaying Data
Topic 1 Displaying Data Categorical Data 1 / 13 Outline Types of Data Pie Charts Bar Charts Twoway Tables Segmented Bar Chart 2 / 13 Types of Data A data set provides information about a group of individuals.
More informationThe NeymanPearson lemma. The NeymanPearson lemma
The NeymanPearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to
More informationMendelian Genetics. I. Background
Mendelian Genetics Objectives 1. To understand the Principles of Segregation and Independent Assortment. 2. To understand how Mendel s principles can explain transmission of characters from one generation
More informationChiSquare Tests and the FDistribution. Goodness of Fit Multinomial Experiments. Chapter 10
Chapter 0 ChiSquare Tests and the FDistribution 0 Goodness of Fit Multinomial xperiments A multinomial experiment is a probability experiment consisting of a fixed number of trials in which there are
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationstatistics Chisquare tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals
Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss
More informationChi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)
Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.
More information3.4 Statistical inference for 2 populations based on two samples
3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationTwo Categorical Variables: the ChiSquare Test
CHAPTER 22 Two Categorical Variables: the ChiSquare Test 22.1 22.2 22.3 22.4 Data Analysis for TwoWay Tables Inference for TwoWay Tables Goodness of Fit Selected Exercise Solutions Introduction We saw
More informationTopic 6: Conditional Probability and Independence
Topic 6: September 1520, 2011 One of the most important concepts in the theory of probability is based on the question: How do we modify the probability of an event in light of the fact that something
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationTHE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.
THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationConfidence Intervals for the Difference Between Two Means
Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means
More informationBasic Principles of Forensic Molecular Biology and Genetics. Population Genetics
Basic Principles of Forensic Molecular Biology and Genetics Population Genetics Significance of a Match What is the significance of: a fiber match? a hair match? a glass match? a DNA match? Meaning of
More informationPopulation Genetics (Outline)
Population Genetics (Outline) Definition of terms of population genetics: population, species, gene, pool, gene flow Calculation of genotypic of homozygous dominant, recessive, or heterozygous individuals,
More informationChapter 11 Chi square Tests
11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with
More informationMath 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2
Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable
More informationUnit 24 Hypothesis Tests about Means
Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a twosample t test To perform a paired t test To perform a twosample t test A measure of the amount
More informationINTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS
INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS By Dr. Susan Petro Based on a lab by Dr. Elaine Winshell Nicotiana tabacum Objectives To apply Mendel s Law of Segregation To use Punnett
More informationMCB142/IB163 Mendelian and Population Genetics 9/19/02
MCB142/IB163 Mendelian and Population Genetics 9/19/02 Practice questions lectures 512 (Hardy Weinberg, chisquare test, Mendel s second law, gene interactions, linkage maps, mapping human diseases, nonrandom
More informationMATH4427 Notebook 3 Spring MATH4427 Notebook Hypotheses: Definitions and Examples... 3
MATH4427 Notebook 3 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MATH4427 Notebook 3 3 3.1 Hypotheses: Definitions and Examples............................
More informationProbability Calculator
Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that
More informationChi Square Distribution
17. Chi Square A. Chi Square Distribution B. OneWay Tables C. Contingency Tables D. Exercises Chi Square is a distribution that has proven to be particularly useful in statistics. The first section describes
More informationCHI SQUARE DISTRIBUTION
CI SQUARE DISTRIBUTION 1 Introduction to the Chi Square Test of Independence This test is used to analyse the relationship between two sets of discrete data. Contingency tables are used to examine the
More information