# Topic 21: Goodness of Fit

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Topic 21: December 5, 2011 A goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of 4 possible of blood types, A, B, AB, and O. The local blood bank has good information from a national database of the fraction of individuals having each blood type, π O, π A, π B, and π AB. The actual fraction p O, p A, p B, and p AB of these blood types in the community for a given blood bank may be different than what is seen in the national database. As a consequency, the local blood bank may choose to alter its distribution of blood supply to more accurately reflect local conditions. To place this assessment strategy in terms of formal hypothesis testing, let π = (π 1,..., π k ) be postulated values of the probability P π {individual is a member of i-th category} = π i and let p = (p 1,..., p k ) denote the possible states of nature. Then, the parameter space is The hypothesis test is Θ = {p = (p 1,..., p k ); p i 0 for all i = 1,..., k, p i = 1}. H 0 : p i = π i, for all i = 1,..., k versus H 1 : p i π i, for some i = 1,..., k. The data x = (x 1,..., x n ) are the outcomes of the n observations. Let s use the likelihood ration criterion to create a test for the distribution of human blood types in a given population. For the data x = {O, B, O, A, A, A, A, A, O, AB} for the blood types of tested individuals, then, in the case of independent observations, the likelihood is L(p x) = p O p B p O p A p A p A p A p A p O p AB = p 3 Op 5 Ap B p AB. Notice that the likelihood has a factor of p i whenever an observation take on the value i. In other words, if we summarize the data using n i = #{observations from category i} to create n = (n 1, n 2,, n k ), a vector that records the number of observations in each category, then, the likelihood function L(p n) = p n1 1 pn k k. The likelihood ratio is the ratio of the maximum value of the likelihood under the null hypothesis and the maximum likelihood for any parameter value. In this case, the numerator is the likelihood evaluated at π. Λ(n) = L(n π) L(n ˆp) = πn1 1 πn2 2 ( ) πn k n1 ( ) nk k π1 πk ˆp n1 1 ˆpn2 2 =. (1) ˆpn k ˆp 1 ˆp k k 261

2 To find the maximum likelihood estimator ˆp, we, as usual, begin by taking the logarithm ln L(p n) = n i ln p i. Because not every set of values for p i is admissible, we cannot just take derivatives, set them equal to 0 and solve. Indeed, we must find a maximum under the constraint s(p) = p i = 1. The maximization problem is now stated in terms of the method of Lagrange multipliers. This method tells us that at the maximum likelihood estimator (ˆp 1,..., ˆp k ), the gradient of ln L(p n) is proportional to the gradient of the constraint s(p). To explain this briefly, recall that the gradient of a function is a vector that is perpendicular to a level set of that function. In this case, ˆp s(p) is perpendicular to the level set {p; s(p) = 1}. Now imagine walking along the set of parameter values of p given by the constraint s(p) = 1, keeping track of the values of the score function ln L(p n). If the walk takes us from a value of the score function below l 0 to values above l 0 then (See Figure 1.), the level surfaces s(p) = 1 {p; s(p) = 1} and {ln L(p n = l 0 } intersect. Consequently, the gradients level sets for the score function, values increasing moving downward ˆp s(p) and ˆp ln L(p n) point in different directions on the intersection of these two surfaces. At Figure 1: Lagrance multipliers Level sets of the score function.shown in dashed blue. The level a local maximum or minimum of the set {s(p) = 1} shown in black. The gradients for the score function and the constraint indicated score function, the level surfaces are by dashed blue and black arrows, respectively. At the maximum, these two arrows are parallel, Their ratio λ is called the Langrange multiplier. If we view the blue dashed lines as contour lines tangent and the two gradients are parallel. In other words, the these two up or down hill. When the trail reaches its highest elevation, the trail is tangent to a contour line on a contour map and the black line as a trail. The crossing contour line indicates walking either gradients vectors are related by a constant of proportionality, λ, known as the Lagrange multiplier. Consequently, at extreme and the gradient for the hill is perpendicular to the trail. values, ( p 1 ln L(ˆp n),..., p ln L(ˆp n) = λ ˆp s(p). ) ln L(ˆp n) = p k ( n1,..., n ) k = λ(1,..., 1) ˆp 1 ˆp k ( p 1 s(p),..., Each of the components of the two vectors must be equal. In other words, ) s(p) p k n i ˆp i = λ, n i = λˆp i for all i = 1,..., k. (2) 262

3 Now sum this equality for all values of i and use the constraint s(p) = 1 to obtain Returning to (2), we have that n i = λ ˆp i = λs(p) = λ and n = λ. n 1 ˆp i = n and ˆp i = n i n. This is the answer we would guess - the estimate for p i is the fraction of observations in category i. We substitute for ˆp i in the likelihood ratio (1) to obtain Λ(n) = L(n π) L(n ˆp) = ( ) n1 ( ) nk nπ1 nπk. n 1 Recall that we reject the null hypothesis if this ratio is too low, i.e, the maximum likelihood under the null hypothesis is sufficiently smaller than the maximum likelihood under the alternative hypothesis. For random variables X 1, X 2,..., X n having distribution π, let s summarize the observations as suggested above N i = #{j; X j = i} Set N = (N 1,..., N k ) to be the vector of observed number of occurrences for each category i. Then the likelihood ratio test statistic 2 ln Λ n (N) = 2 N i ln nπ i = 2 N i ln N i N i nπ i has approximately a χ 2 k 1 distribution. The test statistic 2 ln Λ n (n) is generally rewritten using the notation O i = n i for the number of observed occurrences of i and E i = nπ i for the number of expected occurrences of i as given by H 0. Then, we can write the test statistic as 2 ln Λ n (O) = 2 O i ln O i (3) E i This is called the G test statistic. The traditional method for a test of goodness of fit was introduced between 1895 and 1900 by Karl Pearson and consequenttly has been in use for longer that the concept of likelihood ratio tests. To show the connection between the two tests, we begin by giving the quadratic Taylor polynomial approximation to the natural logarithm expanded around the value a 0 = 1. Apply this to 2 ln Λ n (n), we find that 2 ln Λ n (n) = 2 The data can be displayed in a table = 2 n k ln a (a 1) 1 2 (a 1)2. = 0 + ( (nπi ) n i 1 1 n i 2 (nπ i n i ) + (nπ i n i ) 2 n i. ( ) ) 2 nπi 1 n i ( ) 2 nπi n i 1 n i 263

4 Then, 2 ln Λ n (O) i 1 2 k observed O 1 O 2 O k expected E 1 E 2 E k (nπ i n i ) 2 n i (O i E i ) 2 E i. Example 1. The Red Cross recommends that a blood bank maintains 44% blood type O, 42% blood type A, 10% blood type B, 4% blood type AB. You suspect that the distribution of blood types in Tucson is not the same as the recommendation. In this case, the hypothesis is H 0 : p O = 0.44, p A = 0.42, p B = 0.10, p AB = 0.04 versus H 1 : at least one p i is unequal to the given values Based on 400 observations, we find 228 type O, 124 type A, 40 type B and 8 type AB. This give the table i O A B AB observed expected The commands to perform the chi-square test in R is shown below. The program computes the expected number of observations. > chisq.test(c(228,124,40,8),p=c(0.44,0.42,0.10,0.04)) Chi-squared test for given probabilities data: c(228, 124, 40, 8) X-squared = , df = 3, p-value = 8.977e-07 The number of degrees of freedom is 4 1 = 3. Note that the p-value is very low and so the distribution of blood types in Tucson is very unlikely to be the same as the national distribution. We can also perform the test using the G-statistic in (3): > O<-c(228,124,40,8) > E<-sum(O)*c(0.44,0.42,0.10,0.04) > Gstat<-2*sum(O*log(O/E)) > Gstat [1] > 1-pchisq(Gstat,3) [1] e-07 Example 2 (Hardy-Weinberg equilibrium). As we saw with Gregor Mendel s pea experiments, the two-allele Hardy- Weinberg principle states that after two generations of random mating the genotypic frequencies can be represented by a binomial distribution. So, if a population is segregating for two alleles A 1 and A 2 at an autosomal locus with frequencies p 1 and p 2. Then, random mating would give a proportion p 11 = p 2 1 for the A 1 A 1 genotype, p 12 = 2p 1 p 2 for the A 1 A 2 genotype, and p 22 = p 2 2 for the A 2 A 2 genotype. Then, p 1 = p p 12 p 2 = p p 12. This give us 5 parameters: p 1, p 2, p 11, p 12, p 22, with the constraints that p 1 + p 2 = p 11 + p 12 + p 22 = 1. Thus n = 3. The two restrictions from the Handy-Weinberg equilibrium above give us that k = 2. Thus, the chi-square test statistic will have 1 degree of freedom. 264

5 McDonald et al. (1996) examined variation at the CVJ5 locus in the American oyster, Crassostrea virginica. There were two alleles, L and S, and the genotype frequencies in Panacea, Florida were 14 LL, 21 LS, and 25 SS. So, So, the estimate of p 1 and p 2 are So, the expected number of observations is ˆp 11 = 14 60, ˆp 12 = 21 60, ˆp 22 = ˆp 1 = ˆp ˆp 12 = , ˆp 2 = ˆp ˆp 12 = E 11 = 60ˆp 2 1 = , E 12 = 60 2ˆp 1 ˆp 2 = , E 22 = 60ˆp 2 2 = The chi-square statistic The p-value (14 10) (21 29) (25 21)2 21 = = > 1-pchisq(4.569,1) [1] Thus, we have moderate evidence againt the null hypothesis of a Hardy-Weinberg equilibrium. Many forces may be the cause of this - non-random mating, selection, or migration to name a few possibilities. Exercise 3. Perform the chi square test using the G statistic for the example above. 1 Contingency tables For an r c contingency table, we consider two classifications A and B for an experiment. This gives c categories for classification A and r categories A 1,... A c B 1,... B r for classification B. Here, we write O ij to denote the number of occurences for which an individual falls into both category A i and category B j. The results is then organized into a two-way table. A 1 A 2 A c total B 1 O 11 O 12 O 1c O 1 B 2 O 21 O 22 O 2c O B r O r1 O r2 O rc O r total O 1 O 2 O c n The null hypothesis is that the classifications A and B are independent. To set the parameters for this model, we define p ij = P {an individual is simultaneously a member of category A i and category B j }. 265

6 Then, we have the parameter space Θ = {p = (p ij, 1 i r, 1 j c); p ij 0 for all i, j = 1, r c p ij = 1}. j=1 Write the marginal probabilities c r p i = p ij and p j = p ij. j=1 The null hypothesis of independence of the categories A and B can be written H 0 : p ij = p i p j, for all i, j versus H 1 : p ij p i p j, for some i, j. Follow the procedure as before for the goodness of fit test to end with the test statistic 2 r c j=1 O ij ln E ij O ij r c j=1 (O ij E ij ) 2 E ij. The null hypothesis p ij = p i p j can be written in terms of observed and expected observations as or E ij n = O i O j n n. E ij = O i O j n. Example 4. Returning to the study of the smoking habits of 5375 high school children in Tucson in 1967, here is a two-way table summarizing some of the results. The expected table is student student smokes does not smoke total 2 parents smoke parent smokes parents smoke total student student smokes does not smoke total 2 parents smoke parent smokes parents smoke total For example, E 11 = O 1 O 1 n = =

7 To compute the chi-square statistic ( ) ( ) ( ) ( ) ( ) ( ) = = The number of degrees of freedom is (r 1) (c 1) = (3 1) (2 1) = 2. Thus, the p-value > 1-pchisq(37.57,2) [1] e-09 is very small and leads us to reject the null hypothesis. Thus, children smoking habits are not independent of their parents smoking habits. An examination of the individual cells shows that the children of parents who do not smoke are less likely to smoke and children who have two parents that smoke are more likely to smoke. Under the null hypothesis, each cell has a mean of 1 and so values much greater than 1 show contribution that leads to the rejection of H 0. R does the computation for us using the chisq.test command > smoking<-matrix(c(400,416,188,1380,1823,1168),nrow=3) > smoking [,1] [,2] [1,] [2,] [3,] > chisq.test(smoking) Pearson s Chi-squared test data: smoking X-squared = , df = 2, p-value = 6.959e-09 Example 5. We have the following data on malaria in three regions of the world. The expected table is South Asia America Africa Total Malaria Type A Malaria Type B Malaria Type C Total

8 South Asia America Africa Total Malaria Type A Malaria Type B Malaria Type C Total The values for each of the cels in the chi-square statistic is = The degrees of freedom are (c 1)(r 1) = 2 2 = 4. In R, we have > malaria<-matrix(c(31,2,53,45,53,2,14,5,45),nrow=3,ncol=3) > malaria [,1] [,2] [,3] [1,] [2,] [3,] > chisq.test(malaria) Pearson s Chi-squared test data: malaria X-squared = , df = 4, p-value < 2.2e-16 This provides strong evidence against the independence of malaria type and continent. Exercise 6 (two-by-two tables). Here is the contingency table can be thought of as two sets of Bernoulli trials. group 1 group 2 total successes x 1 x 2 x 1 + x 2 failures n 1 x 1 n 2 x 2 (n 1 + n 2 ) (x 1 + x 2 ) total n 1 n 2 n 1 + n 2 Show that the chi-square test is equivalent to the two-sided two sample proportion test. 2 Applicability and Alternatives to Chi-squared Tests The chi-square test uses the central limit theorem and so is based on the ability to us a normal approximation. On criterion, the Cochran conditions requires no cell has count zero, and more than 80% of the cells have counts at least 5. If this does not hold, then Fisher s exact test uses the hypergeometric distribution (or its generalization) directly rather than normal approximation. For example, for the 2 2 table 268

9 Then, from the point of view of the B classification: A 1 A 2 total B 1 O 11 O 12 O 1 B 2 O 21 O 22 O 2 total O 1 O 2 n From the O 1 belonging to category B 1, O 11 also belong to A 1 This outcome can be accomplished in ) ( O1 ways. From the O 2 belonging to category B 2, O 12 also belong to A 1 This outcome can be accomplished in ) O 11 ( O2 O 21 We have O 1 from n observations with A 1. This can be accomplished in ( ) O 1 n ways. Under the null hypothesis that every individual can be placed in any group, provided we have the given marginal information. In this case, the probability of the table above has hypergeometric formula ( O1 )( O2 ) O 11 O ( 21 O 1 n ) = O 1!O 2!O 1!O 2! O 11!O 12!O 21!O 22!n!. (4) Notice that the formula is symmetric in the column and row variables. Thus, if we had derived the hypergeometric formula from the point of view of the A classification we would have obtained exactly the dame formula (4). To complete the exact test, we now compute the hypergeometric probabilities over all possible choices for entries in the cells that result in the given marginal values, and rank these probabilities from most likely to least likely. Find the ranking of the actual data. For a one-sided test of too rare, the P -value is the sum of probabilities of the ranking lower than that of the data. A similar idea applies to general r c tables. Example 7. We now return to a table on hemoglobin genotypes on two Indonesian islands. Recall that heterozygotes are protected against malaria. genotype AA AE EE Flores Sumba We noted that heterozygotes are rare on Flores and that it appears that malaria is less prevalent there since the heterozygote does not provide an adaptive advantage. Here are both the chisquare test and the Fisher exact test. 269

10 > genotype<-matrix(c(128,119,6,78,0,4),nrow=2) > genotype [,1] [,2] [,3] [1,] [2,] > chisq.test(genotype) Pearson s Chi-squared test data: genotype X-squared = , df = 2, p-value = 1.238e-12 Warning message: In chisq.test(genotype) : Chi-squared approximation may be incorrect > fisher.test(genotype) Fisher s Exact Test for Count Data data: genotype p-value = 3.907e-15 alternative hypothesis: two.sided Note that R cautions against the use of the chisquare test with these data. 3 Answer to Selected Exercise 3. Here is the R output. > O<-c(14,21,25) > phat<-c(o[1]+o[2]/2,o[3]+o[2]/2)/sum(o) > phat [1] > E<-sum(O)*c(phat[1]ˆ2,2*phat[1]*phat[2],phat[2]ˆ2) > E [1] > sum(e) [1] 60 > Gstat<-2*sum(O*log(O/E)) > Gstat [1] > 1-pchisq(Gstat,1) [1] The table of expected observations successes group 1 group 2 total n 1(x 1+x 2) n 2(x 1+x 2) n 1+n 2 n 1+n 2 x 1 + x 2 n failures 1((n 1+n 2) (x 1+x 2)) n 2((n 1+n 2) (x 1+x 2)) n 1+n 2 n 1+n 2 (n 1 + n 2 ) (x 1 + x 2 ) total n 1 n 2 n 1 + n 2 270

11 Now, write ˆp i = x i /n i for the sample proportions from each group, and ˆp 0 = x 1 + x 2 n 1 + n 2 for the pooled sample proportion. Then we have the table of observed and expected observations observed group 1 group 2 total successes n 1 ˆp 1 n 2 ˆp 2 (n 1 + n 2 )ˆp 0 failures n 1 (1 ˆp 1 ) n 2 (1 ˆp 2 ) (n 1 + n 2 )(1 ˆp 0 ) total n 1 n 2 n 1 + n 2 The chi-square test statistic expected group 1 group 2 total successes n 1 ˆp 0 n 2 ˆp 0 (n 1 + n 2 )ˆp 0 failures n 1 (1 ˆp 0 ) n 2 (1 ˆp 0 ) (n 1 + n 2 )(1 ˆp 0 ) total n 1 n 2 n 1 + n 2 (n 1(ˆp 1 ˆp 0)) 2 n 1 ˆp 0 + (n2(ˆp2 ˆp0))2 n 2 ˆp 0 + (n1((1 ˆp1) (1 ˆp0)))2 n 1(1 ˆp 0) + (n2((1 ˆp2)+(1 ˆp0)))2 n 2(1 ˆp 0) = n 1 (ˆp 1 ˆp 0) 2 ˆp 0 + n 2 (ˆp 2 ˆp 0) 2 ˆp 0 (ˆp + n 1 ˆp 0) 2 (ˆp 1 (1 ˆp 0) + n 2 ˆp 0) 2 2 (1 ˆp 0) = n 1 (ˆp 1 ˆp 0 ) 2 1 ˆp + n 0(1 p 0) 2(ˆp 2 ˆp 0 ) 2 1 n = 1(ˆp 1 ˆp 0) 2 +n 2(ˆp 2 ˆp 0) 2 ˆp 0(1 p 0) = ln Λ(x 1, x 2 ) from the likelihood ratio computation for the two-sided two sample proportion test ˆp 0(1 p 0) 271

### Topic 19: Goodness of Fit

Topic 19: November 24, 2009 A goodness of fit test examine the case of a sequence if independent experiments each of which can have 1 of k possible outcomes. In terms of hypothesis testing, let π = (π

### O, A, B, and AB. 0 for all i =1,...,k,

Topic 1 1.1 Fit of a Distribution Goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of

### Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Fit of a Distribution 1 / 14 Outline Fit of a Distribution Blood Bank Likelihood Function Likelihood Ratio Lagrange Multipliers Hanging Chi-Gram 2 / 14 Fit of a Distribution Goodness

### Testing on proportions

Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is

### Goodness of fit - 2 classes

Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval Normal approximation

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval

### CHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining p-values likelihood ratio criterion

CHAPTER NINE Key Concepts chi-square distribution, chi-square test, degrees of freedom observed and expected values, goodness-of-fit tests contingency table, dependent (response) variables, independent

### Chapter 19 The Chi-Square Test

Tutorial for the integration of the software R with introductory statistics Copyright c Grethe Hystad Chapter 19 The Chi-Square Test In this chapter, we will discuss the following topics: We will plot

### Chapter 4. The analysis of Segregation

Chapter 4. The analysis of Segregation 1 The law of segregation (Mendel s rst law) There are two alleles in the two homologous chromosomes at a certain marker. One of the two alleles received from one

### Chi Square for Contingency Tables

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### We know from STAT.1030 that the relevant test statistic for equality of proportions is:

2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### Topic 8. Chi Square Tests

BE540W Chi Square Tests Page 1 of 5 Topic 8 Chi Square Tests Topics 1. Introduction to Contingency Tables. Introduction to the Contingency Table Hypothesis Test of No Association.. 3. The Chi Square Test

### Statistics for EES and MEME Chi-square tests and Fisher s exact test

Statistics for EES and MEME Chi-square tests and Fisher s exact test Dirk Metzler 3. June 2013 Contents 1 X 2 goodness-of-fit test 1 2 X 2 test for homogeneity/independence 3 3 Fisher s exact test 7 4

### CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES

CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chi-square distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chi-square is

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### 13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Module 5 Hypotheses Tests: Comparing Two Groups

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

### Tests for Two Proportions

Chapter 200 Tests for Two Proportions Introduction This module computes power and sample size for hypothesis tests of the difference, ratio, or odds ratio of two independent proportions. The test statistics

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### 2 GENETIC DATA ANALYSIS

2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

### CATEGORICAL DATA Chi-Square Tests for Univariate Data

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.

### Today s lecture. Lecture 6: Dichotomous Variables & Chi-square tests. Proportions and 2 2 tables. How do we compare these proportions

Today s lecture Lecture 6: Dichotomous Variables & Chi-square tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios

### Chi-square test Testing for independeny The r x c contingency tables square test

Chi-square test Testing for independeny The r x c contingency tables square test 1 The chi-square distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computer-aided

### The Chi Square Test. Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.

The Chi Square Test Diana Mindrila, Ph.D. Phoebe Balentyne, M.Ed. Based on Chapter 23 of The Basic Practice of Statistics (6 th ed.) Concepts: Two-Way Tables The Problem of Multiple Comparisons Expected

### Population and Community Dynamics

Population and Community Dynamics Part 1. Genetic Diversity in Populations Pages 676 to 701 Part 2. Population Growth and Interactions Pages 702 to 745 Review Evolution by Natural Selection new variants

### Section 12 Part 2. Chi-square test

Section 12 Part 2 Chi-square test McNemar s Test Section 12 Part 2 Overview Section 12, Part 1 covered two inference methods for categorical data from 2 groups Confidence Intervals for the difference of

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### Non-Inferiority Tests for Two Proportions

Chapter 0 Non-Inferiority Tests for Two Proportions Introduction This module provides power analysis and sample size calculation for non-inferiority and superiority tests in twosample designs in which

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Chi-square (χ 2 ) Tests

Math 442 - Mathematical Statistics II May 5, 2008 Common Uses of the χ 2 test. 1. Testing Goodness-of-fit. Chi-square (χ 2 ) Tests 2. Testing Equality of Several Proportions. 3. Homogeneity Test. 4. Testing

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Appendix A The Chi-Square (32) Test in Genetics

Appendix A The Chi-Square (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between

### Allele Frequencies and Hardy Weinberg Equilibrium

Allele Frequencies and Hardy Weinberg Equilibrium Summer Institute in Statistical Genetics 013 Module 8 Topic Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies

### CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont

CONTINGENCY TABLES ARE NOT ALL THE SAME David C. Howell University of Vermont To most people studying statistics a contingency table is a contingency table. We tend to forget, if we ever knew, that contingency

### Hypothesis Testing. Learning Objectives. After completing this module, the student will be able to

Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and

### VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

### Hardy-Weinberg Equilibrium Problems

Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of

### Odds ratio, Odds ratio test for independence, chi-squared statistic.

Odds ratio, Odds ratio test for independence, chi-squared statistic. Announcements: Assignment 5 is live on webpage. Due Wed Aug 1 at 4:30pm. (9 days, 1 hour, 58.5 minutes ) Final exam is Aug 9. Review

### Test of Hypotheses. Since the Neyman-Pearson approach involves two statistical hypotheses, one has to decide which one

Test of Hypotheses Hypothesis, Test Statistic, and Rejection Region Imagine that you play a repeated Bernoulli game: you win \$1 if head and lose \$1 if tail. After 10 plays, you lost \$2 in net (4 heads

### The Chi-Square Test. STAT E-50 Introduction to Statistics

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

### Chi-Square Tests. In This Chapter BONUS CHAPTER

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

### Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

### For a particular allele N, its frequency in a population is calculated using the formula:

Date: Calculating Allele Frequency Definitions: Allele frequency is a measure of the relative frequency of an allele in a population. Microevolution is defined as the change in the frequency of alleles

### Analyzing tables. Chi-squared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO

Analyzing tables Chi-squared, exact and monte carlo tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Thursday 26.05.2011 1 / 50 Overview Aalen chapter 6.5,

### Chapter 23. Two Categorical Variables: The Chi-Square Test

Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

### Two-Sample T-Test from Means and SD s

Chapter 07 Two-Sample T-Test from Means and SD s Introduction This procedure computes the two-sample t-test and several other two-sample tests directly from the mean, standard deviation, and sample size.

### Applications in population genetics. Hanan Hamamy Department of Genetic Medicine and Development Geneva University

Applications in population genetics Hanan Hamamy Department of Genetic Medicine and Development Geneva University Training Course in Sexual and Reproductive Health Research Geneva 2013 Population genetics

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Is it statistically significant? The chi-square test

UAS Conference Series 2013/14 Is it statistically significant? The chi-square test Dr Gosia Turner Student Data Management and Analysis 14 September 2010 Page 1 Why chi-square? Tests whether two categorical

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

### Algebra 2 Chapter 1 Vocabulary. identity - A statement that equates two equivalent expressions.

Chapter 1 Vocabulary identity - A statement that equates two equivalent expressions. verbal model- A word equation that represents a real-life problem. algebraic expression - An expression with variables.

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### 1. Comparing Two Means: Dependent Samples

1. Comparing Two Means: ependent Samples In the preceding lectures we've considered how to test a difference of two means for independent samples. Now we look at how to do the same thing with dependent

### Non-Inferiority Tests for Two Means using Differences

Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

### Testing Research and Statistical Hypotheses

Testing Research and Statistical Hypotheses Introduction In the last lab we analyzed metric artifact attributes such as thickness or width/thickness ratio. Those were continuous variables, which as you

### Senior Secondary Australian Curriculum

Senior Secondary Australian Curriculum Mathematical Methods Glossary Unit 1 Functions and graphs Asymptote A line is an asymptote to a curve if the distance between the line and the curve approaches zero

### How to Conduct a Hypothesis Test

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### Topic 1 Displaying Data

Topic 1 Displaying Data Categorical Data 1 / 13 Outline Types of Data Pie Charts Bar Charts Two-way Tables Segmented Bar Chart 2 / 13 Types of Data A data set provides information about a group of individuals.

### The Neyman-Pearson lemma. The Neyman-Pearson lemma

The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

### Mendelian Genetics. I. Background

Mendelian Genetics Objectives 1. To understand the Principles of Segregation and Independent Assortment. 2. To understand how Mendel s principles can explain transmission of characters from one generation

### Chi-Square Tests and the F-Distribution. Goodness of Fit Multinomial Experiments. Chapter 10

Chapter 0 Chi-Square Tests and the F-Distribution 0 Goodness of Fit Multinomial xperiments A multinomial experiment is a probability experiment consisting of a fixed number of trials in which there are

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### statistics Chi-square tests and nonparametric Summary sheet from last time: Hypothesis testing Summary sheet from last time: Confidence intervals

Summary sheet from last time: Confidence intervals Confidence intervals take on the usual form: parameter = statistic ± t crit SE(statistic) parameter SE a s e sqrt(1/n + m x 2 /ss xx ) b s e /sqrt(ss

### Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.

### 3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### Two Categorical Variables: the Chi-Square Test

CHAPTER 22 Two Categorical Variables: the Chi-Square Test 22.1 22.2 22.3 22.4 Data Analysis for Two-Way Tables Inference for Two-Way Tables Goodness of Fit Selected Exercise Solutions Introduction We saw

### Topic 6: Conditional Probability and Independence

Topic 6: September 15-20, 2011 One of the most important concepts in the theory of probability is based on the question: How do we modify the probability of an event in light of the fact that something

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Confidence Intervals for the Difference Between Two Means

Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

### Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

Basic Principles of Forensic Molecular Biology and Genetics Population Genetics Significance of a Match What is the significance of: a fiber match? a hair match? a glass match? a DNA match? Meaning of

### Population Genetics (Outline)

Population Genetics (Outline) Definition of terms of population genetics: population, species, gene, pool, gene flow Calculation of genotypic of homozygous dominant, recessive, or heterozygous individuals,

### Chapter 11 Chi square Tests

11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### Unit 24 Hypothesis Tests about Means

Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a two-sample t test To perform a paired t test To perform a two-sample t test A measure of the amount

### INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS

INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS By Dr. Susan Petro Based on a lab by Dr. Elaine Winshell Nicotiana tabacum Objectives To apply Mendel s Law of Segregation To use Punnett

### MCB142/IB163 Mendelian and Population Genetics 9/19/02

MCB142/IB163 Mendelian and Population Genetics 9/19/02 Practice questions lectures 5-12 (Hardy Weinberg, chi-square test, Mendel s second law, gene interactions, linkage maps, mapping human diseases, non-random

### MATH4427 Notebook 3 Spring MATH4427 Notebook Hypotheses: Definitions and Examples... 3

MATH4427 Notebook 3 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 3 MATH4427 Notebook 3 3 3.1 Hypotheses: Definitions and Examples............................

### Probability Calculator

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that