# Goodness of fit - 2 classes

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 Goodness of fit - 2 classes A B Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval Normal approximation

2 Goodness of fit - 3 classes AA AB BB Do these data correspond reasonably to the proportions 1:2:1?

3 Multinomial distribution Imagine an urn with k types of balls. Let p i denote the proportion of type i. Draw n balls with replacement. Outcome: (n 1, n 2,...,n k ), with i n i = n, where n i is the no. balls drawn that were of type i. n! P(X 1 =n 1,...,X k =n k )= n 1! n k! pn 1 1 pn k k if 0 n i n, i n i =n Otherwise P(X 1 =n 1,...,X k =n k )=0.

4 Example Let (p 1, p 2, p 3 ) =(0.25,0.50,0.25)andn=100. P(X 1 =35, X 2 =43, X 3 =22) = 100! 35! 43! 22! Rather brutal, numerically speaking. Take logs (and use a computer).

5 Goodness of fit test We observe (n 1, n 2, n 3 ) Multinomial(n,p={p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 0.25, p 2 = 0.5, p 3 = versus H a : H 0 is false. We need two things: Ateststatistic. The null distribution of the test statistic.

6 The likelihood-ratio test (LRT) Back to the first example: A n A B n B Test H 0 :(p A, p B )=(π A,π B ) versus H a :(p A, p B ) (π A,π B ). MLE under H a : ˆp A = n A /n where n = n A + n B. Likelihood under H a : Likelihood under H 0 : L a =Pr(n A p A = ˆp A )= ( ) n n A ˆp n A A L 0 =Pr(n A p A = π A )= ( ) n n A π n A A (1 ˆp A ) n n A (1 π A) n n A Likelihood ratio test statistic: LRT = 2 ln (L a /L 0 ) Some clever people have shown that if H 0 is true, then LRT follows a χ 2 (df=1) distribution (approximately).

7 Likelihood-ratio test for the example We observed n A =78andn B =22. H 0 :(p A, p B )=(0.75,0.25) H a :(p A, p B ) (0.75,0.25) L a =Pr(n A =78 p A =0.78) = ( ) = L 0 =Pr(n A =78 p A =0.75) = ( ) = LRT = 2 ln (L a /L 0 )=0.49. Using a χ 2 (df=1) distribution, we get a p-value of We therefore have no evidence against the null hypothesis. In R: p-value = 1 - pchisq(0.49,1)

8 Null distribution Obs LRT

9 A little math... n=n A +n B, n 0 A =E[n A H 0 ]=n π A, n 0 B =E[n B H 0 ]=n π B. ( ) na ( Then L a /L 0 = na n nb 0 n A 0 B ) nb ) Or equivalently LRT = 2 n A ln( na n 0 A ) +2 n B ln( nb. n 0 B Why do this?

10 Generalization to more than two groups If we have k groups, then the likelihood ratio test statistic is LRT = 2 k i=1 n i ln ( ni n 0 i ) If H 0 is true, LRT χ 2 (df=k-1)

11 Null distributions 3 groups: χ 2 (df=2) 5 groups: χ 2 (df=4) groups: χ 2 (df=6) 9 groups: χ 2 (df=8)

12 Example In a dihybrid cross of tomatos we expect the ratio of the phenotypes to be 9:3:3:1. In 1611 tomatos, we observe the numbers 926, 288, 293, 104. Do these numbers support our hypothesis? Phenotype n i n 0 i n i /n 0 i n i ln ( ) n i /n 0 i Tall, cut-leaf Tall, potato-leaf Dwarf, cut-leaf Dwarf, potato-leaf Sum

13 Results Obs LRT The test statistics LRT is Using a χ 2 (df=3) distribution, we get a p-value of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1.

14 The chi-square test There is an alternative technique. The test is called the chi-square test, and has the greater tradition in the literature. For two groups, calculate the following: X 2 = ( n A n 0 A) 2 n 0 A + ( n B n 0 B) 2 n 0 B If H 0 is true, then X 2 is a draw from a χ 2 (df=1) distribution (approximately).

15 Example In the first example we observed n A =78andn B = 22. Under the null hypothesis we have n 0 A =75andn 0 B =25.Wethereforeget X 2 = (78-75) (22-25)2 25 = =0.48. This corresponds to a p-value of We therefore have no evidence against the hypothesis (p A, p B )=(0.75,0.25). Note: using the likelihood ratio test we got a p-value of In R: chisq.test(c(78,22),p=c(0.75,0.25))

16 Generalization to more than two groups As with the likelihood ratio test, there is a generalization to more than just two groups. If we have k groups, the chi-square test statistic we use is X 2 = k i=1 (n i n 0 i ) 2 n 0 i χ 2 (df=k-1)

17 Tomato example For the tomato example we get X 2 = ( ) ( ) ( ) ( ) = = 1.47 Using a χ 2 (df=3) distribution, we get a p-value of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1. Using the likelihood ratio test we also got a p-value of In R: chisq.test(c(926,288,293,104),p=c(9,3,3,1)/16)

18 Test statistics Let n 0 i denote the expected count in group i if H 0 is true. LRT statistic LRT = 2 ln { } Pr(data p=mle) Pr(data H 0 ) =...=2 i n i ln(n i /n 0 i ) χ 2 test statistic X 2 = (observed expected) 2 expected = i (n i n 0 i ) 2 n 0 i

19 Null distribution of test statistic What values of LRT (or X 2 )shouldweexpect,ifh 0 were true? The null distributions of these statistics may be obtained by: Brute-force analytic calculations Computer simulations Asymptotic approximations If the sample size n is large, we have LRT χ 2 (k 1) and X 2 χ 2 (k 1)

20 The brute-force method Pr(LRT = g H 0 ) = n 1,n 2,n 3 giving LRT = g Pr(n 1, n 2, n 3 H 0 ) This is not feasible.

21 Computer simulation 1. Simulate a table conforming to the null hypothesis. E.g., simulate (n 1, n 2, n 3 ) Multinomial(n=100, {1/4, 1/2, 1/4}) 2. Calculate your test statistic. 3. Repeat steps (1) and (2) many (e.g., 1000 or 10,000) times. Estimated critical value the 95th percentile of the results. Estimated P-value the prop n of results the observed value. In R, use rmultinom(n, size, prob) to do n simulations of a Multinomial(size, prob).

22 Example We observe the following data: AA AB BB We imagine that these are counts (n 1, n 2, n 3 ) Multinomial(n=100,{p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 1/4, p 2 = 1/2, p 3 = 1/4. We calculate LRT 4.96 and X Referring to the asymptotic approximations (χ 2 dist n with 2 degrees of freedom), we obtain P 8.4% and P 6.9%. With 10,000 simulations under H 0,wegetP 8.9% and P 7.4%.

23 Example Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2

24 Summary and recommendation For either the LRT or the χ 2 test: The null distribution is approximately χ 2 (k 1) if the sample size is large. The null distribution can be approximated by simulating data under the null hypothesis. If the sample size is sufficiently large that the expected count in eachcell is 5, use the asymptotic approximation without worries. Otherwise, consider using computer simulations.

25 Composite hypotheses Sometimes, we ask not p AA = 0.25, p AB = 0.5, p BB = 0.25 But rather something like: p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. For example: Consider the genotypes, of a random sample of individuals, at a diallelic locus. Is the locus in Hardy-Weinberg equilibrium (as expected in the case of random mating)? Example data: AA AB BB

26 Another example ABO blood groups 3allelesA,B,O. Phenotype A genotype AA or AO B genotype BB or BO AB genotype AB O genotype O Allele frequencies: f A, f B, f O (Note that f A + f B + f O = 1) Under Hardy-Weinberg equilibrium, we expect p A = f 2 A + 2f A f O p B = f 2 B + 2f B f O p AB = 2f A f B p O = f 2 O Example data: O A B AB

27 LRT for example 1 Data: (n AA, n AB, n BB ) Multinomial(n,{p AA, p AB, p BB }) We seek to test whether the data conform reasonably to H 0 : p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. General MLEs: ˆp AA = n AA /n, ˆp AB = n AB /n, ˆp BB = n BB /n MLE under H 0 : ˆf =(naa + n AB /2)/n p AA = ˆf 2, p AB = 2ˆf (1 ˆf), p BB =(1 ˆf) 2 LRT statistic: LRT = 2 ln { } Pr(nAA, n AB, n BB ˆp AA, ˆp AB, ˆp BB ) Pr(n AA, n AB, n BB p AA, p AB, p BB )

28 LRT for example 2 Data: (n O, n A, n B, n AB ) Multinomial(n,{p O, p A, p B, p AB }) We seek to test whether the data conform reasonably to H 0 : p A = f 2 A + 2f A f O,p B = f 2 B + 2f B f O,p AB = 2f A f B,p O = f 2 O for some f O, f A, f B,wheref O + f A + f B = 1. General MLEs: ˆp O, ˆp A, ˆp B, ˆp AB, like before. MLE under H 0 : Requires numerical optimization Call them (ˆf O,ˆf A,ˆf B ) ( p O, p A, p B, p AB ) LRT statistic: LRT = 2 ln { } Pr(nO, n A, n B, n AB ˆp O, ˆp A, ˆp B, ˆp AB ) Pr(n O, n A, n B, n AB p O, p A, p B, p AB )

29 χ 2 test for these examples Obtain the MLE(s) under H 0. Calculate the corresponding cell probabilities. Turn these into (estimated) expected counts under H 0. Calculate X 2 = (observed expected) 2 expected

30 Null distribution for these cases Computer simulation (with one wrinkle) Simulate data under H 0 (plug in the MLEs for the observed data) Calculate the MLE with the simulated data Calculate the test statistic with the simulated data Repeat many times Asymptotic approximation Under H 0,ifthesamplesize,n,islarge,boththeLRTstatistic and the χ 2 statistic follow, approximately, a χ 2 distribution with k s 1degrees of freedom, where s is the number of parameters estimated under H 0. Note that s = 1 for example 1, and s = 2 for example 2, and so df = 1 for both examples.

31 Example 1 Example data: AA AB BB MLE: ˆf =(5+20/2)/100=15% Expected counts: Test statistics: LRT statistic = 3.87 X 2 =4.65 Asymptotic χ 2 (df = 1) approx n: P 4.9% P 3.1% 10,000 computer simulations: P 8.2% P 2.4%

32 Example 1 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic 95th %ile = 3.36 Observed X 2

33 Example 2 Example data: O A B AB MLE: ˆfO 62.8%, ˆf A 25.0%, ˆf B 12.2%. Expected counts: Test statistics: LRT statistic = 1.99 X 2 =2.10 Asymptotic χ 2 (df = 1) approx n: P 16% P 15% 10,000 computer simulations: P 17% P 15%

34 Example 2 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2

35 Example 3 Data on number of sperm bound to an egg: count Do these follow a Poisson distribution? MLE: ˆλ =sampleaverage=( )/ Expected counts n 0 i = n e ˆλ ˆλi / i!

36 Example observed expected X 2 = (obs exp) 2 exp =...=42.8 LRT = 2 obs log(obs/exp) =...=18.8 Compare to χ 2 (df = = 4) P-value = (χ 2 )and (LRT). By simulation: p-value = 16/10,000 (χ 2 ) and 7/10,000 (LRT)

37 Null simulation results Observed Simulated χ 2 statistic Observed Simulated LRT statistic

38 A final note With these sorts of goodness-of-fit tests, we are often happy when our model does fit. In other words, we often prefer to fail to reject H 0. Such a conclusion, that the data fit the model reasonably well, should be phrased and considered with caution. We should think: how much power do I have to detect, with these limited data, a reasonable deviation from H 0?

39 2 x 2 tables Apply a treatment to 20 mice from strains A and B, and observe survival. Question: N Y A B Are the survival rates in the two strains the same? Gather 100 rats and determine whether they are infected with viruses A and B. I-B NI-B I-A NI-A Question: Is infection with virus A independent of infection with virus B?

40 Underlying probabilities Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Model: (n 00, n 01, n 10, n 11 ) Multinomial(n,{p 00, p 01, p 10, p 11 }) or n 01 Binomial(n 0+, p 01 /p 0+ ) and n 11 Binomial(n 1+, p 11 /p 1+ )

41 Conditional probabilities Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Conditional probabilities Pr(B = 1 A=0)=p 01 /p 0+ Pr(B = 1 A=1)=p 11 /p 1+ Pr(A = 1 B=0)=p 10 /p +0 Pr(A = 1 B=1)=p 11 /p +1 The questions in the two examples are the same! They both concern: p 01 /p 0+ = p 11 /p 1+ Equivalently: p ij = p i+ p +j for all i,j think Pr(A and B) = Pr(A) Pr(B).

42 This is a composite hypothesis! 2x2table Adifferentview B 0 1 A 0 p 00 p 01 p 0+ p 00 p 01 p 10 p 11 1 p 10 p 11 p 1+ p +0 p +1 1 H 0 : p ij = p i+ p +j for all i,j H 0 : p ij = p i+ p +j for all i,j Degrees of freedom = = 1

43 Expected counts Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Expected counts B 0 1 A 0 e 00 e 01 n 0+ 1 e 10 e 11 n 1+ n +0 n +1 n To get the expected counts under the null hypothesis we: Estimate p 1+ and p +1 by n 1+ /nandn +1 /n, respectively. These are the MLEs under H 0! Turn these into estimates of the p ij. Multiply these by the total sample size, n.

44 The expected counts The expected count (assuming H 0 ) for the 11 cell is the following: e 11 = n ˆp 11 = n ˆp 1+ ˆp +1 = n (n 1+ /n) (n +1 /n) =(n 1+ n +1 )/n The other cells are similar. We can then calculate a χ 2 or LRT statistic as before!

45 Example 1 Observed data N Y A B Expected counts N Y A B X 2 = ( ) ( ) (2 5.5) (9 5.5)2 5.5 =6.14 LRT = 2 [18 log( ) log( 5.5 )] =6.52 P-values (based on the asymptotic χ 2 (df = 1) approximation): 1.3% and 1.1%, respectively.

46 Example 2 Observed data I-B NI-B I-A NI-A Expected counts I-B NI-B I-A NI-A X 2 = (9 5.2) ( ) (9 12.8) ( ) =4.70 LRT = 2 [9 log( ) log( 58.2 )] =4.37 P-values (based on the asymptotic χ 2 (df = 1) approximation): 3.0% and 3.7%, respectively.

47 r x k tables Blood type Population A B AB O Florida Iowa Missouri Same distribution of blood types in each population?

48 Underlying probabilities Observed data 1 2 k 1 n 11 n 12 n 1k n 1+ 2 n 21 n 22 n 2k n r n r1 n r2 n rk n r+ n +1 n +2 n +k n Underlying probabilities 1 2 k 1 p 11 p 12 p 1k p 1+ 2 p 21 p 22 p 2k p r p r1 p r2 p rk p r+ p +1 p +2 p +k 1 H 0 : p ij = p i+ p +j for all i,j.

49 Expected counts Observed data A B AB O F I M Expected counts A B AB O F I M Expected counts under H 0 : e ij = n i+ n +j /n for all i,j.

50 χ 2 and LRT statistics Observed data A B AB O F I M Expected counts A B AB O F I M X 2 statistic = (obs exp) 2 exp = =5.64 LRT statistic = 2 obs ln(obs/exp) = =5.55

51 Asymptotic approximation If the sample size is large, the null distribution of the χ 2 and likelihood ratio test statistics will approximately follow a χ 2 distribution with (r 1) (k 1) d.f. In the example, df = (3 1) (4 1) = 6 X 2 =5.64 P=0.46. LRT = 5.55 P=0.48.

52 Two-locus linkage in an intercross BB Bb bb AA Aa aa Are these two loci linked?

53 General test of independence Observed data BB Bb bb AA Aa aa Expected counts BB Bb bb AA Aa aa χ 2 test: X 2 =10.4 P=3.5% (df = 4) LRT test: LRT = 9.98 P=4.1% Fisher s exact test: P = 4.6%

54 A more specific test Observed data BB Bb bb AA Aa aa Underlying probabilities BB Bb bb AA 1 4 (1 θ)2 1 2 θ(1 θ) 1 4 θ2 Aa 1 2 θ(1 θ) 1 2 [θ2 +(1 θ) 2 1 ] 2θ(1 θ) 1 aa 4 θ2 1 2 θ(1 θ) 1 4 (1 θ)2 H 0 : θ =1/2 versus H a : θ<1/2 Use a likelihood ratio test! Obtain the general MLE of θ. Calculate the LRT statistic = 2 ln Compare this statistic to a χ 2 (df = 1). { Pr(data ˆθ) } Pr(data θ=1/2)

55 Results BB Bb bb AA Aa aa MLE: ˆθ =0.359 LRT statistic: LRT = 7.74 P=0.54% (df = 1) Here we assume Mendelian segregation, and that deviation from H 0 is in a particular direction. If these assumptions are correct, we ll have greater power to detect linkage using this more specific approach.

### Goodness of Fit Goodness of fit - 2 classes

Goodness of Fit Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval

### Topic 19: Goodness of Fit

Topic 19: November 24, 2009 A goodness of fit test examine the case of a sequence if independent experiments each of which can have 1 of k possible outcomes. In terms of hypothesis testing, let π = (π

### Topic 21: Goodness of Fit

Topic 21: December 5, 2011 A goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of 4 possible

### Chapter 4. The analysis of Segregation

Chapter 4. The analysis of Segregation 1 The law of segregation (Mendel s rst law) There are two alleles in the two homologous chromosomes at a certain marker. One of the two alleles received from one

### Chi Square for Contingency Tables

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

### Allele Frequencies and Hardy Weinberg Equilibrium

Allele Frequencies and Hardy Weinberg Equilibrium Summer Institute in Statistical Genetics 013 Module 8 Topic Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies

### Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Fit of a Distribution 1 / 14 Outline Fit of a Distribution Blood Bank Likelihood Function Likelihood Ratio Lagrange Multipliers Hanging Chi-Gram 2 / 14 Fit of a Distribution Goodness

### Statistics for EES and MEME Chi-square tests and Fisher s exact test

Statistics for EES and MEME Chi-square tests and Fisher s exact test Dirk Metzler 3. June 2013 Contents 1 X 2 goodness-of-fit test 1 2 X 2 test for homogeneity/independence 3 3 Fisher s exact test 7 4

### Multiple random variables

Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of

### CATEGORICAL DATA Chi-Square Tests for Univariate Data

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.

### MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

### 2 GENETIC DATA ANALYSIS

2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have

### O, A, B, and AB. 0 for all i =1,...,k,

Topic 1 1.1 Fit of a Distribution Goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of

### CHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining p-values likelihood ratio criterion

CHAPTER NINE Key Concepts chi-square distribution, chi-square test, degrees of freedom observed and expected values, goodness-of-fit tests contingency table, dependent (response) variables, independent

### Mendelian Genetics. I. Background

Mendelian Genetics Objectives 1. To understand the Principles of Segregation and Independent Assortment. 2. To understand how Mendel s principles can explain transmission of characters from one generation

### Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

### Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

### tests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a Chi-Square Test for

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes

### 4. Sum the results of the calculation described in step 3 for all classes of progeny

F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives

### Testing on proportions

Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is

### 12.5: CHI-SQUARE GOODNESS OF FIT TESTS

125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

### CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES

CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chi-square distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chi-square is

### Chi-square test Testing for independeny The r x c contingency tables square test

Chi-square test Testing for independeny The r x c contingency tables square test 1 The chi-square distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computer-aided

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS

INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS By Dr. Susan Petro Based on a lab by Dr. Elaine Winshell Nicotiana tabacum Objectives To apply Mendel s Law of Segregation To use Punnett

### Lecture 42 Section 14.3. Tue, Apr 8, 2008

the Lecture 42 Section 14.3 Hampden-Sydney College Tue, Apr 8, 2008 Outline the 1 2 the 3 4 5 the The will compute χ 2 areas, but not χ 2 percentiles. (That s ok.) After performing the χ 2 test by hand,

### We know from STAT.1030 that the relevant test statistic for equality of proportions is:

2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

### AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Chapter 11 Chi square Tests

11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with

### Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

### Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

### The Goodness-of-Fit Test

on the Lecture 49 Section 14.3 Hampden-Sydney College Tue, Apr 21, 2009 Outline 1 on the 2 3 on the 4 5 Hypotheses on the (Steps 1 and 2) (1) H 0 : H 1 : H 0 is false. (2) α = 0.05. p 1 = 0.24 p 2 = 0.20

### Today s lecture. Lecture 6: Dichotomous Variables & Chi-square tests. Proportions and 2 2 tables. How do we compare these proportions

Today s lecture Lecture 6: Dichotomous Variables & Chi-square tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios

### 9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

### Statistical Analysis of Blood Group Using Maximum Likelihood Estimation

International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-6, June 2014 Statistical Analysis of Blood Group Using Maximum Likelihood Estimation Nagarajan.D, Sunitha.P,

### Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

### Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

### 13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

### Today s Objectives. Probability rules apply to inheritance at more than one locus

Figure 14.8 Segregation of alleles and fertilization as chance events Today s Objectives Use rules of probability to solve genetics problems Define dominance, incomplete dominance, and co-dominance Extend

### LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

### Appendix A The Chi-Square (32) Test in Genetics

Appendix A The Chi-Square (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between

### H + T = 1. p(h + T) = p(h) x p(t)

Probability and Statistics Random Chance A tossed penny can land either heads up or tails up. These are mutually exclusive events, i.e. if the coin lands heads up, it cannot also land tails up on the same

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Basic concepts and introduction to statistical inference

Basic concepts and introduction to statistical inference Anna Helga Jonsdottir Gunnar Stefansson Sigrun Helga Lund University of Iceland (UI) Basic concepts 1 / 19 A review of concepts Basic concepts Confidence

### Chapter 9: Hypothesis Testing Sections

Chapter 9: Hypothesis Testing Sections - we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 Two-Sided Alternatives 9.5 The t Test 9.6 Comparing the

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### Morgan s fly experiment. Linkage Mapping. Genetic recombination. Genetic Recombination. Crossing Over and Recombination. CS Jim Holland

Morgan s fly experiment Linkage Mapping CS71 009 Jim Holland r r Vg Vg (red eyes, normal wings) prprvgvg (purple eyes, vestigial wings) r prvg vg (F1) prprvgvg 19 151 151 1195 r prvg vg r prvgvg prprvg

### Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

### MCB142/IB163 Mendelian and Population Genetics 9/19/02

MCB142/IB163 Mendelian and Population Genetics 9/19/02 Practice questions lectures 5-12 (Hardy Weinberg, chi-square test, Mendel s second law, gene interactions, linkage maps, mapping human diseases, non-random

### Mendelian Inheritance & Probability

Mendelian Inheritance & Probability (CHAPTER 2- Brooker Text) January 31 & Feb 2, 2006 BIO 184 Dr. Tom Peavy Problem Solving TtYy x ttyy What is the expected phenotypic ratio among offspring? Tt RR x Tt

### The alternative hypothesis,, is the statement that the parameter value somehow differs from that claimed by the null hypothesis. : 0.5 :>0.5 :<0.

Section 8.2-8.5 Null and Alternative Hypotheses... The null hypothesis,, is a statement that the value of a population parameter is equal to some claimed value. :=0.5 The alternative hypothesis,, is the

### Chapter Five: Paired Samples Methods 1/38

Chapter Five: Paired Samples Methods 1/38 5.1 Introduction 2/38 Introduction Paired data arise with some frequency in a variety of research contexts. Patients might have a particular type of laser surgery

### The Neyman-Pearson lemma. The Neyman-Pearson lemma

The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

### Analyzing tables. Chi-squared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO

Analyzing tables Chi-squared, exact and monte carlo tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Thursday 26.05.2011 1 / 50 Overview Aalen chapter 6.5,

### Exact Confidence Intervals

Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter

### HLA data analysis in anthropology: basic theory and practice

HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,

### Introduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

### DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS. Posc/Uapp 816 CONTINGENCY TABLES

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 CONTINGENCY TABLES I. AGENDA: A. Cross-classifications 1. Two-by-two and R by C tables 2. Statistical independence 3. The interpretation

### Dr. Young. Genetics Problems Set #1 Answer Key

BIOL276 Dr. Young Name Due Genetics Problems Set #1 Answer Key For problems in genetics, if no particular order is specified, you can assume that a specific order is not required. 1. What is the probability

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### Chi Square Test. Paul E. Johnson Department of Political Science. 2 Center for Research Methods and Data Analysis, University of Kansas

Chi Square Test Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2011 What is this Presentation? Chi Square Distribution Find

### Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

### Bivariate Analysis. Comparisons of proportions: Chi Square Test (X 2 test) Variable 1. Variable 2 2 LEVELS >2 LEVELS CONTINUOUS

Bivariate Analysis Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X 2 chi square test >2 LEVELS X 2 chi square test CONTINUOUS t-test X 2 chi square test X 2 chi square test ANOVA (F-test)

### Chi-square test Fisher s Exact test

Lesson 1 Chi-square test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions

### Chapter 4: Statistical Hypothesis Testing

Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics - Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin

### Random Uniform Clumped. 0 1 2 3 4 5 6 7 8 9 Number of Individuals per Sub-Quadrat. Number of Individuals per Sub-Quadrat

4-1 Population ecology Lab 4: Population dispersion patterns I. Introduction to population dispersion patterns The dispersion of individuals in a population describes their spacing relative to each other.

### Hypothesis Testing. Learning Objectives. After completing this module, the student will be able to

Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and

### Hardy-Weinberg Equilibrium Problems

Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of

### PASS Sample Size Software

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

### Hypothesis testing S2

Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to

### Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

### Testing Hypotheses using SPSS

Is the mean hourly rate of male workers \$2.00? T-Test One-Sample Statistics Std. Error N Mean Std. Deviation Mean 2997 2.0522 6.6282.2 One-Sample Test Test Value = 2 95% Confidence Interval Mean of the

### Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Basics of Statistical Machine Learning

CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

### LS4 Problem Set 2, Population Genetics. Corresponding Quiz: November (Along with positional cloning)

LS4 Problem Set 2, 2010 Population Genetics Corresponding Lectures: November 5, 8 and 10 Corresponding Reading: Hartwell et al., 757-773 Corresponding Quiz: November 15-19 (Along with positional cloning)

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Breeding Value versus Genetic Value

Breeding Value versus Genetic Value Gene351 (not( Gene251) Lecture 6 Key terms Breeding value Genetic value Average effect of alleles Reading: Prescribed None Suggested Falconer & McKay, 1996 Revision

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Probability Models & Frequency Data Goodness of Fit Proportional Model Chi-square Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any

### Estimating the Frequency Distribution of the. Numbers Bet on the California Lottery

Estimating the Frequency Distribution of the Numbers Bet on the California Lottery Mark Finkelstein November 15, 1993 Department of Mathematics, University of California, Irvine, CA 92717. Running head:

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 10 Chi-Square Test Relative Risk/Odds Ratios

UCLA STAT 3 Introduction to Statistical Methods for the Life and Health Sciences Instructor: Ivo Dinov, Asst. Prof. of Statistics and Neurology Chapter 0 Chi-Square Test Relative Risk/Odds Ratios Teaching

### Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms

Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms Robert D. Cousins Dept. of Physics and Astronomy University of California, Los Angeles,

### Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

### Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Answers to Mendelian genetics questions BI164 Spring, 2007

Answers to Mendelian genetics questions BI164 Spring, 2007 1. The father has normal vision and must therefore be hemizygous for the normal vision allele. The mother must be a carrier and hence the source

### Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

### Basic Premises of Population Genetics

Population genetics is concerned with the origin, amount and distribution of genetic variation present in populations of organisms, and the fate of this variation through space and time. The fate of genetic

Transmission patterns of linked genes Linkage, recombination and gene mapping Why are linkage relationships important? Linkage and independent assortment - statistical tests of hypotheses Recombination

### Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### 11-2 Goodness of Fit Test

11-2 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table). We will use a hypothesis