Goodness of fit  2 classes


 Peter Lawrence
 2 years ago
 Views:
Transcription
1 Goodness of fit  2 classes A B Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact pvalue Exact confidence interval Normal approximation
2 Goodness of fit  3 classes AA AB BB Do these data correspond reasonably to the proportions 1:2:1?
3 Multinomial distribution Imagine an urn with k types of balls. Let p i denote the proportion of type i. Draw n balls with replacement. Outcome: (n 1, n 2,...,n k ), with i n i = n, where n i is the no. balls drawn that were of type i. n! P(X 1 =n 1,...,X k =n k )= n 1! n k! pn 1 1 pn k k if 0 n i n, i n i =n Otherwise P(X 1 =n 1,...,X k =n k )=0.
4 Example Let (p 1, p 2, p 3 ) =(0.25,0.50,0.25)andn=100. P(X 1 =35, X 2 =43, X 3 =22) = 100! 35! 43! 22! Rather brutal, numerically speaking. Take logs (and use a computer).
5 Goodness of fit test We observe (n 1, n 2, n 3 ) Multinomial(n,p={p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 0.25, p 2 = 0.5, p 3 = versus H a : H 0 is false. We need two things: Ateststatistic. The null distribution of the test statistic.
6 The likelihoodratio test (LRT) Back to the first example: A n A B n B Test H 0 :(p A, p B )=(π A,π B ) versus H a :(p A, p B ) (π A,π B ). MLE under H a : ˆp A = n A /n where n = n A + n B. Likelihood under H a : Likelihood under H 0 : L a =Pr(n A p A = ˆp A )= ( ) n n A ˆp n A A L 0 =Pr(n A p A = π A )= ( ) n n A π n A A (1 ˆp A ) n n A (1 π A) n n A Likelihood ratio test statistic: LRT = 2 ln (L a /L 0 ) Some clever people have shown that if H 0 is true, then LRT follows a χ 2 (df=1) distribution (approximately).
7 Likelihoodratio test for the example We observed n A =78andn B =22. H 0 :(p A, p B )=(0.75,0.25) H a :(p A, p B ) (0.75,0.25) L a =Pr(n A =78 p A =0.78) = ( ) = L 0 =Pr(n A =78 p A =0.75) = ( ) = LRT = 2 ln (L a /L 0 )=0.49. Using a χ 2 (df=1) distribution, we get a pvalue of We therefore have no evidence against the null hypothesis. In R: pvalue = 1  pchisq(0.49,1)
8 Null distribution Obs LRT
9 A little math... n=n A +n B, n 0 A =E[n A H 0 ]=n π A, n 0 B =E[n B H 0 ]=n π B. ( ) na ( Then L a /L 0 = na n nb 0 n A 0 B ) nb ) Or equivalently LRT = 2 n A ln( na n 0 A ) +2 n B ln( nb. n 0 B Why do this?
10 Generalization to more than two groups If we have k groups, then the likelihood ratio test statistic is LRT = 2 k i=1 n i ln ( ni n 0 i ) If H 0 is true, LRT χ 2 (df=k1)
11 Null distributions 3 groups: χ 2 (df=2) 5 groups: χ 2 (df=4) groups: χ 2 (df=6) 9 groups: χ 2 (df=8)
12 Example In a dihybrid cross of tomatos we expect the ratio of the phenotypes to be 9:3:3:1. In 1611 tomatos, we observe the numbers 926, 288, 293, 104. Do these numbers support our hypothesis? Phenotype n i n 0 i n i /n 0 i n i ln ( ) n i /n 0 i Tall, cutleaf Tall, potatoleaf Dwarf, cutleaf Dwarf, potatoleaf Sum
13 Results Obs LRT The test statistics LRT is Using a χ 2 (df=3) distribution, we get a pvalue of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1.
14 The chisquare test There is an alternative technique. The test is called the chisquare test, and has the greater tradition in the literature. For two groups, calculate the following: X 2 = ( n A n 0 A) 2 n 0 A + ( n B n 0 B) 2 n 0 B If H 0 is true, then X 2 is a draw from a χ 2 (df=1) distribution (approximately).
15 Example In the first example we observed n A =78andn B = 22. Under the null hypothesis we have n 0 A =75andn 0 B =25.Wethereforeget X 2 = (7875) (2225)2 25 = =0.48. This corresponds to a pvalue of We therefore have no evidence against the hypothesis (p A, p B )=(0.75,0.25). Note: using the likelihood ratio test we got a pvalue of In R: chisq.test(c(78,22),p=c(0.75,0.25))
16 Generalization to more than two groups As with the likelihood ratio test, there is a generalization to more than just two groups. If we have k groups, the chisquare test statistic we use is X 2 = k i=1 (n i n 0 i ) 2 n 0 i χ 2 (df=k1)
17 Tomato example For the tomato example we get X 2 = ( ) ( ) ( ) ( ) = = 1.47 Using a χ 2 (df=3) distribution, we get a pvalue of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1. Using the likelihood ratio test we also got a pvalue of In R: chisq.test(c(926,288,293,104),p=c(9,3,3,1)/16)
18 Test statistics Let n 0 i denote the expected count in group i if H 0 is true. LRT statistic LRT = 2 ln { } Pr(data p=mle) Pr(data H 0 ) =...=2 i n i ln(n i /n 0 i ) χ 2 test statistic X 2 = (observed expected) 2 expected = i (n i n 0 i ) 2 n 0 i
19 Null distribution of test statistic What values of LRT (or X 2 )shouldweexpect,ifh 0 were true? The null distributions of these statistics may be obtained by: Bruteforce analytic calculations Computer simulations Asymptotic approximations If the sample size n is large, we have LRT χ 2 (k 1) and X 2 χ 2 (k 1)
20 The bruteforce method Pr(LRT = g H 0 ) = n 1,n 2,n 3 giving LRT = g Pr(n 1, n 2, n 3 H 0 ) This is not feasible.
21 Computer simulation 1. Simulate a table conforming to the null hypothesis. E.g., simulate (n 1, n 2, n 3 ) Multinomial(n=100, {1/4, 1/2, 1/4}) 2. Calculate your test statistic. 3. Repeat steps (1) and (2) many (e.g., 1000 or 10,000) times. Estimated critical value the 95th percentile of the results. Estimated Pvalue the prop n of results the observed value. In R, use rmultinom(n, size, prob) to do n simulations of a Multinomial(size, prob).
22 Example We observe the following data: AA AB BB We imagine that these are counts (n 1, n 2, n 3 ) Multinomial(n=100,{p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 1/4, p 2 = 1/2, p 3 = 1/4. We calculate LRT 4.96 and X Referring to the asymptotic approximations (χ 2 dist n with 2 degrees of freedom), we obtain P 8.4% and P 6.9%. With 10,000 simulations under H 0,wegetP 8.9% and P 7.4%.
23 Example Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2
24 Summary and recommendation For either the LRT or the χ 2 test: The null distribution is approximately χ 2 (k 1) if the sample size is large. The null distribution can be approximated by simulating data under the null hypothesis. If the sample size is sufficiently large that the expected count in eachcell is 5, use the asymptotic approximation without worries. Otherwise, consider using computer simulations.
25 Composite hypotheses Sometimes, we ask not p AA = 0.25, p AB = 0.5, p BB = 0.25 But rather something like: p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. For example: Consider the genotypes, of a random sample of individuals, at a diallelic locus. Is the locus in HardyWeinberg equilibrium (as expected in the case of random mating)? Example data: AA AB BB
26 Another example ABO blood groups 3allelesA,B,O. Phenotype A genotype AA or AO B genotype BB or BO AB genotype AB O genotype O Allele frequencies: f A, f B, f O (Note that f A + f B + f O = 1) Under HardyWeinberg equilibrium, we expect p A = f 2 A + 2f A f O p B = f 2 B + 2f B f O p AB = 2f A f B p O = f 2 O Example data: O A B AB
27 LRT for example 1 Data: (n AA, n AB, n BB ) Multinomial(n,{p AA, p AB, p BB }) We seek to test whether the data conform reasonably to H 0 : p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. General MLEs: ˆp AA = n AA /n, ˆp AB = n AB /n, ˆp BB = n BB /n MLE under H 0 : ˆf =(naa + n AB /2)/n p AA = ˆf 2, p AB = 2ˆf (1 ˆf), p BB =(1 ˆf) 2 LRT statistic: LRT = 2 ln { } Pr(nAA, n AB, n BB ˆp AA, ˆp AB, ˆp BB ) Pr(n AA, n AB, n BB p AA, p AB, p BB )
28 LRT for example 2 Data: (n O, n A, n B, n AB ) Multinomial(n,{p O, p A, p B, p AB }) We seek to test whether the data conform reasonably to H 0 : p A = f 2 A + 2f A f O,p B = f 2 B + 2f B f O,p AB = 2f A f B,p O = f 2 O for some f O, f A, f B,wheref O + f A + f B = 1. General MLEs: ˆp O, ˆp A, ˆp B, ˆp AB, like before. MLE under H 0 : Requires numerical optimization Call them (ˆf O,ˆf A,ˆf B ) ( p O, p A, p B, p AB ) LRT statistic: LRT = 2 ln { } Pr(nO, n A, n B, n AB ˆp O, ˆp A, ˆp B, ˆp AB ) Pr(n O, n A, n B, n AB p O, p A, p B, p AB )
29 χ 2 test for these examples Obtain the MLE(s) under H 0. Calculate the corresponding cell probabilities. Turn these into (estimated) expected counts under H 0. Calculate X 2 = (observed expected) 2 expected
30 Null distribution for these cases Computer simulation (with one wrinkle) Simulate data under H 0 (plug in the MLEs for the observed data) Calculate the MLE with the simulated data Calculate the test statistic with the simulated data Repeat many times Asymptotic approximation Under H 0,ifthesamplesize,n,islarge,boththeLRTstatistic and the χ 2 statistic follow, approximately, a χ 2 distribution with k s 1degrees of freedom, where s is the number of parameters estimated under H 0. Note that s = 1 for example 1, and s = 2 for example 2, and so df = 1 for both examples.
31 Example 1 Example data: AA AB BB MLE: ˆf =(5+20/2)/100=15% Expected counts: Test statistics: LRT statistic = 3.87 X 2 =4.65 Asymptotic χ 2 (df = 1) approx n: P 4.9% P 3.1% 10,000 computer simulations: P 8.2% P 2.4%
32 Example 1 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic 95th %ile = 3.36 Observed X 2
33 Example 2 Example data: O A B AB MLE: ˆfO 62.8%, ˆf A 25.0%, ˆf B 12.2%. Expected counts: Test statistics: LRT statistic = 1.99 X 2 =2.10 Asymptotic χ 2 (df = 1) approx n: P 16% P 15% 10,000 computer simulations: P 17% P 15%
34 Example 2 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2
35 Example 3 Data on number of sperm bound to an egg: count Do these follow a Poisson distribution? MLE: ˆλ =sampleaverage=( )/ Expected counts n 0 i = n e ˆλ ˆλi / i!
36 Example observed expected X 2 = (obs exp) 2 exp =...=42.8 LRT = 2 obs log(obs/exp) =...=18.8 Compare to χ 2 (df = = 4) Pvalue = (χ 2 )and (LRT). By simulation: pvalue = 16/10,000 (χ 2 ) and 7/10,000 (LRT)
37 Null simulation results Observed Simulated χ 2 statistic Observed Simulated LRT statistic
38 A final note With these sorts of goodnessoffit tests, we are often happy when our model does fit. In other words, we often prefer to fail to reject H 0. Such a conclusion, that the data fit the model reasonably well, should be phrased and considered with caution. We should think: how much power do I have to detect, with these limited data, a reasonable deviation from H 0?
39 2 x 2 tables Apply a treatment to 20 mice from strains A and B, and observe survival. Question: N Y A B Are the survival rates in the two strains the same? Gather 100 rats and determine whether they are infected with viruses A and B. IB NIB IA NIA Question: Is infection with virus A independent of infection with virus B?
40 Underlying probabilities Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Model: (n 00, n 01, n 10, n 11 ) Multinomial(n,{p 00, p 01, p 10, p 11 }) or n 01 Binomial(n 0+, p 01 /p 0+ ) and n 11 Binomial(n 1+, p 11 /p 1+ )
41 Conditional probabilities Underlying probabilities B 0 1 A 0 p 00 p 01 p 0+ 1 p 10 p 11 p 1+ p +0 p +1 1 Conditional probabilities Pr(B = 1 A=0)=p 01 /p 0+ Pr(B = 1 A=1)=p 11 /p 1+ Pr(A = 1 B=0)=p 10 /p +0 Pr(A = 1 B=1)=p 11 /p +1 The questions in the two examples are the same! They both concern: p 01 /p 0+ = p 11 /p 1+ Equivalently: p ij = p i+ p +j for all i,j think Pr(A and B) = Pr(A) Pr(B).
42 This is a composite hypothesis! 2x2table Adifferentview B 0 1 A 0 p 00 p 01 p 0+ p 00 p 01 p 10 p 11 1 p 10 p 11 p 1+ p +0 p +1 1 H 0 : p ij = p i+ p +j for all i,j H 0 : p ij = p i+ p +j for all i,j Degrees of freedom = = 1
43 Expected counts Observed data B 0 1 A 0 n 00 n 01 n 0+ 1 n 10 n 11 n 1+ n +0 n +1 n Expected counts B 0 1 A 0 e 00 e 01 n 0+ 1 e 10 e 11 n 1+ n +0 n +1 n To get the expected counts under the null hypothesis we: Estimate p 1+ and p +1 by n 1+ /nandn +1 /n, respectively. These are the MLEs under H 0! Turn these into estimates of the p ij. Multiply these by the total sample size, n.
44 The expected counts The expected count (assuming H 0 ) for the 11 cell is the following: e 11 = n ˆp 11 = n ˆp 1+ ˆp +1 = n (n 1+ /n) (n +1 /n) =(n 1+ n +1 )/n The other cells are similar. We can then calculate a χ 2 or LRT statistic as before!
45 Example 1 Observed data N Y A B Expected counts N Y A B X 2 = ( ) ( ) (2 5.5) (9 5.5)2 5.5 =6.14 LRT = 2 [18 log( ) log( 5.5 )] =6.52 Pvalues (based on the asymptotic χ 2 (df = 1) approximation): 1.3% and 1.1%, respectively.
46 Example 2 Observed data IB NIB IA NIA Expected counts IB NIB IA NIA X 2 = (9 5.2) ( ) (9 12.8) ( ) =4.70 LRT = 2 [9 log( ) log( 58.2 )] =4.37 Pvalues (based on the asymptotic χ 2 (df = 1) approximation): 3.0% and 3.7%, respectively.
47 r x k tables Blood type Population A B AB O Florida Iowa Missouri Same distribution of blood types in each population?
48 Underlying probabilities Observed data 1 2 k 1 n 11 n 12 n 1k n 1+ 2 n 21 n 22 n 2k n r n r1 n r2 n rk n r+ n +1 n +2 n +k n Underlying probabilities 1 2 k 1 p 11 p 12 p 1k p 1+ 2 p 21 p 22 p 2k p r p r1 p r2 p rk p r+ p +1 p +2 p +k 1 H 0 : p ij = p i+ p +j for all i,j.
49 Expected counts Observed data A B AB O F I M Expected counts A B AB O F I M Expected counts under H 0 : e ij = n i+ n +j /n for all i,j.
50 χ 2 and LRT statistics Observed data A B AB O F I M Expected counts A B AB O F I M X 2 statistic = (obs exp) 2 exp = =5.64 LRT statistic = 2 obs ln(obs/exp) = =5.55
51 Asymptotic approximation If the sample size is large, the null distribution of the χ 2 and likelihood ratio test statistics will approximately follow a χ 2 distribution with (r 1) (k 1) d.f. In the example, df = (3 1) (4 1) = 6 X 2 =5.64 P=0.46. LRT = 5.55 P=0.48.
52 Twolocus linkage in an intercross BB Bb bb AA Aa aa Are these two loci linked?
53 General test of independence Observed data BB Bb bb AA Aa aa Expected counts BB Bb bb AA Aa aa χ 2 test: X 2 =10.4 P=3.5% (df = 4) LRT test: LRT = 9.98 P=4.1% Fisher s exact test: P = 4.6%
54 A more specific test Observed data BB Bb bb AA Aa aa Underlying probabilities BB Bb bb AA 1 4 (1 θ)2 1 2 θ(1 θ) 1 4 θ2 Aa 1 2 θ(1 θ) 1 2 [θ2 +(1 θ) 2 1 ] 2θ(1 θ) 1 aa 4 θ2 1 2 θ(1 θ) 1 4 (1 θ)2 H 0 : θ =1/2 versus H a : θ<1/2 Use a likelihood ratio test! Obtain the general MLE of θ. Calculate the LRT statistic = 2 ln Compare this statistic to a χ 2 (df = 1). { Pr(data ˆθ) } Pr(data θ=1/2)
55 Results BB Bb bb AA Aa aa MLE: ˆθ =0.359 LRT statistic: LRT = 7.74 P=0.54% (df = 1) Here we assume Mendelian segregation, and that deviation from H 0 is in a particular direction. If these assumptions are correct, we ll have greater power to detect linkage using this more specific approach.
Goodness of Fit Goodness of fit  2 classes
Goodness of Fit Goodness of fit  2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact pvalue Exact confidence interval
More informationTopic 19: Goodness of Fit
Topic 19: November 24, 2009 A goodness of fit test examine the case of a sequence if independent experiments each of which can have 1 of k possible outcomes. In terms of hypothesis testing, let π = (π
More informationTopic 21: Goodness of Fit
Topic 21: December 5, 2011 A goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of 4 possible
More informationChapter 4. The analysis of Segregation
Chapter 4. The analysis of Segregation 1 The law of segregation (Mendel s rst law) There are two alleles in the two homologous chromosomes at a certain marker. One of the two alleles received from one
More informationChi Square for Contingency Tables
2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure
More informationAllele Frequencies and Hardy Weinberg Equilibrium
Allele Frequencies and Hardy Weinberg Equilibrium Summer Institute in Statistical Genetics 013 Module 8 Topic Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies
More informationTopic 21 Goodness of Fit
Topic 21 Goodness of Fit Fit of a Distribution 1 / 14 Outline Fit of a Distribution Blood Bank Likelihood Function Likelihood Ratio Lagrange Multipliers Hanging ChiGram 2 / 14 Fit of a Distribution Goodness
More informationStatistics for EES and MEME Chisquare tests and Fisher s exact test
Statistics for EES and MEME Chisquare tests and Fisher s exact test Dirk Metzler 3. June 2013 Contents 1 X 2 goodnessoffit test 1 2 X 2 test for homogeneity/independence 3 3 Fisher s exact test 7 4
More informationMultiple random variables
Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of
More informationCATEGORICAL DATA ChiSquare Tests for Univariate Data
CATEGORICAL DATA ChiSquare Tests For Univariate Data 1 CATEGORICAL DATA ChiSquare Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 20092016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More information2 GENETIC DATA ANALYSIS
2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have
More informationO, A, B, and AB. 0 for all i =1,...,k,
Topic 1 1.1 Fit of a Distribution Goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of
More informationCHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining pvalues likelihood ratio criterion
CHAPTER NINE Key Concepts chisquare distribution, chisquare test, degrees of freedom observed and expected values, goodnessoffit tests contingency table, dependent (response) variables, independent
More informationMendelian Genetics. I. Background
Mendelian Genetics Objectives 1. To understand the Principles of Segregation and Independent Assortment. 2. To understand how Mendel s principles can explain transmission of characters from one generation
More informationComparing Multiple Proportions, Test of Independence and Goodness of Fit
Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2
More informationStatistical Testing of Randomness Masaryk University in Brno Faculty of Informatics
Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular
More informationtests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a ChiSquare Test for
This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes
More information4. Sum the results of the calculation described in step 3 for all classes of progeny
F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives
More informationTesting on proportions
Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is
More information12.5: CHISQUARE GOODNESS OF FIT TESTS
125: ChiSquare Goodness of Fit Tests CD121 125: CHISQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability
More informationCHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES
CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chisquare distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chisquare is
More informationChisquare test Testing for independeny The r x c contingency tables square test
Chisquare test Testing for independeny The r x c contingency tables square test 1 The chisquare distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computeraided
More informationHypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...
Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................
More informationINTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS
INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS By Dr. Susan Petro Based on a lab by Dr. Elaine Winshell Nicotiana tabacum Objectives To apply Mendel s Law of Segregation To use Punnett
More informationLecture 42 Section 14.3. Tue, Apr 8, 2008
the Lecture 42 Section 14.3 HampdenSydney College Tue, Apr 8, 2008 Outline the 1 2 the 3 4 5 the The will compute χ 2 areas, but not χ 2 percentiles. (That s ok.) After performing the χ 2 test by hand,
More informationWe know from STAT.1030 that the relevant test statistic for equality of proportions is:
2. Chi 2 tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which
More informationAP: LAB 8: THE CHISQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHISQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationChapter 11 Chi square Tests
11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with
More informationUsing Stata for Categorical Data Analysis
Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,
More informationLecture 7: Binomial Test, Chisquare
Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two
More informationThe GoodnessofFit Test
on the Lecture 49 Section 14.3 HampdenSydney College Tue, Apr 21, 2009 Outline 1 on the 2 3 on the 4 5 Hypotheses on the (Steps 1 and 2) (1) H 0 : H 1 : H 0 is false. (2) α = 0.05. p 1 = 0.24 p 2 = 0.20
More informationToday s lecture. Lecture 6: Dichotomous Variables & Chisquare tests. Proportions and 2 2 tables. How do we compare these proportions
Today s lecture Lecture 6: Dichotomous Variables & Chisquare tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios
More information93.4 Likelihood ratio test. NeymanPearson lemma
93.4 Likelihood ratio test NeymanPearson lemma 91 Hypothesis Testing 91.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental
More informationStatistical Analysis of Blood Group Using Maximum Likelihood Estimation
International Journal of Engineering and Technical Research (IJETR) ISSN: 23210869, Volume2, Issue6, June 2014 Statistical Analysis of Blood Group Using Maximum Likelihood Estimation Nagarajan.D, Sunitha.P,
More informationChiSquare Test. Contingency Tables. Contingency Tables. ChiSquare Test for Independence. ChiSquare Tests for GoodnessofFit
ChiSquare Tests 15 Chapter ChiSquare Test for Independence ChiSquare Tests for Goodness Uniform Goodness Poisson Goodness Goodness Test ECDF Tests (Optional) McGrawHill/Irwin Copyright 2009 by The
More informationChi Squared and Fisher's Exact Tests. Observed vs Expected Distributions
BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: ChiSquared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chisquared
More information13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.
13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)
More informationToday s Objectives. Probability rules apply to inheritance at more than one locus
Figure 14.8 Segregation of alleles and fertilization as chance events Today s Objectives Use rules of probability to solve genetics problems Define dominance, incomplete dominance, and codominance Extend
More informationLAB : THE CHISQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHISQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationOrdinal Regression. Chapter
Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe
More informationMATH 10: Elementary Statistics and Probability Chapter 11: The ChiSquare Distribution
MATH 10: Elementary Statistics and Probability Chapter 11: The ChiSquare Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,
More informationAppendix A The ChiSquare (32) Test in Genetics
Appendix A The ChiSquare (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between
More informationH + T = 1. p(h + T) = p(h) x p(t)
Probability and Statistics Random Chance A tossed penny can land either heads up or tails up. These are mutually exclusive events, i.e. if the coin lands heads up, it cannot also land tails up on the same
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationBasic concepts and introduction to statistical inference
Basic concepts and introduction to statistical inference Anna Helga Jonsdottir Gunnar Stefansson Sigrun Helga Lund University of Iceland (UI) Basic concepts 1 / 19 A review of concepts Basic concepts Confidence
More informationChapter 9: Hypothesis Testing Sections
Chapter 9: Hypothesis Testing Sections  we are still here Skip: 9.2 Testing Simple Hypotheses Skip: 9.3 Uniformly Most Powerful Tests Skip: 9.4 TwoSided Alternatives 9.5 The t Test 9.6 Comparing the
More informationSampling and Hypothesis Testing
Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus
More informationMorgan s fly experiment. Linkage Mapping. Genetic recombination. Genetic Recombination. Crossing Over and Recombination. CS Jim Holland
Morgan s fly experiment Linkage Mapping CS71 009 Jim Holland r r Vg Vg (red eyes, normal wings) prprvgvg (purple eyes, vestigial wings) r prvg vg (F1) prprvgvg 19 151 151 1195 r prvg vg r prvgvg prprvg
More informationStatistics for Management IISTAT 362Final Review
Statistics for Management IISTAT 362Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to
More informationMCB142/IB163 Mendelian and Population Genetics 9/19/02
MCB142/IB163 Mendelian and Population Genetics 9/19/02 Practice questions lectures 512 (Hardy Weinberg, chisquare test, Mendel s second law, gene interactions, linkage maps, mapping human diseases, nonrandom
More informationMendelian Inheritance & Probability
Mendelian Inheritance & Probability (CHAPTER 2 Brooker Text) January 31 & Feb 2, 2006 BIO 184 Dr. Tom Peavy Problem Solving TtYy x ttyy What is the expected phenotypic ratio among offspring? Tt RR x Tt
More informationThe alternative hypothesis,, is the statement that the parameter value somehow differs from that claimed by the null hypothesis. : 0.5 :>0.5 :<0.
Section 8.28.5 Null and Alternative Hypotheses... The null hypothesis,, is a statement that the value of a population parameter is equal to some claimed value. :=0.5 The alternative hypothesis,, is the
More informationChapter Five: Paired Samples Methods 1/38
Chapter Five: Paired Samples Methods 1/38 5.1 Introduction 2/38 Introduction Paired data arise with some frequency in a variety of research contexts. Patients might have a particular type of laser surgery
More informationThe NeymanPearson lemma. The NeymanPearson lemma
The NeymanPearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to
More informationAnalyzing tables. Chisquared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO
Analyzing tables Chisquared, exact and monte carlo tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Thursday 26.05.2011 1 / 50 Overview Aalen chapter 6.5,
More informationExact Confidence Intervals
Math 541: Statistical Theory II Instructor: Songfeng Zheng Exact Confidence Intervals Confidence intervals provide an alternative to using an estimator ˆθ when we wish to estimate an unknown parameter
More informationHLA data analysis in anthropology: basic theory and practice
HLA data analysis in anthropology: basic theory and practice Alicia SanchezMazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,
More informationIntroduction to Hypothesis Testing. Point estimation and confidence intervals are useful statistical inference procedures.
Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.
More informationDEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS. Posc/Uapp 816 CONTINGENCY TABLES
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 816 CONTINGENCY TABLES I. AGENDA: A. Crossclassifications 1. Twobytwo and R by C tables 2. Statistical independence 3. The interpretation
More informationDr. Young. Genetics Problems Set #1 Answer Key
BIOL276 Dr. Young Name Due Genetics Problems Set #1 Answer Key For problems in genetics, if no particular order is specified, you can assume that a specific order is not required. 1. What is the probability
More informationCalculating PValues. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating PValues Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating PValues" (2014). A with Honors Projects.
More informationChi Square Test. Paul E. Johnson Department of Political Science. 2 Center for Research Methods and Data Analysis, University of Kansas
Chi Square Test Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2011 What is this Presentation? Chi Square Distribution Find
More informationMath 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2
Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable
More informationBivariate Analysis. Comparisons of proportions: Chi Square Test (X 2 test) Variable 1. Variable 2 2 LEVELS >2 LEVELS CONTINUOUS
Bivariate Analysis Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X 2 chi square test >2 LEVELS X 2 chi square test CONTINUOUS ttest X 2 chi square test X 2 chi square test ANOVA (Ftest)
More informationChisquare test Fisher s Exact test
Lesson 1 Chisquare test Fisher s Exact test McNemar s Test Lesson 1 Overview Lesson 11 covered two inference methods for categorical data from groups Confidence Intervals for the difference of two proportions
More informationChapter 4: Statistical Hypothesis Testing
Chapter 4: Statistical Hypothesis Testing Christophe Hurlin November 20, 2015 Christophe Hurlin () Advanced Econometrics  Master ESA November 20, 2015 1 / 225 Section 1 Introduction Christophe Hurlin
More informationRandom Uniform Clumped. 0 1 2 3 4 5 6 7 8 9 Number of Individuals per SubQuadrat. Number of Individuals per SubQuadrat
41 Population ecology Lab 4: Population dispersion patterns I. Introduction to population dispersion patterns The dispersion of individuals in a population describes their spacing relative to each other.
More informationHypothesis Testing. Learning Objectives. After completing this module, the student will be able to
Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and
More informationHardyWeinberg Equilibrium Problems
HardyWeinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in HardyWeinberg equilibrium. (a) Calculate the percentage of
More informationPASS Sample Size Software
Chapter 250 Introduction The Chisquare test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial
More informationHypothesis testing S2
Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to
More informationUnit 29 ChiSquare GoodnessofFit Test
Unit 29 ChiSquare GoodnessofFit Test Objectives: To perform the chisquare hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni
More informationTesting Hypotheses using SPSS
Is the mean hourly rate of male workers $2.00? TTest OneSample Statistics Std. Error N Mean Std. Deviation Mean 2997 2.0522 6.6282.2 OneSample Test Test Value = 2 95% Confidence Interval Mean of the
More informationMath 62 Statistics Sample Exam Questions
Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLS4 Problem Set 2, Population Genetics. Corresponding Quiz: November (Along with positional cloning)
LS4 Problem Set 2, 2010 Population Genetics Corresponding Lectures: November 5, 8 and 10 Corresponding Reading: Hartwell et al., 757773 Corresponding Quiz: November 1519 (Along with positional cloning)
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationBreeding Value versus Genetic Value
Breeding Value versus Genetic Value Gene351 (not( Gene251) Lecture 6 Key terms Breeding value Genetic value Average effect of alleles Reading: Prescribed None Suggested Falconer & McKay, 1996 Revision
More informationClass 19: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, ChiSquare (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationGoodness of Fit. Proportional Model. Probability Models & Frequency Data
Probability Models & Frequency Data Goodness of Fit Proportional Model Chisquare Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any
More informationEstimating the Frequency Distribution of the. Numbers Bet on the California Lottery
Estimating the Frequency Distribution of the Numbers Bet on the California Lottery Mark Finkelstein November 15, 1993 Department of Mathematics, University of California, Irvine, CA 92717. Running head:
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationUCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 10 ChiSquare Test Relative Risk/Odds Ratios
UCLA STAT 3 Introduction to Statistical Methods for the Life and Health Sciences Instructor: Ivo Dinov, Asst. Prof. of Statistics and Neurology Chapter 0 ChiSquare Test Relative Risk/Odds Ratios Teaching
More informationGeneralization of Chisquare GoodnessofFit Test for Binned Data Using Saturated Models, with Application to Histograms
Generalization of Chisquare GoodnessofFit Test for Binned Data Using Saturated Models, with Application to Histograms Robert D. Cousins Dept. of Physics and Astronomy University of California, Los Angeles,
More informationChapter 11: Linear Regression  Inference in Regression Analysis  Part 2
Chapter 11: Linear Regression  Inference in Regression Analysis  Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.
More informationChapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 81 Overview 82 Basics of Hypothesis Testing
Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 81 Overview 82 Basics of Hypothesis Testing 83 Testing a Claim About a Proportion 85 Testing a Claim About a Mean: s Not Known 86 Testing
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationAnswers to Mendelian genetics questions BI164 Spring, 2007
Answers to Mendelian genetics questions BI164 Spring, 2007 1. The father has normal vision and must therefore be hemizygous for the normal vision allele. The mother must be a carrier and hence the source
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationBasic Premises of Population Genetics
Population genetics is concerned with the origin, amount and distribution of genetic variation present in populations of organisms, and the fate of this variation through space and time. The fate of genetic
More informationLinkage, recombination and gene mapping. Why are linkage relationships important?
Transmission patterns of linked genes Linkage, recombination and gene mapping Why are linkage relationships important? Linkage and independent assortment  statistical tests of hypotheses Recombination
More informationData Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling
Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study loglinear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationNull Hypothesis H 0. The null hypothesis (denoted by H 0
Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property
More information112 Goodness of Fit Test
112 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a oneway frequency table). We will use a hypothesis
More informationBivariate Statistics Session 2: Measuring Associations ChiSquare Test
Bivariate Statistics Session 2: Measuring Associations ChiSquare Test Features Of The ChiSquare Statistic The chisquare test is nonparametric. That is, it makes no assumptions about the distribution
More informationChapter 7 Part 2. Hypothesis testing Power
Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship
More information