Goodness of Fit Goodness of fit - 2 classes

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Goodness of Fit Goodness of fit - 2 classes"

Transcription

1 Goodness of Fit Goodness of fit - 2 classes A B Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval Normal approximation

2 Goodness of fit - 3 classes AA AB BB Do these data correspond reasonably to the proportions 1:2:1? Multinomial distribution Imagine an urn with k types of balls. Let p i denote the proportion of type i. Draw n balls with replacement. Outcome: (n 1, n 2,...,n k ), with i n i = n, where n i is the no. balls drawn that were of type i. n! P(X 1 =n 1,...,X k =n k )= n 1! n k! pn 1 1 pn k k if 0 n i n, i n i =n Otherwise P(X 1 =n 1,...,X k =n k ) = 0.

3 Example Let (p 1, p 2, p 3 ) =(0.25,0.50,0.25)andn=100. P(X 1 =35, X 2 =43, X 3 =22) = 100! 35! 43! 22! Rather brutal, numerically speaking. Take logs (and use a computer). Goodness of fit test We observe (n 1, n 2, n 3 ) Multinomial(n,p={p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 0.25, p 2 = 0.5, p 3 = versus H a : H 0 is false. We need two things: Ateststatistic. The null distribution of the test statistic.

4 The likelihood-ratio test (LRT) Back to the first example: A n A B n B Test H 0 :(p A, p B )=(π A, π B ) versus H a :(p A, p B ) (π A, π B ). MLE under H a : ˆp A = n A /n where n = n A + n B. Likelihood under H a : Likelihood under H 0 : L a =Pr(n A p A = ˆp A )= ( ) n n A ˆp n A A (1 ˆp A ) n n A L 0 =Pr(n A p A = π A )= ( ) n n A π n A A (1 π A) n n A Likelihood ratio test statistic: LRT = 2 ln (L a /L 0 ) Some clever people have shown that if H 0 is true, then LRT follows a χ 2 (df=1) distribution (approximately). Likelihood-ratio test for the example We observed n A =78andn B =22. H 0 :(p A, p B )=(0.75,0.25) H a :(p A, p B ) (0.75,0.25) L a =Pr(n A =78 p A =0.78) = ( ) = L 0 =Pr(n A =78 p A =0.75) = ( ) = LRT = 2 ln (L a /L 0 ) = Using a χ 2 (df=1) distribution, we get a p-value of We therefore have no evidence against the null hypothesis. In R: p-value = 1 - pchisq(0.49,1)

5 Null distribution Obs LRT A little math... n=n A +n B, n 0 A =E[n A H 0 ]=n π A, n 0 B =E[n B H 0 ]=n π B. ( ) na ( Then L a /L 0 = na n nb 0 n A 0 B ) nb ) Or equivalently LRT = 2 n A ln( na n 0 A ) +2 n B ln( nb. n 0 B Why do this?

6 Generalization to more than two groups If we have k groups, then the likelihood ratio test statistic is LRT = 2 k i=1 n i ln ( ni n 0 i ) If H 0 is true, LRT χ 2 (df=k-1) Null distributions 3 groups: χ 2 (df=2) 5 groups: χ 2 (df=4) groups: χ 2 (df=6) 9 groups: χ 2 (df=8)

7 Example In a dihybrid cross of tomatos we expect the ratio of the phenotypes to be 9:3:3:1. In 1611 tomatos, we observe the numbers 926, 288, 293, 104. Do these numbers support our hypothesis? Phenotype n i n 0 i n i /n 0 i n i ln ( ) n i /n 0 i Tall, cut-leaf Tall, potato-leaf Dwarf, cut-leaf Dwarf, potato-leaf Sum Results Obs LRT The test statistics LRT is Using a χ 2 (df=3) distribution, we get a p-value of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1.

8 The chi-square test There is an alternative technique. The test is called the chi-square test, and has the greater tradition in the literature. For two groups, calculate the following: X 2 = (n A n 0 A) 2 n 0 A + (n B n 0 B) 2 n 0 B If H 0 is true, then X 2 is a draw from a χ 2 (df=1) distribution (approximately). Example In the first example we observed n A =78andn B = 22. Under the null hypothesis we have n 0 A =75andn 0 B =25.Wethereforeget X 2 = (78-75) (22-25)2 25 = =0.48. This corresponds to a p-value of We therefore have no evidence against the hypothesis (p A, p B )=(0.75,0.25). Note: using the likelihood ratio test we got a p-value of In R: chisq.test(c(78,22),p=c(0.75,0.25))

9 Generalization to more than two groups As with the likelihood ratio test, there is a generalization to more than just two groups. If we have k groups, the chi-square test statistic we use is X 2 = k i=1 (n i n 0 i ) 2 n 0 i χ 2 (df=k-1) Tomato example For the tomato example we get X 2 = ( ) ( ) ( ) ( ) = = 1.47 Using a χ 2 (df=3) distribution, we get a p-value of We therefore have no evidence against the hypothesis that the ratio of the phenotypes is 9:3:3:1. Using the likelihood ratio test we also got a p-value of In R: chisq.test(c(926,288,293,104),p=c(9,3,3,1)/16)

10 Test statistics Let n 0 i denote the expected count in group i if H 0 is true. LRT statistic LRT = 2 ln { } Pr(data p = MLE) Pr(data H 0 ) =...=2 i n i ln(n i /n 0 i ) χ 2 test statistic X 2 = (observed expected) 2 expected = i (n i n 0 i ) 2 n 0 i Null distribution of test statistic What values of LRT (or X 2 )shouldweexpect,ifh 0 were true? The null distributions of these statistics may be obtained by: Brute-force analytic calculations Computer simulations Asymptotic approximations If the sample size n is large, we have LRT χ 2 (k 1) and X 2 χ 2 (k 1)

11 The brute-force method Pr(LRT = g H 0 ) = n 1,n 2,n 3 giving LRT = g Pr(n 1, n 2, n 3 H 0 ) This is not feasible. Computer simulation 1. Simulate a table conforming to the null hypothesis. E.g., simulate (n 1, n 2, n 3 ) Multinomial(n=100, {1/4, 1/2, 1/4}) 2. Calculate your test statistic. 3. Repeat steps (1) and (2) many (e.g., 1000 or 10,000) times. Estimated critical value the 95th percentile of the results. Estimated P-value the prop n of results the observed value. In R, use rmultinom(n, size, prob) to do n simulations of a Multinomial(size, prob).

12 Example We observe the following data: AA AB BB We imagine that these are counts (n 1, n 2, n 3 ) Multinomial(n=100,{p 1, p 2, p 3 }). We seek to test H 0 : p 1 = 1/4, p 2 = 1/2, p 3 = 1/4. We calculate LRT 4.96 and X Referring to the asymptotic approximations (χ 2 dist n with 2 degrees of freedom), we obtain P 8.4% and P 6.9%. With 10,000 simulations under H 0,wegetP 8.9% and P 7.4%. Example Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2

13 Summary and recommendation For either the LRT or the χ 2 test: The null distribution is approximately χ 2 (k 1) if the sample size is large. The null distribution can be approximated by simulating data under the null hypothesis. If the sample size is sufficiently large that the expected count in eachcell is 5, use the asymptotic approximation without worries. Otherwise, consider using computer simulations. Composite hypotheses Sometimes, we ask not p AA = 0.25, p AB = 0.5, p BB = 0.25 But rather something like: p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. For example: Consider the genotypes, of a random sample of individuals, at a diallelic locus. Is the locus in Hardy-Weinberg equilibrium (as expected in the case of random mating)? Example data: AA AB BB

14 Another example ABO blood groups 3 alleles A, B, O. Phenotype A genotype AA or AO B genotype BB or BO AB genotype AB O genotype O Allele frequencies: f A, f B, f O (Note that f A + f B + f O = 1) Under Hardy-Weinberg equilibrium, we expect p A = f 2 A + 2f A f O p B = f 2 B + 2f B f O p AB = 2f A f B p O = f 2 O Example data: O A B AB LRT for example 1 Data: (n AA, n AB, n BB ) Multinomial(n,{p AA, p AB, p BB }) We seek to test whether the data conform reasonably to H 0 : p AA = f 2, p AB = 2f(1 f), p BB =(1 f) 2 for some f. General MLEs: ˆp AA = n AA /n, ˆp AB = n AB /n, ˆp BB = n BB /n MLE under H 0 : ˆf =(naa + n AB /2)/n p AA = ˆf 2, p AB = 2ˆf (1 ˆf), p BB =(1 ˆf) 2 LRT statistic: LRT = 2 ln { } Pr(nAA, n AB, n BB ˆp AA, ˆp AB, ˆp BB ) Pr(n AA, n AB, n BB p AA, p AB, p BB )

15 LRT for example 2 Data: (n O, n A, n B, n AB ) Multinomial(n,{p O, p A, p B, p AB }) We seek to test whether the data conform reasonably to H 0 : p A = f 2 A + 2f A f O,p B = f 2 B + 2f B f O,p AB = 2f A f B,p O = f 2 O for some f O, f A, f B,wheref O + f A + f B = 1. General MLEs: ˆp O, ˆp A, ˆp B, ˆp AB, like before. MLE under H 0 : Requires numerical optimization Call them (ˆf O,ˆf A,ˆf B ) ( p O, p A, p B, p AB ) LRT statistic: LRT = 2 ln { } Pr(nO, n A, n B, n AB ˆp O, ˆp A, ˆp B, ˆp AB ) Pr(n O, n A, n B, n AB p O, p A, p B, p AB ) χ 2 test for these examples Obtain the MLE(s) under H 0. Calculate the corresponding cell probabilities. Turn these into (estimated) expected counts under H 0. Calculate X 2 = (observed expected) 2 expected

16 Null distribution for these cases Computer simulation (with one wrinkle) Simulate data under H 0 (plug in the MLEs for the observed data) Calculate the MLE with the simulated data Calculate the test statistic with the simulated data Repeat many times Asymptotic approximation Under H 0,ifthesamplesize,n,islarge,boththeLRTstatistic and the χ 2 statistic follow, approximately, a χ 2 distribution with k s 1degrees of freedom, where s is the number of parameters estimated under H 0. Note that s = 1 for example 1, and s = 2 for example 2, and so df = 1 for both examples. Example 1 Example data: AA AB BB MLE: ˆf =(5+20/2)/100=15% Expected counts: Test statistics: LRT statistic = 3.87 X 2 =4.65 Asymptotic χ 2 (df = 1) approx n: P 4.9% P 3.1% 10,000 computer simulations: P 8.2% P 2.4%

17 Example 1 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic 95th %ile = 3.36 Observed X 2 Example 2 Example data: O A B AB MLE: ˆfO 62.8%, ˆf A 25.0%, ˆf B 12.2%. Expected counts: Test statistics: LRT statistic = 1.99 X 2 =2.10 Asymptotic χ 2 (df = 1) approx n: P 16% P 15% 10,000 computer simulations: P 17% P 15%

18 Example 2 Est d null dist n of LRT statistic Observed 95th %ile = LRT Est d null dist n of chi square statistic Observed 95th %ile = X 2 Example 3 Data on number of sperm bound to an egg: count Do these follow a Poisson distribution? MLE: ˆλ =sampleaverage=( ) / Expected counts n 0 i = n e ˆλ ˆλi / i!

19 Example observed expected X 2 = (obs exp) 2 exp =...=42.8 LRT = 2 obs log(obs/exp) =...=18.8 Compare to χ 2 (df = = 4) P-value = (χ 2 ) and (LRT). By simulation: p-value = 16/10,000 (χ 2 ) and 7/10,000 (LRT) Null simulation results Observed Simulated χ 2 statistic Observed Simulated LRT statistic

20 A final note With these sorts of goodness-of-fit tests, we are often happy when our model does fit. In other words, we often prefer to fail to reject H 0. Such a conclusion, that the data fit the model reasonably well, should be phrased and considered with caution. We should think: how much power do I have to detect, with these limited data, a reasonable deviation from H 0?

Goodness of fit - 2 classes

Goodness of fit - 2 classes Goodness of fit - 2 classes A B 78 22 Do these data correspond reasonably to the proportions 3:1? We previously discussed options for testing p A =0.75! Exact p-value Exact confidence interval Normal approximation

More information

Topic 19: Goodness of Fit

Topic 19: Goodness of Fit Topic 19: November 24, 2009 A goodness of fit test examine the case of a sequence if independent experiments each of which can have 1 of k possible outcomes. In terms of hypothesis testing, let π = (π

More information

Topic 21: Goodness of Fit

Topic 21: Goodness of Fit Topic 21: December 5, 2011 A goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of 4 possible

More information

Allele Frequencies and Hardy Weinberg Equilibrium

Allele Frequencies and Hardy Weinberg Equilibrium Allele Frequencies and Hardy Weinberg Equilibrium Summer Institute in Statistical Genetics 013 Module 8 Topic Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Fit of a Distribution 1 / 14 Outline Fit of a Distribution Blood Bank Likelihood Function Likelihood Ratio Lagrange Multipliers Hanging Chi-Gram 2 / 14 Fit of a Distribution Goodness

More information

Chapter 4. The analysis of Segregation

Chapter 4. The analysis of Segregation Chapter 4. The analysis of Segregation 1 The law of segregation (Mendel s rst law) There are two alleles in the two homologous chromosomes at a certain marker. One of the two alleles received from one

More information

Statistics for EES and MEME Chi-square tests and Fisher s exact test

Statistics for EES and MEME Chi-square tests and Fisher s exact test Statistics for EES and MEME Chi-square tests and Fisher s exact test Dirk Metzler 3. June 2013 Contents 1 X 2 goodness-of-fit test 1 2 X 2 test for homogeneity/independence 3 3 Fisher s exact test 7 4

More information

CATEGORICAL DATA Chi-Square Tests for Univariate Data

CATEGORICAL DATA Chi-Square Tests for Univariate Data CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.

More information

2 GENETIC DATA ANALYSIS

2 GENETIC DATA ANALYSIS 2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have

More information

4. Sum the results of the calculation described in step 3 for all classes of progeny

4. Sum the results of the calculation described in step 3 for all classes of progeny F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Chi Square for Contingency Tables

Chi Square for Contingency Tables 2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

More information

CHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining p-values likelihood ratio criterion

CHAPTER NINE. Key Concepts. McNemar s test for matched pairs combining p-values likelihood ratio criterion CHAPTER NINE Key Concepts chi-square distribution, chi-square test, degrees of freedom observed and expected values, goodness-of-fit tests contingency table, dependent (response) variables, independent

More information

O, A, B, and AB. 0 for all i =1,...,k,

O, A, B, and AB. 0 for all i =1,...,k, Topic 1 1.1 Fit of a Distribution Goodness of fit tests examine the case of a sequence of independent observations each of which can have 1 of k possible categories. For example, each of us has one of

More information

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics

Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Statistical Testing of Randomness Masaryk University in Brno Faculty of Informatics Jan Krhovják Basic Idea Behind the Statistical Tests Generated random sequences properties as sample drawn from uniform/rectangular

More information

9-3.4 Likelihood ratio test. Neyman-Pearson lemma

9-3.4 Likelihood ratio test. Neyman-Pearson lemma 9-3.4 Likelihood ratio test Neyman-Pearson lemma 9-1 Hypothesis Testing 9-1.1 Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental

More information

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations.

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. 13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

More information

tests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a Chi-Square Test for

tests whether there is an association between the outcome variable and a predictor variable. In the Assistant, you can perform a Chi-Square Test for This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes

More information

Lecture 42 Section 14.3. Tue, Apr 8, 2008

Lecture 42 Section 14.3. Tue, Apr 8, 2008 the Lecture 42 Section 14.3 Hampden-Sydney College Tue, Apr 8, 2008 Outline the 1 2 the 3 4 5 the The will compute χ 2 areas, but not χ 2 percentiles. (That s ok.) After performing the χ 2 test by hand,

More information

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

More information

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

More information

Chi-square test Testing for independeny The r x c contingency tables square test

Chi-square test Testing for independeny The r x c contingency tables square test Chi-square test Testing for independeny The r x c contingency tables square test 1 The chi-square distribution HUSRB/0901/1/088 Teaching Mathematics and Statistics in Sciences: Modeling and Computer-aided

More information

INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS

INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS INTRODUCTION TO GENETICS USING TOBACCO (Nicotiana tabacum) SEEDLINGS By Dr. Susan Petro Based on a lab by Dr. Elaine Winshell Nicotiana tabacum Objectives To apply Mendel s Law of Segregation To use Punnett

More information

Mendelian Genetics. I. Background

Mendelian Genetics. I. Background Mendelian Genetics Objectives 1. To understand the Principles of Segregation and Independent Assortment. 2. To understand how Mendel s principles can explain transmission of characters from one generation

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Testing on proportions

Testing on proportions Testing on proportions Textbook Section 5.4 April 7, 2011 Example 1. X 1,, X n Bernolli(p). Wish to test H 0 : p p 0 H 1 : p > p 0 (1) Consider a related problem The likelihood ratio test is where c is

More information

The Goodness-of-Fit Test

The Goodness-of-Fit Test on the Lecture 49 Section 14.3 Hampden-Sydney College Tue, Apr 21, 2009 Outline 1 on the 2 3 on the 4 5 Hypotheses on the (Steps 1 and 2) (1) H 0 : H 1 : H 0 is false. (2) α = 0.05. p 1 = 0.24 p 2 = 0.20

More information

Using Stata for Categorical Data Analysis

Using Stata for Categorical Data Analysis Using Stata for Categorical Data Analysis NOTE: These problems make extensive use of Nick Cox s tab_chi, which is actually a collection of routines, and Adrian Mander s ipf command. From within Stata,

More information

Multiple random variables

Multiple random variables Multiple random variables Multiple random variables We essentially always consider multiple random variables at once. The key concepts: Joint, conditional and marginal distributions, and independence of

More information

Statistical Analysis of Blood Group Using Maximum Likelihood Estimation

Statistical Analysis of Blood Group Using Maximum Likelihood Estimation International Journal of Engineering and Technical Research (IJETR) ISSN: 2321-0869, Volume-2, Issue-6, June 2014 Statistical Analysis of Blood Group Using Maximum Likelihood Estimation Nagarajan.D, Sunitha.P,

More information

H + T = 1. p(h + T) = p(h) x p(t)

H + T = 1. p(h + T) = p(h) x p(t) Probability and Statistics Random Chance A tossed penny can land either heads up or tails up. These are mutually exclusive events, i.e. if the coin lands heads up, it cannot also land tails up on the same

More information

Inferential Statistics

Inferential Statistics Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

More information

We know from STAT.1030 that the relevant test statistic for equality of proportions is:

We know from STAT.1030 that the relevant test statistic for equality of proportions is: 2. Chi 2 -tests for equality of proportions Introduction: Two Samples Consider comparing the sample proportions p 1 and p 2 in independent random samples of size n 1 and n 2 out of two populations which

More information

Today s lecture. Lecture 6: Dichotomous Variables & Chi-square tests. Proportions and 2 2 tables. How do we compare these proportions

Today s lecture. Lecture 6: Dichotomous Variables & Chi-square tests. Proportions and 2 2 tables. How do we compare these proportions Today s lecture Lecture 6: Dichotomous Variables & Chi-square tests Sandy Eckel seckel@jhsph.edu Dichotomous Variables Comparing two proportions using 2 2 tables Study Designs Relative risks, odds ratios

More information

12.5: CHI-SQUARE GOODNESS OF FIT TESTS

12.5: CHI-SQUARE GOODNESS OF FIT TESTS 125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

More information

CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES

CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES CHAPTER 11. GOODNESS OF FIT AND CONTINGENCY TABLES The chi-square distribution was discussed in Chapter 4. We now turn to some applications of this distribution. As previously discussed, chi-square is

More information

Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit

Chi-Square Test. Contingency Tables. Contingency Tables. Chi-Square Test for Independence. Chi-Square Tests for Goodnessof-Fit Chi-Square Tests 15 Chapter Chi-Square Test for Independence Chi-Square Tests for Goodness Uniform Goodness- Poisson Goodness- Goodness Test ECDF Tests (Optional) McGraw-Hill/Irwin Copyright 2009 by The

More information

Mendelian Inheritance & Probability

Mendelian Inheritance & Probability Mendelian Inheritance & Probability (CHAPTER 2- Brooker Text) January 31 & Feb 2, 2006 BIO 184 Dr. Tom Peavy Problem Solving TtYy x ttyy What is the expected phenotypic ratio among offspring? Tt RR x Tt

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Chapter 11 Chi square Tests

Chapter 11 Chi square Tests 11.1 Introduction 259 Chapter 11 Chi square Tests 11.1 Introduction In this chapter we will consider the use of chi square tests (χ 2 tests) to determine whether hypothesized models are consistent with

More information

Lecture 7: Binomial Test, Chisquare

Lecture 7: Binomial Test, Chisquare Lecture 7: Binomial Test, Chisquare Test, and ANOVA May, 01 GENOME 560, Spring 01 Goals ANOVA Binomial test Chi square test Fisher s exact test Su In Lee, CSE & GS suinlee@uw.edu 1 Whirlwind Tour of One/Two

More information

11-2 Goodness of Fit Test

11-2 Goodness of Fit Test 11-2 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table). We will use a hypothesis

More information

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling

Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Data Analysis and Uncertainty Part 3: Hypothesis Testing/Sampling Instructor: Sargur N. University at Buffalo The State University of New York srihari@cedar.buffalo.edu Topics 1. Hypothesis Testing 1.

More information

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

More information

PASS Sample Size Software

PASS Sample Size Software Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

More information

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions

Chi Squared and Fisher's Exact Tests. Observed vs Expected Distributions BMS 617 Statistical Techniques for the Biomedical Sciences Lecture 11: Chi-Squared and Fisher's Exact Tests Chi Squared and Fisher's Exact Tests This lecture presents two similarly structured tests, Chi-squared

More information

Appendix A The Chi-Square (32) Test in Genetics

Appendix A The Chi-Square (32) Test in Genetics Appendix A The Chi-Square (32) Test in Genetics With infinitely large sample sizes, the ideal result of any particular genetic cross is exact conformation to the expected ratio. For example, a cross between

More information

The alternative hypothesis,, is the statement that the parameter value somehow differs from that claimed by the null hypothesis. : 0.5 :>0.5 :<0.

The alternative hypothesis,, is the statement that the parameter value somehow differs from that claimed by the null hypothesis. : 0.5 :>0.5 :<0. Section 8.2-8.5 Null and Alternative Hypotheses... The null hypothesis,, is a statement that the value of a population parameter is equal to some claimed value. :=0.5 The alternative hypothesis,, is the

More information

Statistics for Management II-STAT 362-Final Review

Statistics for Management II-STAT 362-Final Review Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

More information

Basic concepts and introduction to statistical inference

Basic concepts and introduction to statistical inference Basic concepts and introduction to statistical inference Anna Helga Jonsdottir Gunnar Stefansson Sigrun Helga Lund University of Iceland (UI) Basic concepts 1 / 19 A review of concepts Basic concepts Confidence

More information

Testing Hypotheses using SPSS

Testing Hypotheses using SPSS Is the mean hourly rate of male workers $2.00? T-Test One-Sample Statistics Std. Error N Mean Std. Deviation Mean 2997 2.0522 6.6282.2 One-Sample Test Test Value = 2 95% Confidence Interval Mean of the

More information

Examination 110 Probability and Statistics Examination

Examination 110 Probability and Statistics Examination Examination 0 Probability and Statistics Examination Sample Examination Questions The Probability and Statistics Examination consists of 5 multiple-choice test questions. The test is a three-hour examination

More information

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2

Math 58. Rumbos Fall 2008 1. Solutions to Review Problems for Exam 2 Math 58. Rumbos Fall 2008 1 Solutions to Review Problems for Exam 2 1. For each of the following scenarios, determine whether the binomial distribution is the appropriate distribution for the random variable

More information

Chi Square Test. Paul E. Johnson Department of Political Science. 2 Center for Research Methods and Data Analysis, University of Kansas

Chi Square Test. Paul E. Johnson Department of Political Science. 2 Center for Research Methods and Data Analysis, University of Kansas Chi Square Test Paul E. Johnson 1 2 1 Department of Political Science 2 Center for Research Methods and Data Analysis, University of Kansas 2011 What is this Presentation? Chi Square Distribution Find

More information

Math 62 Statistics Sample Exam Questions

Math 62 Statistics Sample Exam Questions Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

More information

Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms

Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms Generalization of Chisquare Goodness-of-Fit Test for Binned Data Using Saturated Models, with Application to Histograms Robert D. Cousins Dept. of Physics and Astronomy University of California, Los Angeles,

More information

Analyzing tables. Chi-squared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO

Analyzing tables. Chi-squared, exact and monte carlo tests. Jon Michael Gran Department of Biostatistics, UiO Analyzing tables Chi-squared, exact and monte carlo tests Jon Michael Gran Department of Biostatistics, UiO MF9130 Introductory course in statistics Thursday 26.05.2011 1 / 50 Overview Aalen chapter 6.5,

More information

Bivariate Analysis. Comparisons of proportions: Chi Square Test (X 2 test) Variable 1. Variable 2 2 LEVELS >2 LEVELS CONTINUOUS

Bivariate Analysis. Comparisons of proportions: Chi Square Test (X 2 test) Variable 1. Variable 2 2 LEVELS >2 LEVELS CONTINUOUS Bivariate Analysis Variable 1 2 LEVELS >2 LEVELS CONTINUOUS Variable 2 2 LEVELS X 2 chi square test >2 LEVELS X 2 chi square test CONTINUOUS t-test X 2 chi square test X 2 chi square test ANOVA (F-test)

More information

Hypothesis Testing. Learning Objectives. After completing this module, the student will be able to

Hypothesis Testing. Learning Objectives. After completing this module, the student will be able to Hypothesis Testing Learning Objectives After completing this module, the student will be able to carry out a statistical test of significance calculate the acceptance and rejection region calculate and

More information

Chapter Five: Paired Samples Methods 1/38

Chapter Five: Paired Samples Methods 1/38 Chapter Five: Paired Samples Methods 1/38 5.1 Introduction 2/38 Introduction Paired data arise with some frequency in a variety of research contexts. Patients might have a particular type of laser surgery

More information

Chapter Additional: Standard Deviation and Chi- Square

Chapter Additional: Standard Deviation and Chi- Square Chapter Additional: Standard Deviation and Chi- Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret

More information

Random Uniform Clumped. 0 1 2 3 4 5 6 7 8 9 Number of Individuals per Sub-Quadrat. Number of Individuals per Sub-Quadrat

Random Uniform Clumped. 0 1 2 3 4 5 6 7 8 9 Number of Individuals per Sub-Quadrat. Number of Individuals per Sub-Quadrat 4-1 Population ecology Lab 4: Population dispersion patterns I. Introduction to population dispersion patterns The dispersion of individuals in a population describes their spacing relative to each other.

More information

Dr. Young. Genetics Problems Set #1 Answer Key

Dr. Young. Genetics Problems Set #1 Answer Key BIOL276 Dr. Young Name Due Genetics Problems Set #1 Answer Key For problems in genetics, if no particular order is specified, you can assume that a specific order is not required. 1. What is the probability

More information

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test... Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

More information

EMPIRICAL FREQUENCY DISTRIBUTION

EMPIRICAL FREQUENCY DISTRIBUTION INTRODUCTION TO MEDICAL STATISTICS: Mirjana Kujundžić Tiljak EMPIRICAL FREQUENCY DISTRIBUTION observed data DISTRIBUTION - described by mathematical models 2 1 when some empirical distribution approximates

More information

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing

Chapter 8 Hypothesis Testing Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing Chapter 8 Hypothesis Testing 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim About a Proportion 8-5 Testing a Claim About a Mean: s Not Known 8-6 Testing

More information

Hardy-Weinberg Equilibrium Problems

Hardy-Weinberg Equilibrium Problems Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of

More information

The Neyman-Pearson lemma. The Neyman-Pearson lemma

The Neyman-Pearson lemma. The Neyman-Pearson lemma The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

More information

NPTEL STRUCTURAL RELIABILITY

NPTEL STRUCTURAL RELIABILITY NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

More information

Hypothesis testing S2

Hypothesis testing S2 Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to

More information

LS4 Problem Set 2, Population Genetics. Corresponding Quiz: November (Along with positional cloning)

LS4 Problem Set 2, Population Genetics. Corresponding Quiz: November (Along with positional cloning) LS4 Problem Set 2, 2010 Population Genetics Corresponding Lectures: November 5, 8 and 10 Corresponding Reading: Hartwell et al., 757-773 Corresponding Quiz: November 15-19 (Along with positional cloning)

More information

Null Hypothesis H 0. The null hypothesis (denoted by H 0

Null Hypothesis H 0. The null hypothesis (denoted by H 0 Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

More information

Estimating the Frequency Distribution of the. Numbers Bet on the California Lottery

Estimating the Frequency Distribution of the. Numbers Bet on the California Lottery Estimating the Frequency Distribution of the Numbers Bet on the California Lottery Mark Finkelstein November 15, 1993 Department of Mathematics, University of California, Irvine, CA 92717. Running head:

More information

Morgan s fly experiment. Linkage Mapping. Genetic recombination. Genetic Recombination. Crossing Over and Recombination. CS Jim Holland

Morgan s fly experiment. Linkage Mapping. Genetic recombination. Genetic Recombination. Crossing Over and Recombination. CS Jim Holland Morgan s fly experiment Linkage Mapping CS71 009 Jim Holland r r Vg Vg (red eyes, normal wings) prprvgvg (purple eyes, vestigial wings) r prvg vg (F1) prprvgvg 19 151 151 1195 r prvg vg r prvgvg prprvg

More information

Likelihood: Frequentist vs Bayesian Reasoning

Likelihood: Frequentist vs Bayesian Reasoning "PRINCIPLES OF PHYLOGENETICS: ECOLOGY AND EVOLUTION" Integrative Biology 200B University of California, Berkeley Spring 2009 N Hallinan Likelihood: Frequentist vs Bayesian Reasoning Stochastic odels and

More information

Unit 29 Chi-Square Goodness-of-Fit Test

Unit 29 Chi-Square Goodness-of-Fit Test Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

More information

PhysicsAndMathsTutor.com

PhysicsAndMathsTutor.com 1. A machine fills jars with jam. The weight of jam in each jar is normally distributed. To check the machine is working properly the contents of a random sample of 15 jars are weighed in grams. Unbiased

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Genetics. The study of heredity. discovered the. Gregor Mendel (1860 s) garden peas.

Genetics. The study of heredity. discovered the. Gregor Mendel (1860 s) garden peas. GENETICS Genetics The study of heredity. Gregor Mendel (1860 s) discovered the fundamental principles of genetics by breeding garden peas. Genetics Alleles 1. Alternative forms of genes. 2. Units that

More information

Chapter 3 RANDOM VARIATE GENERATION

Chapter 3 RANDOM VARIATE GENERATION Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

More information

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 10 Chi-Square Test Relative Risk/Odds Ratios

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 10 Chi-Square Test Relative Risk/Odds Ratios UCLA STAT 3 Introduction to Statistical Methods for the Life and Health Sciences Instructor: Ivo Dinov, Asst. Prof. of Statistics and Neurology Chapter 0 Chi-Square Test Relative Risk/Odds Ratios Teaching

More information

Goodness of Fit. Proportional Model. Probability Models & Frequency Data

Goodness of Fit. Proportional Model. Probability Models & Frequency Data Probability Models & Frequency Data Goodness of Fit Proportional Model Chi-square Statistic Example R Distribution Assumptions Example R 1 Goodness of Fit Goodness of fit tests are used to compare any

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Natural Selection, Chi-square & Hardy-Weinberg Calculations

Natural Selection, Chi-square & Hardy-Weinberg Calculations BIOL 0 LAB 5 Natural Selection, Chi-square & Hardy-Weinberg Calculations Variability exists in all natural populations. For a wide variety of reasons, some phenotypes (visible characters) or genotypes

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Chapter 23. Two Categorical Variables: The Chi-Square Test

Chapter 23. Two Categorical Variables: The Chi-Square Test Chapter 23. Two Categorical Variables: The Chi-Square Test 1 Chapter 23. Two Categorical Variables: The Chi-Square Test Two-Way Tables Note. We quickly review two-way tables with an example. Example. Exercise

More information

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2

Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Chapter 11: Linear Regression - Inference in Regression Analysis - Part 2 Note: Whether we calculate confidence intervals or perform hypothesis tests we need the distribution of the statistic we will use.

More information

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions.

Statistics II Final Exam - January Use the University stationery to give your answers to the following questions. Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

More information

Breeding Value versus Genetic Value

Breeding Value versus Genetic Value Breeding Value versus Genetic Value Gene351 (not( Gene251) Lecture 6 Key terms Breeding value Genetic value Average effect of alleles Reading: Prescribed None Suggested Falconer & McKay, 1996 Revision

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

More information

Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540

Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540 Probability, Binomial Distributions and Hypothesis Testing Vartanian, SW 540 1. Assume you are tossing a coin 11 times. The following distribution gives the likelihoods of getting a particular number of

More information

GENERAL BIOLOGY LAB 1 (BSC1010L) Lab #8: Mendelian Genetics

GENERAL BIOLOGY LAB 1 (BSC1010L) Lab #8: Mendelian Genetics GENERAL BIOLOGY LAB 1 (BSC1010L) Lab #8: Mendelian Genetics OBJECTIVES: Understand Mendel s laws of segregation and independent assortment. Differentiate between an organism s genotype and phenotype. Recognize

More information

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

2 Tests for Goodness of Fit:

2 Tests for Goodness of Fit: Tests for Goodness of Fit: General Notion: We often wish to know whether a particular distribution fits a general definition Example: To use t tests, we must suppose that the population is normally distributed

More information

1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error.

1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error. 1. Rejecting a true null hypothesis is classified as a error. 2. Failing to reject a false null hypothesis is classified as a error. 8.5 Goodness of Fit Test Suppose we want to make an inference about

More information

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980)

Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) Chi Square (χ 2 ) Statistical Instructions EXP 3082L Jay Gould s Elaboration on Christensen and Evans (1980) For the Driver Behavior Study, the Chi Square Analysis II is the appropriate analysis below.

More information

ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer.

ANOVA MULTIPLE CHOICE QUESTIONS. In the following multiple-choice questions, select the best answer. ANOVA MULTIPLE CHOICE QUESTIONS In the following multiple-choice questions, select the best answer. 1. Analysis of variance is a statistical method of comparing the of several populations. a. standard

More information