Hardy-Weinberg Equilibrium: Part 1 and 2. Hardy-Weinberg Equilibrium: Part 1 and 2

Similar documents
Report. A Note on Exact Tests of Hardy-Weinberg Equilibrium. Janis E. Wigginton, 1 David J. Cutler, 2 and Gonçalo R. Abecasis 1

2 GENETIC DATA ANALYSIS

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

Hardy-Weinberg Equilibrium Problems

HLA data analysis in anthropology: basic theory and practice

Heredity. Sarah crosses a homozygous white flower and a homozygous purple flower. The cross results in all purple flowers.

Chapter 9 Patterns of Inheritance

Population Genetics and Multifactorial Inheritance 2002

Forensic Statistics. From the ground up. 15 th International Symposium on Human Identification

Popstats Unplugged. 14 th International Symposium on Human Identification. John V. Planz, Ph.D. UNT Health Science Center at Fort Worth

Mendelian inheritance and the

Name: Class: Date: ID: A

5 GENETIC LINKAGE AND MAPPING

Mendelian Genetics in Drosophila

Heredity - Patterns of Inheritance

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Basics of Marker Assisted Selection

Name: 4. A typical phenotypic ratio for a dihybrid cross is a) 9:1 b) 3:4 c) 9:3:3:1 d) 1:2:1:2:1 e) 6:3:3:6

Chi-square test Fisher s Exact test

PRACTICE PROBLEMS - PEDIGREES AND PROBABILITIES

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.

Summary Genes and Variation Evolution as Genetic Change. Name Class Date

Biology Notes for exam 5 - Population genetics Ch 13, 14, 15

CCR Biology - Chapter 7 Practice Test - Summer 2012

Paternity Testing. Chapter 23

Genetics and Evolution: An ios Application to Supplement Introductory Courses in. Transmission and Evolutionary Genetics

A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes.

Biology 1406 Exam 4 Notes Cell Division and Genetics Ch. 8, 9

Mendelian and Non-Mendelian Heredity Grade Ten

The Making of the Fittest: Natural Selection in Humans

Genetics for the Novice

SeattleSNPs Interactive Tutorial: Web Tools for Site Selection, Linkage Disequilibrium and Haplotype Analysis

Test Positive True Positive False Positive. Test Negative False Negative True Negative. Figure 5-1: 2 x 2 Contingency Table

PRINCIPLES OF POPULATION GENETICS

Genetics 1. Defective enzyme that does not make melanin. Very pale skin and hair color (albino)

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

Genetics Lecture Notes Lectures 1 2

The Genetics of Drosophila melanogaster

CHROMOSOMES AND INHERITANCE

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Section 12 Part 2. Chi-square test

GENETIC CROSSES. Monohybrid Crosses

Cystic Fibrosis Webquest Sarah Follenweider, The English High School 2009 Summer Research Internship Program

Tutorial on gplink. PLINK tutorial, December 2006; Shaun Purcell,

Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted

Blood Stains at the Crime Scene Forensic Investigation

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc

Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) (d) 20 (e) 25 (f) 80. Totals/Marginal

I. Genes found on the same chromosome = linked genes

The correct answer is c A. Answer a is incorrect. The white-eye gene must be recessive since heterozygous females have red eyes.

Chapter 4 Pedigree Analysis in Human Genetics. Chapter 4 Human Heredity by Michael Cummings 2006 Brooks/Cole-Thomson Learning

P (B) In statistics, the Bayes theorem is often used in the following way: P (Data Unknown)P (Unknown) P (Data)

7 POPULATION GENETICS

Chromosomes, Mapping, and the Meiosis Inheritance Connection

Chapter 3. Chapter Outline. Chapter Outline 9/11/10. Heredity and Evolu4on

Crosstabulation & Chi Square

Problems 1-6: In tomato fruit, red flesh color is dominant over yellow flesh color, Use R for the Red allele and r for the yellow allele.

Chi Square Tests. Chapter Introduction

5. The cells of a multicellular organism, other than gametes and the germ cells from which it develops, are known as

Association Between Variables

LAB : PAPER PET GENETICS. male (hat) female (hair bow) Skin color green or orange Eyes round or square Nose triangle or oval Teeth pointed or square

Terms: The following terms are presented in this lesson (shown in bold italics and on PowerPoint Slides 2 and 3):

DNA Determines Your Appearance!

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Evolution (18%) 11 Items Sample Test Prep Questions

Chapter 19 The Chi-Square Test

CHAPTER 10 BLOOD GROUPS: ABO AND Rh

Answer Key Problem Set 5

Exploring contact patterns between two subpopulations

Genetic Mutations. Indicator 4.8: Compare the consequences of mutations in body cells with those in gametes.

Bio EOC Topics for Cell Reproduction: Bio EOC Questions for Cell Reproduction:

Bio 102 Practice Problems Mendelian Genetics and Extensions

Laboratory Procedure Manual. Rh Phenotyping

Bio 102 Practice Problems Mendelian Genetics: Beyond Pea Plants

6.4 Normal Distribution

7A The Origin of Modern Genetics

Two copies of each autosomal gene affect phenotype.

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Genetics 301 Sample Final Examination Spring 2003

MAT 155. Key Concept. February 03, S4.1 2_3 Review & Preview; Basic Concepts of Probability. Review. Chapter 4 Probability

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Sample Size and Power in Clinical Trials

4. Continuous Random Variables, the Pareto and Normal Distributions

Genetics Review for USMLE (Part 2)

2 18. If a boy s father has haemophilia and his mother has one gene for haemophilia. What is the chance that the boy will inherit the disease? 1. 0% 2

Gene Mapping Techniques

Chapter 5. Discrete Probability Distributions

SNP Data Integration and Analysis for Drug- Response Biomarker Discovery

Two-locus population genetics

Lecture 5 : The Poisson Distribution

EMPIRICAL FREQUENCY DISTRIBUTION

Package forensic. February 19, 2015

Continuous and discontinuous variation

B2 5 Inheritrance Genetic Crosses

Probability Distributions

GENETICS OF HUMAN BLOOD TYPE

This fact sheet describes how genes affect our health when they follow a well understood pattern of genetic inheritance known as autosomal recessive.

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

Transcription:

Allele Frequencies and Genotype Frequencies How do allele frequencies relate to genotype frequencies in a population? If we have genotype frequencies, we can easily obtain allele frequencies. Under what conditions/assumptions can we obtain genotype frequencies from allele frequencies?

Example Cystic Fibrosis is caused by a recessive allele. The locus for the allele is in region 7q31. Of 10,000 Caucasian births, 5 were found to have Cystic Fibrosis and 442 were found to be heterozygous carriers of the mutation that causes the disease. Denote the Cystic Fibrosis allele with cf and the normal allele with N. Based on this sample, how can we estimate the allele frequencies in the population? We can estimate the genotype frequencies in the population based on this sample 5 are cf, cf 10000 442 10000 9553 10000 are N, cf are N, N What are the allele frequencies estimates for cf and N?

Example We can use 0.0005, 0.0442, and 0.9553 as our estimates of the genotype frequencies in the population. The only assumption we use to obtain allele frequency estimates is that the sample is a random sample. Starting with these genotype frequencies, we can estimate the allele frequencies without making any further assumptions: Out of 20,000 alleles in the sample 442+10 20000 1 442+10 20000 =.0226 are cf =.9774 are N

Hardy-Weinberg Equilibrium In contrast, going from allele frequencies to genotype frequencies requires more assumptions. HWE Model Assumptions infinite population discrete generations random mating no selection no migration in or out of population no mutation equal initial genotype frequencies in the two sexes

Hardy-Weinberg Equilibrium Consider a locus with two alleles: A and a Assume in the first generation the alleles are not in HWE and the genotype frequency distribution is as follows: where u + v + w = 1 1st Generation Genotype Frequency AA u Aa v aa w From the genotype frequencies, we can easily obtain allele frequencies: P(A) = u + 1 2 v P(a) = w + 1 2 v

Hardy-Weinberg Equilibrium In the first generation: P(A) = u + 1 2 v and P(a) = w + 1 2 v 2nd Generation Mating Type Mating Frequency Expected Progeny AA AA u 2 AA 1 AA Aa 2uv 2 AA : 1 2 Aa AA aa 2uw Aa Aa Aa v 2 1 4 AA : 1 2 Aa : 1 4 aa 1 Aa aa 2vw 2 Aa : 1 2 aa aa aa w 2 aa Check: u 2 + 2uv + 2uw + v 2 + 2vw + w 2 = (u + v + w) 2 = 1 For the second generation, we have the following genotype frequencies: p P(AA) = u 2 + 1 2 (2uv) + 1 4 v 2 = ( u + 1 2 v) 2 q P(Aa) = uv + 2uw + 1 2 v 2 + vw = 2 ( u + 1 2 v) ( 1 2 v + w) r P(aa) = 1 4 v 2 + 1 2 (2vw) + w 2 = ( w + 1 2 v) 2 What are the genotype frequencies in the third generation?

Hardy-Weinberg Equilibrium The frequency of the AA genotype in the third generation is: P(AA) = ( p + 1 ) ( 2 ( 2 q = u + 1 ) 2 2 v + ( ) 1 2 (u + 12 ) ( ) ) 2 12 2 v v + w = ((u + 12 ) [(u v + 12 ) ( )]) 2 1 v + 2 v + w = ((u + 12 ) ) 2 v [(u + v + w)] = ((u + 12 ) 2 (( 1) v = u + 1 )) 2 2 v = p Similarly, P(Aa) = q and P(aa) = r for generation 3 Equilibrium is reached after one generation of random mating under the Hardy-Weinberg assumptions! That is, the genotype frequencies remain the same from generation to generation.

Hardy-Weinberg Equilibrium When a population is in Hardy-Weinberg equilibrium, the alleles that comprise a genotype can be thought of as having been chosen at random from the alleles in a population. We have the following relationship between genotype frequencies and allele frequencies for a population in Hardy-Weinberg equilibrium: P(AA) = P(A)P(A) P(Aa) = 2P(A)P(a) P(aa) = P(a)P(a)

Hardy-Weinberg Equilibrium For example, consider a diallelic locus with alleles A and B with frequencies 0.85 and 0.15, respectively. If the locus is in HWE, then the genotype frequencies are: P(AA) = 0.85 0.85 = 0.7225 P(AB) = 0.85 0.15 + 0.15 0.85 = 0.2550 P(BB) = 0.15 0.15 = 0.0225

Hardy-Weinberg Equilibrium Example Establishing the genetics of the ABO blood group system was one of the first breakthroughs in Mendelian genetics. The locus corresponding to the ABO blood group has three alleles, A, B and O and is located on chromosome 9q34. Alleles A and B are co-dominant, and the alleles A and B are dominant to O. This leads to the following genotypes and phenotypes: Genotype AA, AO BB, BO AB OO Blood Type A B AB O Mendels first law allows us to quantify the types of gametes an individual can produce. For example, an individual with type AB produces gametes A and B with equal probability (1/2).

Hardy-Weinberg Equilibrium Example From a sample of 21,104 individuals from the city of Berlin, allele frequencies have been estimated to be P(A)=0.2877, P(B)=0.1065 and P(O)=0.6057. If an individual has blood type B, what are the possible genotypes for this individual, what possible gametes can be produced, and what is the frequency of the genotypes and gametes if HWE is assumed? If a person has blood type B, then the genotype can be BO or BB. What is P(genotype is BO blood type is B)? What is P(genotype is BB blood type is B)? What is P(B gamete blood type is B)? What is P(O gamete blood type is B)?

Violations of Hardy-Weinberg Equilibrium With HWE: allele frequencies = genotype frequencies. Violations of HWE assumption include: Small population sizes. Chance events can make a big difference. Deviations from random mating. Assortive mating. Mating between genotypically similar individuals increases homozygosity for the loci involved in mate choice without altering allele frequencies.

Violations of Hardy-Weinberg Equilibrium Disassortive mating. Mating between dissimilar individuals increases heterozygosity without altering allele frequencies. Inbreeding. Mating between relatives increases homozygosity for the whole genome without affecting allele frequencies. Population sub-structure Mutation Migration Selection

Testing Hardy-Weinberg Equilibrium When a locus is not in HWE, then this suggests one or more of the Hardy-Weinberg assumptions is false. Departure from HWE has been used to infer the existence of natural selection, argue for existence of assortive (non-random) mating, and infer genotyping errors. It is therefore of interest to test whether a population is in HWE at a locus. We will discuss the two most popular ways of testing HWE: Chi-Square test Exact test

Chi-Square Goodness-Of-Fit Test Compares observed genotype counts with the values expected under Hardy-Weinberg. For a locus with two alleles, we might construct a table as follows: Genotype Observed Expected under HWE AA n AA np 2 A Aa n Aa 2np A (1 p A ) aa n aa n(1 p A ) 2 where n is the number of individuals in the sample and p A is the probability that a random allele in the population is of type A. We estimate p A with ˆp A = 2n AA+n Aa 2n

Chi-Square Goodness-Of-Fit Test X 2 (Observed count Expected count) 2 = Expected count genotypes ( X 2 naa nˆp 2 2 = a) nˆp a 2 + (n Aa 2nˆp a (1 ˆp a )) 2 ( naa n(1 ˆp a ) 2) 2 + 2nˆp a (1 ˆp a ) n(1 ˆp a ) 2 Under H 0, the X 2 test statistic has an approximate χ 2 distribution with 1 degree of freedom Recall the rule of thumb for such χ 2 tests: the expected count should be at least 5 in every cell. If allele frequencies are low, and/or sample size is small, and/or there are many alleles at a locus, this may be a problem.

Linkage Disequilibrium Example A sample of 1000 British people were genotyped for a hypertension study. The genotypes for a SNP in a candidate gene for hypertension are as follows for the sample: Is the SNP in HWE? Geno 1 Geno 2 Geno 3 SNP 1 298 AA 489 AG 213 GG

HWE Exact Test The Hardy-Weinberg exact test is based on calculating probabilities P(genotype counts allele counts) under HWE.

HWE Exact Test Example Suppose we have a sample of 5 people and we observe genotypes AA, AA, AA, aa, and aa. If the 6 A alleles and the 4 a alleles were at chosen at random to form the genotypes of the 5 people, what is the probability of observing 3 homozygous AA and 2 homozygous aa? If five individuals have among them 6 A alleles and 4 a alleles, what genotype configurations are possible?

HWE Exact Test Example aa Aa AA theoretical probability 2 0 3 1 2 2 0 4 1

HWE Exact Test Example aa Aa AA theoretical probability 2 0 3 0.048 1 2 2 0.571 0 4 1 0.381

HWE Exact Test Example Now suppose we have a sample of 100 individuals and we observe 21 a alleles and 179 A alleles, what genotype configurations are possible?

HWE Exact Test Example Note that specifying the number of heterozygotes determines the number of AA and aa genotypes. aa Aa AA theoretical probability 1.000001 3.000001 5 <.000001 7.000001 9.000047 11.000870 13.009375 15.059283 17.214465 19.406355 21.309604 Wiggington, Cutler, Abecasis (AJHG, 2005)

HWE Exact Test Example The formula is: P(n Aa n A, n a, HWE) = n! n AA!n Aa!n aa! 2nAan A!n a! (2n)! If we had actually observed 13 heterozygotes in our sample, then the exact test one-sided p-value of observing as sample at least as extreme as a sample with 13 heterozygotes would be.009375 +.000870 +.000047 +.000001 = 0.010293 (There is a deficit of heterozygotes. To get this one-sided p-value, we sum the probabilities of all configurations with probability equal to or less that the observed configuration.)

Comparison of HWE χ 2 Test and Exact Test The next slide is Figure 1 from Wigginton et al (AJHG 2005). The upper curves give the type I error rate of the chi-square test; the bottom curves give the type I error rate from the exact test. The exact test is always conservative; the chi-square test can be either conservative or anti-conservative.

HWE TYPE I ERROR

Comparison of HWE χ 2 Test and Exact Test The Exact Test should be preferred for smaller sample sizes and/or multiallelic loci, since the χ 2 test is not valid in these cases (rule of thumb: must expect at least 5 in each cell) The coarseness of Exact Test means it is conservative. In Example 4, we reject the null hypothesis that HWE holds if 13 or fewer heterozygotes are observed. But the observed p-value is actually 0.010293. Thus to reject at the 0.05 level, we actual have to see a p-value as small as 0.010293.

Comparison of HWE χ 2 Test and Exact Test The χ 2 test can have inflated type I error rates. Suppose we have 100 genes for which HWE holds. We conduct 100 χ 2 tests at level 0.05. We expect to reject the null hypothesis that HWE holds in 5 of the tests. However, the results of Wiggington et al (AJHG, 2005) say, on average, it can be more than 5 depending on the minor allele count. Although it is not desirable for a test to be conservative (Exact Test), an anti-conservative test is considered unacceptable. Wiggington et al (AJHG, 2005) give an extreme example with a sample of 1000 individuals. At a nominal a=0.001, the true type I error rate for the χ 2 test exceeds 0.06.

Comparison of HWE χ 2 Test and Exact Test The χ 2 test is a two-sided test. In contrast, the Exact Test can be made one-sided, if appropriate. Specifically, one can test for a deficit of heterozygotes (if one suspects inbreeding or population stratification); test for an excess of heterozygotes (which may indicate genotyping errors for some genotyping technologies). Exact test is more computationally intensive