CH927 Quantitative Genomics Lecture 2. How can quantitative traits be mapped?

Size: px
Start display at page:

Download "CH927 Quantitative Genomics Lecture 2. How can quantitative traits be mapped?"

Transcription

1 CH927 Quantitative Genomics Lecture 2 How can quantitative traits be mapped?

2 Lecture objectives By the end of this lecture you should be able to explain: What the main steps in QTL mapping are What the different methods for QTL analysis are: - Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM) and under which experimental conditions they should be used What the different statistical methods for QTL analysis are: - t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs

3 Basis for QTL mapping known for over 70 years but lack of genetic markers prevented widespread use until the mid 80 s With DNA sequencing, the number & density of markers have grown Also, more statistically-sophisticated mapping methods have been developed 1. Score a population for (i) a trait, and (ii) distribution of genome markers 2. Identify regions of the genome containing QTLs based on occurance of a phenotypemarker association that is significantly more likely than chance

4 Association of phenotypes with markers a G A g agb Agb agb agb AgB B b agb AgB agb Agb agb agb agb agb agb AgB AgB Agb Agb A/a and B/b = molecular scores G/g = phenotypic score Results from marker A/a: suggests that the gene is very close to the marker Results from marker B/b: suggests that the gene is not linked to the marker

5 This is a generalisation of the principle...but for only one gene. We need to consider Quantitative Trait Loci (multiple) a G A g agch BJkD AgcH bjkd AgcH BJKD AgcH BJkd agch bjkd c H C h AgcH BJkD agch bjkd agch BJKd AgcH bjkd B J b j k D K d AgcH BJkd AgcH BJKD AgcH BJkD agch bjkd agch bjkd AgcH bjkd AgcH bjkd agch BJkD agch BJKd

6 Objectives of QTL analysis 1. Score a population for (i) a trait, and (ii) distribution of genome markers 2. Identify regions of the genome containing QTLs based on occurence of a phenotype-marker association that is significantly more likely than chance 3. Estimate the effects of the QTLs on the quantitative trait: - many genes with small effect each or few genes with large effect each? - their effects on the trait: is gene action additive or dominant? - their positions in the genome: linkage and association, epistasis - their interaction with the environment 4. Identify candidate genes underlying the QTL and thus the trait

7 QTL analysis can be classified by the type of progeny used All of the different progenies are derived from the same reference population From this reference population different progenies can be produced P1 P2 MMQQ x mmqq M = marker genotype Q = QTL genotype TC4 self F2 self x 5 x P3 F1 MmQq self F7 (RILs) x P4 SI lines x P3 TC1 x P4 TC2 TC3

8 F2 x P2 Backcrosses and Near Isogenic Lines (NILs) self BC1 (Backcross1) F1: use for QTL mapping BC1 F2 Rapid generation of material for QTL analysis x BC1 F3 BC2 F1 BC2 F2 BC2 F3 Near Isogenic Lines Isolate part of genome A of interest

9 To map a quantitative trait: 1. Make a cross and generate marker data - Type of mapping population (e.g. RIL) 2. Generate linkage maps - Genome size, genome coverage 3. Collect phenotypic measurements - Evaluate in uniform environment, - Evaluate in multiple environments - Data transformation (approach normal distribution) frequency A 1 /A 2 A 1 /A 1 A 2 /A 2 trait value Total variance = V T = V G + V E genetic variance + environmental variance heterogeneous env. stochastic events measurement error Assumes genes act additively (i.e. no epistasis) and that their effects are not conditional on environment, otherwise V T = V G + V E + V GxG + V GxE

10 Lecture objectives By the end of this lecture you should be able to explain: What the main steps in QTL mapping are What the different methods for QTL analysis are: - Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM) and under which experimental conditions they should be used What the different statistical methods for QTL analysis are: - t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs

11 4. The statistical machinery for QTL mapping Several analysis frameworks for marker-qtl associations: - Single marker tests (t-test, F-test or Linear Regression) - Interval/Flanking Mapping (IM) (pair of markers simultaneously) - Composite Interval Mapping (CIM) (analysis of a marker interval, flanked by adjacent markers, ML-based) - Multiple Interval Mapping (MIM)

12 4. The statistical machinery for QTL mapping Four main analysis techniques: Simple t-test: use to evaluate presence of a QTL through statistical differences between two marker genotypes ANOVA (marker regression): detects marker differences when there are more than two marker genotypes. Produces a ranking of genotypes, in order of phenotypic effect for the trait of interest, and tests for significant differences between each genotype Multiple regression: simple remodelling of the ANOVA technique in regression terms, with the same ranking and testing for differences Linear regression: most complex point analysis method, allowing different characteristics of the QTL to be investigated. Including: dominance effects, additive effects genotype-environment interactions, epistasis

13 Probabilites and t-tests

14 Basic mapping format: conditional probablities The conditional probibility that the QTL genotype is Qq, given that the marker genotype is Mm: P1 P2 Pr(Qk Mj) = Pr(QkMj) Pr(Mj) MM QQ x mm qq Calculate this in an F2 from: gamete frequencies marker genotype probabilities Consider a QTL linked to a marker (recombination Fraction = c) In the F 2, freq(mq) = freq(mq) = (1-c)/2 freq(mq) = freq(mq) = c/2 self F2 F1 Mm Qq QTL genotypes = missing Marker genotypes = observed

15 Basic mapping format: conditional probablities In the F 2, freq(mq) = freq(mq) = (1-c)/2 freq(mq) = freq(mq) = c/2 Hence, Pr(MMQQ) = Pr(MQ)Pr(MQ) = (1-c) 2 /4 Pr(MMQq) = 2Pr(MQ)Pr(Mq) = 2c(1-c) /4 Pr(MMqq) = Pr(Mq)Pr(Mq) = c 2 /4 Since Pr(MM) = 1/4, the conditional probabilities become: Pr(QQ MM) = Pr(MMQQ)/Pr(MM) = (1-c) 2 Pr(Qq MM) = Pr(MMQq)/Pr(MM) = 2c(1-c) Pr(qq MM) = Pr(MMqq)/Pr(MM) = c 2

16 Using a t-test to probe a QTL e.g. backcross with two genes: marker (alleles M, m), and QTL (alleles Q, q) These two genes are linked with the recombination fraction of c MmQq Mmqq mmqq mmqq Frequency (1-c)/2 c/2 c/2 (1-c)/2 Mean effect m+a m m+a m Mean of marker genotype Mm: m 1 = (1-c)/2(m+a) + c/2m = m + (1-c)a A small MM-mm difference: small effect tight linkage Mean of marker genotype mm: m 0 = c/2(m+a) + (1-c)/2m = m + ca If trait mean is significantly different for the genotypes at a marker locus, it is linked to a QTL large effect loose linkage

17 ANOVA and single marker regression

18 Partitioning of variance: a simple ANOVA model Partition variance: genetically-determined and environmental components Model (there is a QTL linked to a marker) is tested against the null hypothesis of no QTL trait value A 1 /A 1 A 1 /A 2 A 2 /A 2 genotype

19 Partitioning of variance methodology Total sum of squares: calculate grand mean, deviation of each individual from mean SS T square each deviation & sum all the deviations for the population Total mean sum, MS T = SS T degrees of freedom = n-1 = total variance n=23 trait value Grand mean A 1 /A 1 A 1 /A 2 A 2 /A 2

20 Partitioning of variance: fitting the model Calculate mean for each genotype group SS R = residual sum of squares = sum (deviations of each individual from genotype mean) 2 Total mean sum, MS R = SS R degrees of freedom = (n-1) - #genotypes) = variance not explained by the model (or explained by this QTL) trait value Grand mean A 1 /A 1 A 1 /A 2 A 2 /A 2

21 Genetic variance and testing the model Model sum of squares, SS M = sum values for each genotype: (grand mean - each genotype mean) 2 x (# individuals with that genotype) Genetic variance, MS M = But since MS T = MS M + MS R SS M degrees of freedom = 2 It is easier to calculate as MS M = MS T - MS R

22 Genetic variance and testing the model To test whether the QTL explains a significant amount of the variation, calculate Model to residual variance, F-ratio = MS M / MS R Variance explained by the QTL = MS M / MS T Look up the minimum value of F that is unlikely to have occurred by chance, given 2 d.f. for MS M and 20 for MS R (F 3.49 for p 0.05 in this case) If F exceeds this value, we can reject the null hypothesis of no QTL MS M = MS T - MS R

23 This is essentially a least-squares regression Incorporate terms into the model to estimate: The additive effect of the alleles, a = half the difference between the averages for the two homozygotes can be positive or negative, depending on which allele is being considered The dominance deviation, d = the average difference between hets and the mid-point of the homs can also be positive or negative If d = ±a one allele completely dominant If d > ±a one allele shows over-dominance

24 Estimation of additive and dominance effects MmQq Mmqq mmqq mmqq Frequency (1-c)/2 c/2 c/2 (1-c)/2 Mean effect m+a m m+a m Mean of marker genotype Mm: m 1 Mean of marker genotype mm: m 0 a* = estimated additive effects d* = estimated dominance effects Additive effects (a): (m 1 m 0 )/2 = a(1-2c) = a* Dominance effects (d): m 2 - (m 1 m 0 )/2 = d(1-2c) = d* (m 1 m 0 )/2

25 Linear Models for QTL Detection Uses the linear relationship between the apparent affects of a marker on a quantitative character, and the substantial effects of all related QTLs that are linked to that marker y mk = π + b m + e mk Effect of marker genotype m on trait value Value of trait in kth individual of marker genotype m Differences in the distance between the QTL and the markers alter factors in this relationship Detection: a QTL is linked to the marker if at least one of the b m is significantly different from zero Estimation (QTL effect and position): have to relate the b m to the QTL effects and map position

26 Detecting epistasis One major advantage of linear models is their flexibility Test for epistasis between two QTLs: use an ANOVA with an interaction term: Effect from marker genotype at first marker set (can be > 1 loci) Interaction between marker genotypes i in 1st marker set and k in 2nd marker set y = π + ai + bk + di k + e Effect from marker genotype at second marker set At least one of the a i significantly different from 0 QTL linked to first marker set At least one of the b k significantly different from 0 QTL linked to second marker set At least one of the d ik significantly different from 0 interactions between QTL in sets 1 and two

27 Interval mapping and marker regression

28 Problems with single marker mapping using ANOVA If marker density is high, ANOVA with individual marker genotypes is effective: single marker analysis or single marker regression Three important weaknesses: Do not receive separate estimates of QTL location and QTL effect. Must discard individuals whose genotypes are missing at the marker When markers are sparse, the QTL may be quite far from all markers, and so the power for QTL detection will decrease

29 Interval mapping Can use probability estimates for the genotypes in intervals between markers Move the QTL position every 2cM from M 1 to M 2 and draw the profile of the F value. The peak of the profile corresponds to the best estimate of the QTL position F-value M 1 M 2 M 3 M 4 M 5 Testing position

30 Interval mapping implementation Carry out a QTL scan step-wise: once a significant QTL has been identified, other markers tested for their ability to explain the residual variation Known QTL are said to be fixed or co-factors in the regression F-ratio Interval mapping by regression (QTL Express) ** ** * ** * **

31 Interval mapping with regression approach Consider a marker interval M 1 -M 2. We assume that a QTL is located at a particular position between the two markers (r 1 and θ are fixed) With response variable, y i, and dependent variable, x i, a regression model is constructed: The phenotypic value for individual i affected by a QTL can be expressed as, y i = μ + a*x i + e i i = 1,, n (latent model) y i is the overall mean x*i is the indicator variable for QTL genotypes: x*i = 1 for Qq; 0 for qq a* is the additive effect effect of the putative QTL on the trait ei is the residual error, e i ~ N(0, σ 2 )

32 Advantages and disadvantages of interval mapping Advantages: - the position of the QTL can be inferred by a support interval - the estimated position and effects of the QTL tend to be asymptotically unbiased if there is only one segregating QTL on a chromosome - method requires fewer individuals Disadvantages: - this is not an interval test - even when there is no QTL within an interval, the likelihood profile on the interval can still exceed the threshold if there is a QTL nearby - if there is more than one QTL on a chromosome, the test statistic at the position being tested will be affected by all QTL and the estimated positions - not efficient to use only two markers at a time for testing

33 Flanking methods and Maximum likelihood

34 Flanking marker methods have been the most popular analysis techniques over recent years Due to their accuracy and level of characterisation of the putative QTL - combine both detection and estimation of QTL effects and position Two basic techniques: Maximum likelihood Maximum likelihood estimation through regression Three methods for estimating likelihood: Single marker maximum likelihood (least power) Flanking marker maximum likelihood (most versatile) Order restricted interval mapping (most power)

35 LOD score Estimating the QTL position (θ): Likelihood maps View θ as a fixed parameter, assume the QTL is located at a particular position View θ as a variable being estimated (derive log-likelihood equation for MLE of θ) (L O / L A ) = ratio of the likelihood of the null hypothesis (no QTL in the marker interval) to the likelihood of the alternative hypothesis (QTL present) LOD (Log of the Odds) = log 10 (L O / L A ) Support interval Estimated QTL location In each method a likelihood map is produced: Significance threshold 0 Chromosome position

36 Composite interval mapping (CIM) Uses multiple markers as additional factors (marker cofactors) i-1 i i+1 i+2 Interval being mapped Five different types of markers are considered for the regression model, depending on the characteristics of the chromosome region: - markers surrounding the QTL of interest - linked & unlinked markers within the QTL region - linked & unlinked markers outside the QTL region Method: Predict QTL marker genotype every x cm Carry out an LR test for QTL effect every x cm Combines MLE and multiple regression methods

37 Permutation testing to determine experiment-wide signficance thresholds Multiple testing problem: how often are random QTL effects of a certain magnitude detected in similar datasets? Method: top 5% of random - create a large number of random empirical datasets - take your marker data and randomly reassign the phenotypes back to the marker genotypes - repeat the QTL detection process - record the highest LR produced for a random QTL anywhere in the map 95% of random - repeat the whole process > 500 times - record the magnitude of the lowest random QTL observed in the top 5% of LR results = threshold

38 Multiple interval mapping Uses multiple marker intervals simultaneously Aims to map multiple QTLs in a single step Method: Build regression models which include all QTLs (detected first by CIM) Use information content (IC) theory to evaluate alternative models Allows simultaneous detection and estimation of additive, dominance & epistatic effects

39 Some examples of the final output

40 Genetics and genomics of post harvest senescence in broccoli Vicky Buchanan-Wollaston and Dave Pink (Warwick HRI) 1 2 JoinMap 2010 broccoli linkage map plant lines, 211 loci (189 SSR, 22 AFLP) 7 8 9

41 QTLs for senescence traits in broccoli Two major QTL for time to yellowing confirmed on 2010 broccoli map REML calculated: 64.4 % of line mean variation is genetic Chr Lod p >0.001 Chr Lod p > cm 7 cm 30.6% of variation 3.8 Lod p > % of variation MapQTL Permutation test 10,000 iterations

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING

GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary

More information

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele

Marker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele Marker-Assisted Backcrossing Marker-Assisted Selection CS74 009 Jim Holland Target gene = Recurrent parent allele = Donor parent allele. Select donor allele at markers linked to target gene.. Select recurrent

More information

Basics of Marker Assisted Selection

Basics of Marker Assisted Selection asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New

More information

2 GENETIC DATA ANALYSIS

2 GENETIC DATA ANALYSIS 2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

I. Genes found on the same chromosome = linked genes

I. Genes found on the same chromosome = linked genes Genetic recombination in Eukaryotes: crossing over, part 1 I. Genes found on the same chromosome = linked genes II. III. Linkage and crossing over Crossing over & chromosome mapping I. Genes found on the

More information

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

AP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters

GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters Michael B Miller , Michael Li , Gregg Lind , Soon-Young

More information

TARGETED INTROGRESSION OF COTTON FIBER QUALITY QTLs USING MOLECULAR MARKERS

TARGETED INTROGRESSION OF COTTON FIBER QUALITY QTLs USING MOLECULAR MARKERS TARGETED INTROGRESSION OF COTTON FIBER QUALITY QTLs USING MOLECULAR MARKERS J.-M. Lacape, T.-B. Nguyen, B. Hau, and M. Giband CIRAD-CA, Programme Coton, TA 70/03, Avenue Agropolis, 34398 Montpellier Cede

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

Pedigree Based Analysis using FlexQTL TM software

Pedigree Based Analysis using FlexQTL TM software Pedigree Based Analysis using FlexQTL TM software Marco Bink Eric van de Weg Roeland Voorrips Hans Jansen Outline Current Status: QTL mapping in pedigreed populations IBD probability of founder alleles

More information

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc

(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc Advanced genetics Kornfeld problem set_key 1A (5 points) Brenner employed 2-factor and 3-factor crosses with the mutants isolated from his screen, and visually assayed for recombination events between

More information

A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes.

A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes. 1 Biology Chapter 10 Study Guide Trait A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes. Genes Genes are located on chromosomes

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

Chapter 9 Patterns of Inheritance

Chapter 9 Patterns of Inheritance Bio 100 Patterns of Inheritance 1 Chapter 9 Patterns of Inheritance Modern genetics began with Gregor Mendel s quantitative experiments with pea plants History of Heredity Blending theory of heredity -

More information

GENETIC CROSSES. Monohybrid Crosses

GENETIC CROSSES. Monohybrid Crosses GENETIC CROSSES Monohybrid Crosses Objectives Explain the difference between genotype and phenotype Explain the difference between homozygous and heterozygous Explain how probability is used to predict

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics

LAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

5 GENETIC LINKAGE AND MAPPING

5 GENETIC LINKAGE AND MAPPING 5 GENETIC LINKAGE AND MAPPING 5.1 Genetic Linkage So far, we have considered traits that are affected by one or two genes, and if there are two genes, we have assumed that they assort independently. However,

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Investigating the genetic basis for intelligence

Investigating the genetic basis for intelligence Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a

More information

ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual

ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual Di Guardo M, Micheletti D, Bianco L, Koehorst-van Putten HJJ, Longhi S, Costa F, Aranzana MJ, Velasco R, Arús P, Troggio

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the

Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)

Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

The correct answer is c A. Answer a is incorrect. The white-eye gene must be recessive since heterozygous females have red eyes.

The correct answer is c A. Answer a is incorrect. The white-eye gene must be recessive since heterozygous females have red eyes. 1. Why is the white-eye phenotype always observed in males carrying the white-eye allele? a. Because the trait is dominant b. Because the trait is recessive c. Because the allele is located on the X chromosome

More information

HLA data analysis in anthropology: basic theory and practice

HLA data analysis in anthropology: basic theory and practice HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.

A and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently. Name Section 7.014 Problem Set 5 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68-120 by 5:00pm on Friday

More information

Answer Key Problem Set 5

Answer Key Problem Set 5 7.03 Fall 2003 1 of 6 1. a) Genetic properties of gln2- and gln 3-: Answer Key Problem Set 5 Both are uninducible, as they give decreased glutamine synthetase (GS) activity. Both are recessive, as mating

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

Hardy-Weinberg Equilibrium Problems

Hardy-Weinberg Equilibrium Problems Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

Heredity - Patterns of Inheritance

Heredity - Patterns of Inheritance Heredity - Patterns of Inheritance Genes and Alleles A. Genes 1. A sequence of nucleotides that codes for a special functional product a. Transfer RNA b. Enzyme c. Structural protein d. Pigments 2. Genes

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

CHAPTER 13. Experimental Design and Analysis of Variance

CHAPTER 13. Experimental Design and Analysis of Variance CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection

More information

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER

Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps

More information

Non-Parametric Tests (I)

Non-Parametric Tests (I) Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent

More information

One-Way Analysis of Variance (ANOVA) Example Problem

One-Way Analysis of Variance (ANOVA) Example Problem One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

More information

Gene Mapping Techniques

Gene Mapping Techniques Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

CCR Biology - Chapter 7 Practice Test - Summer 2012

CCR Biology - Chapter 7 Practice Test - Summer 2012 Name: Class: Date: CCR Biology - Chapter 7 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A person who has a disorder caused

More information

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering

Presentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di

More information

MAGIC design. and other topics. Karl Broman. Biostatistics & Medical Informatics University of Wisconsin Madison

MAGIC design. and other topics. Karl Broman. Biostatistics & Medical Informatics University of Wisconsin Madison MAGIC design and other topics Karl Broman Biostatistics & Medical Informatics University of Wisconsin Madison biostat.wisc.edu/ kbroman github.com/kbroman kbroman.wordpress.com @kwbroman CC founders compgen.unc.edu

More information

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

More information

Mendelian and Non-Mendelian Heredity Grade Ten

Mendelian and Non-Mendelian Heredity Grade Ten Ohio Standards Connection: Life Sciences Benchmark C Explain the genetic mechanisms and molecular basis of inheritance. Indicator 6 Explain that a unit of hereditary information is called a gene, and genes

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics

Basic Principles of Forensic Molecular Biology and Genetics. Population Genetics Basic Principles of Forensic Molecular Biology and Genetics Population Genetics Significance of a Match What is the significance of: a fiber match? a hair match? a glass match? a DNA match? Meaning of

More information

Genome 361: Fundamentals of Genetics and Genomics Fall 2015

Genome 361: Fundamentals of Genetics and Genomics Fall 2015 Genome 361: Fundamentals of Genetics and Genomics Fall 2015 Instructor Frances Cheong, kcheong3@uw.edu Teaching Assistants Michael Bradshaw, mjb34@uw.edu Colby Samstag, csamstag@uw.edu Emily Youngblom,

More information

Using Excel for inferential statistics

Using Excel for inferential statistics FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied

More information

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS

This chapter will demonstrate how to perform multiple linear regression with IBM SPSS CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Two-locus population genetics

Two-locus population genetics Two-locus population genetics Introduction So far in this course we ve dealt only with variation at a single locus. There are obviously many traits that are governed by more than a single locus in whose

More information

Name: Class: Date: ID: A

Name: Class: Date: ID: A Name: Class: _ Date: _ Meiosis Quiz 1. (1 point) A kidney cell is an example of which type of cell? a. sex cell b. germ cell c. somatic cell d. haploid cell 2. (1 point) How many chromosomes are in a human

More information

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan

Combining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined

More information

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate 1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

THE GENETIC ARCHITECTURE

THE GENETIC ARCHITECTURE Annu. Rev. Genet. 2001. 35:303 39 Copyright c 2001 by Annual Reviews. All rights reserved THE GENETIC ARCHITECTURE OF QUANTITATIVE TRAITS TrudyF.C.Mackay Department of Genetics, Box 7614, North Carolina

More information

DNA MARKERS FOR ASEASONALITY AND MILK PRODUCTION IN SHEEP. R. G. Mateescu and M.L. Thonney

DNA MARKERS FOR ASEASONALITY AND MILK PRODUCTION IN SHEEP. R. G. Mateescu and M.L. Thonney DNA MARKERS FOR ASEASONALITY AND MILK PRODUCTION IN SHEEP Introduction R. G. Mateescu and M.L. Thonney Department of Animal Science Cornell University Ithaca, New York Knowledge about genetic markers linked

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Randomized Block Analysis of Variance

Randomized Block Analysis of Variance Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Tutorial 5: Hypothesis Testing

Tutorial 5: Hypothesis Testing Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................

More information

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation

Chapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus

More information

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation

More information

Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted

Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted Supporting Information 3. Host-parasite simulations Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted parasites on the evolution of sex. Briefly, the simulations

More information