CH927 Quantitative Genomics Lecture 2. How can quantitative traits be mapped?
|
|
- Barrie Copeland
- 7 years ago
- Views:
Transcription
1 CH927 Quantitative Genomics Lecture 2 How can quantitative traits be mapped?
2 Lecture objectives By the end of this lecture you should be able to explain: What the main steps in QTL mapping are What the different methods for QTL analysis are: - Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM) and under which experimental conditions they should be used What the different statistical methods for QTL analysis are: - t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs
3 Basis for QTL mapping known for over 70 years but lack of genetic markers prevented widespread use until the mid 80 s With DNA sequencing, the number & density of markers have grown Also, more statistically-sophisticated mapping methods have been developed 1. Score a population for (i) a trait, and (ii) distribution of genome markers 2. Identify regions of the genome containing QTLs based on occurance of a phenotypemarker association that is significantly more likely than chance
4 Association of phenotypes with markers a G A g agb Agb agb agb AgB B b agb AgB agb Agb agb agb agb agb agb AgB AgB Agb Agb A/a and B/b = molecular scores G/g = phenotypic score Results from marker A/a: suggests that the gene is very close to the marker Results from marker B/b: suggests that the gene is not linked to the marker
5 This is a generalisation of the principle...but for only one gene. We need to consider Quantitative Trait Loci (multiple) a G A g agch BJkD AgcH bjkd AgcH BJKD AgcH BJkd agch bjkd c H C h AgcH BJkD agch bjkd agch BJKd AgcH bjkd B J b j k D K d AgcH BJkd AgcH BJKD AgcH BJkD agch bjkd agch bjkd AgcH bjkd AgcH bjkd agch BJkD agch BJKd
6 Objectives of QTL analysis 1. Score a population for (i) a trait, and (ii) distribution of genome markers 2. Identify regions of the genome containing QTLs based on occurence of a phenotype-marker association that is significantly more likely than chance 3. Estimate the effects of the QTLs on the quantitative trait: - many genes with small effect each or few genes with large effect each? - their effects on the trait: is gene action additive or dominant? - their positions in the genome: linkage and association, epistasis - their interaction with the environment 4. Identify candidate genes underlying the QTL and thus the trait
7 QTL analysis can be classified by the type of progeny used All of the different progenies are derived from the same reference population From this reference population different progenies can be produced P1 P2 MMQQ x mmqq M = marker genotype Q = QTL genotype TC4 self F2 self x 5 x P3 F1 MmQq self F7 (RILs) x P4 SI lines x P3 TC1 x P4 TC2 TC3
8 F2 x P2 Backcrosses and Near Isogenic Lines (NILs) self BC1 (Backcross1) F1: use for QTL mapping BC1 F2 Rapid generation of material for QTL analysis x BC1 F3 BC2 F1 BC2 F2 BC2 F3 Near Isogenic Lines Isolate part of genome A of interest
9 To map a quantitative trait: 1. Make a cross and generate marker data - Type of mapping population (e.g. RIL) 2. Generate linkage maps - Genome size, genome coverage 3. Collect phenotypic measurements - Evaluate in uniform environment, - Evaluate in multiple environments - Data transformation (approach normal distribution) frequency A 1 /A 2 A 1 /A 1 A 2 /A 2 trait value Total variance = V T = V G + V E genetic variance + environmental variance heterogeneous env. stochastic events measurement error Assumes genes act additively (i.e. no epistasis) and that their effects are not conditional on environment, otherwise V T = V G + V E + V GxG + V GxE
10 Lecture objectives By the end of this lecture you should be able to explain: What the main steps in QTL mapping are What the different methods for QTL analysis are: - Single marker, Interval/Flanking Mapping - Composite Interval Mapping (CIM), Multiple Interval Mapping (MIM) and under which experimental conditions they should be used What the different statistical methods for QTL analysis are: - t-test, ANOVA, multiple regression, linear regression and what they can predict about QTLs
11 4. The statistical machinery for QTL mapping Several analysis frameworks for marker-qtl associations: - Single marker tests (t-test, F-test or Linear Regression) - Interval/Flanking Mapping (IM) (pair of markers simultaneously) - Composite Interval Mapping (CIM) (analysis of a marker interval, flanked by adjacent markers, ML-based) - Multiple Interval Mapping (MIM)
12 4. The statistical machinery for QTL mapping Four main analysis techniques: Simple t-test: use to evaluate presence of a QTL through statistical differences between two marker genotypes ANOVA (marker regression): detects marker differences when there are more than two marker genotypes. Produces a ranking of genotypes, in order of phenotypic effect for the trait of interest, and tests for significant differences between each genotype Multiple regression: simple remodelling of the ANOVA technique in regression terms, with the same ranking and testing for differences Linear regression: most complex point analysis method, allowing different characteristics of the QTL to be investigated. Including: dominance effects, additive effects genotype-environment interactions, epistasis
13 Probabilites and t-tests
14 Basic mapping format: conditional probablities The conditional probibility that the QTL genotype is Qq, given that the marker genotype is Mm: P1 P2 Pr(Qk Mj) = Pr(QkMj) Pr(Mj) MM QQ x mm qq Calculate this in an F2 from: gamete frequencies marker genotype probabilities Consider a QTL linked to a marker (recombination Fraction = c) In the F 2, freq(mq) = freq(mq) = (1-c)/2 freq(mq) = freq(mq) = c/2 self F2 F1 Mm Qq QTL genotypes = missing Marker genotypes = observed
15 Basic mapping format: conditional probablities In the F 2, freq(mq) = freq(mq) = (1-c)/2 freq(mq) = freq(mq) = c/2 Hence, Pr(MMQQ) = Pr(MQ)Pr(MQ) = (1-c) 2 /4 Pr(MMQq) = 2Pr(MQ)Pr(Mq) = 2c(1-c) /4 Pr(MMqq) = Pr(Mq)Pr(Mq) = c 2 /4 Since Pr(MM) = 1/4, the conditional probabilities become: Pr(QQ MM) = Pr(MMQQ)/Pr(MM) = (1-c) 2 Pr(Qq MM) = Pr(MMQq)/Pr(MM) = 2c(1-c) Pr(qq MM) = Pr(MMqq)/Pr(MM) = c 2
16 Using a t-test to probe a QTL e.g. backcross with two genes: marker (alleles M, m), and QTL (alleles Q, q) These two genes are linked with the recombination fraction of c MmQq Mmqq mmqq mmqq Frequency (1-c)/2 c/2 c/2 (1-c)/2 Mean effect m+a m m+a m Mean of marker genotype Mm: m 1 = (1-c)/2(m+a) + c/2m = m + (1-c)a A small MM-mm difference: small effect tight linkage Mean of marker genotype mm: m 0 = c/2(m+a) + (1-c)/2m = m + ca If trait mean is significantly different for the genotypes at a marker locus, it is linked to a QTL large effect loose linkage
17 ANOVA and single marker regression
18 Partitioning of variance: a simple ANOVA model Partition variance: genetically-determined and environmental components Model (there is a QTL linked to a marker) is tested against the null hypothesis of no QTL trait value A 1 /A 1 A 1 /A 2 A 2 /A 2 genotype
19 Partitioning of variance methodology Total sum of squares: calculate grand mean, deviation of each individual from mean SS T square each deviation & sum all the deviations for the population Total mean sum, MS T = SS T degrees of freedom = n-1 = total variance n=23 trait value Grand mean A 1 /A 1 A 1 /A 2 A 2 /A 2
20 Partitioning of variance: fitting the model Calculate mean for each genotype group SS R = residual sum of squares = sum (deviations of each individual from genotype mean) 2 Total mean sum, MS R = SS R degrees of freedom = (n-1) - #genotypes) = variance not explained by the model (or explained by this QTL) trait value Grand mean A 1 /A 1 A 1 /A 2 A 2 /A 2
21 Genetic variance and testing the model Model sum of squares, SS M = sum values for each genotype: (grand mean - each genotype mean) 2 x (# individuals with that genotype) Genetic variance, MS M = But since MS T = MS M + MS R SS M degrees of freedom = 2 It is easier to calculate as MS M = MS T - MS R
22 Genetic variance and testing the model To test whether the QTL explains a significant amount of the variation, calculate Model to residual variance, F-ratio = MS M / MS R Variance explained by the QTL = MS M / MS T Look up the minimum value of F that is unlikely to have occurred by chance, given 2 d.f. for MS M and 20 for MS R (F 3.49 for p 0.05 in this case) If F exceeds this value, we can reject the null hypothesis of no QTL MS M = MS T - MS R
23 This is essentially a least-squares regression Incorporate terms into the model to estimate: The additive effect of the alleles, a = half the difference between the averages for the two homozygotes can be positive or negative, depending on which allele is being considered The dominance deviation, d = the average difference between hets and the mid-point of the homs can also be positive or negative If d = ±a one allele completely dominant If d > ±a one allele shows over-dominance
24 Estimation of additive and dominance effects MmQq Mmqq mmqq mmqq Frequency (1-c)/2 c/2 c/2 (1-c)/2 Mean effect m+a m m+a m Mean of marker genotype Mm: m 1 Mean of marker genotype mm: m 0 a* = estimated additive effects d* = estimated dominance effects Additive effects (a): (m 1 m 0 )/2 = a(1-2c) = a* Dominance effects (d): m 2 - (m 1 m 0 )/2 = d(1-2c) = d* (m 1 m 0 )/2
25 Linear Models for QTL Detection Uses the linear relationship between the apparent affects of a marker on a quantitative character, and the substantial effects of all related QTLs that are linked to that marker y mk = π + b m + e mk Effect of marker genotype m on trait value Value of trait in kth individual of marker genotype m Differences in the distance between the QTL and the markers alter factors in this relationship Detection: a QTL is linked to the marker if at least one of the b m is significantly different from zero Estimation (QTL effect and position): have to relate the b m to the QTL effects and map position
26 Detecting epistasis One major advantage of linear models is their flexibility Test for epistasis between two QTLs: use an ANOVA with an interaction term: Effect from marker genotype at first marker set (can be > 1 loci) Interaction between marker genotypes i in 1st marker set and k in 2nd marker set y = π + ai + bk + di k + e Effect from marker genotype at second marker set At least one of the a i significantly different from 0 QTL linked to first marker set At least one of the b k significantly different from 0 QTL linked to second marker set At least one of the d ik significantly different from 0 interactions between QTL in sets 1 and two
27 Interval mapping and marker regression
28 Problems with single marker mapping using ANOVA If marker density is high, ANOVA with individual marker genotypes is effective: single marker analysis or single marker regression Three important weaknesses: Do not receive separate estimates of QTL location and QTL effect. Must discard individuals whose genotypes are missing at the marker When markers are sparse, the QTL may be quite far from all markers, and so the power for QTL detection will decrease
29 Interval mapping Can use probability estimates for the genotypes in intervals between markers Move the QTL position every 2cM from M 1 to M 2 and draw the profile of the F value. The peak of the profile corresponds to the best estimate of the QTL position F-value M 1 M 2 M 3 M 4 M 5 Testing position
30 Interval mapping implementation Carry out a QTL scan step-wise: once a significant QTL has been identified, other markers tested for their ability to explain the residual variation Known QTL are said to be fixed or co-factors in the regression F-ratio Interval mapping by regression (QTL Express) ** ** * ** * **
31 Interval mapping with regression approach Consider a marker interval M 1 -M 2. We assume that a QTL is located at a particular position between the two markers (r 1 and θ are fixed) With response variable, y i, and dependent variable, x i, a regression model is constructed: The phenotypic value for individual i affected by a QTL can be expressed as, y i = μ + a*x i + e i i = 1,, n (latent model) y i is the overall mean x*i is the indicator variable for QTL genotypes: x*i = 1 for Qq; 0 for qq a* is the additive effect effect of the putative QTL on the trait ei is the residual error, e i ~ N(0, σ 2 )
32 Advantages and disadvantages of interval mapping Advantages: - the position of the QTL can be inferred by a support interval - the estimated position and effects of the QTL tend to be asymptotically unbiased if there is only one segregating QTL on a chromosome - method requires fewer individuals Disadvantages: - this is not an interval test - even when there is no QTL within an interval, the likelihood profile on the interval can still exceed the threshold if there is a QTL nearby - if there is more than one QTL on a chromosome, the test statistic at the position being tested will be affected by all QTL and the estimated positions - not efficient to use only two markers at a time for testing
33 Flanking methods and Maximum likelihood
34 Flanking marker methods have been the most popular analysis techniques over recent years Due to their accuracy and level of characterisation of the putative QTL - combine both detection and estimation of QTL effects and position Two basic techniques: Maximum likelihood Maximum likelihood estimation through regression Three methods for estimating likelihood: Single marker maximum likelihood (least power) Flanking marker maximum likelihood (most versatile) Order restricted interval mapping (most power)
35 LOD score Estimating the QTL position (θ): Likelihood maps View θ as a fixed parameter, assume the QTL is located at a particular position View θ as a variable being estimated (derive log-likelihood equation for MLE of θ) (L O / L A ) = ratio of the likelihood of the null hypothesis (no QTL in the marker interval) to the likelihood of the alternative hypothesis (QTL present) LOD (Log of the Odds) = log 10 (L O / L A ) Support interval Estimated QTL location In each method a likelihood map is produced: Significance threshold 0 Chromosome position
36 Composite interval mapping (CIM) Uses multiple markers as additional factors (marker cofactors) i-1 i i+1 i+2 Interval being mapped Five different types of markers are considered for the regression model, depending on the characteristics of the chromosome region: - markers surrounding the QTL of interest - linked & unlinked markers within the QTL region - linked & unlinked markers outside the QTL region Method: Predict QTL marker genotype every x cm Carry out an LR test for QTL effect every x cm Combines MLE and multiple regression methods
37 Permutation testing to determine experiment-wide signficance thresholds Multiple testing problem: how often are random QTL effects of a certain magnitude detected in similar datasets? Method: top 5% of random - create a large number of random empirical datasets - take your marker data and randomly reassign the phenotypes back to the marker genotypes - repeat the QTL detection process - record the highest LR produced for a random QTL anywhere in the map 95% of random - repeat the whole process > 500 times - record the magnitude of the lowest random QTL observed in the top 5% of LR results = threshold
38 Multiple interval mapping Uses multiple marker intervals simultaneously Aims to map multiple QTLs in a single step Method: Build regression models which include all QTLs (detected first by CIM) Use information content (IC) theory to evaluate alternative models Allows simultaneous detection and estimation of additive, dominance & epistatic effects
39 Some examples of the final output
40 Genetics and genomics of post harvest senescence in broccoli Vicky Buchanan-Wollaston and Dave Pink (Warwick HRI) 1 2 JoinMap 2010 broccoli linkage map plant lines, 211 loci (189 SSR, 22 AFLP) 7 8 9
41 QTLs for senescence traits in broccoli Two major QTL for time to yellowing confirmed on 2010 broccoli map REML calculated: 64.4 % of line mean variation is genetic Chr Lod p >0.001 Chr Lod p > cm 7 cm 30.6% of variation 3.8 Lod p > % of variation MapQTL Permutation test 10,000 iterations
GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING
GENOMIC SELECTION: THE FUTURE OF MARKER ASSISTED SELECTION AND ANIMAL BREEDING Theo Meuwissen Institute for Animal Science and Aquaculture, Box 5025, 1432 Ås, Norway, theo.meuwissen@ihf.nlh.no Summary
More informationMarker-Assisted Backcrossing. Marker-Assisted Selection. 1. Select donor alleles at markers flanking target gene. Losing the target allele
Marker-Assisted Backcrossing Marker-Assisted Selection CS74 009 Jim Holland Target gene = Recurrent parent allele = Donor parent allele. Select donor allele at markers linked to target gene.. Select recurrent
More informationBasics of Marker Assisted Selection
asics of Marker ssisted Selection Chapter 15 asics of Marker ssisted Selection Julius van der Werf, Department of nimal Science rian Kinghorn, Twynam Chair of nimal reeding Technologies University of New
More information2 GENETIC DATA ANALYSIS
2.1 Strategies for learning genetics 2 GENETIC DATA ANALYSIS We will begin this lecture by discussing some strategies for learning genetics. Genetics is different from most other biology courses you have
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationI. Genes found on the same chromosome = linked genes
Genetic recombination in Eukaryotes: crossing over, part 1 I. Genes found on the same chromosome = linked genes II. III. Linkage and crossing over Crossing over & chromosome mapping I. Genes found on the
More informationAP: LAB 8: THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Ms. Foglia Date AP: LAB 8: THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationGAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters
GAW 15 Problem 3: Simulated Rheumatoid Arthritis Data Full Model and Simulation Parameters Michael B Miller , Michael Li , Gregg Lind , Soon-Young
More informationTARGETED INTROGRESSION OF COTTON FIBER QUALITY QTLs USING MOLECULAR MARKERS
TARGETED INTROGRESSION OF COTTON FIBER QUALITY QTLs USING MOLECULAR MARKERS J.-M. Lacape, T.-B. Nguyen, B. Hau, and M. Giband CIRAD-CA, Programme Coton, TA 70/03, Avenue Agropolis, 34398 Montpellier Cede
More informationOne-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups
One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The
More informationElementary Statistics Sample Exam #3
Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to
More informationPedigree Based Analysis using FlexQTL TM software
Pedigree Based Analysis using FlexQTL TM software Marco Bink Eric van de Weg Roeland Voorrips Hans Jansen Outline Current Status: QTL mapping in pedigreed populations IBD probability of founder alleles
More information(1-p) 2. p(1-p) From the table, frequency of DpyUnc = ¼ (p^2) = #DpyUnc = p^2 = 0.0004 ¼(1-p)^2 + ½(1-p)p + ¼(p^2) #Dpy + #DpyUnc
Advanced genetics Kornfeld problem set_key 1A (5 points) Brenner employed 2-factor and 3-factor crosses with the mutants isolated from his screen, and visually assayed for recombination events between
More informationA trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes.
1 Biology Chapter 10 Study Guide Trait A trait is a variation of a particular character (e.g. color, height). Traits are passed from parents to offspring through genes. Genes Genes are located on chromosomes
More information1.5 Oneway Analysis of Variance
Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationINTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of
More informationChapter 9 Patterns of Inheritance
Bio 100 Patterns of Inheritance 1 Chapter 9 Patterns of Inheritance Modern genetics began with Gregor Mendel s quantitative experiments with pea plants History of Heredity Blending theory of heredity -
More informationGENETIC CROSSES. Monohybrid Crosses
GENETIC CROSSES Monohybrid Crosses Objectives Explain the difference between genotype and phenotype Explain the difference between homozygous and heterozygous Explain how probability is used to predict
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationWeek TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480
1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationLAB : THE CHI-SQUARE TEST. Probability, Random Chance, and Genetics
Period Date LAB : THE CHI-SQUARE TEST Probability, Random Chance, and Genetics Why do we study random chance and probability at the beginning of a unit on genetics? Genetics is the study of inheritance,
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More information5 GENETIC LINKAGE AND MAPPING
5 GENETIC LINKAGE AND MAPPING 5.1 Genetic Linkage So far, we have considered traits that are affected by one or two genes, and if there are two genes, we have assumed that they assort independently. However,
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationInvestigating the genetic basis for intelligence
Investigating the genetic basis for intelligence Steve Hsu University of Oregon and BGI www.cog-genomics.org Outline: a multidisciplinary subject 1. What is intelligence? Psychometrics 2. g and GWAS: a
More informationASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual
ASSIsT: An Automatic SNP ScorIng Tool for in and out-breeding species Reference Manual Di Guardo M, Micheletti D, Bianco L, Koehorst-van Putten HJJ, Longhi S, Costa F, Aranzana MJ, Velasco R, Arús P, Troggio
More informationOne-Way Analysis of Variance
One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We
More informationGlobally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the
Chapter 5 Analysis of Prostate Cancer Association Study Data 5.1 Risk factors for Prostate Cancer Globally, about 9.7% of cancers in men are prostate cancers, and the risk of developing the disease has
More informationIntroduction to Analysis of Variance (ANOVA) Limitations of the t-test
Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationLecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs)
Lecture 6: Single nucleotide polymorphisms (SNPs) and Restriction Fragment Length Polymorphisms (RFLPs) Single nucleotide polymorphisms or SNPs (pronounced "snips") are DNA sequence variations that occur
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationStudy Guide for the Final Exam
Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More information10. Analysis of Longitudinal Studies Repeat-measures analysis
Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationThe correct answer is c A. Answer a is incorrect. The white-eye gene must be recessive since heterozygous females have red eyes.
1. Why is the white-eye phenotype always observed in males carrying the white-eye allele? a. Because the trait is dominant b. Because the trait is recessive c. Because the allele is located on the X chromosome
More informationHLA data analysis in anthropology: basic theory and practice
HLA data analysis in anthropology: basic theory and practice Alicia Sanchez-Mazas and José Manuel Nunes Laboratory of Anthropology, Genetics and Peopling history (AGP), Department of Anthropology and Ecology,
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationA and B are not absolutely linked. They could be far enough apart on the chromosome that they assort independently.
Name Section 7.014 Problem Set 5 Please print out this problem set and record your answers on the printed copy. Answers to this problem set are to be turned in to the box outside 68-120 by 5:00pm on Friday
More informationAnswer Key Problem Set 5
7.03 Fall 2003 1 of 6 1. a) Genetic properties of gln2- and gln 3-: Answer Key Problem Set 5 Both are uninducible, as they give decreased glutamine synthetase (GS) activity. Both are recessive, as mating
More informationChapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS
Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple
More informationHardy-Weinberg Equilibrium Problems
Hardy-Weinberg Equilibrium Problems 1. The frequency of two alleles in a gene pool is 0.19 (A) and 0.81(a). Assume that the population is in Hardy-Weinberg equilibrium. (a) Calculate the percentage of
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationUNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)
UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design
More informationHeredity - Patterns of Inheritance
Heredity - Patterns of Inheritance Genes and Alleles A. Genes 1. A sequence of nucleotides that codes for a special functional product a. Transfer RNA b. Enzyme c. Structural protein d. Pigments 2. Genes
More informationABSORBENCY OF PAPER TOWELS
ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationCHAPTER 13. Experimental Design and Analysis of Variance
CHAPTER 13 Experimental Design and Analysis of Variance CONTENTS STATISTICS IN PRACTICE: BURKE MARKETING SERVICES, INC. 13.1 AN INTRODUCTION TO EXPERIMENTAL DESIGN AND ANALYSIS OF VARIANCE Data Collection
More informationStep-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER
Step-by-Step Guide to Bi-Parental Linkage Mapping WHITE PAPER JMP Genomics Step-by-Step Guide to Bi-Parental Linkage Mapping Introduction JMP Genomics offers several tools for the creation of linkage maps
More informationNon-Parametric Tests (I)
Lecture 5: Non-Parametric Tests (I) KimHuat LIM lim@stats.ox.ac.uk http://www.stats.ox.ac.uk/~lim/teaching.html Slide 1 5.1 Outline (i) Overview of Distribution-Free Tests (ii) Median Test for Two Independent
More informationOne-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
More informationGene Mapping Techniques
Gene Mapping Techniques OBJECTIVES By the end of this session the student should be able to: Define genetic linkage and recombinant frequency State how genetic distance may be estimated State how restriction
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationCCR Biology - Chapter 7 Practice Test - Summer 2012
Name: Class: Date: CCR Biology - Chapter 7 Practice Test - Summer 2012 Multiple Choice Identify the choice that best completes the statement or answers the question. 1. A person who has a disorder caused
More informationPresentation by: Ahmad Alsahaf. Research collaborator at the Hydroinformatics lab - Politecnico di Milano MSc in Automation and Control Engineering
Johann Bernoulli Institute for Mathematics and Computer Science, University of Groningen 9-October 2015 Presentation by: Ahmad Alsahaf Research collaborator at the Hydroinformatics lab - Politecnico di
More informationMAGIC design. and other topics. Karl Broman. Biostatistics & Medical Informatics University of Wisconsin Madison
MAGIC design and other topics Karl Broman Biostatistics & Medical Informatics University of Wisconsin Madison biostat.wisc.edu/ kbroman github.com/kbroman kbroman.wordpress.com @kwbroman CC founders compgen.unc.edu
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two- Means
Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More informationMendelian and Non-Mendelian Heredity Grade Ten
Ohio Standards Connection: Life Sciences Benchmark C Explain the genetic mechanisms and molecular basis of inheritance. Indicator 6 Explain that a unit of hereditary information is called a gene, and genes
More informationMULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
More informationBasic Principles of Forensic Molecular Biology and Genetics. Population Genetics
Basic Principles of Forensic Molecular Biology and Genetics Population Genetics Significance of a Match What is the significance of: a fiber match? a hair match? a glass match? a DNA match? Meaning of
More informationGenome 361: Fundamentals of Genetics and Genomics Fall 2015
Genome 361: Fundamentals of Genetics and Genomics Fall 2015 Instructor Frances Cheong, kcheong3@uw.edu Teaching Assistants Michael Bradshaw, mjb34@uw.edu Colby Samstag, csamstag@uw.edu Emily Youngblom,
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationThis chapter will demonstrate how to perform multiple linear regression with IBM SPSS
CHAPTER 7B Multiple Regression: Statistical Methods Using IBM SPSS This chapter will demonstrate how to perform multiple linear regression with IBM SPSS first using the standard method and then using the
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationTwo-locus population genetics
Two-locus population genetics Introduction So far in this course we ve dealt only with variation at a single locus. There are obviously many traits that are governed by more than a single locus in whose
More informationName: Class: Date: ID: A
Name: Class: _ Date: _ Meiosis Quiz 1. (1 point) A kidney cell is an example of which type of cell? a. sex cell b. germ cell c. somatic cell d. haploid cell 2. (1 point) How many chromosomes are in a human
More informationCombining Data from Different Genotyping Platforms. Gonçalo Abecasis Center for Statistical Genetics University of Michigan
Combining Data from Different Genotyping Platforms Gonçalo Abecasis Center for Statistical Genetics University of Michigan The Challenge Detecting small effects requires very large sample sizes Combined
More informationOne-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More information12: Analysis of Variance. Introduction
1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider
More informationTHE GENETIC ARCHITECTURE
Annu. Rev. Genet. 2001. 35:303 39 Copyright c 2001 by Annual Reviews. All rights reserved THE GENETIC ARCHITECTURE OF QUANTITATIVE TRAITS TrudyF.C.Mackay Department of Genetics, Box 7614, North Carolina
More informationDNA MARKERS FOR ASEASONALITY AND MILK PRODUCTION IN SHEEP. R. G. Mateescu and M.L. Thonney
DNA MARKERS FOR ASEASONALITY AND MILK PRODUCTION IN SHEEP Introduction R. G. Mateescu and M.L. Thonney Department of Animal Science Cornell University Ithaca, New York Knowledge about genetic markers linked
More informationCourse Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
More informationRandomized Block Analysis of Variance
Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of
More informationChapter 5 Analysis of variance SPSS Analysis of variance
Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationData Analysis Tools. Tools for Summarizing Data
Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationRecall this chart that showed how most of our course would be organized:
Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrc-lmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationChapter 9. Two-Sample Tests. Effect Sizes and Power Paired t Test Calculation
Chapter 9 Two-Sample Tests Paired t Test (Correlated Groups t Test) Effect Sizes and Power Paired t Test Calculation Summary Independent t Test Chapter 9 Homework Power and Two-Sample Tests: Paired Versus
More informationOutline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test
The t-test Outline Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test - Dependent (related) groups t-test - Independent (unrelated) groups t-test Comparing means Correlation
More informationDeterministic computer simulations were performed to evaluate the effect of maternallytransmitted
Supporting Information 3. Host-parasite simulations Deterministic computer simulations were performed to evaluate the effect of maternallytransmitted parasites on the evolution of sex. Briefly, the simulations
More information