From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNAseq data


 Adam Porter
 1 years ago
 Views:
Transcription
1 From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNAseq data
2 experimental design data collection modeling statistical testing
3 biological heterogeneity replicated vs unreplicated experimental design biological vs technical replicates pooling data collection multiplexing modeling statistical testing
4 Experimental Design Unreplicated Definition: One biological replicate per treatment group. Pros: Cheap, and can be informative. Cons: We can only make inferences about the particular biological individuals not the treatment groups. Applications: Pilot studies (although not to assess variation!), nonmodel organism runs focused on reference transcriptome assembly.
5 Experimental Design Replicated Definition: Multiple biological replicate per treatment group. Pros: We can make inferences about the treatment groups, and we can be more confident about our inferences. Cons: More expensive. Applications: Differential expression (and alternative splicing) analysis to make inferences about treatment groups, reliably infer networks.
6 Experimental Design Biological vs Technical Replicates Biological replicates contain multiple individuals; technical replicates contain one individual with some technical steps replicated. Usually biological variance > technical variance, thus biological replicates are more useful. Again, they also allow us to make inferences about the treatment groups.
7 Experimental Design Pooling Definition: Combining multiple samples (individuals, tissues, etc) during preparation into a single sample, assayed together. Pros: Entirely necessary in cases in which there isn t enough sample per individual for sequencing. Unreplicated, pooled samples could also decrease bias. Cons: All ability to measure variability between individuals is lost. A single outlier could bias an entire sample. Applications: Small or difficulttocollect samples, possibly to reduce bias in unreplicated designs.
8 Experimental Design Multiplexing Definition: Attach a unique nucleotide sequence to each sample/replicate group and combine into one pooled sample. Spread pooled sample across multiple lanes and sequence. Pros: Removes technical variation as a source of confounding. Cons: Shorter reads, slightly higher cost. Applications: Generally recommended in all differential expression studies.
9 experimental design data collection multireads genomic vs transcriptomic mapping modeling statistical testing
10 Multireads A multiread is a read that maps equally well to many reference sequences. By default, BWA maps these randomly and uniformly across all equallygood reference positions. read: AGTCGACTAGCTATTAGCATG AGTCGACTAGCTATTAGCATG transcript 1 AGTCGACTAGCTATTAGCATG transcript 2
11 Genomic Mapping mut wt
12 Genomic Mapping Advantages:  Less likely to have multireads across different isoforms.  One can get a sense of the coverage across exons. Disadvantages:  It s a bit involved to estimate isoforms expression.  Needs an (annotated) genome! (i.e. not great for nonmodel organisms)
13 Transcriptomic Mapping mut wt isoform 1 mut wt isoform 2
14 Transcriptomic Mapping Advantages:  Transcriptlevel expression.  Slightly easier to do. Disadvantages:  Multiple isoforms can share an exon. Thus, we can get multireads.  Requires annotation to wrap to genelevel counts.
15 Where does RNAseq data come from? mut 12 wt 21
16 Where does RNAseq data come from? differential isoform expression? mut 12 wt 21
17 Genomic Mapping
18 Count data (unreplicated) gene wt mut
19 Count data (replicated) gene wt1 wt2 mut1 mut
20 Normalization Why normalize? Suppose there are two lanes of data, and 2 times as many sequences in lane A as lane B. Everything will appear to be upregulated, if unnormalized.
21 Normalization Techniques RPKM Reads Per Kilobase Million reads mapped is a common normalization procedure. RPKM = total mapped to gene total mapped to lane (in millions) x gene length (in kilobases) However, a few highlyexpressed genes can dominate total lane counts. Consequently changes in highly expressed genes can disproportionately affect the scaling factor.
22 Normalization Techniques RPKM For example, in one lane of data, the top 2% of genes make up 30% of total lane counts. These 411 genes (out of 20,545) dominate the lane. A constant scaling factor based on total lane count is overemphasizing the expression of these genes.
23 Normalization Techniques Quantile based techniques Idea: rescale empirical distribution to a theoretical one by ordering both, and making the nth smallest value of the empirical distribution equal to the nth smallest of the theoretical distribution. Bullard et al, 2010 have shown that these methods lead to more accurate differential expression results when verified with qpcr.
24 Normalization Techniques DESeq s Approach Size factors are estimated for each column (sample) of the data. Size factors are then used directly in the model fitting step. First, a psuedoreference is created by taking the geometric mean across rows. Then, the median of the ratios of all counts to the psuedoreference value is the size factor.
25 Normalization Techniques
26 experimental design data collection modeling Poisson vs Negative Binomial models assessing models assumptions statistical testing
27 Modeling RNASeq data Example: Poisson models Image of human brain from Anne Brogdon,
28 Modeling RNASeq data Models for Overdispersion DESeq & edger from Bioconductor both use a Negative Binomial model, which model the mean and variance separately.
29 Modeling RNASeq data Models for Overdispersion DESeq & edger from Bioconductor both use a Negative Binomial model, which model the mean and variance separately. Both packages have ways of assessing model fit. Use them!
30 Modeling RNASeq data Consistency between edger and DESeq Using data from Mariano, et al, 2008
31 Modeling RNASeq data Models for Overdispersion Why the difference? DESeq allows for a more local dispersion parameter for similar genes, whereas edger has a fixed dispersion parameter.* Anders and Huber, Orange dashed line is edger estimated variance, purple is variance from Poisson, and orange line is variance estimated from DESeq. *New versions of edger actually allow local fits, and new versions of DESeq have a fixed dispersion parameter! I am simplifying because this is as it is presented in the DESeq paper.
32 experimental design data collection modeling approaches to testing pvalues statistical testing FWER FDR qvalues
33 Testing Hypotheses
34 Testing Hypotheses
35 Testing Hypotheses
36 Testing Hypotheses
37 Why does this matter?
38 Multiple Testing n samples We re doing p simultaneous tests! p genes H1, H2, H3,..., Hp
39 Multiple Testing 20,000 simultaneous ttests on random normal data from the same distribution. There are 1,009 green points (false positives), making up 0.05 of the comparisons (at α = 0.05).
40 Multiple Testing Familywise Error Rate number declared nonsignificant number declared significant total true null hypotheses false null hypotheses U V m0 T S m  m0 m  R R m FWER = P(V 1) FWER = 1  P(V = 0)
41 Multiple Testing Bonferroni Correction One way of controlling FWER: set α = α/n Problem: very conservative.
42 Multiple Testing False Discovery number declared nonsignificant number declared significant total true null hypotheses false null hypotheses U V m0 T S m  m0 m  R R m FDR = E[V/R] (Benjamini and Hochberg, 1995)
43 Multiple Testing False Discovery number declared nonsignificant number declared significant total true null hypotheses false null hypotheses U V m0 T S m  m0 m  R R m control this FDR = E[V/R] (Benjamini and Hochberg, 1995) not this FWER = P(V 1)
44 Multiple Testing False Discovery Procedure (Benjamini and Hochberg, 1995) δ = 0.05 n = 10 Imagine 100 genes were tested, at δ = 0.1 If 40 were found significant, we d expect 4 to be false discoveries.
45 Multiple Testing Storey s qvalue (Storey 2002; Storey and Tibshirani, 2003) When a given qvalue is called significant, the qvalue is the proportion of false discoveries incurred from pvalues as or more extreme.
46 Multiple Testing Storey s qvalue (Storey 2002; Storey and Tibshirani, 2003) When a given qvalue is called significant, the qvalue is the proportion of false discoveries incurred from pvalues as or more extreme. For example, a qvalue of says that 2.3% of genes with pvalues as or more extreme (less likely) are false positives.
47 Multiple Testing Storey s qvalue Practical Example: You have funds to test 100 top differentially expressed gene candidates. How should you pick them? One way: order by absolute value log fold change, and take the top 100 genes. Then order by qvalue and the product of 100 and the last qvalue is the expected number of false positives.
48 Reading Top Tables
49 Practical: Reading Top Tables Recall: it s not just about significance, but effect size. Sorting options:  absolute value of log FC (decreasing)  absolute value of adjusted log FC (decreasing)  pvalue (increasing) Combinations: Absolute value of adjusted log FC (decreasing), subset by adjusted pvalue less than some threshold.
50 Practical: Reading Top Tables Recall: it s not just about significance, but effect size. Sorting options:  absolute value of log FC (decreasing)  absolute value of adjusted log FC (decreasing)  pvalue (increasing) Combinations: Absolute value of adjusted log FC (decreasing), subset by adjusted pvalue less than some threshold.
51 Beyond Differential Expression Differentially Expressed Gene Combinations (Dettling, et al, 2005)
52 Acknowledgements The Bioinformacs Core Dr. Dawei Lin, Ph.D. (Director) Data Analysis Dr. Joe Fass, Ph.D. (Lead) Dr. Monica Bri9on Mr. Nikhil Joshi Stascal Programming Mr. Vince Buffalo (Lead) Applicaon Development (Web/DB) Mr. Jose Boveda (Lead) System Admin & HPC Dr. Zhi Wei Lu (Lead) Vising members Ms. Xinran Dong Campus Scienfic Advisory Board Chair Dr. Craig Benham, Ph.D. (MathemaHcs) Members Dr. Gino Cortopassi, Ph.D. (Molecular Sciences) Dr. Vladimir Filkov, Ph.D. (Computer Sciences) Dr. Fredric Gorin, Ph.D. (Neurosciences) Dr. Juan Medrano, Ph.D. (Animal Sciences) Dr. Jie Peng, Ph.D. (StaHsHcs) Dr. David Rocke, Ph.D. (BiostaHsHcs) Genome Center Director Dr. Richard Michelmore, Ph.D. Associate Directors for Bioinformacs Dr. Ian Korf, Ph.D. Dr. Patrice Koehl, Ph.D.
Gene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies Highthroughput technologies to measure the expression levels of thousands
More informationRNAseq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationExpression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (pairedend) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNAseq protocol Task
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationPackage empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title SimulationBased False Discovery Rate in RNASeq Version 1.0.3 Date 20150526 Author Mikhail V. Matz Maintainer Mikhail V. Matz
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Differential Expression Analysis Daniel Rico drico@cnio.es Bioinformatics Unit CNIO Upregulation or No Change Downregulation Image analysis comparison
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationStatistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: rahnenfuehrer@statistik.tu.de
More information0BComparativeMarkerSelection Documentation
0BComparativeMarkerSelection Documentation Description: Author: Computes significance values for features using several metrics, including FDR(BH), Q Value, FWER, FeatureSpecific PValue, and Bonferroni.
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationIsoform prefiltering improves performance of countbased methods for analysis of differential transcript usage
Soneson et al. Genome Biology (206) 7:2 DOI 86/s30590508623 RESEARCH Isoform prefiltering improves performance of countbased methods for analysis of differential transcript usage Charlotte Soneson,2,
More informationBasic processing of nextgeneration sequencing (NGS) data
Basic processing of nextgeneration sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationMicroarray Data Analysis. Statistical methods to detect differentially expressed genes
Microarray Data Analysis Statistical methods to detect differentially expressed genes Outline The class comparison problem Statistical tests Calculation of pvalues Permutations tests The volcano plot
More informationStatistical Analysis. NBAFB Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAFB Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationBootstrapping pvalue estimations
Bootstrapping pvalue estimations In microarray studies it is common that the the sample size is small and that the distribution of expression values differs from normality. In this situations, permutation
More informationTwoWay ANOVA tests. I. Definition and Applications...2. II. TwoWay ANOVA prerequisites...2. III. How to use the TwoWay ANOVA tool?...
TwoWay ANOVA tests Contents at a glance I. Definition and Applications...2 II. TwoWay ANOVA prerequisites...2 III. How to use the TwoWay ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays?  Biomolecular devices measuring the transcriptome of a
More informationPractical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the genesummarized count data
More informationQVALUE: The Manual Version 1.0
QVALUE: The Manual Version 1.0 Alan Dabney and John D. Storey Department of Biostatistics University of Washington Email: jstorey@u.washington.edu March 2003; Updated June 2003; Updated January 2004 Table
More informationChallenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group gabry@ebi.ac.uk Nextgeneration sequencing Nextgeneration sequencing
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationPackage dunn.test. January 6, 2016
Version 1.3.2 Date 20160106 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
More informationStandards, Guidelines and Best Practices for RNASeq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNASeq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNAseq) are in wide use because of their favorable
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationHow to Conduct a Hypothesis Test
How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some
More informationSection 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
More informationOneWay Analysis of Variance (ANOVA) Example Problem
OneWay Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesistesting technique used to test the equality of two or more population (or treatment) means
More informationThe microarray block. Outline. Microarray experiments. Microarray Technologies. Outline
The microarray block Bioinformatics 1317 March 006 Microarray data analysis John Gustafsson Mathematical statistics Chalmers Lectures DNA microarray technology overview (KS) of microarray data (JG) How
More informationBasics of microarrays. Petter Mostad 2003
Basics of microarrays Petter Mostad 2003 Why microarrays? Microarrays work by hybridizing strands of DNA in a sample against complementary DNA in spots on a chip. Expression analysis measure relative amounts
More informationThe Bonferonni and Šidák Corrections for Multiple Comparisons
The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,
More informationExperimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
More informationNormalization of RNASeq
Normalization of RNASeq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNASeq data analysis from scratch starts with a set of FASTQ files (see e.g.
More informationPower and Sample Size Determination
Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 Power 1 / 31 Experimental Design To this point in the semester,
More informationTwosample hypothesis testing, II 9.07 3/16/2004
Twosample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For twosample tests of the difference in mean, things get a little confusing, here,
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationTutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls nicholls@mrclmb.cam.ac.uk MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
More informationHow Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
More information1. How different is the t distribution from the normal?
Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. tdistributions.
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
More informationStatistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
More informationMultivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
More informationMinería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions
Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Data Mining on the DAG ü When working with large datasets, annotation
More informationRNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance
RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are
More informationMicroarray Data Analysis. A step by step analysis using BRBArray Tools
Microarray Data Analysis A step by step analysis using BRBArray Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationAnalysis Issues II. Mary Foulkes, PhD Johns Hopkins University
This work is licensed under a Creative Commons AttributionNonCommercialShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
More informationedger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K.
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 8 October
More informationRedwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 943055405.
W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:
More informationStatistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More information1 Why is multiple testing a problem?
Spring 2008  Stat C141/ Bioeng C141  Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman
More informationMaster s Thesis. PERFORMANCE OF BETABINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY
Master s Thesis PERFORMANCE OF BETABINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY AUTHOR: Irene Castro Conde DIRECTOR: Jacobo de Uña Álvarez Master in Statistical Techniques University
More informationRow Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES484 ISSN: 17448050 23 June 2008 Abstract
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationStrong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,
More informationQuantitative Biology Lecture 5 (Hypothesis Testing)
15 th Oct 2015 Quantitative Biology Lecture 5 (Hypothesis Testing) Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Classification Errors Statistical significance Ttests Qvalues (Traditional)
More informationFalse discovery rate and permutation test: An evaluation in ERP data analysis
Research Article Received 7 August 2008, Accepted 8 October 2009 Published online 25 November 2009 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3784 False discovery rate and permutation
More informationCancer Biostatistics Workshop Science of Doing Science  Biostatistics
Cancer Biostatistics Workshop Science of Doing Science  Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center VanderbiltIngram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationExercise with Gene Ontology  Cytoscape  BiNGO
Exercise with Gene Ontology  Cytoscape  BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationRT 2 Profiler PCR Array: WebBased Data Analysis Tutorial
RT 2 Profiler PCR Array: WebBased Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcrapplications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! 2
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.11.6) Objectives
More informationANOVA Analysis of Variance
ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,
More informationChapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationStatistical Inference and ttests
1 Statistical Inference and ttests Objectives Evaluate the difference between a sample mean and a target value using a onesample ttest. Evaluate the difference between a sample mean and a target value
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationP(every one of the seven intervals covers the true mean yield at its location) = 3.
1 Let = number of locations at which the computed confidence interval for that location hits the true value of the mean yield at its location has a binomial(7,095) (a) P(every one of the seven intervals
More informationEBM Cheat Sheet Measurements Card
EBM Cheat Sheet Measurements Card Basic terms: Prevalence = Number of existing cases of disease at a point in time / Total population. Notes: Numerator includes old and new cases Prevalence is crosssectional
More informationChoices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swissprot MSDB, NCBI nr dbest Species specific ORFS
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationAnalysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Betweensubjects manipulations: variable to
More informationPower and Sample Size. In epigenetic epidemiology studies
Power and Sample Size In epigenetic epidemiology studies Overview Pros and cons Working examples Concerns for epigenetic epidemiology Definition Power is the probability of detecting an effect, given that
More informationPackage ERP. December 14, 2015
Type Package Package ERP December 14, 2015 Title Significance Analysis of EventRelated Potentials Data Version 1.1 Date 20151211 Author David Causeur (Agrocampus, Rennes, France) and ChingFan Sheu
More informationQuantitative Methods for Finance
Quantitative Methods for Finance Module 1: The Time Value of Money 1 Learning how to interpret interest rates as required rates of return, discount rates, or opportunity costs. 2 Learning how to explain
More informationPackage HHG. July 14, 2015
Type Package Package HHG July 14, 2015 Title HellerHellerGorfine Tests of Independence and Equality of Distributions Version 1.5.1 Date 20150713 Author Barak Brill & Shachar Kaufman, based in part
More information9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
More informationPower Analysis for Correlation & Multiple Regression
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subjecttovariable ratios Stability of correlation values Useful types of power analyses Simple correlations Full
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More information13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated KruskalWallis Test Post hoc Comparisons In the prior
More informationMINITAB ASSISTANT WHITE PAPER
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. OneWay
More informationIntroduction to data analysis: Supervised analysis
Introduction to data analysis: Supervised analysis Introduction to Microarray Technology course May 2011 Solveig Mjelstad Olafsrud solveig@microarray.no Most slides adapted/borrowed from presentations
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationPREDA S4classes. Francesco Ferrari October 13, 2015
PREDA S4classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationSystematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin LindbladToh
More informationLukas Windhager LFE Bioinformatik, Institut für Informatik LudwigMaximiliansUniversität München Coverage variability in NGS Data
Lukas Windhager LFE Bioinformatik, Institut für Informatik LudwigMaximiliansUniversität München Coverage variability in NGS Data 06.04.2011 Short talk Reproducible pattern SOLiD reads mapped to rrna
More informationIntroduction to SAGEnhaft
Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene
More informationStandard Deviation Calculator
CSS.com Chapter 35 Standard Deviation Calculator Introduction The is a tool to calculate the standard deviation from the data, the standard error, the range, percentiles, the COV, confidence limits, or
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationBiodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D.
Biodiversity Data Analysis: Testing Statistical Hypotheses By Joanna Weremijewicz, Simeon Yurek, Steven Green, Ph. D. and Dana Krempels, Ph. D. In biological science, investigators often collect biological
More informationNumerical Summarization of Data OPRE 6301
Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting
More information