From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
|
|
|
- Adam Porter
- 10 years ago
- Views:
Transcription
1 From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data
2 experimental design data collection modeling statistical testing
3 biological heterogeneity replicated vs unreplicated experimental design biological vs technical replicates pooling data collection multiplexing modeling statistical testing
4 Experimental Design Unreplicated Definition: One biological replicate per treatment group. Pros: Cheap, and can be informative. Cons: We can only make inferences about the particular biological individuals not the treatment groups. Applications: Pilot studies (although not to assess variation!), non-model organism runs focused on reference transcriptome assembly.
5 Experimental Design Replicated Definition: Multiple biological replicate per treatment group. Pros: We can make inferences about the treatment groups, and we can be more confident about our inferences. Cons: More expensive. Applications: Differential expression (and alternative splicing) analysis to make inferences about treatment groups, reliably infer networks.
6 Experimental Design Biological vs Technical Replicates Biological replicates contain multiple individuals; technical replicates contain one individual with some technical steps replicated. Usually biological variance > technical variance, thus biological replicates are more useful. Again, they also allow us to make inferences about the treatment groups.
7 Experimental Design Pooling Definition: Combining multiple samples (individuals, tissues, etc) during preparation into a single sample, assayed together. Pros: Entirely necessary in cases in which there isn t enough sample per individual for sequencing. Unreplicated, pooled samples could also decrease bias. Cons: All ability to measure variability between individuals is lost. A single outlier could bias an entire sample. Applications: Small or difficult-to-collect samples, possibly to reduce bias in unreplicated designs.
8 Experimental Design Multiplexing Definition: Attach a unique nucleotide sequence to each sample/replicate group and combine into one pooled sample. Spread pooled sample across multiple lanes and sequence. Pros: Removes technical variation as a source of confounding. Cons: Shorter reads, slightly higher cost. Applications: Generally recommended in all differential expression studies.
9 experimental design data collection multireads genomic vs transcriptomic mapping modeling statistical testing
10 Multireads A multiread is a read that maps equally well to many reference sequences. By default, BWA maps these randomly and uniformly across all equally-good reference positions. read: AGTCGACTAGCTATTAGCATG AGTCGACTAGCTATTAGCATG transcript 1 AGTCGACTAGCTATTAGCATG transcript 2
11 Genomic Mapping mut wt
12 Genomic Mapping Advantages: - Less likely to have multireads across different isoforms. - One can get a sense of the coverage across exons. Disadvantages: - It s a bit involved to estimate isoforms expression. - Needs an (annotated) genome! (i.e. not great for non-model organisms)
13 Transcriptomic Mapping mut wt isoform 1 mut wt isoform 2
14 Transcriptomic Mapping Advantages: - Transcript-level expression. - Slightly easier to do. Disadvantages: - Multiple isoforms can share an exon. Thus, we can get multireads. - Requires annotation to wrap to gene-level counts.
15 Where does RNA-seq data come from? mut 12 wt 21
16 Where does RNA-seq data come from? differential isoform expression? mut 12 wt 21
17 Genomic Mapping
18 Count data (unreplicated) gene wt mut
19 Count data (replicated) gene wt1 wt2 mut1 mut
20 Normalization Why normalize? Suppose there are two lanes of data, and 2 times as many sequences in lane A as lane B. Everything will appear to be upregulated, if unnormalized.
21 Normalization Techniques RPKM Reads Per Kilobase Million reads mapped is a common normalization procedure. RPKM = total mapped to gene total mapped to lane (in millions) x gene length (in kilobases) However, a few highly-expressed genes can dominate total lane counts. Consequently changes in highly expressed genes can disproportionately affect the scaling factor.
22 Normalization Techniques RPKM For example, in one lane of data, the top 2% of genes make up 30% of total lane counts. These 411 genes (out of 20,545) dominate the lane. A constant scaling factor based on total lane count is over-emphasizing the expression of these genes.
23 Normalization Techniques Quantile based techniques Idea: rescale empirical distribution to a theoretical one by ordering both, and making the nth smallest value of the empirical distribution equal to the nth smallest of the theoretical distribution. Bullard et al, 2010 have shown that these methods lead to more accurate differential expression results when verified with qpcr.
24 Normalization Techniques DESeq s Approach Size factors are estimated for each column (sample) of the data. Size factors are then used directly in the model fitting step. First, a psuedoreference is created by taking the geometric mean across rows. Then, the median of the ratios of all counts to the psuedoreference value is the size factor.
25 Normalization Techniques
26 experimental design data collection modeling Poisson vs Negative Binomial models assessing models assumptions statistical testing
27 Modeling RNA-Seq data Example: Poisson models Image of human brain from Anne Brogdon,
28 Modeling RNA-Seq data Models for Overdispersion DESeq & edger from Bioconductor both use a Negative Binomial model, which model the mean and variance separately.
29 Modeling RNA-Seq data Models for Overdispersion DESeq & edger from Bioconductor both use a Negative Binomial model, which model the mean and variance separately. Both packages have ways of assessing model fit. Use them!
30 Modeling RNA-Seq data Consistency between edger and DESeq Using data from Mariano, et al, 2008
31 Modeling RNA-Seq data Models for Overdispersion Why the difference? DESeq allows for a more local dispersion parameter for similar genes, whereas edger has a fixed dispersion parameter.* Anders and Huber, Orange dashed line is edger estimated variance, purple is variance from Poisson, and orange line is variance estimated from DESeq. *New versions of edger actually allow local fits, and new versions of DESeq have a fixed dispersion parameter! I am simplifying because this is as it is presented in the DESeq paper.
32 experimental design data collection modeling approaches to testing p-values statistical testing FWER FDR q-values
33 Testing Hypotheses
34 Testing Hypotheses
35 Testing Hypotheses
36 Testing Hypotheses
37 Why does this matter?
38 Multiple Testing n samples We re doing p simultaneous tests! p genes H1, H2, H3,..., Hp
39 Multiple Testing 20,000 simultaneous t-tests on random normal data from the same distribution. There are 1,009 green points (false positives), making up 0.05 of the comparisons (at α = 0.05).
40 Multiple Testing Familywise Error Rate number declared non-significant number declared significant total true null hypotheses false null hypotheses U V m0 T S m - m0 m - R R m FWER = P(V 1) FWER = 1 - P(V = 0)
41 Multiple Testing Bonferroni Correction One way of controlling FWER: set α = α/n Problem: very conservative.
42 Multiple Testing False Discovery number declared non-significant number declared significant total true null hypotheses false null hypotheses U V m0 T S m - m0 m - R R m FDR = E[V/R] (Benjamini and Hochberg, 1995)
43 Multiple Testing False Discovery number declared non-significant number declared significant total true null hypotheses false null hypotheses U V m0 T S m - m0 m - R R m control this FDR = E[V/R] (Benjamini and Hochberg, 1995) not this FWER = P(V 1)
44 Multiple Testing False Discovery Procedure (Benjamini and Hochberg, 1995) δ = 0.05 n = 10 Imagine 100 genes were tested, at δ = 0.1 If 40 were found significant, we d expect 4 to be false discoveries.
45 Multiple Testing Storey s q-value (Storey 2002; Storey and Tibshirani, 2003) When a given q-value is called significant, the q-value is the proportion of false discoveries incurred from p-values as or more extreme.
46 Multiple Testing Storey s q-value (Storey 2002; Storey and Tibshirani, 2003) When a given q-value is called significant, the q-value is the proportion of false discoveries incurred from p-values as or more extreme. For example, a q-value of says that 2.3% of genes with p-values as or more extreme (less likely) are false positives.
47 Multiple Testing Storey s q-value Practical Example: You have funds to test 100 top differentially expressed gene candidates. How should you pick them? One way: order by absolute value log fold change, and take the top 100 genes. Then order by q-value and the product of 100 and the last q-value is the expected number of false positives.
48 Reading Top Tables
49 Practical: Reading Top Tables Recall: it s not just about significance, but effect size. Sorting options: - absolute value of log FC (decreasing) - absolute value of adjusted log FC (decreasing) - p-value (increasing) Combinations: Absolute value of adjusted log FC (decreasing), subset by adjusted p-value less than some threshold.
50 Practical: Reading Top Tables Recall: it s not just about significance, but effect size. Sorting options: - absolute value of log FC (decreasing) - absolute value of adjusted log FC (decreasing) - p-value (increasing) Combinations: Absolute value of adjusted log FC (decreasing), subset by adjusted p-value less than some threshold.
51 Beyond Differential Expression Differentially Expressed Gene Combinations (Dettling, et al, 2005)
52 Acknowledgements The Bioinforma-cs Core Dr. Dawei Lin, Ph.D. (Director) Data Analysis Dr. Joe Fass, Ph.D. (Lead) Dr. Monica Bri9on Mr. Nikhil Joshi Sta-s-cal Programming Mr. Vince Buffalo (Lead) Applica-on Development (Web/DB) Mr. Jose Boveda (Lead) System Admin & HPC Dr. Zhi- Wei Lu (Lead) Visi-ng members Ms. Xinran Dong Campus Scien-fic Advisory Board Chair Dr. Craig Benham, Ph.D. (MathemaHcs) Members Dr. Gino Cortopassi, Ph.D. (Molecular Sciences) Dr. Vladimir Filkov, Ph.D. (Computer Sciences) Dr. Fredric Gorin, Ph.D. (Neurosciences) Dr. Juan Medrano, Ph.D. (Animal Sciences) Dr. Jie Peng, Ph.D. (StaHsHcs) Dr. David Rocke, Ph.D. (BiostaHsHcs) Genome Center Director Dr. Richard Michelmore, Ph.D. Associate Directors for Bioinforma-cs Dr. Ian Korf, Ph.D. Dr. Patrice Koehl, Ph.D.
Gene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
False Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
Expression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task
Statistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Package empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
Statistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: [email protected]
Quality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...
Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
A direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiple-hypothesis
Package dunn.test. January 6, 2016
Version 1.3.2 Date 2016-01-06 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable
Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
Challenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group [email protected] Next-generation sequencing Next-generation sequencing
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
Practical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the gene-summarized count data
The Bonferonni and Šidák Corrections for Multiple Comparisons
The Bonferonni and Šidák Corrections for Multiple Comparisons Hervé Abdi 1 1 Overview The more tests we perform on a set of data, the more likely we are to reject the null hypothesis when it is true (i.e.,
Quantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
Section 13, Part 1 ANOVA. Analysis Of Variance
Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability
Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test
Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely
Microarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
One-Way Analysis of Variance (ANOVA) Example Problem
One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means
Statistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
1. How different is the t distribution from the normal?
Statistics 101 106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M 7.1 and 7.2, ignoring starred parts. Reread M&M 3.2. The effects of estimated variances on normal approximations. t-distributions.
Normalization of RNA-Seq
Normalization of RNA-Seq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNA-Seq data analysis from scratch starts with a set of FASTQ files (see e.g.
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center [email protected] Aims Cancer Biostatistics
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K.
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 8 October
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
Two-sample hypothesis testing, II 9.07 3/16/2004
Two-sample hypothesis testing, II 9.07 3/16/004 Small sample tests for the difference between two independent means For two-sample tests of the difference in mean, things get a little confusing, here,
Comparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,
CONTENTS OF DAY 2. II. Why Random Sampling is Important 9 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 3 Problems with small populations 8 II. Why Random Sampling is Important 9 A myth,
False discovery rate and permutation test: An evaluation in ERP data analysis
Research Article Received 7 August 2008, Accepted 8 October 2009 Published online 25 November 2009 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3784 False discovery rate and permutation
How Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews [email protected] Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance
RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are
Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance?
Optimization 1 Choices, choices, choices... Which sequence database? Which modifications? What mass tolerance? Where to begin? 2 Sequence Databases Swiss-prot MSDB, NCBI nr dbest Species specific ORFS
Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected]
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected] 1 RNA Transcription to RNA and subsequent
Minería de Datos ANALISIS DE UN SET DE DATOS.! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions
Minería de Datos ANALISIS DE UN SET DE DATOS! Visualization Techniques! Combined Graph! Charts and Pies! Search for specific functions Data Mining on the DAG ü When working with large datasets, annotation
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist [email protected] Pathway Focused Research from Sample Prep to Data Analysis! -2-
Row Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
Multivariate Analysis of Ecological Data
Multivariate Analysis of Ecological Data MICHAEL GREENACRE Professor of Statistics at the Pompeu Fabra University in Barcelona, Spain RAUL PRIMICERIO Associate Professor of Ecology, Evolutionary Biology
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
1 Why is multiple testing a problem?
Spring 2008 - Stat C141/ Bioeng C141 - Statistics for Bioinformatics Course Website: http://www.stat.berkeley.edu/users/hhuang/141c-2008.html Section Website: http://www.stat.berkeley.edu/users/mgoldman
13: Additional ANOVA Topics. Post hoc Comparisons
13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Kruskal-Wallis Test Post hoc Comparisons In the prior
Analysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
Analysis Issues II. Mary Foulkes, PhD Johns Hopkins University
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
P(every one of the seven intervals covers the true mean yield at its location) = 3.
1 Let = number of locations at which the computed confidence interval for that location hits the true value of the mean yield at its location has a binomial(7,095) (a) P(every one of the seven intervals
Package ERP. December 14, 2015
Type Package Package ERP December 14, 2015 Title Significance Analysis of Event-Related Potentials Data Version 1.1 Date 2015-12-11 Author David Causeur (Agrocampus, Rennes, France) and Ching-Fan Sheu
Exercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
Introduction to data analysis: Supervised analysis
Introduction to data analysis: Supervised analysis Introduction to Microarray Technology course May 2011 Solveig Mjelstad Olafsrud [email protected] Most slides adapted/borrowed from presentations
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
Real-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
Package HHG. July 14, 2015
Type Package Package HHG July 14, 2015 Title Heller-Heller-Gorfine Tests of Independence and Equality of Distributions Version 1.5.1 Date 2015-07-13 Author Barak Brill & Shachar Kaufman, based in part
The Variability of P-Values. Summary
The Variability of P-Values Dennis D. Boos Department of Statistics North Carolina State University Raleigh, NC 27695-8203 [email protected] August 15, 2009 NC State Statistics Departement Tech Report
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
You Are What You Bet: Eliciting Risk Attitudes from Horse Races
You Are What You Bet: Eliciting Risk Attitudes from Horse Races Pierre-André Chiappori, Amit Gandhi, Bernard Salanié and Francois Salanié March 14, 2008 What Do We Know About Risk Preferences? Not that
Analysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
Statistics Review PSY379
Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses
Disease gene identification with exome sequencing
Disease gene identification with exome sequencing Christian Gilissen Dept. of Human Genetics Radboud University Nijmegen Medical Centre [email protected] Contents Infrastructure Exome sequencing
How To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
Outline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Chapter 7 Section 7.1: Inference for the Mean of a Population
Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling
Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material
A Streamlined Workflow for Untargeted Metabolomics
A Streamlined Workflow for Untargeted Metabolomics Employing XCMS plus, a Simultaneous Data Processing and Metabolite Identification Software Package for Rapid Untargeted Metabolite Screening Baljit K.
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
Fairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals
Systematic discovery of regulatory motifs in human promoters and 30 UTRs by comparison of several mammals Xiaohui Xie 1, Jun Lu 1, E. J. Kulbokas 1, Todd R. Golub 1, Vamsi Mootha 1, Kerstin Lindblad-Toh
Least Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
Power Analysis for Correlation & Multiple Regression
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful types of power analyses Simple correlations Full
mrna NGS Data Analysis Report
mrna NGS Data Analysis Report Project: Test Project (Ref code: 00001) Customer: Test customer Company/Institute: Exiqon Date: Monday, June 29, 2015 Performed by: XploreRNA Exiqon A/S Company Reg. No. (CVR)
A survey of best practices for RNA-seq data analysis
Conesa et al. Genome Biology (2016) 17:13 DOI 10.1186/s13059-016-0881-8 REVIEW A survey of best practices for RNA-seq data analysis Open Access Ana Conesa 1,2*, Pedro Madrigal 3,4*, Sonia Tarazona 2,5,
8/7/2012. Experimental Design & Intro to NGS Data Analysis. Examples. Agenda. Shoe Example. Breast Cancer Example. Rat Example (Experimental Design)
Experimental Design & Intro to NGS Data Analysis Ryan Peters Field Application Specialist Partek, Incorporated Agenda Experimental Design Examples ANOVA What assays are possible? NGS Analytical Process
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
Understanding West Nile Virus Infection
Understanding West Nile Virus Infection The QIAGEN Bioinformatics Solution: Biomedical Genomics Workbench (BXWB) + Ingenuity Pathway Analysis (IPA) Functional Genomics & Predictive Medicine, May 21-22,
Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
9. Sampling Distributions
9. Sampling Distributions Prerequisites none A. Introduction B. Sampling Distribution of the Mean C. Sampling Distribution of Difference Between Means D. Sampling Distribution of Pearson's r E. Sampling
NGS Data Analysis: An Intro to RNA-Seq
NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental
