Gene Expression Analysis


 Cameron Roberts
 2 years ago
 Views:
Transcription
1 Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012
2 RNA expression technologies Highthroughput technologies to measure the expression levels of thousands of genes simultaneously: Microarray, RNAseq. Platforms: Affymetrix GeneChip arrays; Genome Analyzer II, HiSeq 1000/2000. Goal: study the effects of treatments, developmental stages, tissues, etc. on gene expression. Experimental design issues. pooling, replication multiplexing include multiple bar coded samples in the same sequencing reaction lane, flow cell run, batch Library preparation. Extract data: image analysis; reads mapping.
3 Analyzing data Data structure: microarray intensity value for each probe on the array; RNAseq: mapped reads count for each gene. Data exploration, filtering Normalization Fitting differential expression (DE) models Calling for significant genes
4 Data exploration Plots: MA plots, histograms, etc. Summaries: mean/median, variance/mad, missing rate, library size, etc. Filtering: Microarray: low intensity, low variation RNAseq: low count
5 Normalization Remove systematic biases due to library preparation, RNA composition, etc. such that samples are comparable. Depend on technology and platform. Basic assumption: majority of genes are not differentially expressed across samples. Global normalization match certain global features of the samples. For example, make all samples have the same median and MAD; or make all samples to have the same.75% quantile. Do not change data much (often upto a scaling factor), may not remove all systematic biases. Quantile normalization impose the same empirical distribution to every sample. May change data a lot, may reduce signals while removing bias.
6 Quantile normalization: an R implementation quan.norm<function(x,quan=0.5){ ##x: p by n data matrix, where columns are the samples. norm<x p<nrow(x) n<ncol(x) x.sort<apply(x, 2, sort) ## sort genes within a sample x.rank<apply(x,2,rank) ## rank genes within a sample ## find the common distribution to be matched to: qant.sort<matrix(apply(x.sort,1,quantile, probs=quan), + p,n,byrow=false) ## match each sample to the common distribution: for (i in 1:n){ norm[,i]<qant.sort[x.rank[,i],i] } return(norm) }
7
8 Normalization of RNAseq data Global normalization by scaling. Library size normalization choose a reference sample: e.g., the sample with a median library size. for a target sample: multiply its counts by the ratio between the library size of the reference and that of the target. TMM normalization takes into account RNA composition differences. Ref: Mark D Robinson and Alicia Oshlack. A scaling normalization method for differential expression analysis of rnaseq data. Genome Biology, 11(3):R25, 2010 Quantilematched normalization match a certain quantile across samples: e.g., make the 75%quantile of counts the same for all samples. Ref: Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mrnaseq experiments. BMC Bioinformatics 11, 94.
9
10 RNA composition Observed quantities: counts: Y gk number of reads mapped to gene g in sample k. library size: N k := g Y gk total number of mapped reads in sample k. gene length: L g length of gene g. Unobserved quantities: abundance: A gk number of RNA transcripts of gene g in sample k. total abundance: A k := g A gk total amount of RNA transcripts in sample k. S k := g A gkl g. relative abundance: λ gk := A gk A k. For each gene g, we d like to compare the relative abundance across samples, e.g., testing H 0g : λ g1 = λ g2.
11 The expected value of Y gk can be modeled as E(Y gk ) = A gkl g s A skl s N k = (λ gk L g )( A k S k N k ) =: µ gk. Effective library size: Ñ k := A k S k N k. If Ñ 1 = Ñ 2, then comparing λ g1, λ g2 is equivalent to comparing µ g1, µ g2, which can be done by using a test based on the observed counts Y gk s. The goal is therefore to equalize the effective sample size across samples.
12 Note that E(Y gk /N k ) = (λ gk L g )(A k /S k ). By assuming that most of genes are not DE, i.e., for most genes, λ g1 = λ g2, the trimmed mean of the log ratios can be used to estimate {M g := log Y g1/n 1 Y g2 /N 2 } g, log A 1/S 1 A 2 /S 2.
13 Model expression data Microarray data: assume a multiplicative noise model and model the log intensity as normal random variables. RNAseq data. Within a sample, it is reasonable to model the counts as Poisson random variables with means proportional to the relative RNA abundance. When comparing two samples: R function glm() with famiy="poisson" can be used to fit data. findings are restricted to these two samples and can not be generalized to general populations. To account for biological variations across samples, various overdispersion models are considered. overdispersion: variance > mean. Note that for Poisson random variables, variance = mean. commonly used overdispersion models: negative binomial, quasipoisson, quasibinomial.
14
15
16 Cautions. The Poisson model is based on the assumption that reads are randomly and independently distributed. This may not be true due to various reasons such as random hexamer priming, GC content bias. Ref: Kasper D. Hansen, Steven E. Brenner, Sandrine Dudoit. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research, Vol. 38, No. 12. (01 July 2010), pp. e131e131; Davide Risso, Katja Schwartz, Gavin Sherlock and Sandrine Dudoit. GCContent Normalization for RNASeq Data. BMC Bioinformatics 2011, 12:480. Corrections and normalizations may be necessary depending on the goal of the study. Underdispersion is sometimes observed. QuasiPoisson model can deal with both overdispersion and underdispersion. Negative binomial model can only model overdispersion.
17 Differentially expressed genes Microarray: (moderated) ttests based on log intensities. RNAseq: likelihood ratio tests or exact tests based on counts. Permutation tests, rank tests, empirical Bayes methods, etc. Multiple comparison adjustment: based on pvalues. Control familywise error rate (FWER): bonferroni, holm, etc. Control false discovery rate (fdr): Benjamini & Hochberg (BH), Benjamini & Yekutieli (2001) (BY), etc. R function p.adjust. Other variants of fdr: R package locfdr, R package qvalue.
18 R packages Microarray: affy, limma, etc. RNAseq: DESeq, edger, glm, etc. Bioconductor package edger Based on negative binomial models: Y NB(µ, φ), E(Y ) = µ, Var(Y ) = µ(1 + µφ) (µ > 0, φ > 0). To account for small sample sizes as is typical in RNAseq studies, edger also utilizes empirical Bayes ideas to pool information across genes. Ref: Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):13940,2010; M. D Robinson and G. K Smyth. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21): , 2007; M. D Robinson and G. K Smyth. Smallsample estimation of negative binomial dispersion, with applications to sage data. Biostatistics, 9(2): , 2008.
19 A Case study An RNAseq data set with two groups: grp1 eight replicates, grp2 seven replicates. Data exploration. Data matrix : row gene, column sample. dim(counts) geneid grp1 sample1 grp1 sample2 grp1 sample3... gene gene gene Library size: barplot(colsums(counts)) Filtering: allzero=(rowsums(counts)==0);counts=counts[!allzero,]; dim(counts) Clustering of samples: are samples from the same group clustered together?
20
21 > library(edger) > group=factor(c(rep(1,8), rep(2,7))) > d=dgelist(counts,group) > d$samples$lib.size > plotmds(d) grp2 sample 6 Dimension grp1 sample 7 grp1 sample 1 grp1 sample 5 grp1 sample 8 grp1 sample 6 grp1 sample 4 grp1 sample 3 grp1 sample 2 grp2 sample 2 grp2 sample 7 grp2 sample 1 grp2 sample 3 grp2 sample 4 grp2 sample Dimension 1
22 Normalization and MA plots. > d=calcnormfactors(d,method="tmm") > samp1="grp1sample 7"; samp2="grp2sample 5" > maplot(d$counts[,samp1],d$counts[,samp2],normalize=true, + lowess=true, ylim=c(8,8),pch=19, cex=0.4) > abline(h=0, lty=2) > eff.libsize=d$samples$lib.size*d$samples$norm.factors > names(eff.libsize)=colnames(d$counts) > maplot(d$counts[,samp1]/eff.libsize[samp1], + d$counts[,samp2]/eff.libsize[samp2],normalize=false, + lowess=true, ylim=c(8,8),pch=19, cex=0.4) > abline(h=0, lty=2)
23
24 Twogroup comparison and gene calling. Estimate dispersion parameters and plot genewise biological coefficient of variation (square root of dispersion) against gene abundance (in log2 counts per million). > d=estimatecommondisp(d, verbose=true) > d$common.dispersion > d=estimatetagwisedisp(d,prior.n=getpriorn(d)) > plotbcv(d)
25
26 Exact test and gene calling. > et=exacttest(d,pair=1:2,dispersion="tagwise", + rejection.region="doubletail",big.count=900) > toptags(et,n=100, adjust.method="by") > de=decidetestsdge(et, adjust.method="by", + p.value=0.05) > summary(de) FDR method BY takes into account dependency and is more conservative than method BH. Draw smear plot of log concentration vs. log foldchange: find both statistically significant and practically significant DE genes. > plotsmear(et, + de.tags=rownames(et$table)[as.logical(de)])
27
28 Look at pvalue distribution Histogram: > hist(et$table$pvalue, breaks=50,xlab="pvalue") Observe a unusual high bar on pvalue close to one. Examine logpvalue vs. logconcentration/logcpm: this bar is primarily from genes with small number of counts. Use a threshold (e.g., 10) on the total number of counts across samples to filter out lowcount genes. Similar phenomena occurs when analyzing exon sequence data in GWAS studies.
29 histogram of pvalues Frequency pvalue
30
31 histogram of pvalues genes with at least 10 total counts: 84% genes pass Frequency Frequency pvalue pvalue genes with at least 20 total counts: 79% genes pass genes with at least 40 total counts: 73% genes pass Frequency Frequency pvalue pvalue
32 Summary Explore data by graphs and numerical summaries. Examine normalization by MA plots. Filter out genes with small counts. Look at both pvalues and fold change for significant genes.
From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNAseq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNAseq data experimental design data collection modeling statistical testing biological heterogeneity
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationPractical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the genesummarized count data
More informationedger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K.
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 8 October
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationRNAseq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
More informationNormalization of RNASeq
Normalization of RNASeq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNASeq data analysis from scratch starts with a set of FASTQ files (see e.g.
More informationBasics of microarrays. Petter Mostad 2003
Basics of microarrays Petter Mostad 2003 Why microarrays? Microarrays work by hybridizing strands of DNA in a sample against complementary DNA in spots on a chip. Expression analysis measure relative amounts
More informationExpression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (pairedend) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNAseq protocol Task
More informationThe microarray block. Outline. Microarray experiments. Microarray Technologies. Outline
The microarray block Bioinformatics 1317 March 006 Microarray data analysis John Gustafsson Mathematical statistics Chalmers Lectures DNA microarray technology overview (KS) of microarray data (JG) How
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays?  Biomolecular devices measuring the transcriptome of a
More informationIntroduction to SAGEnhaft
Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene
More informationFalse Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationBootstrapping pvalue estimations
Bootstrapping pvalue estimations In microarray studies it is common that the the sample size is small and that the distribution of expression values differs from normality. In this situations, permutation
More informationMicroarray Data Analysis. A step by step analysis using BRBArray Tools
Microarray Data Analysis A step by step analysis using BRBArray Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Differential Expression Analysis Daniel Rico drico@cnio.es Bioinformatics Unit CNIO Upregulation or No Change Downregulation Image analysis comparison
More informationLukas Windhager LFE Bioinformatik, Institut für Informatik LudwigMaximiliansUniversität München Coverage variability in NGS Data
Lukas Windhager LFE Bioinformatik, Institut für Informatik LudwigMaximiliansUniversität München Coverage variability in NGS Data 06.04.2011 Short talk Reproducible pattern SOLiD reads mapped to rrna
More informationRow Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES484 ISSN: 17448050 23 June 2008 Abstract
More informationEDASeq: Exploratory Data Analysis and Normalization for RNASeq
EDASeq: Exploratory Data Analysis and Normalization for RNASeq Davide Risso Modified: May 22, 2012. Compiled: October 14, 2013 1 Introduction In this document, we show how to conduct Exploratory Data
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationPackage empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title SimulationBased False Discovery Rate in RNASeq Version 1.0.3 Date 20150526 Author Mikhail V. Matz Maintainer Mikhail V. Matz
More informationTwoWay ANOVA tests. I. Definition and Applications...2. II. TwoWay ANOVA prerequisites...2. III. How to use the TwoWay ANOVA tool?...
TwoWay ANOVA tests Contents at a glance I. Definition and Applications...2 II. TwoWay ANOVA prerequisites...2 III. How to use the TwoWay ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationStatistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: rahnenfuehrer@statistik.tu.de
More informationStatistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationPredictive Gene Signature Selection for Adjuvant Chemotherapy in NonSmall Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in NonSmall Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationAutomated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
More informationPREDA S4classes. Francesco Ferrari October 13, 2015
PREDA S4classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationOpen array data analysis: mirna profiling in blood samples from patient suffering heart diseases
CRG BIOINFORMATICS CORE FACILITIES Open array data analysis: mirna profiling in blood samples from patient suffering heart diseases May 2015 Users: Begona Benito and Marta Tajes Users center: IMIM Analyst:
More informationMicroarray Data Analysis. Statistical methods to detect differentially expressed genes
Microarray Data Analysis Statistical methods to detect differentially expressed genes Outline The class comparison problem Statistical tests Calculation of pvalues Permutations tests The volcano plot
More informationAnalysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk
Analysing Questionnaires using Minitab (for SPSS queries contact ) Graham.Currell@uwe.ac.uk Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
More informationTechnology StepbyStep Using StatCrunch
Technology StepbyStep Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationBowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology StepbyStep  Excel Microsoft Excel is a spreadsheet software application
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationModels for Count Data With Overdispersion
Models for Count Data With Overdispersion Germán Rodríguez November 6, 2013 Abstract This addendum to the WWS 509 notes covers extrapoisson variation and the negative binomial model, with brief appearances
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationFrozen Robust MultiArray Analysis and the Gene Expression Barcode
Frozen Robust MultiArray Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationRNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance
RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationData Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationBasic processing of nextgeneration sequencing (NGS) data
Basic processing of nextgeneration sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
More informationCurriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 20092010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
More informationPreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNASEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
More informationStatistical Analysis. NBAFB Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAFB Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationFlipFlop: Fast Lassobased Isoform Prediction as a Flow Problem
FlipFlop: Fast Lassobased Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal JeanPhilippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationMicroarray Analysis Using R/Bioconductor
Microarray Analysis Using R/Bioconductor Reddy Gali, Ph.D. rgali@hms.harvard.edu h"p://catalyst.harvard.edu Agenda Introduction to microarrays Workflow of a gene expression microarray experiment Publishing
More information0BComparativeMarkerSelection Documentation
0BComparativeMarkerSelection Documentation Description: Author: Computes significance values for features using several metrics, including FDR(BH), Q Value, FWER, FeatureSpecific PValue, and Bonferroni.
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.11.6) Objectives
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationStandards, Guidelines and Best Practices for RNASeq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNASeq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNAseq) are in wide use because of their favorable
More informationBIOL 3200 Spring 2015 DNA Subway and RNASeq Data Analysis
BIOL 3200 Spring 2015 DNA Subway and RNASeq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationLogistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 SigmaRestricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationRealtime PCR: Understanding C t
APPLICATION NOTE RealTime PCR Realtime PCR: Understanding C t Realtime PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
More informationCorrelation of microarray and quantitative realtime PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York
Correlation of microarray and quantitative realtime PCR results Elisa Wurmbach Mount Sinai School of Medicine New York Microarray techniques Oligoarray: Affymetrix, Codelink, spotted oligoarrays (6070mers)
More informationQuantitative Biology Lecture 5 (Hypothesis Testing)
15 th Oct 2015 Quantitative Biology Lecture 5 (Hypothesis Testing) Gurinder Singh Mickey Atwal Center for Quantitative Biology Summary Classification Errors Statistical significance Ttests Qvalues (Traditional)
More informationUser Manual May 2016
User Manual May 2016 Chapter 1 Introduction to GENEVESTIGATOR 5 1.1 What is GENEVESTIGATOR? 5 1.1.1 The concept of metaprofiles 5 1.1.2 Software components 7 1.1.3 Requirements 7 1.2 Types of analysis
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationHow Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews simon.andrews@babraham.ac.uk Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
More informationedger: differential expression analysis of digital gene expression data User s Guide
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Matthew Ritchie, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised
More information2. DATA AND EXERCISES (Geos2911 students please read page 8)
2. DATA AND EXERCISES (Geos2911 students please read page 8) 2.1 Data set The data set available to you is an Excel spreadsheet file called cyclones.xls. The file consists of 3 sheets. Only the third is
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationPackage dunn.test. January 6, 2016
Version 1.3.2 Date 20160106 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
More informationExiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays
Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene
More informationREAL TIME PCR USING SYBR GREEN
REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM
More informationFalse discovery rate and permutation test: An evaluation in ERP data analysis
Research Article Received 7 August 2008, Accepted 8 October 2009 Published online 25 November 2009 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3784 False discovery rate and permutation
More informationExploratory Data Analysis
Exploratory Data Analysis Johannes Schauer johannes.schauer@tugraz.at Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
More informationQVALUE: The Manual Version 1.0
QVALUE: The Manual Version 1.0 Alan Dabney and John D. Storey Department of Biostatistics University of Washington Email: jstorey@u.washington.edu March 2003; Updated June 2003; Updated January 2004 Table
More informationMATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Prealgebra Algebra Precalculus Calculus Statistics
More informationIntroduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
More informationMicroarray Analysis. The Basics. Thomas Girke. December 9, 2011. Microarray Analysis Slide 1/42
Microarray Analysis The Basics Thomas Girke December 9, 2011 Microarray Analysis Slide 1/42 Technology Challenges Data Analysis Data Depositories R and BioConductor Homework Assignment Microarray Analysis
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationConsistent Assay Performance Across Universal Arrays and Scanners
Technical Note: Illumina Systems and Software Consistent Assay Performance Across Universal Arrays and Scanners There are multiple Universal Array and scanner options for running Illumina DASL and GoldenGate
More information2.500 Threshold. 2.000 1000e  001. Threshold. Exponential phase. Cycle Number
application note RealTime PCR: Understanding C T RealTime PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e  001 Rn 2500 Rn 1500 Rn 2000
More informationMaterials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling
Application Note Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profi ling Yasmin BeazerBarclay, Doug Sinon, Christopher Morehouse, Mark Porter, and Mike Kuziora
More informationExploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
More informationIntegrating DNA Motif Discovery and GenomeWide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and GenomeWide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationIntroduction to nextgeneration sequencing data
Introduction to nextgeneration sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/researchcentres/cem/ Outline History of DNA sequencing NGS
More informationSupplementary Figure 1: Quality Assessment of Mouse Arrays. Supplementary Figure 2: Quality Assessment of Rat Arrays
Supplementary Figure 1: Quality Assessment of Mouse Arrays The mouse microarray data were subjected to an extensive qualitycontrol procedure prior to conducting downstream analyses. We assessed the spread
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationALLEN Mouse Brain Atlas
TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGHTHROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for highthroughput
More informationAnalysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Betweensubjects manipulations: variable to
More informationPackage HHG. July 14, 2015
Type Package Package HHG July 14, 2015 Title HellerHellerGorfine Tests of Independence and Equality of Distributions Version 1.5.1 Date 20150713 Author Barak Brill & Shachar Kaufman, based in part
More informationExercise with Gene Ontology  Cytoscape  BiNGO
Exercise with Gene Ontology  Cytoscape  BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More informationBasic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. CoDirector, KeckUNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
More informationCancer Biostatistics Workshop Science of Doing Science  Biostatistics
Cancer Biostatistics Workshop Science of Doing Science  Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center VanderbiltIngram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More information