Gene Expression Analysis
|
|
|
- Cameron Roberts
- 9 years ago
- Views:
Transcription
1 Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012
2 RNA expression technologies High-throughput technologies to measure the expression levels of thousands of genes simultaneously: Microarray, RNA-seq. Platforms: Affymetrix GeneChip arrays; Genome Analyzer II, HiSeq 1000/2000. Goal: study the effects of treatments, developmental stages, tissues, etc. on gene expression. Experimental design issues. pooling, replication multiplexing include multiple bar coded samples in the same sequencing reaction lane, flow cell run, batch Library preparation. Extract data: image analysis; reads mapping.
3 Analyzing data Data structure: microarray intensity value for each probe on the array; RNA-seq: mapped reads count for each gene. Data exploration, filtering Normalization Fitting differential expression (DE) models Calling for significant genes
4 Data exploration Plots: MA plots, histograms, etc. Summaries: mean/median, variance/mad, missing rate, library size, etc. Filtering: Microarray: low intensity, low variation RNA-seq: low count
5 Normalization Remove systematic biases due to library preparation, RNA composition, etc. such that samples are comparable. Depend on technology and platform. Basic assumption: majority of genes are not differentially expressed across samples. Global normalization match certain global features of the samples. For example, make all samples have the same median and MAD; or make all samples to have the same.75% quantile. Do not change data much (often upto a scaling factor), may not remove all systematic biases. Quantile normalization impose the same empirical distribution to every sample. May change data a lot, may reduce signals while removing bias.
6 Quantile normalization: an R implementation quan.norm<-function(x,quan=0.5){ ##x: p by n data matrix, where columns are the samples. norm<-x p<-nrow(x) n<-ncol(x) x.sort<-apply(x, 2, sort) ## sort genes within a sample x.rank<-apply(x,2,rank) ## rank genes within a sample ## find the common distribution to be matched to: qant.sort<-matrix(apply(x.sort,1,quantile, probs=quan), + p,n,byrow=false) ## match each sample to the common distribution: for (i in 1:n){ norm[,i]<-qant.sort[x.rank[,i],i] } return(norm) }
7
8 Normalization of RNA-seq data Global normalization by scaling. Library size normalization choose a reference sample: e.g., the sample with a median library size. for a target sample: multiply its counts by the ratio between the library size of the reference and that of the target. TMM normalization takes into account RNA composition differences. Ref: Mark D Robinson and Alicia Oshlack. A scaling normalization method for differential expression analysis of rna-seq data. Genome Biology, 11(3):R25, 2010 Quantile-matched normalization match a certain quantile across samples: e.g., make the 75%-quantile of counts the same for all samples. Ref: Bullard JH, Purdom E, Hansen KD, Dudoit S. Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC Bioinformatics 11, 94.
9
10 RNA composition Observed quantities: counts: Y gk number of reads mapped to gene g in sample k. library size: N k := g Y gk total number of mapped reads in sample k. gene length: L g length of gene g. Unobserved quantities: abundance: A gk number of RNA transcripts of gene g in sample k. total abundance: A k := g A gk total amount of RNA transcripts in sample k. S k := g A gkl g. relative abundance: λ gk := A gk A k. For each gene g, we d like to compare the relative abundance across samples, e.g., testing H 0g : λ g1 = λ g2.
11 The expected value of Y gk can be modeled as E(Y gk ) = A gkl g s A skl s N k = (λ gk L g )( A k S k N k ) =: µ gk. Effective library size: Ñ k := A k S k N k. If Ñ 1 = Ñ 2, then comparing λ g1, λ g2 is equivalent to comparing µ g1, µ g2, which can be done by using a test based on the observed counts Y gk s. The goal is therefore to equalize the effective sample size across samples.
12 Note that E(Y gk /N k ) = (λ gk L g )(A k /S k ). By assuming that most of genes are not DE, i.e., for most genes, λ g1 = λ g2, the trimmed mean of the log ratios can be used to estimate {M g := log Y g1/n 1 Y g2 /N 2 } g, log A 1/S 1 A 2 /S 2.
13 Model expression data Microarray data: assume a multiplicative noise model and model the log intensity as normal random variables. RNA-seq data. Within a sample, it is reasonable to model the counts as Poisson random variables with means proportional to the relative RNA abundance. When comparing two samples: R function glm() with famiy="poisson" can be used to fit data. findings are restricted to these two samples and can not be generalized to general populations. To account for biological variations across samples, various overdispersion models are considered. overdispersion: variance > mean. Note that for Poisson random variables, variance = mean. commonly used overdispersion models: negative binomial, quasi-poisson, quasi-binomial.
14
15
16 Cautions. The Poisson model is based on the assumption that reads are randomly and independently distributed. This may not be true due to various reasons such as random hexamer priming, GC content bias. Ref: Kasper D. Hansen, Steven E. Brenner, Sandrine Dudoit. Biases in Illumina transcriptome sequencing caused by random hexamer priming. Nucleic Acids Research, Vol. 38, No. 12. (01 July 2010), pp. e131-e131; Davide Risso, Katja Schwartz, Gavin Sherlock and Sandrine Dudoit. GC-Content Normalization for RNA-Seq Data. BMC Bioinformatics 2011, 12:480. Corrections and normalizations may be necessary depending on the goal of the study. Underdispersion is sometimes observed. Quasi-Poisson model can deal with both overdispersion and underdispersion. Negative binomial model can only model overdispersion.
17 Differentially expressed genes Microarray: (moderated) t-tests based on log intensities. RNA-seq: likelihood ratio tests or exact tests based on counts. Permutation tests, rank tests, empirical Bayes methods, etc. Multiple comparison adjustment: based on pvalues. Control familywise error rate (FWER): bonferroni, holm, etc. Control false discovery rate (fdr): Benjamini & Hochberg (BH), Benjamini & Yekutieli (2001) (BY), etc. R function p.adjust. Other variants of fdr: R package locfdr, R package qvalue.
18 R packages Microarray: affy, limma, etc. RNA-seq: DESeq, edger, glm, etc. Bioconductor package edger Based on negative binomial models: Y NB(µ, φ), E(Y ) = µ, Var(Y ) = µ(1 + µφ) (µ > 0, φ > 0). To account for small sample sizes as is typical in RNA-seq studies, edger also utilizes empirical Bayes ideas to pool information across genes. Ref: Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics, 26(1):139-40,2010; M. D Robinson and G. K Smyth. Moderated statistical tests for assessing differences in tag abundance. Bioinformatics, 23(21): , 2007; M. D Robinson and G. K Smyth. Small-sample estimation of negative binomial dispersion, with applications to sage data. Biostatistics, 9(2): , 2008.
19 A Case study An RNA-seq data set with two groups: grp1 eight replicates, grp2 seven replicates. Data exploration. Data matrix : row gene, column sample. dim(counts) geneid grp1 sample1 grp1 sample2 grp1 sample3... gene gene gene Library size: barplot(colsums(counts)) Filtering: allzero=(rowsums(counts)==0);counts=counts[!allzero,]; dim(counts) Clustering of samples: are samples from the same group clustered together?
20
21 > library(edger) > group=factor(c(rep(1,8), rep(2,7))) > d=dgelist(counts,group) > d$samples$lib.size > plotmds(d) grp2 sample 6 Dimension grp1 sample 7 grp1 sample 1 grp1 sample 5 grp1 sample 8 grp1 sample 6 grp1 sample 4 grp1 sample 3 grp1 sample 2 grp2 sample 2 grp2 sample 7 grp2 sample 1 grp2 sample 3 grp2 sample 4 grp2 sample Dimension 1
22 Normalization and MA plots. > d=calcnormfactors(d,method="tmm") > samp1="grp1-sample 7"; samp2="grp2-sample 5" > maplot(d$counts[,samp1],d$counts[,samp2],normalize=true, + lowess=true, ylim=c(-8,8),pch=19, cex=0.4) > abline(h=0, lty=2) > eff.libsize=d$samples$lib.size*d$samples$norm.factors > names(eff.libsize)=colnames(d$counts) > maplot(d$counts[,samp1]/eff.libsize[samp1], + d$counts[,samp2]/eff.libsize[samp2],normalize=false, + lowess=true, ylim=c(-8,8),pch=19, cex=0.4) > abline(h=0, lty=2)
23
24 Two-group comparison and gene calling. Estimate dispersion parameters and plot genewise biological coefficient of variation (square root of dispersion) against gene abundance (in log2 counts per million). > d=estimatecommondisp(d, verbose=true) > d$common.dispersion > d=estimatetagwisedisp(d,prior.n=getpriorn(d)) > plotbcv(d)
25
26 Exact test and gene calling. > et=exacttest(d,pair=1:2,dispersion="tagwise", + rejection.region="doubletail",big.count=900) > toptags(et,n=100, adjust.method="by") > de=decidetestsdge(et, adjust.method="by", + p.value=0.05) > summary(de) FDR method BY takes into account dependency and is more conservative than method BH. Draw smear plot of log concentration vs. log fold-change: find both statistically significant and practically significant DE genes. > plotsmear(et, + de.tags=rownames(et$table)[as.logical(de)])
27
28 Look at pvalue distribution Histogram: > hist(et$table$pvalue, breaks=50,xlab="pvalue") Observe a unusual high bar on pvalue close to one. Examine log-pvalue vs. log-concentration/log-cpm: this bar is primarily from genes with small number of counts. Use a threshold (e.g., 10) on the total number of counts across samples to filter out low-count genes. Similar phenomena occurs when analyzing exon sequence data in GWAS studies.
29 histogram of pvalues Frequency pvalue
30
31 histogram of pvalues genes with at least 10 total counts: 84% genes pass Frequency Frequency pvalue pvalue genes with at least 20 total counts: 79% genes pass genes with at least 40 total counts: 73% genes pass Frequency Frequency pvalue pvalue
32 Summary Explore data by graphs and numerical summaries. Examine normalization by MA plots. Filter out genes with small counts. Look at both p-values and fold change for significant genes.
From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
Statistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
Practical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the gene-summarized count data
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K.
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 8 October
Quality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
Normalization of RNA-Seq
Normalization of RNA-Seq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNA-Seq data analysis from scratch starts with a set of FASTQ files (see e.g.
RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
False Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
Expression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task
Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Microarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq
EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq Davide Risso Modified: May 22, 2012. Compiled: October 14, 2013 1 Introduction In this document, we show how to conduct Exploratory Data
Row Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
Analysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...
Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4
Statistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
Analysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
Statistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: [email protected]
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
Package empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Comparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
Frozen Robust Multi-Array Analysis and the Gene Expression Barcode
Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
Automated Biosurveillance Data from England and Wales, 1991 2011
Article DOI: http://dx.doi.org/10.3201/eid1901.120493 Automated Biosurveillance Data from England and Wales, 1991 2011 Technical Appendix This online appendix provides technical details of statistical
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected]
Analysing Questionnaires using Minitab (for SPSS queries contact -) [email protected] Structure As a starting point it is useful to consider a basic questionnaire as containing three main sections:
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition
Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition Online Learning Centre Technology Step-by-Step - Excel Microsoft Excel is a spreadsheet software application
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
RNA Express. Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance
RNA Express Introduction 3 Run RNA Express 4 RNA Express App Output 6 RNA Express Workflow 12 Technical Assistance ILLUMINA PROPRIETARY 15052918 Rev. A February 2014 This document and its contents are
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Exploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany [email protected] Visualization
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable
Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.1-1.6) Objectives
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools
Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools Occam s razor.......................................................... 2 A look at data I.........................................................
PreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010
Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different
Quantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
Core Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
Measuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Real-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York
Correlation of microarray and quantitative real-time PCR results Elisa Wurmbach Mount Sinai School of Medicine New York Microarray techniques Oligo-array: Affymetrix, Codelink, spotted oligo-arrays (60-70mers)
How Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews [email protected] Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis
BIOL 3200 Spring 2015 DNA Subway and RNA-Seq Data Analysis By the end of this lab students should be able to: Describe the uses for each line of the DNA subway program (Red/Yellow/Blue/Green) Describe
Package dunn.test. January 6, 2016
Version 1.3.2 Date 2016-01-06 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
Package HHG. July 14, 2015
Type Package Package HHG July 14, 2015 Title Heller-Heller-Gorfine Tests of Independence and Equality of Distributions Version 1.5.1 Date 2015-07-13 Author Barak Brill & Shachar Kaufman, based in part
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center [email protected] Aims Cancer Biostatistics
Basic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
REAL TIME PCR USING SYBR GREEN
REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM
edger: differential expression analysis of digital gene expression data User s Guide
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Matthew Ritchie, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised
Logistic Regression (a type of Generalized Linear Model)
Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge
2.500 Threshold. 2.000 1000e - 001. Threshold. Exponential phase. Cycle Number
application note Real-Time PCR: Understanding C T Real-Time PCR: Understanding C T 4.500 3.500 1000e + 001 4.000 3.000 1000e + 000 3.500 2.500 Threshold 3.000 2.000 1000e - 001 Rn 2500 Rn 1500 Rn 2000
Consistent Assay Performance Across Universal Arrays and Scanners
Technical Note: Illumina Systems and Software Consistent Assay Performance Across Universal Arrays and Scanners There are multiple Universal Array and scanner options for running Illumina DASL and GoldenGate
Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing
Materials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling
Application Note Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profi ling Yasmin Beazer-Barclay, Doug Sinon, Christopher Morehouse, Mark Porter, and Mike Kuziora
Introduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
False discovery rate and permutation test: An evaluation in ERP data analysis
Research Article Received 7 August 2008, Accepted 8 October 2009 Published online 25 November 2009 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3784 False discovery rate and permutation
Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays
Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene
Exercise with Gene Ontology - Cytoscape - BiNGO
Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray
Microarray Analysis. The Basics. Thomas Girke. December 9, 2011. Microarray Analysis Slide 1/42
Microarray Analysis The Basics Thomas Girke December 9, 2011 Microarray Analysis Slide 1/42 Technology Challenges Data Analysis Data Depositories R and BioConductor Homework Assignment Microarray Analysis
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
MATH BOOK OF PROBLEMS SERIES. New from Pearson Custom Publishing!
MATH BOOK OF PROBLEMS SERIES New from Pearson Custom Publishing! The Math Book of Problems Series is a database of math problems for the following courses: Pre-algebra Algebra Pre-calculus Calculus Statistics
Influence of GSM and UMTS on the Blood Brain Barrier in vitro additional results
Influence of GSM and UMTS on the Blood Brain Barrier in vitro additional results Intl. Workshop on long term effects, München, 11.-12. Okt. 2007 Dr. rer. nat. Helmut Franke Klinik und Poliklinik für Neurologie
Outline. Dispersion Bush lupine survival Quasi-Binomial family
Outline 1 Three-way interactions 2 Overdispersion in logistic regression Dispersion Bush lupine survival Quasi-Binomial family 3 Simulation for inference Why simulations Testing model fit: simulating the
STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics
Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This
ALLEN Mouse Brain Atlas
TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGH-THROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for high-throughput
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
Introduction to data analysis: Supervised analysis
Introduction to data analysis: Supervised analysis Introduction to Microarray Technology course May 2011 Solveig Mjelstad Olafsrud [email protected] Most slides adapted/borrowed from presentations
Analysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Between-subjects manipulations: variable to
Analysis of Variance. MINITAB User s Guide 2 3-1
3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced
Exploratory Data Analysis
Exploratory Data Analysis Johannes Schauer [email protected] Institute of Statistics Graz University of Technology Steyrergasse 17/IV, 8010 Graz www.statistics.tugraz.at February 12, 2008 Introduction
User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev.
User Manual Transcriptome Analysis Console (TAC) Software For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev. 1 Trademarks Affymetrix, Axiom, Command Console, DMET, GeneAtlas,
Online Supplement to Polygenic Influence on Educational Attainment. Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Online Supplement to Polygenic Influence on Educational Attainment Construction of Polygenic Score for Educational Attainment Genotyping was conducted with the Illumina HumanOmni1-Quad v1 platform using
Introduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
Next generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
RNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
MIC - Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska
MIC - Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation - Goal Determine important undiscovered
Penalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
