Statistical challenges in RNA-Seq data analysis
|
|
|
- Roy Tyler
- 10 years ago
- Views:
Transcription
1 Statistical challenges in RNA-Seq data analysis Julie Aubert UMR 518 AgroParisTech-INRA Mathématiques et Informatique Appliquées ETGE, Aussois, 2012 April 26 J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 1 / 65
2 Outline 1 Introduction 2 Experimental design 3 Normalization 4 Differential analysis 5 Conclusions 6 R and Bioconductor J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 2 / 65
3 Introduction A statistical model : what for? Aim of an experiment : answer to a biological question Results of an experiment : (numerous, numerical) measurements Model : mathematical formula that relates the experimental conditions and the observed measurements (response) (Statistical) modelling : translating a biological question into a mathematical model ( PIPELINE!) Statistical model : mathematical formula involving the experimental conditions, the biological response, the parameters that describe the influence of the conditions on the (mean, theoretical) response, and a description of the (technical, biological) variability. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 3 / 65
4 Introduction Biological question Experimental design Sequencing experiment Low-level analysis Exploratory Data Analysis, image analysis, base calling, read mapping, metadata integration Higher-level analysis Biological validation and interprétation Exploratory Data Analysis, normalization and expression quantification, differential analysis, metadata integration *Adapted from S. Dudoit, Berkeley J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 4 / 65
5 Experimental design Experimental Design Basic principles - Fisher (1935) (technical and biological) replications Randomization Blocking Application to NGS Identify controllable biases / technical specificities lane effect< run effect< library prep effect<<biological effect [Marioni et al 2008, Bullard et al 2010] Increase biological replications! Sequencing technology does not eliminate biological variability, Correspondence Nature Biotechnology (July 2011) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 5 / 65
6 Experimental design Experimental Design A good design is a list of experiments to conduct in order to answer to the asked question which maximize collected information and minimize experiments cost with respect to constraints. RNA-seq Which genes or transcripts are expressed? differentially expressed between several conditions? J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 6 / 65
7 Experimental design Experimental Design Technical choices Choice of sequencing technology, type of reads (paired-end?), type of sequencing (directional?), library preparation protocol Sequencing depth Barcoding (attaching a known sequence of nucleotides to the 3 ends of the NGS technology adapter sequences identifing a sample) or not Pooling* of barcoded sample for a simultaneous sequencing and number of samples. Technical challenge : combining approximately equal ratios of cdna preparations to achieve approximately similar depths of sequencing for all samples J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 7 / 65
8 Experimental design Experimental Design Replicate number and sample allocation to runs/lanes Technical vs biological replicates Increasing the number of bio. replicates increases the precision and generalizability of the results Technical variability => inconsistent detection of exons at low levels of coverage (<5reads per nucleotide) (McIntyre et al. 2011) Doing technical replication may be important in studies where low abundant mrnas are the focus. Guidelines from the Encyclopedia of DNA Elements (ENCODE) consortium (June 2011) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 8 / 65
9 Experimental design Library Preparation Protocols J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April 26 9 / 65
10 RNA-sequencing Experimental design Random fragmentation Reverse transcription _ mrnas from a sample Fragmented mrnas cdnas PCR amplification & sequencing mapping # of reads mapped to Gene 1 Gene 2 Gene m counting from Gene 19 from Gene 23 from Gene 56 ATTGCC... GCTAAC... AGCCTC... A vector of counts mapped reads A list of reads Adapted from Li et al. (2011) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
11 Experimental design Sequencing Mapping Counts Fragment expression quantification A real challenge, more important for eucaryots (splicing). Trapnell et al. 2011, Garber et al. 2011, Pachter 2011 J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
12 Experimental design Exploratory data analysis Conducting data analysis is like drinking a fine wine. It is important to swirl and sniff the wine, to unpack the complex bouquet and to appreciate the experience. Gulping the wine doesn t work. (Daniel B. Wright ) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
13 Normalization Normalization Definition Normalization is a process designed to identify and correct technical biases removing the least possible biological signal. This step is technology and platform-dependant. Within-lane normalization Normalisation enabling comparisons of fragments (genes) from a same sample. No need in a differential analysis context. Between-lane normalization Normalisation enabling comparisons of fragments (genes) from different samples. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
14 Normalization Sources of variability Within-sample Gene length Sequence composition (GC content) Between-sample Depth (total number of sequenced and mapped reads) Sampling bias in library construction? Presence of majority fragments Sequence composition du to PCR-amplification step in library preparation (Pickrell et al. 2010, Risso et al. 2011) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
15 Normalization Length bias (Oshlack 2009, Bullard et al. 2010) At same expression level, a long transcript will have more reads than a shorter transcript. Number of reads expression level µ = E(X) = cnl = Var(X) Test : X mesured number of reads in a library mapping a specific transcript, Poisson r.v. c proportionnality constant N total number of transcripts L gene length t = X 1 X 2 (cn1 L + cn 2 L) Power of test depends on a parameter prop. to (L). Identical result after normalization by gene length (but out of the Poisson framework). J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
16 Normalization StatOmique workshop At lot of different normalization methods... Some are part of models for DE, others are stand-alone They do not rely on similar hypotheses But all of them claim to remove technical bias associated with RNA-seq data Which one is the best? How to and on which criteria choice a normalisation adapted to our experiment? What impact of the bioinformatics, normalisation step or differential analysis method on lists of DE genes? J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
17 Normalization Global normalisation methods Normalised counts are raw counts divided by a scaling factor calculated for each sample Distribution adjustment Assumption (TC, UQ, Median) : read counts are prop. to expression level and sequencing depth Total number of reads : TC (Marioni et al. 2008) Quantile : FQ (Robinson and Smyth 2008) Upper Quartile : UQ (Bullard et al. 2010) Median Method taking length into account Reads Per KiloBase Per Million Mapped : RPKM (Mortazavi et al. 2008) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
18 Notations Normalization x ij : number of reads for gene i in sample j J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
19 Normalization Notations x ij : number of reads for gene i in sample j N j : number of reads in sample j (library size of sample j) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
20 Normalization Notations x ij : number of reads for gene i in sample j N j : number of reads in sample j (library size of sample j) n : number of samples in the experiment J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
21 Normalization Notations x ij : number of reads for gene i in sample j N j : number of reads in sample j (library size of sample j) n : number of samples in the experiment ŝ j : normalization factor associated with sample j J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
22 Normalization Notations x ij : number of reads for gene i in sample j N j : number of reads in sample j (library size of sample j) n : number of samples in the experiment ŝ j : normalization factor associated with sample j L i : length of gene i J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
23 Normalization Total read count normalization (TC) (Marioni et al. 2008) Adjust for lane sequencing depth (library size) Motivation greater lane sequencing depth => greater counts whatever the transcript length and the expression level Assumption read counts are proportional to expression level and sequencing depth (same RNAs in equal proportion) Method divide transcript read count by total number of reads x ij N j = x ij ŝ j, ŝ j = N j (1) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
24 Normalization Problem makes frequencies comparable between lanes, not read counts Solution rescale scaling factors so that their sum across lanes is equal to, e.g., the number of lanes Makes normalization procedures comparable ŝ j = 1 n N j l N l (2) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
25 Normalization Upper Quartile normalization UQ (Bullard et al. 2010) Adjust for lane sequencing depth Motivation total read count is strongly dependent on a few highly expressed transcripts Assumption read counts are proportional to expression level and sequencing depth Method divide transcript read count by, e.g., upper quartile x ij, ŝ j = Q3 j ŝ 1 j n l Q3 l Q3 is computed after exclusion of transcripts with no read count (3) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
26 Normalization Other quantile-based normalizations Median Adjust for lane sequencing depth x ij, ŝ j = median j ŝ 1 j l median l n (4) Full quantile normalization FQ (Robinson and Smyth 2008) Adjust for differences in count distribution (inspired from the microarray literature) Assumption read counts have identical distribution across lanes Method all quantiles of the count distributions are matched between lanes J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
27 Normalization RPKM normalization Reads Per Kilobase per Million mapped reads Adjust for lane sequencing depth (library size) and gene length Motivation greater lane sequencing depth and gene length => greater counts whatever the expression level Assumption read counts are proportional to expression level, gene length and sequencing depth (same RNAs in equal proportion) Method divide gene read count by total number of reads (in million) and gene length (in kilobase) x ij N j L i (5) Allows to compare expression levels between genes of the same sample Unbiased estimation of number of reads but affect the variance. (Oshlack et al. 2009) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
28 Normalization The Effective Library Size concept Motivation Different biological conditions express different RNA repertoires, leading to different total amounts of RNA Assumption A majority of transcripts is not differentially expressed Aim Minimizing effect of (very) majority sequences J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
29 Normalization Methods based on the Effective Library Size Concept Trimmed Mean of M-values Robinson et al (edger) Filter on transcripts with nul counts, on the resp. 30% and 5% more extreme M i = log2( Y ik /N k ) and A values Y ik /N k Hyp : We may not estimate the total ARN production in one condition but we may estimate a global expression change between two conditions from non extreme M i distribution. Calculation of a scaling factor between two conditions and normalization to avoid dependance on a reference sample Anders and Huber Package DESeq ŝ j = median i( k ij : number of reads in sample j assigned to gene i k ij ) (πv=1 m kiv )1/m denominator : pseudo-reference sample : geometric mean across samples J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
30 Normalization 4 real datasets and one simulated dataset RNA-seq or mirna-seq, DE, at least 2 conditions, at least 2 bio. rep., no tech. rep. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
31 Normalization Comparison procedures Distribution and properties of normalized datasets Boxplots, variability between biological replicates Comparison of DE genes Differential analysis : DESeq v1.6.1 (Anders and Huber 2010), default param. Number of common DE genes, similarity between list of genes (dendrogram - binary distance and Ward linkage) Power and control of the Type-I error rate simulated data non equivalent library sizes presence of majority genes J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
32 Normalization Normalized data distribution When large diff. in lib. size, TC and RPKM do not improve over the raw counts. Example : Mus musculus dataset J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
33 Normalization Within-condition variability Example : Mus musculus, condition D dataset J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
34 Number of DE genes Normalization DESeq v1.6.0, default parameters Input data : raw counts + scaling factors ŝ j (except RPKM) RPKM : normalized data non rounded and normalization parameter ŝ j = 1 Example : E. histolytica dataset, common genes J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
35 Normalization Consensus dendrogram - Ward linkage algo. Consensus matrice : Mean of the distance matrices obtained from each dataset J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
36 Normalization Type-I Error Rate and Power (Simulated data) Inflated FP rate for all the methods except TMM and DESeq J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
37 So the Winner is...? Normalization In most cases The methods yield similar results However... Differences appear based on data characteristics J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
38 So the Winner is...? Normalization In most cases The methods yield similar results However... Differences appear based on data characteristics J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
39 Interpretation Normalization RawCount Often fewer differential expressed genes (A. fumigatus : no DE gene) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
40 Interpretation Normalization RawCount Often fewer differential expressed genes (A. fumigatus : no DE gene) TC, RPKM Sensitive to the presence of majority genes Less effective stabilization of distributions Ineffective (similar to RawCount) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
41 Interpretation Normalization RawCount Often fewer differential expressed genes (A. fumigatus : no DE gene) TC, RPKM Sensitive to the presence of majority genes Less effective stabilization of distributions Ineffective (similar to RawCount) FQ Can increase between group variance Is based on an very (too) strong assumption (similar distributions) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
42 Interpretation Normalization RawCount Often fewer differential expressed genes (A. fumigatus : no DE gene) TC, RPKM Sensitive to the presence of majority genes Less effective stabilization of distributions Ineffective (similar to RawCount) FQ Can increase between group variance Is based on an very (too) strong assumption (similar distributions) Median High variability of housekeeping genes J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
43 Normalization Interpretation RawCount Often fewer differential expressed genes (A. fumigatus : no DE gene) TC, RPKM Sensitive to the presence of majority genes Less effective stabilization of distributions Ineffective (similar to RawCount) FQ Can increase between group variance Is based on an very (too) strong assumption (similar distributions) Median High variability of housekeeping genes TC, RPKM, FQ, Med, UQ Adjustment of distributions, implies a similarity between RNA repertoires expressed J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
44 Normalization Conclusions on normalization RNA-seq data are affected by technical biaises (total number of mapped reads per lane, gene length, composition bias) Csq1 : non-uniformity of the distribution of reads along the genome Csq2 : technical variability within and between-sample A normalization is needed and has a great impact on the DE genes (Bullard et al 2010), StatOmique Detection of differential expression in RNA-seq data is inherently biased (more power to detect DE of longer genes) Do not normalise by gene length in a context of differential analysis. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
45 Normalization Conclusions on StatOmique workshop Hypothesis : the majority of genes is invariant between two samples. Differences between methods when presence of majority sequences, very different library depths. TMM and DESeq : performant and robust methods in a DE analysis context on the gene scale. Normalisation is necessary and not trivial. Perspectives Application to transcripts : estimation count : to do. Comparison of differential analysis methods (in progress) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
46 Differential analysis Differential analysis Aim : To detect differentially expressed genes between two conditions Discrete quantitative data Few replicates Overdispersion problem Challenge : method which takes into account overdispersion and few number of replicates Proposed methods : edger, DESeq for the most used and known An abundant littérature Few Comparison of methods : Pachter et al. (2011), Kvam et Liu (2012) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
47 Differential analysis Differential analysis gene-by-gene- with replicates For each gene i Is there a significant difference in expression between condition A and B? Statistical model (definition and parameter estimation) - Generalized linear framework Test : Equality of relative abundance of gene i in condition A and B vs non-equality Let be Y ijk the count for replicate j in condition k from gene i Y ijk follows a Poisson distribution (µ ijk ). µ ijk m jk = exp(α i + β ij ) <=> log( µ ijk m jk ) = α i + β ij Property : Var(Y ijk ) = Mean(Y ijk ) = µ ijk In case of overdispersion, of the type I error rate (prob. to declare incorrectly a gene DE). J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
48 Differential analysis Alternative : Negative Binomial Models A supplementary dispersion parameter φ to model the variance How to obtain good estimates of overdispersion parameter for each gene? 1 Pseudo-likelihood (goodness-of-fit statistic) 2 Quasi-likelihood (deviance statistic) 3 Adjusted likelihood Many genes, very few biological samples - so difficult to estimate φ on a gene-by-gene basis J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
49 Differential analysis φ estimation - Some proposed solutions Method Variance Reference DESeq µ(1 + φ µ µ) Anders et Huber (2010) edger µ(1 + φµ) Robinson et Smyth (2009) NBPseq µ(1 + φµ α 1 ) Di et al. (2011) edger : borrow information across genes for stable estimates of φ : 3 ways to estimate φ (common, trend, moderated) DESeq : estimation of the relationship between mean and variance using a local or parametric regression NBPseq : φ and α estimated by LM based on all the genes. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
50 Differential analysis Negative Binomial Models µ ijk = λ ij m jk where m j k : size factor (library size) Test : H 0i : λ ia = λ ib vs H 1i : λ ia λib edger DESeq Adjust observed counts up or down depending on whether library sizes are below or above the geometric mean => Creates approximately identically distributed pseudodata Estimation of φ i by conditional ML conditioning on the TC for gene i Empirical Bayes procedure to shrink dispersions toward a consensus value An exact test analogous to Fisher s exact test but adapted to overdispersed data (Robinson and Smyth 2008) Test similar to Fisher s exact test (calculation has changed) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
51 Differential analysis Practical considerations Input Data = raw counts normalization offsets are included in the model Version matters : edger et DESeq (Bioconductor 2.9) edger TMM normalization Common dispersion must be estimated before tagwise dispersions GLM functionality (for experiments with multiple factors) now available DESeq Two possibilities to obtain a smooth functions fj ( ) Conservative estimates : genes are assigned the maximum of the fitted and empirical values of α i (sharingmode = "maximum") Local fit regression (as described in paper) is no longer the default Each column = independent biological replicate J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
52 Differential analysis edger or DESeq? None is perfect! results obtained by these two methods are mostly the same Robles et al edger DESeq a slightly inflated FPR from edger (small values of n or only high-counts transcripts) Performance improves as number of replicates increases conservative whatever n over-conservative behaviour when only low-counts (<100) Remark : non-parametric methods not enough detection power (Kvam and Liu 2012). J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
53 Differential analysis Conclusions on differential analysis Methods dedicated to microarrays are not applicable to RNA-seq Fisher test is adapted only in the case without replicate (do not allow inference) Poisson model is not adapted to experiments with biological replicates Negative binomial model : distinction of methods on the information sharing for modelisation of the dispersion parameter Negative binomial model not enough flexible? (how to take into account zero-inflation and heavy tail) : ZI-BN, Tweedie-Poisson Numerous methods are regularly proposed : no standard J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
54 Differential analysis Other challenges Multiple testing Usual False Discovery Rate control procedures (e.g Benjamini-Hochberg) assume accurate pvalues (not based on asymptotic distribution of test statistics). The non-respect of this hypothesis may conduct to strong control of the FDR. (Li et Tibshirani 2011) After differential analysis... Les tests d enrichissement de catégorie fonctionnelle supposent que les gènes ont la même chance DE. Or parfois sur-détection des gènes les plus long et les plus exprimés. GOSeq (Young et al. 2011) Prediction Prise en compte du fléau de la dimension p«n J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
55 Conclusions General conclusions and perspectives Pratical conclusions Need to collaborate between biologists, bioinformaticians et statisticians and in a ideal world since the project construction Adaptation of methods and tools to the asked question (no pipeline) Validation of data quality and of data normalization Still a lot of subjects for statisticians J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
56 Conclusions Aknowledgements - StatOmique All the participants of the StatOmique workshop :M.-A. Dillies, B. Jagla, A. Rau, J. Estelle, G. Guernec, L. Jouneau, B. Schaeffer, D. Laloe, C. Hennequet-Antier, M. Jeanmougin, M. Guedj, N. Servant, C. Keime, D. Castel, S. Le Crom, F. Jaffrezic, G. Marot, C. Le Gall, D. Charif The biologists who annotated or accepted their data be included in the study : C. Chau Hon, T. Strub, I. Davidson, G. Janbon J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
57 R and Bioconductor R est un environnement logiciel libre pour le calcul statistique et les graphiques. Open source, Open development, sous licence GNU General Public Licence. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
58 R and Bioconductor Installer R et ses extensions R est disponible sous forme d exécutables pré-compilés. Il est téléchargeable via un réseau mondial de sites miroirs : le CRAN (Comprehensive R Archive Network). Installer un package supplémentaire setrepositories() install.packages( monpackage ) Charger un package library(monpackage) Rq : attention aux dépendances entre les packages. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
59 A quoi ça ressemble? R and Bioconductor En ligne de commance ou avec interface graphique. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
60 R and Bioconductor R un langage de programmation R est un langage interprété et non compilé. Utiliser R consiste à manipuler des objets (données ou fonctions). Un objet est défini par : un type : vecteur, tableau, matrice, liste, etc. un mode : numerique, caractère, complexe, logique. une taille : nombre d éléments contenus dans l objet. une valeur : NA si manquant, NaN, -Inf, Inf. un nom Tests, boucles, fonctions, classes d objets etc. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
61 R and Bioconductor Chercher de l aide help.start() : affiche un menu d aide help(mafonction) : affiche la description de mafonction?mafonction : affiche la description de mafonction help.search( motclef ) : donne la liste des commandes dans lesquelles apparait motclef apropos( motclef ) : affiche toutes les fonctions dont le nom contient motclef Vignettes : certains packages disposent de Vignettes (guide utilisateur). Remarque : on peut utiliser des expressions régulières (?regexp) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
62 R and Bioconductor Quelques fonctions utiles getwd() : affiche le répertoire de travail setwd( CheminMonRepertoire ) : spécifier son répertoire de travail list.files() : quels fichiers dans mon répertoire de travail? ls() : quels objets dans ma session R? load( monanciennesession.rdata ) : charger un environnement save() : sauver un environnement. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
63 R and Bioconductor J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
64 R and Bioconductor Bioconductor Projet dédié à l analyse de données génomiques à haut débit sous R. pour les microarrays, les données de séquences, l annotation, high-throughput assays 2 versions par an suite à chaque nouvelle version de R. Actuellement : v2.9 (R 2.15) Installation 516 bibliothèques de fonctions (libraries) pour l analyse et la réprésentation graphique plus de 500 bibliothèques d annotation source("http ://bioconductor.org/bioclite.r") bioclite( monpackage ) bioclite(c("pkg1", "pkg2")) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
65 R and Bioconductor Bioconductor pour l analyse de données NGS Différents types de données gérés : ChIPseq, RNAseq, reséquençage Différents types de fichiers gérés : Fasta, fastaq, BAM, GFF, BED, WIG Interface avec Sequence Read Archive : SRAdb J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
66 R and Bioconductor Bioconductor pour l analyse de données NGS Différents types d analyse Manipulation des séquences, des génomes ou régions génomiques Biostrings, IRanges, GenomicRanges, genomeintervals Analyse qualité, prétraitement ShortRead, Rsamtools ChIPseq : Recherche et annotation des pics, recherche de motifs chipseq, chipseqr, ChIPpeakAnno, BayesPeak, PICS, CSAR, ChIPsim, rgadem RNAseq : Recherche de gènes différentiellement exprimés, d évènements d épissage alternatif DESeq, DEXSeq, edger, bayseq, DEGseq, etc. Reséquençage ciblé : analyse de l enrichissement : TEQC from C. Keime J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
67 R and Bioconductor First commands Installation des packages : source("http :// bioclite(c("deseq", "edger")) Chargement des packages : library(deseq) library(edger) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
68 R and Bioconductor edger main commands generate raw counts from NB, create list object y <- matrix(rnbinom(80,size=1/0.2,mu=10),nrow=20,ncol=4) rownames(y) <- paste("gene",1 :nrow(y),sep=".") group <- factor(c(1,1,2,2)) perform DA with edger y <- DGEList(counts=y,group=group) y <- calcnormfactors(y,method="tmm") y <- estimatecommondisp(y) y <- estimatetagwisedisp(y) result <- exacttest(y,dispersion="tagwise") Observe some results - DGE with FDR BH toptags(result) summary(decidetestsdge(result),p.value=0.05) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
69 DESeq main commands R and Bioconductor cds <- newcountdataset(y, group) cds <- estimatesizefactors(cds) sizefactors(cds) cds <- estimatedispersions(cds) res <- nbinomtest( cds, "1", "2" ) J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
70 R and Bioconductor Quelques références pour débuter : manuel, FAQ, RJournal, etc... cran.r-project.org/doc/contrib/paradis-rdebuts_fr.pdf G. Millot, (2009), Comprendre et réaliser les tests statistiques à l aide de R, Editions De Boeck, 704 p. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
71 R and Bioconductor References Anders, S and Huber, W. (2010) Differential expression analysis for sequence count data, Genome Biology,11 :R106. Bullard JH, Purdom E, Hansen KD, Dudoit S. (2010) Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments, BMC Bioinformatics, 11 :94 Di Y, Schaef, DW, Cumbie JS, Chang JH (2011) The NBP Negative Binomial Model for Assessing Differential Gene Expression from RNA-Seq, Statistical Applications in Genetics and Molecular Biology, 10(1), Article 24. J. Aubert () Stat. challenges RNA-Seq ETEGE, 2012 April / 65
72 Kvam V, Liu P (2012) A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data Li J, Jiang H, Wong WH, (2010) Modeling non-uniformity in short read rates in RNA-Seq data, Genome Biology, 11 :R50 Li J, Witten DM, Johnstone IM, Tisbhirani R (2011) Normalization, testing, and false discovery rate estimation for RNA-sequencing data, Biostatistics, 1-16 Marioni J.C., Mason C.E. et al. (2008) RNA-seq : An assessment of technical reproducibility and comparison with gene expression arrays, Genome Research, 18 : Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. (2008) Mapping and quantifying mammalian transcriptomes by RNA-seq. Nature Methods, 5(7), Robles et al (2012) Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing, preprint Pachter J. Aubert L () (2011) ModelsStat. for challenges transcript RNA-Seq quantification ETEGE, 2012 from April / 65 References R and Bioconductor
73 Robinson MD, Oshlack A. (2010) A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biology, 11 :R25 Robinson MD and Smyth, GK. (2008) Small-sample estimation of negative binomial dispersion, with applications to SAGE data Biostatistics, 9, 2 ; Robinson MD, McCarthy DJ, Smyth, GK. (2009) edger : a Bioconductor package for differential expression analysis of digital gene expression data, Bioinformatics Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. (2011) Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation, Nature Biotechnology, 28(5) : 511?515. Young MD, Wakefiled MJ, Smyth GK., Oshlack A. (2011) Gene ontology analysis for RNA-seq :accounting for selection bias, Genome J. Aubert () Biology Stat. challenges RNA-Seq ETEGE, 2012 April / 65 References R and Bioconductor
Gene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
Introduction. Overview of Bioconductor packages for short read analysis
Overview of Bioconductor packages for short read analysis Introduction General introduction SRAdb Pseudo code (Shortread) Short overview of some packages Quality assessment Example sequencing data in Bioconductor
RNA-seq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
Normalization of RNA-Seq
Normalization of RNA-Seq Davide Risso Modified: April 27, 2012. Compiled: April 27, 2012 1 Retrieving the data Usually, an RNA-Seq data analysis from scratch starts with a set of FASTQ files (see e.g.
Expression Quantification (I)
Expression Quantification (I) Mario Fasold, LIFE, IZBI Sequencing Technology One Illumina HiSeq 2000 run produces 2 times (paired-end) ca. 1,2 Billion reads ca. 120 GB FASTQ file RNA-seq protocol Task
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Challenges associated with analysis and storage of NGS data
Challenges associated with analysis and storage of NGS data Gabriella Rustici Research and training coordinator Functional Genomics Group [email protected] Next-generation sequencing Next-generation sequencing
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K.
edger: differential expression analysis of digital gene expression data User s Guide Yunshun Chen, Davis McCarthy, Mark Robinson, Gordon K. Smyth First edition 17 September 2008 Last revised 8 October
Frequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable
Practical Differential Gene Expression. Introduction
Practical Differential Gene Expression Introduction In this tutorial you will learn how to use R packages for analysis of differential expression. The dataset we use are the gene-summarized count data
Statistical analysis of modern sequencing data quality control, modelling and interpretation
Statistical analysis of modern sequencing data quality control, modelling and interpretation Jörg Rahnenführer Technische Universität Dortmund, Fakultät Statistik Email: [email protected]
Statistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
NGS Data Analysis: An Intro to RNA-Seq
NGS Data Analysis: An Intro to RNA-Seq March 25th, 2014 GST Colloquim: March 25th, 2014 1 / 1 Workshop Design Basics of NGS Sample Prep RNA-Seq Analysis GST Colloquim: March 25th, 2014 2 / 1 Experimental
Basic processing of next-generation sequencing (NGS) data
Basic processing of next-generation sequencing (NGS) data Getting from raw sequence data to expression analysis! 1 Reminder: we are measuring expression of protein coding genes by transcript abundance
PreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq
EDASeq: Exploratory Data Analysis and Normalization for RNA-Seq Davide Risso Modified: May 22, 2012. Compiled: October 14, 2013 1 Introduction In this document, we show how to conduct Exploratory Data
Comparing Methods for Identifying Transcription Factor Target Genes
Comparing Methods for Identifying Transcription Factor Target Genes Alena van Bömmel (R 3.3.73) Matthew Huska (R 3.3.18) Max Planck Institute for Molecular Genetics Folie 1 Transcriptional Regulation TF
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected]
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) [email protected] 1 RNA Transcription to RNA and subsequent
Sun Management Center Change Manager 1.0.1 Release Notes
Sun Management Center Change Manager 1.0.1 Release Notes Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817 0891 10 May 2003 Copyright 2003 Sun Microsystems, Inc. 4150
RNAseq / ChipSeq / Methylseq and personalized genomics
RNAseq / ChipSeq / Methylseq and personalized genomics 7711 Lecture Subhajyo) De, PhD Division of Biomedical Informa)cs and Personalized Biomedicine, Department of Medicine University of Colorado School
Analysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
False Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
Solaris 10 Documentation README
Solaris 10 Documentation README Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817 0550 10 January 2005 Copyright 2005 Sun Microsystems, Inc. 4150 Network Circle, Santa
N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In
N1 Grid Service Provisioning System 5.0 User s Guide for the Linux Plug-In Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 819 0735 December 2004 Copyright 2004 Sun Microsystems,
Régression logistique : introduction
Chapitre 16 Introduction à la statistique avec R Régression logistique : introduction Une variable à expliquer binaire Expliquer un risque suicidaire élevé en prison par La durée de la peine L existence
Package empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Row Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
Computational Genomics. Next generation sequencing (NGS)
Computational Genomics Next generation sequencing (NGS) Sequencing technology defies Moore s law Nature Methods 2011 Log 10 (price) Sequencing the Human Genome 2001: Human Genome Project 2.7G$, 11 years
Using Galaxy for NGS Analysis. Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org
Using Galaxy for NGS Analysis Daniel Blankenberg Postdoctoral Research Associate The Galaxy Team http://usegalaxy.org Overview NGS Data Galaxy tools for NGS Data Galaxy for Sequencing Facilities Overview
Sun StorEdge A5000 Installation Guide
Sun StorEdge A5000 Installation Guide for Windows NT Server 4.0 Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 USA 650 960-1300 Fax 650 969-9131 Part No. 805-7273-11 October 1998,
Introduction au BIM. ESEB 38170 Seyssinet-Pariset Economie de la construction email : [email protected]
Quel est l objectif? 1 La France n est pas le seul pays impliqué 2 Une démarche obligatoire 3 Une organisation plus efficace 4 Le contexte 5 Risque d erreur INTERVENANTS : - Architecte - Économiste - Contrôleur
POB-JAVA Documentation
POB-JAVA Documentation 1 INTRODUCTION... 4 2 INSTALLING POB-JAVA... 5 Installation of the GNUARM compiler... 5 Installing the Java Development Kit... 7 Installing of POB-Java... 8 3 CONFIGURATION... 9
New Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
Statistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
RNA-Seq Tutorial 1. John Garbe Research Informatics Support Systems, MSI March 19, 2012
RNA-Seq Tutorial 1 John Garbe Research Informatics Support Systems, MSI March 19, 2012 Tutorial 1 RNA-Seq Tutorials RNA-Seq experiment design and analysis Instruction on individual software will be provided
BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am)
BIOS 6660: Analysis of Biomedical Big Data Using R and Bioconductor, Fall 2015 Computer Lab: Education 2 North Room 2201DE (TTh 10:30 to 11:50 am) Course Instructor: Dr. Tzu L. Phang, Assistant Professor
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
SAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
17 July 2014 WEB-SERVER MANUAL. Contact: Michael Hackenberg ([email protected])
WEB-SERVER MANUAL Contact: Michael Hackenberg ([email protected]) 1 1 Introduction srnabench is a free web-server tool and standalone application for processing small- RNA data obtained from next generation
Il est repris ci-dessous sans aucune complétude - quelques éléments de cet article, dont il est fait des citations (texte entre guillemets).
Modélisation déclarative et sémantique, ontologies, assemblage et intégration de modèles, génération de code Declarative and semantic modelling, ontologies, model linking and integration, code generation
Next Generation Sequencing: Adjusting to Big Data. Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013
Next Generation Sequencing: Adjusting to Big Data Daniel Nicorici, Dr.Tech. Statistikot Suomen Lääketeollisuudessa 29.10.2013 Outline Human Genome Project Next-Generation Sequencing Personalized Medicine
Lectures 1 and 8 15. February 7, 2013. Genomics 2012: Repetitorium. Peter N Robinson. VL1: Next- Generation Sequencing. VL8 9: Variant Calling
Lectures 1 and 8 15 February 7, 2013 This is a review of the material from lectures 1 and 8 14. Note that the material from lecture 15 is not relevant for the final exam. Today we will go over the material
Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
Statistiques en grande dimension
Statistiques en grande dimension Christophe Giraud 1,2 et Tristan Mary-Huart 3,4 (1) Université Paris-Sud (2) Ecole Polytechnique (3) AgroParistech (4) INRA - Le Moulon M2 MathSV & Maths Aléa C. Giraud
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert May 3, 2016 Abstract FlipFlop implements a fast method for de novo transcript
Sun StorEdge RAID Manager 6.2.21 Release Notes
Sun StorEdge RAID Manager 6.2.21 Release Notes formicrosoftwindowsnt Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 USA 650 960-1300 Fax 650 969-9131 Part No. 805-6890-11 November
Introduction. GEAL Bibliothèque Java pour écrire des algorithmes évolutionnaires. Objectifs. Simplicité Evolution et coévolution Parallélisme
GEAL 1.2 Generic Evolutionary Algorithm Library http://dpt-info.u-strasbg.fr/~blansche/fr/geal.html 1 /38 Introduction GEAL Bibliothèque Java pour écrire des algorithmes évolutionnaires Objectifs Généricité
3 The spreadsheet execution model and its consequences
Paper SP06 On the use of spreadsheets in statistical analysis Martin Gregory, Merck Serono, Darmstadt, Germany 1 Abstract While most of us use spreadsheets in our everyday work, usually for keeping track
Modifier le texte d'un élément d'un feuillet, en le spécifiant par son numéro d'index:
Bezier Curve Une courbe de "Bézier" (fondé sur "drawing object"). select polygon 1 of page 1 of layout "Feuillet 1" of document 1 set class of selection to Bezier curve select Bezier curve 1 of page 1
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and Genome-Wide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
Detection of water leakage using laser images from 3D laser scanning data
Detection of water leakage using laser images from 3D laser scanning data QUANHONG FENG 1, GUOJUAN WANG 2 & KENNERT RÖSHOFF 3 1 Berg Bygg Konsult (BBK) AB,Ankdammsgatan 20, SE-171 43, Solna, Sweden (e-mail:[email protected])
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology
University of Glasgow - Programme Structure Summary C1G5-5100 MSc Bioinformatics, Polyomics and Systems Biology Programme Structure - the MSc outcome will require 180 credits total (full-time only) - 60
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Statistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
Segmentation models and applications with R
Segmentation models and applications with R Franck Picard UMR 5558 UCB CNRS LBBE, Lyon, France [email protected] http://pbil.univ-lyon1.fr/members/fpicard/ INED-28/04/11 F. Picard (CNRS-LBBE)
System Requirements Orion
Orion Date 21/12/12 Version 1.0 Référence 001 Auteur Antoine Crué VOS CONTACTS TECHNIQUES JEAN-PHILIPPE SENCKEISEN ANTOINE CRUE LIGNE DIRECTE : 01 34 93 35 33 EMAIL : [email protected] LIGNE DIRECTE
Sun Enterprise Optional Power Sequencer Installation Guide
Sun Enterprise Optional Power Sequencer Installation Guide For the Sun Enterprise 6500/5500 System Cabinet and the Sun Enterprise 68-inch Expansion Cabinet Sun Microsystems, Inc. 901 San Antonio Road Palo
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics
Analysis and Integration of Big Data from Next-Generation Genomics, Epigenomics, and Transcriptomics Christopher Benner, PhD Director, Integrative Genomics and Bioinformatics Core (IGC) idash Webinar,
Systems Biology of Cancer: which pathways to choose? Emmanuel Barillot
Systems Biology of Cancer: which pathways to choose? Emmanuel Barillot Central questions to cancer On the clinical side: predict the phenotype On the biological side: explain tumorigenesis and tumoral
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis
Comparing Functional Data Analysis Approach and Nonparametric Mixed-Effects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &
Next Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
CRAC: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data.
: An integrated approach to analyse RNA-seq reads Additional File 3 Results on simulated RNA-seq data. Nicolas Philippe and Mikael Salson and Thérèse Commes and Eric Rivals February 13, 2013 1 Results
Core Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
Service Level Definitions and Interactions
Service Level Definitions and Interactions By Adrian Cockcroft - Enterprise Engineering Sun BluePrints OnLine - April 1999 http://www.sun.com/blueprints Sun Microsystems, Inc. 901 San Antonio Road Palo
Personnalisez votre intérieur avec les revêtements imprimés ALYOS design
Plafond tendu à froid ALYOS technology ALYOS technology vous propose un ensemble de solutions techniques pour vos intérieurs. Spécialiste dans le domaine du plafond tendu, nous avons conçu et développé
RETRIEVING SEQUENCE INFORMATION. Nucleotide sequence databases. Database search. Sequence alignment and comparison
RETRIEVING SEQUENCE INFORMATION Nucleotide sequence databases Database search Sequence alignment and comparison Biological sequence databases Originally just a storage place for sequences. Currently the
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Troncatures dans les modèles linéaires simples et à effets mixtes sous R
Troncatures dans les modèles linéaires simples et à effets mixtes sous R Lyon, 27 et 28 juin 2013 D.Thiam, J.C Thalabard, G.Nuel Université Paris Descartes MAP5 UMR CNRS 8145 IRD UMR 216 2èmes Rencontres
business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
Upgrading the Solaris PC NetLink Software
Upgrading the Solaris PC NetLink Software By Don DeVitt - Enterprise Engineering Sun BluePrints OnLine - January 2000 http://www.sun.com/blueprints Sun Microsystems, Inc. 901 San Antonio Road Palo Alto,
Mir-X mirna First-Strand Synthesis Kit User Manual
User Manual Mir-X mirna First-Strand Synthesis Kit User Manual United States/Canada 800.662.2566 Asia Pacific +1.650.919.7300 Europe +33.(0)1.3904.6880 Japan +81.(0)77.543.6116 Clontech Laboratories, Inc.
Introduction ToIP/Asterisk Quelques applications Trixbox/FOP Autres distributions Conclusion. Asterisk et la ToIP. Projet tuteuré
Asterisk et la ToIP Projet tuteuré Luis Alonso Domínguez López, Romain Gegout, Quentin Hourlier, Benoit Henryon IUT Charlemagne, Licence ASRALL 2008-2009 31 mars 2009 Asterisk et la ToIP 31 mars 2009 1
Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center
Computational Challenges in Storage, Analysis and Interpretation of Next-Generation Sequencing Data Shouguo Gao Ph. D Department of Physics and Comprehensive Diabetes Center Next Generation Sequencing
From the help desk: Bootstrapped standard errors
The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution
Introduction to Bioinformatics 3. DNA editing and contig assembly
Introduction to Bioinformatics 3. DNA editing and contig assembly Benjamin F. Matthews United States Department of Agriculture Soybean Genomics and Improvement Laboratory Beltsville, MD 20708 [email protected]
Exploratory data analysis (Chapter 2) Fall 2011
Exploratory data analysis (Chapter 2) Fall 2011 Data Examples Example 1: Survey Data 1 Data collected from a Stat 371 class in Fall 2005 2 They answered questions about their: gender, major, year in school,
Advanced Software Engineering Agile Software Engineering. Version 1.0
Advanced Software Engineering Agile Software Engineering 1 Version 1.0 Basic direction A agile method has to be A method is called agile if it follows Incremental the principles of the agile manifest.
NATIONAL GENETICS REFERENCE LABORATORY (Manchester)
NATIONAL GENETICS REFERENCE LABORATORY (Manchester) MLPA analysis spreadsheets User Guide (updated October 2006) INTRODUCTION These spreadsheets are designed to assist with MLPA analysis using the kits
Protein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
SunFDDI 6.0 on the Sun Enterprise 10000 Server
SunFDDI 6.0 on the Sun Enterprise 10000 Server Sun Microsystems, Inc. 901 San Antonio Road Palo Alto, CA 94303-4900 USA 650 960-1300 Fax 650 969-9131 Part No.: 806-3610-11 November 1999, Revision A Send
Quantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
A Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
How Sequencing Experiments Fail
How Sequencing Experiments Fail v1.0 Simon Andrews [email protected] Classes of Failure Technical Tracking Library Contamination Biological Interpretation Something went wrong with a machine
Audit de sécurité avec Backtrack 5
Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI Habib Université de Versailles Saint-Quentin-En-Yvelines 24-05-2012 UVSQ - Audit de sécurité avec Backtrack 5 DUMITRESCU Andrei EL RAOUSTI
Module 1. Sequence Formats and Retrieval. Charles Steward
The Open Door Workshop Module 1 Sequence Formats and Retrieval Charles Steward 1 Aims Acquaint you with different file formats and associated annotations. Introduce different nucleotide and protein databases.
Scrubbing Disks Using the Solaris Operating Environment Format Program
Scrubbing Disks Using the Solaris Operating Environment Format Program By Rob Snevely - Enterprise Technology Center Sun BluePrints OnLine - June 2000 http://www.sun.com/blueprints Sun Microsystems, Inc.
