Supervised analysis of gene expression data
|
|
- Oswin Wilson
- 7 years ago
- Views:
Transcription
1 Supervised analysis of gene expression data Bing Zhang Department of Biomedical Informatics Vanderbilt University
2 Gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product. For a specific cell at a specific time, only a subset of the genes coded in the genome are expressed. Transcriptional control is critical in gene expression regulation. Measure of mrna expression level can Provide a good indicator of corresponding protein expression level Provide insight on the mechanisms of transcriptional regulation graph courtesy of Wikipedia
3 Candidate gene approach vs high-throughput approach Chalcone synthase Protein kinase Actin Northern 0 10m 30m 1h 3h 6h 24h Microarray 10m 30m 1h 3h 6h 24h Advantages of high-throughput technologies High-throughput Exploratory analysis Relationship between genes or between samples Challenges in high-throughput technologies Cost Data analysis
4 High-throughput transcriptome profiling approaches Transcriptome: the set of all messenger RNA (mrna) molecules, or "transcripts, produced in one or a population of cells. Hybridization based approaches: incubating fluorescently labeled cdna with microarrays. Hybridization signal is measured. cdna microarray (printed arrays) High density olio arrays (synthesized arrays) Sequencing based approaches: directly determine the cdna sequence. Count is measured. Sanger sequencing of cdna or EST libraries Serial Analysis of Gene Expression (SAGE) Massively Parallel Signature Sequencing (MPSS) RNA-Seq
5 Array preparation Array preparation PerfectPerfect match match Probe Probe set set Mismatch Mismatch Insert amplification Insert amplification by PCR by PCR Vector-specific Vector-specific primersprimers Gene-specific Gene-specific primersprimers In situ In synthesis situ synthesis by photolithography by photolithography Microarray: two-color vs single-color PrintingPrinting Coupling Coupling Denaturing Denaturing Array 2 Array 2 Array 1 Array 1 Ratio array Ratio1/array array 1/array 2 2 Target preparation Target preparation Ratio Cy5/Cy3 Ratio Cy5/Cy3 Staining Staining hybridization hybridization Hybridization Hybridization mixingmixing Cy3 Cy3 orcy3 Cy5or Cy5 labelled labelled cdna cdna Cy5 Cy3 TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT TTTTTTTT First-strand First-strand cdna cdna synthesis synthesis Total RNA Total RNA Cells/tissue Cells/tissue two-color arrays Biotin-labelled Biotin-labelled crna crna Cy5 In vitrointranscription vitro transcription TTTTTTTTTTTTTTTT T7 T7 Double-stranded Double-stranded cdna cdna TTTTTTTTTTTTTTTT T7 T7 cdna synthesis cdna synthesis PolyA+PolyA+ RNA RNA Cells/tissue Cells/tissue single-color arrays FigureFigure 1 Schematic 1 Schematic overview overview of probe of array probeand array target and preparation target preparation for spotted for spotted intensities intensities and ratios and of ratios mrna of abundance mrna abundance for the for genes the represented genes represented on the on array. the array. Schulze andb,downward, Nature Cell Biol, 3:E190, 2001 cdna microarrays cdna microarrays and high-density and high-density oligonucleotide oligonucleotide microarrays. microarrays. a, cdna a,microarcdna microarhigh-density b, High-density oligonucleotide oligonucleotide microarrays. microarrays. Array preparation: Array preparation: sequences sequences of of rays. Array rays.preparation: Array preparation: inserts inserts from cdna fromcollections cdna collections or libraries or libraries (such as (such IMAGE as IMAGE short oligonucleotides short oligonucleotides (typically (typically 25mers) 25mers) are chosen are chosen from the from mrna the reference mrna reference Applied Bioinformatics, Spring 2011 libraries) libraries) are amplified are amplified using either using vector-specific either vector-specific or gene-specific or gene-specific primers. primers. PCR PCR sequence sequence of eachofgene, each often gene,representing often representing the most theunique most unique part of part the transcript of the transcript in in products products are printed are printed at specified at specified sites onsites glass onslides glass using slideshigh-precision using high-precision arrayingarrayingthe 5 -untranslated the 5 -untranslated region. region. Light-directed, Light-directed, in situ oligonucleotide in situ oligonucleotide synthesis synthesis is usedistoused to robots.robots. ThroughThrough the usethe of chemical use of chemical linkers,linkers, selective selective covalent covalent attachment attachment of the of the generate generate high-density high-density probe arrays probe arrays containing containing over 300,000 over 300,000 individual individual elements. elements. + + RNA from RNAdifferent from different tissuestissues or cell populations or cell populations is usedistoused to coding coding strand strand to the glass to thesurface glass surface can be can achieved. be achieved. Target preparation: Target preparation: RNA from RNA from Target preparation: Target preparation: polya polya
6 Overall workflow of a microarray study Biological question Experiment design Microarray experiment Image analysis Pre-processing Data Analysis Experimental verification Hypothesis
7 Data matrix Samples Genes!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$
8 Three major goals of gene expression studies Class comparison (supervised analysis) e.g. disease biomarker discovery Differential expression analysis Input: gene expression data, class label of the samples Output: differentially expressed genes Class detection (unsupervised analysis) e.g. patient subgroup detection Clustering analysis Input: gene expression data Output: groups of similar samples or genes Class prediction (supervised learning) e.g. disease diagnosis and prognosis Machine learning techniques Input: gene expression data, class label of the samples (training data) Output: prediction model!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$
9 Data preprocessing I: missing value imputation Replace with zeros Replace all missing values with 0 Replace with row averages Replace missing values with mean of available values in each row (gene) KNN imputation Estimate missing values via the K-nearest neighbors analysis
10 Data preprocessing II: normalization To make arrays comparable Adjust the arrays using some control or housekeeping genes that you would expect to have the same intensity level across all of the samples Adjust using spike control Multiply each array by a constant to make the mean (median) intensity the same for each individual array (Global normalization) Match the percentiles of each array (Quantile normalization) No normalization Global normalization Quantile normalization
11 Data preprocessing III: transformation To make the data more closely meet the assumptions of a statistical inference procedure log transformation to improve normality Histogram of a Histogram of log(a) Frequency Frequency a log(a)
12 Differential expression Samples Genes!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$ Case Control
13 Fold change n-fold change Arbitrarily selected fold change cut-offs Usually 2 fold Pros Intuitive Simple and rapid Cons Statistically inefficient Magnitude does not necessarily indicate importance
14 Statistical analysis: hypothesis testing Samples Genes!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$ Case A statistical hypothesis is an assumption about a population parameter, e.g. group mean. Control Null hypothesis Alternative hypothesis H 0 : µ 1 = µ 2 H 1 : µ 1 " µ 2
15 Statistical analysis: comparing means of two groups Parametric method Student s t-test Assumes normal distribution of the data Non-parametric method Mann-Whitney U test GeneX t-test: p=0.06; U test: p=0.1 GeneX t-test: p=0.32; U test: p=0.1 Does not rely on data belonging to any particular distribution Based on ranks of observations Student s t-test vs Mann-Whitney U test Robustness: U-test is more robust to outliers Efficiency: When normality holds, the efficiency of the U-test is about 0.95 when compared to the t-test. For distributions sufficiently far from normal and for sufficiently large sample sizes, the U-test can be considerably more efficient than the t-test.
16 Statistical tests for different types of comparisons DATA Continuous/ normal Rank Nominal Compare two unpaired groups Unpaired t- test Mann- Whitney test Fisher s exact test or chi-square test G O A L Compare two paired groups Compare three or more groups Association to quantitative phenotypes Paired t-test One-way ANOVA Pearson s correlation Wilcoxon test Kruskal- Wallis test Spearman s correlation McNemar s test Chi-square test Contingency coefficients
17 Correction for multiple testing: why? In an experiment with a 10,000-gene array in which the significance level p is set at 0.05, 10,000 x 0.05 = 500 genes would be inferred as significant even though none is differentially expressed The probability of drawing the wrong conclusion in at least one of the n different test is P(wrong) =1" (1 " # s ) n = # g " s Where is the significance level at single gene level, and is the global significance level. " g Each row is a test!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$ n " s " g
18 Correction for multiple testing: how? Control the family-wise error rate (FWER), the probability that there is a single type I error in the entire set (family) of hypotheses tested. e.g. Standard Bonferroni Correction: uncorrected p value x no. of genes tested Control the false discovery rate (FDR), the expected proportion of false positives among the number of rejected hypotheses. e.g. Benjamini and Hochberg correction. Ranking all genes according to their p value Picking a desired FDR level, q (e.g. 5%) p " i m q Starting from the top of the list, accept all genes with, where i is the number of genes accepted so far, and m is the total number of genes tested. p Bonferroni Rank (i) q (i/m)*q significant?
19 Resources Data source Gene Expression Omnibus (GEO): ArrayExpress: Microarray data analysis tools Bioconductor: Expression profiler:
20 Summary Three major goals of gene expression studies Class comparison Class detection Class prediction Gene expression data pre-processing steps Missing data imputation Normalization Transformation Statistical tests for two group comparative studies Student s t-test Mann-Whitney U test Multiple-test adjustment Control the family-wise error rate (FWER) Control the false discovery rate (FDR)
21 Exercise Data set: james_west_2005_hne_6h_60vs0.txt (or james_west_2005_hne_6h_60vs0_head100.txt) probe sets (or the top 100 probe sets) Two groups (HNE0 and HNE60, three replicates in each group) No missing value; Already normalized; Already log transformed Use t-test in expression profiler ( or excel to identify genes that are differentially expressed between the two groups. Apply multiple test adjustment on the raw p-values!"#$%&'%(&)* +,-.&/ +,-.&0 +,-.&1 +,-2.&/ +,-2.&0 +,-2.&1 /..3&'&4(!"#!!!!"$%&$!"$'()!"$')&!"$#&'!"*%(* /.51&4( +")$$! +")!*$ +"'&+' +"&))) +")&%' +"&'+' //3&4( ("%(%% ("%%*' #"+%'( +"%')'!"#*!& +"&##* /0/&4( +"()(' +"(''% +"#)&% +"($!) +"('&& +"(*'$ /055&6&4( '"&!%) '"'##+ '"&*#% '"*(%% '"'$(* '"&+(+ /078&4( #"*$$# #"&*!) #"&%$* #"'&+% #"$%(' #"&(() /1/2&4( #"$($+ #"$**% #"'(%+ #"##*# #"#'*! #"'#!! /10.&4( #"$'+( #"$*!! #"$')% #"##%$ #"$+!( #"(&*# /8.5&)&4( '"*&#% '"'#'% '")'*! '"*'#& '"*!(# '"#!'+ /81/&4( $"&)+) $"&%(% $"&#$( $"&!&* $"&$&& $")!%! /819&4( ("%)$$ #"+*$+ #"+&') ("%&'! ("%)'& ("%+() /893&4(!"#*#)!"'!(+!"''+!!"''(%!"$*))!"'&&$ /878&:&4( ("*&+# ("*+%) ("%!!# ("&#'! ("#%$! ("&+'+ /550052&4&4( )%"#&'$ )%"$&*$ )%"#$&& )%"'&%$ )%"&*'' )%"*)'' /550053&4&4( )%"*&&' )%")('+ )%")++& )%"&'#' )%"&)+) )%"&'%$
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
More informationStatistical tests for SPSS
Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationMicroarray Technology
Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More informationHow many of you have checked out the web site on protein-dna interactions?
How many of you have checked out the web site on protein-dna interactions? Example of an approximately 40,000 probe spotted oligo microarray with enlarged inset to show detail. Find and be ready to discuss
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationProtein Protein Interaction Networks
Functional Pattern Mining from Genome Scale Protein Protein Interaction Networks Young-Rae Cho, Ph.D. Assistant Professor Department of Computer Science Baylor University it My Definition of Bioinformatics
More informationData, Measurements, Features
Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies (HTS)
Introduction to transcriptome analysis using High Throughput Sequencing technologies (HTS) A typical RNA Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationStatistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
More informationIntroduction to SAGEnhaft
Introduction to SAGEnhaft Tim Beissbarth October 13, 2015 1 Overview Serial Analysis of Gene Expression (SAGE) is a gene expression profiling technique that estimates the abundance of thousands of gene
More informationData Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis
Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
More informationEssentials of Real Time PCR. About Sequence Detection Chemistries
Essentials of Real Time PCR About Real-Time PCR Assays Real-time Polymerase Chain Reaction (PCR) is the ability to monitor the progress of the PCR as it occurs (i.e., in real time). Data is therefore collected
More informationIntroduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
More informationREAL TIME PCR USING SYBR GREEN
REAL TIME PCR USING SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM
More informationPreciseTM Whitepaper
Precise TM Whitepaper Introduction LIMITATIONS OF EXISTING RNA-SEQ METHODS Correctly designed gene expression studies require large numbers of samples, accurate results and low analysis costs. Analysis
More informationAnalyzing Research Data Using Excel
Analyzing Research Data Using Excel Fraser Health Authority, 2012 The Fraser Health Authority ( FH ) authorizes the use, reproduction and/or modification of this publication for purposes other than commercial
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationGene Expression Assays
APPLICATION NOTE TaqMan Gene Expression Assays A mpl i fic ationef ficienc yof TaqMan Gene Expression Assays Assays tested extensively for qpcr efficiency Key factors that affect efficiency Efficiency
More informationRecombinant DNA and Biotechnology
Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study
More informationReal-Time PCR Vs. Traditional PCR
Real-Time PCR Vs. Traditional PCR Description This tutorial will discuss the evolution of traditional PCR methods towards the use of Real-Time chemistry and instrumentation for accurate quantitation. Objectives
More informationCorrelation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York
Correlation of microarray and quantitative real-time PCR results Elisa Wurmbach Mount Sinai School of Medicine New York Microarray techniques Oligo-array: Affymetrix, Codelink, spotted oligo-arrays (60-70mers)
More informationSPSS Explore procedure
SPSS Explore procedure One useful function in SPSS is the Explore procedure, which will produce histograms, boxplots, stem-and-leaf plots and extensive descriptive statistics. To run the Explore procedure,
More informationQuantitative proteomics background
Proteomics data analysis seminar Quantitative proteomics and transcriptomics of anaerobic and aerobic yeast cultures reveals post transcriptional regulation of key cellular processes de Groot, M., Daran
More informationWhy Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012
Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationFrequently Asked Questions Next Generation Sequencing
Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided
More informationStatistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationBasic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
More informationStatistics in Medicine Research Lecture Series CSMC Fall 2014
Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power
More informationNew Technologies for Sensitive, Low-Input RNA-Seq. Clontech Laboratories, Inc.
New Technologies for Sensitive, Low-Input RNA-Seq Clontech Laboratories, Inc. Outline Introduction Single-Cell-Capable mrna-seq Using SMART Technology SMARTer Ultra Low RNA Kit for the Fluidigm C 1 System
More informationALLEN Mouse Brain Atlas
TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGH-THROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for high-throughput
More informationUsing Illumina BaseSpace Apps to Analyze RNA Sequencing Data
Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless
More informationRT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial
RT 2 Profiler PCR Array: Web-Based Data Analysis Tutorial Samuel J. Rulli, Jr., Ph.D. qpcr-applications Scientist Samuel.Rulli@QIAGEN.com Pathway Focused Research from Sample Prep to Data Analysis! -2-
More informationFlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationAnalyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6
Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,
More informationCHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA
CHAPTER 14 ORDINAL MEASURES OF CORRELATION: SPEARMAN'S RHO AND GAMMA Chapter 13 introduced the concept of correlation statistics and explained the use of Pearson's Correlation Coefficient when working
More informationREAL TIME PCR SYBR GREEN
REAL TIME PCR SYBR GREEN 1 THE PROBLEM NEED TO QUANTITATE DIFFERENCES IN mrna EXPRESSION SMALL AMOUNTS OF mrna LASER CAPTURE SMALL AMOUNTS OF TISSUE PRIMARY CELLS PRECIOUS REAGENTS 2 THE PROBLEM QUANTITATION
More informationA Primer of Genome Science THIRD
A Primer of Genome Science THIRD EDITION GREG GIBSON-SPENCER V. MUSE North Carolina State University Sinauer Associates, Inc. Publishers Sunderland, Massachusetts USA Contents Preface xi 1 Genome Projects:
More informationBiostatistics: Types of Data Analysis
Biostatistics: Types of Data Analysis Theresa A Scott, MS Vanderbilt University Department of Biostatistics theresa.scott@vanderbilt.edu http://biostat.mc.vanderbilt.edu/theresascott Theresa A Scott, MS
More informationThermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual
Thermo Scientific DyNAmo cdna Synthesis Kit for qrt-pcr Technical Manual F- 470S 20 cdna synthesis reactions (20 µl each) F- 470L 100 cdna synthesis reactions (20 µl each) Table of contents 1. Description...
More informationNext Generation Sequencing
Next Generation Sequencing Technology and applications 10/1/2015 Jeroen Van Houdt - Genomics Core - KU Leuven - UZ Leuven 1 Landmarks in DNA sequencing 1953 Discovery of DNA double helix structure 1977
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationSample Size and Power in Clinical Trials
Sample Size and Power in Clinical Trials Version 1.0 May 011 1. Power of a Test. Factors affecting Power 3. Required Sample Size RELATED ISSUES 1. Effect Size. Test Statistics 3. Variation 4. Significance
More informationNonparametric Statistics
Nonparametric Statistics J. Lozano University of Goettingen Department of Genetic Epidemiology Interdisciplinary PhD Program in Applied Statistics & Empirical Methods Graduate Seminar in Applied Statistics
More informationDifference tests (2): nonparametric
NST 1B Experimental Psychology Statistics practical 3 Difference tests (): nonparametric Rudolf Cardinal & Mike Aitken 10 / 11 February 005; Department of Experimental Psychology University of Cambridge
More informationNext generation DNA sequencing technologies. theory & prac-ce
Next generation DNA sequencing technologies theory & prac-ce Outline Next- Genera-on sequencing (NGS) technologies overview NGS applica-ons NGS workflow: data collec-on and processing the exome sequencing
More informationStandards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium
Standards, Guidelines and Best Practices for RNA-Seq V1.0 (June 2011) The ENCODE Consortium I. Introduction: Sequence based assays of transcriptomes (RNA-seq) are in wide use because of their favorable
More informationAppendix 2 Molecular Biology Core Curriculum. Websites and Other Resources
Appendix 2 Molecular Biology Core Curriculum Websites and Other Resources Chapter 1 - The Molecular Basis of Cancer 1. Inside Cancer http://www.insidecancer.org/ From the Dolan DNA Learning Center Cold
More informationSocial Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationScottish Qualifications Authority
National Unit specification: general information Unit code: FH2G 12 Superclass: RH Publication date: March 2011 Source: Scottish Qualifications Authority Version: 01 Summary This Unit is a mandatory Unit
More informationncounter Leukemia Fusion Gene Expression Assay Molecules That Count Product Highlights ncounter Leukemia Fusion Gene Expression Assay Overview
ncounter Leukemia Fusion Gene Expression Assay Product Highlights Simultaneous detection and quantification of 25 fusion gene isoforms and 23 additional mrnas related to leukemia Compatible with a variety
More informationStatistics for Sports Medicine
Statistics for Sports Medicine Suzanne Hecht, MD University of Minnesota (suzanne.hecht@gmail.com) Fellow s Research Conference July 2012: Philadelphia GOALS Try not to bore you to death!! Try to teach
More information1. Molecular computation uses molecules to represent information and molecular processes to implement information processing.
Chapter IV Molecular Computation These lecture notes are exclusively for the use of students in Prof. MacLennan s Unconventional Computation course. c 2013, B. J. MacLennan, EECS, University of Tennessee,
More informationAn Introduction to Microarray Data Analysis
Chapter An Introduction to Microarray Data Analysis M. Madan Babu Abstract This chapter aims to provide an introduction to the analysis of gene expression data obtained using microarray experiments. It
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationIllumina Sequencing Technology
Illumina Sequencing Technology Highest data accuracy, simple workflow, and a broad range of applications. Introduction Figure 1: Illumina Flow Cell Illumina sequencing technology leverages clonal array
More informationLecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
More informationRecombinant DNA & Genetic Engineering. Tools for Genetic Manipulation
Recombinant DNA & Genetic Engineering g Genetic Manipulation: Tools Kathleen Hill Associate Professor Department of Biology The University of Western Ontario Tools for Genetic Manipulation DNA, RNA, cdna
More informationBBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS
BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge
More informationHiPer RT-PCR Teaching Kit
HiPer RT-PCR Teaching Kit Product Code: HTBM024 Number of experiments that can be performed: 5 Duration of Experiment: Protocol: 4 hours Agarose Gel Electrophoresis: 45 minutes Storage Instructions: The
More informationStep-by-Step Guide to Basic Expression Analysis and Normalization
Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document
More informationUNIVERSITY OF NAIROBI
UNIVERSITY OF NAIROBI MASTERS IN PROJECT PLANNING AND MANAGEMENT NAME: SARU CAROLYNN ELIZABETH REGISTRATION NO: L50/61646/2013 COURSE CODE: LDP 603 COURSE TITLE: RESEARCH METHODS LECTURER: GAKUU CHRISTOPHER
More informationData Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds. Overview. Data Analysis Tutorial
Data Analysis on the ABI PRISM 7700 Sequence Detection System: Setting Baselines and Thresholds Overview In order for accuracy and precision to be optimal, the assay must be properly evaluated and a few
More information2. True or False? The sequence of nucleotides in the human genome is 90.9% identical from one person to the next. False (it s 99.
1. True or False? A typical chromosome can contain several hundred to several thousand genes, arranged in linear order along the DNA molecule present in the chromosome. True 2. True or False? The sequence
More informationQUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS
QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.
More informationForensic DNA Testing Terminology
Forensic DNA Testing Terminology ABI 310 Genetic Analyzer a capillary electrophoresis instrument used by forensic DNA laboratories to separate short tandem repeat (STR) loci on the basis of their size.
More informationCore Facility Genomics
Core Facility Genomics versatile genome or transcriptome analyses based on quantifiable highthroughput data ascertainment 1 Topics Collaboration with Harald Binder and Clemens Kreutz Project: Microarray
More informationJust the Facts: A Basic Introduction to the Science Underlying NCBI Resources
1 of 8 11/7/2004 11:00 AM National Center for Biotechnology Information About NCBI NCBI at a Glance A Science Primer Human Genome Resources Model Organisms Guide Outreach and Education Databases and Tools
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationReal-time quantitative RT -PCR (Taqman)
Real-time quantitative RT -PCR (Taqman) Author: SC, Patti Lab, 3/03 This is performed as a 2-step reaction: 1. cdna synthesis from DNase 1-treated total RNA 2. PCR 1. cdna synthesis (Advantage RT-for-PCR
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationDr Alexander Henzing
Horizon 2020 Health, Demographic Change & Wellbeing EU funding, research and collaboration opportunities for 2016/17 Innovate UK funding opportunities in omics, bridging health and life sciences Dr Alexander
More informationCancer Biostatistics Workshop Science of Doing Science - Biostatistics
Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationNext Generation Sequencing
Next Generation Sequencing DNA sequence represents a single format onto which a broad range of biological phenomena can be projected for high-throughput data collection Over the past three years, massively
More informationTwo-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...
Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationLecture 11 Data storage and LIMS solutions. Stéphane LE CROM lecrom@biologie.ens.fr
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM lecrom@biologie.ens.fr Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
More informationMaterials and Methods. Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profiling
Application Note Blocking of Globin Reverse Transcription to Enhance Human Whole Blood Gene Expression Profi ling Yasmin Beazer-Barclay, Doug Sinon, Christopher Morehouse, Mark Porter, and Mike Kuziora
More informationPackage dunn.test. January 6, 2016
Version 1.3.2 Date 2016-01-06 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
More informationbusiness statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar
business statistics using Excel Glyn Davis & Branko Pecar OXFORD UNIVERSITY PRESS Detailed contents Introduction to Microsoft Excel 2003 Overview Learning Objectives 1.1 Introduction to Microsoft Excel
More informationIntroduction to next-generation sequencing data
Introduction to next-generation sequencing data David Simpson Centre for Experimental Medicine Queens University Belfast http://www.qub.ac.uk/research-centres/cem/ Outline History of DNA sequencing NGS
More informationReal-time PCR: Understanding C t
APPLICATION NOTE Real-Time PCR Real-time PCR: Understanding C t Real-time PCR, also called quantitative PCR or qpcr, can provide a simple and elegant method for determining the amount of a target sequence
More informationImproving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP
Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP ABSTRACT In data mining modelling, data preparation
More informationGene Models & Bed format: What they represent.
GeneModels&Bedformat:Whattheyrepresent. Gene models are hypotheses about the structure of transcripts produced by a gene. Like all models, they may be correct, partly correct, or entirely wrong. Typically,
More informationBiotechnology: DNA Technology & Genomics
Chapter 20. Biotechnology: DNA Technology & Genomics 2003-2004 The BIG Questions How can we use our knowledge of DNA to: diagnose disease or defect? cure disease or defect? change/improve organisms? What
More informationDiscovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat
Bioinformatique et Séquençage Haut Débit, Discovery and Quantification of RNA with RNASeq Roderic Guigó Serra Centre de Regulació Genòmica (CRG) roderic.guigo@crg.cat 1 RNA Transcription to RNA and subsequent
More informationRNA-seq. Quantification and Differential Expression. Genomics: Lecture #12
(2) Quantification and Differential Expression Institut für Medizinische Genetik und Humangenetik Charité Universitätsmedizin Berlin Genomics: Lecture #12 Today (2) Gene Expression per Sources of bias,
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationTHE KRUSKAL WALLLIS TEST
THE KRUSKAL WALLLIS TEST TEODORA H. MEHOTCHEVA Wednesday, 23 rd April 08 THE KRUSKAL-WALLIS TEST: The non-parametric alternative to ANOVA: testing for difference between several independent groups 2 NON
More informationOrganizing Your Approach to a Data Analysis
Biost/Stat 578 B: Data Analysis Emerson, September 29, 2003 Handout #1 Organizing Your Approach to a Data Analysis The general theme should be to maximize thinking about the data analysis and to minimize
More informationFinal Project Report
CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes
More information