Statistical issues in the analysis of microarray data


 Blaise Lyons
 3 years ago
 Views:
Transcription
1 Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 1 / 30
2 Table of Contents 1 Outline 2 Experimental design 3 Statistical modelling 4 Hypotheses testing 5 Gene set enrichment analysis 6 Classification D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 2 / 30
3 Outline Focus is set on Single channel microarrays One sample per array Gene expressions for thousands of oligonucleotides Identifying genes that are differentially expressed due to a treatment Finding significantly differentially expressed genes with a given error probability (Predicting a treatment level given the gene expression data) D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 3 / 30
4 Controlled experiments Independent replications Multiple sources of variability present: Sample, array, environmental variability,... Account for this variability in the experimental design by several replications of arrays, samples, multiple timepoints,... Randomisation Needed to separate treatment effects from other factors, which might influence gene expression D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 4 / 30
5 Experimental design Planning an experiment Multiple arrays per sample? Enables estimating array variability. Large amount of RNA needed. With more complex designs a larger number of arrays, samples is needed Measuring covariates, which are not directly of interest, but might have an influence on gene expression Simple classic design 2 Treatments (Control/Treatment), Multiple arrays/samples per treatments D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 5 / 30
6 Data structure Treatment A Treatment B... Array 1 Array 2 Array 3 Array 4 Array 5 Array 6... Gene 1 y 11 y 12 y 13 y 14 y 15 y Gene 2 y 21 y 22 y 23 y 24 y 25 y Gene 3 y 31 y 32 y 33 y 34 y 35 y D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 6 / 30
7 Data example Generating artificial data 2 treatments (A, B) 20 arrays per treatment 5000 genes per array Normal distributed residuals, array effects within array sd = 1; between array sd = genes show an effect (δ = ±2) 2 x transformation D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 7 / 30
8 Data example Array D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 8 / 30
9 Data example density x D. Gerhard (LUH) Analysis of microarray data 23. Sep 09 9 / 30
10 Normalisation Preliminary data processing Checking for hybridisation errors Variability between arrays might bias the results Only a few genes are expected to show an effect Using all observations or known expressions of reference genes to standardise arrays Trying to shift data into a normal distribution (commonly by log 2 transformation) D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
11 Example data transformation original transformed density density x log2(x) D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
12 Median normalisation transformed normalised density density log2(x) log2(x) D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
13 Estimating treatment effects Statistical models Trying to explain the effects by only a few parameters in a statistical model Estimating parameters e.g. by minimising residuals Due to limited calculation resources, models can be fitted separately for each gene 2 sample design For the simple treatmentcontrol design the difference between arithmetic means & it s standard error for each gene can be estimated. After applying the inverse of the log 2 transformation the fold change (ratio of arithmetic means) is the parameter of interest. D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
14 Parametric vs. nonparametric methods Parametric methods Assuming normal distribution after log 2 transformation Summarising the data by means and standard errors is adequate under assumptions of a general linear model Nonparametric methods At skewed distributions providing only means & std.err. might be misleading Instead using medians, IQR, range,... Applying rank transformation, resampling methods,... Interpretation of treatment comparisons might be more complicated in models with less assumptions Lack of power at small sample sizes D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
15 Independent observations? No complete randomisation Observations from non randomised experimental units might be correlated, e.g. Multiple arrays for the same sample Samples of the same individual over time Block structures... Assuming independence of correlated observation may lead to underestimation of variability Introducing multiple error terms in the model Increased complexity of the model, increase in sample size needed D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
16 Hypotheses Testing Test for a single gene Setting up hypotheses of interest (e.g. H 0 : parameter of interest equals 0) Constructing test statistics for each gene Calculating pvalues under assumption of a null distribution for the test statistic Borrowing information from multiple genes At small sample sizes the genewise estimation of std.errors is difficult Adding a fudge factor to the std. err. to minimise the coefficient of variation Borrowing information about variability from all genes by empirical bayes D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
17 ttest results for the example Distribution of pvalues: Frequency p value D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
18 Error rates As multiple hypotheses are tested, there is a choice of controlling different error rates, and the individual typeierror might not be adequate # H 0 not rejected # H 0 rejected # true H 0 U V m 0 # false H 0 T S m m 0 known m R R m PCER Per Comparison error rate: E(V )/m FWER Familywise error rate: P (V > 0) FDR False discovery rate: E (V /R)... D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
19 FWER controlling procedures Calculating p i adjusted pvalues (i = 1,..., m) Bonferroni: p i = min {1, p i /m} (singlestep) Holm: p i = min {1, max {p i 1, (m i + 1) p i }} (stepdown, for p 1 p i p m ) Utilising a multivariate distribution, resampling methods (singlestep)... D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
20 FWER control using datadriven weights Weighted stepdown procedure Weight the m unadjusted pvalues pi = p i /w i and order them by p1 p m Reject H i as long pi α P m k=i w k Obtaining weights Choosing weights independently of the significance of the test Gather information about the distribution of hypotheses under the null or in the alternative Examples Weighting by the total variance w i = S i of the entire sample Weighting by nondecreasing monotone functions of the weights w i = f (S i ) Using principle components to define weights D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
21 FDR controlling procedures Calculating p j adjusted pvalues (i = 1,..., m) BenjaminiHochberg: p j = min j i { m i p i } (stepup, for p 1 p i p m ) BenjaminiYekutieli: correction under dependence (stepup) Storey: pfdr (estimating m 0 /m)... D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
22 Comparison of adjustment methods using adj. pvalues for the data example Method # H 0 rejected # H 0 falsely rejected unadjusted Bonferroni 91 0 Holm 92 0 S i weighted 44 0 minp 82 0 BH BY 98 0 Storey D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
23 Volcano Plot log 2 fold change log 10 p value unadjusted Bonferroni BH BY min p D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
24 Gene set enrichment analysis Define multiple sets of genes Test differential expression for these gene sets Small effects of single genes are hard to detect Combination of multiple small effects to get the big picture Reduction of the dimensionality of the multiple testing problem Test effects for whole pathways, functional groups, etc. D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
25 Assigning genetic features to known classes Classification Reformulating the problem into a setting with p regressors to estimate the class membership probability (control/treatment) for each gene Finding a classification rule by e.g. Logistic regression Discriminant analysis SVM... Validation Fitting the model to training data Validation of the model by test data Crossvalidation to validate the model on training data D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
26 Problem of high dimensions p >> n Problem Requirement for logistic regression or LDA is that the number of observations is larger than the number of variables Reducing the number of variables by Feature Selection Using Penalized Logistic Regression,... D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
27 Feature selection Filtering genes Multiple testing approaches can be used as filter Select all variables corresponding to genes with a pvalue p p 0 Perform for example logistic regression to model the posterior probability of K classes log Pr (G = k X = x) Pr (G = K X = x) = β k0 + β T k x D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
28 A second example Generating artificial training data 2 treatments (A, B) 20 arrays per treatment 5000 genes per array Normal distributed residuals, array effects within array sd = 1; between array sd = 0.5 Genes show N(0, 0.25) distributed effects 2 x transformation Generating test data 10 arrays per treatment Same effects as in training data Both datasets are log 2 transformed and median normalized. D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
29 Feature selection / Classification Choosing only 10 genes with the best ttest results as covariates Performing LDA and logistic regression Validation by the test set LDA: A B A 8 2 B 2 8 logistic regression: A B A 7 2 B 3 8 D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
30 References Dudoit, S and Van der Laan, M (2008): Multiple Testing Procedures with Application to Genomics. Springer Series in Statistics. Gentleman, R, Carey, VJ, Huber, W, Irizarry, RA, and Dudoit, S (2005): Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer for Biology and Health. Hastie, T, Tibshirani, R and Friedman J (2001): The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer Series in Statistics. Benfamini, Y and Hochberg, Y (1995): Controlling the false discovery rate: a new and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57: Benjamini, Y and Yekutieli, D 2001: The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29: Finos L and Salmaso L (2007): FDR and FWEcontrolling methods using datadriven weights. JSPI. 137: Kropf, S & Läuter, J (2002): Multiple tests for different sets of variables using a datadriven ordering of hypotheses, with an application to gene expression data. Biometrical Journal 44: Saeys Y, Iñaki I, Larrañaga (2007): A review of feature selection techniques in bioinformatics. Bioinformatics. 23: Schwender, H, Ickstadt, K, and Rahnenführer J (2008): Classification with HighDimensional Genetic Data: Assigning Patients and Genetic Features to Known Classes. Biometrical Journal 50: Storey, JD and Tibshirani, R (2003): Statistical significance for genomewide studies. PNAS. 100: D. Gerhard (LUH) Analysis of microarray data 23. Sep / 30
False Discovery Rates
False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies Highthroughput technologies to measure the expression levels of thousands
More informationBootstrapping pvalue estimations
Bootstrapping pvalue estimations In microarray studies it is common that the the sample size is small and that the distribution of expression values differs from normality. In this situations, permutation
More informationPackage dunn.test. January 6, 2016
Version 1.3.2 Date 20160106 Package dunn.test January 6, 2016 Title Dunn's Test of Multiple Comparisons Using Rank Sums Author Alexis Dinno Maintainer Alexis Dinno
More informationPackage ERP. December 14, 2015
Type Package Package ERP December 14, 2015 Title Significance Analysis of EventRelated Potentials Data Version 1.1 Date 20151211 Author David Causeur (Agrocampus, Rennes, France) and ChingFan Sheu
More informationCancer Biostatistics Workshop Science of Doing Science  Biostatistics
Cancer Biostatistics Workshop Science of Doing Science  Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center VanderbiltIngram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics
More informationGene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays?  Biomolecular devices measuring the transcriptome of a
More informationCourse on Microarray Gene Expression Analysis
Course on Microarray Gene Expression Analysis ::: Differential Expression Analysis Daniel Rico drico@cnio.es Bioinformatics Unit CNIO Upregulation or No Change Downregulation Image analysis comparison
More informationExploratory data analysis for microarray data
Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization
More informationMolecular Genetics: Challenges for Statistical Practice. J.K. Lindsey
Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray
More informationFrom Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNAseq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNAseq data experimental design data collection modeling statistical testing biological heterogeneity
More informationTest Volume 12, Number 1. June 2003
Sociedad Española de Estadística e Investigación Operativa Test Volume 12, Number 1. June 2003 Resamplingbased Multiple Testing for Microarray Data Analysis Yongchao Ge Department of Statistics University
More informationA direct approach to false discovery rates
J. R. Statist. Soc. B (2002) 64, Part 3, pp. 479 498 A direct approach to false discovery rates John D. Storey Stanford University, USA [Received June 2001. Revised December 2001] Summary. Multiplehypothesis
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationPenalized Logistic Regression and Classification of Microarray Data
Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAGLMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification
More informationMicroarray Data Analysis. Statistical methods to detect differentially expressed genes
Microarray Data Analysis Statistical methods to detect differentially expressed genes Outline The class comparison problem Statistical tests Calculation of pvalues Permutations tests The volcano plot
More informationFalse Discovery Rate Control with Groups
False Discovery Rate Control with Groups James X. Hu, Hongyu Zhao and Harrison H. Zhou Abstract In the context of largescale multiple hypothesis testing, the hypotheses often possess certain group structures
More information0BComparativeMarkerSelection Documentation
0BComparativeMarkerSelection Documentation Description: Author: Computes significance values for features using several metrics, including FDR(BH), Q Value, FWER, FeatureSpecific PValue, and Bonferroni.
More informationPower and Sample Size. In epigenetic epidemiology studies
Power and Sample Size In epigenetic epidemiology studies Overview Pros and cons Working examples Concerns for epigenetic epidemiology Definition Power is the probability of detecting an effect, given that
More informationService courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
More informationData Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments  Introduction
Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments  Introduction
More informationPredictive Gene Signature Selection for Adjuvant Chemotherapy in NonSmall Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in NonSmall Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation:  Feature vector X,  qualitative response Y, taking values in C
More informationAcknowledgments. Data Mining with Regression. Data Mining Context. Overview. Colleagues
Data Mining with Regression Teaching an old dog some new tricks Acknowledgments Colleagues Dean Foster in Statistics Lyle Ungar in Computer Science Bob Stine Department of Statistics The School of the
More informationRegularized Logistic Regression for Mind Reading with Parallel Validation
Regularized Logistic Regression for Mind Reading with Parallel Validation Heikki Huttunen, JukkaPekka Kauppi, Jussi Tohka Tampere University of Technology Department of Signal Processing Tampere, Finland
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationFalse discovery rate and permutation test: An evaluation in ERP data analysis
Research Article Received 7 August 2008, Accepted 8 October 2009 Published online 25 November 2009 in Wiley Interscience (www.interscience.wiley.com) DOI: 10.1002/sim.3784 False discovery rate and permutation
More informationTwoWay ANOVA tests. I. Definition and Applications...2. II. TwoWay ANOVA prerequisites...2. III. How to use the TwoWay ANOVA tool?...
TwoWay ANOVA tests Contents at a glance I. Definition and Applications...2 II. TwoWay ANOVA prerequisites...2 III. How to use the TwoWay ANOVA tool?...3 A. Parametric test, assume variances equal....4
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN13: 9780470860809 ISBN10: 0470860804 Editors Brian S Everitt & David
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationTutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
More informationSoftware and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationStatistics in Medicine Research Lecture Series CSMC Fall 2014
Catherine Bresee, MS Senior Biostatistician Biostatistics & Bioinformatics Research Institute Statistics in Medicine Research Lecture Series CSMC Fall 2014 Overview Review concept of statistical power
More informationComparing Functional Data Analysis Approach and Nonparametric MixedEffects Modeling Approach for Longitudinal Data Analysis
Comparing Functional Data Analysis Approach and Nonparametric MixedEffects Modeling Approach for Longitudinal Data Analysis Hulin Wu, PhD, Professor (with Dr. Shuang Wu) Department of Biostatistics &
More informationStatistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl
Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 Oneway ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic
More informationStatistical Analysis. NBAFB Metabolomics Masterclass. Mark Viant
Statistical Analysis NBAFB Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationEfficient statistical analysis of large correlated multivariate datasets: a case study on brain connectivity matrices
Efficient statistical analysis of large correlated multivariate datasets: a case study on brain connectivity matrices Djalel Eddine Meskaldji 1 ; Leila Cammoun 1 ; Patric Hagmann 2 ; Reto Meuli 2, Jean
More informationRedwood Building, Room T204, Stanford University School of Medicine, Stanford, CA 943055405.
W hittemoretxt050806.tex A Bayesian False Discovery Rate for Multiple Testing Alice S. Whittemore Department of Health Research and Policy Stanford University School of Medicine Correspondence Address:
More informationStrong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach
J. R. Statist. Soc. B (2004) 66, Part 1, pp. 187 205 Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach John D. Storey,
More informationParametric and Nonparametric FDR Estimation Revisited
Parametric and Nonparametric FDR Estimation Revisited Baolin Wu, 1, Zhong Guan 2, and Hongyu Zhao 3, 1 Division of Biostatistics, School of Public Health University of Minnesota, Minneapolis, MN 55455,
More informationMaster s Thesis. PERFORMANCE OF BETABINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY
Master s Thesis PERFORMANCE OF BETABINOMIAL SGoF MULTITESTING METHOD UNDER DEPENDENCE: A SIMULATION STUDY AUTHOR: Irene Castro Conde DIRECTOR: Jacobo de Uña Álvarez Master in Statistical Techniques University
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationHypothesis testing S2
Basic medical statistics for clinical and experimental research Hypothesis testing S2 Katarzyna Jóźwiak k.jozwiak@nki.nl 2nd November 2015 1/43 Introduction Point estimation: use a sample statistic to
More informationDescriptive Statistics
Descriptive Statistics Primer Descriptive statistics Central tendency Variation Relative position Relationships Calculating descriptive statistics Descriptive Statistics Purpose to describe or summarize
More informationSample Size Estimation and Power Analysis
yumi Shintani, Ph.D., M.P.H. Sample Size Estimation and Power nalysis March 2008 yumi Shintani, PhD, MPH Department of Biostatistics Vanderbilt University 1 researcher conducted a study comparing the effect
More information1.2 Statistical testing by permutation
Statistical testing by permutation 17 Excerpt (pp. 1726) Ch. 13), from: McBratney & Webster (1981), McBratney et al. (1981), Webster & Burgess (1984), Borgman & Quimby (1988), and FrançoisBongarçon (1991).
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002Topics in StatisticsBiological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationOutline of Topics. Statistical Methods I. Types of Data. Descriptive Statistics
Statistical Methods I Tamekia L. Jones, Ph.D. (tjones@cog.ufl.edu) Research Assistant Professor Children s Oncology Group Statistics & Data Center Department of Biostatistics Colleges of Medicine and Public
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationInstitute of Actuaries of India Subject CT3 Probability and Mathematical Statistics
Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in
More informationBasics of microarrays. Petter Mostad 2003
Basics of microarrays Petter Mostad 2003 Why microarrays? Microarrays work by hybridizing strands of DNA in a sample against complementary DNA in spots on a chip. Expression analysis measure relative amounts
More informationPackage empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title SimulationBased False Discovery Rate in RNASeq Version 1.0.3 Date 20150526 Author Mikhail V. Matz Maintainer Mikhail V. Matz
More informationEstimation of σ 2, the variance of ɛ
Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationCOURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences. 20152016 Academic Year Qualification.
COURSE PLAN BDA: Biomedical Data Analysis Master in Bioinformatics for Health Sciences 20152016 Academic Year Qualification. Master's Degree 1. Description of the subject Subject name: Biomedical Data
More informationAssumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model
Assumptions Assumptions of linear models Apply to response variable within each group if predictor categorical Apply to error terms from linear model check by analysing residuals Normality Homogeneity
More informationApplications of R Software in Bayesian Data Analysis
Article International Journal of Information Science and System, 2012, 1(1): 723 International Journal of Information Science and System Journal homepage: www.modernscientificpress.com/journals/ijinfosci.aspx
More informationMultivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A WileyInterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationJournal of Statistical Software
JSS Journal of Statistical Software September 2014, Volume 59, Issue 13. http://www.jstatsoft.org/ structssi: Simultaneous and Selective Inference for Grouped or Hierarchically Structured Data Kris Sankaran
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationIntroduction to data analysis: Supervised analysis
Introduction to data analysis: Supervised analysis Introduction to Microarray Technology course May 2011 Solveig Mjelstad Olafsrud solveig@microarray.no Most slides adapted/borrowed from presentations
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationAnalysis of Data. Organizing Data Files in SPSS. Descriptive Statistics
Analysis of Data Claudia J. Stanny PSY 67 Research Design Organizing Data Files in SPSS All data for one subject entered on the same line Identification data Betweensubjects manipulations: variable to
More informationTime series experiments
Time series experiments Time series experiments Why is this a separate lecture: The price of microarrays are decreasing more time series experiments are coming Often a more complex experimental design
More informationStatistical Analysis Strategies for Shotgun Proteomics Data
Statistical Analysis Strategies for Shotgun Proteomics Data Ming Li, Ph.D. Cancer Biostatistics Center Vanderbilt University Medical Center Ayers Institute Biomarker Pipeline normal shotgun proteome analysis
More informationMIC  Detecting Novel Associations in Large Data Sets. by Nico Güttler, Andreas Ströhlein and Matt Huska
MIC  Detecting Novel Associations in Large Data Sets by Nico Güttler, Andreas Ströhlein and Matt Huska Outline Motivation Method Results Criticism Conclusions Motivation  Goal Determine important undiscovered
More informationMISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
More informationStatistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013
Statistics I for QBIC Text Book: Biostatistics, 10 th edition, by Daniel & Cross Contents and Objectives Chapters 1 7 Revised: August 2013 Chapter 1: Nature of Statistics (sections 1.11.6) Objectives
More informationStatistics for BIG data
Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSABulletine 2014) Before
More informationInferential Statistics
Inferential Statistics Sampling and the normal distribution Zscores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are
More informationPrinciples of Hypothesis Testing for Public Health
Principles of Hypothesis Testing for Public Health Laura Lee Johnson, Ph.D. Statistician National Center for Complementary and Alternative Medicine johnslau@mail.nih.gov Fall 2011 Answers to Questions
More informationAdaptive linear stepup procedures that control the false discovery rate
Biometrika (26), 93, 3, pp. 491 57 26 Biometrika Trust Printed in Great Britain Adaptive linear stepup procedures that control the false discovery rate BY YOAV BENJAMINI Department of Statistics and Operations
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationThe Statistics Tutor s Quick Guide to
statstutor community project encouraging academics to share statistics support resources All stcp resources are released under a Creative Commons licence The Statistics Tutor s Quick Guide to Stcpmarshallowen7
More informationThe microarray block. Outline. Microarray experiments. Microarray Technologies. Outline
The microarray block Bioinformatics 1317 March 006 Microarray data analysis John Gustafsson Mathematical statistics Chalmers Lectures DNA microarray technology overview (KS) of microarray data (JG) How
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationControlling the number of false discoveries: application to highdimensional genomic data
Journal of Statistical Planning and Inference 124 (2004) 379 398 www.elsevier.com/locate/jspi Controlling the number of false discoveries: application to highdimensional genomic data Edward L. Korn a;,
More informationStatistical basics for Biology: p s, alphas, and measurement scales.
334 Volume 25: Mini Workshops Statistical basics for Biology: p s, alphas, and measurement scales. Catherine Teare Ketter School of Marine Programs University of Georgia Athens Georgia 306023636 (706)
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationFORMALIZED DATA SNOOPING BASED ON GENERALIZED ERROR RATES
Econometric Theory, 24, 2008, 404 447+ Printed in the United States of America+ DOI: 10+10170S0266466608080171 FORMALIZED DATA SNOOPING BASED ON GENERALIZED ERROR RATES JOSEPH P. ROMANO Stanford University
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models  part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK2800 Kgs. Lyngby
More informationExamples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.
Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in
More informationApplying Statistics Recommended by Regulatory Documents
Applying Statistics Recommended by Regulatory Documents Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301325 32531293129 About the Speaker Mr. Steven
More informationTOWARD BIG DATA ANALYSIS WORKSHOP
TOWARD BIG DATA ANALYSIS WORKSHOP 邁 向 巨 量 資 料 分 析 研 討 會 摘 要 集 2015.06.0506 巨 量 資 料 之 矩 陣 視 覺 化 陳 君 厚 中 央 研 究 院 統 計 科 學 研 究 所 摘 要 視 覺 化 (Visualization) 與 探 索 式 資 料 分 析 (Exploratory Data Analysis, EDA)
More informationMEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS. Biostatistics
MEU. INSTITUTE OF HEALTH SCIENCES COURSE SYLLABUS title course code: Program name: Contingency Tables and Log Linear Models Level Biostatistics Hours/week Ther. Recite. Lab. Others Total Master of Sci.
More informationNotes for STA 437/1005 Methods for Multivariate Data
Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.
More informationIntegrating DNA Motif Discovery and GenomeWide Expression Analysis. Erin M. Conlon
Integrating DNA Motif Discovery and GenomeWide Expression Analysis Department of Mathematics and Statistics University of Massachusetts Amherst Statistics in Functional Genomics Workshop Ascona, Switzerland
More informationInterpretation of Somers D under four simple models
Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms
More informationQVALUE: The Manual Version 1.0
QVALUE: The Manual Version 1.0 Alan Dabney and John D. Storey Department of Biostatistics University of Washington Email: jstorey@u.washington.edu March 2003; Updated June 2003; Updated January 2004 Table
More informationMA2823: Foundations of Machine Learning
MA2823: Foundations of Machine Learning École Centrale Paris Fall 2015 ChloéAgathe Azencot Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr TAs: Jiaqian Yu jiaqian.yu@centralesupelec.fr
More informationMachine Learning Methods for Demand Estimation
Machine Learning Methods for Demand Estimation By Patrick Bajari, Denis Nekipelov, Stephen P. Ryan, and Miaoyu Yang Over the past decade, there has been a high level of interest in modeling consumer behavior
More informationAnalysis of Illumina Gene Expression Microarray Data
Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 10, Issue 1 2011 Article 28 The Joint Null Criterion for Multiple Hypothesis Tests Jeffrey T. Leek, Johns Hopkins Bloomberg School of Public
More informationFinding statistical patterns in Big Data
Finding statistical patterns in Big Data Patrick RubinDelanchy University of Bristol & Heilbronn Institute for Mathematical Research IAS Research Workshop: Data science for the real world (workshop 1)
More informationBuilding risk prediction models  with a focus on GenomeWide Association Studies. Charles Kooperberg
Building risk prediction models  with a focus on GenomeWide Association Studies Risk prediction models Based on data: (D i, X i1,..., X ip ) i = 1,..., n we like to fit a model P(D = 1 X 1,..., X p )
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova
More information