SAS MACROS AND SAS JMP GENOMICS FOR ANALYSIS OF MICROARRAY GENE EXPRESSION DATA

Size: px
Start display at page:

Download "SAS MACROS AND SAS JMP GENOMICS FOR ANALYSIS OF MICROARRAY GENE EXPRESSION DATA"

Transcription

1 SAS MACROS AND SAS JMP GENOMICS FOR ANALYSIS OF MICROARRAY GENE EXPRESSION DATA J. Sreekumar Central Tuber Crops Research Institute Sreekariyam, Thiruvananthapuram Sreejyothi_in@yahoo.com 1. Introduction Microarray gene expression experiments allows biologist to monitor the expression levels of thousands of genes simultaneously. Applications of microarrays range from the study of gene expression in plants under different environmental stress conditions to the comparison of gene expression profiles for tumour from cancer patients. DNA microarray experiments raise numerous statistical questions in different fields as diverse as image analysis, experimental design, hypothesis testing, cluster analysis and distribution theory etc. Noise creeps into microarray experiments at each stage from the preparation of tissue samples to the extraction of data. The greatest challenge to array technology lies in the analysis of gene expression data to identify which genes are differentially expressed across tissue samples or experimental conditions. This article summarizes some of the issues involved and provides a brief review of the analysis tools (macros) available in SAS and how SAS JMP genomics helps to achieve different goals in designing of microarray experiment and analysis of the gene expression data from the experiment. Any microarray experiment involves a number of distinct stages. Firstly there is the design of the experiment. The researchers must decide which genes are to be printed on the arrays, which sources of RNA are to be hybridized to the arrays and on how many arrays the hybridizations will be replicated. Secondly hybridization follows a number of data-cleaning steps or low-level analysis of the microarray data. The microarray images must be processed to acquire red and green foreground and background intensities for each spot. The acquired red/green ratios must be normalized to adjust for dye-bias and for any systematic variation other than that due to the differences between the RNA samples being studied. Thirdly, the normalized ratios are analyzed by various graphical and numerical means to select differentially expressed (DE) genes or to find groups of genes whose expression profiles can reliably classify the different RNA sources into meaningful groups. 2. What is JMP Genomics? JMP Genomics is a statistical discovery software solution from the two most trusted names in analytic software: SAS and JMP. Research organizations use JMP Genomics to uncover meaningful patterns in high-throughput genetics, expression microarray and proteomics data. Dynamically interactive graphics and analysis dialog boxes make it easy to explore data relationships using a comprehensive set of traditional and advanced statistical algorithms. JMP Genomics dynamically links advanced statistics with graphics to provide a complete and comprehensive picture of your research results More than 100 procedures for genetics, microarray and proteomics analysis make JMP Genomics an all-in-one solution, whether screening of a genome for significant genetic markers, looking for meaningful patterns from expression microarrays or examining highthroughput spectral data in a proteomics lab. JMP genomics can be utilized (a) to identify key genes from large microarray data sets (b) to assess quality-control metrics to identify and remove outlier arrays (c) normalize within and across arrays to remove effects of

2 experimental biases (d) perform gene-by-gene modelling to discover statistically significant differences and (e) to reveal biological insight with pattern discovery and predictive modelling tools. 3. SAS Macros for Microarray data analysis There are many macros available in SAS for carrying out analysis of microarray gene expression data. Karine Piot et al., developed AnovArray package, a collection of SAS macros based on Analysis of Variance (ANOVA) models. SAS procedures handle a wide range of statistical analyses used in microarray analysis such as clustering, supervised classification, singular value decomposition, (partial least squares) regression, etc.. The AnovArray package is naturally interfaced with all this tools and benefits therefore of all SAS possibilities. The AnovArray package can be applied to analyze normalized data from macro or microarray experiments in the case of balanced factorial designs and complete model. A macro to identify differentially expressed genes between different experimental conditions under hypothesis of homogeneous variance (HOM) and heterogeneous variance (HET). The following sections of this article correspond roughly to the various analysis steps in SAS JMP genomics and a detailed view of SAS macros for analysis of gene expression data. 4. Gene expression analysis in JMP 4.1 Importing data in JMP For expression and exon analysis, JMP Genomics requires two files: a design file and a data file. The design file contains all the information regarding the sample attributes. You should include as much information as possible about your experiment, including technical variables (e.g., date or batch), as well as experimental and clinical variables. Including this type of information will make it easier to understand the sources of variance in the experiment when you run quality control processes. The design file has two required columns or variables. The Array column is numeric and has a unique number for each array. The ColumnName column contains a unique identifier for each array. JMP Genomics software contains tools to create these variables, found in the Experimental Design submenu. In preparation for importing Affymetrix expression data, the Affymetrix Experimental Design Wizard can help create a design file from Affymetrix Array Attribute (ARR) files from Affymetrix s Expression Console or from existing text or Excel file formats. Note that when importing design information from text or Excel files, the design file template must contain a column labeled File or FileName, containing the file names of all the arrays in the study, and at least one column with design information that will be used in statistical tests (e.g., Treatment). Select Genomics > Experimental Design File > Affymetrix Experimental Design File Wizard. Click Next and in the following window, name of the study and choose either Extract information from ARR Files (AGCC format) or Import design information from an existing text, CSV, or Excel file. To import Illumina expression data, go to Genomics > Import > Illumina > Expression 4.2 Experimental designs Before carrying out a microarray experiment one must decide how many microarray slides will be used and which mrna samples will be hybridized to each slide. Certain decisions must be made in the preparation of the mrna samples, for example whether the RNA from

3 different animals will be pooled or kept separate and whether fluorescent labelling is to be done separately for each array or in one step for a batch of RNA. Careful attention to these issues will ensure that the best use is made of available resources, obvious biases will be avoided, and that the primary questions of interest to the experimenter will be answerable. Kerr and Churchill and Glonek and Solomon apply ideas from optimal experimental designs to suggest efficient designs for the some of the common microarray experiments. Pan, Lin and Le consider sample size and Speed and Yang examine the efficiency of using a reference sample as against direct comparison. 4.3 The basic expression work flow Sample Workflow for Analysis of Microarray Data is as follows 1) Generation of the Data Sets a. Experimental Design File Builder b. Data Set Creation 2) Evaluation of the Data Quality a. Raw Data Distribution Analysis b. Ratio Analysis (Raw Data) c. Ratio Analysis (Loess Normalization) 3) Comparison of Different Methods for Data Normalization a. Data Standardization (Median) & Standardized Distribution Analysis b. Loess Normalization Across Arrays & Distribution Analysis (Loess Normalized Data) 4) Evaluation of Normalized Data Quality a. Correlation and Principle Components b. Correlation and Grouped Scatter Plots 5) Primary Data Analysis for Determining Significant Differences in Gene Expression a. Analysis of Variance b. Mixed Model Analysis 6) Further Analysis a. Transpose Tall and Wide b. K-Means Clustering?c. Distance Matrix 7) Predictive Modeling 5. SAS AnovArray package The functions of Anov Array package has been written in SAS Macro language and so they can be just submitted to SAS software. SAS procedures handle a wide variety of statistical analysis such as clustering, supervised classification, singular value decomposition, partial least square regression etc. Anov Array is available in the site The data file must be contained in a text file (.txt extension) written in columns separated by spaces or tabulation. The input data to Anov Array package is supposed to be normalized and is from a balanced experimental design. The dataset should contain a column named GENE, which represents the gene identifier. 5.1 The contents of Anov Array package The package contains five macros called global_analysis, cleandata, adjust, differential_analysis and comparison. It is advised to use an iterative process of global_analysis, cleandata and adjust macros followed by differential_analysis and comparison.

4 Figure 1: The Anov Array strategy for the analysis of Anov Array package 5.2 global_analysis Macro The global_analysis macro uses functionalities of SAS ANOVA; it performs an analysis of variance on the data. The model is assumed to be complete and from a balanced experimental design. In SAS output window, the macro displays the analysis of variance table with fisher s exact test for each factor under consideration. In addition it enables to calculate fitted values, residuals, standardized residuals and it produces several graphs relative to standardized residuals. The AnovArray package has to be loaded before running any macros, hence as a first step we have to load the package using the syntax %include c:/... /AnovArray.1.0.sas Then the global_analysis macro can be executed by the command %global_analysis ( data=, stmts=, outdata=, outgraph=, procopt=, options= ) where data specifies the dataset, stms specifies the proc ANOVA statements for the analysis separated by semicolons and listed as a single argument to %str macro function. options are same as Proc Anova statements. 5.3 The cleandata Macro The cleandata macro contains dataset cleaning facilities to remove suspicious genes from dataset. These genes are sometimes explicitly known. In this case, the list of their identifier can be given by the user as a cleandata macro argument.

5 %cleandata ( data =, outdatakeep =, outdatadrop =, outdataoutliers =, limit =, droplist =, options = ) Where data specifies the dataset to be cleaned, outdatakeep is optional and specifies the output dataset name which is the original dataset from which the selected genes have been removed, outdatadrop specifies the dataset the dataset containing the genes which are removed, limit is a real number which specifies the genes which have standardized residual less than or more than limit is removed from the original data. 5.4 The adjust macro This is the normalization step, It is useful to adjust for systematic errors before doing differential analysis. There is no output window for adjust macro but a dataset is created which contains the adjusted signal. %adjust ( data =, outdata =, signal =, list = ) where data refers to the dataset, outdata refers to the output file after adjustments, signal refers to the name of the signal to be adjusted and the effects to be subtracted from signal. 5.5 The differential_analysis Macro The differential_analysis macro is used to identify differentially expressed genes under two or more experimental or biological conditions. The hypothesis is based on anova model comparison and in this macro user can choose between two procedures where hypothesis=hom or hypothesis=het arguments. Hypothesis=hom considers genes variances are homogeneous and otherwise hypothesis=het. The differential_analysis macro provides an output dataset with adjusted p values under both homogeneous and heterogeneous assumptions. %differential_analysis ( data=, outdata =, outgraph =, hypothesis =, signal =, treatment =, fdr = ) where data specifies the dataset we are using, outdata refers to the output dataset name which contains the output of the analysis, outgraph is optional and specifies the name of the graphical output file, hypothesis specifies homogeneous or heterogeneous variance hypothesis, signal specifies the name of the variable which contains the variable to be

6 analysed, treatment refers to the variable which contains treatment conditions under which the differential expression has to be analysed and fdr refers to the false discovery rate. 5.6 The comparison macro The comparison macro compares the results of differential_analysis macro under two hypothesis conditions of homogeneous and heterogeneous variance conditions. Each gene which shows different conclusions under two hypothesis has to be observed particularly for its variance before finalizing the results. Figure 2 Interpretation of the comparison graph. In summary The five macros of the package can be used either independently or in a concerted way as indicated in the strategy analysis described in figure 1. The anova model is defined in the macro global_analysis by the user. This macro computes the classical anova table which permits to identify factors which are important to explain observed differences in gene expression. As explained in the previous section, several graphs described are available to check model assumptions: variance homogeneity and gaussian distribution of residuals. These graphs can also be very useful to highlight which experimental factor affects a subpopulation of genes. Several models can be tested and the quality control facilities (statistics in the table of anova, graphs) permit to select which one is the more accurate. Depending on the results given by the macro global_analysis, it could be necessary to use macros adjust and cleandata. The macro adjust will then permit to systematically remove undesirable effects (factors) observed in graphs obtained by the macro global_analysis. In the same manner, the macro cleandata makes it possible to remove genes which do not respect the assumptions of the model. We advise to use this iterative process (global_analysis, cleandata and adjust) before using the macro differential_analysis. The aim of this process is to make sure that data are well fitted by the model and that model assumptions are satisfied. This process is very important to get reliable results on differentially expressed genes. As explained in the previous section, the package also permits the differential analysis under two hypotheses: either genes have equal variance (homogeneous model HOM) or each gene has its own variance (heterogeneous model HET). The macro differential_analysis produces the list of genes differentially expressed between several experimental conditions using p- values and adjusted p-values statistics. A p-value is defined as the probability of rejecting the

7 null hypothesis {The interaction gene x condition is null.}, if true. P-values are calculated for each gene under the hypothesis that all genes have the same variance and under the hypothesis that each gene has its own variance. By using the correction for multiple comparisons FDR (False Discovery Rate) [Tusher, et all (2001)], a gene is differentially expressed if its adjusted p-value is lower than a significance level given by the user. Finally, the macro comparison enables to compare graphically the results obtained by the two models of variance. In a way, the plot of adjusted p-values under hypothesis of homogeneous variance versus adjusted p-values under hypothesis of heterogeneous variance indicates the genes which probably do not satisfy the homogeneity of variance hypothesis. 6. Other macros in SAS There are some other macros available for microarray data analysis using SAS. The macros developed by Don (Dongguang) Li. NCIC-CTG at Queen s University and Lei Qin. Cancer Research Institute, Queen s University are available at This article introduces a SAS macro based program for microarrays data normalization. The algorithms of different methods for both within array and between arrays normalization are discussed. With the breast cancer data adopted from the Stanford Microarray Database, the author implemented the program to facilitate an automatic normalization process with optional methods selection and automatic graphs generation. The SAS macros written in the McIntyre Lab, are available at References Kerr, M. K., and Churchill, G. A. (2001). Experimental design for gene expression microarrays. Biostatistics 2, Glonek, G. F. V., and Solomon, P. J. (2002). Factorial designs for microarray experiments. Technical Report, Department of Applied Mathematics, University of Adelaide, Australia. Pan, W., Lin, J. and Le, C. (2002). How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biology 3(5): research Dudoit, S., Yang, Y. H, Speed, T. P., and Callow, M. J. (2002). Statistical methods for identifying differentially expressed genes in replicated cdna microarray experiments. Statistica Sinica 12, Yang, Y. H., Dudoit, S., Luu, P., and Speed, T. P. (2001). Normalization for cdna microarray data. In M. L. Bittner, Y. Chen, A. N. Dorsel, and E. R. Dougherty (eds.), Microarrays: Optical Technologies and Informatics, Volume 4266 of Proceedings of SPIE. Benjamini, Y., Hochberg, Y. (1995): 'Controlling the False Discovery Rate: a Practical and Powerful Approach to Multiple Testing', J. R. Statist. Soc. B 57, No. 1, pp Tusher, V.G., Tibshirani, R., Chu, G. (2001): 'Significance analysis of microarrays applied to the ionizing radiation response', PNAS 98, No. 9, pp Christelle Hennequet-Antier, Hélène Chiapello, Karine Piot, Séverine Degrelle, Isabelle Hue, Jean-Paul Renard, François Rodolphe and Stéphane Robin AnovArray: a set of SAS macros for the analysis of variance of gene expression data. BMC Bioinformatics 2005, 6:150

Gene Expression Analysis

Gene Expression Analysis Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands

More information

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant

Statistical Analysis. NBAF-B Metabolomics Masterclass. Mark Viant Statistical Analysis NBAF-B Metabolomics Masterclass Mark Viant 1. Introduction 2. Univariate analysis Overview of lecture 3. Unsupervised multivariate analysis Principal components analysis (PCA) Interpreting

More information

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey

Molecular Genetics: Challenges for Statistical Practice. J.K. Lindsey Molecular Genetics: Challenges for Statistical Practice J.K. Lindsey 1. What is a Microarray? 2. Design Questions 3. Modelling Questions 4. Longitudinal Data 5. Conclusions 1. What is a microarray? A microarray

More information

Tutorial for proteome data analysis using the Perseus software platform

Tutorial for proteome data analysis using the Perseus software platform Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information

More information

Gene expression analysis. Ulf Leser and Karin Zimmermann

Gene expression analysis. Ulf Leser and Karin Zimmermann Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a

More information

Analysis of gene expression data. Ulf Leser and Philippe Thomas

Analysis of gene expression data. Ulf Leser and Philippe Thomas Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:

More information

Statistical issues in the analysis of microarray data

Statistical issues in the analysis of microarray data Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data

More information

False Discovery Rates

False Discovery Rates False Discovery Rates John D. Storey Princeton University, Princeton, USA January 2010 Multiple Hypothesis Testing In hypothesis testing, statistical significance is typically based on calculations involving

More information

MAANOVA: A Software Package for the Analysis of Spotted cdna Microarray Experiments

MAANOVA: A Software Package for the Analysis of Spotted cdna Microarray Experiments MAANOVA: A Software Package for the Analysis of Spotted cdna Microarray Experiments i Hao Wu 1, M. Kathleen Kerr 2, Xiangqin Cui 1, and Gary A. Churchill 1 1 The Jackson Laboratory, Bar Harbor, ME 2 The

More information

Microarray Data Analysis. A step by step analysis using BRB-Array Tools

Microarray Data Analysis. A step by step analysis using BRB-Array Tools Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

Step-by-Step Guide to Basic Expression Analysis and Normalization

Step-by-Step Guide to Basic Expression Analysis and Normalization Step-by-Step Guide to Basic Expression Analysis and Normalization Page 1 Introduction This document shows you how to perform a basic analysis and normalization of your data. A full review of this document

More information

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays

Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Data, Measurements, Features

Data, Measurements, Features Data, Measurements, Features Middle East Technical University Dep. of Computer Engineering 2009 compiled by V. Atalay What do you think of when someone says Data? We might abstract the idea that data are

More information

Step by Step Guide to Importing Genetic Data into JMP Genomics

Step by Step Guide to Importing Genetic Data into JMP Genomics Step by Step Guide to Importing Genetic Data into JMP Genomics Page 1 Introduction Data for genetic analyses can exist in a variety of formats. Before this data can be analyzed it must imported into one

More information

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study

Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression

More information

Frequently Asked Questions Next Generation Sequencing

Frequently Asked Questions Next Generation Sequencing Frequently Asked Questions Next Generation Sequencing Import These Frequently Asked Questions for Next Generation Sequencing are some of the more common questions our customers ask. Questions are divided

More information

Exploratory data analysis for microarray data

Exploratory data analysis for microarray data Eploratory data analysis for microarray data Anja von Heydebreck Ma Planck Institute for Molecular Genetics, Dept. Computational Molecular Biology, Berlin, Germany heydebre@molgen.mpg.de Visualization

More information

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis

Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

A truly robust Expression analyzer

A truly robust Expression analyzer Genowiz A truly robust Expression analyzer Abstract Gene expression profiles of 10,000 tumor samples, disease classification, novel gene finding, linkage analysis, clinical profiling of diseases, finding

More information

Final Project Report

Final Project Report CPSC545 by Introduction to Data Mining Prof. Martin Schultz & Prof. Mark Gerstein Student Name: Yu Kor Hugo Lam Student ID : 904907866 Due Date : May 7, 2007 Introduction Final Project Report Pseudogenes

More information

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data

From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity

More information

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS

UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS UNSUPERVISED MACHINE LEARNING TECHNIQUES IN GENOMICS Dwijesh C. Mishra I.A.S.R.I., Library Avenue, New Delhi-110 012 dcmishra@iasri.res.in What is Learning? "Learning denotes changes in a system that enable

More information

SAS Analyst for Windows Tutorial

SAS Analyst for Windows Tutorial Updated: August 2012 Table of Contents Section 1: Introduction... 3 1.1 About this Document... 3 1.2 Introduction to Version 8 of SAS... 3 Section 2: An Overview of SAS V.8 for Windows... 3 2.1 Navigating

More information

Consistent Assay Performance Across Universal Arrays and Scanners

Consistent Assay Performance Across Universal Arrays and Scanners Technical Note: Illumina Systems and Software Consistent Assay Performance Across Universal Arrays and Scanners There are multiple Universal Array and scanner options for running Illumina DASL and GoldenGate

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and Clustering Techniques and STATISTICA Case Study: Defining Clusters of Shopping Center Patrons STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table

More information

Quality Assessment of Exon and Gene Arrays

Quality Assessment of Exon and Gene Arrays Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such

More information

Analysis of Illumina Gene Expression Microarray Data

Analysis of Illumina Gene Expression Microarray Data Analysis of Illumina Gene Expression Microarray Data Asta Laiho, Msc. Tech. Bioinformatics research engineer The Finnish DNA Microarray Centre Turku Centre for Biotechnology, Finland The Finnish DNA Microarray

More information

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics

Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Cancer Biostatistics Workshop Science of Doing Science - Biostatistics Yu Shyr, PhD Jan. 18, 2008 Cancer Biostatistics Center Vanderbilt-Ingram Cancer Center Yu.Shyr@vanderbilt.edu Aims Cancer Biostatistics

More information

Introduction to data analysis: Supervised analysis

Introduction to data analysis: Supervised analysis Introduction to data analysis: Supervised analysis Introduction to Microarray Technology course May 2011 Solveig Mjelstad Olafsrud solveig@microarray.no Most slides adapted/borrowed from presentations

More information

Statistics for BIG data

Statistics for BIG data Statistics for BIG data Statistics for Big Data: Are Statisticians Ready? Dennis Lin Department of Statistics The Pennsylvania State University John Jordan and Dennis K.J. Lin (ICSA-Bulletine 2014) Before

More information

Microarray Technology

Microarray Technology Microarrays And Functional Genomics CPSC265 Matt Hudson Microarray Technology Relatively young technology Usually used like a Northern blot can determine the amount of mrna for a particular gene Except

More information

From The Little SAS Book, Fifth Edition. Full book available for purchase here.

From The Little SAS Book, Fifth Edition. Full book available for purchase here. From The Little SAS Book, Fifth Edition. Full book available for purchase here. Acknowledgments ix Introducing SAS Software About This Book xi What s New xiv x Chapter 1 Getting Started Using SAS Software

More information

Measuring gene expression (Microarrays) Ulf Leser

Measuring gene expression (Microarrays) Ulf Leser Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

More information

Functional Data Analysis of MALDI TOF Protein Spectra

Functional Data Analysis of MALDI TOF Protein Spectra Functional Data Analysis of MALDI TOF Protein Spectra Dean Billheimer dean.billheimer@vanderbilt.edu. Department of Biostatistics Vanderbilt University Vanderbilt Ingram Cancer Center FDA for MALDI TOF

More information

PREDA S4-classes. Francesco Ferrari October 13, 2015

PREDA S4-classes. Francesco Ferrari October 13, 2015 PREDA S4-classes Francesco Ferrari October 13, 2015 Abstract This document provides a description of custom S4 classes used to manage data structures for PREDA: an R package for Position RElated Data Analysis.

More information

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure? Harvey Motulsky hmotulsky@graphpad.com This is the first case in what I expect will be a series of case studies. While I mention

More information

imarray An Integrated Data Management and Data Mining System for Microarray Data Analysis

imarray An Integrated Data Management and Data Mining System for Microarray Data Analysis imarray An Integrated Data Management and Data Mining System for Microarray Data Analysis Proposal Summary Microarray is a powerful tool for genomic research and it has great potential for clinical diagnoses

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

Capturing Best Practice for Microarray Gene Expression Data Analysis

Capturing Best Practice for Microarray Gene Expression Data Analysis Capturing Best Practice for Microarray Gene Expression Data Analysis Gregory Piatetsky-Shapiro KDnuggets gps@kdnuggets.com Tom Khabaza SPSS tkhabaza@spss.com Sridhar Ramaswamy MIT / Whitehead Institute

More information

DeCyder Extended Data Analysis module Version 1.0

DeCyder Extended Data Analysis module Version 1.0 GE Healthcare DeCyder Extended Data Analysis module Version 1.0 Module for DeCyder 2D version 6.5 User Manual Contents 1 Introduction 1.1 Introduction... 7 1.2 The DeCyder EDA User Manual... 9 1.3 Getting

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

Cluster software and Java TreeView

Cluster software and Java TreeView Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0

More information

Data analysis and regression in Stata

Data analysis and regression in Stata Data analysis and regression in Stata This handout shows how the weekly beer sales series might be analyzed with Stata (the software package now used for teaching stats at Kellogg), for purposes of comparing

More information

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data

Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data Using Illumina BaseSpace Apps to Analyze RNA Sequencing Data The Illumina TopHat Alignment and Cufflinks Assembly and Differential Expression apps make RNA data analysis accessible to any user, regardless

More information

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information

More information

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8 August 2013 A Short-Term Traffic Prediction On A Distributed Network Using Multiple Regression Equation Ms.Sharmi.S 1 Research Scholar, MS University,Thirunelvelli Dr.M.Punithavalli Director, SREC,Coimbatore. Abstract:

More information

Distances, Clustering, and Classification. Heatmaps

Distances, Clustering, and Classification. Heatmaps Distances, Clustering, and Classification Heatmaps 1 Distance Clustering organizes things that are close into groups What does it mean for two genes to be close? What does it mean for two samples to be

More information

Lecture 2: Descriptive Statistics and Exploratory Data Analysis

Lecture 2: Descriptive Statistics and Exploratory Data Analysis Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals

More information

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA

Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA PROC FACTOR: How to Interpret the Output of a Real-World Example Rachel J. Goldberg, Guideline Research/Atlanta, Inc., Duluth, GA ABSTRACT THE METHOD This paper summarizes a real-world example of a factor

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Unsupervised and supervised dimension reduction: Algorithms and connections

Unsupervised and supervised dimension reduction: Algorithms and connections Unsupervised and supervised dimension reduction: Algorithms and connections Jieping Ye Department of Computer Science and Engineering Evolutionary Functional Genomics Center The Biodesign Institute Arizona

More information

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables.

FACTOR ANALYSIS. Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables. FACTOR ANALYSIS Introduction Factor Analysis is similar to PCA in that it is a technique for studying the interrelationships among variables Both methods differ from regression in that they don t have

More information

Concepts of Experimental Design

Concepts of Experimental Design Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...

More information

200631 - ADO - Omics Data Analysis

200631 - ADO - Omics Data Analysis Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2016 200 - FME - School of Mathematics and Statistics 1004 - UB - (ENG)Universitat de Barcelona MASTER'S DEGREE IN STATISTICS AND

More information

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis

Data Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering

More information

SAS Certificate Applied Statistics and SAS Programming

SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and SAS Programming SAS Certificate Applied Statistics and Advanced SAS Programming Brigham Young University Department of Statistics offers an Applied Statistics and

More information

A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias

A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias B. M. Bolstad, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5 Group in Biostatistics, University

More information

Exercise with Gene Ontology - Cytoscape - BiNGO

Exercise with Gene Ontology - Cytoscape - BiNGO Exercise with Gene Ontology - Cytoscape - BiNGO This practical has material extracted from http://www.cbs.dtu.dk/chipcourse/exercises/ex_go/goexercise11.php In this exercise we will analyze microarray

More information

Why do statisticians "hate" us?

Why do statisticians hate us? Why do statisticians "hate" us? David Hand, Heikki Mannila, Padhraic Smyth "Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data

More information

Introduction to Pattern Recognition

Introduction to Pattern Recognition Introduction to Pattern Recognition Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Package empiricalfdr.deseq2

Package empiricalfdr.deseq2 Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz

More information

Chapter 6 Experiment Process

Chapter 6 Experiment Process Chapter 6 Process ation is not simple; we have to prepare, conduct and analyze experiments properly. One of the main advantages of an experiment is the control of, for example, subjects, objects and instrumentation.

More information

An Introduction to. Metrics. used during. Software Development

An Introduction to. Metrics. used during. Software Development An Introduction to Metrics used during Software Development Life Cycle www.softwaretestinggenius.com Page 1 of 10 Define the Metric Objectives You can t control what you can t measure. This is a quote

More information

Recombinant DNA and Biotechnology

Recombinant DNA and Biotechnology Recombinant DNA and Biotechnology Chapter 18 Lecture Objectives What Is Recombinant DNA? How Are New Genes Inserted into Cells? What Sources of DNA Are Used in Cloning? What Other Tools Are Used to Study

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012 Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization GENOME 560, Spring 2012 Data are interesting because they help us understand the world Genomics: Massive Amounts

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

ABSORBENCY OF PAPER TOWELS

ABSORBENCY OF PAPER TOWELS ABSORBENCY OF PAPER TOWELS 15. Brief Version of the Case Study 15.1 Problem Formulation 15.2 Selection of Factors 15.3 Obtaining Random Samples of Paper Towels 15.4 How will the Absorbency be measured?

More information

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS

BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS BBSRC TECHNOLOGY STRATEGY: TECHNOLOGIES NEEDED BY RESEARCH KNOWLEDGE PROVIDERS 1. The Technology Strategy sets out six areas where technological developments are required to push the frontiers of knowledge

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION 1.0.8 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA CALIFORNIA STATE UNIVERSITY, LOS ANGELES INFORMATION TECHNOLOGY SERVICES IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA Summer 2013, Version 2.0 Table of Contents Introduction...2 Downloading the

More information

Basic Analysis of Microarray Data

Basic Analysis of Microarray Data Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

STATISTICS AND GENE EXPRESSION ANALYSIS

STATISTICS AND GENE EXPRESSION ANALYSIS STATISTICS AND GENE EXPRESSION ANALYSIS TERRY SPEED Department of Statistics, University of California at Berkeley Division of Genetics & Bioinformatics, Walter & Eliza Hall Institute of Medical Research

More information

Simple Predictive Analytics Curtis Seare

Simple Predictive Analytics Curtis Seare Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

More information

Randomized Block Analysis of Variance

Randomized Block Analysis of Variance Chapter 565 Randomized Block Analysis of Variance Introduction This module analyzes a randomized block analysis of variance with up to two treatment factors and their interaction. It provides tables of

More information

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data

Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data CMPE 59H Comparison of Non-linear Dimensionality Reduction Techniques for Classification with Gene Expression Microarray Data Term Project Report Fatma Güney, Kübra Kalkan 1/15/2013 Keywords: Non-linear

More information

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?...

Two-Way ANOVA tests. I. Definition and Applications...2. II. Two-Way ANOVA prerequisites...2. III. How to use the Two-Way ANOVA tool?... Two-Way ANOVA tests Contents at a glance I. Definition and Applications...2 II. Two-Way ANOVA prerequisites...2 III. How to use the Two-Way ANOVA tool?...3 A. Parametric test, assume variances equal....4

More information

Valutazione di software di analisi di microarray basato su simulazioni di immagini da dati reali

Valutazione di software di analisi di microarray basato su simulazioni di immagini da dati reali Testing and evaluation of microarray image analysis software Valutazione di software di analisi di microarray basato su simulazioni di immagini da dati reali Ignazio Infantino Tutorial su Metodi e strumenti

More information

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study

What is Data Analysis. Kerala School of MathematicsCourse in Statistics for Scientis. Introduction to Data Analysis. Steps in a Statistical Study Kerala School of Mathematics Course in Statistics for Scientists Introduction to Data Analysis T.Krishnan Strand Life Sciences, Bangalore What is Data Analysis Statistics is a body of methods how to use

More information

Statistical Issues in cdna Microarray Data Analysis

Statistical Issues in cdna Microarray Data Analysis Citation: Smyth, G. K., Yang, Y.-H., Speed, T. P. (2003). Statistical issues in cdna microarray data analysis. Methods in Molecular Biology 224, 111-136. [PubMed ID 12710670] Statistical Issues in cdna

More information

Moderation. Moderation

Moderation. Moderation Stats - Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015

UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015 UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1 Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE

More information

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode

Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................

More information

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples

Single-Cell DNA Sequencing with the C 1. Single-Cell Auto Prep System. Reveal hidden populations and genetic diversity within complex samples DATA Sheet Single-Cell DNA Sequencing with the C 1 Single-Cell Auto Prep System Reveal hidden populations and genetic diversity within complex samples Single-cell sensitivity Discover and detect SNPs,

More information

ALLEN Mouse Brain Atlas

ALLEN Mouse Brain Atlas TECHNICAL WHITE PAPER: QUALITY CONTROL STANDARDS FOR HIGH-THROUGHPUT RNA IN SITU HYBRIDIZATION DATA GENERATION Consistent data quality and internal reproducibility are critical concerns for high-throughput

More information

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6

Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Analyzing microrna Data and Integrating mirna with Gene Expression Data in Partek Genomics Suite 6.6 Overview This tutorial outlines how microrna data can be analyzed within Partek Genomics Suite. Additionally,

More information