Frozen Robust Multi-Array Analysis and the Gene Expression Barcode
|
|
|
- Kenneth Willis
- 10 years ago
- Views:
Transcription
1 Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) From CEL files to expression estimates Advanced Options Summarization Methods Input Vectors Output Parameters Creation of Gene Expression Barcodes Getting Started Example Output Options The frma implements methods for the preprocessing (frma) and analysis (barcode) of single microarrays. The frma algorithm allows the user to preprocess arrays individually while retaining the advantages of multi-array preprocessing methods such as RMA. The Gene Expression Barcode provides estimates of absolute expression rather than relative expression. 1
2 1 Frozen Robust Multiarray Analysis (frma) Frozen RMA (frma) is a microarray preprocessing algorithm that allows one to analyze microarrays individually or in small batches and then combine the data for analysis. This is accomplished by utilizing information from the large publicly available microarray databases. Specifically, estimates of probe-specific effects and variances are precomputed and frozen. Then, with new data sets, these frozen parameters are used in concert with information from the new array(s) to preprocess the data. The frma is particularly useful when it is not feasible to preprocess all of the data simultaneously. Such situations often arise when: ˆ large meta-analyses require one to preprocess more data than can be stored in memory, ˆ datasets grow incrementally and it would be laborious to preprocess all of the data each time a new array is added, ˆ microarrays are used to aid in clinical diagnosis and treatment, one needs to obtain information based on a single sample hybridized to a single microarray. Additionally, frma down-weights probes that appear to be batchy, those that are most susceptible to batch-effects. 1.1 From CEL files to expression estimates For Affymetrix microarray data, it is customary to view CEL files as the starting point for preprocessing and analysis. One or more CEL files can be read into R using the ReadAffy function from the affy package to produce an AffyBatch object or using the read.celfiles function from the oligo to produce a FeatureSet. The goal of frma is to obtain reliable gene-level intensities from the raw microarray data. This amounts to converting raw probe-level intensities into background-corrected, normalized, and summarized gene-level intensities. The frma function takes an Affy- Batch or FeatureSet as input and produces an object containing gene-level expression values. This object can take one of two forms, an ExpressionSet or a frmaexpressionset, depending upon the additional information that is requested. The gene-level intensities stored in both these objects can be accessed using the exprs method. In addition to the raw data, the frma algorithm requires a number of frozen parameter vectors. Among these are the reference distribution to which the data are normalized and the probe-effect estimates. We have computed these frozen parameters for many 2
3 popular Affymetrix platforms. The data for each of these platforms is stored in an R package of the form <platform>frmavecs. By default, the frma function attempts to load the appropriate data package for the input data object. The remainder of this section describes typical use of the frma. For details of the statistical methodology implemented in the frma function, please read the following papers: McCall MN, Bolstad BM, and Irizarry RA (2010). Frozen Robust Multi-Array Analysis (frma), Biostatistics, 11(2): McCall MN, Jaffee HA, Irizarry RA (2012). frma ST: Frozen robust multiarray analysis for Affymetrix Exon and Gene ST arrays, Bioinformatics, 28(23): The following is a simple description of how to preprocess a single CEL file using the default version of frma: 1. Download and install frma and the appropriate frozen parameter package (i.e. hgu133afrmavecs, hgu133plus2frmavecs, etc.). 2. Load the frma package. > library(frma) 3. Read in the data as either an AffyBatch or FeatureSet object. For the early Affymetrix platforms (e.g. HGU133plus2), you should use the ReadAffy function from the affy package to read in the CEL files. For the later Affymetrix platforms (e.g. Gene ST or Exon ST), you should use the read.celfiles function from the oligo package. 4. In this vignette we will load an example AffyBatch object: > library(frmaexampledata) > data(affybatchexample) 5. Preprocess the raw data object using the default version of frma. > object <- frma(affybatchexample) The final object will be an ExpressionSet or frmaexpressionset. The latter is an extension of the ExpressionSet class to hold additional information related to the preprocessing procedure such as weights and residuals. To obtain a matrix of gene-level expression values, enter the following command: > e <- exprs(object) 3
4 To assess the quality of the expression estimates, one can use the GNUSE function: > GNUSE(object,type="stats") GSM CEL.gz GSM CEL.gz GSM CEL.gz median IQR % % The GNUSE is a modification of the NUSE quality metric that allows one to assess the quality of individual samples relative to the large training data set used to compute the frozen parameter vectors. The GNUSE is described in more detail in the following paper: McCall MN, Murakami PN, Lukk M, Huber W, Irizarry RA (2011). Affymetrix GeneChip Microarray Quality, BMC Bioinformatics, 12:137. Assessing Depending on the number of CEL files and on the memory available to your system, you might experience errors like Cannot allocate vector.... An obvious option is to increase the memory available to your R process (by adding memory and/or closing external applications). You might also consider analyzing the data in smaller batches. Recall that frma allows you to preprocess data separately and then combine the preprocessed data for further analysis. 1.2 Advanced Options The default arguments to the frma function will be sufficient for most users; however, additional options have been implemented to allow the user to control each stage of the preprocessing, as well as, the information returned by the frma function. This flexibility is instrumental in allowing users to easily explore alternative methods of preprocessing Summarization Methods Summarization refers to the method used to combine probe-level expression values to obtain gene-level expression estimates. There are several summarization methods that one can choose from when running frma. A brief description of each of the methods follow: ˆ average: subtract the probe-effect and then compute the mean of the probes in each probeset. 4
5 ˆ median: subtract the probe-effect and compute the median of the probes in each probeset. ˆ median polish: this is the same as the median summarization because the probeeffects have already been removed. ˆ weighted average: compute a weighted average of the probes in each probeset with weights equal to the inverse of the sum of the precomputed within and between batch variance estimates. ˆ robust weighted average: compute a weighted average of the probes in each probeset with weights equal to the weights returned by an M-estimation procedure divided by the sum of the precomputed within and between batch variance estimates. ˆ random effect: the robust weighted average method adapted for a batch of new arrays (see the frma paper for details) Input Vectors While the vast majority of users will use the precomputed vectors provided in the frmavecs packages, the frma function will accept user-supplied frozen parameter vectors. The frmatools package contains functions to create your own frozen parameter vectors. There are several situations in which creating your own frozen parameter vectors may be beneficial. These are described in detail in: McCall MN and Irizarry RA (2011). Thawing Frozen Robust Multi-array Analysis (frma), BMC Bioinformatics, 12:369. If you create your own frozen parameter vectors using functions in the frmatools package, the vectors will already be in the correct format: a list with elements normvec, probevec, probevarbetween, probevarwithin, probesetsd, and medianse. A description of each of these elements follows: ˆ normvec: a vector containing values of the reference distribution to which samples will be quantile normalized ˆ probevec: a vector of probe-effect estimates ˆ probevarbetween: a vector of the between batch variance for each probe ˆ probevarwithin: a vector of the within batch variance for each probe ˆ probesetsd: a vector of average within probeset standard deviations 5
6 ˆ medianse: a vector of median standard errors of the expression estimates (used by the GNUSE function) Output Parameters While the default is to only return the gene-level expression estimates and if applicable their standard errors, the frma function can also return additional information about the estimates depending on the summarization method chosen. A description of the arguments that can be included in the output.param argument follows: ˆ weights: the weights from the M-estimation procedure ˆ residuals: the residuals from fitting the probe-level model ˆ randomeffects: estimated random effects from fitting the probe-level model with random effect summarization Note that not all of these outputs are available for all of the summarization methods. 2 Creation of Gene Expression Barcodes The barcode algorithm is designed to estimate which genes are expressed and which are unexpressed in a given microarray hybridization. This is accomplished by: (1) using the distribution of observed log 2 intensities across a wide variety of tissues to estimate an expressed and an unexpressed distribution for each gene, and (2) using these estimated distributions to determine which genes are expressed / unexpressed in a given sample. The first step is accomplished by fitting a hierarchical mixture model to the plethora of publicly available data. The second step is accomplished by determining where the observed intensities from the new array fall in the estimated distributions. The default output of the barcode function is a vector of ones and zeros denoting which genes are estimated to be expressed (ones) and unexpressed (zeros). We call this a gene expression barcode. For more details about the Gene Expression Barcode, please read the following papers: McCall MN, Uppal K, Jaffee HA, Zilliox MJ, and Irizarry RA (2011). The Gene Expression Barcode: leveraging public data repositories to begin cataloging the human and murine transcriptomes, Nucleic Acids Research, 39:D1011-D
7 McCall MN, Jaffee HA, Zelisko SJ, Sinha N, Hooiveld G, Irizarry RA, Zilliox MJ (2014). The Gene Expression Barcode 3.0: improved data processing and mining tools, Nucleic Acids Research, 42(D1):D938-D Getting Started To create a gene expression barcode, one needs estimates of the gene expression distributions specifically the mean and variance of the unexpressed distribution for each gene. We have computed these for several popular Affymetrix platforms. To use one of these, simply preprocess your data using the default options of frma and then run the barcode function on the resulting object. Similar to the frma function, the barcode function requires platform specific precomputed parameters. These parameters are stored in the same R packages as the frozen parameter vectors used by frma. In the default implementation, the barcode function attempts to load the appropriate set of parameters for the given ExpressionSet or frma- ExpressionSet object. It is also possible for the user to supply the necessary parameters via optional arguments Example 1. Download and install the frma package and the appropriate data package(s) (i.e. hgu133afrmavecs). 2. Load the frma package. > library(frma) 3. Read in the data and preprocess using the default options. > library(frmaexampledata) > data(affybatchexample) > object <- frma(affybatchexample) 4. Convert the expression values to a gene expression barcode. > bc <- barcode(object) 2.2 Output Options The default output of the barcode function is to return a vector of ones (expressed) and zeros (unexpressed); however, there are alternative output options. A brief description of each of these follows: 7
8 ˆ weight: a vector of weights which roughly correspond to the probability of expression for each gene. ˆ z-score: a vector of z-scores under the unexpressed normal distribution for each gene. ˆ p-value: a vector of p-values under the unexpressed normal distributionfor each gene. ˆ lod: a vector of LOD scores under the unexpressed normal distributionfor each gene. 8
Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
affyplm: Fitting Probe Level Models
affyplm: Fitting Probe Level Models Ben Bolstad [email protected] http://bmbolstad.com April 16, 2015 Contents 1 Introduction 2 2 Fitting Probe Level Models 2 2.1 What is a Probe Level Model and What is
Quality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
Row Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
DEVELOPMENT OF MAP/REDUCE BASED MICROARRAY ANALYSIS TOOLS
Clemson University TigerPrints All Theses Theses 8-2013 DEVELOPMENT OF MAP/REDUCE BASED MICROARRAY ANALYSIS TOOLS Guangyu Yang Clemson University, [email protected] Follow this and additional works at:
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory. April, 2015
UKB_WCSGAX: UK Biobank 500K Samples Genotyping Data Generation by the Affymetrix Research Services Laboratory April, 2015 1 Contents Overview... 3 Rare Variants... 3 Observation... 3 Approach... 3 ApoE
Gene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
Measuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
Analysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
Analyzing and Benchmarking Genomic Preprocessing and Batch Effect Removal Methods in Big Data Infrastructure
Faculty of Science and Bio-Engineering Sciences Department of Computer Science Analyzing and Benchmarking Genomic Preprocessing and Batch Effect Removal Methods in Big Data Infrastructure Graduation thesis
Exiqon Array Software Manual. Quick guide to data extraction from mircury LNA microrna Arrays
Exiqon Array Software Manual Quick guide to data extraction from mircury LNA microrna Arrays March 2010 Table of contents Introduction Overview...................................................... 3 ImaGene
Microarray Analysis. The Basics. Thomas Girke. December 9, 2011. Microarray Analysis Slide 1/42
Microarray Analysis The Basics Thomas Girke December 9, 2011 Microarray Analysis Slide 1/42 Technology Challenges Data Analysis Data Depositories R and BioConductor Homework Assignment Microarray Analysis
Data Mining - Evaluation of Classifiers
Data Mining - Evaluation of Classifiers Lecturer: JERZY STEFANOWSKI Institute of Computing Sciences Poznan University of Technology Poznan, Poland Lecture 4 SE Master Course 2008/2009 revised for 2010
Normalization Methods for Analysis of Affymetrix GeneChip Microarray
Microarray Data Analysis Normalization Methods for Analysis of Affymetrix GeneChip Microarray 中 央 研 究 院 生 命 科 學 圖 書 館 2008 年 教 育 訓 練 課 程 2008/01/29 1 吳 漢 銘 淡 江 大 學 數 學 系 [email protected] http://www.hmwu.idv.tw
Simulating Investment Portfolios
Page 5 of 9 brackets will now appear around your formula. Array formulas control multiple cells at once. When gen_resample is used as an array formula, it assures that the random sample taken from the
Using the Grid for the interactive workflow management in biomedicine. Andrea Schenone BIOLAB DIST University of Genova
Using the Grid for the interactive workflow management in biomedicine Andrea Schenone BIOLAB DIST University of Genova overview background requirements solution case study results background A multilevel
1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Introduce moderated multiple regression Continuous predictor continuous predictor Continuous predictor categorical predictor Understand
Package GSA. R topics documented: February 19, 2015
Package GSA February 19, 2015 Title Gene set analysis Version 1.03 Author Brad Efron and R. Tibshirani Description Gene set analysis Maintainer Rob Tibshirani Dependencies impute
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
They can be obtained in HQJHQH format directly from the home page at: http://www.engene.cnb.uam.es/downloads/kobayashi.dat
HQJHQH70 *XLGHG7RXU This document contains a Guided Tour through the HQJHQH platform and it was created for training purposes with respect to the system options and analysis possibilities. It is not intended
Statistical issues in the analysis of microarray data
Statistical issues in the analysis of microarray data Daniel Gerhard Institute of Biostatistics Leibniz University of Hannover ESNATS Summerschool, Zermatt D. Gerhard (LUH) Analysis of microarray data
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
Removing Unwanted Variation from High Dimensional Data with Negative Controls
Removing Unwanted Variation from High Dimensional Data with Negative Controls Johann A. Gagnon-Bartsch Laurent Jacob Terence P. Speed December 2, 2013 Abstract High dimensional data suffer from unwanted
Lecture 2: Descriptive Statistics and Exploratory Data Analysis
Lecture 2: Descriptive Statistics and Exploratory Data Analysis Further Thoughts on Experimental Design 16 Individuals (8 each from two populations) with replicates Pop 1 Pop 2 Randomly sample 4 individuals
Cluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
Multiple regression - Matrices
Multiple regression - Matrices This handout will present various matrices which are substantively interesting and/or provide useful means of summarizing the data for analytical purposes. As we will see,
Package GEOquery. August 18, 2015
Type Package Package GEOquery August 18, 2015 Title Get data from NCBI Gene Expression Omnibus (GEO) Version 2.34.0 Date 2014-09-28 Author Maintainer BugReports
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem
FlipFlop: Fast Lasso-based Isoform Prediction as a Flow Problem Elsa Bernard Laurent Jacob Julien Mairal Jean-Philippe Vert September 24, 2013 Abstract FlipFlop implements a fast method for de novo transcript
Lecture 11 Data storage and LIMS solutions. Stéphane LE CROM [email protected]
Lecture 11 Data storage and LIMS solutions Stéphane LE CROM [email protected] Various steps of a DNA microarray experiment Experimental steps Data analysis Experimental design set up Chips on catalog
WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics
WEEK #3, Lecture 1: Sparse Systems, MATLAB Graphics Visualization of Matrices Good visuals anchor any presentation. MATLAB has a wide variety of ways to display data and calculation results that can be
Basic Analysis of Microarray Data
Basic Analysis of Microarray Data A User Guide and Tutorial Scott A. Ness, Ph.D. Co-Director, Keck-UNM Genomics Resource and Dept. of Molecular Genetics and Microbiology University of New Mexico HSC Tel.
InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis
InSyBio BioNets: Utmost efficiency in gene expression data and biological networks analysis WHITE PAPER By InSyBio Ltd Konstantinos Theofilatos Bioinformatician, PhD InSyBio Technical Sales Manager August
Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation
Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.
Package cpm. July 28, 2015
Package cpm July 28, 2015 Title Sequential and Batch Change Detection Using Parametric and Nonparametric Methods Version 2.2 Date 2015-07-09 Depends R (>= 2.15.0), methods Author Gordon J. Ross Maintainer
A Toolbox for Bicluster Analysis in R
Sebastian Kaiser and Friedrich Leisch A Toolbox for Bicluster Analysis in R Technical Report Number 028, 2008 Department of Statistics University of Munich http://www.stat.uni-muenchen.de A Toolbox for
Paper PO03. A Case of Online Data Processing and Statistical Analysis via SAS/IntrNet. Sijian Zhang University of Alabama at Birmingham
Paper PO03 A Case of Online Data Processing and Statistical Analysis via SAS/IntrNet Sijian Zhang University of Alabama at Birmingham BACKGROUND It is common to see that statisticians at the statistical
Classification of Bad Accounts in Credit Card Industry
Classification of Bad Accounts in Credit Card Industry Chengwei Yuan December 12, 2014 Introduction Risk management is critical for a credit card company to survive in such competing industry. In addition
Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection
Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.
Gene expression analysis. Ulf Leser and Karin Zimmermann
Gene expression analysis Ulf Leser and Karin Zimmermann Ulf Leser: Bioinformatics, Wintersemester 2010/2011 1 Last lecture What are microarrays? - Biomolecular devices measuring the transcriptome of a
APPENDIX D PHN CHECK DIGIT NUMBER ROUTINE
Ministry of Health Services Professional and Software Compliance Standards For HL7 Messaging APPENDIX D PHN CHECK DIGIT NUMBER ROUTINE Version 2.1 November 21, 2003 Professional and Software Compliance
Multivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
Package empiricalfdr.deseq2
Type Package Package empiricalfdr.deseq2 May 27, 2015 Title Simulation-Based False Discovery Rate in RNA-Seq Version 1.0.3 Date 2015-05-26 Author Mikhail V. Matz Maintainer Mikhail V. Matz
Local outlier detection in data forensics: data mining approach to flag unusual schools
Local outlier detection in data forensics: data mining approach to flag unusual schools Mayuko Simon Data Recognition Corporation Paper presented at the 2012 Conference on Statistical Detection of Potential
15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
Correlation of microarray and quantitative real-time PCR results. Elisa Wurmbach Mount Sinai School of Medicine New York
Correlation of microarray and quantitative real-time PCR results Elisa Wurmbach Mount Sinai School of Medicine New York Microarray techniques Oligo-array: Affymetrix, Codelink, spotted oligo-arrays (60-70mers)
Boolean Network Models
Boolean Network Models 2/5/03 History Kaufmann, 1970s Studied organization and dynamics properties of (N,k) Boolean Networks Found out that highly connected networks behave differently than lowly connected
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC
Text Mining in JMP with R Andrew T. Karl, Senior Management Consultant, Adsurgo LLC Heath Rushing, Principal Consultant and Co-Founder, Adsurgo LLC 1. Introduction A popular rule of thumb suggests that
Introduction to robust calibration and variance stabilisation with VSN
Introduction to robust calibration and variance stabilisation with VSN Wolfgang Huber January 5, 2016 Contents 1 Getting started 1 2 Running VSN on data from a single two-colour array 2 3 Running VSN on
The Scientific Data Mining Process
Chapter 4 The Scientific Data Mining Process When I use a word, Humpty Dumpty said, in rather a scornful tone, it means just what I choose it to mean neither more nor less. Lewis Carroll [87, p. 214] In
User Manual. Transcriptome Analysis Console (TAC) Software. For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev.
User Manual Transcriptome Analysis Console (TAC) Software For Research Use Only. Not for use in diagnostic procedures. P/N 703150 Rev. 1 Trademarks Affymetrix, Axiom, Command Console, DMET, GeneAtlas,
There are a number of different methods that can be used to carry out a cluster analysis; these methods can be classified as follows:
Statistics: Rosie Cornish. 2007. 3.1 Cluster Analysis 1 Introduction This handout is designed to provide only a brief introduction to cluster analysis and how it is done. Books giving further details are
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7
DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7 UNDER THE GUIDANCE Dr. N.P. DHAVALE, DGM, INFINET Department SUBMITTED TO INSTITUTE FOR DEVELOPMENT AND RESEARCH IN BANKING TECHNOLOGY
A Brief Study of the Nurse Scheduling Problem (NSP)
A Brief Study of the Nurse Scheduling Problem (NSP) Lizzy Augustine, Morgan Faer, Andreas Kavountzis, Reema Patel Submitted Tuesday December 15, 2009 0. Introduction and Background Our interest in the
Microarray Data Mining: Puce a ADN
Microarray Data Mining: Puce a ADN Recent Developments Gregory Piatetsky-Shapiro KDnuggets EGC 2005, Paris 2005 KDnuggets EGC 2005 Role of Gene Expression Cell Nucleus Chromosome Gene expression Protein
Microarray Data Analysis. A step by step analysis using BRB-Array Tools
Microarray Data Analysis A step by step analysis using BRB-Array Tools 1 EXAMINATION OF DIFFERENTIAL GENE EXPRESSION (1) Objective: to find genes whose expression is changed before and after chemotherapy.
Programming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning November 4, 2011 Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks
Lambda Architecture. Near Real-Time Big Data Analytics Using Hadoop. January 2015. Email: [email protected] Website: www.qburst.com
Lambda Architecture Near Real-Time Big Data Analytics Using Hadoop January 2015 Contents Overview... 3 Lambda Architecture: A Quick Introduction... 4 Batch Layer... 4 Serving Layer... 4 Speed Layer...
From Reads to Differentially Expressed Genes. The statistics of differential gene expression analysis using RNA-seq data
From Reads to Differentially Expressed Genes The statistics of differential gene expression analysis using RNA-seq data experimental design data collection modeling statistical testing biological heterogeneity
2.2 Elimination of Trend and Seasonality
26 CHAPTER 2. TREND AND SEASONAL COMPONENTS 2.2 Elimination of Trend and Seasonality Here we assume that the TS model is additive and there exist both trend and seasonal components, that is X t = m t +
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES
BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 123 CHAPTER 7 BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES 7.1 Introduction Even though using SVM presents
More details on the inputs, functionality, and output can be found below.
Overview: The SMEEACT (Software for More Efficient, Ethical, and Affordable Clinical Trials) web interface (http://research.mdacc.tmc.edu/smeeactweb) implements a single analysis of a two-armed trial comparing
Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.
Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are
Tutorial for proteome data analysis using the Perseus software platform
Tutorial for proteome data analysis using the Perseus software platform Laboratory of Mass Spectrometry, LNBio, CNPEM Tutorial version 1.0, January 2014. Note: This tutorial was written based on the information
Introduction To Real Time Quantitative PCR (qpcr)
Introduction To Real Time Quantitative PCR (qpcr) SABiosciences, A QIAGEN Company www.sabiosciences.com The Seminar Topics The advantages of qpcr versus conventional PCR Work flow & applications Factors
Tutorial 5: Hypothesis Testing
Tutorial 5: Hypothesis Testing Rob Nicholls [email protected] MRC LMB Statistics Course 2014 Contents 1 Introduction................................ 1 2 Testing distributional assumptions....................
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank
Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through
GAIA: Genomic Analysis of Important Aberrations
GAIA: Genomic Analysis of Important Aberrations Sandro Morganella Stefano Maria Pagnotta Michele Ceccarelli Contents 1 Overview 1 2 Installation 2 3 Package Dependencies 2 4 Vega Data Description 2 4.1
Using MATLAB: Bioinformatics Toolbox for Life Sciences
Using MATLAB: Bioinformatics Toolbox for Life Sciences MR. SARAWUT WONGPHAYAK BIOINFORMATICS PROGRAM, SCHOOL OF BIORESOURCES AND TECHNOLOGY, AND SCHOOL OF INFORMATION TECHNOLOGY, KING MONGKUT S UNIVERSITY
Categorical Data Visualization and Clustering Using Subjective Factors
Categorical Data Visualization and Clustering Using Subjective Factors Chia-Hui Chang and Zhi-Kai Ding Department of Computer Science and Information Engineering, National Central University, Chung-Li,
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat
Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln. Log-Rank Test for More Than Two Groups
Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)
Polynomial Neural Network Discovery Client User Guide
Polynomial Neural Network Discovery Client User Guide Version 1.3 Table of contents Table of contents...2 1. Introduction...3 1.1 Overview...3 1.2 PNN algorithm principles...3 1.3 Additional criteria...3
Package hier.part. February 20, 2015. Index 11. Goodness of Fit Measures for a Regression Hierarchy
Version 1.0-4 Date 2013-01-07 Title Hierarchical Partitioning Author Chris Walsh and Ralph Mac Nally Depends gtools Package hier.part February 20, 2015 Variance partition of a multivariate data set Maintainer
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 [email protected] Genomics A genome is an organism s
Magruder Statistics & Data Analysis
Magruder Statistics & Data Analysis Caution: There will be Equations! Based Closely On: Program Model The International Harmonized Protocol for the Proficiency Testing of Analytical Laboratories, 2006
Gamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
WISE Sampling Distribution of the Mean Tutorial
Name Date Class WISE Sampling Distribution of the Mean Tutorial Exercise 1: How accurate is a sample mean? Overview A friend of yours developed a scale to measure Life Satisfaction. For the population
How To Understand And Solve A Linear Programming Problem
At the end of the lesson, you should be able to: Chapter 2: Systems of Linear Equations and Matrices: 2.1: Solutions of Linear Systems by the Echelon Method Define linear systems, unique solution, inconsistent,
Medical Information Management & Mining. You Chen Jan,15, 2013 [email protected]
Medical Information Management & Mining You Chen Jan,15, 2013 [email protected] 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6
Introduction to Bioinformatics AS 250.265 Laboratory Assignment 6 In the last lab, you learned how to perform basic multiple sequence alignments. While useful in themselves for determining conserved residues
CNV Univariate Analysis Tutorial
CNV Univariate Analysis Tutorial Release 8.1 Golden Helix, Inc. March 18, 2014 Contents 1. Overview 2 2. CNAM Optimal Segmenting 4 A. Performing CNAM Optimal Segmenting..................................
Oracle Data Miner (Extension of SQL Developer 4.0)
An Oracle White Paper September 2013 Oracle Data Miner (Extension of SQL Developer 4.0) Integrate Oracle R Enterprise Mining Algorithms into a workflow using the SQL Query node Denny Wong Oracle Data Mining
