Affymetrix.CEL Files. Preprocessing for Affymetrix GeneChip Data. Methods. MAS 5.0 Signal: Background Adjustment
|
|
- Leslie Archibald Blair
- 7 years ago
- Views:
Transcription
1 Preprocessing for Affymetrix GeneChip Data 1/18/11 Copyright 11 Dan Nettleton Affymetrix.CEL Files A.CEL file contains one number representing signal intensity for each probe cell on a single GeneChip..CEL files can be read with Affymetrix software or in R using the Bioconductor package affy. We will discuss two methods for normalizing and obtaining expression measures using data from Affymetrix.CEL files. 1 Methods MAS 5. Signal: Background Adjustment 1. Microarray Analysis Suite (MAS) 5. Signal proposed by Affymetrix. Statistical Algorithms Description Document () Affymetrix Inc.. Robust Multi-array Average (RMA) proposed by Irizarray et al. (3) Biostatistics, 9-6. These are perhaps the two most popular of many methods for normalizing and computing expression measures using Affymetrix data. Currently > 5 methods are described and compared at 3 Each chip is divided into 16 rectangular zones. The lowest % of intensities in each zone are averaged to form a zone-specific background value denoted bz k for zones k=1,,..., 16. The standard deviation of the lowest % of intensities in each zone is calculated and denoted nz k for zones k=1,,..., 16. Let d k (x,y) denote the distance from the center of zone k to a probe cell located at coordinates (x,y) on the chip. GeneChip Divided into 16 Zones 16 Distances to Zone Centers for Each Probe Cell probe cell at coordinates (x,y) d 1 (x,y) d (x,y) y d 16 (x,y) x 5 6 1
2 MAS 5. Signal: Background Adjustment Let w k (x,y)=1/(d k (x,y)+1). Denote the background for the cell located at coordinates (x,y) by b(x,y)=σ k=1 w k (x,y) bz k / Σ k=1 w k (x,y). Denote the noise for the cell located at coordinates (x,y) by n(x,y)=σ k=1 w k (x,y) nz k / Σ k=1 w k (x,y). MAS 5. Signal: Background Adjustment Let I(x,y) denote the original intensity of the cell located at coordinates (x,y) on the chip. (5 th percentile of 36 pixel intensities in the center of the cell.) Let I (x,y)=max ( I(x,y),.5 ). Define the background-adjusted intensity for the cell at coordinates (x,y) by A(x,y)=max { I (x,y)-b(x,y),.5n(x,y) }. Henceforth these background-adjusted intensities will be referred to as either PM or MM for perfect match or mismatch cells, respectively. 8 MAS 5. Signal: Ideal Mismatch Computation MM values are supposed to provide measures of crosshybridization and stray signal intensity that inflate the value of PM. In the simplest case, a PM value would be corrected simply by subtracting its corresponding MM value. However, some MM values are bigger than their corresponding PM values so that PM-MM would become negative. MAS 5. Signal: Ideal Mismatch Computation For a given probe set containing n probe pairs, let PM j and MM j denote the perfect match and mismatch values of the j th probe pair. The IM value from the j th probe pair (IM j ) is determined as follows: If PM j > MM j, then IM j = MM j and no further computation is needed. If PM j MM j, compute Because negative values do not make a lot of sense and would pose problems with subsequent steps in analysis, Affymetrix determines an Ideal Mismatch (IM) value for M = TBW { log (PM 1 /MM 1 ),...,log (PM n /MM n ) } where TBW denotes a one-step Tukey BiWeight (a special weighted average described later). each probe pair that is guaranteed to be less than PM. 9 1 MAS 5. Signal: Ideal Mismatch Computation If M >.3, then IM j = PM j / M..3 If M.3, then compute P = 1 + (.3-M ) 1 and let IM j = PM j / P. Note that at M =.3, IM j = PM j / 1.11 so that PM j will be slightly larger than IM j. As M gets larger, IM j decreases. As M gets smaller, IM j increases towards PM j / MAS 5. Signal: Signal Log Value Computation Let V j = max ( PM j IM j, - ). Define the probe value for the j th probe pair by PV j = log (V j ). The signal log value for a given probe set is defined by SLV = TBW ( PV 1, PV,..., PV n ) where TBW denotes a one-step Tukey BiWeight (a special weighted average to be discussed later). 11 1
3 MAS 5. Signal: Scaling and Signal Calculation Let SLV i denote the signal log value for the i th probe set on a single chip. Let I denote the number of probe sets on the chip. Let SF = 5/TrimMean( SLV 1, SLV,..., SLV I ;.,.98). The average of the values in parentheses that are strictly between the. and.98 quantiles of the values in parentheses. MAS 5. Signal for the i th probe set is Signal i = SF * SLV i. All computations are done separately for each chip to obtain a Signal value for each chip and probe set. 13 The One-Step Tukey BiWeight Estimator Used by Affymetrix Let x 1, x,..., x n denote observations. Let m = median ( x 1, x,..., x n ). Let MAD = median ( x 1 m, x m,..., x n m ). For each i = 1,,..., n; let t i = x i -m. 5 * MAD +.1 Factor Affymetrix uses to avoid division by. 1 The One-Step Tukey BiWeight Estimator Used by Affymetrix (ctd.) Recall the bisquare weight function defined as B(t) = ( 1 - t ) Bisquare Weight Function for t < 1 = for t 1. n TBW ( x 1, x,..., x n ) = Σ i=1 B(t i ) x i n Σ i=1 B(t i ) B(t) t An Example Compute TBW ( 1,, 13, 15, 8, 15 ). m = ( ) / = 1. Ignore the.1 factor to make calculations easier. MAD = median ( 1-1, -1, 13-1, 15-1, 8-1, 15-1 ) = median ( 13,, 1, 1, 1, 161 ) = median ( 1, 1,, 13, 1, 161 ) = ( + 13 ) / = 1. t 1 = -13 / 5 t = - / 5 t 3 = -1 / 5 15 t = 1 / 5 t 5 = 1 / 5 t 6 = 161 / 5 16 Obtaining MAS5. Signal Values from Affymetrix.CEL Files t 1 = -13 / 5 t = - / 5 t 3 = -1 / 5 t = 1 / 5 t 5 = 1 / 5 t 6 = 161 / 5 B(t 1 )=B(.6)=( ) = B(t )=B(.1)=( ) = B(t 3 )=B(.)=( 1 -. ) =.999 B(t )=B(.)=( 1 -. ) =.999 B(t 5 )=B(.8)=( ) = B(t 6 )= * *+.999* * *8+* MAS5. Signal values can be obtained from Affymetrix software. Approximate MAS5. Signal values can be computed with the mas5 function that is part of the Bioconductor package affy. Use whichever method is easiest for you. The differences do not seem to be large enough to matter. =
4 Bioconductor Bioconductor is an open source and open development software project for the analysis and comprehension of genomic data. Bioconductor provides R packages useful for the analysis of gene expression data. Information about Bioconductor, including installation instructions, can be found at 19 R Commands for Obtaining MAS5. Signal Values from Affymetrix.CEL Files Load the Bioconductor package affy. library(affy) Set the working directory to the directory containing all the.cel files. setwd("c:/z/courses/smicroarray/affycel") Read the.cel file data. Data=ReadAffy() Compute the MAS5. Signal Values signal=mas5(data) Write the data to a tab-delimited text file. write.exprs(signal, file="mydata.txt") Robust Multi-array Average (RMA) RMA: Background Adjustment 1. Background adjust PM values from.cel files.. Take the base- log of each background-adjusted PM intensity. 3. Quantile normalize values from step across all GeneChips.. Perform median polish separately for each probe set with rows indexed by GeneChip and columns indexed by probe. 5. For each row, find the average of the fitted values from step to use as probe-set-specific expression measures for each GeneChip. 1 Assume PM = S + B where signal S ~ Exp(λ) independent of background B ~ N + (μ,σ ). N + (μ,σ ) denotes N(μ,σ ) truncated on the left at. The Probability Density Function of the Exponential Distribution with Mean 1/λ = 1 The Probability Density Function of the Normal Distribution with Mean μ = 1 and Variance σ = 3 λe -λs e-(b-μ) /(σ ) (πσ ).5 s 3 b
5 The Probability Density Function of s + b where s~exp(λ=1/1) and b~n + (μ = 1,σ = 3 ) RMA: Background Adjustment N(,1) density function Density of s+b N(,1) distribution function s+b 5 Separately for each chip, estimate μ, σ, and λ from the observed PM distribution. Plug those estimates into the formula above to obtain an estimate of E(S PM) for each PM value. These serve as background-adjusted PM values. 6 RMA: Background Adjustment Obtaining Estimates of μ, σ, and λ (unpublished description of the procedure) PM Density Estimate Based on Simulated Data Estimate the mode of the PM distribution using a kernel density estimate of the PM density. Estimate the density of the PM values less than the mode. The mode of this distribution serves as an estimate of μ. Assume the data to the left of the estimate of μ are the background observations that fell below their mean. Use those observations to estimate σ. Subtract the estimate of μ from all observations larger than the estimate. The mode of this distribution estimates 1/λ. Density Data below the estimated mode is used to estimate background parameters μ and σ. 8 Density Estimate of PM Data below the Estimated Mode of the PM Distribution Estimate of σ This data is used to estimate σ as 6.3. According to the RMA R code, σ is estimated as follows: Density The purpose of the factor of in the numerator is not clear. Estimate of μ =
6 Density Estimate of PM μ^ Values Greater than Zero RMA: Quantile Normalization Density Estimate of 1/λ = 19 The mean of these values would be a much better estimate of 1/λ in this case. (Mean is 988 and 1/λ=1.) After background adjustment, find the smallest log (PM) on each chip.. Average the values from step Replace each value in step 1 with the average computed in step.. Repeat steps 1 through 3 for the second smallest values, third smallest values,..., largest values. 3 RMA: Median Polish RMA: Median Polish For a given probe set with J probe pairs, let y ij denote the background-adjusted, base--logged, and quantilenormalized value for GeneChip i and probe j. Assume y ij = μ i + α j + e ij where α 1 + α α n =. gene expression of the probe set on GeneChip i probe affinity affect for the j th probe in the probe set residual for the j th probe on the i th GeneChip Perform Tukey s Median Polish on the matrix of y ij values with y ij in the i th row and j th column. Let y^ ij denote the fitted value for y ij that results from the median polish procedure. Let α^ j = y. ^ j y.. ^ where y. ^ I j =Σ i=1 y^ ij and y..= ^ I J Σ i=1 Σ j=1 y^ ij and I IJ and I denotes the number of GeneChips. Let μ^ i = y^ J i.=σ j=1 y^ ij / J μ^ i is the probe-set-specific measure of expression for GeneChip i An Example Suppose the following are background-adjusted, log -transformed, quantile-normalized PM intensities for a single probe set. Determine the final RMA expression measures for this probe set. GeneChip Probe row medians removing row medians 36 6
7 subtracting row medians removing row medians subtracting All row medians and are. Thus the median polish procedure has converged. The above is the residual matrix that we will subtract from the original matrix to obtain the fitted values. 39 original matrix residuals from median polish matrix of fitted values row means 6. = μ^ = μ^ = μ^ = μ^ = ^ μ 5 RMA expression measures for the 5 GeneChips 1 R Commands for Obtaining RMA Expression Measures from Affymetrix.CEL Files Load the Bioconductor package affy. library(affy) Set the working directory to the directory containing all the.cel files. setwd("c:/z/courses/smicroarray/affycel") Read the.cel file data. Data=ReadAffy() Compute the RMA measures of expression. expr=rma(data) Write the data to a tab-delimited text file. write.exprs(expr, file="mydata.txt")
Software and Methods for the Analysis of Affymetrix GeneChip Data. Rafael A Irizarry Department of Biostatistics Johns Hopkins University
Software and Methods for the Analysis of Affymetrix GeneChip Data Rafael A Irizarry Department of Biostatistics Johns Hopkins University Outline Overview Bioconductor Project Examples 1: Gene Annotation
More informationFrozen Robust Multi-Array Analysis and the Gene Expression Barcode
Frozen Robust Multi-Array Analysis and the Gene Expression Barcode Matthew N. McCall October 13, 2015 Contents 1 Frozen Robust Multiarray Analysis (frma) 2 1.1 From CEL files to expression estimates...................
More informationRow Quantile Normalisation of Microarrays
Row Quantile Normalisation of Microarrays W. B. Langdon Departments of Mathematical Sciences and Biological Sciences University of Essex, CO4 3SQ Technical Report CES-484 ISSN: 1744-8050 23 June 2008 Abstract
More informationA Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias
A Comparison of Normalization Methods for High Density Oligonucleotide Array Data Based on Variance and Bias B. M. Bolstad, R. A. Irizarry 2, M. Astrand 3 and T. P. Speed 4, 5 Group in Biostatistics, University
More informationData Acquisition. DNA microarrays. The functional genomics pipeline. Experimental design affects outcome data analysis
Data Acquisition DNA microarrays The functional genomics pipeline Experimental design affects outcome data analysis Data acquisition microarray processing Data preprocessing scaling/normalization/filtering
More informationLAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING
LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.
More informationPredictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients
Predictive Gene Signature Selection for Adjuvant Chemotherapy in Non-Small Cell Lung Cancer Patients by Li Liu A practicum report submitted to the Department of Public Health Sciences in conformity with
More informationSTATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp. 238-239
STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. by John C. Davis Clarificationof zonationprocedure described onpp. 38-39 Because the notation used in this section (Eqs. 4.8 through 4.84) is inconsistent
More informationUsing Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial
Using Formulas, Functions, and Data Analysis Tools Excel 2010 Tutorial Excel file for use with this tutorial Tutor1Data.xlsx File Location http://faculty.ung.edu/kmelton/data/tutor1data.xlsx Introduction:
More informationDEVELOPMENT OF MAP/REDUCE BASED MICROARRAY ANALYSIS TOOLS
Clemson University TigerPrints All Theses Theses 8-2013 DEVELOPMENT OF MAP/REDUCE BASED MICROARRAY ANALYSIS TOOLS Guangyu Yang Clemson University, guangyy@clemson.edu Follow this and additional works at:
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationQuality Assessment of Exon and Gene Arrays
Quality Assessment of Exon and Gene Arrays I. Introduction In this white paper we describe some quality assessment procedures that are computed from CEL files from Whole Transcript (WT) based arrays such
More informationCluster software and Java TreeView
Cluster software and Java TreeView To download the software: http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm http://bonsai.hgc.jp/~mdehoon/software/cluster/manual/treeview.html Cluster 3.0
More informationThe Assignment Problem and the Hungarian Method
The Assignment Problem and the Hungarian Method 1 Example 1: You work as a sales manager for a toy manufacturer, and you currently have three salespeople on the road meeting buyers. Your salespeople are
More informationNumber boards for mini mental sessions
Number boards for mini mental sessions Feel free to edit the document as you wish and customise boards and questions to suit your learners levels Print and laminate for extra sturdiness. Ideal for working
More informationBiostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY
Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to
More informationStandard Deviation Estimator
CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of
More informationaffyplm: Fitting Probe Level Models
affyplm: Fitting Probe Level Models Ben Bolstad bmb@bmbolstad.com http://bmbolstad.com April 16, 2015 Contents 1 Introduction 2 2 Fitting Probe Level Models 2 2.1 What is a Probe Level Model and What is
More informationGene Expression Analysis
Gene Expression Analysis Jie Peng Department of Statistics University of California, Davis May 2012 RNA expression technologies High-throughput technologies to measure the expression levels of thousands
More informationSTATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI
STATS8: Introduction to Biostatistics Data Exploration Babak Shahbaba Department of Statistics, UCI Introduction After clearly defining the scientific problem, selecting a set of representative members
More informationAnalysis of gene expression data. Ulf Leser and Philippe Thomas
Analysis of gene expression data Ulf Leser and Philippe Thomas This Lecture Protein synthesis Microarray Idea Technologies Applications Problems Quality control Normalization Analysis next week! Ulf Leser:
More informationBinary Adders: Half Adders and Full Adders
Binary Adders: Half Adders and Full Adders In this set of slides, we present the two basic types of adders: 1. Half adders, and 2. Full adders. Each type of adder functions to add two binary bits. In order
More informationChapter 3 RANDOM VARIATE GENERATION
Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.
More informationGeneChip Expression Analysis. Data Analysis Fundamentals
GeneChip Expression Analysis Data Analysis Fundamentals Table of Contents Page No. Introduction 1 Chapter 1 Guidelines for Assessing Sample and Array Quality 2 Chapter 2 Statistical Algorithms Reference
More informationCalculate Highest Common Factors(HCFs) & Least Common Multiples(LCMs) NA1
Calculate Highest Common Factors(HCFs) & Least Common Multiples(LCMs) NA1 What are the multiples of 5? The multiples are in the five times table What are the factors of 90? Each of these is a pair of factors.
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationThe Dummy s Guide to Data Analysis Using SPSS
The Dummy s Guide to Data Analysis Using SPSS Mathematics 57 Scripps College Amy Gamble April, 2001 Amy Gamble 4/30/01 All Rights Rerserved TABLE OF CONTENTS PAGE Helpful Hints for All Tests...1 Tests
More informationROUND(cell or formula, 2)
There are many ways to set up an amortization table. This document shows how to set up five columns for the payment number, payment, interest, payment applied to the outstanding balance, and the outstanding
More informationIntroduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman
Introduction to Statistical Computing in Microsoft Excel By Hector D. Flores; hflores@rice.edu, and Dr. J.A. Dobelman Statistics lab will be mainly focused on applying what you have learned in class with
More informationBASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS
BASIC STATISTICAL METHODS FOR GENOMIC DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-110 012 seema@iasri.res.in Genomics A genome is an organism s
More informationGamma Distribution Fitting
Chapter 552 Gamma Distribution Fitting Introduction This module fits the gamma probability distributions to a complete or censored set of individual or grouped data values. It outputs various statistics
More informationChapter 19. General Matrices. An n m matrix is an array. a 11 a 12 a 1m a 21 a 22 a 2m A = a n1 a n2 a nm. The matrix A has n row vectors
Chapter 9. General Matrices An n m matrix is an array a a a m a a a m... = [a ij]. a n a n a nm The matrix A has n row vectors and m column vectors row i (A) = [a i, a i,..., a im ] R m a j a j a nj col
More informationDynamic Programming. Lecture 11. 11.1 Overview. 11.2 Introduction
Lecture 11 Dynamic Programming 11.1 Overview Dynamic Programming is a powerful technique that allows one to solve many different types of problems in time O(n 2 ) or O(n 3 ) for which a naive approach
More informationMeasuring gene expression (Microarrays) Ulf Leser
Measuring gene expression (Microarrays) Ulf Leser This Lecture Gene expression Microarrays Idea Technologies Problems Quality control Normalization Analysis next week! 2 http://learn.genetics.utah.edu/content/molecules/transcribe/
More informationExcel Formulas & Graphs
Using Basic Formulas A formula can be a combination of values (numbers or cell references), math operators and expressions. Excel requires that every formula begin with an equal sign (=). Excel also has
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationUNDERSTANDING THE TWO-WAY ANOVA
UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationFigure 1. An embedded chart on a worksheet.
8. Excel Charts and Analysis ToolPak Charts, also known as graphs, have been an integral part of spreadsheets since the early days of Lotus 1-2-3. Charting features have improved significantly over the
More informationManifold Learning Examples PCA, LLE and ISOMAP
Manifold Learning Examples PCA, LLE and ISOMAP Dan Ventura October 14, 28 Abstract We try to give a helpful concrete example that demonstrates how to use PCA, LLE and Isomap, attempts to provide some intuition
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationNotes on Factoring. MA 206 Kurt Bryan
The General Approach Notes on Factoring MA 26 Kurt Bryan Suppose I hand you n, a 2 digit integer and tell you that n is composite, with smallest prime factor around 5 digits. Finding a nontrivial factor
More informationMulti scale random field simulation program
Multi scale random field simulation program 1.15. 2010 (Updated 12.22.2010) Andrew Seifried, Stanford University Introduction This is a supporting document for the series of Matlab scripts used to perform
More informationHow To Run Statistical Tests in Excel
How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting
More informationWorking with whole numbers
1 CHAPTER 1 Working with whole numbers In this chapter you will revise earlier work on: addition and subtraction without a calculator multiplication and division without a calculator using positive and
More informationUsing Excel for Statistical Analysis
2010 Using Excel for Statistical Analysis Microsoft Excel is spreadsheet software that is used to store information in columns and rows, which can then be organized and/or processed. Excel is a powerful
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
More informationHow To Check For Differences In The One Way Anova
MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way
More information( ) FACTORING. x In this polynomial the only variable in common to all is x.
FACTORING Factoring is similar to breaking up a number into its multiples. For example, 10=5*. The multiples are 5 and. In a polynomial it is the same way, however, the procedure is somewhat more complicated
More informationHOW TO USE YOUR HP 12 C CALCULATOR
HOW TO USE YOUR HP 12 C CALCULATOR This document is designed to provide you with (1) the basics of how your HP 12C financial calculator operates, and (2) the typical keystrokes that will be required on
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More information8. Linear least-squares
8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationEngineering Problem Solving and Excel. EGN 1006 Introduction to Engineering
Engineering Problem Solving and Excel EGN 1006 Introduction to Engineering Mathematical Solution Procedures Commonly Used in Engineering Analysis Data Analysis Techniques (Statistics) Curve Fitting techniques
More informationPEARSON R CORRELATION COEFFICIENT
PEARSON R CORRELATION COEFFICIENT Introduction: Sometimes in scientific data, it appears that two variables are connected in such a way that when one variable changes, the other variable changes also.
More informationTIPS FOR DOING STATISTICS IN EXCEL
TIPS FOR DOING STATISTICS IN EXCEL Before you begin, make sure that you have the DATA ANALYSIS pack running on your machine. It comes with Excel. Here s how to check if you have it, and what to do if you
More informationMonte Carlo Simulation. SMG ITS Advanced Excel Workshop
Advanced Excel Workshop Monte Carlo Simulation Page 1 Contents Monte Carlo Simulation Tutorial... 2 Example 1: New Marketing Campaign... 2 VLOOKUP... 5 Example 2: Revenue Forecast... 6 Pivot Table... 8
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More information5 Transforming Time Series
5 Transforming Time Series In many situations, it is desirable or necessary to transform a time series data set before using the sophisticated methods we study in this course: 1. Almost all methods assume
More informationUsing Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions
2010 Using Excel for Data Manipulation and Statistical Analysis: How-to s and Cautions This document describes how to perform some basic statistical procedures in Microsoft Excel. Microsoft Excel is spreadsheet
More informationPackage SHELF. February 5, 2016
Type Package Package SHELF February 5, 2016 Title Tools to Support the Sheffield Elicitation Framework (SHELF) Version 1.1.0 Date 2016-01-29 Author Jeremy Oakley Maintainer Jeremy Oakley
More information1.2 Solving a System of Linear Equations
1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables
More informationBill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1
Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce
More informationseven Statistical Analysis with Excel chapter OVERVIEW CHAPTER
seven Statistical Analysis with Excel CHAPTER chapter OVERVIEW 7.1 Introduction 7.2 Understanding Data 7.3 Relationships in Data 7.4 Distributions 7.5 Summary 7.6 Exercises 147 148 CHAPTER 7 Statistical
More information23. RATIONAL EXPONENTS
23. RATIONAL EXPONENTS renaming radicals rational numbers writing radicals with rational exponents When serious work needs to be done with radicals, they are usually changed to a name that uses exponents,
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationRational Exponents. Squaring both sides of the equation yields. and to be consistent, we must have
8.6 Rational Exponents 8.6 OBJECTIVES 1. Define rational exponents 2. Simplify expressions containing rational exponents 3. Use a calculator to estimate the value of an expression containing rational exponents
More informationModule 4 (Effect of Alcohol on Worms): Data Analysis
Module 4 (Effect of Alcohol on Worms): Data Analysis Michael Dunn Capuchino High School Introduction In this exercise, you will first process the timelapse data you collected. Then, you will cull (remove)
More informationComparison of Programs for Fixed Kernel Home Range Analysis
1 of 7 5/13/2007 10:16 PM Comparison of Programs for Fixed Kernel Home Range Analysis By Brian R. Mitchell Adjunct Assistant Professor Rubenstein School of Environment and Natural Resources University
More information4. Continuous Random Variables, the Pareto and Normal Distributions
4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random
More informationFEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL
FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL STATIsTICs 4 IV. RANDOm VECTORs 1. JOINTLY DIsTRIBUTED RANDOm VARIABLEs If are two rom variables defined on the same sample space we define the joint
More informationv w is orthogonal to both v and w. the three vectors v, w and v w form a right-handed set of vectors.
3. Cross product Definition 3.1. Let v and w be two vectors in R 3. The cross product of v and w, denoted v w, is the vector defined as follows: the length of v w is the area of the parallelogram with
More informationMicroarray Analysis. The Basics. Thomas Girke. December 9, 2011. Microarray Analysis Slide 1/42
Microarray Analysis The Basics Thomas Girke December 9, 2011 Microarray Analysis Slide 1/42 Technology Challenges Data Analysis Data Depositories R and BioConductor Homework Assignment Microarray Analysis
More informationCAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION
CAB TRAVEL TIME PREDICTI - BASED ON HISTORICAL TRIP OBSERVATION N PROBLEM DEFINITION Opportunity New Booking - Time of Arrival Shortest Route (Distance/Time) Taxi-Passenger Demand Distribution Value Accurate
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More informationComparative genomic hybridization Because arrays are more than just a tool for expression analysis
Microarray Data Analysis Workshop MedVetNet Workshop, DTU 2008 Comparative genomic hybridization Because arrays are more than just a tool for expression analysis Carsten Friis ( with several slides from
More informationChapter 11 Number Theory
Chapter 11 Number Theory Number theory is one of the oldest branches of mathematics. For many years people who studied number theory delighted in its pure nature because there were few practical applications
More informationLinear Algebra Notes
Linear Algebra Notes Chapter 19 KERNEL AND IMAGE OF A MATRIX Take an n m matrix a 11 a 12 a 1m a 21 a 22 a 2m a n1 a n2 a nm and think of it as a function A : R m R n The kernel of A is defined as Note
More informationMedical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu
Medical Information Management & Mining You Chen Jan,15, 2013 You.chen@vanderbilt.edu 1 Trees Building Materials Trees cannot be used to build a house directly. How can we transform trees to building materials?
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationADD-INS: ENHANCING EXCEL
CHAPTER 9 ADD-INS: ENHANCING EXCEL This chapter discusses the following topics: WHAT CAN AN ADD-IN DO? WHY USE AN ADD-IN (AND NOT JUST EXCEL MACROS/PROGRAMS)? ADD INS INSTALLED WITH EXCEL OTHER ADD-INS
More informationProbability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur
Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce
More informationLearn How to Use The Roulette Layout To Calculate Winning Payoffs For All Straight-up Winning Bets
Learn How to Use The Roulette Layout To Calculate Winning Payoffs For All Straight-up Winning Bets Understand that every square on every street on every roulette layout has a value depending on the bet
More informationPractical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods
Practical Calculation of Expected and Unexpected Losses in Operational Risk by Simulation Methods Enrique Navarrete 1 Abstract: This paper surveys the main difficulties involved with the quantitative measurement
More information**BEGINNING OF EXAMINATION** The annual number of claims for an insured has probability function: , 0 < q < 1.
**BEGINNING OF EXAMINATION** 1. You are given: (i) The annual number of claims for an insured has probability function: 3 p x q q x x ( ) = ( 1 ) 3 x, x = 0,1,, 3 (ii) The prior density is π ( q) = q,
More informationAnalyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study
Analyzing the Effect of Treatment and Time on Gene Expression in Partek Genomics Suite (PGS) 6.6: A Breast Cancer Study The data for this study is taken from experiment GSE848 from the Gene Expression
More informationChapter 3. The Normal Distribution
Chapter 3. The Normal Distribution Topics covered in this chapter: Z-scores Normal Probabilities Normal Percentiles Z-scores Example 3.6: The standard normal table The Problem: What proportion of observations
More informationContinuous Random Variables
Chapter 5 Continuous Random Variables 5.1 Continuous Random Variables 1 5.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize and understand continuous
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationOne-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate
1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,
More informationExcel s Business Tools: What-If Analysis
Excel s Business Tools: Introduction is an important aspect of planning and managing any business. Understanding the implications of changes in the factors that influence your business is crucial when
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationTest 4 Sample Problem Solutions, 27.58 = 27 47 100, 7 5, 1 6. 5 = 14 10 = 1.4. Moving the decimal two spots to the left gives
Test 4 Sample Problem Solutions Convert from a decimal to a fraction: 0.023, 27.58, 0.777... For the first two we have 0.023 = 23 58, 27.58 = 27 1000 100. For the last, if we set x = 0.777..., then 10x
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More information3 Monomial orders and the division algorithm
3 Monomial orders and the division algorithm We address the problem of deciding which term of a polynomial is the leading term. We noted above that, unlike in the univariate case, the total degree does
More information