Curve Fitting Best Practice



Similar documents
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

The KaleidaGraph Guide to Curve Fitting

Example: Boats and Manatees

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Simple linear regression

(Least Squares Investigation)

Comparing Means in Two Populations

CALCULATIONS & STATISTICS

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Dealing with Data in Excel 2010

Regression III: Advanced Methods

Correlation and Regression

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Curve Fitting in Microsoft Excel By William Lee

Nonlinear Regression:

Solving Mass Balances using Matrix Algebra

Scatter Plot, Correlation, and Regression on the TI-83/84

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Prot Maximization and Cost Minimization

Fitting Models to Biological Data using Linear and Nonlinear Regression

the Median-Medi Graphing bivariate data in a scatter plot

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

AP Physics 1 and 2 Lab Investigations

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Simple Regression Theory II 2010 Samuel L. Baker

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Linear Regression. use waist

Epipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.

Analyzing Dose-Response Data 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Scientific Graphing in Excel 2010

Transforming Bivariate Data

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Chapter 2: Descriptive Statistics

Module 5: Multiple Regression Analysis

II. DISTRIBUTIONS distribution normal distribution. standard scores

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

Session 7 Bivariate Data and Analysis

Server Load Prediction

Objectives. Materials

Chapter 23. Inferences for Regression

Version 5.0. Regression Guide. Harvey Motulsky President, GraphPad Software Inc. GraphPad Prism All rights reserved.

Empirical Model-Building and Response Surfaces

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Week 4: Standard Error and Confidence Intervals

EST.03. An Introduction to Parametric Estimating

SPSS Explore procedure

Direct and Reflected: Understanding the Truth with Y-S 3

Father s height (inches)

PHAR 7633 Chapter 21 Non-Linear Pharmacokinetic Models

PLOTTING DATA AND INTERPRETING GRAPHS

Data analysis and regression in Stata

Measuring Line Edge Roughness: Fluctuations in Uncertainty

x 2 + y 2 = 1 y 1 = x 2 + 2x y = x 2 + 2x + 1

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

HISTOGRAMS, CUMULATIVE FREQUENCY AND BOX PLOTS

How to calibrate an RTD or Platinum Resistance Thermometer (PRT)

Reflection and Refraction

THE IMPOSSIBLE DOSE HOW CAN SOMETHING SIMPLE BE SO COMPLEX? Lars Hode

Scatter Plots with Error Bars

Chapter 17: Light and Image Formation

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Applying Statistics Recommended by Regulatory Documents

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

1 One Dimensional Horizontal Motion Position vs. time Velocity vs. time

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Vaporization of Liquid Nitrogen

6.4 Normal Distribution

The problem with waiting time

Analysis of Variance ANOVA

1 Review of Least Squares Solutions to Overdetermined Systems

5/31/ Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.

Fitting curves to data using nonlinear regression: a practical and nonmathematical review

A full analysis example Multiple correlations Partial correlations

Imaging Systems Laboratory II. Laboratory 4: Basic Lens Design in OSLO April 2 & 4, 2002

The early part of the article explains in detail the nature and purpose of Depreciation as a book-keeping and accounting concept.

Hypothesis Testing for Beginners

Think of the beards as a layer on top of the face rather than part of the face itself. Using

CHI-SQUARE: TESTING FOR GOODNESS OF FIT

Chapter 04 Firm Production, Cost, and Revenue

The Marginal Cost of Capital and the Optimal Capital Budget

Multiple Regression: What Is It?

A Color Placement Support System for Visualization Designs Based on Subjective Color Balance

14. Nonlinear least-squares

Psychology 60 Fall 2013 Practice Exam Actual Exam: Next Monday. Good luck!

Electrical Resonance

Chemistry 111 Lab: Intro to Spectrophotometry Page E-1

MTH 140 Statistics Videos

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Regression Analysis (Spring, 2000)

with functions, expressions and equations which follow in units 3 and 4.

7.S.8 Interpret data to provide the basis for predictions and to establish

Graphical Integration Exercises Part Four: Reverse Graphical Integration

Transcription:

Enabling Science Curve Fitting Best Practice Part 5: Robust Fitting and Complex Models Most researchers are familiar with standard kinetics, Michaelis-Menten and dose response curves, but there are many more available modern techniques of analysis that allow you to get greater value from data. This article discusses the methods used in curve fitting today, including Iteratively Re-weighted Least Squares (IRLS) which is also known as robust fitting. The constraints of this technique are also explored, including the reasons why robust fitting is now more widely accepted and used today after its introduction some 20 years ago. The principles behind complex models, and how they can be applied, are also discussed. Quick introduction to weights By default, equal weight is given to every data point in a curve fit. In order to determine the best fit, standard weighting is calculated by the Levenberg-Marquardt algorithm (LVM), which minimizes the sum-of-squares of the vertical distance between the observed data and the curve or fitted data (residuals). By default, LVM minimizes: Σ (Ydata Yfit) 2 Unequal weighting can be assigned according to any scheme. If weight values are assigned, LVM minimizes: Σ [(Ydata Yfit)/Weight] 2 The lower the weight (closer to 0), the higher the values bearing on the fit. Unequal weight can be assigned to data points within a certain tolerance, so that all points are included in the analysis but those with less weight are given less bearing and meaning to the ultimate result. For example, an instrument may have a certain data range within which it guarantees a high level of accuracy but when the limits of that range are exceeded, the tolerances in the accuracy of that instrument decreases. In this scenario, more bearing or weight can be given to the data points within the instrument s tolerances, and outside of that range, data points will still be included in the analysis but they have less bearing on the ultimate result. A set of weighting values can be applied that will reflect that assumption and reduce the impact of any outliers outside the tolerance ranges in the fitting process. IRLS (Robust Fitting) Standard regression analysis is very prone to outliers and even a single outlier will affect results considerably, as shown in Fig 1 below. Knocking out an individual outlier improves the curve fit considerably, as shown in Fig 2. IDBS Unit 2 Occam Court Surrey Research Park Guildford Surrey GU2 7QB UK t: +44 1483 595000 e: info@idbs.com w: http://www.idbs.com

Fig 1: Even one outlying data point can significantly affect the quality of a fit Fig 2: Knocking out the outlier considerable improves results Robust fitting is an extension of standard regression (standard non-linear Least Squares Fitting (LSF)) that can even out individual outliers in a data set and neutralize their effect on the ultimate result. Robust fitting was introduced about 20 years ago but was not initially widely accepted because of the many competing techniques available at the time and a lack of understanding of the most appropriate way to use it. Another reason for the general reluctance to widely adopt IRLS was its computationally-intensive nature. Standard non-linear LSF processes could be calculated by writing on paper using standard math techniques, but the more robust technique of IRLS was much harder to perform in the same way. Early curve-fitting software packages were not able to employ robust fitting, making the technique and its algorithms mostly unavailable to the mainstream. IDBS 2008 Page 2 of 7

IRLS (Robust Fitting) A fitting process is iterative and, on each iteration, the fitting algorithm changes parameter values based on the data set provided in order to converge on best results. Robust fitting introduces another variable to the fitting process, by varying individual weights for individual data points as well as parameter values. Thus on each cycle of the iteration, the weighting values for each data point are changed to enable the fit to converge at the best fit for the data. So if there is an outlier in the data set, it will be significantly down weighted to achieve a more robust and better fit for the rest of the data set. There are many IRLS techniques available, but the six major most commonly used are: Tukey s Biweight* Andrew s Sine* German-McClure Huber Welsch Cauchy *Undefined over complete error space resulting in outliers being removed Tukey s Biweight and Andrew s Sine are the most commonly used, and because they are not defined over a whole error space, these two techniques differ slightly compared to the other four. For example, when employing Tukey s Biweight and Andrew s Sine, if a data point is given a weighting value that might be significantly low, it is construed as an outlier and removed from the data set. This occurs in curve-fitting applications such as XLfit when a user chooses to automatically remove outlying points from a data set. Note: The other four techniques down-weigh outlying points so they have no bearing on the fit at all, which is equivalent to knocking them out. It is possible to combine IRLS with manual outlier knock-out where appropriate. In the IRLS fitting scenarios below in Fig 3, Tukey s Biweight is performed on three different sets of data, which are all well defined but contain easily identifiable outliers. IRLS has removed these data points from the set, making manual interaction unnecessary because the fit is of significant quality to be confident in the results produced. Note: These data sets are well defined from a data perspective and are complete, with a reasonably high number of data points. Applying IRLS to well formed data sets enables the analysis to be of significant quality and the process to produce accurate results. Much like standard non-linear LSF, robust fitting does not work if there are any errors in the X value. IDBS 2008 Page 3 of 7

Fig 3: IRLS fitting improves fit results accuracy when applied to three different data sets For a data set with a large amount of scatter, the process involves re-weighting and changing the weight of each point. It is very difficult for the fitting process to converge on a positive result and a single best fit for such data. IRLS does require that the data fitted is of a significant quality, otherwise it is prone to failure. It is recommended that IRLS is always used in conjunction with other data quality checks to ensure good results. IRLS (Robust Fitting) The graph below illustrates how the IRLS process works. The blue line proceeds to infinity and if we assume that the vertical axis is showing some level of impact on the curve fit for an individual outlier, the residual value - the distance from the fitted curve increases. So the further the point is away from the fitted line, the higher the point outlier status is, and the more the impact on the curve fit. The red line is the IRLS fit. For one given individual outlier in the data set, as its outlier status increases, its impact on the fit decreases and eventually reaches 0. IDBS 2008 Page 4 of 7

Fig 4: The impact of IRLS on an outlier compared to standard least-squares regression Complex models Data fitting and analysis is not just confined to basic Michaelis-Menten and dose response models. Complex modelling can be used to analyze different types of data using standard non-linear LSF. The example below in Fig 5 shows time-controlled drug delivery with a number of different parameters being measured, while a drug is administered at different time points in a pulsed nature. The graph is analyzing absorption of the drug into the blood stream over time, indicated by the wave-like fit, allowing the researcher to determine the cycle and rate at which the drug is distributed. Fig 5: Analyzing the cycle and rate at which drug absorption IDBS 2008 Page 5 of 7

Composite models such as the those shown in Fig 6 allow us to analyze a data set using two different models. For example, the researcher fits the first model up to a point in time until the data points start to go back down when the model is changed to analyze a different phase within the data. Although this is a complex model, it allows the researcher to fit results to a high degree of confidence. Fig 6: Fitting composite data occurs Fig 7 below shows a common scenario where data is fitted to a standard dose response curve but the data points start decreasing at the end of the measurements. The researcher can set up a technique to remove those final data points, such as applying an IRLS fitting technique to eliminate those points that start to drop off. Alternatively the researcher can use a model that has been constructed to tackle this kind of scenario. A bellshaped dose response model allows the extraction of data points at the bottom and top, so that parameters C1 and C2 can be extracted as the EC 50 values for these two linked dose response curves, with measureable slope factors for both curves. Bell-shaped models provide an effective means of analyzing and interpreting a whole set of data, as opposed to having to reject data points. A scenario such as this comes up frequently in standard dose response analysis. If the last six points of this example were knocked out and a standard dose response curve performed on the data set, the results for the first curve in the bell-shaped model would display similar or exactly the same results as the standard dose response curve. IDBS 2008 Page 6 of 7

Fig 7: A bell-shaped dose response model producing two fit results Summary IRLS provides an advanced technique for reducing and neutralizing the effects of outliers in a fit. By weighting individual data points, IRLS can increase the accuracy of fit results compared to those achieved using standard regression (standard non-linear LSF). Both techniques, however, must be applied to a well defined and complete data set in order to produce quality results. Curve fitting is a flexible process offering a range of data analysis types, and researchers do not have to be constrained by standard analysis techniques. Providing a variety of innovative ways of applying data analysis to extract required results in varying scenarios, complex models extend data fitting and analysis beyond basic Michaelis-Menten and dose response models and can be used in a wide range of applications. IDBS 2008 Page 7 of 7