Curve Fitting Best Practice

Similar documents
Curve Fitting Best Practice

Applying Statistics Recommended by Regulatory Documents

The KaleidaGraph Guide to Curve Fitting

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Simple linear regression

Analyzing Dose-Response Data 1

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Introduction to Logistic Regression

10.2 ITERATIVE METHODS FOR SOLVING LINEAR SYSTEMS. The Jacobi Method

Server Load Prediction

14. Nonlinear least-squares

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Simple Regression Theory II 2010 Samuel L. Baker

Assumptions. Assumptions of linear models. Boxplot. Data exploration. Apply to response variable. Apply to error terms from linear model

Machine Learning and Pattern Recognition Logistic Regression

Step Response of RC Circuits

A step-by-step guide to non-linear regression analysis of experimental data using a Microsoft Excel spreadsheet

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

LOGIT AND PROBIT ANALYSIS

Fitting curves to data using nonlinear regression: a practical and nonmathematical review

Roots of equation fx are the values of x which satisfy the above expression. Also referred to as the zeros of an equation

Homework 8 Solutions

Roots of Equations (Chapters 5 and 6)

Computer programming course in the Department of Physics, University of Calcutta

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

Data Mining Practical Machine Learning Tools and Techniques

Dealing with Data in Excel 2010

AP Physics 1 and 2 Lab Investigations

Bio-Plex suspension array system tech note 3022

Data Mining Part 5. Prediction

Curve Fitting. Before You Begin

Logistic Regression (1/24/13)

Follow links Class Use and other Permissions. For more information, send to:

Determination of g using a spring

Measuring Line Edge Roughness: Fluctuations in Uncertainty

PHAR 7633 Chapter 19 Multi-Compartment Pharmacokinetic Models

EST.03. An Introduction to Parametric Estimating

Time Series and Forecasting

Transforming Bivariate Data

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

2. Simple Linear Regression

Straightening Data in a Scatterplot Selecting a Good Re-Expression Model

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

METNUMER - Numerical Methods

Empirical Model-Building and Response Surfaces

You have data! What s next?

The correlation coefficient

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Slope-Intercept Equation. Example

Self-Calibration and Hybrid Mapping

ANALYSIS OF TREND CHAPTER 5

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

The Method of Least Squares

High School Algebra Reasoning with Equations and Inequalities Solve systems of equations.

Solving Systems of Linear Equations Graphing

Jitter Measurements in Serial Data Signals

5. Multiple regression

Math 132. Population Growth: the World

A Robust Method for Solving Transcendental Equations

Predict the Popularity of YouTube Videos Using Early View Data

Introduction to Quantitative Methods

(Least Squares Investigation)

Data Mining Lab 5: Introduction to Neural Networks

(Quasi-)Newton methods

Gamma Distribution Fitting

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Regression III: Advanced Methods

Non-Linear Regression Samuel L. Baker

Current Standard: Mathematical Concepts and Applications Shape, Space, and Measurement- Primary

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Industry Environment and Concepts for Forecasting 1

Probit Analysis By: Kim Vincent

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

The equivalence of logistic regression and maximum entropy models

IB Math Research Problem

Fuzzy regression model with fuzzy input and output data for manpower forecasting

Pearson's Correlation Tests

Theory at a Glance (For IES, GATE, PSU)

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Prism 6 Step-by-Step Example Linear Standard Curves Interpolating from a standard curve is a common way of quantifying the concentration of a sample.

Version 5.0. Regression Guide. Harvey Motulsky President, GraphPad Software Inc. GraphPad Prism All rights reserved.

Efficient Curve Fitting Techniques

4. Simple regression. QBUS6840 Predictive Analytics.

Lecture 6. Artificial Neural Networks

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Applying Localized Realized Volatility Modeling to Futures Indices

ENGG1811 Computing for Engineers. Data Analysis using Excel (weeks 2 and 3)

Nonlinear Regression Functions. SW Ch 8 1/54/

Tutorial on Using Excel Solver to Analyze Spin-Lattice Relaxation Time Data

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

business statistics using Excel OXFORD UNIVERSITY PRESS Glyn Davis & Branko Pecar

TIME SERIES ANALYSIS. A time series is essentially composed of the following four components:

COMPUTATION OF THREE-DIMENSIONAL ELECTRIC FIELD PROBLEMS BY A BOUNDARY INTEGRAL METHOD AND ITS APPLICATION TO INSULATION DESIGN

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Austin Peay State University Department of Chemistry Chem The Use of the Spectrophotometer and Beer's Law

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

Curve Fitting, Loglog Plots, and Semilog Plots 1

Note on growth and growth accounting

Transcription:

Enabling Science Curve Fitting Best Practice Part 3: Fitting data Regression and residuals are an important function and feature of curve fitting and should be understood by anyone doing this type of analysis. This article explores regression analysis, describing varying models that can be used to fit data, and the results produced from those particular models. The fitting process When performing the fitting process, observational data consists of m values and is commonly termed y values. This dependent variable is subject to error values that are assumed to have a mean of zero. Systematic error may be present but its treatment is outside the scope of regression analysis. The independent variable normally termed x in maths literature is always assumed to be error-free. The independent variables are also called input variables. Regression Regression is used to construct a model that allows the user to predict a value of the y variable given a known value of the x variable, or vice-versa. If the prediction is to be done within the range of values of the x variables used to construct the model, this is referred to as interpolation. Prediction outside the range of the data used to construct the model is known as extrapolation and it is more risky. Regression is used to describe the relationship between two or more variables. It is often taken to be a linear relationship (i.e., a straight line when plotted) but it can also be non-linear (in the shape of a curve). When the regression relationship for the variables is known, we can predict the approximate value of one variable from the value of the other. Assumptions underpinning regression Regression fits a model to data by varying the parameters within that model. There are certain assumptions that underpin any type of regression performed, and the four most general are: The error terms must be normally distributed and independent, for example, there are no systematic errors in the data set being measured All experimental error in measurement is in the y values, with no error terms in the x values ata follows the trend of the model, so for example, when using a sigmoidal dose response model, the assumption is that the measured data set follows a sigmoidal response shape and is not, for example, exponential or linear. Least Squares Fitting Least squares fitting requires an iterative process to find a best fit based on four different sets of information: The measured raw data (normally formed from X and Y pairs but could be more) IBS Unit 2 Occam Court Surrey Research Park Guildford Surrey GU2 7QB UK t: +44 1483 595000 e: info@idbs.com w: http://www.idbs.com

A set of weighting values (this is not a mandatory set of information and will be covered in more detail in the upcoming fifth article in this best practice series, Robust Fitting and Complex Models) A model to fit the data set A set of starting parameter values that give estimates for the fitting process to begin Many different techniques for least squares fitting exist but for the most part they all use similar methods and have slightly different strengths and weaknesses. The five most common are: Steepest escent Newton Gauss-Newton Simplex Levenberg-Marquardt Most curve fitting packages will use the Levenberg-Marquardt algorithm which employs a combination of the Gauss- Newton and Steepest escent methods. Both have advantages and disadvantages depending on whether or not the starting parameter values are close to the minimum (optimum value). The Levenberg-Marquardt method switches between both techniques depending on its point in the fitting process. Criteria for terminating the fitting process All least squares fitting methods use an iterative process. A least squares fitting algorithm takes a set of starting parameter values for a given model, and then uses those starting parameter values as a point at which to begin the fitting process. The fitting algorithm then alters each parameter value in an iterative process or set of cycles in order to determine the optimum solution to the problem. By changing the parameters, and so the shape of the curve, the algorithm then measures the differences in the sum of the residuals squared and will look for successive consecutive iterations where the change in the residuals is converging. Once the convergence limit has been met, the solution is regarded as the optimum and the fitting process ends. All least squares fitting methods use a number of criteria under which it will terminate the process, known as the iteration limit. In general the higher the iteration limit and the smaller the convergence limit the greater the accuracy (but slower to perform). The lower the iteration limit and the higher the convergence limit the quicker the fitting (but less accurate). In most cases that use good starting parameters, a fit will always converge quickly on the optimum solution. Pick the right model up front The decision on which model to use is dependent on the information that the researcher wants to extract from the curve. It is important to know this information before selecting a model. IBS 2008 Page 2 of 6

Try not to use best fit searching Best fit searching can be used to give the researcher an idea as to the shape of the data but it is recommended that best fit searching is not used to pick the model from which decisions are to be made. When trying to perform any experimental measurements from a curve, the researcher should know the best model to represent the relationships in the measured data and understand the results that they want to extract from the data. A model is used to understand the results of the data you have. It acts as a way of generalising the relationships between the different data sets that you have and enables you to extract additional information from the data modelled. Four Parameter Sigmoid Curves Many four parameter models exist in literature. When looking at dose response data, the most commonly used are: Standard 4 Parameter Logistic Model B A y = A + C Fig 1: 4-parameter sigmoid where parameter C = EC 50 Standard 4 Parameter Logistic Model (Log EC50) IBS 2008 Page 3 of 6

B A y = A + C 10 Fig 2: 4-parameter sigmoid where parameter C = Log EC 50 value These two examples are interchangeable and will produce the same results for all parameters except C. Parameter C for the standard logistic model can be converted to the same value as for the Log EC 50 model by taking the log of C. However, this can cause issues as the results are actually different when the error values (confidence intervals) are taken into account. For this reason it is always best to use the correct model based on the results required. One other variation of the two models above is to lock parameter at 1. This causes the term to drop out of this function and turns the models into three-parameter logistic curves with fixed slopes. Modified Morrison Equation * [ E] [ x] Ki + y = V + V b 0 ([ E] [ x] K 2[ E] * i 2 ) + 4[ E] K * i Fig 3: A Modified Morrison model is an alternative to a 4-parameter sigmoid Three Parameter Logistic Models IBS 2008 Page 4 of 6

Example 3-P Fit 80 y A + 1 C B x + x = B B 60 Inh (%) 40 20 0 0 0.2 0.4 0.6 0.8 Conc (µg/l) B A y = A + C Fig 4: 3-parameter sigmoids where C = EC 50 value (top) and Log EC 50 value Five Parameter Logistic Model IBS 2008 Page 5 of 6

100 80 Example 5-P Fit B y = C A + E Inh (%) 60 40 20 0 0.1 10 1000 Conc. (nm) Fig 5: 5-parameter sigmoid where C = EC 50 curve 1 Summary In general, there is no single solution for best-fit of a model s parameters to the data provided, as there is in linear regression. Usually numerical optimization algorithms are applied to determine the best-fit parameters using the least squares fitting techniques mentioned earlier. Again in contrast to linear regression, there may be many local minima of the function to be optimized. In practice, estimated values of the parameters are used, in conjunction with the algorithm, to attempt to find the global minimum of a sum of squares. The appropriate model to choose depends on the analysis to be performed. By changing the model, you change the data representation and the results generated. For each of the results to be measured and extracted from a data set, the researcher should identify the best model with the correct parameterization prior to fitting. IBS 2008 Page 6 of 6