Nonlinear Regression:

Similar documents

Nonlinear Regression:

Simple Linear Regression Inference

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Least Squares Estimation

STATISTICA Formula Guide: Logistic Regression. Table of Contents

A Basic Introduction to Missing Data

SAS Software to Fit the Generalized Linear Model

Regression III: Advanced Methods

Multivariate Logistic Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Simple linear regression

Time Series Analysis

Week 5: Multiple Linear Regression

Machine Learning and Pattern Recognition Logistic Regression

Part 2: Analysis of Relationship Between Two Variables

Using R for Linear Regression

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Overview Classes Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

HYPOTHESIS TESTING: POWER OF THE TEST

Multiple Linear Regression

SUMAN DUVVURU STAT 567 PROJECT REPORT

Indices of Model Fit STRUCTURAL EQUATION MODELING 2013

Logistic Regression (a type of Generalized Linear Model)

Introduction to General and Generalized Linear Models

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Statistical Machine Learning

Chapter 6: Multivariate Cointegration Analysis

Confidence Intervals for the Difference Between Two Means

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Regression Modeling Strategies

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Regression Analysis: A Complete Example

Generalized Linear Models

Estimation of σ 2, the variance of ɛ

Poisson Models for Count Data

Lecture 3: Linear methods for classification

Basics of Statistical Machine Learning

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

Notes on Applied Linear Regression

GLMs: Gompertz s Law. GLMs in R. Gompertz s famous graduation formula is. or log µ x is linear in age, x,

5. Linear Regression

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Elements of statistics (MATH0487-1)

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

Multinomial and Ordinal Logistic Regression

Inner Product Spaces

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Correlation and Simple Linear Regression

LOGIT AND PROBIT ANALYSIS

Example: Boats and Manatees

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

Module 5: Multiple Regression Analysis

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Nominal and ordinal logistic regression

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Statistical Models in R

Nonlinear Statistical Models

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Gamma Distribution Fitting

Testing for Granger causality between stock prices and economic growth

LOGISTIC REGRESSION ANALYSIS

Java Modules for Time Series Analysis

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Logit Models for Binary Data

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Part II. Multiple Linear Regression

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Ordinal Regression. Chapter

Introduction to Logistic Regression

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Linear Threshold Units

How Far is too Far? Statistical Outlier Detection

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Statistics in Retail Finance. Chapter 6: Behavioural models

1 Simple Linear Regression I Least Squares Estimation

Technical report. in SPSS AN INTRODUCTION TO THE MIXED PROCEDURE

Chapter 4: Statistical Hypothesis Testing

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

Analysis of Bayesian Dynamic Linear Models

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Linear Models for Classification

Final Exam Practice Problem Answers

Multivariate Normal Distribution

Package EstCRM. July 13, 2015

HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009

Additional sources Compilation of sources:

Introduction to Path Analysis

Univariate Regression

Interaction between quantitative predictors

Longitudinal Meta-analysis

Distribution (Weibull) Fitting

Trend and Seasonal Components

Impulse Response Functions

Transcription:

Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day 1: Estimation and Standard Inference Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 2 / 27 Outline: Half-Day 1 Half-Day 2 Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 3 / 27 1 The Nonlinear Regression Model The regression model Y i = h x (1) i,..., x (m) i ; θ 1, θ 2,..., θ p + E i with E i indep. N 0, σ 2 In case of the linear regression model h x (1) i,..., x (m) i ; θ 1, θ 2,..., θ p = θ 1 1 + θ 2x (2) i +... + θ px (p) i (i.e., m = p) Examples of nonlinear regression function: h x i; θ = θ1x θ 3 i θ 2 + x θ 3 h x; θ = exp i θ 1 ( x (1) i ) θ3 exp θ2 x (2) i θ2 h x; θ = θ 1 exp x i

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 4 / 27 Example: Puromycin The Michaelis-Menten model for enzyme kinetics relates the initial velocity of an enzymatic reaction to the substrate concentration 200 teated with Puromycin not treated Velocity 150 100 Velocity 50 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Concentration Y i = θ1 xi θ 2 + x i + E i with E i i.i.d. N 0, σ 2 (Michaelis-Menten model) x substrate concentration [ppm] Y initial velocity [(number/min)/min]

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 5 / 27 Example: Biochemical Oxygen Demand (BOD) Biochemical oxygen demand of stream water 20 Oxygen demand (mg/l) 18 16 14 12 10 Oxygen demand 8 1 2 3 4 5 6 7 Time (days) Time Y i = θ 1 (1 e θ2 xi ) + E i mit E i i.i.d. N 0, σ 2, where Y is the biochemical oxygen demand (BOD) [mg/l] and x the incubation time [days]

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 6 / 27 Example: Cellulose Membrane Ratio of protonated to deprotonated carboxyl groups within the pore of celluose membrane versus ph value x of the bulk solution 163 y (= chem. shift) 162 161 y 160 (a) (b) 2 4 6 8 10 12 x (=ph) x Theoretically, this relation is described by the Henderson-Hasselbach equation, Y i = θ1 + θ2 10θ 3+θ 4 x i 1 + 10 θ 3+θ 4 x i + E i i = 1,..., n, with E i i.i.d. N 0, σ 2.

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 7 / 27 Transformably Linear Models Example: h x, θ = θ 1 exp Applying the log-transformation, we obtain θ2 log h x, θ = log θ 1 exp x θ2 = log θ 1 + log exp x Hence Conclusion: = log θ 1 + θ 2 1 x θ 2 log h x, θ = ϑ 1 + ϑ 2 x x The complete transformably linear model is log Y i = ϑ 1+ϑ 2 x i +E i, E i i.i.d. N 0, σ 2 The error term is additive In the original representation, the model transforms to ϑ 1 + ϑ 2 x i + E i Y i = exp θ2 = θ 1 exp x Ẽ i i.e., Ẽ i is log-normally distributed and the error is multiplicative. Transform to a linear model only if required by the error structure. Check assumptions on error term by residual analysis.

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 8 / 27 If there is a deterministic model y = θ 1 x θ 2, the random component may be either additiv or multiplicativ. The Tukey-Anscombe plot of the fitted model will show clearly which model is more adequate for the data. 0 200 400 600 800 1.0 0.5 0.0 0.5 0 200 400 600 800 1000 1200 500 0 500 1000 2 0 2 4 6 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 2 0 2 4 6 1.0 0.5 0.0 0.5 1.0 lm(log(y) ~ log(x)) nls(y ~ a * x^b) y = a * x^b + E ln(y) = ln(a) + b*ln(x) + E

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 9 / 27 A selection of transformably linear models h x, θ = 1/(θ 1 + θ 2 exp x ) 1/h x, θ = θ 1 + θ 2 exp x h x, θ = θ 1 x/(θ 2 + x) 1/h x, θ = 1/θ 1 + θ 2 /θ 1 1 x h x, θ = θ 1 x θ 2 ln h x, θ = ln θ 1 + θ 2 ln x h x, θ = θ 1 exp θ 2 g x ln h x, θ = ln θ 1 + θ 2 g x h x, θ = exp θ 1 x (1) exp θ 2 /x (2) ln ln h x, θ = ln θ 1 + ln x (1) θ 2 /x (2) h x, θ = θ 1 ( x (1) ) θ 2 ( x (2) ) θ 3 ln h x, θ = ln θ 1 + θ 2 ln x (1) + θ 3 ln x (2)

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 10 / 27 2 Model Fitting Using an Iterative Algorithm The method of least squares: Find the minimum of S θ = n (y i η i θ ) 2 mit η i θ = h θ, x i. i=1 Key steps for minimising: approximate the surface η θ at a temporarily best value θ (l) by a tangent plane where η θ (l) is the point of contact. search the point on the plane, which is closest to Y (that is a linear regression fitting problem). The new point lies on the plain but not on the surface. However, it defines a parameter vector θ (l+1) which will be used in the next iteration step.

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 11 / 27 Algebraically formulated 1 Linear approximation of η i θ at θ (m) : η i θ η i θ (m) + A (m) ( θ θ (m)), where A (m) = A θ (m) is the derivative matrix of η θ at θ (m) in the m-th iteration step. 2 (Local) linear Model Ỹ (m) A (m) β (m) + E where Ỹ (m) = Y η θ (m) and β (m) = θ θ (m) 3 Least-squares estimation for β (m) β (m). Set θ (m+1) = θ (m) + β(m). 4 Repeat steps 1 to 3 until the procedure converges. result θ = θ (m+1)

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 12 / 27 Starting Values interpret the behaviour of the regression function in terms of the parameter analytically or graphically transform the regression function to obtain simpler, preferably linear, behaviour use your knowledge from previous or similar experiments Example Puromycin (2) - using transformation θ1 xi y h x, θ = θ 2 + x i transform to linearity ỹ = 1 y 1 h x, θ = θ2 1 θ 1 x + 1 θ 1 that is ỹ β 1 x + β 0 linear regression β = (0.005, 0.00025) T starting values: θ0 1 = 1 β0 196 θ0 2 = β 1 β 0 0.048

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 13 / 27 Example Puromycin (3) 0.020 200 1/Velocity 0.015 0.010 Velocity 150 100 0.005 50 0 10 20 30 40 50 1/Concentration 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Left: Regression line used for determining the starting values θ 1 and θ 2. Right: Regression function h x; θ based on the starting values θ = θ (0) ( ) and based on the least-squares estimation θ = θ ( ), respectively.

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 14 / 27 Example: Cellulose membrane (2) - starting values h x ; θ = θ 1 + θ 2 10 θ3+θ4x 1 + 10 θ3+θ4x mit θ 4 < 0 We know: h x ; θ θ 1 for x h x ; θ θ 2 for x From data, we obtain θ (0) 1 = 163.7 und θ (0) 2 = 159.5 (0) θ 1 y i Let ỹ i = log 10 y i θ (0) 2 hence ỹ i = θ 3 + θ 4 x i. Simple linear regression results in starting values for both θ 3 and θ 4 θ (0) 3 = 1.83 and θ (0) 4 = 0.36.,

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 15 / 27 Example: Cellulose membrane (3) y 2 1 0 1 2 2 4 6 8 10 12 x (=ph) (a) y (= chem. shift) 163 162 161 160 2 4 6 8 10 12 x (=ph) (a) Regression line used for determining the starting values θ 3 and θ 4. (b) Regression function h x; θ based on the starting values θ = θ (0) ( ) and based on the least-squares estimation θ = θ ( ), respectively. (b)

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 16 / 27 Self-Starter Function For repeated use of the same nonlinear regression model use an automated way of providing starting values. Basically, collect all the manual steps which are necessary to obtain the initial values for a nonlinear regression model into a function. Self-starter functions are specific for a given mean function and calculate starting values for a given dataset. If SSmicmen() (c.f. next slide) is a self-starter function, then you can run the fitting process as nls(rate SSmicmen(conc, Vm, K), data=d.minor) How to write your own self-starter functions see help or, e.g., Ritz & Streibig (2008), Sec 3.2 With the standard installation of R, the following self-starter functions are implemented:

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 17 / 27 Self-Starter Functions in the Standard Installation Model Mean Function Name of Self-Starter Function Biexponential A1 e x elrc1 + A2 e x elrc2 SSbiexp(x, A1, lrc1, A2, lrc2) Asymptotic regression Asym + (R0 Asym) e x elrc SSasymp(x, Asym, R0, lrc) Asymptotic with offset Asymptotic (c0 = 0) regression regression Asym (1 e (x c0) elrc ) SSasympOff(x, Asym, lrc, c0) Asym (1 e x elrc ) First-order x1 elke+lka lcl e lka e lke compartment (e x2 elke e x2 elka ) SSasympOrig(x, Asym, lrc) SSfol(x1, x2, lke, lka, lcl) Gompertz Asym e b2 b3x SSgompertz(x, Asym, b2, b3) B A Logistic A + 1+e (xmid x)/scal SSfpl(x, A, B, xmid, scal) Asym Logistic (A = 0) 1+e (xmid x)/scal SSlogis(x, Asym, xmid, scal) x Michaelis-Menten Vm K+x SSmicmen(x, Vm, K) Weibull Asym Drop e elrc xpwr SSweibull(x, Asym, Drop, lrc, pwr)

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 18 / 27 3 Inference Based on Linear Approximations As a look on the summary output of the Example Cellulose Membrane shows it look very similar to the summary output of a fitted linear regression model: Formula: delta (T1 + T2 * 10ˆ(T3 + T4 * ph))/(10ˆ(t3 + T4 * ph) + 1) Parameters: Value Std. Error t value Pr(> t ) θ 1 163.706 0.1262 1297.26 < 2e-16 *** θ 2 159.785 0.1594 1002.19 < 2e-16 *** θ 3 2.675 0.3813 7.02 3.65e-08 *** θ 4-0.512 0.0703-7.28 1.66e-08 *** Residual standard error: 0.293137 on 35 degrees of freedom Number of iterations to convergence: 7 Achieved convergence tolerance: 3.652e-06

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 19 / 27 The Asymptotic Properties This approach is based on the local linearization of the model (cf. iterative estimation procedure) Y = η θ + A β + E where A θ is the n p matrix of partial derivatives. If the estimation procedure has converged, then β = 0. Asymptotic Distribution of the Least Squares Estimator with asymptotic covariance matrix θ as. N θ, V θ V θ = σ 2 (A θ T A θ ) 1

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 20 / 27 Application in Practise To explicitly determine the covariance matrix V θ, we plug-in estimates instead of true parameters: A θ is calculated using θ Â. For the error variance σ 2 we plug-in the usual estimator. Hence, V ( ) 1 = σ 2 Â T Â where σ 2 = S θ n p = 1 n p n i=1 ( θ ) 2 y i η i and Â = A θ.

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 21 / 27 Approximate 95%-confidence interval Hence, an approximate 95%-confidence interval for β k is θ k ± ŝe βk q t n p 0.975, where ŝe βk is the square root of the kth diagonal element of V. Example Cellulose Membrane From the summary output Parameters: Value Std. Error t value Pr(> t ) θ 1 163.706 0.1262 1297.26 < 2e-16 *** θ 2 159.785 0.1594 1002.19 < 2e-16 *** θ 3 2.675 0.3813 7.02 3.65e-08 *** θ 4-0.512 0.0703-7.28 1.66e-08 *** Residual standard error: 0.293137 on 35 degrees of freedom we can calculate the 95% confidence interval for θ 1 : 163.71 ± 0.13 q t 35 0.975 = 163.71 ± 0.26

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 22 / 27 Example: Puromycin - back to the initial data set The Michaelis-Menten model for enzyme kinetics relates the initial velocity of an enzymatic reaction to the substrate concentration 200 teated with Puromycin not treated Velocity 150 100 Velocity 50 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Concentration Y i = θ1 xi θ 2 + x i + E i with E i i.i.d. N 0, σ 2 (Michaelis-Menten model) x substrate concentration [ppm] Y initial velocity [(number/min)/min]

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 23 / 27 Example: Puromycin (4) Modell: Y i = θ 1x i θ 2 + x i + E i. Model with and without treatment (all data): Y i = (θ 1 + θ 3 z i )x i + E i. θ 2 + θ 4 z i + x { i 1 for with where z i = 0 for without Working hypothesis: Only the asymptotic velocity θ 1 is influenced by adding Puromycin. Hence Null hypothesis: θ 4 = 0 R output for the example Puromycin Parameters: Value Std. Error t value Pr(> t ) θ 1 160.286 6.8964 23.24 2.04e-15 θ 2 0.048 0.0083 5.76 1.50e-05 θ 3 52.398 9.5513 5.49 2.71e-05 θ 4 0.016 0.0114 1.44 0.167 Residual standard error: 10.4 on 19 df Since the P-value of 0.167 is larger than the level of 5% the null hypothesis is not rejected on the 5% level. 95% confidence interval for θ 4: 0.016 ± 0.0114 q t 19 0.975 = [-0.0079, 0.0399]

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 24 / 27 Inference for the expected value E Y x o = h x o ; θ at x o : Linear Regression h x o, β = x T o β is estimated by η o = x T o β. (1 α) 100% confidence interval for h x o, β is η o ± q tn p 1 α/2 se η o with se η o = σ x T o (X T X) 1 x o Nonlinear Regression h x o, θ is estimated by η o = h x o, θ. (1 α) 100% confidence interval for h x o, θ is h x o, θ ± q tn p 1 α/2 se η o with se η o = σ and â o = h x o, θ θ â T o. θ= θ (ÂT Â ) 1âo

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 25 / 27 Confidence Band Left: Confidence band (i.g., pointwise confidence intervals) for a fitted straight line (linear regression model). Right: Confidence band for the fitted curve h x, θ of the example Biochemical Oxygen Demand. log(pcb Concentration) 3 2 1 0 Oxygen Demand 30 25 20 15 10 5 0 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Years^(1/3) 0 2 4 6 8 Days

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 26 / 27 Variable Selection How about variable selection in nonlinear regression? There is no one-to-one correspondence between predictor variables and parameter as in linear regression! Hence, the number of variables may differ from the number of parameters. There are hardly ever problems, where some of the variables are in question (Model is derived from subject matter theory!) However, there are problems where a submodel (a submodel is nested within the full model) may be adequat to describe the data; cf. Example Puromycin, Slide 17, Half-Day 1. If we have a collection of candidate which need not to be submodels of each other and the subject matter is somehow indifferent to this models, but we want to find the the most appropriate model for the data one can use Akaike s information criterion (AIC) to select the best model (and/or run a residual analysis)

Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 27 / 27 Take Home Message Half-Day 1 In nonlinear regression, Y i = h x i, θ + E i, functions h are analysed which are not linear functions of the unknown parameters θ. Such models are often derived from the subject matter theory. The flexibility of this model class is bought by a more complex estimation and inference theory. Parameter estimation is done by an iterative procedure which needs appropriate starting values. Inference is based on an asymptotic theory. For finite sample size the results just hold approximately Model assumptions are assessed like in linear regression modelling.