Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day 1: Estimation and Standard Inference Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 2 / 27 Outline: Half-Day 1 Half-Day 2 Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 3 / 27 1 The Nonlinear Regression Model The regression model Y i = h x (1) i,..., x (m) i ; θ 1, θ 2,..., θ p + E i with E i indep. N 0, σ 2 In case of the linear regression model h x (1) i,..., x (m) i ; θ 1, θ 2,..., θ p = θ 1 1 + θ 2x (2) i +... + θ px (p) i (i.e., m = p) Examples of nonlinear regression function: h x i; θ = θ1x θ 3 i θ 2 + x θ 3 h x; θ = exp i θ 1 ( x (1) i ) θ3 exp θ2 x (2) i θ2 h x; θ = θ 1 exp x i
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 4 / 27 Example: Puromycin The Michaelis-Menten model for enzyme kinetics relates the initial velocity of an enzymatic reaction to the substrate concentration 200 teated with Puromycin not treated Velocity 150 100 Velocity 50 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Concentration Y i = θ1 xi θ 2 + x i + E i with E i i.i.d. N 0, σ 2 (Michaelis-Menten model) x substrate concentration [ppm] Y initial velocity [(number/min)/min]
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 5 / 27 Example: Biochemical Oxygen Demand (BOD) Biochemical oxygen demand of stream water 20 Oxygen demand (mg/l) 18 16 14 12 10 Oxygen demand 8 1 2 3 4 5 6 7 Time (days) Time Y i = θ 1 (1 e θ2 xi ) + E i mit E i i.i.d. N 0, σ 2, where Y is the biochemical oxygen demand (BOD) [mg/l] and x the incubation time [days]
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 6 / 27 Example: Cellulose Membrane Ratio of protonated to deprotonated carboxyl groups within the pore of celluose membrane versus ph value x of the bulk solution 163 y (= chem. shift) 162 161 y 160 (a) (b) 2 4 6 8 10 12 x (=ph) x Theoretically, this relation is described by the Henderson-Hasselbach equation, Y i = θ1 + θ2 10θ 3+θ 4 x i 1 + 10 θ 3+θ 4 x i + E i i = 1,..., n, with E i i.i.d. N 0, σ 2.
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 7 / 27 Transformably Linear Models Example: h x, θ = θ 1 exp Applying the log-transformation, we obtain θ2 log h x, θ = log θ 1 exp x θ2 = log θ 1 + log exp x Hence Conclusion: = log θ 1 + θ 2 1 x θ 2 log h x, θ = ϑ 1 + ϑ 2 x x The complete transformably linear model is log Y i = ϑ 1+ϑ 2 x i +E i, E i i.i.d. N 0, σ 2 The error term is additive In the original representation, the model transforms to ϑ 1 + ϑ 2 x i + E i Y i = exp θ2 = θ 1 exp x Ẽ i i.e., Ẽ i is log-normally distributed and the error is multiplicative. Transform to a linear model only if required by the error structure. Check assumptions on error term by residual analysis.
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 8 / 27 If there is a deterministic model y = θ 1 x θ 2, the random component may be either additiv or multiplicativ. The Tukey-Anscombe plot of the fitted model will show clearly which model is more adequate for the data. 0 200 400 600 800 1.0 0.5 0.0 0.5 0 200 400 600 800 1000 1200 500 0 500 1000 2 0 2 4 6 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 2 0 2 4 6 1.0 0.5 0.0 0.5 1.0 lm(log(y) ~ log(x)) nls(y ~ a * x^b) y = a * x^b + E ln(y) = ln(a) + b*ln(x) + E
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 9 / 27 A selection of transformably linear models h x, θ = 1/(θ 1 + θ 2 exp x ) 1/h x, θ = θ 1 + θ 2 exp x h x, θ = θ 1 x/(θ 2 + x) 1/h x, θ = 1/θ 1 + θ 2 /θ 1 1 x h x, θ = θ 1 x θ 2 ln h x, θ = ln θ 1 + θ 2 ln x h x, θ = θ 1 exp θ 2 g x ln h x, θ = ln θ 1 + θ 2 g x h x, θ = exp θ 1 x (1) exp θ 2 /x (2) ln ln h x, θ = ln θ 1 + ln x (1) θ 2 /x (2) h x, θ = θ 1 ( x (1) ) θ 2 ( x (2) ) θ 3 ln h x, θ = ln θ 1 + θ 2 ln x (1) + θ 3 ln x (2)
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 10 / 27 2 Model Fitting Using an Iterative Algorithm The method of least squares: Find the minimum of S θ = n (y i η i θ ) 2 mit η i θ = h θ, x i. i=1 Key steps for minimising: approximate the surface η θ at a temporarily best value θ (l) by a tangent plane where η θ (l) is the point of contact. search the point on the plane, which is closest to Y (that is a linear regression fitting problem). The new point lies on the plain but not on the surface. However, it defines a parameter vector θ (l+1) which will be used in the next iteration step.
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 11 / 27 Algebraically formulated 1 Linear approximation of η i θ at θ (m) : η i θ η i θ (m) + A (m) ( θ θ (m)), where A (m) = A θ (m) is the derivative matrix of η θ at θ (m) in the m-th iteration step. 2 (Local) linear Model Ỹ (m) A (m) β (m) + E where Ỹ (m) = Y η θ (m) and β (m) = θ θ (m) 3 Least-squares estimation for β (m) β (m). Set θ (m+1) = θ (m) + β(m). 4 Repeat steps 1 to 3 until the procedure converges. result θ = θ (m+1)
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 12 / 27 Starting Values interpret the behaviour of the regression function in terms of the parameter analytically or graphically transform the regression function to obtain simpler, preferably linear, behaviour use your knowledge from previous or similar experiments Example Puromycin (2) - using transformation θ1 xi y h x, θ = θ 2 + x i transform to linearity ỹ = 1 y 1 h x, θ = θ2 1 θ 1 x + 1 θ 1 that is ỹ β 1 x + β 0 linear regression β = (0.005, 0.00025) T starting values: θ0 1 = 1 β0 196 θ0 2 = β 1 β 0 0.048
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 13 / 27 Example Puromycin (3) 0.020 200 1/Velocity 0.015 0.010 Velocity 150 100 0.005 50 0 10 20 30 40 50 1/Concentration 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Left: Regression line used for determining the starting values θ 1 and θ 2. Right: Regression function h x; θ based on the starting values θ = θ (0) ( ) and based on the least-squares estimation θ = θ ( ), respectively.
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 14 / 27 Example: Cellulose membrane (2) - starting values h x ; θ = θ 1 + θ 2 10 θ3+θ4x 1 + 10 θ3+θ4x mit θ 4 < 0 We know: h x ; θ θ 1 for x h x ; θ θ 2 for x From data, we obtain θ (0) 1 = 163.7 und θ (0) 2 = 159.5 (0) θ 1 y i Let ỹ i = log 10 y i θ (0) 2 hence ỹ i = θ 3 + θ 4 x i. Simple linear regression results in starting values for both θ 3 and θ 4 θ (0) 3 = 1.83 and θ (0) 4 = 0.36.,
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 15 / 27 Example: Cellulose membrane (3) y 2 1 0 1 2 2 4 6 8 10 12 x (=ph) (a) y (= chem. shift) 163 162 161 160 2 4 6 8 10 12 x (=ph) (a) Regression line used for determining the starting values θ 3 and θ 4. (b) Regression function h x; θ based on the starting values θ = θ (0) ( ) and based on the least-squares estimation θ = θ ( ), respectively. (b)
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 16 / 27 Self-Starter Function For repeated use of the same nonlinear regression model use an automated way of providing starting values. Basically, collect all the manual steps which are necessary to obtain the initial values for a nonlinear regression model into a function. Self-starter functions are specific for a given mean function and calculate starting values for a given dataset. If SSmicmen() (c.f. next slide) is a self-starter function, then you can run the fitting process as nls(rate SSmicmen(conc, Vm, K), data=d.minor) How to write your own self-starter functions see help or, e.g., Ritz & Streibig (2008), Sec 3.2 With the standard installation of R, the following self-starter functions are implemented:
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 17 / 27 Self-Starter Functions in the Standard Installation Model Mean Function Name of Self-Starter Function Biexponential A1 e x elrc1 + A2 e x elrc2 SSbiexp(x, A1, lrc1, A2, lrc2) Asymptotic regression Asym + (R0 Asym) e x elrc SSasymp(x, Asym, R0, lrc) Asymptotic with offset Asymptotic (c0 = 0) regression regression Asym (1 e (x c0) elrc ) SSasympOff(x, Asym, lrc, c0) Asym (1 e x elrc ) First-order x1 elke+lka lcl e lka e lke compartment (e x2 elke e x2 elka ) SSasympOrig(x, Asym, lrc) SSfol(x1, x2, lke, lka, lcl) Gompertz Asym e b2 b3x SSgompertz(x, Asym, b2, b3) B A Logistic A + 1+e (xmid x)/scal SSfpl(x, A, B, xmid, scal) Asym Logistic (A = 0) 1+e (xmid x)/scal SSlogis(x, Asym, xmid, scal) x Michaelis-Menten Vm K+x SSmicmen(x, Vm, K) Weibull Asym Drop e elrc xpwr SSweibull(x, Asym, Drop, lrc, pwr)
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 18 / 27 3 Inference Based on Linear Approximations As a look on the summary output of the Example Cellulose Membrane shows it look very similar to the summary output of a fitted linear regression model: Formula: delta (T1 + T2 * 10ˆ(T3 + T4 * ph))/(10ˆ(t3 + T4 * ph) + 1) Parameters: Value Std. Error t value Pr(> t ) θ 1 163.706 0.1262 1297.26 < 2e-16 *** θ 2 159.785 0.1594 1002.19 < 2e-16 *** θ 3 2.675 0.3813 7.02 3.65e-08 *** θ 4-0.512 0.0703-7.28 1.66e-08 *** Residual standard error: 0.293137 on 35 degrees of freedom Number of iterations to convergence: 7 Achieved convergence tolerance: 3.652e-06
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 19 / 27 The Asymptotic Properties This approach is based on the local linearization of the model (cf. iterative estimation procedure) Y = η θ + A β + E where A θ is the n p matrix of partial derivatives. If the estimation procedure has converged, then β = 0. Asymptotic Distribution of the Least Squares Estimator with asymptotic covariance matrix θ as. N θ, V θ V θ = σ 2 (A θ T A θ ) 1
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 20 / 27 Application in Practise To explicitly determine the covariance matrix V θ, we plug-in estimates instead of true parameters: A θ is calculated using θ Â. For the error variance σ 2 we plug-in the usual estimator. Hence, V ( ) 1 = σ 2  T  where σ 2 = S θ n p = 1 n p n i=1 ( θ ) 2 y i η i and  = A θ.
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 21 / 27 Approximate 95%-confidence interval Hence, an approximate 95%-confidence interval for β k is θ k ± ŝe βk q t n p 0.975, where ŝe βk is the square root of the kth diagonal element of V. Example Cellulose Membrane From the summary output Parameters: Value Std. Error t value Pr(> t ) θ 1 163.706 0.1262 1297.26 < 2e-16 *** θ 2 159.785 0.1594 1002.19 < 2e-16 *** θ 3 2.675 0.3813 7.02 3.65e-08 *** θ 4-0.512 0.0703-7.28 1.66e-08 *** Residual standard error: 0.293137 on 35 degrees of freedom we can calculate the 95% confidence interval for θ 1 : 163.71 ± 0.13 q t 35 0.975 = 163.71 ± 0.26
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 22 / 27 Example: Puromycin - back to the initial data set The Michaelis-Menten model for enzyme kinetics relates the initial velocity of an enzymatic reaction to the substrate concentration 200 teated with Puromycin not treated Velocity 150 100 Velocity 50 0.0 0.2 0.4 0.6 0.8 1.0 Concentration Concentration Y i = θ1 xi θ 2 + x i + E i with E i i.i.d. N 0, σ 2 (Michaelis-Menten model) x substrate concentration [ppm] Y initial velocity [(number/min)/min]
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 23 / 27 Example: Puromycin (4) Modell: Y i = θ 1x i θ 2 + x i + E i. Model with and without treatment (all data): Y i = (θ 1 + θ 3 z i )x i + E i. θ 2 + θ 4 z i + x { i 1 for with where z i = 0 for without Working hypothesis: Only the asymptotic velocity θ 1 is influenced by adding Puromycin. Hence Null hypothesis: θ 4 = 0 R output for the example Puromycin Parameters: Value Std. Error t value Pr(> t ) θ 1 160.286 6.8964 23.24 2.04e-15 θ 2 0.048 0.0083 5.76 1.50e-05 θ 3 52.398 9.5513 5.49 2.71e-05 θ 4 0.016 0.0114 1.44 0.167 Residual standard error: 10.4 on 19 df Since the P-value of 0.167 is larger than the level of 5% the null hypothesis is not rejected on the 5% level. 95% confidence interval for θ 4: 0.016 ± 0.0114 q t 19 0.975 = [-0.0079, 0.0399]
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 24 / 27 Inference for the expected value E Y x o = h x o ; θ at x o : Linear Regression h x o, β = x T o β is estimated by η o = x T o β. (1 α) 100% confidence interval for h x o, β is η o ± q tn p 1 α/2 se η o with se η o = σ x T o (X T X) 1 x o Nonlinear Regression h x o, θ is estimated by η o = h x o, θ. (1 α) 100% confidence interval for h x o, θ is h x o, θ ± q tn p 1 α/2 se η o with se η o = σ and â o = h x o, θ θ â T o. θ= θ (ÂT Â ) 1âo
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 25 / 27 Confidence Band Left: Confidence band (i.g., pointwise confidence intervals) for a fitted straight line (linear regression model). Right: Confidence band for the fitted curve h x, θ of the example Biochemical Oxygen Demand. log(pcb Concentration) 3 2 1 0 Oxygen Demand 30 25 20 15 10 5 0 1.0 1.2 1.4 1.6 1.8 2.0 2.2 Years^(1/3) 0 2 4 6 8 Days
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 26 / 27 Variable Selection How about variable selection in nonlinear regression? There is no one-to-one correspondence between predictor variables and parameter as in linear regression! Hence, the number of variables may differ from the number of parameters. There are hardly ever problems, where some of the variables are in question (Model is derived from subject matter theory!) However, there are problems where a submodel (a submodel is nested within the full model) may be adequat to describe the data; cf. Example Puromycin, Slide 17, Half-Day 1. If we have a collection of candidate which need not to be submodels of each other and the subject matter is somehow indifferent to this models, but we want to find the the most appropriate model for the data one can use Akaike s information criterion (AIC) to select the best model (and/or run a residual analysis)
Nonlinear Regression: Half-Day 1 Estimation and Standard Inference 27 / 27 Take Home Message Half-Day 1 In nonlinear regression, Y i = h x i, θ + E i, functions h are analysed which are not linear functions of the unknown parameters θ. Such models are often derived from the subject matter theory. The flexibility of this model class is bought by a more complex estimation and inference theory. Parameter estimation is done by an iterative procedure which needs appropriate starting values. Inference is based on an asymptotic theory. For finite sample size the results just hold approximately Model assumptions are assessed like in linear regression modelling.