Nonlinear Regression:

Transcription

1 Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference and Visualisation Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

2 Nonlinear Regression: Half-Day / 8 Outline: Half-Day 1 Half-Day Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook

3 Nonlinear Regression: Half-Day 3 / 8.1 Likelihood Based Inference F-Test for the whole parameter vector θ : T = (n p) p S θ S θ S θ a F p,n p. It s like in lineare regression, where the result holds exactly however. And the resulting confidence region is { ( )} θ S θ S θ 1 + p n p qfp,n p 1 α. In case of the linear regression, this confidence region is identical to the confidence region based on multivariate normal distribution of β. In case of the nonlinear regression, this confidence region is more accurate than that one based on multivariate normal distribution of β. C.f. discussion of the deviance test and the t-test in GLMs.

4 Nonlinear Regression: Half-Day / 8 However, it is very difficult to calculate this more accurate confidence region! p = : We can determine the more accurate confidence region by standard contouring methods, that is, by evaluating S θ for a grid of θ values and approximating the contours by straight line segments in the grid. example, see next slide p 3: There are no contour plots.

5 Nonlinear Regression: Half-Day 5 / 8 Likelihood Contour Lines Nominal 8 and 95% likelihood contours lines ( ) and confidence ellipsoids based on Wald-type asymptotic approximations ( ). + indicates the least-squares estimation. These solutions do agree satisfactorily in the example Puromycin (left), but do disagree in the example Biochemical Oxygen Demand (right) clearly. θ θ θ θ 1

6 Nonlinear Regression: Half-Day 6 / 8 F-Test for a single Parameter: θ k = θ k - Such a null hypothesis ignores the other parameters. - The other parameters are fitted to the data by least-squares. - The minimum is called Sk. It depends on θ k, hence Sk = Sk θ k. The F-test statistic for the test θ k = θk is S k θk S θ T k = (n p). S θ It is approximatly F 1,n p distributed. In linear regression, this F-test is equivalent to the t-test, since the test statistic of the F-test is proportional to the squared of the test statistic of the t-test. In nonlinear regression, this F-test is not equivalent to the t-test of the asymptotic Wald-type test.

7 Nonlinear Regression: Half-Day 7 / 8 A more accurate t-test Based on the previous result, we can construct a t-type test which is more accurate than that introduced initially: Take the square-root from the F-test statistic and multiply it with the sign of θ k θ k, S k θk θ S T k θk := sign θk θk σ This test statistic is t n p distributed approximately.. (In linear regression, this test statistic is equivalent to the usual t-test.)

8 Nonlinear Regression: Half-Day 8 / 8. Profile t Plot and Profile Traces Based on the just introduced test statistic, a graphical tool called profile t plot can be designed for assessing the quality of the linear approximation: We plot the test statistic T k θ k as a function of θ k In linear regression, the profile t function is a straight line. the profile t function In nonlinear regression, the profile t function can be any monotone increasing function. Profile t Plot: Plot T k θ k versus δ k θk def = θ k θk se ( θk ) The more curved the profile t function is the stronger the nonlinearity in a neighbourhood of θk! Hence, the profile t plot shows how accurate the linear approximation of the standard test and standard confidence interval is. The neighbourhood important for statistics is given by δ k θk.5. Why?

9 Nonlinear Regression: Half-Day 9 / 8 Example: Profile t Plots θ θ T 1 (θ 1 ) Level T 1 (θ 1 ) Level δ(θ 1 ) 1 3 δ(θ 1 ) Profile t Plot ( ) for θ 1 for the examples Puromycin data (left) and Biochemical Oxygen Demand data (right).

10 Nonlinear Regression: Half-Day 1 / 8 Example: Cellulose membrane (5) - Profile t plots θ θ T 1 (θ 1 ) Level T (θ ) Level δ(θ 1 ) δ(θ ) θ θ T 3 (θ 3 ) Level T (θ ) Level δ(θ 3 ) δ(θ )

11 Nonlinear Regression: Half-Day 11 / 8 Example: Cellulose membrane (6) Wald-type CI profile -type CI R Output: Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.93 on 35 df R Output: > confint(mem.fit) Waiting for profiling to be done....5% 97.5% θ θ θ θ Approximate 95% confidence intervals ( θk ± se θk q t ) θ 1: [163.5, ] θ : [159.6, 16.11] θ 3: [1.9, 3.5] θ : [-.65, -.37] θ 1: [163.7, ] θ : [159.36, 16.1] θ 3: [1.93, 3.6] θ : [-.69, -.38]

12 Nonlinear Regression: Half-Day 1 / 8 Likelihood Profile Traces Likelihood profile traces are another useful tool. The Parameter θ j, estimated at θ k = θk (k) hence the notation θ j θk. (k j), is evaluated as a function; Remember: min S θ 1,..., θk,..., θp = S θ1,..., θk 1, θk, θk+1,..., θp short = Sk θk {θ h,h k} θ (k) j Plot the profile trace versus θk but reflected at the 5 line; that is y-coordinate vs x-coordinate θ (k) j vs θk overlaid by θj vs θ(j) k overlaid by the profile trace θ (j) k versus θ j

13 Nonlinear Regression: Half-Day 13 / 8 Examples of Likelihood Profile Traces Likelihood Profile Traces for the example Puromycin (left) and the example Biochemical Oxygen Demand (right), complemented by he 8%- and 95% confidence region (gray curve) θ θ θ θ 1

14 Nonlinear Regression: Half-Day 1 / 8 Properties of Likelihood Profile Traces With linear regression: The profile traces are two straight lines. The angle between these two lines represents the correlation between the estimated parameters corresponding to the lines If the correlation between the parameters is, then the lines are orthogonal to each other. If the correlation between the parameters is either 1 or -1, then the lines overlay. With nonlinear regression: Both traces may be curved. The heavier the traces deviated from a straight line, the more insufficient is the linear approximation and the inference based on it. The angle between these two traces at the intersection still represents the correlation between the two estimated parameters θ j and θ k.

15 Nonlinear Regression: Half-Day 15 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces Traces for the bottom left corner: θ Red: θ(1) vs θ1 Green: θ vs θ() θ θ θ 1 θ θ 3 θ

16 Nonlinear Regression: Half-Day 16 / 8.3 Parameter Transformations In this section we study the effects of transforming the parameters. This topic is based on the fact that the mean regression function can usually be written down by mathematically equivalent expressions. For example The two expression for the Michaelis-Menten function are equivalent Hence θ 1x θ + x = x ϑ 1 + ϑ. x ϑ 1 = θ θ 1 and ϑ = 1 θ 1. Or, we have the two equivalent expressions θ 1e θ x = ϑ 1ϑ x hence, ϑ 1 = θ 1 and ϑ = e θ.

17 Nonlinear Regression: Half-Day 17 / 8 Motivation The parameters of the regression function are transformed to get rid of collinearities improve the convergence of the algorithm improve the linear approximation (e.g., the Wald-type asymptotic) which results in ( nicer profile traces ) and hence to obtain a better quality of the Wald-type confidence intervals Parameter transformation do not chance either the deterministic nor the stochastic part of the regression model! in contrast to variable transformations.

18 Nonlinear Regression: Half-Day 18 / 8 Constraints of the Parameter Domain Subject matter theory: Parameter domain is subject to constraints e.g., θ 1 >, a < θ b What to do? Ignore the constraints and observe whether the algorithm converge and where to. If it fails: Most of the constraints are such that they can be imposed by a suitable transformation of the concerned parameter

19 Nonlinear Regression: Half-Day 19 / 8 Examples of Constraints θ > : Trsf. θ φ = log θ θ = exp φ > for all φ h x; θ h x; e φ a < θ < b: Trsf. θ φ = log b θ θ a θ = a + b a 1+exp φ Let h x; θ = θ 1 e θx + θ 3 e θx with θ, θ > The two pairs of parameters (θ 1, θ ) and (θ 3, θ ) are exchangeable and may thus cause convergence problems Workaround: Impose the constraint θ < θ! Trsf. θ φ with θ 1 = φ 1, θ = e φ, θ 3 = φ 3, and θ = e φ (1 + e φ ) h x; (θ 1, φ, θ 3, φ ) T = θ 1 exp e φ x + θ 3 exp e φ (1 + e φ ) x

20 Nonlinear Regression: Half-Day / 8 Collinearity Example to show the problem: Let h x; θ = θ 1 e θx ( ) The partial derivatives ( matrix A) are θ 1 h x; θ = e θ x Hence a T 1 def = (e θ x 1,..., e θ x n ) θ h x; θ = θ 1 x e θ x a T def = ( θ 1 x 1 e θ x 1,..., θ 1 x n e θ x n ) The vectors a 1 and a incline to collinearity if all x i >. Reformulate ( ): h x; θ = θ 1 exp θ (x x + x ) Applying the reparametrization φ 1 def = θ 1 e θx und φ def = θ we obtain h x, φ = φ 1 exp φ (x x ). This functions results in (approximately) optimal matrix A if x = x is chosen.

21 Nonlinear Regression: Half-Day 1 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces (Slide from Half-day ). θ θ3 and θ highly correlated Profile traces of θ and θ 3 as well as θ and θ are twisted clearly θ θ θ 1 θ θ 3 θ

22 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (8) Regression function h x, θ = θ1 + θ 1θ 3+θ ((x i x )+x ) θ 3+θ ((x i x )+x ) Remove collinearity by introducing θ 3 def = θ 3 + θ x, where x = median x i : h x, θ = θ1 + θ 1 θ 3+θ (x i x ) θ 3+θ (x i x ) Improve linear approximation: Def Step 1: Introduce θ = 1 θ : h x, θ = θ1 + θ 1 θ3 θ(x i x θ 3 θ(x i x ) Step : θ 1 Def = θ1 + θ 1 θ 3 1 θ 3 + 1, θ Def = log 1 ( θ1 θ 1 θ θ3 (x i x ) ) h x, θ = θ1 +1 θ 1 θ ) θ 3 θ (x i x )

23 Nonlinear Regression: Half-Day 3 / 8 θ 1 Example Cellulose Membrane (9) Profile t functions and profile traces after reparametrization. θ θ θ θ 1 θ θ 3 θ

24 Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (1) Original parametrization Parameters: Value Std. Error t value θ θ θ θ Residual standard error: on 35 df Correlation of Parameter Estimates: θ 1 θ θ 3 θ -.56 θ θ Reparametrized Parameters: Value Std. Error t value θ θ θ θ Residual standard error:.931 on 35 df Correlation of Parameter Estimates: θ 1 θ θ3 θ θ θ

25 Nonlinear Regression: Half-Day 5 / 8 Successful Reparametrization A successful reparametrization depends both on the regression function and on the data set There are no general guidelines which results in a tedious search for successful reparameterisations. Another Example: h x, θ = = θ 1θ 3(x () x (3) ) 1 + θ x (1) + θ 3x () + θ x (3) ( ) x () x (3) 1 θ 1 θ 3 + θ θ 1 θ 3 x (1) + θ 3 θ 1 θ 3 x () + θ θ 1 θ 3 x (3) = x () x (3) φ 1 + φ x (1) + φ 3x () + φ x (3) ( ) The parametrization ( ) is preferd to ( ) in most cases (cf. exercises).

26 Nonlinear Regression: Half-Day 6 / 8 Interpretation? In most cases, the original parameters have a physical interpretation parameter must be back-transformed Standard approach for back-transformation: Example: Used parameter transformation: θ φ = ln θ Let φ and σ φ the estimated parameters. Estimate θ by θ = exp φ. Its standard error is obtained commonly by Gauss law of error propagation (cf. Stahel, Sec 6.1): ( ) σ θ exp φ ( σ φ = exp φ ) φ σ exp φ φ σ θ σ φ. φ= φ Hence, an approximate 95% confidence interval for θ is: g φ ( ) ± σ θ q t n p.975 = exp φ 1 ± q σ φ t n p.975. ( ) But this approach is not recommended because... see next slide

27 Nonlinear Regression: Half-Day 7 / 8 Why Parameter Transformation? so that the parameter falls within a predefined domain. Confidence intervals according to ( ) may violate this requirement! due to the insufficient quality of the confidence interval Gauss law of error propagation will nullify the achievements by the reparametrization since it uses the same linear approximation as the Wald-type asymptotic! Alternatives to the standard approach: Back-transformation of the complete confidence interval; Example: { θ : ln θ φ ± σ φ qt df.975 } forms a better, but still approximate 95% confidence interval for θ. It is identical to [ ] = exp φ σ φ qt df.975, exp φ + σ φ qt df.975, since ln/exp is strictly increasing. In the nd case, the most convenient approach is to form the confidence interval based on the profile t function.

28 Nonlinear Regression: Half-Day 8 / 8 Take Home Message Half-Day The commonly used confidence intervals are based on a (crude) linear approximation. Use graphical tools like profile t plots and profile traces to assess the quality of the approximated confidence intervals (and hence the linear approximation). If insufficient: More accurate confidence intervals can be calculated for single parameters θ k by using profile t functions (as in confint() implemented anyway). Convergence properties of the estimating algorithm and the quality of the Wald-type conference intervals can be improved by applying suitable reparametrizations (parameter transformations). If the interpretation of the original parameters is crucial, then the confidence interval should also be backtransformed and not be determined by Gauss law of error propagation.