Nonlinear Regression:

Similar documents

Nonlinear Regression:

Simple Linear Regression Inference

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Chapter 4: Statistical Hypothesis Testing

Least-Squares Intersection of Lines

Using R for Linear Regression

Regression III: Advanced Methods

Part 2: Analysis of Relationship Between Two Variables

Basics of Statistical Machine Learning

2 Sample t-test (unequal sample sizes and unequal variances)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Multiple Linear Regression

Hypothesis testing - Steps

1. Then f has a relative maximum at x = c if f(c) f(x) for all values of x in some

Generalized Linear Models

11. Analysis of Case-control Studies Logistic Regression

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Lecture 3: Linear methods for classification

Additional sources Compilation of sources:

Standard errors of marginal effects in the heteroskedastic probit model

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 7 Notes - Inference for Single Samples. You know already for a large sample, you can invoke the CLT so:

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Chapter 13 Introduction to Nonlinear Regression( 非線性迴歸 )

Introduction to General and Generalized Linear Models

Multivariate Normal Distribution

Simple linear regression

Least Squares Estimation

How To Test For Significance On A Data Set

A Basic Introduction to Missing Data

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Consider a study in which. How many subjects? The importance of sample size calculations. An insignificant effect: two possibilities.

Analysing Questionnaires using Minitab (for SPSS queries contact -)

DATA INTERPRETATION AND STATISTICS

HYPOTHESIS TESTING: POWER OF THE TEST

Linear Models and Conjoint Analysis with Nonlinear Spline Transformations

Multivariate Logistic Regression

Gamma Distribution Fitting

Regression Modeling Strategies

Chapter 2 Probability Topics SPSS T tests

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Back Analysis of Material Properties

Ordinal Regression. Chapter

research/scientific includes the following: statistical hypotheses: you have a null and alternative you accept one and reject the other

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Module 5: Multiple Regression Analysis

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Chapter 7: Simple linear regression Learning Objectives

SCHOOL OF HEALTH AND HUMAN SCIENCES DON T FORGET TO RECODE YOUR MISSING VALUES

STA 4273H: Statistical Machine Learning

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

5. Linear Regression

Comparing Means in Two Populations

Correlation and Simple Linear Regression

Tests for Two Survival Curves Using Cox s Proportional Hazards Model

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CURVE FITTING LEAST SQUARES APPROXIMATION

Time Series Analysis

Domain of a Composition

SAS Software to Fit the Generalized Linear Model

Two Correlated Proportions (McNemar Test)

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Elements of statistics (MATH0487-1)

Two Topics in Parametric Integration Applied to Stochastic Simulation in Industrial Engineering

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Factor Analysis. Principal components factor analysis. Use of extracted factors in multivariate dependency models

Econometrics Simple Linear Regression

Determining distribution parameters from quantiles

MATH4427 Notebook 2 Spring MATH4427 Notebook Definitions and Examples Performance Measures for Estimators...

ABSORBENCY OF PAPER TOWELS

Numerical methods for American options

Week 5: Multiple Linear Regression

Statistical Machine Learning

Multinomial and Ordinal Logistic Regression

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

How Far is too Far? Statistical Outlier Detection

Pearson s Correlation

Non-Parametric Tests (I)

Projects Involving Statistics (& SPSS)

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Chapter 5 Analysis of variance SPSS Analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

CALCULATIONS & STATISTICS

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

3. Regression & Exponential Smoothing

Introduction to Logistic Regression

Multiple Regression: What Is It?

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

STAT 350 Practice Final Exam Solution (Spring 2015)

Independent t- Test (Comparing Two Means)

Outline. Definitions Descriptive vs. Inferential Statistics The t-test - One-sample t-test

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Reflection and Refraction

CHAPTER 2 Estimating Probabilities

Lecture Notes Module 1

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Descriptive Statistics

Transcription:

Zurich University of Applied Sciences School of Engineering IDP Institute of Data Analysis and Process Design Nonlinear Regression: A Powerful Tool With Considerable Complexity Half-Day : Improved Inference and Visualisation Andreas Ruckstuhl Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Nonlinear Regression: Half-Day / 8 Outline: Half-Day 1 Half-Day Half-Day 3 Estimation and Standard Inference The Nonlinear Regression Model Iterative Estimation - Model Fitting Inference Based on Linear Approximations Improved Inference and Visualisation Likelihood Based Inference Profile t Plot and Profile Traces Parameter Transformations Bootstrap, Prediction and Calibration Bootstrap Prediction Calibration Outlook

Nonlinear Regression: Half-Day 3 / 8.1 Likelihood Based Inference F-Test for the whole parameter vector θ : T = (n p) p S θ S θ S θ a F p,n p. It s like in lineare regression, where the result holds exactly however. And the resulting confidence region is { ( )} θ S θ S θ 1 + p n p qfp,n p 1 α. In case of the linear regression, this confidence region is identical to the confidence region based on multivariate normal distribution of β. In case of the nonlinear regression, this confidence region is more accurate than that one based on multivariate normal distribution of β. C.f. discussion of the deviance test and the t-test in GLMs.

Nonlinear Regression: Half-Day / 8 However, it is very difficult to calculate this more accurate confidence region! p = : We can determine the more accurate confidence region by standard contouring methods, that is, by evaluating S θ for a grid of θ values and approximating the contours by straight line segments in the grid. example, see next slide p 3: There are no contour plots.

Nonlinear Regression: Half-Day 5 / 8 Likelihood Contour Lines Nominal 8 and 95% likelihood contours lines ( ) and confidence ellipsoids based on Wald-type asymptotic approximations ( ). + indicates the least-squares estimation. These solutions do agree satisfactorily in the example Puromycin (left), but do disagree in the example Biochemical Oxygen Demand (right) clearly. θ.1.9.8.7.6.5. θ 1 8 6 19 1 3 θ 1 1 3 5 6 θ 1

Nonlinear Regression: Half-Day 6 / 8 F-Test for a single Parameter: θ k = θ k - Such a null hypothesis ignores the other parameters. - The other parameters are fitted to the data by least-squares. - The minimum is called Sk. It depends on θ k, hence Sk = Sk θ k. The F-test statistic for the test θ k = θk is S k θk S θ T k = (n p). S θ It is approximatly F 1,n p distributed. In linear regression, this F-test is equivalent to the t-test, since the test statistic of the F-test is proportional to the squared of the test statistic of the t-test. In nonlinear regression, this F-test is not equivalent to the t-test of the asymptotic Wald-type test.

Nonlinear Regression: Half-Day 7 / 8 A more accurate t-test Based on the previous result, we can construct a t-type test which is more accurate than that introduced initially: Take the square-root from the F-test statistic and multiply it with the sign of θ k θ k, S k θk θ S T k θk := sign θk θk σ This test statistic is t n p distributed approximately.. (In linear regression, this test statistic is equivalent to the usual t-test.)

Nonlinear Regression: Half-Day 8 / 8. Profile t Plot and Profile Traces Based on the just introduced test statistic, a graphical tool called profile t plot can be designed for assessing the quality of the linear approximation: We plot the test statistic T k θ k as a function of θ k In linear regression, the profile t function is a straight line. the profile t function In nonlinear regression, the profile t function can be any monotone increasing function. Profile t Plot: Plot T k θ k versus δ k θk def = θ k θk se ( θk ) The more curved the profile t function is the stronger the nonlinearity in a neighbourhood of θk! Hence, the profile t plot shows how accurate the linear approximation of the standard test and standard confidence interval is. The neighbourhood important for statistics is given by δ k θk.5. Why?

Nonlinear Regression: Half-Day 9 / 8 Example: Profile t Plots θ 1 19 1 3 θ 1 6 8 1 3.99.99 T 1 (θ 1 ) 1 1 3.8. Level T 1 (θ 1 ).8.99.8..8.99 Level δ(θ 1 ) 1 3 δ(θ 1 ) Profile t Plot ( ) for θ 1 for the examples Puromycin data (left) and Biochemical Oxygen Demand data (right).

Nonlinear Regression: Half-Day 1 / 8 Example: Cellulose membrane (5) - Profile t plots θ 1 163. 163.6 163.8 16. θ 159. 159. 159.6 159.8 16. 16. T 1 (θ 1 ) 3 1 1 3.99.8. Level T (θ ) 3 1 1.8.99 3.99.8..8.99 Level 1 1 3 δ(θ 1 ) 3 1 1 δ(θ ) θ 3..5 3. 3.5. θ.8.7.6.5. 3.99 3.99 T 3 (θ 3 ) 1 1 3.8 1. Level T (θ ) 1.8.99 3.8..8.99 Level 1 1 3 δ(θ 3 ) 3 1 1 δ(θ )

Nonlinear Regression: Half-Day 11 / 8 Example: Cellulose membrane (6) Wald-type CI profile -type CI R Output: Parameters: Value Std. Error t value θ 1 163.76.16 197.1 θ 159.78.1595 1.3 θ 3.675.3813 7. θ -.51.73-7.8 Residual standard error:.93 on 35 df R Output: > confint(mem.fit) Waiting for profiling to be done....5% 97.5% θ 1 163.66197 163.96399 θ 159.356993 16.95 θ 3 1.96575 3.679 θ -.688365 -.3797975 Approximate 95% confidence intervals ( θk ± se θk q t 35.975 ) θ 1: [163.5, 163.96] θ : [159.6, 16.11] θ 3: [1.9, 3.5] θ : [-.65, -.37] θ 1: [163.7, 163.96] θ : [159.36, 16.1] θ 3: [1.93, 3.6] θ : [-.69, -.38]

Nonlinear Regression: Half-Day 1 / 8 Likelihood Profile Traces Likelihood profile traces are another useful tool. The Parameter θ j, estimated at θ k = θk (k) hence the notation θ j θk. (k j), is evaluated as a function; Remember: min S θ 1,..., θk,..., θp = S θ1,..., θk 1, θk, θk+1,..., θp short = Sk θk {θ h,h k} θ (k) j Plot the profile trace versus θk but reflected at the 5 line; that is y-coordinate vs x-coordinate θ (k) j vs θk overlaid by θj vs θ(j) k overlaid by the profile trace θ (j) k versus θ j

Nonlinear Regression: Half-Day 13 / 8 Examples of Likelihood Profile Traces Likelihood Profile Traces for the example Puromycin (left) and the example Biochemical Oxygen Demand (right), complemented by he 8%- and 95% confidence region (gray curve)..1 1.5.8 θ θ 1..6.5. 19 1 3 5 θ 1 15 5 3 35 θ 1

Nonlinear Regression: Half-Day 1 / 8 Properties of Likelihood Profile Traces With linear regression: The profile traces are two straight lines. The angle between these two lines represents the correlation between the estimated parameters corresponding to the lines If the correlation between the parameters is, then the lines are orthogonal to each other. If the correlation between the parameters is either 1 or -1, then the lines overlay. With nonlinear regression: Both traces may be curved. The heavier the traces deviated from a straight line, the more insufficient is the linear approximation and the inference based on it. The angle between these two traces at the intersection still represents the correlation between the two estimated parameters θ j and θ k.

Nonlinear Regression: Half-Day 15 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces. 163. 163.6 16. Traces for the bottom left corner: θ 16. 159.5 159. Red: θ(1) vs θ1 Green: θ vs θ() 1 163. 163.6 16. 159. 16. 5 5 θ 3 3 3 163. 163.6 16. 159. 16. 3 5.3.3.3..5..5..5 θ.6.7.6.7.6.7.8.8.8.9.9.9 163. 163.6 16. 159. 16. 3 5.9.7.5.3 θ 1 θ θ 3 θ

Nonlinear Regression: Half-Day 16 / 8.3 Parameter Transformations In this section we study the effects of transforming the parameters. This topic is based on the fact that the mean regression function can usually be written down by mathematically equivalent expressions. For example The two expression for the Michaelis-Menten function are equivalent Hence θ 1x θ + x = x ϑ 1 + ϑ. x ϑ 1 = θ θ 1 and ϑ = 1 θ 1. Or, we have the two equivalent expressions θ 1e θ x = ϑ 1ϑ x hence, ϑ 1 = θ 1 and ϑ = e θ.

Nonlinear Regression: Half-Day 17 / 8 Motivation The parameters of the regression function are transformed to get rid of collinearities improve the convergence of the algorithm improve the linear approximation (e.g., the Wald-type asymptotic) which results in ( nicer profile traces ) and hence to obtain a better quality of the Wald-type confidence intervals Parameter transformation do not chance either the deterministic nor the stochastic part of the regression model! in contrast to variable transformations.

Nonlinear Regression: Half-Day 18 / 8 Constraints of the Parameter Domain Subject matter theory: Parameter domain is subject to constraints e.g., θ 1 >, a < θ b What to do? Ignore the constraints and observe whether the algorithm converge and where to. If it fails: Most of the constraints are such that they can be imposed by a suitable transformation of the concerned parameter

Nonlinear Regression: Half-Day 19 / 8 Examples of Constraints θ > : Trsf. θ φ = log θ θ = exp φ > for all φ h x; θ h x; e φ a < θ < b: Trsf. θ φ = log b θ θ a θ = a + b a 1+exp φ Let h x; θ = θ 1 e θx + θ 3 e θx with θ, θ > The two pairs of parameters (θ 1, θ ) and (θ 3, θ ) are exchangeable and may thus cause convergence problems Workaround: Impose the constraint θ < θ! Trsf. θ φ with θ 1 = φ 1, θ = e φ, θ 3 = φ 3, and θ = e φ (1 + e φ ) h x; (θ 1, φ, θ 3, φ ) T = θ 1 exp e φ x + θ 3 exp e φ (1 + e φ ) x

Nonlinear Regression: Half-Day / 8 Collinearity Example to show the problem: Let h x; θ = θ 1 e θx ( ) The partial derivatives ( matrix A) are θ 1 h x; θ = e θ x Hence a T 1 def = (e θ x 1,..., e θ x n ) θ h x; θ = θ 1 x e θ x a T def = ( θ 1 x 1 e θ x 1,..., θ 1 x n e θ x n ) The vectors a 1 and a incline to collinearity if all x i >. Reformulate ( ): h x; θ = θ 1 exp θ (x x + x ) Applying the reparametrization φ 1 def = θ 1 e θx und φ def = θ we obtain h x, φ = φ 1 exp φ (x x ). This functions results in (approximately) optimal matrix A if x = x is chosen.

Nonlinear Regression: Half-Day 1 / 8 Example Cellulose Membrane (7) θ 1 Profile t Plot and Profile Traces (Slide from Half-day ). θ 163. 163.6 16. 16. 159.5 159. θ3 and θ highly correlated Profile traces of θ and θ 3 as well as θ and θ are twisted clearly 163. 163.6 16. 159. 16. 5 5 θ 3 3 3 163. 163.6 16. 159. 16. 3 5.3.3.3..5..5..5 θ.6.7.6.7.6.7.8.8.8.9.9.9 163. 163.6 16. 159. 16. 3 5.9.7.5.3 θ 1 θ θ 3 θ

Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (8) Regression function h x, θ = θ1 + θ 1θ 3+θ ((x i x )+x ) 1 + 1 θ 3+θ ((x i x )+x ) Remove collinearity by introducing θ 3 def = θ 3 + θ x, where x = median x i : h x, θ = θ1 + θ 1 θ 3+θ (x i x ) 1 + 1 θ 3+θ (x i x ) Improve linear approximation: Def Step 1: Introduce θ = 1 θ : h x, θ = θ1 + θ 1 θ3 θ(x i x 1 + 1 θ 3 θ(x i x ) Step : θ 1 Def = θ1 + θ 1 θ 3 1 θ 3 + 1, θ Def = log 1 ( θ1 θ 1 θ 3 + 1 1 θ3 (x i x ) ) h x, θ = θ1 +1 θ 1 θ ) 1 + 1 θ 3 θ (x i x )

Nonlinear Regression: Half-Day 3 / 8 θ 1 Example Cellulose Membrane (9) Profile t functions and profile traces after reparametrization. θ 161.3 161.6 161.9.5..35.3.5. 161.3 161.6 161.9..3..3.3...1.1 θ 3..1..1.. 161.3 161.6 161.9..3.....5.5.5... θ.3..3..3..1.1.1 161.3 161.6 161.9..3.....1..3..5 θ 1 θ θ 3 θ

Nonlinear Regression: Half-Day / 8 Example Cellulose Membrane (1) Original parametrization Parameters: Value Std. Error t value θ 1 163.76.16 197.1 θ 159.785.159 1.3 θ 3.675.3813 7. θ -.51.73-7.8 Residual standard error:.93137 on 35 df Correlation of Parameter Estimates: θ 1 θ θ 3 θ -.56 θ 3 -.3.771 θ.515 -.78 -.989 Reparametrized Parameters: Value Std. Error t value θ 1 161.61.739 187.1 θ.33.313 1.3 θ 3.6.595 1.8 θ.377.98 6.18 Residual standard error:.931 on 35 df Correlation of Parameter Estimates: θ 1 θ θ3 θ -.561 θ 3 -.766.61 θ.151.35 -.31

Nonlinear Regression: Half-Day 5 / 8 Successful Reparametrization A successful reparametrization depends both on the regression function and on the data set There are no general guidelines which results in a tedious search for successful reparameterisations. Another Example: h x, θ = = θ 1θ 3(x () x (3) ) 1 + θ x (1) + θ 3x () + θ x (3) ( ) x () x (3) 1 θ 1 θ 3 + θ θ 1 θ 3 x (1) + θ 3 θ 1 θ 3 x () + θ θ 1 θ 3 x (3) = x () x (3) φ 1 + φ x (1) + φ 3x () + φ x (3) ( ) The parametrization ( ) is preferd to ( ) in most cases (cf. exercises).

Nonlinear Regression: Half-Day 6 / 8 Interpretation? In most cases, the original parameters have a physical interpretation parameter must be back-transformed Standard approach for back-transformation: Example: Used parameter transformation: θ φ = ln θ Let φ and σ φ the estimated parameters. Estimate θ by θ = exp φ. Its standard error is obtained commonly by Gauss law of error propagation (cf. Stahel, Sec 6.1): ( ) σ θ exp φ ( σ φ = exp φ ) φ σ exp φ φ σ θ σ φ. φ= φ Hence, an approximate 95% confidence interval for θ is: g φ ( ) ± σ θ q t n p.975 = exp φ 1 ± q σ φ t n p.975. ( ) But this approach is not recommended because... see next slide

Nonlinear Regression: Half-Day 7 / 8 Why Parameter Transformation? so that the parameter falls within a predefined domain. Confidence intervals according to ( ) may violate this requirement! due to the insufficient quality of the confidence interval Gauss law of error propagation will nullify the achievements by the reparametrization since it uses the same linear approximation as the Wald-type asymptotic! Alternatives to the standard approach: Back-transformation of the complete confidence interval; Example: { θ : ln θ φ ± σ φ qt df.975 } forms a better, but still approximate 95% confidence interval for θ. It is identical to [ ] = exp φ σ φ qt df.975, exp φ + σ φ qt df.975, since ln/exp is strictly increasing. In the nd case, the most convenient approach is to form the confidence interval based on the profile t function.

Nonlinear Regression: Half-Day 8 / 8 Take Home Message Half-Day The commonly used confidence intervals are based on a (crude) linear approximation. Use graphical tools like profile t plots and profile traces to assess the quality of the approximated confidence intervals (and hence the linear approximation). If insufficient: More accurate confidence intervals can be calculated for single parameters θ k by using profile t functions (as in confint() implemented anyway). Convergence properties of the estimating algorithm and the quality of the Wald-type conference intervals can be improved by applying suitable reparametrizations (parameter transformations). If the interpretation of the original parameters is crucial, then the confidence interval should also be backtransformed and not be determined by Gauss law of error propagation.