Nonlinear Statistical Models



Similar documents
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

gloser og kommentarer til Macbeth ( sidetal henviser til "Illustrated Shakespeare") GS 96

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Nonlinear Regression:

Web-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni

Basics of Statistical Machine Learning

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan , Fall 2010

17. SIMPLE LINEAR REGRESSION II

1 Prior Probability and Posterior Probability

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

3. Regression & Exponential Smoothing

Centre for Central Banking Studies

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Errata and updates for ASM Exam C/Exam 4 Manual (Sixteenth Edition) sorted by page

Notes on Applied Linear Regression

The Method of Least Squares. Lectures INF2320 p. 1/80

An introduction to Value-at-Risk Learning Curve September 2003

3.1 State Space Models

A Basic Introduction to Missing Data

Lecture 3: Linear methods for classification

Least Squares Estimation

Standard errors of marginal effects in the heteroskedastic probit model

Precalculus REVERSE CORRELATION. Content Expectations for. Precalculus. Michigan CONTENT EXPECTATIONS FOR PRECALCULUS CHAPTER/LESSON TITLES

VISUALIZATION OF DENSITY FUNCTIONS WITH GEOGEBRA

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Regression III: Advanced Methods

1 Teaching notes on GMM 1.

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Inner Product Spaces

Hypothesis testing - Steps

Imputing Missing Data using SAS

Simple Regression Theory II 2010 Samuel L. Baker

General Framework for an Iterative Solution of Ax b. Jacobi s Method

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Maximum Likelihood Estimation

SUMAN DUVVURU STAT 567 PROJECT REPORT

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

Markov Chain Monte Carlo Simulation Made Simple

Time Series Analysis

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

CHAPTER 3 EXAMPLES: REGRESSION AND PATH ANALYSIS

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

SAS Software to Fit the Generalized Linear Model

Gamma Distribution Fitting

Principle of Data Reduction

SIMULATION STUDIES IN STATISTICS WHAT IS A SIMULATION STUDY, AND WHY DO ONE? What is a (Monte Carlo) simulation study, and why do one?

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

A Coefficient of Variation for Skewed and Heavy-Tailed Insurance Losses. Michael R. Powers[ 1 ] Temple University and Tsinghua University

Principles of Hypothesis Testing for Public Health

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Chapter 3 RANDOM VARIATE GENERATION

Logit Models for Binary Data

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

R 2 -type Curves for Dynamic Predictions from Joint Longitudinal-Survival Models

Estimation of σ 2, the variance of ɛ

Logistic Regression (1/24/13)

Linear Regression. Guy Lebanon

Forecasting in supply chains

GLM, insurance pricing & big data: paying attention to convergence issues.

NCSS Statistical Software

Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Distribution (Weibull) Fitting

Nonlinear Regression:

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Week 5: Multiple Linear Regression

Gaussian Processes to Speed up Hamiltonian Monte Carlo

Efficiency and the Cramér-Rao Inequality

1 Sufficient statistics

STA 4273H: Statistical Machine Learning

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Simple Linear Regression Inference


Dynamical Systems Analysis II: Evaluating Stability, Eigenvalues

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

4 Lyapunov Stability Theory

4. Continuous Random Variables, the Pareto and Normal Distributions

STAT 830 Convergence in Distribution

Tim Kerins. Leaving Certificate Honours Maths - Algebra. Tim Kerins. the date

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Chapter 4: Vector Autoregressive Models

Numerical methods for American options

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

Interaction between quantitative predictors

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

Credit Scoring Modelling for Retail Banking Sector.

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Equations, Inequalities & Partial Fractions

Chapter 4 Online Appendix: The Mathematics of Utility Functions

Least-Squares Intersection of Lines

Time Series Analysis

Statistics 104: Section 6!

Multiple-Comparison Procedures

ab = c a If the coefficients a,b and c are real then either α and β are real or α and β are complex conjugates

Calculating VaR. Capital Market Risk Advisors CMRA

Transcription:

Nonlinear Statistical Models Earlier in the course, we considered the general linear statistical model (GLM): y i =β 0 +β 1 x 1i + +β k x ki +ε i, i= 1,, n, written in matrix form as: y 1 y n = 1 x 11 x p 1,1 1 x 12 x p 1,2 1 x 1n x p 1,n y= Xβ+ε, OR β 0 β 1 β p 1 + Specifically, we discussed estimation of model parameters, measurements of uncertainty in these estimates and the predicted values, and the use of confidence intervals or significance tests to assess the significance of parameters Through the matrix formulation, we saw that in linear models (model is a linear function of the parameters), the estimates of linear functions of the response values and their standard errors have simple closed-form expressions and can thus be analytically computed Unfortunately, this will not be the case for nonlinear models 144 ε 1 ε 2 ε n Definition: As before, consider a set of n observed values on a response variable y assumed to be dependent on k explanatory variables x=(x 1,, x k ) This situation can be described through the following nonlinear statistical model: y i = f(x i,θ)+ε i, i= 1,, n, where x =[x i 1i, x 2i,, x ki ] is the row vector of observations on k explanatory variables x 1,, x k for the i th observational unit,θ=(θ 1,,θ p ) is a vector of p parameters, andε i is the i th error term In a matrix setting, this model can be expressed as: y 1 f(x,θ) ε 1 1 y 2 y n }{{} y = f(x,θ) 2 f(x n } {{,θ) } f(θ) + ε 2 ε n }{{} ε Typically, we assumeε i iid (0,σ 2 ) and that the explanatory variables in x are fixed If f(θ)= Xθ, where X is the design matrix as defined earlier, then the model reverts to a linear model 145

As Leonid has argued, nonlinear models commonly arise as solutions to a system of differential equations which describe some dynamic process in a particular application Some common examples of nonlinear models are given below: 1 Exponential Growth/Decay Model: y i =αe β x i+ε i, where x is typically time This arises as a solution of the differential equation: de(y) =β E(y) d x 2 Two-term Exponential Model: y i = θ 1 θ 1 θ 2 e θ 2 x i e θ 1 x i +εi This model results when two processes lead to exponential growth or decay For example, if we are monitoring the concentration of a drug in the bloodstream, which enters the bloodstream from muscle tissue and is removed by the kidneys, then this model is a solution to the differential equation: de(y) =θ 1 E(y m ) θ 2 E(y), d x where y m is the concentration of the drug in the muscle tissue and y is the concentration in the blood 3 Mitscherlich Model: y i =α 1 e β(x i+δ) +ε i As an example of where this model is used, let y be the yield of a crop, let x be the amount of some nutrient, and letαbe the maximum attainable yield Then if we assume the increase in yield is proportional to the difference between α and the actual yield, we get the following differential equation, to which the model is a solution: d E( y) =β α E(y), d x whereδis the equivalent nutrient value of the soil 4 Logistic Growth Model: y i = Mu u+(m u) exp( kt i ) +ε i This model results from a differential equation where we assume the population growth rate decays as the population increases, so that it 146 147

%&)!"#"$ reaches a carrying capacity M The form of this differential equation was discussed in some detail in the past weeks, and is given by: de(y) = ke(y) 1 E(y), E(y t= 0)=u dt M where M is the carrying capacity, u is the population size at time 0, and k controls the rate of growth The form of this model is shown as a fit to the AIDS data (1981-1992) below %&'%( Fitting Nonlinear Models via Least Squares : In principle, fitting a nonlinear statistical model using least squares is no different than fitting a linear one That is, for the model y i = f(x i,θ)+ε i, we seek to find that parameter vector θ that minimizes the sum of the squared errors: Q(θ y, X) = n ε 2 i= i=1 n [y i f(x i,θ)]2 i=1 = [y f(θ)] [y f(θ)]=ε ε where f(θ) is the n 1 vector of f(x,θ) values i Differentiating Q with respect to theθ j (in an effort to minimize Q), setting the resulting equations to zero and evaluating atθ= θ gives: Q(θ y, X) n = 2 [y θ j i f(x, θ)] f(x, θ) i i θ= θ i=1 θ =0, j j= 1,, p These resulting equations are p nonlinear equations in the p parameters, which in general cannot be solved analytically to obtain explicit solutions forθ As a result, numerical methods are used to solve these nonlinear systems In MatLab, there are several numerical methods that can be used, although the default is a 148 149

generalization of the standard Gauss-Newton method, known as the Levenberg-Marquardt Algorithm Gauss-Newton Method: This method, like others that may be used, is iterative in nature, and requires starting values for the parameters These starting values are then continually adjusted to home in on the least squares solution The method is conducted via the following steps: 1 Consider taking a 1st-order Taylor expansion of f(θ) about some starting parameter vectorθ 0 : f(θ) f(θ 0 )+ F(θ 0 )(θ θ 0 ), (1) where F(θ 0 ) is the n p matrix of partial derivatives evaluated atθ 0 and the n data values x, as given i below: [f(x 1,θ0 )] [f(x 1,θ0 )] [f(x 1,θ0 )] θ 1 θ 2 θ p [f(x 2,θ0 )] [f(x 2,θ0 )] [f(x 2 F(θ 0,θ0 )] )= θ 1 θ 2 θ p [f(x n,θ0 )] [f(x n,θ0 )] [f(x n,θ0 )] θ 1 θ 2 θ p 2 Calculate f(θ 0 ) and F(θ 0 ) Recognizing f(θ) as an approximately linearized model inθ from equation (1), linear least squares is used to update the starting values and obtain the solution at the next iteration 3 Repeat steps 1 & 2 until the change in the parameters is below some tolerance Some Key Results: Based on this same Taylor expansion, Gallant (1987) established the following asymptotic (large sample) results: 1 If we assumeε N(0,σ 2 I), then the least squares parameter vector θ is approximately normally distributed with meanθ and variance matrix Var( θ)=(f F) 1 σ 2, written: θ N[θ,(F F) 1 σ 2 ] In practice, we compute F( θ) as an estimate of F(θ) and estimateσ 2 with MSE=SSE/(n p) to give the asymptotic variance-covariance matrix for θ as: s 2 ( θ)= Var( θ)=( F F) 1 MSE, where: F= F( θ) 150 151

EFGH IJKLM IN OPQ RFMH SRLT UFGH INV WXYZ[ WX 1-/ 1/ D: : ACB*-/ ; < A@? > = < */;: 9876-/ / *+,-*+/*+-*+0/*+0-*+/*+-1/ 2345 This gives an approximate 100(1 α)% confidence interval for a given parameterθ i as: θ i ± t 1 α,n p s( θ) ii where( F F) 1 ii = θ i ± t 1 α,n p MSE( F F) 1 ii is the i th diagonal entry of( F F) 1 2 Again assuming the errorsε N(0,σ 2 I), then: Var( y)= F(F F) 1 F σ 2, which can be estimated as: s 2 ( y)= Var( y)= F( F F) 1 F MSE This technique of quantifying uncertainty through the use of a first-order Taylor series expansion is known as the Delta Method and was the most commonly used technique for finding approximate standard errors before the advent of modern computing Gallant (1987) illustrated several cases where these approximations are poor, giving confidence intervals and hypothesis tests which are completely wrong, Example: Data were collected on the number of big horn sheep on the Mount Everts winter range in Yellowstone between 1965 and 1999 (obtained from Marty Kardos) Consider the time plot of these data below The population suffered a die-off resulting from the infectious keratoconjunctivitis in 1983 Marty indicates that this is fairly typical for bighorn sheep populations: logistic-like population growth up to a carrying capacity followed by disease-related die-offs, and subsequent recovery For now, consider modeling the initial population growth from 1965 to 1982 using a logistic growth model Here, we will fit the model, and use it to illustrate the calculation of standard errors for the parameter estimates from this nonlinear model Some more modern techniques for estimating standard errors and subsequently performing statistical inferences in nonlinear models are the method of bootstrapping, and Markov chain Monte Carlo (MCMC) methods 152 153

y ˆŒ cdb w m m cb t v u n o \db t s rq \b p o n m l db k j i b \]^_\]^\]^`\]ab\]ac\]a_\]a^\]à\]`b\]`c The logistic growth model is a 3-parameter nonlinear model of the form: y i = Mu u+(m u) exp( kt i ) +ε i, where: y i = the number of bighorn sheep at time t i, M u k = the carrying capacity of the population, = the number of sheep at time 0 (1964 here), = a parameter representing the growth rate Ž y { z y y xyz { }~ { ƒ~ ˆ Š Fitting this model to the 1965-1982 data with nonlinear least squares using the nlinfit function in MatLab gives the following fitted model: y i = M u u+( M u) exp( kt i ), efgh where M= 2210513, u=274643, k= 03279, as computed using the Gauss-Newton method outlined earlier The fitted model is shown in overlay below Is this a good model? The code for fitting the logistic model is given below and can be found on the course webpage under the files sheepm and logisticm The first file is the driver program and the logisticm file defines the logistic function 154 155

% ============================================= % % Fits a logistic growth model to the 1965-1982 % % sheep data, and plots the fitted model % % ============================================= % sheep82year = sheepyear(sheepyear<=1982); % Take only years before 1983 sheep82count = sheepcount(sheepyear<=1982); % Take only cases before 1983 time = sheep82year-1964; % Defines the year-1964 variable beta = [221 27 033]; % Parameter starting values [betahat,resid,j] = nlinfit(time, sheep82count,@logistic,beta); % Performs NLS, returning the parameter % estimates, residuals, & Jacobian yhat = sheep82count-resid; % Computes predicted counts plot(sheep82year,sheep82count, ko, sheep82year,yhat); % Plots sheep counts vs year w/ fitted curve xlabel( Year, fontsize,14); ylabel( Number of Bighorn Sheep, fontsize,14); title( Bighorn Sheep Counts: 1965-1982, fontsize,14, fontweight, b ); I tried a few different things to try to improve the fit, including a Weibull-like modification to the logistic growth model, where I added a parameter in the exponent of the model yielding a generalized logistic growth model: y i = Mu u+(m u) exp[ kt γ i ]+ε i, whereγis a parameter controlling the degree of curvature in the logistic shape Ifγ=1, this reverts to the usual logistic growth model This change is Weibull-like in the sense that we have added a parameter in the exponent of an exponential function, much like the addition of theγ-parameter in going from exponential growth to a Weibull growth model This Weibull-like curvature-adjustment to the logistic model has a corresponding differential equation to which it is the solution This equation has the form: de(y) =γt γ 1 ku 1 u, dt M 156 157

º± ±²³ µ± ²¹ º» Æ ³Ç È É Ê˲» º Ì Í º š š š ª «ª š Ÿ š š œ ž where M is the carrying capacity, u is the population size at time 0, k controls the rate of growth, andγis this curvature piece When parameterized this way, it turns out that the resulting parameter estimates for k &γare unstable because they are highly correlated An alterative, but equivalent way to parameterize this model is as follows: ¼½¾ ÀÁ ½¾ÂÃÄÅ y i = Mu u+(m u) exp[ (kt i ) γ ] +ε i This is the form of the model we will use for the remainder of our treatment of this generalized logistic model M = 2076399 u = 679731 k = 01380 γ = 34542 Essentially,γcontrols the steepness of the S-shape in the logistic model, where larger values ofγcorrespond to steeper curves Since our logistic model for the sheep data appears to not be steep enough, we would expect this parameter to be larger than 1 Fitting this generalized logistic model produced the fitted curve and parameter estimates as given on the next page Does this seem like a better fit? The MatLab code used to fit this generalized logistic model is shown below: 158 159

% ===================================== % % Fits a generalized logistic growth % % model to the 1965-1982 sheep data, % % and plots the fitted mode in overlay % % ===================================== % beta2 = [221 27 03 2]; % Parameter starting values [betahat2,resid2,j2] = nlinfit(time, sheep82count,@genlogistic,beta2); % Performs NLS, returning the parameter % estimates, residuals, & Jacobian yhat2 = sheep82count-resid2; % Computes predicted counts plot(sheep82year,sheep82count, ko, sheep82year,yhat2); % Plots the sheep counts vs year with % fitted curve in overlay xlabel( Year, fontsize,14); ylabel( Number of Bighorn Sheep, fontsize,14); title( Bighorn Sheep Counts (1965-1982) with Generalized Logistic Fit, fontsize,14, fontweight, b ); 160 Which of these parameters do you expect to be the least stable (ie: which do you expect to have the highest variability?) To get the standard errors of these parameters inθ=(m, u, k,γ), we need to compute the Delta Method approximate variance matrix, given as: s 2 ( θ)= Var( θ)=( F F) 1 MSE To compute the F-matrix, we first compute the partial derivatives of f i (θ)= f(x i,θ)= Mu u+(m u) exp[ (kt i ) γ ] with respect to M, u, k, andγrespectively Doing so with repeated use of the product (quotient) rule gives: f i (θ) M f i (θ) u f i (θ) k f i (θ) γ = = u 2 {1 exp[ (kt i ) γ ]} {u+(m u) exp[ (kt i ) γ ]} 2, M 2 exp[ (kt i ) γ ] {u+(m u) exp[ (kt i ) γ ]} 2, = Mu(M u)γkγ 1 t γ i exp[ (kt i) γ ] {u+(m u) exp[ (kt i ) γ ]} 2, = Mu(M u)(kt i) γ exp[ (kt i ) γ ] log(kt i ) {u+(m u) exp[ (kt i ) γ ]} 2 161

Plugging the least squares estimates θ=( M, u, k, γ) into these derivatives yields the F-matrix shown below: F= [f(t 1, θ)] M [f(t 2, θ)] M [f(t 17, θ)] M = [f(t 1, θ)] u [f(t 2, θ)] u [f(t 17, θ)] [f(t 1, θ)] k [f(t 2, θ)] k [f(t 17, θ)] [f(t 1, θ)] γ [f(t 2, θ)] γ [f(t 17, θ)] u k γ 00001 10004 00012 00969 00053 10161 00553 19492 00153 10415 01532 36359 00371 10816 03439 50945 00825 11242 06710 50515 01729 11274 11459 15678 03349 10106 16292 64654 05652 07222 17487 151716 07904 03683 12833 165367, 09293 01276 06181 103187 09834 00303 01980 39937 09973 00050 00427 09987 09997 00006 00061 01615 10000 00000 00006 00165 10000 00000 00000 00010 10000 00000 00000 00000 10000 00000 00000 00000 where the actual values were computed in MatLab Finally, computing MSE ( F F) 1 to get the variance-covariance matrix of the parameter estimates, extracting the diagonal entries of this 4x4 matrix, and taking square roots yields the standard errors of the parameter estimates The MatLab code to do this is given below: % ============================================= % % Computes the Delta Method parameter estimate % % standard errors for a 4-parameter generalized % % logistic model % % ============================================= % n = length(sheep82year); % Sample size p = length(beta2); % No model parameters mse = sum(resid2^2)/(n-p); % Computes the mean squared error (MSE) Mhat = betahat2(1); uhat = betahat2(2); khat = betahat2(3); ghat = betahat2(4); % Renames the three parameter estimates den = (uhat+(mhat-uhat)*exp(-(khat* time)^ghat))^2; % Denominator of first-order derivatives 162 163

dfdm = uhat^2*(1-exp(-(khat*time)^ ghat))/den; % Computes df/dm (nx1 vector) dfdu = Mhat^2*exp(-(khat*time)^ghat)/den; % Computes df/du (nx1 vector) dfdk = Mhat*uhat*(Mhat-uhat)*(time^ ghat)*ghat*khat^(ghat-1)* exp(-(khat*time)^ghat)/den; % Computes df/dk (nx1 vector) dfdg = Mhat*uhat*(Mhat-uhat)*(khat* time)^ghat*log(khat*time)* exp(-(khat*time)^ghat)/den; % Computes df/dg (nx1 vector) Fhat = [dfdm dfdu dfdk dfdg]; % Combines derivatives into nx4 matrix varmat = inv(fhat *Fhat)*mse; % Inverts estimated (F F)*MSE matrix separ = sqrt(diag(varmat)); % Standard errors are square root % diagonal elements of varmat tstar = tinv(1-025/p,n-p); % Computes Bonferroni t-value ci = [betahat2 -tstar*separ betahat2 + tstar*separ]; % Computes Bonferroni CIs for the 3 parameters betahat2 = 207640 67973 01380 34542 separ = 4178 7166 00091 08093 This provides approximate Bonferroni-adjusted simultaneous 95% confidence intervals for the four parameters (using t 1 025/4 (n p)= t 99375 (17 4)= tinv(99375,13)=28961) as: M± t 99375 (n p) SE( M) = 207640 ± 28961(4178) =(19554, 21974) u± t 99375 (n p) SE( u) = 67973 ± 28961(7166) =(4722, 8873) k± t99375 (n p) SE( k) = 01380 ± 28961(00091) =(01116, 01645) γ± t 99375 (n p) SE( γ) = 34542 ± 28961(08093) =(111, 580) In MatLab, the Delta Method of computing standard errors for parameter estimates is the default technique 164 165

and can be easily obtained after calling the nlinfit function, by typing: ci2 = nlparci(betahat2,resid2,j2) % Computes NLS Delta Method CIs This returns approximate 95% individual confidence intervals for the three model parameters based on the Delta Method standard errors We will next explore the use of bootstrapping as an alternative to using Delta Method-based standard errors Both techniques have their problems and consider parameter uncertainty for each parameter separately rather than jointly Later, we will briefly explore the use of Markov chain Monte Carlo (MCMC) methods which account for the dependence among model parameters in measuring uncertainty 166