Nonlinear Statistical Models

Nonlinear Statistical Models Earlier in the course, we considered the general linear statistical model (GLM): y i =β 0 +β 1 x 1i + +β k x ki +ε i, i= 1,, n, written in matrix form as: y 1 y n = 1 x 11 x p 1,1 1 x 12 x p 1,2 1 x 1n x p 1,n y= Xβ+ε, OR β 0 β 1 β p 1 + Specifically, we discussed estimation of model parameters, measurements of uncertainty in these estimates and the predicted values, and the use of confidence intervals or significance tests to assess the significance of parameters Through the matrix formulation, we saw that in linear models (model is a linear function of the parameters), the estimates of linear functions of the response values and their standard errors have simple closed-form expressions and can thus be analytically computed Unfortunately, this will not be the case for nonlinear models 144 ε 1 ε 2 ε n Definition: As before, consider a set of n observed values on a response variable y assumed to be dependent on k explanatory variables x=(x 1,, x k ) This situation can be described through the following nonlinear statistical model: y i = f(x i,θ)+ε i, i= 1,, n, where x =[x i 1i, x 2i,, x ki ] is the row vector of observations on k explanatory variables x 1,, x k for the i th observational unit,θ=(θ 1,,θ p ) is a vector of p parameters, andε i is the i th error term In a matrix setting, this model can be expressed as: y 1 f(x,θ) ε 1 1 y 2 y n }{{} y = f(x,θ) 2 f(x n } {{,θ) } f(θ) + ε 2 ε n }{{} ε Typically, we assumeε i iid (0,σ 2 ) and that the explanatory variables in x are fixed If f(θ)= Xθ, where X is the design matrix as defined earlier, then the model reverts to a linear model 145

As Leonid has argued, nonlinear models commonly arise as solutions to a system of differential equations which describe some dynamic process in a particular application Some common examples of nonlinear models are given below: 1 Exponential Growth/Decay Model: y i =αe β x i+ε i, where x is typically time This arises as a solution of the differential equation: de(y) =β E(y) d x 2 Two-term Exponential Model: y i = θ 1 θ 1 θ 2 e θ 2 x i e θ 1 x i +εi This model results when two processes lead to exponential growth or decay For example, if we are monitoring the concentration of a drug in the bloodstream, which enters the bloodstream from muscle tissue and is removed by the kidneys, then this model is a solution to the differential equation: de(y) =θ 1 E(y m ) θ 2 E(y), d x where y m is the concentration of the drug in the muscle tissue and y is the concentration in the blood 3 Mitscherlich Model: y i =α 1 e β(x i+δ) +ε i As an example of where this model is used, let y be the yield of a crop, let x be the amount of some nutrient, and letαbe the maximum attainable yield Then if we assume the increase in yield is proportional to the difference between α and the actual yield, we get the following differential equation, to which the model is a solution: d E( y) =β α E(y), d x whereδis the equivalent nutrient value of the soil 4 Logistic Growth Model: y i = Mu u+(m u) exp( kt i ) +ε i This model results from a differential equation where we assume the population growth rate decays as the population increases, so that it 146 147

%&)!"#"$ reaches a carrying capacity M The form of this differential equation was discussed in some detail in the past weeks, and is given by: de(y) = ke(y) 1 E(y), E(y t= 0)=u dt M where M is the carrying capacity, u is the population size at time 0, and k controls the rate of growth The form of this model is shown as a fit to the AIDS data (1981-1992) below %&'%( Fitting Nonlinear Models via Least Squares : In principle, fitting a nonlinear statistical model using least squares is no different than fitting a linear one That is, for the model y i = f(x i,θ)+ε i, we seek to find that parameter vector θ that minimizes the sum of the squared errors: Q(θ y, X) = n ε 2 i= i=1 n [y i f(x i,θ)]2 i=1 = [y f(θ)] [y f(θ)]=ε ε where f(θ) is the n 1 vector of f(x,θ) values i Differentiating Q with respect to theθ j (in an effort to minimize Q), setting the resulting equations to zero and evaluating atθ= θ gives: Q(θ y, X) n = 2 [y θ j i f(x, θ)] f(x, θ) i i θ= θ i=1 θ =0, j j= 1,, p These resulting equations are p nonlinear equations in the p parameters, which in general cannot be solved analytically to obtain explicit solutions forθ As a result, numerical methods are used to solve these nonlinear systems In MatLab, there are several numerical methods that can be used, although the default is a 148 149

generalization of the standard Gauss-Newton method, known as the Levenberg-Marquardt Algorithm Gauss-Newton Method: This method, like others that may be used, is iterative in nature, and requires starting values for the parameters These starting values are then continually adjusted to home in on the least squares solution The method is conducted via the following steps: 1 Consider taking a 1st-order Taylor expansion of f(θ) about some starting parameter vectorθ 0 : f(θ) f(θ 0 )+ F(θ 0 )(θ θ 0 ), (1) where F(θ 0 ) is the n p matrix of partial derivatives evaluated atθ 0 and the n data values x, as given i below: [f(x 1,θ0 )] [f(x 1,θ0 )] [f(x 1,θ0 )] θ 1 θ 2 θ p [f(x 2,θ0 )] [f(x 2,θ0 )] [f(x 2 F(θ 0,θ0 )] )= θ 1 θ 2 θ p [f(x n,θ0 )] [f(x n,θ0 )] [f(x n,θ0 )] θ 1 θ 2 θ p 2 Calculate f(θ 0 ) and F(θ 0 ) Recognizing f(θ) as an approximately linearized model inθ from equation (1), linear least squares is used to update the starting values and obtain the solution at the next iteration 3 Repeat steps 1 & 2 until the change in the parameters is below some tolerance Some Key Results: Based on this same Taylor expansion, Gallant (1987) established the following asymptotic (large sample) results: 1 If we assumeε N(0,σ 2 I), then the least squares parameter vector θ is approximately normally distributed with meanθ and variance matrix Var( θ)=(f F) 1 σ 2, written: θ N[θ,(F F) 1 σ 2 ] In practice, we compute F( θ) as an estimate of F(θ) and estimateσ 2 with MSE=SSE/(n p) to give the asymptotic variance-covariance matrix for θ as: s 2 ( θ)= Var( θ)=( F F) 1 MSE, where: F= F( θ) 150 151

EFGH IJKLM IN OPQ RFMH SRLT UFGH INV WXYZ[ WX 1-/ 1/ D: : ACB*-/ ; < A@? > = < */;: 9876-/ / *+,-*+/*+-*+0/*+0-*+/*+-1/ 2345 This gives an approximate 100(1 α)% confidence interval for a given parameterθ i as: θ i ± t 1 α,n p s( θ) ii where( F F) 1 ii = θ i ± t 1 α,n p MSE( F F) 1 ii is the i th diagonal entry of( F F) 1 2 Again assuming the errorsε N(0,σ 2 I), then: Var( y)= F(F F) 1 F σ 2, which can be estimated as: s 2 ( y)= Var( y)= F( F F) 1 F MSE This technique of quantifying uncertainty through the use of a first-order Taylor series expansion is known as the Delta Method and was the most commonly used technique for finding approximate standard errors before the advent of modern computing Gallant (1987) illustrated several cases where these approximations are poor, giving confidence intervals and hypothesis tests which are completely wrong, Example: Data were collected on the number of big horn sheep on the Mount Everts winter range in Yellowstone between 1965 and 1999 (obtained from Marty Kardos) Consider the time plot of these data below The population suffered a die-off resulting from the infectious keratoconjunctivitis in 1983 Marty indicates that this is fairly typical for bighorn sheep populations: logistic-like population growth up to a carrying capacity followed by disease-related die-offs, and subsequent recovery For now, consider modeling the initial population growth from 1965 to 1982 using a logistic growth model Here, we will fit the model, and use it to illustrate the calculation of standard errors for the parameter estimates from this nonlinear model Some more modern techniques for estimating standard errors and subsequently performing statistical inferences in nonlinear models are the method of bootstrapping, and Markov chain Monte Carlo (MCMC) methods 152 153

y ˆŒ cdb w m m cb t v u n o \db t s rq \b p o n m l db k j i b \]^_\]^\]^`\]ab\]ac\]a_\]a^\]à\]`b\]`c The logistic growth model is a 3-parameter nonlinear model of the form: y i = Mu u+(m u) exp( kt i ) +ε i, where: y i = the number of bighorn sheep at time t i, M u k = the carrying capacity of the population, = the number of sheep at time 0 (1964 here), = a parameter representing the growth rate Ž y { z y y xyz { }~ { ƒ~ ˆ Š Fitting this model to the 1965-1982 data with nonlinear least squares using the nlinfit function in MatLab gives the following fitted model: y i = M u u+( M u) exp( kt i ), efgh where M= 2210513, u=274643, k= 03279, as computed using the Gauss-Newton method outlined earlier The fitted model is shown in overlay below Is this a good model? The code for fitting the logistic model is given below and can be found on the course webpage under the files sheepm and logisticm The first file is the driver program and the logisticm file defines the logistic function 154 155

% ============================================= % % Fits a logistic growth model to the 1965-1982 % % sheep data, and plots the fitted model % % ============================================= % sheep82year = sheepyear(sheepyear<=1982); % Take only years before 1983 sheep82count = sheepcount(sheepyear<=1982); % Take only cases before 1983 time = sheep82year-1964; % Defines the year-1964 variable beta = [221 27 033]; % Parameter starting values [betahat,resid,j] = nlinfit(time, sheep82count,@logistic,beta); % Performs NLS, returning the parameter % estimates, residuals, & Jacobian yhat = sheep82count-resid; % Computes predicted counts plot(sheep82year,sheep82count, ko, sheep82year,yhat); % Plots sheep counts vs year w/ fitted curve xlabel( Year, fontsize,14); ylabel( Number of Bighorn Sheep, fontsize,14); title( Bighorn Sheep Counts: 1965-1982, fontsize,14, fontweight, b ); I tried a few different things to try to improve the fit, including a Weibull-like modification to the logistic growth model, where I added a parameter in the exponent of the model yielding a generalized logistic growth model: y i = Mu u+(m u) exp[ kt γ i ]+ε i, whereγis a parameter controlling the degree of curvature in the logistic shape Ifγ=1, this reverts to the usual logistic growth model This change is Weibull-like in the sense that we have added a parameter in the exponent of an exponential function, much like the addition of theγ-parameter in going from exponential growth to a Weibull growth model This Weibull-like curvature-adjustment to the logistic model has a corresponding differential equation to which it is the solution This equation has the form: de(y) =γt γ 1 ku 1 u, dt M 156 157

º± ±²³ µ± ²¹ º» Æ ³Ç È É ÊË²» º Ì Í º š š š ª «ª š Ÿ š š œ ž where M is the carrying capacity, u is the population size at time 0, k controls the rate of growth, andγis this curvature piece When parameterized this way, it turns out that the resulting parameter estimates for k &γare unstable because they are highly correlated An alterative, but equivalent way to parameterize this model is as follows: ¼½¾ ÀÁ ½¾ÂÃÄÅ y i = Mu u+(m u) exp[ (kt i ) γ ] +ε i This is the form of the model we will use for the remainder of our treatment of this generalized logistic model M = 2076399 u = 679731 k = 01380 γ = 34542 Essentially,γcontrols the steepness of the S-shape in the logistic model, where larger values ofγcorrespond to steeper curves Since our logistic model for the sheep data appears to not be steep enough, we would expect this parameter to be larger than 1 Fitting this generalized logistic model produced the fitted curve and parameter estimates as given on the next page Does this seem like a better fit? The MatLab code used to fit this generalized logistic model is shown below: 158 159

% ===================================== % % Fits a generalized logistic growth % % model to the 1965-1982 sheep data, % % and plots the fitted mode in overlay % % ===================================== % beta2 = [221 27 03 2]; % Parameter starting values [betahat2,resid2,j2] = nlinfit(time, sheep82count,@genlogistic,beta2); % Performs NLS, returning the parameter % estimates, residuals, & Jacobian yhat2 = sheep82count-resid2; % Computes predicted counts plot(sheep82year,sheep82count, ko, sheep82year,yhat2); % Plots the sheep counts vs year with % fitted curve in overlay xlabel( Year, fontsize,14); ylabel( Number of Bighorn Sheep, fontsize,14); title( Bighorn Sheep Counts (1965-1982) with Generalized Logistic Fit, fontsize,14, fontweight, b ); 160 Which of these parameters do you expect to be the least stable (ie: which do you expect to have the highest variability?) To get the standard errors of these parameters inθ=(m, u, k,γ), we need to compute the Delta Method approximate variance matrix, given as: s 2 ( θ)= Var( θ)=( F F) 1 MSE To compute the F-matrix, we first compute the partial derivatives of f i (θ)= f(x i,θ)= Mu u+(m u) exp[ (kt i ) γ ] with respect to M, u, k, andγrespectively Doing so with repeated use of the product (quotient) rule gives: f i (θ) M f i (θ) u f i (θ) k f i (θ) γ = = u 2 {1 exp[ (kt i ) γ ]} {u+(m u) exp[ (kt i ) γ ]} 2, M 2 exp[ (kt i ) γ ] {u+(m u) exp[ (kt i ) γ ]} 2, = Mu(M u)γkγ 1 t γ i exp[ (kt i) γ ] {u+(m u) exp[ (kt i ) γ ]} 2, = Mu(M u)(kt i) γ exp[ (kt i ) γ ] log(kt i ) {u+(m u) exp[ (kt i ) γ ]} 2 161

Plugging the least squares estimates θ=( M, u, k, γ) into these derivatives yields the F-matrix shown below: F= [f(t 1, θ)] M [f(t 2, θ)] M [f(t 17, θ)] M = [f(t 1, θ)] u [f(t 2, θ)] u [f(t 17, θ)] [f(t 1, θ)] k [f(t 2, θ)] k [f(t 17, θ)] [f(t 1, θ)] γ [f(t 2, θ)] γ [f(t 17, θ)] u k γ 00001 10004 00012 00969 00053 10161 00553 19492 00153 10415 01532 36359 00371 10816 03439 50945 00825 11242 06710 50515 01729 11274 11459 15678 03349 10106 16292 64654 05652 07222 17487 151716 07904 03683 12833 165367, 09293 01276 06181 103187 09834 00303 01980 39937 09973 00050 00427 09987 09997 00006 00061 01615 10000 00000 00006 00165 10000 00000 00000 00010 10000 00000 00000 00000 10000 00000 00000 00000 where the actual values were computed in MatLab Finally, computing MSE ( F F) 1 to get the variance-covariance matrix of the parameter estimates, extracting the diagonal entries of this 4x4 matrix, and taking square roots yields the standard errors of the parameter estimates The MatLab code to do this is given below: % ============================================= % % Computes the Delta Method parameter estimate % % standard errors for a 4-parameter generalized % % logistic model % % ============================================= % n = length(sheep82year); % Sample size p = length(beta2); % No model parameters mse = sum(resid2^2)/(n-p); % Computes the mean squared error (MSE) Mhat = betahat2(1); uhat = betahat2(2); khat = betahat2(3); ghat = betahat2(4); % Renames the three parameter estimates den = (uhat+(mhat-uhat)*exp(-(khat* time)^ghat))^2; % Denominator of first-order derivatives 162 163

dfdm = uhat^2*(1-exp(-(khat*time)^ ghat))/den; % Computes df/dm (nx1 vector) dfdu = Mhat^2*exp(-(khat*time)^ghat)/den; % Computes df/du (nx1 vector) dfdk = Mhat*uhat*(Mhat-uhat)*(time^ ghat)*ghat*khat^(ghat-1)* exp(-(khat*time)^ghat)/den; % Computes df/dk (nx1 vector) dfdg = Mhat*uhat*(Mhat-uhat)*(khat* time)^ghat*log(khat*time)* exp(-(khat*time)^ghat)/den; % Computes df/dg (nx1 vector) Fhat = [dfdm dfdu dfdk dfdg]; % Combines derivatives into nx4 matrix varmat = inv(fhat *Fhat)*mse; % Inverts estimated (F F)*MSE matrix separ = sqrt(diag(varmat)); % Standard errors are square root % diagonal elements of varmat tstar = tinv(1-025/p,n-p); % Computes Bonferroni t-value ci = [betahat2 -tstar*separ betahat2 + tstar*separ]; % Computes Bonferroni CIs for the 3 parameters betahat2 = 207640 67973 01380 34542 separ = 4178 7166 00091 08093 This provides approximate Bonferroni-adjusted simultaneous 95% confidence intervals for the four parameters (using t 1 025/4 (n p)= t 99375 (17 4)= tinv(99375,13)=28961) as: M± t 99375 (n p) SE( M) = 207640 ± 28961(4178) =(19554, 21974) u± t 99375 (n p) SE( u) = 67973 ± 28961(7166) =(4722, 8873) k± t99375 (n p) SE( k) = 01380 ± 28961(00091) =(01116, 01645) γ± t 99375 (n p) SE( γ) = 34542 ± 28961(08093) =(111, 580) In MatLab, the Delta Method of computing standard errors for parameter estimates is the default technique 164 165

and can be easily obtained after calling the nlinfit function, by typing: ci2 = nlparci(betahat2,resid2,j2) % Computes NLS Delta Method CIs This returns approximate 95% individual confidence intervals for the three model parameters based on the Delta Method standard errors We will next explore the use of bootstrapping as an alternative to using Delta Method-based standard errors Both techniques have their problems and consider parameter uncertainty for each parameter separately rather than jointly Later, we will briefly explore the use of Markov chain Monte Carlo (MCMC) methods which account for the dependence among model parameters in measuring uncertainty 166