Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York, 2002 1
Cox (1975). Partial likelihood, Biometrika, 62, 69 76. Fahrmeir and Tutz (2001). Multivariate Statistical Modelling Based on GLM. 2nd ed., Springer, NY. Fokianos (1996). Categorical Time Series: Prediction and Control. Ph.D. Thesis, University of Maryland. Fokianos and Kedem (1998), Prediction and classification of non-stationary categorical time series. J. Multivariate Analysis,67, 277-296. Kedem (1980). Binary Time Series, Marcel Dekker, NY ( ) Kedem and Fokianos (2002). Regression Models for Time Series Analysis, Wiley, NY. 2
McCullagh and Nelder (1989). Generalized Linear Models. 2nd edi., Chapman & Hall, London. Nelder and Wedderburn (1972). Generalized linear models. JRSS, A, 135, 370 384. Slud (1982). Consistency and efficiency of inference with the partial likelihood. Biometrika, 69, 547 552. Slud and Kedem (1994). Partial likelihood analysis of logistic regression and autoregression. Statistica Sinica, 4, 89 106. Wong (1986). Theory of partial likelihood. Annals of Statistics, 14, 88 123. 3
Part I: GLM and Time Series Overview Extension of Nelder and Wedderburn (1972), McCullagh and Nelder(1989) GLM to time series is possible due to: Increasing sequence of histories relative to an observer. Partial likelihood. The partial score is a martingale. Well behaved covariates. 4
Partial Likelihood Suppose we observe a pair of jointly distributed time series, (X t, Y t ), t = 1,..., N, where {Y t } is a response series and {X t } is a time dependent random covariate. Employing the rules of conditional probability, the joint density of all the X, Y observations can be expressed as, f θ (x 1, y 1,..., x N, y N ) = f θ (x 1 ) N t=2 f θ (x t d t ) d t = (y 1, x 1,..., y t 1, x t 1 ) c t = (y 1, x 1,..., y t 1, x t 1, x t ). N t=1 f θ (y t c t ) (1) The second product on the right hand side of (1) constitutes a partial likelihood according to Cox(1975). 5
An increasing sequence of σ-fields F 0 F 1 F 2.... Y 1, Y 2,... a sequence of random variables on some common probability space such that Y t is F t measurable. Y t F t 1 f t (y t ; θ). θ R p is a fixed parameter. The partial likelihood (PL) function relative to θ, F t, and the data Y 1, Y 2,..., Y N, is given by the product PL(θ; y 1,..., y N ) = N t=1 f t (y t ; θ) (2) 6
The General Regression Problem {Y t } is a response time series with the corresponding p dimensional covariate process, Z t 1 = (Z (t 1)1,..., Z (t 1)p ). Define, F t 1 = σ{y t 1, Y t 2,...,Z t 1,Z t 2,...}. Note: Z t 1 may already include past Y t s. The conditional expectation of the response given the past: µ t = E[Y t F t 1 ], ( ) The problem is to relate µ t to the covariates. 7
Time Series Following GLM 1. Random Component. The conditional distribution of the response given the past belongs to the exponential family of distributions in natural or canonical form, { } yt θ t b(θ t ) f(y t ; θ t, φ F t 1 ) = exp + c(y t ; φ). α t (φ) (3) α t (φ) = φ/ω t, dispersion φ, prior weight ω t. 2. Systematic Component. There is a monotone function g( ) such that, g(µ t ) = η t = p j=1 β j Z (t 1)j = Z t 1β. (4) g( ): the link function η t : the linear predictor of the model. 8
Typical choices for η t = Z t 1β, could be or or β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 X t cos(ω 0 t) β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 1 X t + β 4 X t 1 β 0 + β 1 Y t 1 + β 2 Y 7 t 2 + β 3Y t 1 log(x t 12 ) 9
GLM Equations: f(y; θ t ; φ F t 1 )dy = 1. This implies the relationships: µ t = E[Y t F t 1 ] = b (θ t ). (5) Var[Y t F t 1 ] = α t (φ)b (θ t ) α t (φ)v (µ t ). (6) Since Var[Y t F t 1 ] > 0, it follows that b is monotone. Therefore, equation (5) implies that θ t = (b ) 1 (µ t ). (7) We see that θ t itself is a monotone function of µ t and hence it can be used to define a link function. The link function g(µ t ) = θ t (µ t ) = η t = Z t 1 β (8) is called the canonical link function. 10
Example: Poisson Time Series. f(y t ; θ t, φ F t 1 ) = exp {(y t log µ t µ t ) log y t!}. E[Y t F t 1 ] = µ t, b(θ t ) = µ t = exp(θ t ), V (µ t ) = µ t, φ = 1, and ω t = 1. The canonical link is g(µ t ) = θ t (µ t ) = log µ t = η t = Z t 1 β. As an example, if Z t 1 = (1, X t, Y t 1 ), then log µ t = β 0 + β 1 X t + β 2 Y t 1 with {X t } standing for some covariate process, or a possible trend, or a possible seasonal component. 11
Example: Binary Time Series {Y t } takes the values 0,1. Let Then π t = P(Y t = 1 F t 1 ). f(y t ; θ t, φ F t 1 ) = { exp y t log ( πt 1 π t ) + log(1 π t ) } The canonical link gives the logistic regression model g(π t ) = θ t (π t ) = log π t = η t = Z t 1β. (9) 1 π t Note: π t = F l (η t ) (10) 12
Partial Likelihood Inference Given a time series {Y t }, t = 1,..., N, conditionally distributed as (3). The partial likelihood of the observed series is PL(β) = N t=1 f(y t ; θ t, φ F t 1 ). (11) Then from (3), the log partial likelihood, l(β), is given by l(β) = = = N t=1 N t=1 N t=1 log f(y t ; θ t, φ F t 1 ) N l t t=1 } { yt θ t b(θ t ) + c(y t, φ) α t (φ) { yt u(z t 1 β) b(u(z t 1 β)) + c(y t, φ) α t (φ) } (12) 13
( β 1, β 2,, β p ). The partial score is a p dimensional vector, S N (β) l(β) = N t=1 with σ 2 t (β) = Var[Y t F t 1 ]. µ t (Y t µ t (β)) Z t 1 η t σt 2(β) (13) The partial score vector process {S t (β)}, t = 1,..., N, is defined from the partial sums S t (β) = t s=1 µ s (Y s µ s (β)) Z s 1 η s σs(β) 2. (14) 14
The solution of the score equation, S N (β) = logpl(β) = 0 (15) is denoted by ˆβ, and is referred to as the maximum partial likelihood estimator (MPLE) of β. The system of equations (15) is non linear and is customarily solved by the Fisher scoring method, an iterative algorithm resembling the Newton Raphson procedure. Before turning to the Fisher scoring algorithm in our context of conditional inference, it is necessary to introduce several important matrices. 15
An important role in partial likelihood inference is played by the cumulative conditional information matrix, G N (β), defined by a sum of conditional covariance matrices, G N (β) = = N t=1 N t=1 We also need: Cov [ = Z W(β)Z. µ t (Y t µ t (β)) Z t 1 η t σt 2(β) ( ) 2 µt 1 Z t 1 η t σt 2(β)Z t 1 H N (β) l(β). Define R N (β) from the difference H N (β) = G N (β) R N (β). Fact: For canonical links R N (β) = 0. F t 1 ] 16
Fisher Scoring: In Newton-Raphson replace H N (β) by its conditional expectation: ˆβ (k+1) = ˆβ (k) + G 1 N (ˆβ (k) )S N (ˆβ (k) ). Fisher scoring becomes Newton-Raphson for canonical links. Fisher scoring simplifies to Iterative Reweighted Least Squares: ˆβ (k+1) = ( Z W(ˆβ (k) )Z) 1 Z W(ˆβ (k) )q (k). 17
Asymptotic Theory Assumption A A1. The true parameter β belongs to an open set B R p. A2. The covariate vector Z t 1 almost surely lies in a nonrandom compact subset Γ of R p, such that P[ N t=1 Z t 1 Z t 1 > 0] = 1. In addition, Z t 1β lies almost surely in the domain H of the inverse link function h = g 1 for all Z t 1 Γ and β B. A3. The inverse link function h defined in (A2) is twice continuously differentiable and h(γ)/ γ = 0. 18
A4. There is a probability measure ν on R p such that R p zz ν(dz) is positive definite, and such that under (3) and (4) for Borel sets A R p, 1 N N t=1 I [Zt 1 A] ν(a) in probability as N, at the true value of β. A4 calls for asymptotically well behaved covariates: Nt=1 f(z t 1 ) N f(z)ν(dz) R p in probability as N. Thus, there exists a p p limiting information matrix per observation, G(β), such that G N (β) G(β) (16) N in probability, as N. 19
Slud and K (1994), Fokianos and K (1998): 1. {S t (β)} relative to {F t }, t = 1,..., N, is a martingale. 2. R N(β) N 0. 3. S N(β) N N p (0,G(β)). 4. N(ˆβ β) N p (0,G 1 (β)). 20
100(1 α)% prediction interval (h = g 1 ) µ t (β) = µ t (ˆβ)±z α/2 h (Z t 1 β) N Z t 1 G 1 (β)z t 1 Hypothesis Testing Let β be the MPLE of β obtained under H 0 with r < p restrictions, H 0 : β 1 = = β r = 0. Let ˆβ be the unrestricted MPLE. The log partial likelihood ratio statistic converges to χ 2 r. λ N = 2 { l(ˆβ) l( β) } (17) 21
More generally: Assume C is a known matrix with full rank r, r < p. Under the general linear hypothesis, H 0 : Cβ = β 0 against H 1 : Cβ β 0, (18) {Cˆβ β 0 } {CG 1 (ˆβ)C } 1 {Cˆβ β 0 } χ 2 r 22
Diagnostics l(y; y): maximum log partial likelihood corresponding to the saturated model. l(ˆµ; y): maximum log partial likelihood from the reduced model. Scaled Deviance: D 2{l(y;y) l(ˆµ;y)} χ 2 N p AIC(p) = 2logPL(ˆβ) + 2p, BIC(p) = 2logPL(ˆβ) + plog N 23
Analysis of Mortality Count Data in LA Weekly data from Los Angeles County during a period of 10 years from January 1, 1970, to December 31, 1979: Weekly sampled filtered time series. N = 508. Response Y Weather T RH Pollution CO SO 2 NO 2 HC OZ KM Total Mortality (filtered) Temperature Relative humidity Carbon monoxide Sulfur dioxide Nitrogen dioxide Hydrocarbons Ozone Particulates 24
140 160 180 200 220 50 60 70 80 90 100 1.0 1.5 2.0 2.5 3.0 Mortality 0 100 200 300 400 500 0 5 10 15 20 25 Lag Temperature ACF 0.0 0.2 0.4 0.6 0.8 1.0 Series : Mort 0 100 200 300 400 500 0 5 10 15 20 25 Lag log(co) Mortality, Temperature, log(co) ACF -0.5 0.0 0.5 1.0 ACF -0.4 0.0 0.2 0.4 0.6 0.8 1.0 Series : Temp Series : log(co) 0 100 200 300 400 500 0 5 10 15 20 25 Lag Weekly data of filtered total mortality and temperature, and log-filtered CO, and the corresponding estimated autocorrelation functions. N = 508. 25
Covariates and η t used in Poisson regression. S = SO 2, N = NO 2. To recover η t, insert the β s. For Model 2, η t = β 0 + β 1 Y t 1 + β 2 Y t 2, etc. Model 0 T t + RH t + CO t + S t + N t +HC t + OZ t + KM t Model 1 Y t 1 Model 2 Y t 1 + Y t 2 Model 3 Y t 1 + Y t 2 + T t 1 Model 4 Y t 1 + Y t 2 + T t 1 + log(co t ) Model 5 Y t 1 + Y t 2 + T t 1 + T t 2 + log(co t ) Model 6 Y t 1 + Y t 2 + T t + T t 1 + log(co t ) 26
Comparison of 7 Poisson regression models. N = 508. Model p D df AIC BIC 0 9 315.69 499 333.69 371.76 1 2 276.07 506 280.07 288.53 2 3 222.23 505 228.23 240.92 3 4 203.52 504 211.52 228.44 4 5 174.55 503 184.55 205.71 5 6 174.53 502 186.53 211.91 6 6 171.41 502 183.41 208.79 Choose Model 4: log(ˆµ t ) = ˆβ 0 + ˆβ 1 Y t 1 + ˆβ 2 Y t 2 + ˆβ 3 T t 1 + ˆβ 4 log(co t ) 27
Poisson Regression of Filtered LA Mortality Mrt(t) on Mrt(t-1), Mrt(t-2), Tmp(t-1), log(crb(t)) 140 160 180 200 220 Data... Fit 0 100 200 300 400 500 Observed and predicted weekly filtered total mortality from Model 4. 28
Comparison of Residuals Model 0 Residuals Model 4 Residuals -0.1 0.1 0.3 0 100 200 300 400 500-0.15 0.0 0.10 0 100 200 300 400 500 Series : Resid0 Series : Resid4 ACF 0.0 0.4 0.8 ACF 0.0 0.4 0.8 0 5 10 15 20 25 Lag 0 5 10 15 20 25 Lag Working residuals from Models 0 and 4, and their respective estimated autocorrelation. 29
Part II: Binary Time Series Example of bts: Two categories obtained by clipping. Y t I [Xt C] = { 1, if Xt C 0, if X t C (19) Y t I [Xt r] = { 1, if Xt r 0, if X t < r (20) Other examples: at time t = 1,2,..., (Rain, No Rain), (S&P Up, S&P Down), etc. 30
{Y t } taking the values 0 or 1, t = 1,2,3,. {Z t 1 } p-dim covariate stochastic data. Against the backdrop of the general framework presented above we wish to relate µ t (β) = π t (β) = P β (Y t = 1 F t 1 ) (21) to the covariates. For this we need good links! 31
Standard logistic distribution, F l (x) = Then, ex 1 + e x = 1 1 + e x, < x < Fl 1 (x) = log(x/(1 x)) is the natural link under some conditions. 32
Fact: For any bts, there are θ j such that { } P(Yt = 1 Y log t 1 = y t 1,..., Y 1 = y 1 ) = P(Y t = 0 Y t 1 = y t 1,..., Y 1 = y 1 ) θ 0 + θ 1 y t 1 + + θ p y t p or π t (β) = 1 1 + exp[ (θ 0 + θ 1 y t 1 + + θ p y t p )] Fact: Consider an AR(p) time series X t = γ 0 + γ 1 X t 1 +... + γ p X t p + λǫ t where ǫ t are i.i.d. logistically distributed. Define: Y t = I [Xt r]. Then π t (β) = 1 1 + exp[ (γ 0 r + γ 1 X t 1 + + γ p X t p )/λ] 33
This motivates logistic regression: π t (β) P β (Y t = 1 F t 1 ) = F l (β Z t 1 ) 1 = 1 + exp[ β Z t 1 ] or equivalently, the link function is { } πt (β) logit(π t (β)) log = β Z t 1 1 π t (β) This is the canonical link. 34
Link functions for binary time series. logit β Z t 1 = log{π t (β)/(1 π t (β))} probit β Z t 1 = Φ 1 {π t (β)} log-log β Z t 1 = log{ log(π t (β))} C-log-log β Z t 1 = log{ log(1 π t (β))} Note: Here all the inverse links are cdf s. In what follows we always assume the inverse link is a differentiable cdf F(x). 35
The partial likelihood of β takes on the simple product form, PL(β) = = N t=1 N t=1 [π t (β)] y t [1 π t (β)] 1 y t [F(β Z t 1 )] y t [1 F(β Z t 1 )] 1 y t We have under Assumption A: N(ˆβ β) N p (0,G 1 (β)). For the canonical link (logistic regression): G N (β) N G(β) = Rp e β z (1 + e β z ) 2zz ν(dz) 36
Illustration of asymptotic normality. ( ) 2πt logit(π t (β)) = β 1 + β 2 cos + β 3 Y t 1 12 so that Z t 1 = (1,cos(2πt/12), Y t 1 ). (a) 0.0 0.4 0.8 0 50 100 150 200 (b) 0.4 0.6 0.8 0 50 100 150 200 t Logistic autoregression with a sinusoidal component. a. Y t. b. π t (β) where logit(π t (β)) = 0.3 + 0.75cos(2πt/12) + y t 1. 37
b1 b2 0.0 0.1 0.2 0.3 0.4-2 -1 0 1 2 3 4 0.0 0.1 0.2 0.3 0.4-2 0 2 4 b3 0.0 0.1 0.2 0.3 0.4-4 -2 0 2 Histograms of normalized MPLE s where β = (0.3,0.75,1), N = 200. Each histogram consists of 1000 estimates. 38
Goodness of Fit C 1,, C k, a partition of R p. For j = 1,, k, define, and Put: M j E j (β) N t=1 N t=1 I [Zt 1 C j ] Y t I [Zt 1 C j ] π t(β) M (M 1,, M k ), E(β) (E 1 (β),, E k (β)). 39
Slud and K (1994), K and Fokianos (2002): With σ 2 j C j F(β z)(1 F(β z))ν(dz) χ 2 (β) 1 N k j=1 (M j E j (β)) 2 /σ 2 j χ2 k. In practice need to adjust the df when replacing β by ˆβ. 40
When ˆβ and (M E(β)) are obtained from the same data set, E(χ 2 (ˆβ)) k k j=1 (B G(β)B) jj /σ 2 j When ˆβ and (M E(β)) are obtained from independent data sets, E(χ 2 (ˆβ)) k + k j=1 (B G(β)B) jj /σ 2 j 41
Illustration of the distribution of χ 2 (β) using Q-Q plots. Consider the previous logistic regression model with a periodic component. Use the partition C 1 = {Z : Z 1 = 1, 1 Z 2 < 0, Z 3 = 0} C 2 = {Z : Z 1 = 1, 1 Z 2 < 0, Z 3 = 1} C 3 = {Z : Z 1 = 1,0 Z 2 1, Z 3 = 0} C 4 = {Z : Z 1 = 1,0 Z 2 1, Z 3 = 1} Then, k=4, M j is the sum of those Y t s for which Z t 1 is in C j, j = 1,2,3,4, and the E j (β) are obtained similarly. Estimate σ 2 j by, σ 2 j = 1 N N t=1 I [Zt 1 C j ] π t(β)(1 π t (β)) 42
The χ 2 4 approximation is quite good. N=200, b=(0.3,0.75,1) N=400, b=(0.3,1,2) TRUE CHI SQ 0 2 4 6 8 10 12 14 TRUE CHI SQ 0 5 10 15 0 5 10 15 20 25 CHI SQ STATISTIC 0 5 10 15 20 CHI SQ STATISTIC 43
Example: Modeling successive eruptions of the Old Faithful geyser in Yellowstone National Park, Wyoming. 1 if duration is greater than 3 minutes 0 if duration is less than 3 minutes 101110110101011010110101010 111110101010101010101010101 011111010101011010111011111 011101010101010101010101010 101101010101011101111111011 111011111110101010101011111 101010101110101011010111101 010101110101011011011101010 101101111111010101111011011 101101011101011111011101010 110101111111101010101010101 10 N = 299 44
Candidate models η t = β Z t 1 for Old Faithful. 1 β 0 + β 1 Y t 1 2 β 0 + β 1 Y t 1 + β 2 Y t 2 3 β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 3 4 β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 3 + β 4 Y t 4 Comparison of models using the logistic link: Model 2 is best. Probit reg. gives similar results. Model p X 2 D AIC BIC 1 2 165.00 227.38 231.38 238.46 2 3 165.00 215.53 221.53 232.15 3 4 165.00 215.08 223.08 237.24 4 5 164.97 213.99 223.99 241.69 ˆπ t = π t (ˆβ) = 1 1 + exp { (ˆβ 0 + ˆβ 1 Y t 1 + ˆβ 2 Y t 2 ) }. 45
Part III: Categorical Time Series EEG sleep state classified or quantized in four categories as follows. 1 : quiet sleep, 2 : indeterminate sleep, 3 : active sleep, 4 : awake. Here the sleep state categories or levels are assigned integer values. This is an example of a categorical time series {Y t }, t = 1,..., N, taking the values 1,...,4. This is an arbitrary integer assignment. Why not the values 7.1, 15.8, 19.24, 71.17? Any other scale? 46
Assume m categories. The t th observation of any categorical time series regardless of the measurement scale can be represented by the vector Y t = (Y t1,..., Y tq ) of length q = m 1, with elements { 1, if the jth category is observed at time t Y tj = 0, otherwise for t = 1,..., N and j = 1,..., q. BTS is a special case with m = 2, q = 1. 47
Write for j = 1,..., q, π tj = E[Y tj F t 1 ] = P(Y tj = 1 F t 1 ), Define: π t = (π t1,... π tq ) Y tm = 1 π tm = 1 q Y tj j=1 q j=1 π tj. Let {Z t 1 }, t = 1,..., N, for a p q matrix that represents a covariate process. Y tj corresponds to a vector of length p of random time dependent covariates which forms the jth column of Z t 1. 48
Assume the general regression model: ( ) π t (β) = π t1 (β) π t2 (β) π tq (β) = h 1 (Z t 1 β) h 2 (Z t 1 β) h q (Z t 1 β) = h(z t 1 β). The inverse link function h is defined on R q and takes values in R q. We shall only examine nominal and ordinal time series. 49
Nominal Time Series. Nominal categorical variables lack natural ordering. A model for nominal time series: Multinomial logit model π tj (β) = Note that exp(β j z t 1) 1 + q l=1 exp(β l z t 1), j = 1,..., q π tm (β) = 1 1 + q l=1 exp(β l z t 1). 50
Multinomial logit model is a special case of (*). Indeed, define β to be the p qd-vector β = (β 1,..., β q ), and Z t 1 the qd q matrix Z t 1 = z t 1 0 0 0 z t 1 0...... 0 0 z t 1. Let h stand for the vector valued function whose components h j, j = 1,..., q, are given by π tj (β) = h j (η t ) = with exp(η tj ) 1 + q l=1 exp(η tl), j = 1,..., q η t = (η t1,..., η tq ) = Z t 1 β. 51
Ordinal Time Series. Measured on a scale endowed with a natural ordering. A model for ordinal time series: Need a latent or auxiliary variable. Put X t = γ z t 1 + e t, 1. e t cdf F i.i.d. 2. γ d-dim vector of parameters. 3. z t 1 covariate d-dim vector. Define a categorical time series {Y t }, from the levels of {X t }, Y t = j Y tj = 1 θ j 1 X t < θ j = θ 0 < θ 1 <... < θ m =. 52
Then π tj = P(θ j 1 X t < θ j F t 1 ) = F(θ j + γ z t 1 ) F(θ j 1 + γ z t 1 ), for j = 1,..., m. There are many possibilities depending on F. Special case: Proportional Odds Model, F(x) = F l (x) = 1 1 + exp( x). Then we have for j = 1,..., q, { } P[Yt j F log t 1 ] = θ j + γ z t 1 P[Y t > j F t 1 53
Proportional odds model has the form (*) with p = (q + d): β = (θ 1,..., θ q, γ ) and Z t 1 the (q + d) q matrix Now set Z t 1 = 1 0 0 0 1 0...... 0 0 1 z t 1 z t 1 z t 1. h = (h 1,..., h q ), and let for j = 2,..., q, π t1 (β) = h 1 (η t ) = F(η t1 ), π tj (β) = h j (η t ) = F(η tj ) F(η t(j 1) ), where η t = (η t1,..., η tq ) = Z t 1 β. 54
Partial likelihood estimation. Introduce the multinomial probability f(y t ;β F t 1 ) = m j=1 π tj (β) y tj. The partial likelihood is a product of the multinomial probabilities, PL(β) = = N t=1 N t=1 j=1 f(y t ; β F t 1 ) m π y tj tj (β), so that the partial log-likelihood is given by l(β) logpl(β) = N m t=1 j=1 y tj log π tj (β). 55
Under a modified Assumption A: G N (β) N R p q ZU(β)Σ(β)U (β)z ν(dz) = G(β) ( N(ˆβ β) N p 0,G 1 (β) ) N ( πt (ˆβ) π t (β) ) N q ( 0,Zt 1 D t (β)g 1 (β)d t(β)z t 1 ) 56
Example: Sleep State. Covariates: Heart Rate, Temperature. N = 700. 1 : quiet sleep, 2 : indeterminate sleep, 3 : active sleep, 4 : awake. Ordinal CTS: 4 < 1 < 2 < 3. State 1.0 2.0 3.0 4.0 0 200 400 600 800 1000 time Heart Rate 100 140 180 0 200 400 600 800 1000 time Temperature 36.9 37.2 0 200 400 600 800 1000 time 57
Fit proportional odds models. Model Covariates AIC 1 1+Y t 1 401.56 2 1+Y t 1 +log R t 401.51 3 1+Y t 1 +log R t +T t 403.32 4 1+Y t 1 +T t 403.52 5 1+Y t 1 +Y t 2 +log R t 407.28 6 1+Y t 1 +log R t 1 403.40 7 1+log R t 1692.31 58
Model 2: 1 + Y t 1 + log R t [ ] P(Yt 4 F log t 1 ) = P(Y t > 4 F t 1 ) θ 1 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, [ ] P(Yt 1 F log t 1 ) = P(Y t > 1 F t 1 ) θ 2 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, ] [ P(Yt 2 F log t 1 ) = P(Y t > 2 F t 1 ) θ 3 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, ˆθ 1 = 30.352, ˆθ 2 = 23.493, ˆθ 3 = 20.349, ˆγ 1 = 16.718, ˆγ 2 = 9.533, ˆγ 3 = 4.755, ˆγ 4 = 3.556. The corresponding standard errors are 12.051, 12.012, 11.985, 0.872, 0.630, 0.501 and 2.470. 59
(a) State 1.0 2.0 3.0 4.0 0 50 100 150 200 250 300 (b) State 1.0 2.0 3.0 4.0 0 50 100 150 200 250 300 (a) Observed versus (b) predicted sleep states for Model 2 of Table applied to the testing data set. N = 322. 60