Regression Models for Time Series Analysis

Size: px
Start display at page:

Download "Regression Models for Time Series Analysis"

Transcription

1 Regression Models for Time Series Analysis Benjamin Kedem 1 and Konstantinos Fokianos 2 1 University of Maryland, College Park, MD 2 University of Cyprus, Nicosia, Cyprus Wiley, New York,

2 Cox (1975). Partial likelihood, Biometrika, 62, Fahrmeir and Tutz (2001). Multivariate Statistical Modelling Based on GLM. 2nd ed., Springer, NY. Fokianos (1996). Categorical Time Series: Prediction and Control. Ph.D. Thesis, University of Maryland. Fokianos and Kedem (1998), Prediction and classification of non-stationary categorical time series. J. Multivariate Analysis,67, Kedem (1980). Binary Time Series, Marcel Dekker, NY ( ) Kedem and Fokianos (2002). Regression Models for Time Series Analysis, Wiley, NY. 2

3 McCullagh and Nelder (1989). Generalized Linear Models. 2nd edi., Chapman & Hall, London. Nelder and Wedderburn (1972). Generalized linear models. JRSS, A, 135, Slud (1982). Consistency and efficiency of inference with the partial likelihood. Biometrika, 69, Slud and Kedem (1994). Partial likelihood analysis of logistic regression and autoregression. Statistica Sinica, 4, Wong (1986). Theory of partial likelihood. Annals of Statistics, 14,

4 Part I: GLM and Time Series Overview Extension of Nelder and Wedderburn (1972), McCullagh and Nelder(1989) GLM to time series is possible due to: Increasing sequence of histories relative to an observer. Partial likelihood. The partial score is a martingale. Well behaved covariates. 4

5 Partial Likelihood Suppose we observe a pair of jointly distributed time series, (X t, Y t ), t = 1,..., N, where {Y t } is a response series and {X t } is a time dependent random covariate. Employing the rules of conditional probability, the joint density of all the X, Y observations can be expressed as, f θ (x 1, y 1,..., x N, y N ) = f θ (x 1 ) N t=2 f θ (x t d t ) d t = (y 1, x 1,..., y t 1, x t 1 ) c t = (y 1, x 1,..., y t 1, x t 1, x t ). N t=1 f θ (y t c t ) (1) The second product on the right hand side of (1) constitutes a partial likelihood according to Cox(1975). 5

6 An increasing sequence of σ-fields F 0 F 1 F Y 1, Y 2,... a sequence of random variables on some common probability space such that Y t is F t measurable. Y t F t 1 f t (y t ; θ). θ R p is a fixed parameter. The partial likelihood (PL) function relative to θ, F t, and the data Y 1, Y 2,..., Y N, is given by the product PL(θ; y 1,..., y N ) = N t=1 f t (y t ; θ) (2) 6

7 The General Regression Problem {Y t } is a response time series with the corresponding p dimensional covariate process, Z t 1 = (Z (t 1)1,..., Z (t 1)p ). Define, F t 1 = σ{y t 1, Y t 2,...,Z t 1,Z t 2,...}. Note: Z t 1 may already include past Y t s. The conditional expectation of the response given the past: µ t = E[Y t F t 1 ], ( ) The problem is to relate µ t to the covariates. 7

8 Time Series Following GLM 1. Random Component. The conditional distribution of the response given the past belongs to the exponential family of distributions in natural or canonical form, { } yt θ t b(θ t ) f(y t ; θ t, φ F t 1 ) = exp + c(y t ; φ). α t (φ) (3) α t (φ) = φ/ω t, dispersion φ, prior weight ω t. 2. Systematic Component. There is a monotone function g( ) such that, g(µ t ) = η t = p j=1 β j Z (t 1)j = Z t 1β. (4) g( ): the link function η t : the linear predictor of the model. 8

9 Typical choices for η t = Z t 1β, could be or or β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 X t cos(ω 0 t) β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 1 X t + β 4 X t 1 β 0 + β 1 Y t 1 + β 2 Y 7 t 2 + β 3Y t 1 log(x t 12 ) 9

10 GLM Equations: f(y; θ t ; φ F t 1 )dy = 1. This implies the relationships: µ t = E[Y t F t 1 ] = b (θ t ). (5) Var[Y t F t 1 ] = α t (φ)b (θ t ) α t (φ)v (µ t ). (6) Since Var[Y t F t 1 ] > 0, it follows that b is monotone. Therefore, equation (5) implies that θ t = (b ) 1 (µ t ). (7) We see that θ t itself is a monotone function of µ t and hence it can be used to define a link function. The link function g(µ t ) = θ t (µ t ) = η t = Z t 1 β (8) is called the canonical link function. 10

11 Example: Poisson Time Series. f(y t ; θ t, φ F t 1 ) = exp {(y t log µ t µ t ) log y t!}. E[Y t F t 1 ] = µ t, b(θ t ) = µ t = exp(θ t ), V (µ t ) = µ t, φ = 1, and ω t = 1. The canonical link is g(µ t ) = θ t (µ t ) = log µ t = η t = Z t 1 β. As an example, if Z t 1 = (1, X t, Y t 1 ), then log µ t = β 0 + β 1 X t + β 2 Y t 1 with {X t } standing for some covariate process, or a possible trend, or a possible seasonal component. 11

12 Example: Binary Time Series {Y t } takes the values 0,1. Let Then π t = P(Y t = 1 F t 1 ). f(y t ; θ t, φ F t 1 ) = { exp y t log ( πt 1 π t ) + log(1 π t ) } The canonical link gives the logistic regression model g(π t ) = θ t (π t ) = log π t = η t = Z t 1β. (9) 1 π t Note: π t = F l (η t ) (10) 12

13 Partial Likelihood Inference Given a time series {Y t }, t = 1,..., N, conditionally distributed as (3). The partial likelihood of the observed series is PL(β) = N t=1 f(y t ; θ t, φ F t 1 ). (11) Then from (3), the log partial likelihood, l(β), is given by l(β) = = = N t=1 N t=1 N t=1 log f(y t ; θ t, φ F t 1 ) N l t t=1 } { yt θ t b(θ t ) + c(y t, φ) α t (φ) { yt u(z t 1 β) b(u(z t 1 β)) + c(y t, φ) α t (φ) } (12) 13

14 ( β 1, β 2,, β p ). The partial score is a p dimensional vector, S N (β) l(β) = N t=1 with σ 2 t (β) = Var[Y t F t 1 ]. µ t (Y t µ t (β)) Z t 1 η t σt 2(β) (13) The partial score vector process {S t (β)}, t = 1,..., N, is defined from the partial sums S t (β) = t s=1 µ s (Y s µ s (β)) Z s 1 η s σs(β) 2. (14) 14

15 The solution of the score equation, S N (β) = logpl(β) = 0 (15) is denoted by ˆβ, and is referred to as the maximum partial likelihood estimator (MPLE) of β. The system of equations (15) is non linear and is customarily solved by the Fisher scoring method, an iterative algorithm resembling the Newton Raphson procedure. Before turning to the Fisher scoring algorithm in our context of conditional inference, it is necessary to introduce several important matrices. 15

16 An important role in partial likelihood inference is played by the cumulative conditional information matrix, G N (β), defined by a sum of conditional covariance matrices, G N (β) = = N t=1 N t=1 We also need: Cov [ = Z W(β)Z. µ t (Y t µ t (β)) Z t 1 η t σt 2(β) ( ) 2 µt 1 Z t 1 η t σt 2(β)Z t 1 H N (β) l(β). Define R N (β) from the difference H N (β) = G N (β) R N (β). Fact: For canonical links R N (β) = 0. F t 1 ] 16

17 Fisher Scoring: In Newton-Raphson replace H N (β) by its conditional expectation: ˆβ (k+1) = ˆβ (k) + G 1 N (ˆβ (k) )S N (ˆβ (k) ). Fisher scoring becomes Newton-Raphson for canonical links. Fisher scoring simplifies to Iterative Reweighted Least Squares: ˆβ (k+1) = ( Z W(ˆβ (k) )Z) 1 Z W(ˆβ (k) )q (k). 17

18 Asymptotic Theory Assumption A A1. The true parameter β belongs to an open set B R p. A2. The covariate vector Z t 1 almost surely lies in a nonrandom compact subset Γ of R p, such that P[ N t=1 Z t 1 Z t 1 > 0] = 1. In addition, Z t 1β lies almost surely in the domain H of the inverse link function h = g 1 for all Z t 1 Γ and β B. A3. The inverse link function h defined in (A2) is twice continuously differentiable and h(γ)/ γ = 0. 18

19 A4. There is a probability measure ν on R p such that R p zz ν(dz) is positive definite, and such that under (3) and (4) for Borel sets A R p, 1 N N t=1 I [Zt 1 A] ν(a) in probability as N, at the true value of β. A4 calls for asymptotically well behaved covariates: Nt=1 f(z t 1 ) N f(z)ν(dz) R p in probability as N. Thus, there exists a p p limiting information matrix per observation, G(β), such that G N (β) G(β) (16) N in probability, as N. 19

20 Slud and K (1994), Fokianos and K (1998): 1. {S t (β)} relative to {F t }, t = 1,..., N, is a martingale. 2. R N(β) N S N(β) N N p (0,G(β)). 4. N(ˆβ β) N p (0,G 1 (β)). 20

21 100(1 α)% prediction interval (h = g 1 ) µ t (β) = µ t (ˆβ)±z α/2 h (Z t 1 β) N Z t 1 G 1 (β)z t 1 Hypothesis Testing Let β be the MPLE of β obtained under H 0 with r < p restrictions, H 0 : β 1 = = β r = 0. Let ˆβ be the unrestricted MPLE. The log partial likelihood ratio statistic converges to χ 2 r. λ N = 2 { l(ˆβ) l( β) } (17) 21

22 More generally: Assume C is a known matrix with full rank r, r < p. Under the general linear hypothesis, H 0 : Cβ = β 0 against H 1 : Cβ β 0, (18) {Cˆβ β 0 } {CG 1 (ˆβ)C } 1 {Cˆβ β 0 } χ 2 r 22

23 Diagnostics l(y; y): maximum log partial likelihood corresponding to the saturated model. l(ˆµ; y): maximum log partial likelihood from the reduced model. Scaled Deviance: D 2{l(y;y) l(ˆµ;y)} χ 2 N p AIC(p) = 2logPL(ˆβ) + 2p, BIC(p) = 2logPL(ˆβ) + plog N 23

24 Analysis of Mortality Count Data in LA Weekly data from Los Angeles County during a period of 10 years from January 1, 1970, to December 31, 1979: Weekly sampled filtered time series. N = 508. Response Y Weather T RH Pollution CO SO 2 NO 2 HC OZ KM Total Mortality (filtered) Temperature Relative humidity Carbon monoxide Sulfur dioxide Nitrogen dioxide Hydrocarbons Ozone Particulates 24

25 Mortality Lag Temperature ACF Series : Mort Lag log(co) Mortality, Temperature, log(co) ACF ACF Series : Temp Series : log(co) Lag Weekly data of filtered total mortality and temperature, and log-filtered CO, and the corresponding estimated autocorrelation functions. N =

26 Covariates and η t used in Poisson regression. S = SO 2, N = NO 2. To recover η t, insert the β s. For Model 2, η t = β 0 + β 1 Y t 1 + β 2 Y t 2, etc. Model 0 T t + RH t + CO t + S t + N t +HC t + OZ t + KM t Model 1 Y t 1 Model 2 Y t 1 + Y t 2 Model 3 Y t 1 + Y t 2 + T t 1 Model 4 Y t 1 + Y t 2 + T t 1 + log(co t ) Model 5 Y t 1 + Y t 2 + T t 1 + T t 2 + log(co t ) Model 6 Y t 1 + Y t 2 + T t + T t 1 + log(co t ) 26

27 Comparison of 7 Poisson regression models. N = 508. Model p D df AIC BIC Choose Model 4: log(ˆµ t ) = ˆβ 0 + ˆβ 1 Y t 1 + ˆβ 2 Y t 2 + ˆβ 3 T t 1 + ˆβ 4 log(co t ) 27

28 Poisson Regression of Filtered LA Mortality Mrt(t) on Mrt(t-1), Mrt(t-2), Tmp(t-1), log(crb(t)) Data... Fit Observed and predicted weekly filtered total mortality from Model 4. 28

29 Comparison of Residuals Model 0 Residuals Model 4 Residuals Series : Resid0 Series : Resid4 ACF ACF Lag Lag Working residuals from Models 0 and 4, and their respective estimated autocorrelation. 29

30 Part II: Binary Time Series Example of bts: Two categories obtained by clipping. Y t I [Xt C] = { 1, if Xt C 0, if X t C (19) Y t I [Xt r] = { 1, if Xt r 0, if X t < r (20) Other examples: at time t = 1,2,..., (Rain, No Rain), (S&P Up, S&P Down), etc. 30

31 {Y t } taking the values 0 or 1, t = 1,2,3,. {Z t 1 } p-dim covariate stochastic data. Against the backdrop of the general framework presented above we wish to relate µ t (β) = π t (β) = P β (Y t = 1 F t 1 ) (21) to the covariates. For this we need good links! 31

32 Standard logistic distribution, F l (x) = Then, ex 1 + e x = e x, < x < Fl 1 (x) = log(x/(1 x)) is the natural link under some conditions. 32

33 Fact: For any bts, there are θ j such that { } P(Yt = 1 Y log t 1 = y t 1,..., Y 1 = y 1 ) = P(Y t = 0 Y t 1 = y t 1,..., Y 1 = y 1 ) θ 0 + θ 1 y t θ p y t p or π t (β) = exp[ (θ 0 + θ 1 y t θ p y t p )] Fact: Consider an AR(p) time series X t = γ 0 + γ 1 X t γ p X t p + λǫ t where ǫ t are i.i.d. logistically distributed. Define: Y t = I [Xt r]. Then π t (β) = exp[ (γ 0 r + γ 1 X t γ p X t p )/λ] 33

34 This motivates logistic regression: π t (β) P β (Y t = 1 F t 1 ) = F l (β Z t 1 ) 1 = 1 + exp[ β Z t 1 ] or equivalently, the link function is { } πt (β) logit(π t (β)) log = β Z t 1 1 π t (β) This is the canonical link. 34

35 Link functions for binary time series. logit β Z t 1 = log{π t (β)/(1 π t (β))} probit β Z t 1 = Φ 1 {π t (β)} log-log β Z t 1 = log{ log(π t (β))} C-log-log β Z t 1 = log{ log(1 π t (β))} Note: Here all the inverse links are cdf s. In what follows we always assume the inverse link is a differentiable cdf F(x). 35

36 The partial likelihood of β takes on the simple product form, PL(β) = = N t=1 N t=1 [π t (β)] y t [1 π t (β)] 1 y t [F(β Z t 1 )] y t [1 F(β Z t 1 )] 1 y t We have under Assumption A: N(ˆβ β) N p (0,G 1 (β)). For the canonical link (logistic regression): G N (β) N G(β) = Rp e β z (1 + e β z ) 2zz ν(dz) 36

37 Illustration of asymptotic normality. ( ) 2πt logit(π t (β)) = β 1 + β 2 cos + β 3 Y t 1 12 so that Z t 1 = (1,cos(2πt/12), Y t 1 ). (a) (b) t Logistic autoregression with a sinusoidal component. a. Y t. b. π t (β) where logit(π t (β)) = cos(2πt/12) + y t 1. 37

38 b1 b b Histograms of normalized MPLE s where β = (0.3,0.75,1), N = 200. Each histogram consists of 1000 estimates. 38

39 Goodness of Fit C 1,, C k, a partition of R p. For j = 1,, k, define, and Put: M j E j (β) N t=1 N t=1 I [Zt 1 C j ] Y t I [Zt 1 C j ] π t(β) M (M 1,, M k ), E(β) (E 1 (β),, E k (β)). 39

40 Slud and K (1994), K and Fokianos (2002): With σ 2 j C j F(β z)(1 F(β z))ν(dz) χ 2 (β) 1 N k j=1 (M j E j (β)) 2 /σ 2 j χ2 k. In practice need to adjust the df when replacing β by ˆβ. 40

41 When ˆβ and (M E(β)) are obtained from the same data set, E(χ 2 (ˆβ)) k k j=1 (B G(β)B) jj /σ 2 j When ˆβ and (M E(β)) are obtained from independent data sets, E(χ 2 (ˆβ)) k + k j=1 (B G(β)B) jj /σ 2 j 41

42 Illustration of the distribution of χ 2 (β) using Q-Q plots. Consider the previous logistic regression model with a periodic component. Use the partition C 1 = {Z : Z 1 = 1, 1 Z 2 < 0, Z 3 = 0} C 2 = {Z : Z 1 = 1, 1 Z 2 < 0, Z 3 = 1} C 3 = {Z : Z 1 = 1,0 Z 2 1, Z 3 = 0} C 4 = {Z : Z 1 = 1,0 Z 2 1, Z 3 = 1} Then, k=4, M j is the sum of those Y t s for which Z t 1 is in C j, j = 1,2,3,4, and the E j (β) are obtained similarly. Estimate σ 2 j by, σ 2 j = 1 N N t=1 I [Zt 1 C j ] π t(β)(1 π t (β)) 42

43 The χ 2 4 approximation is quite good. N=200, b=(0.3,0.75,1) N=400, b=(0.3,1,2) TRUE CHI SQ TRUE CHI SQ CHI SQ STATISTIC CHI SQ STATISTIC 43

44 Example: Modeling successive eruptions of the Old Faithful geyser in Yellowstone National Park, Wyoming. 1 if duration is greater than 3 minutes 0 if duration is less than 3 minutes N =

45 Candidate models η t = β Z t 1 for Old Faithful. 1 β 0 + β 1 Y t 1 2 β 0 + β 1 Y t 1 + β 2 Y t 2 3 β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 3 4 β 0 + β 1 Y t 1 + β 2 Y t 2 + β 3 Y t 3 + β 4 Y t 4 Comparison of models using the logistic link: Model 2 is best. Probit reg. gives similar results. Model p X 2 D AIC BIC ˆπ t = π t (ˆβ) = exp { (ˆβ 0 + ˆβ 1 Y t 1 + ˆβ 2 Y t 2 ) }. 45

46 Part III: Categorical Time Series EEG sleep state classified or quantized in four categories as follows. 1 : quiet sleep, 2 : indeterminate sleep, 3 : active sleep, 4 : awake. Here the sleep state categories or levels are assigned integer values. This is an example of a categorical time series {Y t }, t = 1,..., N, taking the values 1,...,4. This is an arbitrary integer assignment. Why not the values 7.1, 15.8, 19.24, 71.17? Any other scale? 46

47 Assume m categories. The t th observation of any categorical time series regardless of the measurement scale can be represented by the vector Y t = (Y t1,..., Y tq ) of length q = m 1, with elements { 1, if the jth category is observed at time t Y tj = 0, otherwise for t = 1,..., N and j = 1,..., q. BTS is a special case with m = 2, q = 1. 47

48 Write for j = 1,..., q, π tj = E[Y tj F t 1 ] = P(Y tj = 1 F t 1 ), Define: π t = (π t1,... π tq ) Y tm = 1 π tm = 1 q Y tj j=1 q j=1 π tj. Let {Z t 1 }, t = 1,..., N, for a p q matrix that represents a covariate process. Y tj corresponds to a vector of length p of random time dependent covariates which forms the jth column of Z t 1. 48

49 Assume the general regression model: ( ) π t (β) = π t1 (β) π t2 (β) π tq (β) = h 1 (Z t 1 β) h 2 (Z t 1 β) h q (Z t 1 β) = h(z t 1 β). The inverse link function h is defined on R q and takes values in R q. We shall only examine nominal and ordinal time series. 49

50 Nominal Time Series. Nominal categorical variables lack natural ordering. A model for nominal time series: Multinomial logit model π tj (β) = Note that exp(β j z t 1) 1 + q l=1 exp(β l z t 1), j = 1,..., q π tm (β) = q l=1 exp(β l z t 1). 50

51 Multinomial logit model is a special case of (*). Indeed, define β to be the p qd-vector β = (β 1,..., β q ), and Z t 1 the qd q matrix Z t 1 = z t z t z t 1. Let h stand for the vector valued function whose components h j, j = 1,..., q, are given by π tj (β) = h j (η t ) = with exp(η tj ) 1 + q l=1 exp(η tl), j = 1,..., q η t = (η t1,..., η tq ) = Z t 1 β. 51

52 Ordinal Time Series. Measured on a scale endowed with a natural ordering. A model for ordinal time series: Need a latent or auxiliary variable. Put X t = γ z t 1 + e t, 1. e t cdf F i.i.d. 2. γ d-dim vector of parameters. 3. z t 1 covariate d-dim vector. Define a categorical time series {Y t }, from the levels of {X t }, Y t = j Y tj = 1 θ j 1 X t < θ j = θ 0 < θ 1 <... < θ m =. 52

53 Then π tj = P(θ j 1 X t < θ j F t 1 ) = F(θ j + γ z t 1 ) F(θ j 1 + γ z t 1 ), for j = 1,..., m. There are many possibilities depending on F. Special case: Proportional Odds Model, F(x) = F l (x) = exp( x). Then we have for j = 1,..., q, { } P[Yt j F log t 1 ] = θ j + γ z t 1 P[Y t > j F t 1 53

54 Proportional odds model has the form (*) with p = (q + d): β = (θ 1,..., θ q, γ ) and Z t 1 the (q + d) q matrix Now set Z t 1 = z t 1 z t 1 z t 1. h = (h 1,..., h q ), and let for j = 2,..., q, π t1 (β) = h 1 (η t ) = F(η t1 ), π tj (β) = h j (η t ) = F(η tj ) F(η t(j 1) ), where η t = (η t1,..., η tq ) = Z t 1 β. 54

55 Partial likelihood estimation. Introduce the multinomial probability f(y t ;β F t 1 ) = m j=1 π tj (β) y tj. The partial likelihood is a product of the multinomial probabilities, PL(β) = = N t=1 N t=1 j=1 f(y t ; β F t 1 ) m π y tj tj (β), so that the partial log-likelihood is given by l(β) logpl(β) = N m t=1 j=1 y tj log π tj (β). 55

56 Under a modified Assumption A: G N (β) N R p q ZU(β)Σ(β)U (β)z ν(dz) = G(β) ( N(ˆβ β) N p 0,G 1 (β) ) N ( πt (ˆβ) π t (β) ) N q ( 0,Zt 1 D t (β)g 1 (β)d t(β)z t 1 ) 56

57 Example: Sleep State. Covariates: Heart Rate, Temperature. N = : quiet sleep, 2 : indeterminate sleep, 3 : active sleep, 4 : awake. Ordinal CTS: 4 < 1 < 2 < 3. State time Heart Rate time Temperature time 57

58 Fit proportional odds models. Model Covariates AIC 1 1+Y t Y t 1 +log R t Y t 1 +log R t +T t Y t 1 +T t Y t 1 +Y t 2 +log R t Y t 1 +log R t log R t

59 Model 2: 1 + Y t 1 + log R t [ ] P(Yt 4 F log t 1 ) = P(Y t > 4 F t 1 ) θ 1 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, [ ] P(Yt 1 F log t 1 ) = P(Y t > 1 F t 1 ) θ 2 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, ] [ P(Yt 2 F log t 1 ) = P(Y t > 2 F t 1 ) θ 3 + γ 1 Y (t 1)1 + γ 2 Y (t 1)2 + γ 3 Y (t 1)3 + γ 4 log R t, ˆθ 1 = , ˆθ 2 = , ˆθ 3 = , ˆγ 1 = , ˆγ 2 = 9.533, ˆγ 3 = 4.755, ˆγ 4 = The corresponding standard errors are , , , 0.872, 0.630, and

60 (a) State (b) State (a) Observed versus (b) predicted sleep states for Model 2 of Table applied to the testing data set. N =

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

The Probit Link Function in Generalized Linear Models for Data Mining Applications

The Probit Link Function in Generalized Linear Models for Data Mining Applications Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

Lecture 8: Gamma regression

Lecture 8: Gamma regression Lecture 8: Gamma regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Models with constant coefficient of variation Gamma regression: estimation and testing

More information

Lecture 6: Poisson regression

Lecture 6: Poisson regression Lecture 6: Poisson regression Claudia Czado TU München c (Claudia Czado, TU Munich) ZFS/IMS Göttingen 2004 0 Overview Introduction EDA for Poisson regression Estimation and testing in Poisson regression

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node Enterprise Miner - Regression 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node 1. Some background: Linear attempts to predict the value of a continuous

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE

GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE ACTA UNIVERSITATIS AGRICULTURAE ET SILVICULTURAE MENDELIANAE BRUNENSIS Volume 62 41 Number 2, 2014 http://dx.doi.org/10.11118/actaun201462020383 GENERALIZED LINEAR MODELS IN VEHICLE INSURANCE Silvie Kafková

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES

NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES NON-LIFE INSURANCE PRICING USING THE GENERALIZED ADDITIVE MODEL, SMOOTHING SPLINES AND L-CURVES Kivan Kaivanipour A thesis submitted for the degree of Master of Science in Engineering Physics Department

More information

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)

Overview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and

More information

Computer exercise 4 Poisson Regression

Computer exercise 4 Poisson Regression Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

More information

Logistic regression modeling the probability of success

Logistic regression modeling the probability of success Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Local classification and local likelihoods

Local classification and local likelihoods Local classification and local likelihoods November 18 k-nearest neighbors The idea of local regression can be extended to classification as well The simplest way of doing so is called nearest neighbor

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Logit Models for Binary Data

Logit Models for Binary Data Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Chapter 29 The GENMOD Procedure. Chapter Table of Contents

Chapter 29 The GENMOD Procedure. Chapter Table of Contents Chapter 29 The GENMOD Procedure Chapter Table of Contents OVERVIEW...1365 WhatisaGeneralizedLinearModel?...1366 ExamplesofGeneralizedLinearModels...1367 TheGENMODProcedure...1368 GETTING STARTED...1370

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Analysis and Computation for Finance Time Series - An Introduction

Analysis and Computation for Finance Time Series - An Introduction ECMM703 Analysis and Computation for Finance Time Series - An Introduction Alejandra González Harrison 161 Email: mag208@exeter.ac.uk Time Series - An Introduction A time series is a sequence of observations

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

Regression Modeling Strategies

Regression Modeling Strategies Frank E. Harrell, Jr. Regression Modeling Strategies With Applications to Linear Models, Logistic Regression, and Survival Analysis With 141 Figures Springer Contents Preface Typographical Conventions

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4

4. Simple regression. QBUS6840 Predictive Analytics. https://www.otexts.org/fpp/4 4. Simple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/4 Outline The simple linear model Least squares estimation Forecasting with regression Non-linear functional forms Regression

More information

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics.

Service courses for graduate students in degree programs other than the MS or PhD programs in Biostatistics. Course Catalog In order to be assured that all prerequisites are met, students must acquire a permission number from the education coordinator prior to enrolling in any Biostatistics course. Courses are

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Statistics Graduate Courses

Statistics Graduate Courses Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Poisson Regression or Regression of Counts (& Rates)

Poisson Regression or Regression of Counts (& Rates) Poisson Regression or Regression of (& Rates) Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Generalized Linear Models Slide 1 of 51 Outline Outline

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming

Underwriting risk control in non-life insurance via generalized linear models and stochastic programming Underwriting risk control in non-life insurance via generalized linear models and stochastic programming 1 Introduction Martin Branda 1 Abstract. We focus on rating of non-life insurance contracts. We

More information

Statistics and Probability Letters

Statistics and Probability Letters Statistics and Probability Letters 79 (2009) 1884 1889 Contents lists available at ScienceDirect Statistics and Probability Letters journal homepage: www.elsevier.com/locate/stapro Discrete-valued ARMA

More information

How To Understand The Theory Of Probability

How To Understand The Theory Of Probability Graduate Programs in Statistics Course Titles STAT 100 CALCULUS AND MATR IX ALGEBRA FOR STATISTICS. Differential and integral calculus; infinite series; matrix algebra STAT 195 INTRODUCTION TO MATHEMATICAL

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS Ramasubramanian V. I.A.S.R.I., Library Avenue, New Delhi- 110 012 ram_stat@yahoo.co.in 1. Introduction A Time Series (TS) is a sequence of observations ordered in time. Mostly these

More information

240ST014 - Data Analysis of Transport and Logistics

240ST014 - Data Analysis of Transport and Logistics Coordinating unit: Teaching unit: Academic year: Degree: ECTS credits: 2015 240 - ETSEIB - Barcelona School of Industrial Engineering 715 - EIO - Department of Statistics and Operations Research MASTER'S

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

VI. Introduction to Logistic Regression

VI. Introduction to Logistic Regression VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models

More information

Univariate and Multivariate Methods PEARSON. Addison Wesley

Univariate and Multivariate Methods PEARSON. Addison Wesley Time Series Analysis Univariate and Multivariate Methods SECOND EDITION William W. S. Wei Department of Statistics The Fox School of Business and Management Temple University PEARSON Addison Wesley Boston

More information

Trend and Seasonal Components

Trend and Seasonal Components Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing

More information

Time Series Analysis III

Time Series Analysis III Lecture 12: Time Series Analysis III MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis III 1 Outline Time Series Analysis III 1 Time Series Analysis III MIT 18.S096 Time Series Analysis

More information

MATHEMATICAL METHODS OF STATISTICS

MATHEMATICAL METHODS OF STATISTICS MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

More information

TIME SERIES ANALYSIS

TIME SERIES ANALYSIS TIME SERIES ANALYSIS L.M. BHAR AND V.K.SHARMA Indian Agricultural Statistics Research Institute Library Avenue, New Delhi-0 02 lmb@iasri.res.in. Introduction Time series (TS) data refers to observations

More information

5 Transforming Time Series

5 Transforming Time Series 5 Transforming Time Series In many situations, it is desirable or necessary to transform a time series data set before using the sophisticated methods we study in this course: 1. Almost all methods assume

More information

7 Generalized Estimating Equations

7 Generalized Estimating Equations Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can

More information

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal

Analysis of ordinal data with cumulative link models estimation with the R-package ordinal Analysis of ordinal data with cumulative link models estimation with the R-package ordinal Rune Haubo B Christensen June 28, 2015 1 Contents 1 Introduction 3 2 Cumulative link models 4 2.1 Fitting cumulative

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Joint models for classification and comparison of mortality in different countries.

Joint models for classification and comparison of mortality in different countries. Joint models for classification and comparison of mortality in different countries. Viani D. Biatat 1 and Iain D. Currie 1 1 Department of Actuarial Mathematics and Statistics, and the Maxwell Institute

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES

CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES CHARACTERISTICS IN FLIGHT DATA ESTIMATION WITH LOGISTIC REGRESSION AND SUPPORT VECTOR MACHINES Claus Gwiggner, Ecole Polytechnique, LIX, Palaiseau, France Gert Lanckriet, University of Berkeley, EECS,

More information

Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert.

Examples. David Ruppert. April 25, 2009. Cornell University. Statistics for Financial Engineering: Some R. Examples. David Ruppert. Cornell University April 25, 2009 Outline 1 2 3 4 A little about myself BA and MA in mathematics PhD in statistics in 1977 taught in the statistics department at North Carolina for 10 years have been in

More information

Nominal and ordinal logistic regression

Nominal and ordinal logistic regression Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

Introduction to Predictive Modeling Using GLMs

Introduction to Predictive Modeling Using GLMs Introduction to Predictive Modeling Using GLMs Dan Tevet, FCAS, MAAA, Liberty Mutual Insurance Group Anand Khare, FCAS, MAAA, CPCU, Milliman 1 Antitrust Notice The Casualty Actuarial Society is committed

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Regression Models for Binary Time Series with Gaps

Regression Models for Binary Time Series with Gaps Regression Models for Binary Time Series with Gaps Bernhard Klingenberg Department of Mathematics and Statistics Williams College, Williamstown, MA 01267 email: bklingen@williams.edu November 15, 2007

More information

13. Poisson Regression Analysis

13. Poisson Regression Analysis 136 Poisson Regression Analysis 13. Poisson Regression Analysis We have so far considered situations where the outcome variable is numeric and Normally distributed, or binary. In clinical work one often

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care.

Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care. Incorporating cost in Bayesian Variable Selection, with application to cost-effective measurement of quality of health care University of Florida 10th Annual Winter Workshop: Bayesian Model Selection and

More information

Estimating an ARMA Process

Estimating an ARMA Process Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Logistic Regression (a type of Generalized Linear Model)

Logistic Regression (a type of Generalized Linear Model) Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

15.1 The Structure of Generalized Linear Models

15.1 The Structure of Generalized Linear Models 15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

More information

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012]

Survival Analysis of Left Truncated Income Protection Insurance Data. [March 29, 2012] Survival Analysis of Left Truncated Income Protection Insurance Data [March 29, 2012] 1 Qing Liu 2 David Pitt 3 Yan Wang 4 Xueyuan Wu Abstract One of the main characteristics of Income Protection Insurance

More information

Profitable Strategies in Horse Race Betting Markets

Profitable Strategies in Horse Race Betting Markets University of Melbourne Department of Mathematics and Statistics Master Thesis Profitable Strategies in Horse Race Betting Markets Author: Alexander Davis Supervisor: Dr. Owen Jones October 18, 2013 Abstract:

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Handling attrition and non-response in longitudinal data

Handling attrition and non-response in longitudinal data Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein

More information

Penalized Logistic Regression and Classification of Microarray Data

Penalized Logistic Regression and Classification of Microarray Data Penalized Logistic Regression and Classification of Microarray Data Milan, May 2003 Anestis Antoniadis Laboratoire IMAG-LMC University Joseph Fourier Grenoble, France Penalized Logistic Regression andclassification

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Nonnested model comparison of GLM and GAM count regression models for life insurance data

Nonnested model comparison of GLM and GAM count regression models for life insurance data Nonnested model comparison of GLM and GAM count regression models for life insurance data Claudia Czado, Julia Pfettner, Susanne Gschlößl, Frank Schiller December 8, 2009 Abstract Pricing and product development

More information

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3.

It is important to bear in mind that one of the first three subscripts is redundant since k = i -j +3. IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

More information

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg

SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg SPSS TRAINING SESSION 3 ADVANCED TOPICS (PASW STATISTICS 17.0) Sun Li Centre for Academic Computing lsun@smu.edu.sg IN SPSS SESSION 2, WE HAVE LEARNT: Elementary Data Analysis Group Comparison & One-way

More information

Chapter 10 Introduction to Time Series Analysis

Chapter 10 Introduction to Time Series Analysis Chapter 1 Introduction to Time Series Analysis A time series is a collection of observations made sequentially in time. Examples are daily mortality counts, particulate air pollution measurements, and

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

Automated Statistical Modeling for Data Mining David Stephenson 1

Automated Statistical Modeling for Data Mining David Stephenson 1 Automated Statistical Modeling for Data Mining David Stephenson 1 Abstract. We seek to bridge the gap between basic statistical data mining tools and advanced statistical analysis software that requires

More information

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH

UNDERGRADUATE DEGREE DETAILS : BACHELOR OF SCIENCE WITH QATAR UNIVERSITY COLLEGE OF ARTS & SCIENCES Department of Mathematics, Statistics, & Physics UNDERGRADUATE DEGREE DETAILS : Program Requirements and Descriptions BACHELOR OF SCIENCE WITH A MAJOR IN STATISTICS

More information

Multivariate Statistical Inference and Applications

Multivariate Statistical Inference and Applications Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim

More information

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus

Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information