Statistics 580 The EM Algorithm Introduction

Size: px
Start display at page:

Download "Statistics 580 The EM Algorithm Introduction"

Transcription

1 Statistics 580 The EM Algorithm Introduction The EM algorithm is a very general iterative algorithm for parameter estimation by maximum likelihood when some of the random variables involved are not observed i.e., considered missing or incomplete. The EM algorithm formalizes an intuitive idea for obtaining parameter estimates when some of the data are missing: i. replace missing values by estimated values, ii. estimate parameters. iii. Repeat step (i) using estimated parameter values as true values, and step (ii) using estimated values as observed values, iterating until convergence. This idea has been in use for many years before Orchard and Woodbury (1972) in their missing information principle provided the theoretical foundation of the underlying idea. The term EM was introduced in Dempster, Laird, and Rubin (1977) where proof of general results about the behavior of the algorithm was first given as well as a large number of applications. For this discussion, let us suppose that we have a random vector y whose joint density f(y; θ) is indexed by a p-dimensional parameter θ Θ R p. If the complete-data vector y were observed, it is of interest to compute the maximum likelihood estimate of θ based on the distribution of y. The log-likelihood function of y log L(θ; y) = l(θ; y) = log f(y; θ), is then required to be maximized. In the presence of missing data, however, only a function of the complete-data vector y, is observed. We will denote this by expressing y as (y obs, y mis ), where y obs denotes the observed but incomplete data and y mis denotes the unobserved or missing data. For simplicity of description, assume that the missing data are missing at random (Rubin, 1976), so that f(y; θ) = f(y obs, y mis ; θ) = f 1 (y obs ; θ) f 2 (y mis y obs ; θ), where f 1 is the joint density of y obs and f 2 is the joint density of y mis given the observed data y obs, respectively. Thus it follows that l obs (θ; y obs ) = l(θ; y) log f 2 (y mis y obs ; θ), where l obs (θ; y obs ) is the observed-data log-likelihood. 1

2 EM algorithm is useful when maximizing l obs can be difficult but maximizing the completedata log-likelihood l is simple. However, since y is not observed, l cannot be evaluated and hence maximized. The EM algorithm attempts to maximize l(θ; y) iteratively, by replacing it by its conditional expectation given the observed data y obs. This expectation is computed with respect to the distribution of the complete-data evaluated at the current estimate of θ. More specifically, if θ (0) is an initial value for θ, then on the first iteration it is required to compute Q(θ; θ (0) ) = E θ (0) [l(θ; y) y obs ]. Q(θ; θ (0) ) is now maximized with respect to θ, that is, θ (1) is found such that Q(θ (1) ; θ (0) ) Q(θ; θ (0) ) for all θ Θ. Thus the EM algorithm consists of an E-step (Estimation step) followed by an M-step (Maximization step) defined as follows: E-step: Compute Q(θ; θ (t) ) where Q(θ; θ (t) ) = E θ (t) [l(θ; y) y obs ]. M-step: Find θ (t+1) in Θ such that for all θ Θ. Q(θ (t+1) ; θ (t) ) Q(θ; θ (t) ) The E-step and the M-step are repeated alternately until the difference L(θ (t+1) ) L(θ (t) ) is less than δ, where δ is a prescribed small quantity. The computation of these two steps simplify a great deal when it can be shown that the log-likelihood is linear in the sufficient statistic for θ. In particular, this turns out to be the case when the distribution of the complete-data vector (i.e., y) belongs to the exponential family. In this case, the E-step reduces to computing the expectation of the completedata sufficient statistic given the observed data. When the complete-data are from the exponential family, the M-step also simplifies. The M-step involves maximizing the expected log-likelihood computed in the E-step. In the exponential family case, actually maximizing the expected log-likelihood to obtain the next iterate can be avoided. Instead, the conditional expectations of the sufficient statistics computed in the E-step can be directly substituted for the sufficient statistics that occur in the expressions obtained for the complete-data maximum likelihood estimators of θ, to obtain the next iterate. Several examples are discussed below to illustrate these steps in the exponential family case. As a general algorithm available for complex maximum likelihood computations, the EM algorithm has several appealing properties relative to other iterative algorithms such as Newton-Raphson. First, it is typically easily implemented because it relies on completedata computations: the E-step of each iteration only involves taking expectations over complete-data conditional distributions. The M-step of each iteration only requires completedata maximum likelihood estimation, for which simple closed form expressions are already 2

3 available. Secondly, it is numerically stable: each iteration is required to increase the loglikelihood l(θ; y obs ) in each iteration, and if l(θ; y obs ) is bounded, the sequence l(θ (t) ; y obs ) converges to a stationery value. If the sequence θ (t) converges, it does so to a local maximum or saddle point of l(θ; y obs ) and to the unique MLE if l(θ; y obs ) is unimodal. A disadvantage of EM is that its rate of convergence can be extremely slow if a lot of data are missing: Dempster, Laird, and Rubin (1977) show that convergence is linear with rate proportional to the fraction of information about θ in l(θ; y) that is observed. Example 1: Univariate Normal Sample Let the complete-data vector y = (y 1,..., y n ) T be a random sample from N(µ, σ 2 ). Then ( ) 1 n/2 { f(y; µ, σ 2 ) = exp 1 n (y i µ) 2 } 2πσ 2 2 σ 2 ( ) 1 n/2 = exp { 1/2σ ( 2 y 2 2πσ 2 i 2µ y i + nµ 2)} which implies that ( y i, y 2 i ) are sufficient statistics for θ = (µ, σ 2 ) T. The complete-data log-likelihood function is: l(µ, σ 2 ; y) = n 2 log(σ2 ) 1 2 = n 2 log(σ2 ) 1 2σ 2 n n (y i µ) 2 + constant σ 2 y 2 i + µ σ 2 n y i nµ2 σ 2 + constant It follows that the log-likelihood based on complete-data is linear in complete-data sufficient statistics. Suppose y i, i = 1,..., m are observed and y i, i = m + 1,..., n are missing (at random) where y i are assumed to be i.i.d. N(µ, σ 2 ). Denote the observed data vector by y obs = (y 1,..., y m ) T ). Since the complete-data y is from the exponential family, the E-step requires the computation of ( n ) ( N ) E θ y i y obs and E θ yi 2 y obs, instead of computing the expectation of the complete-data log-likelihood function shown above. Thus, at the t th iteration of the E-step, compute ( n ) s (t) 1 = E µ y (t),σ 2(t) i y obs m = y i + (n m) µ (t) (1) since E µ (t),σ 2(t) (y i) = µ (t) where µ (t) and σ 2(t) are the current estimates of µ and σ 2, and 3

4 ( n ) s (t) 2 = E µ y 2 (t),σ 2(t) i y obs m = yi 2 + (n m) [ σ (t)2 + µ (t)2] (2) since E µ (t),σ 2(t) (y2 i ) = σ 2(t) + µ (t)2. For the M-step, first note that the complete-data maximum likelihood estimates of µ and σ 2 are: ˆµ = n y i n and ˆσ 2 = n y 2 i n ( n ) 2 y i The M-step is defined by substituting the expectations computed in the E-step for the complete-data sufficient statistics on the right-hand side of the above expressions to obtain expressions for the new iterates of µ and σ 2. Note that complete-data sufficient statistics themselves cannot be computed directly since y m+1,..., y n have not been observed. We get the expressions n and µ (t+1) = s(t) 1 n (3) σ 2(t+1) = s(t) 2 n µ(t+1)2. (4) Thus, the E-step involves computing evaluating (1) and (2) beginning with starting values µ (0) and σ 2(0). M-step involves substituting these in (3) and (4) to calculate new values µ (1) and σ 2(1), etc. Thus, the EM algorithm iterates successively between (1) and (2) and (3) and (4). Of course, in this example, it is not necessary to use of EM algorithm since the maximum likelihood estimates for (µ, σ 2 ) are clearly given by ˆµ = m y i /m and ˆσ 2 = m y 2 i /m ˆµ 2. Example 2: Sampling from a Multinomial population In the Example 1, incomplete data in effect was missing data in the conventional sense. However, in general, the EM algorithm applies to situations where the complete data may contain variables that are not observable by definition. In that set-up, the observed data can be viewed as some function or mapping from the space of the complete data. The following example is used by Dempster, Laird and Rubin (1977) as an illustration of the EM algorithm. Let y obs = (38, 34, 125) T be observed counts from a multinomial population with probabilities: ( 1 1θ, 1θ, θ). The objective is to obtain the maximum likelihood estimate of θ. First, to put this into the framework of an incomplete data problem, 4

5 define y = (y 1, y 2, y 3, y 4 ) T with multinomial probabilities ( θ, 1 4 θ, 1 4 θ, 1 2 ) (p 1, p 2, p 3, p 4 ). The y vector is considered complete-data. Then define y obs = (y 1, y 2, y 3 + y 4 ) T. as the observed data vector, which is a function of the complete-data vector. Since only y 3 + y 4 is observed and y 3 and y 4 are not, the observed data is considered incomplete. However, this is not simply a missing data problem. The complete-data log-likelihood is l(θ; y) = y 1 log p 1 + y 2 log p 2 + y 3 log p 3 + y 4 log p 4 + const. which is linear in y 1, y 2, y 3 and y 4 which are also the sufficient statistics. The E-step requires that E θ (y y obs ) be computed; that is compute E θ (y 1 y obs ) = y 1 = 38 E θ (y 2 y obs ) = y 2 = 34 since, conditional on (y 3 + y 4 ), y 3 Similarly, E θ (y 3 y obs ) = E θ (y 3 y 3 + y 4 ) = 125( 1 4 θ)/( θ) is distributed as Binomial(125, p) where p = 1 θ θ. 2 4 E θ (y 4 y obs ) = E θ (y 4 y 3 + y 4 ) = 125( 1 2 )/( θ), which is similar to computing E θ (y 3 y obs ). But only y (t) 3 = E θ (t)(y 3 y obs ) = 125( 1 4 )θ(t) ( θ(t) ) (1) needs to be computed at the t th iteration of the E-step as seen below. For the M-step, note that the complete-data maximum likelihood estimate of θ is (Note: Maximize y 2 + y 3 y 1 + y 2 + y 3 l(θ; y) = y 1 log( θ) + y 2 log 1 4 θ + y 3 log 1 4 θ + y 4 log 1 2 and show that the above indeed is the maximum likelihood estimate of θ). Thus, substitute the expectations from the E-step for the sufficient statistics in the expression for maximum likelihood estimate θ above to get θ (t+1) = 34 + y(t) 3. (2) 72 + y (t) 3 Iterations between (1) and (2) define the EM algorithm for this problem. The following table shows the convergence results of applying EM to this problem with θ (0) =

6 Table 1. The EM Algorithm for Example 2 (from Little and Rubin (1987)) t θ (t) θ (t) ˆθ (θ (t+1) ˆθ)/(θ (t) ˆθ) Example 3: Sample from Binomial/ Poisson Mixture The following table shows the number of children of N widows entitled to support from a certain pension fund. Number of Children: Observed # of Widows: n 0 n 1 n 2 n 3 n 4 n 5 n 6 Since the actual data were not consistent with being a random sample from a Poisson distribution (the number of widows with no children being too large) the following alternative model was adopted. Assume that the discrete random variable is distributed as a mixture of two populations, thus: Mixture of Populations: Population A: with probability ξ, the random variable takes the value 0, and Population B: with probability (1 ξ), the random variable follows a Poisson with mean λ Let the observed vector of counts be n obs = (n 0, n 1,..., n 6 ) T. The problem is to obtain the maximum likelihood estimate of (λ, ξ). This is reformulated as an incomplete data problem by regarding the observed number of widows with no children be the sum of observations that come from each of the above two populations. Define n 0 = n A + n B n A = # widows with no children from population A n B = n o n A = # widows with no children from population B Now, the problem becomes an incomplete data problem because n A is not observed. Let n = (n A, n B, n 1, n 2,..., n 6 ) be the complete-data vector where we assume that n A and n B are observed and n 0 = n A + n B. 6

7 Then f(n; ξ, λ) = k(n) {P (y 0 = 0)} n 0 Π {P (y i = i)} n i = k(n) [ { ξ + (1 ξ) e λ] n 0 [Π 6 (1 ξ) e λ λ i } ni ] i! = k(n) [ ξ + (1 ξ)e λ] n A +n B { } [ ( ) 6 (1 ξ)e λ n i λ Π 6 i ni ]. i! where k(n) = 6 n i /n 0!n 1!... n 6!. Obviously, the complete-data sufficient statistic is (n A, n B, n 1, n 2,..., n 6 ). The complete-data log-likelihood is l(ξ, λ; n) = n 0 log(ξ + (1 ξ)e λ ) 6 + (N n 0 ) [log(1 ξ) λ] + i n i log λ + const. Thus, the complete-data log-likelihood is linear in the sufficient statistic. The E-step requires the computing of E ξ,λ (n n obs ). This computation results in and E ξ,λ (n i n obs ) = n i for i = 1,..., 6, E ξ,λ (n A n obs ) = n 0 ξ ξ + (1 ξ) exp( λ), since n A is Binomial(n 0, p) with p = p A p A +p B where p A = ξ and p B = (1 ξ) e λ. The expression for E ξ,λ (n B n obs ) is equivalent to that for E(n A ) and will not be needed for E- step computations. So the E-step consists of computing n (t) A = n 0 ξ (t) ξ (t) + (1 ξ (t) ) exp( λ (t) ) (1) at the t th iteration. For the M-step, the complete-data maximum likelihood estimate of (ξ, λ) is needed. To obtain these, note that n A Bin(N, ξ) and that n B, n 1,..., n 6 are observed counts for i = 0, 1,..., 6 of a Poisson distribution with parameter λ. Thus, the complete-data maximum likelihood estimate s of ξ and λ are ˆξ = n A N, 7

8 and ˆλ = 6 i n i n B + 6 n i. The M-step computes and ξ (t+1) = n(t) A N (2) λ (t+1) = 6 i n i n (t) B + (3) 6 n i where n (t) B = n 0 n (t) A. The EM algorithm consists of iterating between (1), and (2) and (3) successively. following data are reproduced from Thisted(1988). The Number of children Number of widows 3, Starting with ξ (0) = 0.75 and λ (0) = 0.40 the following results were obtained. Table 2. EM Iterations for the Pension Data t ξ λ n A n B A single iteration produced estimates that are within 0.5% of the maximum likelihood estimate s and are comparable to the results after about four iterations of Newton-Raphson. However, the convergence rate of the subsequent iterations are very slow; more typical of the behavior of the EM algorithm. 8

9 Example 4: Variance Component Estimation (Little and Rubin(1987)) The following example is from Snedecor and Cochran (1967, p.290). In a study of artificial insemination of cows, semen samples from six randomly selected bulls were tested for their ability to produce conceptions. The number of samples tested varied from bull to bull and the response variable was the percentage of conceptions obtained from each sample. Here the interest is on the variability of the bull effects which is assumed to be a random effect. The data are: Table 3. Data for Example 4 (from Snedecor and Cochran(1967)) Bull(i) Percentages of Conception n i 1 46,31,37,62, , ,44,57,40,67,64, ,21,70,46, ,64,50,69,77,81, ,68,59,38,57,76,57,29,60 9 Total 35 A common model used for analysis of such data is the oneway random effects model: y ij = a i + ɛ ij, j = 1,..., n i, i = 1,..., k; where it is assumed that the bull effects a i are distributed as i.i.d. N(µ, σ 2 a) and the withinbull effects (errors) ɛ ij as i.i.d. N(0, σ 2 ) random variables where a i and ɛ ij are independent. The standard oneway random effects analysis of variance is: Source d.f. S.S. M.S. F E(M.S.) Bull σ σ 2 a Error σ 2 Total Equating observed and expected mean squares from the above gives s 2 = as the estimate of σ 2 and ( )/5.67 = as the estimate of σa. 2 To construct an EM algorithm to obtain MLE s of θ = (µ, σa, 2 σ 2 ), first consider the joint density of y = (y, a) T where y is assumed to be complete-data. This joint density can be written as a product of two factors: the part first corresponds to the joint density of y ij given a i and the second to the joint density of a i. f(y ; θ) = f 1 (y a; θ)f 2 (a; θ) { } { } 1 = Π i Π j e 1 2σ 2 (y ij a i ) 2 1 Π i e 1 2σ 2 (a i µ) 2 a 2πσ 2πσa 9

10 Thus, the log-likelihood is linear in the following complete-data sufficient statistics: T 1 = a i T 2 = a 2 i T 3 = (y ij a i ) 2 = i j i (y ij ȳ i. ) 2 + j i n i (ȳ i. a i ) 2 Here complete-data assumes that both y and a are available. Since only y is observed, let y obs = y. Then the E-step of the EM algorithm requires the computation of the expectations of T 1, T 2 and T 3 given y obs, i.e., E θ (T i y) for i = 1, 2, 3. The conditional distribution of a given y is needed for computing these expectations. First, note that the joint distribution of y = (y, a) T is (N + k)-dimensional multivariate normal: N(µ, Σ ) where µ = (µ, µ a ) T, µ = µj N, µ a = µj k and Σ is the (N + k) (N + k) matrix ( ) Σ Σ Σ12 = σai 2. Here Σ = Σ 1 0 Σ Σ k Σ T 12, Σ 12 = σa 2 j n1 0 j n j nk where Σ i = σ 2 I ni +σ 2 aj ni is an n i n i matrix. The covariance matrix Σ of the joint distribution of y is obtained by recognizing that the y ij are jointly normal with common mean µ and common variance σ 2 + σ 2 a and covariance σ 2 a within the same bull and 0 between bulls. That is Cov(y ij, y i j ) = Cov(a i + ɛ ij, a i + ɛ i j ) Σ 1 = = σ 2 + σ 2 a if i = i, j = j, = σ 2 a if i = i, j j, = 0 if i i. Σ 12 is covariance of y and a and follows from the fact that Cov(y ij, a i ) = σa 2 if i = i and 0 if i i. The inverse of Σ is needed for computation of the conditional distribution of a given y and obtained as Σ where Σ 1 i = 1 Σ Σ 1 k ] σ2 a σ 2 +n i J σa 2 ni. Using a well-known theorem in multivariate normal [ σ 2 Ini theory, the distribution of a given y is given by N(α, A) where α = µ a + Σ 12Σ 1 (y µ) and A = σai 2 Σ 12Σ 1 Σ 12. It can be shown after some algebra that a i y i.i.d N (w i µ + (1 w i )ȳ i., v i ) 10

11 where w i = σ 2 /(σ 2 + n i σa), 2 ȳ i. = ( n i j=1 y ij )/n i, and v i = w i σa. 2 Recall that this conditional distribution was derived so that the expectations of T 1, T 2 and T 3 given y (or yobs) can be computed. These now follow easily. Thus the t th iteration of the E-step is defined as T (t) 1 = [ (t) w i µ (t) + (1 w (t) ] i )ȳ i. T (t) 2 = [ w (t) T (t) 3 = i i j µ (t) + (1 w (t) ] 2 (t) i )ȳ i. + v i (y ij ȳ i. ) 2 + i n i [ w (t) 2 i Since the complete-data maximum likelihood estimates are (µ (t) ȳ i. ) 2 + v (t) i ] and ˆµ = T 1 k ˆσ 2 a = T 2 k ˆµ2 ˆσ 2 = T 3 N, the M-step is thus obtained by substituting the expectations for the sufficient statistics calculated in the E-step in the expressions for the maximum likelihood estimates: µ (t+1 = T (t) 1 k σa 2(t+1) = T (t) 2 k σ 2(t+1) = T (t) 3 N µ(t+1)2 Iterations between these 2 sets of equations define the EM algorithm. With the starting values of µ (0) = 54.0, σ 2(0) = 70.0, σa 2(0) = 248.0, the maximum likelihood estimates of ˆµ = , ˆσ a 2 = and ˆσ 2 = were obtained after 30 iterations. These can be compared with the estimates of σa 2 and σ 2 obtained by equating observed and expected mean squares from the random effects analysis of variance given above. Estimates of σa 2 and σ 2 obtained from this analysis are and respectively. Convergence of the EM Algorithm The EM algorithm attempts to maximize l obs (θ; y obs ) by maximizing l(θ; y), the completedata log-likelihood. Each iteration of EM has two steps: an E-step and an M-step. The t th E-step finds the conditional expectation of the complete-data log-likelihood with respect to the conditional distribution of y given y obs and the current estimated parameter θ (t) : Q(θ; θ (t) ) = E θ (t)[l(θ; y) y obs ] = l(θ; y)f(y y obs ; θ (t) )dy, 11

12 as a function of θ for fixed y obs and fixed θ (t). The expectation is actually the conditional expectation of the complete-data log-likelihood, conditional on y obs. The t th M-step then finds θ (t+1) to maximize Q(θ; θ (t) ) i.e., finds θ (t+1) such that Q(θ (t+1) ; θ (t) ) Q(θ; θ (t) ), for all θ Θ. To verify that this iteration produces a sequence of iterates that converges to a maximum of l obs (θ; y obs ), first note that by taking conditional expectation of both sides of l obs (θ; y obs ) = l(θ; y) log f 2 (y mis y obs ; θ), over the distribution of y given y obs at the current estimate θ (t), l obs (θ; y obs ) can be expressed in the form l obs (θ; y obs ) = l(θ; y)f(y y obs ; θ (t) )dy log f 2 (y mis y obs ; θ)f(y y obs ; θ (t) )dy = E θ (t)[l(θ; y) y obs ] E θ (t)[log f 2 (y mis y obs ; θ) y obs ] = Q(θ; θ (t) ) H(θ; θ (t) ) where Q(θ; θ (t) ) is as defined earlier and H(θ; θ (t) ) = E θ (t)[log f 2 (y mis y obs ; θ) y obs ]. The following Lemma will be useful for proving a main result that the sequence of iterates θ (t) resulting from EM algrithm will converge at least to a local maximum of l obs (θ; y obs ). Lemma: For any θ Θ, H(θ; θ (t) ) H(θ (t) ; θ (t) ). Theorem: The EM algorithm increases l obs (θ; y obs ) at each iteration, that is, with equality if and only if l obs (θ (t+1) ; y obs ) l obs (θ (t) ; y obs ) Q(θ (t+1) ; θ (t) ) = Q(θ (t) ; θ (t) ). This Theorem implies that increasing Q(θ; θ (t) ) at each step leads to maximizing or at least constantly increasing l obs (θ; y obs ). Although the general theory of EM applies to any model, it is particularly useful when the complete data y are from an exponential family since, as seen in examples, in such cases the E-step reduces to finding the conditional expectation of the complete-data sufficient statistics, and the M-step is often simple. Nevertheless, even when the complete data y are from an exponential family, there exist a variety of important applications where complete-data maximum likelihood estimation itself is complicated; for example, see Little & Rubin (1987) on selection models and log-linear models, which generally require iterative M-steps. 12

13 In a more general context, EM has been widely used in the recent past in computations related to Bayesian analysis to find the posterior mode of θ, which maximizes l(θ y)+log p(θ) for prior density p(θ) over all θ Θ. Thus in Bayesian computations, log-likelihoods used above are substituted by log-posteriors. Extensions of the EM Algoirthm Some Definitions and Notations Regular Exponential Family (REF). The joint density of an Exponential family may be written in the form : f(y; θ) = b(y) exp { c(θ) T s(y) }/ a(θ) where s(y) is a k 1 vector of sufficient statistics c(θ) is a k 1 vector of parameters θ is a d 1 vector Ω, a d dimensional convex set s.t. f(y; θ) is a p.d.f. b(y) and a(θ) are scalars c(θ) is called the natural or canonical parameter vector. If k = d and the Jacobian of c(θ), c θ is a full rank k k matrix, then f(y; θ) is said to belong to a Regular Exponential Family (REF). In this case Complete-data score vector Observed-data score vector Also can show that f(y; θ) = b(y) exp { θ T s(y) }/ a(θ) S(θ; y) = l(θ; y) θ S obs (θ; y obs ) = l obs(θ; y obs ) θ S obs (θ; y obs ) = E θ [ S(θ; y) yobs ] assuming conditions for interchanging the operations of expectation and differentiation hold. Complete-data Information Matrix I(θ; y) = 2 l(θ; y) θ θ T 13

14 Complete-data Expected Information Matrix Observed-data Information Matrix I(θ; y) = E θ [ I(θ; y) ] I obs ( θ; yobs ) = 2 l(θ; y obs ) θ θ T Observed-data Expected Information Matrix I obs (θ; y obs ) = E θ [ I(θ; yobs ] Conditional Expected Information Matrix Missing Information Principle Recall Differentiating twice w.r.t. θ, we have I c (θ; y obs ) = E θ [ I(θ; y) yobs ] l obs (θ; y obs ) = l(θ; y) log f 2 (y mis y obs ; θ) I obs (θ; y obs ) = I(θ; y) + 2 log f 2 (y mis y obs ; θ) θ θ T Now taking expectation over the conditional distribution y y obs : I obs (θ; y obs ) = I c (θ; y obs ) I mis (θ; y obs ) where we denote the missing information matrix as I mis (θ; y obs ) = E θ { 2 f 2 (y mis y obs ; θ) θ θ T In other words, the missing information principle asserts that } y obs Observed Information = Complete Information - Missing Information Convergence Rate of EM EM algorithm implicitly defines a mapping where M(θ) = ( M 1 (θ),..., M d (θ) ). θ (t+1) = M(θ (t) ) t = 0, 1,... For the problem of maximizing Q(θ; θ), it can be shown that M has a fixed point and since M is continuous and monotone θ (t) converges to a point θ Ω. 14

15 Consider the Taylor series expansion of about θ noting that θ = M(θ ) : which leads to where DM = M (θt ) θ θ (t+1) = M(θ (t) ) M(θ (t) ) = M(θ ) + (θ (t) θ ) M(θ(t) ) θ θ=θ θ (t+1) θ = (θ (t) θ )DM is a d d matrix. θ=θ Thus near θ, EM algorithm is essentially a linear iteration with the rate matrix DM. Definition Recall that the rate of convergence of an iterative process is defined as = lim t θ (t+1) θ θ (t) θ where is any vector norm. For the EM algorithm, the rate of convergence is thus r r = λ max = largest eigen value ofdm Dempster, Laird, and Rubin (1977) have shown that ( ) ( ) DM = I mis θ ; y obs I 1 c θ ; y obs ( ) ( ) Thus the rate of convergence is the largest eigen value of I mis θ ; y obs I 1 c θ ; y obs. Obtaining the covariance matrix of MLE from the EM Algorithm In maximum likelihood estimation, the large-sample covariance matrix of the mle ˆθ is usually estimated by the observed information matrix. When using the EM Algorithm for computing the maximum likelihood estimates, once convergence is reached we can evaluate I(ˆθ; y obs ) directly. However, this involves calculation of the second order derivatives of the observed-data log-likelihood l obs (θ; y obs ). This is not a viable option since, we have appealed to EM algoirthm expressly to avoid the complexity of evaluating l obs (θ; y obs ) itself. Thus we need to be able to approximate I(ˆθ; y obs ) by other methods if EM Algorithm is to be a useful alternative for maximum likelihood estimation using other iterative techniques. For this we need some more results: We can use I c (ˆθ; y obs ) = I(ˆθ; y) to obtain the conditional expected information matrix in the REF case, because I c ( θ; yobs ) = Eθ { I(θ; y) yobs } as I(θ, y) is not a function of y in the REF case. = E θ {I(θ; y)} = I(θ; y) 15

16 That is I c (ˆθ; y obs ) can be obtained by replacing the sufficient statistic s(y) by its conditional expectation evaluated at θ = ˆθ in I(θ; y), the complete-data information matrix. Example (continued from page 5) l(θ; y) = y 1 log ( 1 θ ) + y2 log θ y 3 log θ 4 + y 4 log 1/2 l(θ; y) θ = y 1 1 θ + y 2 θ + y 3 θ 2 l(θ; y) θ 2 = I(θ; y) = E[I(θ; y) y obs ] = I (θ; y obs ) = I (ˆθ; y obs ) = y 1 (1 θ) 2 y 2 θ 2 y 3 θ 2 y 1 (1 θ) + y 2 + y 3 2 θ 2 38 (1 θ) θ + 125θ 2 (2 + θ)θ (1 ˆθ) 2 ˆθ (2 + ˆθ)ˆθ Complete this computation using ˆθ from previous results. Also the convergence rate can be calculated simialry and shown to be.1328 which is the value obtained in the actual iteration. Generalized EM Algorithm (GEM) Recall that in the M-step we maximize Q(θ, θ (t) ) i.e., find θ (t+1) s.t. Q ( θ (t+1) ; θ (t)) Q ( θ; θ (t)) for all θ. In the generalized version of the EM Algorithm we will require only that θ (t+1) be chosen such that Q ( θ (t+1) ; θ (t)) Q ( θ (t) ; θ (t)) holds, i.e., θ (t+1) is chosen to increase Q(θ; θ (t) ) over its value at θ (t) at each iteration t. This is sufficient to ensure that l ( θ (t+1) ; y ) l ( θ (t) ; y ) at each iteration, so GEM sequence of iterates also converges to a local maximum. GEM Algorithm based on a single N-R step We use GEM-type algorithms when a global maximizer of Q(θ; θ (t) ) does not exist in closed form. In this case, possibly an iterative method is required to accomplish the M-step, which might prove to be a computationally infeasible procedure. Since it is not essential to actually maximize Q in a GEM, but only increase the likelihood, we may replace the M-step with a step that achieves that. One possibilty of such a step is a single iteration of the 16

17 Newton-Raphson(N-R) algorithm, which we know is a descent method. Let where θ (t+1) = θ (t) + a (t) δ (t) [ δ (t) = 2 Q(θ;θ (t) ) θ θ T ] 1 θ=θ (t) [ ] Q(θ;θ (t) ) θ θ=θ (t) i.e., δ (t) is the N-R direction at θ (t) and 0 < a (t) 1. If a (t) = 1 this will define an exact N-R step. Here we will choose a (t) so that this defines a GEM sequence. This will be achieved if a (t) < 2 as t. General Mixed Model The general mixed linear model is given by r y = Xβ + Z i u i + ɛ where y n 1 is an observed random vector, X is an n p, and Z i are n q i, matrices of known constants, β p 1 is a vector of unknown parameters, and u i are q i 1 are vectors of unobservable random effects. ɛ n 1 is assumed to be distributed n-dimensional multivariate normal N(0, σ 2 0I n ) and each u i are assumed to have q i -dimensional multivariate normal distributions N qi (0, σ 2 i Σ i ) for i = 1, 2,..., r, independent of each other and of ɛ. We take the complete data vector to be (y, u 1,..., u r ) where y is the incomplete or the observed data vector. It can be shown easily that the covariance matrix of y is the n n matrix V where r V = Z i Z T i σi 2 + σ0i 2 n Let q = r i=0 q i where q 0 = n. The joint distribution of y and u 1,..., u r is q-dimensional multivariate normal N(µ, Σ) where { Xβ V µ = and Σ = i rσ 2 Z i}r q 1 { } 0. q q r { 0 σi 2 Z T i i c dσ 2 I qi}r Thus the density function of y, u,..., u r is f(y, u 1, u 2,..., u r ) = (2π) 1 2 q Σ exp( 2 wt Σ 1 w) where w = [ ] (y Xβ) T, u T 1,..., u T r. This gives the complete data loglikelihood to be l = 1 2 q log(2π) 1 r q i log σi i= r i=0 u T i u i σ 2 i

18 where u 0 = y Xβ r Z i u i = (ɛ). Thus the sufficient statistics are: u T i u i i = 0,..., r, and y r Z i u i and the maximum likelihood estimates (m.l.e. s) are ˆσ i 2 = ut i u i, q i i = 0, 1,..., r ˆβ = r (X T X) X T (y Z i u i ) Special Case: Two-Variance Components Model The general mixed linear model reduces to: y = Xβ + Z 1 u 1 + ɛ where ɛ N(0, σ 2 0I) and u 1 N(0, σ 2 1I n ) and the covariance matrix of y is now V = Z 1 Z T 1 σ σ 2 0I n The complete data loglikelihood is l = 1 2 q log(2π) 1 1 q i log σi i=0 2 where u 0 = y Xβ Z 1 u 1. The m.l.e. s are 1 i=0 ˆσ 2 i = ut i u i q i i = 0, 1 ˆβ = (X T X) X T (y Z 1 u 1 ) u T i u i σ 2 i We need to find the expected values of the sufficient statistics u T i u i, i = 0, 1 and y Z 1 u 1 conditional on observed data vector y. Since u i y is distributed as q i -dimensional multivariate normal we have N(σ 2 i Z T i V 1 (y Xβ), σ 2 i I qi σ 4 i Z T i V 1 Z i ) E(u T i u i y) = σ 4 i (y Xβ) T V 1 Z i Z T i V 1 (y Xβ) + tr(σ 2 i I qi σ 4 i Z T i V 1 Z i ) E(y Z 1 u 1 y) = Xβ + σ 2 0V 1 (y Xβ) noting that E(u 0 y) = σ 2 0V 1 (y Xβ) E(u T 0 u 0 y) = σ 4 0(y Xβ) T V 1 Z 0 Z T 0 V 1 (y Xβ) + tr(σ 2 0I qi σ 4 0Z T 0 V 1 Z 0 ) where Z 0 = I n. 18

19 From the above we can derive the following EM-type algorithms for this case: Basic EM Algorithm Step 1 (E-step) Set V (t) = Z 1 Z 1σ 2(t) 1 + σ 2(t) 0 I n and for i = 0, 1 calculate ŝ (t) i = E(u T i u i y) β=β (t), σ 2 i =σ2(t) i = σ 4(t) i (y Xβ (t) ) T V (t) Z i Z T i V (t) (y Xβ (t) ) + tr(σi 2(t) I qi σi 4(t) Z T i V (t) 1 Z i ) i = 0, 1 ŵ (t) = E(y Z 1 u 1 y) β=β (t), σ 2 i =σ2(t) i = Xβ (t) + σ 2(t) 0 V (t) 1 (y Xβ (t) ) Step 2 (M-step) σ 2(t+1) i = ŝ (t) i /q i i = 0, 1 β (t+1) = (X T X) 1 X ŵ (t) ECM Algorithm Step 1 (E-step) Set V (t) = Z 1 Z 1σ 2(t) 1 + σ 2(t) 0 I n and, for i = 0, 1 calculate ŝ (t) i = E(u T i u i y) β=β (t), σ 2 i =σ2(t) i = σi 4(t) (y Xβ (t) ) T V (t) 1 Z i Z i V (t) 1 (y Xβ) + tr(σi 2(t) I qi σi 4(t) Z T i V (t) 1 Z i ) Step 2 (M-step) Partition the parameter vector θ = (σ0, 2 σ1, 2 β) as θ 1 = (σ0, 2 σ1) 2 and θ 2 = β CM-step 1 Maximize complete data log likelihood over θ 1 σ 2(t+1) i = ŝ (t) i /q i i = 0, 1 CM-step 2 Calculate β (t+1) as β (t+1) = (X T X) 1 X T ŵ (t) where ŵ (t+1) = Xβ (t) + σ 2(t+1) 0 V (t+1) 1 (y Xβ (t) ) 19

20 ECME Algorithm Step 1 (E-step) Set V (t) = Z 1 Z 1σ 2(t) 1 + σ 2(t) 0 I n and, for i = 0, 1 calculate ŝ (t) i = E(u T i u i y) σi 2 = σ 2(t) i = σ 4(t) i y T P (t) Z i Z T i P (t) y + tr(σi 2(t) I qi σi 4(t) Z T i V (t) 1 Z i ) where P (t) = V (t) 1 V (t) 1 X(X T V (t) 1 X) X T V (t) 1 Step 2 (M-step) Partition θ as θ 1 = (σ 2 0, σ 2 1) and θ 2 = β as in ECM. CM-step 1 Maximize complete data log likelihood over θ 1 σ 2(t+1) i = ŝ (t) i /q i i = 0, 1 CM-step 2 Maximize the observed data log likelihood over θ given θ (t) 1 = (σ 2(t) 0, σ 2(t) 0 ): β (t+1) = (X T V (t+1) 1 X) 1 (X T V (t+1) 1 )y (Note: This is the WLS estimator of β.) 20

21 Example of Mixed Model Analysis using the EM Algorithm The first example is an evaluation of the breeding value of a set of five sires in raising pigs, taken from Snedecor and Cochran (1967). (The data is reported in Appendix 4.) The experiment was designed so that each sire is mated to a random group of dams, each mating producing a litter of pigs whose characteristics are criterion. The model to be estimated is y ijk = µ + α i + β ij + ɛ ijk, (4) where α i is a constant associated with the i-th sire effect, β ij is a random effect associated with the i-th sire and j-th dam, ɛ ijk is a random term. The three different initial values for (σ 2 0, σ 2 1) are (1, 1), (10, 10) and (.038,.0375); the last initial value corresponds to the estimates from the SAS ANOVA procedure. Table 1: Average Daily Gain of Two Pigs of Each Litter (in pounds) Sire Dam Gain Sire Dam Gain

22 ########################################################### # Classical EM algorithm for Linear Mixed Model # ########################################################### em.mixed <- function(y, x, z, beta, var0, var1,maxiter=2000,tolerance = 1e-0010) { time <-proc.time() n <- nrow(y) q1 <- nrow(z) conv <- 1 L0 <- loglike(y, x, z, beta, var0, var1) i<-0 cat(" Iter. sigma0 sigma1 Likelihood",fill=T) repeat { if(i>maxiter) {conv<-0 break} V <- c(var1) * z %*% t(z) + c(var0) * diag(n) Vinv <- solve(v) xb <- x %*% beta resid <- (y-xb) temp1 <- Vinv %*% resid s0 <- c(var0)^2 * t(temp1)%*%temp1 + c(var0) * n - c(var0)^2 * tr(vinv) s1 <- c(var1)^2 * t(temp1)%*%z%*%t(z)%*%temp1+ c(var1)*q1 - c(var1)^2 *tr(t(z)%*%vinv%*%z) w <- xb + c(var0) * temp1 var0 <- s0/n var1 <- s1/q1 beta <- ginverse( t(x) %*% x) %*% t(x)%*% w L1 <- loglike(y, x, z, beta, var0, var1) if(l1 < L0) { print("log-likelihood must increase, llikel <llikeo, break.") conv <- 0 break } i <- i + 1 cat(" ", i," ",var0," ",var1," ",L1,fill=T) if(abs(l1 - L0) < tolerance) {break} #check for convergence L0 <- L1 } list(beta=beta, var0=var0,var1=var1,loglikelihood=l0) } 22

23 ######################################################### # loglike calculates the LogLikelihood for Mixed Model # ######################################################### loglike<- function(y, x, z, beta, var0, var1) { n<- nrow(y) V <- c(var1) * z %*% t(z) + c(var0) * diag(n) } Vinv <- ginverse(v) xb <- x %*% beta resid <- (y-xb) temp1 <- Vinv %*% resid (-.5)*( log(det(v)) + t(resid) %*% temp1 ) > y <- matrix(c(2.77, 2.38, 2.58, 2.94, 2.28, 2.22, 3.01, 2.61, , 2.71, 2.72, 2.74, 2.87, 2.46, 2.31, 2.24, , 2.56, 2.50, 2.48),20,1) > x1 <- rep(c(1,0,0,0,0),rep(4,5)) > x2 <- rep(c(0,1,0,0,0),rep(4,5)) > x3 <- rep(c(0,0,1,0,0),rep(4,5)) > x4 <- rep(c(0,0,0,1,0),rep(4,5)) > x <- cbind(1,x1,x2,x3,x4) > x x1 x2 x3 x4 [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,]

24 [14,] [15,] [16,] [17,] [18,] [19,] [20,] > beta <- lm(y~ x1 + x2 + x3 +x4)$coefficients > beta [,1] (Intercept) x x x x > z=matrix(rep( as.vector(diag(1,10)),rep(2,100)),20,10) > z [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,] [16,] [17,] [18,] [19,] [20,] > tolerance <- 1e

25 > maxiter < > seed <- 100 > tr <- function(x) sum(diag(x)) > pig.em.results=em.mixed(y,x,z,beta,1,1) Iter. sigma0 sigma1 Likelihood

26

27 > pig.em.results $beta: [,1] [1,] [2,] [3,] [4,] [5,] $var0: [,1] [1,] $var1: [,1] [1,] $Loglikelihood: [,1] [1,]

28 References Aitkin, M. and Wilson G.T. (1980) Mixture Models, Outliers and the EM Algorithm, Technometrics, 22, Baker S.G. (1990) A simple EM Algorithm for Capture-Recapture Data with Categorical Covariates, Biometrics, 46, Callanan, T. and Harville, D.A. (1991) Some New Algorithms for Computing REML of Variance Components, Journal of Statistical Computation and Simulation, 38, Davenport, J.W., Pierce, M.A, and Hathaway, R.J. (1988) A Numerical Comparison of EM and Quasi-Newton Type Algorithms for MLE for a Mixture of Normals, Computer Science and Statistics: Proc. of the 20th Symp. on the Interface, Dempster, A.P., Laird, N.M. and Rubin, D.B. (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm, Journal of the Royal Statistical Society, Series B, Methodological, 39, Fessler, J.A. and Hero, A.O. (1994) Space-alternating Generalized Expaectation Maximization Algorithm, IEEE Trans. on Signal Processing, 42, Jamshidian, M. and Jennrich, R.I.(1993) Conjugate Gradient Acceleration of the EM Algorithm, Journal of the American Statistical Association, 88, Laird, N.M. (1982) Computation of Variance Components using the EM Algorithm, Journal of Statistical Computation and Simulation, 14, Laird, N.M., Lange, N. and Stram, D. (1987) Maximum Likelihood Computation with Repeated Measures, Journal of the American Statistical Association, 83, Lindstrom and Bates, D. (1988) Newton-Raphson and EM Algorithms for Linear Mixed-Effects Models for Repeated Measure Data, Journal of the American Statistical Association, 83, Little, R.J.A. and Rubin, D.B. (1987) Statistical Analysis with Missing Data, New York: John Wiley. Liu, C.H. and Rubin, D.B. (1994) The ECME Algorithm: a simple Expansion of EM and ECM with Faster Monotone Convergence, Biometrika, 81, Louis, T.A. (1982) Finding the Observed Information Matrix when using the EM Algorithm, Journal of the Royal Statistical Society, Series B, Methodological, 44, Meng, X.L. (1994) On the Rate of Convergence of the ECM Algorithm, Annals of Statistics, 22, Meng, X.L. and Rubin, D.B. (1991) Using EM to Obtain Asymptotic Variance-Covariance Matrices: the SEM Algorithm, Journal of the American Statistical Association, 86,

29 Meng, X.L. and Rubin, D.B. (1993) Maximum Likelihood Estimation via the ECM Algorithm: a general framework, Biometrika, 80, Meilijson, I. (1989) A Fast Improvement to the EM Algorithm on its Own Terms, Journal of the Royal Statistical Society, Series B, Methodological, 44, Orchard, T. and Woodbury, M.A. (1972) A Missing Information Principle: Theory and Applications, Proc. of the 6th Symp. on Math. Stat. and Prob. Vol.1, Rubin, D.B. (1976) Inference with Missing Data, Biometrika, 63, Rubin, D.B. (1991) EM and Beyond, Psychometrika, 56, Tanner, M.A. and Wong, W.H. (1987) The Calculation of Posterior Distributions by Data Augmentation (with discussion), Journal of the American Statistical Association, 82, Titterington, D.M. (1984) Recursive Parameter Estimation using Incomplete Data, Journal of the Royal Statistical Society, Series B, Methodological, 46, Wei, G.C.G. and Tanner, M.A. (1990) A Monte Carlo Implementation of the EM algorithm and the Poor Man s Data Augmentation Algorithms, Journal of the American Statistical Association, 85, Wu, C.F.J. (1983) On the Convergence Properties of the EM Algorithm, The Annals of Statistics, 11,

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

Lecture Notes 1. Brief Review of Basic Probability

Lecture Notes 1. Brief Review of Basic Probability Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Multivariate Analysis of Variance (MANOVA): I. Theory

Multivariate Analysis of Variance (MANOVA): I. Theory Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

The Expectation Maximization Algorithm A short tutorial

The Expectation Maximization Algorithm A short tutorial The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

Sections 2.11 and 5.8

Sections 2.11 and 5.8 Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and

More information

Factor analysis. Angela Montanari

Factor analysis. Angela Montanari Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Revenue Management with Correlated Demand Forecasting

Revenue Management with Correlated Demand Forecasting Revenue Management with Correlated Demand Forecasting Catalina Stefanescu Victor DeMiguel Kristin Fridgeirsdottir Stefanos Zenios 1 Introduction Many airlines are struggling to survive in today's economy.

More information

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis

Review of the Methods for Handling Missing Data in. Longitudinal Data Analysis Int. Journal of Math. Analysis, Vol. 5, 2011, no. 1, 1-13 Review of the Methods for Handling Missing Data in Longitudinal Data Analysis Michikazu Nakai and Weiming Ke Department of Mathematics and Statistics

More information

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

More information

Dynamic Linear Models with R

Dynamic Linear Models with R Giovanni Petris, Sonia Petrone, Patrizia Campagnoli Dynamic Linear Models with R SPIN Springer s internal project number, if known Monograph August 10, 2007 Springer Berlin Heidelberg NewYork Hong Kong

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1. MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column

More information

CS229 Lecture notes. Andrew Ng

CS229 Lecture notes. Andrew Ng CS229 Lecture notes Andrew Ng Part X Factor analysis Whenwehavedatax (i) R n thatcomesfromamixtureofseveral Gaussians, the EM algorithm can be applied to fit a mixture model. In this setting, we usually

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August

More information

On the degrees of freedom in shrinkage estimation

On the degrees of freedom in shrinkage estimation On the degrees of freedom in shrinkage estimation Kengo Kato Graduate School of Economics, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan kato ken@hkg.odn.ne.jp October, 2007 Abstract

More information

Lecture 13: Martingales

Lecture 13: Martingales Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Poisson Models for Count Data

Poisson Models for Count Data Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

Detekce změn v autoregresních posloupnostech

Detekce změn v autoregresních posloupnostech Nové Hrady 2012 Outline 1 Introduction 2 3 4 Change point problem (retrospective) The data Y 1,..., Y n follow a statistical model, which may change once or several times during the observation period

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

Model based clustering of longitudinal data: application to modeling disease course and gene expression trajectories

Model based clustering of longitudinal data: application to modeling disease course and gene expression trajectories 1 2 3 Model based clustering of longitudinal data: application to modeling disease course and gene expression trajectories 4 5 A. Ciampi 1, H. Campbell, A. Dyacheno, B. Rich, J. McCuser, M. G. Cole 6 7

More information

Life Table Analysis using Weighted Survey Data

Life Table Analysis using Weighted Survey Data Life Table Analysis using Weighted Survey Data James G. Booth and Thomas A. Hirschl June 2005 Abstract Formulas for constructing valid pointwise confidence bands for survival distributions, estimated using

More information

Least Squares Estimation

Least Squares Estimation Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

Measurement-Based Network Monitoring and Inference: Scalability and Missing Information

Measurement-Based Network Monitoring and Inference: Scalability and Missing Information 714 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 20, NO. 4, MAY 2002 Measurement-Based Network Monitoring and Inference: Scalability and Missing Information Chuanyi Ji, Member, IEEE and Anwar

More information

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models

Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear

More information

Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid

Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid Fortgeschrittene Computerintensive Methoden: Finite Mixture Models Steffen Unkel Manuel Eugster, Bettina Grün, Friedrich Leisch, Matthias Schmid Institut für Statistik LMU München Sommersemester 2013 Outline

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Detection of changes in variance using binary segmentation and optimal partitioning

Detection of changes in variance using binary segmentation and optimal partitioning Detection of changes in variance using binary segmentation and optimal partitioning Christian Rohrbeck Abstract This work explores the performance of binary segmentation and optimal partitioning in the

More information

Estimation across multiple models with application to Bayesian computing and software development

Estimation across multiple models with application to Bayesian computing and software development Estimation across multiple models with application to Bayesian computing and software development Trevor J Sweeting (Department of Statistical Science, University College London) Richard J Stevens (Diabetes

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

More information

Disability insurance: estimation and risk aggregation

Disability insurance: estimation and risk aggregation Disability insurance: estimation and risk aggregation B. Löfdahl Department of Mathematics KTH, Royal Institute of Technology May 2015 Introduction New upcoming regulation for insurance industry: Solvency

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Item selection by latent class-based methods: an application to nursing homes evaluation

Item selection by latent class-based methods: an application to nursing homes evaluation Item selection by latent class-based methods: an application to nursing homes evaluation Francesco Bartolucci, Giorgio E. Montanari, Silvia Pandolfi 1 Department of Economics, Finance and Statistics University

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation

Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation Maximum Likelihood Estimation of Logistic Regression Models: Theory and Implementation Abstract This article presents an overview of the logistic regression model for dependent variables having two or

More information

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA

A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA REVSTAT Statistical Journal Volume 4, Number 2, June 2006, 131 142 A LOGNORMAL MODEL FOR INSURANCE CLAIMS DATA Authors: Daiane Aparecida Zuanetti Departamento de Estatística, Universidade Federal de São

More information

Factorial experimental designs and generalized linear models

Factorial experimental designs and generalized linear models Statistics & Operations Research Transactions SORT 29 (2) July-December 2005, 249-268 ISSN: 1696-2281 www.idescat.net/sort Statistics & Operations Research c Institut d Estadística de Transactions Catalunya

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Optimizing Prediction with Hierarchical Models: Bayesian Clustering

Optimizing Prediction with Hierarchical Models: Bayesian Clustering 1 Technical Report 06/93, (August 30, 1993). Presidencia de la Generalidad. Caballeros 9, 46001 - Valencia, Spain. Tel. (34)(6) 386.6138, Fax (34)(6) 386.3626, e-mail: bernardo@mac.uv.es Optimizing Prediction

More information

Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults

Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Journal of Data Science 7(2009), 469-485 Statistical Methods for the Analysis of Alcohol and Drug Uses for Young Adults Liang Zhu, Jianguo Sun and Phillip Wood University of Missouri Abstract: Alcohol

More information

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010

Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010 Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

THE CENTRAL LIMIT THEOREM TORONTO

THE CENTRAL LIMIT THEOREM TORONTO THE CENTRAL LIMIT THEOREM DANIEL RÜDT UNIVERSITY OF TORONTO MARCH, 2010 Contents 1 Introduction 1 2 Mathematical Background 3 3 The Central Limit Theorem 4 4 Examples 4 4.1 Roulette......................................

More information

Ideal Class Group and Units

Ideal Class Group and Units Chapter 4 Ideal Class Group and Units We are now interested in understanding two aspects of ring of integers of number fields: how principal they are (that is, what is the proportion of principal ideals

More information

Time Series Analysis

Time Series Analysis Time Series Analysis hm@imm.dtu.dk Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby 1 Outline of the lecture Identification of univariate time series models, cont.:

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Problem of Missing Data

Problem of Missing Data VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;

More information

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure

Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Exact Inference for Gaussian Process Regression in case of Big Data with the Cartesian Product Structure Belyaev Mikhail 1,2,3, Burnaev Evgeny 1,2,3, Kapushev Yermek 1,2 1 Institute for Information Transmission

More information

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate

More information

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu Advanced statistical inference Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu August 1, 2012 2 Chapter 1 Basic Inference 1.1 A review of results in statistical inference In this section, we

More information

A Tutorial on MM Algorithms

A Tutorial on MM Algorithms A Tutorial on MM Algorithms David R. Hunter 1 Kenneth Lange 2 Department of Statistics 1 Penn State University University Park, PA 16802-2111 Departments of Biomathematics and Human Genetics 2 David Geffen

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher

More information

Stat 704 Data Analysis I Probability Review

Stat 704 Data Analysis I Probability Review 1 / 30 Stat 704 Data Analysis I Probability Review Timothy Hanson Department of Statistics, University of South Carolina Course information 2 / 30 Logistics: Tuesday/Thursday 11:40am to 12:55pm in LeConte

More information

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing

Constrained Bayes and Empirical Bayes Estimator Applications in Insurance Pricing Communications for Statistical Applications and Methods 2013, Vol 20, No 4, 321 327 DOI: http://dxdoiorg/105351/csam2013204321 Constrained Bayes and Empirical Bayes Estimator Applications in Insurance

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur Probabilistic Linear Classification: Logistic Regression Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 18, 2016 Probabilistic Machine Learning (CS772A) Probabilistic Linear Classification:

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

Department of Economics

Department of Economics Department of Economics On Testing for Diagonality of Large Dimensional Covariance Matrices George Kapetanios Working Paper No. 526 October 2004 ISSN 1473-0278 On Testing for Diagonality of Large Dimensional

More information

Statistical Theory. Prof. Gesine Reinert

Statistical Theory. Prof. Gesine Reinert Statistical Theory Prof. Gesine Reinert November 23, 2009 Aim: To review and extend the main ideas in Statistical Inference, both from a frequentist viewpoint and from a Bayesian viewpoint. This course

More information

1 Maximum likelihood estimation

1 Maximum likelihood estimation COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N

More information

About predictions in spatial autoregressive models: Optimal and almost optimal strategies

About predictions in spatial autoregressive models: Optimal and almost optimal strategies About predictions in spatial autoregressive models: ptimal and almost optimal strategies Michel Goulard INRA, UMR 1201 DYNAFR, Chemin de Borde Rouge BP52627, F31326 Castanet-Tolosan, FRANCE Thibault Laurent

More information

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean

Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. Philip Kostov and Seamus McErlean Using Mixtures-of-Distributions models to inform farm size selection decisions in representative farm modelling. by Philip Kostov and Seamus McErlean Working Paper, Agricultural and Food Economics, Queen

More information

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT

Validation of Software for Bayesian Models using Posterior Quantiles. Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Validation of Software for Bayesian Models using Posterior Quantiles Samantha R. Cook Andrew Gelman Donald B. Rubin DRAFT Abstract We present a simulation-based method designed to establish that software

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

171:290 Model Selection Lecture II: The Akaike Information Criterion

171:290 Model Selection Lecture II: The Akaike Information Criterion 171:290 Model Selection Lecture II: The Akaike Information Criterion Department of Biostatistics Department of Statistics and Actuarial Science August 28, 2012 Introduction AIC, the Akaike Information

More information

Treatment of Incomplete Data in the Field of Operational Risk: The Effects on Parameter Estimates, EL and UL Figures

Treatment of Incomplete Data in the Field of Operational Risk: The Effects on Parameter Estimates, EL and UL Figures Chernobai.qxd 2/1/ 1: PM Page 1 Treatment of Incomplete Data in the Field of Operational Risk: The Effects on Parameter Estimates, EL and UL Figures Anna Chernobai; Christian Menn*; Svetlozar T. Rachev;

More information

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes

Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009

Exponential Random Graph Models for Social Network Analysis. Danny Wyatt 590AI March 6, 2009 Exponential Random Graph Models for Social Network Analysis Danny Wyatt 590AI March 6, 2009 Traditional Social Network Analysis Covered by Eytan Traditional SNA uses descriptive statistics Path lengths

More information

Course: Model, Learning, and Inference: Lecture 5

Course: Model, Learning, and Inference: Lecture 5 Course: Model, Learning, and Inference: Lecture 5 Alan Yuille Department of Statistics, UCLA Los Angeles, CA 90095 yuille@stat.ucla.edu Abstract Probability distributions on structured representation.

More information

Markov Chain Monte Carlo Simulation Made Simple

Markov Chain Monte Carlo Simulation Made Simple Markov Chain Monte Carlo Simulation Made Simple Alastair Smith Department of Politics New York University April2,2003 1 Markov Chain Monte Carlo (MCMC) simualtion is a powerful technique to perform numerical

More information