CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 1

Size: px
Start display at page:

Download "CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 1"

Transcription

1 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 1 CHAPTER 5: MAXIMUM LIKELIHOOD ESTIMATION

2 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 2 Introduction to Efficient Estimation

3 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 3 Goal MLE is asymptotically efficient estimator under some regularity conditions.

4 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 4 Basic setting Suppose X 1,..., X n are i.i.d from P θ0 in the model P. (A0). θ θ implies P θ P θ (identifiability). (A1). P θ has a density function p θ with respect to a dominating σ-finite measure µ. (A2). The set {x : p θ (x) > 0} does not depend on θ.

5 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 5 MLE definition L n (θ) = n p θ (X i ), l n (θ) = n log p θ (X i ). L n (θ) and l n (θ) are called the likelihood function and the log-likelihood function of θ, respectively. An estimator ˆθ n of θ 0 is the maximum likelihood estimator (MLE) of θ 0 if it maximizes the likelihood function L n (θ).

6 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 6 Ad Hoc Arguments

7 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 7 n(ˆθn θ 0 ) d N(0, I(θ 0 ) 1 ) Consistency: ˆθ n θ 0 (no asymptotic bias) Efficiency: asymptotic variance attains efficiency bound I(θ 0 ) 1.

8 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 8 Consistency Definition 5.1 Let P be a probability measure and let Q be another measure on (Ω, A) with densities p and q with respect to a σ-finite measure µ (µ = P + Q always works). P (Ω) = 1 and Q(Ω) 1. Then the Kullback-Leibler information K(P, Q) is K(P, Q) = E P [log p(x) q(x) ].

9 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 9 Proposition 5.1 K(P, Q) is well-defined, and K(P, Q) 0. K(P, Q) = 0 if and only if P = Q. Proof By the Jensen s inequality, K(P, Q) = E P [ log q(x) p(x) ] log E P [ q(x) ] = log Q(Ω) 0. p(x) The equality holds if and only if p(x) = Mq(x) almost surely with respect P and Q(Ω) = 1 P = Q.

10 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 10 Why is MLE consistent? ˆθ n maximizes l n (θ), 1 n n pˆθn (X i ) 1 n n p θ0 (X i ). Suppose ˆθ n θ. Then we would expect to the both sides converge to E θ0 [p θ (X)] E θ0 [p θ0 (X)], which implies K(P θ0, P θ ) 0. From Prop. 5.1, P θ0 = P θ. From A0, θ = θ 0. That is, ˆθ n converges to θ 0.

11 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 11 Why is MLE efficient? Suppose ˆθ n θ 0. ˆθ n solves the following likelihood (or score) equations l n (ˆθ n ) = Taylor expansion at θ 0 : n n l θ0 (X i ) = where θ is between θ 0 and ˆθ. lˆθn (X i ) = 0. n lθ (X i )(ˆθ θ 0 ), n(ˆθ θ0 ) = 1 n { n 1 n lθ (X i ) } { n l θ0 (X i ) }.

12 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 12 n(ˆθn θ 0 ) is asymptotically equivalent to 1 n n I(θ 0 ) 1 lθ0 (X i ). Then ˆθ n is an asymptotically linear estimator of θ 0 with the influence function I(θ 0 ) 1 lθ0 = l(, P θ0 θ, P).

13 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 13 Consistency Results

14 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 14 Theorem 5.1 Consistency with dominating function Suppose that (a) Θ is compact. (b) log p θ (x) is continuous in θ for all x. (c) There exists a function F (x) such that E θ0 [F (X)] < and log p θ (x) F (x) for all x and θ. Then ˆθ n a.s. θ 0.

15 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 15 Proof For any sample ω Ω, ˆθ n is compact. By choosing a subsequence, ˆθ n θ. If 1 n n lˆθn (X i ) E θ0 [l θ (X)], then since 1 n n E θ0 [l θ (X)] E θ0 [l θ0 (X)]. θ = θ 0. Done! lˆθn (X i ) 1 n n l θ0 (X i ), It remains to show P n [lˆθn (X)] 1 n n lˆθn (X i ) E θ0 [l θ (X)]. It suffices to show P n [lˆθ(x)] E θ0 [lˆθ(x)] 0.

16 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 16 We can even prove the following uniform convergence result Define sup P n [l θ (X)] E θ0 [l θ (X)] 0. θ Θ ψ(x, θ, ρ) = sup (l θ (x) E θ0 [l θ (X)]). θ θ <ρ Since l θ is continuous, ψ(x, θ, ρ) is measurable and by the DCT, E θ0 [ψ(x, θ, ρ)] decreases to E θ0 [l θ (x) E θ0 [l θ (X)]] = 0. for ϵ > 0, for any θ Θ, there exists a ρ θ such that E θ0 [ψ(x, θ, ρ θ )] < ϵ.

17 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 17 The union of {θ : θ θ < ρ θ } covers Θ. By the compactness of Θ, there exists a finite number of θ 1,..., θ m such that lim sup n Θ m {θ : θ θ i < ρ θi }. sup {P n [l θ (X)] E θ0 [l θ (X)]} sup P n [ψ(x, θ i, ρ θi )]. θ Θ 1 i m sup {P n [l θ (X)] E θ0 [l θ (X)]} sup P θ [ψ(x, θ i, ρ θi )] ϵ. θ Θ 1 i m lim sup n sup θ Θ {P n [l θ (X)] E θ0 [l θ (X)]} 0. Similarly, lim sup n sup θ Θ { P n [l θ (X)] + E θ0 [l θ (X)]} 0. lim n sup P n [l θ (X)] E θ0 [l θ (X)] 0. θ Θ

18 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 18 Theorem 5.2 Wald s Consistency Θ is compact. Suppose θ l θ (x) = log p θ (x) is upper-semicontinuous for all x, in the sense lim sup θ θ l θ (x) l θ (x). Suppose for every sufficient small ball U Θ, E θ0 [sup θ U l θ (X)] <. Then ˆθ n p θ 0.

19 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 19 Proof E θ0 [l θ0 (X)] > E θ0 [l θ (X)] for any θ θ 0 there exists a ball U θ containing θ such that E θ0 [l θ0 (X)] > E θ0 [ sup θ U θ l θ (X)]. Otherwise, there exists a sequence θ m θ but E θ0 [l θ0 (X)] E θ0 [l θ m (X)]. Since l θ m (x) sup U l θ (X) where U is the ball satisfying the condition, lim sup m E θ0 [l θ m (X)] E θ0 [lim sup m E θ0 [l θ0 (X)] E θ0 [l θ (X)] contradiction! l θ m (X)] E θ0 [l θ (X)].

20 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 20 For any ϵ, the balls θ U θ covers the compact set Θ { θ θ 0 > ϵ} there exists a finite covering balls, U 1,..., U m. P ( ˆθ n θ 0 > ϵ) P ( sup θ θ 0 >ϵ P n [l θ (X)] P n [l θ0 (X)]) P ( max P n[ sup l θ (X)] P n [l θ0 (X)]) 1 i m θ U i m P (P n [ sup l θ (X)] P n [l θ0 (X)]). θ U i Since P n [ sup θ U i l θ (X)] a.s. E θ0 [ sup θ U i l θ (X)] < E θ0 [l θ0 (X)], the right-hand side converges to zero. ˆθ n p θ 0.

21 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 21 Asymptotic Efficiency Result

22 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 22 Theorem 5.3 Suppose that the model P = {P θ : θ Θ} is Hellinger differentiable at an inner point θ 0 of Θ R k. Furthermore, suppose that there exists a measurable function F with E θ0 [F 2 ] < such that for every θ 1 and θ 2 in a neighborhood of θ 0, log p θ1 (x) log p θ2 (x) F (x) θ 1 θ 2. If the Fisher information matrix I(θ 0 ) is nonsingular and ˆθ n is consistent, then n(ˆθn θ 0 ) = 1 n n I(θ 0 ) 1 lθ0 (X i ) + o pθ0 (1). In particular, n(ˆθ n θ 0 ) d N(0, I(θ 0 ) 1 ).

23 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 23 Proof For any h n h, by the Hellinger differentiability, ( pθ0 +h W n = 2 n / ) n 1 h T lθ0, in L 2 (P θ0 ). p θ0 n(log pθ0 +h n / n log p θ0 ) = 2 n log(1 + W n /2) p h T lθ0. [ n(pn E θ0 P )[ ] n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] [ n(pn V ar θ0 P )[ ] n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] n(pn P )[ n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] p

24 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 24 From Step I in proving Theorem 4.1, n log p θ0 +h log n / n = 1 n h T lθ0 (X i ) 1 log p θ0 n 2 ht I(θ 0 )h + o pθ0 (1). ne θ0 [log p θ0 +h n / n log p θ0 ] h T I(θ 0 )h/2. np n [log p θ0 +h n / n log p θ0 ] = 1 2 ht n I(θ 0 )h n + h T n n(pn P )[ l θ0 ] +o pθ0 (1). Choose h n = n(ˆθ n θ 0 ) and h n = I(θ 0 ) 1 n(p n P )[ l θ0 ]. np n [log pˆθn log p θ0 ] = 1 2 { n(p n P )[ l θ0 ]} T I(θ 0 ) 1 { n(p n P )[ l θ0 ]} +o pθ0 (1).

25 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 25 Compare the above two equations: 1 2 { n(ˆθn θ 0 ) + I(θ 0 ) 1 n(p n P )[ l θ0 ]} T I(θ0 ) { n(ˆθn θ 0 ) + I(θ 0 ) 1 } n(p n P )[ l θ0 ] +o pθ0 (1) 0. n(ˆθn θ 0 ) = I(θ 0 ) 1 n(p n P )[ l θ0 ] + o pθ0 (1).

26 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 26 Theorem 5.4 For each θ in an open subset of Euclidean space. Let θ l θ (x) = log p θ (x) be twice continuously differentiable for every x. Suppose E θ0 [ l θ0 l θ0 ] < and E[ l θ0 ] exists and is nonsingular. Assume that the second partial derivative of l θ (x) is dominated by a fixed integrable function F (x) for every θ in a neighborhood of θ 0. Suppose ˆθ n p θ 0. Then n(ˆθn θ 0 ) = (E θ0 [ l θ0 ]) 1 1 n n l θ0 (X i ) + o pθ0 (1).

27 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 27 Proof ˆθ n solves 0 = n lˆθ(x i ). { n n 0 = l θ0 (X i ) + lθ0 (ˆθ n θ 0 ) (ˆθ n n θ 0 ) T l (3) θ n } (ˆθ n θ 0 ). { 1 n } n lθ0 (X i ) (ˆθ n θ 0 ) 1 n n l θ0 (X i ) 1 n n F (X i ). (ˆθ n θ 0 ) = o p (1). { 1 n(ˆθn θ 0 ) n } n lθ0 (X i ) + o p (1) = 1 n n l θ0 (X i ).

28 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 28 Computation of MLE

29 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 29 Solve likelihood equation n l θ (X i ) = 0. Newton-Raphson iteration: at kth iteration, { } 1 n 1 { 1 n θ (k+1) = θ (k) lθ (k)(x i ) l n n θ (k)(x i ) }. Note 1 n lθ n (k)(x i ) I(θ (k) ). Fisher scoring algorithm: θ (k+1) = θ (k) + I(θ (k) ) 1 { 1 n n } l θ (k)(x i ).

30 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 30 Optimize the likelihood function optimum search algorithm: grid search, quasi-newton method (gradient decent algorithm), MCMC, simulation annealing

31 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 31 EM Algorithm for Missing Data

32 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 32 When part of data is missing or some mis-measured data is observed, a commonly used algorithm is called the expectation-maximization (EM) algorithm. Framework of EM algorithm Y = (Y mis, Y obs ). R is a vector of 0/1 indicating which subjects are missing/not missing. Then Y obs = RY. the density function for the observed data (Y obs, R) Y mis f(y ; θ)p (R Y )dy mis.

33 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 33 Missing mechanism Missing at random assumption (MAR): P (R Y ) = P (R Y obs ) and P (R Y ) does not depend on θ; i.e., the missing probability only depends on the observed data and it is informative about θ. Under MAR, We maximize Y mis f(y ; θ)dy mis P (R Y ). Y mis f(y ; θ)dy mis or log Y mis f(y ; θ)dy mis

34 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 34 Details of EM algorithm We start from any initial value of θ (1) and use the following iterations. The kth iteration consists both E-step and M-step: E-step. We evaluate the conditional expectation E [ log f(y ; θ) Y obs, θ (k)]. E [ log f(y ; θ) Y obs, θ (k)] = Y mis [log f(y ; θ)]f(y ; θ (k) )dy mis Y mis f(y ; θ (k) )dy mis.

35 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 35 M-step. We obtain θ (k+1) by maximizing E [ log f(y ; θ) Y obs, θ (k)]. We then iterate till the convergence of θ; i.e., the difference between θ (k+1) and θ (k) is less than a given criteria.

36 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 36 Rationale why EM works Theorem 5.5 At each iteration of the EM algorithm, log f(y obs ; θ (k+1) ) log f(y obs, θ (k) ) and the equality holds if and only if θ (k+1) = θ (k).

37 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 37 Proof E [ log f(y ; θ (k+1) ) Y obs, θ (k)] E [ log f(y ; θ (k) ) Y obs, θ (k)]. E [ log f(y mis Y obs ; θ (k+1) ) Y obs, θ (k)] + log f(y obs ; θ (k+1) ) E [ log f(y mis Y obs, θ (k) ) Y obs, θ (k)] + log f(y obs ; θ (k) ). E E [log f(y mis Y obs ; θ (k+1) ) Y obs, θ (k)] [ log f(y mis Y obs, θ (k) ) Y obs, θ (k)], log f(y obs ; θ (k+1) ) log f(y obs, θ (k) ). The equality holds iff log f(y mis Y obs, θ (k+1) ) = log f(y mis Y obs, θ (k) ), log f(y ; θ (k+1) ) = log f(y ; θ (k) ).

38 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 38 Incorporating Newton-Raphson in EM E-step. We evaluate the conditional expectation [ ] E θ log f(y ; θ) Y obs, θ (k) and E [ 2 θ 2 log f(y ; θ) Y obs, θ (k) ]

39 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 39 M-step. We obtain θ (k+1) by solving [ 0 = E θ log f(y ; θ) Y obs, θ (k) using one-step Newton-Raphson iteration: θ (k+1) = θ (k) { E [ 2 θ 2 log f(y ; θ) Y obs, θ (k) ] ]} 1 E [ θ log f(y ; θ) Y obs, θ (k) ] θ=θ (k).

40 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 40 Example Suppose a random vector Y has a multinomial distribution with n = 197 and p = ( θ 4, 1 θ 4, 1 θ 4, θ 4 ). Then the probability for Y = (y 1, y 2, y 3, y 4 ) is given by n! y 1!y 2!y 3!y 4! (1 2 + θ 4 )y 1 ( 1 θ 4 )y 2 ( 1 θ 4 )y 3 ( θ 4 )y 4. Suppose we observe Y = (125, 18, 20, 34). If we start with θ (1) = 0.5, after the convergence in the Newton-Raphson iteration, we obtain θ (k) =

41 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 41 EM algorithm: the full data is X has a multivariate normal distribution with n and the p = (1/2, θ/4, (1 θ)/4, (1 θ)/4, θ/4). Y = (X 1 + X 2, X 3, X 4, X 5 ).

42 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 42 The score equation for the complete data X is simple 0 = X 2 + X 5 θ X 3 + X 4 1 θ. M-step of the EM algorithm needs to solve the equation [ X2 + X 5 0 = E X ] 3 + X 4 Y, θ(k) ; θ 1 θ while the E-step evaluates the above expectation. E[X Y, θ (k) 1/2 ] = (Y 1 1/2 + θ (k) /4, Y θ (k) /4 1 1/2 + θ (k) /4, Y 2, Y 3, Y 4 ). θ (k+1) = E[X 2 + X 5 Y, θ (k) ] E[X 2 + X 5 + X 3 + X 4 Y, θ (k) ] = Y 1 θ (k) /4 1/2+θ (k) /4 + Y 4 Y 1 θ (k) /4 1/2+θ (k) /4 + Y 2 + Y 3 + Y 4. We start form θ (1) = 0.5.

43 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 43 k θ (k+1) θ (k+1) θ (k) θ(k+1) ˆθ n θ (k) ˆθ n

44 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 44 Conclusions the EM converges and the result agrees with what is obtained form the Newton-Raphson iteration; the EM convergence is linear as (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) becomes a constant when convergence; the convergence in the Newton-Raphson iteration is quadratic in the sense (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) 2 becomes a constant when convergence; the EM is much less complex than the Newton-Raphson iteration and this is the advantage of using the EM algorithm.

45 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 45 More example the example of exponential mixture model: Suppose Y P θ where P θ has density p θ (y) = { pλe λy + (1 p)µe µy} I(y > 0) and θ = (p, λ, µ) (0, 1) (0, ) (0, ). Consider estimation of θ based on Y 1,..., Y n i.i.d p θ (y). Solving the likelihood equation using the Newton-Raphson is much computation involved.

46 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 46 EM algorithm: the complete data X = (Y, ) p θ (x) where p θ (x) = p θ (y, δ) = (pye λy ) δ ((1 p)µe µy ) 1 δ. This is natural from the following mechanism: is a bernoulli variable with P ( = 1) = p and we generate Y from Exp(λ) if = 1 and from Exp(µ) if = 0. Thus, is missing. The score equation for θ based on X is equal to 0 = l n { i p (X 1,..., X n ) = p 1 } i, 1 p 0 = l λ (X 1,..., X n ) = 0 = l µ (X 1,..., X n ) = n i ( 1 λ Y i), n (1 i )( 1 µ Y i).

47 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 47 M-step solves the equations n [{ i 0 = E p 1 } ] i Y 1,..., Y n, p (k), λ (k), µ (k) 1 p 0 = = 0 = n E n E = = [{ i p 1 } ] i Y i, p (k), λ (k), µ (k), 1 p [ i ( 1λ ] Y i) Y 1,..., Y n, p (k), λ (k), µ (k) n E [ i ( 1λ ] Y i) Y i, p (k), λ (k), µ (k), n E [1 i )( 1µ ] Y i) Y 1,..., Y n, p (k), λ (k), µ (k) n E [1 i )( 1µ ] Y i) Y i, p (k), λ (k), µ (k).

48 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 48 This immediately gives p (k+1) = 1 n n E[ i Y i, p (k), λ (k), µ (k) ], λ (k+1) = n E[ i Y i, p (k), λ (k), µ (k) ] n Y ie[ i Y i, p (k), λ (k), µ (k) ], µ (k+1) = n E[(1 i) Y i, p (k), λ (k), µ (k) ] n Y ie[(1 i ) Y i, p (k), λ (k), µ (k) ]. The conditional expectation E[ Y, θ] = pλe λy pλe λy + (1 p)µe µy. As seen above, the EM algorithm facilitates the computation.

49 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 49 Information Calculation in EM

50 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 50 Notation l c as the score function for θ in the full data; l mis obs as the score for θ in the conditional distribution of Y mis given Y obs ; l obs as the the score for θ in the distribution of Y obs. l c = l mis obs + l obs. V ar( l c ) = V ar(e[ l c Y obs ]) + E[V ar( l c Y obs )].

51 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 51 Information in the EM algorithm We obtain the following Louis formula I c (θ) = I obs (θ) + E[I mis obs (θ, Y obs )]. Thus, the complete information is the summation of the observed information and the missing information. One can even show when the EM converges, the convergence linear rate, denote as (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) approximates the 1 I obs (ˆθ n )/I c (ˆθ n ).

52 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 52 Nonparametric Maximum Likelihood Estimation

53 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 53 First example Let X 1,..., X n be i.i.d random variables with common distribution F, where F is any unknown distribution function. The likelihood function for F is given by L n (F ) = n f(x i ), where f(x i ) is the density function of F with respect to some dominating measure. However, the maximum of L n (F ) does not exists. We instead maximize an alternative function L n (F ) = n F {X i },

54 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 54 where F {X i } denotes the value F (X i ) F (X i ).

55 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 55 Second example Suppose X 1,..., X n are i.i.d F and Y 1,..., Y n are i.i.d G. We observe i.i.d pairs (Z 1, 1 ),..., (Z n, n ), where Z i = min(x i, Y i ) and i = I(X i Y i ). We can think X i as survival time and Y i as censoring time. Then it is easy to calculate the joint distributions for (Z i, i ), i = 1,..., n, is equal to L n (F, G) = n {f(z i )(1 G(Z i ))} i {(1 F (Z i ))g(z i )} 1 i L n (F, G) does not have the maximum so we consider an alternative function n {F {Z i }(1 G(Z i ))} i {(1 F (Z i ))G{Z i }} 1 i.

56 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 56 Third example Suppose T is survival time and Z is covariate. Assume T Z has a conditional hazard function λ(t Z) = λ(t)e θt Z. Then the likelihood function from n i.i.d (T i, Z i ), i = 1,..., n is given by L n (θ, Λ) = n { λ(ti ) exp{ Λ(T i )e θt Z i }f(z i ) }. Note f(z i ) is not informative about θ and λ so we can discard it from the likelihood function. Again, we replace

57 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 57 λ{t i } by Λ{T i } and obtain a modified function L n (θ, Λ) = n Let p i = Λ{T i } we maximize n or its logarithm as n { Λ{Ti } exp{ Λ(T i )e θt Z i } }. p i exp{ ( Y j Y i p j )e θt Z i } θt Z i exp{θ T Z i } Y j Y i p j + log p j.

58 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 58 Fourth example We consider X 1,..., X n are i.i.d F and Y 1,..., Y n are i.i.d G. We only observe (Y i, i ) where i = I(X i Y i ) for i = 1,..., n. This data is one type of interval censored data (or current status data). The likelihood for the observations is n { F (Yi ) i (1 F (Y i )) 1 i g(y i ) }. To derive the NPMLE for F and G, we instead maximize n { P i i (1 P i ) 1 i q i }, subject to the constraint that q i = 1 and 0 P i 1 increases with Y i.

59 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 59 Clearly, ˆq i = 1/n (suppose Y i are all different). This constrained maximization turns out to be solved by the following steps: (i) Plot the points (i, Y j Y i j ), i = 1,..., n. This is called the cumulative sum diagram. (ii) Form the H (t), the greatest the convex minorant of the cumulative sum diagram. (iii) Let ˆP i be the left derivative of H at i. Then ( ˆP 1,..., ˆP n ) maximizes the object function.

60 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 60 Summary of NPMLE The NPMLE is a generalization of the maximum likelihood estimation in the parametric model the semiparametric or nonparametric models. We replace the functional parameter by an empirical function with jumps only at observed data and maximize a modified likelihood function. Both computation of the NPMLE and the asymptotic property of the NPMLE can be difficult and vary for different specific problems.

61 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 61 Alternative Efficient Estimation

62 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 62 One-step efficient estimation start from a strongly consistent estimator for parameter θ, denoted by θ n, assuming that θ n θ 0 = O p (n 1/2 ). One-step procedure is a one-step Newton-Raphson iteration in solving the likelihood score equation; ˆθ n = θ n { ln ( θ n ) } 1 ln ( θ n ), where l n (θ) is the sore function and l n (θ) is the derivative of l n (θ).

63 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 63 Result about the one-step estimation Theorem 5.6 Let l θ (X) be the log-likelihood function of θ. Assume that there exists a neighborhood of θ 0 such that in this neighborhood, l (3) θ (X) F (X) with E[F (X)] <. Then n(ˆθn θ 0 ) d N(0, I(θ 0 ) 1 ), where I(θ 0 ) is the Fisher information.

64 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 64 Proof Since θ n a.s. θ 0, we perform the Taylor expansion on the right-hand side of the one-step equation and obtain ˆθ n = θ n { ln ( θ n )} { ln (θ 0 ) + l } n (θ )( θ n θ 0 ) where θ is between θ n and θ 0. [ ] 1 ˆθ n θ 0 = I { ln ( θ n )} ln (θ ) ( θ n θ 0 ) { ln ( θ n )} ln (θ 0 ). On the other hand, by the condition that l (3) θ (X) F (X) with E[F (X)] <, 1 n l n (θ ) a.s. E[ l θ0 (X)], ˆθ n θ 0 = o p ( θ n θ 0 ) 1 n l n ( θ n ) a.s. E[ l θ0 (X)]. { E[ l θ0 (X)] + o p (1) } 1 1 n l n (θ 0 ).

65 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 65 Sightly different one-step estimation ˆθ n = θ n + I( θ n ) 1 l( θn ). Other efficient estimation the Bayesian estimation method (posterior mode, minimax estimator etc.)

66 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 66 Conclusions The maximum likelihood approach provides a natural and simple way of deriving an efficient estimator. Other estimation approaches are possible for efficient estimation such as one-step estimation, Bayesian estimation etc. Generalization from parametric models to semiparametric or nonparametric models. How?

67 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 67 READING MATERIALS: Ferguson, Sections 16-20, Lehmann and Casella, Sections

Metric Spaces. Chapter 7. 7.1. Metrics

Metric Spaces. Chapter 7. 7.1. Metrics Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some

More information

Basics of Statistical Machine Learning

Basics of Statistical Machine Learning CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar

More information

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by

i=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter

More information

The Expectation Maximization Algorithm A short tutorial

The Expectation Maximization Algorithm A short tutorial The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected

More information

Lecture 13: Martingales

Lecture 13: Martingales Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for

More information

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...

MATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators... MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

2.3 Convex Constrained Optimization Problems

2.3 Convex Constrained Optimization Problems 42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct

More information

Linear Threshold Units

Linear Threshold Units Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear

More information

1 if 1 x 0 1 if 0 x 1

1 if 1 x 0 1 if 0 x 1 Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or

More information

Parametric fractional imputation for missing data analysis

Parametric fractional imputation for missing data analysis 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in

More information

(Quasi-)Newton methods

(Quasi-)Newton methods (Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting

More information

1 Norms and Vector Spaces

1 Norms and Vector Spaces 008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)

More information

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization

Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,

More information

Multivariate Normal Distribution

Multivariate Normal Distribution Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Master s Theory Exam Spring 2006

Master s Theory Exam Spring 2006 Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem

More information

An extension of the factoring likelihood approach for non-monotone missing data

An extension of the factoring likelihood approach for non-monotone missing data An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions

More information

Un point de vue bayésien pour des algorithmes de bandit plus performants

Un point de vue bayésien pour des algorithmes de bandit plus performants Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)

More information

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.

. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i. Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric

More information

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values

More information

Reject Inference in Credit Scoring. Jie-Men Mok

Reject Inference in Credit Scoring. Jie-Men Mok Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business

More information

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics

Chapter 6: Point Estimation. Fall 2011. - Probability & Statistics STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of

More information

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem

More information

Package EstCRM. July 13, 2015

Package EstCRM. July 13, 2015 Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu

More information

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem

IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have

More information

BANACH AND HILBERT SPACE REVIEW

BANACH AND HILBERT SPACE REVIEW BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but

More information

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University

Pattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision

More information

Linear Classification. Volker Tresp Summer 2015

Linear Classification. Volker Tresp Summer 2015 Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong

More information

Note on the EM Algorithm in Linear Regression Model

Note on the EM Algorithm in Linear Regression Model International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University

More information

Adaptive Online Gradient Descent

Adaptive Online Gradient Descent Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

BayesX - Software for Bayesian Inference in Structured Additive Regression

BayesX - Software for Bayesian Inference in Structured Additive Regression BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich

More information

Big Data - Lecture 1 Optimization reminders

Big Data - Lecture 1 Optimization reminders Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics

More information

Probabilistic user behavior models in online stores for recommender systems

Probabilistic user behavior models in online stores for recommender systems Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user

More information

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max

More information

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling

Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data

More information

Christfried Webers. Canberra February June 2015

Christfried Webers. Canberra February June 2015 c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic

More information

Mathematical Methods of Engineering Analysis

Mathematical Methods of Engineering Analysis Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................

More information

Logistic Regression (1/24/13)

Logistic Regression (1/24/13) STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used

More information

Reference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3.

Reference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3. 5 Potential Theory Reference: Introduction to Partial Differential Equations by G. Folland, 995, Chap. 3. 5. Problems of Interest. In what follows, we consider Ω an open, bounded subset of R n with C 2

More information

STAT 830 Convergence in Distribution

STAT 830 Convergence in Distribution STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31

More information

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(

More information

Principle of Data Reduction

Principle of Data Reduction Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

More information

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods

Lecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:

More information

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab

Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?

More information

Gambling and Data Compression

Gambling and Data Compression Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.

Walrasian Demand. u(x) where B(p, w) = {x R n + : p x w}. Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Nonlinear Optimization: Algorithms 3: Interior-point methods

Nonlinear Optimization: Algorithms 3: Interior-point methods Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,

More information

Statistics 580 The EM Algorithm Introduction

Statistics 580 The EM Algorithm Introduction Statistics 580 The EM Algorithm Introduction The EM algorithm is a very general iterative algorithm for parameter estimation by maximum likelihood when some of the random variables involved are not observed

More information

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok

THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan

More information

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope

Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope Dukpa Kim Boston University Pierre Perron Boston University December 4, 2006 THE TESTING

More information

Distribution (Weibull) Fitting

Distribution (Weibull) Fitting Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions

More information

10.2 Series and Convergence

10.2 Series and Convergence 10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and

More information

Continued Fractions and the Euclidean Algorithm

Continued Fractions and the Euclidean Algorithm Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction

More information

Notes for a graduate-level course in asymptotics for statisticians. David R. Hunter Penn State University

Notes for a graduate-level course in asymptotics for statisticians. David R. Hunter Penn State University Notes for a graduate-level course in asymptotics for statisticians David R. Hunter Penn State University June 2014 Contents Preface 1 1 Mathematical and Statistical Preliminaries 3 1.1 Limits and Continuity..............................

More information

Notes on metric spaces

Notes on metric spaces Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.

More information

Linda Staub & Alexandros Gekenidis

Linda Staub & Alexandros Gekenidis Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until

More information

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu

Advanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu Advanced statistical inference Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu August 1, 2012 2 Chapter 1 Basic Inference 1.1 A review of results in statistical inference In this section, we

More information

Nonlinear Programming Methods.S2 Quadratic Programming

Nonlinear Programming Methods.S2 Quadratic Programming Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective

More information

From Binomial Trees to the Black-Scholes Option Pricing Formulas

From Binomial Trees to the Black-Scholes Option Pricing Formulas Lecture 4 From Binomial Trees to the Black-Scholes Option Pricing Formulas In this lecture, we will extend the example in Lecture 2 to a general setting of binomial trees, as an important model for a single

More information

Inner Product Spaces

Inner Product Spaces Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and

More information

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION

CITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August

More information

Date: April 12, 2001. Contents

Date: April 12, 2001. Contents 2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........

More information

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)

2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR) 2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came

More information

MATH 425, PRACTICE FINAL EXAM SOLUTIONS.

MATH 425, PRACTICE FINAL EXAM SOLUTIONS. MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator

More information

1. Prove that the empty set is a subset of every set.

1. Prove that the empty set is a subset of every set. 1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since

More information

Non Parametric Inference

Non Parametric Inference Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable

More information

Detekce změn v autoregresních posloupnostech

Detekce změn v autoregresních posloupnostech Nové Hrady 2012 Outline 1 Introduction 2 3 4 Change point problem (retrospective) The data Y 1,..., Y n follow a statistical model, which may change once or several times during the observation period

More information

Differentiation and Integration

Differentiation and Integration This material is a supplement to Appendix G of Stewart. You should read the appendix, except the last section on complex exponentials, before this material. Differentiation and Integration Suppose we have

More information

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all. 1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.

More information

CHAPTER 2 Estimating Probabilities

CHAPTER 2 Estimating Probabilities CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a

More information

Online Learning, Stability, and Stochastic Gradient Descent

Online Learning, Stability, and Stochastic Gradient Descent Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain

More information

17.3.1 Follow the Perturbed Leader

17.3.1 Follow the Perturbed Leader CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.

More information

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference

What is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference 0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures

More information

M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung

M2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS

More information

Math 461 Fall 2006 Test 2 Solutions

Math 461 Fall 2006 Test 2 Solutions Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two

More information

Microeconomic Theory: Basic Math Concepts

Microeconomic Theory: Basic Math Concepts Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts

More information

A Basic Introduction to Missing Data

A Basic Introduction to Missing Data John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item

More information

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration

Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.

More information

Probabilistic Methods for Time-Series Analysis

Probabilistic Methods for Time-Series Analysis Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:

More information

Testing predictive performance of binary choice models 1

Testing predictive performance of binary choice models 1 Testing predictive performance of binary choice models 1 Bas Donkers Econometric Institute and Department of Marketing Erasmus University Rotterdam Bertrand Melenberg Department of Econometrics Tilburg

More information

Lecture 15 Introduction to Survival Analysis

Lecture 15 Introduction to Survival Analysis Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence

More information

Metric Spaces Joseph Muscat 2003 (Last revised May 2009)

Metric Spaces Joseph Muscat 2003 (Last revised May 2009) 1 Distance J Muscat 1 Metric Spaces Joseph Muscat 2003 (Last revised May 2009) (A revised and expanded version of these notes are now published by Springer.) 1 Distance A metric space can be thought of

More information

ALMOST COMMON PRIORS 1. INTRODUCTION

ALMOST COMMON PRIORS 1. INTRODUCTION ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type

More information

Section 6.1 Joint Distribution Functions

Section 6.1 Joint Distribution Functions Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function

More information

Convex analysis and profit/cost/support functions

Convex analysis and profit/cost/support functions CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m

More information

Practice with Proofs

Practice with Proofs Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

More information

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not

More information

APPLIED MISSING DATA ANALYSIS

APPLIED MISSING DATA ANALYSIS APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview

More information

GLM, insurance pricing & big data: paying attention to convergence issues.

GLM, insurance pricing & big data: paying attention to convergence issues. GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.

More information

Permanents, Order Statistics, Outliers, and Robustness

Permanents, Order Statistics, Outliers, and Robustness Permanents, Order Statistics, Outliers, and Robustness N. BALAKRISHNAN Department of Mathematics and Statistics McMaster University Hamilton, Ontario, Canada L8S 4K bala@mcmaster.ca Received: November

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information