CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 1
|
|
- Curtis Walters
- 7 years ago
- Views:
Transcription
1 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 1 CHAPTER 5: MAXIMUM LIKELIHOOD ESTIMATION
2 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 2 Introduction to Efficient Estimation
3 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 3 Goal MLE is asymptotically efficient estimator under some regularity conditions.
4 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 4 Basic setting Suppose X 1,..., X n are i.i.d from P θ0 in the model P. (A0). θ θ implies P θ P θ (identifiability). (A1). P θ has a density function p θ with respect to a dominating σ-finite measure µ. (A2). The set {x : p θ (x) > 0} does not depend on θ.
5 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 5 MLE definition L n (θ) = n p θ (X i ), l n (θ) = n log p θ (X i ). L n (θ) and l n (θ) are called the likelihood function and the log-likelihood function of θ, respectively. An estimator ˆθ n of θ 0 is the maximum likelihood estimator (MLE) of θ 0 if it maximizes the likelihood function L n (θ).
6 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 6 Ad Hoc Arguments
7 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 7 n(ˆθn θ 0 ) d N(0, I(θ 0 ) 1 ) Consistency: ˆθ n θ 0 (no asymptotic bias) Efficiency: asymptotic variance attains efficiency bound I(θ 0 ) 1.
8 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 8 Consistency Definition 5.1 Let P be a probability measure and let Q be another measure on (Ω, A) with densities p and q with respect to a σ-finite measure µ (µ = P + Q always works). P (Ω) = 1 and Q(Ω) 1. Then the Kullback-Leibler information K(P, Q) is K(P, Q) = E P [log p(x) q(x) ].
9 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 9 Proposition 5.1 K(P, Q) is well-defined, and K(P, Q) 0. K(P, Q) = 0 if and only if P = Q. Proof By the Jensen s inequality, K(P, Q) = E P [ log q(x) p(x) ] log E P [ q(x) ] = log Q(Ω) 0. p(x) The equality holds if and only if p(x) = Mq(x) almost surely with respect P and Q(Ω) = 1 P = Q.
10 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 10 Why is MLE consistent? ˆθ n maximizes l n (θ), 1 n n pˆθn (X i ) 1 n n p θ0 (X i ). Suppose ˆθ n θ. Then we would expect to the both sides converge to E θ0 [p θ (X)] E θ0 [p θ0 (X)], which implies K(P θ0, P θ ) 0. From Prop. 5.1, P θ0 = P θ. From A0, θ = θ 0. That is, ˆθ n converges to θ 0.
11 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 11 Why is MLE efficient? Suppose ˆθ n θ 0. ˆθ n solves the following likelihood (or score) equations l n (ˆθ n ) = Taylor expansion at θ 0 : n n l θ0 (X i ) = where θ is between θ 0 and ˆθ. lˆθn (X i ) = 0. n lθ (X i )(ˆθ θ 0 ), n(ˆθ θ0 ) = 1 n { n 1 n lθ (X i ) } { n l θ0 (X i ) }.
12 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 12 n(ˆθn θ 0 ) is asymptotically equivalent to 1 n n I(θ 0 ) 1 lθ0 (X i ). Then ˆθ n is an asymptotically linear estimator of θ 0 with the influence function I(θ 0 ) 1 lθ0 = l(, P θ0 θ, P).
13 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 13 Consistency Results
14 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 14 Theorem 5.1 Consistency with dominating function Suppose that (a) Θ is compact. (b) log p θ (x) is continuous in θ for all x. (c) There exists a function F (x) such that E θ0 [F (X)] < and log p θ (x) F (x) for all x and θ. Then ˆθ n a.s. θ 0.
15 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 15 Proof For any sample ω Ω, ˆθ n is compact. By choosing a subsequence, ˆθ n θ. If 1 n n lˆθn (X i ) E θ0 [l θ (X)], then since 1 n n E θ0 [l θ (X)] E θ0 [l θ0 (X)]. θ = θ 0. Done! lˆθn (X i ) 1 n n l θ0 (X i ), It remains to show P n [lˆθn (X)] 1 n n lˆθn (X i ) E θ0 [l θ (X)]. It suffices to show P n [lˆθ(x)] E θ0 [lˆθ(x)] 0.
16 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 16 We can even prove the following uniform convergence result Define sup P n [l θ (X)] E θ0 [l θ (X)] 0. θ Θ ψ(x, θ, ρ) = sup (l θ (x) E θ0 [l θ (X)]). θ θ <ρ Since l θ is continuous, ψ(x, θ, ρ) is measurable and by the DCT, E θ0 [ψ(x, θ, ρ)] decreases to E θ0 [l θ (x) E θ0 [l θ (X)]] = 0. for ϵ > 0, for any θ Θ, there exists a ρ θ such that E θ0 [ψ(x, θ, ρ θ )] < ϵ.
17 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 17 The union of {θ : θ θ < ρ θ } covers Θ. By the compactness of Θ, there exists a finite number of θ 1,..., θ m such that lim sup n Θ m {θ : θ θ i < ρ θi }. sup {P n [l θ (X)] E θ0 [l θ (X)]} sup P n [ψ(x, θ i, ρ θi )]. θ Θ 1 i m sup {P n [l θ (X)] E θ0 [l θ (X)]} sup P θ [ψ(x, θ i, ρ θi )] ϵ. θ Θ 1 i m lim sup n sup θ Θ {P n [l θ (X)] E θ0 [l θ (X)]} 0. Similarly, lim sup n sup θ Θ { P n [l θ (X)] + E θ0 [l θ (X)]} 0. lim n sup P n [l θ (X)] E θ0 [l θ (X)] 0. θ Θ
18 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 18 Theorem 5.2 Wald s Consistency Θ is compact. Suppose θ l θ (x) = log p θ (x) is upper-semicontinuous for all x, in the sense lim sup θ θ l θ (x) l θ (x). Suppose for every sufficient small ball U Θ, E θ0 [sup θ U l θ (X)] <. Then ˆθ n p θ 0.
19 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 19 Proof E θ0 [l θ0 (X)] > E θ0 [l θ (X)] for any θ θ 0 there exists a ball U θ containing θ such that E θ0 [l θ0 (X)] > E θ0 [ sup θ U θ l θ (X)]. Otherwise, there exists a sequence θ m θ but E θ0 [l θ0 (X)] E θ0 [l θ m (X)]. Since l θ m (x) sup U l θ (X) where U is the ball satisfying the condition, lim sup m E θ0 [l θ m (X)] E θ0 [lim sup m E θ0 [l θ0 (X)] E θ0 [l θ (X)] contradiction! l θ m (X)] E θ0 [l θ (X)].
20 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 20 For any ϵ, the balls θ U θ covers the compact set Θ { θ θ 0 > ϵ} there exists a finite covering balls, U 1,..., U m. P ( ˆθ n θ 0 > ϵ) P ( sup θ θ 0 >ϵ P n [l θ (X)] P n [l θ0 (X)]) P ( max P n[ sup l θ (X)] P n [l θ0 (X)]) 1 i m θ U i m P (P n [ sup l θ (X)] P n [l θ0 (X)]). θ U i Since P n [ sup θ U i l θ (X)] a.s. E θ0 [ sup θ U i l θ (X)] < E θ0 [l θ0 (X)], the right-hand side converges to zero. ˆθ n p θ 0.
21 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 21 Asymptotic Efficiency Result
22 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 22 Theorem 5.3 Suppose that the model P = {P θ : θ Θ} is Hellinger differentiable at an inner point θ 0 of Θ R k. Furthermore, suppose that there exists a measurable function F with E θ0 [F 2 ] < such that for every θ 1 and θ 2 in a neighborhood of θ 0, log p θ1 (x) log p θ2 (x) F (x) θ 1 θ 2. If the Fisher information matrix I(θ 0 ) is nonsingular and ˆθ n is consistent, then n(ˆθn θ 0 ) = 1 n n I(θ 0 ) 1 lθ0 (X i ) + o pθ0 (1). In particular, n(ˆθ n θ 0 ) d N(0, I(θ 0 ) 1 ).
23 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 23 Proof For any h n h, by the Hellinger differentiability, ( pθ0 +h W n = 2 n / ) n 1 h T lθ0, in L 2 (P θ0 ). p θ0 n(log pθ0 +h n / n log p θ0 ) = 2 n log(1 + W n /2) p h T lθ0. [ n(pn E θ0 P )[ ] n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] [ n(pn V ar θ0 P )[ ] n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] n(pn P )[ n(log p θ0 +h n / n log p θ0 ) h T lθ0 ] p
24 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 24 From Step I in proving Theorem 4.1, n log p θ0 +h log n / n = 1 n h T lθ0 (X i ) 1 log p θ0 n 2 ht I(θ 0 )h + o pθ0 (1). ne θ0 [log p θ0 +h n / n log p θ0 ] h T I(θ 0 )h/2. np n [log p θ0 +h n / n log p θ0 ] = 1 2 ht n I(θ 0 )h n + h T n n(pn P )[ l θ0 ] +o pθ0 (1). Choose h n = n(ˆθ n θ 0 ) and h n = I(θ 0 ) 1 n(p n P )[ l θ0 ]. np n [log pˆθn log p θ0 ] = 1 2 { n(p n P )[ l θ0 ]} T I(θ 0 ) 1 { n(p n P )[ l θ0 ]} +o pθ0 (1).
25 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 25 Compare the above two equations: 1 2 { n(ˆθn θ 0 ) + I(θ 0 ) 1 n(p n P )[ l θ0 ]} T I(θ0 ) { n(ˆθn θ 0 ) + I(θ 0 ) 1 } n(p n P )[ l θ0 ] +o pθ0 (1) 0. n(ˆθn θ 0 ) = I(θ 0 ) 1 n(p n P )[ l θ0 ] + o pθ0 (1).
26 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 26 Theorem 5.4 For each θ in an open subset of Euclidean space. Let θ l θ (x) = log p θ (x) be twice continuously differentiable for every x. Suppose E θ0 [ l θ0 l θ0 ] < and E[ l θ0 ] exists and is nonsingular. Assume that the second partial derivative of l θ (x) is dominated by a fixed integrable function F (x) for every θ in a neighborhood of θ 0. Suppose ˆθ n p θ 0. Then n(ˆθn θ 0 ) = (E θ0 [ l θ0 ]) 1 1 n n l θ0 (X i ) + o pθ0 (1).
27 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 27 Proof ˆθ n solves 0 = n lˆθ(x i ). { n n 0 = l θ0 (X i ) + lθ0 (ˆθ n θ 0 ) (ˆθ n n θ 0 ) T l (3) θ n } (ˆθ n θ 0 ). { 1 n } n lθ0 (X i ) (ˆθ n θ 0 ) 1 n n l θ0 (X i ) 1 n n F (X i ). (ˆθ n θ 0 ) = o p (1). { 1 n(ˆθn θ 0 ) n } n lθ0 (X i ) + o p (1) = 1 n n l θ0 (X i ).
28 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 28 Computation of MLE
29 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 29 Solve likelihood equation n l θ (X i ) = 0. Newton-Raphson iteration: at kth iteration, { } 1 n 1 { 1 n θ (k+1) = θ (k) lθ (k)(x i ) l n n θ (k)(x i ) }. Note 1 n lθ n (k)(x i ) I(θ (k) ). Fisher scoring algorithm: θ (k+1) = θ (k) + I(θ (k) ) 1 { 1 n n } l θ (k)(x i ).
30 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 30 Optimize the likelihood function optimum search algorithm: grid search, quasi-newton method (gradient decent algorithm), MCMC, simulation annealing
31 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 31 EM Algorithm for Missing Data
32 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 32 When part of data is missing or some mis-measured data is observed, a commonly used algorithm is called the expectation-maximization (EM) algorithm. Framework of EM algorithm Y = (Y mis, Y obs ). R is a vector of 0/1 indicating which subjects are missing/not missing. Then Y obs = RY. the density function for the observed data (Y obs, R) Y mis f(y ; θ)p (R Y )dy mis.
33 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 33 Missing mechanism Missing at random assumption (MAR): P (R Y ) = P (R Y obs ) and P (R Y ) does not depend on θ; i.e., the missing probability only depends on the observed data and it is informative about θ. Under MAR, We maximize Y mis f(y ; θ)dy mis P (R Y ). Y mis f(y ; θ)dy mis or log Y mis f(y ; θ)dy mis
34 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 34 Details of EM algorithm We start from any initial value of θ (1) and use the following iterations. The kth iteration consists both E-step and M-step: E-step. We evaluate the conditional expectation E [ log f(y ; θ) Y obs, θ (k)]. E [ log f(y ; θ) Y obs, θ (k)] = Y mis [log f(y ; θ)]f(y ; θ (k) )dy mis Y mis f(y ; θ (k) )dy mis.
35 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 35 M-step. We obtain θ (k+1) by maximizing E [ log f(y ; θ) Y obs, θ (k)]. We then iterate till the convergence of θ; i.e., the difference between θ (k+1) and θ (k) is less than a given criteria.
36 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 36 Rationale why EM works Theorem 5.5 At each iteration of the EM algorithm, log f(y obs ; θ (k+1) ) log f(y obs, θ (k) ) and the equality holds if and only if θ (k+1) = θ (k).
37 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 37 Proof E [ log f(y ; θ (k+1) ) Y obs, θ (k)] E [ log f(y ; θ (k) ) Y obs, θ (k)]. E [ log f(y mis Y obs ; θ (k+1) ) Y obs, θ (k)] + log f(y obs ; θ (k+1) ) E [ log f(y mis Y obs, θ (k) ) Y obs, θ (k)] + log f(y obs ; θ (k) ). E E [log f(y mis Y obs ; θ (k+1) ) Y obs, θ (k)] [ log f(y mis Y obs, θ (k) ) Y obs, θ (k)], log f(y obs ; θ (k+1) ) log f(y obs, θ (k) ). The equality holds iff log f(y mis Y obs, θ (k+1) ) = log f(y mis Y obs, θ (k) ), log f(y ; θ (k+1) ) = log f(y ; θ (k) ).
38 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 38 Incorporating Newton-Raphson in EM E-step. We evaluate the conditional expectation [ ] E θ log f(y ; θ) Y obs, θ (k) and E [ 2 θ 2 log f(y ; θ) Y obs, θ (k) ]
39 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 39 M-step. We obtain θ (k+1) by solving [ 0 = E θ log f(y ; θ) Y obs, θ (k) using one-step Newton-Raphson iteration: θ (k+1) = θ (k) { E [ 2 θ 2 log f(y ; θ) Y obs, θ (k) ] ]} 1 E [ θ log f(y ; θ) Y obs, θ (k) ] θ=θ (k).
40 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 40 Example Suppose a random vector Y has a multinomial distribution with n = 197 and p = ( θ 4, 1 θ 4, 1 θ 4, θ 4 ). Then the probability for Y = (y 1, y 2, y 3, y 4 ) is given by n! y 1!y 2!y 3!y 4! (1 2 + θ 4 )y 1 ( 1 θ 4 )y 2 ( 1 θ 4 )y 3 ( θ 4 )y 4. Suppose we observe Y = (125, 18, 20, 34). If we start with θ (1) = 0.5, after the convergence in the Newton-Raphson iteration, we obtain θ (k) =
41 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 41 EM algorithm: the full data is X has a multivariate normal distribution with n and the p = (1/2, θ/4, (1 θ)/4, (1 θ)/4, θ/4). Y = (X 1 + X 2, X 3, X 4, X 5 ).
42 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 42 The score equation for the complete data X is simple 0 = X 2 + X 5 θ X 3 + X 4 1 θ. M-step of the EM algorithm needs to solve the equation [ X2 + X 5 0 = E X ] 3 + X 4 Y, θ(k) ; θ 1 θ while the E-step evaluates the above expectation. E[X Y, θ (k) 1/2 ] = (Y 1 1/2 + θ (k) /4, Y θ (k) /4 1 1/2 + θ (k) /4, Y 2, Y 3, Y 4 ). θ (k+1) = E[X 2 + X 5 Y, θ (k) ] E[X 2 + X 5 + X 3 + X 4 Y, θ (k) ] = Y 1 θ (k) /4 1/2+θ (k) /4 + Y 4 Y 1 θ (k) /4 1/2+θ (k) /4 + Y 2 + Y 3 + Y 4. We start form θ (1) = 0.5.
43 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 43 k θ (k+1) θ (k+1) θ (k) θ(k+1) ˆθ n θ (k) ˆθ n
44 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 44 Conclusions the EM converges and the result agrees with what is obtained form the Newton-Raphson iteration; the EM convergence is linear as (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) becomes a constant when convergence; the convergence in the Newton-Raphson iteration is quadratic in the sense (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) 2 becomes a constant when convergence; the EM is much less complex than the Newton-Raphson iteration and this is the advantage of using the EM algorithm.
45 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 45 More example the example of exponential mixture model: Suppose Y P θ where P θ has density p θ (y) = { pλe λy + (1 p)µe µy} I(y > 0) and θ = (p, λ, µ) (0, 1) (0, ) (0, ). Consider estimation of θ based on Y 1,..., Y n i.i.d p θ (y). Solving the likelihood equation using the Newton-Raphson is much computation involved.
46 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 46 EM algorithm: the complete data X = (Y, ) p θ (x) where p θ (x) = p θ (y, δ) = (pye λy ) δ ((1 p)µe µy ) 1 δ. This is natural from the following mechanism: is a bernoulli variable with P ( = 1) = p and we generate Y from Exp(λ) if = 1 and from Exp(µ) if = 0. Thus, is missing. The score equation for θ based on X is equal to 0 = l n { i p (X 1,..., X n ) = p 1 } i, 1 p 0 = l λ (X 1,..., X n ) = 0 = l µ (X 1,..., X n ) = n i ( 1 λ Y i), n (1 i )( 1 µ Y i).
47 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 47 M-step solves the equations n [{ i 0 = E p 1 } ] i Y 1,..., Y n, p (k), λ (k), µ (k) 1 p 0 = = 0 = n E n E = = [{ i p 1 } ] i Y i, p (k), λ (k), µ (k), 1 p [ i ( 1λ ] Y i) Y 1,..., Y n, p (k), λ (k), µ (k) n E [ i ( 1λ ] Y i) Y i, p (k), λ (k), µ (k), n E [1 i )( 1µ ] Y i) Y 1,..., Y n, p (k), λ (k), µ (k) n E [1 i )( 1µ ] Y i) Y i, p (k), λ (k), µ (k).
48 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 48 This immediately gives p (k+1) = 1 n n E[ i Y i, p (k), λ (k), µ (k) ], λ (k+1) = n E[ i Y i, p (k), λ (k), µ (k) ] n Y ie[ i Y i, p (k), λ (k), µ (k) ], µ (k+1) = n E[(1 i) Y i, p (k), λ (k), µ (k) ] n Y ie[(1 i ) Y i, p (k), λ (k), µ (k) ]. The conditional expectation E[ Y, θ] = pλe λy pλe λy + (1 p)µe µy. As seen above, the EM algorithm facilitates the computation.
49 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 49 Information Calculation in EM
50 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 50 Notation l c as the score function for θ in the full data; l mis obs as the score for θ in the conditional distribution of Y mis given Y obs ; l obs as the the score for θ in the distribution of Y obs. l c = l mis obs + l obs. V ar( l c ) = V ar(e[ l c Y obs ]) + E[V ar( l c Y obs )].
51 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 51 Information in the EM algorithm We obtain the following Louis formula I c (θ) = I obs (θ) + E[I mis obs (θ, Y obs )]. Thus, the complete information is the summation of the observed information and the missing information. One can even show when the EM converges, the convergence linear rate, denote as (θ (k+1) ˆθ n )/(θ (k) ˆθ n ) approximates the 1 I obs (ˆθ n )/I c (ˆθ n ).
52 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 52 Nonparametric Maximum Likelihood Estimation
53 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 53 First example Let X 1,..., X n be i.i.d random variables with common distribution F, where F is any unknown distribution function. The likelihood function for F is given by L n (F ) = n f(x i ), where f(x i ) is the density function of F with respect to some dominating measure. However, the maximum of L n (F ) does not exists. We instead maximize an alternative function L n (F ) = n F {X i },
54 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 54 where F {X i } denotes the value F (X i ) F (X i ).
55 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 55 Second example Suppose X 1,..., X n are i.i.d F and Y 1,..., Y n are i.i.d G. We observe i.i.d pairs (Z 1, 1 ),..., (Z n, n ), where Z i = min(x i, Y i ) and i = I(X i Y i ). We can think X i as survival time and Y i as censoring time. Then it is easy to calculate the joint distributions for (Z i, i ), i = 1,..., n, is equal to L n (F, G) = n {f(z i )(1 G(Z i ))} i {(1 F (Z i ))g(z i )} 1 i L n (F, G) does not have the maximum so we consider an alternative function n {F {Z i }(1 G(Z i ))} i {(1 F (Z i ))G{Z i }} 1 i.
56 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 56 Third example Suppose T is survival time and Z is covariate. Assume T Z has a conditional hazard function λ(t Z) = λ(t)e θt Z. Then the likelihood function from n i.i.d (T i, Z i ), i = 1,..., n is given by L n (θ, Λ) = n { λ(ti ) exp{ Λ(T i )e θt Z i }f(z i ) }. Note f(z i ) is not informative about θ and λ so we can discard it from the likelihood function. Again, we replace
57 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 57 λ{t i } by Λ{T i } and obtain a modified function L n (θ, Λ) = n Let p i = Λ{T i } we maximize n or its logarithm as n { Λ{Ti } exp{ Λ(T i )e θt Z i } }. p i exp{ ( Y j Y i p j )e θt Z i } θt Z i exp{θ T Z i } Y j Y i p j + log p j.
58 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 58 Fourth example We consider X 1,..., X n are i.i.d F and Y 1,..., Y n are i.i.d G. We only observe (Y i, i ) where i = I(X i Y i ) for i = 1,..., n. This data is one type of interval censored data (or current status data). The likelihood for the observations is n { F (Yi ) i (1 F (Y i )) 1 i g(y i ) }. To derive the NPMLE for F and G, we instead maximize n { P i i (1 P i ) 1 i q i }, subject to the constraint that q i = 1 and 0 P i 1 increases with Y i.
59 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 59 Clearly, ˆq i = 1/n (suppose Y i are all different). This constrained maximization turns out to be solved by the following steps: (i) Plot the points (i, Y j Y i j ), i = 1,..., n. This is called the cumulative sum diagram. (ii) Form the H (t), the greatest the convex minorant of the cumulative sum diagram. (iii) Let ˆP i be the left derivative of H at i. Then ( ˆP 1,..., ˆP n ) maximizes the object function.
60 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 60 Summary of NPMLE The NPMLE is a generalization of the maximum likelihood estimation in the parametric model the semiparametric or nonparametric models. We replace the functional parameter by an empirical function with jumps only at observed data and maximize a modified likelihood function. Both computation of the NPMLE and the asymptotic property of the NPMLE can be difficult and vary for different specific problems.
61 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 61 Alternative Efficient Estimation
62 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 62 One-step efficient estimation start from a strongly consistent estimator for parameter θ, denoted by θ n, assuming that θ n θ 0 = O p (n 1/2 ). One-step procedure is a one-step Newton-Raphson iteration in solving the likelihood score equation; ˆθ n = θ n { ln ( θ n ) } 1 ln ( θ n ), where l n (θ) is the sore function and l n (θ) is the derivative of l n (θ).
63 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 63 Result about the one-step estimation Theorem 5.6 Let l θ (X) be the log-likelihood function of θ. Assume that there exists a neighborhood of θ 0 such that in this neighborhood, l (3) θ (X) F (X) with E[F (X)] <. Then n(ˆθn θ 0 ) d N(0, I(θ 0 ) 1 ), where I(θ 0 ) is the Fisher information.
64 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 64 Proof Since θ n a.s. θ 0, we perform the Taylor expansion on the right-hand side of the one-step equation and obtain ˆθ n = θ n { ln ( θ n )} { ln (θ 0 ) + l } n (θ )( θ n θ 0 ) where θ is between θ n and θ 0. [ ] 1 ˆθ n θ 0 = I { ln ( θ n )} ln (θ ) ( θ n θ 0 ) { ln ( θ n )} ln (θ 0 ). On the other hand, by the condition that l (3) θ (X) F (X) with E[F (X)] <, 1 n l n (θ ) a.s. E[ l θ0 (X)], ˆθ n θ 0 = o p ( θ n θ 0 ) 1 n l n ( θ n ) a.s. E[ l θ0 (X)]. { E[ l θ0 (X)] + o p (1) } 1 1 n l n (θ 0 ).
65 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 65 Sightly different one-step estimation ˆθ n = θ n + I( θ n ) 1 l( θn ). Other efficient estimation the Bayesian estimation method (posterior mode, minimax estimator etc.)
66 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 66 Conclusions The maximum likelihood approach provides a natural and simple way of deriving an efficient estimator. Other estimation approaches are possible for efficient estimation such as one-step estimation, Bayesian estimation etc. Generalization from parametric models to semiparametric or nonparametric models. How?
67 CHAPTER 5 MAXIMUM LIKELIHOOD ESTIMATION 67 READING MATERIALS: Ferguson, Sections 16-20, Lehmann and Casella, Sections
Metric Spaces. Chapter 7. 7.1. Metrics
Chapter 7 Metric Spaces A metric space is a set X that has a notion of the distance d(x, y) between every pair of points x, y X. The purpose of this chapter is to introduce metric spaces and give some
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationi=1 In practice, the natural logarithm of the likelihood function, called the log-likelihood function and denoted by
Statistics 580 Maximum Likelihood Estimation Introduction Let y (y 1, y 2,..., y n be a vector of iid, random variables from one of a family of distributions on R n and indexed by a p-dimensional parameter
More informationThe Expectation Maximization Algorithm A short tutorial
The Expectation Maximiation Algorithm A short tutorial Sean Borman Comments and corrections to: em-tut at seanborman dot com July 8 2004 Last updated January 09, 2009 Revision history 2009-0-09 Corrected
More informationLecture 13: Martingales
Lecture 13: Martingales 1. Definition of a Martingale 1.1 Filtrations 1.2 Definition of a martingale and its basic properties 1.3 Sums of independent random variables and related models 1.4 Products of
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More information2.3 Convex Constrained Optimization Problems
42 CHAPTER 2. FUNDAMENTAL CONCEPTS IN CONVEX OPTIMIZATION Theorem 15 Let f : R n R and h : R R. Consider g(x) = h(f(x)) for all x R n. The function g is convex if either of the following two conditions
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationStatistics in Retail Finance. Chapter 6: Behavioural models
Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More information1 if 1 x 0 1 if 0 x 1
Chapter 3 Continuity In this chapter we begin by defining the fundamental notion of continuity for real valued functions of a single real variable. When trying to decide whether a given function is or
More informationParametric fractional imputation for missing data analysis
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 Biometrika (????,??,?, pp. 1 14 C???? Biometrika Trust Printed in
More information(Quasi-)Newton methods
(Quasi-)Newton methods 1 Introduction 1.1 Newton method Newton method is a method to find the zeros of a differentiable non-linear function g, x such that g(x) = 0, where g : R n R n. Given a starting
More information1 Norms and Vector Spaces
008.10.07.01 1 Norms and Vector Spaces Suppose we have a complex vector space V. A norm is a function f : V R which satisfies (i) f(x) 0 for all x V (ii) f(x + y) f(x) + f(y) for all x,y V (iii) f(λx)
More informationAdaptive Search with Stochastic Acceptance Probabilities for Global Optimization
Adaptive Search with Stochastic Acceptance Probabilities for Global Optimization Archis Ghate a and Robert L. Smith b a Industrial Engineering, University of Washington, Box 352650, Seattle, Washington,
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 5 9/17/2008 RANDOM VARIABLES Contents 1. Random variables and measurable functions 2. Cumulative distribution functions 3. Discrete
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationMaster s Theory Exam Spring 2006
Spring 2006 This exam contains 7 questions. You should attempt them all. Each question is divided into parts to help lead you through the material. You should attempt to complete as much of each problem
More informationAn extension of the factoring likelihood approach for non-monotone missing data
An extension of the factoring likelihood approach for non-monotone missing data Jae Kwang Kim Dong Wan Shin January 14, 2010 ABSTRACT We address the problem of parameter estimation in multivariate distributions
More informationUn point de vue bayésien pour des algorithmes de bandit plus performants
Un point de vue bayésien pour des algorithmes de bandit plus performants Emilie Kaufmann, Telecom ParisTech Rencontre des Jeunes Statisticiens, Aussois, 28 août 2013 Emilie Kaufmann (Telecom ParisTech)
More information. (3.3) n Note that supremum (3.2) must occur at one of the observed values x i or to the left of x i.
Chapter 3 Kolmogorov-Smirnov Tests There are many situations where experimenters need to know what is the distribution of the population of their interest. For example, if they want to use a parametric
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationReject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
More informationChapter 6: Point Estimation. Fall 2011. - Probability & Statistics
STAT355 Chapter 6: Point Estimation Fall 2011 Chapter Fall 2011 6: Point1 Estimat / 18 Chap 6 - Point Estimation 1 6.1 Some general Concepts of Point Estimation Point Estimate Unbiasedness Principle of
More informationModern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh
Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh Peter Richtárik Week 3 Randomized Coordinate Descent With Arbitrary Sampling January 27, 2016 1 / 30 The Problem
More informationPackage EstCRM. July 13, 2015
Version 1.4 Date 2015-7-11 Package EstCRM July 13, 2015 Title Calibrating Parameters for the Samejima's Continuous IRT Model Author Cengiz Zopluoglu Maintainer Cengiz Zopluoglu
More informationIEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem
IEOR 6711: Stochastic Models I Fall 2012, Professor Whitt, Tuesday, September 11 Normal Approximations and the Central Limit Theorem Time on my hands: Coin tosses. Problem Formulation: Suppose that I have
More informationBANACH AND HILBERT SPACE REVIEW
BANACH AND HILBET SPACE EVIEW CHISTOPHE HEIL These notes will briefly review some basic concepts related to the theory of Banach and Hilbert spaces. We are not trying to give a complete development, but
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationNote on the EM Algorithm in Linear Regression Model
International Mathematical Forum 4 2009 no. 38 1883-1889 Note on the M Algorithm in Linear Regression Model Ji-Xia Wang and Yu Miao College of Mathematics and Information Science Henan Normal University
More informationAdaptive Online Gradient Descent
Adaptive Online Gradient Descent Peter L Bartlett Division of Computer Science Department of Statistics UC Berkeley Berkeley, CA 94709 bartlett@csberkeleyedu Elad Hazan IBM Almaden Research Center 650
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationBayesX - Software for Bayesian Inference in Structured Additive Regression
BayesX - Software for Bayesian Inference in Structured Additive Regression Thomas Kneib Faculty of Mathematics and Economics, University of Ulm Department of Statistics, Ludwig-Maximilians-University Munich
More informationBig Data - Lecture 1 Optimization reminders
Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Big Data - Lecture 1 Optimization reminders S. Gadat Toulouse, Octobre 2014 Schedule Introduction Major issues Examples Mathematics
More informationProbabilistic user behavior models in online stores for recommender systems
Probabilistic user behavior models in online stores for recommender systems Tomoharu Iwata Abstract Recommender systems are widely used in online stores because they are expected to improve both user
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationChristfried Webers. Canberra February June 2015
c Statistical Group and College of Engineering and Computer Science Canberra February June (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 829 c Part VIII Linear Classification 2 Logistic
More informationMathematical Methods of Engineering Analysis
Mathematical Methods of Engineering Analysis Erhan Çinlar Robert J. Vanderbei February 2, 2000 Contents Sets and Functions 1 1 Sets................................... 1 Subsets.............................
More informationLogistic Regression (1/24/13)
STA63/CBB540: Statistical methods in computational biology Logistic Regression (/24/3) Lecturer: Barbara Engelhardt Scribe: Dinesh Manandhar Introduction Logistic regression is model for regression used
More informationReference: Introduction to Partial Differential Equations by G. Folland, 1995, Chap. 3.
5 Potential Theory Reference: Introduction to Partial Differential Equations by G. Folland, 995, Chap. 3. 5. Problems of Interest. In what follows, we consider Ω an open, bounded subset of R n with C 2
More informationSTAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
More informationCS 688 Pattern Recognition Lecture 4. Linear Models for Classification
CS 688 Pattern Recognition Lecture 4 Linear Models for Classification Probabilistic generative models Probabilistic discriminative models 1 Generative Approach ( x ) p C k p( C k ) Ck p ( ) ( x Ck ) p(
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More informationLecture 2 ESTIMATING THE SURVIVAL FUNCTION. One-sample nonparametric methods
Lecture 2 ESTIMATING THE SURVIVAL FUNCTION One-sample nonparametric methods There are commonly three methods for estimating a survivorship function S(t) = P (T > t) without resorting to parametric models:
More informationOverview of Monte Carlo Simulation, Probability Review and Introduction to Matlab
Monte Carlo Simulation: IEOR E4703 Fall 2004 c 2004 by Martin Haugh Overview of Monte Carlo Simulation, Probability Review and Introduction to Matlab 1 Overview of Monte Carlo Simulation 1.1 Why use simulation?
More informationGambling and Data Compression
Gambling and Data Compression Gambling. Horse Race Definition The wealth relative S(X) = b(x)o(x) is the factor by which the gambler s wealth grows if horse X wins the race, where b(x) is the fraction
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationWalrasian Demand. u(x) where B(p, w) = {x R n + : p x w}.
Walrasian Demand Econ 2100 Fall 2015 Lecture 5, September 16 Outline 1 Walrasian Demand 2 Properties of Walrasian Demand 3 An Optimization Recipe 4 First and Second Order Conditions Definition Walrasian
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Gaussian Mixture Models Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationNonlinear Optimization: Algorithms 3: Interior-point methods
Nonlinear Optimization: Algorithms 3: Interior-point methods INSEAD, Spring 2006 Jean-Philippe Vert Ecole des Mines de Paris Jean-Philippe.Vert@mines.org Nonlinear optimization c 2006 Jean-Philippe Vert,
More informationStatistics 580 The EM Algorithm Introduction
Statistics 580 The EM Algorithm Introduction The EM algorithm is a very general iterative algorithm for parameter estimation by maximum likelihood when some of the random variables involved are not observed
More informationTHE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE. Alexander Barvinok
THE NUMBER OF GRAPHS AND A RANDOM GRAPH WITH A GIVEN DEGREE SEQUENCE Alexer Barvinok Papers are available at http://www.math.lsa.umich.edu/ barvinok/papers.html This is a joint work with J.A. Hartigan
More informationAssessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope
Assessing the Relative Power of Structural Break Tests Using a Framework Based on the Approximate Bahadur Slope Dukpa Kim Boston University Pierre Perron Boston University December 4, 2006 THE TESTING
More informationDistribution (Weibull) Fitting
Chapter 550 Distribution (Weibull) Fitting Introduction This procedure estimates the parameters of the exponential, extreme value, logistic, log-logistic, lognormal, normal, and Weibull probability distributions
More information10.2 Series and Convergence
10.2 Series and Convergence Write sums using sigma notation Find the partial sums of series and determine convergence or divergence of infinite series Find the N th partial sums of geometric series and
More informationContinued Fractions and the Euclidean Algorithm
Continued Fractions and the Euclidean Algorithm Lecture notes prepared for MATH 326, Spring 997 Department of Mathematics and Statistics University at Albany William F Hammond Table of Contents Introduction
More informationNotes for a graduate-level course in asymptotics for statisticians. David R. Hunter Penn State University
Notes for a graduate-level course in asymptotics for statisticians David R. Hunter Penn State University June 2014 Contents Preface 1 1 Mathematical and Statistical Preliminaries 3 1.1 Limits and Continuity..............................
More informationNotes on metric spaces
Notes on metric spaces 1 Introduction The purpose of these notes is to quickly review some of the basic concepts from Real Analysis, Metric Spaces and some related results that will be used in this course.
More informationLinda Staub & Alexandros Gekenidis
Seminar in Statistics: Survival Analysis Chapter 2 Kaplan-Meier Survival Curves and the Log- Rank Test Linda Staub & Alexandros Gekenidis March 7th, 2011 1 Review Outcome variable of interest: time until
More informationAdvanced statistical inference. Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu
Advanced statistical inference Suhasini Subba Rao Email: suhasini.subbarao@stat.tamu.edu August 1, 2012 2 Chapter 1 Basic Inference 1.1 A review of results in statistical inference In this section, we
More informationNonlinear Programming Methods.S2 Quadratic Programming
Nonlinear Programming Methods.S2 Quadratic Programming Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard A linearly constrained optimization problem with a quadratic objective
More informationFrom Binomial Trees to the Black-Scholes Option Pricing Formulas
Lecture 4 From Binomial Trees to the Black-Scholes Option Pricing Formulas In this lecture, we will extend the example in Lecture 2 to a general setting of binomial trees, as an important model for a single
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationCITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
More informationDate: April 12, 2001. Contents
2 Lagrange Multipliers Date: April 12, 2001 Contents 2.1. Introduction to Lagrange Multipliers......... p. 2 2.2. Enhanced Fritz John Optimality Conditions...... p. 12 2.3. Informative Lagrange Multipliers...........
More information2DI36 Statistics. 2DI36 Part II (Chapter 7 of MR)
2DI36 Statistics 2DI36 Part II (Chapter 7 of MR) What Have we Done so Far? Last time we introduced the concept of a dataset and seen how we can represent it in various ways But, how did this dataset came
More informationMATH 425, PRACTICE FINAL EXAM SOLUTIONS.
MATH 45, PRACTICE FINAL EXAM SOLUTIONS. Exercise. a Is the operator L defined on smooth functions of x, y by L u := u xx + cosu linear? b Does the answer change if we replace the operator L by the operator
More information1. Prove that the empty set is a subset of every set.
1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since
More informationNon Parametric Inference
Maura Department of Economics and Finance Università Tor Vergata Outline 1 2 3 Inverse distribution function Theorem: Let U be a uniform random variable on (0, 1). Let X be a continuous random variable
More informationDetekce změn v autoregresních posloupnostech
Nové Hrady 2012 Outline 1 Introduction 2 3 4 Change point problem (retrospective) The data Y 1,..., Y n follow a statistical model, which may change once or several times during the observation period
More informationDifferentiation and Integration
This material is a supplement to Appendix G of Stewart. You should read the appendix, except the last section on complex exponentials, before this material. Differentiation and Integration Suppose we have
More informationIncreasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.
1. Differentiation The first derivative of a function measures by how much changes in reaction to an infinitesimal shift in its argument. The largest the derivative (in absolute value), the faster is evolving.
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2016. Tom M. Mitchell. All rights reserved. *DRAFT OF January 24, 2016* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is a
More informationOnline Learning, Stability, and Stochastic Gradient Descent
Online Learning, Stability, and Stochastic Gradient Descent arxiv:1105.4701v3 [cs.lg] 8 Sep 2011 September 9, 2011 Tomaso Poggio, Stephen Voinea, Lorenzo Rosasco CBCL, McGovern Institute, CSAIL, Brain
More information17.3.1 Follow the Perturbed Leader
CS787: Advanced Algorithms Topic: Online Learning Presenters: David He, Chris Hopman 17.3.1 Follow the Perturbed Leader 17.3.1.1 Prediction Problem Recall the prediction problem that we discussed in class.
More informationWhat is Statistics? Lecture 1. Introduction and probability review. Idea of parametric inference
0. 1. Introduction and probability review 1.1. What is Statistics? What is Statistics? Lecture 1. Introduction and probability review There are many definitions: I will use A set of principle and procedures
More informationM2S1 Lecture Notes. G. A. Young http://www2.imperial.ac.uk/ ayoung
M2S1 Lecture Notes G. A. Young http://www2.imperial.ac.uk/ ayoung September 2011 ii Contents 1 DEFINITIONS, TERMINOLOGY, NOTATION 1 1.1 EVENTS AND THE SAMPLE SPACE......................... 1 1.1.1 OPERATIONS
More informationMath 461 Fall 2006 Test 2 Solutions
Math 461 Fall 2006 Test 2 Solutions Total points: 100. Do all questions. Explain all answers. No notes, books, or electronic devices. 1. [105+5 points] Assume X Exponential(λ). Justify the following two
More informationMicroeconomic Theory: Basic Math Concepts
Microeconomic Theory: Basic Math Concepts Matt Van Essen University of Alabama Van Essen (U of A) Basic Math Concepts 1 / 66 Basic Math Concepts In this lecture we will review some basic mathematical concepts
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationUniversal Algorithm for Trading in Stock Market Based on the Method of Calibration
Universal Algorithm for Trading in Stock Market Based on the Method of Calibration Vladimir V yugin Institute for Information Transmission Problems, Russian Academy of Sciences, Bol shoi Karetnyi per.
More informationProbabilistic Methods for Time-Series Analysis
Probabilistic Methods for Time-Series Analysis 2 Contents 1 Analysis of Changepoint Models 1 1.1 Introduction................................ 1 1.1.1 Model and Notation....................... 2 1.1.2 Example:
More informationTesting predictive performance of binary choice models 1
Testing predictive performance of binary choice models 1 Bas Donkers Econometric Institute and Department of Marketing Erasmus University Rotterdam Bertrand Melenberg Department of Econometrics Tilburg
More informationLecture 15 Introduction to Survival Analysis
Lecture 15 Introduction to Survival Analysis BIOST 515 February 26, 2004 BIOST 515, Lecture 15 Background In logistic regression, we were interested in studying how risk factors were associated with presence
More informationMetric Spaces Joseph Muscat 2003 (Last revised May 2009)
1 Distance J Muscat 1 Metric Spaces Joseph Muscat 2003 (Last revised May 2009) (A revised and expanded version of these notes are now published by Springer.) 1 Distance A metric space can be thought of
More informationALMOST COMMON PRIORS 1. INTRODUCTION
ALMOST COMMON PRIORS ZIV HELLMAN ABSTRACT. What happens when priors are not common? We introduce a measure for how far a type space is from having a common prior, which we term prior distance. If a type
More informationSection 6.1 Joint Distribution Functions
Section 6.1 Joint Distribution Functions We often care about more than one random variable at a time. DEFINITION: For any two random variables X and Y the joint cumulative probability distribution function
More informationConvex analysis and profit/cost/support functions
CALIFORNIA INSTITUTE OF TECHNOLOGY Division of the Humanities and Social Sciences Convex analysis and profit/cost/support functions KC Border October 2004 Revised January 2009 Let A be a subset of R m
More informationPractice with Proofs
Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using
More informationCCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
More informationAPPLIED MISSING DATA ANALYSIS
APPLIED MISSING DATA ANALYSIS Craig K. Enders Series Editor's Note by Todd D. little THE GUILFORD PRESS New York London Contents 1 An Introduction to Missing Data 1 1.1 Introduction 1 1.2 Chapter Overview
More informationGLM, insurance pricing & big data: paying attention to convergence issues.
GLM, insurance pricing & big data: paying attention to convergence issues. Michaël NOACK - michael.noack@addactis.com Senior consultant & Manager of ADDACTIS Pricing Copyright 2014 ADDACTIS Worldwide.
More informationPermanents, Order Statistics, Outliers, and Robustness
Permanents, Order Statistics, Outliers, and Robustness N. BALAKRISHNAN Department of Mathematics and Statistics McMaster University Hamilton, Ontario, Canada L8S 4K bala@mcmaster.ca Received: November
More informationINDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)
INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its
More information