CALIFORNIA INSTITUTE OF TECHNOLOGY Ma 2b KC Border Introduction to Probability and Statistics February 213 Differentiating under an integral sign In the derivation of Maximum Likelihood Estimators, or the Cramér Rao Lower Bound, we differentiated under an integral sign, and I never told you explicitly when that is allowed. Here is a theorem that gives conditions under which it is permissible. We start with a function g of a vector variable x and a single parameter θ. Let the domain of g be Θ, where is a rectangle in R n and Θ is an interval in R. So Define the function G: Θ R by g : Θ R. G(θ) = In order for this to make sense we assume: g(θ; x) dx. 1 Assumption Let be a rectangle in R n, let Θ be an open interval in R. Assume that for every θ Θ the function x g(θ; x) is continuous and has a finite integral. That is, We would like to show that g(θ; x) dx < G (θ) = ( θ Θ ). D 1 g(θ; x) dx, where D 1 g indicates the partial derivative of g with respect to its first argument θ. In order for this to be true, the partial derivative has to exist and be integrable with respect to x. 2 Assumption Assume that every x, the function θ g(θ; x) is continuous, and for every θ Θ, the partial derivative D 1 g(θ; x) exists, and is integrable with respect to x. That is, D 1 g(θ; x) dx <. 1
Ma 2b February 213 KC Border Differentiating an integral 2 But this is not enough. As you may remember, a partial derivative is a limit of the form ( ) lim h g(θ + h, x) g(θ, x) /h. You may have heard that passing limits through an integral sign requires a boundedness condition (the Lebesgue Dominated Convergence Theorem). It is beyond the scope of this course and this note to go through the details, but let me say that what is required is called a uniform local integrability condition. To get a feeling for what it means, realize that Assumption 2 requires that at each θ the function D 1 g(θ; x) is bounded in absolute value by an integrable function of x, namely D 1 g(θ; x) itself. Uniform local integrability requires that for each θ, there is little strip (θ ε, θ + ε) on which D 1 g(θ; x) is bounded uniformly in θ for each x by an integrable function of x. 3 Assumption Assume that for every θ Θ there is an ε > and a nonnegative function h: R (depending on θ ) such that for all x, and These assumptions are sufficient. 1 sup D 1 g(θ; x) h(x). θ (θ ε,θ +ε) h(x) dx <. 4 Theorem Let g : Θ R satisfy Assumptions 1, 2, and 3. Then the function G: Θ R defined by G(θ) = g(θ; x) dx is differentiable on Θ and G (θ) = D 1 g(θ; x) dx. Proof : For a proof, see Theorem 24.5 of Aliprantis and Burkinshaw [1, pp. 193 194] and the remarks following it. Assumption 3 is awkward to use, so a stronger condition that implies it is useful. The next theorem may be easier to apply. It requires to be closed and bounded, and g to have a jointly continuous partial derivative. It is similar to Theorem 8.11.2 in Dieudonné [2, p. 177]. 5 Theorem Let be a closed and bounded rectangle in R n, let Θ be an open interval in R, and let g : Θ R be jointly continuous. Assume that the partial derivative D 1 g(θ; x) is jointly continuous on Θ. Then the function G: Θ R defined by G(θ) = g(θ; x) dx is differentiable on Θ and G (θ) = D 1 g(θ; x) dx. Proof : Since g and D 1 g are jointly continuous, they are bounded on every set of the form [θ ε, θ + ε], and so integrable. Therefore the Hypotheses of Theorem 4 are satisfied. 1 We can make weaker assumptions, see, for instance, Aliprantis and Burkinshaw [1, pp. 193 194]. The problem is that the weaker assumptions involve measure-theoretic concepts that are beyond the scope of this course.
Ma 2b February 213 KC Border Differentiating an integral 3 1 An illustrative (counter)example The following example shows what can go wrong when the uniform local integrability Assumption 3 violated. You can find it in Gelbaum and Olmsted [3, Example 9.15, p. 123], but similar examples are well-known. 6 Example Let Θ = R and let = [, 1]. Define g : Θ R via θ 3 g(θ, x) = x 2 e θ2 /x x >, x =. The function g is plotted in figure 1. (Notice that for fixed x, the function θ g(θ, x) is continuous at each θ; and for each fixed θ, the function x g(θ, x) is continuous at each x, including x =. (This is because the exponential term goes to zero much faster than than polynomial term goes to zero as x.) The function g is not jointly continuous though: Along the curve x = θ 2 we have g(θ, x) = e 1 /θ, which diverges to as θ and diverges to as θ.) Define G(θ) = g(θ; x) dx = θ 3 1 x 2 e θ2 /x dx. Consulting a table of integrals if necessary, we find the indefinite integral x 2 e a/x dx = e a/x /a. Thus, letting a = θ 2 we have G(θ) = θe θ2 (θ R). The function G is plotted in Figure 2. Differentiation yields G (θ) = (1 2θ 2 )e θ2 (θ R). Note that G () = (1 2 2 )e 2 = 1. Now let s compute D 1 g(θ; x): For x =, g(θ; x) = for all θ, so D 1 g(θ; ) =. For x >, we have D 1 g(θ; x) = 3θ2 /x x 2 e θ2 + θ3 /x x 2 e θ2 ( 2θ/x) ( 3θ 2 = x 2 2θ4 ) x 3 e θ2 /x.
Ma 2b February 213 KC Border Differentiating an integral 4 Surface of graph. Contours. Figure 1. Plots of g(θ; x) = θ3 x 2 e θ2 /x, (x > ).
Ma 2b February 213 KC Border Differentiating an integral 5 Figure 2. G(θ). So ( 3θ 2 )e 2θ4 x D 1 g(θ; x) = 2 x 3 θ2 /x x > x =. (Note that for fixed θ, the limit of D 1 g(θ; x) as x is zero, so for each fixed θ, D 1 g(θ; x) is continuous in x. ALso, for each fixed x, D 1 g(θ; x) is continuous in θ. But again, along the curve x = θ 2, we have D 1 g(θ; x) = ( 3θ 2 2θ 2) e 1 = e 1 /θ 2 which diverges to as θ. Thus D 1 g(θ; x) is not jointly continuous at (, ). See Figures 3 and 4.) Define the integral I(θ) = D 1 g(θ; x) dx = ( 3θ 2 x 2 2θ4 x 3 ) e θ2 /x dx. It satisfies I() =. At θ =, we have G () = (1 2 2 )e 2 = 1 = I() = So the conclusion of Theorem 4 fails. D 1 g(; x) dx,
Ma 2b February 213 KC Border Differentiating an integral 6 Surface of the log graph. Contours. Figure 3. Plots of the log of D 1 g(θ; x) = ( 3θ 2 x 2 2θ4 x 3 )e θ2 /x.
Ma 2b February 213 KC Border Differentiating an integral 7 The failure of the theorem is confined to te case θ =. For θ >, I(θ) can be computed as I(θ) = ( 3θ 2 D 1 g(θ; x) dx = x 2 2θ4 ) x 3 e θ2 /x dx = 3θ 2 1 /x x 2 e θ2 dx 2θ 4 [ e = 3θ 2 θ 2 /x x=1 θ 2 x= = (1 2θ 2 )e θ2. ] [ 2θ 4 e θ2 /x 1 /x x 3 e θ2 dx ( 1 θ 4 + 1 θ 2 x ) ] x=1 x= For θ >, we have So the only problem is right at θ =. G (θ) = (1 2θ 2 )e θ2 = I(θ) = D 1 g(θ; x) dx. Let s check Assumption 3 for θ =. We need to find an ε > so that θ < ε implies D 1 g(θ; x) h(x) where h(x) dx <. Now for x >, ( 3θ 2 D 1 g(θ; x) = x 2 2θ4 ) x 3 e θ2 /x. Looking at points of the form x = θ 2, we see that h(x) must satisfy h(x) D 1 g( ( 3 x; x) = x x) 2 e 1 = e 1 /x. See Figure 4. So for any ε > for all θ < ε, we have x θ implies h(x) e 1 /x. Thus h(x) dx ε h(x) dx e 1 ε 1 dx =. x Thus Assumption 3 is violated by this example, and the conclusion of the theorem fails. References [1] C. D. Aliprantis and O. Burkinshaw. 1998. Principles of real analysis, 3d. ed. San Diego: Academic Press. [2] J. Dieudonné. 1969. Foundations of modern analysis. Number 1-I in Pure and Applied Mathematics. New York: Academic Press. Volume 1 of Treatise on Analysis. [3] B. R. Gelbaum and J. M. Olmsted. 23. Counterexamples in analysis. Minneola, NY: Dover. Slightly corrected reprint of the 1965 printing of the work originally published in 1964 by Holden Day in San Francisco.
Ma 2b February 213 KC Border Differentiating an integral 8 Figure 4. Cross-sections of of D 1 g(θ; x) = ( 3θ 2 x 2 2θ4 x 3 )e θ2 /x.